/** 
\page marvel-2 MARVEL tutorial: Path CVs

\section marvel-2-aims Aims

Consider the two overlain protein structures that are shown in the figure below.  
Can you see the difference between these two structures?  Can you think of a collective variable that could be used to study the substantial change in structure?

\anchor marvel-2-cdk-fig
\image html belfast-2-cdk.png "CDK2 conformational change, PDB code 2C5X and 2C5Y."

Your answers to the questions posed above are hopefully: yes I can see the difference between the two structures - the upper loop is radically different in the two 
cases - and no I have absolutely no idea as to how to create a collective variable that might be used to study the change in structure.

These answers are interesting as they cut to the very heart of what is interesting about biomolecular systems.  In fact this difficulty is one of the reasons 
why such systems are so widely studied.  If you think for a moment about solid state systems any transition usually involves a substantial change in symmetry.  
Low energy configurations are usually high symmetry while higher energy configurations have a low symmetry.  This makes it easy to design collective variables 
to study solid state transitions - you simply measure the degree of symmetry in the system (see \ref belfast-10).  In biomolecular systems by contrast the 
symmetry does not change substantially during a folding transition.  The unfolded state has a low symmetry but the folded state also has a low symmetry, 
which is part of the reason that it is so difficult to find the folded state from the amino acid sequence alone.

With all this in mind the purpose of this tutorial is to learn about how we can design collective variables that can be used to study 
transitions between different states of these low-symmetry systems.  In particular, we are going to learn how we can design collective variables
that describe how far we have progressed along some pathway between two configurations with relatively low symmetry.  We will in most of this 
tutorial study how these coordinates work in a two-dimensional space as this will allow us to visualize what we are doing.  Hopefully, however, 
you will be able to use what you learn from this tutorial to generalize these ideas so that you can use \ref PATH and \ref PCAVARS in 
higher-dimensional spaces.  

\section marvel-2-lo Learning Outcomes

Once this tutorial is completed students will:

- be able to explain what is computed by a \ref PCAVARS coordinate and write down expressions for these quantities.
- be able to write a PLUMED input file that calculates and prints a \ref PCAVARS coordinate.
- be able to write down an expression for the quantity contained in the \f$s\f$ and \f$z\f$ components of a \ref PATH collective variable.
- be able to write PLUMED input files that calculate \ref PATH collective variables for a range of different metrics.
- be able to measure the quality of a transition state by calculating the isocommittor.

\section marvel-2-resources Resources

The <a href="tutorial-resources/marvel-2.tar.gz" download="marvel-2.tar.gz"> tarball </a> for this project contains the following files:

- transformation.pdb : a trajectory that shows the transition between the \f$C_7ax\f$ and \f$C_7eq\f$ conformers of alanine dipeptide.
- pca-reference.pdb : a file that gives the start and end points of the vector that connects the \f$C_7ax\f$ and \f$C_7eq\f$ conformers.  This file contains the positions of the atoms in these two structures.
- PCA-isocommittor : a directory containing the files required to run isocommittor simulations that monitor the values of \f$\phi\f$, \f$\psi\f$ and the PCA coordinate.
- PATH-isocommittor : a directory containing the files required to run isocommittor simulations that monitor the values of the \ref PATH collective variable \f$S(X)\f$.
- 2CV-isocommittor : a directory containing the files required to run isocommittor simulations that monitor the values of the \ref PATH collective variables \f$S(X)\f$ and \f$Z(X)\f$ 

\section marvel-2-instructions Instructions

In this tutorial we are going to be considering a conformational transition of alanine dipeptide.  In particular we are going to be considering the transition
between the two conformers of this molecule shown below:

\anchor marvel-2-transition-fig
\image html belfast-2-transition.png  "Two metastable states of alanine dipeptide are characterized by their Ramachandran dihedral angles."

Alanine dipeptide is a rather well-studied biomolecule (in fact it is an over studied molecule!).  It is well known that you can 
understand the inter-conversion of the two conformers shown above by looking at the free energy surface as a function of the \f$\phi\f$ and \f$\psi\f$ 
Ramachandran angles as shown below:

\anchor marvel-2-rama-fig
\image html belfast-2-rama.png  "The Free energy landscape of alanine dipeptide in Ramachandran angles in the CHARMM27 force field."
 
In this tutorial we are not going to use these coordinates to study alanine dipeptide.  Instead we are going to see if we can find a single collective 
variable that can distinguish between these two states.

\subsection marvel-2-pca1 PCA coordinates

Consider the free energy surface shown in figure \ref marvel-2-rama-fig.  It is clear that either the \f$\phi\f$ (\f$x\f$-axis) or the 
\f$\psi\f$ (\f$y\f$-axis) angle of the molecule can be used to distinguish between the two configurations shown in 
figure \ref marvel-2-transition-fig.  Having said that, however, given the shape of landscape and the associated thermal fluctuations we would 
expect to see in the values of these angles during a typical simulation, it seems likely that \f$\phi\f$ will do a better job
at distinguishing between the two configurations.   \f$\psi\f$ would most likely be a bad coordinate as when the molecule is in the \f$C_7eq\f$ 
configuration the \f$\psi\f$ angle can fluctuate to any value with only a very small energetic cost.  If we only had information on how the \f$\psi\f$
angle changed during a simulation we would thus struggle to distinguish a transition to the \f$C_7ax\f$ configuration from a thermal fluctuations.
It has been shown that metadynamics simulations that use just the \f$\phi\f$ angle as a collective variable can effectively 
drive the system from the \f$C_7ax\f$ configuration to the \f$C_7eq\f$ configuration.  We will not repeat these calculations here but will instead use driver to determine 
how the values of \f$\phi\f$ and \f$\psi\f$ change as we move between along the transition path that connects the \f$C_7ax\f$ configuration to the 
\f$C_7eq\f$ state.

\verbatim
t1: TORSION ATOMS=2,4,6,9
t2: TORSION ATOMS=4,6,9,11
PRINT ARG=t1,t2 FILE=colvar
\endverbatim 

Lets run this now on trajectory that describes the transition from the \f$C_7ax\f$ configuration to the \f$C_7eq\f$ configuration.  To run this calculation copy the input
file above into a file called plumed.dat and run the command below:

\verbatim
plumed driver --mf_pdb transformation.pdb
\endverbatim

Try plotting each of the two torsional angles in this file against time in order to get an idea of how good a job each one of these coordinates at distinguishing between
the various configurations along the pathway connecting the \f$C_7ax\f$ and \f$C_7eq\f$ configurations.  What you will see is that in both cases the CV does not increase/decrease 
monotonically as the transition progresses.

We can perhaps come up with a better coordinate that incorporates changes in both \f$\phi\f$ and \f$\psi\f$ by using the coordinate illustrated in the figure below.

\anchor pca-figure
\image html marvel-2-pca-coordinates.png "An illustration showing how PCAVARS coorditates work.  The vector connecting some reference state to any state the system is in can be in (purple and orange points) can be projected onto the vector connecting the two states of interest (black arrow) by using the dot product of the vectors shown here."

We can even use PLUMED to calculate this coordinate by using the input shown below:

\verbatim
t1: TORSION ATOMS=2,4,6,9
t2: TORSION ATOMS=4,6,9,11
tc: COMBINE ARG=t1,t2 COEFFICIENTS=2.621915,-2.408714 PERIODIC=NO 
PRINT ARG=t1,t2,tc FILE=colvar
\endverbatim

Try calculating the values of the above collective variables for each of the configurations in the transformation.pdb file by using the 
command that was given earlier.

Notice that what we are using here are some well known results on the dot product of two vectors here.  Essentially if the values of the 
Ramachandran angles in the \f$C_7eq\f$ configuration are \f$(\phi_1,\psi_1)\f$ and the Ramachandran angles in the \f$C_7ax\f$ configuration are
\f$(\phi_2,\psi_2)\f$. If our instantaneous configuration is \f$(\phi_3,\psi_3)\f$ we can thus calculate the following projection 
on the vector connecting the \f$C_7eq\f$ state to the \f$C_7ax\f$ state:

\f[
s = (\phi_2 - \phi_1).(\phi_3 - \phi_1) + (\psi_2 - \psi_1).(\psi_3 - \psi_1)
\f]

which is just the dot product between the vector connecting the point \f$(\phi_1,\psi_1)\f$ to \f$(\phi_2,\psi_2)\f$ and the vector 
connecting the point \f$(\phi_1,\psi_1)\f$ to \f$(\phi_3,\psi_3)\f$.  If we call these two vectors \f$\mathbf{v}_1\f$ and 
\f$\mathbf{v}_2\f$ we can write this dot product as:

\f[
\mathbf{v}_1 \cdot \mathbf{v}_2 = | \mathbf{v}_1 | | \mathbf{v}_2 | cos( \alpha )
\f]

where \f$| \mathbf{v}_1 |\f$ and \f$| \mathbf{v}_2 |\f$ are the magnitudes of our two vectors and where \f$\alpha\f$ is the angle between
the two of them.  Elementary trigonometry thus tells us that if \f$\mathbf{v}_1\f$ is a unit vector (i.e. if it has magnitude 1) the dot product
is thus equal to the projection of the vector \f$\mathbf{v}_2\f$ on \f$\mathbf{v}_2\f$ as shown in figure \ref pca-figure.

This is an useful idea.  In fact it is the basis of the \ref PCAVARS collective variable that is implemented in PLUMED so we can (almost) calculate 
the projection on this vector by using the input shown below:

\verbatim
t1: TORSION ATOMS=2,4,6,9
t2: TORSION ATOMS=4,6,9,11
p: PCAVARS REFERENCE=angle-pca-reference.pdb TYPE=EUCLIDEAN 
PRINT ARG=t1,t2,p.* FILE=colvar 
\endverbatim

We cannot, however, do this in practice (we also shouldn't really use the previous input either) as the \ref TORSION angles that we use 
to define our vectors here are periodic.  In this next section we will thus look at how we can avoid this problem of periodicity by working 
in a higher dimensional space. 

\subsection marvel-2-metric1 PCA with the RMSD metric

In the previous section I showed how we can use the projection of a displacement on a vector as a collective variable.  I demonstrated this
in a two dimensional space as this makes it easy to visualize the vectors involved.  We are not forced to work with two dimensional vectors,
however.  We can instead find the vector that connects the \f$C_7eq\f$ and \f$C_7ax\f$ states in some higher dimensional space and project our current
coordinate on that particular vector.  In fact we can even define this vector in the space of the coordinates of the atoms.  In other words,
if the 3\f$N\f$ coordinate of atomic positions is \f$\mathbf{x}^{(1)}\f$ for the \f$C_7eq\f$ configuration and \f$\mathbf{x}^{(2)}\f$ for the \f$C_7ax\f$ 
configuration and if the instantaneous configuration of the atoms is \f$\mathbf{x}^{(3)}\f$ we can use the following as a CV:
\f[
s = \sum_{i=1}^{3N} (x^{(2)}_i - x^{(1)}_i ) (x^{(3)}_i - x^{(1)}_i ) 
\f]
where the sum here runs over the \f$3N\f$-dimensional vector that defines the positions of the \f$N\f$ atoms in the system.  This is what
(in a manner of speaking - I will return to this point momentarily) is calculated by this PLUMED input:

\verbatim
t1: TORSION ATOMS=2,4,6,9
t2: TORSION ATOMS=4,6,9,11
p: PCAVARS REFERENCE=pca-reference.pdb TYPE=OPTIMAL
PRINT ARG=t1,t2,p.* FILE=colvar
\endverbatim

Use this input to analyze the set of configurations that are in the transformation.pdb file.

Let's now look further at the caveat that I alluded to before we ran the calculations.  I stated that we are only calculating:
\f[
s = \sum_{i=1}^{3N} (x^{(2)}_i - x^{(1)}_i ) (x^{(3)}_i - x^{(1)}_i )
\f]
in a manner of speaking.  The point is that we would not want to calculate exactly this quantity because the vectors of displacements that
are calculated in this way includes both rotational and translational motion.  This is a problem as the majority of the change
in moving from the \f$C_7ax\f$ configuration shown in figure \ref marvel-2-transition-fig to the \f$C_7eq\f$ configuration shown in figure 
\ref marvel-2-transition-fig comes from the translation of all the atoms.  To put this another way if I had, in figure \ref marvel-2-transition-fig,
shown two images of the \f$C_7ax\f$ configuration side by side the displacement in the positions of the atoms in those two structures would be
similar to the displacement of the atoms in \ref marvel-2-transition-fig as as the majority of the displacement in the vector of atomic positions 
comes about because I have translated all the atoms in the molecule
rightwards by a fixed amount.  I can, however, remove these translational displacements from consideration when calculating these vectors.  In addition,
I can also remove any displacements due rotation in the frame of reference of the molecule.  If you are interested in how this is done in practice you can 
read about it on the manual page about the \ref RMSD collective variable.  For what concerns us here, however, all we need to know is that when we 
use the OPTIMAL metric we are calculating a vector which tells us how far the atoms have been displaced in moving from structure A to structure
B in a way that excludes any displacements due to translation of the center of mass of the molecule or any displacements that occur due to rotation
of the Cartesian frame. 
 
\subsection marvel-2-iso The isocommittor surface

In the previous sections I have been rather loose when talking about better and worse collective variables in that I have not been clear in my distinction
between what makes a collective variable good and what makes a collective variable bad.  In this section I thus want to discuss one method that we can use
to judge the quality of a collective variable.  This method involves calculating the so called isocommittor.  The essential notion behind this technique is 
that there will be a saddle point between the two states of interest (the \f$C_7ax\f$ and \f$C_7eq\f$ configurations in our alanine dipeptide example).  If the free energy
is plotted as a function of a good collective variable the location of this dividing surface - the saddle point - will appear as a maximum.

Lets suppose that we now start a simulation with the system balanced precariously on this maximum.  The system will, very-rapidly, fall off the maximum and move
towards either the left or the right basin.  Furthermore, if this maximum provides a good representation of the location of the transition state ensemble - in other
words if the position of maximum in the low-dimensional free energy surface tells us something about the structure in the transition state ensemble - then 
50% of trajectories started from this point will fall to the left and 50% will fall to the right.  If by contrast the maximum in the free energy surface does 
not represent the location of the transition state well then there will be an imbalance between the number of trajectories that move rightwards and the number 
that move leftwards.

We can think of this business of the isocommittor one further way, however.  If in the vicinity of the transition state between the two basins the collective variable
is perpendicular to the surface separating these two states half of the trajectories that start from this configuration will move to the left in CV space while the 
other half will move to the right.  If the dividing surface and the CV are not perpendicular in the vicinity of the transition state, however, there will be an 
imbalance between the number of trajectories that move rightwards and the number of trajectories that move leftwards.  We can thus determine the goodness of a 
collective variable by shooting trajectories from what we believe is the transition state and examining the number of trajectories that move into the left and 
right basins.

Lets make all this a bit more concrete by looking at how we might calculate the isocommittor by using some of the collective variables we have introduced in this 
exercise.  You will need to go into the directory in the tar ball that you downloaded that is called PCA-isocommittor.  In this directory you will find a number
of files that will serve as input to gromacs 5 and PLUMED.  You thus need to ensure that you have an installed version of gromacs 5 patched with PLUMED on your 
computer in order to perform this exercise.  In addition to these input files you will find a bash script called script.sh, which we are going to use in order 
to set of a large number of molecular dynamics simulations.  If you open script.sh you will find the following lines near the top:

\verbatim
GROMACS_BIN=/Users/gareth/MD_code/gromacs-5.1.1/build/bin
GROMACS=$GROMACS_BIN/gmx
source $GROMACS_BIN/GMXRC.bash
\endverbatim  

These will need to be adjusted so that the GROMACS_BIN variable points at the bin directory of the gromacs build on your computer.  Once you have made this 
modification though you can run the calculation by issuing the following command in the PCA-isocommittor directory:

\verbatim
./script.sh
\endverbatim

This command submits 50 molecular dynamics simulations that all start from a configuration that lies between the \f$C_7eq\f$ and \f$C_7ax\f$ configurations of alanine 
dipeptide.  In addition, this command also generates some scripts that allow us to visualize how the \f$\phi\f$, \f$\psi\f$ and PCA coordinates that we 
introduced in the previous sections change during each of these 50 simulations.  If you load gnuplot and issue the command:

\verbatim
load "script_psi.gplt"
\endverbatim

you see how \f$\psi\f$ changes during the course of the 50 simulations.  Similarly the gnuplot command:

\verbatim
load "script_phi.gplt"
\endverbatim

will show how how \f$\phi\f$ changes during the course of the 50 simulations and:

\verbatim
load "script_pca.gplt"
\endverbatim

gives you the information on the PCA coordinates.  If you look at the \f$\psi\f$ and PCA data first you can see clearly that it is very difficult to distinguish
the configurations that moved to the \f$C_7eq\f$ basin from the trajectories that moved to the \f$C_7ax\f$ basin.  By contrast if you look at the data on the \f$\phi\f$ angles 
you can indeed use these plots to distinguish the trajectories that moved to \f$C_7eq\f$ from the trajectories that moved to \f$C_7ax\f$.  It is abundantly clear, however,
that the number of trajectories that moved to \f$C_7eq\f$ is not equal to the number of trajectories that moved to \f$C_7ax\f$.  This CV, therefore, is not capturing 
the transition state ensemble. 

One thing you will have seen from these examples is that the \ref PCAVARS coordinate that were introduced in the previous 
sections provides an extremely poor model for the transition state ensemble.  The value of the isocommittor at the maximum for both of these variables is not at all 
close to 50%.  In fact the CV is not even particularly good at capturing the difference between these two states.  If you look at the free energy surface shown below 
it is perhaps clear why.

\anchor ala-tstate
\image html "marvel-2-trans-state.png" "The PCAVARS coordinate and the transition state are highlighted on this figure.  As you can see the coordinate does not pass through the transition state" 

You can see the location of the saddle point between these two states in this surface and it is very clear that the vector connecting the \f$C_7eq\f$ state to the 
\f$C_7ax\f$ state does not pass through this point.  In fact it would be extremely fortuitous if a vector connecting an initial state and a final state also passed
through the intermediate transition state between them.  We can, after all, define the equation of straight line (a vector) if we are given only two points 
on it.  In the next section we are thus going to see how we can resolve this problem by introducing a non-linear (or curvilinear) coordinate.

\subsection marvel-2-pathcvs Path collective variables

Consider the black path that connects the \f$C_7ax\f$ and \f$C_7eq\f$ states in the free energy shown below:

\anchor marvel-2-good-bad-path-fig
\image html belfast-2-good-bad-path.png "Examples of good and bad paths:  the black path follows the minimum free energy path connecting the two metastable states, while the red path connects the two states directly via a linear path that passes through high energy"

This black pathways appears to be the "perfect" coordinate for modelling this conformational transition as it passes along the lowest 
energy pathway that connects the two states and because it thus passes over the lowest saddle point that lies between them.  We can calculate 
such a coordinate with PLUMED by using the input file below (I will return to the mathematical details of how this works momentarily)

\verbatim
path: PATH REFERENCE=path-reference.pdb TYPE=OPTIMAL LAMBDA=15100.
PRINT ARG=* STRIDE=2 FILE=colvar FMT=%12.8f
\endverbatim

Lets thus use this input and run an isocommittor analysis using the location of the maximum in this coordinate as the start point for all our trajectories.
Everything you need to do this analysis is in the PATH-isocommittor directory.  Once again you will find that there is a script.sh bash script inside this
directory, which, as in the previous section, you can use to run a large number of molecular dynamics simulation.  Furthermore, similarly to the last 
section you will need to begin this exercise by modifying the location of path to gromacs within this script.  Once you have made this modification 
submit your molecular dynamics jobs by issuing the command:

\verbatim
./script.sh
\endverbatim

You can then plot the data output using gnuplot and the command:

\verbatim
load "script_path.gplt"
\endverbatim

Unlike what we saw for the \ref PCAVARS variables in the previous section we find that it is easy to use these \ref PATH variables to distinguish those
configurations that moved to \f$C_7ax\f$ from those that moved to \f$C_7eq\f$.  Having said that, however, we still have a large imbalance between the number of 
trajectories that move rightwards and the number that move leftwards.  We are thus still a long way from unambiguously identifying the location of the 
transition state ensemble for this system.

\subsection marvel-2-pathcvs2 The mathematics of path collective variables  

Let's now take a moment to discuss the mathematics of these coordinates, which is not so complicated if we think about what they do through an analogy.
Suppose that you were giving your friend instructions as to how to get to your house and lets suppose these instructions read something like this:

1. Take the <b> M1 motorway </b> and get off <b> at junction 5 </b>
2. At <b> the roundabout </b> you need to take <b> the third exit </b> towards <b> Crumlin </b>
3. Follow the road as far as the <b> farmers arms </b> then take the <b> next left </b>
4. The house is <b> on the corner by the garage. </b>  

If you think about these instructions in the abstract what you have is a set of way markers (the item I have put in bold) in a particular order.  The 
list of way markers is important as is the order they appear in in the instructions so we  incorporate both these items in the \ref PATH coordinates 
that we have just used to study the transition between the transition between \f$C_7eq\f$ and \f$C_7ax\f$.  In these coordinates the way markers are a set of 
interesting points in a high dimensional space.  In other words, these are like the configurations of \f$C_7eq\f$ and \f$C_7ax\f$ that we used in the previous sections
when talking about \ref PCAVARS.  Each of these configurations lies along the path connecting \f$C_7eq\f$ and \f$C_7ax\f$ and, as in the directions example above, one 
must pass them in a particular order in order to pass between these two conformations.  The final CV that we have used above is thus:

\f[
S(X)=\frac{\sum_{i=1}^{N} i\ \exp^{-\lambda \vert X-X_i \vert }}{ \sum_{i=1}^{N} \exp^{-\lambda \vert X-X_i \vert } }
\f]

In this expression \f$\vertX-X_i\vert\f$ is the distance between the instantaneous coordinate of the system, \f$X\f$, in the high-dimensional space and 
\f$X_i\f$ is the coordinate of the \f$i\f$th way mark in the path.  The largest exponential in the sum that appears in the numerator and the denominator
will thus be the one that corresponds to the point that is closest to where the system currently lies.  In other words, \f$S(X)\f$, measures the position 
on a (curvilinear) path that connects two states of interest as shown in red in the figure below:

\anchor marvel-2-ab-sz-fig
\image html belfast-2-ab-sz.png "The S variable can be thought as the length of the red segment, while the Z variable is the length of the green one." 

\subsection marvel-2-pathz The Z(X) collective variable

You may reasonably ask what the purpose these \ref PATH collective variables variables serve given that in this case they seem to do no better than 
\f$\phi\f$ when it comes to the tests we have performed on calculating the isocommittor.  To answer this question we are going to run one final set of 
isocommittor simulations.  The input for these calculations are in the directory called 2CV-isocommittor.  Once again you will find that there is a script.sh bash script inside this
directory, which, as in the previous section, you can use to run a large number of molecular dynamics simulation.  Furthermore, similarly to the last
section you will need to begin this exercise by modifying the location of path to gromacs within this script.  Once you have made this modification
submit your molecular dynamics jobs by issuing the command:

\verbatim
./script.sh
\endverbatim

You can then plot the data output using gnuplot and the commands:

\verbatim
load "script_pca.gplt"
\endverbatim

and

\verbatim
load "script_path.gplt"
\endverbatim

What is plotted by these commands is slightly different from what was plotted in the previous exercises where we calculated the isocommittor.  
Instead of plotting the value of the CV against simulation time the above commands plot the values that 2CVs take during the simulation.  The 
script called script_path.gplt plots the value of the \f$S(X)\f$ collective variable on the x-axis and the value of the following quantity
on the y-axis:

\f[
Z(X)=-\frac{1}{\lambda}\log (\sum_{i=1}^{N} \ \exp^{-\lambda \vert X-X_i \vert })
\f]

What this quantity measures is shown in green in the figure \ref marvel-2-ab-sz-fig.  Essentially it measures the distance between the instantaneous configuration 
the system finds itself in and the path that is marked out using the way markers.  If you plot the data using script_path.gplt what you thus see is that the 
system never moves very far from the path that is defined using the \ref PATH command.  In short the system follows this path from the transition state back to 
either the \f$C_7eq\f$ or \f$C_7ax\f$ configuration.

We can calculate a quantity similar to \f$Z(X)\f$ for the \ref PCAVARS collective variables.  Furthermore, when we plot the data we have generated using this
exercise using script_pca.gplt the value this quantity takes is shown plotted against the instantaneous value of the \ref PCAVARS collective variable.  If you 
compare this graph with what was obtained when you plotted the output from \ref PATH above you see that the system has moved very far from the \ref PCAVARS 
coordinate.

This exercise illustrates the strength of these \ref PATH collective variables.  We can use a \ref PATH to monitor how a large number of coordinates change during 
a chemical transition.  Furthermore, we can use \f$Z(X)\f$ to measure how much real trajectories deviate from our \ref PATH and thus have a quantitative measure
of how well our \ref PATH represents the true transition mechanism.  Compare this with using a single CV such as \f$\phi\f$.  When use use a single CV we 
map the high-dimensional data from the trajectory into a lower dimensional space.   We thus lose some information on what occurs during the transition.  
To be clear we also loose information on the what occurs during the transition when we use \ref PATH as any mapping into a lower dimensional space deletes information.
In the \ref PATH case, however, we can use the value of \f$Z(X)\f$ to measure how much data has been lost in mapping the trajectory onto \f$S(X)\f$. 

These two coordinates, \f$S(X)\f$ and \f$Z(X)\f$, are very flexible.  They are thus been used widely in the literature on modelling conformational changes of biomolecules.
A part of this flexibility comes because one can use any set of way markers to define the \ref PATH.  Another flexibility comes, however, when you 
recognize that you can also change the way in which the distance, \f$\vertX- X_i\vert\f$, is calculated in the two formulas above.  For example
this distance can be calculated using the \ref RMSD distance or it can be calculated by measuring the sum of the squares of a set of displacements
in collective variable values (see \ref TARGET).  Changing the manner in which the distance between path way points is calculated thus provides a 
way to control the level of detail that is incorporated in the description of the reaction \ref PATH.

\subsection marvel-2-pathcv-find Optimizing path collective variables

Hopefully the previous sections have allowed you to understand how \ref PATH collective variables work and the sorts of problems they might be used to 
solve.  If you have one of these problems to solve the next reasonable question to ask is: how to collect the set of reference frames that serve as the 
way markers on your \ref PATH.  Unfortunately, there is no single answer to this question.  Different researchers have used different methods 
including using packages that morph one protein structure into another, using information from prior molecular dynamics or enhanced sampling calculations
and even using \ref PATH collective variables that change adaptively as the simulation progresses.  Ultimately, you will need to find the best method
for solving your particular problem.  Having said that, however, there is some general guidance on setting up \ref PATH collective variable and it is 
this that we will focus on in this section.  The first thing that you will need to double check is the spacing between the frames in your
\ref PATH.  Lets suppose that your \ref PATH has \f$N\f$ of these way markers upon it you will need to calculate is the \f$N \times N\f$ matrix of 
distances between way markers.  That is to say you will have to calculate the distance \f$\vertX_j-X_i\vert\f$ between each pair of frames.
The values of the distance in this matrix for a good \ref PATH are shown in the figure below:

\anchor marvel-2-good-matrix-fig
\image html belfast-2-good-matrix.png "A good distance matrix for path variables has the gull wing shape shape shown here." 

For contrast the values of the distances in this matrix for a bad \ref PATH are shown in the figure below:

\anchor marvel-2-bad-matrix-fig
\image html belfast-2-bad-matrix.png "A bad distance matrix for path variables is rather irregular."

If the distance matrix looks like the second of the two figures shown above this is indicates that the frames in the \ref PATH that have been chosen 
are not particularly effective.  Lets suppose that we have a \ref PATH with four way markers upon it.  In order for the \f$S(x)\f$ CV that was defined 
earlier to work well frame number 3 must be further from frame number 1 than frame number 2.  Similarly frame number 4 must be still further
from frame number 1 than frame number 3.  This is what the gull wing shape in \ref marvel-2-good-matrix-fig is telling us.  The order of the frames in the 
rows and columns of the matrix is the same as the order that they are run through in the sums in the equation for \f$S(X)\f$.  The shape of the surface 
in this figure shows that the distance between frames \f$i\f$ and \f$j\f$ increases monotonically as the magnitude of the difference between \f$i\f$ and 
\f$j\f$ is increased, which is what is required.

A second important requirement of a good \ref PATH is shown in the figure below:  

\anchor marvel-2-good-vs-bad-fig
\image html belfast-2-good-vs-bad.png "Comparison between the distances between neighboring frames on the PATH.  A good PATH will have a set of frames that are all approximately equally spaced along it."

A good \ref PATH has an approximately equal spacing between the neighboring frames along it.  In other words, the distance between frame 1 and frame 2 
is approximately equal to the distance between frame 2 and frame 3 as shown above.  When this condition is satisfied a good criterion for selecting a 
suitable \f$\lambda\f$ parameter to use is:

\f[
\lambda=\frac{2.3 (N-1) }{\sum_{i=1}^{N-1} \vert X_i-X_{i+1} \vert }
\f]

\section marvel-2-conc Conclusions and further work

If you have worked through all of this tutorial make sure that you have understood it by ensuring that you understand what the list of learning outcomes
in section \ref marvel-2-lo means and that you can use PLUMED to perform all these tasks.  You might then want to read the original paper on the \ref PATH
collective variable method as well as a few other articles in which these coordinates have been used to analyze simulations and to accelerate sampling.

- Davide Branduardi and Francesco Luigi Gervasio and Michele Parrinello <a href="http://aip.scitation.org/doi/10.1063/1.2432340"> From A to B in free energy space </a> J. Chem. Phys., 126, 054103 (2007)

If you are interested in learning more about isocommittor surfaces and the transition state ensemble you should read up on the transition path sampling method.

*/

link: @subpage marvel-2

description: This tutorial explains how to use various kinds of path collective variables

additional-files: marvel-2
