/**
\page trieste-3 Trieste tutorial: Using restraints 

\section trieste-3-aims Aims

The aim of this tutorial is to introduce the users to the use of constant biases in PLUMED. 

\section trieste-3-objectives Objectives

- Apply a restraint on a simulations over one or more collective variables 
- Understand the effect of a restraint on the acquired statistics
- Perform a simple un-biasing of a restrained simulation
- Add an external potential in the form of an analytical or numerical function

\section trieste-3-resources Resources

The \tarball{trieste-3} for this tutorial contains the following files:
- wdimer.pdb: a PDB file for two molecules of water in vacuo
- wdimer.tpr: a GROMACS run file to perform MD of two water molecules
- diala.pdb: a PDB file for alanine dipeptide in vacuo
- diala.tpr: a GROMACS run file to perform MD of alanine dipeptide
- do_block_histo.py: the python script from \ref trieste-2 to perform block averaging over histograms

This tutorial has been tested on a pre-release version of version 2.4. However, it should not take advantage
of 2.4-only features, thus should also work with version 2.3.

\section trieste-3-intro Introduction

PLUMED can calculate conformational properties of a system a posteriori as well as on-the-fly. This information can be use to manipulate a simulation on-the-fly. This means adding energy terms in addition to those of the original Hamiltonian. This additional energy terms are usually referred as \ref Bias. In the following we will see how to apply a constant bias potential with PLUMED. It is preferable to run each exercise in a separate folder.

\hidden{Summary of theory}
\subsection trieste-3-theory-biased-sampling Biased sampling

A system at temperature \f$ T\f$ samples 
conformations from the canonical ensemble:
\f[
  P(q)\propto e^{-\frac{U(q)}{k_BT}}
\f].
Here \f$ q \f$ are the microscopic coordinates and \f$ k_B \f$ is the Boltzmann constant.
Since \f$ q \f$ is a highly dimensional vector, it is often convenient to analyze it
in terms of a few collective variables (see \ref trieste-1 ).
The probability distribution for a CV \f$ s\f$ is 
\f[
  P(s)\propto \int dq  e^{-\frac{U(q)}{k_BT}} \delta(s-s(q))
\f]
This probability can be expressed in energy units as a free energy landscape \f$ F(s) \f$:
\f[
  F(s)=-k_B T \log P(s)
\f].

Now we would like to modify the potential by adding a term that depends on the CV only.
That is, instead of using \f$ U(q) \f$, we use \f$ U(q)+V(s(q))\f$.
There are several reasons why one would like to introduce this potential. One is to
avoid that the system samples some un-desired portion of the conformational space.
As an example, imagine  that you want to study dissociation of a complex of two molecules.
If you perform a very long simulation you will be able to see association and dissociation.
However, the typical time required for association will depend on the size of the simulation
box. It could be thus convenient to limit the exploration to conformations where the 
distance between the two molecules is lower than a given threshold. This could be done
by adding a bias potential on the distance between the two molecules.
Another example
is the simulation of a portion of a large molecule taken out from its initial context.
The fragment alone could be unstable, and one might want to add additional
potentials to keep the fragment in place. This could be done by adding
a bias potential on some measure of the distance from the experimental structure
(e.g. on root-mean-square deviation).

Whatever CV we decide to bias, it is very important to recognize which is the
effect of this bias and, if necessary, remove it a posteriori.
The biased distribution  of the CV will be
\f[
  P'(s)\propto \int dq  e^{-\frac{U(q)+V(s(q))}{k_BT}} \delta(s-s(q))\propto e^{-\frac{V(s(q))}{k_BT}}P(s)
\f]
and the biased free energy landscape
\f[
  F'(s)=-k_B T \log P'(s)=F(s)+V(s)+C
\f]
Thus, the effect of a bias potential on the free energy is additive. Also notice the presence
of an undetermined constant \f$ C \f$. This constant is irrelevant for what concerns free-energy differences
and barriers, but will be important later when we will learn the weighted-histogram method.
Obviously the last equation can be inverted so as to obtain the original, unbiased free-energy 
landscape from the biased one just subtracting the bias potential
\f[
  F(s)=F'(s)-V(s)+C
\f]

Additionally, one might be interested in recovering the distribution of an arbitrary
observable. E.g., one could add a bias on the distance between two molecules and be willing to
compute the unbiased distribution of some torsional angle. In this case
there is no straightforward relationship that can be used, and one has to go back to
the relationship between the microscopic probabilities:
\f[
  P(q)\propto P'(q) e^{\frac{V(s(q))}{k_BT}}
\f]
The consequence of this expression is that one can obtained any kind of unbiased
information from a biased simulation just by weighting every sampled conformation
with a weight
\f[
  w\propto e^{\frac{V(s(q))}{k_BT}}
\f]
That is, frames that have been explored
in spite of a high (disfavoring) bias potential \f$ V \f$ will be counted more
than frames that has been explored with a less disfavoring bias potential.

\endhidden

We will make use of two toy models: the first is a water dimer, i.e. two molecules of water in vacuo, that we will use to compare the effect of a constant bias on the equilibrium properties of the system that in this case can be readily computed. The second toy model is alanine dipeptide in vacuo. This system is more challenging to characterize with a standard MD simulation and we will see how we can use an interactive approach to to build a constant bias that will help in flattening the underlying free energy surface and thus sped up the sampling.

\note Create a folder for each exercise and use sub-folders if you want to run the same simulation with multiple choices for the parameters

\section trieste-3-ex-1 Exercise 1: converged histogram of the water dimer relative distance

\image html trieste-3-wdimer.png "A water dimer"

First of all let's start to learn something about the water dimer system by running a first simulations. You can start by creating a folder with the dimer.tpr file and run a simulation.

\verbatim
> gmx mdrun -s dimer.tpr  
\endverbatim

In this way we have a 25ns long trajectory that we can use to have a first look at the behavior of the system.
Is the sampling of the relative distance between the two water molecules converged?

Use plumed driver to analyze the trajectory and evaluate the quality of the sampling.

Here you can find a sample `plumed.dat` file that you can use as a template.
Whenever you see an highlighted \highlight{FILL} string, this is a string that you should replace.

\plumedfile
# vim:ft=plumed
#compute the distance between the two oxygens
d: DISTANCE ATOMS=1,4
#accumulate block histograms
hh: HISTOGRAM ARG=d STRIDE=10 GRID_MIN=0 GRID_MAX=4.0 GRID_BIN=200 KERNEL=DISCRETE CLEAR=10000
#and dump them
DUMPGRID GRID=hh FILE=my_histogram.dat STRIDE=10000
# Print the collective variable.
PRINT ARG=d STRIDE=10 FILE=distance.dat 
\endplumedfile

\verbatim
> plumed driver --mf_xtc traj_comp.xtc --plumed plumed.dat  
> python3 do_block_histo.py > final-histo-10000.dat
\endverbatim

If there is something you don't remember about this procedure go back and check in \ref trieste-2 .
There you can also find a python script to perform block averaging of the histograms and assess the error. 
The result should be comparable with the following:
\image html trieste-3-histo-dimer.png "A histogram of the relative distance (in nm) with errors"
Notice the peak at 0.9 nm, this is the effect of using cut-off for the calculation of the interactions in the simulation (check the run-dimer.mdp file for the properties of the run)

\section trieste-3-ex-2 Exercise 2: Apply a linear restraint on the same collective variable 
Now we will try to apply a linear restraint on the relative distance and compare the resulting distribution.
The new sampling will reflect the effect of the bias.
Be careful about the statistics: in the simulation of exercise 1 you were post processing a trajectory of 125000 frames accumulating one frame every ten in an histogram and clearing
the histogram after 10000 steps. As a result you had 12 blocks in the form of 11 analysis.* files and a final block named my_histogram.dat. In the following try to accumulate on the fly
the same amount of statistics. Look into the .mdp file to see how often frames are written in a trajectory. If you write too many analysis.* files (i.e. 100 files plumed will fail
with an error).

\plumedfile
# vim:ft=plumed
#compute the distance between the two oxygens
d: DISTANCE __FILL__
#accumulate block histograms
hh: HISTOGRAM ARG=d KERNEL=DISCRETE STRIDE=500 CLEAR=500000 GRID_MIN=0 GRID_MAX=4.0 GRID_BIN=200
#and dump them
DUMPGRID __FILL__

#apply a linear restraint
lr: RESTRAINT ARG=d KAPPA=0 AT=0 SLOPE=2.5

# Print the collective variable and the bias.
PRINT __FILL__ 
\endplumedfile

In a new folder we can run this new simulation this time biasing and analyzing the simulation on-the-fly.

\verbatim
> gmx mdrun -s dimer.tpr -plumed plumed.dat 
\endverbatim

The histogram should look different.

The effect of a constant bias is that of systematically changing the probability of each conformation by a factor
\f$ \exp(+V_{bias}/k_{B}T) \f$. This means that it is easily possible to recover the un-biased distribution at least in the regions of the conformational space that have been thoroughly sampled. In practice the statistical weight of each frame is not 1 anymore but is given by the exponential of the bias.

In order to recover the unbiased distribution we can post process the simulation using plumed driver to recalculate the bias felt by each frame and store this information to analyze any property. Furthermore plumed can also automatically use the bias to reweight the accumulated histogram.

\plumedfile
# vim:ft=plumed
d: DISTANCE __FILL__ 

lr: RESTRAINT __FILL__ 
as: REWEIGHT_BIAS TEMP=298

HISTOGRAM ...
  LOGWEIGHTS=as
  ARG=d 
  STRIDE=10 
  GRID_MIN=0 GRID_MAX=4.0 GRID_BIN=200 
  KERNEL=DISCRETE
  CLEAR=10000
... HISTOGRAM

DUMPGRID __FILL__
PRINT ARG=*.* FILE=COLVAR STRIDE=1
\endplumedfile

Be careful again about the difference in the way statistics is accumulated on-the-fly or for post processing. This is not critical for the result but is important in order to have comparable histograms, that is histograms with comparable noise. Remember to give different names to the new histogram otherwise the one obtained before will be overwritten.

\verbatim
> plumed driver --mf_xtc traj_comp.xtc --plumed plumed.dat  
\endverbatim

\note To run block analysis of both sets of histograms you need to edit the python script because the file name is hard coded.

\verbatim
> python3 do_block_histo.py > histo-biased.dat 
> python3 do_block_histo.py > histo-reweighted.dat 
\endverbatim

Now the resulting histogram should be comparable to the reference one.

\section trieste-3-ex-3 Exercise 3: Apply a quadratic restraint on the same collective variable 

Do you expect a different behavior? This time we can write the plumed input file in such a way to compare directly the biased and unbiased histograms.

\plumedfile
# vim:ft=plumed
#calculate the distance
d: DISTANCE ATOMS=1,4
#apply the quadratic restraint centered at a distance of 0.5 nm
lr: RESTRAINT ARG=d KAPPA=10 AT=0.5 
#accumulate the biased histogram
hh: HISTOGRAM ARG=d STRIDE=500 GRID_MIN=0 GRID_MAX=4.0 GRID_BIN=200 KERNEL=DISCRETE CLEAR=500000
#dumpit
DUMPGRID GRID=hh FILE=my_histogram.dat STRIDE=500000
#calculate the weights from the constant bias
as: REWEIGHT_BIAS TEMP=298
#accumulate the unbiased histogram
hhu: HISTOGRAM ARG=d STRIDE=500 GRID_MIN=0 GRID_MAX=4.0 GRID_BIN=200 KERNEL=DISCRETE CLEAR=500000 LOGWEIGHTS=as
#dumpit
DUMPGRID GRID=hhu FILE=myhistu.dat STRIDE=500000
#print distance and bias
PRINT ARG=d,lr.bias FILE=distance.dat STRIDE=50
\endplumedfile

The comparison of the two histograms with the former will show the effect of the weak quadratic bias on the simulation.

\note To run block analysis of both sets of histograms you need to edit the python script because the file name is hard coded.

\verbatim
> python3 do_block_histo.py > histo-biased.dat 
> python3 do_block_histo.py > histo-reweighted.dat 
\endverbatim

\section trieste-3-ex-4 Exercise 4: Apply an upper wall on the distance.
In the above cases we have always applied weak biases. Sometimes biases are useful to prevent the system in reaching some region of the conformational space. In this case instead of using \ref RESTRAINT , we can make use of lower or upper restraints, e.g. \ref LOWER_WALLS and \ref UPPER_WALLS.

What happen to the histogram when we use walls? 

\plumedfile
# vim:ft=plumed
d: DISTANCE ATOMS=1,4
uw: UPPER_WALLS ARG=d KAPPA=1000 AT=2.5
# accumulate the biased histogram
__FILL__
#dumpit
__FILL__
# calculate the weights from the constant bias
__FILL__
#accumulate the unbiased histogram
__FILL__
#dumpit
__FILL__
#print distance and bias
__FILL__
\endplumedfile

Run it.

\verbatim
> gmx mdrun -s dimer.tpr -plumed plumed.dat 
\endverbatim

If we have not sampled a region thoroughly enough it is not possible to estimate the histogram in that region even using reweighting (reweighting is not magic!).

\section trieste-3-ex-5 Exercise 5: Evaluate the free energy and use it as an external restraint

The main issue in sampling rare events is that importance sampling algorithms spend more time in low energy regions and if two low energy regions are separated by a high energy one is unlikely for the sampling algorithm to cross the high energy region and reach the other low energy one. From this point of view an algorithm based on random sampling will work better in crossing the barrier. A particularly efficient sampling can be obtained if one would know the underlying free energy and thus use that to bias the sampling and make the sampling probability uniform in the regions of relevant interest.
In this exercise we will make use of the free-energy estimate along the distance collective variable to bias the sampling of the same collective variable in the dimer simulation. To do so we will make use of a table potential applied using the \ref Bias \ref EXTERNAL. We first need to get a smooth estimate of the free-energy from our fist reference simulations, we will do this by accumulating a histogram with kernel functions, that is continuous function centered at the value of the accumulated point and added accumulated on the discrete representation of the histogram, see <a href="https://en.wikipedia.org/wiki/Kernel_density_estimation"> Kernel density estimation </a>.

\plumedfile
# vim:ft=plumed
#calculate the distance
d: DISTANCE ATOMS=1,4
#accumulate the histogram using a gaussian kernel with 0.05 nm width
hh2: HISTOGRAM ARG=d STRIDE=10 GRID_MIN=0 GRID_MAX=4.0 GRID_BIN=400 BANDWIDTH=0.05
#convert to a free energy
ff: CONVERT_TO_FES GRID=__FILL__ TEMP=__FILL__
#dump the free energy
DUMPGRID GRID=__FILL__ FILE=__FILL__
\endplumedfile

by running plumed driver on the reference trajectory we obtain a free energy estimate.

\verbatim
> plumed driver --mf_xtc traj_comp.xtc --plumed plumed.dat  
\endverbatim

The resulting file for the free energy should be edited in order to:
- Invert the sign of the free-energy and of its derivative
- Remove some unused flag and regions with infinite potential at the boundaries 

The file looks like:

\verbatim
#! FIELDS d ff dff_d
#! SET min_d 0
#! SET max_d 4.0
#! SET nbins_d  400
#! SET periodic_d false
0.060000 -34.9754 185.606
0.070000 -26.0117 184.362
0.080000 -20.8195 181.39
0.090000 -17.5773 176.718
\endverbatim

where the first column is the grid spacing, the second the free energy and the third the derivative of the free energy. You can edit
the file as you want, for example using the following bash lines: 

\verbatim
grep \# ff.dat | grep -v normalisation > external.dat
grep -v \# ff.dat | awk '{print $1, -$2, -$3}' | grep -v inf >> external.dat 
\endverbatim

Furthermore edit the first line of external.dat from

\verbatim
#! FIELDS d ff dff_d
\endverbatim

to 

\verbatim
#! FIELDS d ff.bias der_d
\endverbatim

Now we have an external potential that is the opposite of the free energy and we can use it in a new folder to bias a simulation:

\plumedfile
# vim:ft=plumed
d: DISTANCE ATOMS=1,4
EXTERNAL ARG=d FILE=__FILL__ LABEL=ff
# accumulate the biased histogram
__FILL__
#dumpit
__FILL__
# calculate the weights from the constant bias
__FILL__
#accumulate the unbiased histogram
__FILL__
#dumpit
__FILL__
#print distance and bias
__FILL__
\endplumedfile

Run it.

\verbatim
> gmx mdrun -s dimer.tpr -plumed plumed.dat 
\endverbatim

How do the biased and unbiased histograms look like? In the following we will apply this concept to sample the conformational space of a more complex system.

\section trieste-3-ex-6 Exercise 6: Preliminary run with Alanine dipeptide

Alanine dipeptide is characterized by multiple minima separated by relatively high free energy barriers. Here we will explore the conformational space of
alanine dipeptide using a standard MD simulation, then instead of using the free energy as an external potential we will try to fit the potential using
gnuplot and add a bias using an analytical function of a collective variable with \ref MATHEVAL and \ref BIASVALUE .


As a first test lets run an MD and generate on-the-fly the free energy as a function of the phi and psi collective variables separately.

\plumedfile
#SETTINGS MOLFILE=user-doc/tutorials/old_tutorials/trieste-3/aladip/aladip.pdb
# vim:ft=plumed
MOLINFO STRUCTURE=aladip.pdb
phi: TORSION ATOMS=@phi-2
psi: TORSION ATOMS=@psi-2
hhphi: HISTOGRAM ARG=phi STRIDE=10 GRID_MIN=-pi GRID_MAX=pi GRID_BIN=600 BANDWIDTH=0.05
hhpsi: HISTOGRAM ARG=psi STRIDE=10 GRID_MIN=-pi GRID_MAX=pi GRID_BIN=600 BANDWIDTH=0.05
ffphi: CONVERT_TO_FES GRID=hhphi TEMP=298
ffpsi: CONVERT_TO_FES GRID=hhpsi TEMP=298
DUMPGRID GRID=ffphi FILE=ffphi.dat
DUMPGRID GRID=ffpsi FILE=ffpsi.dat
PRINT ARG=phi,psi FILE=colvar.dat STRIDE=10
\endplumedfile

from the colvar file it is clear that we can quickly explore two minima but that the region for positive phi is not accessible. Instead of using the opposite
of the free energy as a table potential here we introduce the function \ref MATHEVAL that allows defining complex analytical functions of collective variables,
and the bias \ref BIASVALUE that allows using any continuous function of the positions as a bias.

So first we need to fit the opposite of the free energy as a function of phi in the region explored with a periodic function, because of the gaussian like look
of the minima we can fit it using <a href="https://en.wikipedia.org/wiki/Von_Mises_distribution"> the von Mises distribution</a>. In gnuplot

\verbatim
>gnuplot
gnuplot>plot 'ffphi.dat' u 1:(-$2+31) w l
gnuplot>f(x)=exp(k1*cos(x-a1))+exp(k2*cos(x-a2))
gnuplot>fit [-3.:-0.6] f(x) 'ffphi.dat' u 1:(-$2+31) via k1,a1,k2,a2
gnuplot>rep f(x)
\endverbatim

The function and the resulting parameters can be used to run a new biased simulation:

\section trieste-3-ex-7 Exercise 7: First biased run with Alanine dipeptide

\plumedfile
#SETTINGS MOLFILE=user-doc/tutorials/trieste-3/aladip/aladip.pdb
# vim:ft=plumed
MOLINFO STRUCTURE=aladip.pdb

phi: TORSION ATOMS=@phi-2
__FILL__

MATHEVAL ...
ARG=phi 
LABEL=doubleg
FUNC=exp(__FILL)+__FILL__
PERIODIC=NO
... MATHEVAL

b: BIASVALUE ARG=__FILL__
PRINT __FILL__ 
\endplumedfile

It is now possible to run a second simulation and observe the new behavior. The system quickly explores a new minimum. While a quantitative estimate of the free energy difference of the old and new regions
is out of the scope of the current exercise what we can do is to add a new von Mises function centered in the new minimum with a comparable height, in this way we can hope to facilitate a back and forth
transition along the phi collective variable.

\verbatim
>gnuplot
gnuplot> ...
\endverbatim

We can now run a third simulation where both regions are biased.

\section trieste-3-ex-8 Exercise 8: Second biased run with Alanine dipeptide

\plumedfile
#SETTINGS MOLFILE=user-doc/tutorials/trieste-3/aladip/aladip.pdb
# vim:ft=plumed
MOLINFO STRUCTURE=aladip.pdb

phi: TORSION ATOMS=@phi-2
psi: TORSION ATOMS=@psi-2


MATHEVAL ...
ARG=phi 
LABEL=tripleg
FUNC=exp(k1*cos(x-a1))+exp(k2*cos(x-a2))+exp(k3*cos(x+a3))
PERIODIC=NO
... MATHEVAL

b: BIASVALUE ARG=tripleg

__FILL__

ENDPLUMED
\endplumedfile

With this third simulation it should be possible to visit both regions as a function on the phi torsion. Now it is possible to reweight the sampling
and obtain a better free energy estimate along phi.

\plumedfile
#SETTINGS MOLFILE=user-doc/tutorials/trieste-3/aladip/aladip.pdb
# vim:ft=plumed
MOLINFO STRUCTURE=aladip.pdb

phi: TORSION ATOMS=@phi-2
psi: TORSION ATOMS=@psi-2

MATHEVAL ...
ARG=phi 
LABEL=tripleg
FUNC=__FILL__
PERIODIC=NO
... MATHEVAL

b: BIASVALUE ARG=tripleg
as: REWEIGHT_BIAS TEMP=298
hhphi: HISTOGRAM ARG=phi STRIDE=10 GRID_MIN=-pi GRID_MAX=pi GRID_BIN=600 BANDWIDTH=0.05 LOGWEIGHTS=as
hhpsi: HISTOGRAM ARG=psi STRIDE=10 GRID_MIN=-pi GRID_MAX=pi GRID_BIN=600 BANDWIDTH=0.05 LOGWEIGHTS=as
ffphi: CONVERT_TO_FES GRID=hhphi TEMP=298
ffpsi: CONVERT_TO_FES GRID=hhpsi TEMP=298

DUMPGRID GRID=ffphi FILE=ffphi.dat
DUMPGRID GRID=ffpsi FILE=ffpsi.dat

PRINT ARG=phi,psi FILE=colvar.dat STRIDE=10
\endplumedfile

If you have time you can extend this in two-dimensions using at the same time the phi and psi collective variables.


*/

link: @subpage trieste-3

description: This tutorial explains how to use PLUMED to run simple restrained simulations and account for the bias in the analysis

additional-files: trieste-3
