/**
\page masterclass-21-1 PLUMED Masterclass 21.1: PLUMED syntax and analysis

\authors Max Bonomi
\date January 18, 2021

\section masterclass-21-1-aims Aims

The aim of this Masterclass is to introduce users to the PLUMED syntax and to 
illustrate how PLUMED can be used to analyze pre-existing molecular dynamics trajectories.

\section masterclass-21-1-lo Objectives

Once this Masterclass is completed, users will be able to:

- Write a simple PLUMED input file and use it with the \ref driver utility to analyze a trajectory.
- Use advanced selection tools with \ref MOLINFO.
- Define and use virtual atoms, such as \ref CENTER.
- Deal with discontinuities in the trajectory due to periodic boundary conditions.
- Use \ref RMSD to measure protein conformational changes.
- Align the system to a template with \ref FIT_TO_TEMPLATE. 
 
\section masterclass-21-1-install Setting up the software 

\subsection masterclass-21-1-conda Installation

We will use [conda](https://docs.conda.io/en/latest/) to install all the software needed in this Masterclass:
- [PLUMED](https://www.plumed.org) version 2.7.0
- [NumPy](https://numpy.org)
- [pandas](https://pandas.pydata.org)
- [matplotlib](https://matplotlib.org)
- [jupyter](https://jupyter.org/index.html)
- [MDTraj](https://mdtraj.org/1.9.4/index.html)
- [MDAnalysis](https://www.mdanalysis.org)
- [Git](https://git-scm.com)

First, make sure conda is installed by typing:

\verbatim
conda
\endverbatim

If the command is not found, please refer to these [instructions](https://docs.conda.io/en/latest/miniconda.html) to install conda on your machine.
Alternatively, if you use the [Homebrew](https://brew.sh) package manager, you can install conda with:

\verbatim
brew install --cask anaconda
# add this line to your .bashrc
export PATH="/usr/local/anaconda3/bin:$PATH"
\endverbatim

Now we can create a conda environment for the PLUMED Masterclass:

\verbatim
conda create --name plumed-masterclass 
\endverbatim

and activate it with:

\verbatim
conda activate plumed-masterclass
\endverbatim

Finally, we can proceed with the installation of the required software:

\verbatim
conda install -c conda-forge plumed py-plumed numpy pandas matplotlib notebook mdtraj mdanalysis git
\endverbatim

\note Do not forget to activate the `plumed-masterclass` environment every time you open a new terminal/shell.

\subsection masterclass-21-1-plumed PLUMED overview 
 
PLUMED is a library that can be incorporated into many molecular dynamics (MD) codes by adding a relatively simple and well documented interface.
Once it is incorporated you can use PLUMED to perform on-the-fly a variety of different analyses and to bias 
the sampling in MD simulations. Additionally, PLUMED can be used as a standalone code for analyzing trajectories.
If you want to use the code in this way, you can run the PLUMED executable by issuing the command:

\verbatim
plumed <instructions>
\endverbatim

Let's start by getting a feel for the range of calculations that PLUMED can do. Issue the following command now:

\verbatim
plumed --help 
\endverbatim

The output of this command is the list of \ref tools included in PLUMED. Among these, there are commands that allow you
to patch an MD code, postprocess metadynamics simulations, and build the manual. In this
class we will use PLUMED to analyze trajectories. In order to do so, we will
learn how to use the \ref driver command line tool. 
Let's look at the options of PLUMED \ref driver by issuing the following command:  

\verbatim
plumed driver --help
\endverbatim

As you can see we can do a number of things with \ref driver. For all of these options, however, we are going to need to write 
a PLUMED input file. 

\section masterclass-21-1-resources Resources

The data needed to complete the exercises of this Masterclass can be found on [GitHub](https://github.com/plumed/masterclass-21-1).
You can clone this repository locally on your machine using the following command:

\verbatim
git clone https://github.com/plumed/masterclass-21-1.git
\endverbatim

The MD trajectory that we will analyze can be found in the folder called `data`:
- `5-HT1B.pdb`: reference conformation of the 5-HT1B receptor with the serotonin ligand.
- `5-HT1B.xtc`: trajectory of the 5-HT1B receptor with the serotonin ligand.

A Jupyter notebook to be used as template for the analysis of the PLUMED output can be found in the folder called `notebooks`. In this
directory, you will also find the complete solution to all exercises in the notebook `solution.ipynb`.

Please note that:
- This is a simulation of a membrane receptor, but water, lipids, and ions have been stripped out of the trajectory.
- This is the raw trajectory generated by [GROMACS](https://www.gromacs.org). Therefore it is discontinuous due to periodic boundaries conditions (PBCs).
- The students are invited to solve the exercises by themselves after completing the PLUMED input file templates provided below. In case of problems, students
 can rely on the solution notebook provided in the [GitHub](https://github.com/plumed/masterclass-21-1) repository.

To keep things clean, it is recommended to run each exercise in a separate sub-directory (i.e. Exercise-1, Exercise-2, ...), which you can create inside the root directory `masterclass-21-1`.

\note All the exercises have been tested with PLUMED version 2.7.0.

\section masterclass-21-1-structure A sample PLUMED input file

The main goal of PLUMED is to compute collective variables (or CVs), which are complex descriptors 
of the system that can be used to describe the conformational change of a protein or a chemical reaction.
This can be done either _on-the-fly_ during a molecular dynamics simulations or _a posteriori_ on a 
pre-calculated trajectory using PLUMED as a post-processing tool. 
In both cases, you should create an input file with a specific PLUMED syntax. 
Have a look at the sample input file below:

\plumedfile
# Compute distance between atoms 1 and 10.
# Atoms are ordered as in the trajectory files and their numbering starts from 1.
# The distance is called "d" for future reference.
d: DISTANCE ATOMS=1,10

# Compute the torsional angle between atoms 1, 10, 20, and 30.
# The angle is called "phi1" for future reference.
phi1: TORSION ATOMS=1,10,20,30

# The same CV defined above can be split into multiple lines
# The angle is called "phi2" for future reference.
TORSION ...
LABEL=phi2
ATOMS=1,10,20,30
...

# Print "d" on a file named "COLVAR1" every 10 steps.
PRINT ARG=d FILE=COLVAR1 STRIDE=10

# Print "phi1" and "phi2" on another file named "COLVAR2" every 100 steps.
PRINT ARG=phi1,phi2 FILE=COLVAR2 STRIDE=100
\endplumedfile

In the input file above, each line defines a so-called action. In this simple example, 
actions are used to compute a distance, a dihedral angle, 
or print some values on a file. Each action supports a number of keywords,
whose value is specified. Action names are highlighted in green and, by clicking on them, you can go to the
corresponding page in the manual that contains a detailed description of each keyword.
Actions that support the keyword `STRIDE` are those that determine how frequently things are done.
Notice that the default value for `STRIDE` is always 1. In the example above, omitting `STRIDE` keywords
the corresponding `COLVAR` files would have been written for every frame of the analyzed trajectory.
All the other actions in the example above, i.e. \ref DISTANCE and \ref TORSION, do not 
support the `STRIDE` keyword and are only calculated when requested. That is, `d` will be computed
every 10 frames, and `phi1` and `phi2` every 100 frames.

Variables should be given a name (in the example above, `d`, `phi1`, and `phi2`), which is
then used to refer to these variables in subsequent actions, such as the \ref PRINT command.
A lists of atoms should be provided as comma separated numbers, with no space.

You can find more information on the PLUMED syntax
at \ref Syntax page of the manual. The complete documentation for all the supported
collective variables can be found at the \ref colvarintro page.

\subsection masterclass-21-1-units The PLUMED internal units

By default the PLUMED inputs and outputs quantities in the following units:

- Length: nanometers
- Energy: kJ/mol
- Time: picoseconds
- Mass: amu
- Charge: e

If you want to change these units, you can do this using the \ref UNITS keyword. 

\section masterclass-21-1-ex Exercises

\subsection masterclass-21-1-ex-1 Exercise 1: Computing and printing simple collective variables

In this exercise, we will learn how to compute and print collective variables on a pre-calculated MD trajectory.
To analyze the trajectory provided here, we will:
- create a PLUMED input file with a text editor (typically called `plumed.dat`);
- run the PLUMED \ref driver utility;
- visualize the output with the aid of a Jupyter notebook.

Notice that you can also visualize trajectories with [VMD](https://www.ks.uiuc.edu/Research/vmd/) (always a good idea!).
For example, the trajectory `5-HT1B.xtc` can be visualized with the command:

\verbatim
vmd 5-HT1B.pdb 5-HT1B.xtc 
\endverbatim

When you try this, you will notice that this trajectory is discontinuous due to PBCs. We need to keep this
in mind in our analysis. 

Let's now prepare a PLUMED input file to calculate: 
- the gyration radius of the CA protein atoms (\ref GYRATION) of the first 40 N-terminal residues; 
- the distance (\ref DISTANCE) between CA atoms of residues 1 and 40.

The first 40 residues of the 5-HT1B receptor correspond to an extracellular flexible loop of which we want to characterize the
dynamics during our MD simulation. Below you can find a sample `plumed.dat` file that you can use as a template.
Whenever you see an highlighted \highlight{FILL} string, be careful because this is a string that you must replace.
To retrieve the atom indexes that you need to include in the input file, you can have a look at `5-HT1B.pdb`.
The atoms indexes are contained in the second column. Keep in mind that numbering scheme in PLUMED starts from 1.

\plumedfile
# Compute gyration radius on CA atoms of the first 40 N-terminal residues:
r: GYRATION ATOMS=__FILL__

# Compute distance between CA atoms of residues 1 and 40 
d: DISTANCE ATOMS=__FILL__

# Print the two collective variables on COLVAR file every step
PRINT ARG=__FILL__ FILE=COLVAR STRIDE=__FILL__
\endplumedfile

Once your `plumed.dat` file is complete, you can run the PLUMED \ref driver as follows:

\verbatim
plumed driver --plumed plumed.dat --mf_xtc 5-HT1B.xtc
\endverbatim

Scroll in your terminal to read the PLUMED log. As you can see,
PLUMED gives a lot of feedback about the input that it is reading and the actions that it will
execute. Please take your time to inspect the log file and check if PLUMED is actually doing what you intend to do.

The command above will create a `COLVAR` file. The first lines should be identical to these ones:

\verbatim
#! FIELDS time r d
 0.000000 1.271069 2.782998
 1.000000 1.263125 2.388697
 2.000000 1.348965 2.606062
 3.000000 1.291011 2.204363
 4.000000 1.280714 2.411836
 5.000000 1.257692 2.334839
\endverbatim

Notice that the first line informs you about the content of each column.

__In case you obtain different numbers, check your input, you might have made some mistakes!__

This `COLVAR` file can be analyzed using the Jupyter notebook `plumed-pandas.ipynb` provided in the folder `notebooks`.
You can use the following command to open the notebook:

\verbatim
jupyter notebook plumed-pandas.ipynb
\endverbatim

This notebook allows you to import the `COLVAR` file produced by PLUMED and to generate the desired figures using the 
`matplotlib` library:

\code{.py}
     
 # import python modules
 import plumed
 import matplotlib.pyplot as plt

 # import COLVAR file as pandas dataset
 # set the right path to the COLVAR file
 data=plumed.read_as_pandas("../Exercise-1/COLVAR")
 # print pandas dataset
 data

 # plot time serie of gyration radius (r) and distance (d)
 plt.plot(data.time,data.r, label="gyration radius")
 plt.plot(data.time,data.d, label="distance")
 # x-y axis labels
 plt.xlabel("MD frame")
 plt.ylabel("r/d [nm]")
 plt.legend()

 # plot gyration radius vs distance
 plt.plot(data.r,data.d, 'o')
 # x-y axis labels
 plt.xlabel("gyration radius [nm]")
 plt.ylabel("distance [nm]")
     
\endcode

What can you deduce about the dynamics of this region of the 5-HT1B receptors? Are the two CVs both providing useful information or are they quite correlated? To answer to the latter question, you can inspect the plot of one CV against the other.

\subsection masterclass-21-1-ex-2 Exercise 2: Mastering advanced selection tools

PLUMED provides some shortcuts to select atoms with specific properties.
To use this feature, you should specify the \ref MOLINFO action along with a reference
PDB file. This command is used to provide information on the molecules that are present in your system.

Let's try to use this functionality to calculate the backbone dihedral angle \f$ \phi \f$ (phi) of residue 2
of the 5-HT1B receptor. This CV is defined by the action \ref TORSION and a set of 4 atoms. For residue `i`,
the dihedral \f$ \phi \f$ is defined by these atoms: C(i-1),N(i),CA(i),C(i) (see Fig. \ref masterclass-21-1-dih-fig).

\anchor masterclass-21-1-dih-fig 
\image html masterclass-21-1-dih-fig.png "Definition of backbone dihedral angles, with dihedral phi highlighted in red."

After consulting the manual and inspecting `5-HT1B.pdb`, let's define the dihedral angle \f$ \phi \f$ of residue 2 
in two different ways:
1. specifying an explicit list of 4 atoms (`t1`).
2. using the \ref MOLINFO shortcut to select quadruplets for dihedral angles (`t2`);

\plumedfile
# Activate MOLINFO functionalities
MOLINFO STRUCTURE=__FILL__

# Define the dihedral phi of residue 2 as an explicit list of 4 atoms
t1: TORSION ATOMS=__FILL__
# Define the same dihedral using MOLINFO shortcuts
t2: TORSION ATOMS=__FILL__

# Print the two collective variables on COLVAR file every step
PRINT ARG=__FILL__ FILE=COLVAR STRIDE=__FILL__
\endplumedfile

After completing the PLUMED input file above, let's use it to analyze the trajectory `5-HT1B.xtc` using the \ref driver tool:

\verbatim
plumed driver --plumed plumed.dat --mf_xtc 5-HT1B.xtc
\endverbatim

You can modify the Jupyter notebook used in \ref masterclass-21-1-ex-1 to visualize the trajectory of the two CVs calculated with the PLUMED input file above
and written in the `COLVAR` file.
If you executed this exercise correctly, these two trajectories should be identical.

As a second example of \ref MOLINFO capabilities, we will use the advanced atom selection tools provided by
the [MDAnalysis](https://www.mdanalysis.org) and [MDTraj](https://mdtraj.org/1.9.4/index.html) libraries.
Let's redo \ref masterclass-21-1-ex-1, this time using \ref MOLINFO shortcuts to select CA atoms. 
You need to complete the following template PLUMED input file using the appropriate selection syntax for the corresponding
library used. Please consult the [MDAnalysis](https://www.mdanalysis.org) and [MDTraj](https://mdtraj.org/1.9.4/index.html) 
documentations if you are not familiar with these libraries and their syntax.

\plumedfile
# Activate MOLINFO functionalities
MOLINFO STRUCTURE=__FILL__ 

# To use MDAnalysis selection tools:
r1: GYRATION ATOMS={@mda:{resid __FILL__ and name __FILL__}}
d1: DISTANCE ATOMS={@mda:{resid __FILL__ and name __FILL__}}

# To use MDTraj selection tools:
r2: GYRATION ATOMS={@mdt:{resid __FILL__ and name __FILL__}}
d2: DISTANCE ATOMS={@mdt:{resid __FILL__ and name __FILL__}}

# Print all the collective variables on COLVAR file every step
PRINT ARG=__FILL__ FILE=COLVAR STRIDE=__FILL__
\endplumedfile

Now, you can compare the `COLVAR` file obtained with the one of \ref masterclass-21-1-ex-1: they should be identical!

\subsection masterclass-21-1-ex-3 Exercise 3: Using virtual atoms

Sometimes, when calculating a CV, you may not want to use the positions of a number of atoms directly. 
Instead you may want to define a virtual atom whose position is generated based on the positions 
of a collection of other atoms. For example you might want to use the center of mass (\ref COM) or
the geometric center (\ref CENTER) of a group of atoms.

In this exercise, you will learn how to specify virtual atoms and later use them to define a CV.
The objective is to calculate the distances between the geometric center of the serotonin ligand (indicated
as residue `LIG` in `5-HT1B.pdb`) and the geometric centers of the two glycans located at position N24 and N32. 
Glycans are carbohydrate-based polymers that are sometimes linked to certain protein aminoacids. If you examine
`5-HT1B.pdb`, you will find the two glycans defined after the end of the protein, i.e. after residue SER-390.
These two glycans have different length:
- the glycan attached at position N24 ranges from residue BGLC-1 to AFUC-9
- the glycan attached at position N32 ranges from residue BGLC-1 to AFUC-10 

Let's complete the PLUMED input file below. You can use the advanced selection tools learned in \ref masterclass-21-1-ex-2 
to specify the atoms belonging to the ligand and to the two glycans:

\plumedfile
# Geometric center of the ligand
lig: CENTER ATOMS=__FILL__
# Geometric center of the first glycan 
g1: CENTER ATOMS=__FILL__
# Geometric center of the second glycan
g2: CENTER ATOMS=__FILL__

# Distance between ligand and first glycan
d1: DISTANCE ATOMS=__FILL__
# Distance between ligand and second glycan 
d2: DISTANCE ATOMS=__FILL__

# Print the two distances on COLVAR file every step
PRINT ARG=__FILL__ FILE=COLVAR STRIDE=__FILL__
\endplumedfile

Once you have prepared a PLUMED input file containing the above instructions, you can execute it on the trajectory `5-HT1B.xtc`
by making use of the following command:

\verbatim
plumed driver --mf_xtc 5-HT1B.xtc --plumed plumed.dat
\endverbatim

Let's now analyze the output of the calculation by plotting the time series of the two CVs.
Can you say if the ligand is overall staying closer to the first or second glycan?

\subsection masterclass-21-1-ex-4 Exercise 4: Fixing PBCs discontinuities 

As mentioned above, `5-HT1B.xtc` is the raw trajectory generated by the GROMACS MD code. Therefore, it typically presents
discontinuities due to PBCs. Many of the CVs used so far, such as \ref CENTER or \ref DISTANCE, take care of these discontinuities 
automatically. However, other CVs need a special command, called \ref WHOLEMOLECULES, to fix PBCs discontinuities before the calculation
of the CV. In this exercise, you will learn how to use this action.

We have seen that the first 40 N-terminal residues of the 5-HT1B receptor are quite flexibile. In this exercise, we want to estimate
the secondary structure content (alpha-helix and beta-sheet) of this fragment during the course of the MD simulations. In order to 
do so, we can use the following 3 CVs:
- \ref ALPHARMSD to measure the alpha-helical content of a protein structure. 
- \ref PARABETARMSD to measure the parallel beta-sheet content. 
- \ref ANTIBETARMSD to measure the antiparallel beta-sheet content. 

Let's first try to complete the following PLUMED input file:

\plumedfile
# Activate MOLINFO functionalities 
MOLINFO __FILL__
# make the first 40 N-terminal residues whole
WHOLEMOLECULES __FILL__
# alpha-helix content of residues 1-40
h: ALPHARMSD __FILL__ TYPE=OPTIMAL
# parallel beta-sheet content of residues 1-40
pb: PARABETARMSD __FILL__ TYPE=OPTIMAL
# antiparallel beta sheet-content of residues 1-40
ab: ANTIBETARMSD __FILL__ TYPE=OPTIMAL
# now we create a new CV that sums parallel and antiparallel beta-sheet contents
b: COMBINE __FILL__ PERIODIC=NO
# Print the alpha-helical content and the *total* beta-sheet content on COLVAR file every step
PRINT __FILL__
\endplumedfile

and use it to analyze the trajectory `5-HT1B.xtc`. Can you say if the first 40 N-terminal residues tend to populate more
alpha-helical or beta-sheet conformations? 

\subsection masterclass-21-1-ex-5 Exercise 5: Using RMSD to measure conformational changes

As previously mentioned, the first 40 N-terminal residues of the 5-HT1B receptor are quite flexible, while the rest of the 
protein remains more stable during the course of the simulation. In this exercise, you will learn how to use \ref RMSD to
measure deviations from a reference structure. To use this CV, you need to keep in mind that you must specify in the PLUMED input
a PDB file in which you mark the atoms that you want to use to:
- optimally align a conformation to the reference;
- calculate the displacement from the reference conformation after optimal alignment.

Keep in mind that these two sets of atoms might be different! In fact, the objective of this exercise is to calculate:
- the \ref RMSD of the backbone atoms of the first 40 N-terminal residues after aligning the system on the backbone atoms of 
  residues 41 to 390;
- the \ref RMSD of the backbone atoms of residues 41 to 390 after aligning the system on the same set of atoms.

To create the two PDB files needed to define the two \ref RMSD CVs, you can start from the provided PDB file
`5-HT1B.pdb`. Please consult the manual at the \ref RMSD page, in order to:
- create the PLUMED input file to calculate the two CVs defined above (use `TYPE=OPTIMAL`);
- learn how to mark atoms for alignment and displacement in the PDB files;
- check whether PBCs are automatically taken care of or you need to use the \ref WHOLEMOLECULES action.

After analyzing `5-HT1B.xtc` with the \ref driver tool, can you say which part of the receptor is more flexible and deviates more from the
starting conformation during the course of the simulation?

\subsection masterclass-21-1-ex-6 Exercise 6: Aligning conformations to a template

In this exercise, we will learn how to align a MD trajectory to a reference conformation, after fixing possible discontinuities due to PBCs.
The goal is to compute the vertical position of the serotonin ligand with respect to the lipid bilayer. In the simulations of membrane proteins,
typically the initial conformation is oriented so that the lipid bilayer is parallel to the xy plane (look for example at `5-HT1B.pdb`). 
Therefore, initially one could use for example
the coordinate `z` of the geometric center of the ligand to measure how far it is from the membrane bilayer. However, during the simulation:
- the system can translate from its original position;
- the system can be broken by PBCs;

therefore one could not use an absolute position to keep track of the location of the ligand. To solve this problem, there are several PLUMED actions that
can be used to make sure the system is not broken by PBCs, to re-align it to a reference conformation, and thus to use absolute positions safely.

To complete this exercise, the users will need to make heavy use of the PLUMED [manual](https://www.plumed.org/doc-v2.7/user-doc/html/colvarintro.html) to prepare the input file on their own. In the following, some suggestions
will be given:
- first, make sure the entire protein is not broken by PBCs using \ref WHOLEMOLECULES;
- then, make sure the ligand is not broken by PBCs and in the same cell as the protein, using the \ref WRAPAROUND action and the `GROUPBY` option;
- to align the _stable_ protein residues (as defined in \ref masterclass-21-1-ex-5, i.e. residues 41 to 390) to the template `5-HT1B.pdb`, you can use \ref FIT_TO_TEMPLATE;
- at this point, you can safely define the position of the geometric center of the ligand using the \ref POSITION CV with the option `NOPBC`;
- the requested CV is the `z` component of the \ref POSITION CV. 

You can check that your PLUMED input file is correct in two ways. First, you can print out the conformations of the system after the transformations done
by \ref WHOLEMOLECULES, \ref WRAPAROUND, and \ref FIT_TO_TEMPLATE by modifying your input file as follows:

\plumedfile
# Activate MOLINFO functionalities
MOLINFO STRUCTURE=__FILL__
# Here put your PLUMED input file
#
#
# Write coordinates of all the atoms to file after PLUMED transformations
DUMPATOMS FILE=5-HT1B_aligned.gro ATOMS=__FILL__
\endplumedfile

Now, you can visualize the `5-HT1B_aligned.gro` file using [VMD](https://www.ks.uiuc.edu/Research/vmd/).
Second, if you inspect the time series of your CV, this should be a continuous trajectory that spans the range from 9 nm to 18 nm.

\subsection masterclass-21-1-ex-7 Exercise 7: Estimating binding propensity 

In this last exercise, we want to determine the propensity of the serotonin ligand to bind the first 40 N-terminal flexible residues
of the 5-HT1B receptor and if there are hot-spots where binding is more favorite. In order to answer to these questions, the user will:
- compute the fraction of bound conformations over the total number of frames in the MD trajectory;
- for each individual residue and glycan, compute the fraction of bound conformation per residue/glycan;
- dump all bound conformations to a `gro` file, after fixing PBCs as in \ref masterclass-21-1-ex-6;
- for each individual residue and glycan, dump all bound conformations to a separate `gro` file, after fixing PBCs.

To solve this exercise, no template PLUMED input file nor any indication of the procedure to follow will be given. The users should only keep in mind that:
- we arbitrarily define as *bound* a conformation in which at least one pair of atoms of the ligand and of the protein/residue/glycan is closer than 0.4 nm;
- any pre-exisiting CV defined in the PLUMED [manual](https://www.plumed.org/doc-v2.7/user-doc/html/colvarintro.html) can be used;
- any CV defined directly by the user in the PLUMED input file via the \ref CUSTOM action can be used.


*/

link: @subpage masterclass-21-1 

description: This Masterclass explains the syntax of the PLUMED input file and how to use PLUMED to analyze CVs
