Manual SMOG - Beta
Manual SMOG - Beta
Version 2.3beta
User’s Manual
This manual is automatically regenerated daily
Last built: October 8, 2020
Authors:
Jeffrey Noel, Mariana Levi, Mohit Raghunathan
Heiko Lammert, Ryan Hayes, José Onuchic, Paul Whitford
info@smog-server.org
i
If you would only like to use the basic functionality of SMOG 2 (i.e. the standard sup-
ported models), then you may find that the README file associated with the distribution
provides all the information you need. This manual provides a more detailed description
of the basic usage guidelines, in addition to advanced usage information and detailed
descriptions of the underlying methodologies/models. For basic users, if the README is
not sufficient, then Chapters 1, 2, 3 and 4 will help you get started. For more advanced
users, who may wish to modify structure-based models (e.g. extending to new residue
types, ligands, electrostatics, etc), then consulting Chapters 5, 6, 7 and 8 will be nec-
essary. We additionally provide appendices that have technical details that may be of
interest to some users. While we try to provide all pertinent information here, don’t
hesitate to contact us for clarification.
SMOG 2, and all associated files, are distributed free of charge, made available under
the GNU General Public License.
Contents
1 Introduction 1
1.1 What are Structure-Based Models? . . . . . . . . . . . . . . . . . . . . . 1
1.2 What does SMOG 2 do? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 “Installation” 3
2.1 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 Verify SMOG is properly configured . . . . . . . . . . . . . . . . . . . . . 5
3 Using SMOG 2 7
3.1 Preparing the input PDB file . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.1.1 PDB file format . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.1.2 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 Generating a Structure-Based Model . . . . . . . . . . . . . . . . . . . . . 8
3.2.1 Default All-Atom Model . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2.2 Default Cα model . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2.3 Default Gaussian contact potential models . . . . . . . . . . . . . . 10
3.3 Input options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.3.1 User-provided contact map . . . . . . . . . . . . . . . . . . . . . . 10
5 SMOG Tools 19
5.1 smog adjustPDB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5.2 smog extract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.3 smog ions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.4 smog optim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.5 smog scale-energies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
ii
iii
6 Template-Based Approach 26
6.1 Introduction to templates . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
6.2 SMOG 2 Templates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
6.2.1 Biomolecular Information File (.bif) . . . . . . . . . . . . . . . . . 27
6.2.2 Setting Information File (.sif) . . . . . . . . . . . . . . . . . . . . . 30
6.2.3 Bond Information File (.b) & Nonbond Information File (.nb) . . . 33
E Acknowledgements 68
Bibliography 69
Chapter 1
Introduction
Structure-based models (i.e. SBMs, or SMOG models) define a particular known con-
formation as a potential energy minimum. With this being the only requirement, there
is an endless number of ways in which one may construct a structure-based model. For
example, one may build protein-specific and RNA-specific variants, the resolution of
the model can be varied, multiple minima may be included, and the degree to which
non-native interactions are stabilizing can be adjusted. The utility of these models is
equally diverse, where they may be applied for understanding dynamics, or for structural
modeling objectives, as discussed in recent review articles [1, 2]1 . With such flexibility,
this general class of models can be tailored to ask specific questions about biomolecular
processes. In the present document, we describe a set of computational tools that allows
one to use previously-developed structure-based models, as well as design and implement
new variations that are suited for your specific needs.
In the simplest form, a structure-based model defines a single configuration as the global
potential energy minimum, where all intra- and inter-molecular interactions are assigned
minima that correspond to that structure. This fully native-centric variant of the model
is colloquially referred to as a “vanilla” structure-based model. In terms of the energy
landscapes of biomolecules, these vanilla models represent an energetically unfrustrated
landscape [3, 4]. Since biomolecular landscapes possess some degree of energetic rough-
ness, it is often desirable to extend structure-based models to include both native and
non-native interactions. As such, in the SMOG 2 software package, we provide two
energetically unfrustrated models by default, upon which additional interactions may
be added by the user. Specifically, this distribution provides the coarse-grained Cα
1
[1] available at http://smog-server.org/noel/book chapter sbm.pdf
1
Chapter 1. Introduction 2
structure-based model for proteins, as developed by Clementi et al [5], and the all-atom
structure-based model, as developed by Whitford et al. [6]. While the Cα model is only
defined for proteins, the all-atom model supports proteins, RNA, DNA and some lig-
ands. Since there have been a number of extensions in the all-atom model over the last
several years, a complete description of the energetic parameters is given in Appendix A.
SMOG 2 is a software package designed to allow the user to start with a structure of
a biomolecule (i.e. a PDB file) and construct a structure-based model, which is then
simulated using Gromacs [7], NAMD [8], or openMM. We previously implemented an
online server (SMOG 1) that was capable of providing the vanilla flavor of structure-
based models, along with a few adjustable parameters. SMOG 2 is a nearly complete
rewrite of the original software package, and it provides four major advantages over its
predecessor:
• Extensibility – One may add new residue and molecule types without source-code
modifications.
It is important to note that, in order to adjust a SMOG model, one simply adjusts the
XML template files. In addition, the templates are not statically-linked to SMOG 2,
which allows any user to easily choose from a library of models at runtime.
Chapter 2
“Installation”
Since SMOG 2 is written in Perl and Java there is no need for compilation and installa-
tion. However, one must configure a few settings and ensure that appropriate modules
are available at runtime. This section describes the steps necessary to configure and
verify that SMOG 2 is functioning properly on your local machine.
2.1 Prerequisites
SMOG 2 runs on all Unix-like operating systems. The prerequisites for SMOG 2 are
as well as the following modules, which are available through the Perl module managing
utility CPAN (recommended), or through manual installation:
String::Util
XML::Simple
Exporter
XML::Validator::Schema
Finally, your machine must have Java Runtime Environment v1.7 or greater.
3
Chapter 2. Installation 4
2.2 Configuration
Before running SMOG 2, you must configure it on your local machine. This is accom-
plished through a short two-step process:
1) Set the required environment variables. To do so, modify the file configure.smog2,
which is included with the distribution. Specifically, you will need to modify the follow-
ing two lines:
smog2dir=""
perl4smog=""
smog2dir is the main SMOG directory, and perl4smog is the version of perl that you
would like to use. On most linux systems, the default location of Perl is "/usr/bin/perl",
whereas on OSX it is typically "/opt/local/bin/perl".
This will set the required environment variables for your current session.
To automatically configure SMOG at login, you may want to add the above command
to your shell profile file (e.g. ~/.bashrc):
source /full-smog2-path/configure.smog2
As a note, we have found that some Linux distributions require that you replace source
with bash. Among other things, this will add the smog bin to your PATH.
If java is not found, make sure to install the JRE or JDK 1.7 or greater and that it is
in your path (accessible as: > java).
Chapter 2. Installation 5
If SMOG is properly configured, then you will be able to run smog with the following
command:
> smog2
If configuration was successful, then you will be greeted with the following message:
In addition to verifying that SMOG 2 will start, it is highly recommended that you
also run the testing scripts provided as a tarball (smog-check), which is available at
smog-server.org.
smog-check contains three main test scripts. Two of these scripts test smog2 and the
third tests the associated tools. For testing smog2, one script is fast, whereas the second
is very comprehensive and can be used to test new SMOG models that you may design.
When everything works well, performing the checks is as easy as issuing three commands.
Just make sure you source configure.smog2 before running the tests.
./quick-check
./smog-check
./smog-tool-check
If you find that any of these scripts reports failures, please communicate that to the
smog-server.org team, so that we may help diagnose the problem.
Chapter 2. Installation 6
If you are modifying the SMOG package, or you are making modifications to the default
force fields, it may be desirable to only employ specific checks. In this case, you can run
a single test with the command ./smog-check N, where N is the index of the check.
You can also run a range of tests with ./smog-check N M.
Chapter 3
Using SMOG 2
This chapter describes the usage of SMOG 2. It is recommended that all users read this
chapter before using the software.
To avoid I/O issues, please follow these additional guidelines when preparing your PDB
file for use with SMOG 2:
• Only use a text editor (e.g. vi, or emacs) to prevent insertion of hidden characters.
• Only include lines that start with ATOM, HETATM, COMMENT(may be at the
beginning, or end of any chain), BOND (user-defined system-specific bonds. Must
appear after END. See Section 8.1), TER (to indicate a break between 2 chains)
and END. Only BOND and COMMENT lines may appear after END.
• Chain identifiers are ignored. If your system has multiple chains, insert TER lines
(left justified) between chains. NOTE: Do not immediately follow a TER line with
an END line. This is interpreted as a chain with 0 atoms, and an error message
will be issued.
7
Chapter 3. Usage 8
• Only residues and atoms within a residue defined in the force field templates will
be recognized by SMOG 2. Unless a coarse-grained template is designated with
-tCG, unrecognized residues and atoms will lead to a PDB parse error, and the
program will exit.
• Residue name field is officially columns 18-20 in the PDB definition, but smog2
reads columns 18-21 since column 21 is unused.
3.1.2 Preprocessing
As discussed in Chapter 6, SMOG 2 reads “template” files in order to generate force field
files. As such, each PDB file has to fully conform to the molecular structure definitions
provided by the templates. For example, the default all-atom templates (provided in
SBM AA) distinguish between terminal and non-terminal residues (i.e. in proteins there
is an OXT in place of a peptide bond for terminal residues). In the default templates,
the C-terminal amino acid residues have an suffix “T” added to their their three-letter
code (e.g. GLY vs. GLYT).
A preprocessing tool (smog adjustPDB) is provided that will adjust your PDB to reflect
changes necessary to conform to the templates. For usage guidelines, see Section 5.1.
SMOG 2 supports a broad range of structure-based models, and the All-Atom [6], and
the Cα [5] models are provided as defaults. See Appendix A for full details of the default
models. By running SMOG, you will generate the .top, .gro, and .ndx files necessary
to perform a structure-based simulation in Gromacs, NAMD, or openMM. Additional
output files are provided, for your information.
The all-atom potential energy function is defined through the template files found in
$SMOG PATH/SBM AA. These files define:
1) the covalent geometry of amino acids, nucleic acids, some ligands, as well as bioinor-
ganic atoms
2) the energetic and system parameters (e.g. mass, charge, interaction strengths)
Chapter 3. Usage 9
To generate all-atom force field and coordinate files for the default model (i.e. .top and
.gro files), issue the command:
where yourFile.pdb is the name of the file containing your molecular system.
If you would like to specify a different all-atom model, then use the command:
where templateDirName is the name of the directory containing the desired template
files.
To generate force field and coordinate files for the default Cα model, issue the command:
Note that an additional set of templates are required when using a coarse-grained model.
The option -tCG is used to indicate the precise coarse-grained model that should be pre-
pared. When -tCG is given, then the -t flag is used to designate the templates that
initially process the PDB for contact analysis. Normally an all-atom PDB is provided,
since native contact maps make the most sense when generated from an all-atom struc-
ture (note that the “Shadow” map only makes sense with atomic graining). The -tCG
templates are then used to construct the CG energetic model. In the above example,
the PDB has residues and atoms defined in the -t templates, and these definition will
also be used for contact map generation. See Appendix B for a detailed description of
the supported contact map calculations.
Chapter 3. Usage 10
The default Gaussian models have energetic and structural parameters that are tailored
to be similar to the LJ contact potential models, while allowing for a more flexible Gaus-
sian contact shape. To generate force field and coordinate files for the default Gaussian
all-atom model, issue the command:
SMOG 2 always requires a PDB file and some argument indicating which model should
be used. Table 3.1 shows the currently-supported input arguments.
If you have generated contacts yourself, these can be used instead of using the internal
SMOG 2 routines. A single file containing all the contacts in a list can be specified at
the command line with the flag -c. For example:
which should be formatted as a single line per contact, whitespace delimited, where each
line has the two atoms interacting and their respective chain numbers. The chains are
numbered starting from 1 by the order of occurence in the PDB file. The atomNum
should be consistent with atom numbers in the input PDB file. The fifth column can
contain a numeric distance in Å, which, if provided, will be used instead of the native
Chapter 3. Usage 11
distance. If using -tCG to obtain a coarse grained topology, the input contact map should
designate residue numbers instead of atom numbers, again with the same numbering as
in the input PDB file.
Chapter 4
Performing a simulation, or
calculation
Once you have generate the .top and .gro files with SMOG, you are ready to perform
a simulation. Rather than write a new molecular dynamics simulation package, SMOG
generates input files for use with Gromacs [7], NAMD [8] and openMM, highly-optimized
and parallelized MD software suites. This allows you to use nearly every protocol that
has been implemented in these programs when performing simulations with structure-
based models (e.g. replica exchange, umbrella sampling). In addition, these packages
are scalable to many processors through a combination of MPI and thread-based paral-
lelization, and they also support GPU acceleration, which allows SMOG models to fully
take advantage of cutting-edge computing resources. Here, we provide brief descriptions
of how to perform SMOG model simulations in Gromacs, while references to NAMD
and openMM introductions are also noted.
First, produce a portable xdr file (in the example below, run.tpr) that describes your
simulation. This file is platform independent and contains all parameters for your sim-
ulations. This allows you to produce a tpr file on any machine, and then move it to
another machine and run your simulation. The xdr file is produced by grompp (part of
the Gromacs distribution):
12
Chapter 4. Simulations 13
The file mdpfile.mdp tells Gromacs what settings to use during the simulation, such
as the timestep size, the number of timesteps and what thermostat to use. Here is a
sample set of recommended configurations when using the default all-atom model:
integrator = sd ;Run control: Use Langevin Dynamics protocols.
dt = 0.002 ;time step in reduced units.
nsteps = 100000 ;number of integration steps
nstxout = 100000 ;frequency to write coordinates to output trajectory .trr file.
nstvout = 100000 ;frequency to write velocities to output trajectory .trr file
nstlog = 1000 ;frequency to write energies to log file
nstenergy = 1000 ;frequency to write energies to energy file
nstxtcout = 1000 ;frequency to write coordinates to .xtc trajectory
xtc_grps = system ;group(s) to write to .xtc trajectory (assuming no ndx file is supplied to grompp).
energygrps = system ;group(s) to write to energy file
nstlist = 20 ;Frequency to update the neighbor list
coulombtype = Cut-off
ns_type = grid ; use grid-based neighbor searching
rlist = 1.2 ;cut-off distance for the short-range neighbor list
rcoulomb = 1.2 ; cut-off distance for coulomb interactions
rvdw = 1.2 ; cut-off distance for Vdw interactions
pbc = no ; Periodic boundary conditions in all the directions
table-extension = 10 ; (nm) Should equals half of the box’s longest diagonal.
tc-grps = system ;Temperature coupling
tau_t = 1.0 ; Temperature coupling time constant. Smaller values = stronger coupling.
ref_t = 80.0 ; ~1 reduced temperature unit (see Gromacs manual or SMOG2 manual for details)
Pcoupl = no ;Pressure coupling
gen_vel = yes ;Velocity generation
gen_temp = 80.0
gen_seed = -1
ld_seed = -1
comm_mode = angular ; center of mass velocity removal.
Listing 4.1: Sample mdp file for all-atom SMOG models used for Gromacs v4.5/4.6
Note: If you would like to perform energy minimization, simply change the integrator
settings to steep (steepest descent) or cg (conjugate gradient).
After you have generated the .tpr file with grompp, you will need to perform the simu-
lation. To run the simulation, issue the command:
Is is highly recommended that you explore all Gromacs options, in order to ensure max-
imum performance (e.g. the number of threads being used). SMOG-model specific
requirement: To use domain decomposition when performing a simulation in parallel,
using either threads, or MPI, you should add the additional flag -noddcheck. Note: For
Chapter 4. Simulations 14
folding of small proteins you will probably want to avoid domain decomposition, and
instead use particle decomposition by adding the option -pd when on a single node.
4.1.2 Cα Model
To run a simulation with the Cα model, the steps are the same as for the AA model,
though there are a few minor differences. First, when running grompp, you will want to
change a few settings in the .mdp file. A sample .mdp file for Cα models is given below.
integrator = sd ;Run control: Use Langevin Dynamics protocols.
dt = 0.0005 ;time step in reduced units.
nsteps = 100000 ;number of integration steps
nstxout = 100000 ;frequency to write coordinates to output trajectory .trr file.
nstvout = 100000 ;frequency to write velocities to output trajectory .trr file
nstlog = 1000 ;frequency to write energies to log file
nstenergy = 1000 ;frequency to write energies to energy file
nstxtcout = 1000 ;frequency to write coordinates to .xtc trajectory
xtc_grps = system ;group(s) to write to .xtc trajectory (assuming no ndx file is supplied to grompp).
energygrps = system ;group(s) to write to energy file
nstlist = 20 ;Frequency to update the neighbor list
coulombtype = Cut-off
ns_type = grid ; use grid-based neighbor searching
rlist = 3.0 ;cut-off distance for the short-range neighbor list
rcoulomb = 3.0 ; cut-off distance for coulomb interactions
rvdw = 3.0 ; cut-off distance for Vdw interactions
coulombtype = User
vdwtype = User
pbc = no ; Periodic boundary conditions in all the directions
table-extension = 10 ; (nm) Should equals half of the box’s longest diagonal.
tc-grps = system ;Temperature coupling
tau_t = 1.0 ; Temperature coupling time constant. Smaller values = stronger coupling.
ref_t = 80.0 ; ~1 reduced temperature unit (see Gromacs manual or SMOG2 manual for details)
Pcoupl = no ;Pressure coupling
gen_vel = yes ;Velocity generation
gen_temp = 80.0
gen_seed = -1
ld_seed = -1
comm_mode = angular ; center of mass velocity removal.
Listing 4.2: Sample mdp file for Cα SMOG models used for Gromacs v4.5
The most significant difference is the use of “User-defined” vdW and Coulomb inter-
actions. This is due to the fact that the 10-12 potential used for contact interactions
in the Cα model. In order to run mdrun (next step), it is necessary to generate table
files that define the 10-12 interaction. We provide a tools for generating these tables
($SMOG PATH/bin/smog tablegen) with the SMOG2 distribution, which is described in
Section 5.6.
Chapter 4. Simulations 15
After you have generated your tabulated potentials for the 10-12 interaction (i.e. ta-
ble file.xvg), and you have prepared a .tpr file with grompp, you can run the simulation
with the command:
> mdrun -s run.tpr -noddcheck -table table file.xvg -tablep table file.xvg
Typically, for protein folding, you will want to avoid domain decomposition and instead
use particle decomposition by adding the option -pd when on a single node. After you
perform your simulation, you can utilize any analysis tools provided with Gromacs.
4.1.3 Examples
Gromacs 5 has a few changes that impact SMOG models. First, we don’t yet provide
a Gromacs 5 distribution with the SMOG enhancements (umbrellas, g kuh, gaussian
contact potentials). So, if you want to use these you can only use Gromacs 4.5. Gromacs
5 itself has changes of note: 1) OpenMP support has replaced the option of particle
decomposition and 2) OpenMP requires cutoff-scheme=Verlet and Verlet doesn’t yet
allow tabulated potentials. This has the largest impact on C-alpha models, which use
tabulated potentials. If your simulated system has less than roughly 100 atoms, you can
typically only use a single processor with v5, because additional threads are only allowed
through OpenMP. If your system is large enough, you can use multiple MPI processes
with domain decomposition to scale to multiple cores. When using Verlet lists you have
to use pbc = xyz. For all-atom simulations, Verlet lists are fine, and it is usually best
to use as many OpenMP threads as possible with -ntomp.
Chapter 4. Simulations 16
4.2.1 Examples
The force field files generated by SMOG 2 are fully compatible with NAMD. To perform
SMOG models in NAMD, please consult the NAMD manual. A tutorial is available:
http://vidar.scs.uiuc.edu/jlai7/Tutorial/GoDemo.pdf.
The force field files generated by SMOG 2 are compatible with openMM. To perform
SMOG models with openMM, please consult the Grossfield Group webpage.
An alternate method for sampling the landscape is to use Discrete Path Sampling (DPS).
As part of the SMOG 2 package, a script (smog optim) is provided that will convert the
.gro and .top files into the inputs necessary for DPS using the OPTIM/PATHSAMPLE
suite developed by the Wales Group. Currently supported interactions are: bonds, bond
angles, cosine and harmonic dihedrals, 10-12 and 6-12 contacts, anisotropic position
restraints and Debye-Hückel electrostatics. As we develop and test the protocols for
smog optim, this manual will be updated with recommended step-by-step instructions
for DPS SMOG models. For now, we just provide a few examples for how to get started.
Before discussing usage, there are a few important differences between the SMOG force
fields used in Gromacs, and those applied in DPS.
1. Even though the routines required for using DPS with SMOG models are invoked
with the “SB” flag (see listing 4.3), these routines are not structure-based-specific.
That is, these are general routines for calculating bond, angle, dihedral, contacts,
excluded volume and electrostatic energies/forces. Similar to how SMOG models
are implemented in Gromacs, the input force field file (SBM.INP) encodes the
Chapter 4. Simulations 17
structure-based aspects of the model. Thus, one may define a SBM.INP file that
has many non-structure-based features (e.g. non-specific contacts and electrostat-
ics), even though the “SB” routines would be used.
3. In Gromacs, one may exclude all non-bonded interaction that are connected by less
than a specific number (nrexcl) of bonds. In OPTIM, exclusions are automatically
generated for all atoms that interact via a bond, bond angle, dihedral, or contact.
nrexcl is not used.
To convert the .gro and .top files into an odata and SBM.INP file, which is necessary for
OPTIM/PATHSAMPLE, use the smog optim module (described in section 5.4). This
tool may be found in the $SMOG PATH/tools directory. The script is interactive, and
you will be prompted for all necessary input and options. Once you have organized
your force field for DPS, you will need to perform energy minimization for your struc-
ture. By default, the odata file generated by smog optim will provide the keywords for
minimization. The odata file contains the initial coordinates, as well as the calculation
specifications. The odata file should look like the following:
STEPS 1000000
BFGSMIN 0.000001
POINTS
SB 13.67 48.35 -15.2
SB 12.16 51.69 -14.1
SB 10.07 51.44 -10.94
SB 6.78 53.35 -10.85
SB 4.22 53.53 -8.01
SB 0.53 53.56 -8.78
...
\
If you selected rigidification when using smog optim, then there will be the additional
keyword RIGIDINIT in the odata file. In addition to the odata file, smog optim will also
generate the file SBM.INP, which defines the SMOG force field.
After energy minimizing two structures, you will want to find a connection between
them. To find an initial connection, you need to update the odata file, such that it
starts with:
Chapter 4. Simulations 18
In this case, the coordinates in the odata file should correspond to the coordinates of one
of the minimized endpoints. The coordinates of the second endpoint should be provided
in a file called finish. finish should simply be a listing of XYZ coordinates (no “SB”
at the beginning of each line). These steps are sufficient to get one started with DPS
using models generated by SMOG 2. Note: Since SMOG models are so computationally
inexpensive, compared to other models, I/O can sometimes become limiting when using
OPTIM. It is advisable that you check the summary statistics at the end of each use of
OPTIM and verify that roughly 90 percent of time is spent on energy+gradient calls.
Chapter 5
SMOG Tools
This script helps rename atoms and residues in a PDB file, such that they will conform
to the naming scheme used in a set of template files.
The first required input is the PDB file, the second required input is either the option
-default to use the map file provided with the program (for use with the default SMOG
models), or the option -map followed by your own map file.
map file format Starting with SMOG v2.3, the default behavior of smog adjustPDB
has changed. Previously, only terminal residues were renamed. Starting with v2.3, the
following changes have taken effect:
• Support was added for alternate atomic naming conventions (e.g. C3* or C3’).
• One can now ignore (not rename) specific residue names, regardless of the atomic
composition.
Currently, there are 4 types of lines that can appear in the mapping file:
19
Chapter 5. Additional Tools 20
• comment lines: A semicolon can be used to denote a comment. If there are only
white spaces before a comment, then the line will be ignored.
• rename: This is used to globally rename a specific atom. For example, one may
want to use the prime convention, or asterisk convention. As another example, if
you wanted to rename all CA atoms as CAA, then you would give the line:
rename CA CAA
The convention is to begin with rename (not case-sensitive), followed by the desired
atom name and then a possible alternate atom name that could be encountered in
a PDB.
• ignoreres: Sometimes one does not want to rename specific residues. Example:
If one wanted to not rename PSU residues, nor its atom names, then the following
line would be added:
ignoreRes PSU
residue ALA C CA CB N O
Based on this, any residue that has (only) a C, CA, CB, N and O would be renamed
ALA. Note that the rename option is applied before matching. So, if you had a
rename definition that converted CF to CA (rename CF CA), then you could also
match ALA if the residue has C, CF, CB, N and O. In this example, C, CA, CB,
N and O, or C, CF, CB, N and O would both match ALA. The output PDB file
would name the residue CA. It is also possible to allow for multiple types of residue
compositions to correspond to the same target residue. For example, sometimes
one uses O2 to refer to OXT. This would be enabled by:
A final option with the residue definition line is the use of %first or %last. These
tags indicate that, in addition to matching the atoms, the corresponding residue
name will only be substituted if the residue is the first/last residue in a chain.
This may occur if, for example, one is assigning charges to RNA residues. Since
the terminal residues may have the same atoms, but different charges, then one
may need to assign a different name for the terminal position.
Chapter 5. Additional Tools 21
To see the current default format, check out the file share/mapfiles/sbmMapExact.
;This is a mapping file that was generated by smog2
;This defines the composition of atoms in the templates found in:
; /Users/coolsim/smogtemplates
rename OP1 O1P
rename O3’ O3*
residue A C1* C2 C2* C3* C4 C4* C5 C5* C6 C8 N1 N3 N6 N7 N9 O1P O2* O2P O3* O4* O5* P
residue ALAT C CA CB N (O O1) (OXT O2)
residue A C1* C2 C2* C3* C4 C4* C5 C5* C6 C8 N1 N3 N6 N7 N9 O1P O2* O2P O3* O4* O5* P
residue A3P C1* C2 C2* C3* C4 C4* C5 C5* C6 C8 N1 N3 N6 N7 N9 O1P O2* O2P O3* O4* O5* P \%last
....
Generating the mapping file: While one can make the mapping file by hand, smog2 will
also generate a mapping file that defines all of the residues in a given force field. To
generate a minimal mapping file for a given .bif file, then use the gen map option with
smog2. Note: This will only generate the residue definitions, using the exact atom
names defined in the force field (i.e. rename, ignoreres and () will not generated). If
a residue has a “meta” attribute defined in the .bif file, then its value will be included
at the end of the atom names with a “%”. For example, if you have a “last” definition,
then meta=‘‘last’’ in the residue definition in the .bif file would lead to “%last” being
included in the residue definition of the SMOG-generated mapping file.
Legacy map file format When using the -legacy option, the following format should
be used for the map file:
Lines containing a “#” character are interpreted as comments. Each line must have
three strings that are space/tab delimited. The first field is the residue name, as it
appears in the input PDB file. The second is the name to be substituted if the residue
is the first residue in a chain (e.g. N-terminus in a protein), and the third field is the
corresponding substitution for the last residue (e.g. C-terminus) in each chain. The
preprocessing tool will write a modified PDB file smog.pdb.
The script can also renumber atom and residue indices to be sequential within each
chain, and it adjusts atom names to be consistent with the SBM AA template files.
Chapter 5. Additional Tools 22
It is common when studying large molecular assemblies that you will only want to
simulate a portion of the system. In these cases, it is often convenient to remove many
atoms from the model, and then apply restraints on the boundary atoms [9]. To facilitate
this, smog extract will take a SMOG .top and .gro file, and produce a new set of force
field files that only include a specified subset of atoms. The atom list can be given in
a Gromacs-style .ndx file. If multiple groups are listed in the ndx file, the user will be
prompted to select a single group. By default, position restraints are not introduced.
If you would like to include position restraints on all atoms that have an interaction
removed during extraction, then use the -restraints flag to indicate the strength of
the restraints. To see all options, use the -h flag, or refer to Table 5.2.
This module allows one to add ions to your model. The definitions used for the ions
is decided by the user. That is, vdW parameters, mass, charge and name must all be
specified. These ions will be place around the existing atoms without introducing atomic
clashes. For a full list of supported flags, see Table 5.3.
Chapter 5. Additional Tools 23
smog optim is currently in development, and the details of its use are likely to change.
To see a complete list of current options, used smog optim -h.
This tool uses a SMOG .top and .gro file, along with an index file to generate a SMOG
model in which contacts and/or dihedrals weights are modified. This is a common
task when using SMOG models, and we provide this script as a convenience. On the
command line, you must indicate whether you want to rescale dihedrals (-rd <float>)
or contacts (-rc <float>). The specific interactions that will be rescaled are determined
Chapter 5. Additional Tools 24
by the index file and user input at runtime. That is, if you specify that dihedrals will
be rescaled, you will be prompted to select an index group for rescaling. If all four
atom that form a dihedral are within the index group, then the weight is rescaled. Only
dihedrals of type 1 are rescaled. If one is rescaling contacts, then the users will need to
select two index groups. Any contacts between the two groups will be rescaled. If the
same group is selected twice, then intra-group contacts will be rescaled.
When using user-defined potentials (i.e. not 6-12, or direct coulomb interactions), then
it is necessary to provide a table file that contains tabulated potentials and forces.
Specifically, Gromacs will consider any tabulated potential of the form:
where A and B are defined in the .top file, and qi is the charge of atom i. The functions
f (rij ), f 0 (rij ), g(rij ), g 0 (rij ), h(rij ), h0 (rij ) are given by the table file.
smog tablegen will generate a table file with specific parameters, where h(rij ) = r1M ,
ij
exp(−κrij )
1
. κ is the Debye length, and it is equal to 3.2 [C]nm−1 ,
p
g(rij ) = rN and f (rij ) = rij
ij
where [C] is the concentration of monovalent ions (Molar) to be described implicitly by
a Debye-Hückel potential. [C] should be the sum of all monovalent ion concentrations,
regardless of electric charge. Table 5.5 describes the available options.
Chapter 5. Additional Tools 25
Template-Based Approach
A single SMOG “template” is comprised of four XML-formatted files. These files are
required when using SMOG 2. XML format was adopted because of its consistent
formatting, ease of editability and readability, and there are widely available program
modules to generate and parse XML files. Furthermore, XML allows for schemas, a
form content format restriction file, to which the template files must conform, which
adds an additional layer of error checking capabilities. This chapter assumes that the
user knows the basics of XML formatting. Users unfamiliar with XML formatting may
want to check out the W3schools’ website.
Table 6.1 summarizes the purpose of each template file. In Chapter 7, we show how to
add new residues to the template files for an all-atom structure-based model.
SMOG 2 expects four template files to be present in a single folder (i.e. the template
folder). As discussed in Chapter 3, if one is not using a default model, then the template
folder name is a required argument when running SMOG 2. A template folder can only
26
Chapter 6. Template-Based Approach 27
File Purpose
Biomolecular Information File (.bif) Defines the structure of
biomolecules to be supported
Setting Information File (.sif) Defines interaction function
declarations
Bond Information File (.b) Defines bonded interactions
between atoms
Nonbond Information File (.nb) Defines non-bonded interac-
tions between atoms
Table 6.1: Descriptions of the four files that comprise a single template. The expected
suffix of each file is shown in parentheses.
contain one of each file type. If your template folder contains more than one file of
a specific file type, the program will exit with an error. Each file contains unique
information, as described below.
The Biomolecular Information File (from here on called .bif) defines the covalent struc-
ture of all residues described by a particular force field. Each residue is defined in the
.bif file by declaring all the atoms in that residue, the bonds between the atoms and the
improper dihedrals between the atoms.
Residues
Each residue is individually defined between the <residues> and </residues> tags. As
an example, the text below shows how one would define the residue ALA, which contains
5 atoms.
1 <residue name="ALA" residueType="amino" atomCount="5">
2 <atoms>
3 <atom bType="B_1" nbType="NB_1" pairType="P_1">N</atom>
4 <atom bType="B_1" nbType="NB_1" pairType="P_1">CA</atom>
5 <atom bType="B_1" nbType="NB_1" pairType="P_1">C</atom>
6 <atom bType="B_1" nbType="NB_1" pairType="P_1" charge="-1.0">O</atom>
7 <atom bType="B_1" nbType="NB_1" pairType="P_1">CB</atom>
8 </atoms>
9 <bonds>
10 <!--BACKBONE-->
11 <bond energyGroup="bb_a">
12 <atom>N</atom>
13 <atom>CA</atom>
14 </bond>
15 <bond energyGroup="bb_a">
16 <atom>CA</atom>
17 <atom>C</atom>
Chapter 6. Template-Based Approach 28
18 </bond>
19 <bond energyGroup="bb_a">
20 <atom>C</atom>
21 <atom>O</atom>
22 </bond>
23 <!--FUNCTIONAL GROUP-->
24 <bond energyGroup="sc_a">
25 <atom>CA</atom>
26 <atom>CB</atom>
27 </bond>
28 </bonds>
29 <impropers>
30 <improper>
31 <atom>CB</atom>
32 <atom>CA</atom>
33 <atom>C</atom>
34 <atom>N</atom>
35 </improper>
36 </impropers>
37 </residue>
The attribute name is the name of the residue, as used in your PDB file. The attribute
residue residueType is the type of residue, in this case, an amino residue. Finally the
optional attribute atomCount allows the user to explicitly set the total number of atoms
to be counted for normalizing energies. That is, the total atom count is used in the
energetic scaling procedure of dihedrals and contact energies, as described in Appendix
A. This feature is useful when including many copies of a ligand in your system, since
the energetic normalization should only be based on the protein, or RNA, and not the
multiply-copied ligands. In such a scenario, the user would set atomCount to 0. If
atomCount is not defined, SMOG 2 automatically uses the total number of atoms listed
under the <atoms> tag.
The <atoms> tag declares all the atoms in the residue. Note that all the atoms within a
specific residue in your PDB must be listed here. If the PDB and .bif are not consistent,
the program will terminate with an error. This is the reason that the default templates
differentiate between the C-terminal and non-terminal protein residues (See Chapter 3).
Each atom has a bond type bType, a non-bond type nbType and a pair type pairType.
The bond type attribute is used in the generation of the bonded interactions (bonds,
angles, and dihedrals). Likewise, the non-bond type attribute is used in the generation
of the non-bonded interactions. The pairType attribute is used in the generation of
contact interactions (6-12, 10-12 or Gaussian interactions). In the example above, we
also show the optional charge attribute, which allows the user to specify the charge of
an atom within a residue. Using charge within an atom declaration will supersede any
charge assignment based on nbType (see below).
Chapter 6. Template-Based Approach 29
The <bonds> tag contains all the bonds that should be present in the residue. Each bond
in a residue is listed under the <bond> tag. The atom names must match those listed in
the <atoms> field. The bond tag also has an attribute called energyGroup that allows
for one to define heterogeneous energetics in the system. The energy group attribute is
used in conjunction with the bond types to determine the dihedral interaction. Using
the bonds declared here, the program dynamically calculates all angles and dihedrals
that can exist in the molecule.
The <impropers> tag contains all the improper dihedral angles in the biomolecule. The
tag <improper></improper> holds four atom tags. The order of the four atoms defines
a specific improper dihedral within a residue. This feature is used to add dihedrals that
cannot be determined based on bond geometry.
Connections
In addition to defining a residue, the .bif file is also used to define how sequential
residues are covalently connected. Listing 6.2.1 shows how two residues of type amino
are covalently linked. The attribute residueType1 and residueType2 declares how a
residue of type residueType1 at position n should be connected to a residue of type
residueType2 at position n+1. The residue types are matched based on the residue
definitions. Much of the structure of the connection element is similar to that of the
residue element. There is a single bond, whereby the first atom belongs to the nth residue
and the second atom belongs to the (n+1)th residue. We can also define, though not
a requirement component of all connection definitions, a single improper dihedral. In
the context of impropers, the special character suffix “+” is used to declare atoms that
belong to the (n+1)th residue. In the code listing, the N atom belongs to the (n+1)th
residue.
Note: If you do not want to covalently link residues that are listed sequentially in the
PDB file, then you should add the attribute connect=no to the definition of the first
residue type within the residue declaration. This could be helpful if you are modeling
ions, where each ion is listed as a single residue, but a block of ions appear in the PDB
file between TER lines.
1 <connections>
2 <!-- AMINO/AMINO -->
3 <connection name="amino-amino" residueType1="amino" residueType2="amino">
4 <bond energyGroup="r_a" >
5 <atom>C</atom>
6 <atom>N</atom>
7 </bond>
8
9
Chapter 6. Template-Based Approach 30
10 <improper>
11 <atom>O</atom>
12 <atom>CA</atom>
13 <atom>C</atom>
14 <atom>N+</atom>
15 </improper>
16 </connection>
While the .bif file is used to define the covalent geometry of each residue, the Setting
Information File (.sif) is used to control the distribution and functional form of the
energetics, which includes the inter-dihedral dihedral ratios, contact-to-dihedral ratios,
contact map settings and function declarations.
Functions
The <functions> tag should list all the functions that the model requires.
1 <functions>
2 <function name="bond_harmonic" directive="bonds"/>
3 <function name="bond_type6" directive="bonds"/>
4 <function name="bond_free" directive="bonds"/>
5 <function name="angle_harmonic" directive="angles"/>
6 <function name="angle_free" directive="angles"/>
7 <function name="dihedral_cosine" directive="dihedrals"/>
8 <function name="dihedral_harmonic" directive="dihedrals"/>
9 <function name="dihedral_free" directive="dihedrals"/>
10 <function name="contact_1" directive="pairs" exclusions="1"/>
11 <function name="contact_2" directive="pairs" exclusions="1"/>
12 <function name="contact_gaussian" directive="pairs" exclusions="1"/>
13 <function name="contact_free" directive="pairs" exclusions="0"/>
14 </functions>
All supported functions are defined using the <functions> tag. These function names
are mapped to specific subroutines in src/smogv2. To add a new interaction type, follow
the examples already in the code. Note: These functions are simply mappings to defined
Gromacs interactions. If it doesn’t exist in Gromacs, it won’t help to add it here. The
interactions currently in the code are listed in Table 6.2. If you add useful interactions
please let us know so that we can incorporate them into the default distribution.
The function tag has two attributes, directive and exclusions. directive takes
a string and specifies the topology directive under which to write out the interaction.
Chapter 6. Template-Based Approach 31
exclusions is boolean and specifies whether to write out the atom pair additionally
under the [ exclusions ] directive. In Gromacs this means that the atom pair is not
included in the non-specific excluded volume interaction list. This is because the pair
interaction has its own excluded volume part and it shouldn’t be double counted. Since
there can be interactions defined that do not include excluded volume, we include the
option for flexibility.
Table 6.2: Functions available in SMOG2. The energy unit here means the reduced
energy unit (see A.4), which is equivalently kJ/mol in the Gromacs .top file.
There are two ways to declare Lennard-Jones-style potentials for contacts (See Table
6.2): contact 1 and contact 2
A B
Vcontact = M
− N (6.1)
r r
A and B are automatically evaluated, such that the minimum is at distance σ and depth
. Distances are calculated based on the input PDB if “?” is given in the σ field.
Chapter 6. Template-Based Approach 32
f (σ) and g(σ) are expressions that are evaluated using the native distance if “?” is given
for σ. The functions G(r) and H(r) are typically provided to Gromacs via table files
(See Section 5.6).
Group Settings
Structure-based models have two classes of energy groups: contact groups, and dihedral
groups. Each contact group represents a collection of contacts, and each dihedral group
represents a collection of dihedrals. These energy groups are used for energetic scaling
of interaction strength.
X X
HAA = ... + BB FD (φ) + SC FD (φ) (6.3)
backbone sidechain
X
+ C Fcontacts (r) + ... (6.4)
contacts
Shown above are the dihedral and contact terms of the All-Atom Hamiltonian. Shown
below are the energetic scaling factors for dihedrals and contacts, and their respective
attributes under the .sif file.
BB intraRelativeStrength bb
= (6.5)
SC intraRelativeStrength sc
X X X
BB + SC + C = Total non-ligand atoms (6.6)
P P
BB + SC dihedrals groupRatio
P = (6.7)
C contacts groupRatio
The program automatically calculates the total number of non-ligand atoms used in the
energetic scaling. Although the scaling equations shown above are limited to residue
types with only two dihedral types (backbone and sidechain dihedrals), and a single
contact type, the program allows for scaling equations to be generalized to more than two
dihedral types and more than one contact type. The energy group ratios are contained
within the <Groups> tag.
Chapter 6. Template-Based Approach 33
The two classes of energy group ratios, dihedral and contact ratios, are determined by
the <energyGroup> and <contactGroup> tags respectively. The residueType attribute
is used to designate the residue type the scaling factors of a particular energy group is
used for. The name attribute is the label for the energy group. The name attribute
used here is matched to the energyGroup attribute under <bond> tag in the .bif file.
The name of a particular contact energy group will be used later when declaring contact
interaction functions in the subsequent section of this chapter.
The normalize attribute for each energy group is a boolean attribute (1 or 0), and is
used to determine if a particular energy group is to be included in energy normalization
(see equation 6.6). For the All-Atom model, the dihedral group with the name “pr n”
(which represents the nucleic planar rigid dihedrals) has a normalize option set to 0,
indicating that planar dihedrals in nucleic acids will not be part of the normalization.
In contrast, in the All-Atom model, sidechain dihedrals are normalized, as are backbone
dihedrals and contact energies. Accordingly, those energy groups have the normalize
option set to 1. The intraRelativeStrength attribute is the relative ratio of stabilizing
energy within the different class of energy group for a particular residue (equation 6.5).
Finally we use the <groupRatios> tag to set the energy partition between the two classes
of energy group according to equation 6.7.
6.2.3 Bond Information File (.b) & Nonbond Information File (.nb)
The .b file is used to define all bonded interactions, which includes bonds, angles, and
dihedral. The .nb file is used to define all the non-bonded interactions, such as contacts,
1-4 pairs, and excluded volume.
Bonded Interactions
We first discuss how to define a basic bond interaction for the All-atom model.
Chapter 6. Template-Based Approach 34
Recall that each atom is given a bType when they are declared in the .bif file. Given
a particular bonded interaction (bonds, angles, dihedrals, impropers), the functional
form for a bonded interaction is assigned by matching the combination of the bTypes in
that interaction. The first <bond> tag (line 3 in Listing 6.5) is used assign a function
called bond harmonic to two bonded atoms of any type. Since the vanilla model has
only B 1 atoms, the *s could be replaced with B 1, and the same bonds would be
assigned. Recall under Listing 6.3, line 2, we defined bond harmonic() as a type 1
function under the bonds directive (harmonic bond function). The input parameters
for bond harmonic() are r0 and bond . In this case we use the special character “?” to
tell SMOG to calculate the native bond length (r0 ) from the PDB structure file. You
can instead also give an specific value for the bond distance. This feature is useful
when adding nonspecific/empirical terms (e.g. an AMBER/CHARMM backbone) to
the Hamiltonian.
The bType attribute here can take either an exact bond type or a special wildcard “*”
character that matches to all available bTypes. For the case of the All-Atom model,
since all the bType is identical, we can instead also defined the bond interaction as
shown under the <bond> tag in line 8 of Listing 6.5. Please note that the program
will assign the interaction that most closely matches the bTypes of a given
4-atom pair. One needs to be careful not to declare interactions that conflict with
one another. For example, if your system contains a bond between atoms of bType B 1
and B 2, and bond definitions are only given for bTypes B 1-* and B 2-*, then SMOG
would not know which function to apply, and it will exit with an error. However, if one
also explicitly defines a bond function for B 1-B 2 pairs, that bond would take priority.
The angle interaction follow a similar form as bonds, but instead of expecting two bType
attributes, it requires three. The bType attribute in this case is symmetric to the central
bType. When matching bond angles, the angle definition that matches the most atoms
Chapter 6. Template-Based Approach 35
identically will be used. Again, if an equal number of atoms match in multiple angle
definitions, there would be ambiguity, and SMOG will quit.
The “dihedral cosine” function is classified under the dihedral directive with function
type 1 in the .sif file (Listing 6.3 line 3). The arguments are of the form (φ0 , d , mult).
As introduced earlier the special character “?” tells the program to calculate the native
value from the PDB structure file. In this case we ask the program to calculate the
dihedral angle for all dihedral interactions that involve the bType combination *-*-*-*.
By using the “?” argument, along with a multiplicity factor (third argument), we tell
the program to multiply all the φ0 values by the multiplicity factor. More generally,
one may provide any function for the angle calculation. For example, if you were to use
(?*2-1), the angle would be evaluated as: native angle, times 2 and minus 1.
Equation 6.8 shown above is the dihedral interaction function used in the All-Atom
model. The code Listing 4.6 shown below shows how to define the second term of
FD (φ)
1 <dihedral func="dihedral_cosine(?,?,3)+dihedral_cosine(?,?,1)" energyGroup="bb_n">
2 <bType>*</bType>
3 <bType>*</bType>
4 <bType>*</bType>
5 <bType>*</bType>
6 </dihedral>
Finally multiple function can be applied to the same dihedral angle by including a sum
of functions (e.g. func=“f(..)+g(..)+h(..)”).
The special keyword “?” has limitations in where it can be used. For bonds and angles,
it can only be used for the first input parameter to a function. For dihedrals declared
as type 1, it can be used for the first two input parameters to a function. In the case of
dihedrals, the second input parameter for a type 1 dihedral is the energetic scaling term.
The “?” option is used to tell the program that the dihedral is considered under the
global energy normalization procedure. d is dynamically calculated by the program.
A note on assigning dihedrals: As with bonds and angles, it is quite easy to provide
multiple bType sequences that will match to the same atoms in a system. For example,
if you define a dihedral function for B 1-B 2-*-*, as well as B 1-*-*-*, then a dihedral in
your system between atoms (B 1,B 2,B 1,B 1) would match both. To determine which
Chapter 6. Template-Based Approach 36
Non-Bonded Interactions
The .nb files is used to define non-bonded interactions, including native contacts and
non-specific interactions.
1 <!-- DEFAULTS -->
2 <defaults gen-pairs="0" nbfunc="1" gmx-combination-rule="1" fudgeLJ="1" fudgeQ="1"/>
3 <!-- GENERAL NONBONDS -->
4 <nonbond mass="1.00" charge="0.000" ptype="A" c6="0.0" c12="5.96046e-9">
5 <nbType>NB_1</nbType>
6 </nonbond>
7 <!-- CONTACTS -->
8 <contact func="contact_1(6,12,?,?)" contactGroup="c">
9 <pairType>*</pairType>
10 <pairType>*</pairType>
11 </contact>
Listing 6.7 shows how to define non-bonded and native contact parameters. The nbType
value defined for each atom in the .bif is matched to the nonbond declaration.
Note: the contacts are placed in the [ pairs ] section in the topology file which was
originally used for 1-4 pair interactions. For structure based models, it has been applied
to include native contacts.
SMOG 2 supports two types of interactions for native contacts. Since the contact inter-
actions are unique to SMOG 2, refer to Table 6.2 for appropriate use. In code Listing
6.7, the contact parameters C , c6, and c12 are calculated automatically by the program
using energy normalization and the PDB structure because of the “?” marks. C is
also calculated dynamically through the scaling equations. If one uses the 10-12 or N-M
functions, table files have to be included (see Section 4.1.2 for syntax and details.)
Chapter 7
This chapter provides a step-by-step tutorial on how to add a new residue type using
SMOG 2 template files. Rather than introduce a new residue, we will go through the
declarations necessary the modified RNA residue MIA, which is part of the default AA
model.
Make sure to have the correct chemical structure of your molecule. A useful method is
to visually inspect it with a molecular visualization program (VMD for example):
38
Chapter 7. Adding a new residue definition 39
Since we’re going to explicitly define each atom in the MIA residue, the example here will
use the All-Atom model. When adding a new residue, you can either copy an existing
set of templates, or directly modify the distributed models.
As mentioned in Chapter 6, the biomolecular information file (.bif) defines the structure
of all biomolecules in your system. Here we will define the residue information by declar-
ing all of the atoms, bonds and improper dihedrals within the residue. The full modified
file can be found in the XMLcode examples/AA-whitford09 withMIA.bif directory.
As you get acquainted with the AA-whitford09.bif file structure, you’ll find that the
residues appear grouped together according to their type: Ligands, amino and nucleic
residues. The residue type of MIA is nucleic, so it is added for convenience next to the
existing nucleic residues. The <residue> tag encapsulates all of the residue information.
2945 <!-- NUCLEIC RESIDUES -->
2946
2947 <!--2-methylthio-N6 isopentenyl adenosine-->
2948 <residue name="MIA" residueType="nucleic">
Chapter 7. Adding a new residue definition 40
2949 <atoms>
2950 </atoms>
2951 <bonds>
2952 </bonds>
2953 <impropers>
2954 </impropers>
2955 </residue>
2956
2957 <!--RNA A-->
2958 <residue name="A" residueType="nucleic">
Keep in mind that the attribute name should match the residue name in the PDB file.
List all of the atom names in your residue as they appear in your PDB. The <atoms>
tag encapsulates all the atoms in the biomolecule.
2945 <!-- NUCLEIC RESIDUES -->
2946
2947 <!--2-methylthio-N6 isopentenyl adenosine-->
2948 <residue name="MIA" residueType="nucleic">
2949 <atoms>
2950 <atom bType="B_1" nbType="NB_1" pairType="P_1">P</atom>
2951 <atom bType="B_1" nbType="NB_1" pairType="P_1">O1P</atom>
2952 <atom bType="B_1" nbType="NB_1" pairType="P_1">O2P</atom>
2953 <atom bType="B_1" nbType="NB_1" pairType="P_1">O5*</atom>
2954 <atom bType="B_1" nbType="NB_1" pairType="P_1">C5*</atom>
2955 <atom bType="B_1" nbType="NB_1" pairType="P_1">C4*</atom>
2956 <atom bType="B_1" nbType="NB_1" pairType="P_1">O4*</atom>
2957 <atom bType="B_1" nbType="NB_1" pairType="P_1">C3*</atom>
2958 <atom bType="B_1" nbType="NB_1" pairType="P_1">O3*</atom>
2959 <atom bType="B_1" nbType="NB_1" pairType="P_1">C2*</atom>
2960 <atom bType="B_1" nbType="NB_1" pairType="P_1">O2*</atom>
2961 <atom bType="B_1" nbType="NB_1" pairType="P_1">C1*</atom>
2962 <atom bType="B_1" nbType="NB_1" pairType="P_1">N9</atom>
2963 <atom bType="B_1" nbType="NB_1" pairType="P_1">C8</atom>
2964 <atom bType="B_1" nbType="NB_1" pairType="P_1">N7</atom>
2965 <atom bType="B_1" nbType="NB_1" pairType="P_1">C5</atom>
2966 <atom bType="B_1" nbType="NB_1" pairType="P_1">C6</atom>
2967 <atom bType="B_1" nbType="NB_1" pairType="P_1">N6</atom>
2968 <atom bType="B_1" nbType="NB_1" pairType="P_1">N1</atom>
2969 <atom bType="B_1" nbType="NB_1" pairType="P_1">C2</atom>
2970 <atom bType="B_1" nbType="NB_1" pairType="P_1">N3</atom>
2971 <atom bType="B_1" nbType="NB_1" pairType="P_1">C4</atom>
2972 <atom bType="B_1" nbType="NB_2" pairType="P_1">S10</atom>
2973 <atom bType="B_1" nbType="NB_1" pairType="P_1">C11</atom>
2974 <atom bType="B_1" nbType="NB_1" pairType="P_1">C12</atom>
2975 <atom bType="B_1" nbType="NB_1" pairType="P_1">C13</atom>
2976 <atom bType="B_1" nbType="NB_1" pairType="P_1">C14</atom>
Chapter 7. Adding a new residue definition 41
In this example, as in the default models, all bonded interactions (bonds, angles and
dihedrals) are defined the same for all atoms. Therefore, only one atom group needs to be
defined. The bond type:bType=B 1 is identical for all atoms. The contact interactions
pairType=P 1 are also defined to be the same for all atoms. However, changing the
mass of a specific atom, such as sulfur (S10) in our example will produce a different
non-bonded interaction due to a different volume exclusion term. The new group type
of nbType=NB 2 will be defined in the .nb file.
The chemical structure should tell you how atoms are connected to each other via
chemical bonds. Inspect those bonds and add them to the <bonds> section. The <bonds>
tag encapsulates all the bonds in the biomolecule.
2980 <bonds>
2981 <!--BACKBONE-->
2982 <bond energyGroup="bb_n">
2983 <atom>P</atom>
2984 <atom>O1P</atom>
2985 </bond>
2986 <bond energyGroup="bb_n">
2987 <atom>P</atom>
2988 <atom>O2P</atom>
2989 </bond>
2990 <bond energyGroup="bb_n">
2991 <atom>P</atom>
2992 <atom>O5*</atom>
2993 </bond>
2994 <bond energyGroup="bb_n">
2995 <atom>O5*</atom>
2996 <atom>C5*</atom>
2997 </bond>
2998 <bond energyGroup="bb_n">
2999 <atom>C5*</atom>
3000 <atom>C4*</atom>
3001 </bond>
Chapter 7. Adding a new residue definition 42
3057 <atom>N1</atom>
3058 </bond>
3059 <bond energyGroup="pr_n">
3060 <atom>N1</atom>
3061 <atom>C2</atom>
3062 </bond>
3063 <bond energyGroup="pr_n">
3064 <atom>C2</atom>
3065 <atom>N3</atom>
3066 </bond>
3067 <bond energyGroup="pr_n">
3068 <atom>N3</atom>
3069 <atom>C4</atom>
3070 </bond>
3071 <bond energyGroup="pr_n">
3072 <atom>C4</atom>
3073 <atom>C5</atom>
3074 </bond>
3075 <bond energyGroup="pr_n">
3076 <atom>N9</atom>
3077 <atom>C4</atom>
3078 </bond>
3079 <!--ADDITIONAL TO "A" FUNCTIONAL GROUP-->
3080 <bond energyGroup="pr_n">
3081 <atom>C2</atom>
3082 <atom>S10</atom>
3083 </bond>
3084 <bond energyGroup="pr_n">
3085 <atom>S10</atom>
3086 <atom>C11</atom>
3087 </bond>
3088 <bond energyGroup="pr_n">
3089 <atom>N6</atom>
3090 <atom>C12</atom>
3091 </bond>
3092 <bond energyGroup="pr_n">
3093 <atom>C12</atom>
3094 <atom>C13</atom>
3095 </bond>
3096 <bond energyGroup="pr_n">
3097 <atom>C13</atom>
3098 <atom>C14</atom>
3099 </bond>
3100 <bond energyGroup="pr_n">
3101 <atom>C14</atom>
3102 <atom>C15</atom>
3103 </bond>
3104 <bond energyGroup="pr_n">
3105 <atom>C14</atom>
3106 <atom>C16</atom>
3107 </bond>
3108 </bonds>
Note that the bonds are separated by the comments: BACKBONE,FUNCTIONAL GROUP,
ADDITIONAL BONDS TO ’RNA A’ FUNCTIONAL GROUP. The bond tag attribute energyGroup
classifies the energy group the specific bond belongs to and helps to determine the di-
hedral strengths. The energy group bb n refers to bonds that belong to the backbone
group, and pr n refers to the functional group. It is sometimes useful to use an existing
residue as a reference if we know that the new residue is a slight modification of it. In our
example, MIA is a modified RNA A molecule. All backbone and functional group bonds
of RNA A can be added to MIA and we are left to determine the few other bonds created
by the additional atoms: S10, C11-C16. The additions are added under the comment
ADDITIONAL BONDS TO ’RNA A’ FUNCTIONAL GROUP with energy group of pr n.
Improper dihedrals cannot be dynamically calculated by the program using the bonds,
and should be added separately. The tag <improper></improper> holds four atoms.
The order of the four atoms here defines a specific improper dihedral within a biomolecule.
An improper dihedral is used to ensure proper geometry about a chiral centers (i.e.
prevent symmetry inversion due to an absent hydrogen atom). A proper dihedral would
be defined by four sequential atoms connected by bonds (such as the dihedral C4*-
C5*-O5*-P). An improper dihedral is defined by atoms that are not sequential. Those
angles need to be identified using the chemical structure of the molecule. For example
we consider the four carbon atoms: C13,C14,C15,C16 and their bonds as defined above
(highlighted in orange).
Chapter 7. Adding a new residue definition 45
Finally, we add all of the improper dihedrals to our residue. In our example there is
only one additional improper dihedral described above.
3109 <impropers>
3110 <improper>
3111 <atom>C3*</atom>
3112 <atom>C4*</atom>
3113 <atom>C5*</atom>
3114 <atom>O4*</atom>
3115 </improper>
3116 <improper>
3117 <atom>O3*</atom>
3118 <atom>C3*</atom>
3119 <atom>C4*</atom>
3120 <atom>C2*</atom>
3121 </improper>
3122 <improper>
3123 <atom>O2*</atom>
3124 <atom>C2*</atom>
3125 <atom>C1*</atom>
3126 <atom>C3*</atom>
3127 </improper>
3128 <improper>
3129 <atom>C2*</atom>
3130 <atom>C1*</atom>
Chapter 7. Adding a new residue definition 46
3131 <atom>O4*</atom>
3132 <atom>N9</atom>
3133 </improper>
3134 <!--ADDITIONAL IMPROPER DIHEDRAL TO "RNA A" -->
3135 <improper>
3136 <atom>C13</atom>
3137 <atom>C14</atom>
3138 <atom>C15</atom>
3139 <atom>C16</atom>
3140 </improper>
3141 </impropers>
3142 </residue>
Listing 7.4: Adding the improper dihedrals section to the residue structure
Adding a new atom of a different mass requires a creation of a new non-bonded group for
the excluded volume interactions. In our example we chose the atom sulfur to have larger
mass of twice the carbon mass. A new <nonbond> tag is added and it encapsulates the
new non-bond group information such as mass charge and other explicit non-bonded
terms. The mass is doubled in the mass entry. The nbType includes the previously
defined group name NB 2 that is consistent with the .bif file. The modified file can be
found in: XMLcode examples/AA-whitford09 withMIA.nb.
1 <?xml version=’1.0’?>
2 <nb>
3 <!-- DEFAULTS -->
4 <defaults gen-pairs="0"/>
5 <!-- GENERAL NONBONDS -->
6 <nonbond mass="2.00" charge="0.000" ptype="A" c6="0.0" c12="5.96046e-9">
7 <nbType>NB_2</nbType>
8 </nonbond>
9 <nonbond mass="1.00" charge="0.000" ptype="A" c6="0.0" c12="5.96046e-9">
10 <nbType>NB_1</nbType>
11 </nonbond>
12 <!-- CONTACTS -->
13 <contact func="contact_1(6,12,?,?)" contactGroup="c">
14 <pairType>*</pairType>
15 <pairType>*</pairType>
16 </contact>
17 </nb>
As described above, it is the aim of SMOG2 that the user will be able to extend the
models in a wide range of ways. Accordingly, it is not possible that we provide descrip-
tions of every possible variation that one may explore. However, here, we make an effort
to provide some examples of how to implement specific features that we think may be
frequently of interest.
There are often cases where the covalent geometry of the system can not be determined
by a general set of rules. For example, disulfide bonds may be formed, or broken,
depending on the oxidation state. As another example, sugar structures often have
branching patterns, and do not form linear chains. For these types of chemical bonds,
we provide the BOND option in the PDB file. If you would like to add a chemical bond,
then add BOND lines immediately after the END line in the PDB file. The formatting is
the following:
ChainIndex1 and AtomIndex1 indicate the first atom involved in the bond. ChainIndex1
is the index of the chain in which the atom exists, starting at 1. AtomIndex1 is the
number of the atom, as it appears in the PDB file. Note, ChainIndex1 is not necessarily
the chain ID. ChainIndex2 and AtomIndex2 indicate the second atom involved in the
bond. energygroup indicates the properties of any dihedral angles that have the new
bond as a middle bond (See Chapter 6 for discussion on energy groups). For example,
the following line:
BOND 1 51 4 100 r p
47
Chapter 8. Additional options 48
would add a chemical bond between the atom numbered 51 in the first chain, and the
atom numbered 100 in the 4th chain. The bond properties would be determined based
on the .b file settings, and the energy group of any associated dihedrals would be r p.
When SMOG2 runs, it will write information to the screen as the BOND lines are detected.
It will also write out information about what it interpreted the lines to mean, so you
can verify that the intended bonds are being added. As a note: any new bond angles
are automatically identified. However, only dihedrals that have the new bond as the
central bond will be added. This is a slight limitation in the software, but it is unlikely
to be significant for most applications.
One of the goals of SMOG2 is to allow one to “mix and match” elements of SMOG
models and more highly-detailed energetic models (e.g. AMBER or CHARMM). For
this, we have introduced the ability to define bonds, angles and dihedrals “by type”. For
example, if you would like all bonds between atoms of bType=B 1 and bType=B 2 to
be given AMBER parameters, you could include the AMBER definition in the “extras”
file and then define the bond of type bond bytype in the .b file. In this example, the .b
file would have
<bond func="bond_bytype()">
<bType>B_1</bType>
<bType>B_2</bType>
</bond>
Notice in this example that the atom names in the extras file do not need to correspond
to the values used for bTypes. This is intentional. In SMOG2, each atom is assigned a
bType, nbType and pairType. When determining which interaction to assign to a bond,
the bType is read. However, when using bond bytype, the bondtype read by Gromacs
will be the atom type (which is the nbType in SMOG2). So, in this example, if a residue
were to have a C-O bond, and the C atom were bType=B 1 and nbType=C, while the
O atoms were bType=B 2 and nbType=OS, then the bondtype parameters would be
used in the top file. When these templates were used to generate a top file, it would
only list the atom numbers bonds section, while the parameters would appear under
bondtypes.
Chapter 8. Additional options 49
The extras file may be used to add content to any supported directive. However, non-
bond params, bondtypes, angletypes and dihedraltypes lines are only written if the atom
types for each definition are used in a given SMOG model and simulated system. This
ensures that the top file only contains information that is required for the given system.
In order to allow matched atom types to be given no interaction, the “free” interaction
functions (angle free, dihedral free, and contact free) are provided and listed in
the .sif. They can be used as function types in the .b or .nb. As an example, suppose
you added a FRET dye as a new residue into the .bif:
To ensure that no native contacts would be defined with the FRET dye, the contact free
function must be used in the .nb file:
Further, in order to ensure that exclusions are also not added for atoms that were
identified as “in contact” (according to whichever contact protocol you have selected,
such as Shadow, or cut-off), then you need to make sure that the function is declared in
the .sif file, where the exclusions are explicitly disabled by setting the “exclusions” tag
to zero:
<functions>
...
...
Chapter 8. Additional options 50
Together, this example would ensure that an atom of pairType free that is interacting
with any other type (*) will be given a function contact free, which adds nothing into
the top file. A caveat is that matching angle free nullifies any dihedrals containing the
angle. Note that using free functions will not change the normalizations for contact and
dihedral interaction strengths, i.e. the contacts and dihedrals not written to the .top
are still counted in the normalization sums. If you disable normalizations, then this will
not be an issue.
While not part of the default structure-based models distributed with SMOG 2, there
are often times where it is desirable to add some degree of electrostatic interactions, or
one would like to use interactions that are not of the 6-12 form.
There are two ways to introduce charges in your system. First, you can define the charge
for an nbType in the .nb file, as shown in Listing 6.7. This will tell SMOG to assign a
charge to every atom of the specific nbType. The second way to add charges would be
to define charges for specific atoms within a residue type, in the .bif file, as shown in
Listing 6.1.
All non-hydrogen atoms are explicitly represented, and the provided structure (file.pdb)
is defined as the global potential energy minimum. Here, we provide a complete descrip-
tion of the default all-atom structure-based model energy function, which is defined by
the template SBM AA). All calculations employ reduced units (see A.4). Each atom is
represented as a single bead of unit mass, and the charge of each atom is set to zero.
Covalent geometry is maintained through harmonic interactions that ensure the bond
lengths, bond angles, improper dihedral angles and planar dihedral angles remain about
the values found in file.pdb. Non-bonded atom pairs that are in contact in the provided
structure between residues i and j, where i > j + 3 for proteins and i 6= j for RNA,
are given an attractive 6-12 potential. The minimum of each 6-12 interaction is set to
the distance of that atom pair in the provided structure. All non-native interactions
between atoms that are not in contact in file.pdb are repulsive. Contacts were defined
according to the Shadow algorithm (See Appendix B, with an all-atom cutoff distance
of 6 Å and a shadowing radius of 1 Å. The functional form of the potential is,
X r X θ
V = (ri − ri,o )2 + (θi − θi,0 )2 +
2 2
bonds angles
χimp χ
(χi − χi,0 )2 + planar planar (χi − χi,0 )2
P P
impropers 2 2
X X
+ BB FD (φi − φi,0 ) + SC FD (φi − φi,0 ) (A.1)
backbone " sidechains
#
12 6
σNC 12
X σij σij X
+ C −2 + NC
contacts
rij rij non−contacts
rij
51
Appendix A. Energetic formulation 52
where,
1
FD (φ) = [1 − cos(φ)] + [1 − cos(3φ)] (A.2)
2
When using SMOG 2, all values may be adjusted by the user, such as defining stabilizing
non-native interactions and including non-specific dihedral angles. However, for the
default model ri,0 , θi,0 , χi,0 , φi,0 and σij are given the values defined by the provided
structure. For the default model, the parameters are set to the following values:
The sums are over all dihedral angles in the system, and N is the number of atoms in
the system.
Note: The rescaling of dihedrals based on common middle bonds may not always desired.
To explicitly indicate whether this feature should be enabled, add the following child
element within the settings element in the .sif file
1 <dihedralNormalization dihedralCounting="0" />
0 indicates that counting should be turned off. 1 (default) indicates that counting should
be performed.
Appendix A. Energetic formulation 53
The Cα model coarse-grains the protein as single bead of unit mass per residue located
at the position of the α-carbon. x~0 denotes the coordinates of the native state and any
subscript 0 signifies a value taken from the native state. The potential is given by
X r X θ X
VCα (~x, x~0 ) = (r − r0 )2 + (θ − θ0 )2 + D FD (φ − φ0 )
2 2
bonds angles dihedrals
σNC 12
X σij 12 σij 10
X
+ C 5 −6 + NC (A.3)
contacts
r r non−contacts
rij
1
FD (φ) = [1 − cos(φ)] + [1 − cos(3φ)]. (A.4)
2
The coordinates ~x describe a configuration of the α-carbons, with the bond lengths to
nearest neighbors r, three body angles θ, four body dihedrals φ, and distance between
atoms i and j given by rij . Protein contacts that are separated by less than 3 residues
are neglected. Excluded volume is maintained by a hard wall interaction giving the
residues an apparent radius of σNC = 4 Å. The native bias is provided by using the
parameters from the native state x~0 . Setting the energy scale ≡ kB , the coefficients
are given the homogeneous values: r = 200/Å2 , θ = 40/rad2 , D = C = NC = .
The Gaussian-shaped contact potentials (Fig. A.1) are available in the SMOG version
of Gromacs and NAMD (see section A.3.4). These potentials are used when one desires
control over either the shape of the excluded volume or the width of the attractive
potential. They are also useful if contacts require minima in two places. In depth
characterization of the Gaussian potentials with all-atom structure-based models using
SMOG can be found in [12] (templates/SBM AA+gaussian). They are explored in the
context of a multi-basin Cα model here [13] (templates/SBM calpha+gaussian).
This template can be selected using the -AAgaussian option. Selecting this template
p
changes the contact potential to ftype = 6, C =?, r0 =?, σ= r02 /(50 ln 2), and enforces
Appendix A. Energetic formulation 54
12 = 12
rNC =0.21 nm. rNC NC × σNC , i.e. the non-native excluded volume term, with NC =
12 = 0.25 nm. The rather complex definition of the width of the Gaussian
0.1 and σNC
well σ is designed to model the variable width of the LJ potential: CLJ (1.2r0 ) ∼ −1/2
so σ is defined such that G(1.2r0 ) = −1/2 giving σ 2 = (r0 )2 /(50 ln 2).
12 = 0.25 nm.
• Note: v2.0 and earlier had rNC =0.17 nm, i.e. NC = 0.01 and σNC
This template can be selected using the -CAgaussian option. Selecting this template
changes the contact potential to ftype = 6, C =?, r0 =? nm, σ=0.05 nm, rNC =0.4 nm.
This is not yet implemented as a function in SMOG2, but the syntax of ftype= 7 can
be found on the SMOG webserver.
A.3.4.1 Gromacs
The Gaussian contact shapes are not available in the standard Gromacs distributions.
The necessary source code can be obtained at http://smog-server.org/SBMextension.html.
This source distribution is compiled exactly as any other Gromacs source distribution.
Appendix A. Energetic formulation 55
A.3.4.2 NAMD
Currently the “nightly build” version of NAMD contains the Gaussian potentials in the
“Go potentials” section. More information can be found in the NAMD manual.
A.3.5.1 Gromacs
The Gaussian interaction is designated in the [ pairs ] section of the topology file.
• ftype = 6
0 )2
12
1 rNC (rij −rij
– Cij (rij ) = −C 1+ 12
C rij 1 − exp − 2σ2 −1
ij
– 12
rNC = NC × 12 ,
σNC where NC and σNC come from the non-native excluded
volume c12 term.
– C → depth of the attractive well
– r0 → position of the minimum of the attractive well in nm
– σ → width of the attractive well in nm
12 → position of the excluded volume hard wall in nm12
– rNC
– This form includes an excluded volume part, and therefore the pair ij should
be included in [ exclusions ]. The multiplicative form anchors the mini-
mum of the well at (r0 , −C ).
Note that ftype = 5 and ftype = 7 exist in the SBM extensions version, though there
is no implementation in SMOG2 at the moment. They can be added by hand if desired.
A.3.5.2 NAMD
Currently the “nightly build” version of NAMD contains the Gaussian potentials in the
“Go potentials” section. More information can be found in the NAMD manual.
SMOG topology files are written in reduced units, meaning that the masses, energies,
and Boltzmann’s constant are unitless and given a value of unity. (Note that lengths
Appendix A. Energetic formulation 56
are written with units of nanometers for readability.) For example, this means that the
all-atom model gives a mass of 1 to all the atoms with weights near 12 amu and the
C-alpha model gives a mass of 1 for each coarse-grained residue bead. In such a scheme,
the physically meaningful temperature is then near 1 (i.e. kB T ≈ 1). Independent of
reduced units, interpreting the simulation temperatures used in SMOG models as real
world temperatures is problematic because of the effective energetics mixing energetic
and entropic terms in a simplified potential. Gromacs adds an additional caveat to
the expression of reduced temperatures because Gromacs uses a hard-coded value of
Boltzmann’s constant (kB ) of 0.00831451 in Gromacs units of kJ/mol/K. Thus, a reduced
temperature of 1 is expressed as 1/0.00831451 = 120.27 in a Gromacs .mdp file. You
will find that proteins will typically unfold at reduced temperatures of 1-1.3 or Gromacs
temperatures of 120-155. For a discussion of mixing externally known energetic terms
with the effective energetics in SMOG (or equivalently, trying to estimate the reduced
energy unit in physical units), see section S1 of reference [14].
Appendix B
B.1 Introduction
This section describes a Java application SCM.jar that computes the “Shadow” map,
a general contact definition for capturing the dynamics of biomolecular folding and
function. It is described in the literature here [12]. A contact map is a binary symmetric
matrix that encodes which atom pairs are given attractive interactions in the SBM
potential. In the context of a SBM, the native contact map should approximate the
distribution of stabilizing enthalpy in the native state that is provided by short range
interactions like van der Waals forces, hydrogen bonding, and salt bridges. Any long
range interactions or nonlocal effects are taken into account in a mean field way through
the native bias.
Internally SMOG 2 uses SCM.jar to compute contact maps. From the user’s point of
view the contact map can be of two types, all-atom or coarse-grained. An all-atom
map returns the atoms that are in contact based on the Shadow definition. A coarse-
grained map (e.g. to be used with the Calpha model) is created from an all-atom map.
The coarse-grained map consists of residue-level contacts. A residue-level contact exists
if there is at least one atom-atom contact between two residues. This is why a PDB
containing all heavy atoms is required by the tool. When coarse-graining SMOG2 asks
that the user provide an all-atom template in addition to the coarse-graining template
that tells SMOG2 how to interpret the all-atom PDB in order to interface with SCM.jar.
57
Appendix B. SCM.jar 58
Figure B.1: The Shadow contact map screening geometry. Only atoms within the
cutoff distance C are considered. Atoms 1 and 2 are in contact because they are within
C and have no intervening atom. To check if atoms 1 and 3 are in contact, one checks
if atom 2 shadows atom 1 from atom 3. The three atoms are viewed in the plane, and
all atoms are given the same shadowing radius S. Since a light shining from the center
of atom 1 causes a shadow to be cast on atom 3, atoms 1 and 3 are not in contact. See
Section B.1.2.
The tool is available with the SMOG2 source for users that want to create their own
customized maps. The rest of the chapter describes the basics of using the tool.
The coarse-grained contact map returned is only strictly recommended for use with
Calpha models of proteins, and where the input PDB has an all-atom representation.
For various modeling applications, it is desirable that the program not die with an error if
the PDB doesn’t only contain all-atom protein with each residue containing a CA atom.
Therefore, the behavior is that the program will choose one atom from each residue to
stand in as the representative coarse-grained position. It chooses, in order of preference:
CA, N1, first atom in the residue. This really only matters for the --distance option.
From the beginning, SCM.jar has had a bug where asin was replaced by atan in the
code calculating the tangents to the spheres (see Figure B.1). This went undetected
since the error is small when the distances between atoms is larger than the shadowing
size, roughly 3% error in the angle for a shadowing atom 4 Å away. This works out
to allow slightly more contacts. The option --correctedShadow (introduced in v2.3)
Appendix B. SCM.jar 59
implements the exact algorithm of Figure B.1, whereas SCM.jar without that switch
continues to use the “buggy” form.
If the user wishes to use --correctedShadow, but wants to create maps as close to the
“buggy” maps as possible, a good rule of thumb is to use a slightly smaller shadow size.
For example, using -m shadow -s 0.975 -c 6 for the CI2 protein (1YPA.pdb), gives
the same number of contacts (599) with 99.5% of the contacts identical.
Like any java application, no compilation is necessary, but a virtual machine is required;
SCM.jar requires a sufficiently recent JRE. SCM.jar reads SMOG formatted Gromacs
input files. Important! The all (heavy) atom geometry must be used, even if
the output will be a coarse-grained residue-based map for a Cα model. The atomic co-
ordinates are read in .gro format and the bond connectivity is read via a .top obtained
from the SMOG webtool (or source distribution). The topology is required since bonded
atoms shadow each other differently and since contacts are automatically discarded be-
tween two atoms if they share a bonded interaction. At the command line, the basic
syntax is
-Xmx1000m assigns 1000 MB of RAM to the Java virtual machine heap. With large
complexes (>1e5 atoms) the default heap allocation can run out which gives the following
error:
• Shadow map, atomic contacts, shadowing radius 1 Å and cutoff 6 Å (default sizes).
See Figure B.1 for definition of radius and cutoff. Add --chain if you have multiple
chains, since the .gro format does not allow for chain information. Specify the
chains file you get from your SMOG output.
• Same as above, but including the correction such that the algorithm exactly follows
the description in Figure B.1 (see Section B.1.2).
- OR -
C.1 Introduction
When installed Perl modules, it is often easiest to use CPAN, rather than manually
installing the RPMs. Here we provide an introduction to CPAN to get you started.
CPAN (“Comprehensive Perl Archive Network”) is a collection of over 100,000 Perl
modules ready to be installed. It is a great tool for easily adding perl modules or
updating your perl version.
Here we introduce installation for Linux and OSX. CPAN can be installed using the
command line. If you have root privileges and are using the system perl located at
/usr/bin/perl, then it’s easier to install via the package management system of your
Linux or OSX distribution. You can explore CPAN options by typing “?”.
RedHat/Fedora/CentOS Linux:
> sudo yum install perl-CPAN
Debian/Ubuntu Linux:
1. Install all dependent packages for CPAN:
> sudo apt-get install build-essential
2. Invoke the cpan command:
> cpan
62
Appendix C. Installing Perl Modules using CPAN 63
OSX on Mac:
Before you can use CPAN, you need to configure your Mac appropriately. CPAN uses
some low-level tools to install modules, therefore tools like “Xcode” or the “Command
Line Tools for Xcode package” are necessary. These are available at the Apple’s De-
veloper site. You should also have a terminal or shell application downloaded from the
App store.
From the terminal application type the following and hit enter:
> cpan
This will take you through the installation process with more potential questions. You
can enter “yes” for every question.
You can upgrade your CPAN and perl version if it is not up to date. This step is
recommended before installing any other modules.
1. In the CPAN application type:
> upgrade perl
2. If your perl or CPAN is not up to date, you can upgrade it:
> install CPAN
> reload cpan
Appendix C. Installing Perl Modules using CPAN 64
Here we show an example of how to install the Perl module String::Util using CPAN.
1. Open your CPAN application as root (recommended) and type in:
> install String::Util
2. You should see some output messages in your terminal. If the installation is successful,
you should see an output saying “Installation Complete” or “Build install OK”.
This means you can go ahead and start using the module within your perl scripts.
• Unsuccessful installation - Start by looking at the first error message that was out-
put to the screen. Most likely there are missing libraries that are required before
installing your module. It’s recommended that you install those missing libraries
Appendix C. Installing Perl Modules using CPAN 65
one by one, as instructed in the output messages. After installing the missing
libraries, try the installation of the modules again.
• Changing the Perl version linked to CPAN - as mentioned earlier, by default CPAN
works with the system perl located at /usr/bin/perl. It can sometimes help to
work with a different Perl version if your system perl version is damaged and you
cannot install modules properly. You can change which version CPAN sees by
replacing a single line in the CPAN executable script:
(i) Check the full path of CPAN executable by:
> which cpan
Usually the executable file is: /usr/local/bin/cpan
(ii) Open (as root) the executable file, for example using vi:
> vi /usr/local/bin/cpan
(iii) Change the following line (line n.3) to include the full path of the version
you’d like to use:
Save the changes in the file. The next time you use CPAN, all of the newly installed
modules will be linked to the new perl version.
Appendix D
In an effort to reduce the size of .tpr files, grompp tries to find parameters that are
common (e.g. contacts with the same c6 and c12 parameters). In SMOG models,
most contacts have unique parameters. This leads to a very large number of unique
parameters, which are then compared against all other parameters in the .top file. As
a result, grompp executes a nested for loop that has increasing bounds (essentially an
N 2 calculation). For small systems (less than ∼ 10, 000 atoms), this tends not to be
a problem. However, for systems of 100,000 atoms or greater, grompp can easily re-
quire hours to complete. Since this bookkeeping feature of grompp does not appear
to impact the performance of mdrun for SMOG models, it is desirable to disable this
parameter comparison. To disable this check, you can make a one-line modification
to the source code and rebuild Gromacs. For Gromacs v4.6.5, you will want to edit
src/kernel/convparm.c. Specifically, go to line 465, which reads:
if (!bAppend)
if (0)
For Gromacs 5.1.4, the corresponding code can be found on line 542 of
src/gromacs/gmxpreprocess/convparm.c
We have found that this modification results in a .tpr file that is 20-60% larger, though
grompp can be over 100 times faster.
66
Appendix D. FAQs and Tips 67
I used smog adjustPDB and when I run SMOG2 I get the error “FATAL ER-
ROR: It appears that a residue in the PDB file does not contain all of the
atoms defined in the .bif file.”. What is wrong?
Sometimes, this error arises because there were missing atoms in the PDB file (e.g.
some atoms may have not been resolved in the crystallographic model). If you are
convinced that all atoms are present, then there is another possibility. When you use
smog adjustPDB, it makes its best guesses for how residues should be named. For ex-
ample, a 5’-terminal RNA residue (say G) will be renamed with a 5 added to the residue
name (G5). In the default templates, G5 is defined as a residue that has a terminal
phosphate group, however most PDB files are lacking these atoms. The appropriate
name for a phosphate-less terminal G would be G0P (zero, not O). If this is the case,
you can provide your own mapping file for smog adjustPDB, or you can manually edit
the residue names in the pdb file.
Appendix E
Acknowledgements
We would like to thank everyone that has given feedback during the development of
SMOG 2.
We acknowledge Mark Howarth [15] for the protein structure alphabet used in the logo.
68
Bibliography
[1] Jeffrey K Noel and José N Onuchic. The many faces of structure-based poten-
tials: From protein folding landscapes to structural characterization of complex
biomolecules. Computational Modeling of Biological Systems, Springer US, pages
31–54, 2012.
[3] J Bryngelson and P Wolynes. Intermediates and barrier crossing in a random energy
model (with applications to protein folding). J. Phys. Chem., 93:6902–6915, 1989.
[5] C Clementi, H Nymeyer, and J N Onuchic. Topological and energetic factors: what
determines the structural details of the transition state ensemble and “en-route”
intermediates for protein folding? an investigation for small globular proteins. J.
Mol. Biol., 298(5):937–953, 2000.
[6] Paul C Whitford, Jeffrey K Noel, Shachi Gosavi, Alexander Schug, Kevin Y San-
bonmatsu, and José N Onuchic. An all-atom structure-based potential for proteins:
bridging minimal models with all-atom empirical forcefields. Proteins, 75(2):430–
441, 2009.
[7] Sander Pronk, Szilárd Páll, Roland Schulz, Per Larsson, Pär Bjelkmar, Rossen
Apostolov, Michael R Shirts, Jeremy C Smith, Peter M Kasson, David van der
Spoel, Berk Hess, and Erik Lindahl. Gromacs 4.5: a high-throughput and highly
parallel open source molecular simulation toolkit. Bioinformatics, 29(7):845–854,
2013.
[8] James C Phillips, Rosemary Braun, Wei Wang, James Gumbart, Emad Tajkhor-
shid, Elizabeth Villa, Christophe Chipot, Robert D Skeel, Laxmikant Kalé, and
69
Bibliography 70
[9] Jeffrey K Noel, Jorge Chahine, Vitor B P Leite, and Paul Charles Whitford. Cap-
turing transition paths and transition states for conformational rearrangements in
the ribosome. Biophys J, 107(12):2872–2881, December 2014.
[10] Ohad Givaty and Y Levy. Protein sliding along dna: dynamics and structural
characterization. J Mol Biol, 385(4):1087–97, Jan 2009. doi: 10.1016/j.jmb.2008.
11.016.
[11] Jeffrey K Noel, Mariana Levi, Mohit Raghunathan, Heiko Lammert, Ryan L Hayes,
Jose N Onuchic, and Paul C Whitford. SMOG 2: A Versatile Software Package for
Generating Structure-Based Models. PLoS Comp Biol, 12(3):e1004794, 2016.
[12] Jeffrey K Noel, Paul C Whitford, and José N Onuchic. The shadow map: a general
contact definition for capturing the dynamics of biomolecular folding and function.
J. Phys. Chem. B, 116(29):8692–8702, 2012.
[13] Heiko Lammert, Alexander Schug, and José N Onuchic. Robustness and general-
ization of structure-based models for protein folding and function. Proteins, 77(4):
881–891, 2009.
[14] Ryan L Hayes, Jeffrey K Noel, Paul C Whitford, Udayan Mohanty, Karissa Y San-
bonmatsu, and José N Onuchic. Reduced model captures Mg(2+)-RNA interaction
free energy of riboswitches. Biophysical Journal, 106(7):1508–1519, 2014.
[15] Mark Howarth. Say it with proteins: an alphabet of crystal structures. Nat Struct
Mol Biol, 22:349, 2015.