Parameterization with Dabble¶
Regardless of simulation program or force field, parameterizing a molecular system consists of several steps:
- Parse the library of known residues provided by the forcefield
- Match system residues to those in library
- Assign residue and atom names based on matching
- Invoke simulation package preparatory program to build input files.
Pretty much all simulation package prepatory programs require that atom names match the defined residue types, and cannot match up for you. This resuls in many awkward work arounds, such as large dictionaries of name translations for each amino acid from RSC PDB names to the required ones, and tedious work on the part of the user for ligands.
Dabble does all of this work for you, and is in fact completely agnostic to the atom names in the input file. It does this by representing each residue as an undirected graph, with nodes being atoms. A library of these graphs is generated from provided parameter and topology files, and graph isomorphism is used to match up system residues to those in the library based on element.
As protein residues are in a long chain, the graph representation is a little trickier. Using the same philosophy as the CHARMM and AMBER parameter sets, "special" atoms called + and - are attached to represent the next and previous amino acid in the chain. This does require that all protein residues are correctly represented in the input file, that is, each resid does correspond to an entire amino acid.
There are several tweaks to this process depending on the desired output format. For example, CHARMM has the concept of "patches" that can be applied on top of residues. Dabble can detect and apply these correctly.
The following API functions are called on the built system from Dabble's builder, but you can use them on systems you have already built as well, or extend them to other systems.
Automatic atom naming¶
Atom matching classes inherit from MoleculeMatcher
, which
defines basic functionality for matching:
-
class
Dabble.param.
MoleculeMatcher
(**kwargs)¶ Bases:
object
Represents a collection of graphs of all residues defined in the topology files. Can pass in VMD molecules to be checked and atom names corrected
-
topologies
¶ list of str -- The topology files this molecule graph knows about
-
known_res
¶ dict str resname -> networkx graph -- The molecule graphs
-
known_pres
¶ dict tuple (str resname, patchname) -> networkx graph
-
patches
¶ dict patchname -> str instructions -- Known patches
-
nodenames
¶ dict name -> element -- Translates atom names to elements
-
static
get_element
(mass)¶ Returns the element corresponding to a given mass. Masses are rounded up to get atomic number to control for varying numbers of decimal places. If the mass isn't in the table (it's not complete), returns "Other" which should be sufficient for our matching purposes.
Parameters: mass (float) -- Mass to look up Returns: Element corresponding to the mass, or "Other" Return type: (str)
-
get_extraresidue_atoms
(selection)¶ Determines if this selection contains bonds to other things. We
Parameters: selection (atomsel) -- Selection to check Returns: - List of indices of atoms belonging to other
- residues, or an empty list
Return type: (list of int)
-
get_names
(selection, print_warning=False)¶ Obtains a name mapping for the current selection
Parameters: - selection (VMD atomsel) -- Selection to set names for
- print_warning (bool) -- Whether or not to print matching suggestions if matching fails. Set to false if you'll try patches later.
Returns: (dict int->str) Atom index to resname matched (dict int->str) Atom index to atom name matched up
Raises: KeyError
-- if no matching possible
-
static
parse_vmd_graph
(selection)¶ Takes a VMD atom selection, translates it to a graph representation
Parameters: selection (VMD atomsel) -- Atom selection to verify, modified.
Returns: graph representing the molecule, with nodes named indices bool representing if the selection is covalently bonded to
another residue
Raises: - ValueError if atom selection is more than one residue
- LookupError if the residue couldn't be found in known templates
-
Generation of simulation input files¶
Simulation input files are generated by a preparatory program specific to that package. Dabble invokes this program instead of generating files itself as functionality may change, and application of correct forcefield parameters can be challenging. Also, there is no reason to reinvent the wheel.
For CHARMM, Dabble invokes the VMD plugin psfgen, and for AMBER, tleap. Dabble generates an input file for these programs, invokes them, and checks the output for correctness.
The design philosophy here is that this should be accomplished by a
Writer
class, that when initialized with a VMD molecule ID corresponding
to the built system as well as parameter and/or topology files to be applied.
It invokes the atom matching, then generates a new molecule
with corrected atom and residue names. Invoking Writer.write
with an
output file name then generates an input file for and calls the setup program,
checking the output along the way.
Amber-specific Parameterization¶
Atom name matching in AMBER has functionality specific to the modular head/tail representation of lipids, as well as identification of disulfide bonds.
-
class
Dabble.param.
AmberMatcher
(topologies)¶ Bases:
Dabble.param.moleculematcher.MoleculeMatcher
Matches up names and residues for molecules for use with leap and the amber stack of preparatory programs
-
topologies
¶ list of str -- The topology files this molecule graph knows about. For amber, these are leaprc files.
-
known_res
¶ dict str resname -> networkx graph -- The molecule graphs
-
nodenames
¶ dict name -> element -- Translates atom names to elements
-
get_disulfide
(selection, molid)¶ Checks if the selection corresponds to a cysteine in a disulfide bond. Sets the patch line appropriately and matches atom names using a subgraph match to the normal cysteine residue
Parameters: - selection (VMD atomsel) -- Selection to check
- molid (int) -- VMD molecule ID to look for other CYS in
Returns: resnames (dict int -> str) Residue name translation dictionary atomnames (dict int -> str) Atom name translation dictionary conect (int) Residue this one is connected to
-
get_linkage
(selection, molid)¶ Checks if the selection corresponds to a residue that is covalently bonded to some other residue other than the normal + or - peptide bonds. Sets the patch line (bond line for leap) appropriately and matches atom names using a maximal subgraph isomorphism to the normal residue.
Parameters: - selection (VMD atomsel) -- Selection to check
- molid (int) -- VMD molecule ID to look for other bonded residue in
Returns: resnames (dict int -> str) Residue name translation dictionary atomnames (dict int -> str) Atom name translation dictionary conect (str) Leap patch line to apply for this linkage
-
get_lipid_head
(selection)¶ Obtains a name mapping for a lipid head group given a selection describing a possible lipid.
Parameters: selection (VMD atomsel) -- Selection to set names for Returns: (dict int->str) Atom index to resname matched (dict int->str) Atom index to atom name matched up (int) Atom index corresponding to - direction tail Raises: KeyError
-- if no matching possible
-
get_lipid_tails
(selection, head)¶ Obtains a name mapping for both ligand tails in a system given a selection describing the lipid and the indices of the head group atoms.
Parameters: - selection (VMD atomsel) -- Selection to pull tails from
- head (list of int) -- Atom indices in the head group of this lipid. Obtain with get_lipid_head function.
Returns: - Atom index to
resname matched, atom index to atom name translation dictionaries for both tails
Return type: (array of tuples that are dict int->str)
Raises: ValueError
-- If a tail could not be matched or if there is an incorrect number of tails somehow attached.
-
get_names
(selection, print_warning=False)¶ Obtains a name mapping for the current selection. Can't use parent here since name is stored in a different field.
Parameters: - selection (VMD atomsel) -- Selection to set names for
- print_warning (bool) -- Whether or not to print matching suggestions if matching fails. Set to false if you'll try patches later.
Returns: (dict int->str) Atom index to resname matched (dict int->str) Atom index to atom name matched up
Raises: KeyError
-- if no matching possible
-
get_unit
(selection)¶ Gets the UNIT name corresponding to a selection. Helpful when matching ligands that may have a residue name different from unit name since the average user doesn't know what a unit name is when creating a library. This isn't as efficient as it could be because it is called infrequently, on ligands only.
Parameters: - selection (VMD atomsel) -- Selection to check
- molid (int) -- VMD molecule ID to look in
Returns: (str) The unit name, or None if there was no match
-
The AmberWriter
class requires an installation of AmberTools in
$AMBERHOME
, enabling the use of the most up to date forcefields and
bugfixes. The following parameter and topology sets are used by default:
- leaprc.protein.ff14SB
- leaprc.lipid14
- leaprc.lipid16
- leaprc.water.tip3p
- leaprc.gaff2
The tleap program is used to do the parameterization, and the ParmEd API then applies HMR if desired, and checks the result.
If the user is asking for input files for AMBER using CHARMM parameters,
the AmberWriter
class can also take the molecule ID of a
CHARMM-parameterized molecule, and invoke chamber
through the ParmEd
API to convert to the AMBER format.
-
class
Dabble.param.
AmberWriter
(molid, **kwargs)¶ Bases:
object
Creates AMBER-format input files, using either CHARMM or AMBER parameters.
When using the CHARMM parameters, creates a psf/pdb file and interfaces with ParmEd chamber command to create AMBER format files. When using the AMBER parameters, creates a pdb file and runs the amber lipid conversion script to create leap input files.
-
write
(prmtop_name)¶ Creates a prmtop with either AMBER or CHARMM parameters.
-
Charmm-specific Parameterization¶
Atom name matching in CHARMM is somewhat complicated by the fact that patches are allowed. Patches represent modifications to residues, and may be applied to any residue although in reality are only chemically valid for a subset of them. An example of a patch is the addition of a disulfide bond between two residues (DISU), or a cap on the N-terminal end of the protein (NTER, ACE).
If a residue cannot be matched from the provided topology definitions, Dabble's graph matching functionality generates a library of valid patch applications and attempts to match there, adding the relevant patches if a match is found. Subgraph isomorphism is used to help trim the size of this library.
With the exception of disulfide bonds, patches involving only one residue are currently supported.
Note that this requires patch atoms to be in the same residue. This can pose a challenge to recognition of caps, which are applied as separate residues in some protein preparation workflows (notably Maestro).
-
class
Dabble.param.
CharmmMatcher
(topologies)¶ Bases:
Dabble.param.moleculematcher.MoleculeMatcher
Represents a collection of graphs of all residues defined in the topology files. Can pass in VMD molecules to be checked and atom names corrected
-
topologies
¶ list of str -- The topology files this molecule graph knows about, from parent class
-
known_res
¶ dict str resname -> networkx graph -- The molecule graphs, from parent class
-
nodenames
¶ dict name -> element -- Translates atom names to elements, from parent class
-
known_pres
¶ dict tuple (str resname, patchname) -> networkx graph
-
patches
¶ dict patchname -> str instructions -- Known patches
-
get_disulfide
(selstring, fragment, molid)¶ Checks if the selection corresponds to a cysteine in a disulfide bond. Sets the patch line appropriately and matches atom names using a subgraph match to the normal cysteine residue
Parameters: - selstring (str) -- Selection to check
- fragment (str) -- Fragment ID (to narrow down selection)
- molid (int) -- VMD molecule of entire system (needed for disu partner)
Returns: - (str, Patch, dict) resname matched, patch object for psfgen,
name translation dictionary
-
get_names
(selection, print_warning=False)¶ Returns at atom name matching up dictionary. Does the generic moleculematcher algorithm then checks that only one resname matched since for CHARMM there is no concept of a unit and only one named residue is defined per topology.
Parameters: - selection (VMD atomsel) -- Selection to rename
- print_warning (bool) -- Debug output
Returns: (str) resname matched (dict int->str) translation dictionary from index to atom name
Raises: ValueError if more than one residue name is matched
-
get_patches
(selection)¶ Obtains names and patch info for a modified residue in a selection. Identifies which amino acid is patched by finding which amino acid this one is a maximal subgraph of. Then, builds a library of graphs representing all valid patches applied to this amino acid.
Note that this does NOT handle multiple-residue patches such as disulfides!
Parameters: selection (VMD atomsel) -- Selection that is patched Returns: - (str, str, dict) resname matched, patch name applied,
- name translation dictionary
-
The VMD plugin psfgen is used to generate parameterized files with CHARMM parameters. Dabble writes an input file for psfgen, and handles all its quirks (such as not allowing segments to have more than 10,000 atoms because its PDB parser is the worst).
-
class
Dabble.param.
CharmmWriter
(tmp_dir, molid, lipid_sel='lipid', **kwargs)¶ Bases:
object
An object that handles all the conversions to a psf file by interfacing with psfgen.
Writes a pdb/psf file pair from the current molecule using the CHARMM36 topology and atom names/types. Interfaces with psfgen by dynamically generating the .tcl file that psfgen takes as input. Prompts the user for additional topology files and helps with matching atom names that cannot be automatically translated to the charmm naming conventions.
-
file
¶ file handle -- Temporary file to write TCL script that invokes psfgen
-
tmp_dir
¶ string -- Directory where temporary files are stored
-
psf_name
¶ str -- Prefix for the pdb/psf output files, extension will be appended
-
molid
¶ str,optional -- the VMD molecule id to write. Defaults to 0.
-
lipid_sel
¶ str,optional -- atomselect string describing what should count as "lipid". Defaults to "lipid"
-
topologies
¶ list of str -- Topology files that were used in creating the psf
-
prompt_topos
¶ bool -- Whether to ask for more topology files
-
matcher
¶ CharmmMatcher -- Molecular graph matcher object
-
write
(psf_name)¶ Writes the pdb/psf file.
Parameters: psf_name (str) -- Prefix for the pdb/psf output files, extension will be appended Returns: - Topology files that were used in creating
- the psf
Return type: topologies (list of str)
-