Parameterization with Dabble¶
Regardless of simulation program or force field, parameterizing a molecular system consists of several steps:
Parse the library of known residues provided by the forcefield
Match system residues to those in library
Assign residue and atom names based on matching
Invoke simulation package preparatory program to build input files.
Pretty much all simulation package prepatory programs require that atom names match the defined residue types, and cannot match up for you. This resuls in many awkward work arounds, such as large dictionaries of name translations for each amino acid from RSC PDB names to the required ones, and tedious work on the part of the user for ligands.
Dabble does all of this work for you, and is in fact completely agnostic to the atom names in the input file. It does this by representing each residue as an undirected graph, with nodes being atoms. A library of these graphs is generated from provided parameter and topology files, and graph isomorphism is used to match up system residues to those in the library based on element.
As protein residues are in a long chain, the graph representation is a little trickier. Using the same philosophy as the CHARMM and AMBER parameter sets, "special" atoms called + and - are attached to represent the next and previous amino acid in the chain. This does require that all protein residues are correctly represented in the input file, that is, each resid does correspond to an entire amino acid.
There are several tweaks to this process depending on the desired output format. For example, CHARMM has the concept of "patches" that can be applied on top of residues. Dabble can detect and apply these correctly.
The following API functions are called on the built system from Dabble's builder, but you can use them on systems you have already built as well, or extend them to other systems.
Automatic atom naming¶
Atom matching classes inherit from MoleculeMatcher
, which
defines basic functionality for matching:
- class dabble.param.MoleculeMatcher(**kwargs)¶
Bases:
ABC
Represents a collection of graphs of all residues defined in the topology files. Can pass in VMD molecules to be checked and atom names corrected
- topologies¶
The topology files this molecule graph knows about
- Type:
list of str
- known_res¶
The molecule graphs
- Type:
dict str resname -> networkx graph
- known_pres¶
- Type:
dict tuple (str resname, patchname) -> networkx graph
- patches¶
Known patches
- Type:
dict patchname -> str instructions
- nodenames¶
Translates atom names to elements
- Type:
dict name -> element
- pseudoatoms¶
Atom types for pseudo or dummy atoms
- Type:
list of str
- static get_element(mass)¶
Returns the element corresponding to a given mass. Masses are rounded up to get atomic number to control for varying numbers of decimal places. If the mass isn't in the table (it's not complete), returns "Other" which should be sufficient for our matching purposes.
- Parameters:
mass (float) -- Mass to look up
- Returns:
Element corresponding to the mass, or "Other"
- Return type:
(str)
- get_extraresidue_atoms(selection)¶
Determines if this selection contains bonds to other things. We
- Parameters:
selection (atomsel) -- Selection to check
- Returns:
- List of indices of atoms belonging to other
residues, or an empty list
- Return type:
(list of int)
- get_names(selection, print_warning=False)¶
Obtains a name mapping for the current selection
- Parameters:
selection (VMD atomsel) -- Selection to set names for
print_warning (bool) -- Whether or not to print matching suggestions if matching fails. Set to false if you'll try patches later.
- Returns:
(dict int->str) Atom index to resname matched (dict int->str) Atom index to atom name matched up
- Raises:
KeyError -- if no matching possible
- static parse_vmd_graph(selection)¶
Takes a VMD atom selection, translates it to a graph representation
- Parameters:
selection (VMD atomsel) -- Atom selection to verify, modified.
- Returns:
Intermediate representation of the molecule, and True if the selection is covalently bonded to another residue.
- Return type:
(networkx.Graph, bool)
- Raises:
ValueError if atom selection is more than one residue --
LookupError if the residue couldn't be found in known templates --
- static write_dot(graph, output)¶
Writes the residue as a dot file that can be rendered as a SVG to aid in debugging.
- Parameters:
graph (networkx Graph) -- Graph to draw
output (str) -- Filename to write to
Generation of simulation input files¶
Simulation input files are generated by a preparatory program specific to that package. Dabble invokes this program instead of generating files itself as functionality may change, and application of correct forcefield parameters can be challenging. Also, there is no reason to reinvent the wheel.
For CHARMM, Dabble invokes the VMD plugin psfgen, and for AMBER, tleap. Dabble generates an input file for these programs, invokes them, and checks the output for correctness.
The design philosophy here is that this should be accomplished by a
Writer
class, that when initialized with a VMD molecule ID corresponding
to the built system as well as parameter and/or topology files to be applied.
It invokes the atom matching, then generates a new molecule
with corrected atom and residue names. Invoking Writer.write
with an
output file name then generates an input file for and calls the setup program,
checking the output along the way.
Amber-specific Parameterization¶
Atom name matching in AMBER has functionality specific to the modular head/tail representation of lipids, as well as identification of disulfide bonds.
- class dabble.param.AmberMatcher(topologies)¶
Bases:
MoleculeMatcher
Matches up names and residues for molecules for use with leap and the amber stack of preparatory programs
- topologies¶
The topology files this molecule graph knows about. For amber, these are leaprc files.
- Type:
list of str
- known_res¶
The molecule graphs
- Type:
dict str resname -> networkx graph
- nodenames¶
Translates atom names to elements
- Type:
dict name -> element
- get_disulfide(selection, molid)¶
Checks if the selection corresponds to a cysteine in a disulfide bond. Sets the patch line appropriately and matches atom names using a subgraph match to the normal cysteine residue
- Parameters:
selection (VMD atomsel) -- Selection to check
molid (int) -- VMD molecule ID to look for other CYS in
- Returns:
resnames (dict int -> str) Residue name translation dictionary atomnames (dict int -> str) Atom name translation dictionary conect (int) Residue this one is connected to
- get_linkage(selection, molid)¶
Checks if the selection corresponds to a residue that is covalently bonded to some other residue other than the normal + or - peptide bonds. Sets the patch line (bond line for leap) appropriately and matches atom names using a maximal subgraph isomorphism to the normal residue.
- Parameters:
selection (VMD atomsel) -- Selection to check
molid (int) -- VMD molecule ID to look for other bonded residue in
- Returns:
resnames (dict int -> str) Residue name translation dictionary atomnames (dict int -> str) Atom name translation dictionary conect (str) Leap patch line to apply for this linkage
- get_lipid_head(selection)¶
Obtains a name mapping for a lipid head group given a selection describing a possible lipid.
- Parameters:
selection (VMD atomsel) -- Selection to set names for
- Returns:
(dict int->str) Atom index to resname matched (dict int->str) Atom index to atom name matched up (int) Atom index corresponding to - direction tail
- Raises:
KeyError -- if no matching possible
- get_lipid_tails(selection, head)¶
Obtains a name mapping for both ligand tails in a system given a selection describing the lipid and the indices of the head group atoms.
- Parameters:
selection (VMD atomsel) -- Selection to pull tails from
head (list of int) -- Atom indices in the head group of this lipid. Obtain with get_lipid_head function.
- Returns:
- Atom index to
resname matched, atom index to atom name translation dictionaries for both tails
- Return type:
(array of tuples that are dict int->str)
- Raises:
ValueError -- If a tail could not be matched or if there is an incorrect number of tails somehow attached.
- get_names(selection, print_warning=False)¶
Obtains a name mapping for the current selection. Can't use parent here since name is stored in a different field.
- Parameters:
selection (VMD atomsel) -- Selection to set names for
print_warning (bool) -- Whether or not to print matching suggestions if matching fails. Set to false if you'll try patches later.
- Returns:
(dict int->str) Atom index to resname matched (dict int->str) Atom index to atom name matched up
- Raises:
KeyError -- if no matching possible
- get_unit(selection)¶
Gets the UNIT name corresponding to a selection. Helpful when matching ligands that may have a residue name different from unit name since the average user doesn't know what a unit name is when creating a library. This isn't as efficient as it could be because it is called infrequently, on ligands only.
- Parameters:
selection (VMD atomsel) -- Selection to check
molid (int) -- VMD molecule ID to look in
- Returns:
(str) The unit name, or None if there was no match
The AmberWriter
class requires an installation of AmberTools in
$AMBERHOME
, enabling the use of the most up to date forcefields and
bugfixes. The following parameter and topology sets are used by default:
- leaprc.protein.ff14SB
- leaprc.lipid14
- leaprc.lipid16
- leaprc.water.tip3p
- leaprc.gaff2
The tleap program is used to do the parameterization, and the ParmEd API then applies HMR if desired, and checks the result.
If the user is asking for input files for AMBER using CHARMM parameters,
the AmberWriter
class can also take the molecule ID of a
CHARMM-parameterized molecule, and invoke chamber
through the ParmEd
API to convert to the AMBER format.
- class dabble.param.AmberWriter(molid, **kwargs)¶
Bases:
MoleculeWriter
Creates AMBER-format input files, using either CHARMM or AMBER parameters.
When using the CHARMM parameters, creates a psf/pdb file and interfaces with ParmEd chamber command to create AMBER format files. When using the AMBER parameters, creates a pdb file and runs the amber lipid conversion script to create leap input files.
- write(filename)¶
Writes the parameter and topology files
- Parameters:
filename (str) -- File name to write. File type suffix will be added.
Charmm-specific Parameterization¶
Atom name matching in CHARMM is somewhat complicated by the fact that patches are allowed. Patches represent modifications to residues, and may be applied to any residue although in reality are only chemically valid for a subset of them. An example of a patch is the addition of a disulfide bond between two residues (DISU), or a cap on the N-terminal end of the protein (NTER, ACE).
If a residue cannot be matched from the provided topology definitions, Dabble's graph matching functionality generates a library of valid patch applications and attempts to match there, adding the relevant patches if a match is found. Subgraph isomorphism is used to help trim the size of this library.
With the exception of disulfide bonds, patches involving only one residue are currently supported.
Note that this requires patch atoms to be in the same residue. This can pose a challenge to recognition of caps, which are applied as separate residues in some protein preparation workflows (notably Maestro).
- class dabble.param.CharmmMatcher(topologies)¶
Bases:
MoleculeMatcher
Represents a collection of graphs of all residues defined in the topology files. Can pass in VMD molecules to be checked and atom names corrected
- topologies¶
The topology files this molecule graph knows about, from parent class
- Type:
list of str
- known_res¶
The molecule graphs, from parent class
- Type:
dict str resname -> networkx graph
- nodenames¶
Translates atom names to elements, from parent class
- Type:
dict name -> element
- known_pres¶
- Type:
dict tuple (str resname, patchname) -> networkx graph
- patches¶
Known patches
- Type:
dict patchname -> str instructions
- get_disulfide(selstring, molid)¶
Checks if the selection corresponds to a cysteine in a disulfide bond. Sets the patch line appropriately and matches atom names using a subgraph match to the normal cysteine residue
- Parameters:
selstring (str) -- Selection to check
molid (int) -- VMD molecule of entire system (needed for disu partner)
- Returns:
- (str, Patch, dict) resname matched, patch object for psfgen,
name translation dictionary
- get_names(selection, print_warning=False)¶
Returns at atom name matching up dictionary. Does the generic moleculematcher algorithm then checks that only one resname matched since for CHARMM there is no concept of a unit and only one named residue is defined per topology.
- Parameters:
selection (VMD atomsel) -- Selection to rename
print_warning (bool) -- Debug output
- Returns:
(str) resname matched (dict int->str) translation dictionary from index to atom name
- Raises:
ValueError if more than one residue name is matched --
- get_patches(selection)¶
Obtains names and patch info for a modified residue in a selection. Identifies which amino acid is patched by finding which amino acid this one is a maximal subgraph of. Then, builds a library of graphs representing all valid patches applied to this amino acid.
Note that this does NOT handle multiple-residue patches such as disulfides!
- Parameters:
selection (VMD atomsel) -- Selection that is patched
- Returns:
- (str, str, dict) resname matched, patch name applied,
name translation dictionary
- get_protein_segname(molid, fragment)¶
Gets the segment name from a given protein fragment. Sometimes fragment numbers for protein can be quite large if there are a lot of other molecules in the system. This method returns a consistent segment name for a given protein fragment.
The VMD plugin psfgen is used to generate parameterized files with CHARMM parameters. Dabble uses the Python interface to psfgen to invoke this code, and handles all its quirks (such as not allowing segments to have more than 10,000 atoms because its PDB parser is the worst).
- class dabble.param.CharmmWriter(molid, **kwargs)¶
Bases:
MoleculeWriter
An object that handles all the conversions to a psf file by interfacing with psfgen.
Writes a pdb/psf file pair from the current molecule using the CHARMM36 topology and atom names/types. Interfaces with psfgen by dynamically generating the .tcl file that psfgen takes as input. Prompts the user for additional topology files and helps with matching atom names that cannot be automatically translated to the charmm naming conventions.
- write(filename)¶
Writes the parameter and topology files
- Parameters:
filename (str) -- File name to write. File type suffix will be added.