Parameterization with Dabble

Regardless of simulation program or force field, parameterizing a molecular system consists of several steps:

  1. Parse the library of known residues provided by the forcefield

  2. Match system residues to those in library

  3. Assign residue and atom names based on matching

  4. Invoke simulation package preparatory program to build input files.

Pretty much all simulation package prepatory programs require that atom names match the defined residue types, and cannot match up for you. This resuls in many awkward work arounds, such as large dictionaries of name translations for each amino acid from RSC PDB names to the required ones, and tedious work on the part of the user for ligands.

Dabble does all of this work for you, and is in fact completely agnostic to the atom names in the input file. It does this by representing each residue as an undirected graph, with nodes being atoms. A library of these graphs is generated from provided parameter and topology files, and graph isomorphism is used to match up system residues to those in the library based on element.

As protein residues are in a long chain, the graph representation is a little trickier. Using the same philosophy as the CHARMM and AMBER parameter sets, "special" atoms called + and - are attached to represent the next and previous amino acid in the chain. This does require that all protein residues are correctly represented in the input file, that is, each resid does correspond to an entire amino acid.

There are several tweaks to this process depending on the desired output format. For example, CHARMM has the concept of "patches" that can be applied on top of residues. Dabble can detect and apply these correctly.

The following API functions are called on the built system from Dabble's builder, but you can use them on systems you have already built as well, or extend them to other systems.

Automatic atom naming

Atom matching classes inherit from MoleculeMatcher, which defines basic functionality for matching:

class dabble.param.MoleculeMatcher(**kwargs)

Bases: ABC

Represents a collection of graphs of all residues defined in the topology files. Can pass in VMD molecules to be checked and atom names corrected

topologies

The topology files this molecule graph knows about

Type:

list of str

known_res

The molecule graphs

Type:

dict str resname -> networkx graph

known_pres
Type:

dict tuple (str resname, patchname) -> networkx graph

patches

Known patches

Type:

dict patchname -> str instructions

nodenames

Translates atom names to elements

Type:

dict name -> element

pseudoatoms

Atom types for pseudo or dummy atoms

Type:

list of str

static get_element(mass)

Returns the element corresponding to a given mass. Masses are rounded up to get atomic number to control for varying numbers of decimal places. If the mass isn't in the table (it's not complete), returns "Other" which should be sufficient for our matching purposes.

Parameters:

mass (float) -- Mass to look up

Returns:

Element corresponding to the mass, or "Other"

Return type:

(str)

get_extraresidue_atoms(selection)

Determines if this selection contains bonds to other things. We

Parameters:

selection (atomsel) -- Selection to check

Returns:

List of indices of atoms belonging to other

residues, or an empty list

Return type:

(list of int)

get_names(selection, print_warning=False)

Obtains a name mapping for the current selection

Parameters:
  • selection (VMD atomsel) -- Selection to set names for

  • print_warning (bool) -- Whether or not to print matching suggestions if matching fails. Set to false if you'll try patches later.

Returns:

(dict int->str) Atom index to resname matched (dict int->str) Atom index to atom name matched up

Raises:

KeyError -- if no matching possible

static parse_vmd_graph(selection)

Takes a VMD atom selection, translates it to a graph representation

Parameters:

selection (VMD atomsel) -- Atom selection to verify, modified.

Returns:

Intermediate representation of the molecule, and True if the selection is covalently bonded to another residue.

Return type:

(networkx.Graph, bool)

Raises:
  • ValueError if atom selection is more than one residue --

  • LookupError if the residue couldn't be found in known templates --

static write_dot(graph, output)

Writes the residue as a dot file that can be rendered as a SVG to aid in debugging.

Parameters:
  • graph (networkx Graph) -- Graph to draw

  • output (str) -- Filename to write to

Generation of simulation input files

Simulation input files are generated by a preparatory program specific to that package. Dabble invokes this program instead of generating files itself as functionality may change, and application of correct forcefield parameters can be challenging. Also, there is no reason to reinvent the wheel.

For CHARMM, Dabble invokes the VMD plugin psfgen, and for AMBER, tleap. Dabble generates an input file for these programs, invokes them, and checks the output for correctness.

The design philosophy here is that this should be accomplished by a Writer class, that when initialized with a VMD molecule ID corresponding to the built system as well as parameter and/or topology files to be applied. It invokes the atom matching, then generates a new molecule with corrected atom and residue names. Invoking Writer.write with an output file name then generates an input file for and calls the setup program, checking the output along the way.

Amber-specific Parameterization

Atom name matching in AMBER has functionality specific to the modular head/tail representation of lipids, as well as identification of disulfide bonds.

class dabble.param.AmberMatcher(topologies)

Bases: MoleculeMatcher

Matches up names and residues for molecules for use with leap and the amber stack of preparatory programs

topologies

The topology files this molecule graph knows about. For amber, these are leaprc files.

Type:

list of str

known_res

The molecule graphs

Type:

dict str resname -> networkx graph

nodenames

Translates atom names to elements

Type:

dict name -> element

get_disulfide(selection, molid)

Checks if the selection corresponds to a cysteine in a disulfide bond. Sets the patch line appropriately and matches atom names using a subgraph match to the normal cysteine residue

Parameters:
  • selection (VMD atomsel) -- Selection to check

  • molid (int) -- VMD molecule ID to look for other CYS in

Returns:

resnames (dict int -> str) Residue name translation dictionary atomnames (dict int -> str) Atom name translation dictionary conect (int) Residue this one is connected to

get_linkage(selection, molid)

Checks if the selection corresponds to a residue that is covalently bonded to some other residue other than the normal + or - peptide bonds. Sets the patch line (bond line for leap) appropriately and matches atom names using a maximal subgraph isomorphism to the normal residue.

Parameters:
  • selection (VMD atomsel) -- Selection to check

  • molid (int) -- VMD molecule ID to look for other bonded residue in

Returns:

resnames (dict int -> str) Residue name translation dictionary atomnames (dict int -> str) Atom name translation dictionary conect (str) Leap patch line to apply for this linkage

get_lipid_head(selection)

Obtains a name mapping for a lipid head group given a selection describing a possible lipid.

Parameters:

selection (VMD atomsel) -- Selection to set names for

Returns:

(dict int->str) Atom index to resname matched (dict int->str) Atom index to atom name matched up (int) Atom index corresponding to - direction tail

Raises:

KeyError -- if no matching possible

get_lipid_tails(selection, head)

Obtains a name mapping for both ligand tails in a system given a selection describing the lipid and the indices of the head group atoms.

Parameters:
  • selection (VMD atomsel) -- Selection to pull tails from

  • head (list of int) -- Atom indices in the head group of this lipid. Obtain with get_lipid_head function.

Returns:

Atom index to

resname matched, atom index to atom name translation dictionaries for both tails

Return type:

(array of tuples that are dict int->str)

Raises:

ValueError -- If a tail could not be matched or if there is an incorrect number of tails somehow attached.

get_names(selection, print_warning=False)

Obtains a name mapping for the current selection. Can't use parent here since name is stored in a different field.

Parameters:
  • selection (VMD atomsel) -- Selection to set names for

  • print_warning (bool) -- Whether or not to print matching suggestions if matching fails. Set to false if you'll try patches later.

Returns:

(dict int->str) Atom index to resname matched (dict int->str) Atom index to atom name matched up

Raises:

KeyError -- if no matching possible

get_unit(selection)

Gets the UNIT name corresponding to a selection. Helpful when matching ligands that may have a residue name different from unit name since the average user doesn't know what a unit name is when creating a library. This isn't as efficient as it could be because it is called infrequently, on ligands only.

Parameters:
  • selection (VMD atomsel) -- Selection to check

  • molid (int) -- VMD molecule ID to look in

Returns:

(str) The unit name, or None if there was no match

The AmberWriter class requires an installation of AmberTools in $AMBERHOME, enabling the use of the most up to date forcefields and bugfixes. The following parameter and topology sets are used by default:

- leaprc.protein.ff14SB
- leaprc.lipid14
- leaprc.lipid16
- leaprc.water.tip3p
- leaprc.gaff2

The tleap program is used to do the parameterization, and the ParmEd API then applies HMR if desired, and checks the result.

If the user is asking for input files for AMBER using CHARMM parameters, the AmberWriter class can also take the molecule ID of a CHARMM-parameterized molecule, and invoke chamber through the ParmEd API to convert to the AMBER format.

class dabble.param.AmberWriter(molid, **kwargs)

Bases: MoleculeWriter

Creates AMBER-format input files, using either CHARMM or AMBER parameters.

When using the CHARMM parameters, creates a psf/pdb file and interfaces with ParmEd chamber command to create AMBER format files. When using the AMBER parameters, creates a pdb file and runs the amber lipid conversion script to create leap input files.

write(filename)

Writes the parameter and topology files

Parameters:

filename (str) -- File name to write. File type suffix will be added.

Charmm-specific Parameterization

Atom name matching in CHARMM is somewhat complicated by the fact that patches are allowed. Patches represent modifications to residues, and may be applied to any residue although in reality are only chemically valid for a subset of them. An example of a patch is the addition of a disulfide bond between two residues (DISU), or a cap on the N-terminal end of the protein (NTER, ACE).

If a residue cannot be matched from the provided topology definitions, Dabble's graph matching functionality generates a library of valid patch applications and attempts to match there, adding the relevant patches if a match is found. Subgraph isomorphism is used to help trim the size of this library.

With the exception of disulfide bonds, patches involving only one residue are currently supported.

Note that this requires patch atoms to be in the same residue. This can pose a challenge to recognition of caps, which are applied as separate residues in some protein preparation workflows (notably Maestro).

class dabble.param.CharmmMatcher(topologies)

Bases: MoleculeMatcher

Represents a collection of graphs of all residues defined in the topology files. Can pass in VMD molecules to be checked and atom names corrected

topologies

The topology files this molecule graph knows about, from parent class

Type:

list of str

known_res

The molecule graphs, from parent class

Type:

dict str resname -> networkx graph

nodenames

Translates atom names to elements, from parent class

Type:

dict name -> element

known_pres
Type:

dict tuple (str resname, patchname) -> networkx graph

patches

Known patches

Type:

dict patchname -> str instructions

get_disulfide(selstring, molid)

Checks if the selection corresponds to a cysteine in a disulfide bond. Sets the patch line appropriately and matches atom names using a subgraph match to the normal cysteine residue

Parameters:
  • selstring (str) -- Selection to check

  • molid (int) -- VMD molecule of entire system (needed for disu partner)

Returns:

(str, Patch, dict) resname matched, patch object for psfgen,

name translation dictionary

get_names(selection, print_warning=False)

Returns at atom name matching up dictionary. Does the generic moleculematcher algorithm then checks that only one resname matched since for CHARMM there is no concept of a unit and only one named residue is defined per topology.

Parameters:
  • selection (VMD atomsel) -- Selection to rename

  • print_warning (bool) -- Debug output

Returns:

(str) resname matched (dict int->str) translation dictionary from index to atom name

Raises:

ValueError if more than one residue name is matched --

get_patches(selection)

Obtains names and patch info for a modified residue in a selection. Identifies which amino acid is patched by finding which amino acid this one is a maximal subgraph of. Then, builds a library of graphs representing all valid patches applied to this amino acid.

Note that this does NOT handle multiple-residue patches such as disulfides!

Parameters:

selection (VMD atomsel) -- Selection that is patched

Returns:

(str, str, dict) resname matched, patch name applied,

name translation dictionary

get_protein_segname(molid, fragment)

Gets the segment name from a given protein fragment. Sometimes fragment numbers for protein can be quite large if there are a lot of other molecules in the system. This method returns a consistent segment name for a given protein fragment.

The VMD plugin psfgen is used to generate parameterized files with CHARMM parameters. Dabble uses the Python interface to psfgen to invoke this code, and handles all its quirks (such as not allowing segments to have more than 10,000 atoms because its PDB parser is the worst).

class dabble.param.CharmmWriter(molid, **kwargs)

Bases: MoleculeWriter

An object that handles all the conversions to a psf file by interfacing with psfgen.

Writes a pdb/psf file pair from the current molecule using the CHARMM36 topology and atom names/types. Interfaces with psfgen by dynamically generating the .tcl file that psfgen takes as input. Prompts the user for additional topology files and helps with matching atom names that cannot be automatically translated to the charmm naming conventions.

write(filename)

Writes the parameter and topology files

Parameters:

filename (str) -- File name to write. File type suffix will be added.