trajinfo.info_single#

Classes to work with ensembles of trajectories.

The statistics of a protein can be better described by an ensemble of proteins, rather than a single long trajectory. Treating a protein in such a way opens great possibilities and changes the way one can treat molecular dynamics data. Trajectory ensembles allow:

Faster convergence via adaptive sampling.

Better anomaly detection of unique structural states.

This subpackage contains two classes which are containers of trajectory data. The SingleTraj trajectory contains information about a single trajectory. The TrajEnsemble class contains information about multiple trajectories. This adds a new dimension to MD data. The time and atom dimension are already established. Two frames can be appended along the time axis to get a trajectory with multiple frames. If they are appended along the atom axis, the new frame contains the atoms of these two. The trajectory works in a similar fashion. Adding two trajectories along the trajectory axis returns a trajectory ensemble, represented as a TrajEnsemble class in this package.

trajinfo.trajinfo_utils#

Util functions for the TrajEnsemble and SingleTraj classes.

class Bond(resname, type, atom1, atom2)[source]#

Dataclass, that contains information of an atomic bond.

Parameters:

resname (str)
type (Literal['add', 'delete', 'optional', 'optional_delete'])
atom1 (str | int)
atom2 (str | int)

resname#

The name of the residue, this bond belongs to. Although bonds belong to residues, they can also have atom1 or atom2 belonging to a different residue.

Type:: str

type#

Defines what should be done with this bond. ‘add’, adds it to the topology and raises an Exception if the bond was already present. ‘optional’ does the same as ‘add’, but without raising an Exception. ‘delete’ deletes this bond from the topology. An Exception is raised, if this bond wasn’t even in the topology to begin with. ‘optional_delete’ deletes bonds, but doesn’t raise an Exception.

Type:: Literal[“add”, “delete”, “optional”, “optional_delete”]

atom1#

The name of the first atom. Can be ‘CA’, ‘N’, or whatever (not limited to proteins). If it is int it can be any other atom of the topology (also belonging to a different residue).

Type:: Union[str, int]

atom2#

The name of the second atom. Can be ‘CA’, ‘N’, or whatever (not limited to proteins). If it is int it can be any other atom of the topology (also belonging to a different residue).

Type:: Union[str, int]

atom1: str | int#

atom2: str | int#

resname: str#

type: Literal['add', 'delete', 'optional', 'optional_delete']#

class CustomTopology(*new_residues, traj=None)[source]#

Adds custom topology elements to a topology parsed by MDTraj.

Postpones parsing the custom AAs until requested.

The custom_aminoacids dictionary follows these styleguides:

The keys can be str or tuple[str, str]
If a key is str, it needs to be a 3-letter code (MET, ALA, GLY, …)
If a key is a tuple[str, str], the first str of the tuple is a common_str
(see the docstring for encodermap.TrajEnsemble to learn about common_str. This common_str can be used to apply custom topologies to an ensemble based on their common_str. For example:
{("CSR_mutant", "CSR"): ...}
A key can also affect only a single residue (not all resides called “CSR”).
For that, the 3-letter code of the residue needs to be postponed with a dash and the 1-based indexed resSeq of the residue:
{"CSR-2": ...}
The value to a key can be None, which means this residue will not be
used for building a topology. Because EncoderMap raises Exceptions, when it encounters unknown residues (to make sure, you don’t forget to featurize some important residues), it will also raise Exceptions when the topology contains unknown solvents/solutes. If you run a simulation in water/methanol mixtures with the residue names SOL and MOH, EncoderMap will raise an Exception upon encountering MOH, so your custom topology should contain 1{“MOH”: None}` to include MOH.
The value of a key can also be a tuple[str, Union[dict, None]]. In this
case, the first string should be the one-letter code of the residue or the residue most closely representing this residue. If you use phosphotyrosine (PTR) in your simulations and want to use it as a standard tyrosine residue, the custom topology should contain {“PTR”: (“Y”, None)}
If your residue is completely novel you need to define all possible
bonds, backbone and sidechain dihedrals yourself. For that, you want to provide a tuple[str, dict[str, Union[listr[str], list[int]]] type. This second level dict allows for the following keys: * bonds: For bonds between atoms. This key can contain a list[tuple[str, str]],

which defines bonds in this residue. This dict defines a bond between N and CA in phosphothreonine. {“PTR”: (“Y”, {

“bonds”: [
(“N”, “CA”),

],

}} These strings can cotain + and - signs to denote bonds to previous or following residues. To connect the residues MET1 to TPO2 to ALA3, you want to have this dict: {“TPO”: (“T”, {

“bonds”: [
(“-C”, “N”), # bond to MET1-C (“N”, “CA”), … (“C”, “+N”), # bond to ALA2-N

],

}} For exotic bonds, one of the strings can also be int to connect to any 0-based indexed atom in your topology. You can connect the residues CYS2 and CYS20 wit a sulfide bride like so: {“CYS-2”: (“C”, {

“bonds”: [
(“S”, 321), # connect to CYS20, the 321 is a placeholder

], },

“CYS-20”: (“C”, {

“bonds”: [
(20, “S”), # connect to CYS2

], },

}
- optional_bonds: This key accepts the same list[tuple] as ‘bonds’.
  However, bonds will raise an Exception if a bond already exists. The above example with a disulfide bridge between CYS2 and CYS20 will thus raise an exception. A better example is: {“CYS-2”: (“C”, {
  
  “optional_bonds”: [
  (“S”, 321), # connect to CYS20, the 321 is a placeholder
  
  ], },
  
  “CYS-20”: (“C”, {
  
  “optional_bonds”: [
  (20, “S”), # connect to CYS2
  
  ], },
  
  }
- delete_bonds: This key accepts the same list[tuple] as ‘bonds’,
  but will remove bonds. If a bond was marked for deletion, but does not exist in your topology, an Exception will be raised. To delete bonds, without raising an Exception, use:
- optional_delete_bonds: This will delete bonds, if they are present
  and won’t raise an Exception if no bond is present.
- PHI, PSI, OMEGA: These keys define the backbone torsions of this
  residue. You can just provide a list[str] for these keys. But the str can contain + and - to use atoms in previous or following residues. Example: {
  
  “CYS-2”: (
  “C”, {
  
  “PHI”: [“-C”, “N”, “CA”, “C”], “PSI”: [“N”, “CA”, “C”, “+N”], “OMEGA”: [“CA”, “C”, “+N”, “+CA”],
  
  },
  
  ),
  
  }
- not-PSI, not_OMEGA, not_PHI: Same as ‘PHI’, ‘PSI”, ‘OMEGA’, but
  will remove these dihedrals from consideration. The vales of these keys do not matter. Example: {
  
  “CYS-2”: (
  “C”, {
  
  “PHI”: [“-C”, “N”, “CA”, “C”], “not_PSI”: [], # value for not_* keys does not matter “not_OMEGA”: [], # it just makes EncoderMap skip these dihedrals.
  
  },
  
  ),
  
  }
- CHI1, …, CHI5: Finally, these keys define the atoms considered for
  the sidechain angles. If you want to add extra sidechain dihedrals for phosphothreonine, you can do: {
  
  “TPO”: (
  “T”, {
  
  “CHI2”: [“CA”, “CB”, “OG1”, “P”], # include phosphorus in sidechain angles “CHI3”: [“CB”, “OG1”, “P”, “OXT”], # include the terminal axygen in sidechain angles
  
  },
  
  )
  
  }

Examples

>>> # Aminoacids taken from https://www.swisssidechain.ch/
>>> # The provided .pdb file has only strange and unnatural aminoacids.
>>> # Its sequence is:
>>> # TPO - PTR - ORN - OAS - 2AG - CSR
>>> # TPO: phosphothreonine
>>> # PTR: phosphotyrosine
>>> # ORN: ornithine
>>> # OAS: o-acetylserine
>>> # 2AG: 2-allyl-glycine
>>> # CSR: selenocysteine
>>> # However, someone mis-named the 2AG residue to ALL
>>> # Let's fix that with EncoderMap's CustomTopology
>>> import encodermap as em
>>> from pathlib import Path
...
>>> traj = em.load(Path(em.__file__).resolve().parent.parent / "tests/data/unnatural_aminoacids.pdb")
...
>>> custom_aas = {
...     "ALL": ("A", None),  # makes EncoderMap treat 2-allyl-glycine as alanine
...     "OAS": (
...         "S",  # OAS is 2-acetylserine
...         {
...             "CHI2": ["CA", "CB", "OG", "CD"],  # this is a non-standard chi2 angle
...             "CHI3": ["CB", "OG", "CD", "CE"],  # this is a non-standard chi3 angle
...         },
...     ),
...     "CSR": (  # CSR is selenocysteine
...         "S",
...         {
...             "bonds": [   # we can manually define bonds for selenocysteine like so:
...                 ("-C", "N"),      # bond between previous carbon and nitrogen CSR
...                 ("N", "CA"),
...                 ("N", "H1"),
...                 ("CA", "C"),
...                 ("CA", "HA"),     # this topology includes hydrogens
...                 ("C", "O"),
...                 ("C", "OXT"),     # As the C-terminal residue, we don't need to put ("C", "+N") here
...                 ("CA", "CB"),
...                 ("CB", "HB1"),
...                 ("CB", "HB2"),
...                 ("CB", "SE"),
...                 ("SE", "HE"),
...             ],
...             "CHI1": ["N", "CA", "CB", "SE"],  # this is a non-standard chi1 angle
...         },
...     ),
...     "TPO": (  # TPO is phosphothreonine
...         "T",
...         {
...             "CHI2": ["CA", "CB", "OG1", "P"],  # a non-standard chi2 angle
...             "CHI3": ["CB", "OG1", "P", "OXT"],  # a non-standard chi3 angle
...         },
...     ),
... }
...
>>> # loading this will raise an Exception, because the bonds in CSR already exist
>>> traj.load_custom_topology(custom_aas)  
Traceback (most recent call last):
    ...
Exception: Bond between ALL5-C and CSR6-N already exists. Consider using the key 'optional_bonds' to not raise an Exception on already existing bonds.
>>> # If we rename the "bonds" section in "CSR" to "optional_bonds" it will work
>>> custom_aas["CSR"][1]["optional_bonds"] = custom_aas["CSR"][1].pop("bonds")
>>> traj.load_custom_topology(custom_aas)
>>> sidechains = em.features.SideChainDihedrals(traj).describe()
>>> "SIDECHDIH CHI2  RESID  OAS:   4 CHAIN 0" in sidechains
True
>>> "SIDECHDIH CHI3  RESID  OAS:   4 CHAIN 0" in sidechains
True
>>> "SIDECHDIH CHI1  RESID  CSR:   6 CHAIN 0" in sidechains
True
>>> "SIDECHDIH CHI2  RESID  TPO:   1 CHAIN 0" in sidechains
True
>>> "SIDECHDIH CHI3  RESID  TPO:   1 CHAIN 0" in sidechains
True

Parameters:

new_residues (NewResidue)
traj (Optional[SingleTraj])

add_amino_acid_codes()[source]#

Return type:: None

add_bonds()[source]#

Adds and deletes bonds specified in the custom topology.

Returns:: The new topology.
Return type:: md.Topology

add_new_residue(new_residue)[source]#

Adds an instance of NewResidue to the reisdues of this CustomTopology.

Parameters:: new_residue (NewResidue) – An instance of NewResidue.
Return type:: None

atom_sequence(type)[source]#

Returns either backbone or sidechain indices in a useful order.

Parameters:: type (Literal["OMEGA", "PHI", "PSI", "CHI1", "CHI2", "CHI3", "CHI4", "CHI5"]) – The angle, that is looked for.
Returns:: A tuple containing two numpy arrays:
Return type:: tuple[np.ndarray, np.ndarray]

backbone_sequence(atom_names, type)[source]#

Searches for a sequence along the backbone.

Parameters:

atom_names (list[str]) – The names of the atoms. Can use +/- to mark atoms in previous or following residue.
type (Literal["PHI", "PSI", "OMEGA"]) – The type of the dihedral sequence.

Returns:

The integer indices of the requested atoms.

Return type:

np.ndarray

combine_chains(chain_id1, chain_id2)[source]#

Function to combine two chains into one.

Parameters:

chain_id1 (int) – The 0-based index of chain 1.
chain_id2 (int) – The 0-based index of chain 2.

Return type:

None

classmethod from_dict(custom_aas, traj=None)[source]#

Instantiate the class from a dictionary.

Parameters:

custom_aas (CustomAAsDict) –
Custom AAs defined by a dict with the following properties: The keys are the residue names encountered in this traj. The values to the keys can be one of three types:
- None: if a key: None pair is supplied, this just adds the
  residue to the recognized residues. Nothing will be done with it.
- str: If a key: str pair is supplied, it is expected that the
  string matches one of the one-letter amino-acid codes. If your new residue is based on Lysine and you named it LYQ, you need to supply: {“LYQ”: “K”}
- tuple[str, dict]: If your residue has nonstandard side-chain
  angles (i.e. due to phosphorylation), you can supply a tuple of the one-letter amino-acid code and a dict which defines the sidechain angles like so: {“THR”: (“T”, {“CHI2”: [“CA”, “CB”, “CG”, “P”]})} In this example, the standard amino acid threonine was phosphorylated. The chi2 angle was added. If you want to add custom bonds you can add the “bond” key to the dict and give it either atom names or atom indices of other atoms like so: {“LYQ”: (“K”, {“bonds”: [(“N”, “CA”), (“N”, “H”), …], “CHI1”: [“N”, “CA”, “CB”, “CG”]}).
- tuple[str, str, dict]: In this case, the first string should
  be the name of the amino-acid, the second string should be a common_str, that is in self.common_str. That way, the different topologies in this TrajEnsemble can dynamically use different custom_aas.
traj (SingleTraj | None)

classmethod from_hdf5_file(fname, traj=None)[source]#

Parameters:

fname (Path | str)
traj (SingleTraj | None)

classmethod from_json(json_str, traj=None)[source]#

The same as from_dict, but using a json str.

Parameters:

json_str (str)
traj (SingleTraj | None)

classmethod from_yaml(path, traj=None)[source]#

Parameters:

path (str | Path)
traj (SingleTraj | None)

get_single_residue_atom_ids(atom_names, r, key_error_ok=False)[source]#

Gives the 0-based atom ids of a single residue.

Parameters:

atom_names (list[str]) – The names of the atoms. ie. [‘N’ ,’CA’, ‘C’, ‘+N’]
r (NewResidue) – An instance of NewResidue.
key_error_ok (bool) – Whether a key error when querying self._atom_dict raises an error or returns an empty np.ndarray.

Returns:

An integer array with the ids of the requested atoms.

Return type:

np.ndarray

indices_chi1()[source]#