trajinfo.info_single#

Classes to work with ensembles of trajectories.

The statistics of a protein can be better described by an ensemble of proteins, rather than a single long trajectory. Treating a protein in such a way opens great possibilities and changes the way one can treat molecular dynamics data. Trajectory ensembles allow:

  • Faster convergence via adaptive sampling.

  • Better anomaly detection of unique structural states.

This subpackage contains two classes which are containers of trajectory data. The SingleTraj trajectory contains information about a single trajectory. The TrajEnsemble class contains information about multiple trajectories. This adds a new dimension to MD data. The time and atom dimension are already established. Two frames can be appended along the time axis to get a trajectory with multiple frames. If they are appended along the atom axis, the new frame contains the atoms of these two. The trajectory works in a similar fashion. Adding two trajectories along the trajectory axis returns a trajectory ensemble, represented as a TrajEnsemble class in this package.

class Capturing(iterable=(), /)[source]#

Class to capture print statements from function calls.

Examples

>>> # write a function
>>> def my_func(arg='argument'):
...     print(arg)
...     return('fin')
>>> # use capturing context manager
>>> with Capturing() as output:
...     my_func('new_argument')
>>> print(output)
['new_argument', "'fin'"]
exception MixedUpInputs[source]#

For when the user provides trajectories as topologies and vice versa.

class SingleTrajFsel(other)[source]#

trajinfo.trajinfo_utils#

Util functions for the TrajEnsemble and SingleTraj classes.

class Bond(resname, type, atom1, atom2)[source]#

Dataclass, that contains information of an atomic bond.

Parameters:
  • resname (str)

  • type (Literal['add', 'delete', 'optional', 'optional_delete'])

  • atom1 (str | int)

  • atom2 (str | int)

resname#

The name of the residue, this bond belongs to. Although bonds belong to residues, they can also have atom1 or atom2 belonging to a different residue.

Type:

str

type#

Defines what should be done with this bond. ‘add’, adds it to the topology and raises an Exception if the bond was already present. ‘optional’ does the same as ‘add’, but without raising an Exception. ‘delete’ deletes this bond from the topology. An Exception is raised, if this bond wasn’t even in the topology to begin with. ‘optional_delete’ deletes bonds, but doesn’t raise an Exception.

Type:

Literal[“add”, “delete”, “optional”, “optional_delete”]

atom1#

The name of the first atom. Can be ‘CA’, ‘N’, or whatever (not limited to proteins). If it is int it can be any other atom of the topology (also belonging to a different residue).

Type:

Union[str, int]

atom2#

The name of the second atom. Can be ‘CA’, ‘N’, or whatever (not limited to proteins). If it is int it can be any other atom of the topology (also belonging to a different residue).

Type:

Union[str, int]

atom1: str | int#
atom2: str | int#
resname: str#
type: Literal['add', 'delete', 'optional', 'optional_delete']#
class CustomTopology(*new_residues, traj=None)[source]#

Adds custom topology elements to a topology parsed by MDTraj.

Postpones parsing the custom AAs until requested.

The custom_aminoacids dictionary follows these styleguides:
  • The keys can be str or tuple[str, str]

  • If a key is str, it needs to be a 3-letter code (MET, ALA, GLY, …)

  • If a key is a tuple[str, str], the first str of the tuple is a common_str

    (see the docstring for encodermap.TrajEnsemble to learn about common_str. This common_str can be used to apply custom topologies to an ensemble based on their common_str. For example:

    {("CSR_mutant", "CSR"): ...}
    
  • A key can also affect only a single residue (not all resides called “CSR”).

    For that, the 3-letter code of the residue needs to be postponed with a dash and the 1-based indexed resSeq of the residue:

    {"CSR-2": ...}
    
  • The value to a key can be None, which means this residue will not be

    used for building a topology. Because EncoderMap raises Exceptions, when it encounters unknown residues (to make sure, you don’t forget to featurize some important residues), it will also raise Exceptions when the topology contains unknown solvents/solutes. If you run a simulation in water/methanol mixtures with the residue names SOL and MOH, EncoderMap will raise an Exception upon encountering MOH, so your custom topology should contain 1{“MOH”: None}` to include MOH.

  • The value of a key can also be a tuple[str, Union[dict, None]]. In this

    case, the first string should be the one-letter code of the residue or the residue most closely representing this residue. If you use phosphotyrosine (PTR) in your simulations and want to use it as a standard tyrosine residue, the custom topology should contain {“PTR”: (“Y”, None)}

  • If your residue is completely novel you need to define all possible

    bonds, backbone and sidechain dihedrals yourself. For that, you want to provide a tuple[str, dict[str, Union[listr[str], list[int]]] type. This second level dict allows for the following keys: * bonds: For bonds between atoms. This key can contain a list[tuple[str, str]],

    which defines bonds in this residue. This dict defines a bond between N and CA in phosphothreonine. {“PTR”: (“Y”, {

    “bonds”: [

    (“N”, “CA”),

    ],

    }} These strings can cotain + and - signs to denote bonds to previous or following residues. To connect the residues MET1 to TPO2 to ALA3, you want to have this dict: {“TPO”: (“T”, {

    “bonds”: [

    (“-C”, “N”), # bond to MET1-C (“N”, “CA”), … (“C”, “+N”), # bond to ALA2-N

    ],

    }} For exotic bonds, one of the strings can also be int to connect to any 0-based indexed atom in your topology. You can connect the residues CYS2 and CYS20 wit a sulfide bride like so: {“CYS-2”: (“C”, {

    “bonds”: [

    (“S”, 321), # connect to CYS20, the 321 is a placeholder

    ], },

    “CYS-20”: (“C”, {
    “bonds”: [

    (20, “S”), # connect to CYS2

    ], },

    }

    • optional_bonds: This key accepts the same list[tuple] as ‘bonds’.

      However, bonds will raise an Exception if a bond already exists. The above example with a disulfide bridge between CYS2 and CYS20 will thus raise an exception. A better example is: {“CYS-2”: (“C”, {

      “optional_bonds”: [

      (“S”, 321), # connect to CYS20, the 321 is a placeholder

      ], },

      “CYS-20”: (“C”, {
      “optional_bonds”: [

      (20, “S”), # connect to CYS2

      ], },

      }

    • delete_bonds: This key accepts the same list[tuple] as ‘bonds’,

      but will remove bonds. If a bond was marked for deletion, but does not exist in your topology, an Exception will be raised. To delete bonds, without raising an Exception, use:

    • optional_delete_bonds: This will delete bonds, if they are present

      and won’t raise an Exception if no bond is present.

    • PHI, PSI, OMEGA: These keys define the backbone torsions of this

      residue. You can just provide a list[str] for these keys. But the str can contain + and - to use atoms in previous or following residues. Example: {

      “CYS-2”: (

      “C”, {

      “PHI”: [“-C”, “N”, “CA”, “C”], “PSI”: [“N”, “CA”, “C”, “+N”], “OMEGA”: [“CA”, “C”, “+N”, “+CA”],

      },

      ),

      }

    • not-PSI, not_OMEGA, not_PHI: Same as ‘PHI’, ‘PSI”, ‘OMEGA’, but

      will remove these dihedrals from consideration. The vales of these keys do not matter. Example: {

      “CYS-2”: (

      “C”, {

      “PHI”: [“-C”, “N”, “CA”, “C”], “not_PSI”: [], # value for not_* keys does not matter “not_OMEGA”: [], # it just makes EncoderMap skip these dihedrals.

      },

      ),

      }

    • CHI1, …, CHI5: Finally, these keys define the atoms considered for

      the sidechain angles. If you want to add extra sidechain dihedrals for phosphothreonine, you can do: {

      “TPO”: (

      “T”, {

      “CHI2”: [“CA”, “CB”, “OG1”, “P”], # include phosphorus in sidechain angles “CHI3”: [“CB”, “OG1”, “P”, “OXT”], # include the terminal axygen in sidechain angles

      },

      )

      }

Examples

>>> # Aminoacids taken from https://www.swisssidechain.ch/
>>> # The provided .pdb file has only strange and unnatural aminoacids.
>>> # Its sequence is:
>>> # TPO - PTR - ORN - OAS - 2AG - CSR
>>> # TPO: phosphothreonine
>>> # PTR: phosphotyrosine
>>> # ORN: ornithine
>>> # OAS: o-acetylserine
>>> # 2AG: 2-allyl-glycine
>>> # CSR: selenocysteine
>>> # However, someone mis-named the 2AG residue to ALL
>>> # Let's fix that with EncoderMap's CustomTopology
>>> import encodermap as em
>>> from pathlib import Path
...
>>> traj = em.load(Path(em.__file__).resolve().parent.parent / "tests/data/unnatural_aminoacids.pdb")
...
>>> custom_aas = {
...     "ALL": ("A", None),  # makes EncoderMap treat 2-allyl-glycine as alanine
...     "OAS": (
...         "S",  # OAS is 2-acetylserine
...         {
...             "CHI2": ["CA", "CB", "OG", "CD"],  # this is a non-standard chi2 angle
...             "CHI3": ["CB", "OG", "CD", "CE"],  # this is a non-standard chi3 angle
...         },
...     ),
...     "CSR": (  # CSR is selenocysteine
...         "S",
...         {
...             "bonds": [   # we can manually define bonds for selenocysteine like so:
...                 ("-C", "N"),      # bond between previous carbon and nitrogen CSR
...                 ("N", "CA"),
...                 ("N", "H1"),
...                 ("CA", "C"),
...                 ("CA", "HA"),     # this topology includes hydrogens
...                 ("C", "O"),
...                 ("C", "OXT"),     # As the C-terminal residue, we don't need to put ("C", "+N") here
...                 ("CA", "CB"),
...                 ("CB", "HB1"),
...                 ("CB", "HB2"),
...                 ("CB", "SE"),
...                 ("SE", "HE"),
...             ],
...             "CHI1": ["N", "CA", "CB", "SE"],  # this is a non-standard chi1 angle
...         },
...     ),
...     "TPO": (  # TPO is phosphothreonine
...         "T",
...         {
...             "CHI2": ["CA", "CB", "OG1", "P"],  # a non-standard chi2 angle
...             "CHI3": ["CB", "OG1", "P", "OXT"],  # a non-standard chi3 angle
...         },
...     ),
... }
...
>>> # loading this will raise an Exception, because the bonds in CSR already exist
>>> traj.load_custom_topology(custom_aas)  
Traceback (most recent call last):
    ...
Exception: Bond between ALL5-C and CSR6-N already exists. Consider using the key 'optional_bonds' to not raise an Exception on already existing bonds.
>>> # If we rename the "bonds" section in "CSR" to "optional_bonds" it will work
>>> custom_aas["CSR"][1]["optional_bonds"] = custom_aas["CSR"][1].pop("bonds")
>>> traj.load_custom_topology(custom_aas)
>>> sidechains = em.features.SideChainDihedrals(traj).describe()
>>> "SIDECHDIH CHI2  RESID  OAS:   4 CHAIN 0" in sidechains
True
>>> "SIDECHDIH CHI3  RESID  OAS:   4 CHAIN 0" in sidechains
True
>>> "SIDECHDIH CHI1  RESID  CSR:   6 CHAIN 0" in sidechains
True
>>> "SIDECHDIH CHI2  RESID  TPO:   1 CHAIN 0" in sidechains
True
>>> "SIDECHDIH CHI3  RESID  TPO:   1 CHAIN 0" in sidechains
True
Parameters:
add_amino_acid_codes()[source]#
Return type:

None

add_bonds()[source]#

Adds and deletes bonds specified in the custom topology.

Returns:

The new topology.

Return type:

md.Topology

add_new_residue(new_residue)[source]#

Adds an instance of NewResidue to the reisdues of this CustomTopology.

Parameters:

new_residue (NewResidue) – An instance of NewResidue.

Return type:

None

atom_sequence(type)[source]#

Returns either backbone or sidechain indices in a useful order.

Parameters:

type (Literal["OMEGA", "PHI", "PSI", "CHI1", "CHI2", "CHI3", "CHI4", "CHI5"]) – The angle, that is looked for.

Returns:

A tuple containing two numpy arrays:

Return type:

tuple[np.ndarray, np.ndarray]

backbone_sequence(atom_names, type)[source]#

Searches for a sequence along the backbone.

Parameters:
  • atom_names (list[str]) – The names of the atoms. Can use +/- to mark atoms in previous or following residue.

  • type (Literal["PHI", "PSI", "OMEGA"]) – The type of the dihedral sequence.

Returns:

The integer indices of the requested atoms.

Return type:

np.ndarray

combine_chains(chain_id1, chain_id2)[source]#

Function to combine two chains into one.

Parameters:
  • chain_id1 (int) – The 0-based index of chain 1.

  • chain_id2 (int) – The 0-based index of chain 2.

Return type:

None

classmethod from_dict(custom_aas, traj=None)[source]#

Instantiate the class from a dictionary.

Parameters:
  • custom_aas (CustomAAsDict) –

    Custom AAs defined by a dict with the following properties: The keys are the residue names encountered in this traj. The values to the keys can be one of three types:

    • None: if a key: None pair is supplied, this just adds the

      residue to the recognized residues. Nothing will be done with it.

    • str: If a key: str pair is supplied, it is expected that the

      string matches one of the one-letter amino-acid codes. If your new residue is based on Lysine and you named it LYQ, you need to supply: {“LYQ”: “K”}

    • tuple[str, dict]: If your residue has nonstandard side-chain

      angles (i.e. due to phosphorylation), you can supply a tuple of the one-letter amino-acid code and a dict which defines the sidechain angles like so: {“THR”: (“T”, {“CHI2”: [“CA”, “CB”, “CG”, “P”]})} In this example, the standard amino acid threonine was phosphorylated. The chi2 angle was added. If you want to add custom bonds you can add the “bond” key to the dict and give it either atom names or atom indices of other atoms like so: {“LYQ”: (“K”, {“bonds”: [(“N”, “CA”), (“N”, “H”), …], “CHI1”: [“N”, “CA”, “CB”, “CG”]}).

    • tuple[str, str, dict]: In this case, the first string should

      be the name of the amino-acid, the second string should be a common_str, that is in self.common_str. That way, the different topologies in this TrajEnsemble can dynamically use different custom_aas.

  • traj (SingleTraj | None)

classmethod from_hdf5_file(fname, traj=None)[source]#
Parameters:
classmethod from_json(json_str, traj=None)[source]#

The same as from_dict, but using a json str.

Parameters:
classmethod from_yaml(path, traj=None)[source]#
Parameters:
get_single_residue_atom_ids(atom_names, r, key_error_ok=False)[source]#

Gives the 0-based atom ids of a single residue.

Parameters:
  • atom_names (list[str]) – The names of the atoms. ie. [‘N’ ,’CA’, ‘C’, ‘+N’]

  • r (NewResidue) – An instance of NewResidue.

  • key_error_ok (bool) – Whether a key error when querying self._atom_dict raises an error or returns an empty np.ndarray.

Returns:

An integer array with the ids of the requested atoms.

Return type:

np.ndarray

indices_chi1()[source]#

Returns the requested indices as a (n_dihedrals, 4)-shaped numpy array.

Return type:

ndarray

indices_chi2()[source]#

Returns the requested indices as a (n_dihedrals, 4)-shaped numpy array.

Return type:

ndarray

indices_chi3()[source]#

Returns the requested indices as a (n_dihedrals, 4)-shaped numpy array.

Return type:

ndarray

indices_chi4()[source]#

Returns the requested indices as a (n_dihedrals, 4)-shaped numpy array.

Return type:

ndarray

indices_chi5()[source]#

Returns the requested indices as a (n_dihedrals, 4)-shaped numpy array.

Return type:

ndarray

indices_omega()[source]#

Returns the requested indices as a (n_dihedrals, 4)-shaped numpy array.

Return type:

ndarray

indices_phi()[source]#

Returns the requested indices as a (n_dihedrals, 4)-shaped numpy array.

Return type:

ndarray

indices_psi()[source]#

Returns the requested indices as a (n_dihedrals, 4)-shaped numpy array.

Return type:

ndarray

property new_residues: list[NewResidue]#

A list of all new residues.

Type:

list[NewResidue]

sidechain_indices_by_residue()[source]#
Return type:

Generator[Residue, ndarray]

sidechain_sequence(atom_names, type, top=None)[source]#

Searches for a sequence along the sidechains.

Parameters:
  • atom_names (list[str]) – The names of the atoms. Can use +/- to mark atoms in previous or following residue.

  • (Literal["CHI1" (type) – The type of the dihedral sequence.

  • "CHI2" – The type of the dihedral sequence.

  • "CHI3" – The type of the dihedral sequence.

  • "CHI4" – The type of the dihedral sequence.

  • "CHI5"] – The type of the dihedral sequence.

  • top (Optional[md.Topology]) – Can be used to overwrite the toplogy in self.traj.

  • type (Literal['CHI1', 'CHI2', 'CHI3', 'CHI4', 'CHI5'])

Returns:

The integer indices of the requested atoms.

Return type:

np.ndarray

to_dict()[source]#
Return type:

dict[str | tuple[str, str], None | tuple[str, None] | tuple[str, dict[Literal[‘bonds’, ‘optional_bonds’, ‘delete_bonds’, ‘optional_delete_bonds’, ‘PHI’, ‘PSI’, ‘OMEGA’, ‘not_PHI’, ‘not_PSI’, ‘not_OMEGA’, ‘CHI1’, ‘CHI2’, ‘CHI3’, ‘CHI4’, ‘CHI5’], list[str] | list[tuple[str | int, str | int]]]]]

to_hdf_file(fname)[source]#
Parameters:

fname (Path | str)

Return type:

None

to_json()[source]#
Return type:

str

to_yaml()[source]#
Return type:

None

property top: Topology#

The fixed topology.

Type:

md.Topology

class Dihedral(resname, type, atom1=None, atom2=None, atom3=None, atom4=None, delete=False)[source]#

Dataclass that stores information about a dihedral of 4 atoms.

Parameters:
  • resname (str)

  • type (Literal['OMEGA', 'PHI', 'PSI', 'CHI1', 'CHI2', 'CHI3', 'CHI4', 'CHI5'])

  • atom1 (int | str | None)

  • atom2 (int | str | None)

  • atom3 (int | str | None)

  • atom4 (int | str | None)

  • delete (bool)

resname#

The name of the residue, this bond belongs to. Although bonds belong to residues, they can also have atom1 or atom2 belonging to a different residue.

Type:

str

type#

Defines what type of dihedral this dihedral is. Mainly used to discern different these types of dihedrals.

Type:

Literal[“OMEGA”, “PHI”, “PSI”, “CHI1”, “CHI2”, “CHI3”, “CHI4”, “CHI5”]

atom1#

The name of the first atom. Can be ‘CA’, ‘N’, or whatever (not limited to proteins). If it is int it can be any other atom of the topology (also belonging to a different residue).

Type:

Union[str, int]

atom2#

The name of the second atom. Can be ‘CA’, ‘N’, or whatever (not limited to proteins). If it is int it can be any other atom of the topology (also belonging to a different residue).

Type:

Union[str, int]

atom3#

The name of the third atom. Can be ‘CA’, ‘N’, or whatever (not limited to proteins). If it is int it can be any other atom of the topology (also belonging to a different residue).

Type:

Union[str, int]

atom4#

The name of the fourth atom. Can be ‘CA’, ‘N’, or whatever (not limited to proteins). If it is int it can be any other atom of the topology (also belonging to a different residue).

Type:

Union[str, int]

delete#

Whether this dihedral has to be deleted or not. If delete is set to True, this dihedral won’t produce output.

Type:

bool

atom1: int | str | None = None#
atom2: int | str | None = None#
atom3: int | str | None = None#
atom4: int | str | None = None#
delete: bool = False#
property new_atoms_def: list[str]#

A list of str, that describes the dihedral’s atoms.

Type:

list[str]

resname: str#
type: Literal['OMEGA', 'PHI', 'PSI', 'CHI1', 'CHI2', 'CHI3', 'CHI4', 'CHI5']#
class NewResidue(name, idx=None, resSeq=None, one_letter_code='', topology=None, ignore=False, bonds=<factory>, dihedrals=<factory>, common_str=None)[source]#

Dataclass that stores information about a new (nonstandard) residue.

Parameters:
name#

The 3-letter code name of the new residue.

Type:

str

idx#

The 0-based unique index of the residue. The idx index is always unique (i.e., if multiple chains are present, this residue can only appear in one chain).

Type:

Union[None, int]

resSeq#

The 1-based non-unique index of the residue. resSeqs can appear multiple times, but in separate chains. Each residue chain can have a MET1 residue. Either resSeq or idx must be defined. Not both can be None.

Type:

Union[None, int]

one_letter_code#

The one letter code of this new resiude. Can be set to a known one letter code, so that this new residue mimics that one letter code residue’s behavior. Can also be ‘’ (empty string), if you don’t want to bother with this definition.

Type:

str

ignore#

Whether to ignore the features of this residue.

Type:

bool

bonds#

A list of Bond instances.

Type:

list[Bond]

dihedrals#

A list of Dihedral instances.

Type:

list[Dihedral]

common_str#

The common_str of the (sub)set of SingleTraj`s that this new dihedral should apply to. Only applies to `SingleTraj`s with the same `common_str. Can be None and thus applies to all trajs in the TrajEnsmeble.

Type:

Optional[str]

add_bond(bond)[source]#
Parameters:

bond (Bond)

Return type:

None

add_dihedral(dihedral)[source]#
Parameters:

dihedral (Dihedral)

Return type:

None

as_amino_acid_dict_entry()[source]#
Return type:

dict[str, str | None]

bonds: list[Bond]#
common_str: str | None = None#
dihedrals: list[Dihedral]#
get_dihedral_by_type(type)[source]#
Parameters:

type (str)

Return type:

Dihedral

idx: None | int = None#
ignore: bool = False#
name: str#
one_letter_code: str = ''#
parse_bonds_and_dihedrals(bonds_and_dihedrals)[source]#

Parses a dict of bonds and dihedrals. The format of this can be derived from the format of the CustomTopology input dict.

Parameters:

bonds_and_dihedrals (DihedralOrBondDict) – A dict defining bonds and dihedrals of this newResidue.

Return type:

None

resSeq: None | int = None#
topology: Topology | None = None#
flatten(container)[source]#
load_CV_from_string_or_path(file_or_feature, traj, attr_name=None, cols=None, deg=None, labels=None)[source]#

Loads CV data from a string. That string can either identify a features, or point to a file.

Parameters:
  • file_or_feature (str) – The file or feature to load. If ‘all’ is provided, all “standard” features are loaded. But a feature name like ‘sidechain_angle’ can alsop be provided. If a file with the .txt or .npy extension is provided, the data in that file is used.

  • traj (SingleTraj) – The trajectory, that is used to load the features.

  • attr_name (Union[None, str], optional) – The name under which the CV should be found in the class. Is needed, if a raw numpy array is passed, otherwise the name will be generated from the filename (if data == str), the DataArray.name (if data == xarray.DataArray), or the feature name.

  • cols (Union[list, None], optional) – A list specifying the columns to use for the high-dimensional data. If your highD data contains (x,y,z,…)-errors or has an enumeration column at col=0 this can be used to remove this unwanted data.

  • deg (bool) – Whether the provided data is in radians (False) or degree (True). Can also be None for non-angular data.

  • labels (Union[list[str], str, None], optional) – If you want to label the data you provided pass a list of str. If set to None, the features in this dimension will be labeled as [f”{attr_name.upper()} FEATURE {i}” for i in range(self.n_frames)]. If a str is provided, the features will be labeled as [f”{attr_name.upper()} {label.upper()} {i}” for i in range(self.n_frames)]. If a list[str] is provided, it needs to have the same length as the traj has frames. Defaults to None.

Returns:

An xarray dataset.

Return type:

xr.Dataset

load_CVs_ensemble(trajs, data, periodic=True)[source]#

Loads CVs for a trajectory ensemble. This time with generic feature names so different topologies are aligned and can be treated separately. Loading CVs with ensemble=True will always delete existing CVs.

Parameters:
  • trajs (TrajEnsemble) – The trajectory ensemble to load the data for.

  • data (Union[str, list[str], Literal["all']) – The CV to load. When a numpy array is provided, it needs to have a shape matching n_frames. The data is distributed to the trajs. When a list of files is provided, len(data) needs to match n_trajs. The first file will be loaded by the first traj (based on the traj’s traj_num) and so on. If a list of np.ndarray is provided, the first array will be assigned to the first traj (based on the traj’s traj_num). If None is provided, the argument directory will be used to construct a str like: fname = directory + traj.basename + ‘_’ + attr_name. If there are .txt or .npy files matching that string in the directory, the CVs will be loaded from these files to the corresponding trajs. Defaults to None.

  • periodic (bool) – Whether distance, angle, dihedral calculations should obey the minimum image convention.

Return type:

None

load_CVs_ensembletraj(trajs, data, attr_name=None, cols=None, deg=None, periodic=True, labels=None, directory=None, ensemble=False, override=False)[source]#

Loads CVs for a trajectory ensemble.

CVs can be loaded from a multitude of sources. The argument data can be:
  • np.ndarray: Use a numpy array as a feature.

  • str | Path: You can point to .txt or .npy files and load the features

    from these files. In this case, the cols argument can be used to only use a subset of columns in these files. You can also point to a single directory in which case the basename of the trajectories will be used to look for .npy and .txt files.

  • str: Some strings like “central_dihedrals” are recognized out-of-the-box.

    You can also provide “all” to load all dihedrals used in an encodermap.AngleDihedralCartesianEncoderMap.

  • Feature: You can provide an encodermap.loading.features Feature. The

    CVs will be loaded by creating a featurizer, adding this feature, and obtaining the output.

  • Featurizer: You can also directly provide a featurizer with multiple

    features.

  • xr.DataArray: You can also provide a xarray.DataArray, which will be

    appended to the existing CVs.

  • xr.Dataset: If you provide a xarray.Dataset, you will overwrite all

    currently loaded CVs.

Parameters:
  • trajs (TrajEnsemble) – The trajectory ensemble to load the data for.

  • data (Union[str, list, np.ndarray, 'all', xr.Dataset]) – The CV to load. When a numpy array is provided, it needs to have a shape matching n_frames. The data is distributed to the trajs. When a list of files is provided, len(data) needs to match n_trajs. The first file will be loaded by the first traj (based on the traj’s traj_num) and so on. If a list of np.ndarray is provided, the first array will be assigned to the first traj (based on the traj’s traj_num). If None is provided, the argument directory will be used to construct a str like: fname = directory + traj.basename + ‘_’ + attr_name. If there are .txt or .npy files matching that string in the directory, the CVs will be loaded from these files to the corresponding trajs. Defaults to None.

  • attr_name (Optional[str]) – The name under which the CV should be found in the class. Choose whatever you like. highd, lowd, dists, etc. The CV can then be accessed via dot-notation: trajs.attr_name. Defaults to None, in which case, the argument data should point to existing files and the attr_name will be extracted from these files.

  • cols (Optional[list[int]]) –

    A list of integers indexing the columns of the data to be loaded. This is useful if a file contains columns which are not features (i.e. an indexer or the error of the features. eg:

    id   f1    f2    f1_err    f2_err
    0    1.0   2.0   0.1       0.1
    1    2.5   1.2   0.11      0.52
    

    In that case, you would want to supply cols=[1, 2] to the cols argument. If None all columns are loaded. Defaults to None.

  • deg (Optional[bool]) – Whether to return angular CVs using degrees. If None or False, CVs will be in radian. Defaults to None.

  • labels (list) – A list containing the labels for the dimensions of the data. If you provide a np.ndarray with shape (n_trajs, n_frames, n_feat), this list needs to be of len(n_feat) Defaults to None.

  • directory (Optional[str]) – If this argument is provided, the directory will be searched for .txt or .npy files which have the same names as the trajectories have basenames. The CVs will then be loaded from these files.

  • ensemble (bool) – Whether the trajs in this class belong to an ensemble. This implies that they contain either the same topology or are very similar (think wt, and mutant). Setting this option True will try to match the CVs of the trajs onto the same dataset. If a VAL residue has been replaced by LYS in the mutant, the number of sidechain dihedrals will increase. The CVs of the trajs with VAL will thus contain some NaN values. Defaults to False.

  • override (bool) – Whether to override CVs with the same name as attr_name.

  • periodic (bool)

Return type:

None

load_CVs_from_dir(trajs, data, attr_name=None, cols=None, deg=None)[source]#
Parameters:
Return type:

None

load_CVs_singletraj(data, traj, attr_name=None, cols=None, deg=None, periodic=True, labels=None)[source]#
Parameters:
  • data (str | Path | ndarray | Feature | Dataset | DataArray | SingleTrajFeaturizer | DaskFeaturizer | Literal['all'] | ~typing.Literal['full'] | ~encodermap.loading.features.AllCartesians | ~encodermap.loading.features.AllBondDistances | ~encodermap.loading.features.CentralCartesians | ~encodermap.loading.features.CentralBondDistances | ~encodermap.loading.features.CentralAngles | ~encodermap.loading.features.CentralDihedrals | ~encodermap.loading.features.SideChainCartesians | ~encodermap.loading.features.SideChainBondDistances | ~encodermap.loading.features.SideChainAngles | ~encodermap.loading.features.SideChainDihedrals | ~encodermap.loading.features.CustomFeature | ~encodermap.loading.features.SelectionFeature | ~encodermap.loading.features.AngleFeature | ~encodermap.loading.features.DihedralFeature | ~encodermap.loading.features.DistanceFeature | ~encodermap.loading.features.AlignFeature | ~encodermap.loading.features.InverseDistanceFeature | ~encodermap.loading.features.ContactFeature | ~encodermap.loading.features.BackboneTorsionFeature | ~encodermap.loading.features.ResidueMinDistanceFeature | ~encodermap.loading.features.GroupCOMFeature | ~encodermap.loading.features.ResidueCOMFeature | ~encodermap.loading.features.SideChainTorsions | ~encodermap.loading.features.MinRmsdFeature | None)

  • traj (SingleTraj)

  • attr_name (str | None)

  • cols (list[int] | None)

  • deg (bool | None)

  • periodic (bool)

  • labels (list[str] | None)

Return type:

Dataset

np_to_xr(data, traj, attr_name=None, deg=None, labels=None, filename=None)[source]#

Converts a numpy.ndarray to a xarray.DataArray.

Can use some additional labels and attributes to customize the DataArray.

Parameters:
  • data (np.ndarray) – The data to put into the xarray.DataArray. It is assumed that this array is of shape (n_frames, n_features), where n_frames is the number of frames in traj and n_features can be any positive integer.

  • traj (SingleTraj) – An instance of SingleTraj.

  • attr_name (Optional[str]) – The name of the feature, that will be used to identify this feature (e.g. ‘dihedral_angles’, ‘my_distance’). Can be completely custom. If None is provided, the feature will be called ‘FEATURE_{i}’, where i is a 0-based index of unnamed features. Defaults to None.

  • deg (Optional[bool]) – When True, the input is assumed to use degree. When False, the input is assumed in radians. This can be important if you want to combine features (that are not allowed for angle features with different units). If None, the input is assumed to be not angular (distances, absolute positions). Defaults to None.

  • labels (Optional[list[str]]) – A list of str, which contain labels for the feature. If provided needs to be of len(labels) == data.shape[1]. If None is provided, the labels will be ‘… FEATURE 0’, ‘… FEATURE 1’, …, ‘… FEATURE {n_frames}’.

  • filename (Optional[Union[str, Path]]) – If the data is loaded from a file, and attr_name and labels are both None, then they will use the filename.

Returns:

The DataArray.

Return type:

xr.DataArray

trajs_combine_attrs(args, context=None)[source]#

Used for combining attributes and checking, whether CVs stay in the same unit system.

Parameters:
  • args (Sequence[dict[str, Any]]) – A sequence of dicts to combine.

  • context (Optional[xr.Context]) – An xarray.Context object. Currently not used in the function, but xarray passes it nonetheless

Returns:

The combined dict.

Return type:

dict[str, Any]