encodermap.loading package#

Submodules#

encodermap.loading.delayed module#

Functions to use with the DaskFeaturizer class.

build_dask_xarray(featurizer: DaskFeaturizer, traj: SingleTraj | None, streamable: bool, return_delayeds: Literal[True]) tuple[Dataset, dict[str, Variable]][source]#
build_dask_xarray(featurizer: DaskFeaturizer, traj: SingleTraj | None, streamable: bool, return_delayeds: Literal[False]) tuple[Dataset, None]

Builds a large dask xarray, which will be distributively evaluated.

This class takes a DaskFeaturizer class, which contains a list of features. Every feature in this list contains enough information for the delayed functions to calculate the requested quantities when provided the xyz coordinates of the atoms, the unitcell vectors, and the unitcell infos as a Bravais matrix.

Parameters:
  • featurizer (DaskFeaturizer) – An instance of the DaskFeaturizer.

  • return_coordinates (bool) – Whether to add this information: all_xyz, all_time, all_cell_lengths, all_cell_angles to the returned values. Defaults to False.

  • streamable (bool) – Whether to divide the calculations into one-frame blocks, which can then only be calculated when requested.

Returns:

When return_coordinates is False, only a xr.Dataset is returned. Otherwise, a tuple with a xr.Dataset and a sequence of dask.Delayed objects is returned.

Return type:

Union[xr.Dataset, tuple[xr.Dataset, list[dask.delayed]]

calc_bravais_box(box_info)[source]#

Calculates the Bravais vectors from lengths and angles (in degrees).

Note

This code is adapted from gyroid, which is licensed under the BSD http://pythonhosted.org/gyroid/_modules/gyroid/unitcell.html

Parameters:

box_info (np.ndarray) – The box info, where the columns are ordered as follows: a, b, c, alpha, beta. gamma in degree.

Returns:

The bravais vectors as a shape (n_frames, 3, 3) array.

Return type:

np.ndarray

load_xyz(traj_file, frame_indices, traj_num=None)[source]#
Parameters:
Return type:

tuple[array, array, array, array]

load_xyz_from_h5(traj_file, frame_indices, traj_num=None)[source]#

Loads xyz coordinates and unitcell info from a block in a .h5 file.

Standard MDTraj h5 keys are:

[‘cell_angles’, ‘cell_lengths’, ‘coordinates’, ‘time’, ‘topology’]

Parameters:
  • traj_file (str) – The file to load.

  • frame_indices (np.ndarray) – An int array giving the positions to load.

  • traj_num (int) – Which traj num the output should be put to.

Returns:

A four-tuple of dask
arrays that contain dask delayeds. The order of these arrays is:
  • positions: Shape (len(frame_indices), 3): The xyz coordinates in nm.

  • time: shape (len(frame_indices), ): The time in ps.

  • unitcell_vectors: Shape (len(frame_indices), 3, 3): The unitcell vectors.

  • unitcell_info: Shape (len(frame_indices), 6), where [:, :3] are

    the unitcell lengths in nm and [:, 3:] are the unitcell angles in degree.

Return type:

tuple[da.array, da.array, da.array, da.array]

encodermap.loading.features module#

Features contain topological information of proteins and other biomolecules.

These topological information can be calculated once and then provided with input coordinates to calculate frame-wise collective variables of MD simulations.

The features in this module used to inherit from PyEMMA’s features (markovmodel/PyEMMA), but PyEMMA has since been archived.

If using EncoderMap’s featurization make sure to also cite PyEMMA, from which a lot of this code was adopted:

@article{scherer_pyemma_2015,
     author = {Scherer, Martin K. and Trendelkamp-Schroer, Benjamin
               and Paul, Fabian and Pérez-Hernández, Guillermo and Hoffmann, Moritz and
               Plattner, Nuria and Wehmeyer, Christoph and Prinz, Jan-Hendrik and Noé, Frank},
     title = {{PyEMMA} 2: {A} {Software} {Package} for {Estimation},
              {Validation}, and {Analysis} of {Markov} {Models}},
     journal = {Journal of Chemical Theory and Computation},
     volume = {11},
     pages = {5525-5542},
     year = {2015},
     issn = {1549-9618},
     shorttitle = {{PyEMMA} 2},
     url = {http://dx.doi.org/10.1021/acs.jctc.5b00743},
     doi = {10.1021/acs.jctc.5b00743},
     urldate = {2015-10-19},
     month = oct,
}
class AlignFeature(traj, reference, indexes, atom_indices=None, ref_atom_indices=None, in_place=False, delayed=False)[source]#

Bases: SelectionFeature

Parameters:
  • traj (SingleTraj)

  • reference (md.Trajectory)

  • indexes (np.ndarray)

  • atom_indices (Optional[np.ndarray])

  • ref_atom_indices (Optional[np.ndarray])

  • in_place (bool)

  • delayed (bool)

_raise_on_unitcell = False#
_use_angle = False#
_use_omega = False#
_use_periodic = False#
atom_feature = False#
prefix_label: str = 'aligned ATOM:'#
transform(xyz=None, unitcell_vectors=None, unitcell_info=None)[source]#

Returns the aligned xyz coordinates.

Parameters:
Return type:

ndarray

class AllBondDistances(traj, distance_indexes=None, periodic=True, check_aas=True, delayed=False)[source]#

Bases: DistanceFeature

Feature that collects all bonds in a topology.

Parameters:
top#

Topology of this feature.

Type:

mdtraj.Topology

indexes#

The numpy array returned from top.select(‘all’).

Type:

np.ndarray

prefix_label#

A prefix for the labels. In this case it is ‘DISTANCE’.

Type:

str

_raise_on_unitcell = False#
_use_angle = False#
_use_omega = False#
_use_periodic = True#
atom_feature = False#
describe()[source]#

Gives a list of strings describing this feature’s feature-axis.

A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].

Returns:

The labels of this feature.

Return type:

list[str]

generic_describe()[source]#

Returns a list of generic labels, not containing residue names. These can be used to stack tops of different topology.

Returns:

A list of labels.

Return type:

list[str]

property indexes: ndarray#

A (n_angles, 2) shaped numpy array giving the atom indices of the distances to be calculated.

Type:

np.ndarray

property name: str#

The name of the class: “AllBondDistances”.

Type:

str

prefix_label: str = 'DISTANCE        '#
class AllCartesians(traj, check_aas=True, generic_labels=False, delayed=False)[source]#

Bases: SelectionFeature

Feature that collects all cartesian positions of all atoms in the trajectory.

Note

The order of the cartesians is not as in standard MD coordinates. Rather than giving the positions of all atoms of the first residue, and then all positions of the second, and so on, this feature gives all central (backbone) cartesians first, followed by the cartesians of the sidechains. This allows better and faster backmapping. See encodermap.misc.backmapping._full_backmapping_np for mor info, why this is easier.

Parameters:
top#

Topology of this feature.

Type:

mdtraj.Topology

indexes#

The numpy array returned from top.select(‘all’).

Type:

np.ndarray

prefix_label#

A prefix for the labels. In this case, it is ‘POSITION’.

Type:

str

_raise_on_unitcell = False#
_use_angle = False#
_use_omega = False#
_use_periodic = False#
atom_feature = False#
describe()[source]#

Gives a list of strings describing this feature’s feature-axis.

A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].

Returns:

The labels of this feature. This list has as many entries as atoms in self.top.

Return type:

list[str]

generic_describe()[source]#

Returns a list of generic labels, not containing residue names. These can be used to stack tops of different topology.

Returns:

A list of labels.

Return type:

list[str]

property name: str#

The name of this class: ‘AllCartesians’

Type:

str

prefix_label: str = 'POSITION '#
class AngleFeature(traj, angle_indexes, deg=False, cossin=False, periodic=True, check_aas=True, delayed=False)[source]#

Bases: Feature

Parameters:
_raise_on_unitcell = False#
_use_angle = True#
_use_omega = False#
_use_periodic = True#
atom_feature = False#
property dask_indices: str#

The name of the delayed transformation to carry out with this feature.

Type:

str

static dask_transform()#

The same as transform() but without the need to pickle traj.

When dask delayed concurrencies are distributed, required python objects are pickled. Thus, every feature needs to have its own pickled traj. That defeats the purpose of dask distributed. Thus, this method implements the same calculations as transform as a more barebones approach. It foregoes the checks for periodicity and unit-cell shape and just takes xyz, unitcell vectors, and unitcell info. Furthermore, it is a staticmethod, so it doesn’t require self to function. However, it needs the indexes in self.indexes. That’s why the dask_indices property informs the scheduler to also pickle and pass this object to the workers.

Parameters:
  • indexes (np.ndarray) – A numpy array with shape (n, ) giving the 0-based index of the atoms which positions should be returned.

  • periodic (bool) – Whether to observe the minimum image convention and respect proteins breaking over the periodic boundary condition as a whole (True). In this case, the trajectory container in traj needs to have unitcell information. Defaults to True.

  • deg (bool) – Whether to return the result in degree (deg=True) or in radians (deg=False). Defaults to False (radians).

  • cossin (bool) – If True, each angle will be returned as a pair of (sin(x), cos(x)). This is useful, if you calculate the means (e.g. TICA/PCA, clustering) in that space. Defaults to False.

  • xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.

  • unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.

  • unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.

Return type:

dask.delayed

describe()[source]#

Gives a list of strings describing this feature’s feature-axis.

A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].

Returns:

The labels of this feature.

Return type:

list[str]

transform(xyz=None, unitcell_vectors=None, unitcell_info=None)[source]#

Takes xyz and unitcell information to apply the topological calculations on.

When this method is not provided with any input, it will take the traj_container provided as traj in the __init__() method and transforms this trajectory. The argument xyz can be the xyz coordinates in nanometer of a trajectory with identical topology as self.traj. If periodic was set to True, unitcell_vectors and unitcell_info should also be provided.

Parameters:
  • xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.

  • unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.

  • unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.

Returns:

The result of the computation with shape (n_frames, n_indexes).

Return type:

np.ndarray

class BackboneTorsionFeature(traj, selstr=None, deg=False, cossin=False, periodic=True, delayed=False)[source]#

Bases: DihedralFeature

Parameters:
_raise_on_unitcell = False#
_use_angle = True#
_use_omega = False#
_use_periodic = True#
atom_feature = False#
describe()[source]#

Gives a list of strings describing this feature’s feature-axis.

A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].

Returns:

The labels of this feature. The length

is determined by the dih_indexes and the cossin argument in the __init__() method. If cossin is false, then len(describe()) == self.angle_indexes[-1], else len(describe()) is twice as long.

Return type:

list[str]

class CentralAngles(traj, deg=False, cossin=False, periodic=True, generic_labels=False, check_aas=True, delayed=False)[source]#

Bases: AngleFeature

Feature that collects all angles in the backbone of a topology.

Parameters:
top#

Topology of this feature.

Type:

mdtraj.Topology

indexes#

The numpy array returned from top.select(‘all’).

Type:

np.ndarray

prefix_label#

A prefix for the labels. In this case it is ‘CENTERANGLE’.

Type:

str

_raise_on_unitcell = False#
_use_angle = True#
_use_omega = False#
_use_periodic = True#
atom_feature = False#
describe()[source]#

Gives a list of strings describing this feature’s feature-axis.

A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].

Returns:

The labels of this feature.

Return type:

list[str]

generic_describe()[source]#

Returns a list of generic labels, not containing residue names. These can be used to stack tops of different topology.

Returns:

A list of labels.

Return type:

list[str]

property indexes: ndarray#

A (n_angles, 3) shaped numpy array giving the atom indices of the angles to be calculated.

Type:

np.ndarray

property name: str#

The name of the class: “CentralAngles”.

Type:

str

prefix_label: str = 'CENTERANGLE     '#
class CentralBondDistances(traj, distance_indexes=None, periodic=True, generic_labels=False, check_aas=True, delayed=False)[source]#

Bases: AllBondDistances

Feature that collects all bonds in the backbone of a topology.

Parameters:
top#

Topology of this feature.

Type:

mdtraj.Topology

indexes#

The numpy array returned from top.select(‘all’).

Type:

np.ndarray

prefix_label#

A prefix for the labels. In this case, it is ‘CENTERDISTANCE’.

Type:

str

_raise_on_unitcell = False#
_use_angle = False#
_use_omega = False#
_use_periodic = True#
atom_feature = False#
generic_describe()[source]#

Returns a list of generic labels, not containing residue names. These can be used to stack tops of different topology.

Returns:

A list of labels.

Return type:

list[str]

property indexes: ndarray#

A (n_angles, 2) shaped numpy array giving the atom indices of the distances to be calculated.

Type:

np.ndarray

property name: str#

The name of the class: “CentralBondDistances”.

Type:

str

prefix_label: str = 'CENTERDISTANCE  '#
class CentralCartesians(traj, generic_labels=False, check_aas=True, delayed=False)[source]#

Bases: SelectionFeature

Feature that collects all cartesian positions of the backbone atoms.

Examples

>>> import encodermap as em
>>> from pprint import pprint
>>> traj = em.load_project("pASP_pGLU", 0)[0]
>>> traj  
<encodermap.SingleTraj object...>
>>> feature = em.features.CentralCartesians(traj, generic_labels=False)
>>> pprint(feature.describe())  
['CENTERPOS X     ATOM     N:    0 GLU:   1 CHAIN 0',
 'CENTERPOS Y     ATOM     N:    0 GLU:   1 CHAIN 0',
 'CENTERPOS Z     ATOM     N:    0 GLU:   1 CHAIN 0',
 'CENTERPOS X     ATOM    CA:    3 GLU:   1 CHAIN 0',
 'CENTERPOS Y     ATOM    CA:    3 GLU:   1 CHAIN 0',
 'CENTERPOS Z     ATOM    CA:    3 GLU:   1 CHAIN 0',
 '...
 'CENTERPOS Z     ATOM     C:   65 GLU:   6 CHAIN 0']
 >>> feature = em.features.CentralCartesians(traj, generic_labels=True)
 >>> pprint(feature.describe())  
 ['CENTERPOS X 1',
  'CENTERPOS Y 1',
  'CENTERPOS Z 1',
  'CENTERPOS X 2',
  'CENTERPOS Y 2',
  'CENTERPOS Z 2',
  '...
  'CENTERPOS Z 18']
Parameters:
_raise_on_unitcell = False#
_use_angle = False#
_use_omega = False#
_use_periodic = False#
atom_feature = False#
describe()[source]#

Gives a list of strings describing this feature’s feature-axis.

A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].

Returns:

The labels of this feature.

Return type:

list[str]

generic_describe()[source]#

Returns a list of generic labels, not containing residue names. These can be used to stack tops of different topology.

Returns:

A list of labels.

Return type:

list[str]

property name: str#

The name of the class: “CentralCartesians”.

Type:

str

prefix_label: str = 'CENTERPOS'#
class CentralDihedrals(traj, selstr=None, deg=False, cossin=False, periodic=True, omega=True, generic_labels=False, check_aas=True, delayed=False)[source]#

Bases: DihedralFeature

Feature that collects all dihedrals in the backbone of a topology.

Parameters:
top#

Topology of this feature.

Type:

mdtraj.Topology

indexes#

The numpy array returned from top.select(‘all’).

Type:

np.ndarray

_raise_on_unitcell = False#
_use_angle = True#
_use_omega = True#
_use_periodic = True#
atom_feature = False#
describe()[source]#

Gives a list of strings describing this feature’s feature-axis.

A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].

Returns:

The labels of this feature. The length

is determined by the dih_indexes and the cossin argument in the __init__() method. If cossin is false, then len(describe()) == self.angle_indexes[-1], else len(describe()) is twice as long.

Return type:

list[str]

generic_describe()[source]#

Returns a list of generic labels, not containing residue names. These can be used to stack tops of different topology.

Returns:

A list of labels.

Return type:

list[str]

property indexes: ndarray#

A (n_angles, 4) shaped numpy array giving the atom indices of the dihedral angles to be calculated.

Type:

np.ndarray

property name: str#

The name of the class: “CentralDihedrals”.

Type:

str

class ContactFeature(traj, distance_indexes, threshold=5.0, periodic=True, count_contacts=False, delayed=False)[source]#

Bases: DistanceFeature

Defines certain distances as contacts and returns a binary (0, 1) result.

Instead of returning the binary result can also count contacts with the argument count_contacts=True provided at instantiation. In that case, every frame returns an integer number.

Parameters:
_nonstandard_transform_args: list[str] = ['threshold', 'count_contacts']#
_raise_on_unitcell = False#
_use_angle = False#
_use_omega = False#
_use_periodic = True#
atom_feature = False#
property dask_indices: str#

The name of the delayed transformation to carry out with this feature.

Type:

str

static dask_transform()#

The same as transform() but without the need to pickle traj.

When dask delayed concurrencies are distributed, required python objects are pickled. Thus, every feature needs to have its own pickled traj. That defeats the purpose of dask distributed. Thus, this method implements the same calculations as transform as a more barebones approach. It foregoes the checks for periodicity and unit-cell shape and just takes xyz, unitcell vectors, and unitcell info. Furthermore, it is a staticmethod, so it doesn’t require self to function. However, it needs the indexes in self.indexes. That’s why the dask_indices property informs the scheduler to also pickle and pass this object to the workers.

Parameters:
  • indexes (np.ndarray) – A numpy array with shape (n, ) giving the 0-based index of the atoms which positions should be returned.

  • periodic (bool) – Whether to observe the minimum image convention and respect proteins breaking over the periodic boundary condition as a whole (True). In this case, the trajectory container in traj needs to have unitcell information. Defaults to True.

  • threshold (float) – The threshold in nm, under which a distance is considered to be a contact. Defaults to 5.0 nm.

  • count_contacts (bool) – When True, return an integer of the number of contacts instead of returning the array of regular contacts.

  • xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.

  • unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.

  • unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.

Return type:

dask.delayed

prefix_label: str = 'CONTACT:'#
transform(xyz=None, unitcell_vectors=None, unitcell_info=None)[source]#

Takes xyz and unitcell information to apply the topological calculations on.

When this method is not provided with any input, it will take the traj_container provided as traj in the __init__() method and transforms this trajectory. The argument xyz can be the xyz coordinates in nanometer of a trajectory with identical topology as self.traj. If periodic was set to True, unitcell_vectors and unitcell_info should also be provided.

Parameters:
  • xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.

  • unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.

  • unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.

Returns:

The result of the computation with shape (n_frames, n_indexes).

Return type:

np.ndarray

class CustomFeature(fun, dim, traj=None, description=None, fun_args=(), fun_kwargs=None, delayed=False)[source]#

Bases: Feature

Parameters:
_args: tuple[Any, ...] | None = None#
_fun: Callable | None = None#
_is_custom: Final[True] = True#
_kwargs: dict[str, Any] | None = None#
_nonstandard_transform_args: list[str] = ['top', 'indexes', 'delayed_call', '_fun', '_args', '_kwargs']#
_raise_on_unitcell = False#
_use_angle = False#
_use_omega = False#
_use_periodic = False#
atom_feature = False#
property dask_indices#

The name of the delayed transformation to carry out with this feature.

Type:

str

static dask_transform()#

The CustomFeature dask transfrom is still under development.

Parameters:
  • top (md.Topology)

  • indexes (np.ndarray)

  • delayed_call (Optional[Callable])

  • _fun (Optional[Callable])

  • _args (Optional[Sequence[Any]])

  • _kwargs (Optional[dict[str, Any]])

  • xyz (Optional[np.ndarray])

  • unitcell_vectors (Optional[np.ndarray])

  • unitcell_info (Optional[np.ndarray])

Return type:

np.ndarray

delayed: bool = False#
describe()[source]#

Gives a list of strings describing this feature’s feature-axis.

A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].

Returns:

The labels of this feature.

Return type:

list[str]

indexes: np.ndarray | None = None#
top: md.Topology | None = None#
traj: SingleTraj | None = None#
transform(traj=None, xyz=None, unitcell_vectors=None, unitcell_info=None)[source]#

Takes xyz and unitcell information to apply the topological calculations on.

When this method is not provided with any input, it will take the traj_container provided as traj in the __init__() method and transforms this trajectory. The argument xyz can be the xyz coordinates in nanometer of a trajectory with identical topology as self.traj. If periodic was set to True, unitcell_vectors and unitcell_info should also be provided.

Parameters:
  • xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.

  • unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.

  • unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.

  • traj (Trajectory | None)

Returns:

The result of the computation with shape (n_frames, n_indexes).

Return type:

np.ndarray

class DihedralFeature(traj, dih_indexes, deg=False, cossin=False, periodic=True, check_aas=True, delayed=False)[source]#

Bases: AngleFeature

Dihedrals are torsion angles defined by four atoms.

Parameters:
_raise_on_unitcell = False#
_use_angle = True#
_use_omega = False#
_use_periodic = True#
atom_feature = False#
static dask_transform()#

The same as transform() but without the need to pickle traj.

When dask delayed concurrencies are distributed, required python objects are pickled. Thus, every feature needs to have its own pickled traj. That defeats the purpose of dask distributed. Thus, this method implements the same calculations as transform as a more barebones approach. It foregoes the checks for periodicity and unit-cell shape and just takes xyz, unitcell vectors, and unitcell info. Furthermore, it is a staticmethod, so it doesn’t require self to function. However, it needs the indexes in self.indexes. That’s why the dask_indices property informs the scheduler to also pickle and pass this object to the workers.

Parameters:
  • indexes (np.ndarray) – A numpy array with shape (n, ) giving the 0-based index of the atoms which positions should be returned.

  • periodic (bool) – Whether to observe the minimum image convention and respect proteins breaking over the periodic boundary condition as a whole (True). In this case, the trajectory container in traj needs to have unitcell information. Defaults to True.

  • deg (bool) – Whether to return the result in degree (deg=True) or in radians (deg=False). Defaults to False (radians).

  • cossin (bool) – If True, each angle will be returned as a pair of (sin(x), cos(x)). This is useful, if you calculate the means (e.g. TICA/PCA, clustering) in that space. Defaults to False.

  • xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.

  • unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.

  • unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.

Return type:

dask.delayed

describe()[source]#

Gives a list of strings describing this feature’s feature-axis.

A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].

Returns:

The labels of this feature. The length

is determined by the dih_indexes and the cossin argument in the __init__() method. If cossin is false, then len(describe()) == self.angle_indexes[-1], else len(describe()) is twice as long.

Return type:

list[str]

transform(xyz=None, unitcell_vectors=None, unitcell_info=None)[source]#

Takes xyz and unitcell information to apply the topological calculations on.

When this method is not provided with any input, it will take the traj_container provided as traj in the __init__() method and transforms this trajectory. The argument xyz can be the xyz coordinates in nanometer of a trajectory with identical topology as self.traj. If periodic was set to True, unitcell_vectors and unitcell_info should also be provided.

Parameters:
  • xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.

  • unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.

  • unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.

Returns:

The result of the computation with shape (n_frames, n_indexes).

Return type:

np.ndarray

class DistanceFeature(traj, distance_indexes, periodic=True, dim=None, check_aas=True, delayed=False)[source]#

Bases: Feature

Parameters:
_raise_on_unitcell = False#
_use_angle = False#
_use_omega = False#
_use_periodic = True#
atom_feature = False#
property dask_indices: str#

The name of the delayed transformation to carry out with this feature.

Type:

str

static dask_transform()#

The same as transform() but without the need to pickle traj.

When dask delayed concurrencies are distributed, required python objects are pickled. Thus, every feature needs to have its own pickled traj. That defeats the purpose of dask distributed. Thus, this method implements the same calculations as transform as a more barebones approach. It foregoes the checks for periodicity and unit-cell shape and just takes xyz, unitcell vectors, and unitcell info. Furthermore, it is a staticmethod, so it doesn’t require self to function. However, it needs the indexes in self.indexes. That’s why the dask_indices property informs the scheduler to also pickle and pass this object to the workers.

Parameters:
  • indexes (np.ndarray) – A numpy array with shape (n, ) giving the 0-based index of the atoms which positions should be returned.

  • periodic (bool) – Whether to observe the minimum image convention and respect proteins breaking over the periodic boundary condition as a whole (True). In this case, the trajectory container in traj needs to have unitcell information. Defaults to True.

  • xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.

  • unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.

  • unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.

Return type:

dask.delayed

describe()[source]#

Gives a list of strings describing this feature’s feature-axis.

A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].

Returns:

The labels of this feature.

Return type:

list[str]

prefix_label: str = 'DIST:'#
transform(xyz=None, unitcell_vectors=None, unitcell_info=None)[source]#

Takes xyz and unitcell information to apply the topological calculations on.

When this method is not provided with any input, it will take the traj_container provided as traj in the __init__() method and transforms this trajectory. The argument xyz can be the xyz coordinates in nanometer of a trajectory with identical topology as self.traj. If periodic was set to True, unitcell_vectors and unitcell_info should also be provided.

Parameters:
  • xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.

  • unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.

  • unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.

Returns:

The result of the computation with shape (n_frames, n_indexes).

Return type:

np.ndarray

class GroupCOMFeature(traj, group_definitions, ref_geom=None, image_molecules=False, mass_weighted=True, delayed=False)[source]#

Bases: Feature

Cartesian coordinates of the center-of-mass (COM) of atom groups.

Groups can be defined as sequences of sequences of int. So a list of list of int can be used to define groups of various sizes. The resulting array will have the shape of (n_frames, n_groups ** 2). The xyz coordinates are flattended, so the array can be rebuilt with np.dstack()

Examples

>>> import encodermap as em
>>> import numpy as np
>>> traj = em.SingleTraj.from_pdb_id("1YUG")
>>> f = em.features.GroupCOMFeature(
...     traj=traj,
...     group_definitions=[
...         [0, 1, 2],
...         [3, 4, 5, 6, 7],
...         [8, 9, 10],
...     ]
... )
>>> a = f.transform()
>>> a.shape  # this array is flattened along the feature axis
(15, 9)
>>> a = np.dstack([
...     a[..., ::3],
...     a[..., 1::3],
...     a[..., 2::3],
... ])
>>> a.shape  # now the z, coordinate of the 2nd center of mass is a[:, 1, -1]
(15, 3, 3)

Note

Centering (ref_geom) and imaging (image_molecules=True) can be time- consuming. Consider doing this to your trajectory files prior to featurization.

Parameters:
  • traj (SingleTraj)

  • group_definitions (Sequence[Sequence[int]])

  • ref_geom (Optional[md.Trajectory])

  • image_molecules (bool)

  • mass_weighted (bool)

  • delayed (bool)

_nonstandard_transform_args: list[str] = ['top', 'ref_geom', 'image_molecules', 'masses_in_groups']#
_raise_on_unitcell = False#
_use_angle = False#
_use_omega = False#
_use_periodic = False#
atom_feature = False#
property dask_indices: str#

The name of the delayed transformation to carry out with this feature.

Type:

str

static dask_transform()#

The same as transform() but without the need to pickle traj.

When dask delayed concurrencies are distributed, required python objects are pickled. Thus, every feature needs to have its own pickled traj. That defeats the purpose of dask distributed. Thus, this method implements the same calculations as transform as a more barebones approach. It foregoes the checks for periodicity and unit-cell shape and just takes xyz, unitcell vectors, and unitcell info. Furthermore, it is a staticmethod, so it doesn’t require self to function. However, it needs the indexes in self.indexes. That’s why the dask_indices property informs the scheduler to also pickle and pass this object to the workers.

Parameters:
  • indexes (np.ndarray) – For this special feature, the indexes argument in the @staticmethod dask_transform is self.group_definitions.

  • periodic (bool) – Whether to observe the minimum image convention and respect proteins breaking over the periodic boundary condition as a whole (True). In this case, the trajectory container in traj needs to have unitcell information. Defaults to True.

  • threshold (float) – The threshold in nm, under which a distance is considered to be a contact. Defaults to 5.0 nm.

  • count_contacts (bool) – When True, return an integer of the number of contacts instead of returning the array of regular contacts.

  • xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.

  • unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.

  • unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.

  • top (md.Topology)

  • ref_geom (Union[md.Trajectory, None])

  • image_molecules (bool)

  • masses_in_groups (list[float])

Return type:

dask.delayed

describe()[source]#

Gives a list of strings describing this feature’s feature-axis.

A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].

Returns:

The labels of this feature.

Return type:

list[str]

transform(xyz=None, unitcell_vectors=None, unitcell_info=None)[source]#

Takes xyz and unitcell information to apply the topological calculations on.

When this method is not provided with any input, it will take the traj_container provided as traj in the __init__() method and transforms this trajectory. The argument xyz can be the xyz coordinates in nanometer of a trajectory with identical topology as self.traj. If periodic was set to True, unitcell_vectors and unitcell_info should also be provided.

Parameters:
  • xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.

  • unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.

  • unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.

Returns:

The result of the computation with shape (n_frames, n_indexes).

Return type:

np.ndarray

class InverseDistanceFeature(traj, distance_indexes, periodic=True, delayed=False)[source]#

Bases: DistanceFeature

Parameters:
_raise_on_unitcell = False#
_use_angle = False#
_use_omega = False#
_use_periodic = True#
atom_feature = False#
property dask_indices: str#

The name of the delayed transformation to carry out with this feature.

Type:

str

static dask_transform()#

The same as transform() but without the need to pickle traj.

When dask delayed concurrencies are distributed, required python objects are pickled. Thus, every feature needs to have its own pickled traj. That defeats the purpose of dask distributed. Thus, this method implements the same calculations as transform as a more barebones approach. It foregoes the checks for periodicity and unit-cell shape and just takes xyz, unitcell vectors, and unitcell info. Furthermore, it is a staticmethod, so it doesn’t require self to function. However, it needs the indexes in self.indexes. That’s why the dask_indices property informs the scheduler to also pickle and pass this object to the workers.

Parameters:
  • indexes (np.ndarray) – A numpy array with shape (n, ) giving the 0-based index of the atoms which positions should be returned.

  • periodic (bool) – Whether to observe the minimum image convention and respect proteins breaking over the periodic boundary condition as a whole (True). In this case, the trajectory container in traj needs to have unitcell information. Defaults to True.

  • xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.

  • unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.

  • unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.

Return type:

dask.delayed

prefix_label: str = 'INVDIST:'#
transform(xyz=None, unitcell_vectors=None, unitcell_info=None)[source]#

Takes xyz and unitcell information to apply the topological calculations on.

When this method is not provided with any input, it will take the traj_container provided as traj in the __init__() method and transforms this trajectory. The argument xyz can be the xyz coordinates in nanometer of a trajectory with identical topology as self.traj. If periodic was set to True, unitcell_vectors and unitcell_info should also be provided.

Parameters:
  • xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.

  • unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.

  • unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.

Returns:

The result of the computation with shape (n_frames, n_indexes).

Return type:

np.ndarray

class MinRmsdFeature(traj, ref, ref_frame=0, atom_indices=None, precentered=False, delayed=False)[source]#

Bases: Feature

Parameters:
_nonstandard_transform_args: list[str] = ['top', 'ref']#
_raise_on_unitcell = False#
_use_angle = False#
_use_omega = False#
_use_periodic = False#
atom_feature = False#
property dask_indices#

The name of the delayed transformation to carry out with this feature.

Type:

str

static dask_transform()#

Takes xyz and unitcell information to apply the topological calculations on.

Parameters:
  • xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.

  • unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.

  • unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.

  • indexes (np.ndarray)

  • top (md.Topology)

  • ref (md.Trajectory)

Returns:

The result of the computation with shape (n_frames, n_indexes).

Return type:

np.ndarray

describe()[source]#

Gives a list of strings describing this feature’s feature-axis.

A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].

Returns:

The labels of this feature.

Return type:

list[str]

transform(xyz=None, unitcell_vectors=None, unitcell_info=None)[source]#

Takes xyz and unitcell information to apply the topological calculations on.

When this method is not provided with any input, it will take the traj_container provided as traj in the __init__() method and transforms this trajectory. The argument xyz can be the xyz coordinates in nanometer of a trajectory with identical topology as self.traj. If periodic was set to True, unitcell_vectors and unitcell_info should also be provided.

Parameters:
  • xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.

  • unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.

  • unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.

Returns:

The result of the computation with shape (n_frames, n_indexes).

Return type:

np.ndarray

class ResidueCOMFeature(traj, residue_indices, residue_atoms, scheme='all', ref_geom=None, image_molecules=False, mass_weighted=True, delayed=False)[source]#

Bases: GroupCOMFeature

Parameters:
  • traj (SingleTraj)

  • residue_indices (Sequence[int])

  • residue_atoms (np.ndarray)

  • scheme (Literal['all', 'backbone', 'sidechain'])

  • ref_geom (Optional[md.Trajectory])

  • image_molecules (bool)

  • mass_weighted (bool)

  • delayed (bool)

_raise_on_unitcell = False#
_use_angle = False#
_use_omega = False#
_use_periodic = False#
atom_feature = False#
class ResidueMinDistanceFeature(traj, contacts, scheme, ignore_nonprotein, threshold, periodic, count_contacts=False, delayed=False)[source]#

Bases: DistanceFeature

Parameters:
  • traj (SingleTraj)

  • contacts (np.ndarray)

  • scheme (Literal['ca', 'closest', 'closest-heavy'])

  • ignore_nonprotein (bool)

  • threshold (float)

  • periodic (bool)

  • count_contacts (bool)

  • delayed (bool)

_nonstandard_transform_args: list[str] = ['threshold', 'count_contacts', 'scheme', 'top']#
_raise_on_unitcell = False#
_use_angle = False#
_use_omega = False#
_use_periodic = True#
atom_feature = False#
property dask_indices: str#

The name of the delayed transformation to carry out with this feature.

Type:

str

static dask_transform()#

The same as transform() but without the need to pickle traj.

When dask delayed concurrencies are distributed, required python objects are pickled. Thus, every feature needs to have its own pickled traj. That defeats the purpose of dask distributed. Thus, this method implements the same calculations as transform as a more barebones approach. It foregoes the checks for periodicity and unit-cell shape and just takes xyz, unitcell vectors, and unitcell info. Furthermore, it is a staticmethod, so it doesn’t require self to function. However, it needs the indexes in self.indexes. That’s why the dask_indices property informs the scheduler to also pickle and pass this object to the workers.

Parameters:
  • indexes (np.ndarray) – For this special feature, the indexes argument in the @staticmethod dask_transform is self.contacts.

  • periodic (bool) – Whether to observe the minimum image convention and respect proteins breaking over the periodic boundary condition as a whole (True). In this case, the trajectory container in traj needs to have unitcell information. Defaults to True.

  • threshold (float) – The threshold in nm, under which a distance is considered to be a contact. Defaults to 5.0 nm.

  • count_contacts (bool) – When True, return an integer of the number of contacts instead of returning the array of regular contacts.

  • xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.

  • unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.

  • unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.

  • top (md.Topology)

  • scheme (Literal['ca', 'closest', 'closest-heavy'])

Return type:

dask.delayed

describe()[source]#

Gives a list of strings describing this feature’s feature-axis.

A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].

Returns:

The labels of this feature.

Return type:

list[str]

transform(xyz=None, unitcell_vectors=None, unitcell_info=None)[source]#

Takes xyz and unitcell information to apply the topological calculations on.

When this method is not provided with any input, it will take the traj_container provided as traj in the __init__() method and transforms this trajectory. The argument xyz can be the xyz coordinates in nanometer of a trajectory with identical topology as self.traj. If periodic was set to True, unitcell_vectors and unitcell_info should also be provided.

Parameters:
  • xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.

  • unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.

  • unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.

Returns:

The result of the computation with shape (n_frames, n_indexes).

Return type:

np.ndarray

class SelectionFeature(traj, indexes, check_aas=True, delayed=False)[source]#

Bases: Feature

Parameters:
_raise_on_unitcell = False#
_use_angle = False#
_use_omega = False#
_use_periodic = False#
atom_feature = False#
property dask_indices: str#

The name of the delayed transformation to carry out with this feature.

Type:

str

static dask_transform()#

The same as transform() but without the need to pickle traj.

When dask delayed concurrencies are distributed, required python objects are pickled. Thus, every feature needs to have its own pickled traj. That defeats the purpose of dask distributed. Thus, this method implements the same calculations as transform as a more barebones approach. It foregoes the checks for periodicity and unit-cell shape and just takes xyz, unitcell vectors, and unitcell info. Furthermore, it is a staticmethod, so it doesn’t require self to function. However, it needs the indexes in self.indexes. That’s why the dask_indices property informs the scheduler to also pickle and pass this object to the workers.

Parameters:
  • indexes (np.ndarray) – A numpy array with shape (n, ) giving the 0-based index of the atoms which positions should be returned.

  • xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.

  • unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.

  • unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.

Return type:

dask.delayed

describe()[source]#

Gives a list of strings describing this feature’s feature-axis.

A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].

Returns:

The labels of this feature.

Return type:

list[str]

prefix_label: str = 'ATOM:'#
transform(xyz=None, unitcell_vectors=None, unitcell_info=None)[source]#

Takes xyz and unitcell information to apply the topological calculations on.

When this method is not provided with any input, it will take the traj_container provided as traj in the __init__() method and transforms this trajectory. The argument xyz can be the xyz coordinates in nanometer of a trajectory with identical topology as self.traj. If periodic was set to True, unitcell_vectors and unitcell_info should also be provided.

Parameters:
  • xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.

  • unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.

  • unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.

Returns:

The result of the computation with shape (n_frames, n_indexes).

Return type:

np.ndarray

class SideChainAngles(traj, deg=False, cossin=False, periodic=True, check_aas=True, generic_labels=False, delayed=False)[source]#

Bases: AngleFeature

Feature that collects all angles not in the backbone of a topology.

Parameters:
top#

Topology of this feature.

Type:

mdtraj.Topology

indexes#

The numpy array returned from top.select(‘all’).

Type:

np.ndarray

prefix_label#

A prefix for the labels. In this case it is ‘SIDECHANGLE’.

Type:

str

_raise_on_unitcell = False#
_use_angle = True#
_use_omega = False#
_use_periodic = True#
atom_feature = False#
describe()[source]#

Gives a list of strings describing this feature’s feature-axis.

A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].

Returns:

The labels of this feature.

Return type:

list[str]

generic_describe()[source]#

Returns a list of generic labels, not containing residue names. These can be used to stack tops of different topology.

Returns:

A list of labels.

Return type:

list[str]

property indexes#

A (n_angles, 3) shaped numpy array giving the atom indices of the angles to be calculated.

Type:

np.ndarray

property name#

The name of the class: “SideChainAngles”.

Type:

str

prefix_label: str = 'SIDECHANGLE '#
class SideChainBondDistances(traj, periodic=True, check_aas=True, generic_labels=False, delayed=False)[source]#

Bases: AllBondDistances

Feature that collects all bonds not in the backbone of a topology.

Parameters:
top#

Topology of this feature.

Type:

mdtraj.Topology

indexes#

The numpy array returned from top.select(‘all’).

Type:

np.ndarray

prefix_label#

A prefix for the labels. In this case it is ‘SIDECHDISTANCE’.

Type:

str

_raise_on_unitcell = False#
_use_angle = False#
_use_omega = False#
_use_periodic = True#
atom_feature = False#
generic_describe()[source]#

Returns a list of generic labels, not containing residue names. These can be used to stack tops of different topology.

Returns:

A list of labels.

Return type:

list[str]

property indexes#

A (n_angles, 2) shaped numpy array giving the atom indices of the distances to be calculated.

Type:

np.ndarray

property name#

The name of the class: “SideChainBondDistances”.

Type:

str

prefix_label: str = 'SIDECHDISTANCE  '#
class SideChainCartesians(traj, check_aas=True, generic_labels=False, delayed=False)[source]#

Bases: SelectionFeature

Feature that collects all cartesian positions of all non-backbone atoms.

Parameters:
top#

Topology of this feature.

Type:

mdtraj.Topology

indexes#

The numpy array returned from top.select(‘all’).

Type:

np.ndarray

prefix_label#

A prefix for the labels. In this case it is ‘SIDECHPOS’.

Type:

str

_raise_on_unitcell = False#
_use_angle = False#
_use_omega = False#
_use_periodic = False#
atom_feature = False#
describe()[source]#

Gives a list of strings describing this feature’s feature-axis.

A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].

Returns:

The labels of this feature.

Return type:

list[str]

generic_describe()[source]#

Returns a list of generic labels, not containing residue names. These can be used to stack tops of different topology.

Returns:

A list of labels.

Return type:

list[str]

property name#

The name of the class: “SideChainCartesians”.

Type:

str

prefix_label: str = 'SIDECHPOS'#
class SideChainDihedrals(traj, selstr=None, deg=False, cossin=False, periodic=True, generic_labels=False, check_aas=True, delayed=False)[source]#

Bases: DihedralFeature

Feature that collects all dihedrals in the backbone of a topology.

Parameters:
top#

Topology of this feature.

Type:

mdtraj.Topology

indexes#

The numpy array returned from top.select(‘all’).

Type:

np.ndarray

options#

A list of possible sidechain angles [‘chi1’ to ‘chi5’].

Type:

list[str]

_raise_on_unitcell = False#
_use_angle = True#
_use_omega = False#
_use_periodic = True#
atom_feature = False#
describe()[source]#

Gives a list of strings describing this feature’s feature-axis.

A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].

Returns:

The labels of this feature. The length

is determined by the dih_indexes and the cossin argument in the __init__() method. If cossin is false, then len(describe()) == self.angle_indexes[-1], else len(describe()) is twice as long.

Return type:

list[str]

generic_describe()[source]#

Returns a list of generic labels, not containing residue names. These can be used to stack tops of different topology.

Returns:

A list of labels.

Return type:

list[str]

property indexes: ndarray#

A (n_angles, 4) shaped numpy array giving the atom indices of the dihedral angles to be calculated.

Type:

np.ndarray

property name: str#

The name of the class: “SideChainDihedrals”.

Type:

str

options: list[str] = ['chi1', 'chi2', 'chi3', 'chi4', 'chi5']#
class SideChainTorsions(traj, selstr=None, deg=False, cossin=False, periodic=True, which='all', delayed=False)[source]#

Bases: DihedralFeature

Parameters:
  • traj (Union[SingleTraj, TrajEnsemble])

  • selstr (Optional[str])

  • deg (bool)

  • cossin (bool)

  • periodic (bool)

  • which (Union[Literal['all'], Sequence[Literal['chi1', 'chi2', 'chi3', 'chi4', 'chi5']]])

  • delayed (bool)

_raise_on_unitcell = False#
_use_angle = True#
_use_omega = False#
_use_periodic = True#
atom_feature = False#
describe()[source]#

Gives a list of strings describing this feature’s feature-axis.

A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].

Returns:

The labels of this feature. The length

is determined by the dih_indexes and the cossin argument in the __init__() method. If cossin is false, then len(describe()) == self.angle_indexes[-1], else len(describe()) is twice as long.

Return type:

list[str]

options = ('chi1', 'chi2', 'chi3', 'chi4', 'chi5')#

encodermap.loading.featurizer module#

EncoderMap featurization follows the example of the now deprecated PyEMMA package.

You can define your features in advance, inspect the expected output and then let the computer do the number crunching afterwards. This can be done with either PyEMMAs streamable featurization or new with dask and delayed on a dask-cluster of your liking. Here are the basic concepts of EncoderMap’s featurization.

class DaskFeaturizer(trajs, n_workers='cpu-2', client=None)[source]#

Bases: object

Container for SingleTrajFeaturizer and EnsembleFeaturizer that implements delayed transforms.

The DaskFeaturizer is similar to the other two featurizer classes and mostly implements the same API. However, instead of computing the transformations using in-memory computing, it prepares a xarray.Dataset, which contains dask.Arrays. This dataset can be lazily and distributively evaluated using dask.distributed clients and clusters.

Parameters:
property active_features: list[AnyFeature] | dict[md.Topology, list[AnyFeature]]#
add_custom_feature(feature)[source]#
build_graph(traj=None, streamable=False, return_delayeds=False)[source]#

Prepares the dask graph.

Parameters:
  • with_trajectories (Optional[bool]) – Whether to also compute xyz. This can be useful if you want to also save the trajectories to disk.

  • traj (Optional[SingleTraj])

  • streamable (bool)

  • return_delayeds (bool)

Return type:

None

describe()[source]#
Return type:

list[str]

dimension()[source]#
Return type:

int

property feature_containers: dict[Topology, SingleTrajFeaturizer]#
get_output(make_trace=False)[source]#

This function passes the trajs and the features of to dask to create a delayed xarray out of that.

Parameters:

make_trace (bool)

Return type:

Dataset

to_netcdf(filename, overwrite=False, with_trajectories=False)[source]#

Saves the dask tasks to a NetCDF4 formatted HDF5 file.

Parameters:
  • filename (Union[str, list[str]]) – The filename to be used.

  • overwrite (bool) – Whether to overwrite the existing filename.

  • with_trajectories (bool) – Also save the trajectory data. The output file can be read with encodermap.load(filename) and rebuilds the trajectories complete with traj_nums, common_str, custom_top, and all CVs, that this featurizer calculates.

Returns:

Returns the filename of the created files.

Return type:

str

transform(traj_or_trajs=None, *args, **kwargs)[source]#
Parameters:

traj_or_trajs (Optional[Union[SingleTraj, TrajEnsemble]])

Return type:

np.ndarray

visualize()[source]#
Return type:

None

class Featurizer(traj)[source]#

Bases: object

EncoderMap’s featurization has drawn much inspiration from PyEMMA (markovmodel/PyEMMA).

EncoderMap’s Featurizer collects and computes collective variables (CVs). CVs are data that are aligned with MD trajectories on the frame/time axis. Trajectory data contains (besides the topology) an axis for atoms, and an axis for cartesian coordinate (x, y, z), so that a trajectory can be understood as an array with shape (n_frames, n_atoms, 3). A CV is an array that is aligned with the frame/time and has its own feature axis. If the trajectory in our example has 3 residues (MET, ALA, GLY), we can define 6 dihedral angles along the backbone of this peptide. These angles are:

  • PSI1: Between MET1-N - MET1-CA - MET1-C - ALA2-N

  • OMEGA1: Between MET1-CA - MET1-C - ALA2-N - ALA2-CA

  • PHI1: Between MET1-C - ALA2-N - ALA2-CA - ALA2-C

  • PSI2: Between ALA2-N - ALA2-CA - ALA2-C - GLY3-N

  • OMEGA2: Between ALA2-CA - ALA2-C - GLY3-N - GLY3-CA

  • PHI2: Between ALA2-C - GLY3-N - GLY3-CA - GLY3-C

Thus, the collective variable ‘backbone-dihedrals’ provides an array of shape (n_frames, 6) and is aligned with the frame/time axis of the trajectory.

Parameters:

traj (Union[SingleTraj, TrajEnsemble])

Module contents#