Featurization#

class Featurizer(traj)[source]#

EncoderMap’s featurization has drawn much inspiration from PyEMMA (markovmodel/PyEMMA).

EncoderMap’s Featurizer collects and computes collective variables (CVs). CVs are data that are aligned with MD trajectories on the frame/time axis. Trajectory data contains (besides the topology) an axis for atoms, and an axis for cartesian coordinate (x, y, z), so that a trajectory can be understood as an array with shape (n_frames, n_atoms, 3). A CV is an array that is aligned with the frame/time and has its own feature axis. If the trajectory in our example has 3 residues (MET, ALA, GLY), we can define 6 dihedral angles along the backbone of this peptide. These angles are:

  • PSI1: Between MET1-N - MET1-CA - MET1-C - ALA2-N

  • OMEGA1: Between MET1-CA - MET1-C - ALA2-N - ALA2-CA

  • PHI1: Between MET1-C - ALA2-N - ALA2-CA - ALA2-C

  • PSI2: Between ALA2-N - ALA2-CA - ALA2-C - GLY3-N

  • OMEGA2: Between ALA2-CA - ALA2-C - GLY3-N - GLY3-CA

  • PHI2: Between ALA2-C - GLY3-N - GLY3-CA - GLY3-C

Thus, the collective variable ‘backbone-dihedrals’ provides an array of shape (n_frames, 6) and is aligned with the frame/time axis of the trajectory.

Parameters:

traj (Union[SingleTraj, TrajEnsemble])

Features#

Features contain topological information of proteins and other biomolecules.

These topological information can be calculated once and then provided with input coordinates to calculate frame-wise collective variables of MD simulations.

The features in this module used to inherit from PyEMMA’s features (markovmodel/PyEMMA), but PyEMMA has since been archived.

If using EncoderMap’s featurization make sure to also cite PyEMMA, from which a lot of this code was adopted:

@article{scherer_pyemma_2015,
     author = {Scherer, Martin K. and Trendelkamp-Schroer, Benjamin
               and Paul, Fabian and Pérez-Hernández, Guillermo and Hoffmann, Moritz and
               Plattner, Nuria and Wehmeyer, Christoph and Prinz, Jan-Hendrik and Noé, Frank},
     title = {{PyEMMA} 2: {A} {Software} {Package} for {Estimation},
              {Validation}, and {Analysis} of {Markov} {Models}},
     journal = {Journal of Chemical Theory and Computation},
     volume = {11},
     pages = {5525-5542},
     year = {2015},
     issn = {1549-9618},
     shorttitle = {{PyEMMA} 2},
     url = {http://dx.doi.org/10.1021/acs.jctc.5b00743},
     doi = {10.1021/acs.jctc.5b00743},
     urldate = {2015-10-19},
     month = oct,
}
class AlignFeature(traj, reference, indexes, atom_indices=None, ref_atom_indices=None, in_place=False, delayed=False)[source]#

Bases: SelectionFeature

Parameters:
  • traj (SingleTraj)

  • reference (md.Trajectory)

  • indexes (np.ndarray)

  • atom_indices (Optional[np.ndarray])

  • ref_atom_indices (Optional[np.ndarray])

  • in_place (bool)

  • delayed (bool)

_raise_on_unitcell = False#
_use_angle = False#
_use_omega = False#
_use_periodic = False#
atom_feature = False#
prefix_label: str = 'aligned ATOM:'#
transform(xyz=None, unitcell_vectors=None, unitcell_info=None)[source]#

Returns the aligned xyz coordinates.

Parameters:
Return type:

ndarray

class AllBondDistances(traj, distance_indexes=None, periodic=True, check_aas=True, delayed=False)[source]#

Bases: DistanceFeature

Feature that collects all bonds in a topology.

Parameters:
top#

Topology of this feature.

Type:

mdtraj.Topology

indexes#

The numpy array returned from top.select(‘all’).

Type:

np.ndarray

prefix_label#

A prefix for the labels. In this case it is ‘DISTANCE’.

Type:

str

_raise_on_unitcell = False#
_use_angle = False#
_use_omega = False#
_use_periodic = True#
atom_feature = False#
describe()[source]#

Gives a list of strings describing this feature’s feature-axis.

A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].

Returns:

The labels of this feature.

Return type:

list[str]

generic_describe()[source]#

Returns a list of generic labels, not containing residue names. These can be used to stack tops of different topology.

Returns:

A list of labels.

Return type:

list[str]

property indexes: ndarray#

A (n_angles, 2) shaped numpy array giving the atom indices of the distances to be calculated.

Type:

np.ndarray

property name: str#

The name of the class: “AllBondDistances”.

Type:

str

prefix_label: str = 'DISTANCE        '#
class AllCartesians(traj, check_aas=True, generic_labels=False, delayed=False)[source]#

Bases: SelectionFeature

Feature that collects all cartesian positions of all atoms in the trajectory.

Note

The order of the cartesians is not as in standard MD coordinates. Rather than giving the positions of all atoms of the first residue, and then all positions of the second, and so on, this feature gives all central (backbone) cartesians first, followed by the cartesians of the sidechains. This allows better and faster backmapping. See encodermap.misc.backmapping._full_backmapping_np for mor info, why this is easier.

Parameters:
top#

Topology of this feature.

Type:

mdtraj.Topology

indexes#

The numpy array returned from top.select(‘all’).

Type:

np.ndarray

prefix_label#

A prefix for the labels. In this case, it is ‘POSITION’.

Type:

str

_raise_on_unitcell = False#
_use_angle = False#
_use_omega = False#
_use_periodic = False#
atom_feature = False#
describe()[source]#

Gives a list of strings describing this feature’s feature-axis.

A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].

Returns:

The labels of this feature. This list has as many entries as atoms in self.top.

Return type:

list[str]

generic_describe()[source]#

Returns a list of generic labels, not containing residue names. These can be used to stack tops of different topology.

Returns:

A list of labels.

Return type:

list[str]

property name: str#

The name of this class: ‘AllCartesians’

Type:

str

prefix_label: str = 'POSITION '#
class AngleFeature(traj, angle_indexes, deg=False, cossin=False, periodic=True, check_aas=True, delayed=False)[source]#

Bases: Feature

Parameters:
_raise_on_unitcell = False#
_use_angle = True#
_use_omega = False#
_use_periodic = True#
atom_feature = False#
property dask_indices: str#

The name of the delayed transformation to carry out with this feature.

Type:

str

static dask_transform()#

The same as transform() but without the need to pickle traj.

When dask delayed concurrencies are distributed, required python objects are pickled. Thus, every feature needs to have its own pickled traj. That defeats the purpose of dask distributed. Thus, this method implements the same calculations as transform as a more barebones approach. It foregoes the checks for periodicity and unit-cell shape and just takes xyz, unitcell vectors, and unitcell info. Furthermore, it is a staticmethod, so it doesn’t require self to function. However, it needs the indexes in self.indexes. That’s why the dask_indices property informs the scheduler to also pickle and pass this object to the workers.

Parameters:
  • indexes (np.ndarray) – A numpy array with shape (n, ) giving the 0-based index of the atoms which positions should be returned.

  • periodic (bool) – Whether to observe the minimum image convention and respect proteins breaking over the periodic boundary condition as a whole (True). In this case, the trajectory container in traj needs to have unitcell information. Defaults to True.

  • deg (bool) – Whether to return the result in degree (deg=True) or in radians (deg=False). Defaults to False (radians).

  • cossin (bool) – If True, each angle will be returned as a pair of (sin(x), cos(x)). This is useful, if you calculate the means (e.g. TICA/PCA, clustering) in that space. Defaults to False.

  • xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.

  • unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.

  • unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.

Return type:

dask.delayed

describe()[source]#

Gives a list of strings describing this feature’s feature-axis.

A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].

Returns:

The labels of this feature.

Return type:

list[str]

transform(xyz=None, unitcell_vectors=None, unitcell_info=None)[source]#

Takes xyz and unitcell information to apply the topological calculations on.

When this method is not provided with any input, it will take the traj_container provided as traj in the __init__() method and transforms this trajectory. The argument xyz can be the xyz coordinates in nanometer of a trajectory with identical topology as self.traj. If periodic was set to True, unitcell_vectors and unitcell_info should also be provided.

Parameters:
  • xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.

  • unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.

  • unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.

Returns:

The result of the computation with shape (n_frames, n_indexes).

Return type:

np.ndarray

class BackboneTorsionFeature(traj, selstr=None, deg=False, cossin=False, periodic=True, delayed=False)[source]#

Bases: DihedralFeature

Parameters:
_raise_on_unitcell = False#
_use_angle = True#
_use_omega = False#
_use_periodic = True#
atom_feature = False#
describe()[source]#

Gives a list of strings describing this feature’s feature-axis.

A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].

Returns:

The labels of this feature. The length

is determined by the dih_indexes and the cossin argument in the __init__() method. If cossin is false, then len(describe()) == self.angle_indexes[-1], else len(describe()) is twice as long.

Return type:

list[str]

class CentralAngles(traj, deg=False, cossin=False, periodic=True, generic_labels=False, check_aas=True, delayed=False)[source]#

Bases: AngleFeature

Feature that collects all angles in the backbone of a topology.

Parameters:
top#

Topology of this feature.

Type:

mdtraj.Topology

indexes#

The numpy array returned from top.select(‘all’).

Type:

np.ndarray

prefix_label#

A prefix for the labels. In this case it is ‘CENTERANGLE’.

Type:

str

_raise_on_unitcell = False#
_use_angle = True#
_use_omega = False#
_use_periodic = True#
atom_feature = False#
describe()[source]#

Gives a list of strings describing this feature’s feature-axis.

A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].

Returns:

The labels of this feature.

Return type:

list[str]

generic_describe()[source]#

Returns a list of generic labels, not containing residue names. These can be used to stack tops of different topology.

Returns:

A list of labels.

Return type:

list[str]

property indexes: ndarray#

A (n_angles, 3) shaped numpy array giving the atom indices of the angles to be calculated.

Type:

np.ndarray

property name: str#

The name of the class: “CentralAngles”.

Type:

str

prefix_label: str = 'CENTERANGLE     '#
class CentralBondDistances(traj, distance_indexes=None, periodic=True, generic_labels=False, check_aas=True, delayed=False)[source]#

Bases: AllBondDistances

Feature that collects all bonds in the backbone of a topology.

Parameters:
top#

Topology of this feature.

Type:

mdtraj.Topology

indexes#

The numpy array returned from top.select(‘all’).

Type:

np.ndarray

prefix_label#

A prefix for the labels. In this case, it is ‘CENTERDISTANCE’.

Type:

str

_raise_on_unitcell = False#
_use_angle = False#
_use_omega = False#
_use_periodic = True#
atom_feature = False#
generic_describe()[source]#

Returns a list of generic labels, not containing residue names. These can be used to stack tops of different topology.

Returns:

A list of labels.

Return type:

list[str]

property indexes: ndarray#

A (n_angles, 2) shaped numpy array giving the atom indices of the distances to be calculated.

Type:

np.ndarray

property name: str#

The name of the class: “CentralBondDistances”.

Type:

str

prefix_label: str = 'CENTERDISTANCE  '#
class CentralCartesians(traj, generic_labels=False, check_aas=True, delayed=False)[source]#

Bases: SelectionFeature

Feature that collects all cartesian positions of the backbone atoms.

Examples

>>> import encodermap as em
>>> from pprint import pprint
>>> traj = em.load_project("pASP_pGLU", 0)[0]
>>> traj  
<encodermap.SingleTraj object...>
>>> feature = em.features.CentralCartesians(traj, generic_labels=False)
>>> pprint(feature.describe())  
['CENTERPOS X     ATOM     N:    0 GLU:   1 CHAIN 0',
 'CENTERPOS Y     ATOM     N:    0 GLU:   1 CHAIN 0',
 'CENTERPOS Z     ATOM     N:    0 GLU:   1 CHAIN 0',
 'CENTERPOS X     ATOM    CA:    3 GLU:   1 CHAIN 0',
 'CENTERPOS Y     ATOM    CA:    3 GLU:   1 CHAIN 0',
 'CENTERPOS Z     ATOM    CA:    3 GLU:   1 CHAIN 0',
 '...
 'CENTERPOS Z     ATOM     C:   65 GLU:   6 CHAIN 0']
 >>> feature = em.features.CentralCartesians(traj, generic_labels=True)
 >>> pprint(feature.describe())  
 ['CENTERPOS X 1',
  'CENTERPOS Y 1',
  'CENTERPOS Z 1',
  'CENTERPOS X 2',
  'CENTERPOS Y 2',
  'CENTERPOS Z 2',
  '...
  'CENTERPOS Z 18']
Parameters:
_raise_on_unitcell = False#
_use_angle = False#
_use_omega = False#
_use_periodic = False#
atom_feature = False#
describe()[source]#

Gives a list of strings describing this feature’s feature-axis.

A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].

Returns:

The labels of this feature.

Return type:

list[str]

generic_describe()[source]#

Returns a list of generic labels, not containing residue names. These can be used to stack tops of different topology.

Returns:

A list of labels.

Return type:

list[str]

property name: str#

The name of the class: “CentralCartesians”.

Type:

str

prefix_label: str = 'CENTERPOS'#
class CentralDihedrals(traj, selstr=None, deg=False, cossin=False, periodic=True, omega=True, generic_labels=False, check_aas=True, delayed=False)[source]#

Bases: DihedralFeature

Feature that collects all dihedrals in the backbone of a topology.

Parameters:
top#

Topology of this feature.

Type:

mdtraj.Topology

indexes#

The numpy array returned from top.select(‘all’).

Type:

np.ndarray

_raise_on_unitcell = False#
_use_angle = True#
_use_omega = True#
_use_periodic = True#
atom_feature = False#
describe()[source]#

Gives a list of strings describing this feature’s feature-axis.

A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].

Returns:

The labels of this feature. The length

is determined by the dih_indexes and the cossin argument in the __init__() method. If cossin is false, then len(describe()) == self.angle_indexes[-1], else len(describe()) is twice as long.

Return type:

list[str]

generic_describe()[source]#

Returns a list of generic labels, not containing residue names. These can be used to stack tops of different topology.

Returns:

A list of labels.

Return type:

list[str]

property indexes: ndarray#

A (n_angles, 4) shaped numpy array giving the atom indices of the dihedral angles to be calculated.

Type:

np.ndarray

property name: str#

The name of the class: “CentralDihedrals”.

Type:

str

exception CitePYEMMAWarning[source]#

Bases: UserWarning

class ContactFeature(traj, distance_indexes, threshold=5.0, periodic=True, count_contacts=False, delayed=False)[source]#

Bases: DistanceFeature

Defines certain distances as contacts and returns a binary (0, 1) result.

Instead of returning the binary result can also count contacts with the argument count_contacts=True provided at instantiation. In that case, every frame returns an integer number.

Parameters:
_nonstandard_transform_args: list[str] = ['threshold', 'count_contacts']#
_raise_on_unitcell = False#
_use_angle = False#
_use_omega = False#
_use_periodic = True#
atom_feature = False#
property dask_indices: str#

The name of the delayed transformation to carry out with this feature.

Type:

str

static dask_transform()#

The same as transform() but without the need to pickle traj.

When dask delayed concurrencies are distributed, required python objects are pickled. Thus, every feature needs to have its own pickled traj. That defeats the purpose of dask distributed. Thus, this method implements the same calculations as transform as a more barebones approach. It foregoes the checks for periodicity and unit-cell shape and just takes xyz, unitcell vectors, and unitcell info. Furthermore, it is a staticmethod, so it doesn’t require self to function. However, it needs the indexes in self.indexes. That’s why the dask_indices property informs the scheduler to also pickle and pass this object to the workers.

Parameters:
  • indexes (np.ndarray) – A numpy array with shape (n, ) giving the 0-based index of the atoms which positions should be returned.

  • periodic (bool) – Whether to observe the minimum image convention and respect proteins breaking over the periodic boundary condition as a whole (True). In this case, the trajectory container in traj needs to have unitcell information. Defaults to True.

  • threshold (float) – The threshold in nm, under which a distance is considered to be a contact. Defaults to 5.0 nm.

  • count_contacts (bool) – When True, return an integer of the number of contacts instead of returning the array of regular contacts.

  • xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.

  • unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.

  • unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.

Return type:

dask.delayed

prefix_label: str = 'CONTACT:'#
transform(xyz=None, unitcell_vectors=None, unitcell_info=None)[source]#

Takes xyz and unitcell information to apply the topological calculations on.

When this method is not provided with any input, it will take the traj_container provided as traj in the __init__() method and transforms this trajectory. The argument xyz can be the xyz coordinates in nanometer of a trajectory with identical topology as self.traj. If periodic was set to True, unitcell_vectors and unitcell_info should also be provided.

Parameters:
  • xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.

  • unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.

  • unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.

Returns:

The result of the computation with shape (n_frames, n_indexes).

Return type:

np.ndarray

class CustomFeature(fun, dim, traj=None, description=None, fun_args=(), fun_kwargs=None, delayed=False)[source]#

Bases: Feature

Parameters:
_args: tuple[Any, ...] | None = None#
_fun: Callable | None = None#
_is_custom: Final[True] = True#
_kwargs: dict[str, Any] | None = None#
_nonstandard_transform_args: list[str] = ['top', 'indexes', 'delayed_call', '_fun', '_args', '_kwargs']#
_raise_on_unitcell = False#
_use_angle = False#
_use_omega = False#
_use_periodic = False#
atom_feature = False#
property dask_indices#

The name of the delayed transformation to carry out with this feature.

Type:

str

static dask_transform()#

The CustomFeature dask transfrom is still under development.

Parameters:
  • top (md.Topology)

  • indexes (np.ndarray)

  • delayed_call (Optional[Callable])

  • _fun (Optional[Callable])

  • _args (Optional[Sequence[Any]])

  • _kwargs (Optional[dict[str, Any]])

  • xyz (Optional[np.ndarray])

  • unitcell_vectors (Optional[np.ndarray])

  • unitcell_info (Optional[np.ndarray])

Return type:

np.ndarray

delayed: bool = False#
describe()[source]#

Gives a list of strings describing this feature’s feature-axis.

A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].

Returns:

The labels of this feature.

Return type:

list[str]

indexes: np.ndarray | None = None#
top: md.Topology | None = None#
traj: SingleTraj | None = None#
transform(traj=None, xyz=None, unitcell_vectors=None, unitcell_info=None)[source]#

Takes xyz and unitcell information to apply the topological calculations on.

When this method is not provided with any input, it will take the traj_container provided as traj in the __init__() method and transforms this trajectory. The argument xyz can be the xyz coordinates in nanometer of a trajectory with identical topology as self.traj. If periodic was set to True, unitcell_vectors and unitcell_info should also be provided.

Parameters:
  • xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.

  • unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.

  • unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.

  • traj (Trajectory | None)

Returns:

The result of the computation with shape (n_frames, n_indexes).

Return type:

np.ndarray

class DihedralFeature(traj, dih_indexes, deg=False, cossin=False, periodic=True, check_aas=True, delayed=False)[source]#

Bases: AngleFeature

Dihedrals are torsion angles defined by four atoms.

Parameters:
_raise_on_unitcell = False#
_use_angle = True#
_use_omega = False#
_use_periodic = True#
atom_feature = False#
static dask_transform()#

The same as transform() but without the need to pickle traj.

When dask delayed concurrencies are distributed, required python objects are pickled. Thus, every feature needs to have its own pickled traj. That defeats the purpose of dask distributed. Thus, this method implements the same calculations as transform as a more barebones approach. It foregoes the checks for periodicity and unit-cell shape and just takes xyz, unitcell vectors, and unitcell info. Furthermore, it is a staticmethod, so it doesn’t require self to function. However, it needs the indexes in self.indexes. That’s why the dask_indices property informs the scheduler to also pickle and pass this object to the workers.

Parameters:
  • indexes (np.ndarray) – A numpy array with shape (n, ) giving the 0-based index of the atoms which positions should be returned.

  • periodic (bool) – Whether to observe the minimum image convention and respect proteins breaking over the periodic boundary condition as a whole (True). In this case, the trajectory container in traj needs to have unitcell information. Defaults to True.

  • deg (bool) – Whether to return the result in degree (deg=True) or in radians (deg=False). Defaults to False (radians).

  • cossin (bool) – If True, each angle will be returned as a pair of (sin(x), cos(x)). This is useful, if you calculate the means (e.g. TICA/PCA, clustering) in that space. Defaults to False.

  • xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.

  • unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.

  • unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.

Return type:

dask.delayed

describe()[source]#

Gives a list of strings describing this feature’s feature-axis.

A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].

Returns:

The labels of this feature. The length

is determined by the dih_indexes and the cossin argument in the __init__() method. If cossin is false, then len(describe()) == self.angle_indexes[-1], else len(describe()) is twice as long.

Return type:

list[str]

transform(xyz=None, unitcell_vectors=None, unitcell_info=None)[source]#

Takes xyz and unitcell information to apply the topological calculations on.

When this method is not provided with any input, it will take the traj_container provided as traj in the __init__() method and transforms this trajectory. The argument xyz can be the xyz coordinates in nanometer of a trajectory with identical topology as self.traj. If periodic was set to True, unitcell_vectors and unitcell_info should also be provided.

Parameters:
  • xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.

  • unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.

  • unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.

Returns:

The result of the computation with shape (n_frames, n_indexes).

Return type:

np.ndarray

class DistanceFeature(traj, distance_indexes, periodic=True, dim=None, check_aas=True, delayed=False)[source]#

Bases: Feature

Parameters:
_raise_on_unitcell = False#
_use_angle = False#
_use_omega = False#
_use_periodic = True#
atom_feature = False#
property dask_indices: str#

The name of the delayed transformation to carry out with this feature.

Type:

str

static dask_transform()#

The same as transform() but without the need to pickle traj.

When dask delayed concurrencies are distributed, required python objects are pickled. Thus, every feature needs to have its own pickled traj. That defeats the purpose of dask distributed. Thus, this method implements the same calculations as transform as a more barebones approach. It foregoes the checks for periodicity and unit-cell shape and just takes xyz, unitcell vectors, and unitcell info. Furthermore, it is a staticmethod, so it doesn’t require self to function. However, it needs the indexes in self.indexes. That’s why the dask_indices property informs the scheduler to also pickle and pass this object to the workers.

Parameters:
  • indexes (np.ndarray) – A numpy array with shape (n, ) giving the 0-based index of the atoms which positions should be returned.

  • periodic (bool) – Whether to observe the minimum image convention and respect proteins breaking over the periodic boundary condition as a whole (True). In this case, the trajectory container in traj needs to have unitcell information. Defaults to True.

  • xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.

  • unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.

  • unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.

Return type:

dask.delayed

describe()[source]#

Gives a list of strings describing this feature’s feature-axis.

A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].

Returns:

The labels of this feature.

Return type:

list[str]

prefix_label: str = 'DIST:'#
transform(xyz=None, unitcell_vectors=None, unitcell_info=None)[source]#

Takes xyz and unitcell information to apply the topological calculations on.

When this method is not provided with any input, it will take the traj_container provided as traj in the __init__() method and transforms this trajectory. The argument xyz can be the xyz coordinates in nanometer of a trajectory with identical topology as self.traj. If periodic was set to True, unitcell_vectors and unitcell_info should also be provided.

Parameters:
  • xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.

  • unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.

  • unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.

Returns:

The result of the computation with shape (n_frames, n_indexes).

Return type:

np.ndarray

class Feature(traj, check_aas=True, periodic=None, delayed=False)[source]#

Bases: object

Parent class to all feature classes. Implements the FeatureMeta,

the transform method, and checks for unknown amino acids..

This class implements functionality, that holds true for all features. The transform() method can be used by subclasses in two ways:

  • Provide all args with None. In this case, the traj in self.traj

    will be used to calculate the transformation.

  • Provide custom xyz, unitcell_vectors, and unitcell_info. In this

    case,

Parameters:
_raise_on_unitcell = False#
_use_angle = False#
_use_omega = False#
_use_periodic = True#
atom_feature = False#
property dimension: int#

The dimension of the feature.

Type:

int

transform(xyz=None, unitcell_vectors=None, unitcell_info=None)[source]#

Carries out the computation of the CVs.

For featurization of single trajs, all arguments can be left None, and the values of the traj at class instantiation will be returned by this method. For ensembles with a single topology, but multiple trajectories, the xyz, unitcell_vectors, and unitcell_info should be provided accordingly. This parent class’ transform then carries out checks (do all arguments provide the same number of frames, does the xyz array have the same number of atoms as the traj at instantiation, do the unitcell_angles coincide with the one of the parent traj, …). Thus, it is generally advised to call this method with super() to run these checks.

Parameters:
  • xyz (Optional[np.ndarray]) – If None, the coordinates of the trajectory in provided as traj, when the feature was instantiated will be used.

  • unitcell_vectors (Optional[np.ndarray]) – If None, the unitcell vectors of the trajectory in provided as traj, when the feature was instantiated will be used. Unitcell_vectors are arrays with shape (n_frames, 3, 3), where the rows are the bravais vectors a, b, c.

  • unitcell_info (Optional[np.ndarray]) – If None, the unitcell info of the trajectory in provided as traj, when the feature was instantiated will be used. The unitcell_info is an array with shape (n_frames, 6), where the first three columns are the unitcell lengths in nm, the remaining columns are the unitcell angles in deg.

Returns:

A tuple containing three np.ndarrays:
  • The xyz coordinates.

  • The unitcell_vectors

  • The unitcell_info

Return type:

tuple

class FeatureMeta(name, bases, dct)[source]#

Bases: type

Inspects the __init__ of classes and adds attributes to them based on their call signature.

If a feature uses the arguments deg or omega in its call signature, the instance will have the CLASS attributes _use_angle and _use_omega set to True. Otherwise, the instance will have them set as False.

This allows other functions that use these features to easily discern whether they need these arguments before instantiating the classes.

Example

>>> from encodermap.loading import features
>>> f_class = getattr(features, "SideChainDihedrals")
>>> f_class._use_angle
True
>>> f_class._use_omega
False
class GroupCOMFeature(traj, group_definitions, ref_geom=None, image_molecules=False, mass_weighted=True, delayed=False)[source]#

Bases: Feature

Cartesian coordinates of the center-of-mass (COM) of atom groups.

Groups can be defined as sequences of sequences of int. So a list of list of int can be used to define groups of various sizes. The resulting array will have the shape of (n_frames, n_groups ** 2). The xyz coordinates are flattended, so the array can be rebuilt with np.dstack()

Examples

>>> import encodermap as em
>>> import numpy as np
>>> traj = em.SingleTraj.from_pdb_id("1YUG")
>>> f = em.features.GroupCOMFeature(
...     traj=traj,
...     group_definitions=[
...         [0, 1, 2],
...         [3, 4, 5, 6, 7],
...         [8, 9, 10],
...     ]
... )
>>> a = f.transform()
>>> a.shape  # this array is flattened along the feature axis
(15, 9)
>>> a = np.dstack([
...     a[..., ::3],
...     a[..., 1::3],
...     a[..., 2::3],
... ])
>>> a.shape  # now the z, coordinate of the 2nd center of mass is a[:, 1, -1]
(15, 3, 3)

Note

Centering (ref_geom) and imaging (image_molecules=True) can be time- consuming. Consider doing this to your trajectory files prior to featurization.

Parameters:
  • traj (SingleTraj)

  • group_definitions (Sequence[Sequence[int]])

  • ref_geom (Optional[md.Trajectory])

  • image_molecules (bool)

  • mass_weighted (bool)

  • delayed (bool)

_nonstandard_transform_args: list[str] = ['top', 'ref_geom', 'image_molecules', 'masses_in_groups']#
_raise_on_unitcell = False#
_use_angle = False#
_use_omega = False#
_use_periodic = False#
atom_feature = False#
property dask_indices: str#

The name of the delayed transformation to carry out with this feature.

Type:

str

static dask_transform()#

The same as transform() but without the need to pickle traj.

When dask delayed concurrencies are distributed, required python objects are pickled. Thus, every feature needs to have its own pickled traj. That defeats the purpose of dask distributed. Thus, this method implements the same calculations as transform as a more barebones approach. It foregoes the checks for periodicity and unit-cell shape and just takes xyz, unitcell vectors, and unitcell info. Furthermore, it is a staticmethod, so it doesn’t require self to function. However, it needs the indexes in self.indexes. That’s why the dask_indices property informs the scheduler to also pickle and pass this object to the workers.

Parameters:
  • indexes (np.ndarray) – For this special feature, the indexes argument in the @staticmethod dask_transform is self.group_definitions.

  • periodic (bool) – Whether to observe the minimum image convention and respect proteins breaking over the periodic boundary condition as a whole (True). In this case, the trajectory container in traj needs to have unitcell information. Defaults to True.

  • threshold (float) – The threshold in nm, under which a distance is considered to be a contact. Defaults to 5.0 nm.

  • count_contacts (bool) – When True, return an integer of the number of contacts instead of returning the array of regular contacts.

  • xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.

  • unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.

  • unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.

  • top (md.Topology)

  • ref_geom (Union[md.Trajectory, None])

  • image_molecules (bool)

  • masses_in_groups (list[float])

Return type:

dask.delayed

describe()[source]#

Gives a list of strings describing this feature’s feature-axis.

A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].

Returns:

The labels of this feature.

Return type:

list[str]

transform(xyz=None, unitcell_vectors=None, unitcell_info=None)[source]#

Takes xyz and unitcell information to apply the topological calculations on.

When this method is not provided with any input, it will take the traj_container provided as traj in the __init__() method and transforms this trajectory. The argument xyz can be the xyz coordinates in nanometer of a trajectory with identical topology as self.traj. If periodic was set to True, unitcell_vectors and unitcell_info should also be provided.

Parameters:
  • xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.

  • unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.

  • unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.

Returns:

The result of the computation with shape (n_frames, n_indexes).

Return type:

np.ndarray

class InverseDistanceFeature(traj, distance_indexes, periodic=True, delayed=False)[source]#

Bases: DistanceFeature

Parameters:
_raise_on_unitcell = False#
_use_angle = False#
_use_omega = False#
_use_periodic = True#
atom_feature = False#
property dask_indices: str#

The name of the delayed transformation to carry out with this feature.

Type:

str

static dask_transform()#

The same as transform() but without the need to pickle traj.

When dask delayed concurrencies are distributed, required python objects are pickled. Thus, every feature needs to have its own pickled traj. That defeats the purpose of dask distributed. Thus, this method implements the same calculations as transform as a more barebones approach. It foregoes the checks for periodicity and unit-cell shape and just takes xyz, unitcell vectors, and unitcell info. Furthermore, it is a staticmethod, so it doesn’t require self to function. However, it needs the indexes in self.indexes. That’s why the dask_indices property informs the scheduler to also pickle and pass this object to the workers.

Parameters:
  • indexes (np.ndarray) – A numpy array with shape (n, ) giving the 0-based index of the atoms which positions should be returned.

  • periodic (bool) – Whether to observe the minimum image convention and respect proteins breaking over the periodic boundary condition as a whole (True). In this case, the trajectory container in traj needs to have unitcell information. Defaults to True.

  • xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.

  • unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.

  • unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.

Return type:

dask.delayed

prefix_label: str = 'INVDIST:'#
transform(xyz=None, unitcell_vectors=None, unitcell_info=None)[source]#

Takes xyz and unitcell information to apply the topological calculations on.

When this method is not provided with any input, it will take the traj_container provided as traj in the __init__() method and transforms this trajectory. The argument xyz can be the xyz coordinates in nanometer of a trajectory with identical topology as self.traj. If periodic was set to True, unitcell_vectors and unitcell_info should also be provided.

Parameters:
  • xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.

  • unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.

  • unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.

Returns:

The result of the computation with shape (n_frames, n_indexes).

Return type:

np.ndarray

class MinRmsdFeature(traj, ref, ref_frame=0, atom_indices=None, precentered=False, delayed=False)[source]#

Bases: Feature

Parameters:
_nonstandard_transform_args: list[str] = ['top', 'ref']#
_raise_on_unitcell = False#
_use_angle = False#
_use_omega = False#
_use_periodic = False#
atom_feature = False#
property dask_indices#

The name of the delayed transformation to carry out with this feature.

Type:

str

static dask_transform()#

Takes xyz and unitcell information to apply the topological calculations on.

Parameters:
  • xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.

  • unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.

  • unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.

  • indexes (np.ndarray)

  • top (md.Topology)

  • ref (md.Trajectory)

Returns:

The result of the computation with shape (n_frames, n_indexes).

Return type:

np.ndarray

describe()[source]#

Gives a list of strings describing this feature’s feature-axis.

A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].

Returns:

The labels of this feature.

Return type:

list[str]

transform(xyz=None, unitcell_vectors=None, unitcell_info=None)[source]#

Takes xyz and unitcell information to apply the topological calculations on.

When this method is not provided with any input, it will take the traj_container provided as traj in the __init__() method and transforms this trajectory. The argument xyz can be the xyz coordinates in nanometer of a trajectory with identical topology as self.traj. If periodic was set to True, unitcell_vectors and unitcell_info should also be provided.

Parameters:
  • xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.

  • unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.

  • unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.

Returns:

The result of the computation with shape (n_frames, n_indexes).

Return type:

np.ndarray

class ResidueCOMFeature(traj, residue_indices, residue_atoms, scheme='all', ref_geom=None, image_molecules=False, mass_weighted=True, delayed=False)[source]#

Bases: GroupCOMFeature

Parameters:
  • traj (SingleTraj)

  • residue_indices (Sequence[int])

  • residue_atoms (np.ndarray)

  • scheme (Literal['all', 'backbone', 'sidechain'])

  • ref_geom (Optional[md.Trajectory])

  • image_molecules (bool)

  • mass_weighted (bool)

  • delayed (bool)

_raise_on_unitcell = False#
_use_angle = False#
_use_omega = False#
_use_periodic = False#
atom_feature = False#
class ResidueMinDistanceFeature(traj, contacts, scheme, ignore_nonprotein, threshold, periodic, count_contacts=False, delayed=False)[source]#

Bases: DistanceFeature

Parameters:
  • traj (SingleTraj)

  • contacts (np.ndarray)

  • scheme (Literal['ca', 'closest', 'closest-heavy'])

  • ignore_nonprotein (bool)

  • threshold (float)

  • periodic (bool)

  • count_contacts (bool)

  • delayed (bool)

_nonstandard_transform_args: list[str] = ['threshold', 'count_contacts', 'scheme', 'top']#
_raise_on_unitcell = False#
_use_angle = False#
_use_omega = False#
_use_periodic = True#
atom_feature = False#
property dask_indices: str#

The name of the delayed transformation to carry out with this feature.

Type:

str

static dask_transform()#

The same as transform() but without the need to pickle traj.

When dask delayed concurrencies are distributed, required python objects are pickled. Thus, every feature needs to have its own pickled traj. That defeats the purpose of dask distributed. Thus, this method implements the same calculations as transform as a more barebones approach. It foregoes the checks for periodicity and unit-cell shape and just takes xyz, unitcell vectors, and unitcell info. Furthermore, it is a staticmethod, so it doesn’t require self to function. However, it needs the indexes in self.indexes. That’s why the dask_indices property informs the scheduler to also pickle and pass this object to the workers.

Parameters:
  • indexes (np.ndarray) – For this special feature, the indexes argument in the @staticmethod dask_transform is self.contacts.

  • periodic (bool) – Whether to observe the minimum image convention and respect proteins breaking over the periodic boundary condition as a whole (True). In this case, the trajectory container in traj needs to have unitcell information. Defaults to True.

  • threshold (float) – The threshold in nm, under which a distance is considered to be a contact. Defaults to 5.0 nm.

  • count_contacts (bool) – When True, return an integer of the number of contacts instead of returning the array of regular contacts.

  • xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.

  • unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.

  • unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.

  • top (md.Topology)

  • scheme (Literal['ca', 'closest', 'closest-heavy'])

Return type:

dask.delayed

describe()[source]#

Gives a list of strings describing this feature’s feature-axis.

A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].

Returns:

The labels of this feature.

Return type:

list[str]

transform(xyz=None, unitcell_vectors=None, unitcell_info=None)[source]#

Takes xyz and unitcell information to apply the topological calculations on.

When this method is not provided with any input, it will take the traj_container provided as traj in the __init__() method and transforms this trajectory. The argument xyz can be the xyz coordinates in nanometer of a trajectory with identical topology as self.traj. If periodic was set to True, unitcell_vectors and unitcell_info should also be provided.

Parameters:
  • xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.

  • unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.

  • unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.

Returns:

The result of the computation with shape (n_frames, n_indexes).

Return type:

np.ndarray

class SelectionFeature(traj, indexes, check_aas=True, delayed=False)[source]#

Bases: Feature

Parameters:
_raise_on_unitcell = False#
_use_angle = False#
_use_omega = False#
_use_periodic = False#
atom_feature = False#
property dask_indices: str#

The name of the delayed transformation to carry out with this feature.

Type:

str

static dask_transform()#

The same as transform() but without the need to pickle traj.

When dask delayed concurrencies are distributed, required python objects are pickled. Thus, every feature needs to have its own pickled traj. That defeats the purpose of dask distributed. Thus, this method implements the same calculations as transform as a more barebones approach. It foregoes the checks for periodicity and unit-cell shape and just takes xyz, unitcell vectors, and unitcell info. Furthermore, it is a staticmethod, so it doesn’t require self to function. However, it needs the indexes in self.indexes. That’s why the dask_indices property informs the scheduler to also pickle and pass this object to the workers.

Parameters:
  • indexes (np.ndarray) – A numpy array with shape (n, ) giving the 0-based index of the atoms which positions should be returned.

  • xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.

  • unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.

  • unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.

Return type:

dask.delayed

describe()[source]#

Gives a list of strings describing this feature’s feature-axis.

A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].

Returns:

The labels of this feature.

Return type:

list[str]

prefix_label: str = 'ATOM:'#
transform(xyz=None, unitcell_vectors=None, unitcell_info=None)[source]#

Takes xyz and unitcell information to apply the topological calculations on.

When this method is not provided with any input, it will take the traj_container provided as traj in the __init__() method and transforms this trajectory. The argument xyz can be the xyz coordinates in nanometer of a trajectory with identical topology as self.traj. If periodic was set to True, unitcell_vectors and unitcell_info should also be provided.

Parameters:
  • xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.

  • unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.

  • unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.

Returns:

The result of the computation with shape (n_frames, n_indexes).

Return type:

np.ndarray

class SideChainAngles(traj, deg=False, cossin=False, periodic=True, check_aas=True, generic_labels=False, delayed=False)[source]#

Bases: AngleFeature

Feature that collects all angles not in the backbone of a topology.

Parameters:
top#

Topology of this feature.

Type:

mdtraj.Topology

indexes#

The numpy array returned from top.select(‘all’).

Type:

np.ndarray

prefix_label#

A prefix for the labels. In this case it is ‘SIDECHANGLE’.

Type:

str

_raise_on_unitcell = False#
_use_angle = True#
_use_omega = False#
_use_periodic = True#
atom_feature = False#
describe()[source]#

Gives a list of strings describing this feature’s feature-axis.

A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].

Returns:

The labels of this feature.

Return type:

list[str]

generic_describe()[source]#

Returns a list of generic labels, not containing residue names. These can be used to stack tops of different topology.

Returns:

A list of labels.

Return type:

list[str]

property indexes#

A (n_angles, 3) shaped numpy array giving the atom indices of the angles to be calculated.

Type:

np.ndarray

property name#

The name of the class: “SideChainAngles”.

Type:

str

prefix_label: str = 'SIDECHANGLE '#
class SideChainBondDistances(traj, periodic=True, check_aas=True, generic_labels=False, delayed=False)[source]#

Bases: AllBondDistances

Feature that collects all bonds not in the backbone of a topology.

Parameters:
top#

Topology of this feature.

Type:

mdtraj.Topology

indexes#

The numpy array returned from top.select(‘all’).

Type:

np.ndarray

prefix_label#

A prefix for the labels. In this case it is ‘SIDECHDISTANCE’.

Type:

str

_raise_on_unitcell = False#
_use_angle = False#
_use_omega = False#
_use_periodic = True#
atom_feature = False#
generic_describe()[source]#

Returns a list of generic labels, not containing residue names. These can be used to stack tops of different topology.

Returns:

A list of labels.

Return type:

list[str]

property indexes#

A (n_angles, 2) shaped numpy array giving the atom indices of the distances to be calculated.

Type:

np.ndarray

property name#

The name of the class: “SideChainBondDistances”.

Type:

str

prefix_label: str = 'SIDECHDISTANCE  '#
class SideChainCartesians(traj, check_aas=True, generic_labels=False, delayed=False)[source]#

Bases: SelectionFeature

Feature that collects all cartesian positions of all non-backbone atoms.

Parameters:
top#

Topology of this feature.

Type:

mdtraj.Topology

indexes#

The numpy array returned from top.select(‘all’).

Type:

np.ndarray

prefix_label#

A prefix for the labels. In this case it is ‘SIDECHPOS’.

Type:

str

_raise_on_unitcell = False#
_use_angle = False#
_use_omega = False#
_use_periodic = False#
atom_feature = False#
describe()[source]#

Gives a list of strings describing this feature’s feature-axis.

A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].

Returns:

The labels of this feature.

Return type:

list[str]

generic_describe()[source]#

Returns a list of generic labels, not containing residue names. These can be used to stack tops of different topology.

Returns:

A list of labels.

Return type:

list[str]

property name#

The name of the class: “SideChainCartesians”.

Type:

str

prefix_label: str = 'SIDECHPOS'#
class SideChainDihedrals(traj, selstr=None, deg=False, cossin=False, periodic=True, generic_labels=False, check_aas=True, delayed=False)[source]#

Bases: DihedralFeature

Feature that collects all dihedrals in the backbone of a topology.

Parameters:
top#

Topology of this feature.

Type:

mdtraj.Topology

indexes#

The numpy array returned from top.select(‘all’).

Type:

np.ndarray

options#

A list of possible sidechain angles [‘chi1’ to ‘chi5’].

Type:

list[str]

_raise_on_unitcell = False#
_use_angle = True#
_use_omega = False#
_use_periodic = True#
atom_feature = False#
describe()[source]#

Gives a list of strings describing this feature’s feature-axis.

A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].

Returns:

The labels of this feature. The length

is determined by the dih_indexes and the cossin argument in the __init__() method. If cossin is false, then len(describe()) == self.angle_indexes[-1], else len(describe()) is twice as long.

Return type:

list[str]

generic_describe()[source]#

Returns a list of generic labels, not containing residue names. These can be used to stack tops of different topology.

Returns:

A list of labels.

Return type:

list[str]

property indexes: ndarray#

A (n_angles, 4) shaped numpy array giving the atom indices of the dihedral angles to be calculated.

Type:

np.ndarray

property name: str#

The name of the class: “SideChainDihedrals”.

Type:

str

options: list[str] = ['chi1', 'chi2', 'chi3', 'chi4', 'chi5']#
class SideChainTorsions(traj, selstr=None, deg=False, cossin=False, periodic=True, which='all', delayed=False)[source]#

Bases: DihedralFeature

Parameters:
  • traj (Union[SingleTraj, TrajEnsemble])

  • selstr (Optional[str])

  • deg (bool)

  • cossin (bool)

  • periodic (bool)

  • which (Union[Literal['all'], Sequence[Literal['chi1', 'chi2', 'chi3', 'chi4', 'chi5']]])

  • delayed (bool)

_raise_on_unitcell = False#
_use_angle = True#
_use_omega = False#
_use_periodic = True#
atom_feature = False#
describe()[source]#

Gives a list of strings describing this feature’s feature-axis.

A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].

Returns:

The labels of this feature. The length

is determined by the dih_indexes and the cossin argument in the __init__() method. If cossin is false, then len(describe()) == self.angle_indexes[-1], else len(describe()) is twice as long.

Return type:

list[str]

options = ('chi1', 'chi2', 'chi3', 'chi4', 'chi5')#
_check_aas(traj)[source]#
Parameters:

traj (SingleTraj)

Return type:

None

_describe_atom(topology, index)[source]#

Returns a string describing the given atom.

Parameters:
  • topology (md.Topology) – An MDTraj Topology.

  • index (str) – The index of the atom.

Returns:

A description of the atom.

Return type:

str

describe_last_feats(feat, n=5)[source]#

Prints the description of the last n features.

Parameters:
  • feat (encodermap.Featurizer) – An instance of a featurizer.

  • n (int) – The number of last features to describe. Default is 5.

Return type:

None

pair(*numbers)[source]#

ConvertGroup’s (https://convertgroup.com/) implementation of Matthew Szudzik’s pairing function (http://szudzik.com/ElegantPairing.pdf)

Maps a pair of non-negative integers to a uniquely associated single non-negative integer. Pairing also generalizes for n non-negative integers, by recursively mapping the first pair. For example, to map the following tuple:

Parameters:

*numbers (int) – Variable length integers.

Returns:

The paired integer.

Return type:

int

unpair(number, n=2)[source]#

ConvertGroup’s (https://convertgroup.com/) implementation of Matthew Szudzik’s pairing function (http://szudzik.com/ElegantPairing.pdf)

The inverse function outputs the pair associated with a non-negative integer. Unpairing also generalizes by recursively unpairing a non-negative integer to n non-negative integers.

For example, to associate a number with three non-negative integers n_1, n_2, n_3, such that:

pairing(n_1, n_2, n_3) = number

the number will first be unpaired to n_p, n_3, then the n_p will be unpaired to n_1, n_2, producing the desired n_1, n_2 and n_3.

Parameters:
  • number (int) – The paired integer.

  • n (int) – How many integers are paired in number?

Returns:

A list of length n with the constituting ints.

Return type:

list[int]