Featurization#
- class Featurizer(traj)[source]#
EncoderMap’s featurization has drawn much inspiration from PyEMMA (markovmodel/PyEMMA).
EncoderMap’s Featurizer collects and computes collective variables (CVs). CVs are data that are aligned with MD trajectories on the frame/time axis. Trajectory data contains (besides the topology) an axis for atoms, and an axis for cartesian coordinate (x, y, z), so that a trajectory can be understood as an array with shape (n_frames, n_atoms, 3). A CV is an array that is aligned with the frame/time and has its own feature axis. If the trajectory in our example has 3 residues (MET, ALA, GLY), we can define 6 dihedral angles along the backbone of this peptide. These angles are:
PSI1: Between MET1-N - MET1-CA - MET1-C - ALA2-N
OMEGA1: Between MET1-CA - MET1-C - ALA2-N - ALA2-CA
PHI1: Between MET1-C - ALA2-N - ALA2-CA - ALA2-C
PSI2: Between ALA2-N - ALA2-CA - ALA2-C - GLY3-N
OMEGA2: Between ALA2-CA - ALA2-C - GLY3-N - GLY3-CA
PHI2: Between ALA2-C - GLY3-N - GLY3-CA - GLY3-C
Thus, the collective variable ‘backbone-dihedrals’ provides an array of shape (n_frames, 6) and is aligned with the frame/time axis of the trajectory.
- Parameters:
traj (Union[SingleTraj, TrajEnsemble])
Features#
Features contain topological information of proteins and other biomolecules.
These topological information can be calculated once and then provided with input coordinates to calculate frame-wise collective variables of MD simulations.
The features in this module used to inherit from PyEMMA’s features (markovmodel/PyEMMA), but PyEMMA has since been archived.
If using EncoderMap’s featurization make sure to also cite PyEMMA, from which a lot of this code was adopted:
@article{scherer_pyemma_2015,
author = {Scherer, Martin K. and Trendelkamp-Schroer, Benjamin
and Paul, Fabian and Pérez-Hernández, Guillermo and Hoffmann, Moritz and
Plattner, Nuria and Wehmeyer, Christoph and Prinz, Jan-Hendrik and Noé, Frank},
title = {{PyEMMA} 2: {A} {Software} {Package} for {Estimation},
{Validation}, and {Analysis} of {Markov} {Models}},
journal = {Journal of Chemical Theory and Computation},
volume = {11},
pages = {5525-5542},
year = {2015},
issn = {1549-9618},
shorttitle = {{PyEMMA} 2},
url = {http://dx.doi.org/10.1021/acs.jctc.5b00743},
doi = {10.1021/acs.jctc.5b00743},
urldate = {2015-10-19},
month = oct,
}
- class AlignFeature(traj, reference, indexes, atom_indices=None, ref_atom_indices=None, in_place=False, delayed=False)[source]#
Bases:
SelectionFeature
- Parameters:
traj (SingleTraj)
reference (md.Trajectory)
indexes (np.ndarray)
atom_indices (Optional[np.ndarray])
ref_atom_indices (Optional[np.ndarray])
in_place (bool)
delayed (bool)
- _raise_on_unitcell = False#
- _use_angle = False#
- _use_omega = False#
- _use_periodic = False#
- atom_feature = False#
- class AllBondDistances(traj, distance_indexes=None, periodic=True, check_aas=True, delayed=False)[source]#
Bases:
DistanceFeature
Feature that collects all bonds in a topology.
- Parameters:
traj (SingleTraj)
distance_indexes (Optional[np.ndarray])
periodic (bool)
check_aas (bool)
delayed (bool)
- top#
Topology of this feature.
- Type:
mdtraj.Topology
- indexes#
The numpy array returned from top.select(‘all’).
- Type:
np.ndarray
- _raise_on_unitcell = False#
- _use_angle = False#
- _use_omega = False#
- _use_periodic = True#
- atom_feature = False#
- describe()[source]#
Gives a list of strings describing this feature’s feature-axis.
A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].
- generic_describe()[source]#
Returns a list of generic labels, not containing residue names. These can be used to stack tops of different topology.
- class AllCartesians(traj, check_aas=True, generic_labels=False, delayed=False)[source]#
Bases:
SelectionFeature
Feature that collects all cartesian positions of all atoms in the trajectory.
Note
The order of the cartesians is not as in standard MD coordinates. Rather than giving the positions of all atoms of the first residue, and then all positions of the second, and so on, this feature gives all central (backbone) cartesians first, followed by the cartesians of the sidechains. This allows better and faster backmapping. See encodermap.misc.backmapping._full_backmapping_np for mor info, why this is easier.
- Parameters:
traj (SingleTraj)
check_aas (bool)
generic_labels (bool)
delayed (bool)
- top#
Topology of this feature.
- Type:
mdtraj.Topology
- indexes#
The numpy array returned from top.select(‘all’).
- Type:
np.ndarray
- _raise_on_unitcell = False#
- _use_angle = False#
- _use_omega = False#
- _use_periodic = False#
- atom_feature = False#
- describe()[source]#
Gives a list of strings describing this feature’s feature-axis.
A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].
- class AngleFeature(traj, angle_indexes, deg=False, cossin=False, periodic=True, check_aas=True, delayed=False)[source]#
Bases:
Feature
- Parameters:
traj (Union[SingleTraj, TrajEnsemble])
angle_indexes (np.ndarray)
deg (bool)
cossin (bool)
periodic (bool)
check_aas (bool)
delayed (bool)
- _raise_on_unitcell = False#
- _use_angle = True#
- _use_omega = False#
- _use_periodic = True#
- atom_feature = False#
- property dask_indices: str#
The name of the delayed transformation to carry out with this feature.
- Type:
- static dask_transform()#
The same as transform() but without the need to pickle traj.
When dask delayed concurrencies are distributed, required python objects are pickled. Thus, every feature needs to have its own pickled traj. That defeats the purpose of dask distributed. Thus, this method implements the same calculations as transform as a more barebones approach. It foregoes the checks for periodicity and unit-cell shape and just takes xyz, unitcell vectors, and unitcell info. Furthermore, it is a staticmethod, so it doesn’t require self to function. However, it needs the indexes in self.indexes. That’s why the dask_indices property informs the scheduler to also pickle and pass this object to the workers.
- Parameters:
indexes (np.ndarray) – A numpy array with shape (n, ) giving the 0-based index of the atoms which positions should be returned.
periodic (bool) – Whether to observe the minimum image convention and respect proteins breaking over the periodic boundary condition as a whole (True). In this case, the trajectory container in traj needs to have unitcell information. Defaults to True.
deg (bool) – Whether to return the result in degree (deg=True) or in radians (deg=False). Defaults to False (radians).
cossin (bool) – If True, each angle will be returned as a pair of (sin(x), cos(x)). This is useful, if you calculate the means (e.g. TICA/PCA, clustering) in that space. Defaults to False.
xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.
unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.
unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.
- Return type:
dask.delayed
- describe()[source]#
Gives a list of strings describing this feature’s feature-axis.
A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].
- transform(xyz=None, unitcell_vectors=None, unitcell_info=None)[source]#
Takes xyz and unitcell information to apply the topological calculations on.
When this method is not provided with any input, it will take the traj_container provided as traj in the __init__() method and transforms this trajectory. The argument xyz can be the xyz coordinates in nanometer of a trajectory with identical topology as self.traj. If periodic was set to True, unitcell_vectors and unitcell_info should also be provided.
- Parameters:
xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.
unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.
unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.
- Returns:
The result of the computation with shape (n_frames, n_indexes).
- Return type:
np.ndarray
- class BackboneTorsionFeature(traj, selstr=None, deg=False, cossin=False, periodic=True, delayed=False)[source]#
Bases:
DihedralFeature
- Parameters:
- _raise_on_unitcell = False#
- _use_angle = True#
- _use_omega = False#
- _use_periodic = True#
- atom_feature = False#
- describe()[source]#
Gives a list of strings describing this feature’s feature-axis.
A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].
- class CentralAngles(traj, deg=False, cossin=False, periodic=True, generic_labels=False, check_aas=True, delayed=False)[source]#
Bases:
AngleFeature
Feature that collects all angles in the backbone of a topology.
- Parameters:
traj (Union[SingleTraj, TrajEnsemble])
deg (bool)
cossin (bool)
periodic (bool)
generic_labels (bool)
check_aas (bool)
delayed (bool)
- top#
Topology of this feature.
- Type:
mdtraj.Topology
- indexes#
The numpy array returned from top.select(‘all’).
- Type:
np.ndarray
- _raise_on_unitcell = False#
- _use_angle = True#
- _use_omega = False#
- _use_periodic = True#
- atom_feature = False#
- describe()[source]#
Gives a list of strings describing this feature’s feature-axis.
A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].
- generic_describe()[source]#
Returns a list of generic labels, not containing residue names. These can be used to stack tops of different topology.
- class CentralBondDistances(traj, distance_indexes=None, periodic=True, generic_labels=False, check_aas=True, delayed=False)[source]#
Bases:
AllBondDistances
Feature that collects all bonds in the backbone of a topology.
- Parameters:
traj (SingleTraj)
distance_indexes (Optional[np.ndarray])
periodic (bool)
generic_labels (bool)
check_aas (bool)
delayed (bool)
- top#
Topology of this feature.
- Type:
mdtraj.Topology
- indexes#
The numpy array returned from top.select(‘all’).
- Type:
np.ndarray
- _raise_on_unitcell = False#
- _use_angle = False#
- _use_omega = False#
- _use_periodic = True#
- atom_feature = False#
- generic_describe()[source]#
Returns a list of generic labels, not containing residue names. These can be used to stack tops of different topology.
- class CentralCartesians(traj, generic_labels=False, check_aas=True, delayed=False)[source]#
Bases:
SelectionFeature
Feature that collects all cartesian positions of the backbone atoms.
Examples
>>> import encodermap as em >>> from pprint import pprint >>> traj = em.load_project("pASP_pGLU", 0)[0] >>> traj <encodermap.SingleTraj object...> >>> feature = em.features.CentralCartesians(traj, generic_labels=False) >>> pprint(feature.describe()) ['CENTERPOS X ATOM N: 0 GLU: 1 CHAIN 0', 'CENTERPOS Y ATOM N: 0 GLU: 1 CHAIN 0', 'CENTERPOS Z ATOM N: 0 GLU: 1 CHAIN 0', 'CENTERPOS X ATOM CA: 3 GLU: 1 CHAIN 0', 'CENTERPOS Y ATOM CA: 3 GLU: 1 CHAIN 0', 'CENTERPOS Z ATOM CA: 3 GLU: 1 CHAIN 0', '... 'CENTERPOS Z ATOM C: 65 GLU: 6 CHAIN 0'] >>> feature = em.features.CentralCartesians(traj, generic_labels=True) >>> pprint(feature.describe()) ['CENTERPOS X 1', 'CENTERPOS Y 1', 'CENTERPOS Z 1', 'CENTERPOS X 2', 'CENTERPOS Y 2', 'CENTERPOS Z 2', '... 'CENTERPOS Z 18']
- Parameters:
traj (SingleTraj)
generic_labels (bool)
check_aas (bool)
delayed (bool)
- _raise_on_unitcell = False#
- _use_angle = False#
- _use_omega = False#
- _use_periodic = False#
- atom_feature = False#
- describe()[source]#
Gives a list of strings describing this feature’s feature-axis.
A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].
- class CentralDihedrals(traj, selstr=None, deg=False, cossin=False, periodic=True, omega=True, generic_labels=False, check_aas=True, delayed=False)[source]#
Bases:
DihedralFeature
Feature that collects all dihedrals in the backbone of a topology.
- Parameters:
traj (Union[SingleTraj, TrajEnsemble])
selstr (Optional[str])
deg (bool)
cossin (bool)
periodic (bool)
omega (bool)
generic_labels (bool)
check_aas (bool)
delayed (bool)
- top#
Topology of this feature.
- Type:
mdtraj.Topology
- indexes#
The numpy array returned from top.select(‘all’).
- Type:
np.ndarray
- _raise_on_unitcell = False#
- _use_angle = True#
- _use_omega = True#
- _use_periodic = True#
- atom_feature = False#
- describe()[source]#
Gives a list of strings describing this feature’s feature-axis.
A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].
- generic_describe()[source]#
Returns a list of generic labels, not containing residue names. These can be used to stack tops of different topology.
- exception CitePYEMMAWarning[source]#
Bases:
UserWarning
- class ContactFeature(traj, distance_indexes, threshold=5.0, periodic=True, count_contacts=False, delayed=False)[source]#
Bases:
DistanceFeature
Defines certain distances as contacts and returns a binary (0, 1) result.
Instead of returning the binary result can also count contacts with the argument count_contacts=True provided at instantiation. In that case, every frame returns an integer number.
- Parameters:
traj (SingleTraj)
distance_indexes (np.ndarray)
threshold (float)
periodic (bool)
count_contacts (bool)
delayed (bool)
- _raise_on_unitcell = False#
- _use_angle = False#
- _use_omega = False#
- _use_periodic = True#
- atom_feature = False#
- property dask_indices: str#
The name of the delayed transformation to carry out with this feature.
- Type:
- static dask_transform()#
The same as transform() but without the need to pickle traj.
When dask delayed concurrencies are distributed, required python objects are pickled. Thus, every feature needs to have its own pickled traj. That defeats the purpose of dask distributed. Thus, this method implements the same calculations as transform as a more barebones approach. It foregoes the checks for periodicity and unit-cell shape and just takes xyz, unitcell vectors, and unitcell info. Furthermore, it is a staticmethod, so it doesn’t require self to function. However, it needs the indexes in self.indexes. That’s why the dask_indices property informs the scheduler to also pickle and pass this object to the workers.
- Parameters:
indexes (np.ndarray) – A numpy array with shape (n, ) giving the 0-based index of the atoms which positions should be returned.
periodic (bool) – Whether to observe the minimum image convention and respect proteins breaking over the periodic boundary condition as a whole (True). In this case, the trajectory container in traj needs to have unitcell information. Defaults to True.
threshold (float) – The threshold in nm, under which a distance is considered to be a contact. Defaults to 5.0 nm.
count_contacts (bool) – When True, return an integer of the number of contacts instead of returning the array of regular contacts.
xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.
unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.
unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.
- Return type:
dask.delayed
- transform(xyz=None, unitcell_vectors=None, unitcell_info=None)[source]#
Takes xyz and unitcell information to apply the topological calculations on.
When this method is not provided with any input, it will take the traj_container provided as traj in the __init__() method and transforms this trajectory. The argument xyz can be the xyz coordinates in nanometer of a trajectory with identical topology as self.traj. If periodic was set to True, unitcell_vectors and unitcell_info should also be provided.
- Parameters:
xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.
unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.
unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.
- Returns:
The result of the computation with shape (n_frames, n_indexes).
- Return type:
np.ndarray
- class CustomFeature(fun, dim, traj=None, description=None, fun_args=(), fun_kwargs=None, delayed=False)[source]#
Bases:
Feature
- Parameters:
- _is_custom: Final[True] = True#
- _nonstandard_transform_args: list[str] = ['top', 'indexes', 'delayed_call', '_fun', '_args', '_kwargs']#
- _raise_on_unitcell = False#
- _use_angle = False#
- _use_omega = False#
- _use_periodic = False#
- atom_feature = False#
- property dask_indices#
The name of the delayed transformation to carry out with this feature.
- Type:
- static dask_transform()#
The CustomFeature dask transfrom is still under development.
- Parameters:
- Return type:
np.ndarray
- describe()[source]#
Gives a list of strings describing this feature’s feature-axis.
A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].
- traj: SingleTraj | None = None#
- transform(traj=None, xyz=None, unitcell_vectors=None, unitcell_info=None)[source]#
Takes xyz and unitcell information to apply the topological calculations on.
When this method is not provided with any input, it will take the traj_container provided as traj in the __init__() method and transforms this trajectory. The argument xyz can be the xyz coordinates in nanometer of a trajectory with identical topology as self.traj. If periodic was set to True, unitcell_vectors and unitcell_info should also be provided.
- Parameters:
xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.
unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.
unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.
traj (Trajectory | None)
- Returns:
The result of the computation with shape (n_frames, n_indexes).
- Return type:
np.ndarray
- class DihedralFeature(traj, dih_indexes, deg=False, cossin=False, periodic=True, check_aas=True, delayed=False)[source]#
Bases:
AngleFeature
Dihedrals are torsion angles defined by four atoms.
- Parameters:
traj (Union[SingleTraj, TrajEnsemble])
dih_indexes (np.ndarray)
deg (bool)
cossin (bool)
periodic (bool)
check_aas (bool)
delayed (bool)
- _raise_on_unitcell = False#
- _use_angle = True#
- _use_omega = False#
- _use_periodic = True#
- atom_feature = False#
- static dask_transform()#
The same as transform() but without the need to pickle traj.
When dask delayed concurrencies are distributed, required python objects are pickled. Thus, every feature needs to have its own pickled traj. That defeats the purpose of dask distributed. Thus, this method implements the same calculations as transform as a more barebones approach. It foregoes the checks for periodicity and unit-cell shape and just takes xyz, unitcell vectors, and unitcell info. Furthermore, it is a staticmethod, so it doesn’t require self to function. However, it needs the indexes in self.indexes. That’s why the dask_indices property informs the scheduler to also pickle and pass this object to the workers.
- Parameters:
indexes (np.ndarray) – A numpy array with shape (n, ) giving the 0-based index of the atoms which positions should be returned.
periodic (bool) – Whether to observe the minimum image convention and respect proteins breaking over the periodic boundary condition as a whole (True). In this case, the trajectory container in traj needs to have unitcell information. Defaults to True.
deg (bool) – Whether to return the result in degree (deg=True) or in radians (deg=False). Defaults to False (radians).
cossin (bool) – If True, each angle will be returned as a pair of (sin(x), cos(x)). This is useful, if you calculate the means (e.g. TICA/PCA, clustering) in that space. Defaults to False.
xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.
unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.
unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.
- Return type:
dask.delayed
- describe()[source]#
Gives a list of strings describing this feature’s feature-axis.
A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].
- transform(xyz=None, unitcell_vectors=None, unitcell_info=None)[source]#
Takes xyz and unitcell information to apply the topological calculations on.
When this method is not provided with any input, it will take the traj_container provided as traj in the __init__() method and transforms this trajectory. The argument xyz can be the xyz coordinates in nanometer of a trajectory with identical topology as self.traj. If periodic was set to True, unitcell_vectors and unitcell_info should also be provided.
- Parameters:
xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.
unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.
unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.
- Returns:
The result of the computation with shape (n_frames, n_indexes).
- Return type:
np.ndarray
- class DistanceFeature(traj, distance_indexes, periodic=True, dim=None, check_aas=True, delayed=False)[source]#
Bases:
Feature
- Parameters:
traj (Union[SingleTraj, TrajEnsemble])
distance_indexes (np.ndarray)
periodic (bool)
dim (Optional[int])
check_aas (bool)
delayed (bool)
- _raise_on_unitcell = False#
- _use_angle = False#
- _use_omega = False#
- _use_periodic = True#
- atom_feature = False#
- property dask_indices: str#
The name of the delayed transformation to carry out with this feature.
- Type:
- static dask_transform()#
The same as transform() but without the need to pickle traj.
When dask delayed concurrencies are distributed, required python objects are pickled. Thus, every feature needs to have its own pickled traj. That defeats the purpose of dask distributed. Thus, this method implements the same calculations as transform as a more barebones approach. It foregoes the checks for periodicity and unit-cell shape and just takes xyz, unitcell vectors, and unitcell info. Furthermore, it is a staticmethod, so it doesn’t require self to function. However, it needs the indexes in self.indexes. That’s why the dask_indices property informs the scheduler to also pickle and pass this object to the workers.
- Parameters:
indexes (np.ndarray) – A numpy array with shape (n, ) giving the 0-based index of the atoms which positions should be returned.
periodic (bool) – Whether to observe the minimum image convention and respect proteins breaking over the periodic boundary condition as a whole (True). In this case, the trajectory container in traj needs to have unitcell information. Defaults to True.
xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.
unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.
unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.
- Return type:
dask.delayed
- describe()[source]#
Gives a list of strings describing this feature’s feature-axis.
A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].
- transform(xyz=None, unitcell_vectors=None, unitcell_info=None)[source]#
Takes xyz and unitcell information to apply the topological calculations on.
When this method is not provided with any input, it will take the traj_container provided as traj in the __init__() method and transforms this trajectory. The argument xyz can be the xyz coordinates in nanometer of a trajectory with identical topology as self.traj. If periodic was set to True, unitcell_vectors and unitcell_info should also be provided.
- Parameters:
xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.
unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.
unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.
- Returns:
The result of the computation with shape (n_frames, n_indexes).
- Return type:
np.ndarray
- class Feature(traj, check_aas=True, periodic=None, delayed=False)[source]#
Bases:
object
- Parent class to all feature classes. Implements the FeatureMeta,
the transform method, and checks for unknown amino acids..
This class implements functionality, that holds true for all features. The transform() method can be used by subclasses in two ways:
- Provide all args with None. In this case, the traj in self.traj
will be used to calculate the transformation.
- Provide custom xyz, unitcell_vectors, and unitcell_info. In this
case,
- Parameters:
traj (Union[SingleTraj, TrajEnsemble])
check_aas (bool)
periodic (Optional[bool])
delayed (bool)
- _raise_on_unitcell = False#
- _use_angle = False#
- _use_omega = False#
- _use_periodic = True#
- atom_feature = False#
- transform(xyz=None, unitcell_vectors=None, unitcell_info=None)[source]#
Carries out the computation of the CVs.
For featurization of single trajs, all arguments can be left None, and the values of the traj at class instantiation will be returned by this method. For ensembles with a single topology, but multiple trajectories, the xyz, unitcell_vectors, and unitcell_info should be provided accordingly. This parent class’ transform then carries out checks (do all arguments provide the same number of frames, does the xyz array have the same number of atoms as the traj at instantiation, do the unitcell_angles coincide with the one of the parent traj, …). Thus, it is generally advised to call this method with super() to run these checks.
- Parameters:
xyz (Optional[np.ndarray]) – If None, the coordinates of the trajectory in provided as traj, when the feature was instantiated will be used.
unitcell_vectors (Optional[np.ndarray]) – If None, the unitcell vectors of the trajectory in provided as traj, when the feature was instantiated will be used. Unitcell_vectors are arrays with shape (n_frames, 3, 3), where the rows are the bravais vectors a, b, c.
unitcell_info (Optional[np.ndarray]) – If None, the unitcell info of the trajectory in provided as traj, when the feature was instantiated will be used. The unitcell_info is an array with shape (n_frames, 6), where the first three columns are the unitcell lengths in nm, the remaining columns are the unitcell angles in deg.
- Returns:
- A tuple containing three np.ndarrays:
The xyz coordinates.
The unitcell_vectors
The unitcell_info
- Return type:
- class FeatureMeta(name, bases, dct)[source]#
Bases:
type
Inspects the __init__ of classes and adds attributes to them based on their call signature.
If a feature uses the arguments deg or omega in its call signature, the instance will have the CLASS attributes _use_angle and _use_omega set to True. Otherwise, the instance will have them set as False.
This allows other functions that use these features to easily discern whether they need these arguments before instantiating the classes.
Example
>>> from encodermap.loading import features >>> f_class = getattr(features, "SideChainDihedrals") >>> f_class._use_angle True >>> f_class._use_omega False
- class GroupCOMFeature(traj, group_definitions, ref_geom=None, image_molecules=False, mass_weighted=True, delayed=False)[source]#
Bases:
Feature
Cartesian coordinates of the center-of-mass (COM) of atom groups.
Groups can be defined as sequences of sequences of int. So a list of list of int can be used to define groups of various sizes. The resulting array will have the shape of (n_frames, n_groups ** 2). The xyz coordinates are flattended, so the array can be rebuilt with np.dstack()
Examples
>>> import encodermap as em >>> import numpy as np >>> traj = em.SingleTraj.from_pdb_id("1YUG") >>> f = em.features.GroupCOMFeature( ... traj=traj, ... group_definitions=[ ... [0, 1, 2], ... [3, 4, 5, 6, 7], ... [8, 9, 10], ... ] ... ) >>> a = f.transform() >>> a.shape # this array is flattened along the feature axis (15, 9) >>> a = np.dstack([ ... a[..., ::3], ... a[..., 1::3], ... a[..., 2::3], ... ]) >>> a.shape # now the z, coordinate of the 2nd center of mass is a[:, 1, -1] (15, 3, 3)
Note
Centering (ref_geom) and imaging (image_molecules=True) can be time- consuming. Consider doing this to your trajectory files prior to featurization.
- Parameters:
traj (SingleTraj)
group_definitions (Sequence[Sequence[int]])
ref_geom (Optional[md.Trajectory])
image_molecules (bool)
mass_weighted (bool)
delayed (bool)
- _nonstandard_transform_args: list[str] = ['top', 'ref_geom', 'image_molecules', 'masses_in_groups']#
- _raise_on_unitcell = False#
- _use_angle = False#
- _use_omega = False#
- _use_periodic = False#
- atom_feature = False#
- property dask_indices: str#
The name of the delayed transformation to carry out with this feature.
- Type:
- static dask_transform()#
The same as transform() but without the need to pickle traj.
When dask delayed concurrencies are distributed, required python objects are pickled. Thus, every feature needs to have its own pickled traj. That defeats the purpose of dask distributed. Thus, this method implements the same calculations as transform as a more barebones approach. It foregoes the checks for periodicity and unit-cell shape and just takes xyz, unitcell vectors, and unitcell info. Furthermore, it is a staticmethod, so it doesn’t require self to function. However, it needs the indexes in self.indexes. That’s why the dask_indices property informs the scheduler to also pickle and pass this object to the workers.
- Parameters:
indexes (np.ndarray) – For this special feature, the indexes argument in the @staticmethod dask_transform is self.group_definitions.
periodic (bool) – Whether to observe the minimum image convention and respect proteins breaking over the periodic boundary condition as a whole (True). In this case, the trajectory container in traj needs to have unitcell information. Defaults to True.
threshold (float) – The threshold in nm, under which a distance is considered to be a contact. Defaults to 5.0 nm.
count_contacts (bool) – When True, return an integer of the number of contacts instead of returning the array of regular contacts.
xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.
unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.
unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.
top (md.Topology)
ref_geom (Union[md.Trajectory, None])
image_molecules (bool)
- Return type:
dask.delayed
- describe()[source]#
Gives a list of strings describing this feature’s feature-axis.
A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].
- transform(xyz=None, unitcell_vectors=None, unitcell_info=None)[source]#
Takes xyz and unitcell information to apply the topological calculations on.
When this method is not provided with any input, it will take the traj_container provided as traj in the __init__() method and transforms this trajectory. The argument xyz can be the xyz coordinates in nanometer of a trajectory with identical topology as self.traj. If periodic was set to True, unitcell_vectors and unitcell_info should also be provided.
- Parameters:
xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.
unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.
unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.
- Returns:
The result of the computation with shape (n_frames, n_indexes).
- Return type:
np.ndarray
- class InverseDistanceFeature(traj, distance_indexes, periodic=True, delayed=False)[source]#
Bases:
DistanceFeature
- Parameters:
traj (Union[SingleTraj, TrajEnsemble])
distance_indexes (np.ndarray)
periodic (bool)
delayed (bool)
- _raise_on_unitcell = False#
- _use_angle = False#
- _use_omega = False#
- _use_periodic = True#
- atom_feature = False#
- property dask_indices: str#
The name of the delayed transformation to carry out with this feature.
- Type:
- static dask_transform()#
The same as transform() but without the need to pickle traj.
When dask delayed concurrencies are distributed, required python objects are pickled. Thus, every feature needs to have its own pickled traj. That defeats the purpose of dask distributed. Thus, this method implements the same calculations as transform as a more barebones approach. It foregoes the checks for periodicity and unit-cell shape and just takes xyz, unitcell vectors, and unitcell info. Furthermore, it is a staticmethod, so it doesn’t require self to function. However, it needs the indexes in self.indexes. That’s why the dask_indices property informs the scheduler to also pickle and pass this object to the workers.
- Parameters:
indexes (np.ndarray) – A numpy array with shape (n, ) giving the 0-based index of the atoms which positions should be returned.
periodic (bool) – Whether to observe the minimum image convention and respect proteins breaking over the periodic boundary condition as a whole (True). In this case, the trajectory container in traj needs to have unitcell information. Defaults to True.
xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.
unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.
unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.
- Return type:
dask.delayed
- transform(xyz=None, unitcell_vectors=None, unitcell_info=None)[source]#
Takes xyz and unitcell information to apply the topological calculations on.
When this method is not provided with any input, it will take the traj_container provided as traj in the __init__() method and transforms this trajectory. The argument xyz can be the xyz coordinates in nanometer of a trajectory with identical topology as self.traj. If periodic was set to True, unitcell_vectors and unitcell_info should also be provided.
- Parameters:
xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.
unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.
unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.
- Returns:
The result of the computation with shape (n_frames, n_indexes).
- Return type:
np.ndarray
- class MinRmsdFeature(traj, ref, ref_frame=0, atom_indices=None, precentered=False, delayed=False)[source]#
Bases:
Feature
- Parameters:
traj (SingleTraj)
ref (Union[md.Trajectory, SingleTraj])
ref_frame (int)
atom_indices (Optional[np.ndarray])
precentered (bool)
delayed (bool)
- _raise_on_unitcell = False#
- _use_angle = False#
- _use_omega = False#
- _use_periodic = False#
- atom_feature = False#
- property dask_indices#
The name of the delayed transformation to carry out with this feature.
- Type:
- static dask_transform()#
Takes xyz and unitcell information to apply the topological calculations on.
- Parameters:
xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.
unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.
unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.
indexes (np.ndarray)
top (md.Topology)
ref (md.Trajectory)
- Returns:
The result of the computation with shape (n_frames, n_indexes).
- Return type:
np.ndarray
- describe()[source]#
Gives a list of strings describing this feature’s feature-axis.
A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].
- transform(xyz=None, unitcell_vectors=None, unitcell_info=None)[source]#
Takes xyz and unitcell information to apply the topological calculations on.
When this method is not provided with any input, it will take the traj_container provided as traj in the __init__() method and transforms this trajectory. The argument xyz can be the xyz coordinates in nanometer of a trajectory with identical topology as self.traj. If periodic was set to True, unitcell_vectors and unitcell_info should also be provided.
- Parameters:
xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.
unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.
unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.
- Returns:
The result of the computation with shape (n_frames, n_indexes).
- Return type:
np.ndarray
- class ResidueCOMFeature(traj, residue_indices, residue_atoms, scheme='all', ref_geom=None, image_molecules=False, mass_weighted=True, delayed=False)[source]#
Bases:
GroupCOMFeature
- Parameters:
traj (SingleTraj)
residue_indices (Sequence[int])
residue_atoms (np.ndarray)
scheme (Literal['all', 'backbone', 'sidechain'])
ref_geom (Optional[md.Trajectory])
image_molecules (bool)
mass_weighted (bool)
delayed (bool)
- _raise_on_unitcell = False#
- _use_angle = False#
- _use_omega = False#
- _use_periodic = False#
- atom_feature = False#
- class ResidueMinDistanceFeature(traj, contacts, scheme, ignore_nonprotein, threshold, periodic, count_contacts=False, delayed=False)[source]#
Bases:
DistanceFeature
- Parameters:
- _raise_on_unitcell = False#
- _use_angle = False#
- _use_omega = False#
- _use_periodic = True#
- atom_feature = False#
- property dask_indices: str#
The name of the delayed transformation to carry out with this feature.
- Type:
- static dask_transform()#
The same as transform() but without the need to pickle traj.
When dask delayed concurrencies are distributed, required python objects are pickled. Thus, every feature needs to have its own pickled traj. That defeats the purpose of dask distributed. Thus, this method implements the same calculations as transform as a more barebones approach. It foregoes the checks for periodicity and unit-cell shape and just takes xyz, unitcell vectors, and unitcell info. Furthermore, it is a staticmethod, so it doesn’t require self to function. However, it needs the indexes in self.indexes. That’s why the dask_indices property informs the scheduler to also pickle and pass this object to the workers.
- Parameters:
indexes (np.ndarray) – For this special feature, the indexes argument in the @staticmethod dask_transform is self.contacts.
periodic (bool) – Whether to observe the minimum image convention and respect proteins breaking over the periodic boundary condition as a whole (True). In this case, the trajectory container in traj needs to have unitcell information. Defaults to True.
threshold (float) – The threshold in nm, under which a distance is considered to be a contact. Defaults to 5.0 nm.
count_contacts (bool) – When True, return an integer of the number of contacts instead of returning the array of regular contacts.
xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.
unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.
unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.
top (md.Topology)
scheme (Literal['ca', 'closest', 'closest-heavy'])
- Return type:
dask.delayed
- describe()[source]#
Gives a list of strings describing this feature’s feature-axis.
A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].
- transform(xyz=None, unitcell_vectors=None, unitcell_info=None)[source]#
Takes xyz and unitcell information to apply the topological calculations on.
When this method is not provided with any input, it will take the traj_container provided as traj in the __init__() method and transforms this trajectory. The argument xyz can be the xyz coordinates in nanometer of a trajectory with identical topology as self.traj. If periodic was set to True, unitcell_vectors and unitcell_info should also be provided.
- Parameters:
xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.
unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.
unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.
- Returns:
The result of the computation with shape (n_frames, n_indexes).
- Return type:
np.ndarray
- class SelectionFeature(traj, indexes, check_aas=True, delayed=False)[source]#
Bases:
Feature
- Parameters:
traj (Union[SingleTraj, TrajEnsemble])
indexes (Sequence[int])
check_aas (bool)
delayed (bool)
- _raise_on_unitcell = False#
- _use_angle = False#
- _use_omega = False#
- _use_periodic = False#
- atom_feature = False#
- property dask_indices: str#
The name of the delayed transformation to carry out with this feature.
- Type:
- static dask_transform()#
The same as transform() but without the need to pickle traj.
When dask delayed concurrencies are distributed, required python objects are pickled. Thus, every feature needs to have its own pickled traj. That defeats the purpose of dask distributed. Thus, this method implements the same calculations as transform as a more barebones approach. It foregoes the checks for periodicity and unit-cell shape and just takes xyz, unitcell vectors, and unitcell info. Furthermore, it is a staticmethod, so it doesn’t require self to function. However, it needs the indexes in self.indexes. That’s why the dask_indices property informs the scheduler to also pickle and pass this object to the workers.
- Parameters:
indexes (np.ndarray) – A numpy array with shape (n, ) giving the 0-based index of the atoms which positions should be returned.
xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.
unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.
unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.
- Return type:
dask.delayed
- describe()[source]#
Gives a list of strings describing this feature’s feature-axis.
A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].
- transform(xyz=None, unitcell_vectors=None, unitcell_info=None)[source]#
Takes xyz and unitcell information to apply the topological calculations on.
When this method is not provided with any input, it will take the traj_container provided as traj in the __init__() method and transforms this trajectory. The argument xyz can be the xyz coordinates in nanometer of a trajectory with identical topology as self.traj. If periodic was set to True, unitcell_vectors and unitcell_info should also be provided.
- Parameters:
xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.
unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.
unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.
- Returns:
The result of the computation with shape (n_frames, n_indexes).
- Return type:
np.ndarray
- class SideChainAngles(traj, deg=False, cossin=False, periodic=True, check_aas=True, generic_labels=False, delayed=False)[source]#
Bases:
AngleFeature
Feature that collects all angles not in the backbone of a topology.
- Parameters:
- top#
Topology of this feature.
- Type:
mdtraj.Topology
- indexes#
The numpy array returned from top.select(‘all’).
- Type:
np.ndarray
- _raise_on_unitcell = False#
- _use_angle = True#
- _use_omega = False#
- _use_periodic = True#
- atom_feature = False#
- describe()[source]#
Gives a list of strings describing this feature’s feature-axis.
A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].
- generic_describe()[source]#
Returns a list of generic labels, not containing residue names. These can be used to stack tops of different topology.
- property indexes#
A (n_angles, 3) shaped numpy array giving the atom indices of the angles to be calculated.
- Type:
np.ndarray
- class SideChainBondDistances(traj, periodic=True, check_aas=True, generic_labels=False, delayed=False)[source]#
Bases:
AllBondDistances
Feature that collects all bonds not in the backbone of a topology.
- Parameters:
traj (SingleTraj)
periodic (bool)
check_aas (bool)
generic_labels (bool)
delayed (bool)
- top#
Topology of this feature.
- Type:
mdtraj.Topology
- indexes#
The numpy array returned from top.select(‘all’).
- Type:
np.ndarray
- _raise_on_unitcell = False#
- _use_angle = False#
- _use_omega = False#
- _use_periodic = True#
- atom_feature = False#
- generic_describe()[source]#
Returns a list of generic labels, not containing residue names. These can be used to stack tops of different topology.
- property indexes#
A (n_angles, 2) shaped numpy array giving the atom indices of the distances to be calculated.
- Type:
np.ndarray
- class SideChainCartesians(traj, check_aas=True, generic_labels=False, delayed=False)[source]#
Bases:
SelectionFeature
Feature that collects all cartesian positions of all non-backbone atoms.
- Parameters:
traj (SingleTraj)
check_aas (bool)
generic_labels (bool)
delayed (bool)
- top#
Topology of this feature.
- Type:
mdtraj.Topology
- indexes#
The numpy array returned from top.select(‘all’).
- Type:
np.ndarray
- _raise_on_unitcell = False#
- _use_angle = False#
- _use_omega = False#
- _use_periodic = False#
- atom_feature = False#
- describe()[source]#
Gives a list of strings describing this feature’s feature-axis.
A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].
- class SideChainDihedrals(traj, selstr=None, deg=False, cossin=False, periodic=True, generic_labels=False, check_aas=True, delayed=False)[source]#
Bases:
DihedralFeature
Feature that collects all dihedrals in the backbone of a topology.
- Parameters:
traj (Union[SingleTraj, TrajEnsemble])
selstr (Optional[str])
deg (bool)
cossin (bool)
periodic (bool)
generic_labels (bool)
check_aas (bool)
delayed (bool)
- top#
Topology of this feature.
- Type:
mdtraj.Topology
- indexes#
The numpy array returned from top.select(‘all’).
- Type:
np.ndarray
- _raise_on_unitcell = False#
- _use_angle = True#
- _use_omega = False#
- _use_periodic = True#
- atom_feature = False#
- describe()[source]#
Gives a list of strings describing this feature’s feature-axis.
A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].
- generic_describe()[source]#
Returns a list of generic labels, not containing residue names. These can be used to stack tops of different topology.
- class SideChainTorsions(traj, selstr=None, deg=False, cossin=False, periodic=True, which='all', delayed=False)[source]#
Bases:
DihedralFeature
- Parameters:
traj (Union[SingleTraj, TrajEnsemble])
selstr (Optional[str])
deg (bool)
cossin (bool)
periodic (bool)
which (Union[Literal['all'], Sequence[Literal['chi1', 'chi2', 'chi3', 'chi4', 'chi5']]])
delayed (bool)
- _raise_on_unitcell = False#
- _use_angle = True#
- _use_omega = False#
- _use_periodic = True#
- atom_feature = False#
- describe()[source]#
Gives a list of strings describing this feature’s feature-axis.
A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].
- options = ('chi1', 'chi2', 'chi3', 'chi4', 'chi5')#
- _check_aas(traj)[source]#
- Parameters:
traj (SingleTraj)
- Return type:
None
- describe_last_feats(feat, n=5)[source]#
Prints the description of the last n features.
- Parameters:
feat (encodermap.Featurizer) – An instance of a featurizer.
n (int) – The number of last features to describe. Default is 5.
- Return type:
None
- pair(*numbers)[source]#
ConvertGroup’s (https://convertgroup.com/) implementation of Matthew Szudzik’s pairing function (http://szudzik.com/ElegantPairing.pdf)
Maps a pair of non-negative integers to a uniquely associated single non-negative integer. Pairing also generalizes for n non-negative integers, by recursively mapping the first pair. For example, to map the following tuple:
- unpair(number, n=2)[source]#
ConvertGroup’s (https://convertgroup.com/) implementation of Matthew Szudzik’s pairing function (http://szudzik.com/ElegantPairing.pdf)
The inverse function outputs the pair associated with a non-negative integer. Unpairing also generalizes by recursively unpairing a non-negative integer to n non-negative integers.
For example, to associate a number with three non-negative integers n_1, n_2, n_3, such that:
pairing(n_1, n_2, n_3) = number
the number will first be unpaired to n_p, n_3, then the n_p will be unpaired to n_1, n_2, producing the desired n_1, n_2 and n_3.