encodermap.loading package#
Submodules#
encodermap.loading.delayed module#
Functions to use with the DaskFeaturizer class.
- build_dask_xarray(featurizer: DaskFeaturizer, traj: SingleTraj | None, streamable: bool, return_delayeds: Literal[True]) tuple[Dataset, dict[str, Variable]] [source]#
- build_dask_xarray(featurizer: DaskFeaturizer, traj: SingleTraj | None, streamable: bool, return_delayeds: Literal[False]) tuple[Dataset, None]
Builds a large dask xarray, which will be distributively evaluated.
This class takes a DaskFeaturizer class, which contains a list of features. Every feature in this list contains enough information for the delayed functions to calculate the requested quantities when provided the xyz coordinates of the atoms, the unitcell vectors, and the unitcell infos as a Bravais matrix.
- Parameters:
featurizer (DaskFeaturizer) – An instance of the DaskFeaturizer.
return_coordinates (bool) – Whether to add this information: all_xyz, all_time, all_cell_lengths, all_cell_angles to the returned values. Defaults to False.
streamable (bool) – Whether to divide the calculations into one-frame blocks, which can then only be calculated when requested.
- Returns:
When return_coordinates is False, only a xr.Dataset is returned. Otherwise, a tuple with a xr.Dataset and a sequence of dask.Delayed objects is returned.
- Return type:
- calc_bravais_box(box_info)[source]#
Calculates the Bravais vectors from lengths and angles (in degrees).
Note
This code is adapted from gyroid, which is licensed under the BSD http://pythonhosted.org/gyroid/_modules/gyroid/unitcell.html
- Parameters:
box_info (np.ndarray) – The box info, where the columns are ordered as follows: a, b, c, alpha, beta. gamma in degree.
- Returns:
The bravais vectors as a shape (n_frames, 3, 3) array.
- Return type:
np.ndarray
- load_xyz_from_h5(traj_file, frame_indices, traj_num=None)[source]#
Loads xyz coordinates and unitcell info from a block in a .h5 file.
- Standard MDTraj h5 keys are:
[‘cell_angles’, ‘cell_lengths’, ‘coordinates’, ‘time’, ‘topology’]
- Parameters:
- Returns:
- A four-tuple of dask
- arrays that contain dask delayeds. The order of these arrays is:
positions: Shape (len(frame_indices), 3): The xyz coordinates in nm.
time: shape (len(frame_indices), ): The time in ps.
unitcell_vectors: Shape (len(frame_indices), 3, 3): The unitcell vectors.
- unitcell_info: Shape (len(frame_indices), 6), where [:, :3] are
the unitcell lengths in nm and [:, 3:] are the unitcell angles in degree.
- Return type:
tuple[da.array, da.array, da.array, da.array]
encodermap.loading.features module#
Features contain topological information of proteins and other biomolecules.
These topological information can be calculated once and then provided with input coordinates to calculate frame-wise collective variables of MD simulations.
The features in this module used to inherit from PyEMMA’s features (markovmodel/PyEMMA), but PyEMMA has since been archived.
If using EncoderMap’s featurization make sure to also cite PyEMMA, from which a lot of this code was adopted:
@article{scherer_pyemma_2015,
author = {Scherer, Martin K. and Trendelkamp-Schroer, Benjamin
and Paul, Fabian and Pérez-Hernández, Guillermo and Hoffmann, Moritz and
Plattner, Nuria and Wehmeyer, Christoph and Prinz, Jan-Hendrik and Noé, Frank},
title = {{PyEMMA} 2: {A} {Software} {Package} for {Estimation},
{Validation}, and {Analysis} of {Markov} {Models}},
journal = {Journal of Chemical Theory and Computation},
volume = {11},
pages = {5525-5542},
year = {2015},
issn = {1549-9618},
shorttitle = {{PyEMMA} 2},
url = {http://dx.doi.org/10.1021/acs.jctc.5b00743},
doi = {10.1021/acs.jctc.5b00743},
urldate = {2015-10-19},
month = oct,
}
- class AlignFeature(traj, reference, indexes, atom_indices=None, ref_atom_indices=None, in_place=False, delayed=False)[source]#
Bases:
SelectionFeature
- Parameters:
traj (SingleTraj)
reference (md.Trajectory)
indexes (np.ndarray)
atom_indices (Optional[np.ndarray])
ref_atom_indices (Optional[np.ndarray])
in_place (bool)
delayed (bool)
- _raise_on_unitcell = False#
- _use_angle = False#
- _use_omega = False#
- _use_periodic = False#
- atom_feature = False#
- class AllBondDistances(traj, distance_indexes=None, periodic=True, check_aas=True, delayed=False)[source]#
Bases:
DistanceFeature
Feature that collects all bonds in a topology.
- Parameters:
traj (SingleTraj)
distance_indexes (Optional[np.ndarray])
periodic (bool)
check_aas (bool)
delayed (bool)
- top#
Topology of this feature.
- Type:
mdtraj.Topology
- indexes#
The numpy array returned from top.select(‘all’).
- Type:
np.ndarray
- _raise_on_unitcell = False#
- _use_angle = False#
- _use_omega = False#
- _use_periodic = True#
- atom_feature = False#
- describe()[source]#
Gives a list of strings describing this feature’s feature-axis.
A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].
- generic_describe()[source]#
Returns a list of generic labels, not containing residue names. These can be used to stack tops of different topology.
- class AllCartesians(traj, check_aas=True, generic_labels=False, delayed=False)[source]#
Bases:
SelectionFeature
Feature that collects all cartesian positions of all atoms in the trajectory.
Note
The order of the cartesians is not as in standard MD coordinates. Rather than giving the positions of all atoms of the first residue, and then all positions of the second, and so on, this feature gives all central (backbone) cartesians first, followed by the cartesians of the sidechains. This allows better and faster backmapping. See encodermap.misc.backmapping._full_backmapping_np for mor info, why this is easier.
- Parameters:
traj (SingleTraj)
check_aas (bool)
generic_labels (bool)
delayed (bool)
- top#
Topology of this feature.
- Type:
mdtraj.Topology
- indexes#
The numpy array returned from top.select(‘all’).
- Type:
np.ndarray
- _raise_on_unitcell = False#
- _use_angle = False#
- _use_omega = False#
- _use_periodic = False#
- atom_feature = False#
- describe()[source]#
Gives a list of strings describing this feature’s feature-axis.
A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].
- class AngleFeature(traj, angle_indexes, deg=False, cossin=False, periodic=True, check_aas=True, delayed=False)[source]#
Bases:
Feature
- Parameters:
traj (Union[SingleTraj, TrajEnsemble])
angle_indexes (np.ndarray)
deg (bool)
cossin (bool)
periodic (bool)
check_aas (bool)
delayed (bool)
- _raise_on_unitcell = False#
- _use_angle = True#
- _use_omega = False#
- _use_periodic = True#
- atom_feature = False#
- property dask_indices: str#
The name of the delayed transformation to carry out with this feature.
- Type:
- static dask_transform()#
The same as transform() but without the need to pickle traj.
When dask delayed concurrencies are distributed, required python objects are pickled. Thus, every feature needs to have its own pickled traj. That defeats the purpose of dask distributed. Thus, this method implements the same calculations as transform as a more barebones approach. It foregoes the checks for periodicity and unit-cell shape and just takes xyz, unitcell vectors, and unitcell info. Furthermore, it is a staticmethod, so it doesn’t require self to function. However, it needs the indexes in self.indexes. That’s why the dask_indices property informs the scheduler to also pickle and pass this object to the workers.
- Parameters:
indexes (np.ndarray) – A numpy array with shape (n, ) giving the 0-based index of the atoms which positions should be returned.
periodic (bool) – Whether to observe the minimum image convention and respect proteins breaking over the periodic boundary condition as a whole (True). In this case, the trajectory container in traj needs to have unitcell information. Defaults to True.
deg (bool) – Whether to return the result in degree (deg=True) or in radians (deg=False). Defaults to False (radians).
cossin (bool) – If True, each angle will be returned as a pair of (sin(x), cos(x)). This is useful, if you calculate the means (e.g. TICA/PCA, clustering) in that space. Defaults to False.
xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.
unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.
unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.
- Return type:
dask.delayed
- describe()[source]#
Gives a list of strings describing this feature’s feature-axis.
A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].
- transform(xyz=None, unitcell_vectors=None, unitcell_info=None)[source]#
Takes xyz and unitcell information to apply the topological calculations on.
When this method is not provided with any input, it will take the traj_container provided as traj in the __init__() method and transforms this trajectory. The argument xyz can be the xyz coordinates in nanometer of a trajectory with identical topology as self.traj. If periodic was set to True, unitcell_vectors and unitcell_info should also be provided.
- Parameters:
xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.
unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.
unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.
- Returns:
The result of the computation with shape (n_frames, n_indexes).
- Return type:
np.ndarray
- class BackboneTorsionFeature(traj, selstr=None, deg=False, cossin=False, periodic=True, delayed=False)[source]#
Bases:
DihedralFeature
- Parameters:
- _raise_on_unitcell = False#
- _use_angle = True#
- _use_omega = False#
- _use_periodic = True#
- atom_feature = False#
- describe()[source]#
Gives a list of strings describing this feature’s feature-axis.
A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].
- class CentralAngles(traj, deg=False, cossin=False, periodic=True, generic_labels=False, check_aas=True, delayed=False)[source]#
Bases:
AngleFeature
Feature that collects all angles in the backbone of a topology.
- Parameters:
traj (Union[SingleTraj, TrajEnsemble])
deg (bool)
cossin (bool)
periodic (bool)
generic_labels (bool)
check_aas (bool)
delayed (bool)
- top#
Topology of this feature.
- Type:
mdtraj.Topology
- indexes#
The numpy array returned from top.select(‘all’).
- Type:
np.ndarray
- _raise_on_unitcell = False#
- _use_angle = True#
- _use_omega = False#
- _use_periodic = True#
- atom_feature = False#
- describe()[source]#
Gives a list of strings describing this feature’s feature-axis.
A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].
- generic_describe()[source]#
Returns a list of generic labels, not containing residue names. These can be used to stack tops of different topology.
- class CentralBondDistances(traj, distance_indexes=None, periodic=True, generic_labels=False, check_aas=True, delayed=False)[source]#
Bases:
AllBondDistances
Feature that collects all bonds in the backbone of a topology.
- Parameters:
traj (SingleTraj)
distance_indexes (Optional[np.ndarray])
periodic (bool)
generic_labels (bool)
check_aas (bool)
delayed (bool)
- top#
Topology of this feature.
- Type:
mdtraj.Topology
- indexes#
The numpy array returned from top.select(‘all’).
- Type:
np.ndarray
- _raise_on_unitcell = False#
- _use_angle = False#
- _use_omega = False#
- _use_periodic = True#
- atom_feature = False#
- generic_describe()[source]#
Returns a list of generic labels, not containing residue names. These can be used to stack tops of different topology.
- class CentralCartesians(traj, generic_labels=False, check_aas=True, delayed=False)[source]#
Bases:
SelectionFeature
Feature that collects all cartesian positions of the backbone atoms.
Examples
>>> import encodermap as em >>> from pprint import pprint >>> traj = em.load_project("pASP_pGLU", 0)[0] >>> traj <encodermap.SingleTraj object...> >>> feature = em.features.CentralCartesians(traj, generic_labels=False) >>> pprint(feature.describe()) ['CENTERPOS X ATOM N: 0 GLU: 1 CHAIN 0', 'CENTERPOS Y ATOM N: 0 GLU: 1 CHAIN 0', 'CENTERPOS Z ATOM N: 0 GLU: 1 CHAIN 0', 'CENTERPOS X ATOM CA: 3 GLU: 1 CHAIN 0', 'CENTERPOS Y ATOM CA: 3 GLU: 1 CHAIN 0', 'CENTERPOS Z ATOM CA: 3 GLU: 1 CHAIN 0', '... 'CENTERPOS Z ATOM C: 65 GLU: 6 CHAIN 0'] >>> feature = em.features.CentralCartesians(traj, generic_labels=True) >>> pprint(feature.describe()) ['CENTERPOS X 1', 'CENTERPOS Y 1', 'CENTERPOS Z 1', 'CENTERPOS X 2', 'CENTERPOS Y 2', 'CENTERPOS Z 2', '... 'CENTERPOS Z 18']
- Parameters:
traj (SingleTraj)
generic_labels (bool)
check_aas (bool)
delayed (bool)
- _raise_on_unitcell = False#
- _use_angle = False#
- _use_omega = False#
- _use_periodic = False#
- atom_feature = False#
- describe()[source]#
Gives a list of strings describing this feature’s feature-axis.
A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].
- class CentralDihedrals(traj, selstr=None, deg=False, cossin=False, periodic=True, omega=True, generic_labels=False, check_aas=True, delayed=False)[source]#
Bases:
DihedralFeature
Feature that collects all dihedrals in the backbone of a topology.
- Parameters:
traj (Union[SingleTraj, TrajEnsemble])
selstr (Optional[str])
deg (bool)
cossin (bool)
periodic (bool)
omega (bool)
generic_labels (bool)
check_aas (bool)
delayed (bool)
- top#
Topology of this feature.
- Type:
mdtraj.Topology
- indexes#
The numpy array returned from top.select(‘all’).
- Type:
np.ndarray
- _raise_on_unitcell = False#
- _use_angle = True#
- _use_omega = True#
- _use_periodic = True#
- atom_feature = False#
- describe()[source]#
Gives a list of strings describing this feature’s feature-axis.
A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].
- generic_describe()[source]#
Returns a list of generic labels, not containing residue names. These can be used to stack tops of different topology.
- class ContactFeature(traj, distance_indexes, threshold=5.0, periodic=True, count_contacts=False, delayed=False)[source]#
Bases:
DistanceFeature
Defines certain distances as contacts and returns a binary (0, 1) result.
Instead of returning the binary result can also count contacts with the argument count_contacts=True provided at instantiation. In that case, every frame returns an integer number.
- Parameters:
traj (SingleTraj)
distance_indexes (np.ndarray)
threshold (float)
periodic (bool)
count_contacts (bool)
delayed (bool)
- _raise_on_unitcell = False#
- _use_angle = False#
- _use_omega = False#
- _use_periodic = True#
- atom_feature = False#
- property dask_indices: str#
The name of the delayed transformation to carry out with this feature.
- Type:
- static dask_transform()#
The same as transform() but without the need to pickle traj.
When dask delayed concurrencies are distributed, required python objects are pickled. Thus, every feature needs to have its own pickled traj. That defeats the purpose of dask distributed. Thus, this method implements the same calculations as transform as a more barebones approach. It foregoes the checks for periodicity and unit-cell shape and just takes xyz, unitcell vectors, and unitcell info. Furthermore, it is a staticmethod, so it doesn’t require self to function. However, it needs the indexes in self.indexes. That’s why the dask_indices property informs the scheduler to also pickle and pass this object to the workers.
- Parameters:
indexes (np.ndarray) – A numpy array with shape (n, ) giving the 0-based index of the atoms which positions should be returned.
periodic (bool) – Whether to observe the minimum image convention and respect proteins breaking over the periodic boundary condition as a whole (True). In this case, the trajectory container in traj needs to have unitcell information. Defaults to True.
threshold (float) – The threshold in nm, under which a distance is considered to be a contact. Defaults to 5.0 nm.
count_contacts (bool) – When True, return an integer of the number of contacts instead of returning the array of regular contacts.
xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.
unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.
unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.
- Return type:
dask.delayed
- transform(xyz=None, unitcell_vectors=None, unitcell_info=None)[source]#
Takes xyz and unitcell information to apply the topological calculations on.
When this method is not provided with any input, it will take the traj_container provided as traj in the __init__() method and transforms this trajectory. The argument xyz can be the xyz coordinates in nanometer of a trajectory with identical topology as self.traj. If periodic was set to True, unitcell_vectors and unitcell_info should also be provided.
- Parameters:
xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.
unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.
unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.
- Returns:
The result of the computation with shape (n_frames, n_indexes).
- Return type:
np.ndarray
- class CustomFeature(fun, dim, traj=None, description=None, fun_args=(), fun_kwargs=None, delayed=False)[source]#
Bases:
Feature
- Parameters:
- _is_custom: Final[True] = True#
- _nonstandard_transform_args: list[str] = ['top', 'indexes', 'delayed_call', '_fun', '_args', '_kwargs']#
- _raise_on_unitcell = False#
- _use_angle = False#
- _use_omega = False#
- _use_periodic = False#
- atom_feature = False#
- property dask_indices#
The name of the delayed transformation to carry out with this feature.
- Type:
- static dask_transform()#
The CustomFeature dask transfrom is still under development.
- Parameters:
- Return type:
np.ndarray
- describe()[source]#
Gives a list of strings describing this feature’s feature-axis.
A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].
- traj: SingleTraj | None = None#
- transform(traj=None, xyz=None, unitcell_vectors=None, unitcell_info=None)[source]#
Takes xyz and unitcell information to apply the topological calculations on.
When this method is not provided with any input, it will take the traj_container provided as traj in the __init__() method and transforms this trajectory. The argument xyz can be the xyz coordinates in nanometer of a trajectory with identical topology as self.traj. If periodic was set to True, unitcell_vectors and unitcell_info should also be provided.
- Parameters:
xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.
unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.
unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.
traj (Trajectory | None)
- Returns:
The result of the computation with shape (n_frames, n_indexes).
- Return type:
np.ndarray
- class DihedralFeature(traj, dih_indexes, deg=False, cossin=False, periodic=True, check_aas=True, delayed=False)[source]#
Bases:
AngleFeature
Dihedrals are torsion angles defined by four atoms.
- Parameters:
traj (Union[SingleTraj, TrajEnsemble])
dih_indexes (np.ndarray)
deg (bool)
cossin (bool)
periodic (bool)
check_aas (bool)
delayed (bool)
- _raise_on_unitcell = False#
- _use_angle = True#
- _use_omega = False#
- _use_periodic = True#
- atom_feature = False#
- static dask_transform()#
The same as transform() but without the need to pickle traj.
When dask delayed concurrencies are distributed, required python objects are pickled. Thus, every feature needs to have its own pickled traj. That defeats the purpose of dask distributed. Thus, this method implements the same calculations as transform as a more barebones approach. It foregoes the checks for periodicity and unit-cell shape and just takes xyz, unitcell vectors, and unitcell info. Furthermore, it is a staticmethod, so it doesn’t require self to function. However, it needs the indexes in self.indexes. That’s why the dask_indices property informs the scheduler to also pickle and pass this object to the workers.
- Parameters:
indexes (np.ndarray) – A numpy array with shape (n, ) giving the 0-based index of the atoms which positions should be returned.
periodic (bool) – Whether to observe the minimum image convention and respect proteins breaking over the periodic boundary condition as a whole (True). In this case, the trajectory container in traj needs to have unitcell information. Defaults to True.
deg (bool) – Whether to return the result in degree (deg=True) or in radians (deg=False). Defaults to False (radians).
cossin (bool) – If True, each angle will be returned as a pair of (sin(x), cos(x)). This is useful, if you calculate the means (e.g. TICA/PCA, clustering) in that space. Defaults to False.
xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.
unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.
unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.
- Return type:
dask.delayed
- describe()[source]#
Gives a list of strings describing this feature’s feature-axis.
A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].
- transform(xyz=None, unitcell_vectors=None, unitcell_info=None)[source]#
Takes xyz and unitcell information to apply the topological calculations on.
When this method is not provided with any input, it will take the traj_container provided as traj in the __init__() method and transforms this trajectory. The argument xyz can be the xyz coordinates in nanometer of a trajectory with identical topology as self.traj. If periodic was set to True, unitcell_vectors and unitcell_info should also be provided.
- Parameters:
xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.
unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.
unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.
- Returns:
The result of the computation with shape (n_frames, n_indexes).
- Return type:
np.ndarray
- class DistanceFeature(traj, distance_indexes, periodic=True, dim=None, check_aas=True, delayed=False)[source]#
Bases:
Feature
- Parameters:
traj (Union[SingleTraj, TrajEnsemble])
distance_indexes (np.ndarray)
periodic (bool)
dim (Optional[int])
check_aas (bool)
delayed (bool)
- _raise_on_unitcell = False#
- _use_angle = False#
- _use_omega = False#
- _use_periodic = True#
- atom_feature = False#
- property dask_indices: str#
The name of the delayed transformation to carry out with this feature.
- Type:
- static dask_transform()#
The same as transform() but without the need to pickle traj.
When dask delayed concurrencies are distributed, required python objects are pickled. Thus, every feature needs to have its own pickled traj. That defeats the purpose of dask distributed. Thus, this method implements the same calculations as transform as a more barebones approach. It foregoes the checks for periodicity and unit-cell shape and just takes xyz, unitcell vectors, and unitcell info. Furthermore, it is a staticmethod, so it doesn’t require self to function. However, it needs the indexes in self.indexes. That’s why the dask_indices property informs the scheduler to also pickle and pass this object to the workers.
- Parameters:
indexes (np.ndarray) – A numpy array with shape (n, ) giving the 0-based index of the atoms which positions should be returned.
periodic (bool) – Whether to observe the minimum image convention and respect proteins breaking over the periodic boundary condition as a whole (True). In this case, the trajectory container in traj needs to have unitcell information. Defaults to True.
xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.
unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.
unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.
- Return type:
dask.delayed
- describe()[source]#
Gives a list of strings describing this feature’s feature-axis.
A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].
- transform(xyz=None, unitcell_vectors=None, unitcell_info=None)[source]#
Takes xyz and unitcell information to apply the topological calculations on.
When this method is not provided with any input, it will take the traj_container provided as traj in the __init__() method and transforms this trajectory. The argument xyz can be the xyz coordinates in nanometer of a trajectory with identical topology as self.traj. If periodic was set to True, unitcell_vectors and unitcell_info should also be provided.
- Parameters:
xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.
unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.
unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.
- Returns:
The result of the computation with shape (n_frames, n_indexes).
- Return type:
np.ndarray
- class GroupCOMFeature(traj, group_definitions, ref_geom=None, image_molecules=False, mass_weighted=True, delayed=False)[source]#
Bases:
Feature
Cartesian coordinates of the center-of-mass (COM) of atom groups.
Groups can be defined as sequences of sequences of int. So a list of list of int can be used to define groups of various sizes. The resulting array will have the shape of (n_frames, n_groups ** 2). The xyz coordinates are flattended, so the array can be rebuilt with np.dstack()
Examples
>>> import encodermap as em >>> import numpy as np >>> traj = em.SingleTraj.from_pdb_id("1YUG") >>> f = em.features.GroupCOMFeature( ... traj=traj, ... group_definitions=[ ... [0, 1, 2], ... [3, 4, 5, 6, 7], ... [8, 9, 10], ... ] ... ) >>> a = f.transform() >>> a.shape # this array is flattened along the feature axis (15, 9) >>> a = np.dstack([ ... a[..., ::3], ... a[..., 1::3], ... a[..., 2::3], ... ]) >>> a.shape # now the z, coordinate of the 2nd center of mass is a[:, 1, -1] (15, 3, 3)
Note
Centering (ref_geom) and imaging (image_molecules=True) can be time- consuming. Consider doing this to your trajectory files prior to featurization.
- Parameters:
traj (SingleTraj)
group_definitions (Sequence[Sequence[int]])
ref_geom (Optional[md.Trajectory])
image_molecules (bool)
mass_weighted (bool)
delayed (bool)
- _nonstandard_transform_args: list[str] = ['top', 'ref_geom', 'image_molecules', 'masses_in_groups']#
- _raise_on_unitcell = False#
- _use_angle = False#
- _use_omega = False#
- _use_periodic = False#
- atom_feature = False#
- property dask_indices: str#
The name of the delayed transformation to carry out with this feature.
- Type:
- static dask_transform()#
The same as transform() but without the need to pickle traj.
When dask delayed concurrencies are distributed, required python objects are pickled. Thus, every feature needs to have its own pickled traj. That defeats the purpose of dask distributed. Thus, this method implements the same calculations as transform as a more barebones approach. It foregoes the checks for periodicity and unit-cell shape and just takes xyz, unitcell vectors, and unitcell info. Furthermore, it is a staticmethod, so it doesn’t require self to function. However, it needs the indexes in self.indexes. That’s why the dask_indices property informs the scheduler to also pickle and pass this object to the workers.
- Parameters:
indexes (np.ndarray) – For this special feature, the indexes argument in the @staticmethod dask_transform is self.group_definitions.
periodic (bool) – Whether to observe the minimum image convention and respect proteins breaking over the periodic boundary condition as a whole (True). In this case, the trajectory container in traj needs to have unitcell information. Defaults to True.
threshold (float) – The threshold in nm, under which a distance is considered to be a contact. Defaults to 5.0 nm.
count_contacts (bool) – When True, return an integer of the number of contacts instead of returning the array of regular contacts.
xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.
unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.
unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.
top (md.Topology)
ref_geom (Union[md.Trajectory, None])
image_molecules (bool)
- Return type:
dask.delayed
- describe()[source]#
Gives a list of strings describing this feature’s feature-axis.
A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].
- transform(xyz=None, unitcell_vectors=None, unitcell_info=None)[source]#
Takes xyz and unitcell information to apply the topological calculations on.
When this method is not provided with any input, it will take the traj_container provided as traj in the __init__() method and transforms this trajectory. The argument xyz can be the xyz coordinates in nanometer of a trajectory with identical topology as self.traj. If periodic was set to True, unitcell_vectors and unitcell_info should also be provided.
- Parameters:
xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.
unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.
unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.
- Returns:
The result of the computation with shape (n_frames, n_indexes).
- Return type:
np.ndarray
- class InverseDistanceFeature(traj, distance_indexes, periodic=True, delayed=False)[source]#
Bases:
DistanceFeature
- Parameters:
traj (Union[SingleTraj, TrajEnsemble])
distance_indexes (np.ndarray)
periodic (bool)
delayed (bool)
- _raise_on_unitcell = False#
- _use_angle = False#
- _use_omega = False#
- _use_periodic = True#
- atom_feature = False#
- property dask_indices: str#
The name of the delayed transformation to carry out with this feature.
- Type:
- static dask_transform()#
The same as transform() but without the need to pickle traj.
When dask delayed concurrencies are distributed, required python objects are pickled. Thus, every feature needs to have its own pickled traj. That defeats the purpose of dask distributed. Thus, this method implements the same calculations as transform as a more barebones approach. It foregoes the checks for periodicity and unit-cell shape and just takes xyz, unitcell vectors, and unitcell info. Furthermore, it is a staticmethod, so it doesn’t require self to function. However, it needs the indexes in self.indexes. That’s why the dask_indices property informs the scheduler to also pickle and pass this object to the workers.
- Parameters:
indexes (np.ndarray) – A numpy array with shape (n, ) giving the 0-based index of the atoms which positions should be returned.
periodic (bool) – Whether to observe the minimum image convention and respect proteins breaking over the periodic boundary condition as a whole (True). In this case, the trajectory container in traj needs to have unitcell information. Defaults to True.
xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.
unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.
unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.
- Return type:
dask.delayed
- transform(xyz=None, unitcell_vectors=None, unitcell_info=None)[source]#
Takes xyz and unitcell information to apply the topological calculations on.
When this method is not provided with any input, it will take the traj_container provided as traj in the __init__() method and transforms this trajectory. The argument xyz can be the xyz coordinates in nanometer of a trajectory with identical topology as self.traj. If periodic was set to True, unitcell_vectors and unitcell_info should also be provided.
- Parameters:
xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.
unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.
unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.
- Returns:
The result of the computation with shape (n_frames, n_indexes).
- Return type:
np.ndarray
- class MinRmsdFeature(traj, ref, ref_frame=0, atom_indices=None, precentered=False, delayed=False)[source]#
Bases:
Feature
- Parameters:
traj (SingleTraj)
ref (Union[md.Trajectory, SingleTraj])
ref_frame (int)
atom_indices (Optional[np.ndarray])
precentered (bool)
delayed (bool)
- _raise_on_unitcell = False#
- _use_angle = False#
- _use_omega = False#
- _use_periodic = False#
- atom_feature = False#
- property dask_indices#
The name of the delayed transformation to carry out with this feature.
- Type:
- static dask_transform()#
Takes xyz and unitcell information to apply the topological calculations on.
- Parameters:
xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.
unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.
unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.
indexes (np.ndarray)
top (md.Topology)
ref (md.Trajectory)
- Returns:
The result of the computation with shape (n_frames, n_indexes).
- Return type:
np.ndarray
- describe()[source]#
Gives a list of strings describing this feature’s feature-axis.
A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].
- transform(xyz=None, unitcell_vectors=None, unitcell_info=None)[source]#
Takes xyz and unitcell information to apply the topological calculations on.
When this method is not provided with any input, it will take the traj_container provided as traj in the __init__() method and transforms this trajectory. The argument xyz can be the xyz coordinates in nanometer of a trajectory with identical topology as self.traj. If periodic was set to True, unitcell_vectors and unitcell_info should also be provided.
- Parameters:
xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.
unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.
unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.
- Returns:
The result of the computation with shape (n_frames, n_indexes).
- Return type:
np.ndarray
- class ResidueCOMFeature(traj, residue_indices, residue_atoms, scheme='all', ref_geom=None, image_molecules=False, mass_weighted=True, delayed=False)[source]#
Bases:
GroupCOMFeature
- Parameters:
traj (SingleTraj)
residue_indices (Sequence[int])
residue_atoms (np.ndarray)
scheme (Literal['all', 'backbone', 'sidechain'])
ref_geom (Optional[md.Trajectory])
image_molecules (bool)
mass_weighted (bool)
delayed (bool)
- _raise_on_unitcell = False#
- _use_angle = False#
- _use_omega = False#
- _use_periodic = False#
- atom_feature = False#
- class ResidueMinDistanceFeature(traj, contacts, scheme, ignore_nonprotein, threshold, periodic, count_contacts=False, delayed=False)[source]#
Bases:
DistanceFeature
- Parameters:
- _raise_on_unitcell = False#
- _use_angle = False#
- _use_omega = False#
- _use_periodic = True#
- atom_feature = False#
- property dask_indices: str#
The name of the delayed transformation to carry out with this feature.
- Type:
- static dask_transform()#
The same as transform() but without the need to pickle traj.
When dask delayed concurrencies are distributed, required python objects are pickled. Thus, every feature needs to have its own pickled traj. That defeats the purpose of dask distributed. Thus, this method implements the same calculations as transform as a more barebones approach. It foregoes the checks for periodicity and unit-cell shape and just takes xyz, unitcell vectors, and unitcell info. Furthermore, it is a staticmethod, so it doesn’t require self to function. However, it needs the indexes in self.indexes. That’s why the dask_indices property informs the scheduler to also pickle and pass this object to the workers.
- Parameters:
indexes (np.ndarray) – For this special feature, the indexes argument in the @staticmethod dask_transform is self.contacts.
periodic (bool) – Whether to observe the minimum image convention and respect proteins breaking over the periodic boundary condition as a whole (True). In this case, the trajectory container in traj needs to have unitcell information. Defaults to True.
threshold (float) – The threshold in nm, under which a distance is considered to be a contact. Defaults to 5.0 nm.
count_contacts (bool) – When True, return an integer of the number of contacts instead of returning the array of regular contacts.
xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.
unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.
unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.
top (md.Topology)
scheme (Literal['ca', 'closest', 'closest-heavy'])
- Return type:
dask.delayed
- describe()[source]#
Gives a list of strings describing this feature’s feature-axis.
A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].
- transform(xyz=None, unitcell_vectors=None, unitcell_info=None)[source]#
Takes xyz and unitcell information to apply the topological calculations on.
When this method is not provided with any input, it will take the traj_container provided as traj in the __init__() method and transforms this trajectory. The argument xyz can be the xyz coordinates in nanometer of a trajectory with identical topology as self.traj. If periodic was set to True, unitcell_vectors and unitcell_info should also be provided.
- Parameters:
xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.
unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.
unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.
- Returns:
The result of the computation with shape (n_frames, n_indexes).
- Return type:
np.ndarray
- class SelectionFeature(traj, indexes, check_aas=True, delayed=False)[source]#
Bases:
Feature
- Parameters:
traj (Union[SingleTraj, TrajEnsemble])
indexes (Sequence[int])
check_aas (bool)
delayed (bool)
- _raise_on_unitcell = False#
- _use_angle = False#
- _use_omega = False#
- _use_periodic = False#
- atom_feature = False#
- property dask_indices: str#
The name of the delayed transformation to carry out with this feature.
- Type:
- static dask_transform()#
The same as transform() but without the need to pickle traj.
When dask delayed concurrencies are distributed, required python objects are pickled. Thus, every feature needs to have its own pickled traj. That defeats the purpose of dask distributed. Thus, this method implements the same calculations as transform as a more barebones approach. It foregoes the checks for periodicity and unit-cell shape and just takes xyz, unitcell vectors, and unitcell info. Furthermore, it is a staticmethod, so it doesn’t require self to function. However, it needs the indexes in self.indexes. That’s why the dask_indices property informs the scheduler to also pickle and pass this object to the workers.
- Parameters:
indexes (np.ndarray) – A numpy array with shape (n, ) giving the 0-based index of the atoms which positions should be returned.
xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.
unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.
unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.
- Return type:
dask.delayed
- describe()[source]#
Gives a list of strings describing this feature’s feature-axis.
A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].
- transform(xyz=None, unitcell_vectors=None, unitcell_info=None)[source]#
Takes xyz and unitcell information to apply the topological calculations on.
When this method is not provided with any input, it will take the traj_container provided as traj in the __init__() method and transforms this trajectory. The argument xyz can be the xyz coordinates in nanometer of a trajectory with identical topology as self.traj. If periodic was set to True, unitcell_vectors and unitcell_info should also be provided.
- Parameters:
xyz (Optional[np.ndarray]) – A numpy array with shape (n_frames, n_atoms, 3) in nanometer. If None is provided, the coordinates of self.traj will be used. Otherwise, the topology of this set of xyz coordinates should match the topology of self.atom. Defaults to None.
unitcell_vectors (Optional[np.ndarray]) – When periodic is set to True, the unitcell_vectors are needed to calculate the minimum image convention in a periodic space. This numpy array should have the shape (n_frames, 3, 3). The rows of this array correlate to the Bravais vectors a, b, and c.
unitcell_info (Optional[np.ndarray]) – Basically identical to unitcell_vectors. A numpy array of shape (n_frames, 6), where the first three columns are the unitcell_lengths in nanometer. The other three columns are the unitcell_angles in degrees.
- Returns:
The result of the computation with shape (n_frames, n_indexes).
- Return type:
np.ndarray
- class SideChainAngles(traj, deg=False, cossin=False, periodic=True, check_aas=True, generic_labels=False, delayed=False)[source]#
Bases:
AngleFeature
Feature that collects all angles not in the backbone of a topology.
- Parameters:
- top#
Topology of this feature.
- Type:
mdtraj.Topology
- indexes#
The numpy array returned from top.select(‘all’).
- Type:
np.ndarray
- _raise_on_unitcell = False#
- _use_angle = True#
- _use_omega = False#
- _use_periodic = True#
- atom_feature = False#
- describe()[source]#
Gives a list of strings describing this feature’s feature-axis.
A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].
- generic_describe()[source]#
Returns a list of generic labels, not containing residue names. These can be used to stack tops of different topology.
- property indexes#
A (n_angles, 3) shaped numpy array giving the atom indices of the angles to be calculated.
- Type:
np.ndarray
- class SideChainBondDistances(traj, periodic=True, check_aas=True, generic_labels=False, delayed=False)[source]#
Bases:
AllBondDistances
Feature that collects all bonds not in the backbone of a topology.
- Parameters:
traj (SingleTraj)
periodic (bool)
check_aas (bool)
generic_labels (bool)
delayed (bool)
- top#
Topology of this feature.
- Type:
mdtraj.Topology
- indexes#
The numpy array returned from top.select(‘all’).
- Type:
np.ndarray
- _raise_on_unitcell = False#
- _use_angle = False#
- _use_omega = False#
- _use_periodic = True#
- atom_feature = False#
- generic_describe()[source]#
Returns a list of generic labels, not containing residue names. These can be used to stack tops of different topology.
- property indexes#
A (n_angles, 2) shaped numpy array giving the atom indices of the distances to be calculated.
- Type:
np.ndarray
- class SideChainCartesians(traj, check_aas=True, generic_labels=False, delayed=False)[source]#
Bases:
SelectionFeature
Feature that collects all cartesian positions of all non-backbone atoms.
- Parameters:
traj (SingleTraj)
check_aas (bool)
generic_labels (bool)
delayed (bool)
- top#
Topology of this feature.
- Type:
mdtraj.Topology
- indexes#
The numpy array returned from top.select(‘all’).
- Type:
np.ndarray
- _raise_on_unitcell = False#
- _use_angle = False#
- _use_omega = False#
- _use_periodic = False#
- atom_feature = False#
- describe()[source]#
Gives a list of strings describing this feature’s feature-axis.
A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].
- class SideChainDihedrals(traj, selstr=None, deg=False, cossin=False, periodic=True, generic_labels=False, check_aas=True, delayed=False)[source]#
Bases:
DihedralFeature
Feature that collects all dihedrals in the backbone of a topology.
- Parameters:
traj (Union[SingleTraj, TrajEnsemble])
selstr (Optional[str])
deg (bool)
cossin (bool)
periodic (bool)
generic_labels (bool)
check_aas (bool)
delayed (bool)
- top#
Topology of this feature.
- Type:
mdtraj.Topology
- indexes#
The numpy array returned from top.select(‘all’).
- Type:
np.ndarray
- _raise_on_unitcell = False#
- _use_angle = True#
- _use_omega = False#
- _use_periodic = True#
- atom_feature = False#
- describe()[source]#
Gives a list of strings describing this feature’s feature-axis.
A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].
- generic_describe()[source]#
Returns a list of generic labels, not containing residue names. These can be used to stack tops of different topology.
- class SideChainTorsions(traj, selstr=None, deg=False, cossin=False, periodic=True, which='all', delayed=False)[source]#
Bases:
DihedralFeature
- Parameters:
traj (Union[SingleTraj, TrajEnsemble])
selstr (Optional[str])
deg (bool)
cossin (bool)
periodic (bool)
which (Union[Literal['all'], Sequence[Literal['chi1', 'chi2', 'chi3', 'chi4', 'chi5']]])
delayed (bool)
- _raise_on_unitcell = False#
- _use_angle = True#
- _use_omega = False#
- _use_periodic = True#
- atom_feature = False#
- describe()[source]#
Gives a list of strings describing this feature’s feature-axis.
A feature computes a collective variable (CV). A CV is aligned with an MD trajectory on the time/frame-axis. The feature axis is unique for every feature. A feature describing the backbone torsions (phi, omega, psi) would have a feature axis with the size 3*n-3, where n is the number of residues. The end-to-end distance of a linear protein in contrast would just have a feature axis with length 1. This describe() method will label these values unambiguously. A backbone torsion feature’s describe() could be [‘phi_1’, ‘omega_1’, ‘psi_1’, ‘phi_2’, ‘omega_2’, …, ‘psi_n-1’]. The end-to-end distance feature could be described by [‘distance_between_MET1_and_LYS80’].
- options = ('chi1', 'chi2', 'chi3', 'chi4', 'chi5')#
encodermap.loading.featurizer module#
EncoderMap featurization follows the example of the now deprecated PyEMMA package.
You can define your features in advance, inspect the expected output and then let the computer do the number crunching afterwards. This can be done with either PyEMMAs streamable featurization or new with dask and delayed on a dask-cluster of your liking. Here are the basic concepts of EncoderMap’s featurization.
- class DaskFeaturizer(trajs, n_workers='cpu-2', client=None)[source]#
Bases:
object
Container for SingleTrajFeaturizer and EnsembleFeaturizer that implements delayed transforms.
The DaskFeaturizer is similar to the other two featurizer classes and mostly implements the same API. However, instead of computing the transformations using in-memory computing, it prepares a xarray.Dataset, which contains dask.Arrays. This dataset can be lazily and distributively evaluated using dask.distributed clients and clusters.
- Parameters:
trajs (Union[SingleTraj, TrajEnsemble])
client (Optional[Client])
- build_graph(traj=None, streamable=False, return_delayeds=False)[source]#
Prepares the dask graph.
- Parameters:
with_trajectories (Optional[bool]) – Whether to also compute xyz. This can be useful if you want to also save the trajectories to disk.
traj (Optional[SingleTraj])
streamable (bool)
return_delayeds (bool)
- Return type:
None
- get_output(make_trace=False)[source]#
This function passes the trajs and the features of to dask to create a delayed xarray out of that.
- to_netcdf(filename, overwrite=False, with_trajectories=False)[source]#
Saves the dask tasks to a NetCDF4 formatted HDF5 file.
- Parameters:
overwrite (bool) – Whether to overwrite the existing filename.
with_trajectories (bool) – Also save the trajectory data. The output file can be read with encodermap.load(filename) and rebuilds the trajectories complete with traj_nums, common_str, custom_top, and all CVs, that this featurizer calculates.
- Returns:
Returns the filename of the created files.
- Return type:
- transform(traj_or_trajs=None, *args, **kwargs)[source]#
- Parameters:
traj_or_trajs (Optional[Union[SingleTraj, TrajEnsemble]])
- Return type:
np.ndarray
- class Featurizer(traj)[source]#
Bases:
object
EncoderMap’s featurization has drawn much inspiration from PyEMMA (markovmodel/PyEMMA).
EncoderMap’s Featurizer collects and computes collective variables (CVs). CVs are data that are aligned with MD trajectories on the frame/time axis. Trajectory data contains (besides the topology) an axis for atoms, and an axis for cartesian coordinate (x, y, z), so that a trajectory can be understood as an array with shape (n_frames, n_atoms, 3). A CV is an array that is aligned with the frame/time and has its own feature axis. If the trajectory in our example has 3 residues (MET, ALA, GLY), we can define 6 dihedral angles along the backbone of this peptide. These angles are:
PSI1: Between MET1-N - MET1-CA - MET1-C - ALA2-N
OMEGA1: Between MET1-CA - MET1-C - ALA2-N - ALA2-CA
PHI1: Between MET1-C - ALA2-N - ALA2-CA - ALA2-C
PSI2: Between ALA2-N - ALA2-CA - ALA2-C - GLY3-N
OMEGA2: Between ALA2-CA - ALA2-C - GLY3-N - GLY3-CA
PHI2: Between ALA2-C - GLY3-N - GLY3-CA - GLY3-C
Thus, the collective variable ‘backbone-dihedrals’ provides an array of shape (n_frames, 6) and is aligned with the frame/time axis of the trajectory.
- Parameters:
traj (Union[SingleTraj, TrajEnsemble])