Featurization#
Features#
Classes to be used as custom features with pyemma add_custom_feature
Todo
Write tests
Put the describe_last_feats function into utils.
Add Nan feature.
Write Examples.
- class encodermap.loading.features.AllBondDistances(*args, **kwargs)[source]#
Bases:
DistanceFeature
Feature that collects all bonds in a topology.
- top#
Topology of this feature.
- Type:
mdtraj.Topology
- indexes#
The numpy array returned from top.select(‘all’).
- Type:
np.ndarray
- prefix_label#
A prefix for the labels. In this case it is ‘DISTANCE’.
- Type:
str
- __serialize_fields = ('distance_indexes', 'periodic')#
attribute names to serialize
- __serialize_version = 0#
version of class definition
- describe()[source]#
Returns a list of labels, that can be used to unambiguously define atoms in the protein topology.
- Returns:
A list of labels. This list has as many entries as atoms in self.top.
- Return type:
list[str]
- property indexes#
A (n_angles, 2) shaped numpy array giving the atom indices of the distances to be calculated.
- Type:
np.ndarray
- property name#
The name of the class: “AllBondDistances”.
- Type:
str
- prefix_label = 'DISTANCE '#
- class encodermap.loading.features.AllCartesians(*args, **kwargs)[source]#
Bases:
SelectionFeature
Feature that collects all cartesian position of all atoms in the trajectory.
- top#
Topology of this feature.
- Type:
mdtraj.Topology
- indexes#
The numpy array returned from top.select(‘all’).
- Type:
np.ndarray
- prefix_label#
A prefix for the labels. In this case it is ‘POSITION’.
- Type:
str
- __init__(top)[source]#
Instantiate the AllCartesians class.
- Parameters:
top (mdtraj.Topology) – A mdtraj topology.
- __serialize_fields = ('indexes',)#
attribute names to serialize
- __serialize_version = 0#
version of class definition
- describe()[source]#
Returns a list of labels, that can be used to unambiguously define atoms in the protein topology.
- Returns:
A list of labels. This list has as many entries as atoms in self.top.
- Return type:
list[str]
- property name#
The name of this class: ‘AllCartesians’
- Type:
str
- prefix_label = 'POSITION '#
- class encodermap.loading.features.CentralAngles(*args, **kwargs)[source]#
Bases:
AngleFeature
Feature that collects all angles in the backbone of a topology.
- top#
Topology of this feature.
- Type:
mdtraj.Topology
- indexes#
The numpy array returned from top.select(‘all’).
- Type:
np.ndarray
- prefix_label#
A prefix for the labels. In this case it is ‘CENTERANGLE’.
- Type:
str
- __serialize_fields = ('angle_indexes', 'deg', 'cossin', 'periodic')#
attribute names to serialize
- __serialize_version = 0#
version of class definition
- describe()[source]#
Returns a list of labels, that can be used to unambiguously define atoms in the protein topology.
- Returns:
A list of labels. This list has as many entries as atoms in self.top.
- Return type:
list[str]
- property indexes#
A (n_angles, 3) shaped numpy array giving the atom indices of the angles to be calculated.
- Type:
np.ndarray
- property name#
The name of the class: “CentralAngles”.
- Type:
str
- prefix_label = 'CENTERANGLE '#
- class encodermap.loading.features.CentralBondDistances(*args, **kwargs)[source]#
Bases:
AllBondDistances
Feature that collects all bonds in the backbone of a topology.
- top#
Topology of this feature.
- Type:
mdtraj.Topology
- indexes#
The numpy array returned from top.select(‘all’).
- Type:
np.ndarray
- prefix_label#
A prefix for the labels. In this case it is ‘CENTERDISTANCE’.
- Type:
str
- __serialize_fields = ('distance_indexes', 'periodic')#
attribute names to serialize
- __serialize_version = 0#
version of class definition
- property indexes#
A (n_angles, 2) shaped numpy array giving the atom indices of the distances to be calculated.
- Type:
np.ndarray
- property name#
The name of the class: “CentralBondDistances”.
- Type:
str
- prefix_label = 'CENTERDISTANCE '#
- class encodermap.loading.features.CentralCartesians(*args, **kwargs)[source]#
Bases:
AllCartesians
Feature that collects all cartesian position of the backbone atoms.
- top#
Topology of this feature.
- Type:
mdtraj.Topology
- indexes#
The numpy array returned from top.select(‘all’).
- Type:
np.ndarray
- prefix_label#
A prefix for the labels. In this case it is ‘CENTERPOS’.
- Type:
str
- __serialize_fields = ('indexes',)#
attribute names to serialize
- __serialize_version = 0#
version of class definition
- describe()[source]#
Returns a list of labels, that can be used to unambiguously define atoms in the protein topology.
- Returns:
A list of labels. This list has as manyu entries as atoms in self.top.
- Return type:
list[str]
- property name#
The name of the class: “CentralCartesians”.
- Type:
str
- prefix_label = 'CENTERPOS'#
- class encodermap.loading.features.CentralDihedrals(*args, **kwargs)[source]#
Bases:
DihedralFeature
Feature that collects all dihedrals in the backbone of a topology.
- top#
Topology of this feature.
- Type:
mdtraj.Topology
- indexes#
The numpy array returned from top.select(‘all’).
- Type:
np.ndarray
- __init__(topology, selstr=None, deg=False, cossin=False, periodic=True, omega=True, generic_labels=False)[source]#
Instantiate this feature class.
- Parameters:
topology (mdtraj.Topology) – A topology to build features from.
selstr (Optional[str]) – A string, that limits the selection of dihedral angles. Only dihedral angles which atoms are represented by the selstr argument are considered. This selection string follows MDTraj’s atom selection language: https://mdtraj.org/1.9.3/atom_selection.html. Can also be None, in which case all backbone dihedrals (also omega) are considered. Defaults to None.
deg (bool) – Whether to return the result in degree (deg=True) or in radians (deg=False). Defaults to radions.
cossin (bool) – If True, each angle will be returned as a pair of (sin(x), cos(x)). This is useful, if you calculate the mean (e.g TICA/PCA, clustering) in that space. Defaults to False.
periodic (bool) – Whether to recognize periodic boundary conditions and work under the minimum image convention. Defaults to True.
- __serialize_fields = ('selstr', '_phi_inds', '_psi_inds', '_omega_inds')#
attribute names to serialize
- __serialize_version = 0#
version of class definition
- property dask_transform#
- describe()[source]#
Returns a list of labels, that can be used to unambiguously define atoms in the protein topology.
- Returns:
A list of labels. This list has as many entries as atoms in self.top.
- Return type:
list[str]
- generic_describe()[source]#
Returns a list of generic labels, not containing residue names. These can be used to stack tops of different topology.
- Returns:
A list of labels.
- Return type:
list[str]
- property indexes#
A (n_angles, 4) shaped numpy array giving the atom indices of the dihedral angles to be calculated.
- Type:
np.ndarray
- property name#
The name of the class: “CentralDihedrals”.
- Type:
str
- class encodermap.loading.features.SideChainAngles(*args, **kwargs)[source]#
Bases:
AngleFeature
Feature that collects all angles not in the backbone of a topology.
- top#
Topology of this feature.
- Type:
mdtraj.Topology
- indexes#
The numpy array returned from top.select(‘all’).
- Type:
np.ndarray
- prefix_label#
A prefix for the labels. In this case it is ‘SIDECHANGLE’.
- Type:
str
- __serialize_fields = ('angle_indexes', 'deg', 'cossin', 'periodic')#
attribute names to serialize
- __serialize_version = 0#
version of class definition
- describe()[source]#
Returns a list of labels, that can be used to unambiguously define atoms in the protein topology.
- Retruns:
list[str]: A list of labels. This list has as many entries as atoms in self.top.
- property indexes#
A (n_angles, 3) shaped numpy array giving the atom indices of the angles to be calculated.
- Type:
np.ndarray
- property name#
The name of the class: “SideChainAngles”.
- Type:
str
- prefix_label = 'SIDECHANGLE '#
- class encodermap.loading.features.SideChainBondDistances(*args, **kwargs)[source]#
Bases:
AllBondDistances
Feature that collects all bonds not in the backbone of a topology.
- top#
Topology of this feature.
- Type:
mdtraj.Topology
- indexes#
The numpy array returned from top.select(‘all’).
- Type:
np.ndarray
- prefix_label#
A prefix for the labels. In this case it is ‘SIDECHDISTANCE’.
- Type:
str
- __serialize_fields = ('distance_indexes', 'periodic')#
attribute names to serialize
- __serialize_version = 0#
version of class definition
- property indexes#
A (n_angles, 2) shaped numpy array giving the atom indices of the distances to be calculated.
- Type:
np.ndarray
- property name#
The name of the class: “SideChainBondDistances”.
- Type:
str
- prefix_label = 'SIDECHDISTANCE '#
- class encodermap.loading.features.SideChainCartesians(*args, **kwargs)[source]#
Bases:
AllCartesians
Feature that collects all cartesian position of all non-backbone atoms.
- top#
Topology of this feature.
- Type:
mdtraj.Topology
- indexes#
The numpy array returned from top.select(‘all’).
- Type:
np.ndarray
- prefix_label#
A prefix for the labels. In this case it is ‘SIDECHPOS’.
- Type:
str
- __serialize_fields = ('indexes',)#
attribute names to serialize
- __serialize_version = 0#
version of class definition
- property name#
The name of the class: “SideChainCartesians”.
- Type:
str
- prefix_label = 'SIDECHPOS'#
- class encodermap.loading.features.SideChainDihedrals(*args, **kwargs)[source]#
Bases:
DihedralFeature
Feature that collects all dihedrals in the backbone of a topology.
- top#
Topology of this feature.
- Type:
mdtraj.Topology
- indexes#
The numpy array returned from top.select(‘all’).
- Type:
np.ndarray
- options#
A list of possible sidechain angles [‘chi1’ to ‘chi5’].
- Type:
list[str]
- __serialize_fields: tuple[str] = ('_prefix_label_lengths',)#
attribute names to serialize
- __serialize_version: int = 0#
version of class definition
- describe()[source]#
Returns a list of labels, that can be used to unambiguously define atoms in the protein topology.
- Returns:
A list of labels. This list has as many entries as atoms in self.top.
- Return type:
list[str]
- property indexes#
A (n_angles, 4) shaped numpy array giving the atom indices of the dihedral angles to be calculated.
- Type:
np.ndarray
- property name#
The name of the class: “SideChainDihedrals”.
- Type:
str
- options: list[str] = ['chi1', 'chi2', 'chi3', 'chi4', 'chi5']#
- encodermap.loading.features.add_KAC_backbone_bonds(top)[source]#
Adds acetylated Lysine specific backbone bonds to mdtraj.Topology.
- Parameters:
top (mdtraj.Topology) – The topology to be extended.
- Returns:
The new topology with added bonds.
- Return type:
mdtraj.Topology
Note
The bonds are currently not at the correct index, i.e. they are at the very end of top.bonds and not at the correct position.
- encodermap.loading.features.add_KAC_sidechain_bonds(top)[source]#
Adds acetylated Lysine specific side chain bonds to mdtraj.Topology. Bonds between indented atoms are added: KAC11-N 102 KAC11-H 103
KAC11-CA 104 KAC11-CB 105 KAC11-CG 106 KAC11-CD 107 KAC11-CE 108 KAC11-NZ 109
KAC11-HZ 110 KAC11-CH 111 KAC11-OI2 112 KAC11-CI1 113 KAC11-C 114 KAC11-O 115
- Parameters:
top (mdtraj.Topology) – The topology to be extended.
- Returns:
The new topology with added bonds.
- Return type:
mdtraj.Topology
Note
The bonds are currently not at the correct index, i.e. they are at the very end of top.bonds and not at the correct position.
- encodermap.loading.features.describe_last_feats(feat: AnyFeature, n: int = 5) None [source]#
Prints the description of the last n features.
- Parameters:
feat (encodermap.Featurizer) – An instance of a featurizer.
n (Optional[int]) – The number of last features to decribe. Defaults to 5.