Featurization#

Features#

Classes to be used as custom features with pyemma add_custom_feature

Todo

  • Write tests

  • Put the describe_last_feats function into utils.

  • Add Nan feature.

  • Write Examples.

class encodermap.loading.features.AllBondDistances(*args, **kwargs)[source]#

Bases: DistanceFeature

Feature that collects all bonds in a topology.

top#

Topology of this feature.

Type:

mdtraj.Topology

indexes#

The numpy array returned from top.select(‘all’).

Type:

np.ndarray

prefix_label#

A prefix for the labels. In this case it is ‘DISTANCE’.

Type:

str

__serialize_fields = ('distance_indexes', 'periodic')#

attribute names to serialize

__serialize_version = 0#

version of class definition

describe()[source]#

Returns a list of labels, that can be used to unambiguously define atoms in the protein topology.

Returns:

A list of labels. This list has as many entries as atoms in self.top.

Return type:

list[str]

generic_describe()[source]#
property indexes#

A (n_angles, 2) shaped numpy array giving the atom indices of the distances to be calculated.

Type:

np.ndarray

property name#

The name of the class: “AllBondDistances”.

Type:

str

prefix_label = 'DISTANCE        '#
class encodermap.loading.features.AllCartesians(*args, **kwargs)[source]#

Bases: SelectionFeature

Feature that collects all cartesian position of all atoms in the trajectory.

top#

Topology of this feature.

Type:

mdtraj.Topology

indexes#

The numpy array returned from top.select(‘all’).

Type:

np.ndarray

prefix_label#

A prefix for the labels. In this case it is ‘POSITION’.

Type:

str

__init__(top)[source]#

Instantiate the AllCartesians class.

Parameters:

top (mdtraj.Topology) – A mdtraj topology.

__serialize_fields = ('indexes',)#

attribute names to serialize

__serialize_version = 0#

version of class definition

describe()[source]#

Returns a list of labels, that can be used to unambiguously define atoms in the protein topology.

Returns:

A list of labels. This list has as many entries as atoms in self.top.

Return type:

list[str]

property name#

The name of this class: ‘AllCartesians’

Type:

str

prefix_label = 'POSITION '#
class encodermap.loading.features.CentralAngles(*args, **kwargs)[source]#

Bases: AngleFeature

Feature that collects all angles in the backbone of a topology.

top#

Topology of this feature.

Type:

mdtraj.Topology

indexes#

The numpy array returned from top.select(‘all’).

Type:

np.ndarray

prefix_label#

A prefix for the labels. In this case it is ‘CENTERANGLE’.

Type:

str

__serialize_fields = ('angle_indexes', 'deg', 'cossin', 'periodic')#

attribute names to serialize

__serialize_version = 0#

version of class definition

describe()[source]#

Returns a list of labels, that can be used to unambiguously define atoms in the protein topology.

Returns:

A list of labels. This list has as many entries as atoms in self.top.

Return type:

list[str]

generic_describe()[source]#
property indexes#

A (n_angles, 3) shaped numpy array giving the atom indices of the angles to be calculated.

Type:

np.ndarray

property name#

The name of the class: “CentralAngles”.

Type:

str

prefix_label = 'CENTERANGLE '#
class encodermap.loading.features.CentralBondDistances(*args, **kwargs)[source]#

Bases: AllBondDistances

Feature that collects all bonds in the backbone of a topology.

top#

Topology of this feature.

Type:

mdtraj.Topology

indexes#

The numpy array returned from top.select(‘all’).

Type:

np.ndarray

prefix_label#

A prefix for the labels. In this case it is ‘CENTERDISTANCE’.

Type:

str

__serialize_fields = ('distance_indexes', 'periodic')#

attribute names to serialize

__serialize_version = 0#

version of class definition

property indexes#

A (n_angles, 2) shaped numpy array giving the atom indices of the distances to be calculated.

Type:

np.ndarray

property name#

The name of the class: “CentralBondDistances”.

Type:

str

prefix_label = 'CENTERDISTANCE  '#
class encodermap.loading.features.CentralCartesians(*args, **kwargs)[source]#

Bases: AllCartesians

Feature that collects all cartesian position of the backbone atoms.

top#

Topology of this feature.

Type:

mdtraj.Topology

indexes#

The numpy array returned from top.select(‘all’).

Type:

np.ndarray

prefix_label#

A prefix for the labels. In this case it is ‘CENTERPOS’.

Type:

str

__serialize_fields = ('indexes',)#

attribute names to serialize

__serialize_version = 0#

version of class definition

describe()[source]#

Returns a list of labels, that can be used to unambiguously define atoms in the protein topology.

Returns:

A list of labels. This list has as manyu entries as atoms in self.top.

Return type:

list[str]

generic_describe()[source]#
property name#

The name of the class: “CentralCartesians”.

Type:

str

prefix_label = 'CENTERPOS'#
class encodermap.loading.features.CentralDihedrals(*args, **kwargs)[source]#

Bases: DihedralFeature

Feature that collects all dihedrals in the backbone of a topology.

top#

Topology of this feature.

Type:

mdtraj.Topology

indexes#

The numpy array returned from top.select(‘all’).

Type:

np.ndarray

__init__(topology, selstr=None, deg=False, cossin=False, periodic=True, omega=True, generic_labels=False)[source]#

Instantiate this feature class.

Parameters:
  • topology (mdtraj.Topology) – A topology to build features from.

  • selstr (Optional[str]) – A string, that limits the selection of dihedral angles. Only dihedral angles which atoms are represented by the selstr argument are considered. This selection string follows MDTraj’s atom selection language: https://mdtraj.org/1.9.3/atom_selection.html. Can also be None, in which case all backbone dihedrals (also omega) are considered. Defaults to None.

  • deg (bool) – Whether to return the result in degree (deg=True) or in radians (deg=False). Defaults to radions.

  • cossin (bool) – If True, each angle will be returned as a pair of (sin(x), cos(x)). This is useful, if you calculate the mean (e.g TICA/PCA, clustering) in that space. Defaults to False.

  • periodic (bool) – Whether to recognize periodic boundary conditions and work under the minimum image convention. Defaults to True.

__serialize_fields = ('selstr', '_phi_inds', '_psi_inds', '_omega_inds')#

attribute names to serialize

__serialize_version = 0#

version of class definition

property dask_transform#
describe()[source]#

Returns a list of labels, that can be used to unambiguously define atoms in the protein topology.

Returns:

A list of labels. This list has as many entries as atoms in self.top.

Return type:

list[str]

generic_describe()[source]#

Returns a list of generic labels, not containing residue names. These can be used to stack tops of different topology.

Returns:

A list of labels.

Return type:

list[str]

property indexes#

A (n_angles, 4) shaped numpy array giving the atom indices of the dihedral angles to be calculated.

Type:

np.ndarray

property name#

The name of the class: “CentralDihedrals”.

Type:

str

class encodermap.loading.features.SideChainAngles(*args, **kwargs)[source]#

Bases: AngleFeature

Feature that collects all angles not in the backbone of a topology.

top#

Topology of this feature.

Type:

mdtraj.Topology

indexes#

The numpy array returned from top.select(‘all’).

Type:

np.ndarray

prefix_label#

A prefix for the labels. In this case it is ‘SIDECHANGLE’.

Type:

str

__serialize_fields = ('angle_indexes', 'deg', 'cossin', 'periodic')#

attribute names to serialize

__serialize_version = 0#

version of class definition

describe()[source]#

Returns a list of labels, that can be used to unambiguously define atoms in the protein topology.

Retruns:

list[str]: A list of labels. This list has as many entries as atoms in self.top.

property indexes#

A (n_angles, 3) shaped numpy array giving the atom indices of the angles to be calculated.

Type:

np.ndarray

property name#

The name of the class: “SideChainAngles”.

Type:

str

prefix_label = 'SIDECHANGLE '#
class encodermap.loading.features.SideChainBondDistances(*args, **kwargs)[source]#

Bases: AllBondDistances

Feature that collects all bonds not in the backbone of a topology.

top#

Topology of this feature.

Type:

mdtraj.Topology

indexes#

The numpy array returned from top.select(‘all’).

Type:

np.ndarray

prefix_label#

A prefix for the labels. In this case it is ‘SIDECHDISTANCE’.

Type:

str

__serialize_fields = ('distance_indexes', 'periodic')#

attribute names to serialize

__serialize_version = 0#

version of class definition

property indexes#

A (n_angles, 2) shaped numpy array giving the atom indices of the distances to be calculated.

Type:

np.ndarray

property name#

The name of the class: “SideChainBondDistances”.

Type:

str

prefix_label = 'SIDECHDISTANCE  '#
class encodermap.loading.features.SideChainCartesians(*args, **kwargs)[source]#

Bases: AllCartesians

Feature that collects all cartesian position of all non-backbone atoms.

top#

Topology of this feature.

Type:

mdtraj.Topology

indexes#

The numpy array returned from top.select(‘all’).

Type:

np.ndarray

prefix_label#

A prefix for the labels. In this case it is ‘SIDECHPOS’.

Type:

str

__serialize_fields = ('indexes',)#

attribute names to serialize

__serialize_version = 0#

version of class definition

property name#

The name of the class: “SideChainCartesians”.

Type:

str

prefix_label = 'SIDECHPOS'#
class encodermap.loading.features.SideChainDihedrals(*args, **kwargs)[source]#

Bases: DihedralFeature

Feature that collects all dihedrals in the backbone of a topology.

top#

Topology of this feature.

Type:

mdtraj.Topology

indexes#

The numpy array returned from top.select(‘all’).

Type:

np.ndarray

options#

A list of possible sidechain angles [‘chi1’ to ‘chi5’].

Type:

list[str]

__serialize_fields: tuple[str] = ('_prefix_label_lengths',)#

attribute names to serialize

__serialize_version: int = 0#

version of class definition

describe()[source]#

Returns a list of labels, that can be used to unambiguously define atoms in the protein topology.

Returns:

A list of labels. This list has as many entries as atoms in self.top.

Return type:

list[str]

generic_describe()[source]#
property indexes#

A (n_angles, 4) shaped numpy array giving the atom indices of the dihedral angles to be calculated.

Type:

np.ndarray

property name#

The name of the class: “SideChainDihedrals”.

Type:

str

options: list[str] = ['chi1', 'chi2', 'chi3', 'chi4', 'chi5']#
encodermap.loading.features.add_KAC_backbone_bonds(top)[source]#

Adds acetylated Lysine specific backbone bonds to mdtraj.Topology.

Parameters:

top (mdtraj.Topology) – The topology to be extended.

Returns:

The new topology with added bonds.

Return type:

mdtraj.Topology

Note

The bonds are currently not at the correct index, i.e. they are at the very end of top.bonds and not at the correct position.

encodermap.loading.features.add_KAC_sidechain_bonds(top)[source]#

Adds acetylated Lysine specific side chain bonds to mdtraj.Topology. Bonds between indented atoms are added: KAC11-N 102 KAC11-H 103

KAC11-CA 104 KAC11-CB 105 KAC11-CG 106 KAC11-CD 107 KAC11-CE 108 KAC11-NZ 109

KAC11-HZ 110 KAC11-CH 111 KAC11-OI2 112 KAC11-CI1 113 KAC11-C 114 KAC11-O 115

Parameters:

top (mdtraj.Topology) – The topology to be extended.

Returns:

The new topology with added bonds.

Return type:

mdtraj.Topology

Note

The bonds are currently not at the correct index, i.e. they are at the very end of top.bonds and not at the correct position.

encodermap.loading.features.describe_last_feats(feat: AnyFeature, n: int = 5) None[source]#

Prints the description of the last n features.

Parameters:
  • feat (encodermap.Featurizer) – An instance of a featurizer.

  • n (Optional[int]) – The number of last features to decribe. Defaults to 5.