SingleTraj#

class SingleTraj(traj, top=None, common_str='', backend='no_load', index=None, traj_num=None, basename_fn=None, custom_top=None)[source]#

This class contains the info about a single trajectory.

This class contains many of the attributes and methods of mdtraj.Trajectory. It is meant to be used as a standalone single trajetcory or in an ensemble defined in the encodermap.trajinfo.info_all.TrajEnsemble class. Other than the standard mdtraj.Trajectory this class loads the MD data only when needed. The location of the file(s) and other attributes like indices (int, list[int], numpy.ndarray, slice) are stored until the traj is accessed via the SingleTraj.traj attribute. The returned traj is a mdtraj.Trajectory with the correct number of frames in the correct sequence.

Besides MD data, this class keeps track of your collective variables. Oftentimes the raw xyz data of a trajectory is not needed for understanding the conformation and suitable CVs are selected to represent a protein via internal coordinates (torsions, pairwise distances, etc.). This class keeps tack of your CVs. Whether you call them 'highd' or 'torsions', this class keeps track of everything and returns the values when you need them.

SingleTraj supports fancy indexing, so you can extract one or more frames from a Trajectory as a separate trajectory. For example, to form a trajectory with every other frame, you can slice with traj[::2].

Note

SingleTraj uses the nanometer, degree & picosecond unit system.

backend#

Current state of loading. If backend == 'no_load' xyz data will be loaded from disk, if accessed. If backend == 'mdtraj', the data is already in RAM.

Type:: str

common_str#

Substring of traj_file and top_file. Used to group multiple trajectory and topology files. If traj files=['protein1_traj1.xtc', 'protein1_traj2.xtc'] have the same topolgy stored in a file called 'protein1.pdb', you can load them with common_str='protein1' together with more .xtc and .pdb files and these two .xtc files will use the correct .pdb file.

Type:: str

index#

A sequence of fancy slices of the trajectory. When file is loaded from disk, the fancy indexes will be applied one after the other.

Type:: Sequence[Union[None, int, list, numpy.ndarray, slice]]

traj_num#

Integer to identify a SingleTraj class in a TrajEnsemble class.

Type:: int

traj_file#

Trajectory file used to create this class.

Type:: str

top_file#

Topology file used to create this class. If a .h5 trajectory was used traj_file and top_file are identical. If a mdtraj.Trajectory was used to create SingleTraj, these strings are empty.

Type:: str

Examples

Load a pdb file with 14 frames from rcsb.org

>>> import encodermap as em
>>> traj = em.SingleTraj("https://files.rcsb.org/view/1GHC.pdb")
>>> traj  
<encodermap.SingleTraj object...
>>> traj.n_frames
14

Providing common_str sets this attribute.

>>> traj = em.SingleTraj("https://files.rcsb.org/view/1GHC.pdb", common_str="1GHC")
>>> traj.common_str
'1GHC'

Indexing using integers returns a SingleTraj with only one frame.

>>> frame = traj[5]
>>> frame.n_frames
1

Indexing can also use lists of integers.

>>> subset = traj[[0, 1, 5]]
>>> subset.n_frames
3

Further indexing this subset uses the current trajectory ‘as is’. Although frame 0, 1, and 5 have been extracted from traj, we get frame 5 from subset by indexing with 2.

>>> frame = subset[2]
>>> frame.id
array([5])

Indexing using the original frame indices from the file is done using the fsel[] accessor.

>>> frame = subset.fsel[5]
>>> frame.id
array([5])

Advanced slicing

>>> traj = em.SingleTraj("https://files.rcsb.org/view/1GHC.pdb")[-1:7:-2]
>>> [frame.id[0] for frame in traj]
[13, 11, 9]

The traj_num argument is mainly used in encodermap.TrajEnsemble, but can be provided manually.

>>> traj = em.SingleTraj("https://files.rcsb.org/view/1GHC.pdb", traj_num=2)
>>> traj.traj_num
2

The argument basename_fn should be a callable, that takes a string and returns a string.

>>> from pathlib import Path
>>> def my_basename_fn(filename):
...     stem = str(Path(filename).stem)
...     return "custom_" + stem
>>> traj = em.SingleTraj("https://files.rcsb.org/view/1GHC.pdb", basename_fn=my_basename_fn)
>>> traj.basename
'custom_1GHC'

Build a trajectory ensemble from multiple SingleTraj objects.

>>> traj1 = em.SingleTraj("https://files.rcsb.org/view/1YUG.pdb")
>>> traj2 = em.SingleTraj("https://files.rcsb.org/view/1YUF.pdb")
>>> trajs = traj1 + traj2
>>> print(trajs.n_trajs, trajs.n_frames, [traj.n_frames for traj in trajs])
2 31 [15, 16]

Initialize the SingleTraj object with location and reference pdb file.

Parameters:

traj (Union[str, mdtraj.Trajectory]) –
The trajectory. This argument can either be the filename of a trajectory file (.xtc, .dcd, .h5, .trr) or an instance of

mdtraj.Trajectory.
top (Union[str, mdtraj.Topology], optional) – The path to the topology file. Defaults to None. If a mdtraj.Trajectory or a .h5 file is provided in traj, this argument will not be used and the topology from the corresponding traj argument will be used.
common_str (str, optional) – A string to group traj of similar topology. If multiple SingleTraj are grouped in one encodermap.trajinfo.info_all.TrajEnsemble, the common_str is used to group them together. Defaults to ‘’ which means this instance of SingleTraj won’t have a common string.
backend (Literal['no_load', 'mdtraj'], optional) –
Choose the backend to load trajectories.
- ’mdtraj’ uses mdtraj, which loads all trajectories into RAM.
- ’no_load’ creates an empty trajectory object.
Defaults to ‘no_load’
index (Optional[Union[int, list[int], numpy.ndarray, slice]]) – An integer or a Sequence of int. If an integer is provided, only the frame at this position will be loaded once the internal mdtraj.Trajectory is accessed. If an array or list is provided, the corresponding frames will be used. Indices always slice the trajectory as is, meaning they don’t index the original frames of the trajectory on disk (see Example section). These indices can have duplicates: [0, 1, 1, 2, 0, 1]. A slice object can also be provided. Supports fancy slicing (traj[1:50:3]). If None is provided, the traj is loaded fully. Defaults to None.
traj_num (Union[int, None], optional) – If working with multiple trajs, this is the easiest unique identifier. If multiple SingleTraj are instantiated by encodermap.trajinfo.info_all.TrajEnsemble the traj_num is used as a unique identifier per traj. Defaults to None.
basename_fn (Optional[Callable[[str], str]]) – A function to apply to traj to give it another identifier. If all your trajs are called 'traj.xtc' and only the directory they’re in gives them a unique identifier, you can provide a function into this argument to split the path. The function has to take a str and return str. If None is provided, the basename is extracted like so: lambda x: x.split('/')[0].split('.')[-1]. Defaults to None, in which case the filename without extension will be used.
custom_top (Optional[CustomAAsDict]) – Optional[encodermap._typing.CustomAAsDict]: An instance of the encodermap.trajinfo.trajinfo_utils.CustomTopology class or a dictionary that can be made into such.

atom_slice(atom_indices, invert=False)[source]#

Deletes atoms from this SingleTraj instance.

Parameters:

atom_indices (Union[list, numpy.ndarray]) – The indices of the atoms to keep.
invert (bool) – If False, it is assumed, that the atoms in atom_indices are the ones to be kept. If True, the atoms in atom_indices are the ones to be removed.

Return type:

None

copy()[source]#

Returns a copy of self.

Return type:: SingleTraj

dash_summary()[source]#

Returns a pandas.DataFrame with useful information about this instance.

Returns:: The dataframe.
Return type:: pd.DataFrame

del_CVs()[source]#

Resets the _CVs attribute to an empty xarray.Dataset.

Return type:: None

classmethod from_pdb_id(pdb_id, traj_num=None)[source]#

Alternate constructor for the TrajEnsemble class.

Builds an SingleTraj class from a pdb-id.

Parameters:

pdb_id (str) – The 4-letter pdb id.
traj_num (int | None)

Returns:

An SingleTraj class.

Return type:

SingleTraj

get_single_frame(key)[source]#

Returns a single frame from the trajectory.

Parameters:: key (Union[int, np.int]) – Index of the frame.
Return type:: SingleTraj

Examples

Import EncoderMap and load SingleTraj.

>>> import encodermap as em
>>> traj = em.SingleTraj("https://files.rcsb.org/view/1GHC.pdb")
>>> traj.n_frames
14

Load the same traj and give it a traj_num for recognition in a set of multiple trajectories.

>>> traj = em.SingleTraj("https://files.rcsb.org/view/1GHC.pdb", traj_num=5)
>>> frame = traj.get_single_frame(2)
>>> frame.id
array([[5, 2]])

iterframes(with_traj_num: bool = False) → Iterable[tuple[int, SingleTraj]][source]#

iterframes(with_traj_num: bool = True) → Iterable[tuple[int, int, SingleTraj]]

Iterator over the frames in this class.

Parameters:

with_traj_num (bool) – Whether to return a three-tuple of traj_num, frame_num, frame (True) or just traj_num, frame (False).

Yields:

tuple –

A tuple containing the following:

int: The traj_num.
int: The frame_num.
encodermap.SingleTraj: An SingleTraj object.

Examples

Import EncoderMap and create SingleTraj instance.

>>> import encodermap as em
>>> traj = em.SingleTraj('https://files.rcsb.org/view/1YUG.pdb')
>>> traj.n_frames
15

Slicing the trajectory every 5th frame

>>> traj = traj[::5]
>>> traj.n_frames
3

Using the iterframes() iterator.

>>> for frame_num, frame in traj.iterframes():
...     print(frame_num, frame.n_frames)
0 1
5 1
10 1

join(other)[source]#

Join two trajectories together along the time/frame axis.

Note

Returns a mdtraj.Trajectory and thus loses CVs, filenames, etc.

Parameters:: other (SingleTraj | Trajectory)
Return type:: Trajectory

load_CV(data, attr_name=None, cols=None, deg=None, periodic=True, labels=None, override=False)[source]#

Load CVs into traj. Many options are possible. Provide xarray, numpy array, em.loading.feature, em.featurizer, and even string!

This method loads CVs into the SingleTraj instance. Many ways of doing so are available:

numpy.ndarray: The easiest way. Provide a np array and a name for
the array, and the data will be saved as an instance variable, accesible via SingleTraj.name.

xarray.DataArray: You can load a multidimensional xarray as
data into the class. Please refer to xarrays own documentation if you want to create one yourself.

xarray.Dataset: You can add another dataset to the existing _CVs.

encodermap.loading.features.Feature: If you provide one of the
features from encodermap.loading.features the resulting features will be loaded and also be placed under the set name.

encodermap.loading.featurizer.Featurizer: If you provide a
full featurizer, the data will be generated and be accessible as an attribute.

str: If a string is provided, the data will be loaded from a
.txt, .npy, or NetCDF / HDF5 .nc file.

Parameters:

(Union[str (data) – em.loading.features.Feature, em.loading.featurizer.Featurizer]): The CV to load. Either as numpy.ndarray, xarray.DataArray, EncoderMap feature, or EncoderMap Featurizer.
numpy.ndarray – em.loading.features.Feature, em.loading.featurizer.Featurizer]): The CV to load. Either as numpy.ndarray, xarray.DataArray, EncoderMap feature, or EncoderMap Featurizer.
xr.DataArray – em.loading.features.Feature, em.loading.featurizer.Featurizer]): The CV to load. Either as numpy.ndarray, xarray.DataArray, EncoderMap feature, or EncoderMap Featurizer.
data (SingleTrajFeatureType)
attr_name (Optional[str])
cols (Optional[list[int]])
deg (Optional[bool])
periodic (bool)
labels (Optional[list[str]])
override (bool)

Return type:

None

:paramem.loading.features.Feature, em.loading.featurizer.Featurizer]):: The CV to load. Either as numpy.ndarray, xarray.DataArray, EncoderMap feature, or EncoderMap Featurizer.

Parameters:

attr_name (Optional[str]) – The name under which the CV should be found in the class. Is needed, if a raw numpy array is passed, otherwise the name will be generated from the filename (if data == str), the DataArray.name (if data == xarray.DataArray), or the feature name.
cols (Optional[list]) – A list specifying the columns to use it for the high-dimensional data. If your highD data contains (x,y,z,…)-errors or has an enumeration column at col=0 this can be used to remove this unwanted data.
deg (Optional[bool]) – Whether the provided data is in radians (False) or degree (True). It can also be None for non-angular data.
labels (Optional[Union[list, str]]) – If you want to label the data you provided, pass a list of str. If set to None, the features in this dimension will be labeled as [f"{attr_name.upper()} FEATURE {i}" for i in range(self.n_frames)]. If a str is provided, the features will be labeled as [f"{attr_name.upper()} {label.upper()} {i}" for i in range(self.n_frames)]. If a list of str is provided, it needs to have the same length as the traj has frames. Defaults to None.
override (bool) – Whether to overwrite existing CVs. The method will also print a message which CVs have been overwritten.
data (SingleTrajFeatureType)
periodic (bool)

Return type:

None

Examples

Import EncoderMap and load an example Trajectory.

>>> import encodermap as em
>>> traj = em.SingleTraj("https://files.rcsb.org/view/1GHC.pdb")

Load the central dihedrals using ``data=’central_dihedrals’` as shortcut.

>>> traj.load_CV("central_dihedrals")
>>> traj.central_dihedrals.shape
(14, 222)

>>> traj._CVs['central_dihedrals']['CENTRAL_DIHEDRALS'].values[:2]
['CENTERDIH PSI   RESID  MET:   1 CHAIN 0'
 'CENTERDIH OMEGA RESID  MET:   1 CHAIN 0']

Slicing the SingleTraj keeps all CVs in order.

>>> import numpy as np
>>> from pathlib import Path
>>> traj1 = em.SingleTraj(
...     Path(em.__file__).parent.parent / "tests/data/1am7_corrected.xtc",
...     Path(em.__file__).parent.parent / "tests/data/1am7_protein.pdb",
... )
>>> traj1.load_CV(traj1.xyz[..., -1], 'z_coordinate')
...
>>> for i, frame in enumerate(traj1):
...     print(np.array_equal(frame.z_coordinate[0], frame.xyz[0, :, -1]))
...     if i == 3:
...         break
True
True
True
True

Raises:

FileNotFoundError – When the file given by data does not exist.
IOError – When the provided filename does not have .txt, .npy or .nc extension.
TypeError – When data does not match the specified input types.
Exception – When a numpy array has been passed as data and no attr_name has been provided.
Exception – When the provided attr_name is str, but cannot be a python identifier.

Parameters:

data (SingleTrajFeatureType)
attr_name (Optional[str])
cols (Optional[list[int]])
deg (Optional[bool])
periodic (bool)
labels (Optional[list[str]])
override (bool)

Return type:

None

load_custom_topology(custom_top=None)[source]#

Loads a custom_topology from a CustomTopology class or a dict.

SingleTraj#

This Page