SingleTraj#
- class SingleTraj(traj, top=None, common_str='', backend='no_load', index=None, traj_num=None, basename_fn=None, custom_top=None)[source]#
This class contains the info about a single trajectory.
This class contains many of the attributes and methods of mdtraj.Trajectory. It is meant to be used as a standalone single trajetcory or in an ensemble defined in the
encodermap.trajinfo.info_all.TrajEnsemble
class. Other than the standard mdtraj.Trajectory this class loads the MD data only when needed. The location of the file(s) and other attributes like indices (int, list[int], numpy.ndarray, slice) are stored until the traj is accessed via theSingleTraj.traj
attribute. The returned traj is a mdtraj.Trajectory with the correct number of frames in the correct sequence.Besides MD data, this class keeps track of your collective variables. Oftentimes the raw xyz data of a trajectory is not needed for understanding the conformation and suitable CVs are selected to represent a protein via internal coordinates (torsions, pairwise distances, etc.). This class keeps tack of your CVs. Whether you call them
'highd'
or'torsions'
, this class keeps track of everything and returns the values when you need them.SingleTraj
supports fancy indexing, so you can extract one or more frames from a Trajectory as a separate trajectory. For example, to form a trajectory with every other frame, you can slice withtraj[::2]
.Note
SingleTraj uses the nanometer, degree & picosecond unit system.
- backend#
Current state of loading. If
backend == 'no_load'
xyz data will be loaded from disk, if accessed. Ifbackend == 'mdtraj'
, the data is already in RAM.- Type:
- common_str#
Substring of
traj_file
andtop_file
. Used to group multiple trajectory and topology files. Iftraj files=['protein1_traj1.xtc', 'protein1_traj2.xtc']
have the same topolgy stored in a file called'protein1.pdb'
, you can load them withcommon_str='protein1'
together with more.xtc
and.pdb
files and these two.xtc
files will use the correct.pdb
file.- Type:
- index#
A sequence of fancy slices of the trajectory. When file is loaded from disk, the fancy indexes will be applied one after the other.
- Type:
Sequence[Union[None, int, list, numpy.ndarray, slice]]
- top_file#
Topology file used to create this class. If a .h5 trajectory was used traj_file and top_file are identical. If a
mdtraj.Trajectory
was used to create SingleTraj, these strings are empty.- Type:
Examples
Load a pdb file with 14 frames from rcsb.org
>>> import encodermap as em >>> traj = em.SingleTraj("https://files.rcsb.org/view/1GHC.pdb") >>> traj <encodermap.SingleTraj object... >>> traj.n_frames 14
Providing common_str sets this attribute.
>>> traj = em.SingleTraj("https://files.rcsb.org/view/1GHC.pdb", common_str="1GHC") >>> traj.common_str '1GHC'
Indexing using integers returns a SingleTraj with only one frame.
>>> frame = traj[5] >>> frame.n_frames 1
Indexing can also use lists of integers.
>>> subset = traj[[0, 1, 5]] >>> subset.n_frames 3
Further indexing this
subset
uses the current trajectory ‘as is’. Although frame 0, 1, and 5 have been extracted fromtraj
, we get frame 5 fromsubset
by indexing with 2.>>> frame = subset[2] >>> frame.id array([5])
Indexing using the original frame indices from the file is done using the
fsel[]
accessor.>>> frame = subset.fsel[5] >>> frame.id array([5])
Advanced slicing
>>> traj = em.SingleTraj("https://files.rcsb.org/view/1GHC.pdb")[-1:7:-2] >>> [frame.id[0] for frame in traj] [13, 11, 9]
The
traj_num
argument is mainly used inencodermap.TrajEnsemble
, but can be provided manually.>>> traj = em.SingleTraj("https://files.rcsb.org/view/1GHC.pdb", traj_num=2) >>> traj.traj_num 2
The argument
basename_fn
should be a callable, that takes a string and returns a string.>>> from pathlib import Path >>> def my_basename_fn(filename): ... stem = str(Path(filename).stem) ... return "custom_" + stem >>> traj = em.SingleTraj("https://files.rcsb.org/view/1GHC.pdb", basename_fn=my_basename_fn) >>> traj.basename 'custom_1GHC'
Build a trajectory ensemble from multiple
SingleTraj
objects.>>> traj1 = em.SingleTraj("https://files.rcsb.org/view/1YUG.pdb") >>> traj2 = em.SingleTraj("https://files.rcsb.org/view/1YUF.pdb") >>> trajs = traj1 + traj2 >>> print(trajs.n_trajs, trajs.n_frames, [traj.n_frames for traj in trajs]) 2 31 [15, 16]
Initialize the SingleTraj object with location and reference pdb file.
- Parameters:
traj (Union[str, mdtraj.Trajectory]) –
The trajectory. This argument can either be the filename of a trajectory file (
.xtc, .dcd, .h5, .trr
) or an instance ofmdtraj.Trajectory
.top (Union[str, mdtraj.Topology], optional) – The path to the topology file. Defaults to
None
. If amdtraj.Trajectory
or a.h5
file is provided intraj
, this argument will not be used and the topology from the correspondingtraj
argument will be used.common_str (str, optional) – A string to group traj of similar topology. If multiple
SingleTraj
are grouped in oneencodermap.trajinfo.info_all.TrajEnsemble
, thecommon_str
is used to group them together. Defaults to ‘’ which means this instance ofSingleTraj
won’t have a common string.backend (Literal['no_load', 'mdtraj'], optional) –
Choose the backend to load trajectories.
’mdtraj’ uses mdtraj, which loads all trajectories into RAM.
’no_load’ creates an empty trajectory object.
Defaults to ‘no_load’
index (Optional[Union[int, list[int], numpy.ndarray, slice]]) – An integer or a Sequence of
int
. If an integer is provided, only the frame at this position will be loaded once the internalmdtraj.Trajectory
is accessed. If an array or list is provided, the corresponding frames will be used. Indices always slice the trajectory as is, meaning they don’t index the original frames of the trajectory on disk (see Example section). These indices can have duplicates:[0, 1, 1, 2, 0, 1]
. A slice object can also be provided. Supports fancy slicing (traj[1:50:3]
). IfNone
is provided, the traj is loaded fully. Defaults toNone
.traj_num (Union[int, None], optional) – If working with multiple trajs, this is the easiest unique identifier. If multiple
SingleTraj
are instantiated byencodermap.trajinfo.info_all.TrajEnsemble
thetraj_num
is used as a unique identifier per traj. Defaults toNone
.basename_fn (Optional[Callable[[str], str]]) – A function to apply to
traj
to give it another identifier. If all your trajs are called'traj.xtc'
and only the directory they’re in gives them a unique identifier, you can provide a function into this argument to split the path. The function has to take astr
and returnstr
. If None is provided, the basename is extracted like so:lambda x: x.split('/')[0].split('.')[-1]
. Defaults to None, in which case the filename without extension will be used.custom_top (Optional[CustomAAsDict]) – Optional[
encodermap._typing.CustomAAsDict
]: An instance of theencodermap.trajinfo.trajinfo_utils.CustomTopology
class or a dictionary that can be made into such.
- atom_slice(atom_indices, invert=False)[source]#
Deletes atoms from this
SingleTraj
instance.- Parameters:
atom_indices (Union[list, numpy.ndarray]) – The indices of the atoms to keep.
invert (bool) – If False, it is assumed, that the atoms in
atom_indices
are the ones to be kept. If True, the atoms inatom_indices
are the ones to be removed.
- Return type:
None
- dash_summary()[source]#
Returns a
pandas.DataFrame
with useful information about this instance.- Returns:
The dataframe.
- Return type:
pd.DataFrame
- del_CVs()[source]#
Resets the
_CVs
attribute to an emptyxarray.Dataset
.- Return type:
None
- classmethod from_pdb_id(pdb_id, traj_num=None)[source]#
Alternate constructor for the TrajEnsemble class.
Builds an SingleTraj class from a pdb-id.
- Parameters:
- Returns:
An SingleTraj class.
- Return type:
- get_single_frame(key)[source]#
Returns a single frame from the trajectory.
- Parameters:
key (Union[int, np.int]) – Index of the frame.
- Return type:
Examples
Import EncoderMap and load
SingleTraj
.>>> import encodermap as em >>> traj = em.SingleTraj("https://files.rcsb.org/view/1GHC.pdb") >>> traj.n_frames 14
Load the same traj and give it a
traj_num
for recognition in a set of multiple trajectories.>>> traj = em.SingleTraj("https://files.rcsb.org/view/1GHC.pdb", traj_num=5) >>> frame = traj.get_single_frame(2) >>> frame.id array([[5, 2]])
- iterframes(with_traj_num: bool = False) Iterable[tuple[int, SingleTraj]] [source]#
- iterframes(with_traj_num: bool = True) Iterable[tuple[int, int, SingleTraj]]
Iterator over the frames in this class.
- Parameters:
with_traj_num (bool) – Whether to return a three-tuple of traj_num, frame_num, frame (True) or just traj_num, frame (False).
- Yields:
tuple –
- A tuple containing the following:
int: The traj_num.
int: The frame_num.
encodermap.SingleTraj: An SingleTraj object.
Examples
Import EncoderMap and create
SingleTraj
instance.>>> import encodermap as em >>> traj = em.SingleTraj('https://files.rcsb.org/view/1YUG.pdb') >>> traj.n_frames 15
Slicing the trajectory every 5th frame
>>> traj = traj[::5] >>> traj.n_frames 3
Using the
iterframes()
iterator.>>> for frame_num, frame in traj.iterframes(): ... print(frame_num, frame.n_frames) 0 1 5 1 10 1
- join(other)[source]#
Join two trajectories together along the time/frame axis.
Note
Returns a
mdtraj.Trajectory
and thus loses CVs, filenames, etc.- Parameters:
other (SingleTraj | Trajectory)
- Return type:
Trajectory
- load_CV(data, attr_name=None, cols=None, deg=None, periodic=True, labels=None, override=False)[source]#
Load CVs into traj. Many options are possible. Provide xarray, numpy array, em.loading.feature, em.featurizer, and even string!
This method loads CVs into the SingleTraj instance. Many ways of doing so are available:
numpy.ndarray
: The easiest way. Provide a np array and a name forthe array, and the data will be saved as an instance variable, accesible via SingleTraj.name.
xarray.DataArray
: You can load a multidimensional xarray asdata into the class. Please refer to xarrays own documentation if you want to create one yourself.
xarray.Dataset
: You can add another dataset to the existing _CVs.encodermap.loading.features.Feature
: If you provide one of thefeatures from
encodermap.loading.features
the resulting features will be loaded and also be placed under the set name.
encodermap.loading.featurizer.Featurizer
: If you provide afull featurizer, the data will be generated and be accessible as an attribute.
- str: If a string is provided, the data will be loaded from a
.txt
,.npy
, or NetCDF / HDF5.nc
file.
- Parameters:
(Union[str (data) – em.loading.features.Feature, em.loading.featurizer.Featurizer]): The CV to load. Either as
numpy.ndarray
,xarray.DataArray
, EncoderMap feature, or EncoderMap Featurizer.numpy.ndarray – em.loading.features.Feature, em.loading.featurizer.Featurizer]): The CV to load. Either as
numpy.ndarray
,xarray.DataArray
, EncoderMap feature, or EncoderMap Featurizer.xr.DataArray – em.loading.features.Feature, em.loading.featurizer.Featurizer]): The CV to load. Either as
numpy.ndarray
,xarray.DataArray
, EncoderMap feature, or EncoderMap Featurizer.data (SingleTrajFeatureType)
attr_name (Optional[str])
deg (Optional[bool])
periodic (bool)
override (bool)
- Return type:
None
- :paramem.loading.features.Feature, em.loading.featurizer.Featurizer]):
The CV to load. Either as
numpy.ndarray
,xarray.DataArray
, EncoderMap feature, or EncoderMap Featurizer.
- Parameters:
attr_name (Optional[str]) – The name under which the CV should be found in the class. Is needed, if a raw numpy array is passed, otherwise the name will be generated from the filename (if
data == str
), the DataArray.name (ifdata == xarray.DataArray
), or the feature name.cols (Optional[list]) – A list specifying the columns to use it for the high-dimensional data. If your highD data contains (x,y,z,…)-errors or has an enumeration column at
col=0
this can be used to remove this unwanted data.deg (Optional[bool]) – Whether the provided data is in radians (False) or degree (True). It can also be None for non-angular data.
labels (Optional[Union[list, str]]) – If you want to label the data you provided, pass a list of str. If set to None, the features in this dimension will be labeled as
[f"{attr_name.upper()} FEATURE {i}" for i in range(self.n_frames)]
. If a str is provided, the features will be labeled as[f"{attr_name.upper()} {label.upper()} {i}" for i in range(self.n_frames)]
. If a list of str is provided, it needs to have the same length as the traj has frames. Defaults to None.override (bool) – Whether to overwrite existing CVs. The method will also print a message which CVs have been overwritten.
data (SingleTrajFeatureType)
periodic (bool)
- Return type:
None
Examples
Import EncoderMap and load an example Trajectory.
>>> import encodermap as em >>> traj = em.SingleTraj("https://files.rcsb.org/view/1GHC.pdb")
Load the central dihedrals using ``data=’central_dihedrals’` as shortcut.
>>> traj.load_CV("central_dihedrals") >>> traj.central_dihedrals.shape (14, 222)
>>> traj._CVs['central_dihedrals']['CENTRAL_DIHEDRALS'].values[:2] ['CENTERDIH PSI RESID MET: 1 CHAIN 0' 'CENTERDIH OMEGA RESID MET: 1 CHAIN 0']
Slicing the
SingleTraj
keeps all CVs in order.>>> import numpy as np >>> from pathlib import Path >>> traj1 = em.SingleTraj( ... Path(em.__file__).parent.parent / "tests/data/1am7_corrected.xtc", ... Path(em.__file__).parent.parent / "tests/data/1am7_protein.pdb", ... ) >>> traj1.load_CV(traj1.xyz[..., -1], 'z_coordinate') ... >>> for i, frame in enumerate(traj1): ... print(np.array_equal(frame.z_coordinate[0], frame.xyz[0, :, -1])) ... if i == 3: ... break True True True True
- Raises:
FileNotFoundError – When the file given by data does not exist.
IOError – When the provided filename does not have .txt, .npy or .nc extension.
TypeError – When data does not match the specified input types.
Exception – When a numpy array has been passed as data and no attr_name has been provided.
Exception – When the provided attr_name is str, but cannot be a python identifier.
- Parameters:
- Return type:
None
- load_custom_topology(custom_top=None)[source]#
Loads a custom_topology from a CustomTopology class or a dict.
- Parameters:
custom_top (Optional[Union[CustomTopology, CustomAAsDict]]) – Optional[Union[CustomTopology, CustomAAsDict]]: An instance of
encodermap.trajinfo.trajinfo_utils.CustomTopology
or a dictionary that can be made into such.- Return type:
None
- load_traj(new_backend='mdtraj')[source]#
Loads the trajectory, with a new specified backend.
After this is called the instance variable self.trajectory will contain a mdtraj Trajectory object.
- Parameters:
new_backend (str, optional) –
- Can either be:
'mdtraj'
to load the trajectory using mdtraj.'no_load'
to not load the traj (unload).
Defaults to
'mdtraj'
.- Return type:
None
- save(fname, CVs='all', overwrite=False)[source]#
Save the trajectory as HDF5 file format to disk.
- Parameters:
- Raises:
IOError – When the file already exists and overwrite is False.
- Return type:
None
- save_CV_as_numpy(attr_name, fname=None, overwrite=False)[source]#
Saves a specified collective variable of this traj as a
.npy
file.This got its own method for parallelization purposes.
- select(sel_str='all')[source]#
Execute a selection against the topology.
Examples
>>> import encodermap as em >>> traj = em.SingleTraj("https://files.rcsb.org/view/1GHC.pdb") >>> select = traj.top.select("name CA and resSeq 1") >>> select array([1])
>>> traj = em.SingleTraj("https://files.rcsb.org/view/1GHC.pdb") >>> select = traj.top.select("name CA and resSeq 1") >>> traj.top.atom(select[0]) MET1-CA
- show_traj(gui=True)[source]#
Returns an nglview view object.
- Returns:
The nglview widget object.
- Return type:
view (nglview.widget)
- Parameters:
gui (bool)
- sidechain_info()[source]#
Indices used for the AngleDihedralCartesianEncoderMap class to allow training with multiple different sidechains.
- Returns:
The indices. The key ‘-1’ is used for the hypothetical convex hull of all feature spaces (the output of the tensorflow model). The other keys match the common_str of the trajs.
- Return type:
- Raises:
Exception – When the common_strings and topologies are not aligned. An exception is raised. Aligned means that all trajs with the same common_str should possess the same topology.
- stack(other)[source]#
Stack two trajectories along the atom axis
Note
Returns a m``dtraj.Trajectory`` and thus loses CVs, filenames, etc.
- Parameters:
other (SingleTraj)
- Return type:
Trajectory
- superpose(reference, frame=0, atom_indices=None, ref_atom_indices=None, parallel=True, inherit_CVs=False)[source]#
Superpose each conformation in this trajectory upon a reference
- Parameters:
reference (Union[mdtraj.Trajectory, SingleTraj]) – The reference frame to align to. If the reference has multiple frames and you want to use a specific frame as reference, use the
frame
argument also.frame (int, optional) – Align to this frame in reference. Default is 0.
atom_indices (Union[np.array, None], optional) – Indices in self, used to calculate RMS values. Defaults to None which means all atoms will be used.
ref_atom_indices (Union[np.array, None], optional) – Indices in reference, used to calculate RMS values. Defaults to None which means all atoms will be used.
parallel (bool, optional) – Use OpenMP to run the superposition in parallel over multiple cores.
inherit_CVs (bool, optional) – Whether to also inherit the CVs. This feature is currently not implemented. It would require additional code in all Feature classes discerning intrinsic (distance, angle, cluster_membership, etc.) or an extrinsic feature (absolute coordinate, COG position, etc.). Then this extrinsic/intrinsic boolean flag also needs to accompany the xarray Datasets, so that the intrinsic CVs can be inherited, and the extrinsic can be dropped with a corresponding message.
- Returns:
A new trajectory with atoms aligned.
- Return type:
- unload(CVs=False)[source]#
Clears up RAM by deleting the trajectory info and the CV data.
If
CVs
is set to True the loaded CVs will also be deleted.- Parameters:
CVs (bool, optional) – Whether to also delete CVs, defaults to False.
- Return type:
None
- property CVs: dict[str, ndarray]#
Returns a simple dict from the more complicated self._CVs xarray Dataset.
If self._CVs is empty and self.traj_file is a HDF5 (.h5) file, the contents of the HDF5 will be checked, whether CVs have been stored there. If not and empty dict will be returned.
- Type:
- property _n_frames_base_h5_file: int#
Can be used to get n_frames without loading an HDF5 into memory.
- Type:
- property _original_frame_indices: ndarray#
If trajectory has not been loaded, it is loaded and the frames of the trajectory file on disk are put into a np.arange(). If the trajectory is sliced in weird ways, this array tracks the original frames.
- Type:
- property _traj#
Needs to be here to complete setter. Not returning anything, because setter is also not returning anything.
- property basename: str#
Basename is the filename without path and without extension. If basename_fn is not None, it will be applied to traj_file.
- Type:
- property extension: str#
Extension is the file extension of the trajectory file (self.traj_file).
- Type:
- property id: ndarray#
id is an array of unique identifiers which identify the frames in this SingleTraj object when multiple Trajectories are considered.
If the traj was initialized from an TrajEnsemble class, the traj gets a unique identifier (traj_num) which will also be put into the id array, so that id can have two shapes ((n_frames, ), (n_frames, 2)) This corresponds to self.id.ndim = 1 and self.id.ndim = 2. In the latter case self.id[:,1] are the frames and self.id[:,0] is an array full of traj_num.
- Type:
- property indices_chi1: ndarray#
A numpy array with shape (n_dihedrals, 4) indexing the atoms that take part in this dihedral angle. This index is 0-based.
- Type:
- property indices_chi2: ndarray#
A numpy array with shape (n_dihedrals, 4) indexing the atoms that take part in this dihedral angle. This index is 0-based.
- Type:
- property indices_chi3: ndarray#
A numpy array with shape (n_dihedrals, 4) indexing the atoms that take part in this dihedral angle. This index is 0-based.
- Type:
- property indices_chi4: ndarray#
A numpy array with shape (n_dihedrals, 4) indexing the atoms that take part in this dihedral angle. This index is 0-based.
- Type:
- property indices_chi5: ndarray#
A numpy array with shape (n_dihedrals, 4) indexing the atoms that take part in this dihedral angle. This index is 0-based.
- Type:
- property indices_omega: ndarray#
A numpy array with shape (n_dihedrals, 4) indexing the atoms that take part in this dihedral angle. This index is 0-based.
- Type:
- property indices_phi: ndarray#
A numpy array with shape (n_dihedrals, 4) indexing the atoms that take part in this dihedral angle. This index is 0-based.
- Type:
- property indices_psi: ndarray#
A numpy array with shape (n_dihedrals, 4) indexing the atoms that take part in this dihedral angle. This index is 0-based.
- Type:
- property n_atoms: int#
Number of atoms in traj.
Loads the traj into memory if not in HDF5 file format. Be aware.
- Type:
- property n_frames: int#
Number of frames in traj.
Loads the traj into memory if not in HDF5 file format. Be aware.
- Type:
- property top: Topology#
The structure of a Topology object is similar to that of a PDB file.
It consists. of a set of Chains (often but not always corresponding to polymer chains). Each Chain contains a set of Residues, and each Residue contains a set of Atoms. In addition, the Topology stores a list of which atom pairs are bonded to each other. Atom and residue names should follow the PDB 3.0 nomenclature for all molecules for which one exists
- chains#
Iterate over chains.
- Type:
generator
- residues#
Iterate over residues.
- Type:
generator
- atoms#
Iterate over atoms.
- Type:
generator
- bonds#
Iterate over bonds.
- Type:
generator
- Type:
mdtraj.Topology
- property traj: Trajectory#
This attribute always returns an mdtraj.Trajectory. if backend is ‘no_load’, the trajectory will be loaded into memory and returned.
- Type:
mdtraj.Trajectory