Trajectory classes#

SingleTraj#

class encodermap.trajinfo.info_single.SingleTraj(traj: Union[str, Path, md.Trajectory], top: Optional[str, Path] = None, common_str: str = '', backend: Literal['no_load', 'mdtraj'] = 'no_load', index: Optional[Union[int, list[int], np.ndarray, slice]] = None, traj_num: Optional[int] = None, basename_fn: Optional[Callable] = None)[source]#

Bases: object

This class contains the info about a single trajectory.

This class contains many of the attributes and methods of mdtraj’s Trajectory. It is meant to be used as a single trajectory in a ensemble defined in the TrajEnsemble class. Other than the standard mdtraj Trajectory this class loads the MD data only when needed. The location of the file and other attributes like a single integer index (single frame of trajectory) or a list of integers (multiple frames of the same traj) are stored until the traj is accessed via the SingleTraj.traj attribute. The returned traj is a mdtraj Trajectory with the correct number of frames in the correct sequence.

Furthermore this class keeps track of your collective variables. Oftentimes the raw xyz data of a trajectory is not needed and suitable CVs are selected to represent a protein via internal coordinates (torsions, pairwise distances, etc.). This class keeps tack of your CVs. Whether you call them highd or torsions, this class keeps track of everything and returns the values when you need them.

SingleTraj supports fancy indexing, so you can extract one or more frames from a Trajectory as a separate trajectory. For example, to form a trajectory with every other frame, you can slice with traj[::2].

SingleTraj uses the nanometer, degree & picosecond unit system.

backend#

Current state of loading. If backend == ‘no_load’ xyz data will be loaded from disk, if accessed. If backend == ‘mdtraj’, the data is already in RAM.

Type:

str

common_str#

Substring of traj_file. Used to group multiple trajectories together based on common topology files. If traj files protein1_traj1.xtc and protein1_traj2.xtc share the sameprotein1.pdb common_str can be set to group them together.

Type:

str

index#

Fancy slices of the trajectory. When file is loaded from disk, the fancy indexes will be applied.

Type:

Union[int, list, np.array, slice]

traj_num#

Integer to identify a SingleTraj class in a TrajEnsemble class.

Type:

int

traj_file#

Trajectory file used to create this class.

Type:

str

top_file#

Topology file used to create this class. If a .h5 trajectory was used traj_file and top_file are identical. If a mdtraj.Trajectory was used to create SingleTraj, these strings are empty.

Type:

str

Examples

>>> # load a pdb file with 14 frames from rcsb.org
>>> import encodermap as em
>>> traj = em.SingleTraj("https://files.rcsb.org/view/1GHC.pdb")
>>> print(traj)
encodermap.SingleTraj object. Current backend is no_load. Basename is 1GHC. Not containing any CVs.
>>> traj.n_frames
14
>>> # advanced slicing
>>> traj = em.SingleTraj("https://files.rcsb.org/view/1GHC.pdb")[-1:7:-2]
>>> print([frame.id for frame in traj])
[13, 11, 9]
>>> # Build a trajectory ensemble from multiple trajs
>>> traj1 = em.SingleTraj("https://files.rcsb.org/view/1YUG.pdb")
>>> traj2 = em.SingleTraj("https://files.rcsb.org/view/1YUF.pdb")
>>> trajs = traj1 + traj2
>>> print(trajs.n_trajs, trajs.n_frames, [traj.n_frames for traj in trajs])
2 31 [15, 16]
property CVs: dict[str, numpy.ndarray]#

Returns a simple dict from the more complicated self._CVs xarray Dataset.

If self._CVs is empty and self.traj_file is a HDF5 (.h5) file, the contents of the HDF5 will be checked, whether CVs have been stored there. If not and empty dict will be returned.

Type:

dict

property CVs_in_file: bool#

Is True, if traj_file has exyension .h5 and contains CVs.

Type:

bool

__add__(y: SingleTraj) TrajEnsemble[source]#

Addition of two SingleTraj classes yields TrajEnsemble class. A trajectory ensemble.

Parameters:

y (encodermap.SingleTraj) – The other traj, that will be added.

Returns:

The new trajs.

Return type:

encodermap.TrajEnsemble

__enter__()[source]#

Enters context manager. Inside context manager, the traj stays loaded.

__eq__(other: SingleTraj) bool[source]#

Two SingleTraj objetcs are the same, when the trajectories are the same, the files are the same and the loaded CVs are the same.

__exit__(type, value, traceback)[source]#

Exits the context manager and deletes unwanted variables.

__getattr__(attr)[source]#

What to do when attributes can not be obtained in a normal way?.

This method allows access of the self.CVs dictionary’s values as instance variables. Furthermore, of a mdtraj variable is called, the traj is loaded and the correct variable is returned.

__getitem__(key)[source]#

This method returns another trajectory as an SingleTraj class.

Parameters:

key (Union[int, list[int], np.ndarray, slice]) – Indexing the trajectory can be done by int (returns a traj with 1 frame), lists of int or np.ndarray (returns a new traj with len(traj) == len(key)), or slice ([::3]), which returns a new traj with the correct number of frames.

Returns:

An SingleTraj object with this frame in it.

Return type:

Info_Single

__init__(traj: Union[str, Path, md.Trajectory], top: Optional[str, Path] = None, common_str: str = '', backend: Literal['no_load', 'mdtraj'] = 'no_load', index: Optional[Union[int, list[int], np.ndarray, slice]] = None, traj_num: Optional[int] = None, basename_fn: Optional[Callable] = None) None[source]#

Initilaize the SingleTraj object with location and reference pdb file.

Parameters:
  • traj (Union[str, mdtraj.Trajectory]) – The trajectory. Can either be teh filename of a trajectory file (.xtc, .dcd, .h5, .trr) or a mdtraj.Trajectory.

  • top (Union[str, mdtraj.Topology], optional) – The path to the reference pdb file. Defaults to ‘’. If an mdtraj.Trajectory or a .h5 traj filename is provided this option is not needed.

  • common_str (str, optional) – A string to group traj of similar topology. If multiple trajs are loaded (TrajEnsemble) this common_str is used to group them together. Defaults to ‘’ and won’t be matched to other trajs. If traj files protein1_traj1.xtc and protein1_traj2.xtc share the sameprotein1.pdb and protein2_traj.xtc uses protein2.pdb as its topology this argument can be [‘protein1’, ‘protein2’].

  • backend (Literal['no_load', 'mdtraj'], optional) – Chooses the backend to load trajectories. * ‘mdtraj’ uses mdtraj which loads all trajecoties into RAM. * ‘no_load’ creates an empty trajectory object. Defaults to ‘no_load’

  • () (index) – An integer or an array giving the indices. If an integer is provided only the frame at this position will be loaded once the internal mdtraj.Trajectory is accessed. If an array or list is provided the corresponding frames will be used. These indices can have duplicates: [0, 1, 1, 2, 0, 1]. A slice object can also be provided. Supports fancy slicing like traj[1:50:3]. If None is provided the trajectory is simply loaded as is. Defaults to None

  • traj_num (Union[int, None], optional) – If working with multiple trajs this is the easiest unique identifier. If multiple SingleTrajs are instantiated by TrajEnsemble the traj_num is used as unique identifier per traj. Defaults to None.

  • basename_fn (Optional[Callable]) – A function to apply to traj_file to give it a unique identifier. If all your trajs are called traj.xtc and only the directory they’re in gives them a unique identifier you can provide a function into this argument to split the path. If None is provided the basename is extracted liek so: `lambda x: x.split(‘/’)[0].split(‘.’)[-1]. Defaults to None.

__iter__()[source]#

Iterate over frames in this class. Returns the correct CVs along with the frame of the trajectory.

__reversed__() SingleTraj[source]#

Reverses the frame order of the traj. Same as traj[::-1]

_add_along_traj(y: SingleTraj) TrajEnsemble[source]#

Puts self and y into a TrajEnsemble object.

This way the trajectories are not appended along the timed axis but rather along the trajectory axis.

Parameters:

y (SingleTraj) – The other ep.SingleTraj trajectory.

_gen_ensemble() TrajEnsemble[source]#

Creates an TrajEnsemble class with this traj in it.

This method is needed to add two SingleTraj objects along the trajectory axis with the method add_new_traj. This method is also called by the __getitem__ method of the TrajEnsemble class.

_mdtraj_attr = ['n_frames', 'n_atoms', 'n_chains', 'n_residues', 'openmm_boxes', 'openmm_positions', 'time', 'timestep', 'xyz', 'unitcell_vectors', 'unitcell_lengths', 'unitcell_angles', '_check_valid_unitcell', '_distance_unit', '_have_unitcell', '_rmsd_traces', '_savers', '_string_summary_basic', '_time', '_time_default_to_arange', '_topology', '_unitcell_angles', '_unitcell_lengths', '_xyz']#
property _n_frames_base_h5_file: int#

Can be used to get n_frames without loading an HDF5 into memory.

Type:

int

property _original_frame_indices#
_string_summary() str[source]#

Returns a summary about the current instance.

Number of frames, index, loaded CVs.

property _traj#

Needs to be here to complete setter. Not returning anything, because setter is also not returning anything.

_validate_uri(uri: str) bool[source]#

Checks whether uri is a valid uri.

atom_slice(atom_indices: ndarray, inplace: bool = False) Union[None, SingleTraj][source]#

Create a new trajectory from a subset of atoms.

Parameters:
  • atom_indices (Union[list, np.array]) – The indices of the

  • keep. (atoms to) –

  • inplace (bool, optional) – Whether to overwrite the current instance,

  • False. (or return a new instance. Defaults to) –

property basename: str#

Basename is the filename without path and without extension. If basename_fn is not None, it will be applied to traj_file.

Type:

str

property extension: str#

Extension is the file extension of the trajectory file (self.traj_file).

Type:

str

classmethod from_pdb_id(pdb_id: str) SingleTraj[source]#

Alternate constructor for the TrajEnsemble class.

Builds an SingleTraj class from a pdb-id.

Parameters:

pdb_id (str) – The 4-letter pdb id.

Returns:

An SingleTraj class.

Return type:

SingleTraj

get_single_frame(key: int) SingleTraj[source]#

Returns a single frame from the trajectory.

Parameters:

key (Union[int, np.int]) – Index of the frame.

Examples

>>> # Load traj from pdb
>>> import encodermap as em
>>> traj = em.SingleTraj("https://files.rcsb.org/view/1GHC.pdb")
>>> traj.n_frames
14
>>> # Load the same traj and give it a number for recognition in a set of multiple trajs
>>> traj = em.SingleTraj("https://files.rcsb.org/view/1GHC.pdb", traj_num=5)
>>> frame = traj.get_single_frame(2)
>>> frame.id
array([[5, 2]])
property id: ndarray#

id is an array of unique identifiers which identify the frames in this SingleTraj object when multiple Trajectories are considered.

If the traj was initialized from an TrajEnsemble class, the traj gets a unique identifier (traj_num) which will also be put into the id array, so that id can have two shapes ((n_frames, ), (n_frames, 2)) This corresponds to self.id.ndim = 1 and self.id.ndim = 2. In the latter case self.id[:,1] are the frames and self.id[:,0] is an array full of traj_num.

Type:

np.ndarray

join(other: SingleTraj) Trajectory[source]#

Join two trajectories together along the time/frame axis.

Returns a mdtraj.Trajectory and thus loses CVs, filenames, etc.

load_CV(data: SingleTrajFeatureType, attr_name: Optional[str] = None, cols: Optional[list[int]] = None, labels: Optional[list[str]] = None, override: bool = False) None[source]#

Load CVs into traj. Many options are possible. Provide xarray, numpy array, em.loading.feature, em.featurizer, and even string!

This method loads CVs into the SingleTraj class. Many ways of doing so are available:
  • np.ndarray: The easiest way. Provide a np array and a name for the array and the data

    will be saved as a instance variable, accesible via instance.name.

  • xarray.DataArray: You can load a multidimensional xarray as data into the class. Please

    refer to xarrays own documentation if you want to create one yourself.

  • xarray.Dataset: You can add another dataset to the existing _CVs.

  • em.loading.feature: If you provide one of the features from em.loading.features the resulting

    features will be loaded and also placed under the provided name.

  • em.Featurizer: If you provide a full featurizer, the data will be generated and put as an

    instance variable as the provided name.

  • str: If a string is provided, the data will be loaded from a .txt, .npy, or NetCDF / HDF5 .nc file.

Parameters:
  • data (Union[str, np.ndarray, xr.DataArray, em.loading.feature, em.Featurizer]) – The CV to load. Either as numpy array, xarray DataArray, encodermap or pyemma feature, or full encodermap Featurzier.

  • attr_name (Union[None, str], optional) – The name under which the CV should be found in the class. Is needed, if a raw numpy array is passed, otherwise the name will be generated from the filename (if data == str), the DataArray.name (if data == xarray.DataArray), or the feature name.

  • cols (Union[list, None], optional) – A list specifying the columns to use for the highD data. If your highD data contains (x,y,z,…)-errors or has an enumeration column at col=0 this can be used to remove this unwanted data.

  • labels (Union[list, str, None], optional) – If you want to label the data you provided pass a list of str. If set to None, the features in this dimension will be labelled as [f”{attr_name.upper()} FEATURE {i}” for i in range(self.n_frames)]. If a str is provided, the features will be labelled as [f”{attr_name.upper()} {label.upper()} {i}” for i in range(self.n_frames)]. If a list of str is provided it needs to have the same length as the traj has frames. Defaults to None.

  • override (bool) – Whether to overwrite existing CVs. The method will also print a message which CVs have been overwritten.

Examples

>>> # Load the backbone torsions from a time-resolved NMR ensemble from the pdb
>>> import encodermap as em
>>> traj = em.SingleTraj("https://files.rcsb.org/view/1GHC.pdb")
>>> central_dihedrals = em.loading.features.CentralDihedrals(traj.top)
>>> traj.load_CV(central_dihedrals)
>>> traj.central_dihedrals.shape
(1, 14, 222)
>>> # The values are stored in an xarray Dataset to track every possible datafield
>>> traj = em.SingleTraj("https://files.rcsb.org/view/1GHC.pdb")
>>> traj.load_CV(em.loading.features.CentralDihedrals(traj.top))
>>> print(traj._CVs['central_dihedrals']['CENTRALDIHEDRALS'].values[:2])
['CENTERDIH PSI   RESID  MET:   1 CHAIN 0'
 'CENTERDIH OMEGA RESID  MET:   1 CHAIN 0']
Raises:
  • FileNotFoundError – When the file given by data does not exist.

  • IOError – When the provided filename does not have .txt, .npy or .nc extension.

  • TypeError – When data does not match the specified input types.

  • Exception – When a numpy array has been passed as data and no attr_name has been provided.

  • BadError – When the provided attr_name is str, but can not be a python identifier.

load_traj(new_backend: Literal['no_load', 'mdtraj'] = 'mdtraj') None[source]#

Loads the trajectory, with a new specified backend.

After this is called the instance variable self.trajectory will contain an mdtraj Trajectory object.

Parameters:

new_backend (str, optional) – Can either be: * mdtraj to load the trajectory using mdtraj. * no_load to not load the traj (unload). Defaults to mdtraj.

property n_atoms: int#

Number of atoms in traj.

Loads the traj into memory if not in HDF5 file format. Be aware.

Type:

int

property n_chains: int#

Number of chains in traj.

Type:

int

property n_frames: int#

Number of frames in traj.

Loads the traj into memory if not in HDF5 file format. Be aware.

Type:

int

property n_residues: int#

Number of residues in traj.

Type:

int

save(fname: str, CVs: Union[str, list[str]] = 'all', overwrite: bool = False) None[source]#

Save the trajectory as HDF5 fileformat to disk,

Parameters:
  • fname (str) – The filename.

  • CVs (Union[List, 'all'], optional) – Either provide a list of strings of the CVs you would like to save to disk, or set to ‘all’ to save all CVs. Defaults to [].

  • overwrite (bool, optional) – Whether to force overwrite an existing file. Defaults to False.

Raises:

IOError – When the file already exists and overwrite is False.

save_CV_as_numpy(attr_name: str, fname: Optional[str] = None, overwrite: bool = False) None[source]#

Saves the highD data of this traj.

This got its own method for parallelization purposes.

Parameters:
  • attr_name (str) – Name of the CV to save.

  • fname (str, optional) – Can be either

  • overwrite (bool, opt) – Whether to overwrite the file. Defaults to False.

Raises:

IOError – When the file already exists and overwrite is set to False.

select(sel_str: str = 'all') ndarray[source]#

Execute a selection against the topology

Parameters:

sel_str (str, optional) – What to select. Defaults to ‘all’.

Examples

>>> import encodermap as em
>>> traj = em.SingleTraj("https://files.rcsb.org/view/1GHC.pdb")
>>> select = traj.top.select("name CA and resSeq 1")
>>> select
array([1])
>>> traj = em.SingleTraj("https://files.rcsb.org/view/1GHC.pdb")
>>> select = traj.top.select("name CA and resSeq 1")
>>> traj.top.atom(select[0])
MET1-CA
show_traj(gui: bool = True) nglview.view[source]#

Returns an nglview view object.

Returns:

The nglview widget object.

Return type:

view (nglview.widget)

stack(other: SingleTraj) Trajectory[source]#

Stack two trajectories along the atom axis

Returns a mdtraj.Trajectory and thus loses CVs, filenames, etc.

superpose(reference: Union[Trajectory, SingleTraj], frame: int = 0, atom_indices: Optional[ndarray] = None, ref_atom_indices: Optional[ndarray] = None, parallel: bool = True) SingleTraj[source]#

Superpose each conformation in this trajectory upon a reference

Parameters:
  • reference (Union[mdtraj.Trajectory, SingleTraj]) – The reference frame to align to.

  • reame (int, optional) – Align to this frame in reference. Defaults to 1.

  • atom_indices (Union[np.array, None], optional) – Indices in self, used to calculate RMS values. Defaults to None, whcih means all atoms will be used.

  • ref_atom_indices (Union[np.array, None], optional) – Indices in reference, used to calculate RMS values. Defaults to None, whcih means all atoms will be used.

  • parallel (bool, optional) – Use OpenMP to run the superposition in parallel over multiple cores.

Returns:

A new aligned trajectory.

Return type:

SingleTraj

property top: Topology#

The structure of a Topology object is similar to that of a PDB file.

It consists. of a set of Chains (often but not always corresponding to polymer chains). Each Chain contains a set of Residues, and each Residue contains a set of Atoms. In addition, the Topology stores a list of which atom pairs are bonded to each other. Atom and residue names should follow the PDB 3.0 nomenclature for all molecules for which one exists

chains#

Iterate over chains.

Type:

generator

residues#

Iterate over residues.

Type:

generator

atoms#

Iterate over atoms.

Type:

generator

bonds#

Iterate over bonds.

Type:

generator

Type:

mdtraj.Topology

property top_file: str#

The topology file as a string (rather than a pathlib.Path).

Type:

str

property traj: Trajectory#

This attribute always returns an mdtraj.Trajectory. if backend is ‘no_load’, the trajectory will be loaded into memory and returned.

Type:

mdtraj.Trajectory

property traj_file: str#

The traj file as a string (rather than a pathlib.Path).

Type:

str

unload(CVs: bool = False) None[source]#

Clears up RAM by deleting the trajectory Info and the CV data.

If CVs is set to True the loaded CVs will also be deleted.

Parameters:

CVs (bool, optional) – Whether to also delete CVs, defaults to False.

TrajEnsemble#

class encodermap.trajinfo.info_all.TrajEnsemble(trajs: Union[list[str], list[md.Trajectory], list[SingleTraj], list[Path]], tops: Optional[list[str]] = None, backend: Literal['mdtraj', 'no_load'] = 'no_load', common_str: Optional[list[str]] = None, basename_fn: Optional[Callable] = None)[source]#

Bases: object

This class contains the info about many trajectories. Topologies can be mismatching.

This class is a fancy list of encodermap.trajinfo.SingleTraj objects. Trajectories can have different topologies and will be grouped by the common_str argument.

TrajEnsemble supports fancy indexing. You can slice to your liking trajs[::5] returns an TrajEnsemble object that only consideres every fifth frame. Besides indexing by slices and integers you can pass a 2 dimensional np.array. np.array([[0, 5], [1, 10], [5, 20]]) will return a TrajEnsemble object with frame 5 of trajectory 0, frame 10 of trajectory 1 and frame 20 of trajectory 5. Simply passing an integer as index returns the corresponding SingleTraj object.

The TrajEnsemble class also contains an iterator to iterate over trajectores. You could do:: >>> for traj in trajs: … for frame in traj: … print(frame)

CVs#

The collective variables of the SingleTraj classes. Only CVs with matching names in all SingleTraj classes are returned. The data is stacked along a hypothetical time axis along the trajs.

Type:

dict

_CVs#

The same data as in CVs but with labels. Additionally, the xarray is not stacked along the time axis. It contains an extra dimension for trajectories.

Type:

xarray.Dataset

n_trajs#

Number of individual trajectories in this class.

Type:

int

n_frames#

Number of frames, sum over all trajectories.

Type:

int

locations#

A list with the locations of the trajectories.

Type:

list of str

top#

A list with the reference pdb for each trajecotry.

Type:

list of mdtraj.Topology

basenames#

A list with the names of the trajecotries. The leading path and the file extension is omitted.

Type:

list of str

name_arr#

An array with len(name_arr) == n_frames. This array keeps track of each frame in this object by identifying each frame with a filename. This can be useful, when frames are mixed inside an TrajEnsemble class.

Type:

np.ndarray of str

index_arr#

index_arr.shape = (n_frames, 2). This array keeps track of each frame with two ints. One giving the number of the trajectory, the other the frame.

Type:

np.ndarray of str

Examples

>>> # Create a trajectory ensemble from a list of files
>>> import encodermap as em
>>> trajs = em.TrajEnsemble(['https://files.rcsb.org/view/1YUG.pdb', 'https://files.rcsb.org/view/1YUF.pdb'])
>>> # trajs are inernally numbered
>>> print([traj.traj_num for traj in trajs])
[0, 1]
>>> # Build a new traj from random frames
>>> # Let's say frame 2 of traj 0, frame 5 of traj 1 and again frame 2 of traj 0
>>> # Doing this every frame will now be its own trajectory for easier bookkepping
>>> arr = np.array([[0, 2], [1, 5], [0, 2]])
>>> new_trajs = trajs[arr]
>>> print(new_trajs.n_trajs)
3
>>> # trace back a single frame
>>> frame_num = 28
>>> index = trajs.index_arr[frame_num]
>>> print('Frame {}, originates from trajectory {}, frame {}.'.format(frame_num, trajs.basenames[index[0]], index[1]))
Frame 28, originates from trajectory 1YUF, frame 13.
property CVs: dict[str, numpy.ndarray]#

Returns dict of CVs in SingleTraj classes. Only CVs with the same names in all SingleTraj classes are loaded.

Type:

dict

property CVs_in_file: bool#

Is true, if CVs can be loaded from file. Can be used to build a data generator from.

Type:

bool

property _CVs: Dataset#

Returns x-array Dataset of matching CVs. stacked along the trajectory-axis.

Type:

xarray.Dataset

__add__(y)[source]#

Addition of two TrajEnsemble objects returns new TrajEnsemble with trajectories joined along the traj axis.

__init__(trajs: Union[list[str], list[md.Trajectory], list[SingleTraj], list[Path]], tops: Optional[list[str]] = None, backend: Literal['mdtraj', 'no_load'] = 'no_load', common_str: Optional[list[str]] = None, basename_fn: Optional[Callable] = None) None[source]#

Initialize the Info class with two lists of files.

Parameters:
  • trajs (Union[list[str], list[md.Trajectory], list[SingleTraj], list[Path]]) – List of strings with paths to trajectories.

  • tops (Optional[list[str]]) – List of strings with paths to reference pdbs.

  • backend (str, optional) – Chooses the backend to load trajectories. * ‘mdtraj’ uses mdtraj which loads all trajecoties into RAM. * ‘no_load’ creates an empty trajectory object. Defaults to ‘no_load’.

  • common_str (list of str, optional) – If you want to include trajectories with different topology. The common string is used to pair traj-files (.xtc, .dcd, .lammpstrj) with their topology (.pdb, .gro, …). The common-string should be a substring of matching trajs and topologies.

  • basename_fn (Union[None, function], optional) – A function to apply to the traj_file string to return the basename of the trajectory. If None is provided, the filename without extension will be used. When all files are named the same and the folder they’re in defines the name of the trajectory you can supply lambda x: split(‘/’)[-2] as this argument. Defaults to None.

Raises:

TypeError – If some of your inputs are mismatched. If your input lists contain other types than str or mdtraj.Trajecotry.

_pyemma_indexing(key: ndarray) TrajEnsemble[source]#

Returns a new TrajEnsemble by giving the indices of traj and frame

_return_trajs_by_index(index: list[int]) TrajEnsemble[source]#

Creates a TrajEnsemble object with the trajs specified by index.

_string_summary() str[source]#
property basenames: list[str]#

List of the basenames in the Info single classes.

Type:

list

property frames: list[int]#

Frames of individual trajectories.

Type:

list

classmethod from_textfile(fname, basename_fn=None) TrajEnsemble[source]#

Creates an TrajEnsemble object from a textfile.

The textfile needs to be space-separated with two or three columns. Column 1: The trajectory file. Column 2: The corresponding topology file (If you are using .h5 trajs,

column 1 and 2 will be identical).

Column 3: The common string of the trajectory. This column can be left

out, which will result in an TrajEnsemble without common_strings.

Parameters:
  • fname (str) – File to be read.

  • basename_fn (Union[None, function], optional) – A function to apply to the traj_file string to return the basename of the trajectory. If None is provided, the filename without extension will be used. When all files are named the same and the folder they’re in defines the name of the trajectory you can supply lambda x: split(‘/’)[-2] as this argument. Defaults to None.

Returns:

An instantiated TrajEnsemble class.

Return type:

TrajEnsemble

classmethod from_xarray(fnames, basename_fn=None) TrajEnsemble[source]#
get_single_frame(key: int) SingleTraj[source]#

Returns a single frame from all loaded trajectories.

Consider a TrajEnsemble class with two SingleTraj classes. One has 10 frames, the other 5 (trajs.n_frames is 15). Calling trajs.get_single_frame(12) is equal to calling trajs[1][1].

Parameters:

key (int) – The frame to return.

Returns:

The frame.

Return type:

encodermap.SingleTraj

property id: ndarray#

Duplication of self.index_arr

Type:

np.ndarray

property index_arr: ndarray#

Returns np.ndarray with ndim = 2. Clearly assigning every loaded frame an identifier of traj_num (self.index_arr[:,0]) and frame_num (self.index_arr[:,1]). Can be used to create a unspecified subset of frames and can be useful when used with clustering.

Type:

np.ndarray

iterframes() Iterator[tuple[int, SingleTraj]][source]#

Generator over the frames in this class.

Yields:

tuple

A tuple containing the following:

int: A loop-counter integer. encodermap.SingleTraj: An SingleTraj object.

Examples

>>> import encodermap as em
>>> trajs = em.TrajEnsemble(['https://files.rcsb.org/view/1YUG.pdb', 'https://files.rcsb.org/view/1YUF.pdb'])
>>> for i, frame in trajs.iterframes():
...     print(frame.basename)
...     print(frame.n_frames)
...     break
1YUG
1
itertrajs() Iterator[tuple[int, SingleTraj]][source]#

Generator over the SingleTraj classes.

Yields:

tuple

A tuple containing the following:

int: A loop-counter integer. Is identical with traj.traj_num. encodermap.SingleTraj: An SingleTraj object.

Examples

>>> import encodermap as em
>>> trajs = em.TrajEnsemble(['https://files.rcsb.org/view/1YUG.pdb', 'https://files.rcsb.org/view/1YUF.pdb'])
>>> for i, traj in trajs.itertrajs():
...     print(traj.basename)
1YUG
1YUF
load_CVs(data: TrajEnsembleFeatureType, attr_name: Optional[str] = None, cols: Optional[list[int]] = None, labels: Optional[list[str]] = None, directory: Optional[Union[str, Path]] = None, ensemble: bool = False) None[source]#

Loads CVs in various ways. Easiest way is to provide a single numpy array and a name for that array.

Besides np.ndarrays, files (.txt and .npy) can be loaded. Features or Featurizers can be provided. An xarray.Dataset can be provided. A str can be provided that either is the name of one of encodermap’s features (encodermap.loading.features) or the string can be ‘all’, which loads all features required for encodermap’s AngleDihedralCarteisanEncoderMap class.

Parameters:
  • data (Union[str, list, np.ndarray, 'all', xr.Dataset]) – The CV to load. When a numpy array is provided, it needs to have a shape matching n_frames. The data is distributed to the trajs. When a list of files is provided, len(data) needs to match n_trajs. The first file will be loaded by the first traj and so on. If a list of np.arrays is provided, the first array will be assigned to the first traj. If a None is provided, the arg directory will be used to construct fname = directory + traj.basename + ‘_’ + attr_name. The filenames will be used. These files will then be loaded and put into the trajs. Defaults to None.

  • attr_name (Optional[str]) – The name under which the CV should be found in the class. Choose whatever you like. highd, lowd, dists, etc…

  • cols (Optional[list[int]]) – A list of integers indexing the columns of the data to be loaded. This is useful, if a file contains feature1, feature1, …, feature1_err, feature2_err formatted data. This option will only be used, when loading multiple .txt files. If None is provided all columns will be loaded. Defaults to None.

  • labels (list) – A list containing the labels for the dimensions of the data. Defaults to None.

  • directory (Optional[str]) – The directory to save the data at, if data is an instance of em.Featurizer and this featurizer has in_memory set to Fase. Defaults to ‘’.

  • ensemble (bool) – Whether the trajs in this class belong to an ensemble. This implies that they contain either the same topology or are very similar (think wt, and mutant). Setting this option True will try to match the CVs of the trajs onto a same dataset. If a VAL residue has been replaced by LYS in the mutant, the number of sidechain dihedrals will increase. The CVs of the trajs with VAL will thus contain some NaN values. Defaults to False.

Raises:

TypeError – When wrong Type has been provided for data.

load_trajs() None[source]#

Loads all trajs in self.

property locations: list[str]#

Duplication of self.traj_files but using the trajs own traj_file attribute. Ensures that traj files are always returned independent from current load state.

Type:

list

property n_frames: int#

Sum of the loaded frames.

Type:

int

property n_residues: int#

List of number of residues of the SingleTraj classes

Type:

list

property n_trajs: int#

Number of trajectories in this encemble.

Type:

int

property name_arr: ndarray#

Trajectory names with the same length as self.n_frames.

Type:

np.ndarray

save()[source]#
save_CVs(path: Union[str, Path]) None[source]#

Saves the CVs to a NETCDF file using xarray.

split_into_frames(inplace: bool = False) None[source]#

Splits self into separate frames.

Parameters:

inplace (bool, optionale) – Whether to do the split inplace or not. Defaults to False and thus, returns a new TrajEnsemble class.

subsample(stride: int, inplace: bool = False) Union[None, TrajEnsemble][source]#

Returns a subset of this TrajEnsemble class given the provided stride.

This is a faster alternative than using the trajs[trajs.index_arr[::1000]] when HDF5 trajs are used, because the slicing information is saved in the respective SingleTraj classes and loading of single frames is faster in HDF5 formatted trajs.

Note

The result from subsample() is different from trajs[trajs.index_arr[::1000]]. With subsample every trajectory is subsampled independently. Cosnider a TrajEnsemble with two SingleTraj trajectories with 18 frames each. subsampled = trajs.subsample(5) would return an TrajEnsemble with two trajs with 3 frames each (subsampled.n_frames is 6). Whereas subsampled = trajs[trajs.index_arr[::5]] would return an TrajEnsemble with 7 SingleTrajs with 1 frame each (subsampled.n_frames is 7). Because the times and frame numbers are saved all the time this should not be too much of a problem.

property top: list[mdtraj.core.topology.Topology]#

Returns a minimal set of mdtraj.Topologies.

If all trajectories share the same topology a list with len 1 will be returned.

Type:

list

property top_files: list[str]#

Returns minimal set of topology files.

If yoy want a list of top files with the same length as self.trajs use self._top_files and self._traj_files.

Type:

list

property traj_files: list[str]#

A list of the traj_files of the individual SingleTraj classes.

Type:

list

property traj_joined: Trajectory#

Returns a mdtraj Trajectory with every frame of this class appended along the time axis.

Can also work if different topologies (with the same number of atoms) are loaded. In that case, the first frame in self will be used as topology parent and the remaining frames’ xyz coordinates are used to position the parents’ atoms accordingly.

Examples

>>> import encodermap as em
>>> single_mdtraj = trajs.split_into_frames().traj_joined
>>> print(single_mdtraj)
<mdtraj.Trajectory with 31 frames, 720 atoms, 50 residues, without unitcells>
Type:

mdtraj.Trajectory

property traj_nums: list[int]#

Number of info single classes in self.

Type:

list

unload() None[source]#

Unloads all trajs in self.

property xyz: ndarray#

xyz coordinates of all atoms stacked along the traj-time axis. Only works if all trajs share the same topology.

Type:

np.ndarray