TrajEnsemble#
- class TrajEnsemble(trajs, tops=None, backend='no_load', common_str=None, basename_fn=None, traj_nums=None, custom_top=None)[source]#
A fancy list of single trajectories. Topologies can be different across trajs.
Check out http://statisticalbiophysicsblog.org/?p=92 for why trajectory ensembles are awesome.
This class is a fancy list of
encodermap.trajinfo.info_single.SingleTraj`
. Trajectories can have different topologies and will be grouped by thecommon_str
argument. Each trajectory has its own uniquetraj_num
, which identifies it in the ensemble - even when the ensemble is sliced or subsampled.Examples
>>> import encodermap as em >>> traj1 = em.SingleTraj.from_pdb_id("1YUG") >>> traj2 = em.SingleTraj.from_pdb_id("1YUF")
Addition of two
encodermap.trajinfo.info_single.SingleTraj
also creates an ensemble.>>> trajs = traj1 + traj2 >>> trajs <encodermap.TrajEnsemble object. Current backend is no_load. Containing 2 trajectories. Common str is ['1YUG', '1YUF']. Not containing any CVs...>
Indexing a
TrajEnsemble
returns aencodermap.trajinfo.info_single.SingleTraj
based on its 0-based index. Think of theTrajEnsmeble
as a list ofencodermap.trajinfo.info_single.SingleTraj
. But trajectories can also havetraj_nums
, which do not have to adhere to[0, 1, 2, ...]
. This is similar to how apandas.DataFrame
offers indexing via.loc[]
and.iloc[]
(https://pandas.pydata.org/docs/user_guide/indexing.html#different-choices-for-indexing). For indexing trajs based on theirtraj_num
, you can use the.tsel[]
accessor of theTrajEnsmeble
Examples
>>> import encodermap as em >>> traj1 = em.SingleTraj.from_pdb_id("1YUG") >>> traj2 = em.SingleTraj.from_pdb_id("1YUF")
Addition of two SingleTraj also creates an ensemble.
>>> trajs = traj1 + traj2 >>> trajs.traj_nums [0, 1]
Change the
traj_num
oftraj2
>>> trajs[1].traj_num = 4 >>> trajs.traj_nums [0, 4] >>> trajs[1] <encodermap.SingleTraj object. Currently not in memory. Basename is '1YUF'. Not containing any CVs. Common string is '1YUF'. Object at ...> >>> trajs.tsel[4] <encodermap.SingleTraj object. Currently not in memory. Basename is '1YUF'. Not containing any CVs. Common string is '1YUF'. Object at ...>
TrajEnsemble
supports fancy indexing. You can slice to your liking (trajs[::5]
returns aTrajEnsemble
object that only consideres every fifth frame). Besides indexing by slices and integers, you can pass a 2-dimensionalnumpy.ndarray
.np.array([[0, 5], [1, 10], [5, 20]])
will return aTrajEnsemble
object with frame 5 of trajectory 0, frame 10 of trajectory 1 and frame 20 of trajectory 5.Examples
>>> import encodermap as em >>> traj1 = em.SingleTraj.from_pdb_id("1YUG") >>> traj2 = em.SingleTraj.from_pdb_id("1YUF") >>> trajs = traj1 + traj2 >>> sel = trajs[[[0, 0], [0, 1], [0, 2], [1, 10]]] >>> sel <encodermap.TrajEnsemble object. Current backend is no_load. Containing 4 frames and 2 trajectories. Common str is...>
The
TrajEnsemble
class also is an iterator to iterate over trajectores. Besides plain iteration, theTrajEnsmeble
also offers alternate iterators. Theitertrajs()
iterator returns a two-tuple oftraj_num
andtraj
. Theiterframes()
iterator returns a three-tuple oftraj_num
,frame_num
, andtraj
.Examples
>>> import encodermap as em >>> traj1 = em.SingleTraj.from_pdb_id("1YUG") >>> traj2 = em.SingleTraj.from_pdb_id("1YUF") >>> trajs = traj1 + traj2 >>> trajs[1].traj_num = 4 >>> for traj_num, traj in trajs.itertrajs(): ... print(traj_num, traj.n_frames) 0 15 4 16 >>> for traj_num, frame_num ,traj in trajs.subsample(10).iterframes(): ... print(traj_num, frame_num, traj.n_frames) 0 0 1 0 10 1 4 0 1 4 10 1
The
TrajEnsemble
has multiple alternative constructors. Thewith_overwrite_trajnums
constructor fixes inhomogeneous sequences ofencodermap.trajinfo.info_single.SingleTraj
andTrajEnsemble
.Examples
>>> import encodermap as em >>> traj1 = em.SingleTraj.from_pdb_id("1YUG", traj_num=0) >>> traj2 = em.SingleTraj.from_pdb_id("1YUF", traj_num=0) >>> trajs = em.TrajEnsemble([traj1, traj2]) Traceback (most recent call last): ... Exception: The `traj_num` attributes of the provided 2 `SingleTraj`s is not unique, the `traj_num` 0 occurs 2 times. This can happen, if you use `SingleTraj`s, that are already part of a `TrajEnsemble`. To create copies of the `SingleTraj`s and overwrite their `traj_num`s, use the `with_overwrite_trajnums()` constructor. >>> trajs = em.TrajEnsemble.with_overwrite_trajnums(traj1, traj2) >>> trajs <encodermap.TrajEnsemble...>
The
from_dataset
constructor can be used to load an ensemble from an.h5
fileExamples
>>> import encodermap as em >>> from tempfile import TemporaryDirectory >>> traj1 = em.SingleTraj.from_pdb_id("1YUG") >>> traj2 = em.SingleTraj.from_pdb_id("1YUF") >>> trajs = em.TrajEnsemble([traj1, traj2]) >>> with TemporaryDirectory() as td: ... trajs.save(td + "/trajs.h5") ... new = em.TrajEnsemble.from_dataset(td + "/trajs.h5") ... print(new) encodermap.TrajEnsemble object. Current backend is no_load. Containing 2 trajectories. Common str is...Not containing any CVs.
- CVs#
The collective variables of the
SingleTraj
classes. Only CVs with matching names in allSingleTraj
classes are returned. The data is stacked along a hypothetical time axis along the trajs.
- _CVs#
The same data as in CVs but with labels. Additionally, the xarray is not stacked along the time axis. It contains an extra dimension for trajectories.
- Type:
- basenames#
A list with the names of the trajecotries. The leading path and the file extension is omitted.
- name_arr#
An array with
len(name_arr) == n_frames
. This array keeps track of each frame in this object by identifying each frame with a filename. This can be useful, when frames are mixed inside aTrajEnsemble
class.- Type:
np.ndarray
Instantiate the
TrajEnsmeble
class with two lists of files.- Parameters:
(Union[Sequence[str] (trajs) – Sequence[SingleTraj], Sequence[Path]]): List of strings with paths to trajectories. Can also be a list of md.Trajectory or em.SingleTraj.
Sequence[md.Trajectory] – Sequence[SingleTraj], Sequence[Path]]): List of strings with paths to trajectories. Can also be a list of md.Trajectory or em.SingleTraj.
trajs (Union[Sequence[str], Sequence[Path], Sequence[md.Trajectory], Sequence[SingleTraj]])
tops (Union[None, Sequence[str], Sequence[Path]])
backend (Literal['mdtraj', 'no_load'])
common_str (Optional[Sequence[str]])
traj_nums (Optional[Sequence[int]])
custom_top (Optional[CustomAAsDict])
- :paramSequence[SingleTraj], Sequence[Path]]): List of strings with
paths to trajectories. Can also be a list of md.Trajectory or em.SingleTraj.
- Parameters:
tops (Optional[list[str]]) – List of strings with paths to reference pdbs.
backend (str, optional) –
- Choose the backend to load trajectories:
’mdtraj’ uses mdtraj, which loads all trajectories into RAM.
’no_load’ creates an empty trajectory object.
Defaults to ‘no_load’, which makes the instantiation of large ensembles fast and RAM efficient.
common_str (list[str], optional) – If you want to include trajectories with different topology. The common string is used to pair traj-files (
.xtc, .dcd, .lammpstrj, ...
) with their topology (.pdb, .gro, ...
). The common-string should be a substring of matching traj and topology files.basename_fn (Union[None, Callable[[str], str], optional) – A function to apply to the trajectory file path string to return the basename of the trajectory. If None is provided, the filename without extension will be used. When all files are named the same and the folder they’re in defines the name of the trajectory, you can supply
lambda x: split('/')[-2]
as this argument. Defaults to None.custom_top (Optional[CustomAAsDict]) – Optional[CustomAAsDict]: An instance of the
encodermap.trajinfo.trajinfo_utils.CustomTopology
or a dictionary that can be made into such.trajs (Union[Sequence[str], Sequence[Path], Sequence[md.Trajectory], Sequence[SingleTraj]])
traj_nums (Optional[Sequence[int]])
- batch_iterator(batch_size: int, replace: bool = False, CV_names: tuple[str] = ('',), deterministic: bool = True, yield_index: bool = True, start: int = 1) Iterator[tuple[ndarray, ndarray]] [source]#
- batch_iterator(batch_size: int, replace: bool = False, CV_names: tuple[str] = ('',), deterministic: bool = True, yield_index: bool = False, start: int = 1) Iterator[ndarray]
- batch_iterator(batch_size: int, replace: bool = False, CV_names: Sequence[str] | None = None, deterministic: bool = True, yield_index: bool = True, start: int = 1) Iterator[tuple[ndarray, tuple[ndarray, ndarray, ndarray, ndarray, ndarray]]]
- batch_iterator(batch_size: int, replace: bool = False, CV_names: Sequence[str] | None = None, deterministic: bool = True, yield_index: bool = False, start: int = 1) Iterator[tuple[ndarray, ndarray, ndarray, ndarray, ndarray]]
Lazy batched iterator of CV data.
This iterator extracts batches of CV data from the ensemble. If the ensemble is a large HDF5 datset, this provides the ability to use all data without loading it all into memory.
Examples
Import EncoderMap and load some example trajectories.
>>> import encodermap as em >>> trajs = em.TrajEnsemble( ... [ ... 'https://files.rcsb.org/view/1YUG.pdb', ... 'https://files.rcsb.org/view/1YUF.pdb' ... ] ... )
This iterator will yield new samples forever. The batch is a tuple of
numpy.ndarray
.>>> for batch in trajs.batch_iterator(batch_size=2): ... print([b.shape for b in batch]) ... break [(2, 148), (2, 147), (2, 150, 3), (2, 149), (2, 82)]
Use it with Python’s builtin
next()
function. Thedeterministic
flag returns deterministic batches. Theyield_index
flag also provides the index of the extracted batch. In this example, both batches are extracted from the 1YUG trajectory (traj_num==0
).>>> iterator = trajs.batch_iterator(deterministic=True, batch_size=2, yield_index=True) >>> index, batch = next(iterator) >>> index [[0 5] [0 8]] >>> index, batch = next(iterator) >>> index [[ 0 3] [ 0 10]]
If a single string is requested for
CV_names
, the batch, will be a sinlgenumpy.ndarray
, rather than a tuple thereof.>>> iterator = trajs.batch_iterator(batch_size=2, CV_names=["central_dihedrals"])
>>> batch = next(iterator) >>> batch.shape (2, 147)
- Parameters:
batch_size (int) – The size of the batch.
replace (bool) – Whether inside a single batch a sample can occur more than once. Set to False (default) to only allow unique samples in a batch.
CV_names (Sequence[str]) – The names of the CVs to be used in the iterator. If a list/tuple with a single string is provided, the batch will be a
numpy.ndarray
, rather than a tuple thereof.deterministic (bbol) – Whether the samples should be deterministic.
yield_index (bool) – Whether to also yield the index of the extracted samples.
start (int) – A start ineteger, which can be used together with
deterministic=True
to get different deterministic datasets.
- Returns:
Different iterators based on chosen arguments.
- Return type:
Iterator[Any]
- cluster(cluster_id, col='cluster_membership', memberships=None, n_points=-1, overwrite=True)[source]#
Clusters this
TrajEnsemble
based on the providedcluster_id
andcol
.With ‘clustering’ we mean to extract a subset given a certain membership. Take two trajectories with 3 frames each as an ensemble. Let’s say we calculate the end-to-end distance of the trajectories and use it as a collective variable of the system. The values are
[0.8, 1.3, 1.2, 1.9, 0.2, 1.3]
. Based on these values, we define a boolean CV (using 0 as False and 1 as True) which says whether the end-to-end distance is smaller or grather than 1.0. We give this CV the name'end_to_end_binary'
and the values are[0, 1, 1, 1, 0, 1]
. We can use this CV to ‘cluster’ theTrajEnsemble
via:cluster = trajs.cluster(cluster_id=0, col='end_to_end_binary')
:This gives a
TrajEnsemble
with 2 frames.
cluster = trajs.cluster(cluster_id=0, col='end_to_end_binary')
:This gives a
TrajEnsemble
with 4 frames.
Sometimes, you want to save this a cluster in a format that can be rendered by graphical programs (
.xtc, .pdb
), you can use either thejoin
orstack
method of the resulting :obj:``TrajEnsemble` to get a mdtraj.Trajectory, which is either stacked along the atom axis or joined along the time axis.Note
If the resulting
TrajEnsemble
has inhomogeneous topologies, thejoin
method will return a dict[md.Topology, md.Trajectory] instead. This dict can be used to save multiple (.xtc, .pdb
) files and visualize your cluster in external programs.The
col
parameter takes any CV name, that is per-frame and integer.- Parameters:
cluster_id (int) – The cluster id to use. Needs to be an integer, that is present in the
col
parameter.col (str) – Which ‘column’ of the collective variables to use. Needs to be a key, that can be found in
trajs.CVs.keys()
.memberships (Optional[np.ndarray]) – If a
numpy.ndarray
is provided here, the memberships from this array will be used. In this case, thecol
argument will be unused.n_points (int) – How many points the resulting cluster should contain. Subsamples the points in
col == cluster_id
evenly and without repeat. If set to -1, all points will be used.overwrite (bool) – When the
memberships
argument is used, but theTrajEnsemble
already has a CV under the name specified bycol
, you can set this to True to overwrite this column. Can be helpful, when you iteratively conduct multiple clusterings.
- Return type:
Examples
Import EncoderMap and NumPy.
>>> import encodermap as em >>> import numpy as np
Load an example project.
>>> trajs = em.load_project("pASP_pGLU", load_autoencoder=False)
Create an array full of
-1
’s. These are the ‘outliers’.>>> cluster_membership = np.ones(shape=(trajs.n_frames, )) * -1
Select the first 5 frames of every traj to be in cluster 0.
>>> cluster_membership[trajs.id[:, 1] < 5] = 0
Select all frames between 50 and 55 to be cluster 1.
>>> cluster_membership[(50 <= trajs.id[:, 1]) & (trajs.id[:, 1] <= 55)] = 1 >>> np.unique(cluster_membership) array([-1., 0., 1.])
Load this array as a CV called
'clu_mem'
.>>> trajs.load_CVs(cluster_membership, attr_name='clu_mem')
Extract all of cluster 0 with
n_points=-1
.>>> clu0 = trajs.cluster(0, "clu_mem") >>> clu0.n_frames 35
Extract an evenly spaced subset of cluster 1 with 10 total points.
>>> clu1 = trajs.cluster(1, "clu_mem", n_points=10) >>> clu1.n_frames 10
Cclusters with inhomogeneous topologies can be stacked along the atom axis.
>>> [t.n_atoms for t in trajs] [69, 83, 103, 91, 80, 63, 73] >>> stacked = clu1.stack() >>> stacked.n_atoms 795
But joining the trajectories returns a
dict[top, traj]
if the topologies are inhomogeneous.>>> joined = clu1.join() >>> type(joined) <class 'dict'>
- dash_summary()[source]#
A
pandas.DataFrame
that summarizes this ensemble.- Returns:
The DataFrame.
- Return type:
pd.DataFrame
- classmethod from_textfile(fname, basename_fn=None)[source]#
Creates a
TrajEnsemble
object from a textfile.- The textfile needs to be space-separated with two or three columns:
- Column 1:
The trajectory file.
- Column 2:
The corresponding topology file (If you are using
.h5
trajs, column 1 and 2 will be identical, but column 2 needs to be there nonetheless).
- Column 3:
The common string of the trajectory. This column can be left out, which will result in an
TrajEnsemble
without common strings.
- Parameters:
fname (Union[str, Path]) – File to be read.
basename_fn (Union[None, Callable[[str], str]], optional) – A function to apply to the
traj_file
string to return the basename of the trajectory. If None is provided, the filename without extension will be used. When all files are named the same and the folder they’re in defines the name of the trajectory, you can supplylambda x: split('/')[-2]
as this argument. Defaults to None.
- Returns:
A
TrajEnsemble
instance.- Return type:
- get_single_frame(key)[source]#
Returns a single frame from all loaded trajectories.
Consider a
TrajEnsemble
class with two trajectories. One has 10 frames, the other 5 (trajs.n_frames
is 15). Callingtrajs.get_single_frame(12)
is equal to callingtrajs[1][1]
. Callingtrajs.get_single_frame(16)
will error, andtrajs.get_single_frame(1)
is the same astrajs[0][1]
.- Parameters:
key (int) – The frame to return.
- Returns:
The frame.
- Return type:
- iterframes()[source]#
Generator over the frames in this instance.
- Yields:
tuple –
- A tuple containing the following:
int: The traj_num
int: The frame_num
encodermap.SingleTraj: An SingleTraj object.
- Return type:
Iterator[tuple[int, int, SingleTraj]]
Examples
Import EncoderMap and load an example
TrajEnsemble
.>>> import encodermap as em >>> trajs = em.TrajEnsemble( ... [ ... 'https://files.rcsb.org/view/1YUG.pdb', ... 'https://files.rcsb.org/view/1YUF.pdb', ... ], ... ) >>> print(trajs.n_frames) 31
Subsample every tenth frame.
>>> trajs = trajs.subsample(10) >>> trajs.n_frames 4
Call the
iterframes
method.>>> for traj_num, frame_num, frame in trajs.iterframes(): ... print(traj_num, frame_num, frame.n_frames) 0 0 1 0 10 1 1 0 1 1 10 1
- itertrajs()[source]#
Generator over the SingleTraj classes.
- Yields:
tuple –
- A tuple containing the following:
int: A loop-counter integer. Is identical with traj.traj_num.
encodermap.SingleTraj: An SingleTraj object.
- Return type:
Examples
>>> import encodermap as em >>> trajs = em.TrajEnsemble( ... [ ... 'https://files.rcsb.org/view/1YUG.pdb', ... 'https://files.rcsb.org/view/1YUF.pdb' ... ] ... ) >>> for i, traj in trajs.itertrajs(): ... print(traj.basename) 1YUG 1YUF
- load_CVs(data=None, attr_name=None, cols=None, deg=None, periodic=True, labels=None, directory=None, ensemble=False, override=False, custom_aas=None, alignment=None)[source]#
Loads CVs in various ways. The easiest way is to provide a single
numpy.ndarray
and a name for that array.Besides np.ndarray, files (
.txt and .npy
) can be loaded. Features or Featurizers can be provided. Axarray.Dataset
can be provided. A str can be provided which either is the name of one of EncoderMap’s features (encodermap.features) or the string can be ‘all’, which loads all features required for EncoderMap’sencodermap.autoencoder.autoencoder`AngleDihedralCartesianEncoderMap
.- Parameters:
data (Optional[TrajEnsembleFeatureType]) – The CV to load. When a
numpy.ndarray
is provided, it needs to have a shape matchingn_frames
and the data will be distributed to the trajs, When a list of files is provided,len(data)
(the files) needs to matchn_trajs
. The first file will be loaded by the first traj (based on the traj’straj_num
) and so on. If a list ofnumpy.ndarray
is provided, the first array will be assigned to the first traj (based on the traj’straj_num
). If None is provided, the argumentdirectory
will be used to construct a str using this expressionfname = directory + traj.basename + '_' + attr_name
. If there are.txt
or.npy
files matching that string in thedirectory
, the CVs will be loaded from these files to the corresponding trajs. Defaults to None.attr_name (Optional[str]) – The name under which the CV should be found in the class. Choose whatever you like.
'highd'
,'lowd'
,'dists'
, etc. The CV can then be accessed via dot-notation:trajs.attr_name
. Defaults to None, in which case, the argumentdata
should point to existing files. Theattr_name
will be extracted from these files.A list of integers indexing the columns of the data to be loaded. This is useful if a file contains columns which are not features (i.e. an indexer or the error of the features. eg:
id f1 f2 f1_err f2_err 0 1.0 2.0 0.1 0.1 1 2.5 1.2 0.11 0.52
In that case, you would want to supply
cols=[1, 2]
to thecols
argument. If None is provided all columns are loaded. Defaults to None.deg (Optional[bool]) – Whether to return angular CVs using degrees. If None or False, CVs will be in radian. Defaults to None.
periodic (bool) – Whether to use the minimum image convention to calculate distances/angles/dihedrals. This is generally recommended, when you don’t clean up your trajectories and the proteins break over the periodic boundary conditions. However, when the protein is large, the distance between one site and another might be shorter through the periodic boundary. This can lead to wrong results in your distance calculations.
labels (list[str]) – A list containing the labels for the dimensions of the data. If you provide a
numpy.ndarray
with shape(n_trajs, n_frames, n_feat)
, this list needs to be oflen(n_feat)
. An exception will be raised otherwise. If None is privided, the labels will be automatically generated. Defaults to None.directory (Optional[str]) – If this argument is provided, the directory will be searched for
.txt
or.npy
files which have the same names as the trajectories have basenames. The CVs will then be loaded from these files.ensemble (bool) – Whether the trajs in this class belong to an ensemble. This implies that they contain either the same topology or are very similar (think wt, and mutant). Setting this option True will try to match the CVs of the trajs onto the same dataset. If a VAL residue has been replaced by LYS in the mutant, the number of sidechain dihedrals will increase. The CVs of the trajs with VAL will thus contain some NaN values. Defaults to False.
override (bool) – Whether to override CVs with the same name as
attr_name
.custom_aas (Optional[CustomAAsDict]) – You can provide non-standard residue definitions in this argument. See
encodermap.trajinfo.trajinfo_utils.CustomTopology
for information how to use the custom_aas argument. If set to None (default), only standard residue names are assumed.alignment (Optional[str]) – If your proteins have similar but different sequences, you can provide a CLUSTAL W alignment as this argument and the featurization will align the features accordingly.
- Raises:
TypeError – When wrong Type has been provided for data.
- Return type:
None
- load_custom_topology(custom_top=None)[source]#
Loads a custom_topology from a CustomTopology class or a dict.
See also
CustomTopology
- Parameters:
custom_top (CustomTopology | dict[str | tuple[str, str], None | tuple[str, None] | tuple[str, dict[Literal['bonds', 'optional_bonds', 'delete_bonds', 'optional_delete_bonds', 'PHI', 'PSI', 'OMEGA', 'not_PHI', 'not_PSI', 'not_OMEGA', 'CHI1', 'CHI2', 'CHI3', 'CHI4', 'CHI5'], list[str] | list[tuple[str | int, str | int]]]]] | None) – Optional[Union[CustomTopology, CustomAAsDict]]: An instance of the CustomTopology class or a dictionary that can be made into such.
- Return type:
None
- parse_clustal_w_alignment(aln)[source]#
Parse an alignment in ClustalW format and add the info to the trajectories.
- Parameters:
aln (str) – The alignment in ClustalW format.
- Return type:
None
- save(fname, CVs='all', overwrite=False, only_top=False)[source]#
Saves this TrajEnsemble into a single
.h5
file.- Parameters:
fname (Union[str, Path]) – Where to save the file.
CVs (Union[Literal["all"], list[str], Literal[False]]) – Which CVs to alos store in the file. If set to
'all'
, all CVs will be saved. Otherwise, a list[str] can be provided to only save specific CVs. Can also be set to False, no CVs are stored in the file.overwrite (bool) – If the file exists, it is overwritten.
only_top (bool) – Only writes the trajectorie’s topologies into the file.
- Raises:
IOError – If file already exists and overwrite is not True.
- Return type:
None
- sidechain_info()[source]#
Indices used for the AngleDihedralCartesianEncoderMap class to allow training with multiple different sidechains.
- Returns:
The indices. The key ‘-1’ is used for the hypothetical convex hull of all feature spaces (the output of the tensorflow model). The other keys match the common_str of the trajs.
- Return type:
- Raises:
Exception – When the common_strings and topologies are not aligned. An exception is raised. Aligned means that all trajs with the same common_str should possess the same topology.
- split_into_frames(inplace=False)[source]#
Splits self into separate frames.
- Parameters:
inplace (bool) – Whether to do the split inplace or not. Defaults to False and thus, returns a new TrajEnsemble class.
- Return type:
None
- subsample(stride=None, total=None)[source]#
Returns a subset of this
TrajEnsemble
given the provided stride or total.This is a faster alternative than using the
trajs[trajs.index_arr[::1000]]
when HDF5 trajs are used, because the slicing information is saved in the respectiveencodermap.trajinfo.info_single.SingleTraj
and loading of single frames is faster in HDF5 formatted trajs.
- Parameters:
- Returns:
A trajectory ensemble.
- Return type:
Note
The result from
subsample(1000)` `is different from ``trajs[trajs.index_arr[::1000]]
. With subsample every trajectory is sub-sampled independently. Consider aTrajEnsemble
with twoencodermap.trajinfo.info_single.SingleTraj
trajectories with 18 frames each.subsampled = trajs.subsample(5)
would return aTrajEnsemble
with two trajs with 3 frames each (subsampled.n_frames == 6
). Whereas,subsampled = trajs[trajs.index_arr[::5]]
would return aTrajEnsemble
with 7 SingleTrajs with 1 frame each (subsampled.n_frames == 7
). Because the time and frame numbers are saved all the time, this should not be too much of a problem.
- to_alignment_query()[source]#
A string, that cen be put into sequence alignment software.
- Return type:
- classmethod with_overwrite_trajnums(*trajs)[source]#
Creates a
TrajEnsemble
by copying the providedencodermap.trajinfo.info_single.SingleTraj
instances and changing theirtraj_num
attribute to adhere to[0, 1, 2, ...]
.- Parameters:
trajs (Sequence[SingleTraj]) – The sequence of trajs.
- Returns:
A
TrajEnsemble
instance.- Return type:
- property CVs: dict[str, ndarray]#
Returns dict of CVs in SingleTraj classes. Only CVs with the same names in all SingleTraj classes are loaded.
- Type:
- property CVs_in_file: bool#
Is true, if CVs can be loaded from file. Can be used to build a data generator from.
- Type:
- property _CVs: Dataset#
Returns x-array Dataset of matching CVs. stacked along the trajectory-axis.
- Type:
- property index_arr: ndarray#
Returns np.ndarray with ndim = 2. Clearly assigning every loaded frame an identifier of traj_num (self.index_arr[:,0]) and frame_num (self.index_arr[:,1]). Can be used to create an unspecified subset of frames and can be useful when used with clustering.
- Type:
np.ndarray
- property locations: list[str]#
Duplication of self.traj_files but using the trajs own traj_file attribute. Ensures that traj files are always returned independent of the current load state.
- Type:
- property name_arr: ndarray#
Trajectory names with the same length as self.n_frames.
- Type:
np.ndarray
- property top: list[Topology]#
Returns a minimal set of mdtraj.Topologies.
If all trajectories share the same topology a list with len 1 will be returned.
- Type:
- property top_files: list[str]#
Returns minimal set of topology files.
If yoy want a list of top files with the same length as self.trajs use self._top_files and self._traj_files.
- Type:
- property traj_files: list[str]#
A list of the traj_files of the individual SingleTraj classes.
- Type:
- property traj_joined: Trajectory#
Returns a mdtraj Trajectory with every frame of this class appended along the time axis.
Can also work if different topologies (with the same number of atoms) are loaded. In that case, the first frame in self will be used as topology parent and the remaining frames’ xyz coordinates are used to position the parents’ atoms accordingly.
Examples
>>> import encodermap as em >>> trajs = em.load_project("pASP_pGLU") >>> subsample = trajs[0][:20] + trajs[1][:20] >>> subsample.split_into_frames().traj_joined <mdtraj.Trajectory with 40 frames, 69 atoms, 6 residues, and unitcells at ...>
- Type:
mdtraj.Trajectory
- property trajs_by_common_str: dict[None | str, TrajEnsemble]#
Returns the trajs in self ordered by top.
If all trajectories share the same common_str, a dict with one key will be returned. As the common_str can be None, None can also occur as a key in this dict.
- Type:
- property trajs_by_top: dict[Topology, TrajEnsemble]#
Returns the trajs in self ordered by top.
If all trajectories share the same topology, a dict with one key will be returned.
- Type:
dict[md.Topology, TrajEnsemble]