encodermap.moldata package#
Submodules#
encodermap.moldata.moldata module#
New MolData class. Uses PyEMMA to calculate many trajectories in Parallel.
Even when the set of trajectories or even collective variables is too large to keep in memory.
Allows creation of tfrecord files to pass large datasets to tensorflow that normally won’t fit into memory.
Is Backwards-compatible to the old MolData class.
Todo
Add tfrecord capabilities
- class encodermap.moldata.moldata.NewMolData(trajs, cache_path='', top=None, write_traj=False, fmt='.nc', start=None, stop=None, step=None)[source]#
Bases:
object
MolData version 2. Extracts and holds conformational information of trajectories.
In version 2. You can either use MDAnalysis or the out-of memory option using encodermap’s new TrajEnsemble and SingleTraj classes.
- Collective Variables is a term used for data of some dimension matching the dimension of your trajectory.
Collective variables of dimensionality 1 assign a single (float) value to every frame of a simulation or simulation ensemble. This could the the membership to a cluster, the distance between the termini of a protein or the distance between two spin labels. Collective variables of dimensionality 2 assign a list of floats to every simulation frame. The backbone torsions are such a collective variable. A flattened array of pairwise distances between CA atoms would also fall into this category. CVs of dimensionality 3 ascribe a value to every atom in every frame. This could be the xyz-coordinates of the atom or the beta-factor or the charge.
- Encodermap in its Angle-Dihedral-Cartesioan mode uses the following collective variables:
cartesians: The xyz-coordinates of every atom in every frame in every trajectory.
central_cartesians: The xyz-coordinates of the backbone C, CA, N atoms.
dihedrals: The omega-phi-psi angles of the backbone.
angles: The angles between the central_cartesian atoms.
lengths: The distances between the central_cartesian atoms.
sidedihedrals: The dihedrals of the sidechains in order residue1-chi1-chi5 residue2-ch1-chi5.
- __init__(trajs, cache_path='', top=None, write_traj=False, fmt='.nc', start=None, stop=None, step=None)[source]#
Instantiate the MolData Class.
- The trajs parameter can take a number of possible inputs:
MDAnalysis.AtomGroup: Ensuing backwards-compatibility to the old MolData class.
- em.TrajEnsemble: EncoderMap’s TrajEnsemble class which keeps track of frames and collective
variables.
- list of str: If you don’t want to bother yourself with the TrajEnsemble class you can pass a
list of str giving the filenames of many trajetcory files (.xtc, .dcd, .h5). Make sure to also provide a topology in case of non-topology trajectories.
- Parameters:
trajs (Union[MDAnalysis.AtomGroup, encodermap.TrajEnsemble, list]) – The trajectories to load. Can be either one of the following: * MDAnalysis.AtomGroup. For Backwards-compatibility. * encodermap.TrajEnsemble. New TrajEnsemble class which manages frames and collective variables. * list: Simply provide a list of trajectory files and don’t forget to provide a topology.
cache_path (str, optional) – Where to save generated Data to. Saves either numpy arrays (when AtomGroup is provided as trajs, or fmt is ‘.npy’) or NetCDF-HDF5 files with xarray (fmt is ‘.nc’). When an empty string is provided nothing is written to disk. Defaults to ‘’ (empty string).
top (Union[str, mdtraj.Topology, None], optional) – The topology of trajs in case trajs is a list of str. Can take filename of a topology file or already loaded mdtraj.Topology. Defaults to None.
write_traj (bool, optional) – Whether to include the trajectory (+topology) into the NetCDF-HDF5 file. This option only works in conjunction with fmt=’.nc’ and if set to True will use mdtraj to write the trajectory, topology and the collective variables to one comprehensive file.
fmt (str, optional) – The format to save the CVs as. Can be either ‘.npy’ or ‘.nc’. Defaults to ‘.nc’. The default is NetCDF-HDF5, because these files can be read iteratively and such can be larger than memory allows. This helps in the construction of tfrecord files that can also be used to train a network with large datasets.
start (Union[int, None], optional) – First frame to analyze. Is there for backwards-compatibility. This feature is dropped in the newer TrajEnsemble pipeline.
stop (Union[int, None], optional) – Last frame to analyze. Is there for backwards-compatibility. This feature is dropped in the newer TrajEnsemble pipeline.
step (Union[int, None], optional) – Step provided to old MolData class. Is there for backwards-compatibility. This feature is dropped in the newer TrajEnsemble pipeline.
Examples
>>> import encodermap as em >>> traj =