encodermap package#

Subpackages#

Submodules#

encodermap._optional_imports module#

Optional imports of python packages.

Allows you to postpone import exceptions. Basically makes the codebase of EncoderMap leaner, so that users don’t need to install packages for features they don’t require.

Examples

>>> from encodermap._optional_imports import _optional_import
>>> np = _optional_import('numpy')
>>> np.array([1, 2, 3])
array([1, 2, 3])
>>> nonexistent = _optional_import('nonexistent_package')
>>> try:
...     nonexistent.function()
... except ValueError as e:
...     print(e)
Install the `nonexistent_package` package to make use of this feature.
>>> try:
...     _ = nonexistent.variable
... except ValueError as e:
...     print(e)
Install the `nonexistent_package` package to make use of this feature.
>>> numpy_random = _optional_import('numpy', 'random.random')
>>> np.random.seed(1)
>>> np.round(numpy_random((5, 5)) * 20, 0)
array([[ 8., 14.,  0.,  6.,  3.],
       [ 2.,  4.,  7.,  8., 11.],
       [ 8., 14.,  4., 18.,  1.],
       [13.,  8., 11.,  3.,  4.],
       [16., 19.,  6., 14., 18.]])
encodermap._optional_imports._optional_import(module: str, name: Optional[str] = None, version: Optional[str] = None) Any[source]#

encodermap._typing module#

Typing for the encodermap package

encodermap._version module#

Encodermap’s versioning follows semantic versioning guidelines. Read more about them here: https://semver.org/

tldr: Given a version number MAJOR.MINOR.PATCH, increment the:

  • MAJOR version when you make incompatible API changes,

  • MINOR version when you add functionality in a backwards compatible manner, and

  • PATCH version when you make backwards compatible bug fixes.

Additional labels for pre-release and build metadata are available as extensions to the MAJOR.MINOR.PATCH format.

Current example: Currenlty I am writing this documentation. Writing this will not break an API, nor does it add functionality, nor does it fixes bugs. Thus, the version stays at 3.0.0

Module contents#

EncoderMap: Dimensionality reduction for molecular dynamics.

EncoderMap provides a framework for using molecular dynamics data with with the tensorflow library. It started as the implementation of a neural network autoencoder to do dimensionality reduction and also create new high-dimensional data from the low-dimensional embedding. The user was still required to create their own dataset and provide the numpy arrays. In the second iteration of EncoderMap, the possibility to provide molecular dynamics data with the MolData class was added. A new neural network architecture was implemented to try and rebuild cartesian coordinates from the low-dimensional embedding.

This iteration of EncoderMap continues this endeavour by porting the old code to the newer tensorflow version (2.x). However, more has been added which should aid computational chemists and also structural biologists:

  • New trajectory classes with lazy loading of coordinates to save disk space.

  • Featurization which can be parallelized using the distributed computing

    library dask.

  • Interactive matplotlib plots for clustering and structure creation.

  • Neural network building blocks that allows users to easily build new

    nural networks.

  • Sparse networks allow comparison of proteins with different topologies.

class encodermap.ADCParameters(**kwargs: Optional[Union[float, int, str, bool, list[int], list[str], list[float], tuple[int, None]]])[source]#

Bases: ParametersFramework

This is the parameter object for the AngleDihedralCartesianEncoder. It holds all the parameters that the Parameters object includes, plus the following attributes:

cartesian_pwd_start#

Index of the first atom to use for the pairwise distance calculation.

Type:

int

cartesian_pwd_stop#

Index of the last atom to use for the pairwise distance calculation.

Type:

int

cartesian_pwd_step#

Step for the calculation of paiwise distances. E.g. for a chain of atoms N-C_a-C-N-C_a-C… cartesian_pwd_start=1 and cartesian_pwd_step=3 will result in using all C-alpha atoms for the pairwise distance calculation.

Type:

int

use_backbone_angles#

Allows to define whether backbone bond angles should be learned (True) or if instead mean values should be used to generate conformations (False).

Type:

bool

use_sidechains#

Whether sidechain dihedrals should be passed through the autoencoder.

Type:

bool

angle_cost_scale#

Adjusts how much the angle cost is weighted in the cost function.

Type:

int

angle_cost_variant#

Defines how the angle cost is calculated. Must be one of: * “mean_square” * “mean_abs” * “mean_norm”.

Type:

str

angle_cost_reference#

Can be used to normalize the angle cost with the cost of same reference model (dummy).

Type:

int

dihedral_cost_scale#

Adjusts how much the dihedral cost is weighted in the cost function.

Type:

int

dihedral_cost_variant#

Defines how the dihedral cost is calculated. Must be one of: * “mean_square” * “mean_abs” * “mean_norm”.

Type:

str

dihedral_cost_reference#

Can be used to normalize the dihedral cost with the cost of same reference model (dummy).

Type:

int

side_dihedral_cost_scale#

Adjusts how much the side dihedral cost is weighted in the cost function.

Type:

int

side_dihedral_cost_variant#

Defines how the side dihedral cost is calculated. Must be one of: * “mean_square” * “mean_abs” * “mean_norm”.

Type:

str

side_dihedral_cost_reference#

Can be used to normalize the side dihedral cost with the cost of same reference model (dummy).

Type:

int

cartesian_cost_scale#

Adjusts how much the cartesian cost is weighted in the cost function.

Type:

int

cartesian_cost_scale_soft_start#

Allows to slowly turn on the cartesian cost. Must be a tuple with (start, end) or (None, None) If begin and end are given, cartesian_cost_scale will be increased linearly in the given range.

Type:

tuple

cartesian_cost_variant#

Defines how the cartesian cost is calculated. Must be one of: * “mean_square” * “mean_abs” * “mean_norm”.

Type:

str

cartesian_cost_reference#

Can be used to normalize the cartesian cost with the cost of same reference model (dummy).

Type:

int

cartesian_dist_sig_parameters#

Parameters for the sigmoid functions applied to the high- and low-dimensional distances in the following order (sig_h, a_h, b_h, sig_l, a_l, b_l).

Type:

tuple of floats

cartesian_distance_cost_scale#

Adjusts how much the cartesian distance cost is weighted in the cost function.

Type:

int

Examples

>>> import encodermap as em
>>> parameters = em.ADCParameters()
>>> parameters.auto_cost_variant
mean_abs
>>> parameters.save(path='/path/to/dir')
/path/to/dir/parameters.json
>>> # alternative constructor
>>> new_params = em.Parameters.from_file('/path/to/dir/parameters.json')
>>> new_params.main_path
/path/to/dir/parameters.json
__init__(**kwargs: Optional[Union[float, int, str, bool, list[int], list[str], list[float], tuple[int, None]]]) None[source]#

Instantiate the ADCParameters class

Takes a dict as input and overwrites the class defaults. The dict is directly stored as an attribute and can be accessed via instance attributes.

Parameters:

**kwargs (dict) – Dict containing values. If unknown values are passed they will be dropped.

activation_functions: list[str]#
defaults = {'activation_functions': ['', 'tanh', 'tanh', ''], 'analysis_path': '', 'angle_cost_reference': 1, 'angle_cost_scale': 0, 'angle_cost_variant': 'mean_abs', 'auto_cost_scale': None, 'auto_cost_variant': 'mean_abs', 'batch_size': 256, 'batched': True, 'cartesian_cost_reference': 1, 'cartesian_cost_scale': 1, 'cartesian_cost_scale_soft_start': (None, None), 'cartesian_cost_variant': 'mean_abs', 'cartesian_dist_sig_parameters': (4.5, 12, 6, 1, 2, 6), 'cartesian_distance_cost_scale': 1, 'cartesian_pwd_start': None, 'cartesian_pwd_step': None, 'cartesian_pwd_stop': None, 'center_cost_scale': 0.0001, 'checkpoint_step': 5000, 'dihedral_cost_reference': 1, 'dihedral_cost_scale': 1, 'dihedral_cost_variant': 'mean_abs', 'dist_sig_parameters': (4.5, 12, 6, 1, 2, 6), 'distance_cost_scale': None, 'gpu_memory_fraction': 0, 'id': '', 'l2_reg_constant': 0.001, 'learning_rate': 0.001, 'loss': 'emap_cost', 'model_api': 'functional', 'n_neurons': [128, 128, 2], 'n_steps': 100000, 'periodicity': 6.283185307179586, 'seed': None, 'side_dihedral_cost_reference': 1, 'side_dihedral_cost_scale': 0.5, 'side_dihedral_cost_variant': 'mean_abs', 'summary_step': 10, 'tensorboard': False, 'training': 'auto', 'use_backbone_angles': False, 'use_sidechains': False}#
classmethod defaults_description() str[source]#

str: A string that contains tabulated default parameter values.

n_neurons: list[int]#
class encodermap.AngleDihedralCartesianEncoderMap(trajs: encodermap.TrajEnsemble, parameters: Optional[encodermap.ADCParameters] = None, model: Optional[tensorflow.keras.Model] = None, read_only: bool = False, cartesian_loss_step: int = 0, top: Optional[mdtraj.Topology] = None)[source]#

Bases: Autoencoder

Different __init__ method, than Autoencoder Class. Uses callbacks to tune-in cartesian cost.

Overwritten methods: _set_up_callbacks and generate.

Examples

>>> import encodermap as em
>>> # Load two trajectories
>>> xtcs = ["tests/data/1am7_corrected_part1.xtc", "tests/data/1am7_corrected_part2.xtc"]
>>> tops = ["tests/data/1am7_protein.pdb", "tests/data/1am7_protein.pdb"]
>>> trajs = em.load(xtcs, tops)
>>> print(trajs)
encodermap.TrajEnsemble object. Current backend is no_load. Containing 2 trajs. Not containing any CVs.
>>> # load CVs
>>> # This step can be omitted. The AngleDihedralCartesianEncoderMap class automatically loads CVs
>>> trajs.load_CVs('all')
>>> print(trajs.CVs['central_cartesians'].shape)
(51, 474, 3)
>>> print(trajs.CVs['central_dihedrals'].shape)
(51, 471)
>>> # create some parameters
>>> p = em.ADCParameters(periodicity=360, use_backbone_angles=True, use_sidechains=True,
...                      cartesian_cost_scale_soft_start=(6, 12))
>>> # Standard is functional model, as it offers more flexibility
>>> print(p.model_api)
functional
>>> print(p.distance_cost_scale)
None
>>> # Instantiate the class
>>> e_map = em.AngleDihedralCartesianEncoderMap(trajs, p, read_only=True)
>>> # dataset contains these inputs:
>>> # central_angles, central_dihedrals, central_cartesians, central_distances, sidechain_dihedrals
>>> print(e_map.dataset)
<BatchDataset element_spec=(TensorSpec(shape=(None, 472), dtype=tf.float32, name=None), TensorSpec(shape=(None, 471), dtype=tf.float32, name=None), TensorSpec(shape=(None, 474, 3), dtype=tf.float32, name=None), TensorSpec(shape=(None, 473), dtype=tf.float32, name=None), TensorSpec(shape=(None, 316), dtype=tf.float32, name=None))>
>>> # output from the model contains the following data:
>>> # out_angles, out_dihedrals, back_cartesians, pairwise_distances of inp cartesians, pairwise of back-mapped cartesians, out_side_dihedrals
>>> for data in e_map.dataset.take(1):
...     pass
>>> out = e_map.model(data)
>>> print([i.shape for i in out])
[TensorShape([256, 472]), TensorShape([256, 471]), TensorShape([256, 474, 3]), TensorShape([256, 112101]), TensorShape([256, 112101]), TensorShape([256, 316])]
>>> # get output of latent space by providing central_angles, central_dihedrals, sidehcain_dihedrals
>>> latent = e_map.encoder([data[0], data[1], data[-1]])
>>> print(latent.shape)
(256, 2)
>>> # Rebuild central_angles, central_dihedrals and sidechain_angles from latent
>>> dih, ang, side_dih = e_map.decode(latent)
>>> print(dih.shape, ang.shape, side_dih.shape)
(256, 472) (256, 471) (256, 316)
__init__(trajs: encodermap.TrajEnsemble, parameters: Optional[encodermap.ADCParameters] = None, model: Optional[tensorflow.keras.Model] = None, read_only: bool = False, cartesian_loss_step: int = 0, top: Optional[mdtraj.Topology] = None) None[source]#

Instantiate the AngleDihedralCartesianEncoderMap class.

Parameters:
  • trajs (em.TrajEnsemble) – The trajectories to be used as input. If trajs contain no CVs, correct CVs will be loaded.

  • parameters (Optional[em.ACDParameters]) – The parameters for the current run. Can be set to None and the default parameters will be used. Defaults to None.

  • model (Optional[tf.keras.models.Model]) – The keras model to use. You can provide your own model with this argument. If set to None, the model will be built to the specifications of parameters using either the functional or sequential API. Defaults to None

  • read_only (bool) – Whether to write anything to disk (False) or not (True). Defaults to False.

  • cartesian_loss_step (int, optional) – For loading and re-training the model. The cartesian_distance_loss is tuned in step-wise. For this the start step of the training needs to be accounted for. If the scale of the cartesian loss should increase from epoch 6 to epoch 12 and the model is saved at epoch 9, this argument should also be set to 9, to continue training with the correct scaling factor. Defaults to 0.

_setup_callbacks() None[source]#

Overwrites the parent class’ _setup_callbacks method.

Due to the ‘soft start’ of the cartesian cost, the cartesiand_increase_callback needs to be added to the list of callbacks.

encode(data=None)[source]#

Calls encoder part of model.

Parameters:

data (Union[np.ndarray, None], optional) – The data to be passed top the encoder part. Can be either numpy ndarray or None. If None is provided a set of 10000 points from the provided train data will be taken. Defaults to None.

Returns:

The output from the bottlenack/latent layer.

Return type:

np.ndarray

classmethod from_checkpoint(trajs, checkpoint_path, read_only=True, overwrite_tensorboard_bool=False)[source]#

Reconstructs the model from a checkpoint.

generate(points: np.ndarray, top: Optional[str, int, mdtraj.Topology] = None, backend: Literal['mdtraj', 'mdanalysis'] = 'mdtraj') Union[MDAnalysis.Universe, mdtraj.Trajectory][source]#

Overrides the parent class’ generate method and builds a trajectory.

Instead of just providing data to decode using the decoder part of the network, this method also takes a molecular topology as its top argument. This topology is then used to rebuild a time-resolved trajectory.

Parameters:
  • points (np.ndarray) – The low-dimensional points from which the trajectory should be rebuilt.

  • top (Optional[str, int, mdtraj.Topology]) – The topology to be used for rebuilding the trajectory. This should be a string pointing towards a <*.pdb, *.gro, *.h5> file. Alternatively, None can be provided, in which case, the internal topology (self.top) of this class is used. Defaults to None.

  • backend (str) – Defines what MD python package to use, to build the trajectory and also what type this method returns, needs to be one of the following: * “mdtraj” * “mdanalysis”

Returns:

The trajectory after

applying the decoded structural information. The type of this depends on the chosen backend parameter.

Return type:

Union[mdtraj.Trajectory, MDAnalysis.universe]

static get_train_data_from_trajs(trajs, p, attr='CVs')[source]#
property loss#

A list of loss functions passed to the model when it is compiled. When the main Autoencoder class is used and parameters.loss is ‘emap_cost’ this list is comprised of center_cost, regularization_cost, auto_cost. When the EncoderMap sub-class is used and parameters.loss is ‘emap_cost’ distance_cost is added to the list. When parameters.loss is not ‘emap_cost’, the loss can either be a string (‘mse’), or a function, that both are acceptable arguments for loss, when a keras model is compiled.

Type:

(Union[list, string, function])

save(step: Optional[int] = None) None[source]#

Saves the model to the current path defined in parameters.main_path.

Parameters:

step (Optional[int]) – Does not actually save the model at the given training step, but rather changes the string used for saving the model from an datetime format to another.

train() None[source]#

Overrides the parent class’ train method.

After the training is finished, an additional file is written to disk, which saves the current epoch. In the event that training will continue, the current state of the soft-start cartesian cost is read from that file.

class encodermap.Autoencoder(parameters=None, train_data: Optional[Union[np.ndarray, tf.Dataset]] = None, model=None, read_only=False, sparse=False)[source]#

Bases: object

Main Autoencoder class preparing data, setting up the neural network and implementing training.

This is the main class for neural networks inside EncoderMap. The class prepares the data (batching and shuffling), creates a tf.keras.Model of layers specified by the attributes of the encodermap.Parameters class. Depending on what Parent/Child-Class is instantiated a combination of cost functions is set up. Callbacks to Tensorboard are also set up.

train_data#

The numpy array of the train data passed at init.

Type:

np.ndarray

p#

An encodermap.Parameters() class containing all info needed to set up the network.

Type:

encodermap.Parameters

dataset#

The dataset that is actually used in training the keras model. The dataset is a batched, shuffled, infinitely-repeating dataset.

Type:

tensorflow.data.Dataset

read_only#

Variable telling the class whether it is allowed to write to disk (False) or not (True).

Type:

bool

optimizer#

Instance of the Adam optimizer with learning rate specified by the Parameters class.

Type:

tf.keras.optimizers.Adam

metrics#

A list of metrics passed to the model when it is compiled.

Type:

list

callbacks#

A list of tf.keras.callbacks.Callback Sub-classes changing the behavior of the model during training. Some standard callbacks are always present like:

  • encodermap.callbacks.callbacks.ProgressBar:

    A progress bar callback using tqdm giving the current progress of training and the current loss.

  • CheckPointSaver:

    A callback that saves the model every parameters.checkpoint_step steps into the main directory. This callback will only be used, when read_only is False.

  • TensorboardWriteBool:

    A callback that contains a boolean Tensor that will be True or False, depending on the current training step and the summary_step in the parameters class. The loss functions use this callback to decide whether they should write to Tensorboard. This callback will only be present, when read_only is False and parameters.tensorboard is True.

You can append your own callbacks to this list before executing Autoencoder.train().

Type:

list

encoder#

The encoder (sub)model of model.

Type:

tf.keras.models.Model

decoder#

The decoder (sub)model of model.

Type:

tf.keras.models.Model

from_checkpoint()[source]#

Rebuild the model from a checkpoint.

add_images_to_tensorboard()[source]#

Make tensorboard plot images.

train()[source]#

Starts the training of the tf.keras.models.Model.

plot_network()[source]#

Tries to plot the network. For this method to work graphviz, pydot and pydotplus needs to be installed.

encode()[source]#

Takes high-dimensional data and sends it through the encoder.

decode()[source]#

Takes low-dimensional data and sends it through the encoder.

generate()[source]#

Same as decode. For AngleDihedralCartesianAutoencoder classes this will build a protein strutcure.

Note

Performance of tensorflow is not only dependant on your system’s hardware and how the data is presented to the network (for this check out https://www.tensorflow.org/guide/data_performance), but also how you compiled tensorflow. Normal tensorflow (pip install tensorflow) is build without CPU extensions to work on many CPUs. However, Tensorflow can greatly benefit from using CPU instructions like AVX2, AVX512 that bring a speed-up in linear algebra computations of 300%. By building tensorflow from source you can activate these extensions. However, the CPU speed-up is dwarfed by the speed-up when you allow tensorflow to run on your GPU (grapohics card). To check whether a GPU is available run: print(“Num GPUs Available: “, len(tf.config.list_physical_devices(‘GPU’))). Refer to these pages to install tensorflow for best performance: https://www.tensorflow.org/install/pip, https://www.tensorflow.org/install/gpu

Examples

>>> import encodermap as em
>>> # without providing any data, default parameters and a 4D hypercube as input data will be used.
>>> e_map = em.EncoderMap(read_only=True)
>>> print(e_map.train_data.shape)
(16000, 4)
>>> print(e_map.dataset)
<BatchDataset element_spec=(TensorSpec(shape=(None, 4), dtype=tf.float32, name=None), TensorSpec(shape=(None, 4), dtype=tf.float32, name=None))>
>>> print(e_map.encode(e_map.train_data).shape)
(16000, 2)
__init__(parameters=None, train_data: Optional[Union[np.ndarray, tf.Dataset]] = None, model=None, read_only=False, sparse=False)[source]#

Instantiate the Autoencoder class.

Parameters:
  • parameters (Union[encodermap.Parameters, None], optional) – The parameters to be used. If None is provided default values (check them with print(em.Parameters.defaults_description())) are used. Defaults to None.

  • train_data (Union[np.ndarray, tf.data.Dataset, None], optional) –

    The train data. Can be one of the following: * None: If None is provided points on the edges of a 4-dimensional hypercube will be used as train data. * np.ndarray: If a numpy array is provided, it will be transformed into a batched tf.data.Dataset by

    first making it an infinitely repeating dataset, shuffling it and the batching it with a batch size specified by parameters.batch_size.

    • tf.data.Dataset: If a dataset is provided it will be used without making any adjustments. Make

      sure, that the dataset uses float32 as its type.

    Defaults to None.

  • model (Union[tf.keras.models.Model, None], optional) – Providing a keras model to this argument will make the Autoencoder/EncoderMap class use this model instead of the predefined ones. Make sure the model can accept EncoderMap’s loss functions. If None is provided the model will be built using the specifications in parameters. Defaults to None.

  • read_only (bool, optional) – Whether the class is allowed to write to disk (False) or not (True). Defaults to False and will allow the class to write to disk.

Raises:

BadError – When read_only is True and parameters.tensorboard is True, this Exception will be raised, because they are mutually exclusive.

_setup_callbacks()[source]#

Sets up a list with callbacks to be passed to self.model.fit()

add_images_to_tensorboard(data=None, image_step=None, scatter_kws={'s': 20}, hist_kws={'bins': 50}, additional_fns=None, when='epoch')[source]#

Adds images to Tensorboard using the data in data and the ids in ids.

Parameters:
  • data (Union[np.ndarray, list, None], optional) – The input-data will be passed through the encoder part of the autoencoder. If None is provided a set of 10000 points from the provided train data will be taken. A list is needed for the functional API of the ADCAutoencoder, that takes a list of [angles, dihedrals, side_dihedrals]. Defaults to None.

  • image_step (Union[int, None], optional) – The interval in which to plot images to tensorboard. If None is provided, the update step will be the same as parameters.summary_step. Defaults to None.

  • scatter_kws (dict, optional) – A dict with items that matplotlib.pyplot.scatter() will accept. Defaults to {‘s’: 20}, which sets an appropriate size of scatter points for the size of datasets encodermap is usually used for.

  • hist_kws (dict, optional) – A dict with items that matplotlib.pyplot.scatter() will accept. You can choose a colorbar here. Defaults to {‘bins’: 50} which sets an appropriate bin count for the size of datasets encodermap is usually used for.

  • additional_fns (Union[list, None], optional) – A list of functions that will accept the low-dimensional output of the autoencoder’s latent/bottleneck layer and return a tf.Tensor that can be logged by tf.summary.image(). See the notebook ‘writing_custom_images_to_tensorboard.ipynb’ in tutorials/notebooks_customization for more info. If None is provided no additional functions will be used to plot to tensorboard. Defaults to None.

  • when (str, optional) – When to log the images can be either ‘batch’, then the images will be logged after every step during training, or ‘epoch’, then only after every image_step epoch the images will be written. Defaults to ‘epoch’.

close()[source]#

Clears the current keras backend and frees up resources.

decode(data)[source]#

Calls the decoder part of the model.

AngleDihedralCartesianAutoencoder will, like the other two classes’ output a tuple of data.

Parameters:

data (np.ndarray) – The data to be passed to the decoder part of the model. Make sure that the shape of the data matches the number of neurons in the latent space.

Returns:

Oue output from the decoder part.

Return type:

np.ndarray

property decoder#

Decoder part of the model.

Type:

tf.keras.models.Model

encode(data=None)[source]#

Calls encoder part of model.

Parameters:

data (Union[np.ndarray, None], optional) – The data to be passed top the encoder part. Can be either numpy ndarray or None. If None is provided a set of 10000 points from the provided train data will be taken. Defaults to None.

Returns:

The output from the bottlenack/latent layer.

Return type:

np.ndarray

property encoder#

Encoder part of the model.

Type:

tf.keras.models.Model

classmethod from_checkpoint(checkpoint_path, read_only=True, overwrite_tensorboard_bool=False, sparse=False)[source]#

Reconstructs the class from a checkpoint.

Parameters:
  • path (Checkpoint) – The path to the checkpoint. Most models are saved in parts (encoder, decoder) and thus the provided path often needs a wildcard (*). The save() method of this class prints a string with which the model can be reloaded.

  • read_only (bool, optional) – Whether to reload the model in read_only mode (True) or allow the Autoencoder class to write to disk (False). This option might collide with the tensorboard Parameter in the respective parameters.json file in the maith_path. Defaults to True.

  • overwrite_tensorboard_bool (bool, optional) – Whether to overwrite the tensorboard Parameter while reloading the class. This can be set to True to set the tensorboard parameter False and allow read_only. Defaults to False.

Raises:

BadError – When read_only is True, overwrite_tensorboard_bool is False and the reloaded parameters have tensorboard set to True.

Returns:

Encodermap Autoencoder class.

Return type:

Autoencoder

generate(data)[source]#

Duplication of decode.

In Autoencoder and EncoderMap this method is equivalent to decode(). In AngleDihedralCartesianAutoencoder this method will be overwritten to produce output molecular conformations.

Parameters:

data (np.ndarray) – The data to be passed to the decoder part of the model. Make sure that the shape of the data matches the number of neurons in the latent space.

Returns:

Oue output from the decoder part.

Return type:

np.ndarray

property loss#

A list of loss functions passed to the model when it is compiled. When the main Autoencoder class is used and parameters.loss is ‘emap_cost’ this list is comprised of center_cost, regularization_cost, auto_cost. When the EncoderMap sub-class is used and parameters.loss is ‘emap_cost’ distance_cost is added to the list. When parameters.loss is not ‘emap_cost’, the loss can either be a string (‘mse’), or a function, that both are acceptable arguments for loss, when a keras model is compiled.

Type:

(Union[list, string, function])

property model#

The tf.keras.Model model used for training.

Type:

tf.keras.models.Model

plot_network()[source]#

Tries to plot the network using pydot, pydotplus and graphviz. Doesn’t raise an exception if plotting is not possible.

save(step=None)[source]#

Saves the model to the current path defined in parameters.main_path.

Parameters:

step (Union[int, None], optional) – Does not actually save the model at the given training step, but rather changes the string used for saving the model from an datetime format to another.

train()[source]#

Starts the training of the model.

class encodermap.EncoderMap(parameters=None, train_data: Optional[Union[np.ndarray, tf.Dataset]] = None, model=None, read_only=False, sparse=False)[source]#

Bases: Autoencoder

Complete copy of Autoencoder class but uses additional distance cost scaled by the SketchMap sigmoid params

classmethod from_checkpoint(checkpoint_path, read_only=True, overwrite_tensorboard_bool=False, sparse=False)[source]#

Reconstructs the model from a checkpoint.

property loss#

A list of loss functions passed to the model when it is compiled. When the main Autoencoder class is used and parameters.loss is ‘emap_cost’ this list is comprised of center_cost, regularization_cost, auto_cost. When the EncoderMap sub-class is used and parameters.loss is ‘emap_cost’ distance_cost is added to the list. When parameters.loss is not ‘emap_cost’, the loss can either be a string (‘mse’), or a function, that both are acceptable arguments for loss, when a keras model is compiled.

Type:

(Union[list, string, function])

class encodermap.EncoderMapBaseCallback(parameters: Optional[AnyParameters] = None)[source]#

Bases: Callback

Base class for multiple callbacks.

Can be used to implement new callbacks that can also use enocdermap.Parameters classes. A counter is increased after a tran_batch is finished. Based on the two attributes summary_step and checkpoint_step in the encodermap.Parameters classes the corresponding methods are called. Has two class attributes that are important:

steps_counter#

The current step counter. Increases every on_train_batch_end.

Type:

int

p (Union[encodermap.Parameters, encodermap.ADCParameters]

The parameters for this callback. Based on the summary_step and checkpoint_step of this parameters class different class-methods are called.

__init__(parameters: Optional[AnyParameters] = None) None[source]#

Instantiate the EncoderMapBaseCallback class.

Parameters:

parameters (Union[encodermap.Parameters, encodermap.ADCParameters, None], optional) – Parameters that will be used to print out the progress bar. If None is passed default values (check them with print(em.ADCParameters.defaults_description())) will be used. Defaults to None.

on_checkpoint_step(step: int, logs: Optional[dict] = None) None[source]#

Executed, when the currently finished batch matches encodermap.Parameters.checkpoint_step

Parameters:
  • step (int) – The number of the current step.

  • logs (Optional[dict]) – logs is a dict containing the metrics results.

on_summary_step(step: int, logs: Optional[dict] = None) None[source]#

Executed, when the currently finished batch matches encodermap.Parameters.summary_step

Parameters:
  • step (int) – The number of the current step.

  • logs (Optional[dict]) – logs is a dict containing the metrics results.

on_train_batch_end(batch: int, logs: Optional[dict] = None) None[source]#

Called after a batch ends. The number of batch is provided by keras.

This method is the backbone of all of encodermap’s callbacks. After every batch is method is called by keras. When the number of that batch matches either encodermap.Parameters.summary_step or encodermap.Parameters.checkpoint_step the code on self.on_summary_step, or self.on_checkpoint_step is executed. These methods should be overwritten by child classes.

Parameters:
  • batch (int) – The number of the current batch. Provided by keras.

  • logs (Optional[dict]) – logs is a dict containing the metrics results.

class encodermap.Featurizer(trajs, in_memory=True)[source]#

Bases: type

class encodermap.InteractivePlotting(autoencoder, trajs=None, data=None, ax=None, align_string='name CA', top=None, hist=False, scatter_kws={'s': 5}, ball_and_stick=False, top_index=0)[source]#

Bases: object

Class to open up an interactive plotting window.

Contains sub-classes to handle user-clickable menus and selectors.

trajs#

The trajs passed into this class.

Type:

encodermap.TrajEnsemble

fig#

The figure plotted onto. If ax is passed when this class is instantiated the parent figure will be fetched with self.fig = self.ax.get_figure()

Type:

matplotlib.figure

ax#

The axes where the lowd data of the trajs is plotted on.

Type:

matplotlib.axes

menu_ax#

The axes where the normal menu is plotted on.

Type:

matplotlib.axes

status_menu_ax#

The axes on which the status menu is plotted on.

Type:

matplotlib.axes

pts#

The points which are plotted. Based on some other class variables the color of this collection is adjusted.

Type:

matplotlib.collections.Collection

statusmenu#

The menu containing the status buttons.

Type:

encodermap.plot.utils.StatusMenu

menu#

The menu containing the remaining buttons.

Type:

encodermap.plot.utils.Menu

tool#

The currentlty active tool used to select points. This can be lasso, polygon, etc…

Type:

encodermap.plot.utils.SelectFromCollection

mode#

Current mode of the statusmenu.

Type:

str

Examples

>>> sess = ep.InteractivePlotting(trajs)
__init__(autoencoder, trajs=None, data=None, ax=None, align_string='name CA', top=None, hist=False, scatter_kws={'s': 5}, ball_and_stick=False, top_index=0)[source]#

Instantiate the InteractivePlotting class.

Parameters:
  • trajs (encodermap.TrajEnsemble) – The trajs of which the lowd info should be plotted.

  • ax (matplotlib.axes, optional) – On what axes to plot. If no axes is provided a new figure and axes will be created defaults to None.

accept()[source]#
bezier()[source]#
property cluster_zoomed#
ellipse()[source]#
lasso()[source]#
property mode#
on_click(event)[source]#

Decides whether the release event happened in the drawing area or the menu.

Parameters:

event (matplotlib.backend_bases.Event) – The event provided by figure.canvas.connect().

on_click_menu(event)[source]#

Chooses the function to call based on what MenuItem was clicked.

Parameters:

event (matplotlib.backend_bases.Event) – The event provided by figure.canvas.connect().

on_click_tool(event)[source]#

Left here for convenience if some tools need a button release event.

on_enter_ax(event)[source]#

Chosses the tool to use when self.ax is entered, based on current mode.

Parameters:

event (matplotlib.backend_bases.Event) – The event provided by figure.canvas.connect().

on_leave_ax(event)[source]#

Disconnect the current tool.

path()[source]#
polygon()[source]#
rectangle()[source]#
render_move()[source]#
reset()[source]#

Called when ‘Reset’ is pressed.

set_points()[source]#

Called when ‘Set Points’ is pressed.

write()[source]#

Called when ‘Write’ is pressed.

class encodermap.MolData(atom_group, cache_path='', start=None, stop=None, step=None)[source]#

Bases: object

MolData is designed to extract and hold conformational information from trajectories.

Variables:
  • cartesians – numpy array of the trajectory atom coordinates

  • central_cartesians – cartesian coordinates of the central backbone atoms (N-CA-C-N-CA-C…)

  • dihedrals – all backbone dihederals (phi, psi, omega)

  • angles – all bond angles of the central backbone atoms

  • lengths – all bond lengths between neighbouring central atoms

  • sidedihedrals – all sidechain dihedrals

  • aminoaciddict – number of sidechain diheadrals

__init__(atom_group, cache_path='', start=None, stop=None, step=None)[source]#
Parameters:
  • atom_group – MDAnalysis atom group

  • cache_path – Allows to define a path where the calculated variables can be cached.

  • start – first frame to analyze

  • stop – last frame to analyze

  • step – step of the analyzes

static sort_key(atom)[source]#
write(path, coordinates, name='generated', formats=('pdb', 'xtc'), only_central=False, align_reference=None, align_select='all')[source]#

Writes a trajectory for the given coordinates.

Parameters:
  • path – directory where to save the trajectory

  • coordinates – numpy array of xyz coordinates (frames, atoms, xyz)

  • name – filename (without extension)

  • formats – specify which formats schould be used to write structure and trajectory. default: (“pdb”, “xtc”)

  • only_central – if True only central atom coordinates are expected (N-Ca-C…)

  • align_reference – Allows to allign the generated conformations according to some reference. The reference should be given as MDAnalysis atomgroup

  • align_select – Allows to select which atoms should be used for the alignment. e.g. “resid 5:60” default is “all”. Have a look at the MDAnalysis selection syntax for more details.

Returns:

class encodermap.Parameters(**kwargs: Optional[Union[float, int, str, bool, list[int], list[str], list[float], tuple[int, None]]])[source]#

Bases: ParametersFramework

Class to hold Parameters for the Autoencoder

Parameters can be set via keyword args while instantiating the class, set as instance attributes or read from disk. This class can write parameters to disk in .yaml or .json format.

defaults#

Classvariable dict that holds the defaults even when the current values might have changed.

Type:

dict

main_path#

Defines a main path where the parameters and other things might be stored.

Type:

str

n_neurons#

List containing number of neurons for each layer up to the bottleneck layer. For example [128, 128, 2] stands for an autoencoder with the following architecture {i, 128, 128, 2, 128, 128, i} where i is the number of dimensions of the input data. These are Input/Output Layers that are not trained.

Type:

list of int

activation_functions#

List of activation function names as implemented in TensorFlow. For example: “relu”, “tanh”, “sigmoid” or “” to use no activation function. The encoder part of the network takes the activation functions from the list starting with the second element. The decoder part of the network takes the activation functions in reversed order starting with the second element form the back. For example [“”, “relu”, “tanh”, “”] would result in a autoencoder with {“relu”, “tanh”, “”, “tanh”, “relu”, “”} as sequence of activation functions.

Type:

list of str

periodicity#

Defines the distance between periodic walls for the inputs. For example 2pi for angular values in radians. All periodic data processed by EncoderMap must be wrapped to one periodic window. E.g. data with 2pi periodicity may contain values from -pi to pi or from 0 to 2pi. Set the periodicity to float(“inf”) for non-periodic inputs.

Type:

float

learning_rate#

Learning rate used by the optimizer.

Type:

float

n_steps#

Number of training steps.

Type:

int

batch_size#

Number of training points used in each training step

Type:

int

summary_step#

A summary for TensorBoard is writen every summary_step steps.

Type:

int

checkpoint_step#

A checkpoint is writen every checkpoint_step steps.

Type:

int

dist_sig_parameters#

Parameters for the sigmoid functions applied to the high- and low-dimensional distances in the following order (sig_h, a_h, b_h, sig_l, a_l, b_l)

Type:

tuple of floats

distance_cost_scale#

Adjusts how much the distance based metric is weighted in the cost function.

Type:

int

auto_cost_scale#

Adjusts how much the autoencoding cost is weighted in the cost function.

Type:

int

auto_cost_variant#

defines how the auto cost is calculated. Must be one of: * mean_square * mean_abs * mean_norm

Type:

str

center_cost_scale#

Adjusts how much the centering cost is weighted in the cost function.

Type:

float

l2_reg_constant#

Adjusts how much the L2 regularisation is weighted in the cost function.

Type:

float

gpu_memory_fraction#

Specifies the fraction of gpu memory blocked. If set to 0, memory is allocated as needed.

Type:

float

analysis_path#

A path that can be used to store analysis

Type:

str

id#

Can be any name for the run. Might be useful for example for specific analysis for different data sets.

Type:

str

model_api#

A string defining the API to be used to build the keras model. Defaults to sequntial. Possible strings are: * functional will use keras’ functional API. * sequential will define a keras Model, containing two other models with the Sequential API.

These two models are encoder and decoder.

  • custom will create a custom Model where even the layers are custom.

Type:

str

loss#

A string defining the loss function. Defaults to emap_cost. Possible losses are: * reconstruction_loss will try to train output == input * mse: Returns a mean squared error loss. * emap_cost is the EncoderMap loss function. Depending on the class Autoencoder,

Encodermap, `ACDAutoencoder, different contributions are used for a combined loss. Autoencoder uses atuo_cost, reg_cost, center_cost. EncoderMap class adds sigmoid_loss.

Type:

str

batched#

Whether the dataset is batched or not.

Type:

bool

training#

A string defining what kind of training is performed when autoencoder.train() is callsed. * auto does a regular model.compile() and model.fit() procedure. * custom uses gradient tape and calculates losses and gradients manually.

Type:

str

tensorboard#

Whether to print tensorboard information. Defaults to False.

Type:

bool

seed#

Fixes the state of all operations using random numbers. Defaults to None.

Type:

Union[int, None]

Examples

>>> import encodermap as em
>>> paramters = em.Parameters()
>>> parameters.auto_cost_variant
mean_abs
>>> parameters.save(path='/path/to/dir')
/path/to/dir/parameters.json
>>> # alternative constructor
>>> new_params = em.Parameters.from_file('/path/to/dir/parameters.json')
>>> new_params.main_path
/path/to/dir/parameters.json
__init__(**kwargs: Optional[Union[float, int, str, bool, list[int], list[str], list[float], tuple[int, None]]]) None[source]#

Instantiate the Parameters class

Takes a dict as input and overwrites the class defaults. The dict is directly stored as an attribute and can be accessed via instance attributes.

Parameters:

**kwargs (dcit) – Dict containing values. If unknown keys are passed they will be dropped.

activation_functions: list[str]#
defaults = {'activation_functions': ['', 'tanh', 'tanh', ''], 'analysis_path': '', 'auto_cost_scale': 1, 'auto_cost_variant': 'mean_abs', 'batch_size': 256, 'batched': True, 'center_cost_scale': 0.0001, 'checkpoint_step': 5000, 'dist_sig_parameters': (4.5, 12, 6, 1, 2, 6), 'distance_cost_scale': 500, 'gpu_memory_fraction': 0, 'id': '', 'l2_reg_constant': 0.001, 'learning_rate': 0.001, 'loss': 'emap_cost', 'model_api': 'sequential', 'n_neurons': [128, 128, 2], 'n_steps': 100000, 'periodicity': 6.283185307179586, 'seed': None, 'summary_step': 10, 'tensorboard': False, 'training': 'auto'}#
classmethod defaults_description() str[source]#

str: A string that contains tabulated default parameter values.

n_neurons: list[int]#
class encodermap.Repository(repo_source='data/repository.yaml', checksum_file='data/repository.md5', ignore_checksums=False, debug=True)[source]#

Bases: object

Main Class to work with Repositories of MD data and download the data.

This class handles the download of files from a repository source. All data are obtained from a .yaml file (default at data/repository.yaml), which contains trajectory files and topology files organized in a readable manner. With this class the repository.yaml file can be queried using unix-like file patterns. Files can be downloaded on-the-fly (if they already exist, they won’t be downloaded again). Besides files full projects can be downloaded and rebuilt.

current_path#

Path of the .py file containing this class. If no working directory is given (None), all files will be downloaded to a directory named ‘data’ (will be created) which will be placed in the directory of this .py file.

Type:

str

url#

The url to the current repo source.

Type:

str

maintainer#

The maintainer of the current repo source.

Type:

str

files_dict#

A dictionary summarizing the files in this repo. dict keys are built from ‘project_name’ + ‘filetype’. So for a project called ‘protein_sim’, possible keys are ‘protein_sim_trajectory’, ‘protein_sim_topology’, ‘protein_sim_log’. The values of these keys are all str and they give the actual filename of the files. If ‘protein_sim’ was conducted with GROMACS, these files would be ‘traj_comp.xtc’, ‘confout.gro’ and ‘md.log’.

Type:

dict

files#

Just a list of str of all downloadable files.

Type:

list

data#

The main organization of the repository. This is the complete .yaml file as it was read and returned by pyyaml.

Type:

dict

Examples

>>> import encodermap as em
>>> repo = em.Repository()
>>> print(repo.search('*PFFP_sing*')) 
{'PFFP_single_trajectory': 'PFFP_single.xtc', 'PFFP_single_topology': 'PFFP_single.gro', 'PFFP_single_input': 'PFFP.mdp', 'PFFP_single_log': 'PFFP.log'}
>>> print(repo.url)
http://134.34.112.158
__init__(repo_source='data/repository.yaml', checksum_file='data/repository.md5', ignore_checksums=False, debug=True)[source]#

Initialize the repository,

Parameters:
  • repo_source (str) – The source .yaml file to build the repository from. Defaults to ‘data/repository.yaml’.

  • checksum_file (str) – A file containing the md5 hash of the repository file. This ensures no one tampers with the repository.yaml file and injects malicious code. Defaults to ‘data/repository.md5’.

  • ignore_checksums (bool) – If you want to ignore the checksum check of the repo_source file set this top True. Can be useful for developing, when the repository.yaml file undergoes a lot of changes. Defaults to False.

  • debug (bool, optional) – Whether to print debug info. Defaults to False.

_get_connection()[source]#

Also compatibility with mdshare

static _split_proj_filetype(proj_filetype)[source]#

Splits the strings that index the self.datasets dictionary.

property catalogue#

Returns the underlying catalogue data.

Type:

dict

property datasets#

A set of datasets in this repository. A dataset can either be characterized by a set of trajectory-, topology-, log- and input-file or a dataset is a .tar.gz container, which contains all necessary files.

Type:

set

fetch(remote_filenames, working_directory=None, overwrite=False, max_attempts=3, makdedir=False, progress_bar=True)[source]#

This fetches a singular file from self.files.

Displays also progress bar with the name of the file. Uses requests.

Parameters:
  • remote_filename (str) – The name of the remote file. Check self.files for more info.

  • working_directory (Union[str, None], optional) – Can be a string to a directory to save the files at. Can also be None. In that case self.current_path + ‘/data’ will be used to save the file at. Which is retrieved by inspect.getfile(inspect.currentframe)). If the files are already there and overwrite is false, the file path is simply returned. Defaults to None.

  • overwrite (bool, optional) – Whether to overwrite local files. Defaults to False.

  • max_attempts (int, optional) – Number of download attempts. Defaults to 3.

  • makdedir (bool, optional) – Whether to create working_directory, if it is not already existing. Defaults to False.

  • progress_bar (bool, optional) – Uses the package progress-reporter to display a progress bar.

Returns:

A tuple containing the following:

list: A list of files that have just been downloaded. str: A string leading to the directory the files have been downloaded to.

Return type:

tuple

get_sizes(pattern)[source]#

Returns a list of file-sizes of a given pattern.

Parameters:

pattern (Union[str, list]) – A unix-like pattern (‘traj*.xtc’) or a list of files ([‘traj_1.xtc’, ‘traj_2.xtc’]).

Returns:

A list of filesizes in bytes.

Return type:

list

load_project(project, working_directory=None, overwrite=False, max_attempts=3, makdedir=False, progress_bar=True)[source]#

This will return TrajEnsemble / SingleTraj objects that are correctly formatted.

This method allows one to directly rebuild projects from the repo source, using encodermap’s own SingleTraj and TrajEnsemble classes.

Parameters:
  • project (str) – The name of the project to be loaded. See Repository.projects.keys() for a list of projects.

  • working_directory (Union[str, None], optional) – Can be a string to a directory to save the files at. Can also be None. In that case self.current_path + ‘/data’ will be used to save the file at. Which is retrieved by inspect.getfile(inspect.currentframe)). If the files are already there and overwrite is false, the file path is simply returned. Defaults to None.

  • overwrite (bool, optional) – Whether to overwrite local files. Defaults to False.

  • max_attempts (int, optional) – Number of download attempts. Defaults to 3.

  • makdedir (bool, optional) – Whether to create working_directory, if it is not already existing. Defaults to False.

  • progress_bar (bool, optional) – Uses the package progress-reporter to display a progress bar.

Returns:

The project already loaded into encodermap’s

SingleTraj or TrajEnsemble classes.

Return type:

Union[encodermap.SingleTraj, encodermap.TrajEnsemble]

Examples

>>> import encodermap as em
>>> repo = em.Repository()
>>> trajs = repo.load_project('Tetrapeptides_Single')
>>> print(trajs)
encodermap.TrajEnsemble object. Current backend is no_load. Containing 2 trajs. Common str is ['PFFP', 'FPPF']. Not containing any CVs.
>>> print(trajs.n_trajs)
2
lookup(file)[source]#

Piece of code to allow some compatibility to mdshare.

The complete self.data dictionary will be traversed to find file and its location in the self.data dictionary. This will be used to get the filesize and its md5 hash. The returned tuple also tells whether the file is a .tar.gz container or not. In the case of a container, the container needs to be extracted using tarfile.

Parameters:

file (str) – The file to search for.

Returns:

A tuple containing the follwing:

str: A string that is either ‘container’ or ‘index’ (for normal files). dict: A dict with dict(file=filename, hash=filehas, size=filesize)

Return type:

tuple

print_catalogue()[source]#

Prints the catalogue nicely formatted.

property projects#

A dictionary containing project names and their associated files. Projects are a larger collection of individual sims, that belong together. The project names are the dictionary’s keys, the files are given as lists in the dict’s values.

Type:

dict

search(pattern)[source]#
stack(pattern)[source]#

Creates a stack to prepare for downloads.

Parameters:

pattern (Union[str, list]) – A unix-like pattern (‘traj*.xtc’) or a list of files ([‘traj_1.xtc’, ‘traj_2.xtc’]).

Returns:

A list of dicts. Each dict contains filename, size and a boolean

value telling whether the downloaded file needs to be extracted after downloading.

Return type:

list

encodermap.function(debug=False)[source]#
encodermap.load(trajs: Union[str, md.Trajectory, Sequence[str], Sequence[md.Trajectory]], tops: Optional[Union[str, md.Topology, Sequence[str], Sequence[md.Topology]]] = None, common_str: Optional[str, list[str]] = None, backend: Literal['no_load', 'mdtraj'] = 'no_load', index: Optional[Union[int, np.ndarray, list[int], slice]] = None, traj_num: Optional[int] = None, basename_fn: Optional[Callable] = None) Union[SingleTraj, TrajEnsemble][source]#

Encodermap’s forward facing function to work with MD data of single or more trajectories.

Based what’s provided for trajs, you either get a SingleTraj object, that collects information about a single traj, or a TrajEnsemble object, that contains information of multiple trajectories (even with different topologies).

Parameters:
  • trajs (Union[str, md.Trajectory, Sequence[str], Sequence[md.Trajectory], Sequence[SingleTraj]]) – Here, you can provide a single string pointing to a trajectory on your computer (/path/to/traj_file.xtc) or (/path/to/protein.pdb) or a list of such strings. In the former case, you will get a SingleTraj object which is encodermap’s way of storing data (positions, CVs, times) of a single trajectory. In the latter case, you will get a TrajEnsemble object, which is Encodermap’s way of working with mutlipel SingleTrajs.

  • tops (Optional[Union[str, md.Topology, Sequence[str], Sequence[md.Topology]]]) – For this argument, you can provide the topology(ies) of the corresponding traj(s). Trajectory file formats like .xtc and .dcd only store atomic positions and not weights, elements, or bonds. That’s what the tops argument is for. There are some trajectory file formats out there (MDTraj HDF5, AMBER netCDF4) that store both trajectory and topology in a single file. Also .pdb file can also be used as If you provide such files for trajs, you can leave tops as None. If you provide multiple files for trajs, you can still provide a single tops file, if the trajs in trajs share the same topology. If that is not the case, you can either provide a list of topologies, matched to the trajs in trajs, or use the common_str argument to match them. Defaults to None.

  • common_str (Optional[str, list[str]]) –

    If you provided a different number of trajs and tops, this argument is used to match them. Let’s say, you have 5 trajectories of a wild type protein and 5 trajectories of a mutant. If the path to these files is somewhat consistent (e.g:

    • /path/to/wt/traj1.xtc

    • /different/path/to/wt/traj_no_water.xtc

    • /data/path/to/mutant/traj0.xtc

    • /data/path/to/mutant/traj0.xtc

    ), you can provide [‘wt’, ‘mutant’] for the common_str argument and the files are grouped based on the occurence of ‘wt’ and ‘mutant’ in ther filepaths. Defaults to None.

  • backend (Literal["no_load", "mdtraj"]) – Normally, encodermap postpones the actual loading of the atomic positions until you really need them. This accelerates the handling of large trajectory ensembles. Choosing ‘mdtraj’ as the backend, all atomic positions are always loaded, taking up space on your system memory, but accessing positions in a non-sequential fashion is faster. Defaults to ‘no_load’.

  • index (Optional[Union[int, np.ndarray, list[int], slice]]) –

    Only used, if argument trajs is a single trajectory. This argument can be used to index the trajectory data. If you want to exclude the first 100 frames of your trajectory, because the protein relaxes from its crystal structure, you can load it like so:

    em.load(traj_file, top_file, index=slice(100))

    As encodermap lazily evaluates positional data, the slice(100) argument is stored until the data is accessed in which case the first 100 frames are not accessible. Just like, if you would have deleted them. Besides a slice, you can also provide int (which returns a single frame at the requested index) and lists of int (which returns frames at the locations indexed by the ints in the list). If None is provided the trajectory data is not sliced/subsampled. Defaults to None.

  • traj_num (Optional[int]) –

    Only used, if argument trajs is a single trajectory. This argument is meant to organize the SingleTraj trajectories in a TrajEnsemble class. Of course you can build your own TrajEnsemble from

    a list of SingleTraj`s and provide this list as the `trajs argument to

    em.load(). In this case you need to set the `traj_num`s of the `SingleTraj`s yourself. Defaults to None.

  • basename_fn (Optional[Callable]) – A function to apply to the traj_file string to return the basename of the trajectory. If None is provided, the filename without extension will be used. When all files are named the same and the folder they’re in defines the name of the trajectory you can supply lambda x: split(‘/’)[-2] as this argument. Defaults to None.

Examples

>>> # load a pdb file with 14 frames from rcsb.org
>>> import encodermap as em
>>> traj = em.load("https://files.rcsb.org/view/1GHC.pdb")
>>> print(traj)
encodermap.SingleTraj object. Current backend is no_load. Basename is 1GHC. Not containing any CVs.
>>> traj.n_frames
14
>>> # load multiple trajs
>>> trajs = em.load(['https://files.rcsb.org/view/1YUG.pdb', 'https://files.rcsb.org/view/1YUF.pdb'])
>>> # trajs are inernally numbered
>>> print([traj.traj_num for traj in trajs])