encodermap package#
Subpackages#
- encodermap.autoencoder package
- Submodules
- encodermap.autoencoder.autoencoder module
AngleDihedralCartesianEncoderMap
AngleDihedralCartesianEncoderMap.__init__()
AngleDihedralCartesianEncoderMap._setup_callbacks()
AngleDihedralCartesianEncoderMap.encode()
AngleDihedralCartesianEncoderMap.from_checkpoint()
AngleDihedralCartesianEncoderMap.generate()
AngleDihedralCartesianEncoderMap.get_train_data_from_trajs()
AngleDihedralCartesianEncoderMap.loss
AngleDihedralCartesianEncoderMap.save()
AngleDihedralCartesianEncoderMap.train()
Autoencoder
Autoencoder.train_data
Autoencoder.p
Autoencoder.dataset
Autoencoder.read_only
Autoencoder.optimizer
Autoencoder.metrics
Autoencoder.callbacks
Autoencoder.encoder
Autoencoder.decoder
Autoencoder.from_checkpoint()
Autoencoder.add_images_to_tensorboard()
Autoencoder.train()
Autoencoder.plot_network()
Autoencoder.encode()
Autoencoder.decode()
Autoencoder.generate()
Autoencoder.__init__()
Autoencoder._setup_callbacks()
Autoencoder.add_images_to_tensorboard()
Autoencoder.close()
Autoencoder.decode()
Autoencoder.decoder
Autoencoder.encode()
Autoencoder.encoder
Autoencoder.from_checkpoint()
Autoencoder.generate()
Autoencoder.loss
Autoencoder.model
Autoencoder.plot_network()
Autoencoder.save()
Autoencoder.train()
DihedralEncoderMap
EncoderMap
- Module contents
- encodermap.callbacks package
- encodermap.data package
- encodermap.encodermap_tf1 package
- Submodules
- encodermap.encodermap_tf1.angle_dihedral_cartesian_encodermap module
AngleDihedralCartesianEncoderMap
AngleDihedralCartesianEncoderMap.__init__()
AngleDihedralCartesianEncoderMap._angle_cost()
AngleDihedralCartesianEncoderMap._cartesian_cost()
AngleDihedralCartesianEncoderMap._cartesian_distance_cost()
AngleDihedralCartesianEncoderMap._dihedral_cost()
AngleDihedralCartesianEncoderMap._distance_cost()
AngleDihedralCartesianEncoderMap._prepare_data()
AngleDihedralCartesianEncoderMap._setup_cost()
AngleDihedralCartesianEncoderMap._setup_network()
AngleDihedralCartesianEncoderMap.generate()
AngleDihedralCartesianEncoderMapDummy
- encodermap.encodermap_tf1.autoencoder module
Autoencoder
Autoencoder.__init__()
Autoencoder._auto_cost()
Autoencoder._center_cost()
Autoencoder._encode()
Autoencoder._generate()
Autoencoder._l2_reg_cost()
Autoencoder._prepare_data()
Autoencoder._random_batch()
Autoencoder._setup_cost()
Autoencoder._setup_data_iterator()
Autoencoder._setup_network()
Autoencoder._step()
Autoencoder.close()
Autoencoder.encode()
Autoencoder.generate()
Autoencoder.profile()
Autoencoder.train()
- encodermap.encodermap_tf1.backmapping module
- encodermap.encodermap_tf1.encodermap module
- encodermap.encodermap_tf1.misc module
- encodermap.encodermap_tf1.moldata module
- encodermap.encodermap_tf1.parameters module
- encodermap.encodermap_tf1.plot module
ManualPath
ManualPath.__init__()
ManualPath._add_point_interp()
ManualPath._free_draw()
ManualPath._free_draw_callback()
ManualPath._grab_background()
ManualPath._interpolate()
ManualPath._on_click()
ManualPath._on_key()
ManualPath._reset_lines()
ManualPath._update_interp()
ManualPath._update_lines()
ManualPath.use_points()
PathGenerateCartesians
PathGenerateDihedrals
PathSelect
distance_histogram()
- Module contents
- encodermap.loading package
- Submodules
- encodermap.loading.dask_featurizer module
- encodermap.loading.delayed module
- encodermap.loading.features module
- encodermap.loading.featurizer module
- encodermap.loading.pipeline module
- encodermap.loading.utils module
- Module contents
- encodermap.loss_functions package
- encodermap.misc package
- Submodules
- encodermap.misc.backmapping module
- encodermap.misc.clustering module
- encodermap.misc.distances module
- encodermap.misc.errors module
- encodermap.misc.function_def module
- encodermap.misc.misc module
- encodermap.misc.saving_loading_models module
- encodermap.misc.summaries module
- encodermap.misc.transformations module
- encodermap.misc.xarray module
- encodermap.misc.xarray_save_wrong_hdf5 module
- Module contents
- encodermap.models package
- encodermap.moldata package
- encodermap.parameters package
- Submodules
- encodermap.parameters.parameters module
ADCParameters
ADCParameters.cartesian_pwd_start
ADCParameters.cartesian_pwd_stop
ADCParameters.cartesian_pwd_step
ADCParameters.use_backbone_angles
ADCParameters.use_sidechains
ADCParameters.angle_cost_scale
ADCParameters.angle_cost_variant
ADCParameters.angle_cost_reference
ADCParameters.dihedral_cost_scale
ADCParameters.dihedral_cost_variant
ADCParameters.dihedral_cost_reference
ADCParameters.side_dihedral_cost_scale
ADCParameters.side_dihedral_cost_variant
ADCParameters.side_dihedral_cost_reference
ADCParameters.cartesian_cost_scale
ADCParameters.cartesian_cost_scale_soft_start
ADCParameters.cartesian_cost_variant
ADCParameters.cartesian_cost_reference
ADCParameters.cartesian_dist_sig_parameters
ADCParameters.cartesian_distance_cost_scale
ADCParameters.__init__()
ADCParameters.activation_functions
ADCParameters.defaults
ADCParameters.defaults_description()
ADCParameters.n_neurons
Parameters
Parameters.defaults
Parameters.main_path
Parameters.n_neurons
Parameters.activation_functions
Parameters.periodicity
Parameters.learning_rate
Parameters.n_steps
Parameters.batch_size
Parameters.summary_step
Parameters.checkpoint_step
Parameters.dist_sig_parameters
Parameters.distance_cost_scale
Parameters.auto_cost_scale
Parameters.auto_cost_variant
Parameters.center_cost_scale
Parameters.l2_reg_constant
Parameters.gpu_memory_fraction
Parameters.analysis_path
Parameters.id
Parameters.model_api
Parameters.loss
Parameters.batched
Parameters.training
Parameters.tensorboard
Parameters.seed
Parameters.__init__()
Parameters.activation_functions
Parameters.defaults
Parameters.defaults_description()
Parameters.n_neurons
_datetime_windows_and_linux_compatible()
- Module contents
- encodermap.plot package
- Submodules
- encodermap.plot.interactive_plotting module
InteractivePlotting
InteractivePlotting.trajs
InteractivePlotting.fig
InteractivePlotting.ax
InteractivePlotting.menu_ax
InteractivePlotting.status_menu_ax
InteractivePlotting.pts
InteractivePlotting.statusmenu
InteractivePlotting.menu
InteractivePlotting.tool
InteractivePlotting.mode
InteractivePlotting.__init__()
InteractivePlotting.accept()
InteractivePlotting.bezier()
InteractivePlotting.cluster_zoomed
InteractivePlotting.ellipse()
InteractivePlotting.lasso()
InteractivePlotting.mode
InteractivePlotting.on_click()
InteractivePlotting.on_click_menu()
InteractivePlotting.on_click_tool()
InteractivePlotting.on_enter_ax()
InteractivePlotting.on_leave_ax()
InteractivePlotting.path()
InteractivePlotting.polygon()
InteractivePlotting.rectangle()
InteractivePlotting.render_move()
InteractivePlotting.reset()
InteractivePlotting.set_points()
InteractivePlotting.write()
- encodermap.plot.jinja_template module
- encodermap.plot.plotting module
- encodermap.plot.utils module
Bernstein()
Bezier()
BezierBuilder
DummyTool
Menu
MenuItem
ModeButton
Props
SelectFromCollection
StatusMenu
_check_all_templates_defined()
_create_readme()
_get_system_info()
_match_tops_and_trajs()
_unpack_cluster_info()
_unpack_path_info()
abc_to_rgb()
calculate_dssps()
correct_missing_uniques()
digitize_dssp()
- Module contents
- encodermap.trajinfo package
- Submodules
- encodermap.trajinfo.hash_files module
- encodermap.trajinfo.info_all module
TrajEnsemble
TrajEnsemble.CVs
TrajEnsemble._CVs
TrajEnsemble.n_trajs
TrajEnsemble.n_frames
TrajEnsemble.locations
TrajEnsemble.top
TrajEnsemble.basenames
TrajEnsemble.name_arr
TrajEnsemble.index_arr
TrajEnsemble.CVs
TrajEnsemble.CVs_in_file
TrajEnsemble._CVs
TrajEnsemble.__add__()
TrajEnsemble.__init__()
TrajEnsemble._pyemma_indexing()
TrajEnsemble._return_trajs_by_index()
TrajEnsemble._string_summary()
TrajEnsemble.basenames
TrajEnsemble.frames
TrajEnsemble.from_textfile()
TrajEnsemble.from_xarray()
TrajEnsemble.get_single_frame()
TrajEnsemble.id
TrajEnsemble.index_arr
TrajEnsemble.iterframes()
TrajEnsemble.itertrajs()
TrajEnsemble.load_CVs()
TrajEnsemble.load_trajs()
TrajEnsemble.locations
TrajEnsemble.n_frames
TrajEnsemble.n_residues
TrajEnsemble.n_trajs
TrajEnsemble.name_arr
TrajEnsemble.save()
TrajEnsemble.save_CVs()
TrajEnsemble.split_into_frames()
TrajEnsemble.subsample()
TrajEnsemble.top
TrajEnsemble.top_files
TrajEnsemble.traj_files
TrajEnsemble.traj_joined
TrajEnsemble.traj_nums
TrajEnsemble.unload()
TrajEnsemble.xyz
_can_be_feature()
_datetime_windows_and_linux_compatible()
- encodermap.trajinfo.info_single module
SingleTraj
SingleTraj.backend
SingleTraj.common_str
SingleTraj.index
SingleTraj.traj_num
SingleTraj.traj_file
SingleTraj.top_file
SingleTraj.CVs
SingleTraj.CVs_in_file
SingleTraj.__add__()
SingleTraj.__enter__()
SingleTraj.__eq__()
SingleTraj.__exit__()
SingleTraj.__getattr__()
SingleTraj.__getitem__()
SingleTraj.__init__()
SingleTraj.__iter__()
SingleTraj.__reversed__()
SingleTraj._add_along_traj()
SingleTraj._gen_ensemble()
SingleTraj._mdtraj_attr
SingleTraj._n_frames_base_h5_file
SingleTraj._original_frame_indices
SingleTraj._string_summary()
SingleTraj._traj
SingleTraj._validate_uri()
SingleTraj.atom_slice()
SingleTraj.basename
SingleTraj.extension
SingleTraj.from_pdb_id()
SingleTraj.get_single_frame()
SingleTraj.id
SingleTraj.join()
SingleTraj.load_CV()
SingleTraj.load_traj()
SingleTraj.n_atoms
SingleTraj.n_chains
SingleTraj.n_frames
SingleTraj.n_residues
SingleTraj.save()
SingleTraj.save_CV_as_numpy()
SingleTraj.select()
SingleTraj.show_traj()
SingleTraj.stack()
SingleTraj.superpose()
SingleTraj.top
SingleTraj.chains
SingleTraj.residues
SingleTraj.atoms
SingleTraj.bonds
SingleTraj.top_file
SingleTraj.traj
SingleTraj.traj_file
SingleTraj.unload()
_load_traj()
- encodermap.trajinfo.load_traj module
- encodermap.trajinfo.repository module
Repository
Repository.current_path
Repository.url
Repository.maintainer
Repository.files_dict
Repository.files
Repository.data
Repository.__init__()
Repository._get_connection()
Repository._split_proj_filetype()
Repository.catalogue
Repository.datasets
Repository.fetch()
Repository.get_sizes()
Repository.load_project()
Repository.lookup()
Repository.print_catalogue()
Repository.projects
Repository.search()
Repository.stack()
- encodermap.trajinfo.trajinfo_deprecated module
- encodermap.trajinfo.trajinfo_utils module
- Module contents
Submodules#
encodermap._optional_imports module#
Optional imports of python packages.
Allows you to postpone import exceptions. Basically makes the codebase of EncoderMap leaner, so that users don’t need to install packages for features they don’t require.
Examples
>>> from encodermap._optional_imports import _optional_import
>>> np = _optional_import('numpy')
>>> np.array([1, 2, 3])
array([1, 2, 3])
>>> nonexistent = _optional_import('nonexistent_package')
>>> try:
... nonexistent.function()
... except ValueError as e:
... print(e)
Install the `nonexistent_package` package to make use of this feature.
>>> try:
... _ = nonexistent.variable
... except ValueError as e:
... print(e)
Install the `nonexistent_package` package to make use of this feature.
>>> numpy_random = _optional_import('numpy', 'random.random')
>>> np.random.seed(1)
>>> np.round(numpy_random((5, 5)) * 20, 0)
array([[ 8., 14., 0., 6., 3.],
[ 2., 4., 7., 8., 11.],
[ 8., 14., 4., 18., 1.],
[13., 8., 11., 3., 4.],
[16., 19., 6., 14., 18.]])
encodermap._typing module#
Typing for the encodermap package
encodermap._version module#
Encodermap’s versioning follows semantic versioning guidelines. Read more about them here: https://semver.org/
tldr: Given a version number MAJOR.MINOR.PATCH, increment the:
MAJOR version when you make incompatible API changes,
MINOR version when you add functionality in a backwards compatible manner, and
PATCH version when you make backwards compatible bug fixes.
Additional labels for pre-release and build metadata are available as extensions to the MAJOR.MINOR.PATCH format.
Current example: Currenlty I am writing this documentation. Writing this will not break an API, nor does it add functionality, nor does it fixes bugs. Thus, the version stays at 3.0.0
Module contents#
EncoderMap: Dimensionality reduction for molecular dynamics.
EncoderMap provides a framework for using molecular dynamics data with with the tensorflow library. It started as the implementation of a neural network autoencoder to do dimensionality reduction and also create new high-dimensional data from the low-dimensional embedding. The user was still required to create their own dataset and provide the numpy arrays. In the second iteration of EncoderMap, the possibility to provide molecular dynamics data with the MolData class was added. A new neural network architecture was implemented to try and rebuild cartesian coordinates from the low-dimensional embedding.
This iteration of EncoderMap continues this endeavour by porting the old code to the newer tensorflow version (2.x). However, more has been added which should aid computational chemists and also structural biologists:
New trajectory classes with lazy loading of coordinates to save disk space.
- Featurization which can be parallelized using the distributed computing
library dask.
Interactive matplotlib plots for clustering and structure creation.
- Neural network building blocks that allows users to easily build new
nural networks.
Sparse networks allow comparison of proteins with different topologies.
- class encodermap.ADCParameters(**kwargs: Optional[Union[float, int, str, bool, list[int], list[str], list[float], tuple[int, None]]])[source]#
Bases:
ParametersFramework
This is the parameter object for the AngleDihedralCartesianEncoder. It holds all the parameters that the Parameters object includes, plus the following attributes:
- cartesian_pwd_start#
Index of the first atom to use for the pairwise distance calculation.
- Type:
int
- cartesian_pwd_stop#
Index of the last atom to use for the pairwise distance calculation.
- Type:
int
- cartesian_pwd_step#
Step for the calculation of paiwise distances. E.g. for a chain of atoms N-C_a-C-N-C_a-C… cartesian_pwd_start=1 and cartesian_pwd_step=3 will result in using all C-alpha atoms for the pairwise distance calculation.
- Type:
int
- use_backbone_angles#
Allows to define whether backbone bond angles should be learned (True) or if instead mean values should be used to generate conformations (False).
- Type:
bool
- use_sidechains#
Whether sidechain dihedrals should be passed through the autoencoder.
- Type:
bool
- angle_cost_scale#
Adjusts how much the angle cost is weighted in the cost function.
- Type:
int
- angle_cost_variant#
Defines how the angle cost is calculated. Must be one of: * “mean_square” * “mean_abs” * “mean_norm”.
- Type:
str
- angle_cost_reference#
Can be used to normalize the angle cost with the cost of same reference model (dummy).
- Type:
int
- dihedral_cost_scale#
Adjusts how much the dihedral cost is weighted in the cost function.
- Type:
int
- dihedral_cost_variant#
Defines how the dihedral cost is calculated. Must be one of: * “mean_square” * “mean_abs” * “mean_norm”.
- Type:
str
- dihedral_cost_reference#
Can be used to normalize the dihedral cost with the cost of same reference model (dummy).
- Type:
int
- side_dihedral_cost_scale#
Adjusts how much the side dihedral cost is weighted in the cost function.
- Type:
int
- side_dihedral_cost_variant#
Defines how the side dihedral cost is calculated. Must be one of: * “mean_square” * “mean_abs” * “mean_norm”.
- Type:
str
- side_dihedral_cost_reference#
Can be used to normalize the side dihedral cost with the cost of same reference model (dummy).
- Type:
int
- cartesian_cost_scale#
Adjusts how much the cartesian cost is weighted in the cost function.
- Type:
int
- cartesian_cost_scale_soft_start#
Allows to slowly turn on the cartesian cost. Must be a tuple with (start, end) or (None, None) If begin and end are given, cartesian_cost_scale will be increased linearly in the given range.
- Type:
tuple
- cartesian_cost_variant#
Defines how the cartesian cost is calculated. Must be one of: * “mean_square” * “mean_abs” * “mean_norm”.
- Type:
str
- cartesian_cost_reference#
Can be used to normalize the cartesian cost with the cost of same reference model (dummy).
- Type:
int
- cartesian_dist_sig_parameters#
Parameters for the sigmoid functions applied to the high- and low-dimensional distances in the following order (sig_h, a_h, b_h, sig_l, a_l, b_l).
- Type:
tuple of floats
- cartesian_distance_cost_scale#
Adjusts how much the cartesian distance cost is weighted in the cost function.
- Type:
int
Examples
>>> import encodermap as em >>> parameters = em.ADCParameters() >>> parameters.auto_cost_variant mean_abs >>> parameters.save(path='/path/to/dir') /path/to/dir/parameters.json >>> # alternative constructor >>> new_params = em.Parameters.from_file('/path/to/dir/parameters.json') >>> new_params.main_path /path/to/dir/parameters.json
- __init__(**kwargs: Optional[Union[float, int, str, bool, list[int], list[str], list[float], tuple[int, None]]]) None [source]#
Instantiate the ADCParameters class
Takes a dict as input and overwrites the class defaults. The dict is directly stored as an attribute and can be accessed via instance attributes.
- Parameters:
**kwargs (dict) – Dict containing values. If unknown values are passed they will be dropped.
- activation_functions: list[str]#
- defaults = {'activation_functions': ['', 'tanh', 'tanh', ''], 'analysis_path': '', 'angle_cost_reference': 1, 'angle_cost_scale': 0, 'angle_cost_variant': 'mean_abs', 'auto_cost_scale': None, 'auto_cost_variant': 'mean_abs', 'batch_size': 256, 'batched': True, 'cartesian_cost_reference': 1, 'cartesian_cost_scale': 1, 'cartesian_cost_scale_soft_start': (None, None), 'cartesian_cost_variant': 'mean_abs', 'cartesian_dist_sig_parameters': (4.5, 12, 6, 1, 2, 6), 'cartesian_distance_cost_scale': 1, 'cartesian_pwd_start': None, 'cartesian_pwd_step': None, 'cartesian_pwd_stop': None, 'center_cost_scale': 0.0001, 'checkpoint_step': 5000, 'dihedral_cost_reference': 1, 'dihedral_cost_scale': 1, 'dihedral_cost_variant': 'mean_abs', 'dist_sig_parameters': (4.5, 12, 6, 1, 2, 6), 'distance_cost_scale': None, 'gpu_memory_fraction': 0, 'id': '', 'l2_reg_constant': 0.001, 'learning_rate': 0.001, 'loss': 'emap_cost', 'model_api': 'functional', 'n_neurons': [128, 128, 2], 'n_steps': 100000, 'periodicity': 6.283185307179586, 'seed': None, 'side_dihedral_cost_reference': 1, 'side_dihedral_cost_scale': 0.5, 'side_dihedral_cost_variant': 'mean_abs', 'summary_step': 10, 'tensorboard': False, 'training': 'auto', 'use_backbone_angles': False, 'use_sidechains': False}#
- classmethod defaults_description() str [source]#
str: A string that contains tabulated default parameter values.
- n_neurons: list[int]#
- class encodermap.AngleDihedralCartesianEncoderMap(trajs: encodermap.TrajEnsemble, parameters: Optional[encodermap.ADCParameters] = None, model: Optional[tensorflow.keras.Model] = None, read_only: bool = False, cartesian_loss_step: int = 0, top: Optional[mdtraj.Topology] = None)[source]#
Bases:
Autoencoder
Different __init__ method, than Autoencoder Class. Uses callbacks to tune-in cartesian cost.
Overwritten methods: _set_up_callbacks and generate.
Examples
>>> import encodermap as em >>> # Load two trajectories >>> xtcs = ["tests/data/1am7_corrected_part1.xtc", "tests/data/1am7_corrected_part2.xtc"] >>> tops = ["tests/data/1am7_protein.pdb", "tests/data/1am7_protein.pdb"] >>> trajs = em.load(xtcs, tops) >>> print(trajs) encodermap.TrajEnsemble object. Current backend is no_load. Containing 2 trajs. Not containing any CVs. >>> # load CVs >>> # This step can be omitted. The AngleDihedralCartesianEncoderMap class automatically loads CVs >>> trajs.load_CVs('all') >>> print(trajs.CVs['central_cartesians'].shape) (51, 474, 3) >>> print(trajs.CVs['central_dihedrals'].shape) (51, 471) >>> # create some parameters >>> p = em.ADCParameters(periodicity=360, use_backbone_angles=True, use_sidechains=True, ... cartesian_cost_scale_soft_start=(6, 12)) >>> # Standard is functional model, as it offers more flexibility >>> print(p.model_api) functional >>> print(p.distance_cost_scale) None >>> # Instantiate the class >>> e_map = em.AngleDihedralCartesianEncoderMap(trajs, p, read_only=True) >>> # dataset contains these inputs: >>> # central_angles, central_dihedrals, central_cartesians, central_distances, sidechain_dihedrals >>> print(e_map.dataset) <BatchDataset element_spec=(TensorSpec(shape=(None, 472), dtype=tf.float32, name=None), TensorSpec(shape=(None, 471), dtype=tf.float32, name=None), TensorSpec(shape=(None, 474, 3), dtype=tf.float32, name=None), TensorSpec(shape=(None, 473), dtype=tf.float32, name=None), TensorSpec(shape=(None, 316), dtype=tf.float32, name=None))> >>> # output from the model contains the following data: >>> # out_angles, out_dihedrals, back_cartesians, pairwise_distances of inp cartesians, pairwise of back-mapped cartesians, out_side_dihedrals >>> for data in e_map.dataset.take(1): ... pass >>> out = e_map.model(data) >>> print([i.shape for i in out]) [TensorShape([256, 472]), TensorShape([256, 471]), TensorShape([256, 474, 3]), TensorShape([256, 112101]), TensorShape([256, 112101]), TensorShape([256, 316])] >>> # get output of latent space by providing central_angles, central_dihedrals, sidehcain_dihedrals >>> latent = e_map.encoder([data[0], data[1], data[-1]]) >>> print(latent.shape) (256, 2) >>> # Rebuild central_angles, central_dihedrals and sidechain_angles from latent >>> dih, ang, side_dih = e_map.decode(latent) >>> print(dih.shape, ang.shape, side_dih.shape) (256, 472) (256, 471) (256, 316)
- __init__(trajs: encodermap.TrajEnsemble, parameters: Optional[encodermap.ADCParameters] = None, model: Optional[tensorflow.keras.Model] = None, read_only: bool = False, cartesian_loss_step: int = 0, top: Optional[mdtraj.Topology] = None) None [source]#
Instantiate the AngleDihedralCartesianEncoderMap class.
- Parameters:
trajs (em.TrajEnsemble) – The trajectories to be used as input. If trajs contain no CVs, correct CVs will be loaded.
parameters (Optional[em.ACDParameters]) – The parameters for the current run. Can be set to None and the default parameters will be used. Defaults to None.
model (Optional[tf.keras.models.Model]) – The keras model to use. You can provide your own model with this argument. If set to None, the model will be built to the specifications of parameters using either the functional or sequential API. Defaults to None
read_only (bool) – Whether to write anything to disk (False) or not (True). Defaults to False.
cartesian_loss_step (int, optional) – For loading and re-training the model. The cartesian_distance_loss is tuned in step-wise. For this the start step of the training needs to be accounted for. If the scale of the cartesian loss should increase from epoch 6 to epoch 12 and the model is saved at epoch 9, this argument should also be set to 9, to continue training with the correct scaling factor. Defaults to 0.
- _setup_callbacks() None [source]#
Overwrites the parent class’ _setup_callbacks method.
Due to the ‘soft start’ of the cartesian cost, the cartesiand_increase_callback needs to be added to the list of callbacks.
- encode(data=None)[source]#
Calls encoder part of model.
- Parameters:
data (Union[np.ndarray, None], optional) – The data to be passed top the encoder part. Can be either numpy ndarray or None. If None is provided a set of 10000 points from the provided train data will be taken. Defaults to None.
- Returns:
The output from the bottlenack/latent layer.
- Return type:
np.ndarray
- classmethod from_checkpoint(trajs, checkpoint_path, read_only=True, overwrite_tensorboard_bool=False)[source]#
Reconstructs the model from a checkpoint.
- generate(points: np.ndarray, top: Optional[str, int, mdtraj.Topology] = None, backend: Literal['mdtraj', 'mdanalysis'] = 'mdtraj') Union[MDAnalysis.Universe, mdtraj.Trajectory] [source]#
Overrides the parent class’ generate method and builds a trajectory.
Instead of just providing data to decode using the decoder part of the network, this method also takes a molecular topology as its top argument. This topology is then used to rebuild a time-resolved trajectory.
- Parameters:
points (np.ndarray) – The low-dimensional points from which the trajectory should be rebuilt.
top (Optional[str, int, mdtraj.Topology]) – The topology to be used for rebuilding the trajectory. This should be a string pointing towards a <*.pdb, *.gro, *.h5> file. Alternatively, None can be provided, in which case, the internal topology (self.top) of this class is used. Defaults to None.
backend (str) – Defines what MD python package to use, to build the trajectory and also what type this method returns, needs to be one of the following: * “mdtraj” * “mdanalysis”
- Returns:
- The trajectory after
applying the decoded structural information. The type of this depends on the chosen backend parameter.
- Return type:
Union[mdtraj.Trajectory, MDAnalysis.universe]
- property loss#
A list of loss functions passed to the model when it is compiled. When the main Autoencoder class is used and parameters.loss is ‘emap_cost’ this list is comprised of center_cost, regularization_cost, auto_cost. When the EncoderMap sub-class is used and parameters.loss is ‘emap_cost’ distance_cost is added to the list. When parameters.loss is not ‘emap_cost’, the loss can either be a string (‘mse’), or a function, that both are acceptable arguments for loss, when a keras model is compiled.
- Type:
(Union[list, string, function])
- save(step: Optional[int] = None) None [source]#
Saves the model to the current path defined in parameters.main_path.
- Parameters:
step (Optional[int]) – Does not actually save the model at the given training step, but rather changes the string used for saving the model from an datetime format to another.
- class encodermap.Autoencoder(parameters=None, train_data: Optional[Union[np.ndarray, tf.Dataset]] = None, model=None, read_only=False, sparse=False)[source]#
Bases:
object
Main Autoencoder class preparing data, setting up the neural network and implementing training.
This is the main class for neural networks inside EncoderMap. The class prepares the data (batching and shuffling), creates a tf.keras.Model of layers specified by the attributes of the encodermap.Parameters class. Depending on what Parent/Child-Class is instantiated a combination of cost functions is set up. Callbacks to Tensorboard are also set up.
- train_data#
The numpy array of the train data passed at init.
- Type:
np.ndarray
- p#
An encodermap.Parameters() class containing all info needed to set up the network.
- Type:
- dataset#
The dataset that is actually used in training the keras model. The dataset is a batched, shuffled, infinitely-repeating dataset.
- Type:
tensorflow.data.Dataset
- read_only#
Variable telling the class whether it is allowed to write to disk (False) or not (True).
- Type:
bool
- optimizer#
Instance of the Adam optimizer with learning rate specified by the Parameters class.
- Type:
tf.keras.optimizers.Adam
- metrics#
A list of metrics passed to the model when it is compiled.
- Type:
list
- callbacks#
A list of tf.keras.callbacks.Callback Sub-classes changing the behavior of the model during training. Some standard callbacks are always present like:
- encodermap.callbacks.callbacks.ProgressBar:
A progress bar callback using tqdm giving the current progress of training and the current loss.
- CheckPointSaver:
A callback that saves the model every parameters.checkpoint_step steps into the main directory. This callback will only be used, when read_only is False.
- TensorboardWriteBool:
A callback that contains a boolean Tensor that will be True or False, depending on the current training step and the summary_step in the parameters class. The loss functions use this callback to decide whether they should write to Tensorboard. This callback will only be present, when read_only is False and parameters.tensorboard is True.
You can append your own callbacks to this list before executing Autoencoder.train().
- Type:
list
- encoder#
The encoder (sub)model of model.
- Type:
tf.keras.models.Model
- decoder#
The decoder (sub)model of model.
- Type:
tf.keras.models.Model
- plot_network()[source]#
Tries to plot the network. For this method to work graphviz, pydot and pydotplus needs to be installed.
- generate()[source]#
Same as decode. For AngleDihedralCartesianAutoencoder classes this will build a protein strutcure.
Note
Performance of tensorflow is not only dependant on your system’s hardware and how the data is presented to the network (for this check out https://www.tensorflow.org/guide/data_performance), but also how you compiled tensorflow. Normal tensorflow (pip install tensorflow) is build without CPU extensions to work on many CPUs. However, Tensorflow can greatly benefit from using CPU instructions like AVX2, AVX512 that bring a speed-up in linear algebra computations of 300%. By building tensorflow from source you can activate these extensions. However, the CPU speed-up is dwarfed by the speed-up when you allow tensorflow to run on your GPU (grapohics card). To check whether a GPU is available run: print(“Num GPUs Available: “, len(tf.config.list_physical_devices(‘GPU’))). Refer to these pages to install tensorflow for best performance: https://www.tensorflow.org/install/pip, https://www.tensorflow.org/install/gpu
Examples
>>> import encodermap as em >>> # without providing any data, default parameters and a 4D hypercube as input data will be used. >>> e_map = em.EncoderMap(read_only=True) >>> print(e_map.train_data.shape) (16000, 4) >>> print(e_map.dataset) <BatchDataset element_spec=(TensorSpec(shape=(None, 4), dtype=tf.float32, name=None), TensorSpec(shape=(None, 4), dtype=tf.float32, name=None))> >>> print(e_map.encode(e_map.train_data).shape) (16000, 2)
- __init__(parameters=None, train_data: Optional[Union[np.ndarray, tf.Dataset]] = None, model=None, read_only=False, sparse=False)[source]#
Instantiate the Autoencoder class.
- Parameters:
parameters (Union[encodermap.Parameters, None], optional) – The parameters to be used. If None is provided default values (check them with print(em.Parameters.defaults_description())) are used. Defaults to None.
train_data (Union[np.ndarray, tf.data.Dataset, None], optional) –
The train data. Can be one of the following: * None: If None is provided points on the edges of a 4-dimensional hypercube will be used as train data. * np.ndarray: If a numpy array is provided, it will be transformed into a batched tf.data.Dataset by
first making it an infinitely repeating dataset, shuffling it and the batching it with a batch size specified by parameters.batch_size.
- tf.data.Dataset: If a dataset is provided it will be used without making any adjustments. Make
sure, that the dataset uses float32 as its type.
Defaults to None.
model (Union[tf.keras.models.Model, None], optional) – Providing a keras model to this argument will make the Autoencoder/EncoderMap class use this model instead of the predefined ones. Make sure the model can accept EncoderMap’s loss functions. If None is provided the model will be built using the specifications in parameters. Defaults to None.
read_only (bool, optional) – Whether the class is allowed to write to disk (False) or not (True). Defaults to False and will allow the class to write to disk.
- Raises:
BadError – When read_only is True and parameters.tensorboard is True, this Exception will be raised, because they are mutually exclusive.
- add_images_to_tensorboard(data=None, image_step=None, scatter_kws={'s': 20}, hist_kws={'bins': 50}, additional_fns=None, when='epoch')[source]#
Adds images to Tensorboard using the data in data and the ids in ids.
- Parameters:
data (Union[np.ndarray, list, None], optional) – The input-data will be passed through the encoder part of the autoencoder. If None is provided a set of 10000 points from the provided train data will be taken. A list is needed for the functional API of the ADCAutoencoder, that takes a list of [angles, dihedrals, side_dihedrals]. Defaults to None.
image_step (Union[int, None], optional) – The interval in which to plot images to tensorboard. If None is provided, the update step will be the same as parameters.summary_step. Defaults to None.
scatter_kws (dict, optional) – A dict with items that matplotlib.pyplot.scatter() will accept. Defaults to {‘s’: 20}, which sets an appropriate size of scatter points for the size of datasets encodermap is usually used for.
hist_kws (dict, optional) – A dict with items that matplotlib.pyplot.scatter() will accept. You can choose a colorbar here. Defaults to {‘bins’: 50} which sets an appropriate bin count for the size of datasets encodermap is usually used for.
additional_fns (Union[list, None], optional) – A list of functions that will accept the low-dimensional output of the autoencoder’s latent/bottleneck layer and return a tf.Tensor that can be logged by tf.summary.image(). See the notebook ‘writing_custom_images_to_tensorboard.ipynb’ in tutorials/notebooks_customization for more info. If None is provided no additional functions will be used to plot to tensorboard. Defaults to None.
when (str, optional) – When to log the images can be either ‘batch’, then the images will be logged after every step during training, or ‘epoch’, then only after every image_step epoch the images will be written. Defaults to ‘epoch’.
- decode(data)[source]#
Calls the decoder part of the model.
AngleDihedralCartesianAutoencoder will, like the other two classes’ output a tuple of data.
- Parameters:
data (np.ndarray) – The data to be passed to the decoder part of the model. Make sure that the shape of the data matches the number of neurons in the latent space.
- Returns:
Oue output from the decoder part.
- Return type:
np.ndarray
- property decoder#
Decoder part of the model.
- Type:
tf.keras.models.Model
- encode(data=None)[source]#
Calls encoder part of model.
- Parameters:
data (Union[np.ndarray, None], optional) – The data to be passed top the encoder part. Can be either numpy ndarray or None. If None is provided a set of 10000 points from the provided train data will be taken. Defaults to None.
- Returns:
The output from the bottlenack/latent layer.
- Return type:
np.ndarray
- property encoder#
Encoder part of the model.
- Type:
tf.keras.models.Model
- classmethod from_checkpoint(checkpoint_path, read_only=True, overwrite_tensorboard_bool=False, sparse=False)[source]#
Reconstructs the class from a checkpoint.
- Parameters:
path (Checkpoint) – The path to the checkpoint. Most models are saved in parts (encoder, decoder) and thus the provided path often needs a wildcard (*). The save() method of this class prints a string with which the model can be reloaded.
read_only (bool, optional) – Whether to reload the model in read_only mode (True) or allow the Autoencoder class to write to disk (False). This option might collide with the tensorboard Parameter in the respective parameters.json file in the maith_path. Defaults to True.
overwrite_tensorboard_bool (bool, optional) – Whether to overwrite the tensorboard Parameter while reloading the class. This can be set to True to set the tensorboard parameter False and allow read_only. Defaults to False.
- Raises:
BadError – When read_only is True, overwrite_tensorboard_bool is False and the reloaded parameters have tensorboard set to True.
- Returns:
Encodermap Autoencoder class.
- Return type:
- generate(data)[source]#
Duplication of decode.
In Autoencoder and EncoderMap this method is equivalent to decode(). In AngleDihedralCartesianAutoencoder this method will be overwritten to produce output molecular conformations.
- Parameters:
data (np.ndarray) – The data to be passed to the decoder part of the model. Make sure that the shape of the data matches the number of neurons in the latent space.
- Returns:
Oue output from the decoder part.
- Return type:
np.ndarray
- property loss#
A list of loss functions passed to the model when it is compiled. When the main Autoencoder class is used and parameters.loss is ‘emap_cost’ this list is comprised of center_cost, regularization_cost, auto_cost. When the EncoderMap sub-class is used and parameters.loss is ‘emap_cost’ distance_cost is added to the list. When parameters.loss is not ‘emap_cost’, the loss can either be a string (‘mse’), or a function, that both are acceptable arguments for loss, when a keras model is compiled.
- Type:
(Union[list, string, function])
- property model#
The tf.keras.Model model used for training.
- Type:
tf.keras.models.Model
- plot_network()[source]#
Tries to plot the network using pydot, pydotplus and graphviz. Doesn’t raise an exception if plotting is not possible.
Note
Refer to this guide to install these programs: https://stackoverflow.com/questions/47605558/importerror-failed-to-import-pydot-you-must-install-pydot-and-graphviz-for-py
- class encodermap.EncoderMap(parameters=None, train_data: Optional[Union[np.ndarray, tf.Dataset]] = None, model=None, read_only=False, sparse=False)[source]#
Bases:
Autoencoder
Complete copy of Autoencoder class but uses additional distance cost scaled by the SketchMap sigmoid params
- classmethod from_checkpoint(checkpoint_path, read_only=True, overwrite_tensorboard_bool=False, sparse=False)[source]#
Reconstructs the model from a checkpoint.
- property loss#
A list of loss functions passed to the model when it is compiled. When the main Autoencoder class is used and parameters.loss is ‘emap_cost’ this list is comprised of center_cost, regularization_cost, auto_cost. When the EncoderMap sub-class is used and parameters.loss is ‘emap_cost’ distance_cost is added to the list. When parameters.loss is not ‘emap_cost’, the loss can either be a string (‘mse’), or a function, that both are acceptable arguments for loss, when a keras model is compiled.
- Type:
(Union[list, string, function])
- class encodermap.EncoderMapBaseCallback(parameters: Optional[AnyParameters] = None)[source]#
Bases:
Callback
Base class for multiple callbacks.
Can be used to implement new callbacks that can also use enocdermap.Parameters classes. A counter is increased after a tran_batch is finished. Based on the two attributes summary_step and checkpoint_step in the encodermap.Parameters classes the corresponding methods are called. Has two class attributes that are important:
- steps_counter#
The current step counter. Increases every on_train_batch_end.
- Type:
int
- p (Union[encodermap.Parameters, encodermap.ADCParameters]
The parameters for this callback. Based on the summary_step and checkpoint_step of this parameters class different class-methods are called.
- __init__(parameters: Optional[AnyParameters] = None) None [source]#
Instantiate the EncoderMapBaseCallback class.
- Parameters:
parameters (Union[encodermap.Parameters, encodermap.ADCParameters, None], optional) – Parameters that will be used to print out the progress bar. If None is passed default values (check them with print(em.ADCParameters.defaults_description())) will be used. Defaults to None.
- on_checkpoint_step(step: int, logs: Optional[dict] = None) None [source]#
Executed, when the currently finished batch matches encodermap.Parameters.checkpoint_step
- Parameters:
step (int) – The number of the current step.
logs (Optional[dict]) – logs is a dict containing the metrics results.
- on_summary_step(step: int, logs: Optional[dict] = None) None [source]#
Executed, when the currently finished batch matches encodermap.Parameters.summary_step
- Parameters:
step (int) – The number of the current step.
logs (Optional[dict]) – logs is a dict containing the metrics results.
- on_train_batch_end(batch: int, logs: Optional[dict] = None) None [source]#
Called after a batch ends. The number of batch is provided by keras.
This method is the backbone of all of encodermap’s callbacks. After every batch is method is called by keras. When the number of that batch matches either encodermap.Parameters.summary_step or encodermap.Parameters.checkpoint_step the code on self.on_summary_step, or self.on_checkpoint_step is executed. These methods should be overwritten by child classes.
- Parameters:
batch (int) – The number of the current batch. Provided by keras.
logs (Optional[dict]) – logs is a dict containing the metrics results.
- class encodermap.InteractivePlotting(autoencoder, trajs=None, data=None, ax=None, align_string='name CA', top=None, hist=False, scatter_kws={'s': 5}, ball_and_stick=False, top_index=0)[source]#
Bases:
object
Class to open up an interactive plotting window.
Contains sub-classes to handle user-clickable menus and selectors.
- trajs#
The trajs passed into this class.
- Type:
encodermap.TrajEnsemble
- fig#
The figure plotted onto. If ax is passed when this class is instantiated the parent figure will be fetched with self.fig = self.ax.get_figure()
- Type:
matplotlib.figure
- ax#
The axes where the lowd data of the trajs is plotted on.
- Type:
matplotlib.axes
The axes where the normal menu is plotted on.
- Type:
matplotlib.axes
The axes on which the status menu is plotted on.
- Type:
matplotlib.axes
- pts#
The points which are plotted. Based on some other class variables the color of this collection is adjusted.
- Type:
matplotlib.collections.Collection
The menu containing the status buttons.
The menu containing the remaining buttons.
- tool#
The currentlty active tool used to select points. This can be lasso, polygon, etc…
- mode#
Current mode of the statusmenu.
- Type:
str
Examples
>>> sess = ep.InteractivePlotting(trajs)
- __init__(autoencoder, trajs=None, data=None, ax=None, align_string='name CA', top=None, hist=False, scatter_kws={'s': 5}, ball_and_stick=False, top_index=0)[source]#
Instantiate the InteractivePlotting class.
- Parameters:
trajs (encodermap.TrajEnsemble) – The trajs of which the lowd info should be plotted.
ax (matplotlib.axes, optional) – On what axes to plot. If no axes is provided a new figure and axes will be created defaults to None.
- property cluster_zoomed#
- property mode#
- on_click(event)[source]#
Decides whether the release event happened in the drawing area or the menu.
- Parameters:
event (matplotlib.backend_bases.Event) – The event provided by figure.canvas.connect().
Chooses the function to call based on what MenuItem was clicked.
- Parameters:
event (matplotlib.backend_bases.Event) – The event provided by figure.canvas.connect().
- class encodermap.MolData(atom_group, cache_path='', start=None, stop=None, step=None)[source]#
Bases:
object
MolData is designed to extract and hold conformational information from trajectories.
- Variables:
cartesians – numpy array of the trajectory atom coordinates
central_cartesians – cartesian coordinates of the central backbone atoms (N-CA-C-N-CA-C…)
dihedrals – all backbone dihederals (phi, psi, omega)
angles – all bond angles of the central backbone atoms
lengths – all bond lengths between neighbouring central atoms
sidedihedrals – all sidechain dihedrals
aminoaciddict – number of sidechain diheadrals
- __init__(atom_group, cache_path='', start=None, stop=None, step=None)[source]#
- Parameters:
atom_group – MDAnalysis atom group
cache_path – Allows to define a path where the calculated variables can be cached.
start – first frame to analyze
stop – last frame to analyze
step – step of the analyzes
- write(path, coordinates, name='generated', formats=('pdb', 'xtc'), only_central=False, align_reference=None, align_select='all')[source]#
Writes a trajectory for the given coordinates.
- Parameters:
path – directory where to save the trajectory
coordinates – numpy array of xyz coordinates (frames, atoms, xyz)
name – filename (without extension)
formats – specify which formats schould be used to write structure and trajectory. default: (“pdb”, “xtc”)
only_central – if True only central atom coordinates are expected (N-Ca-C…)
align_reference – Allows to allign the generated conformations according to some reference. The reference should be given as MDAnalysis atomgroup
align_select – Allows to select which atoms should be used for the alignment. e.g. “resid 5:60” default is “all”. Have a look at the MDAnalysis selection syntax for more details.
- Returns:
- class encodermap.Parameters(**kwargs: Optional[Union[float, int, str, bool, list[int], list[str], list[float], tuple[int, None]]])[source]#
Bases:
ParametersFramework
Class to hold Parameters for the Autoencoder
Parameters can be set via keyword args while instantiating the class, set as instance attributes or read from disk. This class can write parameters to disk in .yaml or .json format.
- defaults#
Classvariable dict that holds the defaults even when the current values might have changed.
- Type:
dict
- main_path#
Defines a main path where the parameters and other things might be stored.
- Type:
str
- n_neurons#
List containing number of neurons for each layer up to the bottleneck layer. For example [128, 128, 2] stands for an autoencoder with the following architecture {i, 128, 128, 2, 128, 128, i} where i is the number of dimensions of the input data. These are Input/Output Layers that are not trained.
- Type:
list of int
- activation_functions#
List of activation function names as implemented in TensorFlow. For example: “relu”, “tanh”, “sigmoid” or “” to use no activation function. The encoder part of the network takes the activation functions from the list starting with the second element. The decoder part of the network takes the activation functions in reversed order starting with the second element form the back. For example [“”, “relu”, “tanh”, “”] would result in a autoencoder with {“relu”, “tanh”, “”, “tanh”, “relu”, “”} as sequence of activation functions.
- Type:
list of str
- periodicity#
Defines the distance between periodic walls for the inputs. For example 2pi for angular values in radians. All periodic data processed by EncoderMap must be wrapped to one periodic window. E.g. data with 2pi periodicity may contain values from -pi to pi or from 0 to 2pi. Set the periodicity to float(“inf”) for non-periodic inputs.
- Type:
float
- learning_rate#
Learning rate used by the optimizer.
- Type:
float
- n_steps#
Number of training steps.
- Type:
int
- batch_size#
Number of training points used in each training step
- Type:
int
- summary_step#
A summary for TensorBoard is writen every summary_step steps.
- Type:
int
- checkpoint_step#
A checkpoint is writen every checkpoint_step steps.
- Type:
int
- dist_sig_parameters#
Parameters for the sigmoid functions applied to the high- and low-dimensional distances in the following order (sig_h, a_h, b_h, sig_l, a_l, b_l)
- Type:
tuple of floats
- distance_cost_scale#
Adjusts how much the distance based metric is weighted in the cost function.
- Type:
int
- auto_cost_scale#
Adjusts how much the autoencoding cost is weighted in the cost function.
- Type:
int
- auto_cost_variant#
defines how the auto cost is calculated. Must be one of: * mean_square * mean_abs * mean_norm
- Type:
str
- center_cost_scale#
Adjusts how much the centering cost is weighted in the cost function.
- Type:
float
- l2_reg_constant#
Adjusts how much the L2 regularisation is weighted in the cost function.
- Type:
float
- gpu_memory_fraction#
Specifies the fraction of gpu memory blocked. If set to 0, memory is allocated as needed.
- Type:
float
- analysis_path#
A path that can be used to store analysis
- Type:
str
- id#
Can be any name for the run. Might be useful for example for specific analysis for different data sets.
- Type:
str
- model_api#
A string defining the API to be used to build the keras model. Defaults to sequntial. Possible strings are: * functional will use keras’ functional API. * sequential will define a keras Model, containing two other models with the Sequential API.
These two models are encoder and decoder.
custom will create a custom Model where even the layers are custom.
- Type:
str
- loss#
A string defining the loss function. Defaults to emap_cost. Possible losses are: * reconstruction_loss will try to train output == input * mse: Returns a mean squared error loss. * emap_cost is the EncoderMap loss function. Depending on the class Autoencoder,
Encodermap, `ACDAutoencoder, different contributions are used for a combined loss. Autoencoder uses atuo_cost, reg_cost, center_cost. EncoderMap class adds sigmoid_loss.
- Type:
str
- batched#
Whether the dataset is batched or not.
- Type:
bool
- training#
A string defining what kind of training is performed when autoencoder.train() is callsed. * auto does a regular model.compile() and model.fit() procedure. * custom uses gradient tape and calculates losses and gradients manually.
- Type:
str
- tensorboard#
Whether to print tensorboard information. Defaults to False.
- Type:
bool
- seed#
Fixes the state of all operations using random numbers. Defaults to None.
- Type:
Union[int, None]
Examples
>>> import encodermap as em >>> paramters = em.Parameters() >>> parameters.auto_cost_variant mean_abs >>> parameters.save(path='/path/to/dir') /path/to/dir/parameters.json >>> # alternative constructor >>> new_params = em.Parameters.from_file('/path/to/dir/parameters.json') >>> new_params.main_path /path/to/dir/parameters.json
- __init__(**kwargs: Optional[Union[float, int, str, bool, list[int], list[str], list[float], tuple[int, None]]]) None [source]#
Instantiate the Parameters class
Takes a dict as input and overwrites the class defaults. The dict is directly stored as an attribute and can be accessed via instance attributes.
- Parameters:
**kwargs (dcit) – Dict containing values. If unknown keys are passed they will be dropped.
- activation_functions: list[str]#
- defaults = {'activation_functions': ['', 'tanh', 'tanh', ''], 'analysis_path': '', 'auto_cost_scale': 1, 'auto_cost_variant': 'mean_abs', 'batch_size': 256, 'batched': True, 'center_cost_scale': 0.0001, 'checkpoint_step': 5000, 'dist_sig_parameters': (4.5, 12, 6, 1, 2, 6), 'distance_cost_scale': 500, 'gpu_memory_fraction': 0, 'id': '', 'l2_reg_constant': 0.001, 'learning_rate': 0.001, 'loss': 'emap_cost', 'model_api': 'sequential', 'n_neurons': [128, 128, 2], 'n_steps': 100000, 'periodicity': 6.283185307179586, 'seed': None, 'summary_step': 10, 'tensorboard': False, 'training': 'auto'}#
- classmethod defaults_description() str [source]#
str: A string that contains tabulated default parameter values.
- n_neurons: list[int]#
- class encodermap.Repository(repo_source='data/repository.yaml', checksum_file='data/repository.md5', ignore_checksums=False, debug=True)[source]#
Bases:
object
Main Class to work with Repositories of MD data and download the data.
This class handles the download of files from a repository source. All data are obtained from a .yaml file (default at data/repository.yaml), which contains trajectory files and topology files organized in a readable manner. With this class the repository.yaml file can be queried using unix-like file patterns. Files can be downloaded on-the-fly (if they already exist, they won’t be downloaded again). Besides files full projects can be downloaded and rebuilt.
- current_path#
Path of the .py file containing this class. If no working directory is given (None), all files will be downloaded to a directory named ‘data’ (will be created) which will be placed in the directory of this .py file.
- Type:
str
- url#
The url to the current repo source.
- Type:
str
- maintainer#
The maintainer of the current repo source.
- Type:
str
- files_dict#
A dictionary summarizing the files in this repo. dict keys are built from ‘project_name’ + ‘filetype’. So for a project called ‘protein_sim’, possible keys are ‘protein_sim_trajectory’, ‘protein_sim_topology’, ‘protein_sim_log’. The values of these keys are all str and they give the actual filename of the files. If ‘protein_sim’ was conducted with GROMACS, these files would be ‘traj_comp.xtc’, ‘confout.gro’ and ‘md.log’.
- Type:
dict
- files#
Just a list of str of all downloadable files.
- Type:
list
- data#
The main organization of the repository. This is the complete .yaml file as it was read and returned by pyyaml.
- Type:
dict
Examples
>>> import encodermap as em >>> repo = em.Repository() >>> print(repo.search('*PFFP_sing*')) {'PFFP_single_trajectory': 'PFFP_single.xtc', 'PFFP_single_topology': 'PFFP_single.gro', 'PFFP_single_input': 'PFFP.mdp', 'PFFP_single_log': 'PFFP.log'} >>> print(repo.url) http://134.34.112.158
- __init__(repo_source='data/repository.yaml', checksum_file='data/repository.md5', ignore_checksums=False, debug=True)[source]#
Initialize the repository,
- Parameters:
repo_source (str) – The source .yaml file to build the repository from. Defaults to ‘data/repository.yaml’.
checksum_file (str) – A file containing the md5 hash of the repository file. This ensures no one tampers with the repository.yaml file and injects malicious code. Defaults to ‘data/repository.md5’.
ignore_checksums (bool) – If you want to ignore the checksum check of the repo_source file set this top True. Can be useful for developing, when the repository.yaml file undergoes a lot of changes. Defaults to False.
debug (bool, optional) – Whether to print debug info. Defaults to False.
- static _split_proj_filetype(proj_filetype)[source]#
Splits the strings that index the self.datasets dictionary.
- property catalogue#
Returns the underlying catalogue data.
- Type:
dict
- property datasets#
A set of datasets in this repository. A dataset can either be characterized by a set of trajectory-, topology-, log- and input-file or a dataset is a .tar.gz container, which contains all necessary files.
- Type:
set
- fetch(remote_filenames, working_directory=None, overwrite=False, max_attempts=3, makdedir=False, progress_bar=True)[source]#
This fetches a singular file from self.files.
Displays also progress bar with the name of the file. Uses requests.
- Parameters:
remote_filename (str) – The name of the remote file. Check self.files for more info.
working_directory (Union[str, None], optional) – Can be a string to a directory to save the files at. Can also be None. In that case self.current_path + ‘/data’ will be used to save the file at. Which is retrieved by inspect.getfile(inspect.currentframe)). If the files are already there and overwrite is false, the file path is simply returned. Defaults to None.
overwrite (bool, optional) – Whether to overwrite local files. Defaults to False.
max_attempts (int, optional) – Number of download attempts. Defaults to 3.
makdedir (bool, optional) – Whether to create working_directory, if it is not already existing. Defaults to False.
progress_bar (bool, optional) – Uses the package progress-reporter to display a progress bar.
- Returns:
- A tuple containing the following:
list: A list of files that have just been downloaded. str: A string leading to the directory the files have been downloaded to.
- Return type:
tuple
- get_sizes(pattern)[source]#
Returns a list of file-sizes of a given pattern.
- Parameters:
pattern (Union[str, list]) – A unix-like pattern (‘traj*.xtc’) or a list of files ([‘traj_1.xtc’, ‘traj_2.xtc’]).
- Returns:
A list of filesizes in bytes.
- Return type:
list
- load_project(project, working_directory=None, overwrite=False, max_attempts=3, makdedir=False, progress_bar=True)[source]#
This will return TrajEnsemble / SingleTraj objects that are correctly formatted.
This method allows one to directly rebuild projects from the repo source, using encodermap’s own SingleTraj and TrajEnsemble classes.
- Parameters:
project (str) – The name of the project to be loaded. See Repository.projects.keys() for a list of projects.
working_directory (Union[str, None], optional) – Can be a string to a directory to save the files at. Can also be None. In that case self.current_path + ‘/data’ will be used to save the file at. Which is retrieved by inspect.getfile(inspect.currentframe)). If the files are already there and overwrite is false, the file path is simply returned. Defaults to None.
overwrite (bool, optional) – Whether to overwrite local files. Defaults to False.
max_attempts (int, optional) – Number of download attempts. Defaults to 3.
makdedir (bool, optional) – Whether to create working_directory, if it is not already existing. Defaults to False.
progress_bar (bool, optional) – Uses the package progress-reporter to display a progress bar.
- Returns:
- The project already loaded into encodermap’s
SingleTraj or TrajEnsemble classes.
- Return type:
Union[encodermap.SingleTraj, encodermap.TrajEnsemble]
Examples
>>> import encodermap as em >>> repo = em.Repository() >>> trajs = repo.load_project('Tetrapeptides_Single') >>> print(trajs) encodermap.TrajEnsemble object. Current backend is no_load. Containing 2 trajs. Common str is ['PFFP', 'FPPF']. Not containing any CVs. >>> print(trajs.n_trajs) 2
- lookup(file)[source]#
Piece of code to allow some compatibility to mdshare.
The complete self.data dictionary will be traversed to find file and its location in the self.data dictionary. This will be used to get the filesize and its md5 hash. The returned tuple also tells whether the file is a .tar.gz container or not. In the case of a container, the container needs to be extracted using tarfile.
- Parameters:
file (str) – The file to search for.
- Returns:
- A tuple containing the follwing:
str: A string that is either ‘container’ or ‘index’ (for normal files). dict: A dict with dict(file=filename, hash=filehas, size=filesize)
- Return type:
tuple
- property projects#
A dictionary containing project names and their associated files. Projects are a larger collection of individual sims, that belong together. The project names are the dictionary’s keys, the files are given as lists in the dict’s values.
- Type:
dict
- stack(pattern)[source]#
Creates a stack to prepare for downloads.
- Parameters:
pattern (Union[str, list]) – A unix-like pattern (‘traj*.xtc’) or a list of files ([‘traj_1.xtc’, ‘traj_2.xtc’]).
- Returns:
- A list of dicts. Each dict contains filename, size and a boolean
value telling whether the downloaded file needs to be extracted after downloading.
- Return type:
list
- encodermap.load(trajs: Union[str, md.Trajectory, Sequence[str], Sequence[md.Trajectory]], tops: Optional[Union[str, md.Topology, Sequence[str], Sequence[md.Topology]]] = None, common_str: Optional[str, list[str]] = None, backend: Literal['no_load', 'mdtraj'] = 'no_load', index: Optional[Union[int, np.ndarray, list[int], slice]] = None, traj_num: Optional[int] = None, basename_fn: Optional[Callable] = None) Union[SingleTraj, TrajEnsemble] [source]#
Encodermap’s forward facing function to work with MD data of single or more trajectories.
Based what’s provided for trajs, you either get a SingleTraj object, that collects information about a single traj, or a TrajEnsemble object, that contains information of multiple trajectories (even with different topologies).
- Parameters:
trajs (Union[str, md.Trajectory, Sequence[str], Sequence[md.Trajectory], Sequence[SingleTraj]]) – Here, you can provide a single string pointing to a trajectory on your computer (/path/to/traj_file.xtc) or (/path/to/protein.pdb) or a list of such strings. In the former case, you will get a SingleTraj object which is encodermap’s way of storing data (positions, CVs, times) of a single trajectory. In the latter case, you will get a TrajEnsemble object, which is Encodermap’s way of working with mutlipel SingleTrajs.
tops (Optional[Union[str, md.Topology, Sequence[str], Sequence[md.Topology]]]) – For this argument, you can provide the topology(ies) of the corresponding traj(s). Trajectory file formats like .xtc and .dcd only store atomic positions and not weights, elements, or bonds. That’s what the tops argument is for. There are some trajectory file formats out there (MDTraj HDF5, AMBER netCDF4) that store both trajectory and topology in a single file. Also .pdb file can also be used as If you provide such files for trajs, you can leave tops as None. If you provide multiple files for trajs, you can still provide a single tops file, if the trajs in trajs share the same topology. If that is not the case, you can either provide a list of topologies, matched to the trajs in trajs, or use the common_str argument to match them. Defaults to None.
common_str (Optional[str, list[str]]) –
If you provided a different number of trajs and tops, this argument is used to match them. Let’s say, you have 5 trajectories of a wild type protein and 5 trajectories of a mutant. If the path to these files is somewhat consistent (e.g:
/path/to/wt/traj1.xtc
/different/path/to/wt/traj_no_water.xtc
…
/data/path/to/mutant/traj0.xtc
/data/path/to/mutant/traj0.xtc
), you can provide [‘wt’, ‘mutant’] for the common_str argument and the files are grouped based on the occurence of ‘wt’ and ‘mutant’ in ther filepaths. Defaults to None.
backend (Literal["no_load", "mdtraj"]) – Normally, encodermap postpones the actual loading of the atomic positions until you really need them. This accelerates the handling of large trajectory ensembles. Choosing ‘mdtraj’ as the backend, all atomic positions are always loaded, taking up space on your system memory, but accessing positions in a non-sequential fashion is faster. Defaults to ‘no_load’.
index (Optional[Union[int, np.ndarray, list[int], slice]]) –
Only used, if argument trajs is a single trajectory. This argument can be used to index the trajectory data. If you want to exclude the first 100 frames of your trajectory, because the protein relaxes from its crystal structure, you can load it like so:
em.load(traj_file, top_file, index=slice(100))
As encodermap lazily evaluates positional data, the slice(100) argument is stored until the data is accessed in which case the first 100 frames are not accessible. Just like, if you would have deleted them. Besides a slice, you can also provide int (which returns a single frame at the requested index) and lists of int (which returns frames at the locations indexed by the ints in the list). If None is provided the trajectory data is not sliced/subsampled. Defaults to None.
traj_num (Optional[int]) –
Only used, if argument trajs is a single trajectory. This argument is meant to organize the SingleTraj trajectories in a TrajEnsemble class. Of course you can build your own TrajEnsemble from
a list of SingleTraj`s and provide this list as the `trajs argument to
em.load(). In this case you need to set the `traj_num`s of the `SingleTraj`s yourself. Defaults to None.
basename_fn (Optional[Callable]) – A function to apply to the traj_file string to return the basename of the trajectory. If None is provided, the filename without extension will be used. When all files are named the same and the folder they’re in defines the name of the trajectory you can supply lambda x: split(‘/’)[-2] as this argument. Defaults to None.
Examples
>>> # load a pdb file with 14 frames from rcsb.org >>> import encodermap as em >>> traj = em.load("https://files.rcsb.org/view/1GHC.pdb") >>> print(traj) encodermap.SingleTraj object. Current backend is no_load. Basename is 1GHC. Not containing any CVs. >>> traj.n_frames 14 >>> # load multiple trajs >>> trajs = em.load(['https://files.rcsb.org/view/1YUG.pdb', 'https://files.rcsb.org/view/1YUF.pdb']) >>> # trajs are inernally numbered >>> print([traj.traj_num for traj in trajs])