API#
Here follows a complete collection of EncoderMap’s low level functions and classes. These are mostly interlinked with the documentation in User Guide.
Subpackages#
- encodermap.autoencoder package
- Submodules
- encodermap.autoencoder.autoencoder module
AngleDihedralCartesianEncoderMap
Autoencoder
train_data
p
dataset
read_only
metrics
callbacks
encoder
decoder
loss
from_checkpoint
add_images_to_tensorboard
train
plot_network
encode
decode
generate
add_callback
add_images_to_tensorboard
add_loss
add_metric
close
decode
decoder
encode
encoder
from_checkpoint
generate
plot_network
save
set_train_data
train
DihedralEncoderMap
EncoderMap
- Module contents
- encodermap.callbacks package
- Submodules
- encodermap.callbacks.callbacks module
- encodermap.callbacks.metrics module
- Module contents
- encodermap.encodermap_tf1 package
- Submodules
- encodermap.encodermap_tf1.angle_dihedral_cartesian_encodermap module
- encodermap.encodermap_tf1.autoencoder module
- encodermap.encodermap_tf1.backmapping module
- encodermap.encodermap_tf1.encodermap module
- encodermap.encodermap_tf1.misc module
- encodermap.encodermap_tf1.moldata module
- encodermap.encodermap_tf1.parameters module
- encodermap.encodermap_tf1.plot module
- Module contents
- encodermap.loading package
- Submodules
- encodermap.loading.delayed module
- encodermap.loading.features module
AlignFeature
AllBondDistances
AllCartesians
AngleFeature
BackboneTorsionFeature
CentralAngles
CentralBondDistances
CentralCartesians
CentralDihedrals
ContactFeature
CustomFeature
DihedralFeature
DistanceFeature
GroupCOMFeature
InverseDistanceFeature
MinRmsdFeature
ResidueCOMFeature
ResidueMinDistanceFeature
SelectionFeature
SideChainAngles
SideChainBondDistances
SideChainCartesians
SideChainDihedrals
SideChainTorsions
- encodermap.loading.featurizer module
- Module contents
- encodermap.loss_functions package
- encodermap.misc package
- Submodules
- encodermap.misc.backmapping module
- encodermap.misc.clustering module
- encodermap.misc.distances module
- encodermap.misc.function_def module
- encodermap.misc.misc module
- encodermap.misc.rotate module
- encodermap.misc.saving_loading_models module
- encodermap.misc.summaries module
- encodermap.misc.xarray module
- encodermap.misc.xarray_save_wrong_hdf5 module
- Module contents
- encodermap.models package
- encodermap.moldata package
- encodermap.parameters package
- Submodules
- encodermap.parameters.parameters module
ADCParameters
track_clashes
track_RMSD
cartesian_pwd_start
cartesian_pwd_stop
cartesian_pwd_step
use_backbone_angles
use_sidechains
angle_cost_scale
angle_cost_variant
angle_cost_reference
dihedral_cost_scale
dihedral_cost_variant
dihedral_cost_reference
side_dihedral_cost_scale
side_dihedral_cost_variant
side_dihedral_cost_reference
cartesian_cost_scale
cartesian_cost_scale_soft_start
cartesian_cost_variant
cartesian_cost_reference
cartesian_dist_sig_parameters
cartesian_distance_cost_scale
multimer_training
multimer_topology_classes
multimer_connection_bridges
multimer_lengths
reconstruct_sidechains
_defaults
defaults_description
Parameters
defaults
main_path
n_neurons
activation_functions
periodicity
learning_rate
n_steps
batch_size
summary_step
checkpoint_step
dist_sig_parameters
distance_cost_scale
auto_cost_scale
auto_cost_variant
center_cost_scale
l2_reg_constant
gpu_memory_fraction
analysis_path
id
model_api
loss
batched
training
tensorboard
seed
current_training_step
write_summary
trainable_dense_to_sparse
using_hypercube
_defaults
defaults_description
- Module contents
- encodermap.plot package
- Submodules
- encodermap.plot.dashboard module
- encodermap.plot.interactive_plotting module
- encodermap.plot.jinja_template module
- encodermap.plot.plotting module
- encodermap.plot.utils module
- Module contents
- encodermap.trajinfo package
- Submodules
- encodermap.trajinfo.info_all module
TrajEnsemble
CVs
_CVs
n_trajs
n_frames
locations
top
basenames
name_arr
CVs
CVs_in_file
_CVs
basenames
batch_iterator
cluster
copy
dash_summary
del_CVs
del_featurizer
featurizer
frames
from_dataset
from_textfile
get_single_frame
id
index_arr
iterframes
itertrajs
join
load_CVs
load_custom_topology
load_trajs
locations
n_frames
n_residues
n_trajs
name_arr
parse_clustal_w_alignment
save
save_CVs
sidechain_info
split_into_frames
stack
subsample
tf_dataset
to_alignment_query
to_dataframe
top
top_files
traj_files
traj_joined
traj_nums
trajs_by_common_str
trajs_by_top
trajs_by_traj_num
tsel
unload
with_overwrite_trajnums
xyz
- encodermap.trajinfo.info_single module
SingleTraj
backend
common_str
index
traj_num
traj_file
top_file
CVs
CVs_in_file
_frames
_mdtraj_attr
_n_frames_base_h5_file
_original_frame_indices
_traj
atom_slice
basename
copy
dash_summary
del_CVs
extension
featurizer
from_pdb_id
fsel
get_single_frame
id
indices_chi1
indices_chi2
indices_chi3
indices_chi4
indices_chi5
indices_omega
indices_phi
indices_psi
iterframes
join
load_CV
load_custom_topology
load_traj
n_atoms
n_chains
n_frames
n_residues
save
save_CV_as_numpy
select
show_traj
sidechain_info
stack
superpose
top
top_file
traj
traj_file
unload
- encodermap.trajinfo.load_traj module
- encodermap.trajinfo.trajinfo_utils module
CustomTopology
add_amino_acid_codes
add_bonds
add_new_residue
atom_sequence
backbone_sequence
combine_chains
from_dict
from_hdf5_file
from_json
from_yaml
get_single_residue_atom_ids
indices_chi1
indices_chi2
indices_chi3
indices_chi4
indices_chi5
indices_omega
indices_phi
indices_psi
new_residues
sidechain_indices_by_residue
sidechain_sequence
to_dict
to_hdf_file
to_json
to_yaml
top
load_CVs_ensembletraj
load_CVs_singletraj
- Module contents
Submodules#
encodermap._typing module#
Typing for the encodermap package
encodermap._version module#
Encodermap’s versioning follows semantic versioning guidelines. Read more about them here: https://semver.org/
tldr: Given a version number MAJOR.MINOR.PATCH, increment the:
MAJOR version when you make incompatible API changes,
MINOR version when you add functionality in a backwards compatible manner, and
PATCH version when you make backwards compatible bug fixes.
Additional labels for pre-release and build metadata are available as extensions to the MAJOR.MINOR.PATCH format.
Current example: Currently I am writing this documentation. Writing this will not break an API, nor does it add functionality, nor does it fixes bugs. Thus, the version stays at 3.0.0
- get_config()[source]#
Create, populate and return the VersioneerConfig() object.
- Return type:
VersioneerConfig
- git_pieces_from_vcs(tag_prefix, root, verbose, runner=<function run_command>)[source]#
Get version from ‘git describe’ in the root of the source tree.
This only gets called if the git-archive ‘subst’ keywords were not expanded, and _version.py hasn’t already been rewritten with a short version string, meaning we’re inside a checked out source tree.
- git_versions_from_keywords(keywords, tag_prefix, verbose)[source]#
Get version information from git keywords.
- pep440_split_post(ver)[source]#
Split pep440 version string at the post-release segment.
Returns the release segments before the post-release and the post-release version number (or -1 if no post-release segment is present).
- register_vcs_handler(vcs, method)[source]#
Create decorator to mark a method as the handler of a VCS.
- render_git_describe(pieces)[source]#
TAG[-DISTANCE-gHEX][-dirty].
Like ‘git describe –tags –dirty –always’.
Exceptions: 1: no tags. HEX[-dirty] (note: no ‘g’ prefix)
- render_git_describe_long(pieces)[source]#
TAG-DISTANCE-gHEX[-dirty].
Like ‘git describe –tags –dirty –always -long’. The distance/hash is unconditional.
Exceptions: 1: no tags. HEX[-dirty] (note: no ‘g’ prefix)
- render_pep440(pieces)[source]#
Build up version string, with post-release “local version identifier”.
Our goal: TAG[+DISTANCE.gHEX[.dirty]] . Note that if you get a tagged build and then dirty it, you’ll get TAG+0.gHEX.dirty
Exceptions: 1: no tags. git_describe was just HEX. 0+untagged.DISTANCE.gHEX[.dirty]
- render_pep440_branch(pieces)[source]#
TAG[[.dev0]+DISTANCE.gHEX[.dirty]] .
The “.dev0” means not master branch. Note that .dev0 sorts backwards (a feature branch will appear “older” than the master branch).
Exceptions: 1: no tags. 0[.dev0]+untagged.DISTANCE.gHEX[.dirty]
- render_pep440_old(pieces)[source]#
TAG[.postDISTANCE[.dev0]] .
The “.dev0” means dirty.
Exceptions: 1: no tags. 0.postDISTANCE[.dev0]
- render_pep440_post(pieces)[source]#
TAG[.postDISTANCE[.dev0]+gHEX] .
The “.dev0” means dirty. Note that .dev0 sorts backwards (a dirty tree will appear “older” than the corresponding clean one), but you shouldn’t be releasing software with -dirty anyways.
Exceptions: 1: no tags. 0.postDISTANCE[.dev0]
- render_pep440_post_branch(pieces)[source]#
TAG[.postDISTANCE[.dev0]+gHEX[.dirty]] .
The “.dev0” means not master branch.
Exceptions: 1: no tags. 0.postDISTANCE[.dev0]+gHEX[.dirty]
- render_pep440_pre(pieces)[source]#
TAG[.postN.devDISTANCE] – No -dirty.
Exceptions: 1: no tags. 0.post0.devDISTANCE
- run_command(commands, args, cwd=None, verbose=False, hide_stderr=False, env=None)[source]#
Call the given command(s).
- versions_from_parentdir(parentdir_prefix, root, verbose)[source]#
Try to determine the version from the parent directory name.
Source tarballs conventionally unpack into a directory that includes both the project name and a version string. We will also support searching up two directory levels for an appropriately named parent directory
encodermap.kondata module#
Functions for interfacing with the University of Konstanz’s repository service KonDATA.
- get_from_kondata(dataset_name, output=None, force_overwrite=False, mk_parentdir=False, silence_overwrite_message=False, tqdm_class=None, download_extra_data=False, download_checkpoints=False, download_h5=True)[source]#
Get dataset from the University of Konstanz’s data repository KONData.
- Parameters:
dataset_name (str) – The name of the dataset. Refer to DATASET_URL_MAPPING to get a list of the available datasets.
output (Union[str, Path]) – The output directory.
force_overwrite (bool) – Whether to overwrite existing files. Defaults to False.
mk_parentdir (bool) – Whether to create the output directory if it does not already exist. Defaults to False.
silence_overwrite_message (bool) – Whether to silence the ‘file already exists’ warning. Can be useful in scripts. Defaults to False.
tqdm_class (Optional[Any]) – A class that is similar to tqdm.tqdm. This is mainly useful if this function is used inside a rich.status.Status context manager, as the normal tqdm does not work inside this context. If None is provided, the default tqdm will be used.
download_extra_data (bool) – Whether to download extra data. It Is only used if the dataset is not available on KonDATA. Defaults to False.
download_checkpoints (bool) – Whether to download pretrained checkpoints. It is only used if the dataset is not available on KonDATA. Defaults to False.
download_h5 (bool) – Whether to also download an h5 file of the ensemble. Defaults to True.
- Returns:
The output directory.
- Return type:
Module contents#
EncoderMap: Dimensionality reduction for molecular dynamics.
EncoderMap provides a framework for using molecular dynamics data with the tensorflow library. It started as the implementation of a neural network autoencoder to do dimensionality reduction and also create new high-dimensional data from the low-dimensional embedding. The user was still required to create their own dataset and provide the numpy arrays. In the second iteration of EncoderMap, the possibility to provide molecular dynamics data with the MolData class was added. A new neural network architecture was implemented to try and rebuild cartesian coordinates from the low-dimensional embedding.
This iteration of EncoderMap continues this endeavour by porting the old code to the newer tensorflow version (2.x). However, more has been added which should aid computational chemists and also structural biologists:
New trajectory classes with lazy loading of coordinates to accelerate analysis.
- Featurization which can be parallelized using the distributed computing
library dask.
Interactive plotly plots for clustering and structure creation.
- Neural network building blocks that allows users to easily build new
neural networks.
Sparse networks allow comparison of proteins with different topologies.
Todo
- [ ] Rework all notebooks.
[x] 01 Basic cube
[x] 02 asp7
[x] 03 your data
[ ] customization
[ ] Ensembles and ensemble classes
[ ] Ub mutants
[ ] sidechain reconstruction (if possible)
[ ] FAT10 (if possible)
[ ] Rewrite the install encodermap script in a github gist and add that to the notebooks.
[ ] Record videos.
- [~] Fix FAT 10 Nans
[ ] NaNs are fixed, but training still bad.
- [x] Check whether sigmoid values are good for FAT10
[x] Test [40, 10, 5, 1, 2, 5] (from linear dimers) and compare.
[ ] Test (20, 10, 5, 1, 2, 5)
- [~] Fix sidechain reconstruction NaNs
[ ] Try out LSTM layers
[ ] Try out gradient clipping
[~] Try out a higher regularization cost (increase l2 reg constant from 0.001 to 0.1)
[ ] Remove OTU11 from tests
[ ] Image for FAT10 decoding, if NaN error is fixed.
[ ] Delete commented stuff (i.e. all occurrences of more than 3 # signs in lines)
[ ] Fix the deterministic training for M1diUb
[ ] Add FAT10 to the deterministic training.
- class ADCParameters(**kwargs)[source]#
Bases:
ParametersFramework
This is the parameter object for the AngleDihedralCartesianEncoder. It holds all the parameters that the Parameters object includes, plus the following attributes:
- Parameters:
kwargs (ParametersData)
- track_clashes#
Whether to track the number of clashes during training. The average number of clashes is the average number of distances in the reconstructed cartesian coordinates with a distance smaller than 1 (nm). Defaults to False.
- Type:
- track_RMSD#
Whether to track the RMSD of the input and reconstructed cartesians during training. The RMSDs are computed along the batch by minimizing the .. math:
\text{RMSD}(\mathbf{x}, \mathbf{x}^{\text{ref}}) = \min_{\mathsf{R}, \mathbf{t}} % \sqrt{\frac{1}{N} \sum_{i=1}^{N} \left[ % (\mathsf{R}\cdot\mathbf{x}_{i}(t) + \mathbf{t}) - \mathbf{x}_{i}^{\text{ref}} \right]^{2}}
This results in n RMSD values, where n is the size of the batch. A mean RMSD of this batch and the values for this batch will be logged to tensorboard.
- Type:
- cartesian_pwd_start#
Index of the first atom to use for the pairwise distance calculation.
- Type:
- cartesian_pwd_step#
Step for the calculation of paiwise distances. E.g. for a chain of atoms N-C_a-C-N-C_a-C… cartesian_pwd_start=1 and cartesian_pwd_step=3 will result in using all C-alpha atoms for the pairwise distance calculation.
- Type:
- use_backbone_angles#
Allows to define whether backbone bond angles should be learned (True) or if instead mean values should be used to generate conformations (False).
- Type:
- angle_cost_variant#
Defines how the angle cost is calculated. Must be one of:
“mean_square”
“mean_abs”
“mean_norm”.
- Type:
- angle_cost_reference#
Can be used to normalize the angle cost with the cost of same reference model (dummy).
- Type:
- dihedral_cost_variant#
Defines how the dihedral cost is calculated. Must be one of:
“mean_square”
“mean_abs”
“mean_norm”.
- Type:
- dihedral_cost_reference#
Can be used to normalize the dihedral cost with the cost of same reference model (dummy).
- Type:
- side_dihedral_cost_scale#
Adjusts how much the side dihedral cost is weighted in the cost function.
- Type:
- side_dihedral_cost_variant#
Defines how the side dihedral cost is calculated. Must be one of:
“mean_square”
“mean_abs”
“mean_norm”.
- Type:
- side_dihedral_cost_reference#
Can be used to normalize the side dihedral cost with the cost of same reference model (dummy).
- Type:
- cartesian_cost_scale#
Adjusts how much the cartesian cost is weighted in the cost function.
- Type:
- cartesian_cost_scale_soft_start#
Allows to slowly turn on the cartesian cost. Must be a tuple with (start, end) or (None, None) If begin and end are given,
cartesian_cost_scale will be increased linearly in the
given range.
- Type:
- cartesian_cost_variant#
Defines how the cartesian cost is calculated. Must be one of:
“mean_square”
“mean_abs”
“mean_norm”.
- Type:
- cartesian_cost_reference#
Can be used to normalize the cartesian cost with the cost of same reference model (dummy).
- Type:
- cartesian_dist_sig_parameters#
Parameters for the sigmoid functions applied to the high- and low-dimensional distances in the following order (sig_h, a_h, b_h, sig_l, a_l, b_l).
- Type:
tuple of floats
- cartesian_distance_cost_scale#
Adjusts how much the cartesian distance cost is weighted in the cost function.
- Type:
- multimer_training#
Experimental feature.
- Type:
Any
- multimer_topology_classes#
Experimental feature.
- Type:
Any
- multimer_connection_bridges#
Experimental feature.
- Type:
Any
- multimer_lengths#
Experimental feature.
- Type:
Any
Examples
>>> import encodermap as em >>> import tempfile >>> from pathlib import Path ... >>> with tempfile.TemporaryDirectory() as td: ... td = Path(td) ... p = em.Parameters() ... print(p.auto_cost_variant) ... savepath = p.save(td / "parameters.json") ... print(savepath) ... new_params = em.Parameters.from_file(td / "parameters.json") ... print(new_params.main_path) mean_abs /tmp...parameters.json seems like the parameter file was moved to another directory. Parameter file is updated ... /home...
- _defaults = {'activation_functions': ['', 'tanh', 'tanh', ''], 'analysis_path': '', 'angle_cost_reference': 1, 'angle_cost_scale': 0, 'angle_cost_variant': 'mean_abs', 'auto_cost_scale': None, 'auto_cost_variant': 'mean_abs', 'batch_size': 256, 'batched': True, 'cartesian_cost_reference': 1, 'cartesian_cost_scale': 1, 'cartesian_cost_scale_soft_start': (None, None), 'cartesian_cost_variant': 'mean_abs', 'cartesian_dist_sig_parameters': (4.5, 12, 6, 1, 2, 6), 'cartesian_distance_cost_scale': 1, 'cartesian_pwd_start': None, 'cartesian_pwd_step': None, 'cartesian_pwd_stop': None, 'center_cost_scale': 0.0001, 'checkpoint_step': 5000, 'current_training_step': 0, 'dihedral_cost_reference': 1, 'dihedral_cost_scale': 1, 'dihedral_cost_variant': 'mean_abs', 'dist_sig_parameters': (4.5, 12, 6, 1, 2, 6), 'distance_cost_scale': None, 'gpu_memory_fraction': 0, 'id': '', 'l2_reg_constant': 0.001, 'learning_rate': 0.001, 'loss': 'emap_cost', 'model_api': 'functional', 'multimer_connection_bridges': None, 'multimer_lengths': None, 'multimer_topology_classes': None, 'multimer_training': None, 'n_neurons': [128, 128, 2], 'n_steps': 1000, 'periodicity': 6.283185307179586, 'reconstruct_sidechains': False, 'seed': None, 'side_dihedral_cost_reference': 1, 'side_dihedral_cost_scale': 0.5, 'side_dihedral_cost_variant': 'mean_abs', 'summary_step': 10, 'tensorboard': False, 'track_RMSD': False, 'track_clashes': False, 'trainable_dense_to_sparse': False, 'training': 'auto', 'use_backbone_angles': False, 'use_sidechains': False, 'using_hypercube': False, 'write_summary': False}#
- class AngleDihedralCartesianEncoderMap(trajs=None, parameters=None, model=None, read_only=False, dataset=None, ensemble=False, use_dataset_when_possible=True, deterministic=False)[source]#
Bases:
object
Different __init__ method, than Autoencoder Class. Uses callbacks to tune-in cartesian cost.
Overwritten methods: _set_up_callbacks and generate.
Examples
>>> import encodermap as em >>> from pathlib import Path >>> # Load two trajectories >>> test_data = Path(em.__file__).parent.parent / "tests/data" >>> test_data.is_dir() True >>> xtcs = [test_data / "1am7_corrected_part1.xtc", test_data / "1am7_corrected_part2.xtc"] >>> tops = [test_data / "1am7_protein.pdb", test_data /"1am7_protein.pdb"] >>> trajs = em.load(xtcs, tops) >>> print(trajs) encodermap.TrajEnsemble object. Current backend is no_load. Containing 2 trajectories. Not containing any CVs. >>> # load CVs >>> # This step can be omitted. The AngleDihedralCartesianEncoderMap class automatically loads CVs >>> trajs.load_CVs('all') >>> print(trajs.CVs['central_cartesians'].shape) (51, 474, 3) >>> print(trajs.CVs['central_dihedrals'].shape) (51, 471) >>> # create some parameters >>> p = em.ADCParameters(periodicity=360, use_backbone_angles=True, use_sidechains=True, ... cartesian_cost_scale_soft_start=(6, 12)) >>> # Standard is functional model, as it offers more flexibility >>> print(p.model_api) functional >>> print(p.distance_cost_scale) None >>> # Instantiate the class >>> e_map = em.AngleDihedralCartesianEncoderMap(trajs, p, read_only=True) Model... >>> # dataset contains these inputs: >>> # central_angles, central_dihedrals, central_cartesians, central_distances, sidechain_dihedrals >>> print(e_map.dataset) <BatchDataset element_spec=(TensorSpec(shape=(None, 472), dtype=tf.float32, name=None), TensorSpec(shape=(None, 471), dtype=tf.float32, name=None), TensorSpec(shape=(None, 474, 3), dtype=tf.float32, name=None), TensorSpec(shape=(None, 473), dtype=tf.float32, name=None), TensorSpec(shape=(None, 316), dtype=tf.float32, name=None))> >>> # output from the model contains the following data: >>> # out_angles, out_dihedrals, back_cartesians, pairwise_distances of inp cartesians, pairwise of back-mapped cartesians, out_side_dihedrals >>> for data in e_map.dataset.take(1): ... pass >>> out = e_map.model(data) >>> print([i.shape for i in out]) [TensorShape([256, 472]), TensorShape([256, 471]), TensorShape([256, 474, 3]), TensorShape([256, 112101]), TensorShape([256, 112101]), TensorShape([256, 316])] >>> # get output of latent space by providing central_angles, central_dihedrals, sidehcain_dihedrals >>> latent = e_map.encoder([data[0], data[1], data[-1]]) >>> print(latent.shape) (256, 2) >>> # Rebuild central_angles, central_dihedrals and sidechain_angles from latent >>> dih, ang, side_dih = e_map.decode(latent) >>> print(dih.shape, ang.shape, side_dih.shape) (256, 472) (256, 471) (256, 316)
- Parameters:
trajs (Optional[TrajEnsemble])
parameters (Optional[ADCParameters])
model (Optional[tf.keras.Model])
read_only (bool)
dataset (Optional[tf.data.Dataset])
ensemble (bool)
use_dataset_when_possible (bool)
deterministic (bool)
- add_images_to_tensorboard(*args, **kwargs)[source]#
Adds images of the latent space to tensorboard.
- Parameters:
data (Optional[Union[np.ndarray, Sequence[np.ndarray]]) – The input-data will be passed through the encoder part of the autoencoder. If None is provided, a set of 10_000 points from self.train_data will be taken. A list[np.ndarray] is needed for the functional API of the AngleDihedralCartesianEncoderMap, that takes a list of [angles, dihedrals, side_dihedrals]. Defaults to None.
image_step (Optional[int]) – The interval in which to plot images to tensorboard. If None is provided, the image_step will be the same as Parameters.summary_step. Defaults to None.
max_size (int) – The maximum size of the high-dimensional data, that is projected. Prevents excessively large-datasets from being projected at every image_step. Defaults to 10_000.
scatter_kws (Optional[dict[str, Any]]) – A dict with items that plotly.express.scatter() will accept. If None is provided, a dict with size 20 will be passed to px.scatter(**{‘size_max’: 10, ‘opacity’: 0.2}), which sets an appropriate size of scatter points for the size of datasets encodermap is usually used for.
hist_kws (Optional[dict[str, Any]]) – A dict with items that encodermap.plot.plotting._plot_free_energy() will accept. If None is provided a dict with bins 50 will be passed to encodermap.plot.plotting._plot_free_energy(**{‘bins’: 50}). You can choose a colormap here by providing {‘bins’: 50, ‘cmap’: ‘plasma’} for this argument.
additional_fns (Optional[Sequence[Callable]]) – A list of functions that will accept the low-dimensional output of the Autoencoder latent/bottleneck layer and return a tf.Tensor that can be logged by tf.summary.image(). See the notebook ‘writing_custom_images_to_tensorboard.ipynb’ in tutorials/notebooks_customization for more info. If None is provided, no additional functions will be used to plot to tensorboard. Defaults to None.
when (Literal["epoch", "batch"]) – When to log the images can be either ‘batch’, then the images will be logged after every step during training, or ‘epoch’, then only after every image_step epoch the images will be written. Defaults to ‘epoch’.
save_to_disk (bool) – Whether to also write the images to disk.
args (Any)
kwargs (Any)
- Return type:
None
- decode(data)[source]#
Calls the decoder part of the model.
AngleDihedralCartesianAutoencoder will, like the other two classes’ output a list of np.ndarray.
- Parameters:
data (np.ndarray) – The data to be passed to the decoder part of the model. Make sure that the shape of the data matches the number of neurons in the latent space.
- Returns:
- Outputs from the decoder part.
For AngleDihedralCartesianEncoderMap, this will be a list of np.ndarray.
- Return type:
Union[list[np.ndarray], np.ndarray]
- property decoder: Model#
The decoder Model.
- Type:
tf.keras.Model
- encode(data=None)[source]#
Runs the central_angles, central_dihedrals, (side_dihedrals) through the autoencoder. Make sure that data has the correct shape.
- Parameters:
data (Sequence[np.ndarray]) – Provide a sequence of angles, and central_dihedrals, if you used sidechain_dihedrals during training append these to the end of the sequence.
- Returns:
The latent space representation of the provided data.
- Return type:
np.ndarray
- property encoder: Model#
The encoder Model.
- Type:
tf.keras.Model
- classmethod from_checkpoint(trajs, checkpoint_path, dataset=None, use_previous_model=False, compat=False)[source]#
Reconstructs the model from a checkpoint.
Although the model can be loaded from disk without any form of data and still yield the correct input and output shapes, it is required to either provide trajs or dataset to double-check, that the correct model will be reloaded.
This is also, whe the sparse argument is not needed, as sparcity of the input data is a property of the TrajEnsemble provided.
- Parameters:
trajs (Union[None, TrajEnsemble]) – Either None (in which case, the argument dataset is required), or an instance of TrajEnsemble, which was used to instantiate the AngleDihedralCartesianEncoderMap, before it was saved to disk.
checkpoint_path (Union[Path, str]) – The path to the checkpoint. Can either be the path to a .keras file or to a directory containing .keras files, in which case the most recently created .keras file will be used.
dataset (Optional[tf.data.Dataset]) – If trajs is not provided, a dataset is required to make sure the input shapes match the model, that is stored on the disk.
use_previous_model (bool) – Set this flag to True, if you load a model from an in-between checkpoint step (e.g., to continue training with different parameters). If you have the files saved_model_0.keras, saved_model_500.keras and saved_model_1000.keras, setting this to True and loading the saved_model_500.keras will back up the saved_model_1000.keras.
compat (bool) – Whether to use compatibility mode when missing or wrong parameter files are present. In this special case, some assumptions about the network architecture are made from the model and the parameters in parameters.json overwritten accordingly (a backup will also be made).
- Returns:
An instance of AngleDihedralCartesianEncoderMap.
- Return type:
AngleDihedralCartesianEncoderMapType
- generate(points: ndarray, top: str | int | Topology | None, backend: Literal['mdtraj'], progbar: Any | None) Trajectory [source]#
- generate(points: ndarray, top: str | int | Topology | None, backend: Literal['mdanalysis'], progbar: Any | None) Universe
Overrides the parent class’ generate method and builds a trajectory.
Instead of just providing data to decode using the decoder part of the network, this method also takes a molecular topology as its top argument. This topology is then used to rebuild a time-resolved trajectory.
- Parameters:
points (np.ndarray) – The low-dimensional points from which the trajectory should be rebuilt.
top (Optional[str, int, mdtraj.Topology]) – The topology to be used for rebuilding the trajectory. This should be a string pointing towards a <*.pdb, *.gro, *.h5> file. Alternatively, None can be provided; in which case, the internal topology (self.top) of this class is used. Defaults to None.
backend (str) –
Defines what MD python package is to use, to build the trajectory and also what type this method returns, needs to be one of the following:
”mdtraj”
”mdanalysis”
- Returns:
- The trajectory after
applying the decoded structural information. The type of this depends on the chosen backend parameter.
- Return type:
Union[mdtraj.Trajectory, MDAnalysis.universe]
- static get_train_data_from_trajs(trajs, p, attr='CVs', max_size=-1)[source]#
Builds train data from a TrajEnsemble.
- Parameters:
trajs (TrajEnsemble) – A TrajEnsemble instance.
p (encodermap.parameters.ADCParameters) – An instance of encodermap.parameters.ADCParameters.
attr (str) – Which attribute to get from TrajEnsemble. This defaults to ‘CVs’, because ‘CVs’ is usually a dict containing the CV data. However, you can build the train data from any dict in the TrajEnsemble.
max_size (int) – When you only want a subset of the CV data. Set this to the desired size.
- Returns:
- A tuple containing the following:
- bool: A bool that shows whether some ‘CV’ values are np.nan (True),
which will be used to decide whether the sparse training will be used.
- list[np.ndarray]: An array of features fed into the autoencoder,
concatenated along the feature axis. The order of the features is: central_angles, central_dihedral, (side_dihedrals if p.use_sidechain_dihedrals is True).
- dict[str, np.ndarray]: The training data as a dict. Containing
all values in trajs.CVs.
- Return type:
- plot_network()[source]#
Tries to plot the network using pydot, pydotplus and graphviz. Doesn’t raise an exception if plotting is not possible.
Note
Refer to this guide to install these programs: https://stackoverflow.com/questions/47605558/importerror-failed-to-import-pydot-you-must-install-pydot-and-graphviz-for-py
- Return type:
None
- save(step=None)[source]#
Saves the model to the current path defined in parameters.main_path.
- Parameters:
step (Optional[int]) – Does not save the model at the given training step, but rather changes the string used for saving the model from a datetime format to another.
- Returns:
- When the model has been saved, the Path will
be returned. If the model could not be saved. None will be returned.
- Return type:
Union[None, Path]
- set_train_data(data)[source]#
Resets the train data for reloaded models.
- Parameters:
data (TrajEnsemble)
- Return type:
None
- class Autoencoder(parameters=None, train_data=None, model=None, read_only=False, sparse=False)[source]#
Bases:
object
Main Autoencoder class. Presents all high-level functions.
This is the main class for neural networks inside EncoderMap. The class prepares the data (batching and shuffling), creates a tf.keras.Model of layers specified by the attributes of the encodermap.Parameters class. Depending on what Parent/Child-Class is instantiated, a combination of various cost functions is set up. Callbacks to Tensorboard are also set up.
- Parameters:
- train_data#
The numpy array of the train data passed at init.
- Type:
np.ndarray
- p#
An encodermap.Parameters class containing all info needed to set up the network.
- Type:
AnyParameters
- dataset#
The dataset that is actually used in training the keras model. The dataset is a batched, shuffled, infinitely-repeating dataset.
- Type:
tensorflow.data.Dataset
- read_only#
Variable telling the class whether it is allowed to write to disk (False) or not (True).
- Type:
- callbacks#
A list of tf.keras.callbacks.Callback subclasses changing the behavior of the model during training. Some standard callbacks are always present like:
- encodermap.callbacks.callbacks.ProgressBar:
A progress bar callback using tqdm giving the current progress of training and the current loss.
- CheckPointSaver:
A callback that saves the model every parameters.checkpoint_step steps into the main directory. This callback will only be used, when read_only is False.
- TensorboardWriteBool:
A callback that contains a boolean Tensor that will be True or False, depending on the current training step and the summary_step in the parameters class. The loss functions use this callback to decide whether they should write to Tensorboard. This callback will only be present when read_only is False and parameters.tensorboard is True.
You can append your own callbacks to this list before executing self.train().
- Type:
list[Any]
- encoder#
The encoder submodel of self.model.
- Type:
tf.keras.Model
- decoder#
The decoder submodel of self.model.
- Type:
tf.keras.Model
- loss#
A list of loss functions passed to the model when it is compiled. When the main Autoencoder class is used and parameters.loss is ‘emap_cost’, this list comprises center_cost, regularization_cost, auto_cost. When the EncoderMap sub-class is used and parameters.loss is ‘emap_cost’, distance_cost is added to the list. When parameters.loss is not ‘emap_cost’, the loss can either be a string (‘mse’), or a function, that both are acceptable arguments for loss, when a keras model is compiled.
- Type:
Sequence[Callable]
- plot_network()[source]#
Tries to plot the network. For this method to work graphviz, pydot and pydotplus need to be installed.
- Return type:
None
- generate()[source]#
Same as decode. For AngleDihedralCartesianAutoencoder classes, this will build a protein strutcure.
Note
Performance of tensorflow is not only dependent on your system’s hardware and how the data is presented to the network (for this check out https://www.tensorflow.org/guide/data_performance), but also how you compiled tensorflow. Normal tensorflow (pip install tensorflow) is build without CPU extensions to work on many CPUs. However, Tensorflow can greatly benefit from using CPU instructions like AVX2, AVX512 that bring a speed-up in linear algebra computations of 300%. By building tensorflow from source, you can activate these extensions. However, the speed-up of using tensorflow with a GPU dwarfs the CPU speed-up. To check whether a GPU is available run: print(len(tf.config.list_physical_devices(‘GPU’))). Refer to these pages to install tensorflow for the best performance: https://www.tensorflow.org/install/pip and https://www.tensorflow.org/install/gpu
Examples
>>> import encodermap as em >>> # without providing any data, default parameters and a 4D >>> # hypercube as input data will be used. >>> e_map = em.EncoderMap(read_only=True) >>> print(e_map.train_data.shape) (16000, 4) >>> print(e_map.dataset) <BatchDataset element_spec=(TensorSpec(shape=(None, 4), dtype=tf.float32, name=None), TensorSpec(shape=(None, 4), dtype=tf.float32, name=None))> >>> print(e_map.encode(e_map.train_data).shape) (16000, 2)
- add_images_to_tensorboard(*args, **kwargs)[source]#
Adds images of the latent space to tensorboard.
- Parameters:
data (Optional[Union[np.ndarray, Sequence[np.ndarray]]) – The input-data will be passed through the encoder part of the autoencoder. If None is provided, a set of 10_000 points from self.train_data will be taken. A list[np.ndarray] is needed for the functional API of the AngleDihedralCartesianEncoderMap, that takes a list of [angles, dihedrals, side_dihedrals]. Defaults to None.
image_step (Optional[int]) – The interval in which to plot images to tensorboard. If None is provided, the image_step will be the same as Parameters.summary_step. Defaults to None.
max_size (int) – The maximum size of the high-dimensional data, that is projected. Prevents excessively large-datasets from being projected at every image_step. Defaults to 10_000.
scatter_kws (Optional[dict[str, Any]]) – A dict with items that plotly.express.scatter() will accept. If None is provided, a dict with size 20 will be passed to px.scatter(**{‘size_max’: 10, ‘opacity’: 0.2}), which sets an appropriate size of scatter points for the size of datasets encodermap is usually used for.
hist_kws (Optional[dict[str, Any]]) – A dict with items that encodermap.plot.plotting._plot_free_energy() will accept. If None is provided a dict with bins 50 will be passed to encodermap.plot.plotting._plot_free_energy(**{‘bins’: 50}). You can choose a colormap here by providing {‘bins’: 50, ‘cmap’: ‘plasma’} for this argument.
additional_fns (Optional[Sequence[Callable]]) – A list of functions that will accept the low-dimensional output of the Autoencoder latent/bottleneck layer and return a tf.Tensor that can be logged by tf.summary.image(). See the notebook ‘writing_custom_images_to_tensorboard.ipynb’ in tutorials/notebooks_customization for more info. If None is provided, no additional functions will be used to plot to tensorboard. Defaults to None.
when (Literal["epoch", "batch"]) – When to log the images can be either ‘batch’, then the images will be logged after every step during training, or ‘epoch’, then only after every image_step epoch the images will be written. Defaults to ‘epoch’.
save_to_disk (bool) – Whether to also write the images to disk.
args (Any)
kwargs (Any)
- Return type:
None
- decode(data)[source]#
Calls the decoder part of the model.
AngleDihedralCartesianAutoencoder will, like the other two classes’ output a list of np.ndarray.
- Parameters:
data (np.ndarray) – The data to be passed to the decoder part of the model. Make sure that the shape of the data matches the number of neurons in the latent space.
- Returns:
- Outputs from the decoder part.
For AngleDihedralCartesianEncoderMap, this will be a list of np.ndarray.
- Return type:
Union[list[np.ndarray], np.ndarray]
- property decoder: Model#
Decoder part of the model.
- Type:
tf.keras.Model
- encode(data=None)[source]#
Calls encoder part of self.model.
- Parameters:
data (Optional[np.ndarray]) – The data to be passed top the encoder part. It can be either numpy ndarray or None. If None is provided, a set of 10000 points from the provided train data will be taken. Defaults to None.
- Returns:
The output from the bottleneck/latent layer.
- Return type:
np.ndarray
- property encoder: Model#
Encoder part of the model.
- Type:
tf.keras.Model
- classmethod from_checkpoint(checkpoint_path, train_data=None, sparse=False, use_previous_model=False, compat=False)[source]#
Reconstructs the class from a checkpoint.
- Parameters:
checkpoint_path (Union[str, Path]) – The path to the checkpoint. Can be either a directory, in which case the most recently saved model will be loaded. Or a direct .keras file, in which case, this specific model will be loaded.
train_data (Optional[np.ndarray]) – can provide the train data here.
sparse (bool) – Whether the reloaded model should be sparse.
use_previous_model (bool) – Set this flag to True, if you load a model from an in-between checkpoint step (e.g., to continue training with different parameters). If you have the files saved_model_0.keras, saved_model_500.keras and saved_model_1000.keras, setting this to True and loading the saved_model_500.keras will back up the saved_model_1000.keras.
compat (bool) – Whether to use compatibility mode when missing or wrong parameter files are present. In this special case, some assumptions about the network architecture are made from the model and the parameters in parameters.json overwritten accordingly (a backup will also be made).
- Returns:
Encodermap Autoencoder class.
- Return type:
- generate(data)[source]#
Duplication of self.decode.
In Autoencoder and EncoderMap this method is equivalent to decode(). In AngleDihedralCartesianEncoderMap this method will be overwritten to produce output molecular conformations.
- Parameters:
data (np.ndarray) – The data to be passed to the decoder part of the model. Make sure that the shape of the data matches the number of neurons in the latent space.
- Returns:
- Outputs from the decoder part. For
AngleDihedralCartesianEncoderMap, this will either be a mdtraj.Trajectory or MDAnalysis.Universe.
- Return type:
np.ndarray
- plot_network()[source]#
Tries to plot the network using pydot, pydotplus and graphviz. Doesn’t raise an exception if plotting is not possible.
Note
Refer to this guide to install these programs: https://stackoverflow.com/questions/47605558/importerror-failed-to-import-pydot-you-must-install-pydot-and-graphviz-for-py
- Return type:
None
- save(step=None)[source]#
Saves the model to the current path defined in parameters.main_path.
- Parameters:
step (Optional[int]) – Does not save the model at the given training step, but rather changes the string used for saving the model from a datetime format to another.
- Returns:
- When the model has been saved, the Path will
be returned. If the model could not be saved. None will be returned.
- Return type:
Union[None, Path]
- class EncoderMap(parameters=None, train_data=None, model=None, read_only=False, sparse=False)[source]#
Bases:
Autoencoder
Complete copy of Autoencoder class but uses additional distance cost scaled by the SketchMap sigmoid params
- Parameters:
- classmethod from_checkpoint(checkpoint_path, train_data=None, sparse=False, use_previous_model=False, compat=False)[source]#
Reconstructs the class from a checkpoint.
- Parameters:
checkpoint_path (Union[str, Path]) – The path to the checkpoint. Can be either a directory, in which case the most recently saved model will be loaded. Or a direct .keras file, in which case, this specific model will be loaded.
train_data (Optional[np.ndarray]) – can provide the train data here.
sparse (bool) – Whether the reloaded model should be sparse.
use_previous_model (bool) – Set this flag to True, if you load a model from an in-between checkpoint step (e.g., to continue training with different parameters). If you have the files saved_model_0.keras, saved_model_500.keras and saved_model_1000.keras, setting this to True and loading the saved_model_500.keras will back up the saved_model_1000.keras.
compat (bool) – Whether to use compatibility mode when missing or wrong parameter files are present. In this special case, some assumptions about the network architecture are made from the model and the parameters in parameters.json overwritten accordingly (a backup will also be made).
- Returns:
EncoderMap EncoderMap class.
- Return type:
- class EncoderMapBaseCallback(parameters=None)[source]#
Bases:
Callback
Base class for callbacks in EncoderMap.
The Parameters class in EncoderMap has a summary_step variable that dictates when variables and other tensors are logged to TensorBoard. No matter what property is logged there will always be a code section executing a if train_step % summary_step == 0 code snippet. This is handled centrally in this class. This class is instantiated inside the user-facing AutoEncoderClass classes and is provided with the appropriate parameters (Parameters for EncoderMap and ADCParameters for AngleDihedralCartesianEncoderMap). Thus, subclassing this class does not need to implement a new __init__ method. Only the on_summary_step and the on_checkpoint_step methods need to be implemented for sub-classes if this class with code that should happen when these events happen.
Examples:
In this example, the on_summary_step method causes an exception.
>>> from typing import Optional >>> import encodermap as em ... >>> class MyCallback(em.callbacks.EncoderMapBaseCallback): ... def on_summary_step(self, step: int, logs: Optional[dict] = None) -> None: ... raise Exception(f"Summary step {self.steps_counter} has been reached.") ... >>> emap = em.EncoderMap() Output... >>> emap.add_callback(MyCallback) >>> emap.train() Traceback (most recent call last): ... Exception: Summary step 10 has been reached.
- Parameters:
parameters (Optional['AnyParameters'])
- p (Union[encodermap.parameters.Parameters, encodermap.parameters.ADCParameters]
The parameters for this callback. Based on the summary_step and checkpoint_step of the encodermap.parameters.Parameters class different class-methods are called.
- on_checkpoint_step(step, logs=None)[source]#
Executed, when the currently finished batch matches encodermap.Parameters.checkpoint_step
- on_summary_step(step, logs=None)[source]#
Executed, when the currently finished batch matches encodermap.Parameters.summary_step
- on_train_batch_end(batch, logs=None)[source]#
Called after a batch ends. The number of batch is provided by keras.
This method is the backbone of all of EncoderMap’s callbacks. After every batch is method is called by keras. When the number of that batch matches either encodermap.Parameters.summary_step or encodermap.Parameters.checkpoint_step the code on self.on_summary_step, or self.on_checkpoint_step is executed. These methods should be overwritten by child classes.
- class Featurizer(traj)[source]#
Bases:
object
EncoderMap’s featurization has drawn much inspiration from PyEMMA (markovmodel/PyEMMA).
EncoderMap’s Featurizer collects and computes collective variables (CVs). CVs are data that are aligned with MD trajectories on the frame/time axis. Trajectory data contains (besides the topology) an axis for atoms, and an axis for cartesian coordinate (x, y, z), so that a trajectory can be understood as an array with shape (n_frames, n_atoms, 3). A CV is an array that is aligned with the frame/time and has its own feature axis. If the trajectory in our example has 3 residues (MET, ALA, GLY), we can define 6 dihedral angles along the backbone of this peptide. These angles are:
PSI1: Between MET1-N - MET1-CA - MET1-C - ALA2-N
OMEGA1: Between MET1-CA - MET1-C - ALA2-N - ALA2-CA
PHI1: Between MET1-C - ALA2-N - ALA2-CA - ALA2-C
PSI2: Between ALA2-N - ALA2-CA - ALA2-C - GLY3-N
OMEGA2: Between ALA2-CA - ALA2-C - GLY3-N - GLY3-CA
PHI2: Between ALA2-C - GLY3-N - GLY3-CA - GLY3-C
Thus, the collective variable ‘backbone-dihedrals’ provides an array of shape (n_frames, 6) and is aligned with the frame/time axis of the trajectory.
- Parameters:
traj (Union[SingleTraj, TrajEnsemble])
- class InteractivePlotting(autoencoder=None, trajs=None, lowd_data=None, highd_data=None, align_string='name CA', top=None, ball_and_stick=False, histogram_type='free_energy', superpose=True, ref_align_string='name CA', base_traj=None)[source]#
Bases:
object
EncoderMap’s interactive plotting for jupyter notebooks.
Instantiating this class will display an interactive display in your notebook. The display will look like this:
┌─────────────────────┐ ┌───────────┐ │Display │ │Top │ └─────────────────────┘ └───────────┘ ┌─────────────┐ ┌───┐ ┌─────────────┐ │ │ │ │ │ │ │ │ │ T │ │ │ │ Main │ │ R │ │ Molecular │ │ Plotting │ │ A │ │ Conform. │ │ Area │ │ C │ │ Area │ │ │ │ E │ │ │ │ │ │ │ │ │ └─────────────┘ └───┘ └─────────────┘ ┌───┐ ┌─────────────────────────────┐ │ │ │Progress Bar │ └───┘ └─────────────────────────────┘ ┌─┐ ┌─┐ ┌─┐ ┌─┐ ┌───────────────────┐ │C│ │G│ │S│ │D│ │Slider │ └─┘ └─┘ └─┘ └─┘ └───────────────────┘ ┌────────────────┐ ┌───────────────┐ │ │ │ │ │ Data │ │ │ │ Overview │ │ │ │ │ │ │ │ │ │ │ └────────────────┘ └───────────────┘
- The components do the following:
- Display:
This part will display debug information.
- Top (Top selector):
Select which topology to use when creating new molecular conformations from the autoencoder network.
- Main plotting area:
In this area, a scatter plot will be displayed. The coordinates of the scatter plot will be taken from the low-dimensional projection of the trajectories. The data for this plotting area can be taken from different sources. See the _lowd_parser docstring for information on how the lowd data is selected. Clicking on a point in the scatter plot displays the conformation of that point.
- TRACE:
Displays the high-dimensinal data of selected points or clusters.
- Molecular conformation area:
Displays molecular conformations.
- Progress Bar:
Displays progress.
- C (Cluster button):
After selecting point in the main plotting area with the lasso tool, hit this button to display the molecular conformations of the selected cluster.
- G (Generate Button):
Switch to density using the density button. Then, you can draw a freeform path into the Main plotting area. Pressing the generate button will generate the appropriate molecular conformations. If your data has multiple conformations, you can choose which conformation to use for decoding with the top selector.
- S (Save button):
Writes either a cluster or generated path to your disk. Uses the main_path of the autoencoder (the same directory as the training data will be stored).
- D (Density button):
Switch the main plotting area to Density.
- Slider:
In scatter mode this slider defines how many structures to select from a cluster for representation in the molecular conformations window. In density mode, this slider defines how many points along the user-drawn path should be sampled.
- Parameters:
autoencoder (Optional[AutoencoderClass])
trajs (Optional[Union[str, list[str], TrajEnsemble, SingleTraj]])
lowd_data (Optional[np.ndarray])
highd_data (Optional[np.ndarray])
align_string (str)
ball_and_stick (bool)
histogram_type (Union[None, Literal['free_energy', 'density']])
superpose (bool)
ref_align_string (str)
base_traj (Optional[Trajectory])
- MolData#
alias of
NewMolData
- class Parameters(**kwargs)[source]#
Bases:
ParametersFramework
Class to hold Parameters for the Autoencoder
Parameters can be set via keyword args while instantiating the class, set as instance attributes or read from disk. This class can write parameters to disk in .yaml or .json format.
- Parameters:
kwargs (ParametersData)
- defaults#
Classvariable dict that holds the defaults even when the current values might have changed.
- Type:
- n_neurons#
List containing number of neurons for each layer up to the bottleneck layer. For example [128, 128, 2] stands for an autoencoder with the following architecture {i, 128, 128, 2, 128, 128, i} where i is the number of dimensions of the input data. These are Input/Output Layers that are not trained.
- activation_functions#
List of activation function names as implemented in TensorFlow. For example: “relu”, “tanh”, “sigmoid” or “” to use no activation function. The encoder part of the network takes the activation functions from the list starting with the second element. The decoder part of the network takes the activation functions in reversed order starting with the second element form the back. For example [“”, “relu”, “tanh”, “”] would result in a autoencoder with {“relu”, “tanh”, “”, “tanh”, “relu”, “”} as sequence of activation functions.
- periodicity#
Defines the distance between periodic walls for the inputs. For example 2pi for angular values in radians. All periodic data processed by EncoderMap must be wrapped to one periodic window. E.g. data with 2pi periodicity may contain values from -pi to pi or from 0 to 2pi. Set the periodicity to float(“inf”) for non-periodic inputs.
- Type:
- dist_sig_parameters#
Parameters for the sigmoid functions applied to the high- and low-dimensional distances in the following order (sig_h, a_h, b_h, sig_l, a_l, b_l)
- Type:
tuple of floats
- distance_cost_scale#
Adjusts how much the distance based metric is weighted in the cost function.
- Type:
- auto_cost_variant#
defines how the auto cost is calculated. Must be one of: * mean_square * mean_abs * mean_norm
- Type:
- center_cost_scale#
Adjusts how much the centering cost is weighted in the cost function.
- Type:
- l2_reg_constant#
Adjusts how much the L2 regularisation is weighted in the cost function.
- Type:
- gpu_memory_fraction#
Specifies the fraction of gpu memory blocked. If set to 0, memory is allocated as needed.
- Type:
- id#
Can be any name for the run. Might be useful for example for specific analysis for different data sets.
- Type:
- model_api#
A string defining the API to be used to build the keras model. Defaults to sequntial. Possible strings are: * functional will use keras’ functional API. * sequential will define a keras Model, containing two other models with the Sequential API.
These two models are encoder and decoder.
custom will create a custom Model where even the layers are custom.
- Type:
- loss#
A string defining the loss function. Defaults to emap_cost. Possible losses are: * reconstruction_loss will try to train output == input * mse: Returns a mean squared error loss. * emap_cost is the EncoderMap loss function. Depending on the class Autoencoder,
Encodermap, `ADCAutoencoder, different contributions are used for a combined loss. Autoencoder uses atuo_cost, reg_cost, center_cost. EncoderMap class adds sigmoid_loss.
- Type:
- training#
A string defining what kind of training is performed when autoencoder.train() is callsed. * auto does a regular model.compile() and model.fit() procedure. * custom uses gradient tape and calculates losses and gradients manually.
- Type:
- seed#
Fixes the state of all operations using random numbers. Defaults to None.
- Type:
Union[int, None]
- write_summary#
If True writes a summar.txt of the models into main_path if tensorboard is True, summaries will also be written.
- Type:
- trainable_dense_to_sparse#
When using different topologies to train the AngleDihedralCartesianEncoderMap, some inputs might be sparse, which means, they have missing values. Creating a dense input is done by first passing these sparse tensors through tf.keras.layers.Dense layers. These layers have trainable weights, and if this parameter is True, these weights will be changed by the optimizer.
- Type:
- using_hypercube#
This parameter is not meant to be set by the user. It allows us to print better error messages when re-loading and re-training a model. It contains a boolean whether a model has been trained on the hypercube example data. If your data is 4-dimensional and you reload a model and forget to prvide your data, the model will happily train with the hypercube (and not your) data. This variable implements a check.
- Type:
Examples
>>> import encodermap as em >>> import tempfile >>> from pathlib import Path ... >>> with tempfile.TemporaryDirectory() as td: ... td = Path(td) ... p = em.Parameters() ... print(p.auto_cost_variant) ... savepath = p.save(td / "parameters.json") ... print(savepath) ... new_params = em.Parameters.from_file(td / "parameters.json") ... print(new_params.main_path) mean_abs /tmp...parameters.json seems like the parameter file was moved to another directory. Parameter file is updated ... /home...
- _defaults = {'activation_functions': ['', 'tanh', 'tanh', ''], 'analysis_path': '', 'auto_cost_scale': 1, 'auto_cost_variant': 'mean_abs', 'batch_size': 256, 'batched': True, 'center_cost_scale': 0.0001, 'checkpoint_step': 5000, 'current_training_step': 0, 'dist_sig_parameters': (4.5, 12, 6, 1, 2, 6), 'distance_cost_scale': 500, 'gpu_memory_fraction': 0, 'id': '', 'l2_reg_constant': 0.001, 'learning_rate': 0.001, 'loss': 'emap_cost', 'model_api': 'sequential', 'n_neurons': [128, 128, 2], 'n_steps': 1000, 'periodicity': 6.283185307179586, 'seed': None, 'summary_step': 10, 'tensorboard': False, 'trainable_dense_to_sparse': False, 'training': 'auto', 'using_hypercube': False, 'write_summary': False}#
- load(trajs, tops=None, common_str=None, backend='no_load', index=None, traj_num=None, basename_fn=None, custom_top=None)[source]#
Load MD data.
Based what’s provided for trajs, you either get a SingleTraj object that collects information about a single traj, or a TrajEnsemble object, that contains information of multiple trajectories (even with different topologies).
- Parameters:
trajs (Union[str, md.Trajectory, Sequence[str], Sequence[md.Trajectory], Sequence[SingleTraj]]) – Here, you can provide a single string pointing to a trajectory on your computer (/path/to/traj_file.xtc) or (/path/to/protein.pdb) or a list of such strings. In the former case, you will get a SingleTraj object which is EncoderMap’s way of storing data (positions, CVs, times) of a single trajectory. In the latter case, you will get a TrajEnsemble object, which is Encodermap’s way of working with mutlipel SingleTrajs.
tops (Optional[Union[str, md.Topology, Sequence[str], Sequence[md.Topology]]]) – For this argument, you can provide the topology(ies) of the corresponding traj(s). Trajectory file formats like .xtc and .dcd only store atomic positions and not weights, elements, or bonds. That’s what the tops argument is for. There are some trajectory file formats out there (MDTraj HDF5, AMBER netCDF4) that store both trajectory and topology in a single file. Also .pdb file can also be used as If you provide such files for trajs, you can leave tops as None. If you provide multiple files for trajs, you can still provide a single tops file, if the trajs in trajs share the same topology. If that is not the case, you can either provide a list of topologies, matched to the trajs in trajs, or use the common_str argument to match them. Defaults to None.
common_str (Optional[str, list[str]]) –
If you provided a different number of trajs and tops, this argument is used to match them. Let’s say, you have 5 trajectories of a wild type protein and 5 trajectories of a mutant. If the path to these files is somewhat consistent (e.g:
/path/to/wt/traj1.xtc
/different/path/to/wt/traj_no_water.xtc
…
/data/path/to/mutant/traj0.xtc
/data/path/to/mutant/traj0.xtc
), you can provide [‘wt’, ‘mutant’] for the common_str argument and the files are grouped based on the occurence of ‘wt’ and ‘mutant’ in ther filepaths. Defaults to None.
backend (Literal["no_load", "mdtraj"]) – Normally, encodermap postpones the actual loading of the atomic positions until you really need them. This accelerates the handling of large trajectory ensembles. Choosing ‘mdtraj’ as the backend, all atomic positions are always loaded, taking up space on your system memory, but accessing positions in a non-sequential fashion is faster. Defaults to ‘no_load’.
index (Optional[Union[int, np.ndarray, list[int], slice]]) –
Only used, if argument trajs is a single trajectory. This argument can be used to index the trajectory data. If you want to exclude the first 100 frames of your trajectory, because the protein relaxes from its crystal structure, you can load it like so:
em.load(traj_file, top_file, index=slice(100))
As encodermap lazily evaluates positional data, the slice(100) argument is stored until the data is accessed in which case the first 100 frames are not accessible. Just like, if you would have deleted them. Besides a slice, you can also provide int (which returns a single frame at the requested index) and lists of int (which returns frames at the locations indexed by the ints in the list). If None is provided the trajectory data is not sliced/subsampled. Defaults to None.
traj_num (Optional[int]) –
Only used, if argument trajs is a single trajectory. This argument is meant to organize the SingleTraj trajectories in a TrajEnsemble class. Of course you can build your own TrajEnsemble from
a list of SingleTraj`s and provide this list as the `trajs argument to
em.load(). In this case you need to set the `traj_num`s of the `SingleTraj`s yourself. Defaults to None.
basename_fn (Optional[Callable[[str], str]]) – A function to apply to the traj_file string to return the basename of the trajectory. If None is provided, the filename without extension will be used. When all files are named the same and the folder they’re in defines the name of the trajectory you can supply lambda x: split(‘/’)[-2] as this argument. Defaults to None.
custom_top (Optional['CustomAAsDict'])
- Return type:
Union[SingleTraj, TrajEnsemble]
Examples
>>> # load a pdb file with 14 frames from rcsb.org >>> import encodermap as em >>> traj = em.load("https://files.rcsb.org/view/1GHC.pdb") >>> print(traj) encodermap.SingleTraj object. Current backend is no_load. Basename is 1GHC. At indices (None,). Not containing any CVs. >>> traj.n_frames 14 >>> # load multiple trajs >>> trajs = em.load([ ... 'https://files.rcsb.org/view/1YUG.pdb', ... 'https://files.rcsb.org/view/1YUF.pdb' ... ]) >>> # trajs are internally numbered >>> print([traj.traj_num for traj in trajs]) [0, 1]