encodermap.parameters package#

Submodules#

encodermap.parameters.parameters module#

Parameter Classes for Encodermap.

This module contains parameter classes which are used to hold information for the encodermap autoencoder. Parameters can be set from keyword arguments, by overwriting the class attributes or by reading them from .json, .yaml or ASCII files.

Features:
  • Setting and saving Parameters with the Parameter class.

  • Loading parameters from disk and continue where you left off.

  • The Parameter and ADCParamter class contain already good default values.

class ADCParameters(**kwargs)[source]#

Bases: ParametersFramework

This is the parameter object for the AngleDihedralCartesianEncoder. It holds all the parameters that the Parameters object includes, plus the following attributes:

Parameters:

kwargs (ParametersData)

track_clashes#

Whether to track the number of clashes during training. The average number of clashes is the average number of distances in the reconstructed cartesian coordinates with a distance smaller than 1 (nm). Defaults to False.

Type:

bool

track_RMSD#

Whether to track the RMSD of the input and reconstructed cartesians during training. The RMSDs are computed along the batch by minimizing the .. math:

\text{RMSD}(\mathbf{x}, \mathbf{x}^{\text{ref}}) = \min_{\mathsf{R}, \mathbf{t}} %
 \sqrt{\frac{1}{N} \sum_{i=1}^{N} \left[ %
     (\mathsf{R}\cdot\mathbf{x}_{i}(t) + \mathbf{t}) - \mathbf{x}_{i}^{\text{ref}} \right]^{2}}

This results in n RMSD values, where n is the size of the batch. A mean RMSD of this batch and the values for this batch will be logged to tensorboard.

Type:

bool

cartesian_pwd_start#

Index of the first atom to use for the pairwise distance calculation.

Type:

int

cartesian_pwd_stop#

Index of the last atom to use for the pairwise distance calculation.

Type:

int

cartesian_pwd_step#

Step for the calculation of paiwise distances. E.g. for a chain of atoms N-C_a-C-N-C_a-C… cartesian_pwd_start=1 and cartesian_pwd_step=3 will result in using all C-alpha atoms for the pairwise distance calculation.

Type:

int

use_backbone_angles#

Allows to define whether backbone bond angles should be learned (True) or if instead mean values should be used to generate conformations (False).

Type:

bool

use_sidechains#

Whether sidechain dihedrals should be passed through the autoencoder.

Type:

bool

angle_cost_scale#

Adjusts how much the angle cost is weighted in the cost function.

Type:

int

angle_cost_variant#

Defines how the angle cost is calculated. Must be one of:

  • “mean_square”

  • “mean_abs”

  • “mean_norm”.

Type:

str

angle_cost_reference#

Can be used to normalize the angle cost with the cost of same reference model (dummy).

Type:

int

dihedral_cost_scale#

Adjusts how much the dihedral cost is weighted in the cost function.

Type:

int

dihedral_cost_variant#

Defines how the dihedral cost is calculated. Must be one of:

  • “mean_square”

  • “mean_abs”

  • “mean_norm”.

Type:

str

dihedral_cost_reference#

Can be used to normalize the dihedral cost with the cost of same reference model (dummy).

Type:

int

side_dihedral_cost_scale#

Adjusts how much the side dihedral cost is weighted in the cost function.

Type:

int

side_dihedral_cost_variant#

Defines how the side dihedral cost is calculated. Must be one of:

  • “mean_square”

  • “mean_abs”

  • “mean_norm”.

Type:

str

side_dihedral_cost_reference#

Can be used to normalize the side dihedral cost with the cost of same reference model (dummy).

Type:

int

cartesian_cost_scale#

Adjusts how much the cartesian cost is weighted in the cost function.

Type:

int

cartesian_cost_scale_soft_start#

Allows to slowly turn on the cartesian cost. Must be a tuple with (start, end) or (None, None) If begin and end are given,

cartesian_cost_scale will be increased linearly in the

given range.

Type:

tuple

cartesian_cost_variant#

Defines how the cartesian cost is calculated. Must be one of:

  • “mean_square”

  • “mean_abs”

  • “mean_norm”.

Type:

str

cartesian_cost_reference#

Can be used to normalize the cartesian cost with the cost of same reference model (dummy).

Type:

int

cartesian_dist_sig_parameters#

Parameters for the sigmoid functions applied to the high- and low-dimensional distances in the following order (sig_h, a_h, b_h, sig_l, a_l, b_l).

Type:

tuple of floats

cartesian_distance_cost_scale#

Adjusts how much the cartesian distance cost is weighted in the cost function.

Type:

int

multimer_training#

Experimental feature.

Type:

Any

multimer_topology_classes#

Experimental feature.

Type:

Any

multimer_connection_bridges#

Experimental feature.

Type:

Any

multimer_lengths#

Experimental feature.

Type:

Any

reconstruct_sidechains#

Whether to also reconstruct sidechains.

Type:

bool

Examples

>>> import encodermap as em
>>> import tempfile
>>> from pathlib import Path
...
>>> with tempfile.TemporaryDirectory() as td:
...     td = Path(td)
...     p = em.Parameters()
...     print(p.auto_cost_variant)
...     savepath = p.save(td / "parameters.json")
...     print(savepath)
...     new_params = em.Parameters.from_file(td / "parameters.json")
...     print(new_params.main_path)  
mean_abs
/tmp...parameters.json
seems like the parameter file was moved to another directory. Parameter file is updated ...
/home...
_defaults = {'activation_functions': ['', 'tanh', 'tanh', ''], 'analysis_path': '', 'angle_cost_reference': 1, 'angle_cost_scale': 0, 'angle_cost_variant': 'mean_abs', 'auto_cost_scale': None, 'auto_cost_variant': 'mean_abs', 'batch_size': 256, 'batched': True, 'cartesian_cost_reference': 1, 'cartesian_cost_scale': 1, 'cartesian_cost_scale_soft_start': (None, None), 'cartesian_cost_variant': 'mean_abs', 'cartesian_dist_sig_parameters': (4.5, 12, 6, 1, 2, 6), 'cartesian_distance_cost_scale': 1, 'cartesian_pwd_start': None, 'cartesian_pwd_step': None, 'cartesian_pwd_stop': None, 'center_cost_scale': 0.0001, 'checkpoint_step': 5000, 'current_training_step': 0, 'dihedral_cost_reference': 1, 'dihedral_cost_scale': 1, 'dihedral_cost_variant': 'mean_abs', 'dist_sig_parameters': (4.5, 12, 6, 1, 2, 6), 'distance_cost_scale': None, 'gpu_memory_fraction': 0, 'id': '', 'l2_reg_constant': 0.001, 'learning_rate': 0.001, 'loss': 'emap_cost', 'model_api': 'functional', 'multimer_connection_bridges': None, 'multimer_lengths': None, 'multimer_topology_classes': None, 'multimer_training': None, 'n_neurons': [128, 128, 2], 'n_steps': 1000, 'periodicity': 6.283185307179586, 'reconstruct_sidechains': False, 'seed': None, 'side_dihedral_cost_reference': 1, 'side_dihedral_cost_scale': 0.5, 'side_dihedral_cost_variant': 'mean_abs', 'summary_step': 10, 'tensorboard': False, 'track_RMSD': False, 'track_clashes': False, 'trainable_dense_to_sparse': False, 'training': 'auto', 'use_backbone_angles': False, 'use_sidechains': False, 'using_hypercube': False, 'write_summary': False}#
classmethod defaults_description()[source]#

str: A string that contains tabulated default parameter values.

Return type:

str

class Parameters(**kwargs)[source]#

Bases: ParametersFramework

Class to hold Parameters for the Autoencoder

Parameters can be set via keyword args while instantiating the class, set as instance attributes or read from disk. This class can write parameters to disk in .yaml or .json format.

Parameters:

kwargs (ParametersData)

defaults#

Classvariable dict that holds the defaults even when the current values might have changed.

Type:

dict

main_path#

Defines a main path where the parameters and other things might be stored.

Type:

str

n_neurons#

List containing number of neurons for each layer up to the bottleneck layer. For example [128, 128, 2] stands for an autoencoder with the following architecture {i, 128, 128, 2, 128, 128, i} where i is the number of dimensions of the input data. These are Input/Output Layers that are not trained.

Type:

list of int

activation_functions#

List of activation function names as implemented in TensorFlow. For example: “relu”, “tanh”, “sigmoid” or “” to use no activation function. The encoder part of the network takes the activation functions from the list starting with the second element. The decoder part of the network takes the activation functions in reversed order starting with the second element form the back. For example [“”, “relu”, “tanh”, “”] would result in a autoencoder with {“relu”, “tanh”, “”, “tanh”, “relu”, “”} as sequence of activation functions.

Type:

list of str

periodicity#

Defines the distance between periodic walls for the inputs. For example 2pi for angular values in radians. All periodic data processed by EncoderMap must be wrapped to one periodic window. E.g. data with 2pi periodicity may contain values from -pi to pi or from 0 to 2pi. Set the periodicity to float(“inf”) for non-periodic inputs.

Type:

float

learning_rate#

Learning rate used by the optimizer.

Type:

float

n_steps#

Number of training steps.

Type:

int

batch_size#

Number of training points used in each training step

Type:

int

summary_step#

A summary for TensorBoard is writen every summary_step steps.

Type:

int

checkpoint_step#

A checkpoint is writen every checkpoint_step steps.

Type:

int

dist_sig_parameters#

Parameters for the sigmoid functions applied to the high- and low-dimensional distances in the following order (sig_h, a_h, b_h, sig_l, a_l, b_l)

Type:

tuple of floats

distance_cost_scale#

Adjusts how much the distance based metric is weighted in the cost function.

Type:

int

auto_cost_scale#

Adjusts how much the autoencoding cost is weighted in the cost function.

Type:

int

auto_cost_variant#

defines how the auto cost is calculated. Must be one of: * mean_square * mean_abs * mean_norm

Type:

str

center_cost_scale#

Adjusts how much the centering cost is weighted in the cost function.

Type:

float

l2_reg_constant#

Adjusts how much the L2 regularisation is weighted in the cost function.

Type:

float

gpu_memory_fraction#

Specifies the fraction of gpu memory blocked. If set to 0, memory is allocated as needed.

Type:

float

analysis_path#

A path that can be used to store analysis

Type:

str

id#

Can be any name for the run. Might be useful for example for specific analysis for different data sets.

Type:

str

model_api#

A string defining the API to be used to build the keras model. Defaults to sequntial. Possible strings are: * functional will use keras’ functional API. * sequential will define a keras Model, containing two other models with the Sequential API.

These two models are encoder and decoder.

  • custom will create a custom Model where even the layers are custom.

Type:

str

loss#

A string defining the loss function. Defaults to emap_cost. Possible losses are: * reconstruction_loss will try to train output == input * mse: Returns a mean squared error loss. * emap_cost is the EncoderMap loss function. Depending on the class Autoencoder,

Encodermap, `ADCAutoencoder, different contributions are used for a combined loss. Autoencoder uses atuo_cost, reg_cost, center_cost. EncoderMap class adds sigmoid_loss.

Type:

str

batched#

Whether the dataset is batched or not.

Type:

bool

training#

A string defining what kind of training is performed when autoencoder.train() is callsed. * auto does a regular model.compile() and model.fit() procedure. * custom uses gradient tape and calculates losses and gradients manually.

Type:

str

tensorboard#

Whether to print tensorboard information. Defaults to False.

Type:

bool

seed#

Fixes the state of all operations using random numbers. Defaults to None.

Type:

Union[int, None]

current_training_step#

The current training step. Aids in reloading of models.

Type:

int

write_summary#

If True writes a summar.txt of the models into main_path if tensorboard is True, summaries will also be written.

Type:

bool

trainable_dense_to_sparse#

When using different topologies to train the AngleDihedralCartesianEncoderMap, some inputs might be sparse, which means, they have missing values. Creating a dense input is done by first passing these sparse tensors through tf.keras.layers.Dense layers. These layers have trainable weights, and if this parameter is True, these weights will be changed by the optimizer.

Type:

bool

using_hypercube#

This parameter is not meant to be set by the user. It allows us to print better error messages when re-loading and re-training a model. It contains a boolean whether a model has been trained on the hypercube example data. If your data is 4-dimensional and you reload a model and forget to prvide your data, the model will happily train with the hypercube (and not your) data. This variable implements a check.

Type:

bool

Examples

>>> import encodermap as em
>>> import tempfile
>>> from pathlib import Path
...
>>> with tempfile.TemporaryDirectory() as td:
...     td = Path(td)
...     p = em.Parameters()
...     print(p.auto_cost_variant)
...     savepath = p.save(td / "parameters.json")
...     print(savepath)
...     new_params = em.Parameters.from_file(td / "parameters.json")
...     print(new_params.main_path)  
mean_abs
/tmp...parameters.json
seems like the parameter file was moved to another directory. Parameter file is updated ...
/home...
_defaults = {'activation_functions': ['', 'tanh', 'tanh', ''], 'analysis_path': '', 'auto_cost_scale': 1, 'auto_cost_variant': 'mean_abs', 'batch_size': 256, 'batched': True, 'center_cost_scale': 0.0001, 'checkpoint_step': 5000, 'current_training_step': 0, 'dist_sig_parameters': (4.5, 12, 6, 1, 2, 6), 'distance_cost_scale': 500, 'gpu_memory_fraction': 0, 'id': '', 'l2_reg_constant': 0.001, 'learning_rate': 0.001, 'loss': 'emap_cost', 'model_api': 'sequential', 'n_neurons': [128, 128, 2], 'n_steps': 1000, 'periodicity': 6.283185307179586, 'seed': None, 'summary_step': 10, 'tensorboard': False, 'trainable_dense_to_sparse': False, 'training': 'auto', 'using_hypercube': False, 'write_summary': False}#
classmethod defaults_description()[source]#

str: A string that contains tabulated default parameter values.

Return type:

str

Module contents#