Models#

ToDo: * Add some nice images to the plot_model of the functional model.

class encodermap.models.models.ADCSequentialModel(*args, **kwargs)[source]#

Bases: SequentialModel

call(x, training=False)[source]#

Calls the model on new inputs and returns the outputs as tensors.

In this case call() just reapplies all ops in the graph to the new inputs (e.g. build a new computational graph from the provided inputs).

Note: This method should not be called directly. It is only meant to be overridden when subclassing tf.keras.Model. To call a model on an input, always use the __call__() method, i.e. model(inputs), which relies on the underlying call() method.

Parameters:

inputs – Input tensor, or dict/list/tuple of input tensors.
training – Boolean or boolean scalar tensor, indicating whether to run the Network in training mode or inference mode.
mask – A mask or list of masks. A mask can be either a boolean tensor or None (no mask). For more details, check the guide [here](https://www.tensorflow.org/guide/keras/masking_and_padding).

Returns:

A tensor if there is a single output, or a list of tensors if there are more than one outputs.

call_and_map_back(x, distances, angles, dihedrals, cartesians, splits, side_dihedrals=None)[source]#

train_step(data)[source]#

Overwrites the normal train_step. What is different?

Not much. Even the provided data is expected to be a tuple of (data, classes) (x, y) in classification tasks. The data is unpacked and y is discarded, because the Autoencoder Model is a regression task.

Parameters:: data (tuple) – The (x, y) data of this train step.

class encodermap.models.models.FunctionalModel(*args, **kwargs)[source]#

Bases: Model

compile(*args, **kwargs)[source]#

Configures the model for training.

Example:

```python model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),

loss=tf.keras.losses.BinaryCrossentropy(), metrics=[tf.keras.metrics.BinaryAccuracy(),

tf.keras.metrics.FalseNegatives()])

```

Parameters:

optimizer – String (name of optimizer) or optimizer instance. See tf.keras.optimizers.
loss – Loss function. May be a string (name of loss function), or a tf.keras.losses.Loss instance. See tf.keras.losses. A loss function is any callable with the signature loss = fn(y_true, y_pred), where y_true are the ground truth values, and y_pred are the model’s predictions. y_true should have shape (batch_size, d0, .. dN) (except in the case of sparse loss functions such as sparse categorical crossentropy which expects integer arrays of shape (batch_size, d0, .. dN-1)). y_pred should have shape (batch_size, d0, .. dN). The loss function should return a float tensor. If a custom Loss instance is used and reduction is set to None, return value has shape (batch_size, d0, .. dN-1) i.e. per-sample or per-timestep loss values; otherwise, it is a scalar. If the model has multiple outputs, you can use a different loss on each output by passing a dictionary or a list of losses. The loss value that will be minimized by the model will then be the sum of all individual losses, unless loss_weights is specified.
metrics – List of metrics to be evaluated by the model during training and testing. Each of this can be a string (name of a built-in function), function or a tf.keras.metrics.Metric instance. See tf.keras.metrics. Typically you will use metrics=[‘accuracy’]. A function is any callable with the signature result = fn(y_true, y_pred). To specify different metrics for different outputs of a multi-output model, you could also pass a dictionary, such as metrics={‘output_a’:’accuracy’, ‘output_b’:[‘accuracy’, ‘mse’]}. You can also pass a list to specify a metric or a list of metrics for each output, such as metrics=[[‘accuracy’], [‘accuracy’, ‘mse’]] or metrics=[‘accuracy’, [‘accuracy’, ‘mse’]]. When you pass the strings ‘accuracy’ or ‘acc’, we convert this to one of tf.keras.metrics.BinaryAccuracy, tf.keras.metrics.CategoricalAccuracy, tf.keras.metrics.SparseCategoricalAccuracy based on the shapes of the targets and of the model output. We do a similar conversion for the strings ‘crossentropy’ and ‘ce’ as well. The metrics passed here are evaluated without sample weighting; if you would like sample weighting to apply, you can specify your metrics via the weighted_metrics argument instead.
loss_weights – Optional list or dictionary specifying scalar coefficients (Python floats) to weight the loss contributions of different model outputs. The loss value that will be minimized by the model will then be the weighted sum of all individual losses, weighted by the loss_weights coefficients. If a list, it is expected to have a 1:1 mapping to the model’s outputs. If a dict, it is expected to map output names (strings) to scalar coefficients.
weighted_metrics – List of metrics to be evaluated and weighted by sample_weight or class_weight during training and testing.
run_eagerly – Bool. Defaults to False. If True, this Model’s logic will not be wrapped in a tf.function. Recommended to leave this as None unless your Model cannot be run inside a tf.function. run_eagerly=True is not supported when using tf.distribute.experimental.ParameterServerStrategy.
steps_per_execution – Int. Defaults to 1. The number of batches to run during each tf.function call. Running multiple batches inside a single tf.function call can greatly improve performance on TPUs or small models with a large Python overhead. At most, one full epoch will be run each execution. If a number larger than the size of the epoch is passed, the execution will be truncated to the size of the epoch. Note that if steps_per_execution is set to N, Callback.on_batch_begin and Callback.on_batch_end methods will only be called every N batches (i.e. before/after each tf.function execution).
jit_compile – If True, compile the model training step with XLA. [XLA](https://www.tensorflow.org/xla) is an optimizing compiler for machine learning. jit_compile is not enabled for by default. This option cannot be enabled with run_eagerly=True. Note that jit_compile=True may not necessarily work for all models. For more information on supported operations please refer to the [XLA documentation](https://www.tensorflow.org/xla). Also refer to [known XLA issues](https://www.tensorflow.org/xla/known_issues) for more details.
**kwargs – Arguments supported for backwards compatibility only.

decoder(x, training=False)[source]#

encoder(x, training=False)[source]#

get_loss(inp)[source]#

train_step(data)[source]#

The logic for one training step.

This method can be overridden to support custom training logic. For concrete examples of how to override this method see [Customizing what happens in fit]( https://www.tensorflow.org/guide/keras/customizing_what_happens_in_fit). This method is called by Model.make_train_function.

This method should contain the mathematical logic for one step of training. This typically includes the forward pass, loss calculation, backpropagation, and metric updates.

Configuration details for how this logic is run (e.g. tf.function and tf.distribute.Strategy settings), should be left to Model.make_train_function, which can also be overridden.

Parameters:: data – A nested structure of `Tensor`s.
Returns:: A dict containing values that will be passed to tf.keras.callbacks.CallbackList.on_train_batch_end. Typically, the values of the Model’s metrics are returned. Example: {‘loss’: 0.2, ‘accuracy’: 0.7}.

class encodermap.models.models.SequentialModel(*args, **kwargs)[source]#

Bases: Model

build(*args, **kwargs)[source]#

Builds the model based on input shapes received.

This is to be used for subclassed models, which do not know at instantiation time what their inputs look like.

This method only exists for users who want to call model.build() in a standalone way (as a substitute for calling the model on real data to build it). It will never be called by the framework (and thus it will never throw unexpected errors in an unrelated workflow).

Parameters:

input_shape – Single tuple, TensorShape instance, or list/dict of shapes, where shapes are tuples, integers, or TensorShape instances.

Raises:

ValueError –
1. In case of invalid user-provided data (not of type tuple, list, TensorShape, or dict). 2. If the model requires call arguments that are agnostic to the input shapes (positional or keyword arg in call signature). 3. If not all layers were properly built. 4. If float type inputs are not supported within the layers.
In each of these cases, the user should build their model by calling –
it on real tensor data. –

call(x, training=False)[source]#

Calls the model on new inputs and returns the outputs as tensors.

In this case call() just reapplies all ops in the graph to the new inputs (e.g. build a new computational graph from the provided inputs).

Parameters:

inputs – Input tensor, or dict/list/tuple of input tensors.
training – Boolean or boolean scalar tensor, indicating whether to run the Network in training mode or inference mode.
mask – A mask or list of masks. A mask can be either a boolean tensor or None (no mask). For more details, check the guide [here](https://www.tensorflow.org/guide/keras/masking_and_padding).

Returns:

A tensor if there is a single output, or a list of tensors if there are more than one outputs.

compile(*args, **kwargs)[source]#

Configures the model for training.

Example:

```python model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=1e-3),

loss=tf.keras.losses.BinaryCrossentropy(), metrics=[tf.keras.metrics.BinaryAccuracy(),

tf.keras.metrics.FalseNegatives()])

```

Parameters:

optimizer – String (name of optimizer) or optimizer instance. See tf.keras.optimizers.
loss – Loss function. May be a string (name of loss function), or a tf.keras.losses.Loss instance. See tf.keras.losses. A loss function is any callable with the signature loss = fn(y_true, y_pred), where y_true are the ground truth values, and y_pred are the model’s predictions. y_true should have shape (batch_size, d0, .. dN) (except in the case of sparse loss functions such as sparse categorical crossentropy which expects integer arrays of shape (batch_size, d0, .. dN-1)). y_pred should have shape (batch_size, d0, .. dN). The loss function should return a float tensor. If a custom Loss instance is used and reduction is set to None, return value has shape (batch_size, d0, .. dN-1) i.e. per-sample or per-timestep loss values; otherwise, it is a scalar. If the model has multiple outputs, you can use a different loss on each output by passing a dictionary or a list of losses. The loss value that will be minimized by the model will then be the sum of all individual losses, unless loss_weights is specified.
metrics – List of metrics to be evaluated by the model during training and testing. Each of this can be a string (name of a built-in function), function or a tf.keras.metrics.Metric instance. See tf.keras.metrics. Typically you will use metrics=[‘accuracy’]. A function is any callable with the signature result = fn(y_true, y_pred). To specify different metrics for different outputs of a multi-output model, you could also pass a dictionary, such as metrics={‘output_a’:’accuracy’, ‘output_b’:[‘accuracy’, ‘mse’]}. You can also pass a list to specify a metric or a list of metrics for each output, such as metrics=[[‘accuracy’], [‘accuracy’, ‘mse’]] or metrics=[‘accuracy’, [‘accuracy’, ‘mse’]]. When you pass the strings ‘accuracy’ or ‘acc’, we convert this to one of tf.keras.metrics.BinaryAccuracy, tf.keras.metrics.CategoricalAccuracy, tf.keras.metrics.SparseCategoricalAccuracy based on the shapes of the targets and of the model output. We do a similar conversion for the strings ‘crossentropy’ and ‘ce’ as well. The metrics passed here are evaluated without sample weighting; if you would like sample weighting to apply, you can specify your metrics via the weighted_metrics argument instead.
loss_weights – Optional list or dictionary specifying scalar coefficients (Python floats) to weight the loss contributions of different model outputs. The loss value that will be minimized by the model will then be the weighted sum of all individual losses, weighted by the loss_weights coefficients. If a list, it is expected to have a 1:1 mapping to the model’s outputs. If a dict, it is expected to map output names (strings) to scalar coefficients.
weighted_metrics – List of metrics to be evaluated and weighted by sample_weight or class_weight during training and testing.
run_eagerly – Bool. Defaults to False. If True, this Model’s logic will not be wrapped in a tf.function. Recommended to leave this as None unless your Model cannot be run inside a tf.function. run_eagerly=True is not supported when using tf.distribute.experimental.ParameterServerStrategy.
steps_per_execution – Int. Defaults to 1. The number of batches to run during each tf.function call. Running multiple batches inside a single tf.function call can greatly improve performance on TPUs or small models with a large Python overhead. At most, one full epoch will be run each execution. If a number larger than the size of the epoch is passed, the execution will be truncated to the size of the epoch. Note that if steps_per_execution is set to N, Callback.on_batch_begin and Callback.on_batch_end methods will only be called every N batches (i.e. before/after each tf.function execution).
jit_compile – If True, compile the model training step with XLA. [XLA](https://www.tensorflow.org/xla) is an optimizing compiler for machine learning. jit_compile is not enabled for by default. This option cannot be enabled with run_eagerly=True. Note that jit_compile=True may not necessarily work for all models. For more information on supported operations please refer to the [XLA documentation](https://www.tensorflow.org/xla). Also refer to [known XLA issues](https://www.tensorflow.org/xla/known_issues) for more details.
**kwargs – Arguments supported for backwards compatibility only.

decoder(x, training=False)[source]#

encoder(x, training=False)[source]#

train_step(data)[source]#

Overwrites the normal train_step. What is different?

Parameters:: data (tuple) – The (x, y) data of this train step.

class encodermap.models.models.Sparse(*args, **kwargs)[source]#

Bases: Dense

call(inputs)[source]#

This is where the layer’s logic lives.

The call() method may not create state (except in its first invocation, wrapping the creation of variables or other resources in tf.init_scope()). It is recommended to create state, including tf.Variable instances and nested Layer instances,

in __init__(), or in the build() method that is

called automatically before call() executes for the first time.

Parameters:

inputs –
Input tensor, or dict/list/tuple of input tensors. The first positional inputs argument is subject to special rules: - inputs must be explicitly passed. A layer cannot have zero

arguments, and inputs cannot be provided via the default value of a keyword argument.
- NumPy array or Python scalar values in inputs get cast as tensors.
- Keras mask metadata is only collected from inputs.
- Layers are built (build(input_shape) method) using shape info from inputs only.
- input_spec compatibility is only checked against inputs.
- Mixed precision input casting is only applied to inputs. If a layer has tensor arguments in *args or **kwargs, their casting behavior in mixed precision should be handled manually.
- The SavedModel input specification is generated using inputs only.
- Integration with various ecosystem packages like TFMOT, TFLite, TF.js, etc is only supported for inputs and not for tensors in positional and keyword arguments.
*args – Additional positional arguments. May contain tensors, although this is not recommended, for the reasons above.
**kwargs –
Additional keyword arguments. May contain tensors, although this is not recommended, for the reasons above. The following optional keyword arguments are reserved: - training: Boolean scalar tensor of Python boolean indicating

whether the call is meant for training or inference.
- mask: Boolean input mask. If the layer’s call() method takes a mask argument, its default value will be set to the mask generated for inputs by the previous layer (if input did come from a layer that generated a corresponding mask, i.e. if it came from a Keras layer with masking support).

Returns:

A tensor or list/tuple of tensors.

class encodermap.models.models.SparseFunctionalModel(*args, **kwargs)[source]#

Bases: FunctionalModel

get_loss(inp)[source]#

class encodermap.models.models.SparseModel(*args, **kwargs)[source]#

Bases: Model

call(sparse_tensor)[source]#

Calls the model on new inputs and returns the outputs as tensors.

In this case call() just reapplies all ops in the graph to the new inputs (e.g. build a new computational graph from the provided inputs).

Parameters:

inputs – Input tensor, or dict/list/tuple of input tensors.
training – Boolean or boolean scalar tensor, indicating whether to run the Network in training mode or inference mode.
mask – A mask or list of masks. A mask can be either a boolean tensor or None (no mask). For more details, check the guide [here](https://www.tensorflow.org/guide/keras/masking_and_padding).

Returns:

A tensor if there is a single output, or a list of tensors if there are more than one outputs.

encodermap.models.models.gen_functional_model(input_dataset, parameters=None, reload_layers=None, sparse=False)[source]#

Builds a model to specification of parameters using the functional API.

The functional API is much more flexible than the sequential API, in that models with multiple inputs and outputs can be defined. Custom-layers and sub-models can be intermixed. In EncoderMap’s case the functional API is used to build the AngleDihedralCartesianAutoencoder, which takes input data in form of a tf.data.Dataset with:

backbone_angles (angles between C, CA, N - atoms in the backbone).

backbone_torsions (dihedral angles in the backbone, commonly known as omega, phi, psi).

cartesian_coordinates (coordinates of the C, CA, N backbone atoms. This data has ndim 3, the other have ndim 2).

backbone_distances (distances between the C, CA, N backbone atoms).

sidechain_torsions (dihedral angles in the sidechain, commonly known as chi1, chi2, chi3, chi4, chi5).

Packing and unpacking that data in the correct manner is important. Make sure to double check whether you are using angles or dihedrals. A simple print of the shape can be enough.

In the functional model all operations are tf.keras.layers, meaning that the projection onto a unit_circle that the SequentialModel does in its call() method needs to be a layer. The FunctionalModel consist of 5 main parts:

Angle Inputs: The provided dataset is unpacked and the periodic data of the angles is projected onto
a unit-circle. If the angles are in gradians, they will also be normalized into a [-pi, pi) interval.

Autoencoder: The trainable part of the network consists of the Autoencoder part build to the specifications
in the provided parameters. Here, Dense layers are stacked. Only the angles and torsions are fed into the Autoencoder. The Distances and Cartesians are used later.

Angle Outputs: The angles are recalculated from their unit-circle inputs.

Back-Mapping. The backmapping layer takes backbone_angles and backbone_dihedrals, backbone_distances to
calculate new cartesian coordinates.

Pairwise Distances: The pairwise distances of the input cartesians and the back-mapped cartesians are calculated.

Parameters:

input_dataset (tf.data.Dataset) – The dataset with the data in the order given in the explanation.
parameters (Union[em.ADCParameters, None], optional) – The parameters to be used to build the network. If None is provided the default parameters in encodermap.ADCParameters.defaults is used. You can look at the defaults with print(em.ADCParameters.defaults_description()). Defaults to None.
reload_layers (Union[None, list], optional) – List of layers that will be reloaded when reloading the model from disk. Defaults to None, when a new model should be built.

Raises:

AssertionError – AssertionErrors will be raised when the input data is not formatted correctly. This means, if len(cartesians) != len(distances) - 1, or len(cartesians) != len(angles) - 2. This can also mean, the input dataset is not packed correctly. Please keep the order specified above. This can also mean, that the provided protein is not linear (branched, circular, …).

Returns:

A subclass of tf.keras.Model build with specified parameters.

Return type:

em.FunctionalModel

encodermap.models.models.gen_sequential_model(input_shape, parameters=None, sparse=False)[source]#

Returns a tf.keras Model build with the specified input shape and the parameters in the Parameters class.

Parameters:

input_shape (int) – The input shape of the returned model. In most cases that is data.shape[1] of your data.
parameters (Union[encodermap.Parameters, encodermap.ADCParameters, None], optional) – The parameters to use on the returned model. If None is provided the default parameters in encodermap.Parameters.defaults is used. You can look at the defaults with print(em.Parameters.defaults_description()). Defaults to None.

Returns:

A subclass of tf.keras.Model build with specified parameters.

Return type:

em.SequentialModel