encodermap.encodermap_tf1 package#

Submodules#

encodermap.encodermap_tf1.angle_dihedral_cartesian_encodermap module#

class AngleDihedralCartesianEncoderMap(*args, **kwargs)[source]#

Bases: Autoencoder

This EncoderMap variant is specially designed for protein conformations. During the training, the cartesian conformations of the backbone chain are reconstructed from backbone angles and dihedrals. This allows for a more sophisticated comparison of input conformations and generated conformations and improves the accuracy of generated conformations especially for large proteins. We achieve this with the cartesian_cost where we compare pairwise distances between atoms in cartesian coordinates in the input and generated conformations.

generate(latent, quantity=None)[source]#

Generates new high-dimensional points based on given low-dimensional points using the decoder part of the autoencoder.

Parameters:

latent – 2d numpy array containing points in the low-dimensional space. The number of columns must be equal to the number of neurons in the bottleneck layer of the autoencoder.

Returns:

2d numpy array containing points in the high-dimensional space.

class AngleDihedralCartesianEncoderMapDummy(*args, **kwargs)[source]#

Bases: AngleDihedralCartesianEncoderMap

encodermap.encodermap_tf1.autoencoder module#

class Autoencoder(parameters, train_data=None, validation_data=None, checkpoint_path=None, n_inputs=None, read_only=False, seed=None, debug=False)[source]#

Bases: object

close()[source]#

Close tensorflow session to free resources. :return:

encode(inputs)[source]#

Projects high dimensional data to a low dimensional space using the encoder part of the autoencoder.

Parameters:

inputs – 2d numpy array with the same number of columns as the used train_data

Returns:

2d numpy array with the point projected the the low dimensional space. The number of columns is equal to the number of neurons in the bottleneck layer of the autoencoder.

generate(latent)[source]#

Generates new high-dimensional points based on given low-dimensional points using the decoder part of the autoencoder.

Parameters:

latent – 2d numpy array containing points in the low-dimensional space. The number of columns must be equal to the number of neurons in the bottleneck layer of the autoencoder.

Returns:

2d numpy array containing points in the high-dimensional space.

profile()[source]#
train()[source]#

Train the autoencoder as specified in the parameters object.

encodermap.encodermap_tf1.backmapping module#

_expand_universe(universe, length)[source]#
_set_dihedral(dihedral, atoms, angle)[source]#
chain_in_plane(lengths, angles)[source]#

Reconstructs cartesions from distances and angles.

dihedral_backmapping(pdb_path, dihedral_trajectory, rough_n_points=-1)[source]#

Takes a pdb file with a peptide and creates a trajectory based on the dihedral angles given. It simply rotates around the dihedral angle axis. In the result side-chains might overlap but the backbone should turn out quite well.

Parameters:
  • pdb_path – (str)

  • dihedral_trajectory – array-like of shape (traj_length, number_of_dihedrals)

  • rough_n_points – (int) a step_size to select a subset of values from dihedral_trajectory is calculated by max(1, int(len(dihedral_trajectory) / rough_n_points)) with rough_n_points = -1 all values are used.

Returns:

(MDAnalysis.Universe)

dihedral_to_cartesian_tf_one_way(dihedrals, cartesian)[source]#
dihedrals_to_cartesian_tf(dihedrals, cartesian)[source]#
dihedrals_to_cartesian_tf_old(dihedrals, cartesian=None, central_atom_indices=None, no_omega=False)[source]#
guess_amide_H(cartesians, atom_names)[source]#
guess_amide_O(cartesians, atom_names)[source]#
guess_sp2_atom(cartesians, atom_names, bond_partner, angle_to_previous, bond_length)[source]#
merge_cartesians(central_cartesians, central_atom_names, H_cartesians, O_cartesians)[source]#
straight_tetrahedral_chain(n_atoms=None, bond_lengths=None)[source]#

encodermap.encodermap_tf1.encodermap module#

class EncoderMap(parameters, train_data=None, validation_data=None, checkpoint_path=None, n_inputs=None, read_only=False, seed=None, debug=False)[source]#

Bases: Autoencoder

encodermap.encodermap_tf1.misc module#

add_layer_summaries(layer, debug=False)[source]#
Parameters:

layer

Returns:

create_dir(path)[source]#
Parameters:

path

Returns:

distance_cost(r_h, r_l, sig_h, a_h, b_h, sig_l, a_l, b_l, periodicity)[source]#
Parameters:
  • r_h

  • r_l

  • sig_h

  • a_h

  • b_h

  • sig_l

  • a_l

  • b_l

  • periodicity

Returns:

pairwise_dist(positions, squared=False, flat=False)[source]#
pairwise_dist_periodic(positions, periodicity)[source]#
periodic_distance(a, b, periodicity=6.283185307179586)[source]#
Parameters:
  • a

  • b

  • periodicity

Returns:

periodic_distance_np(a, b, periodicity=6.283185307179586)[source]#
Parameters:
  • a

  • b

  • periodicity

Returns:

potential_energy(angles, dihedrals, distances)[source]#
random_on_cube_edges(n_points, sigma=0)[source]#
read_from_log(run_path, names)[source]#
rotation_matrix(axis_unit_vec, angle)[source]#
run_path(path)[source]#

Creates a directory at “path/run{i}” where the i is corresponding to the smallest not yet existing path

Parameters:

path – (str)

Returns:

(str) path of the created folder

search_and_replace(file_path, search_pattern, replacement, out_path=None, backup=True)[source]#

Searches for a pattern in a text file and replaces it with the replacement

Parameters:
  • file_path – (str) path to the file to search

  • search_pattern – (str) pattern to search for

  • replacement – (str) string that replaces the search_pattern in the output file

  • out_path – (str) path where to write the output file. If no path is given the original file will be replaced.

  • backup – (bool) if backup is true the original file is renamed to filename.bak before it is overwritten

Returns:

sigmoid(r, sig, a, b)[source]#
Parameters:
  • r

  • sig

  • a

  • b

Returns:

variable_summaries(name, variables, debug=False)[source]#

Attach several summaries to a Tensor for TensorBoard visualization.

Parameters:
  • name

  • variables

Returns:

encodermap.encodermap_tf1.moldata module#

class Angles(atomgroups, **kwargs)[source]#

Bases: AnalysisBase

class MolData(atom_group, cache_path='', start=None, stop=None, step=None)[source]#

Bases: object

MolData is designed to extract and hold conformational information from trajectories.

Variables:
  • cartesians – numpy array of the trajectory atom coordinates

  • central_cartesians – cartesian coordinates of the central backbone atoms (N-CA-C-N-CA-C…)

  • dihedrals – all backbone dihederals (phi, psi, omega)

  • angles – all bond angles of the central backbone atoms

  • lengths – all bond lengths between neighbouring central atoms

  • sidedihedrals – all sidechain dihedrals

  • aminoaciddict – number of sidechain diheadrals

static sort_key(atom)[source]#
write(path, coordinates, name='generated', formats=('pdb', 'xtc'), only_central=False, align_reference=None, align_select='all')[source]#

Writes a trajectory for the given coordinates.

Parameters:
  • path – directory where to save the trajectory

  • coordinates – numpy array of xyz coordinates (frames, atoms, xyz)

  • name – filename (without extension)

  • formats – specify which formats schould be used to write structure and trajectory. default: (“pdb”, “xtc”)

  • only_central – if True only central atom coordinates are expected (N-Ca-C…)

  • align_reference – Allows to allign the generated conformations according to some reference. The reference should be given as MDAnalysis atomgroup

  • align_select – Allows to select which atoms should be used for the alignment. e.g. “resid 5:60” default is “all”. Have a look at the MDAnalysis selection syntax for more details.

Returns:

class Positions(atomgroup, **kwargs)[source]#

Bases: AnalysisBase

encodermap.encodermap_tf1.parameters module#

class ADCParameters[source]#

Bases: Parameters

This is the parameter object for the AngleDihedralCartesianEncoder. It holds all the parameters that the Parameters object includes, plus the following parameters:

Variables:
  • cartesian_pwd_start – index of the first atom to use for the pairwise distance calculation

  • cartesian_pwd_stop – index of the last atom to use for the pairwise distance calculation

  • cartesian_pwd_step – step for the calculation of paiwise distances. E.g. for a chain of atoms N-C_a-C-N-C_a-C… cartesian_pwd_start=1 and cartesian_pwd_step=3 will result in using all C-alpha atoms for the pairwise distance calculation.

  • use_backbone_angles – Allows to define whether backbone bond angles should be learned (True) or if instead mean values should be used to generate conformations (False)

  • angle_cost_scale – Adjusts how much the angle cost is weighted in the cost function.

  • angle_cost_variant – Defines how the angle cost is calculated. Must be one of: “mean_square”, “mean_abs”, “mean_norm”

  • angle_cost_reference – Can be used to normalize the angle cost with the cost of same reference model (dummy)

  • dihedral_cost_scale – Adjusts how much the dihedral cost is weighted in the cost function.

  • dihedral_cost_variant – Defines how the dihedral cost is calculated. Must be one of: “mean_square”, “mean_abs”, “mean_norm”

  • dihedral_cost_reference – Can be used to normalize the dihedral cost with the cost of same reference model (dummy)

  • cartesian_cost_scale – Adjusts how much the cartesian cost is weighted in the cost function.

  • cartesian_cost_scale_soft_start – Allows to slowly turn on the cartesian cost. Must be a tuple with (begin, ende) or (None, None) If begin and end are given, cartesian_cost_scale will be increased linearly in the given range

  • cartesian_cost_variant – Defines how the cartesian cost is calculated. Must be one of: “mean_square”, “mean_abs”, “mean_norm”

  • cartesian_cost_reference – Can be used to normalize the cartesian cost with the cost of same reference model (dummy)

  • cartesian_dist_sig_parameters – Parameters for the sigmoid functions applied to the high- and low-dimensional distances in the following order (sig_h, a_h, b_h, sig_l, a_l, b_l)

  • cartesian_distance_cost_scale – Adjusts how much the cartesian distance cost is weighted in the cost function.

class Parameters[source]#

Bases: ParametersFramework

Variables:
  • main_path – Defines a main path where the parameters and other things might be stored.

  • n_neurons – List containing number of neurons for each layer up to the bottleneck layer. For example [128, 128, 2] stands for an autoencoder with the following architecture {i, 128, 128, 2, 128, 128, i} where i is the number of dimensions of the input data.

  • activation_functions – List of activation function names as implemented in TensorFlow. For example: “relu”, “tanh”, “sigmoid” or “” to use no activation function. The encoder part of the network takes the activation functions from the list starting with the second element. The decoder part of the network takes the activation functions in reversed order starting with the second element form the back. For example [“”, “relu”, “tanh”, “”] would result in a autoencoder with {“relu”, “tanh”, “”, “tanh”, “relu”, “”} as sequence of activation functions.

  • periodicity – Defines the distance between periodic walls for the inputs. For example 2pi for angular values in radians. All periodic data processed by EncoderMap must be wrapped to one periodic window. E.g. data with 2pi periodicity may contain values from -pi to pi or from 0 to 2pi. Set the periodicity to float(“inf”) for non-periodic inputs.

  • learning_rate – Learning rate used by the optimizer.

  • n_steps – Number of training steps.

  • batch_size – Number of training points used in each training step

  • summary_step – A summary for TensorBoard is writen every summary_step steps.

  • checkpoint_step – A checkpoint is writen every checkpoint_step steps.

  • dist_sig_parameters – Parameters for the sigmoid functions applied to the high- and low-dimensional distances in the following order (sig_h, a_h, b_h, sig_l, a_l, b_l)

  • distance_cost_scale – Adjusts how much the distance based metric is weighted in the cost function.

  • auto_cost_scale – Adjusts how much the autoencoding cost is weighted in the cost function.

  • auto_cost_variant – defines how the auto cost is calculated. Must be one of: “mean_square”, “mean_abs”, “mean_norm”

  • center_cost_scale – Adjusts how much the centering cost is weighted in the cost function.

  • l2_reg_constant – Adjusts how much the l2 regularisation is weighted in the cost function.

  • gpu_memory_fraction – Specifies the fraction of gpu memory blocked. If it is 0 memory is allocated as needed.

  • analysis_path – A path that can be used to store analysis

  • id – Can be any name for the run. Might be useful for example for specific analysis for different data sets.

class ParametersFramework[source]#

Bases: object

classmethod load(path)[source]#

Loads the parameters saved in a json file into a new Parameter object.

Parameters:

path – path of the json parameter file

Returns:

a Parameter object

save(path=None)[source]#

Save parameters in json format

Parameters:

path – Path where parameters should be saved. If no path is given main_path/parameters.json is used.

Returns:

The path where the parameters were saved.

encodermap.encodermap_tf1.plot module#

class ManualPath(axe, n_points=200)[source]#

Bases: object

ManualPath is a tool to manually select a path in a matplotlib graph. It supports two modes: “interpolated line”, and “free draw”. Press “m” to switch modes.

In interpolated line mode click in the graph to add an additional way point. Press “delete” to remove the last way point. Press “d” to remove all way points. Press “enter” once you have finished your path selection.

In free draw mode press and hold the left mouse button while you draw a path.

Once the path selection is completed, the use_points method is called with the points on the selected path. You can overwrite the use_points method to do what ever you want with the points on the path.

use_points(points)[source]#

Overwrite this method to use the selected points in any way you like.

For Example:

>>> class MyManualPath(ManualPath):
>>>     def use_points(self, points):
>>>         print(points)
Parameters:

points – numpy array with points from the manual path selection

Returns:

None

class PathGenerateCartesians(axe, autoencoder, mol_data, save_path=None, n_points=200, vmd_path='', align_reference=None, align_select='all')[source]#

Bases: ManualPath

This class inherits from encodermap.plot.ManualPath. It is used to select paths in a 2d map and to generate conformations for these paths with a AngleDihedralCartesianEncoder.

use_points(points)[source]#

Overwrite this method to use the selected points in any way you like.

For Example:

>>> class MyManualPath(ManualPath):
>>>     def use_points(self, points):
>>>         print(points)
Parameters:

points – numpy array with points from the manual path selection

Returns:

None

class PathGenerateDihedrals(axe, autoencoder, pdb_path, save_path=None, n_points=200)[source]#

Bases: ManualPath

This class inherits from encodermap.plot.ManualPath. The points from a manually selected path are fed into the decoder part of a given autoencoder. The output of the autoencoder is used as phi psi dihedral angles to reconstruct protein conformations based on the protein structure given with pdb_path. Three output files are written for each selected path: points.npy, generated.npy and generated.pdb which contain: the points on the selected path, the generated output of the autoencoder, and the generated protein conformations respectively. Keep in mind that backbone dihedrals are not sufficient to describe a protein conformation completely. Usually the backbone is reconstructed well but all side chains are messed up.

use_points(points)[source]#

Overwrite this method to use the selected points in any way you like.

For Example:

>>> class MyManualPath(ManualPath):
>>>     def use_points(self, points):
>>>         print(points)
Parameters:

points – numpy array with points from the manual path selection

Returns:

None

class PathSelect(axe, projected, mol_data, save_path, n_points=200, vmd_path='', align_reference=None, align_select='all')[source]#

Bases: ManualPath

This class inherits from encodermap.plot.ManualPath. It is used to select areas in a 2d map and to write all conformations in these areas to separate trajectories.

use_points(points)[source]#

Overwrite this method to use the selected points in any way you like.

For Example:

>>> class MyManualPath(ManualPath):
>>>     def use_points(self, points):
>>>         print(points)
Parameters:

points – numpy array with points from the manual path selection

Returns:

None

distance_histogram(data, periodicity, sigmoid_parameters, axes=None, low_d_max=5, bins='auto')[source]#

Plots the histogram of all pairwise distances in the data. It also shows the sigmoid function and its normalized derivative.

Parameters:
  • data – each row should contain a point in a number_of _columns dimensional space.

  • periodicity – Periodicity of the data. use float(“inf”) for non periodic data

  • sigmoid_parameters – tuple (sigma, a, b)

  • axes – Array like structure with two matplotlib axe objects ore None. If None a new figure is generated.

  • low_d_max – upper limit for plotting the low_d sigmoid

  • bins – number of bins for histogram

Returns:

matplotlib axe objects

Module contents#