encodermap.encodermap_tf1 package#

Submodules#

encodermap.encodermap_tf1.angle_dihedral_cartesian_encodermap module#

class encodermap.encodermap_tf1.angle_dihedral_cartesian_encodermap.AngleDihedralCartesianEncoderMap(*args, **kwargs)[source]#

Bases: Autoencoder

This EncoderMap variant is specially designed for protein conformations. During the training, the cartesian conformations of the backbone chain are reconstructed from backbone angles and dihedrals. This allows for a more sophisticated comparison of input conformations and generated conformations and improves the accuracy of generated conformations especially for large proteins. We achieve this with the cartesian_cost where we compare pairwise distances between atoms in cartesian coordinates in the input and generated conformations.

__init__(*args, **kwargs)[source]#
Parameters:
  • parameters – ADCParameters object as defined in encodermap.encodermap_tf1.parameters.ADCParameters

  • train_data – the training data as a MolData object

  • validation_data – not yet supported

  • checkpoint_path – If a checkpoint path is given, values like neural network weights stored in this checkpoint will be restored.

  • read_only – if True, no output is writen

_angle_cost()[source]#
_cartesian_cost()[source]#
_cartesian_distance_cost()[source]#
_dihedral_cost()[source]#
_distance_cost()[source]#
_prepare_data()[source]#
_setup_cost()[source]#
_setup_network()[source]#
generate(latent, quantity=None)[source]#

Generates new high-dimensional points based on given low-dimensional points using the decoder part of the autoencoder.

Parameters:

latent – 2d numpy array containing points in the low-dimensional space. The number of columns must be equal to the number of neurons in the bottleneck layer of the autoencoder.

Returns:

2d numpy array containing points in the high-dimensional space.

class encodermap.encodermap_tf1.angle_dihedral_cartesian_encodermap.AngleDihedralCartesianEncoderMapDummy(*args, **kwargs)[source]#

Bases: AngleDihedralCartesianEncoderMap

_setup_network()[source]#

encodermap.encodermap_tf1.autoencoder module#

class encodermap.encodermap_tf1.autoencoder.Autoencoder(parameters, train_data=None, validation_data=None, checkpoint_path=None, n_inputs=None, read_only=False, seed=None, debug=False)[source]#

Bases: object

__init__(parameters, train_data=None, validation_data=None, checkpoint_path=None, n_inputs=None, read_only=False, seed=None, debug=False)[source]#
Parameters:
  • parameters – Parameters object as defined in encodermap.encodermap_tf1.parameters.Parameters

  • train_data – 2d numpy array where each row is treated as a training point

  • validation_data – A 2d numpy array. This data will only be used to calculate a validation error during training. It will not be used for training.

  • checkpoint_path – If a checkpoint path is given, values like neural network weights stored in this checkpoint will be restored.

  • n_inputs – If no train_data is given, for example when an already trained network is restored from a checkpoint, the number of of inputs needs to be given. This should be equal to the number of columns of the train_data the network was originally trained with.

  • read_only – if True, no output is writen

_auto_cost()[source]#
_center_cost()[source]#
_encode(inputs)[source]#
_generate(inputs)[source]#
_l2_reg_cost()[source]#
_prepare_data()[source]#
_random_batch(data, batch_size=None)[source]#
_setup_cost()[source]#
_setup_data_iterator()[source]#
_setup_network()[source]#
_step()[source]#
close()[source]#

Close tensorflow session to free resources. :return:

encode(inputs)[source]#

Projects high dimensional data to a low dimensional space using the encoder part of the autoencoder.

Parameters:

inputs – 2d numpy array with the same number of columns as the used train_data

Returns:

2d numpy array with the point projected the the low dimensional space. The number of columns is equal to the number of neurons in the bottleneck layer of the autoencoder.

generate(latent)[source]#

Generates new high-dimensional points based on given low-dimensional points using the decoder part of the autoencoder.

Parameters:

latent – 2d numpy array containing points in the low-dimensional space. The number of columns must be equal to the number of neurons in the bottleneck layer of the autoencoder.

Returns:

2d numpy array containing points in the high-dimensional space.

profile()[source]#
train()[source]#

Train the autoencoder as specified in the parameters object.

encodermap.encodermap_tf1.backmapping module#

encodermap.encodermap_tf1.backmapping._expand_universe(universe, length)[source]#
encodermap.encodermap_tf1.backmapping._set_dihedral(dihedral, atoms, angle)[source]#
encodermap.encodermap_tf1.backmapping.chain_in_plane(lengths, angles)[source]#

Reconstructs cartesions from distances and angles.

encodermap.encodermap_tf1.backmapping.dihedral_backmapping(pdb_path, dihedral_trajectory, rough_n_points=-1)[source]#

Takes a pdb file with a peptide and creates a trajectory based on the dihedral angles given. It simply rotates around the dihedral angle axis. In the result side-chains might overlap but the backbone should turn out quite well.

Parameters:
  • pdb_path – (str)

  • dihedral_trajectory – array-like of shape (traj_length, number_of_dihedrals)

  • rough_n_points – (int) a step_size to select a subset of values from dihedral_trajectory is calculated by max(1, int(len(dihedral_trajectory) / rough_n_points)) with rough_n_points = -1 all values are used.

Returns:

(MDAnalysis.Universe)

encodermap.encodermap_tf1.backmapping.dihedral_to_cartesian_tf_one_way(dihedrals, cartesian)[source]#
encodermap.encodermap_tf1.backmapping.dihedrals_to_cartesian_tf(dihedrals, cartesian)[source]#
encodermap.encodermap_tf1.backmapping.dihedrals_to_cartesian_tf_old(dihedrals, cartesian=None, central_atom_indices=None, no_omega=False)[source]#
encodermap.encodermap_tf1.backmapping.guess_amide_H(cartesians, atom_names)[source]#
encodermap.encodermap_tf1.backmapping.guess_amide_O(cartesians, atom_names)[source]#
encodermap.encodermap_tf1.backmapping.guess_sp2_atom(cartesians, atom_names, bond_partner, angle_to_previous, bond_length)[source]#
encodermap.encodermap_tf1.backmapping.merge_cartesians(central_cartesians, central_atom_names, H_cartesians, O_cartesians)[source]#
encodermap.encodermap_tf1.backmapping.straight_tetrahedral_chain(n_atoms=None, bond_lengths=None)[source]#

encodermap.encodermap_tf1.encodermap module#

class encodermap.encodermap_tf1.encodermap.EncoderMap(parameters, train_data=None, validation_data=None, checkpoint_path=None, n_inputs=None, read_only=False, seed=None, debug=False)[source]#

Bases: Autoencoder

_distance_cost()[source]#
_setup_cost()[source]#

encodermap.encodermap_tf1.misc module#

encodermap.encodermap_tf1.misc.add_layer_summaries(layer, debug=False)[source]#
Parameters:

layer

Returns:

encodermap.encodermap_tf1.misc.create_dir(path)[source]#
Parameters:

path

Returns:

encodermap.encodermap_tf1.misc.distance_cost(r_h, r_l, sig_h, a_h, b_h, sig_l, a_l, b_l, periodicity)[source]#
Parameters:
  • r_h

  • r_l

  • sig_h

  • a_h

  • b_h

  • sig_l

  • a_l

  • b_l

  • periodicity

Returns:

encodermap.encodermap_tf1.misc.pairwise_dist(positions, squared=False, flat=False)[source]#
encodermap.encodermap_tf1.misc.pairwise_dist_periodic(positions, periodicity)[source]#
encodermap.encodermap_tf1.misc.periodic_distance(a, b, periodicity=6.283185307179586)[source]#
Parameters:
  • a

  • b

  • periodicity

Returns:

encodermap.encodermap_tf1.misc.periodic_distance_np(a, b, periodicity=6.283185307179586)[source]#
Parameters:
  • a

  • b

  • periodicity

Returns:

encodermap.encodermap_tf1.misc.potential_energy(angles, dihedrals, distances)[source]#
encodermap.encodermap_tf1.misc.random_on_cube_edges(n_points, sigma=0)[source]#
encodermap.encodermap_tf1.misc.read_from_log(run_path, names)[source]#
encodermap.encodermap_tf1.misc.rotation_matrix(axis_unit_vec, angle)[source]#
encodermap.encodermap_tf1.misc.run_path(path)[source]#

Creates a directory at “path/run{i}” where the i is corresponding to the smallest not yet existing path

Parameters:

path – (str)

Returns:

(str) path of the created folder

encodermap.encodermap_tf1.misc.search_and_replace(file_path, search_pattern, replacement, out_path=None, backup=True)[source]#

Searches for a pattern in a text file and replaces it with the replacement

Parameters:
  • file_path – (str) path to the file to search

  • search_pattern – (str) pattern to search for

  • replacement – (str) string that replaces the search_pattern in the output file

  • out_path – (str) path where to write the output file. If no path is given the original file will be replaced.

  • backup – (bool) if backup is true the original file is renamed to filename.bak before it is overwritten

Returns:

encodermap.encodermap_tf1.misc.sigmoid(r, sig, a, b)[source]#
Parameters:
  • r

  • sig

  • a

  • b

Returns:

encodermap.encodermap_tf1.misc.variable_summaries(name, variables, debug=False)[source]#

Attach several summaries to a Tensor for TensorBoard visualization.

Parameters:
  • name

  • variables

Returns:

encodermap.encodermap_tf1.moldata module#

class encodermap.encodermap_tf1.moldata.Angles(atomgroups, **kwargs)[source]#

Bases: AnalysisBase

_conclude()[source]#

Finalize the results you’ve gathered.

Called at the end of the run() method to finish everything up.

_prepare()[source]#

Set things up before the analysis loop begins

_single_frame()[source]#

Calculate data from a single frame of trajectory

Don’t worry about normalising, just deal with a single frame.

class encodermap.encodermap_tf1.moldata.MolData(atom_group, cache_path='', start=None, stop=None, step=None)[source]#

Bases: object

MolData is designed to extract and hold conformational information from trajectories.

Variables:
  • cartesians – numpy array of the trajectory atom coordinates

  • central_cartesians – cartesian coordinates of the central backbone atoms (N-CA-C-N-CA-C…)

  • dihedrals – all backbone dihederals (phi, psi, omega)

  • angles – all bond angles of the central backbone atoms

  • lengths – all bond lengths between neighbouring central atoms

  • sidedihedrals – all sidechain dihedrals

  • aminoaciddict – number of sidechain diheadrals

__init__(atom_group, cache_path='', start=None, stop=None, step=None)[source]#
Parameters:
  • atom_group – MDAnalysis atom group

  • cache_path – Allows to define a path where the calculated variables can be cached.

  • start – first frame to analyze

  • stop – last frame to analyze

  • step – step of the analyzes

static sort_key(atom)[source]#
write(path, coordinates, name='generated', formats=('pdb', 'xtc'), only_central=False, align_reference=None, align_select='all')[source]#

Writes a trajectory for the given coordinates.

Parameters:
  • path – directory where to save the trajectory

  • coordinates – numpy array of xyz coordinates (frames, atoms, xyz)

  • name – filename (without extension)

  • formats – specify which formats schould be used to write structure and trajectory. default: (“pdb”, “xtc”)

  • only_central – if True only central atom coordinates are expected (N-Ca-C…)

  • align_reference – Allows to allign the generated conformations according to some reference. The reference should be given as MDAnalysis atomgroup

  • align_select – Allows to select which atoms should be used for the alignment. e.g. “resid 5:60” default is “all”. Have a look at the MDAnalysis selection syntax for more details.

Returns:

class encodermap.encodermap_tf1.moldata.Positions(atomgroup, **kwargs)[source]#

Bases: AnalysisBase

_conclude()[source]#

Finalize the results you’ve gathered.

Called at the end of the run() method to finish everything up.

_prepare()[source]#

Set things up before the analysis loop begins

_single_frame()[source]#

Calculate data from a single frame of trajectory

Don’t worry about normalising, just deal with a single frame.

encodermap.encodermap_tf1.parameters module#

class encodermap.encodermap_tf1.parameters.ADCParameters[source]#

Bases: Parameters

This is the parameter object for the AngleDihedralCartesianEncoder. It holds all the parameters that the Parameters object includes, plus the following parameters:

Variables:
  • cartesian_pwd_start – index of the first atom to use for the pairwise distance calculation

  • cartesian_pwd_stop – index of the last atom to use for the pairwise distance calculation

  • cartesian_pwd_step – step for the calculation of paiwise distances. E.g. for a chain of atoms N-C_a-C-N-C_a-C… cartesian_pwd_start=1 and cartesian_pwd_step=3 will result in using all C-alpha atoms for the pairwise distance calculation.

  • use_backbone_angles – Allows to define whether backbone bond angles should be learned (True) or if instead mean values should be used to generate conformations (False)

  • angle_cost_scale – Adjusts how much the angle cost is weighted in the cost function.

  • angle_cost_variant – Defines how the angle cost is calculated. Must be one of: “mean_square”, “mean_abs”, “mean_norm”

  • angle_cost_reference – Can be used to normalize the angle cost with the cost of same reference model (dummy)

  • dihedral_cost_scale – Adjusts how much the dihedral cost is weighted in the cost function.

  • dihedral_cost_variant – Defines how the dihedral cost is calculated. Must be one of: “mean_square”, “mean_abs”, “mean_norm”

  • dihedral_cost_reference – Can be used to normalize the dihedral cost with the cost of same reference model (dummy)

  • cartesian_cost_scale – Adjusts how much the cartesian cost is weighted in the cost function.

  • cartesian_cost_scale_soft_start – Allows to slowly turn on the cartesian cost. Must be a tuple with (begin, ende) or (None, None) If begin and end are given, cartesian_cost_scale will be increased linearly in the given range

  • cartesian_cost_variant – Defines how the cartesian cost is calculated. Must be one of: “mean_square”, “mean_abs”, “mean_norm”

  • cartesian_cost_reference – Can be used to normalize the cartesian cost with the cost of same reference model (dummy)

  • cartesian_dist_sig_parameters – Parameters for the sigmoid functions applied to the high- and low-dimensional distances in the following order (sig_h, a_h, b_h, sig_l, a_l, b_l)

  • cartesian_distance_cost_scale – Adjusts how much the cartesian distance cost is weighted in the cost function.

class encodermap.encodermap_tf1.parameters.Parameters[source]#

Bases: ParametersFramework

Variables:
  • main_path – Defines a main path where the parameters and other things might be stored.

  • n_neurons – List containing number of neurons for each layer up to the bottleneck layer. For example [128, 128, 2] stands for an autoencoder with the following architecture {i, 128, 128, 2, 128, 128, i} where i is the number of dimensions of the input data.

  • activation_functions – List of activation function names as implemented in TensorFlow. For example: “relu”, “tanh”, “sigmoid” or “” to use no activation function. The encoder part of the network takes the activation functions from the list starting with the second element. The decoder part of the network takes the activation functions in reversed order starting with the second element form the back. For example [“”, “relu”, “tanh”, “”] would result in a autoencoder with {“relu”, “tanh”, “”, “tanh”, “relu”, “”} as sequence of activation functions.

  • periodicity – Defines the distance between periodic walls for the inputs. For example 2pi for angular values in radians. All periodic data processed by EncoderMap must be wrapped to one periodic window. E.g. data with 2pi periodicity may contain values from -pi to pi or from 0 to 2pi. Set the periodicity to float(“inf”) for non-periodic inputs.

  • learning_rate – Learning rate used by the optimizer.

  • n_steps – Number of training steps.

  • batch_size – Number of training points used in each training step

  • summary_step – A summary for TensorBoard is writen every summary_step steps.

  • checkpoint_step – A checkpoint is writen every checkpoint_step steps.

  • dist_sig_parameters – Parameters for the sigmoid functions applied to the high- and low-dimensional distances in the following order (sig_h, a_h, b_h, sig_l, a_l, b_l)

  • distance_cost_scale – Adjusts how much the distance based metric is weighted in the cost function.

  • auto_cost_scale – Adjusts how much the autoencoding cost is weighted in the cost function.

  • auto_cost_variant – defines how the auto cost is calculated. Must be one of: “mean_square”, “mean_abs”, “mean_norm”

  • center_cost_scale – Adjusts how much the centering cost is weighted in the cost function.

  • l2_reg_constant – Adjusts how much the l2 regularisation is weighted in the cost function.

  • gpu_memory_fraction – Specifies the fraction of gpu memory blocked. If it is 0 memory is allocated as needed.

  • analysis_path – A path that can be used to store analysis

  • id – Can be any name for the run. Might be useful for example for specific analysis for different data sets.

class encodermap.encodermap_tf1.parameters.ParametersFramework[source]#

Bases: object

_setattrs(dictionary)[source]#
classmethod load(path)[source]#

Loads the parameters saved in a json file into a new Parameter object.

Parameters:

path – path of the json parameter file

Returns:

a Parameter object

save(path=None)[source]#

Save parameters in json format

Parameters:

path – Path where parameters should be saved. If no path is given main_path/parameters.json is used.

Returns:

The path where the parameters were saved.

encodermap.encodermap_tf1.plot module#

class encodermap.encodermap_tf1.plot.ManualPath(axe, n_points=200)[source]#

Bases: object

ManualPath is a tool to manually select a path in a matplotlib graph. It supports two modes: “interpolated line”, and “free draw”. Press “m” to switch modes.

In interpolated line mode click in the graph to add an additional way point. Press “delete” to remove the last way point. Press “d” to remove all way points. Press “enter” once you have finished your path selection.

In free draw mode press and hold the left mouse button while you draw a path.

Once the path selection is completed, the use_points method is called with the points on the selected path. You can overwrite the use_points method to do what ever you want with the points on the path.

__init__(axe, n_points=200)[source]#
Parameters:
  • axe – matplotlib axe object for example from: fig, axe = plt.subplots()

  • n_points – Number of points distributed on the selected path.

_add_point_interp(event)[source]#
_free_draw(event)[source]#
_free_draw_callback(verts)[source]#
_grab_background(event=None)[source]#

When the figure is resized, hide the points, draw everything, and update the background.

_interpolate(x, y)[source]#
_on_click(event)[source]#
_on_key(event)[source]#
_reset_lines()[source]#
_update_interp()[source]#
_update_lines()[source]#

Efficiently update the figure, without needing to redraw the “background” artists.

use_points(points)[source]#

Overwrite this method to use the selected points in any way you like.

For Example:

>>> class MyManualPath(ManualPath):
>>>     def use_points(self, points):
>>>         print(points)
Parameters:

points – numpy array with points from the manual path selection

Returns:

None

class encodermap.encodermap_tf1.plot.PathGenerateCartesians(axe, autoencoder, mol_data, save_path=None, n_points=200, vmd_path='', align_reference=None, align_select='all')[source]#

Bases: ManualPath

This class inherits from encodermap.plot.ManualPath. It is used to select paths in a 2d map and to generate conformations for these paths with a AngleDihedralCartesianEncoder.

__init__(axe, autoencoder, mol_data, save_path=None, n_points=200, vmd_path='', align_reference=None, align_select='all')[source]#
Parameters:
  • axe – matplotlib axe object for example from: fig, axe = plt.subplots()

  • autoencoderAngleDihedralCartesianEncoder

  • mol_dataMolData

  • save_path – Path where outputs should be written

  • n_points – Number of points distributed on the selected path.

  • vmd_path – If a path to vmd is given, the generated conformations will be directly opened in vmd.

  • align_reference – Allows to allign the generated conformations according to some reference. The reference should be given as MDAnalysis atomgroup

  • align_select – Allows to select which atoms should be used for the alignment. e.g. “resid 5:60” default is “all”. Have a look at the MDAnalysis selection syntax for more details.

use_points(points)[source]#

Overwrite this method to use the selected points in any way you like.

For Example:

>>> class MyManualPath(ManualPath):
>>>     def use_points(self, points):
>>>         print(points)
Parameters:

points – numpy array with points from the manual path selection

Returns:

None

class encodermap.encodermap_tf1.plot.PathGenerateDihedrals(axe, autoencoder, pdb_path, save_path=None, n_points=200)[source]#

Bases: ManualPath

This class inherits from encodermap.plot.ManualPath. The points from a manually selected path are fed into the decoder part of a given autoencoder. The output of the autoencoder is used as phi psi dihedral angles to reconstruct protein conformations based on the protein structure given with pdb_path. Three output files are written for each selected path: points.npy, generated.npy and generated.pdb which contain: the points on the selected path, the generated output of the autoencoder, and the generated protein conformations respectively. Keep in mind that backbone dihedrals are not sufficient to describe a protein conformation completely. Usually the backbone is reconstructed well but all side chains are messed up.

__init__(axe, autoencoder, pdb_path, save_path=None, n_points=200)[source]#
Parameters:
  • axe – matplotlib axe object for example from: fig, axe = plt.subplots()

  • autoencoderencodermap.autoencoder.Autoencoder which was trained on protein dihedral angles. The dihedrals have to be order starting from the amino end. First all phi angles then all psi angles.

  • pdb_path – Path to a protein data bank (pdb) file of the protein

  • save_path – Path where outputs should be written

  • n_points – Number of points distributed on the selected path.

use_points(points)[source]#

Overwrite this method to use the selected points in any way you like.

For Example:

>>> class MyManualPath(ManualPath):
>>>     def use_points(self, points):
>>>         print(points)
Parameters:

points – numpy array with points from the manual path selection

Returns:

None

class encodermap.encodermap_tf1.plot.PathSelect(axe, projected, mol_data, save_path, n_points=200, vmd_path='', align_reference=None, align_select='all')[source]#

Bases: ManualPath

This class inherits from encodermap.plot.ManualPath. It is used to select areas in a 2d map and to write all conformations in these areas to separate trajectories.

__init__(axe, projected, mol_data, save_path, n_points=200, vmd_path='', align_reference=None, align_select='all')[source]#
Parameters:
  • axe – matplotlib axe object for example from: fig, axe = plt.subplots()

  • projected – points in the map (must be the same number of points as conformations in mol_data)

  • mol_dataMolData

  • save_path – Path where outputs should be written

  • n_points – Number of points distributed on the selected path.

  • vmd_path – If a path to vmd is given, the generated conformations will be directly opened in vmd.

  • align_reference – Allows to allign the generated conformations according to some reference. The reference should be given as MDAnalysis atomgroup

  • align_select – Allows to select which atoms should be used for the alignment. e.g. “resid 5:60” default is “all”. Have a look at the MDAnalysis selection syntax for more details.

use_points(points)[source]#

Overwrite this method to use the selected points in any way you like.

For Example:

>>> class MyManualPath(ManualPath):
>>>     def use_points(self, points):
>>>         print(points)
Parameters:

points – numpy array with points from the manual path selection

Returns:

None

encodermap.encodermap_tf1.plot.distance_histogram(data, periodicity, sigmoid_parameters, axes=None, low_d_max=5, bins='auto')[source]#

Plots the histogram of all pairwise distances in the data. It also shows the sigmoid function and its normalized derivative.

Parameters:
  • data – each row should contain a point in a number_of _columns dimensional space.

  • periodicity – Periodicity of the data. use float(“inf”) for non periodic data

  • sigmoid_parameters – tuple (sigma, a, b)

  • axes – Array like structure with two matplotlib axe objects ore None. If None a new figure is generated.

  • low_d_max – upper limit for plotting the low_d sigmoid

  • bins – number of bins for histogram

Returns:

matplotlib axe objects

Module contents#