encodermap.encodermap_tf1 package#
Submodules#
encodermap.encodermap_tf1.angle_dihedral_cartesian_encodermap module#
- class encodermap.encodermap_tf1.angle_dihedral_cartesian_encodermap.AngleDihedralCartesianEncoderMap(*args, **kwargs)[source]#
Bases:
Autoencoder
This EncoderMap variant is specially designed for protein conformations. During the training, the cartesian conformations of the backbone chain are reconstructed from backbone angles and dihedrals. This allows for a more sophisticated comparison of input conformations and generated conformations and improves the accuracy of generated conformations especially for large proteins. We achieve this with the cartesian_cost where we compare pairwise distances between atoms in cartesian coordinates in the input and generated conformations.
- __init__(*args, **kwargs)[source]#
- Parameters:
parameters – ADCParameters object as defined in
encodermap.encodermap_tf1.parameters.ADCParameters
train_data – the training data as a
MolData
objectvalidation_data – not yet supported
checkpoint_path – If a checkpoint path is given, values like neural network weights stored in this checkpoint will be restored.
read_only – if True, no output is writen
- generate(latent, quantity=None)[source]#
Generates new high-dimensional points based on given low-dimensional points using the decoder part of the autoencoder.
- Parameters:
latent – 2d numpy array containing points in the low-dimensional space. The number of columns must be equal to the number of neurons in the bottleneck layer of the autoencoder.
- Returns:
2d numpy array containing points in the high-dimensional space.
encodermap.encodermap_tf1.autoencoder module#
- class encodermap.encodermap_tf1.autoencoder.Autoencoder(parameters, train_data=None, validation_data=None, checkpoint_path=None, n_inputs=None, read_only=False, seed=None, debug=False)[source]#
Bases:
object
- __init__(parameters, train_data=None, validation_data=None, checkpoint_path=None, n_inputs=None, read_only=False, seed=None, debug=False)[source]#
- Parameters:
parameters – Parameters object as defined in
encodermap.encodermap_tf1.parameters.Parameters
train_data – 2d numpy array where each row is treated as a training point
validation_data – A 2d numpy array. This data will only be used to calculate a validation error during training. It will not be used for training.
checkpoint_path – If a checkpoint path is given, values like neural network weights stored in this checkpoint will be restored.
n_inputs – If no train_data is given, for example when an already trained network is restored from a checkpoint, the number of of inputs needs to be given. This should be equal to the number of columns of the train_data the network was originally trained with.
read_only – if True, no output is writen
- encode(inputs)[source]#
Projects high dimensional data to a low dimensional space using the encoder part of the autoencoder.
- Parameters:
inputs – 2d numpy array with the same number of columns as the used train_data
- Returns:
2d numpy array with the point projected the the low dimensional space. The number of columns is equal to the number of neurons in the bottleneck layer of the autoencoder.
- generate(latent)[source]#
Generates new high-dimensional points based on given low-dimensional points using the decoder part of the autoencoder.
- Parameters:
latent – 2d numpy array containing points in the low-dimensional space. The number of columns must be equal to the number of neurons in the bottleneck layer of the autoencoder.
- Returns:
2d numpy array containing points in the high-dimensional space.
encodermap.encodermap_tf1.backmapping module#
- encodermap.encodermap_tf1.backmapping.chain_in_plane(lengths, angles)[source]#
Reconstructs cartesions from distances and angles.
- encodermap.encodermap_tf1.backmapping.dihedral_backmapping(pdb_path, dihedral_trajectory, rough_n_points=-1)[source]#
Takes a pdb file with a peptide and creates a trajectory based on the dihedral angles given. It simply rotates around the dihedral angle axis. In the result side-chains might overlap but the backbone should turn out quite well.
- Parameters:
pdb_path – (str)
dihedral_trajectory – array-like of shape (traj_length, number_of_dihedrals)
rough_n_points – (int) a step_size to select a subset of values from dihedral_trajectory is calculated by max(1, int(len(dihedral_trajectory) / rough_n_points)) with rough_n_points = -1 all values are used.
- Returns:
(MDAnalysis.Universe)
- encodermap.encodermap_tf1.backmapping.dihedral_to_cartesian_tf_one_way(dihedrals, cartesian)[source]#
- encodermap.encodermap_tf1.backmapping.dihedrals_to_cartesian_tf_old(dihedrals, cartesian=None, central_atom_indices=None, no_omega=False)[source]#
- encodermap.encodermap_tf1.backmapping.guess_sp2_atom(cartesians, atom_names, bond_partner, angle_to_previous, bond_length)[source]#
encodermap.encodermap_tf1.encodermap module#
encodermap.encodermap_tf1.misc module#
- encodermap.encodermap_tf1.misc.add_layer_summaries(layer, debug=False)[source]#
- Parameters:
layer –
- Returns:
- encodermap.encodermap_tf1.misc.distance_cost(r_h, r_l, sig_h, a_h, b_h, sig_l, a_l, b_l, periodicity)[source]#
- Parameters:
r_h –
r_l –
sig_h –
a_h –
b_h –
sig_l –
a_l –
b_l –
periodicity –
- Returns:
- encodermap.encodermap_tf1.misc.periodic_distance(a, b, periodicity=6.283185307179586)[source]#
- Parameters:
a –
b –
periodicity –
- Returns:
- encodermap.encodermap_tf1.misc.periodic_distance_np(a, b, periodicity=6.283185307179586)[source]#
- Parameters:
a –
b –
periodicity –
- Returns:
- encodermap.encodermap_tf1.misc.run_path(path)[source]#
Creates a directory at “path/run{i}” where the i is corresponding to the smallest not yet existing path
- Parameters:
path – (str)
- Returns:
(str) path of the created folder
- encodermap.encodermap_tf1.misc.search_and_replace(file_path, search_pattern, replacement, out_path=None, backup=True)[source]#
Searches for a pattern in a text file and replaces it with the replacement
- Parameters:
file_path – (str) path to the file to search
search_pattern – (str) pattern to search for
replacement – (str) string that replaces the search_pattern in the output file
out_path – (str) path where to write the output file. If no path is given the original file will be replaced.
backup – (bool) if backup is true the original file is renamed to filename.bak before it is overwritten
- Returns:
encodermap.encodermap_tf1.moldata module#
- class encodermap.encodermap_tf1.moldata.Angles(atomgroups, **kwargs)[source]#
Bases:
AnalysisBase
- class encodermap.encodermap_tf1.moldata.MolData(atom_group, cache_path='', start=None, stop=None, step=None)[source]#
Bases:
object
MolData is designed to extract and hold conformational information from trajectories.
- Variables:
cartesians – numpy array of the trajectory atom coordinates
central_cartesians – cartesian coordinates of the central backbone atoms (N-CA-C-N-CA-C…)
dihedrals – all backbone dihederals (phi, psi, omega)
angles – all bond angles of the central backbone atoms
lengths – all bond lengths between neighbouring central atoms
sidedihedrals – all sidechain dihedrals
aminoaciddict – number of sidechain diheadrals
- __init__(atom_group, cache_path='', start=None, stop=None, step=None)[source]#
- Parameters:
atom_group – MDAnalysis atom group
cache_path – Allows to define a path where the calculated variables can be cached.
start – first frame to analyze
stop – last frame to analyze
step – step of the analyzes
- write(path, coordinates, name='generated', formats=('pdb', 'xtc'), only_central=False, align_reference=None, align_select='all')[source]#
Writes a trajectory for the given coordinates.
- Parameters:
path – directory where to save the trajectory
coordinates – numpy array of xyz coordinates (frames, atoms, xyz)
name – filename (without extension)
formats – specify which formats schould be used to write structure and trajectory. default: (“pdb”, “xtc”)
only_central – if True only central atom coordinates are expected (N-Ca-C…)
align_reference – Allows to allign the generated conformations according to some reference. The reference should be given as MDAnalysis atomgroup
align_select – Allows to select which atoms should be used for the alignment. e.g. “resid 5:60” default is “all”. Have a look at the MDAnalysis selection syntax for more details.
- Returns:
encodermap.encodermap_tf1.parameters module#
- class encodermap.encodermap_tf1.parameters.ADCParameters[source]#
Bases:
Parameters
This is the parameter object for the AngleDihedralCartesianEncoder. It holds all the parameters that the Parameters object includes, plus the following parameters:
- Variables:
cartesian_pwd_start – index of the first atom to use for the pairwise distance calculation
cartesian_pwd_stop – index of the last atom to use for the pairwise distance calculation
cartesian_pwd_step – step for the calculation of paiwise distances. E.g. for a chain of atoms N-C_a-C-N-C_a-C… cartesian_pwd_start=1 and cartesian_pwd_step=3 will result in using all C-alpha atoms for the pairwise distance calculation.
use_backbone_angles – Allows to define whether backbone bond angles should be learned (True) or if instead mean values should be used to generate conformations (False)
angle_cost_scale – Adjusts how much the angle cost is weighted in the cost function.
angle_cost_variant – Defines how the angle cost is calculated. Must be one of: “mean_square”, “mean_abs”, “mean_norm”
angle_cost_reference – Can be used to normalize the angle cost with the cost of same reference model (dummy)
dihedral_cost_scale – Adjusts how much the dihedral cost is weighted in the cost function.
dihedral_cost_variant – Defines how the dihedral cost is calculated. Must be one of: “mean_square”, “mean_abs”, “mean_norm”
dihedral_cost_reference – Can be used to normalize the dihedral cost with the cost of same reference model (dummy)
cartesian_cost_scale – Adjusts how much the cartesian cost is weighted in the cost function.
cartesian_cost_scale_soft_start – Allows to slowly turn on the cartesian cost. Must be a tuple with (begin, ende) or (None, None) If begin and end are given, cartesian_cost_scale will be increased linearly in the given range
cartesian_cost_variant – Defines how the cartesian cost is calculated. Must be one of: “mean_square”, “mean_abs”, “mean_norm”
cartesian_cost_reference – Can be used to normalize the cartesian cost with the cost of same reference model (dummy)
cartesian_dist_sig_parameters – Parameters for the sigmoid functions applied to the high- and low-dimensional distances in the following order (sig_h, a_h, b_h, sig_l, a_l, b_l)
cartesian_distance_cost_scale – Adjusts how much the cartesian distance cost is weighted in the cost function.
- class encodermap.encodermap_tf1.parameters.Parameters[source]#
Bases:
ParametersFramework
- Variables:
main_path – Defines a main path where the parameters and other things might be stored.
n_neurons – List containing number of neurons for each layer up to the bottleneck layer. For example [128, 128, 2] stands for an autoencoder with the following architecture {i, 128, 128, 2, 128, 128, i} where i is the number of dimensions of the input data.
activation_functions – List of activation function names as implemented in TensorFlow. For example: “relu”, “tanh”, “sigmoid” or “” to use no activation function. The encoder part of the network takes the activation functions from the list starting with the second element. The decoder part of the network takes the activation functions in reversed order starting with the second element form the back. For example [“”, “relu”, “tanh”, “”] would result in a autoencoder with {“relu”, “tanh”, “”, “tanh”, “relu”, “”} as sequence of activation functions.
periodicity – Defines the distance between periodic walls for the inputs. For example 2pi for angular values in radians. All periodic data processed by EncoderMap must be wrapped to one periodic window. E.g. data with 2pi periodicity may contain values from -pi to pi or from 0 to 2pi. Set the periodicity to float(“inf”) for non-periodic inputs.
learning_rate – Learning rate used by the optimizer.
n_steps – Number of training steps.
batch_size – Number of training points used in each training step
summary_step – A summary for TensorBoard is writen every summary_step steps.
checkpoint_step – A checkpoint is writen every checkpoint_step steps.
dist_sig_parameters – Parameters for the sigmoid functions applied to the high- and low-dimensional distances in the following order (sig_h, a_h, b_h, sig_l, a_l, b_l)
distance_cost_scale – Adjusts how much the distance based metric is weighted in the cost function.
auto_cost_scale – Adjusts how much the autoencoding cost is weighted in the cost function.
auto_cost_variant – defines how the auto cost is calculated. Must be one of: “mean_square”, “mean_abs”, “mean_norm”
center_cost_scale – Adjusts how much the centering cost is weighted in the cost function.
l2_reg_constant – Adjusts how much the l2 regularisation is weighted in the cost function.
gpu_memory_fraction – Specifies the fraction of gpu memory blocked. If it is 0 memory is allocated as needed.
analysis_path – A path that can be used to store analysis
id – Can be any name for the run. Might be useful for example for specific analysis for different data sets.
encodermap.encodermap_tf1.plot module#
- class encodermap.encodermap_tf1.plot.ManualPath(axe, n_points=200)[source]#
Bases:
object
ManualPath is a tool to manually select a path in a matplotlib graph. It supports two modes: “interpolated line”, and “free draw”. Press “m” to switch modes.
In interpolated line mode click in the graph to add an additional way point. Press “delete” to remove the last way point. Press “d” to remove all way points. Press “enter” once you have finished your path selection.
In free draw mode press and hold the left mouse button while you draw a path.
Once the path selection is completed, the use_points method is called with the points on the selected path. You can overwrite the use_points method to do what ever you want with the points on the path.
- __init__(axe, n_points=200)[source]#
- Parameters:
axe – matplotlib axe object for example from: fig, axe = plt.subplots()
n_points – Number of points distributed on the selected path.
- _grab_background(event=None)[source]#
When the figure is resized, hide the points, draw everything, and update the background.
- class encodermap.encodermap_tf1.plot.PathGenerateCartesians(axe, autoencoder, mol_data, save_path=None, n_points=200, vmd_path='', align_reference=None, align_select='all')[source]#
Bases:
ManualPath
This class inherits from
encodermap.plot.ManualPath
. It is used to select paths in a 2d map and to generate conformations for these paths with a AngleDihedralCartesianEncoder.- __init__(axe, autoencoder, mol_data, save_path=None, n_points=200, vmd_path='', align_reference=None, align_select='all')[source]#
- Parameters:
axe – matplotlib axe object for example from: fig, axe = plt.subplots()
autoencoder –
AngleDihedralCartesianEncoder
mol_data –
MolData
save_path – Path where outputs should be written
n_points – Number of points distributed on the selected path.
vmd_path – If a path to vmd is given, the generated conformations will be directly opened in vmd.
align_reference – Allows to allign the generated conformations according to some reference. The reference should be given as MDAnalysis atomgroup
align_select – Allows to select which atoms should be used for the alignment. e.g. “resid 5:60” default is “all”. Have a look at the MDAnalysis selection syntax for more details.
- class encodermap.encodermap_tf1.plot.PathGenerateDihedrals(axe, autoencoder, pdb_path, save_path=None, n_points=200)[source]#
Bases:
ManualPath
This class inherits from
encodermap.plot.ManualPath
. The points from a manually selected path are fed into the decoder part of a given autoencoder. The output of the autoencoder is used as phi psi dihedral angles to reconstruct protein conformations based on the protein structure given with pdb_path. Three output files are written for each selected path: points.npy, generated.npy and generated.pdb which contain: the points on the selected path, the generated output of the autoencoder, and the generated protein conformations respectively. Keep in mind that backbone dihedrals are not sufficient to describe a protein conformation completely. Usually the backbone is reconstructed well but all side chains are messed up.- __init__(axe, autoencoder, pdb_path, save_path=None, n_points=200)[source]#
- Parameters:
axe – matplotlib axe object for example from: fig, axe = plt.subplots()
autoencoder –
encodermap.autoencoder.Autoencoder
which was trained on protein dihedral angles. The dihedrals have to be order starting from the amino end. First all phi angles then all psi angles.pdb_path – Path to a protein data bank (pdb) file of the protein
save_path – Path where outputs should be written
n_points – Number of points distributed on the selected path.
- class encodermap.encodermap_tf1.plot.PathSelect(axe, projected, mol_data, save_path, n_points=200, vmd_path='', align_reference=None, align_select='all')[source]#
Bases:
ManualPath
This class inherits from
encodermap.plot.ManualPath
. It is used to select areas in a 2d map and to write all conformations in these areas to separate trajectories.- __init__(axe, projected, mol_data, save_path, n_points=200, vmd_path='', align_reference=None, align_select='all')[source]#
- Parameters:
axe – matplotlib axe object for example from: fig, axe = plt.subplots()
projected – points in the map (must be the same number of points as conformations in mol_data)
mol_data –
MolData
save_path – Path where outputs should be written
n_points – Number of points distributed on the selected path.
vmd_path – If a path to vmd is given, the generated conformations will be directly opened in vmd.
align_reference – Allows to allign the generated conformations according to some reference. The reference should be given as MDAnalysis atomgroup
align_select – Allows to select which atoms should be used for the alignment. e.g. “resid 5:60” default is “all”. Have a look at the MDAnalysis selection syntax for more details.
- encodermap.encodermap_tf1.plot.distance_histogram(data, periodicity, sigmoid_parameters, axes=None, low_d_max=5, bins='auto')[source]#
Plots the histogram of all pairwise distances in the data. It also shows the sigmoid function and its normalized derivative.
- Parameters:
data – each row should contain a point in a number_of _columns dimensional space.
periodicity – Periodicity of the data. use float(“inf”) for non periodic data
sigmoid_parameters – tuple (sigma, a, b)
axes – Array like structure with two matplotlib axe objects ore None. If None a new figure is generated.
low_d_max – upper limit for plotting the low_d sigmoid
bins – number of bins for histogram
- Returns:
matplotlib axe objects