encodermap.misc package#

Submodules#

encodermap.misc.backmapping module#

Backmapping functions largely based upon encodermap_tf1’s nackmapping an martini-tools backwards.py

Todo

  • Using Quaternions in Tensorflow rotation matrices could be accelerated?

  • Multi Top.

encodermap.misc.backmapping._dihedral(xyz, indices)[source]#

Returns current dihedral angle between positions.

Adapted from MDTraj.

Parameters:
  • xyz (np.ndarray). This function only takes a xyz array of a single frame and uses np.expand_dims() – to make that fame work with the _displacement function from mdtraj.

  • indices (Union[np.ndarray, list]) – List of 4 ints describing the dihedral.

encodermap.misc.backmapping._displacement(xyz, pairs)[source]#

Displacement vector between pairs of points in each frame

encodermap.misc.backmapping._get_far_and_near_networkx(bondgraph, edge_indices, top=None)[source]#

Returns near and far sides for a list of edges giving the indices of the two atoms at which the structure is broken.

Parameters:
  • bondgraph (networkx.classes.graph.Graph) – The bondgraph describing the protein.

  • edge_indices (np.ndarray) – The edges the graph will be broken at.

Returns:

A tuple containing the following:

near_sides (list of np.ndarray): List of integer arrays giving the near sides. len(near_sides) == len(edge_indices). far_sides (list of np.ndarray): Same as near sides, but this time the far sides.

Return type:

tuple

encodermap.misc.backmapping.backbone_hydrogen_oxygen_crossproduct(backbone_positions)[source]#

encodermap.misc.clustering module#

Functions for building clusters.

encodermap.misc.clustering._gen_dummy_traj_single(trajs, cluster_no, align_string='name CA', nglview=False, stack_atoms=False, shorten=False, max_frames=-1, superpose=True, col='cluster_membership', subunit='', ref_align_string='name CA', base_traj=None)[source]#

Called when only one cluster is needed.

encodermap.misc.clustering.gen_dummy_traj(trajs, cluster_no, align_string='name CA', nglview=False, stack_atoms=False, shorten=False, max_frames=-1, superpose=True, col='cluster_membership', subunit='', ref_align_string='name CA', base_traj=None)[source]#

Makes a dummy traj from an encodermap trajectory which contains trajectories with different topology.

This function takes an encodermap.TrajEnsemble object and returns mdtraj trajectories for clustered data. This function can concatenate trajs even if the topology of trajecotries in the TrajEnsemble class is different. The topology of this dummy traj will be wrong, but the atomic positions are correct.

This function constructs a traj of length cluster_membership.count(cluster_no) with the topology of the first frame of this cluster (trajs.get_single_frame(cluster_membership.index(cluster_no))) and changes the atomic coordinates of this traj based on the other frames in this cluster.

Note

If you have loaded the encodermap functions with the ‘no_load’ backend a second call to this function with the same parameters will be faster, because the trajectory frames have been loaded to memory.

Parameters:
  • trajs (encodermap.TrajEnsemble) – The trajs which were clustered.

  • cluster_no (Union[int, int, np.ndarray, list]) – The cluster_no of the cluster to make the dummy traj from. Can be: * int or int: The cluster will be found by using the trajs own cluster_membership in the trajs pandas dataframe. * np.array or list: If list or np.array is provided multiple clusters are returned and colored according to clsuter_no.

  • align_string (str, optional) – Use this mdtraj atom selection string to align the frames of the dummy traj. Defaults to ‘name CA’.

  • nglview (bool, optional) – Whether to return a tuple of an nglview.view object and the traj or not. Defaults to False.

  • stack_atoms (bool, optional) – Whether to stack all frames into a single frame with mutliple structures in it. This option is useful, if you want to generate a picture of interpenetrating structures. Defaults to False.

  • shorten (bool, optional) – Whether to return all structures or just a subset of roughly ten structures. Defaults to False.

  • max_frames (int, optional) – Only return so many frames. If set to -1 all frames will be returned. Defaults to -1.

  • superpose (Union(bool, mdtraj.Trajectory), optional) – Whether the frames of the returned traj should be superposed to frame 0 of the traj. If an mdtraj Trajectory is provided this trajectory is used to superpose. Defaults to True.

  • subunit (str, optional) – When you want to only visualize an ensemble of certain parts of your protein but keep some part stationary (see align_str), you can provide a mdtraj selection string. This part of the protein will only be rendered from the first frame. The other parts will be rendered as an ensemble of structures (either along atom (stack_atoms = True) or time (stack_atoms = False)). Defaults to ‘’.

  • ref_align_string (str, optional) – When the type of superpose is mdtraj.Trajectory with a different topology than trajs, you can give a different align string into this argument. Defaults to ‘name CA’.

  • base_traj (Union[None, mdtraj.Trajectory], optional) – An mdtraj.Trajectory that will be set to the coordinates from trajs, instead of trajs[0]. Normally, the first traj in trajs (trajs[0]) will be used as a base traj. It will be extended into the time-direction until it has the desired number of frames (shorten=True; 10, max_frames=N, N; etc.). If you don’t want to use this traj but something else, you can feed this option an mdtraj.Trajectory object. Defaults to None.

Returns:

A tuple containing:

view (nglview.view): The nglview.view object if nglview == True,

is None otherwise.

dummy_traj (mdtraj.Trajectory): The mdtraj trajectory with wrong

topology but correct atomic positions.

Return type:

tuple

See also

See the render_vmd function in this document to render an image of the returned traj.

encodermap.misc.clustering.get_cluster_frames(trajs, cluster_no, align_string='name CA', nglview=False, stack_atoms=False, shorten=False, max_frames=-1, superpose=True, col='cluster_membership', subunit='', ball_and_stick=False, cmap='viridis')[source]#

encodermap.misc.distances module#

encodermap.misc.distances.pairwise_dist(positions, squared=False, flat=False)[source]#

Tensorflow implementation of scipy.spatial.distances.cdist.

Returns a tensor with shape (positions.shape[1], positions.shape[1]). This tensor is the distance matrix of the provided positions. The matrix is hollow, i.e. the diagonal elements are zero.

Parameters:
  • positions (Union[np.ndarray, tf.Tensor]) – Collection of n-dimensional points. positions.shape[0] are points. positions.shape[1] are dimensions.

  • squared (bool, optional) – Whether to return the pairwise squared euclidean distance matrix or normal euclidean distance matrix. Defaults to False.

  • flat (bool, otpional) – Whether to return only the lower triangle of the hollow matrix. Setting this to true mimics the behavior of scipy.spatial.distance.pdist. Defaults to False.

encodermap.misc.distances.pairwise_dist_periodic(positions, periodicity)[source]#

Pairwise distances using periodicity.

Parameters:
  • positions (Union[np.ndarray, tf.Tensor]) – The positions of the points. Currently only 2D arrays with positions.shape[0] == n_points and positions.shape[1] == 1 (rotational values) is supported.

  • periodicity (float) – The periodicity of the data. Most often you will use either 2*pi or 360.

encodermap.misc.distances.periodic_distance(a, b, periodicity=6.283185307179586)[source]#

Calculates distance between two points and respects periodicity.

If the provided dataset is periodic (i.e. angles and torsion angles), the returned distance is corrected.

Parameters:
  • a (tf.Tensor) – Coordinate of point a.

  • b (tf.Tensor) – Coordinate of point b.

  • periodicity (float, optional) – The periodicity (i.e. the box length/ maximum angle) of your data. Defaults to 2*pi. Provide float(‘inf’) for no periodicity.

Example

>>> import encodermap as em
>>> x = tf.convert_to_tensor(np.array([[1.5], [1.5]]))
>>> y = tf.convert_to_tensor(np.array([[-3.1], [-3.1]]))
>>> r = em.misc.periodic_distance(x, y)
>>> print(r.numpy())
[[1.68318531]
 [1.68318531]]
encodermap.misc.distances.periodic_distance_np(a, b, periodicity=6.283185307179586)[source]#

Calculates distance between two points and respects periodicity.

If the provided dataset is periodic (i.e. angles and torsion angles), the returned distance is corrected.

Parameters:
  • a (np.ndarray) – Coordinate of point a.

  • b (np.ndarray) – Coordinate of point b.

  • periodicity (float, optional) – The periodicity (i.e. the box length/ maximum angle) of your data. Defaults to 2*pi. Provide float(‘inf’) for no periodicity.

encodermap.misc.distances.sigmoid(sig, a, b)[source]#

Returns a sigmoid function with specified parameters.

Parameters:
  • sig (float) – Sigma.

  • a (float) –

  • b (float) –

Returns:

A function that can be used to calculate the sigmoid with the

specified parameters.

Return type:

function

encodermap.misc.errors module#

exception encodermap.misc.errors.BadError(message)[source]#

Bases: Error

Raised when the Error is really bad.

encodermap.misc.function_def module#

encodermap.misc.function_def.function(debug=False)[source]#

encodermap.misc.misc module#

Miscellaneous functions.

encodermap.misc.misc._can_be_feature(inp)[source]#

Function to decide whether the input can be interpreted by the Featurizer class.

Outputs True, if inp == ‘all’ or inp is a list of strings contained in FEATURE_NAMES.

Parameters:

inp (Any) – The input.

Returns:

True, if inp can be interpreted by featurizer.

Return type:

bool

Example

>>> from encodermap.misc.misc import _can_be_feature
>>> _can_be_feature('all')
True
>>> _can_be_feature('no')
False
>>> _can_be_feature(['AllCartesians', 'central_dihedrals'])
True
encodermap.misc.misc._datetime_windows_and_linux_compatible()[source]#

Portable way to get now as either a linux or windows compatible string.

For linux systems strings in this manner will be returned:

2022-07-13T16:04:04+02:00

For windows systems strings in this manner will be returned:

2022-07-13_16-04-46

encodermap.misc.misc._flatten_model(model_nested, input_dim=None, return_model=True)[source]#

Flattens a nested tensorflow.keras.models.Model.

Can be useful if a model consists of two sequential models and needs to be flattened to be plotted.

encodermap.misc.misc._validate_uri(str_)[source]#

Checks whether the str_ is a valid uri.

encodermap.misc.misc.create_n_cube(n=3, points_along_edge=500, sigma=0.05, same_colored_edges=3, seed=None)[source]#

Creates points along the edges of an n-dimensional unit hyper-cube.

The cube is created using networkx.hypercube_graph and points are placed along the edges of the cube. By providing a sigma value the points can be shifted by some Gaussian noise.

Parameters:
  • n (int, optional) – The dimension of the Hypercube (can also take 1 or 2). Defaults to 3.

  • points_along_edge (int, optional) – How many points should be placed along any edge. By increasing the number of dimensions, the number of edges increases, which also increases the total number of points. Defaults to 500.

  • sigma (float, optional) – The sigma value for np.random.normal which introduces Gaussian noise to the positions of the points. Defaults to 0.05.

  • same_color_edges (int, optional) – How many edges of the Hypercube should be colored with the same color. This can be used to later better visualize the edges of the cube. Defaults to 3.

  • seed (int, optional) – If an int is provided this will be used as a seed for np.random and fix the random state. Defaults to None which produces random results every time this function is called.

Returns:

A tuple containing the following:

coordinates (np.ndarray): The coordinates of the points. colors (np.ndarray): Integers that can be used for coloration.

Return type:

tuple

Example

>>> # A sigma value of zero means no noise at all.
>>> coords, colors = create_n_cube(2, sigma=0)
>>> coords[0]
[0., 1.]
encodermap.misc.misc.get_full_common_str_and_ref(trajs, tops, common_str)[source]#

Matches traj_files, top_files and common string and returns lists with the same length matching the provided common str.

Parameters:
  • trajs (list[str]) – A list of str pointing to trajectory files.

  • tops (list[str]) – A list of str pointing to topology files.

  • common_str (list[str]) – A list of strings that can be found in both trajs and tops (i.e. substrings).

encodermap.misc.misc.plot_model(model, input_dim)[source]#

Plots keras model using tf.keras.utils.plot_model

encodermap.misc.misc.run_path(path)[source]#

Creates a directory at “path/run{i}” where the i is corresponding to the smallest not yet existing path.

Parameters:

path (str) – Path to the run folder.

Returns:

The new output path.

Return type:

str

Exampples:
>>> import os
>>> import encodermap as em
>>> os.makedirs('run1/')
>>> em.misc.run_path('run1/')
'run2/'
>>> os.listdir()
['run1/', 'run2/']

encodermap.misc.saving_loading_models module#

Todo

  • This is in a desperate need of rework.

encodermap.misc.saving_loading_models.load_list_of_models(models: list[str], custom_objects: Optional[dict[str, Callable]] = None) list[keras.engine.training.Model][source]#

Load the models supplied in models using keras.

Parameters:

models (list[str]) – The paths of the models to be loaded

encodermap.misc.saving_loading_models.load_model(autoencoder_class: AutoencoderClass, checkpoint_path: str, read_only: bool = True, overwrite_tensorboard_bool: bool = False, trajs: Optional[TrajEnsemble] = None, sparse: bool = False) AutoencoderClass[source]#

Reloads a tf.keras.Model from a checkpoint path.

For this, an AutoencoderClass is necessary, to provide the corresponding custom objects, such as loss functions.

encodermap.misc.saving_loading_models.model_sort_key(model_name: str) int[source]#

Returns numerical values baed on whether model_name contains substrings.

Parameters:

model_name (str) – The filepath to the saved model.

Returns:

Returns 0 for ‘encoder’, 1 for ‘decoder’, 2 for everything else.

Return type:

int

encodermap.misc.saving_loading_models.save_model(model, main_path, inp_class_name, step=None, current_step=None)[source]#

encodermap.misc.summaries module#

Functions that write stuff to tensorboard. Mainly used for the iumage callbacks.

encodermap.misc.summaries._gen_hist(data, hist_kws)[source]#

Creates matplotlib histogram and returns tensorflow Tensor that represents an image.

Parameters:
  • data (Union[np.ndarray, tf.Tensor]) – The xy data to be used. data.ndim should be 2. 1st dimension the datapoints, 2nd dimension x, y.

  • hist_kws (dict) – Additional keywords to be passed to matplotlib.pyplot.hist2d().

Returns:

A tensorflow tensor that can be written to Tensorboard with tf.summary.image().

Return type:

tf.Tensor

encodermap.misc.summaries._gen_nan_image()[source]#

Creates matplotlib image, whith debug info.

Returns:

A tensorflow tensor that can be written to Tensorboard with tf.summary.image().

Return type:

tf.Tensor

encodermap.misc.summaries._gen_scatter(data, scatter_kws)[source]#

Creates matplotlib scatter plot and returns tensorflow Tensor that represents an image.

Parameters:
  • data (Union[np.ndarray, tf.Tensor]) – The xy data to be used. data.ndim should be 2. 1st dimension the datapoints, 2nd dimension x, y.

  • scatter_kws (dict) – Additional keywords to be passed to matplotlib.pyplot.scatter().

Returns:

A tensorflow tensor that can be written to Tensorboard with tf.summary.image().

Return type:

tf.Tensor

encodermap.misc.summaries.add_layer_summaries(layer, step=None)[source]#

Adds summaries for a layer to Tensorboard.

Parameters:
  • layer (tf.keras.layers.Layer) – The layer.

  • step (Union[tf.Tensor, int, None], optional) – The current step. Can be either a Tensor or None. Defaults to None.

encodermap.misc.summaries.image_summary(lowd, step=None, scatter_kws={'s': 20}, hist_kws={'bins': 50}, additional_fns=None)[source]#

Writes an image to Tensorboard.

Parameters:
  • lowd (np.ndarray) – The data to plot. Usually that will be the output of the latent space of the Autoencoder. This array has to be of dimensionality 2 (rows and columns). The first two points of the rows will be used as xy coordinates in a scatter plot.

  • step (Union[int, None], optional) – The training step under which you can find the image in tensorboard. Defaults to None.

  • scatter_kws (dict, optional) – A dictionary with keyword arguments to be passed to matpltlib.pyplot.scatter(). Defaults to {‘s’: 20}.

  • hist_kws (dict, optional) – A dictionary with keyword arguments to be passed to matpltlib.pyplot.hist2d(). Defaults to {‘bins’: 50}.

  • additional_fns (Union[None, list], optional) – A list of functions that take the data of the latent space and return a tf.Tensor that can be logged to tensorboard with tf.summary.image().

Raises:

AssertionError – When lowd.ndim is not 2 and when len(lowd) != len(ids)

encodermap.misc.transformations module#

Homogeneous Transformation Matrices and Quaternions.

A library for calculating 4x4 matrices for translating, rotating, reflecting, scaling, shearing, projecting, orthogonalizing, and superimposing arrays of 3D homogeneous coordinates as well as for converting between rotation matrices, Euler angles, and quaternions. Also includes an Arcball control object and functions to decompose transformation matrices.

Author:

Christoph Gohlke

Organization:

Laboratory for Fluorescence Dynamics, University of California, Irvine

Version:

2013.06.29

Requirements#

Notes#

The API is not stable yet and is expected to change between revisions.

This Python code is not optimized for speed. Refer to the transformations.c module for a faster implementation of some functions.

Documentation in HTML format can be generated with epydoc.

Matrices (M) can be inverted using numpy.linalg.inv(M), be concatenated using numpy.dot(M0, M1), or transform homogeneous coordinate arrays (v) using numpy.dot(M, v) for shape (4, *) column vectors, respectively numpy.dot(v, M.T) for shape (*, 4) row vectors (“array of points”).

This module follows the “column vectors on the right” and “row major storage” (C contiguous) conventions. The translation components are in the right column of the transformation matrix, i.e. M[:3, 3]. The transpose of the transformation matrices may have to be used to interface with other graphics systems, e.g. with OpenGL’s glMultMatrixd(). See also [16].

Calculations are carried out with numpy.float64 precision.

Vector, point, quaternion, and matrix function arguments are expected to be “array like”, i.e. tuple, list, or numpy arrays.

Return types are numpy arrays unless specified otherwise.

Angles are in radians unless specified otherwise.

Quaternions w+ix+jy+kz are represented as [w, x, y, z].

A triple of Euler angles can be applied/interpreted in 24 ways, which can be specified using a 4 character string or encoded 4-tuple:

Axes 4-string: e.g. ‘sxyz’ or ‘ryxy’

  • first character : rotations are applied to ‘s’tatic or ‘r’otating frame

  • remaining characters : successive rotation axis ‘x’, ‘y’, or ‘z’

Axes 4-tuple: e.g. (0, 0, 0, 0) or (1, 1, 1, 1)

  • inner axis: code of axis (‘x’:0, ‘y’:1, ‘z’:2) of rightmost matrix.

  • parity : even (0) if inner axis ‘x’ is followed by ‘y’, ‘y’ is followed by ‘z’, or ‘z’ is followed by ‘x’. Otherwise odd (1).

  • repetition : first and last axis are same (1) or different (0).

  • frame : rotations are applied to static (0) or rotating (1) frame.

References#

  1. Matrices and transformations. Ronald Goldman. In “Graphics Gems I”, pp 472-475. Morgan Kaufmann, 1990.

  2. More matrices and transformations: shear and pseudo-perspective. Ronald Goldman. In “Graphics Gems II”, pp 320-323. Morgan Kaufmann, 1991.

  3. Decomposing a matrix into simple transformations. Spencer Thomas. In “Graphics Gems II”, pp 320-323. Morgan Kaufmann, 1991.

  4. Recovering the data from the transformation matrix. Ronald Goldman. In “Graphics Gems II”, pp 324-331. Morgan Kaufmann, 1991.

  5. Euler angle conversion. Ken Shoemake. In “Graphics Gems IV”, pp 222-229. Morgan Kaufmann, 1994.

  6. Arcball rotation control. Ken Shoemake. In “Graphics Gems IV”, pp 175-192. Morgan Kaufmann, 1994.

  7. Representing attitude: Euler angles, unit quaternions, and rotation vectors. James Diebel. 2006.

  8. A discussion of the solution for the best rotation to relate two sets of vectors. W Kabsch. Acta Cryst. 1978. A34, 827-828.

  9. Closed-form solution of absolute orientation using unit quaternions. BKP Horn. J Opt Soc Am A. 1987. 4(4):629-642.

  10. Quaternions. Ken Shoemake. http://www.sfu.ca/~jwa3/cmpt461/files/quatut.pdf

  11. From quaternion to matrix and back. JMP van Waveren. 2005. http://www.intel.com/cd/ids/developer/asmo-na/eng/293748.htm

  12. Uniform random rotations. Ken Shoemake. In “Graphics Gems III”, pp 124-132. Morgan Kaufmann, 1992.

  13. Quaternion in molecular modeling. CFF Karney. J Mol Graph Mod, 25(5):595-604

  14. New method for extracting the quaternion from a rotation matrix. Itzhack Y Bar-Itzhack, J Guid Contr Dynam. 2000. 23(6): 1085-1087.

  15. Multiple View Geometry in Computer Vision. Hartley and Zissermann. Cambridge University Press; 2nd Ed. 2004. Chapter 4, Algorithm 4.7, p 130.

  16. Column Vectors vs. Row Vectors. http://steve.hollasch.net/cgindex/math/matrix/column-vec.html

Examples#

>>> alpha, beta, gamma = 0.123, -1.234, 2.345
>>> origin, xaxis, yaxis, zaxis = [0, 0, 0], [1, 0, 0], [0, 1, 0], [0, 0, 1]
>>> I = identity_matrix()
>>> Rx = rotation_matrix(alpha, xaxis)
>>> Ry = rotation_matrix(beta, yaxis)
>>> Rz = rotation_matrix(gamma, zaxis)
>>> R = concatenate_matrices(Rx, Ry, Rz)
>>> euler = euler_from_matrix(R, 'rxyz')
>>> numpy.allclose([alpha, beta, gamma], euler)
True
>>> Re = euler_matrix(alpha, beta, gamma, 'rxyz')
>>> is_same_transform(R, Re)
True
>>> al, be, ga = euler_from_matrix(Re, 'rxyz')
>>> is_same_transform(Re, euler_matrix(al, be, ga, 'rxyz'))
True
>>> qx = quaternion_about_axis(alpha, xaxis)
>>> qy = quaternion_about_axis(beta, yaxis)
>>> qz = quaternion_about_axis(gamma, zaxis)
>>> q = quaternion_multiply(qx, qy)
>>> q = quaternion_multiply(q, qz)
>>> Rq = quaternion_matrix(q)
>>> is_same_transform(R, Rq)
True
>>> S = scale_matrix(1.23, origin)
>>> T = translation_matrix([1, 2, 3])
>>> Z = shear_matrix(beta, xaxis, origin, zaxis)
>>> R = random_rotation_matrix(numpy.random.rand(3))
>>> M = concatenate_matrices(T, R, Z, S)
>>> scale, shear, angles, trans, persp = decompose_matrix(M)
>>> numpy.allclose(scale, 1.23)
True
>>> numpy.allclose(trans, [1, 2, 3])
True
>>> numpy.allclose(shear, [0, math.tan(beta), 0])
True
>>> is_same_transform(R, euler_matrix(axes='sxyz', *angles))
True
>>> M1 = compose_matrix(scale, shear, angles, trans, persp)
>>> is_same_transform(M, M1)
True
>>> v0, v1 = random_vector(3), random_vector(3)
>>> M = rotation_matrix(angle_between_vectors(v0, v1), vector_product(v0, v1))
>>> v2 = numpy.dot(v0, M[:3,:3].T)
>>> numpy.allclose(unit_vector(v1), unit_vector(v2))
True
encodermap.misc.transformations._import_module(name, package=None, warn=False, prefix='_py_', ignore='_')[source]#

Try import all public attributes from module into global namespace.

Existing attributes with name clashes are renamed with prefix. Attributes starting with underscore are ignored by default.

Return True on successful import.

encodermap.misc.xarray module#

encodermap.misc.xarray._validate_uri(str_)[source]#

Checks whether the str_ is a valid uri.

encodermap.misc.xarray.construct_xarray_from_numpy(traj: SingleTraj, data: np.ndarray, name: str, labels: Optional[list[str]] = None, check_n_frames: bool = False) xr.DataArray[source]#

Constructs an xarray dataarray from a numpy array.

Three different cases are recognized:
  • The input array in data has ndim == 2. This kind of feature/CV is a per-frame feature, like the membership

    to clusters. Every frame of every trajectory is assigned a single value (most often int values).

  • The input array in data has ndim == 3: This is also a per-frame feature/CV, but this time every frame

    is characterized by a series of values. These values can be dihedral angles in the backbone starting from the protein’s N-terminus to the C-terminus, or pairwise distance features between certain atoms. The xarray datarrat constructed from this kind of data will have a label dimension that will either contain generic labels like ‘CUSTOM_FEATURE FEATURE 0’ or labels defined by the featurizer such as ‘SIDECHAIN ANGLE CHI1 OF RESIDUE 1LYS’.

  • The input array in data has ndim == 4. Here, the same feature/CV is duplicated for the protein’s atoms.

    Besides the XYZ coordinates of the atoms no other CVs should fall into this case. The labels will be 2-dimensional with ‘POSITION OF ATOM H1 IN RESIDUE 1LYS’ in dimension 0 and either ‘X’, ‘Y’ or ‘Z’ in dimension 1.

Parameters:
  • traj (em.SingleTraj) – The trajectory we want to create the xarray dataarray for.

  • data (np.ndarray) – The numpy array we want to create the data from. Note, that the data passed into this function should be expanded by np.expand_dim(a, axis=0), so to add a new axis to the complete data containing the trajectories of a trajectory ensemble.

  • name (str) – The name of the feature. This can be choosen freely. Names like ‘central_angles’, ‘backbone_torsions’ would make the most sense.

  • labels (Optional[list]) – If you have specific labels for your CVs in mind, you can overwrite the generic ‘CUSTOM_FEATURE FEATURE 0’ labels with providing a list for this argument. If None is provided, generic names will be given to the features. Defaults to None.

  • check_n_frames (bool) – Whether to check whether the number of frames in the trajectory matches the len of the data in at least one dimension. Defaults to False.

Returns:

An xarray.Dataarray.

Return type:

xarray.Dataarray

Examples

>>> import encodermap as em
>>> from encodermap.misc.xarray import construct_xarray_from_numpy
>>> # load file from RCSB and give it traj num to represent it in a potential trajectory ensemble
>>> traj = em.load('https://files.rcsb.org/view/1GHC.pdb', traj_num=1)
>>> # single trajectory needs to be expaneded into 'trajectory' axis
>>> z_coordinate = np.expand_dims(traj.xyz[:,:,0], 0)
>>> da = construct_xarray_from_numpy(traj, z_coordinate, 'z_coordinate')
>>> print(da.coords['Z_COORDINATE'].values[:2])
['Z_COORDINATE FEATURE 0' 'Z_COORDINATE FEATURE 1']
>>> print(da.coords['traj_num'].values)
[1]
>>> print(da.attrs['time_units'])
ps
encodermap.misc.xarray.unpack_data_and_feature(feat: Featurizer, traj: SingleTraj, input_data: np.ndarray, put_indices_into_attrs: bool = True) xr.Dataset[source]#

Makes a xarray.Dataset from data and a featurizer.

Usually, if you add multiple features to a PyEMMA featurizer, they are stacked along the feature axis. Let’s say, you have a trajectory with 20 frames and 3 residues. If you add the Ramachandran angles, you get 6 features (3xphi, 3xpsi). If you then also add the end-to-end distance as a feature, the data returned by PyEMMA will have the shape (20, 7). This function returns the correct indices, so that iteration of zip(Featurizer.active_features, indices) will yield the correct results.

Parameters:
  • feat (em.Featurizer) – An instance of the currently used encodermap.Featurizer.

  • traj (em.SingleTraj) – An instance of encodermap.SingleTraj, that the data in input_data was computed from

  • input_data (np.ndarray) – The data, as returned from PyEMMA.

  • put_indices_into_attrs (bool) – Whether to put the indices into the attrs. This needs to be False, when Ensembles are loaded because the Features of the ensemble load function do not match the real indices that should be there.

Returns:

An xarray.Dataset with all features in a nice format.

Return type:

xarray.Dataset

encodermap.misc.xarray_save_wrong_hdf5 module#

Allows the combined storing of CVs and trajectories in single HDF5/NetCDF4 files.

These files represent collated and completed trajectory ensembles, which can be lazy-loaded (memory efficient) and used as training input for encodermap’s NNs.

encodermap.misc.xarray_save_wrong_hdf5._get_scheduler(get=None, collection=None) str | None[source]#

Determine the dask scheduler that is being used.

None is returned if no dask scheduler is active.

See Also#

dask.base.get_scheduler

encodermap.misc.xarray_save_wrong_hdf5._to_netcdf(dataset: Dataset, path_or_file: Optional[str] = None, mode: Optional[str] = 'w', format: Optional[str] = None, group: Optional[str] = None, engine: Optional[str] = None, encoding: Optional[Mapping] = None, unlimited_dims: Optional[Iterable[Hashable]] = None, compute: bool = True, multifile: bool = False, invalid_netcdf: bool = False) Union[None, Delayed][source]#

This function creates an appropriate datastore for writing a dataset to disk as a netCDF file

See Dataset.to_netcdf for full API docs.

The multifile argument is only for the private use of save_mfdataset.

encodermap.misc.xarray_save_wrong_hdf5._validate_attrs(dataset: Dataset) None[source]#

attrs must have a string key and a value which is either: a number, a string, an ndarray or a list/tuple of numbers/strings.

encodermap.misc.xarray_save_wrong_hdf5._validate_dataset_names(dataset: Dataset) None[source]#

DataArray.name and Dataset keys must be a string or None

encodermap.misc.xarray_save_wrong_hdf5.save_netcdf_alongside_mdtraj(fname: str, dataset: Dataset) None[source]#

Module contents#