encodermap.misc package#
Submodules#
encodermap.misc.backmapping module#
Backmapping functions to create new atomistic conformations from intrinsic coordinates.
- mdtraj_backmapping(top: Path | str | int | Topology | None, dihedrals: ndarray | None, sidechain_dihedrals: ndarray | None, trajs: TrajEnsemble | SingleTraj | None, remove_component_size: int, verify_every_rotation: bool, angle_type: Literal['degree', 'radian'], omega: bool, guess_amid_atoms: bool, return_indices: Literal[False], parallel: bool, progbar: Any | None) Trajectory [source]#
- mdtraj_backmapping(top: Path | str | int | Topology | None, dihedrals: ndarray | None, sidechain_dihedrals: ndarray | None, trajs: TrajEnsemble | SingleTraj | None, remove_component_size: int, verify_every_rotation: bool, angle_type: Literal['degree', 'radian'], omega: bool, guess_amid_atoms: bool, return_indices: Literal[True], parallel: bool, progbar: Any | None) tuple[Trajectory, dict[str, ndarray]]
Uses MDTraj and Christoph Gohlke’s transformations.py to rotate the bonds in the provided topology.
Todo
Make this faster. Maybe write a C or FORTRAN implementation.
- General procedure:
- Decide on which topology to use (if different topologies are in the
TrajEnsemble class, the dihedrals and sidechain_dihedrals arrays need to be altered so that the correct dihedrals are used. Because EncoderMap is trained on a full input dihedrals and sidechain_dihedrals contain the dihedrals for the topology in TrajEnsemble with most of such angles. Some SingleTraj classes in TrajEnsemble might not contain all these angles if, for example, an amino acid has been modified the mutant contains more sidechain dihedrals than the wt. So the correct sidechain dihedrals for the wildtype need to be selected.
- Get the indices of the far sides of the rotations. The graph is
gradually broken apart and the longer sub-graphs are kept.
- Extend the trajectory. The lengths of dihedrals and sidechain_dihedrals
should match. The frame given by top will be duplicated len(dihedrals)-times.
- Get the current angles. We know what the final angles should be,
but now how far to rotate the bonds. This can be done by getting the difference between current and target angle.
- Rotate the bonds. Using Christoph Gohlke’s transformations.py,
the rotation matrix is constructed and the array is padded with zeros to resemble an array of quaternions.
- Parameters:
top (Optional[str]) – The topology file to use.
dihedrals (Optional[np.ndarray]) – The dihedrals to put onto the trajectory. len(dihedrals) is number of frames of output trajectory. dihedrals.shape[1] needs to be the same as the number of dihedrals in the topology. Can be None, in which case dihedrals and sidechain dihedrals will be faked.
sidechain_dihedrals (Optional[np.ndarray]) – The sidechain dihedrals to put onto the trajectory. If None is provided, the sidechains are kept like they were in the topology. Defaults to None.
trajs (Optional[em.TrajEnsemble, em.SingleTraj]) – Encodermap TrajEnsemble class. It Can accelerate the loading of current dihedral angles. Checks if provided topology is part of trajs. Defaults to None.
verify_every_rotation (bool) – Whether the rotation succeeded.
angle_type (Literal["degree", "radians"]) – Whether input is in degrees. Input will be converted to radians. Defaults to False.
omega (bool) – Whether your input backbone dihedrals contain the omega angle.
return_indices (bool) –
Whether to not only return the back-mapped trajectory, but also a dict of labels. This dict contains the keys:
’dihedrals_labels’
’generic_dihedrals_labels’
’side_dihedrals_labels’
’generic_side_dihedrals_labels’
Which matches the indices of the returned dihedrals with the input MD structures in top and/or trajs. This can be useful to make sure that input dihedrals match output dihedrals. Why? Because there are some proline dihedrals that cannot be adjusted. They are filtered out before doing backmapping, and the indices give the names of all dihedrals that were adjusted. See the Example below.
Examples
>>> from pathlib import Path >>> import numpy as np >>> import encodermap as em >>> from pprint import pprint >>> output_dir = Path( ... em.get_from_kondata( ... "OTU11", ... mk_parentdir=True, ... silence_overwrite_message=True, ... ), ... ) >>> # assign how many backbone angles we need >>> traj = em.load(output_dir / "OTU11_wt_only_prot.pdb") >>> traj.load_CV("central_dihedrals") >>> n_angles = traj.central_dihedrals.shape[-1] >>> n_angles 732 >>> # create some fake dihedrals with a uniform distribution between -pi and pi >>> dihedrals = np.random.uniform(low=-np.pi, high=np.pi, size=(5, n_angles)) >>> out, index = em.misc.backmapping.mdtraj_backmapping( ... top=output_dir / "OTU11_wt_only_prot.pdb", ... dihedrals=dihedrals, ... remove_component_size=10, ... return_indices=True, ... ) >>> out = em.SingleTraj(out) >>> out.load_CV("central_dihedrals") >>> # Here you will see, what indicies were automatically dropped during backmapping >>> # They will be proline phi angles, as these angles can not be >>> # freely rotated >>> all_coords = set(out._CVs.coords["CENTRAL_DIHEDRALS"].values) >>> indexed_coords = set(index['dihedrals_labels']) >>> pprint(all_coords - indexed_coords) {'CENTERDIH PHI RESID PRO: 8 CHAIN 0', 'CENTERDIH PHI RESID PRO: 70 CHAIN 0', 'CENTERDIH PHI RESID PRO: 73 CHAIN 0', 'CENTERDIH PHI RESID PRO: 80 CHAIN 0', 'CENTERDIH PHI RESID PRO: 151 CHAIN 0', 'CENTERDIH PHI RESID PRO: 200 CHAIN 0', 'CENTERDIH PHI RESID PRO: 205 CHAIN 0', 'CENTERDIH PHI RESID PRO: 231 CHAIN 0', 'CENTERDIH PHI RESID PRO: 234 CHAIN 0', 'CENTERDIH PHI RESID PRO: 238 CHAIN 0'}
encodermap.misc.clustering module#
Functions for building clusters.
encodermap.misc.distances module#
EncoderMap’s implements different distance computations.
Normal: Euclidean distance between two points.
Periodic: Euclidean distance between two points lying in a periodic space.
Pairwise: Euclidean distance between sets of points. Either with or without periodicity.
- pairwise_dist(positions, squared=False, flat=False)[source]#
Tensorflow implementation of scipy.spatial.distances.cdist.
Returns a tensor with shape (positions.shape[1], positions.shape[1]). This tensor is the distance matrix of the provided positions. The matrix is hollow, i.e., the diagonal elements are zero.
Thanks to https://omoindrot.github.io/triplet-loss for this implementation. Find an archived link here: https://archive.is/lNT2L
- Parameters:
positions (Union[np.ndarray, tf.Tensor]) – Collection of n-dimensional points. positions[0] are points. positions[1] are dimensions.
squared (bool) – Whether to return the pairwise squared Euclidean distance matrix or normal Euclidean distance matrix. Defaults to False.
flat (bool) – Whether to return only the lower triangle of the hollow matrix. Setting this to true mimics the behavior of scipy.spatial.distance.pdist. Defaults to False.
- Returns:
The distances.
- Return type:
tf.Tensor
- pairwise_dist_periodic(positions, periodicity)[source]#
Pairwise distances using periodicity.
- Parameters:
positions (tf.Tensor) – The positions of the points. Currently only 2D arrays with positions.shape[0] == n_points and positions.shape[1] == 1 (rotational values) is supported.
periodicity (float) – The periodicity of the data. Most often you will use either 2*pi or 360.
- Returns:
The dists.
- Return type:
tf.Tensor
- periodic_distance(a, b, periodicity=6.283185307179586)[source]#
Calculates distance between two points and respects periodicity.
If the provided dataset is periodic (i.e. angles and torsion angles), the returned distance is corrected.
- Parameters:
a (tf.Tensor) – Coordinate of point a.
b (tf.Tensor) – Coordinate of point b.
periodicity (float) – The periodicity (i.e. the box length/ maximum angle) of your data. Defaults to 2*pi. Provide float(‘inf’) for no periodicity.
- Returns:
The distances accounting for periodicity.
- Return type:
tf.Tensor
Example
>>> import encodermap as em >>> x = tf.convert_to_tensor(np.array([[1.5], [1.5]])) >>> y = tf.convert_to_tensor(np.array([[-3.1], [-3.1]])) >>> r = em.misc.periodic_distance(x, y) >>> print(r.numpy()) [[1.68318531] [1.68318531]]
- periodic_distance_np(a, b, periodicity=6.283185307179586)[source]#
Calculates distance between two points and respects periodicity.
If the provided dataset is periodic (i.e. angles and torsion angles), the returned distance is corrected.
- Parameters:
a (np.ndarray) – Coordinate of point a.
b (np.ndarray) – Coordinate of point b.
periodicity (float) – The periodicity (i.e. the box length/ maximum angle) of your data. Defaults to 2*pi. Provide float(‘inf’) for no periodicity.
- Returns:
The distances accounting for periodicity.
- Return type:
np.ndarray
encodermap.misc.function_def module#
Wraps tensorflow’s tf.function again to accept a debug=True or debug=False argument.
With debug=True, the function will not be compiled. With debug=False (which is teh default), it will be compiled.
encodermap.misc.misc module#
Miscellaneous functions.
- create_n_cube(n=3, points_along_edge=500, sigma=0.05, same_colored_edges=3, seed=None)[source]#
Creates points along the edges of an n-dimensional unit hyper-cube.
The cube is created using networkx.hypercube_graph and points are placed along the edges of the cube. By providing a sigma value, the points can be shifted by some Gaussian noise.
- Parameters:
n (int) – The dimension of the Hypercube (can also take 1 or 2). Defaults to 3.
points_along_edge (int) – How many points should be placed along any edge. By increasing the number of dimensions, the number of edges increases, which also increases the total number of points. Defaults to 500.
sigma (float) – The sigma value for np.random.normal which introduces Gaussian noise to the positions of the points. Defaults to 0.05.
same_color_edges (int) – How many edges of the Hypercube should be colored with the same color. This can be used to later better visualize the edges of the cube. Defaults to 3.
seed (Optional[int]) – If an int is provided, this will be used as a seed for np.random and fix the random state. Defaults to None which produces random results every time this function is called.
same_colored_edges (int)
- Returns:
- A tuple containing the following:
np.ndarray: The coordinates of the points.
np.ndarray: Integers that can be used for coloration.
- Return type:
Example
>>> from encodermap.misc.misc import create_n_cube >>> # A sigma value of zero means no noise at all. >>> coords, colors = create_n_cube(2, sigma=0) >>> coords[0] array([0., 0.])
- get_full_common_str_and_ref(trajs, tops, common_str)[source]#
Matches traj_files, top_files and common string and returns lists with the same length matching the provided common str.
- Parameters:
trajs (list[str]) – A list of str pointing to trajectory files.
tops (list[str]) – A list of str pointing to topology files.
common_str (list[str]) – A list of strings that can be found in both trajs and tops (i.e. substrings).
tuple – A tuple containing the following: - list[str]: A list of str with the traj file names. - list[str]: A list of str with the top file names. - list[str]: A list of str with the common_str’s. All lists have the same length.
- Return type:
- plot_model(model, input_dim=None)[source]#
Plots keras model using tf.keras.utils.plot_model
- Parameters:
model (tf.keras.Model)
input_dim (Optional[Sequence[int]])
- Return type:
Optional[Image]
- run_path(path)[source]#
Creates a directory at “path/run{i}” where the i is corresponding to the smallest not yet existing path.
- Exampples:
>>> import encodermap as em >>> import tempfile >>> import os >>> >>> def sort_key(inp: str) -> int: ... return int(inp[-1]) >>> >>> with tempfile.TemporaryDirectory() as td: ... # create some directories ... os.makedirs(os.path.join(td, "run0")) ... os.makedirs(os.path.join(td, "run1")) ... # em.misc.run_path will automatically advance the counter to 'run2' ... new_path = em.misc.run_path(td) ... print(new_path) ... print(sorted(os.listdir(td), key=sort_key)) /tmp/.../run2 ['run0', 'run1', 'run2']
encodermap.misc.rotate module#
Helpers to apply rotations to molecular coordinates.
- mdtraj_rotate(traj, angles, indices, deg=False, check_cyclic_backbone=True, verify_every_rotation=False, drop_proline_angles=False, delete_sulfide_bridges=True)[source]#
Uses MDTraj and Christoph Gohlke’s transformations.py to set bond rotations provided traj.
Input can be in radian (set deg to False) or degree (set deg to True).
- General procedure:
- Carry out some checks. Shapes of input need to be correct. traj
needs to have a single frame and not be of a cyclic protein nor contain multiple chains.
- Get the indices of the near and far side of the rotations. Every
dihedral angle is indexed by 4 atoms. The rotational axis is located between the central two atoms (dihedral[1:3]).
- Extend the trajectory. The lengths of dihedrals and sidechain_dihedrals
should match. The frame given by top will be duplicated len(dihedrals)-times.
- Get the current angles. We know what the final angles should be, but
now how far to rotate the bonds. This can be done by getting the difference between current and target angle.
- Rotate the bonds. Using Christoph Gohlke’s transformations.py,
the rotation matrix is constructed and the array is padded with zeros to resemble an array of quaternions.
- Parameters:
traj (mdtraj.Trajectory) – The trajectory to use. Needs to have only one frame.
angles (list[list[float]], np.ndarray) – The values the angles should assume after the rotations. This arg can either be a nested list with floats or (better) a numpy array with the shape angles.shape = (n_new_frames, n_indexed_dihedrals). Here, angles.shape[0] defines how many frames the output trajectory is going to have and angles.shape[1] should be similar to the number of dihedrals you want to rotate around. A shape of (4, 2) would indicate that two dihedrals are going to be used for rotation and the output trajectory is going to have 4 frames.
indices (list[list[int]], np.ndarray) – A list of ints indexing the dihedrals to be rotated around. Naturally indices.shape[1] needs to be 4. Additionally indices.shape[0] needs to be the same as angles.shape[1]. indices indexes the angles along axis 1 and angles sets the values of those angles along axis 0.
deg (bool, optional) – Whether argument angles is in deg. Defaults to False.
check_cyclic_backbone (bool) – Whether the backbone should be checked for being cyclic. Rotating around a backbone angle for a cyclic protein is not possible and thus an Exception is raised. However, rotation around sidechain dihedrals is still possible. If you are sure you want to rotate sidechain dihedrals set this argument to False to prevent the cyclic backbone check. Defaults to True.
verify_every_rotation (bool) – Whether the rotation succeeded.
drop_proline_angles (bool) – Whether to automatically drop proline angles and indices.
delete_sulfide_bridges (bool) – Whether to automatically remove bonds from between cysteine residues.
- Raises:
Exception – If the input seems like it is in degrees, but deg is False.
Exception – If traj contains more than 1 frame.
Exception – If traj is not fully connected.
Exception – If shapes of angles and indices mismatches.
Exception – If shape[1] of indices is not 4.
Exception – If backbone is cyclic and check_cyclic_backbone is True.
Exception – If the first rotation does not reach a tolerance of 1e-3.
- Returns:
An MDTraj trajectory with applied rotations.
- Return type:
mdtraj.Trajectory
Examples
>>> import mdtraj as md >>> import numpy as np
>>> # load an arbitrary protein from the pdb >>> traj = md.load_pdb('https://files.rcsb.org/view/1GHC.pdb') >>> print(traj.n_frames) 14
>>> # traj has multiple frames so we remove all but one >>> traj = traj[0]
>>> # Get indices of psi_angles >>> psi_indices, old_values = md.compute_psi(traj)
>>> # set every psi angle to be either 0 or 180 deg >>> angles = np.full((len(psi_indices), 2), [0, 180]).T
>>> # create the new traj with the desired rotations >>> out_traj = mdtraj_rotate(traj, angles, psi_indices, deg=True) >>> print(out_traj.n_frames) 2
>>> # check values >>> _, new_values = md.compute_psi(out_traj) >>> print(np.abs(np.rad2deg(new_values[0, :2]).round(0))) # prevent rounding inconsistencies [0. 0.] >>> print(np.abs(np.rad2deg(new_values[1, :2]).round(0))) # prevent rounding inconsistencies [180. 180.]
encodermap.misc.saving_loading_models module#
Implementation of saving and loading models.
- load_model(autoencoder: None | 'AutoencoderClass', checkpoint_path: str | Path, trajs: TrajEnsemble | None, sparse: bool, dataset: ndarray | DatasetV2 | None, print_message: bool, submodel: Literal[None], use_previous_model: bool, compat: bool) AutoencoderClass [source]#
- load_model(autoencoder: None | 'AutoencoderClass', checkpoint_path: str | Path, trajs: TrajEnsemble | None, sparse: bool, dataset: ndarray | DatasetV2 | None, print_message: bool, submodel: Literal['encoder', 'decoder'], use_previous_model: bool, compat: bool) Model
Reloads a model from a checkpoint path.
An implementation of saving the .keras files procuded by EncoderMap. The old legacy .model files can still be loaded by this function. Or use the load_model_legacy function directly.
- Parameters:
autoencoder (Union[None, "AutoencoderClass"]) – Kept for legacy reasons. The old .model files had a list of “custom_objects” that was created by the autoencoder classes (AutoEncoder, EncoderMap. AngleDihedralCartesianEncoderMap) and needed to be supplied when reloading the models from disk. The new implementations use the from_config and get_config implementations of serializable keras objects and thus, the layers and cost functions can save their own state. Is only needed to load legacy models and can be None if a new .keras model is loaded.
checkpoint_path (Union[str, Path]) – Can be either the path to a .keras file or to a directory with multiple .keras files in which case, the most recent .keras file will be loaded.
trajs (Optional[TrajEnsemble]) – A TrajEnsemble class for when a AngleDihedralCartesianEncoderMap is reloaded.
sparse (bool) – This argument is also only needed to load legacy .model files. Defaults to False.
dataset (Optional[Union[tf.data.Dataset, np.ndarray]]) – A pass-through to the dataset argument of the autoencoder classes (AutoEncoder, EncoderMap. AngleDihedralCartesianEncoderMap) which all can take a tf.data.Dataset. Can be None, in which case, the data will be sourced differently (The EncoderMap class uses example data from a 4D hypercube, the AngleDihedralCartesianEncoderMap uses the data from the provided trajs.)
print_message (bool) – Whether to print some debug information. Defaults to False.
submodel (Optional[Literal["encoder", "decoder"]]) – Whether to only load a specific submodel. In order to use this argument, a file with the name *encoder.keras or *decoder.keras has to be in the in checkpoint_path specified directory.
use_previous_model (bool) – Whether to load a model from an intermediate checkpoint step.
compat (bool) – Whether to fix a parameters.json file that has been saved with the legacy .model file.
- Returns:
- A tf.keras.models.Model
when you specified submodel. And an appropriate “AutoencoderClass” otherwise.
- Return type:
Union[tf.keras.models.Model, “AutoencoderClass”]
- save_model(model, main_path, inp_class_name=None, step=None, print_message=False)[source]#
Saves a model in the portable .keras format.
- Parameters:
model (tf.keras.models.Model) – The keras model to save. If the keras model has the attribute ‘encoder_model’ the encoder_model will be saved separately. The same with the attribute ‘decoder_model’.
main_path (Union[str, Path]) – Which directory to save the model to. If step is None, the nae will be saved_model_{time}.keras, where time is a current ISO-8601 formatted string.
step (Optional[int]) – Can be None, in which case the model will bve saved using the current time. Otherwise, the step argument will be used like so: saved_model_{step}.keras Defaults to None.
print_message (bool) – Whether to print a message after saving the model Defaults to False.
inp_class_name (str | None)
- Returns:
The path, where the model was saved.
- Return type:
Path
encodermap.misc.summaries module#
Functions that write stuff to tensorboard. Mainly used for the image callbacks.
- add_layer_summaries(layer, step=None)[source]#
Adds summaries for a layer to Tensorboard.
- Parameters:
layer (tf.keras.layers.Layer) – The layer.
step (Union[tf.Tensor, int, None], optional) – The current step. Can be either a Tensor or None. Defaults to None.
- Return type:
None
- image_summary(lowd, step=None, scatter_kws=None, hist_kws=None, additional_fns=None, backend='matplotlib')[source]#
Writes an image to Tensorboard.
- Parameters:
lowd (np.ndarray) – The data to plot. Usually that will be the output of the latent space of the Autoencoder. This array has to be of dimensionality 2 (rows and columns). The first two points of the rows will be used as xy coordinates in a scatter plot.
step (Optional[int]) – The training step under which you can find the image in tensorboard. Defaults to None.
scatter_kws (Optional[dict[str, Any]]) – A dict with items that plotly.express.scatter() will accept. If None is provided, a dict with size 20 will be passed to px.scatter(**{‘size_max’: 10, ‘opacity’: 0.2}), which sets an appropriate size of scatter points for the size of datasets encodermap is usually used for.
hist_kws (Optional[dict[str, Any]]) – A dict with items that encodermap.plot.plotting._plot_free_energy() will accept. If None is provided a dict with bins 50 will be passed to encodermap.plot.plotting._plot_free_energy(**{‘bins’: 50}). You can choose a colormap here by providing {‘bins’: 50, ‘cmap’: ‘plasma’} for this argument.
additional_fns (Optional[Sequence[Callable]]) – A sequence of functions that take the data of the latent space and return a tf.Tensor that can be logged to tensorboard with tf.summary.image().
(Literal["matplotlib" (backend) – Which backend to use for plotting. Defaults to ‘matplotlib’.
"plotly"] – Which backend to use for plotting. Defaults to ‘matplotlib’.
backend (Literal['matplotlib', 'plotly'])
- Raises:
AssertionError – When lowd.ndim is not 2 and when len(lowd) != len(ids)
- Return type:
None
encodermap.misc.xarray module#
EncoderMap’s xarray manipulation functions.
EncoderMap uses xarray datasets to save CV data alongside with trajectory data. These functions implement creation of such xarray datasets.
- construct_xarray_from_numpy(traj, data, name, deg=False, labels=None, check_n_frames=False)[source]#
Constructs a xarray.DataArray from a numpy array.
- Three cases are recognized:
- The input array in data has ndim == 2. This kind of feature/CV is a
per-frame feature, like the membership to clusters. Every frame of every trajectory is assigned a single value (most often int values).
- The input array in data has ndim == 3: This is also a per-frame
feature/CV, but this time every frame is characterized by a series of values. These values can be dihedral angles in the backbone starting from the protein’s N-terminus to the C-terminus, or pairwise distance features between certain atoms. The xarray datarray constructed from this kind of data will have a label dimension that will either contain generic labels like ‘CUSTOM_FEATURE FEATURE 0’ or labels defined by the featurizer, such as ‘SIDECHAIN ANGLE CHI1 OF RESIDUE 1LYS’.
- The input array in data has ndim == 4. Here, the same feature/CV is
duplicated for the protein’s atoms. Besides the XYZ coordinates of the atoms, no other CVs should fall into this case. The labels will be 2-dimensional with ‘POSITION OF ATOM H1 IN RESIDUE 1LYS’ in dimension 0 and either ‘X’, ‘Y’ or ‘Z’ in dimension 1.
- Parameters:
traj (em.SingleTraj) – The trajectory we want to create the xarray.DataArray for.
data (np.ndarray) – The numpy array we want to create the data from. Note that the data passed into this function should be expanded by np.expand_dim(a, axis=0), so to add a new axis to the complete data containing the trajectories of a trajectory ensemble.
name (str) – The name of the feature. This can be chosen freely. Names like ‘central_angles’, ‘backbone_torsions’ would make the most sense.
deg (bool) – Whether provided data is in deg or radians.
labels (Optional[list]) – If you have specific labels for your CVs in mind, you can overwrite the generic ‘CUSTOM_FEATURE FEATURE 0’ labels by providing a list for this argument. If None is provided, generic names will be given to the features. Defaults to None.
check_n_frames (bool) – Whether to check whether the number of frames in the trajectory matches the len of the data in at least one dimension. Defaults to False.
- Returns:
An xarray.DataArray.
- Return type:
Examples
>>> import encodermap as em >>> import numpy as np >>> from encodermap.misc.xarray import construct_xarray_from_numpy >>> # load file from RCSB and give it traj num to represent it in a >>> # potential trajectory ensemble >>> traj = em.load('https://files.rcsb.org/view/1GHC.pdb', traj_num=1) >>> # single trajectory needs to be expanded into 'trajectory' axis >>> z_coordinate = np.expand_dims(traj.xyz[:,:,0], 0) >>> da = construct_xarray_from_numpy(traj, z_coordinate, 'z_coordinate') >>> print(da.coords['Z_COORDINATE'].values[:2]) ['Z_COORDINATE FEATURE 0' 'Z_COORDINATE FEATURE 1'] >>> print(da.coords['traj_num'].values) [1] >>> print(da.attrs['time_units']) ps
- unpack_data_and_feature(feat, traj, input_data)[source]#
Makes a xarray.Dataset from data and a featurizer.
Usually, if you add multiple features to a featurizer, they are stacked along the feature axis. Let’s say, you have a trajectory with 20 frames and 3 residues. If you add the Ramachandran angles, you get 6 features (3xphi, 3xpsi). If you then also add the end-to-end distance as a feature, the data returned by the featurizer will have the shape (20, 7). This function returns the correct indices, so that iteration of zip(Featurizer.active_features, indices) will yield the correct results.
- Parameters:
feat (encodermap.loading.Featurizer) – An instance of the currently used encodermap.loading.Featurizer.
traj (encodermap.trajinfo.SingleTraj) – An instance of encodermap.SingleTraj, that the data in input_data was computed from.
input_data (np.ndarray) – The data, as returned from PyEMMA.
- Returns:
An xarray.Dataset with all features in a nice format.
- Return type:
encodermap.misc.xarray_save_wrong_hdf5 module#
Allows the combined storing of CVs and trajectories in single HDF5/NetCDF4 files.
These files represent collated and completed trajectory ensembles, which can be lazy-loaded (memory efficient) and used as training input for EncoderMap’s NNs.