Coverage for encodermap/plot/jinja_template.py: 100%
2 statements
« prev ^ index » next coverage.py v7.1.0, created at 2023-02-07 11:05 +0000
« prev ^ index » next coverage.py v7.1.0, created at 2023-02-07 11:05 +0000
1# -*- coding: utf-8 -*-
2# encodermap/plot/jinja_template.py
3################################################################################
4# Encodermap: A python library for dimensionality reduction.
5#
6# Copyright 2019-2022 University of Konstanz and the Authors
7#
8# Authors:
9# Kevin Sawade, Tobias Lemke
10#
11# Encodermap is free software: you can redistribute it and/or modify
12# it under the terms of the GNU Lesser General Public License as
13# published by the Free Software Foundation, either version 2.1
14# of the License, or (at your option) any later version.
15# This package is distributed in the hope that it will be useful to other
16# researches. IT DOES NOT COME WITH ANY WARRANTY WHATSOEVER; without even the
17# implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
18# See the GNU Lesser General Public License for more details.
19#
20# See <http://www.gnu.org/licenses/>.
21################################################################################
22"""This is a template for a README.md generated when a user writes a cluster to disk."""
24template = """# Cluster {{cluster_id}} generated at {{now}}
26## What just happened?
28You either selected a cluster with the `InteractivePlotting` class of `encodermap` our you called the `_unpack_cluster_info()` function from `encodermap.plot.utils`. Many files have been put into a directory at {{cluster_abspath}} which can be used to rebuild the cluster. The cluster you selected has been assigned the number {{cluster_id}}. If your cluster number is 0, your cluster is the first selected cluster of these MD trajectories (outliers are assigned -1). If your cluster has a number different than 0, you have selected another cluster and the cluster_membership is given by this unique identifier.
30Here is a general rundown of the files created:
32## {{parents_trajs}}
34This plain text document contains the absolute paths to all trajectory files, their corresponding topology files and their corresponding `common_str`, that were considered during the clustering. Some of the trajectory files here might not take part in the actual cluster, but they are here in this file nonetheless. You can reload the trajectories with the `from_textfile()` alternative constructor of the `TrajEnsemble` class.
36```python
37import encodermap as em
38trajs = em.TrajEnsemble.from_textfile('{{parents_trajs}}')
39```
41## {{pdb_name}}
43This file contains ca. 10 frames. These 10 frames were selected from the original {{cluster_n_points}} points inside the cluster. By evenly slicing it (That's why it is only roughly 10 structures. Sometimes its more). You can load this pdb whichever way you like and render a nice image of the cluster.
45## Other pdb and xtc files
47The other pdb and xtc files contain data to rebuild not only the ca. 10 frames from the pdb, but the whole cluster. They are enumerated the same way they are enumerated in {{parents_trajs}}. The fille `cluster_id_{{cluster_id}}_traj_0.xtc` corresponds to `cluster_id_{{cluster_id}}_start_traj_0_from_{{basename}}.pdb`, `cluster_id_{{cluster_id}}_traj_1.xtc` corresponds to `cluster_id_{{cluster_id}}_start_traj_1_from_{{basename}}.pdb` and so on.
49## {{lowd_npy_name}}
51A 2D numpy array with the same number of points, as there are frames in the `cluster_id_{{cluster_id}}_traj_X.xtc` files combined. This is the low-dimensional representation of this whole cluster.
53## {{indices_npy_name}}
55This file can be used to rebuild the clustering of the trajectories like so:
57```python
58>>> import encodermap as em
59>>> import numpy as np
60>>> trajs = em.TrajEnsemble.from_textfile('{{parents_trajs}}')
61>>> cluster_membership = np.full(trajs.n_frames, -1) # fill array with -1, meaning outliers
62>>> indices = np.load('{{indices_npy_name}}') # load the indices
63>>> cluster_membership[indices] = {{cluster_id}} # set the cluster number of the indices
64>>> trajs.load_CVs(cluster_membership, 'cluster_membership') # load the cluster membership as collectvie variables
65>>> traj_indices = trajs.id[indices] # more on this line in a separate section
66>>> cluster_trajs = trajs[traj_indices]
67```
69## {{pdb_origin_names}}
71This file is only created, when the structures inside the cluster have a different number of atoms and thus, can not be loaded with the same topology. This plain text file contains information from where the pdb files were copied. This might only be useful in very niche scenarios.
73## {{csv_name}}
75This .csv table contains Info about every point inside the cluster. Its columns give the following information:
77| trajectory file | Contains the trajectory data (file formats such as .xtc, .dcd, .h5). |
78| ------------------------------------- | ------------------------------------------------------------ |
79| topology file | Contains the topology of the file (i.e. atom types, masses, residues) (file formats such as .pdb, .gro, .h5). Some trajectory files (.h5) might also contain the topology. In that case `trajectory file` and `topology` file are identical. |
80| frame number | The number of the frame of the `trajectory file`. If you index your trajectories by frame number use this number to reload this specific trajectory frame. `import mdtraj as md; frame = md.load_frame(trajectory_file, index=frame, top=topology_file)`<br />or<br />`imprt MDAnalysis as mda; frame = mda.Universe(topology_file, trajectory_file).trajectory[frame]` |
81| time | The time of the frame. This can be used for time-based indexing of trajectories. `gmx trjconv -f $traj_file -s $top_file -dump $time` |
82| cluster id | The id of the cluster. This column is identical in the whole csv file but can be used to merge multiple csv files to analyye multiple clusters at once. |
83| trajectory number | The number of the trajectory in the full dataset. This corresponds to the line number in the file {{parents_trajs}}. If many trajectories have been loaded, the first trajectory is 0, and so on. If only one trajectory is loaded, its `trajectory number` might also be `None`. |
84| unique id in set of {{n_trajs}} trajs | This is an integer number with a unique identifier of every frame of every trajectory given in {{parents_trajs}}. The frames of trajectory number 0 are enumerated starting from 0, 1, ... n. The frames of the next trajectory (trajectory number 1) are enumerated n + 1, n + 2, ... n + m. The frames of traj 3 are enumerated as n + m + 1, n + m + 2, and so on. This way every frame gets a unique integer identifier. |
86## {{selector_npy_name}}
88This is a 2D numpy array of the points of the Selector used. The selector is a matplotlib.widget that can interactively select points in a 2D scatter plot. In `encodermap` 4 selectors are available:
90- Rectangle: For Rectangle the Selector will contain 4 points. The xy coordinates of the corners of the rectangle in data coordinates.
91- Polygon: Similar to Rectangle a collection of points. The first and last point are identical.
92- Ellipse: A collection of points describing the outline of the Ellipse.
93- Lasso: A collection of points following a free-hand drawn shape.
95## {{current_clustering}}
97This is a numpy array containing the cluster numbers of all previously selected clusters. If this cluster has a cluster id of 0, this array will only contain 0s and -1s and will be the same lengths as there are frames in the analyzed trajectories.
99```python
100>>> import encodermap as em
101>>> import numpy as np
102>>> trajs = em.TrajEnsemble.from_textfile('{{parents_trajs}}')
103>>> current_clustering = np.load('{{current_clustering}}')
104>>> len(current_clustering) == trajs.n_frames
105True
106```
108If this cluster has a higher cluster id all previously selected clusters can be accessed with this array:
110```python
111>>> import encodermap as em
112>>> import numpy as np
113>>> trajs = em.TrajEnsemble.from_textfile('{{parents_trajs}}')
114>>> current_clustering = np.load('{{current_clustering}}') # fill array with -1, meaning outliers
115>>> trajs.load_CVs(current_clustering, 'cluster_membership') # load the cluster membership as collectvie variables
116>>> indices_some_other_cluster = np.where(trajs.cluster_membership == 2)[0]
117>>> traj_indices = trajs.id[indices_some_other_cluster] # more on this line in a separate section
118>>> cluster_trajs = trajs[traj_indices]
119>>> len(traj_indices == cluster_trajs.n_frames)
120True
121```
123## {{png_name}}
125This is just an image. Here it is
127![Cluster Image]({{png_name}})
129## Why the `trajs.id[indices]` part?
131This comes down to the question of: What should be returned if an `TrajEnsemble` object is indexed via a list or numpy array. For this we will first fall back and try to figure out, what should happen if the `TrajEnsemble` class is indexed via a single integer. The most sensical way would be that you get an `SingleTraj` class indexed by this integer. Consider this example:
133```python
134>>> import encodermap as em
135>>> traj1 = em.SingleTraj('path/to/traj1.xtc', top='path/to/top1.pdb')
136>>> print(traj1.basename)
137traj1
138>>> traj2 = em.SingleTraj('path/to/traj2.xtc', top='path/to/top2.pdb')
139>>> traj3 = em.SingleTraj('path/to/traj3.xtc', top='path/to/top3.pdb')
140>>> trajs = em.TrajEnsemble([traj1, traj2, traj3])
141>>> print([t.basename for t in trajs])
142['traj1', 'traj2', 'traj3']
143>>> integer_indexing = trajs[2]
144>>> print(integer_indexing == traj3)
145True
146>>> print(integer_indexing.basename)
147traj3
148```
150Using a list of int or a numpy array of int thus returns a new `TrajEnsemble` class, but with the `SingleTraj` classes indexed by the ints. Consider this example:
152```python
153>>> import encodermap as em
154>>> traj1 = em.SingleTraj('path/to/traj1.xtc', top='path/to/top1.pdb')
155>>> traj2 = em.SingleTraj('path/to/traj2.xtc', top='path/to/top2.pdb')
156>>> traj3 = em.SingleTraj('path/to/traj3.xtc', top='path/to/top3.pdb')
157>>> trajs = em.TrajEnsemble([traj1, traj2, traj3])
158>>> print([t.basename for t in trajs])
159['traj1', 'traj2', 'traj3']
160>>> indices = [1, 2]
161>>> new_trajs = trajs[indices]
162>>> print([t.basename for t in new_trajs])
163['traj2', 'traj3']
164```
166And finally we arrived at the point of using the `traj_indices = trajs.id[indices]` syntax in section {{indices_npy_name}}. This will return a numpy array with ndim = 2 with which you can index single frames. Let's say we want to have a `TrajEnsemble` class, but only with frame 10 of traj 0, frame 20 of traj 2 and frame 30 of traj 3. Maybe we will also add the frames 2 to 5 from traj 2. The syntax will be as follows:
168```python
169>>> import encodermap as em
170>>> traj1 = em.SingleTraj('path/to/traj1.xtc', top='path/to/top1.pdb')
171>>> traj2 = em.SingleTraj('path/to/traj2.xtc', top='path/to/top2.pdb')
172>>> traj3 = em.SingleTraj('path/to/traj3.xtc', top='path/to/top3.pdb')
173>>> trajs = em.TrajEnsemble([traj1, traj2, traj3])
174>>> print([t.basename for t in trajs])
175['traj1', 'traj2', 'traj3']
176>>> print([t.n_frames for t in trajs])
177[100, 100, 100]
178>>> indices = np.array([
179 [1, 10],
180 [2, 20],
181 [3, 30],
182 [2, 2].
183 [2, 3],
184 [2, 4],
185 [2, 5]
186])
187>>> new_trajs = trajs[indices]
188>>> print([t.basename for t in new_trajs])
189['traj1', 'traj2', 'traj3', 'traj2', 'traj2', 'traj2', 'traj2']
190>>> print([t.n_frames for t in new_trajs])
191[1, 1, 1, 1, 1, 1, 1]
192>>> print(set([type(t) for t in new_trajs]))
193[encodermap.SingleTraj]
194```
196So all in all a 1D array of ints indexes single trajectories a 2D array of ints indexes trajs and frames.
198## What is a `common_str`?
200Encodermap's `TrajEnsemble` and `SingleTraj` classes contain a class variable called `comon_str`. The common string is a way to order trajectory files from the same topology. This comes in handy, when you run many simulations with the same topology and want to compare them to simulations with a similar, but different topology. Let's consider this scenario. You run simulations of short peptides AFFA and FAAF. Both peptides have the same number of atoms but different topologies. Somehow they still share some joint phase space and can be considered similar to some regards. You set up some simulations from your AFFA.pdb and FAAF.pdb files and them Now you have these files to consider:
202- AFFA.pdb: AFFA_traj1.xtc, AFFA_traj2.xtc, AFFA_traj3.xtc
203- FAAF.pdb: FAAF_traj.xtc
205And you want to compare them. For this you need to assign the pdb files to the corresponding xtc files. Luckily you chosen a naming scheme that lets you group them by the substrings AFFA and FAAF. You can load all trajectories with encodermap using the `TrajEnsemble` class.
207```python
208import encodermap as em
209trajs = em.TrajEnsemble(
210 [AFFA_traj1.xtc, AFFA_traj2.xtc, AFFA_traj3.xtc, FAAF_traj.xtc],
211 [AFFA.pdb, FAAF.pdb],
212 common_str=['FAAF', 'AFFA']
213)
214```
216## Rendering this document
218If you don't like to view plain markdown files with a text-viewer there are many viewers available, that are able to render markdown nicely. I am currently using typora:
220https://typora.io/
222If you want to create a pdf from this document you can try a combination of pandoc, latex and groff.
224### HTML
226```bash
227pandoc {{filename}}.md -o {{filename}}.html
228```
230### Latex
232```bash
233pandoc {{filename}}.md -o {{filename}}.pdf
234```
236### Groff
238```bash
239pandoc {{filename}}.md -t ms -o {{filename}}.pdf
240```
242## Debug Info
244```
245encodermap.__version__ = {{encodermap_version}}
246system_user = {{system_user}}
247platform = {{platform}}
248platform_release = {{platform_release}}
249platform_version = {{platform_version}}
250architecture = {{architecture}}
251hostname = {{hostname}}
252ip_address = {{ip_address}}
253mac_address = {{mac_address}}
254processor = {{processor}}
255ram = {{ram}}
256pip freeze = {{pip_freeze}}
258```
262"""