Coverage for encodermap/plot/jinja_template.py: 100%
6 statements
« prev ^ index » next coverage.py v7.4.1, created at 2024-12-31 16:54 +0100
« prev ^ index » next coverage.py v7.4.1, created at 2024-12-31 16:54 +0100
1# -*- coding: utf-8 -*-
2# encodermap/plot/jinja_template.py
3################################################################################
4# EncoderMap: A python library for dimensionality reduction.
5#
6# Copyright 2019-2024 University of Konstanz and the Authors
7#
8# Authors:
9# Kevin Sawade
10#
11# Encodermap is free software: you can redistribute it and/or modify
12# it under the terms of the GNU Lesser General Public License as
13# published by the Free Software Foundation, either version 2.1
14# of the License, or (at your option) any later version.
15# This package is distributed in the hope that it will be useful to other
16# researches. IT DOES NOT COME WITH ANY WARRANTY WHATSOEVER; without even the
17# implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
18# See the GNU Lesser General Public License for more details.
19#
20# See <http://www.gnu.org/licenses/>.
21################################################################################
22"""This is a template for a README.md generated when a user writes a cluster to disk."""
24xtc_rebuild = """\
25This file can be used to rebuild the clustering of the trajectories like so:
27```python
28>>> import encodermap as em
29>>> import numpy as np
30>>> trajs = em.TrajEnsemble.from_textfile('{{parents_trajs}}')
31>>> cluster_membership = np.full(trajs.n_frames, -1) # fill array with -1, meaning outliers
32>>> indices = np.load('{{indices_npy_name}}') # load the indices
33>>> cluster_membership[indices] = {{cluster_id}} # set the cluster number of the indices
34>>> trajs.load_CVs(cluster_membership, 'cluster_membership') # load the cluster membership as collectvie variables
35>>> traj_indices = trajs.id[indices] # more on this line in a separate section
36>>> cluster_trajs = trajs[traj_indices]
37```
38"""
40h5_rebuild = """\
41This file can be used to rebuild the clustering of the trajectories like so:
43```python
44>>> import encodermap as em
45>>> import numpy as np
46>>> trajs = em.TrajEnsemble.from_dataset('{{h5_file}}')
47>>> cluster_membership = np.full(trajs.n_frames, -1) # fill array with -1, meaning outliers
48>>> indices = np.load('{{indices_npy_name}}') # load the indices
49>>> cluster_membership[indices] = {{cluster_id}} # set the cluster number of the indices
50>>> trajs.load_CVs(cluster_membership, 'cluster_membership') # load the cluster membership as collectvie variables
51>>> traj_indices = trajs.id[indices] # more on this line in a separate section
52>>> cluster_trajs = trajs[traj_indices]
53```
54"""
56xtc_parents = """\
57## {{parents_trajs}}
59This plain text document contains the absolute paths to all trajectory files, their corresponding topology files and their corresponding `common_str`, that were considered during the clustering. Some of the trajectory files here might not take part in the actual cluster, but they are here in this file nonetheless. You can reload the trajectories with the `from_textfile()` alternative constructor of the `TrajEnsemble` class.
61```python
62import encodermap as em
63trajs = em.TrajEnsemble.from_textfile('{{parents_trajs}}')
64```
65"""
67h5_parents = """\
68## Parents Trajs
70The parent trajectories are all saved in a single h5 file.
72```python
73import encodermap as em
74trajs = em.TrajEnsemble.from_dataset('{{h5_file}}')
75```
76"""
78template = """# Cluster {{cluster_id}} generated at {{now}}
80## What just happened?
82You either selected a cluster with the `InteractivePlotting` class of `encodermap` our you called the `_unpack_cluster_info()` function from `encodermap.plot.utils`. Many files have been put into a directory at {{cluster_abspath}} which can be used to rebuild the cluster. The cluster you selected has been assigned the number {{cluster_id}}. If your cluster number is 0, your cluster is the first selected cluster of these MD trajectories (outliers are assigned -1). If your cluster has a number different than 0, you have selected another cluster and the cluster_membership is given by this unique identifier.
84Here is a general rundown of the files created:
88## {{h5_name}}
90This file contains 10 frames. These 10 frames were selected from the original {{cluster_n_points}} points inside the cluster. By evenly slicing it (That's why it is only roughly 10 structures. Sometimes its more). You can load this pdb whichever way you like and render a nice image of the cluster.
92## Other pdb and xtc files
94The other pdb and xtc files contain data to rebuild not only the ca. 10 frames from the pdb, but the whole cluster. They are enumerated the same way they are enumerated in {{parents_trajs}}. The fille `cluster_id_{{cluster_id}}_traj_0.xtc` corresponds to `cluster_id_{{cluster_id}}_start_traj_0_from_{{basename}}.pdb`, `cluster_id_{{cluster_id}}_traj_1.xtc` corresponds to `cluster_id_{{cluster_id}}_start_traj_1_from_{{basename}}.pdb` and so on.
96## {{lowd_npy_name}}
98A 2D numpy array with the same number of points, as there are frames in the `cluster_id_{{cluster_id}}_traj_X.xtc` files combined. This is the low-dimensional representation of this whole cluster.
100## {{indices_npy_name}}
102{{rebuild_clustering_info}}
104## {{pdb_origin_names}}
106This file is only created, when the structures inside the cluster have a different number of atoms and thus, can not be loaded with the same topology. This plain text file contains information from where the pdb files were copied. This might only be useful in very niche scenarios.
108## {{csv_name}}
110This .csv table contains Info about every point inside the cluster. Its columns give the following information:
112| trajectory file | Contains the trajectory data (file formats such as .xtc, .dcd, .h5). |
113| ------------------------------------- | ------------------------------------------------------------ |
114| topology file | Contains the topology of the file (i.e. atom types, masses, residues) (file formats such as .pdb, .gro, .h5). Some trajectory files (.h5) might also contain the topology. In that case `trajectory file` and `topology` file are identical. |
115| frame number | The number of the frame of the `trajectory file`. If you index your trajectories by frame number use this number to reload this specific trajectory frame. `import mdtraj as md; frame = md.load_frame(trajectory_file, index=frame, top=topology_file)`<br />or<br />`imprt MDAnalysis as mda; frame = mda.Universe(topology_file, trajectory_file).trajectory[frame]` |
116| time | The time of the frame. This can be used for time-based indexing of trajectories. `gmx trjconv -f $traj_file -s $top_file -dump $time` |
117| cluster id | The id of the cluster. This column is identical in the whole csv file but can be used to merge multiple csv files to analyye multiple clusters at once. |
118| trajectory number | The number of the trajectory in the full dataset. This corresponds to the line number in the file {{parents_trajs}}. If many trajectories have been loaded, the first trajectory is 0, and so on. If only one trajectory is loaded, its `trajectory number` might also be `None`. |
120## {{selector_npy_name}}
122This is a 2D numpy array of the points of the Selector used. The selector is a matplotlib.widget that can interactively select points in a 2D scatter plot. In `encodermap` 4 selectors are available:
124- Rectangle: For Rectangle the Selector will contain 4 points. The xy coordinates of the corners of the rectangle in data coordinates.
125- Polygon: Similar to Rectangle a collection of points. The first and last point are identical.
126- Ellipse: A collection of points describing the outline of the Ellipse.
127- Lasso: A collection of points following a free-hand drawn shape.
129## {{current_clustering}}
131This is a numpy array containing the cluster numbers of all previously selected clusters. If this cluster has a cluster id of 0, this array will only contain 0s and -1s and will be the same lengths as there are frames in the analyzed trajectories.
133```python
134>>> import encodermap as em
135>>> import numpy as np
136>>> trajs = em.TrajEnsemble.from_textfile('{{parents_trajs}}')
137>>> current_clustering = np.load('{{current_clustering}}')
138>>> len(current_clustering) == trajs.n_frames
139True
140```
142If this cluster has a higher cluster id all previously selected clusters can be accessed with this array:
144```python
145>>> import encodermap as em
146>>> import numpy as np
147>>> trajs = em.TrajEnsemble.from_textfile('{{parents_trajs}}')
148>>> current_clustering = np.load('{{current_clustering}}') # fill array with -1, meaning outliers
149>>> trajs.load_CVs(current_clustering, 'cluster_membership') # load the cluster membership as collectvie variables
150>>> indices_some_other_cluster = np.where(trajs.cluster_membership == 2)[0]
151>>> traj_indices = trajs.id[indices_some_other_cluster] # more on this line in a separate section
152>>> cluster_trajs = trajs[traj_indices]
153>>> len(traj_indices == cluster_trajs.n_frames)
154True
155```
157## {{png_name}}
159This is just an image. Here it is
161
163## Why the `trajs.id[indices]` part?
165This comes down to the question of: What should be returned if an `TrajEnsemble` object is indexed via a list or numpy array. For this we will first fall back and try to figure out, what should happen if the `TrajEnsemble` class is indexed via a single integer. The most sensical way would be that you get an `SingleTraj` class indexed by this integer. Consider this example:
167```python
168>>> import encodermap as em
169>>> traj1 = em.SingleTraj('path/to/traj1.xtc', top='path/to/top1.pdb')
170>>> print(traj1.basename)
171traj1
172>>> traj2 = em.SingleTraj('path/to/traj2.xtc', top='path/to/top2.pdb')
173>>> traj3 = em.SingleTraj('path/to/traj3.xtc', top='path/to/top3.pdb')
174>>> trajs = em.TrajEnsemble([traj1, traj2, traj3])
175>>> print([t.basename for t in trajs])
176['traj1', 'traj2', 'traj3']
177>>> integer_indexing = trajs[2]
178>>> print(integer_indexing == traj3)
179True
180>>> print(integer_indexing.basename)
181traj3
182```
184Using a list of int or a numpy array of int thus returns a new `TrajEnsemble` class, but with the `SingleTraj` classes indexed by the ints. Consider this example:
186```python
187>>> import encodermap as em
188>>> traj1 = em.SingleTraj('path/to/traj1.xtc', top='path/to/top1.pdb')
189>>> traj2 = em.SingleTraj('path/to/traj2.xtc', top='path/to/top2.pdb')
190>>> traj3 = em.SingleTraj('path/to/traj3.xtc', top='path/to/top3.pdb')
191>>> trajs = em.TrajEnsemble([traj1, traj2, traj3])
192>>> print([t.basename for t in trajs])
193['traj1', 'traj2', 'traj3']
194>>> indices = [1, 2]
195>>> new_trajs = trajs[indices]
196>>> print([t.basename for t in new_trajs])
197['traj2', 'traj3']
198```
200And finally we arrived at the point of using the `traj_indices = trajs.id[indices]` syntax in section {{indices_npy_name}}. This will return a numpy array with ndim = 2 with which you can index single frames. Let's say we want to have a `TrajEnsemble` class, but only with frame 10 of traj 0, frame 20 of traj 2 and frame 30 of traj 3. Maybe we will also add the frames 2 to 5 from traj 2. The syntax will be as follows:
202```python
203>>> import encodermap as em
204>>> traj1 = em.SingleTraj('path/to/traj1.xtc', top='path/to/top1.pdb')
205>>> traj2 = em.SingleTraj('path/to/traj2.xtc', top='path/to/top2.pdb')
206>>> traj3 = em.SingleTraj('path/to/traj3.xtc', top='path/to/top3.pdb')
207>>> trajs = em.TrajEnsemble([traj1, traj2, traj3])
208>>> print([t.basename for t in trajs])
209['traj1', 'traj2', 'traj3']
210>>> print([t.n_frames for t in trajs])
211[100, 100, 100]
212>>> indices = np.array([
213 [1, 10],
214 [2, 20],
215 [3, 30],
216 [2, 2].
217 [2, 3],
218 [2, 4],
219 [2, 5]
220])
221>>> new_trajs = trajs[indices]
222>>> print([t.basename for t in new_trajs])
223['traj1', 'traj2', 'traj3', 'traj2', 'traj2', 'traj2', 'traj2']
224>>> print([t.n_frames for t in new_trajs])
225[1, 1, 1, 1, 1, 1, 1]
226>>> print(set([type(t) for t in new_trajs]))
227[encodermap.SingleTraj]
228```
230So all in all a 1D array of ints indexes single trajectories a 2D array of ints indexes trajs and frames.
232## What is a `common_str`?
234Encodermap's `TrajEnsemble` and `SingleTraj` classes contain a class variable called `comon_str`. The common string is a way to order trajectory files from the same topology. This comes in handy, when you run many simulations with the same topology and want to compare them to simulations with a similar, but different topology. Let's consider this scenario. You run simulations of short peptides AFFA and FAAF. Both peptides have the same number of atoms but different topologies. Somehow they still share some joint phase space and can be considered similar to some regards. You set up some simulations from your AFFA.pdb and FAAF.pdb files and them Now you have these files to consider:
236- AFFA.pdb: AFFA_traj1.xtc, AFFA_traj2.xtc, AFFA_traj3.xtc
237- FAAF.pdb: FAAF_traj.xtc
239And you want to compare them. For this you need to assign the pdb files to the corresponding xtc files. Luckily you chosen a naming scheme that lets you group them by the substrings AFFA and FAAF. You can load all trajectories with encodermap using the `TrajEnsemble` class.
241```python
242import encodermap as em
243trajs = em.TrajEnsemble(
244 [AFFA_traj1.xtc, AFFA_traj2.xtc, AFFA_traj3.xtc, FAAF_traj.xtc],
245 [AFFA.pdb, FAAF.pdb],
246 common_str=['FAAF', 'AFFA']
247)
248```
250## Rendering this document
252If you don't like to view plain markdown files with a text-viewer there are many viewers available, that are able to render markdown nicely. I am currently using typora:
254https://typora.io/
256If you want to create a pdf from this document you can try a combination of pandoc, latex and groff.
258### HTML
260```bash
261pandoc {{filename}}.md -o {{filename}}.html
262```
264### Latex
266```bash
267pandoc {{filename}}.md -o {{filename}}.pdf
268```
270### Groff
272```bash
273pandoc {{filename}}.md -t ms -o {{filename}}.pdf
274```
276## Debug Info
278```
279encodermap.__version__ = {{encodermap_version}}
280system_user = {{system_user}}
281platform = {{platform}}
282platform_release = {{platform_release}}
283platform_version = {{platform_version}}
284architecture = {{architecture}}
285hostname = {{hostname}}
286ip_address = {{ip_address}}
287mac_address = {{mac_address}}
288processor = {{processor}}
289ram = {{ram}}
290pip freeze = {{pip_freeze}}
292```
296"""