Coverage for encodermap/plot/jinja_template.py: 100%

2 statements  

« prev     ^ index     » next       coverage.py v7.1.0, created at 2023-02-07 11:05 +0000

1# -*- coding: utf-8 -*- 

2# encodermap/plot/jinja_template.py 

3################################################################################ 

4# Encodermap: A python library for dimensionality reduction. 

5# 

6# Copyright 2019-2022 University of Konstanz and the Authors 

7# 

8# Authors: 

9# Kevin Sawade, Tobias Lemke 

10# 

11# Encodermap is free software: you can redistribute it and/or modify 

12# it under the terms of the GNU Lesser General Public License as 

13# published by the Free Software Foundation, either version 2.1 

14# of the License, or (at your option) any later version. 

15# This package is distributed in the hope that it will be useful to other 

16# researches. IT DOES NOT COME WITH ANY WARRANTY WHATSOEVER; without even the 

17# implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. 

18# See the GNU Lesser General Public License for more details. 

19# 

20# See <http://www.gnu.org/licenses/>. 

21################################################################################ 

22"""This is a template for a README.md generated when a user writes a cluster to disk.""" 

23 

24template = """# Cluster {{cluster_id}} generated at {{now}} 

25 

26## What just happened? 

27 

28You either selected a cluster with the `InteractivePlotting` class of `encodermap` our you called the `_unpack_cluster_info()` function from `encodermap.plot.utils`. Many files have been put into a directory at {{cluster_abspath}} which can be used to rebuild the cluster. The cluster you selected has been assigned the number {{cluster_id}}. If your cluster number is 0, your cluster is the first selected cluster of these MD trajectories (outliers are assigned -1). If your cluster has a number different than 0, you have selected another cluster and the cluster_membership is given by this unique identifier. 

29 

30Here is a general rundown of the files created: 

31 

32## {{parents_trajs}} 

33 

34This plain text document contains the absolute paths to all trajectory files, their corresponding topology files and their corresponding `common_str`, that were considered during the clustering. Some of the trajectory files here might not take part in the actual cluster, but they are here in this file nonetheless. You can reload the trajectories with the `from_textfile()` alternative constructor of the `TrajEnsemble` class. 

35 

36```python 

37import encodermap as em 

38trajs = em.TrajEnsemble.from_textfile('{{parents_trajs}}') 

39``` 

40 

41## {{pdb_name}} 

42 

43This file contains ca. 10 frames. These 10 frames were selected from the original {{cluster_n_points}} points inside the cluster. By evenly slicing it (That's why it is only roughly 10 structures. Sometimes its more). You can load this pdb whichever way you like and render a nice image of the cluster. 

44 

45## Other pdb and xtc files 

46 

47The other pdb and xtc files contain data to rebuild not only the ca. 10 frames from the pdb, but the whole cluster. They are enumerated the same way they are enumerated in {{parents_trajs}}. The fille `cluster_id_{{cluster_id}}_traj_0.xtc` corresponds to `cluster_id_{{cluster_id}}_start_traj_0_from_{{basename}}.pdb`, `cluster_id_{{cluster_id}}_traj_1.xtc` corresponds to `cluster_id_{{cluster_id}}_start_traj_1_from_{{basename}}.pdb` and so on. 

48 

49## {{lowd_npy_name}} 

50 

51A 2D numpy array with the same number of points, as there are frames in the `cluster_id_{{cluster_id}}_traj_X.xtc` files combined. This is the low-dimensional representation of this whole cluster. 

52 

53## {{indices_npy_name}} 

54 

55This file can be used to rebuild the clustering of the trajectories like so: 

56 

57```python 

58>>> import encodermap as em 

59>>> import numpy as np 

60>>> trajs = em.TrajEnsemble.from_textfile('{{parents_trajs}}') 

61>>> cluster_membership = np.full(trajs.n_frames, -1) # fill array with -1, meaning outliers 

62>>> indices = np.load('{{indices_npy_name}}') # load the indices 

63>>> cluster_membership[indices] = {{cluster_id}} # set the cluster number of the indices 

64>>> trajs.load_CVs(cluster_membership, 'cluster_membership') # load the cluster membership as collectvie variables 

65>>> traj_indices = trajs.id[indices] # more on this line in a separate section 

66>>> cluster_trajs = trajs[traj_indices] 

67``` 

68 

69## {{pdb_origin_names}} 

70 

71This file is only created, when the structures inside the cluster have a different number of atoms and thus, can not be loaded with the same topology. This plain text file contains information from where the pdb files were copied. This might only be useful in very niche scenarios. 

72 

73## {{csv_name}} 

74 

75This .csv table contains Info about every point inside the cluster. Its columns give the following information: 

76 

77| trajectory file | Contains the trajectory data (file formats such as .xtc, .dcd, .h5). | 

78| ------------------------------------- | ------------------------------------------------------------ | 

79| topology file | Contains the topology of the file (i.e. atom types, masses, residues) (file formats such as .pdb, .gro, .h5). Some trajectory files (.h5) might also contain the topology. In that case `trajectory file` and `topology` file are identical. | 

80| frame number | The number of the frame of the `trajectory file`. If you index your trajectories by frame number use this number to reload this specific trajectory frame. `import mdtraj as md; frame = md.load_frame(trajectory_file, index=frame, top=topology_file)`<br />or<br />`imprt MDAnalysis as mda; frame = mda.Universe(topology_file, trajectory_file).trajectory[frame]` | 

81| time | The time of the frame. This can be used for time-based indexing of trajectories. `gmx trjconv -f $traj_file -s $top_file -dump $time` | 

82| cluster id | The id of the cluster. This column is identical in the whole csv file but can be used to merge multiple csv files to analyye multiple clusters at once. | 

83| trajectory number | The number of the trajectory in the full dataset. This corresponds to the line number in the file {{parents_trajs}}. If many trajectories have been loaded, the first trajectory is 0, and so on. If only one trajectory is loaded, its `trajectory number` might also be `None`. | 

84| unique id in set of {{n_trajs}} trajs | This is an integer number with a unique identifier of every frame of every trajectory given in {{parents_trajs}}. The frames of trajectory number 0 are enumerated starting from 0, 1, ... n. The frames of the next trajectory (trajectory number 1) are enumerated n + 1, n + 2, ... n + m. The frames of traj 3 are enumerated as n + m + 1, n + m + 2, and so on. This way every frame gets a unique integer identifier. | 

85 

86## {{selector_npy_name}} 

87 

88This is a 2D numpy array of the points of the Selector used. The selector is a matplotlib.widget that can interactively select points in a 2D scatter plot. In `encodermap` 4 selectors are available: 

89 

90- Rectangle: For Rectangle the Selector will contain 4 points. The xy coordinates of the corners of the rectangle in data coordinates. 

91- Polygon: Similar to Rectangle a collection of points. The first and last point are identical. 

92- Ellipse: A collection of points describing the outline of the Ellipse. 

93- Lasso: A collection of points following a free-hand drawn shape. 

94 

95## {{current_clustering}} 

96 

97This is a numpy array containing the cluster numbers of all previously selected clusters. If this cluster has a cluster id of 0, this array will only contain 0s and -1s and will be the same lengths as there are frames in the analyzed trajectories. 

98 

99```python 

100>>> import encodermap as em 

101>>> import numpy as np 

102>>> trajs = em.TrajEnsemble.from_textfile('{{parents_trajs}}') 

103>>> current_clustering = np.load('{{current_clustering}}') 

104>>> len(current_clustering) == trajs.n_frames 

105True 

106``` 

107 

108If this cluster has a higher cluster id all previously selected clusters can be accessed with this array: 

109 

110```python 

111>>> import encodermap as em 

112>>> import numpy as np 

113>>> trajs = em.TrajEnsemble.from_textfile('{{parents_trajs}}') 

114>>> current_clustering = np.load('{{current_clustering}}') # fill array with -1, meaning outliers 

115>>> trajs.load_CVs(current_clustering, 'cluster_membership') # load the cluster membership as collectvie variables 

116>>> indices_some_other_cluster = np.where(trajs.cluster_membership == 2)[0] 

117>>> traj_indices = trajs.id[indices_some_other_cluster] # more on this line in a separate section 

118>>> cluster_trajs = trajs[traj_indices] 

119>>> len(traj_indices == cluster_trajs.n_frames) 

120True 

121``` 

122 

123## {{png_name}} 

124 

125This is just an image. Here it is 

126 

127![Cluster Image]({{png_name}}) 

128 

129## Why the `trajs.id[indices]` part? 

130 

131This comes down to the question of: What should be returned if an `TrajEnsemble` object is indexed via a list or numpy array. For this we will first fall back and try to figure out, what should happen if the `TrajEnsemble` class is indexed via a single integer. The most sensical way would be that you get an `SingleTraj` class indexed by this integer. Consider this example: 

132 

133```python 

134>>> import encodermap as em 

135>>> traj1 = em.SingleTraj('path/to/traj1.xtc', top='path/to/top1.pdb') 

136>>> print(traj1.basename) 

137traj1 

138>>> traj2 = em.SingleTraj('path/to/traj2.xtc', top='path/to/top2.pdb') 

139>>> traj3 = em.SingleTraj('path/to/traj3.xtc', top='path/to/top3.pdb') 

140>>> trajs = em.TrajEnsemble([traj1, traj2, traj3]) 

141>>> print([t.basename for t in trajs]) 

142['traj1', 'traj2', 'traj3'] 

143>>> integer_indexing = trajs[2] 

144>>> print(integer_indexing == traj3) 

145True 

146>>> print(integer_indexing.basename) 

147traj3 

148``` 

149 

150Using a list of int or a numpy array of int thus returns a new `TrajEnsemble` class, but with the `SingleTraj` classes indexed by the ints. Consider this example: 

151 

152```python 

153>>> import encodermap as em 

154>>> traj1 = em.SingleTraj('path/to/traj1.xtc', top='path/to/top1.pdb') 

155>>> traj2 = em.SingleTraj('path/to/traj2.xtc', top='path/to/top2.pdb') 

156>>> traj3 = em.SingleTraj('path/to/traj3.xtc', top='path/to/top3.pdb') 

157>>> trajs = em.TrajEnsemble([traj1, traj2, traj3]) 

158>>> print([t.basename for t in trajs]) 

159['traj1', 'traj2', 'traj3'] 

160>>> indices = [1, 2] 

161>>> new_trajs = trajs[indices] 

162>>> print([t.basename for t in new_trajs]) 

163['traj2', 'traj3'] 

164``` 

165 

166And finally we arrived at the point of using the `traj_indices = trajs.id[indices]` syntax in section {{indices_npy_name}}. This will return a numpy array with ndim = 2 with which you can index single frames. Let's say we want to have a `TrajEnsemble` class, but only with frame 10 of traj 0, frame 20 of traj 2 and frame 30 of traj 3. Maybe we will also add the frames 2 to 5 from traj 2. The syntax will be as follows: 

167 

168```python 

169>>> import encodermap as em 

170>>> traj1 = em.SingleTraj('path/to/traj1.xtc', top='path/to/top1.pdb') 

171>>> traj2 = em.SingleTraj('path/to/traj2.xtc', top='path/to/top2.pdb') 

172>>> traj3 = em.SingleTraj('path/to/traj3.xtc', top='path/to/top3.pdb') 

173>>> trajs = em.TrajEnsemble([traj1, traj2, traj3]) 

174>>> print([t.basename for t in trajs]) 

175['traj1', 'traj2', 'traj3'] 

176>>> print([t.n_frames for t in trajs]) 

177[100, 100, 100] 

178>>> indices = np.array([ 

179 [1, 10], 

180 [2, 20], 

181 [3, 30], 

182 [2, 2]. 

183 [2, 3], 

184 [2, 4], 

185 [2, 5] 

186]) 

187>>> new_trajs = trajs[indices] 

188>>> print([t.basename for t in new_trajs]) 

189['traj1', 'traj2', 'traj3', 'traj2', 'traj2', 'traj2', 'traj2'] 

190>>> print([t.n_frames for t in new_trajs]) 

191[1, 1, 1, 1, 1, 1, 1] 

192>>> print(set([type(t) for t in new_trajs])) 

193[encodermap.SingleTraj] 

194``` 

195 

196So all in all a 1D array of ints indexes single trajectories a 2D array of ints indexes trajs and frames. 

197 

198## What is a `common_str`? 

199 

200Encodermap's `TrajEnsemble` and `SingleTraj` classes contain a class variable called `comon_str`. The common string is a way to order trajectory files from the same topology. This comes in handy, when you run many simulations with the same topology and want to compare them to simulations with a similar, but different topology. Let's consider this scenario. You run simulations of short peptides AFFA and FAAF. Both peptides have the same number of atoms but different topologies. Somehow they still share some joint phase space and can be considered similar to some regards. You set up some simulations from your AFFA.pdb and FAAF.pdb files and them Now you have these files to consider: 

201 

202- AFFA.pdb: AFFA_traj1.xtc, AFFA_traj2.xtc, AFFA_traj3.xtc 

203- FAAF.pdb: FAAF_traj.xtc 

204 

205And you want to compare them. For this you need to assign the pdb files to the corresponding xtc files. Luckily you chosen a naming scheme that lets you group them by the substrings AFFA and FAAF. You can load all trajectories with encodermap using the `TrajEnsemble` class. 

206 

207```python 

208import encodermap as em 

209trajs = em.TrajEnsemble( 

210 [AFFA_traj1.xtc, AFFA_traj2.xtc, AFFA_traj3.xtc, FAAF_traj.xtc], 

211 [AFFA.pdb, FAAF.pdb], 

212 common_str=['FAAF', 'AFFA'] 

213) 

214``` 

215 

216## Rendering this document 

217 

218If you don't like to view plain markdown files with a text-viewer there are many viewers available, that are able to render markdown nicely. I am currently using typora: 

219 

220https://typora.io/ 

221 

222If you want to create a pdf from this document you can try a combination of pandoc, latex and groff. 

223 

224### HTML 

225 

226```bash 

227pandoc {{filename}}.md -o {{filename}}.html 

228``` 

229 

230### Latex 

231 

232```bash 

233pandoc {{filename}}.md -o {{filename}}.pdf 

234``` 

235 

236### Groff 

237 

238```bash 

239pandoc {{filename}}.md -t ms -o {{filename}}.pdf 

240``` 

241 

242## Debug Info 

243 

244``` 

245encodermap.__version__ = {{encodermap_version}} 

246system_user = {{system_user}} 

247platform = {{platform}} 

248platform_release = {{platform_release}} 

249platform_version = {{platform_version}} 

250architecture = {{architecture}} 

251hostname = {{hostname}} 

252ip_address = {{ip_address}} 

253mac_address = {{mac_address}} 

254processor = {{processor}} 

255ram = {{ram}} 

256pip freeze = {{pip_freeze}} 

257 

258``` 

259 

260 

261 

262"""