Coverage for encodermap/plot/jinja_template.py: 100%

6 statements  

« prev     ^ index     » next       coverage.py v7.4.1, created at 2024-12-31 16:54 +0100

1# -*- coding: utf-8 -*- 

2# encodermap/plot/jinja_template.py 

3################################################################################ 

4# EncoderMap: A python library for dimensionality reduction. 

5# 

6# Copyright 2019-2024 University of Konstanz and the Authors 

7# 

8# Authors: 

9# Kevin Sawade 

10# 

11# Encodermap is free software: you can redistribute it and/or modify 

12# it under the terms of the GNU Lesser General Public License as 

13# published by the Free Software Foundation, either version 2.1 

14# of the License, or (at your option) any later version. 

15# This package is distributed in the hope that it will be useful to other 

16# researches. IT DOES NOT COME WITH ANY WARRANTY WHATSOEVER; without even the 

17# implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. 

18# See the GNU Lesser General Public License for more details. 

19# 

20# See <http://www.gnu.org/licenses/>. 

21################################################################################ 

22"""This is a template for a README.md generated when a user writes a cluster to disk.""" 

23 

24xtc_rebuild = """\ 

25This file can be used to rebuild the clustering of the trajectories like so: 

26 

27```python 

28>>> import encodermap as em 

29>>> import numpy as np 

30>>> trajs = em.TrajEnsemble.from_textfile('{{parents_trajs}}') 

31>>> cluster_membership = np.full(trajs.n_frames, -1) # fill array with -1, meaning outliers 

32>>> indices = np.load('{{indices_npy_name}}') # load the indices 

33>>> cluster_membership[indices] = {{cluster_id}} # set the cluster number of the indices 

34>>> trajs.load_CVs(cluster_membership, 'cluster_membership') # load the cluster membership as collectvie variables 

35>>> traj_indices = trajs.id[indices] # more on this line in a separate section 

36>>> cluster_trajs = trajs[traj_indices] 

37``` 

38""" 

39 

40h5_rebuild = """\ 

41This file can be used to rebuild the clustering of the trajectories like so: 

42 

43```python 

44>>> import encodermap as em 

45>>> import numpy as np 

46>>> trajs = em.TrajEnsemble.from_dataset('{{h5_file}}') 

47>>> cluster_membership = np.full(trajs.n_frames, -1) # fill array with -1, meaning outliers 

48>>> indices = np.load('{{indices_npy_name}}') # load the indices 

49>>> cluster_membership[indices] = {{cluster_id}} # set the cluster number of the indices 

50>>> trajs.load_CVs(cluster_membership, 'cluster_membership') # load the cluster membership as collectvie variables 

51>>> traj_indices = trajs.id[indices] # more on this line in a separate section 

52>>> cluster_trajs = trajs[traj_indices] 

53``` 

54""" 

55 

56xtc_parents = """\ 

57## {{parents_trajs}} 

58 

59This plain text document contains the absolute paths to all trajectory files, their corresponding topology files and their corresponding `common_str`, that were considered during the clustering. Some of the trajectory files here might not take part in the actual cluster, but they are here in this file nonetheless. You can reload the trajectories with the `from_textfile()` alternative constructor of the `TrajEnsemble` class. 

60 

61```python 

62import encodermap as em 

63trajs = em.TrajEnsemble.from_textfile('{{parents_trajs}}') 

64``` 

65""" 

66 

67h5_parents = """\ 

68## Parents Trajs 

69 

70The parent trajectories are all saved in a single h5 file. 

71 

72```python 

73import encodermap as em 

74trajs = em.TrajEnsemble.from_dataset('{{h5_file}}') 

75``` 

76""" 

77 

78template = """# Cluster {{cluster_id}} generated at {{now}} 

79 

80## What just happened? 

81 

82You either selected a cluster with the `InteractivePlotting` class of `encodermap` our you called the `_unpack_cluster_info()` function from `encodermap.plot.utils`. Many files have been put into a directory at {{cluster_abspath}} which can be used to rebuild the cluster. The cluster you selected has been assigned the number {{cluster_id}}. If your cluster number is 0, your cluster is the first selected cluster of these MD trajectories (outliers are assigned -1). If your cluster has a number different than 0, you have selected another cluster and the cluster_membership is given by this unique identifier. 

83 

84Here is a general rundown of the files created: 

85 

86 

87 

88## {{h5_name}} 

89 

90This file contains 10 frames. These 10 frames were selected from the original {{cluster_n_points}} points inside the cluster. By evenly slicing it (That's why it is only roughly 10 structures. Sometimes its more). You can load this pdb whichever way you like and render a nice image of the cluster. 

91 

92## Other pdb and xtc files 

93 

94The other pdb and xtc files contain data to rebuild not only the ca. 10 frames from the pdb, but the whole cluster. They are enumerated the same way they are enumerated in {{parents_trajs}}. The fille `cluster_id_{{cluster_id}}_traj_0.xtc` corresponds to `cluster_id_{{cluster_id}}_start_traj_0_from_{{basename}}.pdb`, `cluster_id_{{cluster_id}}_traj_1.xtc` corresponds to `cluster_id_{{cluster_id}}_start_traj_1_from_{{basename}}.pdb` and so on. 

95 

96## {{lowd_npy_name}} 

97 

98A 2D numpy array with the same number of points, as there are frames in the `cluster_id_{{cluster_id}}_traj_X.xtc` files combined. This is the low-dimensional representation of this whole cluster. 

99 

100## {{indices_npy_name}} 

101 

102{{rebuild_clustering_info}} 

103 

104## {{pdb_origin_names}} 

105 

106This file is only created, when the structures inside the cluster have a different number of atoms and thus, can not be loaded with the same topology. This plain text file contains information from where the pdb files were copied. This might only be useful in very niche scenarios. 

107 

108## {{csv_name}} 

109 

110This .csv table contains Info about every point inside the cluster. Its columns give the following information: 

111 

112| trajectory file | Contains the trajectory data (file formats such as .xtc, .dcd, .h5). | 

113| ------------------------------------- | ------------------------------------------------------------ | 

114| topology file | Contains the topology of the file (i.e. atom types, masses, residues) (file formats such as .pdb, .gro, .h5). Some trajectory files (.h5) might also contain the topology. In that case `trajectory file` and `topology` file are identical. | 

115| frame number | The number of the frame of the `trajectory file`. If you index your trajectories by frame number use this number to reload this specific trajectory frame. `import mdtraj as md; frame = md.load_frame(trajectory_file, index=frame, top=topology_file)`<br />or<br />`imprt MDAnalysis as mda; frame = mda.Universe(topology_file, trajectory_file).trajectory[frame]` | 

116| time | The time of the frame. This can be used for time-based indexing of trajectories. `gmx trjconv -f $traj_file -s $top_file -dump $time` | 

117| cluster id | The id of the cluster. This column is identical in the whole csv file but can be used to merge multiple csv files to analyye multiple clusters at once. | 

118| trajectory number | The number of the trajectory in the full dataset. This corresponds to the line number in the file {{parents_trajs}}. If many trajectories have been loaded, the first trajectory is 0, and so on. If only one trajectory is loaded, its `trajectory number` might also be `None`. | 

119 

120## {{selector_npy_name}} 

121 

122This is a 2D numpy array of the points of the Selector used. The selector is a matplotlib.widget that can interactively select points in a 2D scatter plot. In `encodermap` 4 selectors are available: 

123 

124- Rectangle: For Rectangle the Selector will contain 4 points. The xy coordinates of the corners of the rectangle in data coordinates. 

125- Polygon: Similar to Rectangle a collection of points. The first and last point are identical. 

126- Ellipse: A collection of points describing the outline of the Ellipse. 

127- Lasso: A collection of points following a free-hand drawn shape. 

128 

129## {{current_clustering}} 

130 

131This is a numpy array containing the cluster numbers of all previously selected clusters. If this cluster has a cluster id of 0, this array will only contain 0s and -1s and will be the same lengths as there are frames in the analyzed trajectories. 

132 

133```python 

134>>> import encodermap as em 

135>>> import numpy as np 

136>>> trajs = em.TrajEnsemble.from_textfile('{{parents_trajs}}') 

137>>> current_clustering = np.load('{{current_clustering}}') 

138>>> len(current_clustering) == trajs.n_frames 

139True 

140``` 

141 

142If this cluster has a higher cluster id all previously selected clusters can be accessed with this array: 

143 

144```python 

145>>> import encodermap as em 

146>>> import numpy as np 

147>>> trajs = em.TrajEnsemble.from_textfile('{{parents_trajs}}') 

148>>> current_clustering = np.load('{{current_clustering}}') # fill array with -1, meaning outliers 

149>>> trajs.load_CVs(current_clustering, 'cluster_membership') # load the cluster membership as collectvie variables 

150>>> indices_some_other_cluster = np.where(trajs.cluster_membership == 2)[0] 

151>>> traj_indices = trajs.id[indices_some_other_cluster] # more on this line in a separate section 

152>>> cluster_trajs = trajs[traj_indices] 

153>>> len(traj_indices == cluster_trajs.n_frames) 

154True 

155``` 

156 

157## {{png_name}} 

158 

159This is just an image. Here it is 

160 

161![Cluster Image]({{png_name}}) 

162 

163## Why the `trajs.id[indices]` part? 

164 

165This comes down to the question of: What should be returned if an `TrajEnsemble` object is indexed via a list or numpy array. For this we will first fall back and try to figure out, what should happen if the `TrajEnsemble` class is indexed via a single integer. The most sensical way would be that you get an `SingleTraj` class indexed by this integer. Consider this example: 

166 

167```python 

168>>> import encodermap as em 

169>>> traj1 = em.SingleTraj('path/to/traj1.xtc', top='path/to/top1.pdb') 

170>>> print(traj1.basename) 

171traj1 

172>>> traj2 = em.SingleTraj('path/to/traj2.xtc', top='path/to/top2.pdb') 

173>>> traj3 = em.SingleTraj('path/to/traj3.xtc', top='path/to/top3.pdb') 

174>>> trajs = em.TrajEnsemble([traj1, traj2, traj3]) 

175>>> print([t.basename for t in trajs]) 

176['traj1', 'traj2', 'traj3'] 

177>>> integer_indexing = trajs[2] 

178>>> print(integer_indexing == traj3) 

179True 

180>>> print(integer_indexing.basename) 

181traj3 

182``` 

183 

184Using a list of int or a numpy array of int thus returns a new `TrajEnsemble` class, but with the `SingleTraj` classes indexed by the ints. Consider this example: 

185 

186```python 

187>>> import encodermap as em 

188>>> traj1 = em.SingleTraj('path/to/traj1.xtc', top='path/to/top1.pdb') 

189>>> traj2 = em.SingleTraj('path/to/traj2.xtc', top='path/to/top2.pdb') 

190>>> traj3 = em.SingleTraj('path/to/traj3.xtc', top='path/to/top3.pdb') 

191>>> trajs = em.TrajEnsemble([traj1, traj2, traj3]) 

192>>> print([t.basename for t in trajs]) 

193['traj1', 'traj2', 'traj3'] 

194>>> indices = [1, 2] 

195>>> new_trajs = trajs[indices] 

196>>> print([t.basename for t in new_trajs]) 

197['traj2', 'traj3'] 

198``` 

199 

200And finally we arrived at the point of using the `traj_indices = trajs.id[indices]` syntax in section {{indices_npy_name}}. This will return a numpy array with ndim = 2 with which you can index single frames. Let's say we want to have a `TrajEnsemble` class, but only with frame 10 of traj 0, frame 20 of traj 2 and frame 30 of traj 3. Maybe we will also add the frames 2 to 5 from traj 2. The syntax will be as follows: 

201 

202```python 

203>>> import encodermap as em 

204>>> traj1 = em.SingleTraj('path/to/traj1.xtc', top='path/to/top1.pdb') 

205>>> traj2 = em.SingleTraj('path/to/traj2.xtc', top='path/to/top2.pdb') 

206>>> traj3 = em.SingleTraj('path/to/traj3.xtc', top='path/to/top3.pdb') 

207>>> trajs = em.TrajEnsemble([traj1, traj2, traj3]) 

208>>> print([t.basename for t in trajs]) 

209['traj1', 'traj2', 'traj3'] 

210>>> print([t.n_frames for t in trajs]) 

211[100, 100, 100] 

212>>> indices = np.array([ 

213 [1, 10], 

214 [2, 20], 

215 [3, 30], 

216 [2, 2]. 

217 [2, 3], 

218 [2, 4], 

219 [2, 5] 

220]) 

221>>> new_trajs = trajs[indices] 

222>>> print([t.basename for t in new_trajs]) 

223['traj1', 'traj2', 'traj3', 'traj2', 'traj2', 'traj2', 'traj2'] 

224>>> print([t.n_frames for t in new_trajs]) 

225[1, 1, 1, 1, 1, 1, 1] 

226>>> print(set([type(t) for t in new_trajs])) 

227[encodermap.SingleTraj] 

228``` 

229 

230So all in all a 1D array of ints indexes single trajectories a 2D array of ints indexes trajs and frames. 

231 

232## What is a `common_str`? 

233 

234Encodermap's `TrajEnsemble` and `SingleTraj` classes contain a class variable called `comon_str`. The common string is a way to order trajectory files from the same topology. This comes in handy, when you run many simulations with the same topology and want to compare them to simulations with a similar, but different topology. Let's consider this scenario. You run simulations of short peptides AFFA and FAAF. Both peptides have the same number of atoms but different topologies. Somehow they still share some joint phase space and can be considered similar to some regards. You set up some simulations from your AFFA.pdb and FAAF.pdb files and them Now you have these files to consider: 

235 

236- AFFA.pdb: AFFA_traj1.xtc, AFFA_traj2.xtc, AFFA_traj3.xtc 

237- FAAF.pdb: FAAF_traj.xtc 

238 

239And you want to compare them. For this you need to assign the pdb files to the corresponding xtc files. Luckily you chosen a naming scheme that lets you group them by the substrings AFFA and FAAF. You can load all trajectories with encodermap using the `TrajEnsemble` class. 

240 

241```python 

242import encodermap as em 

243trajs = em.TrajEnsemble( 

244 [AFFA_traj1.xtc, AFFA_traj2.xtc, AFFA_traj3.xtc, FAAF_traj.xtc], 

245 [AFFA.pdb, FAAF.pdb], 

246 common_str=['FAAF', 'AFFA'] 

247) 

248``` 

249 

250## Rendering this document 

251 

252If you don't like to view plain markdown files with a text-viewer there are many viewers available, that are able to render markdown nicely. I am currently using typora: 

253 

254https://typora.io/ 

255 

256If you want to create a pdf from this document you can try a combination of pandoc, latex and groff. 

257 

258### HTML 

259 

260```bash 

261pandoc {{filename}}.md -o {{filename}}.html 

262``` 

263 

264### Latex 

265 

266```bash 

267pandoc {{filename}}.md -o {{filename}}.pdf 

268``` 

269 

270### Groff 

271 

272```bash 

273pandoc {{filename}}.md -t ms -o {{filename}}.pdf 

274``` 

275 

276## Debug Info 

277 

278``` 

279encodermap.__version__ = {{encodermap_version}} 

280system_user = {{system_user}} 

281platform = {{platform}} 

282platform_release = {{platform_release}} 

283platform_version = {{platform_version}} 

284architecture = {{architecture}} 

285hostname = {{hostname}} 

286ip_address = {{ip_address}} 

287mac_address = {{mac_address}} 

288processor = {{processor}} 

289ram = {{ram}} 

290pip freeze = {{pip_freeze}} 

291 

292``` 

293 

294 

295 

296"""