Coverage for encodermap/plot/jinja

1# -*- coding: utf-8 -*-

2# encodermap/plot/jinja_template.py

3################################################################################

4# EncoderMap: A python library for dimensionality reduction.

8# Authors:

9# Kevin Sawade

10#

11# Encodermap is free software: you can redistribute it and/or modify

12# it under the terms of the GNU Lesser General Public License as

13# published by the Free Software Foundation, either version 2.1

14# of the License, or (at your option) any later version.

15# This package is distributed in the hope that it will be useful to other

16# researches. IT DOES NOT COME WITH ANY WARRANTY WHATSOEVER; without even the

17# implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

18# See the GNU Lesser General Public License for more details.

19#

20# See <http://www.gnu.org/licenses/>.

21################################################################################

22"""This is a template for a README.md generated when a user writes a cluster to disk."""

24xtc_rebuild = """\

25This file can be used to rebuild the clustering of the trajectories like so:

27```python

28>>> import encodermap as em

29>>> import numpy as np

30>>> trajs = em.TrajEnsemble.from_textfile('{{parents_trajs}}')

31>>> cluster_membership = np.full(trajs.n_frames, -1) # fill array with -1, meaning outliers

32>>> indices = np.load('{{indices_npy_name}}') # load the indices

33>>> cluster_membership[indices] = {{cluster_id}} # set the cluster number of the indices

34>>> trajs.load_CVs(cluster_membership, 'cluster_membership') # load the cluster membership as collectvie variables

35>>> traj_indices = trajs.id[indices] # more on this line in a separate section

36>>> cluster_trajs = trajs[traj_indices]

37```

38"""

40h5_rebuild = """\

41This file can be used to rebuild the clustering of the trajectories like so:

43```python

44>>> import encodermap as em

45>>> import numpy as np

46>>> trajs = em.TrajEnsemble.from_dataset('{{h5_file}}')

47>>> cluster_membership = np.full(trajs.n_frames, -1) # fill array with -1, meaning outliers

48>>> indices = np.load('{{indices_npy_name}}') # load the indices

49>>> cluster_membership[indices] = {{cluster_id}} # set the cluster number of the indices

50>>> trajs.load_CVs(cluster_membership, 'cluster_membership') # load the cluster membership as collectvie variables

51>>> traj_indices = trajs.id[indices] # more on this line in a separate section

52>>> cluster_trajs = trajs[traj_indices]

53```

54"""

56xtc_parents = """\

57## {{parents_trajs}}

59This plain text document contains the absolute paths to all trajectory files, their corresponding topology files and their corresponding `common_str`, that were considered during the clustering. Some of the trajectory files here might not take part in the actual cluster, but they are here in this file nonetheless. You can reload the trajectories with the `from_textfile()` alternative constructor of the `TrajEnsemble` class.

61```python

62import encodermap as em

63trajs = em.TrajEnsemble.from_textfile('{{parents_trajs}}')

64```

65"""

67h5_parents = """\

68## Parents Trajs

70The parent trajectories are all saved in a single h5 file.

72```python

73import encodermap as em

74trajs = em.TrajEnsemble.from_dataset('{{h5_file}}')

75```

76"""

78template = """# Cluster {{cluster_id}} generated at {{now}}

80## What just happened?

82You either selected a cluster with the `InteractivePlotting` class of `encodermap` our you called the `_unpack_cluster_info()` function from `encodermap.plot.utils`. Many files have been put into a directory at {{cluster_abspath}} which can be used to rebuild the cluster. The cluster you selected has been assigned the number {{cluster_id}}. If your cluster number is 0, your cluster is the first selected cluster of these MD trajectories (outliers are assigned -1). If your cluster has a number different than 0, you have selected another cluster and the cluster_membership is given by this unique identifier.

84Here is a general rundown of the files created:

88## {{h5_name}}

90This file contains 10 frames. These 10 frames were selected from the original {{cluster_n_points}} points inside the cluster. By evenly slicing it (That's why it is only roughly 10 structures. Sometimes its more). You can load this pdb whichever way you like and render a nice image of the cluster.

92## Other pdb and xtc files

94The other pdb and xtc files contain data to rebuild not only the ca. 10 frames from the pdb, but the whole cluster. They are enumerated the same way they are enumerated in {{parents_trajs}}. The fille `cluster_id_{{cluster_id}}_traj_0.xtc` corresponds to `cluster_id_{{cluster_id}}_start_traj_0_from_{{basename}}.pdb`, `cluster_id_{{cluster_id}}_traj_1.xtc` corresponds to `cluster_id_{{cluster_id}}_start_traj_1_from_{{basename}}.pdb` and so on.

96## {{lowd_npy_name}}

98A 2D numpy array with the same number of points, as there are frames in the `cluster_id_{{cluster_id}}_traj_X.xtc` files combined. This is the low-dimensional representation of this whole cluster.

100## {{indices_npy_name}}

101

102{{rebuild_clustering_info}}

103

104## {{pdb_origin_names}}

105

106This file is only created, when the structures inside the cluster have a different number of atoms and thus, can not be loaded with the same topology. This plain text file contains information from where the pdb files were copied. This might only be useful in very niche scenarios.

107

108## {{csv_name}}

109

110This .csv table contains Info about every point inside the cluster. Its columns give the following information:

111

112| trajectory file | Contains the trajectory data (file formats such as .xtc, .dcd, .h5). |

113| ------------------------------------- | ------------------------------------------------------------ |

114| topology file | Contains the topology of the file (i.e. atom types, masses, residues) (file formats such as .pdb, .gro, .h5). Some trajectory files (.h5) might also contain the topology. In that case `trajectory file` and `topology` file are identical. |

115| frame number | The number of the frame of the `trajectory file`. If you index your trajectories by frame number use this number to reload this specific trajectory frame. `import mdtraj as md; frame = md.load_frame(trajectory_file, index=frame, top=topology_file)`<br />or<br />`imprt MDAnalysis as mda; frame = mda.Universe(topology_file, trajectory_file).trajectory[frame]` |

116| time | The time of the frame. This can be used for time-based indexing of trajectories. `gmx trjconv -f $traj_file -s $top_file -dump $time` |

117| cluster id | The id of the cluster. This column is identical in the whole csv file but can be used to merge multiple csv files to analyye multiple clusters at once. |

118| trajectory number | The number of the trajectory in the full dataset. This corresponds to the line number in the file {{parents_trajs}}. If many trajectories have been loaded, the first trajectory is 0, and so on. If only one trajectory is loaded, its `trajectory number` might also be `None`. |

119

120## {{selector_npy_name}}

121

122This is a 2D numpy array of the points of the Selector used. The selector is a matplotlib.widget that can interactively select points in a 2D scatter plot. In `encodermap` 4 selectors are available:

123

124- Rectangle: For Rectangle the Selector will contain 4 points. The xy coordinates of the corners of the rectangle in data coordinates.

125- Polygon: Similar to Rectangle a collection of points. The first and last point are identical.

126- Ellipse: A collection of points describing the outline of the Ellipse.

127- Lasso: A collection of points following a free-hand drawn shape.

128

129## {{current_clustering}}

130

131This is a numpy array containing the cluster numbers of all previously selected clusters. If this cluster has a cluster id of 0, this array will only contain 0s and -1s and will be the same lengths as there are frames in the analyzed trajectories.

132

133```python

134>>> import encodermap as em

135>>> import numpy as np

136>>> trajs = em.TrajEnsemble.from_textfile('{{parents_trajs}}')

137>>> current_clustering = np.load('{{current_clustering}}')

138>>> len(current_clustering) == trajs.n_frames

139True

140```

141

142If this cluster has a higher cluster id all previously selected clusters can be accessed with this array:

143

144```python

145>>> import encodermap as em

146>>> import numpy as np

147>>> trajs = em.TrajEnsemble.from_textfile('{{parents_trajs}}')

148>>> current_clustering = np.load('{{current_clustering}}') # fill array with -1, meaning outliers

149>>> trajs.load_CVs(current_clustering, 'cluster_membership') # load the cluster membership as collectvie variables

150>>> indices_some_other_cluster = np.where(trajs.cluster_membership == 2)[0]

151>>> traj_indices = trajs.id[indices_some_other_cluster] # more on this line in a separate section

152>>> cluster_trajs = trajs[traj_indices]

153>>> len(traj_indices == cluster_trajs.n_frames)

154True

155```

156

157## {{png_name}}

158

159This is just an image. Here it is

160

161![Cluster Image]({{png_name}})

162

163## Why the `trajs.id[indices]` part?

164

165This comes down to the question of: What should be returned if an `TrajEnsemble` object is indexed via a list or numpy array. For this we will first fall back and try to figure out, what should happen if the `TrajEnsemble` class is indexed via a single integer. The most sensical way would be that you get an `SingleTraj` class indexed by this integer. Consider this example:

166

167```python

168>>> import encodermap as em

169>>> traj1 = em.SingleTraj('path/to/traj1.xtc', top='path/to/top1.pdb')

170>>> print(traj1.basename)

171traj1

172>>> traj2 = em.SingleTraj('path/to/traj2.xtc', top='path/to/top2.pdb')

173>>> traj3 = em.SingleTraj('path/to/traj3.xtc', top='path/to/top3.pdb')

174>>> trajs = em.TrajEnsemble([traj1, traj2, traj3])

175>>> print([t.basename for t in trajs])

176['traj1', 'traj2', 'traj3']

177>>> integer_indexing = trajs[2]

178>>> print(integer_indexing == traj3)

179True

180>>> print(integer_indexing.basename)

181traj3

182```

183

184Using a list of int or a numpy array of int thus returns a new `TrajEnsemble` class, but with the `SingleTraj` classes indexed by the ints. Consider this example:

185

186```python

187>>> import encodermap as em

188>>> traj1 = em.SingleTraj('path/to/traj1.xtc', top='path/to/top1.pdb')

189>>> traj2 = em.SingleTraj('path/to/traj2.xtc', top='path/to/top2.pdb')

190>>> traj3 = em.SingleTraj('path/to/traj3.xtc', top='path/to/top3.pdb')

191>>> trajs = em.TrajEnsemble([traj1, traj2, traj3])

192>>> print([t.basename for t in trajs])

193['traj1', 'traj2', 'traj3']

194>>> indices = [1, 2]

195>>> new_trajs = trajs[indices]

196>>> print([t.basename for t in new_trajs])

197['traj2', 'traj3']

198```

199

200And finally we arrived at the point of using the `traj_indices = trajs.id[indices]` syntax in section {{indices_npy_name}}. This will return a numpy array with ndim = 2 with which you can index single frames. Let's say we want to have a `TrajEnsemble` class, but only with frame 10 of traj 0, frame 20 of traj 2 and frame 30 of traj 3. Maybe we will also add the frames 2 to 5 from traj 2. The syntax will be as follows:

201

202```python

203>>> import encodermap as em

204>>> traj1 = em.SingleTraj('path/to/traj1.xtc', top='path/to/top1.pdb')

205>>> traj2 = em.SingleTraj('path/to/traj2.xtc', top='path/to/top2.pdb')

206>>> traj3 = em.SingleTraj('path/to/traj3.xtc', top='path/to/top3.pdb')

207>>> trajs = em.TrajEnsemble([traj1, traj2, traj3])

208>>> print([t.basename for t in trajs])

209['traj1', 'traj2', 'traj3']

210>>> print([t.n_frames for t in trajs])

211[100, 100, 100]

212>>> indices = np.array([

213 [1, 10],

214 [2, 20],

215 [3, 30],

216 [2, 2].

217 [2, 3],

218 [2, 4],

219 [2, 5]

220])

221>>> new_trajs = trajs[indices]

222>>> print([t.basename for t in new_trajs])

223['traj1', 'traj2', 'traj3', 'traj2', 'traj2', 'traj2', 'traj2']

224>>> print([t.n_frames for t in new_trajs])

225[1, 1, 1, 1, 1, 1, 1]

226>>> print(set([type(t) for t in new_trajs]))

227[encodermap.SingleTraj]

228```

229

230So all in all a 1D array of ints indexes single trajectories a 2D array of ints indexes trajs and frames.

231

232## What is a `common_str`?

233

234Encodermap's `TrajEnsemble` and `SingleTraj` classes contain a class variable called `comon_str`. The common string is a way to order trajectory files from the same topology. This comes in handy, when you run many simulations with the same topology and want to compare them to simulations with a similar, but different topology. Let's consider this scenario. You run simulations of short peptides AFFA and FAAF. Both peptides have the same number of atoms but different topologies. Somehow they still share some joint phase space and can be considered similar to some regards. You set up some simulations from your AFFA.pdb and FAAF.pdb files and them Now you have these files to consider:

235

236- AFFA.pdb: AFFA_traj1.xtc, AFFA_traj2.xtc, AFFA_traj3.xtc

237- FAAF.pdb: FAAF_traj.xtc

238

239And you want to compare them. For this you need to assign the pdb files to the corresponding xtc files. Luckily you chosen a naming scheme that lets you group them by the substrings AFFA and FAAF. You can load all trajectories with encodermap using the `TrajEnsemble` class.

240

241```python

242import encodermap as em

243trajs = em.TrajEnsemble(

244 [AFFA_traj1.xtc, AFFA_traj2.xtc, AFFA_traj3.xtc, FAAF_traj.xtc],

245 [AFFA.pdb, FAAF.pdb],

246 common_str=['FAAF', 'AFFA']

247)

248```

249

250## Rendering this document

251

252If you don't like to view plain markdown files with a text-viewer there are many viewers available, that are able to render markdown nicely. I am currently using typora:

253

254https://typora.io/

255

256If you want to create a pdf from this document you can try a combination of pandoc, latex and groff.

257

258### HTML

259

260```bash

261pandoc {{filename}}.md -o {{filename}}.html

262```

263

264### Latex

265

266```bash

267pandoc {{filename}}.md -o {{filename}}.pdf

268```

269

270### Groff

271

272```bash

273pandoc {{filename}}.md -t ms -o {{filename}}.pdf

274```

275

276## Debug Info

277

278```

279encodermap.__version__ = {{encodermap_version}}

280system_user = {{system_user}}

281platform = {{platform}}

282platform_release = {{platform_release}}

283platform_version = {{platform_version}}

284architecture = {{architecture}}

285hostname = {{hostname}}

286ip_address = {{ip_address}}

287mac_address = {{mac_address}}

288processor = {{processor}}

289ram = {{ram}}

290pip freeze = {{pip_freeze}}

292```

296"""

Coverage for encodermap/plot/jinja_template.py: 100%

6 statements