Your Data#
Run this notebook on Google Colab:
Find the documentation of EncoderMap:
https://ag-peter.github.io/encodermap
For Google colab only:#
If you’re on Google colab, please uncomment these lines and install EncoderMap.
[1]:
# !wget https://raw.githubusercontent.com/AG-Peter/encodermap/main/tutorials/install_encodermap_google_colab.sh
# !sudo bash install_encodermap_google_colab.sh
Primer#
Now it’s time to take advantage of your new knowledge about dimensionality reduction with EncoderMap. Load your own data and get started! The data set you use should be a table where each line contains one sample and the number of columns is the dimensionality of the data-set.
Load Libraries#
[2]:
import encodermap as em
import matplotlib.pyplot as plt
import numpy as np
from math import pi
%config Completer.use_jedi=False
2023-02-07 11:09:59.413537: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-02-07 11:09:59.572799: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.9.16/x64/lib
2023-02-07 11:09:59.572829: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2023-02-07 11:10:00.417566: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.9.16/x64/lib
2023-02-07 11:10:00.417662: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.9.16/x64/lib
2023-02-07 11:10:00.417672: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
Load Your Data#
[3]:
csv_path = "path/to/your/data.csv"
high_d_data = np.loadtxt(csv_path, delimiter=",")
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
Cell In[3], line 2
1 csv_path = "path/to/your/data.csv"
----> 2 high_d_data = np.loadtxt(csv_path, delimiter=",")
File /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/numpy/lib/npyio.py:1338, in loadtxt(fname, dtype, comments, delimiter, converters, skiprows, usecols, unpack, ndmin, encoding, max_rows, quotechar, like)
1335 if isinstance(delimiter, bytes):
1336 delimiter = delimiter.decode('latin1')
-> 1338 arr = _read(fname, dtype=dtype, comment=comment, delimiter=delimiter,
1339 converters=converters, skiplines=skiprows, usecols=usecols,
1340 unpack=unpack, ndmin=ndmin, encoding=encoding,
1341 max_rows=max_rows, quote=quotechar)
1343 return arr
File /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/numpy/lib/npyio.py:975, in _read(fname, delimiter, comment, quote, imaginary_unit, usecols, skiplines, max_rows, converters, ndmin, unpack, dtype, encoding)
973 fname = os.fspath(fname)
974 if isinstance(fname, str):
--> 975 fh = np.lib._datasource.open(fname, 'rt', encoding=encoding)
976 if encoding is None:
977 encoding = getattr(fh, 'encoding', 'latin1')
File /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/numpy/lib/_datasource.py:193, in open(path, mode, destpath, encoding, newline)
156 """
157 Open `path` with `mode` and return the file object.
158
(...)
189
190 """
192 ds = DataSource(destpath)
--> 193 return ds.open(path, mode, encoding=encoding, newline=newline)
File /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/numpy/lib/_datasource.py:533, in DataSource.open(self, path, mode, encoding, newline)
530 return _file_openers[ext](found, mode=mode,
531 encoding=encoding, newline=newline)
532 else:
--> 533 raise FileNotFoundError(f"{path} not found.")
FileNotFoundError: path/to/your/data.csv not found.
Set Parameters#
[4]:
parameters = em.Parameters()
parameters.main_path = em.misc.run_path("runs/my_data")
parameters.n_steps = 1000
parameters.dist_sig_parameters = (4.5, 12, 6, 1, 2, 6)
parameters.periodicity = 2*pi
# if your data set is large you should not try to calculate
# the pairwise distance histogram with the complete data.
em.plot.distance_histogram(high_d_data, # e.g. use high_d_data[::10] to use evrey 10th point
parameters.periodicity,
parameters.dist_sig_parameters)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[4], line 9
5 parameters.periodicity = 2*pi
7 # if your data set is large you should not try to calculate
8 # the pairwise distance histogram with the complete data.
----> 9 em.plot.distance_histogram(high_d_data, # e.g. use high_d_data[::10] to use evrey 10th point
10 parameters.periodicity,
11 parameters.dist_sig_parameters)
NameError: name 'high_d_data' is not defined
Run the Dimensionality Reduction#
[5]:
e_map = em.EncoderMap(parameters, high_d_data)
e_map.train()
low_d_projection = e_map.encode(dihedrals)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[5], line 1
----> 1 e_map = em.EncoderMap(parameters, high_d_data)
2 e_map.train()
4 low_d_projection = e_map.encode(dihedrals)
NameError: name 'high_d_data' is not defined
Plot the Results#
[6]:
%matplotlib notebook
fig, axe = plt.subplots()
axe.scatter(low_d_projection[:, 0], low_d_projection[:, 1], s=5, marker="o", linewidths=0)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[6], line 3
1 get_ipython().run_line_magic('matplotlib', 'notebook')
2 fig, axe = plt.subplots()
----> 3 axe.scatter(low_d_projection[:, 0], low_d_projection[:, 1], s=5, marker="o", linewidths=0)
NameError: name 'low_d_projection' is not defined