Your Data#

Run this notebook on Google Colab:

Open in Colab

Find the documentation of EncoderMap:

https://ag-peter.github.io/encodermap

For Google colab only:#

If you’re on Google colab, please uncomment these lines and install EncoderMap.

[1]:
# !wget https://raw.githubusercontent.com/AG-Peter/encodermap/main/tutorials/install_encodermap_google_colab.sh
# !sudo bash install_encodermap_google_colab.sh

Primer#

Now it’s time to take advantage of your new knowledge about dimensionality reduction with EncoderMap. Load your own data and get started! The data set you use should be a table where each line contains one sample and the number of columns is the dimensionality of the data-set.

Load Libraries#

[2]:
import encodermap as em
import matplotlib.pyplot as plt
import numpy as np
from math import pi
%config Completer.use_jedi=False
2023-02-07 11:09:59.413537: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-02-07 11:09:59.572799: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.9.16/x64/lib
2023-02-07 11:09:59.572829: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2023-02-07 11:10:00.417566: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.9.16/x64/lib
2023-02-07 11:10:00.417662: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.9.16/x64/lib
2023-02-07 11:10:00.417672: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.

Load Your Data#

[3]:
csv_path = "path/to/your/data.csv"
high_d_data = np.loadtxt(csv_path, delimiter=",")
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
Cell In[3], line 2
      1 csv_path = "path/to/your/data.csv"
----> 2 high_d_data = np.loadtxt(csv_path, delimiter=",")

File /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/numpy/lib/npyio.py:1338, in loadtxt(fname, dtype, comments, delimiter, converters, skiprows, usecols, unpack, ndmin, encoding, max_rows, quotechar, like)
   1335 if isinstance(delimiter, bytes):
   1336     delimiter = delimiter.decode('latin1')
-> 1338 arr = _read(fname, dtype=dtype, comment=comment, delimiter=delimiter,
   1339             converters=converters, skiplines=skiprows, usecols=usecols,
   1340             unpack=unpack, ndmin=ndmin, encoding=encoding,
   1341             max_rows=max_rows, quote=quotechar)
   1343 return arr

File /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/numpy/lib/npyio.py:975, in _read(fname, delimiter, comment, quote, imaginary_unit, usecols, skiplines, max_rows, converters, ndmin, unpack, dtype, encoding)
    973     fname = os.fspath(fname)
    974 if isinstance(fname, str):
--> 975     fh = np.lib._datasource.open(fname, 'rt', encoding=encoding)
    976     if encoding is None:
    977         encoding = getattr(fh, 'encoding', 'latin1')

File /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/numpy/lib/_datasource.py:193, in open(path, mode, destpath, encoding, newline)
    156 """
    157 Open `path` with `mode` and return the file object.
    158
   (...)
    189
    190 """
    192 ds = DataSource(destpath)
--> 193 return ds.open(path, mode, encoding=encoding, newline=newline)

File /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/numpy/lib/_datasource.py:533, in DataSource.open(self, path, mode, encoding, newline)
    530     return _file_openers[ext](found, mode=mode,
    531                               encoding=encoding, newline=newline)
    532 else:
--> 533     raise FileNotFoundError(f"{path} not found.")

FileNotFoundError: path/to/your/data.csv not found.

Set Parameters#

[4]:
parameters = em.Parameters()
parameters.main_path = em.misc.run_path("runs/my_data")
parameters.n_steps = 1000
parameters.dist_sig_parameters = (4.5, 12, 6, 1, 2, 6)
parameters.periodicity = 2*pi

# if your data set is large you should not try to calculate
# the pairwise distance histogram with the complete data.
em.plot.distance_histogram(high_d_data,  # e.g. use high_d_data[::10] to use evrey 10th point
                           parameters.periodicity,
                           parameters.dist_sig_parameters)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[4], line 9
      5 parameters.periodicity = 2*pi
      7 # if your data set is large you should not try to calculate 
      8 # the pairwise distance histogram with the complete data. 
----> 9 em.plot.distance_histogram(high_d_data,  # e.g. use high_d_data[::10] to use evrey 10th point
     10                            parameters.periodicity,
     11                            parameters.dist_sig_parameters)

NameError: name 'high_d_data' is not defined

Run the Dimensionality Reduction#

[5]:
e_map = em.EncoderMap(parameters, high_d_data)
e_map.train()

low_d_projection = e_map.encode(dihedrals)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[5], line 1
----> 1 e_map = em.EncoderMap(parameters, high_d_data)
      2 e_map.train()
      4 low_d_projection = e_map.encode(dihedrals)

NameError: name 'high_d_data' is not defined

Plot the Results#

[6]:
%matplotlib notebook
fig, axe = plt.subplots()
axe.scatter(low_d_projection[:, 0], low_d_projection[:, 1], s=5, marker="o", linewidths=0)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[6], line 3
      1 get_ipython().run_line_magic('matplotlib', 'notebook')
      2 fig, axe = plt.subplots()
----> 3 axe.scatter(low_d_projection[:, 0], low_d_projection[:, 1], s=5, marker="o", linewidths=0)

NameError: name 'low_d_projection' is not defined