Your Data#

Run this notebook on Google Colab:

Open in Colab

Find the documentation of EncoderMap:

For Google colab only:#

If you’re on Google colab, please uncomment these lines and install EncoderMap.

# !wget
# !sudo bash


Now it’s time to take advantage of your new knowledge about dimensionality reduction with EncoderMap. Load your own data and get started! The data set you use should be a table where each line contains one sample and the number of columns is the dimensionality of the data-set.

Load Libraries#

import encodermap as em
import matplotlib.pyplot as plt
import numpy as np
from math import pi
%config Completer.use_jedi=False
2023-02-07 11:09:59.413537: I tensorflow/core/platform/] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-02-07 11:09:59.572799: W tensorflow/compiler/xla/stream_executor/platform/default/] Could not load dynamic library ''; dlerror: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.9.16/x64/lib
2023-02-07 11:09:59.572829: I tensorflow/compiler/xla/stream_executor/cuda/] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2023-02-07 11:10:00.417566: W tensorflow/compiler/xla/stream_executor/platform/default/] Could not load dynamic library ''; dlerror: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.9.16/x64/lib
2023-02-07 11:10:00.417662: W tensorflow/compiler/xla/stream_executor/platform/default/] Could not load dynamic library ''; dlerror: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.9.16/x64/lib
2023-02-07 11:10:00.417672: W tensorflow/compiler/tf2tensorrt/utils/] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.

Load Your Data#

csv_path = "path/to/your/data.csv"
high_d_data = np.loadtxt(csv_path, delimiter=",")
FileNotFoundError                         Traceback (most recent call last)
Cell In[3], line 2
      1 csv_path = "path/to/your/data.csv"
----> 2 high_d_data = np.loadtxt(csv_path, delimiter=",")

File /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/numpy/lib/, in loadtxt(fname, dtype, comments, delimiter, converters, skiprows, usecols, unpack, ndmin, encoding, max_rows, quotechar, like)
   1335 if isinstance(delimiter, bytes):
   1336     delimiter = delimiter.decode('latin1')
-> 1338 arr = _read(fname, dtype=dtype, comment=comment, delimiter=delimiter,
   1339             converters=converters, skiplines=skiprows, usecols=usecols,
   1340             unpack=unpack, ndmin=ndmin, encoding=encoding,
   1341             max_rows=max_rows, quote=quotechar)
   1343 return arr

File /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/numpy/lib/, in _read(fname, delimiter, comment, quote, imaginary_unit, usecols, skiplines, max_rows, converters, ndmin, unpack, dtype, encoding)
    973     fname = os.fspath(fname)
    974 if isinstance(fname, str):
--> 975     fh =, 'rt', encoding=encoding)
    976     if encoding is None:
    977         encoding = getattr(fh, 'encoding', 'latin1')

File /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/numpy/lib/, in open(path, mode, destpath, encoding, newline)
    156 """
    157 Open `path` with `mode` and return the file object.
    190 """
    192 ds = DataSource(destpath)
--> 193 return, mode, encoding=encoding, newline=newline)

File /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/numpy/lib/, in, path, mode, encoding, newline)
    530     return _file_openers[ext](found, mode=mode,
    531                               encoding=encoding, newline=newline)
    532 else:
--> 533     raise FileNotFoundError(f"{path} not found.")

FileNotFoundError: path/to/your/data.csv not found.

Set Parameters#

parameters = em.Parameters()
parameters.main_path = em.misc.run_path("runs/my_data")
parameters.n_steps = 1000
parameters.dist_sig_parameters = (4.5, 12, 6, 1, 2, 6)
parameters.periodicity = 2*pi

# if your data set is large you should not try to calculate
# the pairwise distance histogram with the complete data.
em.plot.distance_histogram(high_d_data,  # e.g. use high_d_data[::10] to use evrey 10th point
NameError                                 Traceback (most recent call last)
Cell In[4], line 9
      5 parameters.periodicity = 2*pi
      7 # if your data set is large you should not try to calculate 
      8 # the pairwise distance histogram with the complete data. 
----> 9 em.plot.distance_histogram(high_d_data,  # e.g. use high_d_data[::10] to use evrey 10th point
     10                            parameters.periodicity,
     11                            parameters.dist_sig_parameters)

NameError: name 'high_d_data' is not defined

Run the Dimensionality Reduction#

e_map = em.EncoderMap(parameters, high_d_data)

low_d_projection = e_map.encode(dihedrals)
NameError                                 Traceback (most recent call last)
Cell In[5], line 1
----> 1 e_map = em.EncoderMap(parameters, high_d_data)
      2 e_map.train()
      4 low_d_projection = e_map.encode(dihedrals)

NameError: name 'high_d_data' is not defined

Plot the Results#

%matplotlib notebook
fig, axe = plt.subplots()
axe.scatter(low_d_projection[:, 0], low_d_projection[:, 1], s=5, marker="o", linewidths=0)
NameError                                 Traceback (most recent call last)
Cell In[6], line 3
      1 get_ipython().run_line_magic('matplotlib', 'notebook')
      2 fig, axe = plt.subplots()
----> 3 axe.scatter(low_d_projection[:, 0], low_d_projection[:, 1], s=5, marker="o", linewidths=0)

NameError: name 'low_d_projection' is not defined