Writing Custom Loss Functions#
Run this notebook on Google Colab:
Find the documentation of EncoderMap:
https://ag-peter.github.io/encodermap
For Google colab only:#
If you’re on Google colab, please uncomment these lines and install EncoderMap.
[1]:
# !wget https://raw.githubusercontent.com/AG-Peter/encodermap/main/tutorials/install_encodermap_google_colab.sh
# !sudo bash install_encodermap_google_colab.sh
Primer#
In this tutorial we will learn how to write our own loss functions and add them to EncoderMap. Let us start with the imports:
[2]:
import numpy as np
import encodermap as em
import pandas as pd
import tensorflow as tf
2023-02-07 11:14:31.849480: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-02-07 11:14:31.997359: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.9.16/x64/lib
2023-02-07 11:14:31.997395: I tensorflow/compiler/xla/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2023-02-07 11:14:32.808582: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.9.16/x64/lib
2023-02-07 11:14:32.808678: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.9.16/x64/lib
2023-02-07 11:14:32.808689: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
Adding a unit circle loss#
To show how to implement loss functions we will replace EncoderMap’s center_cost with a loss that tries to push the low-dimensional points into a unit circle. For a unit circle the following equation holds true:
\begin{align} x^2 + y^2 &= 1\\ x^2 + y^2 - 1 &= 0 \end{align}
Let us first plot a unit circle with matplotlib.
[3]:
import matplotlib.pyplot as plt
%matplotlib inline
t = np.linspace(0,np.pi*2,100)
plt.close('all')
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1, aspect='equal')
ax.plot(np.cos(t), np.sin(t), linewidth=1)
[3]:
[<matplotlib.lines.Line2D at 0x7f773b720160>]
How to put this information into a loss function?
We need to find a function that describes the distance between any (x, y)-coordinate to the unit circle.
[4]:
def distance_to_unit_circle_2D(x, y):
return np.abs((np.square(x) + np.square(y)) - 1)
def distance_to_unit_circle(points):
return np.abs(np.sum(np.square(points), axis=0) - 1)
[5]:
xx = np.linspace(-2, 2, 250)
yy = np.linspace(-2, 2, 250)
grid = np.meshgrid(xx, yy)
z = distance_to_unit_circle(grid)
plt.close('all')
plt.contourf(xx, yy, z, levels=60)
[5]:
<matplotlib.contour.QuadContourSet at 0x7f773962b3d0>
Build a loss function from that:#
Cost functions in EncoderMap are almost always closures. Meaning they return a function and not a value. Let’s look at an example closure:
[6]:
def print_msg(msg):
# This is the outer enclosing function
# The variable msg is part of the function's name space
# This namespace is accesible by the nested function `printer`
def printer():
# This is the nested function
print(msg)
printer()
# We execute the function
# Output: Hello
print_msg("Hello")
Hello
The printer function was able to access the non-local variable msg
. EncoderMap’s loss functions use the non-local variables model
and parameters
(often abbreviated to p
).
We will also add tf.reduce_mean()
to get the mean distance from the unit circle for all points, because a loss is always a scalar value.
[7]:
def circle_loss(model, parameters):
"""Circle loss outer function. Takes model and parameters. Parameters is only here for demonstration purpoes.
It is not actually needed in the closure.
"""
# use the models encoder part to create low-dimensional data
latent = model.encoder
def circle_loss_fn(y_true, y_pred=None):
"""Circle loss inner function. Takes y_true and y_pred. y_pred will not be used. y_true will be used to get
the latent space of the autoencoder.
"""
# get latent output
lowd = latent(y_true)
# get circle cost
circle_cost = tf.reduce_mean(tf.abs(tf.reduce_sum(tf.square(lowd), axis=0) - 1))
# bump up the cost to make it stronger than the other contributions
circle_cost *= 5
# write to tensorboard
tf.summary.scalar('Circle Cost', circle_cost)
# return circle cost
return circle_cost
# return inner function
return circle_loss_fn
Include the loss function in EncoderMap#
First: Let us load the dihedral data from ../notebooks_easy and define some Parameters. For the parameters we will set the center_cost_scale to be 0 as to not interfere with our new circle cost.
[8]:
df = pd.read_csv('/asp7.csv')
dihedrals = df.iloc[:,:-1].values.astype(np.float32)
cluster_ids = df.iloc[:,-1].values
print(dihedrals.shape, cluster_ids.shape)
print(df.shape)
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
Cell In[8], line 1
----> 1 df = pd.read_csv('/asp7.csv')
2 dihedrals = df.iloc[:,:-1].values.astype(np.float32)
3 cluster_ids = df.iloc[:,-1].values
File /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/pandas/util/_decorators.py:211, in deprecate_kwarg.<locals>._deprecate_kwarg.<locals>.wrapper(*args, **kwargs)
209 else:
210 kwargs[new_arg_name] = new_arg_value
--> 211 return func(*args, **kwargs)
File /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/pandas/util/_decorators.py:331, in deprecate_nonkeyword_arguments.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
325 if len(args) > num_allow_args:
326 warnings.warn(
327 msg.format(arguments=_format_argument_list(allow_args)),
328 FutureWarning,
329 stacklevel=find_stack_level(),
330 )
--> 331 return func(*args, **kwargs)
File /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/pandas/io/parsers/readers.py:950, in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, error_bad_lines, warn_bad_lines, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options)
935 kwds_defaults = _refine_defaults_read(
936 dialect,
937 delimiter,
(...)
946 defaults={"delimiter": ","},
947 )
948 kwds.update(kwds_defaults)
--> 950 return _read(filepath_or_buffer, kwds)
File /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/pandas/io/parsers/readers.py:605, in _read(filepath_or_buffer, kwds)
602 _validate_names(kwds.get("names", None))
604 # Create the parser.
--> 605 parser = TextFileReader(filepath_or_buffer, **kwds)
607 if chunksize or iterator:
608 return parser
File /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/pandas/io/parsers/readers.py:1442, in TextFileReader.__init__(self, f, engine, **kwds)
1439 self.options["has_index_names"] = kwds["has_index_names"]
1441 self.handles: IOHandles | None = None
-> 1442 self._engine = self._make_engine(f, self.engine)
File /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/pandas/io/parsers/readers.py:1735, in TextFileReader._make_engine(self, f, engine)
1733 if "b" not in mode:
1734 mode += "b"
-> 1735 self.handles = get_handle(
1736 f,
1737 mode,
1738 encoding=self.options.get("encoding", None),
1739 compression=self.options.get("compression", None),
1740 memory_map=self.options.get("memory_map", False),
1741 is_text=is_text,
1742 errors=self.options.get("encoding_errors", "strict"),
1743 storage_options=self.options.get("storage_options", None),
1744 )
1745 assert self.handles is not None
1746 f = self.handles.handle
File /opt/hostedtoolcache/Python/3.9.16/x64/lib/python3.9/site-packages/pandas/io/common.py:856, in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)
851 elif isinstance(handle, str):
852 # Check whether the filename is to be opened in binary mode.
853 # Binary mode does not support 'encoding' and 'newline'.
854 if ioargs.encoding and "b" not in ioargs.mode:
855 # Encoding
--> 856 handle = open(
857 handle,
858 ioargs.mode,
859 encoding=ioargs.encoding,
860 errors=errors,
861 newline="",
862 )
863 else:
864 # Binary mode
865 handle = open(handle, ioargs.mode)
FileNotFoundError: [Errno 2] No such file or directory: '/asp7.csv'
[9]:
parameters = em.Parameters(
tensorboard=True,
center_cost_scale=0,
n_steps=100,
periodicity=2*np.pi,
main_path=em.misc.run_path('runs/custom_losses')
)
Now we can instaniate the EncoderMap
class. For visualization purposes we will also make tensorboard write images.
[10]:
e_map = em.EncoderMap(parameters, dihedrals)
e_map.add_images_to_tensorboard(dihedrals, image_step=1)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[10], line 1
----> 1 e_map = em.EncoderMap(parameters, dihedrals)
2 e_map.add_images_to_tensorboard(dihedrals, image_step=1)
NameError: name 'dihedrals' is not defined
The loss is created by giving it the model and parameters of the parent EncoderMap
instance. To not clash with the names of function and result we will call it _circle_loss
.
[11]:
circle_loss_fn = circle_loss(e_map.model, e_map.p)
print(_circle_loss)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[11], line 1
----> 1 circle_loss_fn = circle_loss(e_map.model, e_map.p)
2 print(_circle_loss)
NameError: name 'e_map' is not defined
Now we add this loss to EncoderMap
’s losses
[12]:
print(e_map.loss)
e_map.loss.append(_circle_loss)
print(e_map.loss)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[12], line 1
----> 1 print(e_map.loss)
2 e_map.loss.append(_circle_loss)
3 print(e_map.loss)
NameError: name 'e_map' is not defined
Train#
Also make sure to execute tensorboard in the correct directory:
$ tensorboard --logdir . --reload_multifile True
If you’re on Google colab, you can use tensorboard, by activating the tensorboard extension:
[13]:
# %load_ext tensorboard
# %tensorboard --logdir .
[14]:
e_map.train()
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[14], line 1
----> 1 e_map.train()
NameError: name 'e_map' is not defined
Output#
Here’s what Tensorboard should put out:
Conclusion#
Using the closure method, you can easily add new loss functions to EncoderMap.