# Learning Rate Schedulers

**Welcome**

Welcome to the Learning Rate Schedulers tutorial. Learning rate schedulers can help us dynamically adjust the learning rate of the Adam optimization algorithm. That way, we can decrease the learning rate as we approach the minima of the cost function.

Run this notebook on Google Colab:

[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/AG-Peter/encodermap/blob/main/tutorials/notebooks_customization/04_learning_rate_schedulers.ipynb)

Find the documentation of EncoderMap:

https://ag-peter.github.io/encodermap

**Goals:**

In this tutorial you will learn:

* [Why we can profit from learning rate schedulers](#why)
* [How to log the current learning rate to TensorBoard](#log_to_tb)
* [How to implement a learning rate scheduler with an exponentially decaying learning rate](#lr_implementation)

### For Google colab only:

If you're on Google colab, please uncomment these lines and install EncoderMap.

In [1]:
# !wget https://gist.githubusercontent.com/kevinsawade/deda578a3c6f26640ae905a3557e4ed1/raw/b7403a37710cb881839186da96d4d117e50abf36/install_encodermap_google_colab.sh
# !sudo bash install_encodermap_google_colab.sh

If you're on Google Colab, you also want to download the data we will use:

In [2]:
# !wget https://raw.githubusercontent.com/AG-Peter/encodermap/main/tutorials/notebooks_starter/asp7.csv

## Import Libraries

Before we can start exploring the learning rate scheduler, we need to import some libraries.

In [3]:
import os
import numpy as np
import encodermap as em
import tensorflow as tf
import pandas as pd
from pathlib import Path
%load_ext autoreload
%autoreload 2


import os; os.environ['ENCODERMAP_ENABLE_GPU'] = 'True'

before importing encodermap.




We wil work in the directory `runs/lr_scheduler`. We will create it now.

In [4]:
(Path.cwd() / "runs/lr_scheduler").mkdir(parents=True, exist_ok=True)

<a id="why"></a>

## Why learning rate schedulers? A linear regression example

<a id="log_to_tb"></a>

## Log the current learning rate to Tensorboard

Before we implement some dynamic learning rates we want to find a way to log the learning rate to tensorboard.

### Running tensorboard on Google colab

To use tensorboard in google colabs notebooks, you neet to first load the tensorboard extension

```python
%load_ext tensorboard
```

And then activate it with:

```python
%tensorboard --logdir .
```

The next code cell contains these commands. Uncomment them and then continue.

### Running tensorboard locally

TensorBoard is a visualization tool from the machine learning library TensorFlow which is used by the EncoderMap package. During the dimensionality reduction step, when the neural network autoencoder is trained, several readings are saved in a TensorBoard format. All output files are saved to the path defined in `parameters.main_path`. Navigate to this location in a shell and start TensorBoard. Change the paramter Tensorboard to `True` to make Encodermap log to Tensorboard.

In case you run this tutorial in the provided Docker container you can open a new console inside the container by typing the following command in a new system shell.
```shell
docker exec -it emap bash
```
Navigate to the location where all the runs are saved. e.g.:
```shell
cd notebooks_easy/runs/asp7/
```
Start TensorBoard in this directory with:
```shell
tensorboard --logdir .
```

You should now be able to open TensorBoard in your webbrowser on port 6006.  
`0.0.0.0:6006` or `127.0.0.1:6006`

In the SCALARS tab of TensorBoard you should see among other values the overall cost and different contributions to the cost. The two most important contributions are `auto_cost` and `distance_cost`. `auto_cost` indicates differences between the inputs and outputs of the autoencoder. `distance_cost` is the part of the cost function which compares pairwise distances in the input space and the low-dimensional (latent) space.

**Fixing Reloading issues**
Using Tensorboard we often encountered some issues while training multiple models and writing mutliple runs to Tensorboard's logdir. Reloading the data and event refreshing the web page did not display the data of the current run. We needed to kill tensorboard and restart it in order to see the new data. This issue was fixed by setting `reload_multifile` `True`.

```bash
tensorboard --logdir . --reload_multifile True
```

**When you're on Goole Colab, you can load the Tensorboard extension with:**

In [5]:
# %load_ext tensorboard
# %tensorboard --logdir .

### Sublcassing EncoderMap's `EncoderMapBaseCallback`

The easiest way to implement and log a new variable to TensorBorard is by subclassing EncoderMap's `EncodeMapBaseCallback` from the `callbacks` submodule.

In [6]:
?em.callbacks.EncoderMapBaseCallback

As per the docstring of the `EncoderMapBaseCallback` class, we create the `LearningRateLogger` class and implement a piece of code in the `on_summary_step` method.

In [7]:
class LearningRateLogger(em.callbacks.EncoderMapBaseCallback):
    def on_summary_step(self, step, logs=None):
        with tf.name_scope("Learning Rate"):
            tf.summary.scalar('current learning rate', self.model.optimizer.lr, step=step)

We can now create an `EncoderMap` class and add our new callback with the `add_callback` method.

In [8]:
df = pd.read_csv('asp7.csv')
dihedrals = df.iloc[:,:-1].values.astype(np.float32)
cluster_ids = df.iloc[:,-1].values

parameters = em.Parameters(
tensorboard=True,
periodicity=2*np.pi,
main_path=em.misc.run_path('runs/lr_scheduler'),
n_steps=100,
summary_step=5
)

# create an instance of EncoderMap
e_map = em.EncoderMap(parameters, dihedrals)

# Add an instance of the new Callback
e_map.add_callback(LearningRateLogger)

Output files are saved to runs/lr_scheduler/run0 as defined in 'main_path' in the parameters.
Saved a text-summary of the model and an image in runs/lr_scheduler/run0, as specified in 'main_path' in the parameters.


We train the Model.

In [9]:
history = e_map.train()

  0%|                                                             | 0/100 [00:00<?, ?it/s]

  0%|                                        | 0/100 [00:00<?, ?it/s, Loss after step ?=?]

  1%|▎                               | 1/100 [00:02<04:55,  2.99s/it, Loss after step ?=?]

  4%|█▏                           | 4/100 [00:03<04:46,  2.99s/it, Loss after step 5=64.9]

  9%|██▌                         | 9/100 [00:03<04:31,  2.99s/it, Loss after step 10=44.9]

 13%|███▌                       | 13/100 [00:03<00:15,  5.78it/s, Loss after step 10=44.9]

 14%|███▊                       | 14/100 [00:03<00:14,  5.78it/s, Loss after step 15=40.6]

 19%|█████▏                     | 19/100 [00:03<00:14,  5.78it/s, Loss after step 20=40.3]

 24%|██████▉                      | 24/100 [00:03<00:13,  5.78it/s, Loss after step 25=38]

 26%|███████▌                     | 26/100 [00:03<00:05, 13.33it/s, Loss after step 25=38]

 29%|████████▍                    | 29/100 [00:03<00:05, 13.33it/s, Loss after step 30=41]

 34%|█████████▏                 | 34/100 [00:03<00:04, 13.33it/s, Loss after step 35=36.2]

 38%|██████████▎                | 38/100 [00:03<00:02, 21.84it/s, Loss after step 35=36.2]

 39%|██████████▌                | 39/100 [00:03<00:02, 21.84it/s, Loss after step 40=37.8]

 44%|███████████▉               | 44/100 [00:03<00:02, 21.84it/s, Loss after step 45=36.2]

 49%|█████████████▏             | 49/100 [00:03<00:02, 21.84it/s, Loss after step 50=35.7]

 51%|█████████████▊             | 51/100 [00:03<00:01, 32.72it/s, Loss after step 50=35.7]

 54%|██████████████▌            | 54/100 [00:03<00:01, 32.72it/s, Loss after step 55=33.6]

 59%|███████████████▉           | 59/100 [00:03<00:01, 32.72it/s, Loss after step 60=34.4]

 64%|█████████████████▎         | 64/100 [00:03<00:00, 44.85it/s, Loss after step 60=34.4]

 64%|██████████████████▌          | 64/100 [00:03<00:00, 44.85it/s, Loss after step 65=36]

 69%|██████████████████▋        | 69/100 [00:03<00:00, 44.85it/s, Loss after step 70=36.1]

 74%|███████████████████▉       | 74/100 [00:03<00:00, 44.85it/s, Loss after step 75=35.6]

 76%|████████████████████▌      | 76/100 [00:03<00:00, 56.12it/s, Loss after step 75=35.6]

 79%|█████████████████████▎     | 79/100 [00:03<00:00, 56.12it/s, Loss after step 80=32.8]

 84%|██████████████████████▋    | 84/100 [00:03<00:00, 56.12it/s, Loss after step 85=33.9]

 89%|████████████████████████   | 89/100 [00:03<00:00, 69.17it/s, Loss after step 85=33.9]

 89%|████████████████████████   | 89/100 [00:03<00:00, 69.17it/s, Loss after step 90=31.7]

 94%|█████████████████████████▍ | 94/100 [00:03<00:00, 69.17it/s, Loss after step 95=31.7]

 99%|█████████████████████████▋| 99/100 [00:03<00:00, 69.17it/s, Loss after step 100=31.4]

100%|█████████████████████████| 100/100 [00:03<00:00, 26.30it/s, Loss after step 100=31.4]




Saving the model to runs/lr_scheduler/run0/saved_model_2024-12-29T13:56:23+01:00.keras. Use `em.EncoderMap.from_checkpoint('runs/lr_scheduler/run0')` to load the most recent model, or `em.EncoderMap.from_checkpoint('runs/lr_scheduler/run0/saved_model_2024-12-29T13:56:23+01:00.keras')` to load the model with specific weights..
This model has a subclassed encoder, which can be loaded independently. Use `tf.keras.load_model('runs/lr_scheduler/run0/saved_model_2024-12-29T13:56:23+01:00_encoder.keras')` to load only this model.
This model has a subclassed decoder, which can be loaded independently. Use `tf.keras.load_model('runs/lr_scheduler/run0/saved_model_2024-12-29T13:56:23+01:00_decoder.keras')` to load only this model.


And now, we can see our current leanring rate in TensorBoard

<img src="lr_scheduler_1.png" width="800">

A constant learning rate of 0.001

<a id="lr_implementation"></a>

## Write a learning rate scheduler

We can write a learning rate scheduler either by providing intervals of training steps and the associated learning rate:

```python
def lr_schedule(step):
    """
    Returns a custom learning rate that decreases as steps progress.
    """
    learning_rate = 0.2
    if step > 10:
        learning_rate = 0.02
    if step > 20:
        learning_rate = 0.01
    if step > 50:
        learning_rate = 0.005
```

Or by using a function that gives us a learning rate:

```python
def scheduler(step, lr=1, n_steps=1000):
    """
    Returns a custom learning rate that decreases based on an exp function as steps progress.
    """
    if step < 10:
        return lr
    else:
        return lr * tf.math.exp(-step / n_steps)
```

Below, is an example combining both:

In [10]:
def scheduler(step, lr=1):
    """
    Returns a custom learning rate that decreases based on an exp function as steps progress.
    """
    if step < 10:
        return lr
    else:
        return lr * tf.math.exp(-0.1)

This scheduler function can simply be provided to the builtin `keras.callbacks.LearningRateScheduler` callback.

In [11]:
callback = tf.keras.callbacks.LearningRateScheduler(scheduler)

And appended to the list of `callbacks` in the EncoderMap class.

In [12]:
parameters = em.Parameters(
tensorboard=True,
periodicity=2*np.pi,
main_path=em.misc.run_path('runs/lr_scheduler'),
n_steps=50,
summary_step=1
)

e_map = em.EncoderMap(parameters, dihedrals)
e_map.add_callback(LearningRateLogger)
e_map.add_callback(callback)

Output files are saved to runs/lr_scheduler/run1 as defined in 'main_path' in the parameters.


Saved a text-summary of the model and an image in runs/lr_scheduler/run1, as specified in 'main_path' in the parameters.


In [13]:
history = e_map.train()

  0%|                                                              | 0/50 [00:00<?, ?it/s]

  0%|                                         | 0/50 [00:00<?, ?it/s, Loss after step ?=?]

  0%|                                       | 0/50 [00:02<?, ?it/s, Loss after step 1=133]

  2%|▌                              | 1/50 [00:02<02:12,  2.71s/it, Loss after step 1=133]

  2%|▌                             | 1/50 [00:02<02:12,  2.71s/it, Loss after step 2=99.5]

  4%|█▏                            | 2/50 [00:02<02:10,  2.71s/it, Loss after step 3=83.2]

  6%|█▉                              | 3/50 [00:02<02:07,  2.71s/it, Loss after step 4=71]

  8%|██▍                           | 4/50 [00:02<02:04,  2.71s/it, Loss after step 5=53.5]

 10%|███                           | 5/50 [00:02<02:01,  2.71s/it, Loss after step 6=47.6]

 12%|███▌                          | 6/50 [00:02<01:59,  2.71s/it, Loss after step 7=47.1]

 14%|████▍                           | 7/50 [00:02<01:56,  2.71s/it, Loss after step 8=45]

 16%|████▊                         | 8/50 [00:02<01:53,  2.71s/it, Loss after step 9=40.5]

 18%|█████▏                       | 9/50 [00:02<01:51,  2.71s/it, Loss after step 10=41.4]

 20%|█████▌                      | 10/50 [00:02<01:48,  2.71s/it, Loss after step 11=40.1]

 22%|██████▏                     | 11/50 [00:02<00:07,  5.35it/s, Loss after step 11=40.1]

 22%|██████▏                     | 11/50 [00:02<00:07,  5.35it/s, Loss after step 12=41.2]

 24%|██████▋                     | 12/50 [00:02<00:07,  5.35it/s, Loss after step 13=41.2]

 26%|███████▎                    | 13/50 [00:02<00:06,  5.35it/s, Loss after step 14=40.2]

 28%|███████▊                    | 14/50 [00:02<00:06,  5.35it/s, Loss after step 15=37.8]

 30%|████████▍                   | 15/50 [00:02<00:06,  5.35it/s, Loss after step 16=41.8]

 32%|████████▉                   | 16/50 [00:02<00:06,  5.35it/s, Loss after step 17=37.9]

 34%|█████████▌                  | 17/50 [00:02<00:06,  5.35it/s, Loss after step 18=40.4]

 36%|██████████                  | 18/50 [00:02<00:05,  5.35it/s, Loss after step 19=37.9]

 38%|██████████▋                 | 19/50 [00:02<00:05,  5.35it/s, Loss after step 20=38.4]

 40%|███████████▏                | 20/50 [00:02<00:05,  5.35it/s, Loss after step 21=41.9]

 42%|███████████▊                | 21/50 [00:02<00:05,  5.35it/s, Loss after step 22=39.4]

 44%|████████████▎               | 22/50 [00:02<00:02, 12.28it/s, Loss after step 22=39.4]

 44%|████████████▎               | 22/50 [00:02<00:02, 12.28it/s, Loss after step 23=40.8]

 46%|████████████▉               | 23/50 [00:02<00:02, 12.28it/s, Loss after step 24=40.9]

 48%|█████████████▍              | 24/50 [00:02<00:02, 12.28it/s, Loss after step 25=39.6]

 50%|██████████████              | 25/50 [00:02<00:02, 12.28it/s, Loss after step 26=39.8]

 52%|██████████████▌             | 26/50 [00:02<00:01, 12.28it/s, Loss after step 27=39.9]

 54%|███████████████             | 27/50 [00:02<00:01, 12.28it/s, Loss after step 28=39.1]

 56%|███████████████▋            | 28/50 [00:02<00:01, 12.28it/s, Loss after step 29=39.7]

 58%|████████████████▏           | 29/50 [00:02<00:01, 12.28it/s, Loss after step 30=39.7]

 60%|████████████████▊           | 30/50 [00:03<00:01, 12.28it/s, Loss after step 31=38.2]

 62%|█████████████████▎          | 31/50 [00:03<00:01, 12.28it/s, Loss after step 32=39.2]

 64%|█████████████████▉          | 32/50 [00:03<00:01, 12.28it/s, Loss after step 33=39.3]

 66%|██████████████████▍         | 33/50 [00:03<00:00, 20.59it/s, Loss after step 33=39.3]

 66%|██████████████████▍         | 33/50 [00:03<00:00, 20.59it/s, Loss after step 34=40.5]

 68%|███████████████████         | 34/50 [00:03<00:00, 20.59it/s, Loss after step 35=39.4]

 70%|███████████████████▌        | 35/50 [00:03<00:00, 20.59it/s, Loss after step 36=38.8]

 72%|████████████████████▏       | 36/50 [00:03<00:00, 20.59it/s, Loss after step 37=39.3]

 74%|████████████████████▋       | 37/50 [00:03<00:00, 20.59it/s, Loss after step 38=37.8]

 76%|█████████████████████▎      | 38/50 [00:03<00:00, 20.59it/s, Loss after step 39=41.9]

 78%|█████████████████████▊      | 39/50 [00:03<00:00, 20.59it/s, Loss after step 40=39.9]

 80%|██████████████████████▍     | 40/50 [00:03<00:00, 20.59it/s, Loss after step 41=38.3]

 82%|████████████████████████▌     | 41/50 [00:03<00:00, 20.59it/s, Loss after step 42=41]

 84%|█████████████████████████▏    | 42/50 [00:03<00:00, 20.59it/s, Loss after step 43=39]

 86%|████████████████████████    | 43/50 [00:03<00:00, 20.59it/s, Loss after step 44=37.1]

 88%|████████████████████████▋   | 44/50 [00:03<00:00, 30.09it/s, Loss after step 44=37.1]

 88%|██████████████████████████▍   | 44/50 [00:03<00:00, 30.09it/s, Loss after step 45=38]

 90%|█████████████████████████▏  | 45/50 [00:03<00:00, 30.09it/s, Loss after step 46=38.9]

 92%|█████████████████████████▊  | 46/50 [00:03<00:00, 30.09it/s, Loss after step 47=39.6]

 94%|██████████████████████████▎ | 47/50 [00:03<00:00, 30.09it/s, Loss after step 48=37.5]

 96%|██████████████████████████▉ | 48/50 [00:03<00:00, 30.09it/s, Loss after step 49=38.4]

 98%|███████████████████████████▍| 49/50 [00:03<00:00, 30.09it/s, Loss after step 50=38.3]

100%|████████████████████████████| 50/50 [00:03<00:00, 15.64it/s, Loss after step 50=38.3]




Saving the model to runs/lr_scheduler/run1/saved_model_2024-12-29T13:56:26+01:00.keras. Use `em.EncoderMap.from_checkpoint('runs/lr_scheduler/run1')` to load the most recent model, or `em.EncoderMap.from_checkpoint('runs/lr_scheduler/run1/saved_model_2024-12-29T13:56:26+01:00.keras')` to load the model with specific weights..
This model has a subclassed encoder, which can be loaded independently. Use `tf.keras.load_model('runs/lr_scheduler/run1/saved_model_2024-12-29T13:56:26+01:00_encoder.keras')` to load only this model.
This model has a subclassed decoder, which can be loaded independently. Use `tf.keras.load_model('runs/lr_scheduler/run1/saved_model_2024-12-29T13:56:26+01:00_decoder.keras')` to load only this model.


Here's what Tensorboard should look like:

<img src="lr_scheduler_2.png" width="800">

And here's the learning rate plotted from the history.

In [14]:
import plotly.express as px

px.line(history.history["lr"])

## Conclusion

Learning rate schedulers are helpful to prevent overtraining, but still slightly increase the predictive power of your NN model. EncoderMap's modularity allows for them to be simple Plug-In solutions.