{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "7A6j7pHKEWgC" }, "source": [ "# Getting started: Basic Cube\n", "\n", "**Welcome**\n", "\n", "Welcome to your first EncoderMap tutorial. All EncoderMap tutorials are provided as jupyter notebooks, that you can run locally, on binderhub, or even on google colab.\n", "\n", "Run this notebook on Google Colab:\n", "\n", "[](https://colab.research.google.com/github/AG-Peter/encodermap/blob/main/tutorials/notebooks_starter/01_Basic_Usage-Cube_Example.ipynb)\n", "\n", "Find the documentation of EncoderMap:\n", "\n", "https://ag-peter.github.io/encodermap\n", "\n", "**Goals:**\n", "\n", "In this tutorial you will learn:\n", "- [How to set training parameters for EncoderMap.](#select-parameters)\n", "- [How to train EncoderMap.](#perform-dimensionality-reduction)\n", "- [How to use the decoder part of the network to create high-dimensional data.](#generate-high-dimensional-data)" ] }, { "cell_type": "markdown", "metadata": { "editable": true, "id": "CNvfnyZyEWgD", "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "**For Google colab only:**\n", "\n", "If you're on Google colab, please uncomment these lines and install EncoderMap." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "execution": { "iopub.execute_input": "2024-12-30T09:52:18.205709Z", "iopub.status.busy": "2024-12-30T09:52:18.205585Z", "iopub.status.idle": "2024-12-30T09:52:18.207633Z", "shell.execute_reply": "2024-12-30T09:52:18.207327Z" } }, "outputs": [], "source": [ "# !wget https://raw.githubusercontent.com/AG-Peter/encodermap/main/tutorials/install_encodermap_google_colab.sh\n", "# !sudo bash install_encodermap_google_colab.sh" ] }, { "cell_type": "markdown", "metadata": { "editable": true, "id": "ek5-hP-WEWgF", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "## Import Libraries\n", "Before we can get started using EncoderMap we first need to import the EncoderMap library:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "execution": { "iopub.execute_input": "2024-12-30T09:52:18.209228Z", "iopub.status.busy": "2024-12-30T09:52:18.209094Z", "iopub.status.idle": "2024-12-30T09:52:21.685666Z", "shell.execute_reply": "2024-12-30T09:52:21.684898Z" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/kevin/git/encoder_map_private/encodermap/__init__.py:194: GPUsAreDisabledWarning: EncoderMap disables the GPU per default because most tensorflow code runs with a higher compatibility when the GPU is disabled. If you want to enable GPUs manually, set the environment variable 'ENCODERMAP_ENABLE_GPU' to 'True' before importing EncoderMap. To do this in python you can run:\n", "\n", "import os; os.environ['ENCODERMAP_ENABLE_GPU'] = 'True'\n", "\n", "before importing encodermap.\n", " _warnings.warn(\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "d7e3f95fb45749329ce02d3fef48f777", "version_major": 2, "version_minor": 0 }, "text/plain": [] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "import encodermap as em" ] }, { "cell_type": "markdown", "metadata": { "editable": true, "id": "QE_obd3YEWgF", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "We will also need some aditional imports for plotting. The line with `google.colab` imports some nice features for google colab, which renders pandas Dataframes very nicely." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "execution": { "iopub.execute_input": "2024-12-30T09:52:21.688088Z", "iopub.status.busy": "2024-12-30T09:52:21.687673Z", "iopub.status.idle": "2024-12-30T09:52:21.737483Z", "shell.execute_reply": "2024-12-30T09:52:21.736981Z" } }, "outputs": [], "source": [ "import plotly\n", "import plotly.graph_objs as go\n", "import plotly.express as px\n", "import plotly.io as pio\n", "import pandas as pd\n", "import numpy as np\n", "try:\n", " from google.colab import data_table, output\n", " data_table.enable_dataframe_formatter()\n", " output.enable_custom_widget_manager()\n", " renderer = \"colab\"\n", "except ModuleNotFoundError:\n", " renderer = \"plotly_mimetype+notebook\"\n", "pio.renderers.default = renderer" ] }, { "cell_type": "markdown", "metadata": { "editable": true, "id": "HMWIMDMZEWgG", "pycharm": { "name": "#%% md\n" }, "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "To ensure that this notebook yields reproducible output, we fix the randomness in tensorflow." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "execution": { "iopub.execute_input": "2024-12-30T09:52:21.739096Z", "iopub.status.busy": "2024-12-30T09:52:21.738956Z", "iopub.status.idle": "2024-12-30T09:52:21.741023Z", "shell.execute_reply": "2024-12-30T09:52:21.740655Z" } }, "outputs": [], "source": [ "import tensorflow as tf\n", "tf.random.set_seed(3)" ] }, { "cell_type": "markdown", "metadata": { "editable": true, "id": "9k3lArfCEWgG", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "## Load Data\n", "Next, we need to load our data. EncoderMap expects the input data to be a 2d array. Each line should contain one data point and the number of columns is the dimensionality of the data set. Here, you could load data from any source. In this tutorial, however, we will use a function to generate a toy data set. The function `random_on_cube_edges` distributes a given number of points randomly on the edges of a cube. We can also add some Gaussian noise by specifying a sigma value." ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "execution": { "iopub.execute_input": "2024-12-30T09:52:21.742388Z", "iopub.status.busy": "2024-12-30T09:52:21.742191Z", "iopub.status.idle": "2024-12-30T09:52:21.747218Z", "shell.execute_reply": "2024-12-30T09:52:21.746960Z" } }, "outputs": [], "source": [ "high_d_data, ids = em.misc.create_n_cube()" ] }, { "cell_type": "markdown", "metadata": { "editable": true, "id": "vkyraGl3EWgG", "slideshow": { "slide_type": "" }, "tags": [] }, "source": [ "let's look at the data we have just created:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "execution": { "iopub.execute_input": "2024-12-30T09:52:21.748620Z", "iopub.status.busy": "2024-12-30T09:52:21.748513Z", "iopub.status.idle": "2024-12-30T09:52:21.757800Z", "shell.execute_reply": "2024-12-30T09:52:21.757520Z" } }, "outputs": [ { "data": { "text/html": [ "
\n", " | id | \n", "x | \n", "y | \n", "z | \n", "
---|---|---|---|---|
Point 0 | \n", "1 | \n", "-0.005684 | \n", "-0.003473 | \n", "0.050832 | \n", "
Point 1 | \n", "1 | \n", "-0.023902 | \n", "-0.001094 | \n", "-0.069661 | \n", "
Point 2 | \n", "1 | \n", "0.011999 | \n", "0.012342 | \n", "0.018002 | \n", "
Point 3 | \n", "1 | \n", "0.000477 | \n", "-0.066432 | \n", "0.054068 | \n", "
Point 4 | \n", "1 | \n", "-0.025671 | \n", "-0.013453 | \n", "0.045888 | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "... | \n", "
Point 5995 | \n", "11 | \n", "0.972860 | \n", "1.003848 | \n", "0.985437 | \n", "
Point 5996 | \n", "11 | \n", "1.044156 | \n", "0.974762 | \n", "0.987097 | \n", "
Point 5997 | \n", "11 | \n", "0.918292 | \n", "1.016864 | \n", "1.010103 | \n", "
Point 5998 | \n", "11 | \n", "1.040325 | \n", "0.969910 | \n", "1.035233 | \n", "
Point 5999 | \n", "11 | \n", "1.006821 | \n", "1.024599 | \n", "1.044481 | \n", "
6000 rows × 4 columns
\n", "