From b11496db600097142e58f6ab9ab51c3c06f6b3ba Mon Sep 17 00:00:00 2001 From: Chaselzxy <47959406+Chaselzxy@users.noreply.github.com> Date: Wed, 17 Aug 2022 20:11:24 +0800 Subject: [PATCH] Chronos: How-to-Guides (How to train forecaster on single node) (#5224) * how to train forecaster * add a notebook * add new notebook * add index * fix colab link * change content Co-authored-by: Xinyi Zhang Co-authored-by: theaperdeng --- docs/readthedocs/source/_toc.yml | 1 + .../how_to_train_forecaster_on_one_node.ipynb | 269 ++++++++++++++++++ .../source/doc/Chronos/Howto/index.rst | 16 ++ 3 files changed, 286 insertions(+) create mode 100644 docs/readthedocs/source/doc/Chronos/Howto/how_to_train_forecaster_on_one_node.ipynb create mode 100644 docs/readthedocs/source/doc/Chronos/Howto/index.rst diff --git a/docs/readthedocs/source/_toc.yml b/docs/readthedocs/source/_toc.yml index 57a1c9c3..4c1dbdd6 100644 --- a/docs/readthedocs/source/_toc.yml +++ b/docs/readthedocs/source/_toc.yml @@ -55,6 +55,7 @@ subtrees: - file: doc/Chronos/Overview/windows_guide - file: doc/Chronos/Overview/quick-tour - file: doc/Chronos/Overview/deep_dive + - file: doc/Chronos/Howto/index - file: doc/Chronos/QuickStart/index - file: doc/Chronos/Overview/chronos_known_issue diff --git a/docs/readthedocs/source/doc/Chronos/Howto/how_to_train_forecaster_on_one_node.ipynb b/docs/readthedocs/source/doc/Chronos/Howto/how_to_train_forecaster_on_one_node.ipynb new file mode 100644 index 00000000..6224772f --- /dev/null +++ b/docs/readthedocs/source/doc/Chronos/Howto/how_to_train_forecaster_on_one_node.ipynb @@ -0,0 +1,269 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "SQRh9TDkmexb" + }, + "source": [ + "![image.png]()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "[Open in Colab Here](https://colab.research.google.com/github/intel-analytics/BigDL/blob/main/docs/readthedocs/source/doc/Chronos/Howto/how_to_train_forecaster_on_one_node.ipynb)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Train forcaster on single node" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "onZXfhnkmexc" + }, + "source": [ + "## Introduction\n", + "\n", + "In Chronos, Forecaster (`bigdl.chronos.forecaster.Forecaster`) is the forecasting abstraction. It hides the complex logic of model's creation, training, scaling to cluster, tuning, optimization and inferencing while expose some APIs for users (e.g. `fit` in this guide) to control.\n", + "\n", + "In this guidance, **we demonstrate how to train forecasters on one node**. In the training process, forecaster will learn the pattern (like the period, scale...) in history data. Although Chronos supports training on a cluster, it's highly recommeneded to try one node first before allocating a cluster to make life easier.\n", + "\n", + "We will take `TCNForecaster` and nyc_taxi dataset as an exmaple in this guide." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Setup\n", + "\n", + "Before we begin, we need to install chronos if it isn’t already available, we choose to use pytorch as deep learning backend." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "!pip install --pre --upgrade bigdl-chronos[pytorch]\n", + "!pip uninstall -y torchtext # uninstall torchtext to avoid version conflict\n", + "exit()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Data preparation\n", + "\n", + "First, we load the nyc taxi dataset." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from bigdl.chronos.data.repo_dataset import get_public_dataset\n", + "from sklearn.preprocessing import StandardScaler\n", + "\n", + "tsdata_train, tsdata_val, _ = get_public_dataset(name='nyc_taxi')\n", + "\n", + "stand = StandardScaler()\n", + "for tsdata in [tsdata_train, tsdata_val]:\n", + " tsdata.impute()\\\n", + " .scale(stand, fit=tsdata is tsdata_train)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "i8kjsyL4mexd" + }, + "source": [ + "Forecaster supports learning history data in many formats, including:\n", + "\n", + "1. `bigdl.chronos.data.TSDataset` (**recommended**)\n", + "2. pytorch dataloader\n", + "3. tensorflow dataset\n", + "4. numpy ndarray\n", + "\n", + "It's always recommended to use `TSDataset` direcetly when possible, while many other formats are supported if users want to customize their own data processing and feature engineering procedure." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# comment this line if you want to try other formats\n", + "train_data = tsdata_train\n", + "\n", + "# uncomment this line to change `train_data` as pytorch dataloader\n", + "# train_data = tsdata_train.to_torch_data_loader(roll=True, lookback=48, horizon=1)\n", + "\n", + "# uncomment this line to change `train_data` as numpy ndarray\n", + "# train_data = tsdata_train.roll(lookback=48, horizon=1).to_numpy()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "IswBKf39mexg" + }, + "source": [ + "## Training\n", + "\n", + "First we will create a `TCNForecaster`.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from bigdl.chronos.forecaster.tcn_forecaster import TCNForecaster\n", + "\n", + "forecaster = TCNForecaster(past_seq_len=48,\n", + " future_seq_len=1,\n", + " input_feature_num=1,\n", + " output_feature_num=1)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Then just call `fit` on the training data!\n", + "\n", + "There are some other parameters you may want to change, including `epochs` and `batch_size`. How change these hyperparameters might be tricky and highly based on experience. \n", + "\n", + "Here is a \"rule of thumb\" for users who are not that familiar with manual hyperparameter tuning: adjust `epochs` to make sure your mse training loss on scaled data (you can find it on the right side on progress bar, and \"mse\" is the default loss function) decreases under than 0.1 and leave `batch_size` to the default value." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "yc34x9dcmexg", + "outputId": "bba44274-e6d0-408a-bf9b-32d549fe1ea9" + }, + "outputs": [], + "source": [ + "forecaster.fit(train_data)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ZiolCDSmmexj" + }, + "source": [ + "## Validation (optional)\n", + "\n", + "Overfitting is a common issue when it comes to time series analysis since the data are not always large enough, one (and probably the most effective) method to avoid overfitting is validate your model during the trianing process. `Forecaster.fit` API also provides a way to validate the forecaster and find the best stop point before the model suffers overfitting.\n", + "\n", + "As in following cases, we choose a validation_mode called \"best_epoch\" to load back the checkpoint with best validation loss on validation data." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# comment this line if you want to try other formats\n", + "val_data = tsdata_val.to_torch_data_loader(roll=True, lookback=48, horizon=1)\n", + "\n", + "# uncomment this line to change `val_data` as numpy ndarray\n", + "# val_data = tsdata_val.roll(lookback=48, horizon=1).to_numpy()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "o7jWojYNmexk", + "outputId": "eb4cc68f-4d38-410d-b007-c507da30b97f" + }, + "outputs": [], + "source": [ + "# create a new forecaster\n", + "forecaster = TCNForecaster(past_seq_len=48,\n", + " future_seq_len=1,\n", + " input_feature_num=1,\n", + " output_feature_num=1)\n", + "\n", + "# train the forecaster with validation data to avoid overfitting\n", + "forecaster.fit(train_data,\n", + " epochs=3,\n", + " validation_data=val_data,\n", + " validation_mode='best_epoch')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Acceleration on intel CPU\n", + "\n", + "Chronos only support CPU (and transparent acceleration on Intel CPU) for training. Since time series data is not large, CPU training is enough for most cases.\n", + "\n", + "You may find there is no code needed in this section since the acceleration happens transparently (i.e. no specific code is needed for acceleration settings) inside the forecasters in Chronos. This includes **automatically** sets the\n", + "\n", + "1. better memory allocator\n", + "2. hardware-awared system variables for multithread utilization\n", + "3. multiprocess training on powerful CPU\n", + "4. usage of IPEX (intel pytorch extension) and intel optimized tensorflow." + ] + } + ], + "metadata": { + "colab": { + "collapsed_sections": [], + "name": "train_forecaster_on_one_node.ipynb", + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3.7.13 ('chronos')", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.13" + }, + "vscode": { + "interpreter": { + "hash": "f7cbcfcf124497a723b2fc91b0dad8cd6ed41af955928289a9d3478af9690021" + } + } + }, + "nbformat": 4, + "nbformat_minor": 1 +} diff --git a/docs/readthedocs/source/doc/Chronos/Howto/index.rst b/docs/readthedocs/source/doc/Chronos/Howto/index.rst new file mode 100644 index 00000000..0163c7c5 --- /dev/null +++ b/docs/readthedocs/source/doc/Chronos/Howto/index.rst @@ -0,0 +1,16 @@ +Chronos How-to Guides +========================= +How-to guides are bite-sized, executable examples where users could check when meeting with some specific topic during the usage. + +Forecasting +------------------------- +* `Train forcaster on single node `__ + + In this guidance, **we demonstrate how to train forecasters on one node**. In the training process, forecaster will learn the pattern (like the period, scale...) in history data. Although Chronos supports training on a cluster, it's highly recommeneded to try one node first before allocating a cluster to make life easier. + + +.. toctree:: + :maxdepth: 1 + :hidden: + + how_to_train_forecaster_on_one_node