* refactor toc * refactor toc * Change to pydata-sphinx-theme and update packages requirement list for ReadtheDocs * Remove customized css for old theme * Add index page to each top bar section and limit dropdown maximum to be 4 * Use js to change 'More' to 'Libraries' * Add custom.css to conf.py for further css changes * Add BigDL logo and search bar * refactor toc * refactor toc and add overview * refactor toc and add overview * refactor toc and add overview * refactor get started * add paper and video section * add videos * add grid columns in landing page * add document roadmap to index * reapply search bar and github icon commit * reorg orca and chronos sections * Test: weaken ads by js * update: change left attrbute * update: add comments * update: change opacity to 0.7 * Remove useless theme template override for old theme * Add sidebar releases component in the home page * Remove sidebar search and restore top nav search button * Add BigDL handouts * Add back to homepage button to pages except from the home page * Update releases contents & styles in left sidebar * Add version badge to the top bar * Test: weaken ads by js * update: add comments * remove landing page contents * rfix chronos install * refactor install * refactor chronos section titles * refactor nano index * change chronos landing * revise chronos landing page * add document navigator to nano landing page * revise install landing page * Improve css of versions in sidebar * Make handouts image pointing to a page in new tab * add win guide to install * add dliib installation * revise title bar * rename index files * add index page for user guide * add dllib and orca API * update user guide landing page * refactor side bar * Remove extra style configuration of card components & make different card usage consistent * Remove extra styles for Nano how-to guides * Remove extra styles for Chronos how-to guides * Remove dark mode for now * Update index page description * Add decision tree for choosing BigDL libraries in index page * add dllib models api, revise core layers formats * Change primary & info color in light mode * Restyle card components * Restructure Chronos landing page * Update card style * Update BigDL library selection decision tree * Fix failed Chronos tutorials filter * refactor PPML documents * refactor and add friesian documents * add friesian arch diagram * update landing pages and fill key features guide index page * Restyle link card component * Style video frames in PPML sections * Adjust Nano landing page * put api docs to the last in index for convinience * Make badge horizontal padding smaller & small changes * Change the second letter of all header titles to be small capitalizd * Small changes on Chronos index page * Revise decision tree to make it smaller * Update: try to change the position of ads. * Bugfix: deleted nonexist file config * Update: update ad JS/CSS/config * Update: change ad. * Update: delete my template and change files. * Update: change chronos installation table color. * Update: change table font color to --pst-color-primary-text * Remove old contents in landing page sidebar * Restyle badge for usage in card footer again * Add quicklinks template on landing page sidebar * add quick links * Add scala logo * move tf, pytorch out of the link * change orca key features cards * fix typo * fix a mistake in wording * Restyle badge for card footer * Update decision tree * Remove useless html templates * add more api docs and update tutorials in dllib * update chronos install using new style * merge changes in nano doc from master * fix quickstart links in sidebar quicklinks * Make tables responsive * Fix overflow in api doc * Fix list indents problems in [User guide] section * Further fixes to nested bullets contents in [User Guide] section * Fix strange title in Nano 5-min doc * Fix list indent problems in [DLlib] section * Fix misnumbered list problems and other small fixes for [Chronos] section * Fix list indent problems and other small fixes for [Friesian] section * Fix list indent problem and other small fixes for [PPML] section * Fix list indent problem for developer guide * Fix list indent problem for [Cluster Serving] section * fix dllib links * Fix wrong relative link in section landing page Co-authored-by: Yuwen Hu <yuwen.hu@intel.com> Co-authored-by: Juntao Luo <1072087358@qq.com>
7.3 KiB
Accelerated Training and Inference
Chronos provides transparent acceleration for Chronos built-in models and customized time-series models. In this deep-dive page, we will introduce how to enable/disable them.
We will focus on single node acceleration for forecasting models' training and inferencing in this page. Other topic such as:
- Distributed time series data processing - XShardsTSDataset (based on Spark, powered by
bigdl.orca.data) - Distributed training on a cluster - Distributed training (based on Ray/Spark/Horovod, powered by
bigdl.orca.learn) - Non-forecasting models / non-deep-learning models - Prophet with intel python, DBScan Detector with intel Sklearn, DPGANSimulator pytorch implementation.
You may refer to other pages listed above.
1. Overview
Time series model, especially those deep learning models, often suffers slow training speed and unsatisfying inference speed. Chronos is adapted to integrate many optimized library and best known methods(BKMs) for performance improvement on built-in models and customized models.
2. Training Acceleration
Training Acceleration is transparent in Chronos's API. Transparentness means that Chronos users will enjoy the acceleration without changing their code(unless some expert users want to set some advanced settings).
.. note::
**Write your script under** ``if __name__=="__main__":``:
Chronos will automatically utilize the computation resources on the hardware. This may include multi-process training on a single node. Use this header will prevent many strange behavior.
2.1 Forecaster Training Acceleration
Currently, transparent acceleration for LSTMForecaster, Seq2SeqForecaster, TCNForecaster and NBeatsForecaster is automatically enabled and tested. Chronos will set various environment variables and config multi-processing training according to the hardware paremeters(e.g. cores number, ...).
Currently, this function is under active development and some expert users may want to change some config or disable some acceleration tricks. Here are some instructions.
Users may unset the environment by:
source bigdl-nano-unset-env
Users may set the the number of process to use in training by:
print(forecaster.num_processes) # num_processes is automatically optimized by Chronos
forecaster.num_processes = 1 # disable multi-processing training
forecaster.num_processes = 10 # You may set it to any number you want
Users may set the IPEX(Intel Pytorch Extension) availbility to use in training by:
print(forecaster.use_ipex) # use_ipex is automatically optimized by Chronos
forecaster.use_ipex = True # enable ipex during training
forecaster.use_ipex = False # disable ipex during training
2.2 Customized Model Training Acceleration
We provide an optimized pytorch-lightning Trainer, TSTrainer, to accelerate customized time series model defined by pytorch. A typical use-case can be using pytorch-forecasting's built-in models(they are defined in pytorch-lightning LightningModule) and Chronos TSTrainer to accelerate the training process.
TSTrainer requires very few code changes to your original code. Here is a quick guide:
# from pytorch-lightning import Trainer
from bigdl.chronos.pytorch import TSTrainer as Trainer
trainer = Trainer(...
# set number of processes for training
num_processes=8,
# disable GPU training, TSTrainer currently only available for CPU
gpus=0,
...)
We have examples adapted from pytorch-forecasting's examples to show the significant speed-up by using TSTrainer in our use-case.
2.3 Auto Tuning Acceleration
We are working on the acceleration of AutoModel and AutoTSEstimator. Please unset the environment by:
source bigdl-nano-unset-env
3. Inference Acceleration
Inference has become a critical part for time series model's performance. This may be divided to two parts:
- Throughput: how many samples can be predicted in a certain amount of time.
- Latency: how much time is used to predict 1 sample.
Typically, throughput and latency is a trade-off pair. We have three optimization options for inferencing in Chronos.
- Default: Generally useful for both throughput and latency.
- ONNX Runtime: Users may export their trained(w/wo auto tuning) model to ONNX file and deploy it on other service. Chronos also provides an internal onnxruntime inference support for those users who pursue low latency and higher throughput during inference on a single node.
- Quantization: Quantization refers to processes that enable lower precision inference. In Chronos, post-training quantization is supported relied on Intel® Neural Compressor.
.. note::
**Additional Dependencies**:
You need to install ``neural-compressor`` to enable quantization related methods.
``pip install neural-compressor==1.8.1``
3.1 Forecaster Inference Acceleration
3.1.1 Default Acceleration
Nothing needs to be done. Chronos has deployed accleration for inferencing. some expert users may want to change some config or disable some acceleration tricks. Here are some instructions:
Users may unset the environment by:
source bigdl-nano-unset-env
3.1.2 ONNX Runtime
LSTM, TCN, Seq2seq and NBeats has supported onnx in their forecasters. When users use these built-in models, they may call predict_with_onnx/evaluate_with_onnx for prediction or evaluation. They may also call export_onnx_file to export the onnx model file and build_onnx to change the onnxruntime's setting(not necessary).
f = Forecaster(...)
f.fit(...)
f.predict_with_onnx(...)
3.1.3 Quantization
LSTM, TCN and NBeats has supported quantization in their forecasters.
# init
f = Forecaster(...)
# train the forecaster
f.fit(train_data, ...)
# quantize the forecaster
f.quantize(train_data, ..., framework=...)
# predict with int8 model with better inference throughput
f.predict/predict_with_onnx(test_data, quantize=True)
# predict with fp32
f.predict/predict_with_onnx(test_data, quantize=False)
# save
f.save(checkpoint_file="fp32.model"
quantize_checkpoint_file="int8.model")
# load
f.load(checkpoint_file="fp32.model"
quantize_checkpoint_file="int8.model")
Please refer to Forecaster API Docs for details.
3.2 TSPipeline Inference Acceleration
Basically same to Forecaster
3.1.1 Default Acceleration
Basically same to Forecaster
3.1.2 ONNX Runtime
tsppl.predict_with_onnx(...)
3.1.3 Quantization
tsppl.quantize(...)
tsppl.predict/predict_with_onnx(test_data, quantize=True/False)
Please refer to TSPipeline API doc for details.