diff --git a/docs/readthedocs/source/doc/Chronos/Howto/how_to_use_benchmark_tool.md b/docs/readthedocs/source/doc/Chronos/Howto/how_to_use_benchmark_tool.md
index 21ae9157..87032714 100644
--- a/docs/readthedocs/source/doc/Chronos/Howto/how_to_use_benchmark_tool.md
+++ b/docs/readthedocs/source/doc/Chronos/Howto/how_to_use_benchmark_tool.md
@@ -28,10 +28,16 @@ benchmark-chronos -m lstm -l 96 -o 720
 ```
 
 ### Stage
-Regarding a model, training and inference stages are most concerned. By setting `-s/--stage` parameter, users can obtain knowledge of throughput during training (`-s train`), throughput during inference (`-s throughput`) and latency of inference (`-s latency`). If not specified, train is used as the default.
+Regarding a model, training and inference stages are most concerned. By setting `-s/--stage` parameter, users can obtain knowledge of throughput during training (`-s train`), accuracy after training(`-s accuracy`). throughput during inference (`-s throughput`) and latency of inference (`-s latency`). If not specified, train is used as the default.
 ```bash
 benchmark-chronos -s latency -l 96 -o 720
 ```
+```eval_rst
+.. note::
+    **More About Accuracy Results**:
+
+     After setting ``-s accuracy``, the tool will load dataset and split it to train, validation and test set with ratio of 7:1:2. Then validation loss is monitored during training epoches and checkpoint of the epoch with smallest loss is loaded after training. With the trained forecaster, obtain evaluation results corresponding to ``--metrics``.
+```
 
 ### Dataset
 Several built-in datasets can be chosen, including nyc_taxi and tsinghua_electricity. If users are with poor Internet connection and hard to download dataset, run benchmark tool with `-d synthetic_dataset` to use synthetic dataset. Default to be tsinghua_electricity if `-d/--dataset` parameter is not specified.
@@ -74,12 +80,26 @@ benchmark-chronos -l 96 -o 720
 ```
 
 ## Advanced Options
+When `-s/--stage accuracy` is set, users can further specify evaluation metrics through `--metrics` which default to be mse and mae.
+```bash
+benchmark-chronos --stage accuracy --metrics mse rmse  -l 96 -o 720
+```
+
+To improve model accuracy, the tool provides with normalization trick to alleviate distribution shift. Once enable `--normalization`, normalization trick will be applied to forecaster.
+```bash
+benchmark-chronos --stage accuracy --normalization -l 96 -o 720
+```
+```eval_rst
+.. note::
+     Only TCNForecaster supports normalization trick now.
+```
+
 Besides, number of processes and epoches can be set by `--training_processes` and `--training_epochs`. Users can also tune batchsize during training and inference through `--training_batchsize` and `--inference_batchsize` respectively.
 ```bash
 benchmark-chronos --training_processes 2 --training_epochs 3 --training_batchsize 32 --inference_batchsize 128 -l 96 -o 720
 ```
 
-To speed up inference, accelerators like ONNXRuntime and OpenVINO are usually used. To benchmark inference performance with or without accelerator, run tool with `--inference_framework` to specify without accelerator (`--inference_framework torch`)or with ONNXRuntime (`--inference_framework onnx`) or with OpenVINO (`--inference_framework openvino`).
+To speed up inference, accelerators like ONNXRuntime and OpenVINO are usually used. To benchmark inference performance with or without accelerator, run tool with `--inference_framework` to specify without accelerator (`--inference_framework torch`)or with ONNXRuntime (`--inference_framework onnx`) or with OpenVINO (`--inference_framework openvino`) or with jit (`--inference_framework jit`).
 ```bash
 benchmark-chronos --inference_framework onnx -l 96 -o 720
 ```
@@ -101,21 +121,21 @@ benchmark-chronos -h
 ```eval_rst
 .. code-block:: python
 
-    usage: benchmark-chronos.py [-h] [-m] [-s] [-d] [-f] [-c] -l lookback -o
-                                horizon [--training_processes]
-                                [--training_batchsize] [--training_epochs]
-                                [--inference_batchsize] [--quantize]
-                                [--inference_framework  [...]] [--ipex]
-                                [--quantize_type] [--ckpt]
+     usage: benchmark-chronos [-h] [-m] [-s] [-d] [-f] [-c] -l lookback -o horizon
+                             [--training_processes] [--training_batchsize]
+                             [--training_epochs] [--inference_batchsize]
+                             [--quantize] [--inference_framework  [...]] [--ipex]
+                             [--quantize_type] [--ckpt] [--metrics  [...]]
+                             [--normalization]
 
-    Benchmarking Parameters
+     Benchmarking Parameters
 
-    optional arguments:
+     optional arguments:
       -h, --help            show this help message and exit
       -m, --model           model name, choose from
                             tcn/lstm/seq2seq/nbeats/autoformer, default to "tcn".
-      -s, --stage           stage name, choose from train/latency/throughput,
-                            default to "train".
+      -s, --stage           stage name, choose from
+                            train/latency/throughput/accuracy, default to "train".
       -d, --dataset         dataset name, choose from
                             nyc_taxi/tsinghua_electricity/synthetic_dataset,
                             default to "tsinghua_electricity".
@@ -137,7 +157,7 @@ benchmark-chronos -h
                             False.
       --inference_framework  [ ...]
                             predict without/with accelerator, choose from
-                            torch/onnx/openvino, default to "torch" (i.e. predict
+                            torch/onnx/openvino/jit, default to "torch" (i.e. predict
                             without accelerator).
       --ipex                if use ipex as accelerator for trainer, default to
                             False.
@@ -146,5 +166,9 @@ benchmark-chronos -h
                             default to "pytorch_fx".
       --ckpt                checkpoint path of a trained model, e.g.
                             "checkpoints/tcn", default to "checkpoints/tcn".
+      --metrics  [ ...]     evaluation metrics of a trained model, e.g.
+                            "mse"/"mae", default to "mse, mae".
+      --normalization       if to use normalization trick to alleviate
+                            distribution shift.
 ```