From db00e79cdf232ea1e7036d9bd1a72566cb39a945 Mon Sep 17 00:00:00 2001
From: WeiguangHan <weiguang.han@intel.com>
Date: Thu, 7 Mar 2024 18:50:29 +0800
Subject: [PATCH] LLM: add user guide for benchmarking (#10284)

* add user guide for benchmarking

* change the name and place of the benchmark user guide

* resolve some comments

* resolve new comments

* modify some typo

* resolve some new comments

* modify some descriptions
---
 .../source/_templates/sidebar_quicklinks.html |   3 +
 docs/readthedocs/source/_toc.yml              |   1 +
 .../LLM/Quickstart/benchmark_quickstart.md    | 113 ++++++++++++++++++
 .../source/doc/LLM/Quickstart/index.rst       |   3 +-
 4 files changed, 119 insertions(+), 1 deletion(-)
 create mode 100644 docs/readthedocs/source/doc/LLM/Quickstart/benchmark_quickstart.md
diff --git a/docs/readthedocs/source/_templates/sidebar_quicklinks.html b/docs/readthedocs/source/_templates/sidebar_quicklinks.html
index 5da4603d..80fc064d 100644
--- a/docs/readthedocs/source/_templates/sidebar_quicklinks.html
+++ b/docs/readthedocs/source/_templates/sidebar_quicklinks.html
@@ -15,6 +15,9 @@
                     <li>
                         <a href="doc/LLM/Quickstart/webui_quickstart.html">Use Text Generation WebUI on Windows with Intel GPU</a>
                     </li>
+                    <li>
+                        <a href="doc/LLM/Quickstart/benchmark_quickstart.html">BigDL-LLM Benchmarking</a>
+                    </li>
                 </ul>
             </li>
             <li>
diff --git a/docs/readthedocs/source/_toc.yml b/docs/readthedocs/source/_toc.yml
index 89a29102..ba24efc4 100644
--- a/docs/readthedocs/source/_toc.yml
+++ b/docs/readthedocs/source/_toc.yml
@@ -40,6 +40,7 @@ subtrees:
               - entries:
                 - file: doc/LLM/Quickstart/install_windows_gpu
                 - file: doc/LLM/Quickstart/webui_quickstart
+                - file: doc/LLM/Quickstart/benchmark_quickstart
           - file: doc/LLM/Overview/KeyFeatures/index
             title: "Key Features"
             subtrees:
diff --git a/docs/readthedocs/source/doc/LLM/Quickstart/benchmark_quickstart.md b/docs/readthedocs/source/doc/LLM/Quickstart/benchmark_quickstart.md
new file mode 100644
index 00000000..d929e4b2
--- /dev/null
+++ b/docs/readthedocs/source/doc/LLM/Quickstart/benchmark_quickstart.md
@@ -0,0 +1,113 @@
+# BigDL-LLM Benchmarking
+
+We can do benchmarking for BigDL-LLM on Intel CPUs and GPUs using the benchmark scripts we provide.
+
+## Prepare The Environment
+
+You can refer to [here](https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install.html) to install BigDL-LLM in your environment. The following dependencies are also needed to run the benchmark scripts.
+
+```
+pip install pandas
+pip install omegaconf
+```
+
+## Prepare The Scripts
+
+Navigate to your local workspace and then download BigDL from GitHub. Modify the `config.yaml` under `all-in-one` folder for your own benchmark configurations.
+
+```
+cd your/local/workspace
+git clone https://github.com/intel-analytics/BigDL.git
+cd BigDL/python/llm/dev/benchmark/all-in-one/
+```
+
+## Configure YAML File
+
+```yaml
+repo_id:
+  - 'meta-llama/Llama-2-7b-chat-hf'
+local_model_hub: '/mnt/disk1/models'
+warm_up: 1
+num_trials: 3
+num_beams: 1 # default to greedy search
+low_bit: 'sym_int4' # default to use 'sym_int4' (i.e. symmetric int4)
+batch_size: 1 # default to 1
+in_out_pairs:
+  - '32-32'
+  - '1024-128'
+  - '2048-256'
+test_api:
+  - "transformer_int4_gpu"
+cpu_embedding: False
+```
+
+Some parameters in the yaml file that you can configure:
+
+- repo_id: The name of the model and its organization.
+- local_model_hub: The folder path where the models are stored on your machine.
+- low_bit: The low_bit precision you want to convert to for benchmarking.
+- batch_size: The number of samples on which the models makes predictions in one forward pass.
+- in_out_pairs: Input sequence length and output sequence length combined by '-'.
+- test_api: Use different test functions on different machines.
+  - `transformer_int4_gpu` on Intel GPU for Linux
+  - `transformer_int4_gpu_win` on Intel GPU for Windows
+  - `transformer_int4` on Intel CPU
+- cpu_embedding: Whether to put embedding on CPU (only avaiable now for windows gpu related test_api).
+
+## Run on Windows
+
+Please refer to [here](https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#runtime-configuration) to configure oneAPI environment variables.
+
+```eval_rst
+.. tabs::
+   .. tab:: Intel iGPU
+
+      .. code-block:: bash
+
+         set SYCL_CACHE_PERSISTENT=1
+         set BIGDL_LLM_XMX_DISABLED=1
+
+         python run.py
+
+   .. tab:: Intel Arc™ A300-Series or Pro A60
+
+      .. code-block:: bash
+
+         set SYCL_CACHE_PERSISTENT=1
+         python run.py
+
+   .. tab:: Other Intel dGPU Series(e.g. Arc™ A770)
+
+      .. code-block:: bash
+
+         python run.py
+
+```
+
+## Run on Linux
+
+```eval_rst
+.. tabs::
+   .. tab:: Intel Arc™ A-Series and Intel Data Center GPU Flex
+
+      For Intel Arc™ A-Series Graphics and Intel Data Center GPU Flex Series, we recommend:
+
+      .. code-block:: bash
+
+         ./run-arc.sh
+
+   .. tab:: Intel Data Center GPU Max
+
+      For Intel Data Center GPU Max Series, we recommend:
+
+      .. code-block:: bash
+
+         ./run-max-gpu.sh
+
+      Please note that you need to run ``conda install -c conda-forge -y gperftools=2.10`` to install essential dependencies for Intel Data Center GPU Max.
+
+```
+
+## Result
+
+After the script runnning is completed, you can obtain a CSV result file under the current folder. You can mainly look at the results of columns `1st token avg latency (ms)` and `2+ avg latency (ms/token)` for  performance results. You can also check whether the column `actual input/output tokens` is consistent with the column `input/output tokens` and whether the parameters you specified in config.yaml have been successfully applied in the benchmarking.
\ No newline at end of file
diff --git a/docs/readthedocs/source/doc/LLM/Quickstart/index.rst b/docs/readthedocs/source/doc/LLM/Quickstart/index.rst
index 021c1012..ef595249 100644
--- a/docs/readthedocs/source/doc/LLM/Quickstart/index.rst
+++ b/docs/readthedocs/source/doc/LLM/Quickstart/index.rst
@@ -8,4 +8,5 @@ BigDL-LLM Quickstart
 This section includes efficient guide to show you how to:
 
 * `Install BigDL-LLM on Windows with Intel GPU <./install_windows_gpu.html>`_
-* `Use Text Generation WebUI on Windows with Intel GPU <./webui_quickstart.html>`_
\ No newline at end of file
+* `Use Text Generation WebUI on Windows with Intel GPU <./webui_quickstart.html>`_
+* `Conduct Performance Benchmarking with BigDL-LLM <./benchmark_quickstart.html>`_
\ No newline at end of file