LLM: add user guide for benchmarking (#10284)

* add user guide for benchmarking * change the name and place of the benchmark user guide * resolve some comments * resolve new comments * modify some typo * resolve some new comments * modify some descriptions
2024-03-07 18:50:29 +08:00 · 2024-03-07 18:50:29 +08:00 · db00e79cdf
commit db00e79cdf
parent 1ac193ba02
4 changed files with 119 additions and 1 deletions
--- a/docs/readthedocs/source/_templates/sidebar_quicklinks.html
+++ b/docs/readthedocs/source/_templates/sidebar_quicklinks.html
@ -15,6 +15,9 @@
                    <li>
                        <a href="doc/LLM/Quickstart/webui_quickstart.html">Use Text Generation WebUI on Windows with Intel GPU</a>
                    </li>
+                    <li>
+                        <a href="doc/LLM/Quickstart/benchmark_quickstart.html">BigDL-LLM Benchmarking</a>
+                    </li>
                </ul>
            </li>
            <li>
--- a/docs/readthedocs/source/_toc.yml
+++ b/docs/readthedocs/source/_toc.yml
@ -40,6 +40,7 @@ subtrees:
              - entries:
                - file: doc/LLM/Quickstart/install_windows_gpu
                - file: doc/LLM/Quickstart/webui_quickstart
+                - file: doc/LLM/Quickstart/benchmark_quickstart
          - file: doc/LLM/Overview/KeyFeatures/index
            title: "Key Features"
            subtrees:
--- a/docs/readthedocs/source/doc/LLM/Quickstart/benchmark_quickstart.md
+++ b/docs/readthedocs/source/doc/LLM/Quickstart/benchmark_quickstart.md
@ -0,0 +1,113 @@
+# BigDL-LLM Benchmarking
+
+We can do benchmarking for BigDL-LLM on Intel CPUs and GPUs using the benchmark scripts we provide.
+
+## Prepare The Environment
+
+You can refer to [here](https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install.html) to install BigDL-LLM in your environment. The following dependencies are also needed to run the benchmark scripts.
+
+```
+pip install pandas
+pip install omegaconf
+```
+
+## Prepare The Scripts
+
+Navigate to your local workspace and then download BigDL from GitHub. Modify the `config.yaml` under `all-in-one` folder for your own benchmark configurations.
+
+```
+cd your/local/workspace
+git clone https://github.com/intel-analytics/BigDL.git
+cd BigDL/python/llm/dev/benchmark/all-in-one/
+```
+
+## Configure YAML File
+
+```yaml
+repo_id:
+  - 'meta-llama/Llama-2-7b-chat-hf'
+local_model_hub: '/mnt/disk1/models'
+warm_up: 1
+num_trials: 3
+num_beams: 1 # default to greedy search
+low_bit: 'sym_int4' # default to use 'sym_int4' (i.e. symmetric int4)
+batch_size: 1 # default to 1
+in_out_pairs:
+  - '32-32'
+  - '1024-128'
+  - '2048-256'
+test_api:
+  - "transformer_int4_gpu"
+cpu_embedding: False
+```
+
+Some parameters in the yaml file that you can configure:
+
+- repo_id: The name of the model and its organization.
+- local_model_hub: The folder path where the models are stored on your machine.
+- low_bit: The low_bit precision you want to convert to for benchmarking.
+- batch_size: The number of samples on which the models makes predictions in one forward pass.
+- in_out_pairs: Input sequence length and output sequence length combined by '-'.
+- test_api: Use different test functions on different machines.
+  - `transformer_int4_gpu` on Intel GPU for Linux
+  - `transformer_int4_gpu_win` on Intel GPU for Windows
+  - `transformer_int4` on Intel CPU
+- cpu_embedding: Whether to put embedding on CPU (only avaiable now for windows gpu related test_api).
+
+## Run on Windows
+
+Please refer to [here](https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#runtime-configuration) to configure oneAPI environment variables.
+
+```eval_rst
+.. tabs::
+   .. tab:: Intel iGPU
+
+      .. code-block:: bash
+
+         set SYCL_CACHE_PERSISTENT=1
+         set BIGDL_LLM_XMX_DISABLED=1
+
+         python run.py
+
+   .. tab:: Intel Arc™ A300-Series or Pro A60
+
+      .. code-block:: bash
+
+         set SYCL_CACHE_PERSISTENT=1
+         python run.py
+
+   .. tab:: Other Intel dGPU Series(e.g. Arc™ A770)
+
+      .. code-block:: bash
+
+         python run.py
+
+```
+
+## Run on Linux
+
+```eval_rst
+.. tabs::
+   .. tab:: Intel Arc™ A-Series and Intel Data Center GPU Flex
+
+      For Intel Arc™ A-Series Graphics and Intel Data Center GPU Flex Series, we recommend:
+
+      .. code-block:: bash
+
+         ./run-arc.sh
+
+   .. tab:: Intel Data Center GPU Max
+
+      For Intel Data Center GPU Max Series, we recommend:
+
+      .. code-block:: bash
+
+         ./run-max-gpu.sh
+
+      Please note that you need to run ``conda install -c conda-forge -y gperftools=2.10`` to install essential dependencies for Intel Data Center GPU Max.
+
+```
+
+## Result
+
+After the script runnning is completed, you can obtain a CSV result file under the current folder. You can mainly look at the results of columns `1st token avg latency (ms)` and `2+ avg latency (ms/token)` for  performance results. You can also check whether the column `actual input/output tokens` is consistent with the column `input/output tokens` and whether the parameters you specified in config.yaml have been successfully applied in the benchmarking.
--- a/docs/readthedocs/source/doc/LLM/Quickstart/index.rst
+++ b/docs/readthedocs/source/doc/LLM/Quickstart/index.rst
@ -8,4 +8,5 @@ BigDL-LLM Quickstart
 This section includes efficient guide to show you how to:

 * `Install BigDL-LLM on Windows with Intel GPU <./install_windows_gpu.html>`_
-* `Use Text Generation WebUI on Windows with Intel GPU <./webui_quickstart.html>`_
+* `Use Text Generation WebUI on Windows with Intel GPU <./webui_quickstart.html>`_
+* `Conduct Performance Benchmarking with BigDL-LLM <./benchmark_quickstart.html>`_