From db00e79cdf232ea1e7036d9bd1a72566cb39a945 Mon Sep 17 00:00:00 2001 From: WeiguangHan Date: Thu, 7 Mar 2024 18:50:29 +0800 Subject: [PATCH] LLM: add user guide for benchmarking (#10284) * add user guide for benchmarking * change the name and place of the benchmark user guide * resolve some comments * resolve new comments * modify some typo * resolve some new comments * modify some descriptions --- .../source/_templates/sidebar_quicklinks.html | 3 + docs/readthedocs/source/_toc.yml | 1 + .../LLM/Quickstart/benchmark_quickstart.md | 113 ++++++++++++++++++ .../source/doc/LLM/Quickstart/index.rst | 3 +- 4 files changed, 119 insertions(+), 1 deletion(-) create mode 100644 docs/readthedocs/source/doc/LLM/Quickstart/benchmark_quickstart.md diff --git a/docs/readthedocs/source/_templates/sidebar_quicklinks.html b/docs/readthedocs/source/_templates/sidebar_quicklinks.html index 5da4603d..80fc064d 100644 --- a/docs/readthedocs/source/_templates/sidebar_quicklinks.html +++ b/docs/readthedocs/source/_templates/sidebar_quicklinks.html @@ -15,6 +15,9 @@
  • Use Text Generation WebUI on Windows with Intel GPU
  • +
  • + BigDL-LLM Benchmarking +
  • diff --git a/docs/readthedocs/source/_toc.yml b/docs/readthedocs/source/_toc.yml index 89a29102..ba24efc4 100644 --- a/docs/readthedocs/source/_toc.yml +++ b/docs/readthedocs/source/_toc.yml @@ -40,6 +40,7 @@ subtrees: - entries: - file: doc/LLM/Quickstart/install_windows_gpu - file: doc/LLM/Quickstart/webui_quickstart + - file: doc/LLM/Quickstart/benchmark_quickstart - file: doc/LLM/Overview/KeyFeatures/index title: "Key Features" subtrees: diff --git a/docs/readthedocs/source/doc/LLM/Quickstart/benchmark_quickstart.md b/docs/readthedocs/source/doc/LLM/Quickstart/benchmark_quickstart.md new file mode 100644 index 00000000..d929e4b2 --- /dev/null +++ b/docs/readthedocs/source/doc/LLM/Quickstart/benchmark_quickstart.md @@ -0,0 +1,113 @@ +# BigDL-LLM Benchmarking + +We can do benchmarking for BigDL-LLM on Intel CPUs and GPUs using the benchmark scripts we provide. + +## Prepare The Environment + +You can refer to [here](https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install.html) to install BigDL-LLM in your environment. The following dependencies are also needed to run the benchmark scripts. + +``` +pip install pandas +pip install omegaconf +``` + +## Prepare The Scripts + +Navigate to your local workspace and then download BigDL from GitHub. Modify the `config.yaml` under `all-in-one` folder for your own benchmark configurations. + +``` +cd your/local/workspace +git clone https://github.com/intel-analytics/BigDL.git +cd BigDL/python/llm/dev/benchmark/all-in-one/ +``` + +## Configure YAML File + +```yaml +repo_id: + - 'meta-llama/Llama-2-7b-chat-hf' +local_model_hub: '/mnt/disk1/models' +warm_up: 1 +num_trials: 3 +num_beams: 1 # default to greedy search +low_bit: 'sym_int4' # default to use 'sym_int4' (i.e. symmetric int4) +batch_size: 1 # default to 1 +in_out_pairs: + - '32-32' + - '1024-128' + - '2048-256' +test_api: + - "transformer_int4_gpu" +cpu_embedding: False +``` + +Some parameters in the yaml file that you can configure: + +- repo_id: The name of the model and its organization. +- local_model_hub: The folder path where the models are stored on your machine. +- low_bit: The low_bit precision you want to convert to for benchmarking. +- batch_size: The number of samples on which the models makes predictions in one forward pass. +- in_out_pairs: Input sequence length and output sequence length combined by '-'. +- test_api: Use different test functions on different machines. + - `transformer_int4_gpu` on Intel GPU for Linux + - `transformer_int4_gpu_win` on Intel GPU for Windows + - `transformer_int4` on Intel CPU +- cpu_embedding: Whether to put embedding on CPU (only avaiable now for windows gpu related test_api). + +## Run on Windows + +Please refer to [here](https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#runtime-configuration) to configure oneAPI environment variables. + +```eval_rst +.. tabs:: + .. tab:: Intel iGPU + + .. code-block:: bash + + set SYCL_CACHE_PERSISTENT=1 + set BIGDL_LLM_XMX_DISABLED=1 + + python run.py + + .. tab:: Intel Arc™ A300-Series or Pro A60 + + .. code-block:: bash + + set SYCL_CACHE_PERSISTENT=1 + python run.py + + .. tab:: Other Intel dGPU Series(e.g. Arc™ A770) + + .. code-block:: bash + + python run.py + +``` + +## Run on Linux + +```eval_rst +.. tabs:: + .. tab:: Intel Arc™ A-Series and Intel Data Center GPU Flex + + For Intel Arc™ A-Series Graphics and Intel Data Center GPU Flex Series, we recommend: + + .. code-block:: bash + + ./run-arc.sh + + .. tab:: Intel Data Center GPU Max + + For Intel Data Center GPU Max Series, we recommend: + + .. code-block:: bash + + ./run-max-gpu.sh + + Please note that you need to run ``conda install -c conda-forge -y gperftools=2.10`` to install essential dependencies for Intel Data Center GPU Max. + +``` + +## Result + +After the script runnning is completed, you can obtain a CSV result file under the current folder. You can mainly look at the results of columns `1st token avg latency (ms)` and `2+ avg latency (ms/token)` for performance results. You can also check whether the column `actual input/output tokens` is consistent with the column `input/output tokens` and whether the parameters you specified in config.yaml have been successfully applied in the benchmarking. \ No newline at end of file diff --git a/docs/readthedocs/source/doc/LLM/Quickstart/index.rst b/docs/readthedocs/source/doc/LLM/Quickstart/index.rst index 021c1012..ef595249 100644 --- a/docs/readthedocs/source/doc/LLM/Quickstart/index.rst +++ b/docs/readthedocs/source/doc/LLM/Quickstart/index.rst @@ -8,4 +8,5 @@ BigDL-LLM Quickstart This section includes efficient guide to show you how to: * `Install BigDL-LLM on Windows with Intel GPU <./install_windows_gpu.html>`_ -* `Use Text Generation WebUI on Windows with Intel GPU <./webui_quickstart.html>`_ \ No newline at end of file +* `Use Text Generation WebUI on Windows with Intel GPU <./webui_quickstart.html>`_ +* `Conduct Performance Benchmarking with BigDL-LLM <./benchmark_quickstart.html>`_ \ No newline at end of file