LLM: add user guide for benchmarking (#10284)
* add user guide for benchmarking * change the name and place of the benchmark user guide * resolve some comments * resolve new comments * modify some typo * resolve some new comments * modify some descriptions
This commit is contained in:
		
							parent
							
								
									1ac193ba02
								
							
						
					
					
						commit
						db00e79cdf
					
				
					 4 changed files with 119 additions and 1 deletions
				
			
		| 
						 | 
				
			
			@ -15,6 +15,9 @@
 | 
			
		|||
                    <li>
 | 
			
		||||
                        <a href="doc/LLM/Quickstart/webui_quickstart.html">Use Text Generation WebUI on Windows with Intel GPU</a>
 | 
			
		||||
                    </li>
 | 
			
		||||
                    <li>
 | 
			
		||||
                        <a href="doc/LLM/Quickstart/benchmark_quickstart.html">BigDL-LLM Benchmarking</a>
 | 
			
		||||
                    </li>
 | 
			
		||||
                </ul>
 | 
			
		||||
            </li>
 | 
			
		||||
            <li>
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
| 
						 | 
				
			
			@ -40,6 +40,7 @@ subtrees:
 | 
			
		|||
              - entries:
 | 
			
		||||
                - file: doc/LLM/Quickstart/install_windows_gpu
 | 
			
		||||
                - file: doc/LLM/Quickstart/webui_quickstart
 | 
			
		||||
                - file: doc/LLM/Quickstart/benchmark_quickstart
 | 
			
		||||
          - file: doc/LLM/Overview/KeyFeatures/index
 | 
			
		||||
            title: "Key Features"
 | 
			
		||||
            subtrees:
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
| 
						 | 
				
			
			@ -0,0 +1,113 @@
 | 
			
		|||
# BigDL-LLM Benchmarking
 | 
			
		||||
 | 
			
		||||
We can do benchmarking for BigDL-LLM on Intel CPUs and GPUs using the benchmark scripts we provide.
 | 
			
		||||
 | 
			
		||||
## Prepare The Environment
 | 
			
		||||
 | 
			
		||||
You can refer to [here](https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install.html) to install BigDL-LLM in your environment. The following dependencies are also needed to run the benchmark scripts.
 | 
			
		||||
 | 
			
		||||
```
 | 
			
		||||
pip install pandas
 | 
			
		||||
pip install omegaconf
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
## Prepare The Scripts
 | 
			
		||||
 | 
			
		||||
Navigate to your local workspace and then download BigDL from GitHub. Modify the `config.yaml` under `all-in-one` folder for your own benchmark configurations.
 | 
			
		||||
 | 
			
		||||
```
 | 
			
		||||
cd your/local/workspace
 | 
			
		||||
git clone https://github.com/intel-analytics/BigDL.git
 | 
			
		||||
cd BigDL/python/llm/dev/benchmark/all-in-one/
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
## Configure YAML File
 | 
			
		||||
 | 
			
		||||
```yaml
 | 
			
		||||
repo_id:
 | 
			
		||||
  - 'meta-llama/Llama-2-7b-chat-hf'
 | 
			
		||||
local_model_hub: '/mnt/disk1/models'
 | 
			
		||||
warm_up: 1
 | 
			
		||||
num_trials: 3
 | 
			
		||||
num_beams: 1 # default to greedy search
 | 
			
		||||
low_bit: 'sym_int4' # default to use 'sym_int4' (i.e. symmetric int4)
 | 
			
		||||
batch_size: 1 # default to 1
 | 
			
		||||
in_out_pairs:
 | 
			
		||||
  - '32-32'
 | 
			
		||||
  - '1024-128'
 | 
			
		||||
  - '2048-256'
 | 
			
		||||
test_api:
 | 
			
		||||
  - "transformer_int4_gpu"
 | 
			
		||||
cpu_embedding: False
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
Some parameters in the yaml file that you can configure:
 | 
			
		||||
 | 
			
		||||
- repo_id: The name of the model and its organization.
 | 
			
		||||
- local_model_hub: The folder path where the models are stored on your machine.
 | 
			
		||||
- low_bit: The low_bit precision you want to convert to for benchmarking.
 | 
			
		||||
- batch_size: The number of samples on which the models makes predictions in one forward pass.
 | 
			
		||||
- in_out_pairs: Input sequence length and output sequence length combined by '-'.
 | 
			
		||||
- test_api: Use different test functions on different machines.
 | 
			
		||||
  - `transformer_int4_gpu` on Intel GPU for Linux
 | 
			
		||||
  - `transformer_int4_gpu_win` on Intel GPU for Windows
 | 
			
		||||
  - `transformer_int4` on Intel CPU
 | 
			
		||||
- cpu_embedding: Whether to put embedding on CPU (only avaiable now for windows gpu related test_api).
 | 
			
		||||
 | 
			
		||||
## Run on Windows
 | 
			
		||||
 | 
			
		||||
Please refer to [here](https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#runtime-configuration) to configure oneAPI environment variables.
 | 
			
		||||
 | 
			
		||||
```eval_rst
 | 
			
		||||
.. tabs::
 | 
			
		||||
   .. tab:: Intel iGPU
 | 
			
		||||
 | 
			
		||||
      .. code-block:: bash
 | 
			
		||||
 | 
			
		||||
         set SYCL_CACHE_PERSISTENT=1
 | 
			
		||||
         set BIGDL_LLM_XMX_DISABLED=1
 | 
			
		||||
 | 
			
		||||
         python run.py
 | 
			
		||||
 | 
			
		||||
   .. tab:: Intel Arc™ A300-Series or Pro A60
 | 
			
		||||
 | 
			
		||||
      .. code-block:: bash
 | 
			
		||||
 | 
			
		||||
         set SYCL_CACHE_PERSISTENT=1
 | 
			
		||||
         python run.py
 | 
			
		||||
 | 
			
		||||
   .. tab:: Other Intel dGPU Series(e.g. Arc™ A770)
 | 
			
		||||
 | 
			
		||||
      .. code-block:: bash
 | 
			
		||||
 | 
			
		||||
         python run.py
 | 
			
		||||
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
## Run on Linux
 | 
			
		||||
 | 
			
		||||
```eval_rst
 | 
			
		||||
.. tabs::
 | 
			
		||||
   .. tab:: Intel Arc™ A-Series and Intel Data Center GPU Flex
 | 
			
		||||
 | 
			
		||||
      For Intel Arc™ A-Series Graphics and Intel Data Center GPU Flex Series, we recommend:
 | 
			
		||||
 | 
			
		||||
      .. code-block:: bash
 | 
			
		||||
 | 
			
		||||
         ./run-arc.sh
 | 
			
		||||
 | 
			
		||||
   .. tab:: Intel Data Center GPU Max
 | 
			
		||||
 | 
			
		||||
      For Intel Data Center GPU Max Series, we recommend:
 | 
			
		||||
 | 
			
		||||
      .. code-block:: bash
 | 
			
		||||
 | 
			
		||||
         ./run-max-gpu.sh
 | 
			
		||||
 | 
			
		||||
      Please note that you need to run ``conda install -c conda-forge -y gperftools=2.10`` to install essential dependencies for Intel Data Center GPU Max.
 | 
			
		||||
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
## Result
 | 
			
		||||
 | 
			
		||||
After the script runnning is completed, you can obtain a CSV result file under the current folder. You can mainly look at the results of columns `1st token avg latency (ms)` and `2+ avg latency (ms/token)` for  performance results. You can also check whether the column `actual input/output tokens` is consistent with the column `input/output tokens` and whether the parameters you specified in config.yaml have been successfully applied in the benchmarking.
 | 
			
		||||
| 
						 | 
				
			
			@ -8,4 +8,5 @@ BigDL-LLM Quickstart
 | 
			
		|||
This section includes efficient guide to show you how to:
 | 
			
		||||
 | 
			
		||||
* `Install BigDL-LLM on Windows with Intel GPU <./install_windows_gpu.html>`_
 | 
			
		||||
* `Use Text Generation WebUI on Windows with Intel GPU <./webui_quickstart.html>`_
 | 
			
		||||
* `Use Text Generation WebUI on Windows with Intel GPU <./webui_quickstart.html>`_
 | 
			
		||||
* `Conduct Performance Benchmarking with BigDL-LLM <./benchmark_quickstart.html>`_
 | 
			
		||||
		Loading…
	
		Reference in a new issue