Add SYCL_CACHE_PERSISTENT in doc and explain warmup in benchmark quickstart (#10571)
* update doc * update
This commit is contained in:
		
							parent
							
								
									c450c85489
								
							
						
					
					
						commit
						e619142a16
					
				
					 3 changed files with 10 additions and 2 deletions
				
			
		| 
						 | 
					@ -499,6 +499,7 @@ To use GPU acceleration on Linux, several environment variables are required or
 | 
				
			||||||
         # Recommended Environment Variables for optimal performance
 | 
					         # Recommended Environment Variables for optimal performance
 | 
				
			||||||
         export USE_XETLA=OFF
 | 
					         export USE_XETLA=OFF
 | 
				
			||||||
         export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 | 
					         export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 | 
				
			||||||
 | 
					         export SYCL_CACHE_PERSISTENT=1
 | 
				
			||||||
 | 
					
 | 
				
			||||||
   .. tab:: Intel Data Center GPU Max
 | 
					   .. tab:: Intel Data Center GPU Max
 | 
				
			||||||
 | 
					
 | 
				
			||||||
| 
						 | 
					@ -513,6 +514,7 @@ To use GPU acceleration on Linux, several environment variables are required or
 | 
				
			||||||
         # Recommended Environment Variables for optimal performance
 | 
					         # Recommended Environment Variables for optimal performance
 | 
				
			||||||
         export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 | 
					         export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 | 
				
			||||||
         export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 | 
					         export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 | 
				
			||||||
 | 
					         export SYCL_CACHE_PERSISTENT=1
 | 
				
			||||||
         export ENABLE_SDP_FUSION=1
 | 
					         export ENABLE_SDP_FUSION=1
 | 
				
			||||||
 | 
					
 | 
				
			||||||
      Please note that ``libtcmalloc.so`` can be installed by ``conda install -c conda-forge -y gperftools=2.10``
 | 
					      Please note that ``libtcmalloc.so`` can be installed by ``conda install -c conda-forge -y gperftools=2.10``
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
| 
						 | 
					@ -21,7 +21,7 @@ git clone https://github.com/intel-analytics/ipex-llm.git
 | 
				
			||||||
cd ipex-llm/python/llm/dev/benchmark/all-in-one/
 | 
					cd ipex-llm/python/llm/dev/benchmark/all-in-one/
 | 
				
			||||||
```
 | 
					```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
## Configure YAML File
 | 
					## config.yaml
 | 
				
			||||||
 | 
					
 | 
				
			||||||
```yaml
 | 
					```yaml
 | 
				
			||||||
repo_id:
 | 
					repo_id:
 | 
				
			||||||
| 
						 | 
					@ -44,6 +44,8 @@ Some parameters in the yaml file that you can configure:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
- repo_id: The name of the model and its organization.
 | 
					- repo_id: The name of the model and its organization.
 | 
				
			||||||
- local_model_hub: The folder path where the models are stored on your machine.
 | 
					- local_model_hub: The folder path where the models are stored on your machine.
 | 
				
			||||||
 | 
					- warm_up: The number of runs as warmup trials, executed before performance benchmarking.
 | 
				
			||||||
 | 
					- num_trials: The number of runs for performance benchmarking. The final benchmark result would be the average of all the trials.
 | 
				
			||||||
- low_bit: The low_bit precision you want to convert to for benchmarking.
 | 
					- low_bit: The low_bit precision you want to convert to for benchmarking.
 | 
				
			||||||
- batch_size: The number of samples on which the models makes predictions in one forward pass.
 | 
					- batch_size: The number of samples on which the models makes predictions in one forward pass.
 | 
				
			||||||
- in_out_pairs: Input sequence length and output sequence length combined by '-'.
 | 
					- in_out_pairs: Input sequence length and output sequence length combined by '-'.
 | 
				
			||||||
| 
						 | 
					@ -53,6 +55,8 @@ Some parameters in the yaml file that you can configure:
 | 
				
			||||||
  - `transformer_int4` on Intel CPU
 | 
					  - `transformer_int4` on Intel CPU
 | 
				
			||||||
- cpu_embedding: Whether to put embedding on CPU (only avaiable now for windows gpu related test_api).
 | 
					- cpu_embedding: Whether to put embedding on CPU (only avaiable now for windows gpu related test_api).
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Remark: If you want to benchmark the performance without warmup, you can set `warm_up: 0` as well as `num_trials: 1` in `config.yaml`, and run each single model and in_out_pair separately.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
## Run on Windows
 | 
					## Run on Windows
 | 
				
			||||||
 | 
					
 | 
				
			||||||
Please refer to [here](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#runtime-configuration) to configure oneAPI environment variables.
 | 
					Please refer to [here](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#runtime-configuration) to configure oneAPI environment variables.
 | 
				
			||||||
| 
						 | 
					@ -144,4 +148,4 @@ Please refer to [here](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Overvie
 | 
				
			||||||
 | 
					
 | 
				
			||||||
## Result
 | 
					## Result
 | 
				
			||||||
 | 
					
 | 
				
			||||||
After the script runnning is completed, you can obtain a CSV result file under the current folder. You can mainly look at the results of columns `1st token avg latency (ms)` and `2+ avg latency (ms/token)` for  performance results. You can also check whether the column `actual input/output tokens` is consistent with the column `input/output tokens` and whether the parameters you specified in config.yaml have been successfully applied in the benchmarking.
 | 
					After the benchmarking completes, you can obtain a CSV result file under the current folder. You can mainly look at the results of columns `1st token avg latency (ms)` and `2+ avg latency (ms/token)` for the benchmark results. You can also check whether the column `actual input/output tokens` is consistent with the column `input/output tokens` and whether the parameters you specified in `config.yaml` have been successfully applied in the benchmarking.
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
| 
						 | 
					@ -128,6 +128,7 @@ To use GPU acceleration on Linux, several environment variables are required or
 | 
				
			||||||
         # Recommended Environment Variables for optimal performance
 | 
					         # Recommended Environment Variables for optimal performance
 | 
				
			||||||
         export USE_XETLA=OFF
 | 
					         export USE_XETLA=OFF
 | 
				
			||||||
         export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 | 
					         export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 | 
				
			||||||
 | 
					         SYCL_CACHE_PERSISTENT=1
 | 
				
			||||||
 | 
					
 | 
				
			||||||
   .. tab:: Intel Data Center GPU Max
 | 
					   .. tab:: Intel Data Center GPU Max
 | 
				
			||||||
 | 
					
 | 
				
			||||||
| 
						 | 
					@ -142,6 +143,7 @@ To use GPU acceleration on Linux, several environment variables are required or
 | 
				
			||||||
         # Recommended Environment Variables for optimal performance
 | 
					         # Recommended Environment Variables for optimal performance
 | 
				
			||||||
         export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 | 
					         export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
 | 
				
			||||||
         export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 | 
					         export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 | 
				
			||||||
 | 
					         export SYCL_CACHE_PERSISTENT=1
 | 
				
			||||||
         export ENABLE_SDP_FUSION=1
 | 
					         export ENABLE_SDP_FUSION=1
 | 
				
			||||||
 | 
					
 | 
				
			||||||
      Please note that ``libtcmalloc.so`` can be installed by ``conda install -c conda-forge -y gperftools=2.10``
 | 
					      Please note that ``libtcmalloc.so`` can be installed by ``conda install -c conda-forge -y gperftools=2.10``
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
		Loading…
	
		Reference in a new issue