doc for Multi gpu selection (#9414)
This commit is contained in:
		
							parent
							
								
									50b01058f1
								
							
						
					
					
						commit
						0f9a440b06
					
				
					 4 changed files with 61 additions and 0 deletions
				
			
		| 
						 | 
				
			
			@ -51,6 +51,7 @@ subtrees:
 | 
			
		|||
                    - entries:
 | 
			
		||||
                      - file: doc/LLM/Overview/KeyFeatures/inference_on_gpu
 | 
			
		||||
                      - file: doc/LLM/Overview/KeyFeatures/finetune
 | 
			
		||||
                      - file: doc/LLM/Overview/KeyFeatures/multi_gpus_selection
 | 
			
		||||
          - file: doc/LLM/Overview/examples
 | 
			
		||||
            title: "Examples"
 | 
			
		||||
            subtrees:
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
| 
						 | 
				
			
			@ -5,6 +5,10 @@ BigDL-LLM not only supports running large language models for inference, but als
 | 
			
		|||
 | 
			
		||||
* |inference_on_gpu|_
 | 
			
		||||
* `Finetune (QLoRA) <./finetune.html>`_
 | 
			
		||||
* `Multi GPUs selection <./multi_gpus_selection.html>`_
 | 
			
		||||
 | 
			
		||||
.. |inference_on_gpu| replace:: Inference on GPU
 | 
			
		||||
.. _inference_on_gpu: ./inference_on_gpu.html
 | 
			
		||||
 | 
			
		||||
.. |multi_gpus_selection| replace:: Multi GPUs selection
 | 
			
		||||
.. _multi_gpus_selection: ./multi_gpus_selection.html
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
| 
						 | 
				
			
			@ -14,6 +14,8 @@ You may run the LLMs using ``bigdl-llm`` through one of the following APIs:
 | 
			
		|||
 | 
			
		||||
  * |inference_on_gpu|_
 | 
			
		||||
  * `Finetune (QLoRA) <./finetune.html>`_
 | 
			
		||||
  * `Multi GPUs selection <./multi_gpus_selection.html>`_
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
.. |transformers_style_api| replace:: ``transformers``-style API
 | 
			
		||||
.. _transformers_style_api: ./transformers_style_api.html
 | 
			
		||||
| 
						 | 
				
			
			@ -26,3 +28,6 @@ You may run the LLMs using ``bigdl-llm`` through one of the following APIs:
 | 
			
		|||
 | 
			
		||||
.. |inference_on_gpu| replace:: Inference on GPU
 | 
			
		||||
.. _inference_on_gpu: ./inference_on_gpu.html
 | 
			
		||||
 | 
			
		||||
.. |multi_gpus_selection| replace:: Multi GPUs selection
 | 
			
		||||
.. _multi_gpus_selection: ./multi_gpus_selection.html
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
| 
						 | 
				
			
			@ -0,0 +1,51 @@
 | 
			
		|||
# Multi Intel GPUs selection
 | 
			
		||||
 | 
			
		||||
In [Inference on GPU](inference_on_gpu.md) and [Finetune (QLoRA)](finetune.md), you have known how to run inference and finetune on Intel GPUs. In this section, we will show you two approaches to select GPU devices.
 | 
			
		||||
 | 
			
		||||
## List devices
 | 
			
		||||
 | 
			
		||||
The `sycl-ls` tool enumerates a list of devices available in the system. You can use it after you setup oneapi environment:
 | 
			
		||||
```bash
 | 
			
		||||
source /opt/intel/oneapi/setvars.sh
 | 
			
		||||
sycl-ls
 | 
			
		||||
```
 | 
			
		||||
If you have two Arc770 GPUs, you can get something like below:
 | 
			
		||||
```
 | 
			
		||||
[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device 1.2 [2023.16.7.0.21_160000]
 | 
			
		||||
[opencl:cpu:1] Intel(R) OpenCL, Intel(R) Core(TM) i9-14900K 3.0 [2023.16.7.0.21_160000]
 | 
			
		||||
[opencl:gpu:2] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) A770 Graphics 3.0 [23.17.26241.33]
 | 
			
		||||
[opencl:gpu:3] Intel(R) OpenCL Graphics, Intel(R) Arc(TM) A770 Graphics 3.0 [23.17.26241.33]
 | 
			
		||||
[opencl:gpu:4] Intel(R) OpenCL Graphics, Intel(R) UHD Graphics 770 3.0 [23.17.26241.33]
 | 
			
		||||
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Arc(TM) A770 Graphics 1.3 [1.3.26241]
 | 
			
		||||
[ext_oneapi_level_zero:gpu:1] Intel(R) Level-Zero, Intel(R) Arc(TM) A770 Graphics 1.3 [1.3.26241]
 | 
			
		||||
[ext_oneapi_level_zero:gpu:2] Intel(R) Level-Zero, Intel(R) UHD Graphics 770 1.3 [1.3.26241]
 | 
			
		||||
```
 | 
			
		||||
This output shows there are two Arc A770 GPUs on this machine.
 | 
			
		||||
 | 
			
		||||
## Devices selection
 | 
			
		||||
To enable xpu, you should convert your model and input to xpu by below code:
 | 
			
		||||
```
 | 
			
		||||
model = model.to('xpu')
 | 
			
		||||
input_ids = tokenizer.encode(prompt, return_tensors="pt").to('xpu')
 | 
			
		||||
```
 | 
			
		||||
To select the desired devices, there are two ways: one is changing the code, another is adding an environment variable. See:  
 | 
			
		||||
 | 
			
		||||
### 1. Select device in python
 | 
			
		||||
To specify a xpu, you can change the `to('xpu')` to `to('xpu:[device_id]')`, this device_id is counted from zero.  
 | 
			
		||||
If you you want to use the second device, you can change the code like this: 
 | 
			
		||||
```
 | 
			
		||||
model = model.to('xpu:1')
 | 
			
		||||
input_ids = tokenizer.encode(prompt, return_tensors="pt").to('xpu:1')
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
### 2. OneAPI device selector
 | 
			
		||||
Device selection environment variable, `ONEAPI_DEVICE_SELECTOR`, can be used to limit the choice of Intel GPU devices. As upon `sycl-ls` shows, the last three lines are three Level Zero GPU devices. So we can use `ONEAPI_DEVICE_SELECTOR=level_zero:[gpu_id]` to select devices.
 | 
			
		||||
For example, you want to use the second A770 GPU, you can run the python like this:
 | 
			
		||||
```
 | 
			
		||||
ONEAPI_DEVICE_SELECTOR=level_zero:1 python generate.py
 | 
			
		||||
```
 | 
			
		||||
`ONEAPI_DEVICE_SELECTOR=level_zero:1` in upon command only affect in current python program. Also, you can export the environment, then run your python:
 | 
			
		||||
```
 | 
			
		||||
export ONEAPI_DEVICE_SELECTOR=level_zero:1
 | 
			
		||||
python generate.py
 | 
			
		||||
```
 | 
			
		||||
		Loading…
	
		Reference in a new issue