update cpp quickstart about ONEAPI_DEVICE_SELECTOR (#11630)
				
					
				
			* update * update * small fix
This commit is contained in:
		
							parent
							
								
									af6d406178
								
							
						
					
					
						commit
						ac97b31664
					
				
					 3 changed files with 20 additions and 0 deletions
				
			
		| 
						 | 
				
			
			@ -52,6 +52,8 @@ To use GPU acceleration, several environment variables are required or recommend
 | 
			
		|||
  source /opt/intel/oneapi/setvars.sh
 | 
			
		||||
  export SYCL_CACHE_PERSISTENT=1
 | 
			
		||||
  export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 | 
			
		||||
  # [optional] if you want to run on single GPU, use below command to limit GPU may improve performance
 | 
			
		||||
  export ONEAPI_DEVICE_SELECTOR=level_zero:0
 | 
			
		||||
  ```
 | 
			
		||||
 | 
			
		||||
- For **Windows users**:
 | 
			
		||||
| 
						 | 
				
			
			@ -63,6 +65,8 @@ To use GPU acceleration, several environment variables are required or recommend
 | 
			
		|||
  set SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 | 
			
		||||
  ```
 | 
			
		||||
 | 
			
		||||
> [!TIP]
 | 
			
		||||
> When your machine has multi GPUs and you want to run on one of them, you need to set `ONEAPI_DEVICE_SELECTOR=level_zero:[gpu_id]`, here `[gpu_id]` varies based on your requirement. For more details, you can refer to [this section](../Overview/KeyFeatures/multi_gpus_selection.md#2-oneapi-device-selector).
 | 
			
		||||
 | 
			
		||||
##### Run llama3
 | 
			
		||||
 | 
			
		||||
| 
						 | 
				
			
			@ -128,6 +132,8 @@ Launch the Ollama service:
 | 
			
		|||
  source /opt/intel/oneapi/setvars.sh
 | 
			
		||||
  export SYCL_CACHE_PERSISTENT=1
 | 
			
		||||
  export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 | 
			
		||||
  # [optional] if you want to run on single GPU, use below command to limit GPU may improve performance
 | 
			
		||||
  export ONEAPI_DEVICE_SELECTOR=level_zero:0
 | 
			
		||||
 | 
			
		||||
  ./ollama serve
 | 
			
		||||
  ```
 | 
			
		||||
| 
						 | 
				
			
			@ -151,6 +157,10 @@ Launch the Ollama service:
 | 
			
		|||
>
 | 
			
		||||
> To allow the service to accept connections from all IP addresses, use `OLLAMA_HOST=0.0.0.0 ./ollama serve` instead of just `./ollama serve`.
 | 
			
		||||
 | 
			
		||||
> [!TIP]
 | 
			
		||||
> When your machine has multi GPUs and you want to run on one of them, you need to set `ONEAPI_DEVICE_SELECTOR=level_zero:[gpu_id]`, here `[gpu_id]` varies based on your requirement. For more details, you can refer to [this section](../Overview/KeyFeatures/multi_gpus_selection.md#2-oneapi-device-selector).
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
##### 2.2.2 Using Ollama Run Llama3
 | 
			
		||||
 | 
			
		||||
Keep the Ollama service on and open another terminal and run llama3 with `ollama run`:
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
| 
						 | 
				
			
			@ -118,6 +118,8 @@ To use GPU acceleration, several environment variables are required or recommend
 | 
			
		|||
  source /opt/intel/oneapi/setvars.sh
 | 
			
		||||
  export SYCL_CACHE_PERSISTENT=1
 | 
			
		||||
  export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 | 
			
		||||
  # [optional] if you want to run on single GPU, use below command to limit GPU may improve performance
 | 
			
		||||
  export ONEAPI_DEVICE_SELECTOR=level_zero:0
 | 
			
		||||
  ```
 | 
			
		||||
 | 
			
		||||
- For **Windows users**:
 | 
			
		||||
| 
						 | 
				
			
			@ -129,6 +131,9 @@ To use GPU acceleration, several environment variables are required or recommend
 | 
			
		|||
  set SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 | 
			
		||||
  ```
 | 
			
		||||
 | 
			
		||||
> [!TIP]
 | 
			
		||||
> When your machine has multi GPUs and you want to run on one of them, you need to set `ONEAPI_DEVICE_SELECTOR=level_zero:[gpu_id]`, here `[gpu_id]` varies based on your requirement. For more details, you can refer to [this section](../Overview/KeyFeatures/multi_gpus_selection.md#2-oneapi-device-selector).
 | 
			
		||||
 | 
			
		||||
### 3. Example: Running community GGUF models with IPEX-LLM
 | 
			
		||||
 | 
			
		||||
Here we provide a simple example to show how to run a community GGUF model with IPEX-LLM.
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
| 
						 | 
				
			
			@ -73,6 +73,8 @@ You may launch the Ollama service as below:
 | 
			
		|||
  source /opt/intel/oneapi/setvars.sh
 | 
			
		||||
  export SYCL_CACHE_PERSISTENT=1
 | 
			
		||||
  export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 | 
			
		||||
  # [optional] if you want to run on single GPU, use below command to limit GPU may improve performance
 | 
			
		||||
  export ONEAPI_DEVICE_SELECTOR=level_zero:0
 | 
			
		||||
 | 
			
		||||
  ./ollama serve
 | 
			
		||||
  ```
 | 
			
		||||
| 
						 | 
				
			
			@ -97,6 +99,9 @@ You may launch the Ollama service as below:
 | 
			
		|||
> [!NOTE]
 | 
			
		||||
> To allow the service to accept connections from all IP addresses, use `OLLAMA_HOST=0.0.0.0 ./ollama serve` instead of just `./ollama serve`.
 | 
			
		||||
 | 
			
		||||
> [!TIP]
 | 
			
		||||
> When your machine has multi GPUs and you want to run on one of them, you need to set `ONEAPI_DEVICE_SELECTOR=level_zero:[gpu_id]`, here `[gpu_id]` varies based on your requirement. For more details, you can refer to [this section](../Overview/KeyFeatures/multi_gpus_selection.md#2-oneapi-device-selector).
 | 
			
		||||
 | 
			
		||||
The console will display messages similar to the following:
 | 
			
		||||
 | 
			
		||||
<a href="https://llm-assets.readthedocs.io/en/latest/_images/ollama_serve.png" target="_blank">
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
		Loading…
	
		Reference in a new issue