diff --git a/docs/mddocs/Quickstart/llama3_llamacpp_ollama_quickstart.md b/docs/mddocs/Quickstart/llama3_llamacpp_ollama_quickstart.md index f42ffe46..40fcd31f 100644 --- a/docs/mddocs/Quickstart/llama3_llamacpp_ollama_quickstart.md +++ b/docs/mddocs/Quickstart/llama3_llamacpp_ollama_quickstart.md @@ -52,6 +52,8 @@ To use GPU acceleration, several environment variables are required or recommend source /opt/intel/oneapi/setvars.sh export SYCL_CACHE_PERSISTENT=1 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 + # [optional] if you want to run on single GPU, use below command to limit GPU may improve performance + export ONEAPI_DEVICE_SELECTOR=level_zero:0 ``` - For **Windows users**: @@ -63,6 +65,8 @@ To use GPU acceleration, several environment variables are required or recommend set SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ``` +> [!TIP] +> When your machine has multi GPUs and you want to run on one of them, you need to set `ONEAPI_DEVICE_SELECTOR=level_zero:[gpu_id]`, here `[gpu_id]` varies based on your requirement. For more details, you can refer to [this section](../Overview/KeyFeatures/multi_gpus_selection.md#2-oneapi-device-selector). ##### Run llama3 @@ -128,6 +132,8 @@ Launch the Ollama service: source /opt/intel/oneapi/setvars.sh export SYCL_CACHE_PERSISTENT=1 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 + # [optional] if you want to run on single GPU, use below command to limit GPU may improve performance + export ONEAPI_DEVICE_SELECTOR=level_zero:0 ./ollama serve ``` @@ -151,6 +157,10 @@ Launch the Ollama service: > > To allow the service to accept connections from all IP addresses, use `OLLAMA_HOST=0.0.0.0 ./ollama serve` instead of just `./ollama serve`. +> [!TIP] +> When your machine has multi GPUs and you want to run on one of them, you need to set `ONEAPI_DEVICE_SELECTOR=level_zero:[gpu_id]`, here `[gpu_id]` varies based on your requirement. For more details, you can refer to [this section](../Overview/KeyFeatures/multi_gpus_selection.md#2-oneapi-device-selector). + + ##### 2.2.2 Using Ollama Run Llama3 Keep the Ollama service on and open another terminal and run llama3 with `ollama run`: diff --git a/docs/mddocs/Quickstart/llama_cpp_quickstart.md b/docs/mddocs/Quickstart/llama_cpp_quickstart.md index afac3bf1..2bd16289 100644 --- a/docs/mddocs/Quickstart/llama_cpp_quickstart.md +++ b/docs/mddocs/Quickstart/llama_cpp_quickstart.md @@ -118,6 +118,8 @@ To use GPU acceleration, several environment variables are required or recommend source /opt/intel/oneapi/setvars.sh export SYCL_CACHE_PERSISTENT=1 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 + # [optional] if you want to run on single GPU, use below command to limit GPU may improve performance + export ONEAPI_DEVICE_SELECTOR=level_zero:0 ``` - For **Windows users**: @@ -129,6 +131,9 @@ To use GPU acceleration, several environment variables are required or recommend set SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 ``` +> [!TIP] +> When your machine has multi GPUs and you want to run on one of them, you need to set `ONEAPI_DEVICE_SELECTOR=level_zero:[gpu_id]`, here `[gpu_id]` varies based on your requirement. For more details, you can refer to [this section](../Overview/KeyFeatures/multi_gpus_selection.md#2-oneapi-device-selector). + ### 3. Example: Running community GGUF models with IPEX-LLM Here we provide a simple example to show how to run a community GGUF model with IPEX-LLM. diff --git a/docs/mddocs/Quickstart/ollama_quickstart.md b/docs/mddocs/Quickstart/ollama_quickstart.md index 8f940c43..d4519be8 100644 --- a/docs/mddocs/Quickstart/ollama_quickstart.md +++ b/docs/mddocs/Quickstart/ollama_quickstart.md @@ -73,6 +73,8 @@ You may launch the Ollama service as below: source /opt/intel/oneapi/setvars.sh export SYCL_CACHE_PERSISTENT=1 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 + # [optional] if you want to run on single GPU, use below command to limit GPU may improve performance + export ONEAPI_DEVICE_SELECTOR=level_zero:0 ./ollama serve ``` @@ -97,6 +99,9 @@ You may launch the Ollama service as below: > [!NOTE] > To allow the service to accept connections from all IP addresses, use `OLLAMA_HOST=0.0.0.0 ./ollama serve` instead of just `./ollama serve`. +> [!TIP] +> When your machine has multi GPUs and you want to run on one of them, you need to set `ONEAPI_DEVICE_SELECTOR=level_zero:[gpu_id]`, here `[gpu_id]` varies based on your requirement. For more details, you can refer to [this section](../Overview/KeyFeatures/multi_gpus_selection.md#2-oneapi-device-selector). + The console will display messages similar to the following: