11 KiB
Run Ollama Portable Zip on Intel GPU with IPEX-LLM
< English | 中文 >
This guide demonstrates how to use Ollama portable zip to directly run Ollama on Intel GPU with ipex-llm (without the need of manual installations).
Note
Ollama portable zip has been verified on:
- Intel Core Ultra processors
- Intel Core 11th - 14th gen processors
- Intel Arc A-Series GPU
- Intel Arc B-Series GPU
Table of Contents
Windows Quickstart
Note
We recommand using Windows 11 for Windows users.
Prerequisites
We recommend updating your GPU driver to the latest.
Step 1: Download and Unzip
Download IPEX-LLM Ollama portable zip for Windows users from the link.
Then, extract the zip file to a folder.
Step 2: Start Ollama Serve
Start Ollama serve as follows:
- Open "Command Prompt" (cmd), and enter the extracted folder through
cd /d PATH\TO\EXTRACTED\FOLDER - Run
start-ollama.batin the "Command Prompt. A window will then pop up as shown below:
Step 3: Run Ollama
You could then use Ollama to run LLMs on Intel GPUs through running ollama run deepseek-r1:7b in the same "Command Prompt" (not the pop-up window). You may use any other model.
Linux Quickstart
Prerequisites
Check your GPU driver version, and update it if needed; we recommend following Intel client GPU driver installation guide to install your GPU driver.
Step 1: Download and Extract
Download IPEX-LLM Ollama portable tgz for Ubuntu users from the link.
Then open a terminal, extract the tgz file to a folder.
tar -xvf [Downloaded tgz file path]
Step 2: Start Ollama Serve
Enter the extracted folder, and run start-ollama.sh to start Ollama service.
cd PATH/TO/EXTRACTED/FOLDER
./start-ollama.sh
Step 3: Run Ollama
You could then use Ollama to run LLMs on Intel GPUs as follows:
- Open another ternimal, and enter the extracted folder through
cd PATH/TO/EXTRACTED/FOLDER - Run
./ollama run deepseek-r1:7b(you may use any other model)
Tips & Troubleshooting
Speed up model download using alternative sources
Ollama by default downloads model from the Ollama library. By setting the environment variable OLLAMA_MODEL_SOURCE to modelscope or ollama before running Ollama, you could switch the source where the model is downloaded.
For example, if you would like to run deepseek-r1:7b but the download speed from the Ollama library is slow, you could download the model from ModelScope as follows:
-
For Windows users:
- In the "Command Prompt", navigate to the extracted folder by
cd /d PATH\TO\EXTRACTED\FOLDER - Run
set OLLAMA_MODEL_SOURCE=modelscopein "Command Prompt" - Run
ollama run deepseek-r1:7b
- In the "Command Prompt", navigate to the extracted folder by
-
For Linux users:
- In a terminal other than the one for Ollama serve, navigate to the extracted folder by
cd PATH\TO\EXTRACTED\FOLDER - Run
export OLLAMA_MODEL_SOURCE=modelscopein the terminal - Run
./ollama run deepseek-r1:7b
- In a terminal other than the one for Ollama serve, navigate to the extracted folder by
Tip
Model downloaded with
set OLLAMA_MODEL_SOURCE=modelscopewill still show actual model id inollama list, e.g.NAME ID SIZE MODIFIED modelscope.cn/unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUF:Q4_K_M f482d5af6aec 4.7 GB About a minute agoExcept for
ollama runandollama pull, the model should be identified through its actual id, e.g.ollama rm modelscope.cn/unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUF:Q4_K_M
Increase context length in Ollama
By default, Ollama runs model with a context window of 2048 tokens. That is, the model can "remember" at most 2048 tokens of context.
To increase the context length, you could set environment variable OLLAMA_NUM_CTX before staring Ollama Serve, as shwon below (if Ollama serve is already running, please make sure to stop it first):
-
For Windows users:
- Open "Command Prompt", and navigate to the extracted folder through
cd /d PATH\TO\EXTRACTED\FOLDER - Set
OLLAMA_NUM_CTXto the desired length in the "Command Prompt, e.g.set OLLAMA_NUM_CTX=16384 - Start Ollama serve through
start-ollama.bat
- Open "Command Prompt", and navigate to the extracted folder through
-
For Linux users:
- In a terminal, navigate to the extracted folder through
cd PATH\TO\EXTRACTED\FOLDER - Set
OLLAMA_NUM_CTXto the desired length in the terminal, e.g.export OLLAMA_NUM_CTX=16384 - Start Ollama serve through
./start-ollama.sh
- In a terminal, navigate to the extracted folder through
Tip
OLLAMA_NUM_CTXhas a higher priority than thenum_ctxsettings in a models'Modelfile.
Note
For versions earlier than 2.7.0b20250429, please use the
IPEX_LLM_NUM_CTXinstead.
Select specific GPU(s) to run Ollama when multiple ones are available
If your machine has multiple Intel GPUs, Ollama will by default runs on all of them.
To specify which Intel GPU(s) you would like Ollama to use, you could set environment variable ONEAPI_DEVICE_SELECTOR before starting Ollama Serve, as follows (if Ollama serve is already running, please make sure to stop it first):
-
Identify the id (e.g. 0, 1, etc.) for your multiple GPUs. You could find them in the logs of Ollama serve when loading any models, e.g.:
-
For Windows users:
- Open "Command Prompt", and navigate to the extracted folder by
cd /d PATH\TO\EXTRACTED\FOLDER - In the "Command Prompt", set
ONEAPI_DEVICE_SELECTORto define the Intel GPU(s) you want to use, e.g.set ONEAPI_DEVICE_SELECTOR=level_zero:0(on single Intel GPU), orset ONEAPI_DEVICE_SELECTOR=level_zero:0;level_zero:1(on multiple Intel GPUs), in which0,1should be changed to your desired GPU id - Start Ollama serve through
start-ollama.bat
- Open "Command Prompt", and navigate to the extracted folder by
-
For Linux users:
- In a terminal, navigate to the extracted folder by
cd PATH\TO\EXTRACTED\FOLDER - Set
ONEAPI_DEVICE_SELECTORto define the Intel GPU(s) you want to use, e.g.export ONEAPI_DEVICE_SELECTOR=level_zero:0(on single Intel GPU), orexport ONEAPI_DEVICE_SELECTOR="level_zero:0;level_zero:1"(on multiple Intel GPUs), in which0,1should be changed to your desired GPU id - Start Ollama serve through
./start-ollama.sh
- In a terminal, navigate to the extracted folder by
Tune performance
Here are some settings you could try to tune the performance:
Environment variable SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS
The environment variable SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS determines the usage of immediate command lists for task submission to the GPU. You could experiment with SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 or 0 for best performance.
To enable SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS, set it before starting Ollama Serve, as shown below (if Ollama serve is already running, please make sure to stop it first):
-
For Windows users:
- Open "Command Prompt", and navigate to the extracted folder through
cd /d PATH\TO\EXTRACTED\FOLDER - Run
set SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1in "Command Prompt" - Start Ollama serve through
start-ollama.bat
- Open "Command Prompt", and navigate to the extracted folder through
-
For Linux users:
- In a terminal, navigate to the extracted folder through
cd PATH\TO\EXTRACTED\FOLDER - Run
export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1in the terminal - Start Ollama serve through
./start-ollama.sh
- In a terminal, navigate to the extracted folder through
Tip
You could refer to here regarding more information about Level Zero Immediate Command Lists.
Additional models supported after Ollama v0.6.2
The currently Ollama Portable Zip is based on Ollama v0.6.2; in addition, the following new models have also been supported in the Ollama Portable Zip:
| Model | Download (Windows) | Download (Linux) | Model Link |
|---|---|---|---|
| DeepSeek-R1 | ollama run deepseek-r1 |
./ollama run deepseek-r1 |
deepseek-r1 |
| Openthinker | ollama run openthinker |
./ollama run openthinker |
openthinker |
| DeepScaleR | ollama run deepscaler |
./ollama run deepscaler |
deepscaler |
| Phi-4 | ollama run phi4 |
./ollama run phi4 |
phi4 |
| Dolphin 3.0 | ollama run dolphin3 |
./ollama run dolphin3 |
dolphin3 |
| Smallthinker | ollama run smallthinker |
./ollama run smallthinker |
smallthinker |
| Granite3.1-Dense | ollama run granite3-dense |
./ollama run granite3-dense |
granite3.1-dense |
| Granite3.1-Moe-3B | ollama run granite3-moe |
./ollama run granite3-moe |
granite3.1-moe |
| Gemma 3 1B | set IPEX_LLM_MODEL_SOURCE=modelscope ollama run gemma3:1b |
export IPEX_LLM_MODEL_SOURCE=modelscope ./ollama run gemma3:1b |
gemma3:1b |
Signature Verification
For portable zip/tgz version 2.2.0, you could verify its signature with the following command:
openssl cms -verify -in <portable-zip-or-tgz-file-name>.pkcs1.sig -inform DER -content <portable-zip-or-tgz-file-name> -out nul -noverify
Note
Please ensure that
opensslis installed on your system before verifying signature.