* Add latest models llist on ollama quickstart * update oneapi version describe * move models list to ollama_portable_zip doc * update CN readme
5.1 KiB
Run Ollama Portable Zip on Intel GPU with IPEX-LLM
< English | 中文 >
This guide demonstrates how to use Ollama portable zip to directly run Ollama on Intel GPU with ipex-llm (without the need of manual installations).
Note
Currently, IPEX-LLM only provides Ollama portable zip on Windows.
Table of Contents
- Prerequisites
- Step 1: Download and Unzip
- Step 2: Start Ollama Serve
- Step 3: Run Ollama
- Tips & Troubleshooting
Prerequisites
Check your GPU driver version, and update it if needed:
-
For Intel Core Ultra processors (Series 2) or Intel Arc B-Series GPU, we recommend updating your GPU driver to the latest
-
For other Intel iGPU/dGPU, we recommend using GPU driver version 32.0.101.6078
Step 1: Download and Unzip
Download IPEX-LLM Ollama portable zip from the link.
Then, extract the zip file to a folder.
Step 2: Start Ollama Serve
Double-click start-ollama.bat in the extracted folder to start the Ollama service. A window will then pop up as shown below:
Step 3: Run Ollama
You could then use Ollama to run LLMs on Intel GPUs as follows:
- Open "Command Prompt" (cmd), and enter the extracted folder through
cd /d PATH\TO\EXTRACTED\FOLDER - Run
ollama run deepseek-r1:7bin the "Command Prompt" (you may use any other model)
Tips & Troubleshooting
Speed up model download using alternative sources
Ollama by default downloads model from Ollama library. By setting the environment variable IPEX_LLM_MODEL_SOURCE to modelscope/ollama before run Ollama, you could switch the source from which the model is downloaded first.
For example, if you would like to run deepseek-r1:7b but the download speed from Ollama library is quite slow, you could use its model source from ModelScope instead, through:
- Open "Command Prompt" (cmd), and navigate to the extracted folder by
cd /d PATH\TO\EXTRACTED\FOLDER - Run
set IPEX_LLM_MODEL_SOURCE=modelscopein "Command Prompt" - Run
ollama run deepseek-r1:7b
Tip
Model downloaded with
set IPEX_LLM_MODEL_SOURCE=modelscopewill still show actual model id inollama list, e.g.NAME ID SIZE MODIFIED modelscope.cn/unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUF:Q4_K_M f482d5af6aec 4.7 GB About a minute agoExcept for
ollama runandollama pull, the model should be identified through its actual id, e.g.ollama rm modelscope.cn/unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUF:Q4_K_M
Increase context length in Ollama
By default, Ollama runs model with a context window of 2048 tokens. That is, the model can "remember" at most 2048 tokens of context.
To increase the context length, you could set environment variable IPEX_LLM_NUM_CTX before starting Ollama serve, as shwon below:
- Open "Command Prompt" (cmd), and navigate to the extracted folder through
cd /d PATH\TO\EXTRACTED\FOLDER - Set
IPEX_LLM_NUM_CTXto the desired length in the "Command Prompt, e.g.set IPEX_LLM_NUM_CTX=16384 - Start Ollama serve through
start-ollama.bat
Tip
IPEX_LLM_NUM_CTXhas a higher priority than thenum_ctxsettings in a models'Modelfile.
Additional models supported after Ollama v0.5.4
The currently Ollama Portable Zip is based on Ollama v0.5.4; in addition, the following new models have also been supported in the Ollama Portable Zip:
| Model | Download | Model Link |
|---|---|---|
| DeepSeek-R1 | ollama run deepseek-r1 |
deepseek-r1 |
| Openthinker | ollama run openthinker |
openthinker |
| DeepScaleR | ollama run deepscaler |
deepscaler |
| Phi-4 | ollama run phi4 |
phi4 |
| Dolphin 3.0 | ollama run dolphin3 |
dolphin3 |
| Smallthinker | ollama run smallthinker |
smallthinker |
| Granite3.1-Dense | ollama run granite3-dense |
granite3.1-dense |
| Granite3.1-Moe-3B | ollama run granite3-moe |
granite3.1-moe |