* Update ollama portable zip quickstart with runtime configurations * Small fix * Update based on comments * Small fix * Small fix
4.1 KiB
Run Ollama Portable Zip on Intel GPU with IPEX-LLM
< English | 中文 >
This guide demonstrates how to use Ollama portable zip to directly run Ollama on Intel GPU with ipex-llm (without the need of manual installations).
Note
Currently, IPEX-LLM only provides Ollama portable zip on Windows.
Table of Contents
- Prerequisites
- Step 1: Download and Unzip
- Step 2: Start Ollama Serve
- Step 3: Run Ollama
- Tips & Troubleshooting
Prerequisites
Check your GPU driver version, and update it if needed:
-
For Intel Core Ultra processors (Series 2) or Intel Arc B-Series GPU, we recommend updating your GPU driver to the latest
-
For other Intel iGPU/dGPU, we recommend using GPU driver version 32.0.101.6078
Step 1: Download and Unzip
Download IPEX-LLM Ollama portable zip from the link.
Then, extract the zip file to a folder.
Step 2: Start Ollama Serve
Double-click start-ollama.bat in the extracted folder to start the Ollama service. A window will then pop up as shown below:
Step 3: Run Ollama
You could then use Ollama to run LLMs on Intel GPUs as follows:
- Open "Command Prompt" (cmd), and enter the extracted folder through
cd /d PATH\TO\EXTRACTED\FOLDER - Run
ollama run deepseek-r1:7bin the "Command Prompt" (you may use any other model)
Tips & Troubleshooting
Speed up model download using alternative sources
Ollama by default downloads model from Ollama library. By setting the environment variable IPEX_LLM_MODEL_SOURCE to modelscope/ollama before run Ollama, you could switch the source from which the model is downloaded first.
For example, if you would like to run deepseek-r1:7b but the download speed from Ollama library is quite slow, you could use its model source from ModelScope instead, through:
- Open "Command Prompt" (cmd), and navigate to the extracted folder by
cd /d PATH\TO\EXTRACTED\FOLDER - Run
set IPEX_LLM_MODEL_SOURCE=modelscopein "Command Prompt" - Run
ollama run deepseek-r1:7b
Tip
Model downloaded with
set IPEX_LLM_MODEL_SOURCE=modelscopewill still show actual model id inollama list, e.g.NAME ID SIZE MODIFIED modelscope.cn/unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUF:Q4_K_M f482d5af6aec 4.7 GB About a minute agoExcept for
ollama runandollama pull, the model should be identified through its actual id, e.g.ollama rm modelscope.cn/unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUF:Q4_K_M
Increase context length in Ollama
By default, Ollama runs model with a context window of 2048 tokens. That is, the model can "remember" at most 2048 tokens of context.
To increase the context length, you could set environment variable IPEX_LLM_NUM_CTX before starting Ollama serve, as shwon below:
- Open "Command Prompt" (cmd), and navigate to the extracted folder through
cd /d PATH\TO\EXTRACTED\FOLDER - Set
IPEX_LLM_NUM_CTXto the desired length in the "Command Prompt, e.g.set IPEX_LLM_NUM_CTX=16384 - Start Ollama serve through
start-ollama.bat
Tip
IPEX_LLM_NUM_CTXhas a higher priority than thenum_ctxsettings in a models'Modelfile.