ipex-llm/docs/mddocs/Quickstart/ollama_portablze_zip_quickstart.md
SONG Ge 5d041f9ebf
Add latest models list in ollama quickstart (#12850)
* Add latest models llist on ollama quickstart

* update oneapi version describe

* move models list to ollama_portable_zip doc

* update CN readme
2025-02-19 18:29:43 +08:00

5.1 KiB

Run Ollama Portable Zip on Intel GPU with IPEX-LLM

< English | 中文 >

This guide demonstrates how to use Ollama portable zip to directly run Ollama on Intel GPU with ipex-llm (without the need of manual installations).

Note

Currently, IPEX-LLM only provides Ollama portable zip on Windows.

Table of Contents

Prerequisites

Check your GPU driver version, and update it if needed:

  • For Intel Core Ultra processors (Series 2) or Intel Arc B-Series GPU, we recommend updating your GPU driver to the latest

  • For other Intel iGPU/dGPU, we recommend using GPU driver version 32.0.101.6078

Step 1: Download and Unzip

Download IPEX-LLM Ollama portable zip from the link.

Then, extract the zip file to a folder.

Step 2: Start Ollama Serve

Double-click start-ollama.bat in the extracted folder to start the Ollama service. A window will then pop up as shown below:

Step 3: Run Ollama

You could then use Ollama to run LLMs on Intel GPUs as follows:

  • Open "Command Prompt" (cmd), and enter the extracted folder through cd /d PATH\TO\EXTRACTED\FOLDER
  • Run ollama run deepseek-r1:7b in the "Command Prompt" (you may use any other model)

Tips & Troubleshooting

Speed up model download using alternative sources

Ollama by default downloads model from Ollama library. By setting the environment variable IPEX_LLM_MODEL_SOURCE to modelscope/ollama before run Ollama, you could switch the source from which the model is downloaded first.

For example, if you would like to run deepseek-r1:7b but the download speed from Ollama library is quite slow, you could use its model source from ModelScope instead, through:

  • Open "Command Prompt" (cmd), and navigate to the extracted folder by cd /d PATH\TO\EXTRACTED\FOLDER
  • Run set IPEX_LLM_MODEL_SOURCE=modelscope in "Command Prompt"
  • Run ollama run deepseek-r1:7b

Tip

Model downloaded with set IPEX_LLM_MODEL_SOURCE=modelscope will still show actual model id in ollama list, e.g.

NAME                                                             ID              SIZE      MODIFIED
modelscope.cn/unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUF:Q4_K_M    f482d5af6aec    4.7 GB    About a minute ago

Except for ollama run and ollama pull, the model should be identified through its actual id, e.g. ollama rm modelscope.cn/unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUF:Q4_K_M

Increase context length in Ollama

By default, Ollama runs model with a context window of 2048 tokens. That is, the model can "remember" at most 2048 tokens of context.

To increase the context length, you could set environment variable IPEX_LLM_NUM_CTX before starting Ollama serve, as shwon below:

  • Open "Command Prompt" (cmd), and navigate to the extracted folder through cd /d PATH\TO\EXTRACTED\FOLDER
  • Set IPEX_LLM_NUM_CTX to the desired length in the "Command Prompt, e.g. set IPEX_LLM_NUM_CTX=16384
  • Start Ollama serve through start-ollama.bat

Tip

IPEX_LLM_NUM_CTX has a higher priority than the num_ctx settings in a models' Modelfile.

Additional models supported after Ollama v0.5.4

The currently Ollama Portable Zip is based on Ollama v0.5.4; in addition, the following new models have also been supported in the Ollama Portable Zip:

Model Download Model Link
DeepSeek-R1 ollama run deepseek-r1 deepseek-r1
Openthinker ollama run openthinker openthinker
DeepScaleR ollama run deepscaler deepscaler
Phi-4 ollama run phi4 phi4
Dolphin 3.0 ollama run dolphin3 dolphin3
Smallthinker ollama run smallthinker smallthinker
Granite3.1-Dense ollama run granite3-dense granite3.1-dense
Granite3.1-Moe-3B ollama run granite3-moe granite3.1-moe