Add Portable zip Linux QuickStart (#12849)

* linux doc

* update

* Update ollama_portablze_zip_quickstart.md

* Update ollama_portablze_zip_quickstart.md

* Update ollama_portablze_zip_quickstart.zh-CN.md

* Update ollama_portablze_zip_quickstart.md

* meet code review

* update

* Add tips & troubleshooting sections for both Linux & Windows

* Rebase

* Fix based on comments

* Small fix

* Fix img

* Update table for linux

* Small fix

---------

Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
This commit is contained in:
Xin Qiu 2025-02-19 19:13:55 +08:00 committed by GitHub
parent b26409d53f
commit c81b7fc003
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
2 changed files with 194 additions and 54 deletions

View file

@ -5,17 +5,22 @@
This guide demonstrates how to use [Ollama portable zip](https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly) to directly run Ollama on Intel GPU with `ipex-llm` (without the need of manual installations).
> [!NOTE]
> Currently, IPEX-LLM only provides Ollama portable zip on Windows.
## Table of Contents
- [Prerequisites](#prerequisitesa)
- [Step 1: Download and Unzip](#step-1-download-and-unzip)
- [Step 2: Start Ollama Serve](#step-2-start-ollama-serve)
- [Step 3: Run Ollama](#step-3-run-ollama)
- [Tips & Troubleshooting](#tips--troubleshootings)
- [Windows Quickstart](#windows-quickstart)
- [Prerequisites](#prerequisites)
- [Step 1: Download and Unzip](#step-1-download-and-unzip)
- [Step 2: Start Ollama Serve](#step-2-start-ollama-serve)
- [Step 3: Run Ollama](#step-3-run-ollama)
- [Linux Quickstart](#linux-quickstart)
- [Prerequisites](#prerequisites-1)
- [Step 1: Download and Extract](#step-1-download-and-extract)
- [Step 2: Start Ollama Serve](#step-2-start-ollama-serve-1)
- [Step 3: Run Ollama](#step-3-run-ollama-1)
- [Tips & Troubleshooting](#tips--troubleshooting)
## Prerequisites
## Windows Quickstart
### Prerequisites
Check your GPU driver version, and update it if needed:
@ -23,13 +28,13 @@ Check your GPU driver version, and update it if needed:
- For other Intel iGPU/dGPU, we recommend using GPU driver version [32.0.101.6078](https://www.intel.com/content/www/us/en/download/785597/834050/intel-arc-iris-xe-graphics-windows.html)
## Step 1: Download and Unzip
### Step 1: Download and Unzip
Download IPEX-LLM Ollama portable zip from the [link](https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly).
Download IPEX-LLM Ollama portable zip for Windows users from the [link](https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly).
Then, extract the zip file to a folder.
## Step 2: Start Ollama Serve
### Step 2: Start Ollama Serve
Double-click `start-ollama.bat` in the extracted folder to start the Ollama service. A window will then pop up as shown below:
@ -37,7 +42,7 @@ Double-click `start-ollama.bat` in the extracted folder to start the Ollama serv
<img src="https://llm-assets.readthedocs.io/en/latest/_images/ollama_portable_start_ollama.png" width=80%/>
</div>
## Step 3: Run Ollama
### Step 3: Run Ollama
You could then use Ollama to run LLMs on Intel GPUs as follows:
@ -48,17 +53,70 @@ You could then use Ollama to run LLMs on Intel GPUs as follows:
<img src="https://llm-assets.readthedocs.io/en/latest/_images/ollama_portable_run_ollama.png" width=80%/>
</div>
## Linux Quickstart
### Prerequisites
Check your GPU driver version, and update it if needed:
- For client GPU, like A-series, B-series and integrated GPU, we recommend following [Intel client GPU driver installing guide](https://dgpu-docs.intel.com/driver/client/overview.html) to install your GPU driver.
### Step 1: Download and Extract
Download IPEX-LLM Ollama portable tgz for Ubuntu users from the [link](https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly).
Then open a terminal, extract the tgz file to a folder.
```bash
cd PATH/TO/DOWNLOADED/TGZ
tar xvf [Downloaded tgz file]
```
### Step 2: Start Ollama Serve
Enter the extracted folder, and run `start-ollama.sh` to start Ollama service.
```bash
cd PATH/TO/EXTRACTED/FOLDER
./start-ollama.sh
```
<div align="center">
<img src="https://llm-assets.readthedocs.io/en/latest/_images/ollama_portable_start_ollama_ubuntu.png" width=80%/>
</div>
### Step 3: Run Ollama
You could then use Ollama to run LLMs on Intel GPUs as follows:
- Open another ternimal, and enter the extracted folder through `cd PATH/TO/EXTRACTED/FOLDER`
- Run `./ollama run deepseek-r1:7b` (you may use any other model)
<div align="center">
<img src="https://llm-assets.readthedocs.io/en/latest/_images/ollama_portable_run_ollama_ubuntu.png" width=80%/>
</div>
## Tips & Troubleshooting
### Speed up model download using alternative sources
Ollama by default downloads model from [Ollama library](https://ollama.com/library). By setting the environment variable `IPEX_LLM_MODEL_SOURCE` to `modelscope`/`ollama` before [run Ollama](#step-3-run-ollama), you could switch the source from which the model is downloaded first.
Ollama by default downloads model from [Ollama library](https://ollama.com/library). By setting the environment variable `IPEX_LLM_MODEL_SOURCE` to `modelscope`/`ollama` **before Run Ollama**, you could switch the source from which the model is downloaded first.
For example, if you would like to run `deepseek-r1:7b` but the download speed from Ollama library is quite slow, you could use [its model source](https://www.modelscope.cn/models/unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUF) from [ModelScope](https://www.modelscope.cn/models) instead, through:
- Open "Command Prompt" (cmd), and navigate to the extracted folder by `cd /d PATH\TO\EXTRACTED\FOLDER`
- Run `set IPEX_LLM_MODEL_SOURCE=modelscope` in "Command Prompt"
- Run `ollama run deepseek-r1:7b`
- For **Windows** users:
- Open "Command Prompt", and navigate to the extracted folder by `cd /d PATH\TO\EXTRACTED\FOLDER`
- Run `set IPEX_LLM_MODEL_SOURCE=modelscope` in "Command Prompt"
- Run `ollama run deepseek-r1:7b`
- For **Linux** users:
- In a terminal other than the one for Ollama serve, navigate to the extracted folder by `cd PATH\TO\EXTRACTED\FOLDER`
- Run `export IPEX_LLM_MODEL_SOURCE=modelscope` in the terminal
- Run `./ollama run deepseek-r1:7b`
> [!TIP]
> Model downloaded with `set IPEX_LLM_MODEL_SOURCE=modelscope` will still show actual model id in `ollama list`, e.g.
@ -72,26 +130,58 @@ For example, if you would like to run `deepseek-r1:7b` but the download speed fr
By default, Ollama runs model with a context window of 2048 tokens. That is, the model can "remember" at most 2048 tokens of context.
To increase the context length, you could set environment variable `IPEX_LLM_NUM_CTX` before [starting Ollama serve](#step-2-start-ollama-serve), as shwon below:
To increase the context length, you could set environment variable `IPEX_LLM_NUM_CTX` **before Start Ollama Serve**, as shwon below (if Ollama serve is already running, please make sure to stop it first):
- Open "Command Prompt" (cmd), and navigate to the extracted folder through `cd /d PATH\TO\EXTRACTED\FOLDER`
- Set `IPEX_LLM_NUM_CTX` to the desired length in the "Command Prompt, e.g. `set IPEX_LLM_NUM_CTX=16384`
- Start Ollama serve through `start-ollama.bat`
- For **Windows** users:
- Open "Command Prompt", and navigate to the extracted folder through `cd /d PATH\TO\EXTRACTED\FOLDER`
- Set `IPEX_LLM_NUM_CTX` to the desired length in the "Command Prompt, e.g. `set IPEX_LLM_NUM_CTX=16384`
- Start Ollama serve through `start-ollama.bat`
- For **Linux** users:
- In a terminal, navigate to the extracted folder through `cd PATH\TO\EXTRACTED\FOLDER`
- Set `IPEX_LLM_NUM_CTX` to the desired length in the terminal, e.g. `export IPEX_LLM_NUM_CTX=16384`
- Start Ollama serve through `./start-ollama.sh`
> [!TIP]
> `IPEX_LLM_NUM_CTX` has a higher priority than the `num_ctx` settings in a models' `Modelfile`.
### Select specific GPU to run Ollama when multiple ones are available
If your machine has multiple Intel GPUs, Ollama will by default runs on all of them.
To specify which Intel GPU you would like Ollama to use, you could set environment variable `ONEAPI_DEVICE_SELECTOR` **before Start Ollama Serve**, as follows (if Ollama serve is already running, please make sure to stop it first):
- Identify the id (e.g. 0, 1, etc.) for your multiple GPUs. You could find them in the logs of Ollama serve when loading any models, e.g.:
<div align="center">
<img src="https://llm-assets.readthedocs.io/en/latest/_images/ollama_portable_multi_gpus.png" width=80%/>
</div>
- For **Windows** users:
- Open "Command Prompt", and navigate to the extracted folder by `cd /d PATH\TO\EXTRACTED\FOLDER`
- In the "Command Prompt", set `ONEAPI_DEVICE_SELECTOR` to define the Intel GPU you want to use, e.g. `set ONEAPI_DEVICE_SELECTOR=level_zero:0`, in which `0` should be changed to your desired GPU id
- Start Ollama serve through `start-ollama.bat`
- For **Linux** users:
- In a terminal, navigate to the extracted folder by `cd PATH\TO\EXTRACTED\FOLDER`
- Set `ONEAPI_DEVICE_SELECTOR` to define the Intel GPU you want to use, e.g. `export ONEAPI_DEVICE_SELECTOR=level_zero:0`, in which `0` should be changed to your desired GPU id
- Start Ollama serve through `./start-ollama.sh`
### Additional models supported after Ollama v0.5.4
The currently Ollama Portable Zip is based on Ollama v0.5.4; in addition, the following new models have also been supported in the Ollama Portable Zip:
| Model | Download | Model Link |
| - | - | - |
| DeepSeek-R1 | `ollama run deepseek-r1` | [deepseek-r1](https://ollama.com/library/deepseek-r1) |
| Openthinker | `ollama run openthinker` | [openthinker](https://ollama.com/library/openthinker) |
| DeepScaleR | `ollama run deepscaler` | [deepscaler](https://ollama.com/library/deepscaler) |
| Phi-4 | `ollama run phi4` | [phi4](https://ollama.com/library/phi4) |
| Dolphin 3.0 | `ollama run dolphin3` | [dolphin3](https://ollama.com/library/dolphin3) |
| Smallthinker | `ollama run smallthinker` | [smallthinker](https://ollama.com/library/smallthinker) |
| Granite3.1-Dense | `ollama run granite3-dense` | [granite3.1-dense](https://ollama.com/library/granite3.1-dense) |
| Granite3.1-Moe-3B | `ollama run granite3-moe` | [granite3.1-moe](https://ollama.com/library/granite3.1-moe) |
| Model | Download (Windows) | Download (Linux) | Model Link |
| - | - | - | - |
| DeepSeek-R1 | `ollama run deepseek-r1` | `./ollama run deepseek-r1` | [deepseek-r1](https://ollama.com/library/deepseek-r1) |
| Openthinker | `ollama run openthinker` | `./ollama run openthinker` | [openthinker](https://ollama.com/library/openthinker) |
| DeepScaleR | `ollama run deepscaler` | `./ollama run deepscaler` | [deepscaler](https://ollama.com/library/deepscaler) |
| Phi-4 | `ollama run phi4` | `./ollama run phi4` | [phi4](https://ollama.com/library/phi4) |
| Dolphin 3.0 | `ollama run dolphin3` | `./ollama run dolphin3` | [dolphin3](https://ollama.com/library/dolphin3) |
| Smallthinker | `ollama run smallthinker` |`./ollama run smallthinker` | [smallthinker](https://ollama.com/library/smallthinker) |
| Granite3.1-Dense | `ollama run granite3-dense` | `./ollama run granite3-dense` | [granite3.1-dense](https://ollama.com/library/granite3.1-dense) |
| Granite3.1-Moe-3B | `ollama run granite3-moe` | `./ollama run granite3-moe` | [granite3.1-moe](https://ollama.com/library/granite3.1-moe) |

View file

@ -5,17 +5,22 @@
本指南演示如何使用 [Ollama portable zip](https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly) 通过 `ipex-llm` 在 Intel GPU 上直接免安装运行 Ollama。
> [!NOTE]
> 目前IPEX-LLM 仅在 Windows 上提供 Ollama portable zip。
## 目录
- [系统环境安装](#系统环境准备)
- [步骤 1下载和解压](#步骤-1下载和解压)
- [步骤 2启动 Ollama Serve](#步骤-2启动-ollama-serve)
- [步骤 3运行 Ollama](#步骤-3运行-ollama)
- [提示和故障排除](#提示和故障排除)
- [Windows用户指南](#windows用户指南)
- [系统环境安装](#系统环境准备)
- [步骤 1下载和解压](#步骤-1下载和解压)
- [步骤 2启动 Ollama Serve](#步骤-2启动-ollama-serve)
- [步骤 3运行 Ollama](#步骤-3运行-ollama)
- [提示和故障排除](#提示和故障排除)
- [Linux用户指南](#linux用户指南)
- [系统环境安装](#系统环境准备-1)
- [步骤 1下载和解压](#步骤-1下载和解压-1)
- [步骤 2启动 Ollama Serve](#步骤-2启动-ollama-serve-1)
- [步骤 3运行 Ollama](#步骤-3运行-ollama-1)
## 系统环境准备
## Windows用户指南
### 系统环境准备
检查你的 GPU 驱动程序版本,并根据需要进行更新:
@ -23,13 +28,13 @@
- 对于其他的 Intel 核显和独显,我们推荐使用 GPU 驱动版本 [32.0.101.6078](https://www.intel.com/content/www/us/en/download/785597/834050/intel-arc-iris-xe-graphics-windows.html)
## 步骤 1下载和解压
### 步骤 1下载和解压
从此[链接](https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly)下载 IPEX-LLM Ollama portable zip。
然后,将 zip 文件解压到一个文件夹中。
## 步骤 2启动 Ollama Serve
### 步骤 2启动 Ollama Serve
在解压后的文件夹中双击 `start-ollama.bat` 即可启动 Ollama Serve。随后会弹出一个窗口如下所示
@ -37,27 +42,27 @@
<img src="https://llm-assets.readthedocs.io/en/latest/_images/ollama_portable_start_ollama.png" width=80%/>
</div>
## 步骤 3运行 Ollama
### 步骤 3运行 Ollama
在 Intel GPUs 上使用 Ollama 运行 LLMs如下所示
- 打开命令提示符cmd并通过在命令行输入指令 `cd /d PATH\TO\EXTRACTED\FOLDER` 进入解压后的文件夹
- 在命令提示符中运行 `ollama run deepseek-r1:7可以将当前模型替换为你需要的模型
- 在命令提示符中运行 `ollama run deepseek-r1:7b`(可以将当前模型替换为你需要的模型)
<div align="center">
<img src="https://llm-assets.readthedocs.io/en/latest/_images/ollama_portable_run_ollama.png" width=80%/>
</div>
## 提示和故障排除
### 提示和故障排除
### 通过切换源提升模型下载速度
#### 通过切换源提升模型下载速度
Ollama 默认从 [Ollama 库](https://ollama.com/library) 下载模型。在 [运行 Ollama](#步骤-3运行-ollama) 之前设置环境变量 `IPEX_LLM_MODEL_SOURCE``modelscope/ollama`,你可以切换模型的首选下载源。
Ollama 默认从 [Ollama 库](https://ollama.com/library) 下载模型。通过在 [运行 Ollama](#步骤-3运行-ollama) 之前设置环境变量 `IPEX_LLM_MODEL_SOURCE``modelscope/ollama`,你可以切换模型的首选下载源。
例如,如果你想运行 `deepseek-r1:7b` 但从 Ollama 库的下载速度较慢,可以通过如下方式改用 [ModelScope](https://www.modelscope.cn/models) 的 [模型源](https://www.modelscope.cn/models/unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUF)
例如,如果你想运行 `deepseek-r1:7b` 但从 Ollama 库的下载速度较慢,可以通过如下方式改用 [ModelScope](https://www.modelscope.cn/models) 的 [模型源](https://www.modelscope.cn/models/unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUF)
- 打开命令提示符cmd并通过 `cd /d PATH\TO\EXTRACTED\FOLDER` 命令进入解压后的文件夹
- 在命令提示符中运行 `set IPEX_LLM_MODEL_SOURCE=modelscope`
- 打开命令提示符cmd并通过 `cd /d PATH\TO\EXTRACTED\FOLDER` 命令进入解压后的文件夹
- 在命令提示符中运行 `set IPEX_LLM_MODEL_SOURCE=modelscope`
- 运行 `ollama run deepseek-r1:7b`
> [!Tip]
@ -68,20 +73,20 @@ Ollama 默认从 [Ollama 库](https://ollama.com/library) 下载模型。在 [
> ```
> 除了 `ollama run``ollama pull`,其他操作中模型应通过其实际 ID 进行识别,例如: `ollama rm modelscope.cn/unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUF:Q4_K_M`
### 在 Ollama 中增加上下文长度
#### 在 Ollama 中增加上下文长度
默认情况下Ollama 使用 2048 个 token 的上下文窗口运行模型。也就是说,模型最多能 “记住” 2048 个 token 的上下文。
要增加上下文长度,可以在 [启动 Ollama serve](#步骤-2启动-ollama-serve) 之前设置环境变量 `IPEX_LLM_NUM_CTX`,步骤如下:
- 打开命令提示符cmd并通过 `cd /d PATH\TO\EXTRACTED\FOLDER` 命令进入解压后的文件夹
- 在命令提示符中将 `IPEX_LLM_NUM_CTX` 设置为所需长度,例如:`set IPEX_LLM_NUM_CTX=16384`
- 打开命令提示符cmd并通过 `cd /d PATH\TO\EXTRACTED\FOLDER` 命令进入解压后的文件夹
- 在命令提示符中将 `IPEX_LLM_NUM_CTX` 设置为所需长度,例如:`set IPEX_LLM_NUM_CTX=16384`
- 通过运行 `start-ollama.bat` 启动 Ollama serve
> [!Tip]
> `IPEX_LLM_NUM_CTX` 的优先级高于模型 `Modelfile` 中设置的 `num_ctx`
### Ollama v0.5.4 之后支持的其他模型
#### Ollama v0.5.4 之后支持的其他模型
当前的 Ollama Portable Zip 基于 Ollama v0.5.4;此外,以下新模型也已在 Ollama Portable Zip 中得到支持:
@ -95,3 +100,48 @@ Ollama 默认从 [Ollama 库](https://ollama.com/library) 下载模型。在 [
| Smallthinker | `ollama run smallthinker` | [smallthinker](https://ollama.com/library/smallthinker) |
| Granite3.1-Dense | `ollama run granite3-dense` | [granite3.1-dense](https://ollama.com/library/granite3.1-dense) |
| Granite3.1-Moe-3B | `ollama run granite3-moe` | [granite3.1-moe](https://ollama.com/library/granite3.1-moe) |
## Linux用户指南
### 系统环境准备
检查你的 GPU 驱动程序版本,并根据需要进行更新:
- 对于消费级显卡用户如A系列B系列和集成显卡我们推荐按照[消费级显卡驱动安装指南](https://dgpu-docs.intel.com/driver/client/overview.html)来安装您的显卡驱动。
### 步骤 1下载和解压
从此[链接](https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly)下载 IPEX-LLM Ollama portable tgz。
然后,开启一个终端,输入如下命令将 tgz 文件解压到一个文件夹中。
```bash
cd PATH/TO/DOWNLOADED/TGZ
tar xvf [Downloaded tgz file]
```
### 步骤 2启动 Ollama Serve
进入解压后的文件夹,执行`./start-ollama.sh`启动 Ollama Serve
[可选操作] 对于有多块显卡的用户,请编辑解压后文件夹中的 start-ollama.sh并根据机器配置修改 ONEAPI_DEVICE_SELECTOR。默认情况下Ollama 会使用所有显卡。
```bash
cd PATH/TO/EXTRACTED/FOLDER
./start-ollama.sh
```
<div align="center">
<img src="https://llm-assets.readthedocs.io/en/latest/_images/ollama_portable_start_ollama_ubuntu.png" width=80%/>
</div>
### 步骤 3运行 Ollama
在 Intel GPUs 上使用 Ollama 运行大语言模型,如下所示:
- 打开另外一个终端,并输入指令 `cd PATH/TO/EXTRACTED/FOLDER` 进入解压后的文件夹
- 在终端中运行 `./ollama run deepseek-r1:7b`(可以将当前模型替换为你需要的模型)
<div align="center">
<img src="https://llm-assets.readthedocs.io/en/latest/_images/ollama_portable_run_ollama_ubuntu.png" width=80%/>
</div>