Add Portable zip Linux QuickStart (#12849)
* linux doc * update * Update ollama_portablze_zip_quickstart.md * Update ollama_portablze_zip_quickstart.md * Update ollama_portablze_zip_quickstart.zh-CN.md * Update ollama_portablze_zip_quickstart.md * meet code review * update * Add tips & troubleshooting sections for both Linux & Windows * Rebase * Fix based on comments * Small fix * Fix img * Update table for linux * Small fix --------- Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
This commit is contained in:
		
							parent
							
								
									b26409d53f
								
							
						
					
					
						commit
						c81b7fc003
					
				
					 2 changed files with 194 additions and 54 deletions
				
			
		| 
						 | 
				
			
			@ -5,17 +5,22 @@
 | 
			
		|||
 | 
			
		||||
This guide demonstrates how to use [Ollama portable zip](https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly) to directly run Ollama on Intel GPU with `ipex-llm` (without the need of manual installations).
 | 
			
		||||
 | 
			
		||||
> [!NOTE]
 | 
			
		||||
> Currently, IPEX-LLM only provides Ollama portable zip on Windows.
 | 
			
		||||
 | 
			
		||||
## Table of Contents
 | 
			
		||||
- [Prerequisites](#prerequisitesa)
 | 
			
		||||
- [Windows Quickstart](#windows-quickstart)
 | 
			
		||||
  - [Prerequisites](#prerequisites)
 | 
			
		||||
  - [Step 1: Download and Unzip](#step-1-download-and-unzip)
 | 
			
		||||
  - [Step 2: Start Ollama Serve](#step-2-start-ollama-serve)
 | 
			
		||||
  - [Step 3: Run Ollama](#step-3-run-ollama)
 | 
			
		||||
- [Tips & Troubleshooting](#tips--troubleshootings)
 | 
			
		||||
- [Linux Quickstart](#linux-quickstart)
 | 
			
		||||
  - [Prerequisites](#prerequisites-1)
 | 
			
		||||
  - [Step 1: Download and Extract](#step-1-download-and-extract)
 | 
			
		||||
  - [Step 2: Start Ollama Serve](#step-2-start-ollama-serve-1)
 | 
			
		||||
  - [Step 3: Run Ollama](#step-3-run-ollama-1)
 | 
			
		||||
- [Tips & Troubleshooting](#tips--troubleshooting)
 | 
			
		||||
 | 
			
		||||
## Prerequisites
 | 
			
		||||
## Windows Quickstart
 | 
			
		||||
 | 
			
		||||
### Prerequisites
 | 
			
		||||
 | 
			
		||||
Check your GPU driver version, and update it if needed:
 | 
			
		||||
 | 
			
		||||
| 
						 | 
				
			
			@ -23,13 +28,13 @@ Check your GPU driver version, and update it if needed:
 | 
			
		|||
 | 
			
		||||
- For other Intel iGPU/dGPU, we recommend using GPU driver version [32.0.101.6078](https://www.intel.com/content/www/us/en/download/785597/834050/intel-arc-iris-xe-graphics-windows.html)
 | 
			
		||||
 | 
			
		||||
## Step 1: Download and Unzip
 | 
			
		||||
### Step 1: Download and Unzip
 | 
			
		||||
 | 
			
		||||
Download IPEX-LLM Ollama portable zip from the [link](https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly).
 | 
			
		||||
Download IPEX-LLM Ollama portable zip for Windows users from the [link](https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly).
 | 
			
		||||
 | 
			
		||||
Then, extract the zip file to a folder.
 | 
			
		||||
 | 
			
		||||
## Step 2: Start Ollama Serve
 | 
			
		||||
### Step 2: Start Ollama Serve
 | 
			
		||||
 | 
			
		||||
Double-click `start-ollama.bat` in the extracted folder to start the Ollama service. A window will then pop up as shown below:
 | 
			
		||||
 | 
			
		||||
| 
						 | 
				
			
			@ -37,7 +42,7 @@ Double-click `start-ollama.bat` in the extracted folder to start the Ollama serv
 | 
			
		|||
  <img src="https://llm-assets.readthedocs.io/en/latest/_images/ollama_portable_start_ollama.png"  width=80%/>
 | 
			
		||||
</div>
 | 
			
		||||
 | 
			
		||||
## Step 3: Run Ollama
 | 
			
		||||
### Step 3: Run Ollama
 | 
			
		||||
 | 
			
		||||
You could then use Ollama to run LLMs on Intel GPUs as follows:
 | 
			
		||||
 | 
			
		||||
| 
						 | 
				
			
			@ -48,18 +53,71 @@ You could then use Ollama to run LLMs on Intel GPUs as follows:
 | 
			
		|||
  <img src="https://llm-assets.readthedocs.io/en/latest/_images/ollama_portable_run_ollama.png"  width=80%/>
 | 
			
		||||
</div>
 | 
			
		||||
 | 
			
		||||
## Linux Quickstart
 | 
			
		||||
 | 
			
		||||
### Prerequisites
 | 
			
		||||
 | 
			
		||||
Check your GPU driver version, and update it if needed:
 | 
			
		||||
 | 
			
		||||
- For client GPU, like A-series, B-series and integrated GPU, we recommend following [Intel client GPU driver installing guide](https://dgpu-docs.intel.com/driver/client/overview.html) to install your GPU driver.
 | 
			
		||||
 | 
			
		||||
### Step 1: Download and Extract
 | 
			
		||||
 | 
			
		||||
Download IPEX-LLM Ollama portable tgz for Ubuntu users from the [link](https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly).
 | 
			
		||||
 | 
			
		||||
Then open a terminal, extract the tgz file to a folder.
 | 
			
		||||
 | 
			
		||||
```bash
 | 
			
		||||
cd PATH/TO/DOWNLOADED/TGZ
 | 
			
		||||
tar xvf [Downloaded tgz file]
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
### Step 2: Start Ollama Serve
 | 
			
		||||
 | 
			
		||||
Enter the extracted folder, and run `start-ollama.sh` to start Ollama service.  
 | 
			
		||||
 | 
			
		||||
```bash
 | 
			
		||||
cd PATH/TO/EXTRACTED/FOLDER
 | 
			
		||||
./start-ollama.sh
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
<div align="center">
 | 
			
		||||
  <img src="https://llm-assets.readthedocs.io/en/latest/_images/ollama_portable_start_ollama_ubuntu.png"  width=80%/>
 | 
			
		||||
</div>
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
### Step 3: Run Ollama
 | 
			
		||||
 | 
			
		||||
You could then use Ollama to run LLMs on Intel GPUs as follows:
 | 
			
		||||
 | 
			
		||||
- Open another ternimal, and enter the extracted folder through `cd PATH/TO/EXTRACTED/FOLDER`
 | 
			
		||||
- Run `./ollama run deepseek-r1:7b` (you may use any other model)
 | 
			
		||||
 | 
			
		||||
<div align="center">
 | 
			
		||||
  <img src="https://llm-assets.readthedocs.io/en/latest/_images/ollama_portable_run_ollama_ubuntu.png"  width=80%/>
 | 
			
		||||
</div>
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
## Tips & Troubleshooting
 | 
			
		||||
 | 
			
		||||
### Speed up model download using alternative sources
 | 
			
		||||
 | 
			
		||||
Ollama by default downloads model from [Ollama library](https://ollama.com/library). By setting the environment variable `IPEX_LLM_MODEL_SOURCE` to `modelscope`/`ollama` before [run Ollama](#step-3-run-ollama), you could switch the source from which the model is downloaded first.
 | 
			
		||||
Ollama by default downloads model from [Ollama library](https://ollama.com/library). By setting the environment variable `IPEX_LLM_MODEL_SOURCE` to `modelscope`/`ollama` **before Run Ollama**, you could switch the source from which the model is downloaded first.
 | 
			
		||||
 | 
			
		||||
For example, if you would like to run `deepseek-r1:7b` but the download speed from Ollama library is quite slow, you could use [its model source](https://www.modelscope.cn/models/unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUF) from [ModelScope](https://www.modelscope.cn/models) instead, through:
 | 
			
		||||
 | 
			
		||||
- Open "Command Prompt" (cmd), and navigate to the extracted folder by `cd /d PATH\TO\EXTRACTED\FOLDER`
 | 
			
		||||
- For **Windows** users:
 | 
			
		||||
 | 
			
		||||
  - Open "Command Prompt", and navigate to the extracted folder by `cd /d PATH\TO\EXTRACTED\FOLDER`
 | 
			
		||||
  - Run `set IPEX_LLM_MODEL_SOURCE=modelscope` in "Command Prompt"
 | 
			
		||||
  - Run `ollama run deepseek-r1:7b`
 | 
			
		||||
 | 
			
		||||
- For **Linux** users:
 | 
			
		||||
 | 
			
		||||
  - In a terminal other than the one for Ollama serve, navigate to the extracted folder by `cd PATH\TO\EXTRACTED\FOLDER`
 | 
			
		||||
  - Run `export IPEX_LLM_MODEL_SOURCE=modelscope` in the terminal
 | 
			
		||||
  - Run `./ollama run deepseek-r1:7b`
 | 
			
		||||
 | 
			
		||||
> [!TIP]
 | 
			
		||||
> Model downloaded with `set IPEX_LLM_MODEL_SOURCE=modelscope` will still show actual model id in `ollama list`, e.g.
 | 
			
		||||
> ```
 | 
			
		||||
| 
						 | 
				
			
			@ -72,26 +130,58 @@ For example, if you would like to run `deepseek-r1:7b` but the download speed fr
 | 
			
		|||
 | 
			
		||||
By default, Ollama runs model with a context window of 2048 tokens. That is, the model can "remember" at most 2048 tokens of context.
 | 
			
		||||
 | 
			
		||||
To increase the context length, you could set environment variable `IPEX_LLM_NUM_CTX` before [starting Ollama serve](#step-2-start-ollama-serve), as shwon below:
 | 
			
		||||
To increase the context length, you could set environment variable `IPEX_LLM_NUM_CTX` **before Start Ollama Serve**, as shwon below (if Ollama serve is already running, please make sure to stop it first):
 | 
			
		||||
 | 
			
		||||
- Open "Command Prompt" (cmd), and navigate to the extracted folder through `cd /d PATH\TO\EXTRACTED\FOLDER`
 | 
			
		||||
- For **Windows** users:
 | 
			
		||||
 | 
			
		||||
  - Open "Command Prompt", and navigate to the extracted folder through `cd /d PATH\TO\EXTRACTED\FOLDER`
 | 
			
		||||
  - Set `IPEX_LLM_NUM_CTX` to the desired length in the "Command Prompt, e.g. `set IPEX_LLM_NUM_CTX=16384`
 | 
			
		||||
  - Start Ollama serve through `start-ollama.bat`
 | 
			
		||||
 | 
			
		||||
- For **Linux** users:
 | 
			
		||||
 | 
			
		||||
  - In a terminal, navigate to the extracted folder through `cd PATH\TO\EXTRACTED\FOLDER`
 | 
			
		||||
  - Set `IPEX_LLM_NUM_CTX` to the desired length in the terminal, e.g. `export IPEX_LLM_NUM_CTX=16384`
 | 
			
		||||
  - Start Ollama serve through `./start-ollama.sh`
 | 
			
		||||
 | 
			
		||||
> [!TIP]
 | 
			
		||||
> `IPEX_LLM_NUM_CTX` has a higher priority than the `num_ctx` settings in a models' `Modelfile`.
 | 
			
		||||
 | 
			
		||||
### Select specific GPU to run Ollama when multiple ones are available
 | 
			
		||||
 | 
			
		||||
If your machine has multiple Intel GPUs, Ollama will by default runs on all of them.
 | 
			
		||||
 | 
			
		||||
To specify which Intel GPU you would like Ollama to use, you could set environment variable `ONEAPI_DEVICE_SELECTOR` **before Start Ollama Serve**, as follows (if Ollama serve is already running, please make sure to stop it first):
 | 
			
		||||
 | 
			
		||||
- Identify the id (e.g. 0, 1, etc.) for your multiple GPUs. You could find them in the logs of Ollama serve when loading any models, e.g.:
 | 
			
		||||
 | 
			
		||||
  <div align="center">
 | 
			
		||||
    <img src="https://llm-assets.readthedocs.io/en/latest/_images/ollama_portable_multi_gpus.png"  width=80%/>
 | 
			
		||||
  </div>
 | 
			
		||||
 | 
			
		||||
- For **Windows** users:
 | 
			
		||||
 | 
			
		||||
  - Open "Command Prompt", and navigate to the extracted folder by `cd /d PATH\TO\EXTRACTED\FOLDER`
 | 
			
		||||
  - In the "Command Prompt", set `ONEAPI_DEVICE_SELECTOR` to define the Intel GPU you want to use, e.g. `set ONEAPI_DEVICE_SELECTOR=level_zero:0`, in which `0` should be changed to your desired GPU id
 | 
			
		||||
  - Start Ollama serve through `start-ollama.bat`
 | 
			
		||||
 | 
			
		||||
- For **Linux** users:
 | 
			
		||||
 | 
			
		||||
  - In a terminal, navigate to the extracted folder by `cd PATH\TO\EXTRACTED\FOLDER`
 | 
			
		||||
  - Set `ONEAPI_DEVICE_SELECTOR` to define the Intel GPU you want to use, e.g. `export ONEAPI_DEVICE_SELECTOR=level_zero:0`, in which `0` should be changed to your desired GPU id
 | 
			
		||||
  - Start Ollama serve through `./start-ollama.sh`
 | 
			
		||||
 | 
			
		||||
### Additional models supported after Ollama v0.5.4
 | 
			
		||||
 | 
			
		||||
The currently Ollama Portable Zip is based on Ollama v0.5.4; in addition, the following new models have also been supported in the Ollama Portable Zip:
 | 
			
		||||
 | 
			
		||||
  | Model  | Download | Model Link |
 | 
			
		||||
  | - | - | - |
 | 
			
		||||
  | DeepSeek-R1 | `ollama run deepseek-r1` | [deepseek-r1](https://ollama.com/library/deepseek-r1) |
 | 
			
		||||
  | Openthinker | `ollama run openthinker` | [openthinker](https://ollama.com/library/openthinker) |
 | 
			
		||||
  | DeepScaleR | `ollama run deepscaler` | [deepscaler](https://ollama.com/library/deepscaler) |
 | 
			
		||||
  | Phi-4 | `ollama run phi4` | [phi4](https://ollama.com/library/phi4) |
 | 
			
		||||
  | Dolphin 3.0 | `ollama run dolphin3` | [dolphin3](https://ollama.com/library/dolphin3) |
 | 
			
		||||
  | Smallthinker | `ollama run smallthinker` | [smallthinker](https://ollama.com/library/smallthinker) |
 | 
			
		||||
  | Granite3.1-Dense |  `ollama run granite3-dense` | [granite3.1-dense](https://ollama.com/library/granite3.1-dense) |
 | 
			
		||||
  | Granite3.1-Moe-3B | `ollama run granite3-moe` | [granite3.1-moe](https://ollama.com/library/granite3.1-moe) |
 | 
			
		||||
  | Model  | Download (Windows) | Download (Linux) | Model Link |
 | 
			
		||||
  | - | - | - | - |
 | 
			
		||||
  | DeepSeek-R1 | `ollama run deepseek-r1` | `./ollama run deepseek-r1` | [deepseek-r1](https://ollama.com/library/deepseek-r1) |
 | 
			
		||||
  | Openthinker | `ollama run openthinker` | `./ollama run openthinker` | [openthinker](https://ollama.com/library/openthinker) |
 | 
			
		||||
  | DeepScaleR | `ollama run deepscaler` | `./ollama run deepscaler` | [deepscaler](https://ollama.com/library/deepscaler) |
 | 
			
		||||
  | Phi-4 | `ollama run phi4` | `./ollama run phi4` | [phi4](https://ollama.com/library/phi4) |
 | 
			
		||||
  | Dolphin 3.0 | `ollama run dolphin3` | `./ollama run dolphin3` | [dolphin3](https://ollama.com/library/dolphin3) |
 | 
			
		||||
  | Smallthinker | `ollama run smallthinker` |`./ollama run smallthinker` | [smallthinker](https://ollama.com/library/smallthinker) |
 | 
			
		||||
  | Granite3.1-Dense |  `ollama run granite3-dense` | `./ollama run granite3-dense` | [granite3.1-dense](https://ollama.com/library/granite3.1-dense) |
 | 
			
		||||
  | Granite3.1-Moe-3B | `ollama run granite3-moe` | `./ollama run granite3-moe` | [granite3.1-moe](https://ollama.com/library/granite3.1-moe) |
 | 
			
		||||
| 
						 | 
				
			
			@ -5,17 +5,22 @@
 | 
			
		|||
 | 
			
		||||
本指南演示如何使用 [Ollama portable zip](https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly) 通过 `ipex-llm` 在 Intel GPU 上直接免安装运行 Ollama。
 | 
			
		||||
 | 
			
		||||
> [!NOTE]
 | 
			
		||||
> 目前,IPEX-LLM 仅在 Windows 上提供 Ollama portable zip。
 | 
			
		||||
 | 
			
		||||
## 目录
 | 
			
		||||
- [Windows用户指南](#windows用户指南)
 | 
			
		||||
  - [系统环境安装](#系统环境准备)
 | 
			
		||||
  - [步骤 1:下载和解压](#步骤-1下载和解压)
 | 
			
		||||
  - [步骤 2:启动 Ollama Serve](#步骤-2启动-ollama-serve)
 | 
			
		||||
  - [步骤 3:运行 Ollama](#步骤-3运行-ollama)
 | 
			
		||||
  - [提示和故障排除](#提示和故障排除)
 | 
			
		||||
- [Linux用户指南](#linux用户指南)
 | 
			
		||||
  - [系统环境安装](#系统环境准备-1)
 | 
			
		||||
  - [步骤 1:下载和解压](#步骤-1下载和解压-1)
 | 
			
		||||
  - [步骤 2:启动 Ollama Serve](#步骤-2启动-ollama-serve-1)
 | 
			
		||||
  - [步骤 3:运行 Ollama](#步骤-3运行-ollama-1)
 | 
			
		||||
 | 
			
		||||
## 系统环境准备
 | 
			
		||||
## Windows用户指南
 | 
			
		||||
 | 
			
		||||
### 系统环境准备
 | 
			
		||||
 | 
			
		||||
检查你的 GPU 驱动程序版本,并根据需要进行更新:
 | 
			
		||||
 | 
			
		||||
| 
						 | 
				
			
			@ -23,13 +28,13 @@
 | 
			
		|||
 | 
			
		||||
- 对于其他的 Intel 核显和独显,我们推荐使用 GPU 驱动版本 [32.0.101.6078](https://www.intel.com/content/www/us/en/download/785597/834050/intel-arc-iris-xe-graphics-windows.html)
 | 
			
		||||
 | 
			
		||||
## 步骤 1:下载和解压
 | 
			
		||||
### 步骤 1:下载和解压
 | 
			
		||||
 | 
			
		||||
从此[链接](https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly)下载 IPEX-LLM Ollama portable zip。
 | 
			
		||||
 | 
			
		||||
然后,将 zip 文件解压到一个文件夹中。
 | 
			
		||||
 | 
			
		||||
## 步骤 2:启动 Ollama Serve
 | 
			
		||||
### 步骤 2:启动 Ollama Serve
 | 
			
		||||
 | 
			
		||||
在解压后的文件夹中双击 `start-ollama.bat` 即可启动 Ollama Serve。随后会弹出一个窗口,如下所示:
 | 
			
		||||
 | 
			
		||||
| 
						 | 
				
			
			@ -37,27 +42,27 @@
 | 
			
		|||
  <img src="https://llm-assets.readthedocs.io/en/latest/_images/ollama_portable_start_ollama.png"  width=80%/>
 | 
			
		||||
</div>
 | 
			
		||||
 | 
			
		||||
## 步骤 3:运行 Ollama
 | 
			
		||||
### 步骤 3:运行 Ollama
 | 
			
		||||
 | 
			
		||||
在 Intel GPUs 上使用 Ollama 运行 LLMs,如下所示:
 | 
			
		||||
 | 
			
		||||
- 打开命令提示符(cmd),并通过在命令行输入指令 `cd /d PATH\TO\EXTRACTED\FOLDER` 进入解压后的文件夹
 | 
			
		||||
- 在命令提示符中运行 `ollama run deepseek-r1:7(可以将当前模型替换为你需要的模型)
 | 
			
		||||
- 在命令提示符中运行 `ollama run deepseek-r1:7b`(可以将当前模型替换为你需要的模型)
 | 
			
		||||
 | 
			
		||||
<div align="center">
 | 
			
		||||
  <img src="https://llm-assets.readthedocs.io/en/latest/_images/ollama_portable_run_ollama.png"  width=80%/>
 | 
			
		||||
</div>
 | 
			
		||||
 | 
			
		||||
## 提示和故障排除
 | 
			
		||||
### 提示和故障排除
 | 
			
		||||
 | 
			
		||||
### 通过切换源提升模型下载速度
 | 
			
		||||
#### 通过切换源提升模型下载速度
 | 
			
		||||
 | 
			
		||||
Ollama 默认从 [Ollama 库](https://ollama.com/library) 下载模型。在 [运行 Ollama](#步骤-3运行-ollama) 之前设置环境变量 `IPEX_LLM_MODEL_SOURCE` 为 `modelscope/ollama`,你可以切换模型的首选下载源。
 | 
			
		||||
Ollama 默认从 [Ollama 库](https://ollama.com/library) 下载模型。通过在 [运行 Ollama](#步骤-3运行-ollama) 之前设置环境变量 `IPEX_LLM_MODEL_SOURCE` 为 `modelscope/ollama`,你可以切换模型的首选下载源。
 | 
			
		||||
 | 
			
		||||
例如,如果你想运行 `deepseek-r1:7b` 但从 Ollama 库的下载速度较慢,可以通过如下方式改用 [ModelScope](https://www.modelscope.cn/models) 的 [模型源](https://www.modelscope.cn/models/unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUF):
 | 
			
		||||
例如,如果你想运行 `deepseek-r1:7b` 但从 Ollama 库的下载速度较慢,可以通过如下方式改用 [ModelScope](https://www.modelscope.cn/models) 上的 [模型源](https://www.modelscope.cn/models/unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUF):
 | 
			
		||||
 | 
			
		||||
- 打开 “命令提示符”(cmd),并通过 `cd /d PATH\TO\EXTRACTED\FOLDER` 命令进入解压后的文件夹
 | 
			
		||||
- 在 “命令提示符” 中运行 `set IPEX_LLM_MODEL_SOURCE=modelscope`
 | 
			
		||||
- 打开命令提示符(cmd),并通过 `cd /d PATH\TO\EXTRACTED\FOLDER` 命令进入解压后的文件夹
 | 
			
		||||
- 在命令提示符中运行 `set IPEX_LLM_MODEL_SOURCE=modelscope`
 | 
			
		||||
- 运行 `ollama run deepseek-r1:7b`
 | 
			
		||||
 | 
			
		||||
> [!Tip]
 | 
			
		||||
| 
						 | 
				
			
			@ -68,20 +73,20 @@ Ollama 默认从 [Ollama 库](https://ollama.com/library) 下载模型。在 [
 | 
			
		|||
> ```
 | 
			
		||||
> 除了 `ollama run` 和 `ollama pull`,其他操作中模型应通过其实际 ID 进行识别,例如: `ollama rm modelscope.cn/unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUF:Q4_K_M`
 | 
			
		||||
 | 
			
		||||
### 在 Ollama 中增加上下文长度
 | 
			
		||||
#### 在 Ollama 中增加上下文长度
 | 
			
		||||
 | 
			
		||||
默认情况下,Ollama 使用 2048 个 token 的上下文窗口运行模型。也就是说,模型最多能 “记住” 2048 个 token 的上下文。
 | 
			
		||||
 | 
			
		||||
要增加上下文长度,可以在 [启动 Ollama serve](#步骤-2启动-ollama-serve) 之前设置环境变量 `IPEX_LLM_NUM_CTX`,步骤如下:
 | 
			
		||||
 | 
			
		||||
- 打开 “命令提示符”(cmd),并通过 `cd /d PATH\TO\EXTRACTED\FOLDER` 命令进入解压后的文件夹
 | 
			
		||||
- 在 “命令提示符” 中将 `IPEX_LLM_NUM_CTX` 设置为所需长度,例如:`set IPEX_LLM_NUM_CTX=16384`
 | 
			
		||||
- 打开命令提示符(cmd),并通过 `cd /d PATH\TO\EXTRACTED\FOLDER` 命令进入解压后的文件夹
 | 
			
		||||
- 在命令提示符中将 `IPEX_LLM_NUM_CTX` 设置为所需长度,例如:`set IPEX_LLM_NUM_CTX=16384`
 | 
			
		||||
- 通过运行 `start-ollama.bat` 启动 Ollama serve
 | 
			
		||||
 | 
			
		||||
> [!Tip]
 | 
			
		||||
> `IPEX_LLM_NUM_CTX` 的优先级高于模型 `Modelfile` 中设置的 `num_ctx`。
 | 
			
		||||
 | 
			
		||||
### Ollama v0.5.4 之后支持的其他模型
 | 
			
		||||
#### Ollama v0.5.4 之后支持的其他模型
 | 
			
		||||
 | 
			
		||||
当前的 Ollama Portable Zip 基于 Ollama v0.5.4;此外,以下新模型也已在 Ollama Portable Zip 中得到支持:
 | 
			
		||||
 | 
			
		||||
| 
						 | 
				
			
			@ -95,3 +100,48 @@ Ollama 默认从 [Ollama 库](https://ollama.com/library) 下载模型。在 [
 | 
			
		|||
  | Smallthinker | `ollama run smallthinker` | [smallthinker](https://ollama.com/library/smallthinker) |
 | 
			
		||||
  | Granite3.1-Dense |  `ollama run granite3-dense` | [granite3.1-dense](https://ollama.com/library/granite3.1-dense) |
 | 
			
		||||
  | Granite3.1-Moe-3B | `ollama run granite3-moe` | [granite3.1-moe](https://ollama.com/library/granite3.1-moe) |
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
## Linux用户指南
 | 
			
		||||
 | 
			
		||||
### 系统环境准备
 | 
			
		||||
 | 
			
		||||
检查你的 GPU 驱动程序版本,并根据需要进行更新:
 | 
			
		||||
 | 
			
		||||
- 对于消费级显卡用户,如A系列,B系列和集成显卡,我们推荐按照[消费级显卡驱动安装指南](https://dgpu-docs.intel.com/driver/client/overview.html)来安装您的显卡驱动。
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
### 步骤 1:下载和解压
 | 
			
		||||
 | 
			
		||||
从此[链接](https://github.com/intel/ipex-llm/releases/tag/v2.2.0-nightly)下载 IPEX-LLM Ollama portable tgz。
 | 
			
		||||
 | 
			
		||||
然后,开启一个终端,输入如下命令将 tgz 文件解压到一个文件夹中。
 | 
			
		||||
```bash
 | 
			
		||||
cd PATH/TO/DOWNLOADED/TGZ
 | 
			
		||||
tar xvf [Downloaded tgz file]
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
### 步骤 2:启动 Ollama Serve
 | 
			
		||||
 | 
			
		||||
进入解压后的文件夹,执行`./start-ollama.sh`启动 Ollama Serve: 
 | 
			
		||||
 | 
			
		||||
[可选操作] 对于有多块显卡的用户,请编辑解压后文件夹中的 start-ollama.sh,并根据机器配置修改 ONEAPI_DEVICE_SELECTOR。默认情况下,Ollama 会使用所有显卡。
 | 
			
		||||
```bash
 | 
			
		||||
 cd PATH/TO/EXTRACTED/FOLDER
 | 
			
		||||
./start-ollama.sh
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
<div align="center">
 | 
			
		||||
  <img src="https://llm-assets.readthedocs.io/en/latest/_images/ollama_portable_start_ollama_ubuntu.png"  width=80%/>
 | 
			
		||||
</div>
 | 
			
		||||
 | 
			
		||||
### 步骤 3:运行 Ollama
 | 
			
		||||
 | 
			
		||||
在 Intel GPUs 上使用 Ollama 运行大语言模型,如下所示:
 | 
			
		||||
 | 
			
		||||
- 打开另外一个终端,并输入指令 `cd PATH/TO/EXTRACTED/FOLDER` 进入解压后的文件夹
 | 
			
		||||
- 在终端中运行 `./ollama run deepseek-r1:7b`(可以将当前模型替换为你需要的模型)
 | 
			
		||||
 | 
			
		||||
<div align="center">
 | 
			
		||||
  <img src="https://llm-assets.readthedocs.io/en/latest/_images/ollama_portable_run_ollama_ubuntu.png"  width=80%/>
 | 
			
		||||
</div>
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
		Loading…
	
		Reference in a new issue