Update CN Ollama portable zip QuickStart for troubleshooting & tips (#12860)
* Small fix for english version * Update CN ollama portable zip quickstart for troubleshooting & tips * Small fix
This commit is contained in:
		
							parent
							
								
									38a682adb1
								
							
						
					
					
						commit
						0f2706be42
					
				
					 2 changed files with 73 additions and 35 deletions
				
			
		| 
						 | 
				
			
			@ -107,7 +107,7 @@ You could then use Ollama to run LLMs on Intel GPUs as follows:
 | 
			
		|||
 | 
			
		||||
Ollama by default downloads model from the Ollama library. By setting the environment variable `IPEX_LLM_MODEL_SOURCE` to `modelscope` or `ollama` **before running Ollama**, you could switch the source where the model is downloaded.
 | 
			
		||||
 | 
			
		||||
For example, if you would like to run `deepseek-r1:7b` but the download speed from the Ollama library is slow, you could download the model from [ModelScope](https://www.modelscope.cn/models) as follows:
 | 
			
		||||
For example, if you would like to run `deepseek-r1:7b` but the download speed from the Ollama library is slow, you could download the model from ModelScope as follows:
 | 
			
		||||
 | 
			
		||||
- For **Windows** users:
 | 
			
		||||
 | 
			
		||||
| 
						 | 
				
			
			@ -133,7 +133,7 @@ For example, if you would like to run `deepseek-r1:7b` but the download speed fr
 | 
			
		|||
 | 
			
		||||
By default, Ollama runs model with a context window of 2048 tokens. That is, the model can "remember" at most 2048 tokens of context.
 | 
			
		||||
 | 
			
		||||
To increase the context length, you could set environment variable `IPEX_LLM_NUM_CTX` **before Start Ollama Serve**, as shwon below (if Ollama serve is already running, please make sure to stop it first):
 | 
			
		||||
To increase the context length, you could set environment variable `IPEX_LLM_NUM_CTX` **before staring Ollama Serve**, as shwon below (if Ollama serve is already running, please make sure to stop it first):
 | 
			
		||||
 | 
			
		||||
- For **Windows** users:
 | 
			
		||||
 | 
			
		||||
| 
						 | 
				
			
			@ -154,7 +154,7 @@ To increase the context length, you could set environment variable `IPEX_LLM_NUM
 | 
			
		|||
 | 
			
		||||
If your machine has multiple Intel GPUs, Ollama will by default runs on all of them.
 | 
			
		||||
 | 
			
		||||
To specify which Intel GPU you would like Ollama to use, you could set environment variable `ONEAPI_DEVICE_SELECTOR` **before Start Ollama Serve**, as follows (if Ollama serve is already running, please make sure to stop it first):
 | 
			
		||||
To specify which Intel GPU you would like Ollama to use, you could set environment variable `ONEAPI_DEVICE_SELECTOR` **before starting Ollama Serve**, as follows (if Ollama serve is already running, please make sure to stop it first):
 | 
			
		||||
 | 
			
		||||
- Identify the id (e.g. 0, 1, etc.) for your multiple GPUs. You could find them in the logs of Ollama serve when loading any models, e.g.:
 | 
			
		||||
 | 
			
		||||
| 
						 | 
				
			
			@ -178,13 +178,13 @@ To specify which Intel GPU you would like Ollama to use, you could set environme
 | 
			
		|||
 | 
			
		||||
The currently Ollama Portable Zip is based on Ollama v0.5.4; in addition, the following new models have also been supported in the Ollama Portable Zip:
 | 
			
		||||
 | 
			
		||||
  | Model  | Download (Windows) | Download (Linux) | Model Link |
 | 
			
		||||
  | - | - | - | - |
 | 
			
		||||
  | DeepSeek-R1 | `ollama run deepseek-r1` | `./ollama run deepseek-r1` | [deepseek-r1](https://ollama.com/library/deepseek-r1) |
 | 
			
		||||
  | Openthinker | `ollama run openthinker` | `./ollama run openthinker` | [openthinker](https://ollama.com/library/openthinker) |
 | 
			
		||||
  | DeepScaleR | `ollama run deepscaler` | `./ollama run deepscaler` | [deepscaler](https://ollama.com/library/deepscaler) |
 | 
			
		||||
  | Phi-4 | `ollama run phi4` | `./ollama run phi4` | [phi4](https://ollama.com/library/phi4) |
 | 
			
		||||
  | Dolphin 3.0 | `ollama run dolphin3` | `./ollama run dolphin3` | [dolphin3](https://ollama.com/library/dolphin3) |
 | 
			
		||||
  | Smallthinker | `ollama run smallthinker` |`./ollama run smallthinker` | [smallthinker](https://ollama.com/library/smallthinker) |
 | 
			
		||||
  | Granite3.1-Dense |  `ollama run granite3-dense` | `./ollama run granite3-dense` | [granite3.1-dense](https://ollama.com/library/granite3.1-dense) |
 | 
			
		||||
  | Granite3.1-Moe-3B | `ollama run granite3-moe` | `./ollama run granite3-moe` | [granite3.1-moe](https://ollama.com/library/granite3.1-moe) |
 | 
			
		||||
| Model  | Download (Windows) | Download (Linux) | Model Link |
 | 
			
		||||
| - | - | - | - |
 | 
			
		||||
| DeepSeek-R1 | `ollama run deepseek-r1` | `./ollama run deepseek-r1` | [deepseek-r1](https://ollama.com/library/deepseek-r1) |
 | 
			
		||||
| Openthinker | `ollama run openthinker` | `./ollama run openthinker` | [openthinker](https://ollama.com/library/openthinker) |
 | 
			
		||||
| DeepScaleR | `ollama run deepscaler` | `./ollama run deepscaler` | [deepscaler](https://ollama.com/library/deepscaler) |
 | 
			
		||||
| Phi-4 | `ollama run phi4` | `./ollama run phi4` | [phi4](https://ollama.com/library/phi4) |
 | 
			
		||||
| Dolphin 3.0 | `ollama run dolphin3` | `./ollama run dolphin3` | [dolphin3](https://ollama.com/library/dolphin3) |
 | 
			
		||||
| Smallthinker | `ollama run smallthinker` |`./ollama run smallthinker` | [smallthinker](https://ollama.com/library/smallthinker) |
 | 
			
		||||
| Granite3.1-Dense |  `ollama run granite3-dense` | `./ollama run granite3-dense` | [granite3.1-dense](https://ollama.com/library/granite3.1-dense) |
 | 
			
		||||
| Granite3.1-Moe-3B | `ollama run granite3-moe` | `./ollama run granite3-moe` | [granite3.1-moe](https://ollama.com/library/granite3.1-moe) |
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
| 
						 | 
				
			
			@ -19,6 +19,7 @@
 | 
			
		|||
- [提示和故障排除](#提示和故障排除)
 | 
			
		||||
  - [通过切换源提升模型下载速度](#通过切换源提升模型下载速度)
 | 
			
		||||
  - [在 Ollama 中增加上下文长度](#在-ollama-中增加上下文长度)
 | 
			
		||||
  - [在多块 GPU 可用时选择特定的 GPU 来运行 Ollama](#在多块-gpu-可用时选择特定的-gpu-来运行-ollama)
 | 
			
		||||
  - [Ollama v0.5.4 之后新增模型支持](#ollama-v054-之后新增模型支持)
 | 
			
		||||
- [更多信息](ollama_quickstart.zh-CN.md)
 | 
			
		||||
 | 
			
		||||
| 
						 | 
				
			
			@ -61,9 +62,7 @@
 | 
			
		|||
 | 
			
		||||
### 系统环境准备
 | 
			
		||||
 | 
			
		||||
检查你的 GPU 驱动程序版本,并根据需要进行更新:
 | 
			
		||||
 | 
			
		||||
- 对于消费级显卡用户,如A系列,B系列和集成显卡,我们推荐按照[消费级显卡驱动安装指南](https://dgpu-docs.intel.com/driver/client/overview.html)来安装您的显卡驱动。
 | 
			
		||||
检查你的 GPU 驱动程序版本,并根据需要进行更新;我们推荐用户按照[消费级显卡驱动安装指南](https://dgpu-docs.intel.com/driver/client/overview.html)来安装 GPU 驱动。
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
### 步骤 1:下载和解压
 | 
			
		||||
| 
						 | 
				
			
			@ -80,7 +79,6 @@ tar xvf [Downloaded tgz file]
 | 
			
		|||
 | 
			
		||||
进入解压后的文件夹,执行`./start-ollama.sh`启动 Ollama Serve: 
 | 
			
		||||
 | 
			
		||||
[可选操作] 对于有多块显卡的用户,请编辑解压后文件夹中的 start-ollama.sh,并根据机器配置修改 ONEAPI_DEVICE_SELECTOR。默认情况下,Ollama 会使用所有显卡。
 | 
			
		||||
```bash
 | 
			
		||||
 cd PATH/TO/EXTRACTED/FOLDER
 | 
			
		||||
./start-ollama.sh
 | 
			
		||||
| 
						 | 
				
			
			@ -105,13 +103,21 @@ tar xvf [Downloaded tgz file]
 | 
			
		|||
 | 
			
		||||
### 通过切换源提升模型下载速度
 | 
			
		||||
 | 
			
		||||
Ollama 默认从 Ollama 库下载模型。通过在运行 Ollama 之前设置环境变量 `IPEX_LLM_MODEL_SOURCE` 为 `modelscope`或`ollama`,你可以切换模型的首选下载源。
 | 
			
		||||
Ollama 默认从 Ollama 库下载模型。通过在**运行 Ollama 之前**设置环境变量 `IPEX_LLM_MODEL_SOURCE` 为 `modelscope` 或 `ollama`,你可以切换模型的下载源。
 | 
			
		||||
 | 
			
		||||
例如,如果你想运行 `deepseek-r1:7b` 但从 Ollama 库的下载速度较慢,可以通过如下方式改用 ModelScope 上的模型源:
 | 
			
		||||
 | 
			
		||||
- 打开命令提示符(cmd),并通过 `cd /d PATH\TO\EXTRACTED\FOLDER` 命令进入解压后的文件夹
 | 
			
		||||
- 在命令提示符中运行 `set IPEX_LLM_MODEL_SOURCE=modelscope`
 | 
			
		||||
- 运行 `ollama run deepseek-r1:7b`
 | 
			
		||||
- 对于 **Windows** 用户:
 | 
			
		||||
 | 
			
		||||
  - 打开命令提示符(cmd),并通过 `cd /d PATH\TO\EXTRACTED\FOLDER` 命令进入解压后的文件夹
 | 
			
		||||
  - 在命令提示符中运行 `set IPEX_LLM_MODEL_SOURCE=modelscope`
 | 
			
		||||
  - 运行 `ollama run deepseek-r1:7b`
 | 
			
		||||
 | 
			
		||||
- 对于 **Linux** 用户:
 | 
			
		||||
 | 
			
		||||
  - 在另一个终端(不同于运行 Ollama serve 的终端)中,输入指令 `cd PATH/TO/EXTRACTED/FOLDER` 进入解压后的文件夹
 | 
			
		||||
  - 在终端中运行 `export IPEX_LLM_MODEL_SOURCE=modelscope`
 | 
			
		||||
  - 运行 `./ollama run deepseek-r1:7b`
 | 
			
		||||
 | 
			
		||||
> [!Tip]
 | 
			
		||||
> 使用 `set IPEX_LLM_MODEL_SOURCE=modelscope` 下载的模型,在执行 `ollama list` 时仍会显示实际的模型 ID,例如:
 | 
			
		||||
| 
						 | 
				
			
			@ -125,26 +131,58 @@ Ollama 默认从 Ollama 库下载模型。通过在运行 Ollama 之前设置环
 | 
			
		|||
 | 
			
		||||
默认情况下,Ollama 使用 2048 个 token 的上下文窗口运行模型。也就是说,模型最多能 “记住” 2048 个 token 的上下文。
 | 
			
		||||
 | 
			
		||||
要增加上下文长度,可以在 [启动 Ollama serve](#步骤-2启动-ollama-serve) 之前设置环境变量 `IPEX_LLM_NUM_CTX`,步骤如下:
 | 
			
		||||
要增加上下文长度,可以在**启动 Ollama serve 之前**设置环境变量 `IPEX_LLM_NUM_CTX`,步骤如下(如果 Ollama serve 已经在运行,请确保先将其停止):
 | 
			
		||||
 | 
			
		||||
- 打开命令提示符(cmd),并通过 `cd /d PATH\TO\EXTRACTED\FOLDER` 命令进入解压后的文件夹
 | 
			
		||||
- 在命令提示符中将 `IPEX_LLM_NUM_CTX` 设置为所需长度,例如:`set IPEX_LLM_NUM_CTX=16384`
 | 
			
		||||
- 通过运行 `start-ollama.bat` 启动 Ollama serve
 | 
			
		||||
- 对于 **Windows** 用户:
 | 
			
		||||
 | 
			
		||||
  - 打开命令提示符(cmd),并通过 `cd /d PATH\TO\EXTRACTED\FOLDER` 命令进入解压后的文件夹
 | 
			
		||||
  - 在命令提示符中将 `IPEX_LLM_NUM_CTX` 设置为所需长度,例如:`set IPEX_LLM_NUM_CTX=16384`
 | 
			
		||||
  - 通过运行 `start-ollama.bat` 启动 Ollama serve
 | 
			
		||||
 | 
			
		||||
- 对于 **Linux** 用户:
 | 
			
		||||
 | 
			
		||||
  - 在终端中输入指令 `cd PATH/TO/EXTRACTED/FOLDER` 进入解压后的文件夹
 | 
			
		||||
  - 在终端中将 `IPEX_LLM_NUM_CTX` 设置为所需长度,例如:`export IPEX_LLM_NUM_CTX=16384`
 | 
			
		||||
  - 通过运行 `./start-ollama.sh` 启动 Ollama serve
 | 
			
		||||
 | 
			
		||||
> [!Tip]
 | 
			
		||||
> `IPEX_LLM_NUM_CTX` 的优先级高于模型 `Modelfile` 中设置的 `num_ctx`。
 | 
			
		||||
 | 
			
		||||
### 在多块 GPU 可用时选择特定的 GPU 来运行 Ollama
 | 
			
		||||
 | 
			
		||||
如果你的机器上有多块 GPU,Ollama 默认会在所有 GPU 上运行。
 | 
			
		||||
 | 
			
		||||
你可以通过在**启动 Ollama serve 之前**设置环境变量 `ONEAPI_DEVICE_SELECTOR` 来指定在特定的 Intel GPU 上运行 Ollama,步骤如下(如果 Ollama serve 已经在运行,请确保先将其停止):
 | 
			
		||||
 | 
			
		||||
- 确认多块 GPU 对应的 id (例如0,1等)。你可以通过在加载任何模型时查看 Ollama serve 的日志来找到它们,例如:
 | 
			
		||||
 | 
			
		||||
  <div align="center">
 | 
			
		||||
    <img src="https://llm-assets.readthedocs.io/en/latest/_images/ollama_portable_multi_gpus.png"  width=80%/>
 | 
			
		||||
  </div>
 | 
			
		||||
 | 
			
		||||
- 对于 **Windows** 用户:
 | 
			
		||||
 | 
			
		||||
  - 打开命令提示符(cmd),并通过 `cd /d PATH\TO\EXTRACTED\FOLDER` 命令进入解压后的文件夹
 | 
			
		||||
  - 在命令提示符中设置 `ONEAPI_DEVICE_SELECTOR` 来定义你想使用的 Intel GPU,例如 `set ONEAPI_DEVICE_SELECTOR=level_zero:0`,其中`0`应该替换成你期望的 GPU id
 | 
			
		||||
  - 通过运行 `start-ollama.bat` 启动 Ollama serve
 | 
			
		||||
 | 
			
		||||
- 对于 **Linux** 用户:
 | 
			
		||||
 | 
			
		||||
  - 在终端中输入指令 `cd PATH/TO/EXTRACTED/FOLDER` 进入解压后的文件夹
 | 
			
		||||
  - 在终端中设置 `ONEAPI_DEVICE_SELECTOR` 来定义你想使用的 Intel GPU,例如 `export ONEAPI_DEVICE_SELECTOR=level_zero:0`,其中`0`应该替换成你期望的 GPU id
 | 
			
		||||
  - 通过运行 `./start-ollama.sh` 启动 Ollama serve
 | 
			
		||||
 | 
			
		||||
### Ollama v0.5.4 之后新增模型支持
 | 
			
		||||
 | 
			
		||||
当前的 Ollama Portable Zip 基于 Ollama v0.5.4;此外,以下新模型也已在 Ollama Portable Zip 中得到支持:
 | 
			
		||||
 | 
			
		||||
  | 模型  | 下载 | 模型链接 |
 | 
			
		||||
  | - | - | - |
 | 
			
		||||
  | DeepSeek-R1 | `ollama run deepseek-r1` | [deepseek-r1](https://ollama.com/library/deepseek-r1) |
 | 
			
		||||
  | Openthinker | `ollama run openthinker` | [openthinker](https://ollama.com/library/openthinker) |
 | 
			
		||||
  | DeepScaleR | `ollama run deepscaler` | [deepscaler](https://ollama.com/library/deepscaler) |
 | 
			
		||||
  | Phi-4 | `ollama run phi4` | [phi4](https://ollama.com/library/phi4) |
 | 
			
		||||
  | Dolphin 3.0 | `ollama run dolphin3` | [dolphin3](https://ollama.com/library/dolphin3) |
 | 
			
		||||
  | Smallthinker | `ollama run smallthinker` | [smallthinker](https://ollama.com/library/smallthinker) |
 | 
			
		||||
  | Granite3.1-Dense |  `ollama run granite3-dense` | [granite3.1-dense](https://ollama.com/library/granite3.1-dense) |
 | 
			
		||||
  | Granite3.1-Moe-3B | `ollama run granite3-moe` | [granite3.1-moe](https://ollama.com/library/granite3.1-moe) |
 | 
			
		||||
| 模型  | 下载(Windows)| 下载(Linux)| 模型链接 |
 | 
			
		||||
| - | - | - | - |
 | 
			
		||||
| DeepSeek-R1 | `ollama run deepseek-r1` | `./ollama run deepseek-r1` | [deepseek-r1](https://ollama.com/library/deepseek-r1) |
 | 
			
		||||
| Openthinker | `ollama run openthinker` | `./ollama run openthinker` | [openthinker](https://ollama.com/library/openthinker) |
 | 
			
		||||
| DeepScaleR | `ollama run deepscaler` | `./ollama run deepscaler` | [deepscaler](https://ollama.com/library/deepscaler) |
 | 
			
		||||
| Phi-4 | `ollama run phi4` | `./ollama run phi4` | [phi4](https://ollama.com/library/phi4) |
 | 
			
		||||
| Dolphin 3.0 | `ollama run dolphin3` | `./ollama run dolphin3` | [dolphin3](https://ollama.com/library/dolphin3) |
 | 
			
		||||
| Smallthinker | `ollama run smallthinker` |`./ollama run smallthinker` | [smallthinker](https://ollama.com/library/smallthinker) |
 | 
			
		||||
| Granite3.1-Dense |  `ollama run granite3-dense` | `./ollama run granite3-dense` | [granite3.1-dense](https://ollama.com/library/granite3.1-dense) |
 | 
			
		||||
| Granite3.1-Moe-3B | `ollama run granite3-moe` | `./ollama run granite3-moe` | [granite3.1-moe](https://ollama.com/library/granite3.1-moe) |
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
		Loading…
	
		Reference in a new issue