Update CN Ollama portable zip QuickStart for troubleshooting & tips (#12860)

* Small fix for english version

* Update CN ollama portable zip quickstart for troubleshooting & tips

* Small fix
This commit is contained in:
Yuwen Hu 2025-02-20 11:32:06 +08:00 committed by GitHub
parent 38a682adb1
commit 0f2706be42
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
2 changed files with 73 additions and 35 deletions

View file

@ -107,7 +107,7 @@ You could then use Ollama to run LLMs on Intel GPUs as follows:
Ollama by default downloads model from the Ollama library. By setting the environment variable `IPEX_LLM_MODEL_SOURCE` to `modelscope` or `ollama` **before running Ollama**, you could switch the source where the model is downloaded. Ollama by default downloads model from the Ollama library. By setting the environment variable `IPEX_LLM_MODEL_SOURCE` to `modelscope` or `ollama` **before running Ollama**, you could switch the source where the model is downloaded.
For example, if you would like to run `deepseek-r1:7b` but the download speed from the Ollama library is slow, you could download the model from [ModelScope](https://www.modelscope.cn/models) as follows: For example, if you would like to run `deepseek-r1:7b` but the download speed from the Ollama library is slow, you could download the model from ModelScope as follows:
- For **Windows** users: - For **Windows** users:
@ -133,7 +133,7 @@ For example, if you would like to run `deepseek-r1:7b` but the download speed fr
By default, Ollama runs model with a context window of 2048 tokens. That is, the model can "remember" at most 2048 tokens of context. By default, Ollama runs model with a context window of 2048 tokens. That is, the model can "remember" at most 2048 tokens of context.
To increase the context length, you could set environment variable `IPEX_LLM_NUM_CTX` **before Start Ollama Serve**, as shwon below (if Ollama serve is already running, please make sure to stop it first): To increase the context length, you could set environment variable `IPEX_LLM_NUM_CTX` **before staring Ollama Serve**, as shwon below (if Ollama serve is already running, please make sure to stop it first):
- For **Windows** users: - For **Windows** users:
@ -154,7 +154,7 @@ To increase the context length, you could set environment variable `IPEX_LLM_NUM
If your machine has multiple Intel GPUs, Ollama will by default runs on all of them. If your machine has multiple Intel GPUs, Ollama will by default runs on all of them.
To specify which Intel GPU you would like Ollama to use, you could set environment variable `ONEAPI_DEVICE_SELECTOR` **before Start Ollama Serve**, as follows (if Ollama serve is already running, please make sure to stop it first): To specify which Intel GPU you would like Ollama to use, you could set environment variable `ONEAPI_DEVICE_SELECTOR` **before starting Ollama Serve**, as follows (if Ollama serve is already running, please make sure to stop it first):
- Identify the id (e.g. 0, 1, etc.) for your multiple GPUs. You could find them in the logs of Ollama serve when loading any models, e.g.: - Identify the id (e.g. 0, 1, etc.) for your multiple GPUs. You could find them in the logs of Ollama serve when loading any models, e.g.:
@ -178,13 +178,13 @@ To specify which Intel GPU you would like Ollama to use, you could set environme
The currently Ollama Portable Zip is based on Ollama v0.5.4; in addition, the following new models have also been supported in the Ollama Portable Zip: The currently Ollama Portable Zip is based on Ollama v0.5.4; in addition, the following new models have also been supported in the Ollama Portable Zip:
| Model | Download (Windows) | Download (Linux) | Model Link | | Model | Download (Windows) | Download (Linux) | Model Link |
| - | - | - | - | | - | - | - | - |
| DeepSeek-R1 | `ollama run deepseek-r1` | `./ollama run deepseek-r1` | [deepseek-r1](https://ollama.com/library/deepseek-r1) | | DeepSeek-R1 | `ollama run deepseek-r1` | `./ollama run deepseek-r1` | [deepseek-r1](https://ollama.com/library/deepseek-r1) |
| Openthinker | `ollama run openthinker` | `./ollama run openthinker` | [openthinker](https://ollama.com/library/openthinker) | | Openthinker | `ollama run openthinker` | `./ollama run openthinker` | [openthinker](https://ollama.com/library/openthinker) |
| DeepScaleR | `ollama run deepscaler` | `./ollama run deepscaler` | [deepscaler](https://ollama.com/library/deepscaler) | | DeepScaleR | `ollama run deepscaler` | `./ollama run deepscaler` | [deepscaler](https://ollama.com/library/deepscaler) |
| Phi-4 | `ollama run phi4` | `./ollama run phi4` | [phi4](https://ollama.com/library/phi4) | | Phi-4 | `ollama run phi4` | `./ollama run phi4` | [phi4](https://ollama.com/library/phi4) |
| Dolphin 3.0 | `ollama run dolphin3` | `./ollama run dolphin3` | [dolphin3](https://ollama.com/library/dolphin3) | | Dolphin 3.0 | `ollama run dolphin3` | `./ollama run dolphin3` | [dolphin3](https://ollama.com/library/dolphin3) |
| Smallthinker | `ollama run smallthinker` |`./ollama run smallthinker` | [smallthinker](https://ollama.com/library/smallthinker) | | Smallthinker | `ollama run smallthinker` |`./ollama run smallthinker` | [smallthinker](https://ollama.com/library/smallthinker) |
| Granite3.1-Dense | `ollama run granite3-dense` | `./ollama run granite3-dense` | [granite3.1-dense](https://ollama.com/library/granite3.1-dense) | | Granite3.1-Dense | `ollama run granite3-dense` | `./ollama run granite3-dense` | [granite3.1-dense](https://ollama.com/library/granite3.1-dense) |
| Granite3.1-Moe-3B | `ollama run granite3-moe` | `./ollama run granite3-moe` | [granite3.1-moe](https://ollama.com/library/granite3.1-moe) | | Granite3.1-Moe-3B | `ollama run granite3-moe` | `./ollama run granite3-moe` | [granite3.1-moe](https://ollama.com/library/granite3.1-moe) |

View file

@ -19,6 +19,7 @@
- [提示和故障排除](#提示和故障排除) - [提示和故障排除](#提示和故障排除)
- [通过切换源提升模型下载速度](#通过切换源提升模型下载速度) - [通过切换源提升模型下载速度](#通过切换源提升模型下载速度)
- [在 Ollama 中增加上下文长度](#在-ollama-中增加上下文长度) - [在 Ollama 中增加上下文长度](#在-ollama-中增加上下文长度)
- [在多块 GPU 可用时选择特定的 GPU 来运行 Ollama](#在多块-gpu-可用时选择特定的-gpu-来运行-ollama)
- [Ollama v0.5.4 之后新增模型支持](#ollama-v054-之后新增模型支持) - [Ollama v0.5.4 之后新增模型支持](#ollama-v054-之后新增模型支持)
- [更多信息](ollama_quickstart.zh-CN.md) - [更多信息](ollama_quickstart.zh-CN.md)
@ -61,9 +62,7 @@
### 系统环境准备 ### 系统环境准备
检查你的 GPU 驱动程序版本,并根据需要进行更新: 检查你的 GPU 驱动程序版本,并根据需要进行更新;我们推荐用户按照[消费级显卡驱动安装指南](https://dgpu-docs.intel.com/driver/client/overview.html)来安装 GPU 驱动。
- 对于消费级显卡用户如A系列B系列和集成显卡我们推荐按照[消费级显卡驱动安装指南](https://dgpu-docs.intel.com/driver/client/overview.html)来安装您的显卡驱动。
### 步骤 1下载和解压 ### 步骤 1下载和解压
@ -80,7 +79,6 @@ tar xvf [Downloaded tgz file]
进入解压后的文件夹,执行`./start-ollama.sh`启动 Ollama Serve 进入解压后的文件夹,执行`./start-ollama.sh`启动 Ollama Serve
[可选操作] 对于有多块显卡的用户,请编辑解压后文件夹中的 start-ollama.sh并根据机器配置修改 ONEAPI_DEVICE_SELECTOR。默认情况下Ollama 会使用所有显卡。
```bash ```bash
cd PATH/TO/EXTRACTED/FOLDER cd PATH/TO/EXTRACTED/FOLDER
./start-ollama.sh ./start-ollama.sh
@ -105,13 +103,21 @@ tar xvf [Downloaded tgz file]
### 通过切换源提升模型下载速度 ### 通过切换源提升模型下载速度
Ollama 默认从 Ollama 库下载模型。通过在运行 Ollama 之前设置环境变量 `IPEX_LLM_MODEL_SOURCE``modelscope`或`ollama`,你可以切换模型的首选下载源。 Ollama 默认从 Ollama 库下载模型。通过在**运行 Ollama 之前**设置环境变量 `IPEX_LLM_MODEL_SOURCE``modelscope` `ollama`,你可以切换模型的下载源。
例如,如果你想运行 `deepseek-r1:7b` 但从 Ollama 库的下载速度较慢,可以通过如下方式改用 ModelScope 上的模型源: 例如,如果你想运行 `deepseek-r1:7b` 但从 Ollama 库的下载速度较慢,可以通过如下方式改用 ModelScope 上的模型源:
- 打开命令提示符cmd并通过 `cd /d PATH\TO\EXTRACTED\FOLDER` 命令进入解压后的文件夹 - 对于 **Windows** 用户:
- 在命令提示符中运行 `set IPEX_LLM_MODEL_SOURCE=modelscope`
- 运行 `ollama run deepseek-r1:7b` - 打开命令提示符cmd并通过 `cd /d PATH\TO\EXTRACTED\FOLDER` 命令进入解压后的文件夹
- 在命令提示符中运行 `set IPEX_LLM_MODEL_SOURCE=modelscope`
- 运行 `ollama run deepseek-r1:7b`
- 对于 **Linux** 用户:
- 在另一个终端(不同于运行 Ollama serve 的终端)中,输入指令 `cd PATH/TO/EXTRACTED/FOLDER` 进入解压后的文件夹
- 在终端中运行 `export IPEX_LLM_MODEL_SOURCE=modelscope`
- 运行 `./ollama run deepseek-r1:7b`
> [!Tip] > [!Tip]
> 使用 `set IPEX_LLM_MODEL_SOURCE=modelscope` 下载的模型,在执行 `ollama list` 时仍会显示实际的模型 ID例如 > 使用 `set IPEX_LLM_MODEL_SOURCE=modelscope` 下载的模型,在执行 `ollama list` 时仍会显示实际的模型 ID例如
@ -125,26 +131,58 @@ Ollama 默认从 Ollama 库下载模型。通过在运行 Ollama 之前设置环
默认情况下Ollama 使用 2048 个 token 的上下文窗口运行模型。也就是说,模型最多能 “记住” 2048 个 token 的上下文。 默认情况下Ollama 使用 2048 个 token 的上下文窗口运行模型。也就是说,模型最多能 “记住” 2048 个 token 的上下文。
要增加上下文长度,可以在 [启动 Ollama serve](#步骤-2启动-ollama-serve) 之前设置环境变量 `IPEX_LLM_NUM_CTX`,步骤如下: 要增加上下文长度,可以在**启动 Ollama serve 之前**设置环境变量 `IPEX_LLM_NUM_CTX`,步骤如下(如果 Ollama serve 已经在运行,请确保先将其停止)
- 打开命令提示符cmd并通过 `cd /d PATH\TO\EXTRACTED\FOLDER` 命令进入解压后的文件夹 - 对于 **Windows** 用户:
- 在命令提示符中将 `IPEX_LLM_NUM_CTX` 设置为所需长度,例如:`set IPEX_LLM_NUM_CTX=16384`
- 通过运行 `start-ollama.bat` 启动 Ollama serve - 打开命令提示符cmd并通过 `cd /d PATH\TO\EXTRACTED\FOLDER` 命令进入解压后的文件夹
- 在命令提示符中将 `IPEX_LLM_NUM_CTX` 设置为所需长度,例如:`set IPEX_LLM_NUM_CTX=16384`
- 通过运行 `start-ollama.bat` 启动 Ollama serve
- 对于 **Linux** 用户:
- 在终端中输入指令 `cd PATH/TO/EXTRACTED/FOLDER` 进入解压后的文件夹
- 在终端中将 `IPEX_LLM_NUM_CTX` 设置为所需长度,例如:`export IPEX_LLM_NUM_CTX=16384`
- 通过运行 `./start-ollama.sh` 启动 Ollama serve
> [!Tip] > [!Tip]
> `IPEX_LLM_NUM_CTX` 的优先级高于模型 `Modelfile` 中设置的 `num_ctx` > `IPEX_LLM_NUM_CTX` 的优先级高于模型 `Modelfile` 中设置的 `num_ctx`
### 在多块 GPU 可用时选择特定的 GPU 来运行 Ollama
如果你的机器上有多块 GPUOllama 默认会在所有 GPU 上运行。
你可以通过在**启动 Ollama serve 之前**设置环境变量 `ONEAPI_DEVICE_SELECTOR` 来指定在特定的 Intel GPU 上运行 Ollama步骤如下如果 Ollama serve 已经在运行,请确保先将其停止):
- 确认多块 GPU 对应的 id (例如01等)。你可以通过在加载任何模型时查看 Ollama serve 的日志来找到它们,例如:
<div align="center">
<img src="https://llm-assets.readthedocs.io/en/latest/_images/ollama_portable_multi_gpus.png" width=80%/>
</div>
- 对于 **Windows** 用户:
- 打开命令提示符cmd并通过 `cd /d PATH\TO\EXTRACTED\FOLDER` 命令进入解压后的文件夹
- 在命令提示符中设置 `ONEAPI_DEVICE_SELECTOR` 来定义你想使用的 Intel GPU例如 `set ONEAPI_DEVICE_SELECTOR=level_zero:0`,其中`0`应该替换成你期望的 GPU id
- 通过运行 `start-ollama.bat` 启动 Ollama serve
- 对于 **Linux** 用户:
- 在终端中输入指令 `cd PATH/TO/EXTRACTED/FOLDER` 进入解压后的文件夹
- 在终端中设置 `ONEAPI_DEVICE_SELECTOR` 来定义你想使用的 Intel GPU例如 `export ONEAPI_DEVICE_SELECTOR=level_zero:0`,其中`0`应该替换成你期望的 GPU id
- 通过运行 `./start-ollama.sh` 启动 Ollama serve
### Ollama v0.5.4 之后新增模型支持 ### Ollama v0.5.4 之后新增模型支持
当前的 Ollama Portable Zip 基于 Ollama v0.5.4;此外,以下新模型也已在 Ollama Portable Zip 中得到支持: 当前的 Ollama Portable Zip 基于 Ollama v0.5.4;此外,以下新模型也已在 Ollama Portable Zip 中得到支持:
| 模型 | 下载 | 模型链接 | | 模型 | 下载Windows| 下载Linux| 模型链接 |
| - | - | - | | - | - | - | - |
| DeepSeek-R1 | `ollama run deepseek-r1` | [deepseek-r1](https://ollama.com/library/deepseek-r1) | | DeepSeek-R1 | `ollama run deepseek-r1` | `./ollama run deepseek-r1` | [deepseek-r1](https://ollama.com/library/deepseek-r1) |
| Openthinker | `ollama run openthinker` | [openthinker](https://ollama.com/library/openthinker) | | Openthinker | `ollama run openthinker` | `./ollama run openthinker` | [openthinker](https://ollama.com/library/openthinker) |
| DeepScaleR | `ollama run deepscaler` | [deepscaler](https://ollama.com/library/deepscaler) | | DeepScaleR | `ollama run deepscaler` | `./ollama run deepscaler` | [deepscaler](https://ollama.com/library/deepscaler) |
| Phi-4 | `ollama run phi4` | [phi4](https://ollama.com/library/phi4) | | Phi-4 | `ollama run phi4` | `./ollama run phi4` | [phi4](https://ollama.com/library/phi4) |
| Dolphin 3.0 | `ollama run dolphin3` | [dolphin3](https://ollama.com/library/dolphin3) | | Dolphin 3.0 | `ollama run dolphin3` | `./ollama run dolphin3` | [dolphin3](https://ollama.com/library/dolphin3) |
| Smallthinker | `ollama run smallthinker` | [smallthinker](https://ollama.com/library/smallthinker) | | Smallthinker | `ollama run smallthinker` |`./ollama run smallthinker` | [smallthinker](https://ollama.com/library/smallthinker) |
| Granite3.1-Dense | `ollama run granite3-dense` | [granite3.1-dense](https://ollama.com/library/granite3.1-dense) | | Granite3.1-Dense | `ollama run granite3-dense` | `./ollama run granite3-dense` | [granite3.1-dense](https://ollama.com/library/granite3.1-dense) |
| Granite3.1-Moe-3B | `ollama run granite3-moe` | [granite3.1-moe](https://ollama.com/library/granite3.1-moe) | | Granite3.1-Moe-3B | `ollama run granite3-moe` | `./ollama run granite3-moe` | [granite3.1-moe](https://ollama.com/library/granite3.1-moe) |