Modify ollama num_ctx related doc (#13139)

* Modify ollama num_ctx related doc

* meet comments
This commit is contained in:
SONG Ge 2025-05-07 16:44:58 +08:00 committed by GitHub
parent 3a28b69202
commit e88a2aa65b
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
2 changed files with 22 additions and 16 deletions

View file

@ -114,24 +114,24 @@ You could then use Ollama to run LLMs on Intel GPUs as follows:
### Speed up model download using alternative sources ### Speed up model download using alternative sources
Ollama by default downloads model from the Ollama library. By setting the environment variable `IPEX_LLM_MODEL_SOURCE` to `modelscope` or `ollama` **before running Ollama**, you could switch the source where the model is downloaded. Ollama by default downloads model from the Ollama library. By setting the environment variable `OLLAMA_MODEL_SOURCE` to `modelscope` or `ollama` **before running Ollama**, you could switch the source where the model is downloaded.
For example, if you would like to run `deepseek-r1:7b` but the download speed from the Ollama library is slow, you could download the model from ModelScope as follows: For example, if you would like to run `deepseek-r1:7b` but the download speed from the Ollama library is slow, you could download the model from ModelScope as follows:
- For **Windows** users: - For **Windows** users:
- In the "Command Prompt", navigate to the extracted folder by `cd /d PATH\TO\EXTRACTED\FOLDER` - In the "Command Prompt", navigate to the extracted folder by `cd /d PATH\TO\EXTRACTED\FOLDER`
- Run `set IPEX_LLM_MODEL_SOURCE=modelscope` in "Command Prompt" - Run `set OLLAMA_MODEL_SOURCE=modelscope` in "Command Prompt"
- Run `ollama run deepseek-r1:7b` - Run `ollama run deepseek-r1:7b`
- For **Linux** users: - For **Linux** users:
- In a terminal other than the one for Ollama serve, navigate to the extracted folder by `cd PATH\TO\EXTRACTED\FOLDER` - In a terminal other than the one for Ollama serve, navigate to the extracted folder by `cd PATH\TO\EXTRACTED\FOLDER`
- Run `export IPEX_LLM_MODEL_SOURCE=modelscope` in the terminal - Run `export OLLAMA_MODEL_SOURCE=modelscope` in the terminal
- Run `./ollama run deepseek-r1:7b` - Run `./ollama run deepseek-r1:7b`
> [!TIP] > [!TIP]
> Model downloaded with `set IPEX_LLM_MODEL_SOURCE=modelscope` will still show actual model id in `ollama list`, e.g. > Model downloaded with `set OLLAMA_MODEL_SOURCE=modelscope` will still show actual model id in `ollama list`, e.g.
> ``` > ```
> NAME ID SIZE MODIFIED > NAME ID SIZE MODIFIED
> modelscope.cn/unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUF:Q4_K_M f482d5af6aec 4.7 GB About a minute ago > modelscope.cn/unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUF:Q4_K_M f482d5af6aec 4.7 GB About a minute ago
@ -142,22 +142,25 @@ For example, if you would like to run `deepseek-r1:7b` but the download speed fr
By default, Ollama runs model with a context window of 2048 tokens. That is, the model can "remember" at most 2048 tokens of context. By default, Ollama runs model with a context window of 2048 tokens. That is, the model can "remember" at most 2048 tokens of context.
To increase the context length, you could set environment variable `IPEX_LLM_NUM_CTX` **before staring Ollama Serve**, as shwon below (if Ollama serve is already running, please make sure to stop it first): To increase the context length, you could set environment variable `OLLAMA_NUM_CTX` **before staring Ollama Serve**, as shwon below (if Ollama serve is already running, please make sure to stop it first):
- For **Windows** users: - For **Windows** users:
- Open "Command Prompt", and navigate to the extracted folder through `cd /d PATH\TO\EXTRACTED\FOLDER` - Open "Command Prompt", and navigate to the extracted folder through `cd /d PATH\TO\EXTRACTED\FOLDER`
- Set `IPEX_LLM_NUM_CTX` to the desired length in the "Command Prompt, e.g. `set IPEX_LLM_NUM_CTX=16384` - Set `OLLAMA_NUM_CTX` to the desired length in the "Command Prompt, e.g. `set OLLAMA_NUM_CTX=16384`
- Start Ollama serve through `start-ollama.bat` - Start Ollama serve through `start-ollama.bat`
- For **Linux** users: - For **Linux** users:
- In a terminal, navigate to the extracted folder through `cd PATH\TO\EXTRACTED\FOLDER` - In a terminal, navigate to the extracted folder through `cd PATH\TO\EXTRACTED\FOLDER`
- Set `IPEX_LLM_NUM_CTX` to the desired length in the terminal, e.g. `export IPEX_LLM_NUM_CTX=16384` - Set `OLLAMA_NUM_CTX` to the desired length in the terminal, e.g. `export OLLAMA_NUM_CTX=16384`
- Start Ollama serve through `./start-ollama.sh` - Start Ollama serve through `./start-ollama.sh`
> [!TIP] > [!TIP]
> `IPEX_LLM_NUM_CTX` has a higher priority than the `num_ctx` settings in a models' `Modelfile`. > `OLLAMA_NUM_CTX` has a higher priority than the `num_ctx` settings in a models' `Modelfile`.
> [!NOTE]
> For versions earlier than 2.7.0b20250429, please use the `IPEX_LLM_NUM_CTX` instead.
### Select specific GPU(s) to run Ollama when multiple ones are available ### Select specific GPU(s) to run Ollama when multiple ones are available

View file

@ -112,24 +112,24 @@ cd PATH/TO/EXTRACTED/FOLDER
### 通过切换源提升模型下载速度 ### 通过切换源提升模型下载速度
Ollama 默认从 Ollama 库下载模型。通过在**运行 Ollama 之前**设置环境变量 `IPEX_LLM_MODEL_SOURCE` 为 `modelscope``ollama`,你可以切换模型的下载源。 Ollama 默认从 Ollama 库下载模型。通过在**运行 Ollama 之前**设置环境变量 `OLLAMA_MODEL_SOURCE` 为 `modelscope``ollama`,你可以切换模型的下载源。
例如,如果你想运行 `deepseek-r1:7b` 但从 Ollama 库的下载速度较慢,可以通过如下方式改用 ModelScope 上的模型源: 例如,如果你想运行 `deepseek-r1:7b` 但从 Ollama 库的下载速度较慢,可以通过如下方式改用 ModelScope 上的模型源:
- 对于 **Windows** 用户: - 对于 **Windows** 用户:
- 打开命令提示符通过 `cd /d PATH\TO\EXTRACTED\FOLDER` 命令进入解压后的文件夹 - 打开命令提示符通过 `cd /d PATH\TO\EXTRACTED\FOLDER` 命令进入解压后的文件夹
- 在命令提示符中运行 `set IPEX_LLM_MODEL_SOURCE=modelscope` - 在命令提示符中运行 `set OLLAMA_MODEL_SOURCE=modelscope`
- 运行 `ollama run deepseek-r1:7b` - 运行 `ollama run deepseek-r1:7b`
- 对于 **Linux** 用户: - 对于 **Linux** 用户:
- 在另一个终端(不同于运行 Ollama serve 的终端)中,输入指令 `cd PATH/TO/EXTRACTED/FOLDER` 进入解压后的文件夹 - 在另一个终端(不同于运行 Ollama serve 的终端)中,输入指令 `cd PATH/TO/EXTRACTED/FOLDER` 进入解压后的文件夹
- 在终端中运行 `export IPEX_LLM_MODEL_SOURCE=modelscope` - 在终端中运行 `export OLLAMA_MODEL_SOURCE=modelscope`
- 运行 `./ollama run deepseek-r1:7b` - 运行 `./ollama run deepseek-r1:7b`
> [!Tip] > [!Tip]
> 使用 `set IPEX_LLM_MODEL_SOURCE=modelscope` 下载的模型,在执行 `ollama list` 时仍会显示实际的模型 ID例如 > 使用 `set OLLAMA_MODEL_SOURCE=modelscope` 下载的模型,在执行 `ollama list` 时仍会显示实际的模型 ID例如
> ``` > ```
> NAME ID SIZE MODIFIED > NAME ID SIZE MODIFIED
> modelscope.cn/unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUF:Q4_K_M f482d5af6aec 4.7 GB About a minute ago > modelscope.cn/unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUF:Q4_K_M f482d5af6aec 4.7 GB About a minute ago
@ -140,22 +140,25 @@ Ollama 默认从 Ollama 库下载模型。通过在**运行 Ollama 之前**设
默认情况下Ollama 使用 2048 个 token 的上下文窗口运行模型。也就是说,模型最多能 “记住” 2048 个 token 的上下文。 默认情况下Ollama 使用 2048 个 token 的上下文窗口运行模型。也就是说,模型最多能 “记住” 2048 个 token 的上下文。
要增加上下文长度,可以在**启动 Ollama serve 之前**设置环境变量 `IPEX_LLM_NUM_CTX`,步骤如下(如果 Ollama serve 已经在运行,请确保先将其停止): 要增加上下文长度,可以在**启动 Ollama serve 之前**设置环境变量 `OLLAMA_NUM_CTX`,步骤如下(如果 Ollama serve 已经在运行,请确保先将其停止):
- 对于 **Windows** 用户: - 对于 **Windows** 用户:
- 打开命令提示符,并通过 `cd /d PATH\TO\EXTRACTED\FOLDER` 命令进入解压后的文件夹 - 打开命令提示符,并通过 `cd /d PATH\TO\EXTRACTED\FOLDER` 命令进入解压后的文件夹
- 在命令提示符中将 `IPEX_LLM_NUM_CTX` 设置为所需长度,例如:`set IPEX_LLM_NUM_CTX=16384` - 在命令提示符中将 `OLLAMA_NUM_CTX` 设置为所需长度,例如:`set OLLAMA_NUM_CTX=16384`
- 通过运行 `start-ollama.bat` 启动 Ollama serve - 通过运行 `start-ollama.bat` 启动 Ollama serve
- 对于 **Linux** 用户: - 对于 **Linux** 用户:
- 在终端中输入指令 `cd PATH/TO/EXTRACTED/FOLDER` 进入解压后的文件夹 - 在终端中输入指令 `cd PATH/TO/EXTRACTED/FOLDER` 进入解压后的文件夹
- 在终端中将 `IPEX_LLM_NUM_CTX` 设置为所需长度,例如:`export IPEX_LLM_NUM_CTX=16384` - 在终端中将 `OLLAMA_NUM_CTX` 设置为所需长度,例如:`export OLLAMA_NUM_CTX=16384`
- 通过运行 `./start-ollama.sh` 启动 Ollama serve - 通过运行 `./start-ollama.sh` 启动 Ollama serve
> [!Tip] > [!Tip]
> `IPEX_LLM_NUM_CTX` 的优先级高于模型 `Modelfile` 中设置的 `num_ctx` > `OLLAMA_NUM_CTX` 的优先级高于模型 `Modelfile` 中设置的 `num_ctx`
> [!NOTE]
> 对早于 2.7.0b20250429 的版本,请改用 IPEX_LLM_NUM_CTX 变量。
### 在多块 GPU 可用时选择特定的 GPU 来运行 Ollama ### 在多块 GPU 可用时选择特定的 GPU 来运行 Ollama