Modify ollama num_ctx related doc (#13139)

* Modify ollama num_ctx related doc * meet comments
2025-05-07 16:44:58 +08:00 · 2025-05-07 16:44:58 +08:00 · e88a2aa65b
commit e88a2aa65b
parent 3a28b69202
2 changed files with 22 additions and 16 deletions
--- a/docs/mddocs/Quickstart/ollama_portable_zip_quickstart.md
+++ b/docs/mddocs/Quickstart/ollama_portable_zip_quickstart.md
@ -114,24 +114,24 @@ You could then use Ollama to run LLMs on Intel GPUs as follows:
 ### Speed up model download using alternative sources
-Ollama by default downloads model from the Ollama library. By setting the environment variable `IPEX_LLM_MODEL_SOURCE` to `modelscope` or `ollama` **before running Ollama**, you could switch the source where the model is downloaded.
+Ollama by default downloads model from the Ollama library. By setting the environment variable `OLLAMA_MODEL_SOURCE` to `modelscope` or `ollama` **before running Ollama**, you could switch the source where the model is downloaded.
 For example, if you would like to run `deepseek-r1:7b` but the download speed from the Ollama library is slow, you could download the model from ModelScope as follows:
 - For **Windows** users:
  - In the "Command Prompt", navigate to the extracted folder by `cd /d PATH\TO\EXTRACTED\FOLDER`
-  - Run `set IPEX_LLM_MODEL_SOURCE=modelscope` in "Command Prompt"
+  - Run `set OLLAMA_MODEL_SOURCE=modelscope` in "Command Prompt"
  - Run `ollama run deepseek-r1:7b`
 - For **Linux** users:
  - In a terminal other than the one for Ollama serve, navigate to the extracted folder by `cd PATH\TO\EXTRACTED\FOLDER`
-  - Run `export IPEX_LLM_MODEL_SOURCE=modelscope` in the terminal
+  - Run `export OLLAMA_MODEL_SOURCE=modelscope` in the terminal
  - Run `./ollama run deepseek-r1:7b`
 > [!TIP]
-> Model downloaded with `set IPEX_LLM_MODEL_SOURCE=modelscope` will still show actual model id in `ollama list`, e.g.
+> Model downloaded with `set OLLAMA_MODEL_SOURCE=modelscope` will still show actual model id in `ollama list`, e.g.
 > ```
 > NAME                                                             ID              SIZE      MODIFIED
 > modelscope.cn/unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUF:Q4_K_M    f482d5af6aec    4.7 GB    About a minute ago
@ -142,22 +142,25 @@ For example, if you would like to run `deepseek-r1:7b` but the download speed fr
 By default, Ollama runs model with a context window of 2048 tokens. That is, the model can "remember" at most 2048 tokens of context.
-To increase the context length, you could set environment variable `IPEX_LLM_NUM_CTX` **before staring Ollama Serve**, as shwon below (if Ollama serve is already running, please make sure to stop it first):
+To increase the context length, you could set environment variable `OLLAMA_NUM_CTX` **before staring Ollama Serve**, as shwon below (if Ollama serve is already running, please make sure to stop it first):
 - For **Windows** users:
  - Open "Command Prompt", and navigate to the extracted folder through `cd /d PATH\TO\EXTRACTED\FOLDER`
-  - Set `IPEX_LLM_NUM_CTX` to the desired length in the "Command Prompt, e.g. `set IPEX_LLM_NUM_CTX=16384`
+  - Set `OLLAMA_NUM_CTX` to the desired length in the "Command Prompt, e.g. `set OLLAMA_NUM_CTX=16384`
  - Start Ollama serve through `start-ollama.bat`
 - For **Linux** users:
  - In a terminal, navigate to the extracted folder through `cd PATH\TO\EXTRACTED\FOLDER`
-  - Set `IPEX_LLM_NUM_CTX` to the desired length in the terminal, e.g. `export IPEX_LLM_NUM_CTX=16384`
+  - Set `OLLAMA_NUM_CTX` to the desired length in the terminal, e.g. `export OLLAMA_NUM_CTX=16384`
  - Start Ollama serve through `./start-ollama.sh`
 > [!TIP]
-> `IPEX_LLM_NUM_CTX` has a higher priority than the `num_ctx` settings in a models' `Modelfile`.
+> `OLLAMA_NUM_CTX` has a higher priority than the `num_ctx` settings in a models' `Modelfile`.
 > [!NOTE]
 > For versions earlier than 2.7.0b20250429, please use the `IPEX_LLM_NUM_CTX` instead.
 ### Select specific GPU(s) to run Ollama when multiple ones are available
--- a/docs/mddocs/Quickstart/ollama_portable_zip_quickstart.zh-CN.md
+++ b/docs/mddocs/Quickstart/ollama_portable_zip_quickstart.zh-CN.md
@ -112,24 +112,24 @@ cd PATH/TO/EXTRACTED/FOLDER
 ### 通过切换源提升模型下载速度
-Ollama 默认从 Ollama 库下载模型。通过在**运行 Ollama 之前**设置环境变量 `IPEX_LLM_MODEL_SOURCE` 为 `modelscope` 或 `ollama`，你可以切换模型的下载源。
+Ollama 默认从 Ollama 库下载模型。通过在**运行 Ollama 之前**设置环境变量 `OLLAMA_MODEL_SOURCE` 为 `modelscope` 或 `ollama`，你可以切换模型的下载源。
 例如，如果你想运行 `deepseek-r1:7b` 但从 Ollama 库的下载速度较慢，可以通过如下方式改用 ModelScope 上的模型源：
 - 对于 **Windows** 用户：
  - 打开命令提示符通过 `cd /d PATH\TO\EXTRACTED\FOLDER` 命令进入解压后的文件夹
-  - 在命令提示符中运行 `set IPEX_LLM_MODEL_SOURCE=modelscope`
+  - 在命令提示符中运行 `set OLLAMA_MODEL_SOURCE=modelscope`
  - 运行 `ollama run deepseek-r1:7b`
 - 对于 **Linux** 用户：
  - 在另一个终端（不同于运行 Ollama serve 的终端）中，输入指令 `cd PATH/TO/EXTRACTED/FOLDER` 进入解压后的文件夹
-  - 在终端中运行 `export IPEX_LLM_MODEL_SOURCE=modelscope`
+  - 在终端中运行 `export OLLAMA_MODEL_SOURCE=modelscope`
  - 运行 `./ollama run deepseek-r1:7b`
 > [!Tip]
-> 使用 `set IPEX_LLM_MODEL_SOURCE=modelscope` 下载的模型，在执行 `ollama list` 时仍会显示实际的模型 ID，例如：
+> 使用 `set OLLAMA_MODEL_SOURCE=modelscope` 下载的模型，在执行 `ollama list` 时仍会显示实际的模型 ID，例如：
 > ```
 > NAME                                                             ID              SIZE      MODIFIED
 > modelscope.cn/unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUF:Q4_K_M    f482d5af6aec    4.7 GB    About a minute ago
@ -140,22 +140,25 @@ Ollama 默认从 Ollama 库下载模型。通过在**运行 Ollama 之前**设
 默认情况下，Ollama 使用 2048 个 token 的上下文窗口运行模型。也就是说，模型最多能 “记住” 2048 个 token 的上下文。
-要增加上下文长度，可以在**启动 Ollama serve 之前**设置环境变量 `IPEX_LLM_NUM_CTX`，步骤如下（如果 Ollama serve 已经在运行，请确保先将其停止）：
+要增加上下文长度，可以在**启动 Ollama serve 之前**设置环境变量 `OLLAMA_NUM_CTX`，步骤如下（如果 Ollama serve 已经在运行，请确保先将其停止）：
 - 对于 **Windows** 用户：
  - 打开命令提示符，并通过 `cd /d PATH\TO\EXTRACTED\FOLDER` 命令进入解压后的文件夹
-  - 在命令提示符中将 `IPEX_LLM_NUM_CTX` 设置为所需长度，例如：`set IPEX_LLM_NUM_CTX=16384`
+  - 在命令提示符中将 `OLLAMA_NUM_CTX` 设置为所需长度，例如：`set OLLAMA_NUM_CTX=16384`
  - 通过运行 `start-ollama.bat` 启动 Ollama serve
 - 对于 **Linux** 用户：
  - 在终端中输入指令 `cd PATH/TO/EXTRACTED/FOLDER` 进入解压后的文件夹
-  - 在终端中将 `IPEX_LLM_NUM_CTX` 设置为所需长度，例如：`export IPEX_LLM_NUM_CTX=16384`
+  - 在终端中将 `OLLAMA_NUM_CTX` 设置为所需长度，例如：`export OLLAMA_NUM_CTX=16384`
  - 通过运行 `./start-ollama.sh` 启动 Ollama serve
 > [!Tip]
-> `IPEX_LLM_NUM_CTX` 的优先级高于模型 `Modelfile` 中设置的 `num_ctx`。
+> `OLLAMA_NUM_CTX` 的优先级高于模型 `Modelfile` 中设置的 `num_ctx`。
 > [!NOTE]
 > 对早于 2.7.0b20250429 的版本，请改用 IPEX_LLM_NUM_CTX 变量。
 ### 在多块 GPU 可用时选择特定的 GPU 来运行 Ollama