Modify ollama num_ctx related doc (#13139)
* Modify ollama num_ctx related doc * meet comments
This commit is contained in:
parent
3a28b69202
commit
e88a2aa65b
2 changed files with 22 additions and 16 deletions
|
|
@ -114,24 +114,24 @@ You could then use Ollama to run LLMs on Intel GPUs as follows:
|
||||||
|
|
||||||
### Speed up model download using alternative sources
|
### Speed up model download using alternative sources
|
||||||
|
|
||||||
Ollama by default downloads model from the Ollama library. By setting the environment variable `IPEX_LLM_MODEL_SOURCE` to `modelscope` or `ollama` **before running Ollama**, you could switch the source where the model is downloaded.
|
Ollama by default downloads model from the Ollama library. By setting the environment variable `OLLAMA_MODEL_SOURCE` to `modelscope` or `ollama` **before running Ollama**, you could switch the source where the model is downloaded.
|
||||||
|
|
||||||
For example, if you would like to run `deepseek-r1:7b` but the download speed from the Ollama library is slow, you could download the model from ModelScope as follows:
|
For example, if you would like to run `deepseek-r1:7b` but the download speed from the Ollama library is slow, you could download the model from ModelScope as follows:
|
||||||
|
|
||||||
- For **Windows** users:
|
- For **Windows** users:
|
||||||
|
|
||||||
- In the "Command Prompt", navigate to the extracted folder by `cd /d PATH\TO\EXTRACTED\FOLDER`
|
- In the "Command Prompt", navigate to the extracted folder by `cd /d PATH\TO\EXTRACTED\FOLDER`
|
||||||
- Run `set IPEX_LLM_MODEL_SOURCE=modelscope` in "Command Prompt"
|
- Run `set OLLAMA_MODEL_SOURCE=modelscope` in "Command Prompt"
|
||||||
- Run `ollama run deepseek-r1:7b`
|
- Run `ollama run deepseek-r1:7b`
|
||||||
|
|
||||||
- For **Linux** users:
|
- For **Linux** users:
|
||||||
|
|
||||||
- In a terminal other than the one for Ollama serve, navigate to the extracted folder by `cd PATH\TO\EXTRACTED\FOLDER`
|
- In a terminal other than the one for Ollama serve, navigate to the extracted folder by `cd PATH\TO\EXTRACTED\FOLDER`
|
||||||
- Run `export IPEX_LLM_MODEL_SOURCE=modelscope` in the terminal
|
- Run `export OLLAMA_MODEL_SOURCE=modelscope` in the terminal
|
||||||
- Run `./ollama run deepseek-r1:7b`
|
- Run `./ollama run deepseek-r1:7b`
|
||||||
|
|
||||||
> [!TIP]
|
> [!TIP]
|
||||||
> Model downloaded with `set IPEX_LLM_MODEL_SOURCE=modelscope` will still show actual model id in `ollama list`, e.g.
|
> Model downloaded with `set OLLAMA_MODEL_SOURCE=modelscope` will still show actual model id in `ollama list`, e.g.
|
||||||
> ```
|
> ```
|
||||||
> NAME ID SIZE MODIFIED
|
> NAME ID SIZE MODIFIED
|
||||||
> modelscope.cn/unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUF:Q4_K_M f482d5af6aec 4.7 GB About a minute ago
|
> modelscope.cn/unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUF:Q4_K_M f482d5af6aec 4.7 GB About a minute ago
|
||||||
|
|
@ -142,22 +142,25 @@ For example, if you would like to run `deepseek-r1:7b` but the download speed fr
|
||||||
|
|
||||||
By default, Ollama runs model with a context window of 2048 tokens. That is, the model can "remember" at most 2048 tokens of context.
|
By default, Ollama runs model with a context window of 2048 tokens. That is, the model can "remember" at most 2048 tokens of context.
|
||||||
|
|
||||||
To increase the context length, you could set environment variable `IPEX_LLM_NUM_CTX` **before staring Ollama Serve**, as shwon below (if Ollama serve is already running, please make sure to stop it first):
|
To increase the context length, you could set environment variable `OLLAMA_NUM_CTX` **before staring Ollama Serve**, as shwon below (if Ollama serve is already running, please make sure to stop it first):
|
||||||
|
|
||||||
- For **Windows** users:
|
- For **Windows** users:
|
||||||
|
|
||||||
- Open "Command Prompt", and navigate to the extracted folder through `cd /d PATH\TO\EXTRACTED\FOLDER`
|
- Open "Command Prompt", and navigate to the extracted folder through `cd /d PATH\TO\EXTRACTED\FOLDER`
|
||||||
- Set `IPEX_LLM_NUM_CTX` to the desired length in the "Command Prompt, e.g. `set IPEX_LLM_NUM_CTX=16384`
|
- Set `OLLAMA_NUM_CTX` to the desired length in the "Command Prompt, e.g. `set OLLAMA_NUM_CTX=16384`
|
||||||
- Start Ollama serve through `start-ollama.bat`
|
- Start Ollama serve through `start-ollama.bat`
|
||||||
|
|
||||||
- For **Linux** users:
|
- For **Linux** users:
|
||||||
|
|
||||||
- In a terminal, navigate to the extracted folder through `cd PATH\TO\EXTRACTED\FOLDER`
|
- In a terminal, navigate to the extracted folder through `cd PATH\TO\EXTRACTED\FOLDER`
|
||||||
- Set `IPEX_LLM_NUM_CTX` to the desired length in the terminal, e.g. `export IPEX_LLM_NUM_CTX=16384`
|
- Set `OLLAMA_NUM_CTX` to the desired length in the terminal, e.g. `export OLLAMA_NUM_CTX=16384`
|
||||||
- Start Ollama serve through `./start-ollama.sh`
|
- Start Ollama serve through `./start-ollama.sh`
|
||||||
|
|
||||||
> [!TIP]
|
> [!TIP]
|
||||||
> `IPEX_LLM_NUM_CTX` has a higher priority than the `num_ctx` settings in a models' `Modelfile`.
|
> `OLLAMA_NUM_CTX` has a higher priority than the `num_ctx` settings in a models' `Modelfile`.
|
||||||
|
|
||||||
|
> [!NOTE]
|
||||||
|
> For versions earlier than 2.7.0b20250429, please use the `IPEX_LLM_NUM_CTX` instead.
|
||||||
|
|
||||||
### Select specific GPU(s) to run Ollama when multiple ones are available
|
### Select specific GPU(s) to run Ollama when multiple ones are available
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -112,24 +112,24 @@ cd PATH/TO/EXTRACTED/FOLDER
|
||||||
|
|
||||||
### 通过切换源提升模型下载速度
|
### 通过切换源提升模型下载速度
|
||||||
|
|
||||||
Ollama 默认从 Ollama 库下载模型。通过在**运行 Ollama 之前**设置环境变量 `IPEX_LLM_MODEL_SOURCE` 为 `modelscope` 或 `ollama`,你可以切换模型的下载源。
|
Ollama 默认从 Ollama 库下载模型。通过在**运行 Ollama 之前**设置环境变量 `OLLAMA_MODEL_SOURCE` 为 `modelscope` 或 `ollama`,你可以切换模型的下载源。
|
||||||
|
|
||||||
例如,如果你想运行 `deepseek-r1:7b` 但从 Ollama 库的下载速度较慢,可以通过如下方式改用 ModelScope 上的模型源:
|
例如,如果你想运行 `deepseek-r1:7b` 但从 Ollama 库的下载速度较慢,可以通过如下方式改用 ModelScope 上的模型源:
|
||||||
|
|
||||||
- 对于 **Windows** 用户:
|
- 对于 **Windows** 用户:
|
||||||
|
|
||||||
- 打开命令提示符通过 `cd /d PATH\TO\EXTRACTED\FOLDER` 命令进入解压后的文件夹
|
- 打开命令提示符通过 `cd /d PATH\TO\EXTRACTED\FOLDER` 命令进入解压后的文件夹
|
||||||
- 在命令提示符中运行 `set IPEX_LLM_MODEL_SOURCE=modelscope`
|
- 在命令提示符中运行 `set OLLAMA_MODEL_SOURCE=modelscope`
|
||||||
- 运行 `ollama run deepseek-r1:7b`
|
- 运行 `ollama run deepseek-r1:7b`
|
||||||
|
|
||||||
- 对于 **Linux** 用户:
|
- 对于 **Linux** 用户:
|
||||||
|
|
||||||
- 在另一个终端(不同于运行 Ollama serve 的终端)中,输入指令 `cd PATH/TO/EXTRACTED/FOLDER` 进入解压后的文件夹
|
- 在另一个终端(不同于运行 Ollama serve 的终端)中,输入指令 `cd PATH/TO/EXTRACTED/FOLDER` 进入解压后的文件夹
|
||||||
- 在终端中运行 `export IPEX_LLM_MODEL_SOURCE=modelscope`
|
- 在终端中运行 `export OLLAMA_MODEL_SOURCE=modelscope`
|
||||||
- 运行 `./ollama run deepseek-r1:7b`
|
- 运行 `./ollama run deepseek-r1:7b`
|
||||||
|
|
||||||
> [!Tip]
|
> [!Tip]
|
||||||
> 使用 `set IPEX_LLM_MODEL_SOURCE=modelscope` 下载的模型,在执行 `ollama list` 时仍会显示实际的模型 ID,例如:
|
> 使用 `set OLLAMA_MODEL_SOURCE=modelscope` 下载的模型,在执行 `ollama list` 时仍会显示实际的模型 ID,例如:
|
||||||
> ```
|
> ```
|
||||||
> NAME ID SIZE MODIFIED
|
> NAME ID SIZE MODIFIED
|
||||||
> modelscope.cn/unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUF:Q4_K_M f482d5af6aec 4.7 GB About a minute ago
|
> modelscope.cn/unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUF:Q4_K_M f482d5af6aec 4.7 GB About a minute ago
|
||||||
|
|
@ -140,22 +140,25 @@ Ollama 默认从 Ollama 库下载模型。通过在**运行 Ollama 之前**设
|
||||||
|
|
||||||
默认情况下,Ollama 使用 2048 个 token 的上下文窗口运行模型。也就是说,模型最多能 “记住” 2048 个 token 的上下文。
|
默认情况下,Ollama 使用 2048 个 token 的上下文窗口运行模型。也就是说,模型最多能 “记住” 2048 个 token 的上下文。
|
||||||
|
|
||||||
要增加上下文长度,可以在**启动 Ollama serve 之前**设置环境变量 `IPEX_LLM_NUM_CTX`,步骤如下(如果 Ollama serve 已经在运行,请确保先将其停止):
|
要增加上下文长度,可以在**启动 Ollama serve 之前**设置环境变量 `OLLAMA_NUM_CTX`,步骤如下(如果 Ollama serve 已经在运行,请确保先将其停止):
|
||||||
|
|
||||||
- 对于 **Windows** 用户:
|
- 对于 **Windows** 用户:
|
||||||
|
|
||||||
- 打开命令提示符,并通过 `cd /d PATH\TO\EXTRACTED\FOLDER` 命令进入解压后的文件夹
|
- 打开命令提示符,并通过 `cd /d PATH\TO\EXTRACTED\FOLDER` 命令进入解压后的文件夹
|
||||||
- 在命令提示符中将 `IPEX_LLM_NUM_CTX` 设置为所需长度,例如:`set IPEX_LLM_NUM_CTX=16384`
|
- 在命令提示符中将 `OLLAMA_NUM_CTX` 设置为所需长度,例如:`set OLLAMA_NUM_CTX=16384`
|
||||||
- 通过运行 `start-ollama.bat` 启动 Ollama serve
|
- 通过运行 `start-ollama.bat` 启动 Ollama serve
|
||||||
|
|
||||||
- 对于 **Linux** 用户:
|
- 对于 **Linux** 用户:
|
||||||
|
|
||||||
- 在终端中输入指令 `cd PATH/TO/EXTRACTED/FOLDER` 进入解压后的文件夹
|
- 在终端中输入指令 `cd PATH/TO/EXTRACTED/FOLDER` 进入解压后的文件夹
|
||||||
- 在终端中将 `IPEX_LLM_NUM_CTX` 设置为所需长度,例如:`export IPEX_LLM_NUM_CTX=16384`
|
- 在终端中将 `OLLAMA_NUM_CTX` 设置为所需长度,例如:`export OLLAMA_NUM_CTX=16384`
|
||||||
- 通过运行 `./start-ollama.sh` 启动 Ollama serve
|
- 通过运行 `./start-ollama.sh` 启动 Ollama serve
|
||||||
|
|
||||||
> [!Tip]
|
> [!Tip]
|
||||||
> `IPEX_LLM_NUM_CTX` 的优先级高于模型 `Modelfile` 中设置的 `num_ctx`。
|
> `OLLAMA_NUM_CTX` 的优先级高于模型 `Modelfile` 中设置的 `num_ctx`。
|
||||||
|
|
||||||
|
> [!NOTE]
|
||||||
|
> 对早于 2.7.0b20250429 的版本,请改用 IPEX_LLM_NUM_CTX 变量。
|
||||||
|
|
||||||
### 在多块 GPU 可用时选择特定的 GPU 来运行 Ollama
|
### 在多块 GPU 可用时选择特定的 GPU 来运行 Ollama
|
||||||
|
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue