diff --git a/docs/mddocs/Quickstart/ollama_portable_zip_quickstart.md b/docs/mddocs/Quickstart/ollama_portable_zip_quickstart.md index b98d4aff..5de4607e 100644 --- a/docs/mddocs/Quickstart/ollama_portable_zip_quickstart.md +++ b/docs/mddocs/Quickstart/ollama_portable_zip_quickstart.md @@ -114,24 +114,24 @@ You could then use Ollama to run LLMs on Intel GPUs as follows: ### Speed up model download using alternative sources -Ollama by default downloads model from the Ollama library. By setting the environment variable `IPEX_LLM_MODEL_SOURCE` to `modelscope` or `ollama` **before running Ollama**, you could switch the source where the model is downloaded. +Ollama by default downloads model from the Ollama library. By setting the environment variable `OLLAMA_MODEL_SOURCE` to `modelscope` or `ollama` **before running Ollama**, you could switch the source where the model is downloaded. For example, if you would like to run `deepseek-r1:7b` but the download speed from the Ollama library is slow, you could download the model from ModelScope as follows: - For **Windows** users: - In the "Command Prompt", navigate to the extracted folder by `cd /d PATH\TO\EXTRACTED\FOLDER` - - Run `set IPEX_LLM_MODEL_SOURCE=modelscope` in "Command Prompt" + - Run `set OLLAMA_MODEL_SOURCE=modelscope` in "Command Prompt" - Run `ollama run deepseek-r1:7b` - For **Linux** users: - In a terminal other than the one for Ollama serve, navigate to the extracted folder by `cd PATH\TO\EXTRACTED\FOLDER` - - Run `export IPEX_LLM_MODEL_SOURCE=modelscope` in the terminal + - Run `export OLLAMA_MODEL_SOURCE=modelscope` in the terminal - Run `./ollama run deepseek-r1:7b` > [!TIP] -> Model downloaded with `set IPEX_LLM_MODEL_SOURCE=modelscope` will still show actual model id in `ollama list`, e.g. +> Model downloaded with `set OLLAMA_MODEL_SOURCE=modelscope` will still show actual model id in `ollama list`, e.g. > ``` > NAME ID SIZE MODIFIED > modelscope.cn/unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUF:Q4_K_M f482d5af6aec 4.7 GB About a minute ago @@ -142,22 +142,25 @@ For example, if you would like to run `deepseek-r1:7b` but the download speed fr By default, Ollama runs model with a context window of 2048 tokens. That is, the model can "remember" at most 2048 tokens of context. -To increase the context length, you could set environment variable `IPEX_LLM_NUM_CTX` **before staring Ollama Serve**, as shwon below (if Ollama serve is already running, please make sure to stop it first): +To increase the context length, you could set environment variable `OLLAMA_NUM_CTX` **before staring Ollama Serve**, as shwon below (if Ollama serve is already running, please make sure to stop it first): - For **Windows** users: - Open "Command Prompt", and navigate to the extracted folder through `cd /d PATH\TO\EXTRACTED\FOLDER` - - Set `IPEX_LLM_NUM_CTX` to the desired length in the "Command Prompt, e.g. `set IPEX_LLM_NUM_CTX=16384` + - Set `OLLAMA_NUM_CTX` to the desired length in the "Command Prompt, e.g. `set OLLAMA_NUM_CTX=16384` - Start Ollama serve through `start-ollama.bat` - For **Linux** users: - In a terminal, navigate to the extracted folder through `cd PATH\TO\EXTRACTED\FOLDER` - - Set `IPEX_LLM_NUM_CTX` to the desired length in the terminal, e.g. `export IPEX_LLM_NUM_CTX=16384` + - Set `OLLAMA_NUM_CTX` to the desired length in the terminal, e.g. `export OLLAMA_NUM_CTX=16384` - Start Ollama serve through `./start-ollama.sh` > [!TIP] -> `IPEX_LLM_NUM_CTX` has a higher priority than the `num_ctx` settings in a models' `Modelfile`. +> `OLLAMA_NUM_CTX` has a higher priority than the `num_ctx` settings in a models' `Modelfile`. + +> [!NOTE] +> For versions earlier than 2.7.0b20250429, please use the `IPEX_LLM_NUM_CTX` instead. ### Select specific GPU(s) to run Ollama when multiple ones are available diff --git a/docs/mddocs/Quickstart/ollama_portable_zip_quickstart.zh-CN.md b/docs/mddocs/Quickstart/ollama_portable_zip_quickstart.zh-CN.md index 3e5db7dd..334946f6 100644 --- a/docs/mddocs/Quickstart/ollama_portable_zip_quickstart.zh-CN.md +++ b/docs/mddocs/Quickstart/ollama_portable_zip_quickstart.zh-CN.md @@ -112,24 +112,24 @@ cd PATH/TO/EXTRACTED/FOLDER ### 通过切换源提升模型下载速度 -Ollama 默认从 Ollama 库下载模型。通过在**运行 Ollama 之前**设置环境变量 `IPEX_LLM_MODEL_SOURCE` 为 `modelscope` 或 `ollama`,你可以切换模型的下载源。 +Ollama 默认从 Ollama 库下载模型。通过在**运行 Ollama 之前**设置环境变量 `OLLAMA_MODEL_SOURCE` 为 `modelscope` 或 `ollama`,你可以切换模型的下载源。 例如,如果你想运行 `deepseek-r1:7b` 但从 Ollama 库的下载速度较慢,可以通过如下方式改用 ModelScope 上的模型源: - 对于 **Windows** 用户: - 打开命令提示符通过 `cd /d PATH\TO\EXTRACTED\FOLDER` 命令进入解压后的文件夹 - - 在命令提示符中运行 `set IPEX_LLM_MODEL_SOURCE=modelscope` + - 在命令提示符中运行 `set OLLAMA_MODEL_SOURCE=modelscope` - 运行 `ollama run deepseek-r1:7b` - 对于 **Linux** 用户: - 在另一个终端(不同于运行 Ollama serve 的终端)中,输入指令 `cd PATH/TO/EXTRACTED/FOLDER` 进入解压后的文件夹 - - 在终端中运行 `export IPEX_LLM_MODEL_SOURCE=modelscope` + - 在终端中运行 `export OLLAMA_MODEL_SOURCE=modelscope` - 运行 `./ollama run deepseek-r1:7b` > [!Tip] -> 使用 `set IPEX_LLM_MODEL_SOURCE=modelscope` 下载的模型,在执行 `ollama list` 时仍会显示实际的模型 ID,例如: +> 使用 `set OLLAMA_MODEL_SOURCE=modelscope` 下载的模型,在执行 `ollama list` 时仍会显示实际的模型 ID,例如: > ``` > NAME ID SIZE MODIFIED > modelscope.cn/unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUF:Q4_K_M f482d5af6aec 4.7 GB About a minute ago @@ -140,22 +140,25 @@ Ollama 默认从 Ollama 库下载模型。通过在**运行 Ollama 之前**设 默认情况下,Ollama 使用 2048 个 token 的上下文窗口运行模型。也就是说,模型最多能 “记住” 2048 个 token 的上下文。 -要增加上下文长度,可以在**启动 Ollama serve 之前**设置环境变量 `IPEX_LLM_NUM_CTX`,步骤如下(如果 Ollama serve 已经在运行,请确保先将其停止): +要增加上下文长度,可以在**启动 Ollama serve 之前**设置环境变量 `OLLAMA_NUM_CTX`,步骤如下(如果 Ollama serve 已经在运行,请确保先将其停止): - 对于 **Windows** 用户: - 打开命令提示符,并通过 `cd /d PATH\TO\EXTRACTED\FOLDER` 命令进入解压后的文件夹 - - 在命令提示符中将 `IPEX_LLM_NUM_CTX` 设置为所需长度,例如:`set IPEX_LLM_NUM_CTX=16384` + - 在命令提示符中将 `OLLAMA_NUM_CTX` 设置为所需长度,例如:`set OLLAMA_NUM_CTX=16384` - 通过运行 `start-ollama.bat` 启动 Ollama serve - 对于 **Linux** 用户: - 在终端中输入指令 `cd PATH/TO/EXTRACTED/FOLDER` 进入解压后的文件夹 - - 在终端中将 `IPEX_LLM_NUM_CTX` 设置为所需长度,例如:`export IPEX_LLM_NUM_CTX=16384` + - 在终端中将 `OLLAMA_NUM_CTX` 设置为所需长度,例如:`export OLLAMA_NUM_CTX=16384` - 通过运行 `./start-ollama.sh` 启动 Ollama serve > [!Tip] -> `IPEX_LLM_NUM_CTX` 的优先级高于模型 `Modelfile` 中设置的 `num_ctx`。 +> `OLLAMA_NUM_CTX` 的优先级高于模型 `Modelfile` 中设置的 `num_ctx`。 + +> [!NOTE] +> 对早于 2.7.0b20250429 的版本,请改用 IPEX_LLM_NUM_CTX 变量。 ### 在多块 GPU 可用时选择特定的 GPU 来运行 Ollama