From 637543e1356f7c9dcf00ae58548286debe3005a0 Mon Sep 17 00:00:00 2001 From: Yuwen Hu <54161268+Oscilloscope98@users.noreply.github.com> Date: Wed, 19 Feb 2025 11:04:03 +0800 Subject: [PATCH] Update Ollama portable zip QuickStart with troubleshooting (#12846) * Update ollama portable zip quickstart with runtime configurations * Small fix * Update based on comments * Small fix * Small fix --- .../ollama_portablze_zip_quickstart.md | 35 ++++++++++++++++++- 1 file changed, 34 insertions(+), 1 deletion(-) diff --git a/docs/mddocs/Quickstart/ollama_portablze_zip_quickstart.md b/docs/mddocs/Quickstart/ollama_portablze_zip_quickstart.md index 71aaac71..f34c974f 100644 --- a/docs/mddocs/Quickstart/ollama_portablze_zip_quickstart.md +++ b/docs/mddocs/Quickstart/ollama_portablze_zip_quickstart.md @@ -13,6 +13,7 @@ This guide demonstrates how to use [Ollama portable zip](https://github.com/inte - [Step 1: Download and Unzip](#step-1-download-and-unzip) - [Step 2: Start Ollama Serve](#step-2-start-ollama-serve) - [Step 3: Run Ollama](#step-3-run-ollama) +- [Tips & Troubleshooting](#tips--troubleshootings) ## Prerequisites @@ -36,7 +37,6 @@ Double-click `start-ollama.bat` in the extracted folder to start the Ollama serv - ## Step 3: Run Ollama You could then use Ollama to run LLMs on Intel GPUs as follows: @@ -47,3 +47,36 @@ You could then use Ollama to run LLMs on Intel GPUs as follows:
+ +## Tips & Troubleshooting + +### Speed up model download using alternative sources + +Ollama by default downloads model from [Ollama library](https://ollama.com/library). By setting the environment variable `IPEX_LLM_MODEL_SOURCE` to `modelscope`/`ollama` before [run Ollama](#step-3-run-ollama), you could switch the source from which the model is downloaded first. + +For example, if you would like to run `deepseek-r1:7b` but the download speed from Ollama library is quite slow, you could use [its model source](https://www.modelscope.cn/models/unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUF) from [ModelScope](https://www.modelscope.cn/models) instead, through: + +- Open "Command Prompt" (cmd), and navigate to the extracted folder by `cd /d PATH\TO\EXTRACTED\FOLDER` +- Run `set IPEX_LLM_MODEL_SOURCE=modelscope` in "Command Prompt" +- Run `ollama run deepseek-r1:7b` + +> [!TIP] +> Model downloaded with `set IPEX_LLM_MODEL_SOURCE=modelscope` will still show actual model id in `ollama list`, e.g. +> ``` +> NAME ID SIZE MODIFIED +> modelscope.cn/unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUF:Q4_K_M f482d5af6aec 4.7 GB About a minute ago +> ``` +> Except for `ollama run` and `ollama pull`, the model should be identified through its actual id, e.g. `ollama rm modelscope.cn/unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUF:Q4_K_M` + +### Increase context length in Ollama + +By default, Ollama runs model with a context window of 2048 tokens. That is, the model can "remember" at most 2048 tokens of context. + +To increase the context length, you could set environment variable `IPEX_LLM_NUM_CTX` before [starting Ollama serve](#step-2-start-ollama-serve), as shwon below: + +- Open "Command Prompt" (cmd), and navigate to the extracted folder through `cd /d PATH\TO\EXTRACTED\FOLDER` +- Set `IPEX_LLM_NUM_CTX` to the desired length in the "Command Prompt, e.g. `set IPEX_LLM_NUM_CTX=16384` +- Start Ollama serve through `start-ollama.bat` + +> [!TIP] +> `IPEX_LLM_NUM_CTX` has a higher priority than the `num_ctx` settings in a models' `Modelfile`. \ No newline at end of file