Update Ollama portable zip QuickStart with troubleshooting (#12846)
* Update ollama portable zip quickstart with runtime configurations * Small fix * Update based on comments * Small fix * Small fix
This commit is contained in:
		
							parent
							
								
									bde8acc303
								
							
						
					
					
						commit
						637543e135
					
				
					 1 changed files with 34 additions and 1 deletions
				
			
		| 
						 | 
					@ -13,6 +13,7 @@ This guide demonstrates how to use [Ollama portable zip](https://github.com/inte
 | 
				
			||||||
- [Step 1: Download and Unzip](#step-1-download-and-unzip)
 | 
					- [Step 1: Download and Unzip](#step-1-download-and-unzip)
 | 
				
			||||||
- [Step 2: Start Ollama Serve](#step-2-start-ollama-serve)
 | 
					- [Step 2: Start Ollama Serve](#step-2-start-ollama-serve)
 | 
				
			||||||
- [Step 3: Run Ollama](#step-3-run-ollama)
 | 
					- [Step 3: Run Ollama](#step-3-run-ollama)
 | 
				
			||||||
 | 
					- [Tips & Troubleshooting](#tips--troubleshootings)
 | 
				
			||||||
 | 
					
 | 
				
			||||||
## Prerequisites
 | 
					## Prerequisites
 | 
				
			||||||
 | 
					
 | 
				
			||||||
| 
						 | 
					@ -36,7 +37,6 @@ Double-click `start-ollama.bat` in the extracted folder to start the Ollama serv
 | 
				
			||||||
  <img src="https://llm-assets.readthedocs.io/en/latest/_images/ollama_portable_start_ollama.png"  width=80%/>
 | 
					  <img src="https://llm-assets.readthedocs.io/en/latest/_images/ollama_portable_start_ollama.png"  width=80%/>
 | 
				
			||||||
</div>
 | 
					</div>
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					 | 
				
			||||||
## Step 3: Run Ollama
 | 
					## Step 3: Run Ollama
 | 
				
			||||||
 | 
					
 | 
				
			||||||
You could then use Ollama to run LLMs on Intel GPUs as follows:
 | 
					You could then use Ollama to run LLMs on Intel GPUs as follows:
 | 
				
			||||||
| 
						 | 
					@ -47,3 +47,36 @@ You could then use Ollama to run LLMs on Intel GPUs as follows:
 | 
				
			||||||
<div align="center">
 | 
					<div align="center">
 | 
				
			||||||
  <img src="https://llm-assets.readthedocs.io/en/latest/_images/ollama_portable_run_ollama.png"  width=80%/>
 | 
					  <img src="https://llm-assets.readthedocs.io/en/latest/_images/ollama_portable_run_ollama.png"  width=80%/>
 | 
				
			||||||
</div>
 | 
					</div>
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Tips & Troubleshooting
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					### Speed up model download using alternative sources
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Ollama by default downloads model from [Ollama library](https://ollama.com/library). By setting the environment variable `IPEX_LLM_MODEL_SOURCE` to `modelscope`/`ollama` before [run Ollama](#step-3-run-ollama), you could switch the source from which the model is downloaded first.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					For example, if you would like to run `deepseek-r1:7b` but the download speed from Ollama library is quite slow, you could use [its model source](https://www.modelscope.cn/models/unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUF) from [ModelScope](https://www.modelscope.cn/models) instead, through:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					- Open "Command Prompt" (cmd), and navigate to the extracted folder by `cd /d PATH\TO\EXTRACTED\FOLDER`
 | 
				
			||||||
 | 
					- Run `set IPEX_LLM_MODEL_SOURCE=modelscope` in "Command Prompt"
 | 
				
			||||||
 | 
					- Run `ollama run deepseek-r1:7b`
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					> [!TIP]
 | 
				
			||||||
 | 
					> Model downloaded with `set IPEX_LLM_MODEL_SOURCE=modelscope` will still show actual model id in `ollama list`, e.g.
 | 
				
			||||||
 | 
					> ```
 | 
				
			||||||
 | 
					> NAME                                                             ID              SIZE      MODIFIED
 | 
				
			||||||
 | 
					> modelscope.cn/unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUF:Q4_K_M    f482d5af6aec    4.7 GB    About a minute ago
 | 
				
			||||||
 | 
					> ```
 | 
				
			||||||
 | 
					> Except for `ollama run` and `ollama pull`, the model should be identified through its actual id, e.g. `ollama rm modelscope.cn/unsloth/DeepSeek-R1-Distill-Qwen-7B-GGUF:Q4_K_M`
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					### Increase context length in Ollama
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					By default, Ollama runs model with a context window of 2048 tokens. That is, the model can "remember" at most 2048 tokens of context.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					To increase the context length, you could set environment variable `IPEX_LLM_NUM_CTX` before [starting Ollama serve](#step-2-start-ollama-serve), as shwon below:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					- Open "Command Prompt" (cmd), and navigate to the extracted folder through `cd /d PATH\TO\EXTRACTED\FOLDER`
 | 
				
			||||||
 | 
					- Set `IPEX_LLM_NUM_CTX` to the desired length in the "Command Prompt, e.g. `set IPEX_LLM_NUM_CTX=16384`
 | 
				
			||||||
 | 
					- Start Ollama serve through `start-ollama.bat`
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					> [!TIP]
 | 
				
			||||||
 | 
					> `IPEX_LLM_NUM_CTX` has a higher priority than the `num_ctx` settings in a models' `Modelfile`.
 | 
				
			||||||
		Loading…
	
		Reference in a new issue