add ollama quickstart (#10649)
Co-authored-by: arda <arda@arda-arc12.sh.intel.com>
This commit is contained in:
		
							parent
							
								
									1ae519ec69
								
							
						
					
					
						commit
						f789c2eee4
					
				
					 1 changed files with 44 additions and 0 deletions
				
			
		| 
						 | 
					@ -0,0 +1,44 @@
 | 
				
			||||||
 | 
					# Run Ollama on Intel GPU
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					### 1 Install Ollama integrated with IPEX-LLM
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					First ensure that IPEX-LLM is installed. Follow the instructions on the [IPEX-LLM Installation Quickstart for Windows with Intel GPU](install_windows_gpu.html). And activate your conda environment.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Run `pip install --pre --upgrade ipex-llm[cpp]`, then execute `init-ollama`, you can see a softlink of `ollama`under your current directory.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					### 2 Verify Ollama Serve
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					To avoid potential proxy issues, run `export no_proxy=localhost,127.0.0.1`. Execute `export ZES_ENABLE_SYSMAN=1` and `source /opt/intel/oneapi/setvars.sh` to enable driver initialization and dependencies for system management.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Start the service using `./ollama serve`. It should display something like:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					To expose the `ollama` service port and access it from another machine, use `OLLAMA_HOST=0.0.0.0 ./ollama serve`.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Open another terminal, use `./ollama pull <model_name>` to download a model locally.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Verify the setup with the following command:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					```shell
 | 
				
			||||||
 | 
					curl http://localhost:11434/api/generate -d '
 | 
				
			||||||
 | 
					{ 
 | 
				
			||||||
 | 
					  "model": "<model_name>", 
 | 
				
			||||||
 | 
					  "prompt": "Why is the sky blue?", 
 | 
				
			||||||
 | 
					  "stream": false 
 | 
				
			||||||
 | 
					}'
 | 
				
			||||||
 | 
					```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Expected results:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					### 3 Example: Ollama Run
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					You can use `./ollama run <model_name>` to automatically pull and load the model for a stream chat.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
		Loading…
	
		Reference in a new issue