Ollama quickstart update (#10806)
* add ollama doc for OLLAMA_NUM_GPU * remove useless params * revert unexpected changes back * move env setting to server part * update
This commit is contained in:
		
							parent
							
								
									08458b4f74
								
							
						
					
					
						commit
						fbd1743b5e
					
				
					 1 changed files with 10 additions and 19 deletions
				
			
		| 
						 | 
					@ -44,8 +44,7 @@ Activate the `llm-cpp` conda environment and initialize Ollama by executing the
 | 
				
			||||||
 | 
					
 | 
				
			||||||
### 3 Run Ollama Serve
 | 
					### 3 Run Ollama Serve
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					You may launch the Ollama service as below:
 | 
				
			||||||
Launch the Ollama service:
 | 
					 | 
				
			||||||
 | 
					
 | 
				
			||||||
```eval_rst
 | 
					```eval_rst
 | 
				
			||||||
.. tabs::
 | 
					.. tabs::
 | 
				
			||||||
| 
						 | 
					@ -53,6 +52,7 @@ Launch the Ollama service:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
      .. code-block:: bash
 | 
					      .. code-block:: bash
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					         export OLLAMA_NUM_GPU=999
 | 
				
			||||||
         export no_proxy=localhost,127.0.0.1
 | 
					         export no_proxy=localhost,127.0.0.1
 | 
				
			||||||
         export ZES_ENABLE_SYSMAN=1
 | 
					         export ZES_ENABLE_SYSMAN=1
 | 
				
			||||||
         source /opt/intel/oneapi/setvars.sh
 | 
					         source /opt/intel/oneapi/setvars.sh
 | 
				
			||||||
| 
						 | 
					@ -65,6 +65,7 @@ Launch the Ollama service:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
      .. code-block:: bash
 | 
					      .. code-block:: bash
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					         set OLLAMA_NUM_GPU=999
 | 
				
			||||||
         set no_proxy=localhost,127.0.0.1
 | 
					         set no_proxy=localhost,127.0.0.1
 | 
				
			||||||
         set ZES_ENABLE_SYSMAN=1
 | 
					         set ZES_ENABLE_SYSMAN=1
 | 
				
			||||||
         call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 | 
					         call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 | 
				
			||||||
| 
						 | 
					@ -74,6 +75,11 @@ Launch the Ollama service:
 | 
				
			||||||
```
 | 
					```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
```eval_rst
 | 
					```eval_rst
 | 
				
			||||||
 | 
					.. note::
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					  Please set environment variable ``OLLAMA_NUM_GPU`` to ``999`` to make sure all layers of your model are running on Intel GPU, otherwise, some layers may run on CPU.
 | 
				
			||||||
 | 
					```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
.. note::
 | 
					.. note::
 | 
				
			||||||
 | 
					
 | 
				
			||||||
  To allow the service to accept connections from all IP addresses, use `OLLAMA_HOST=0.0.0.0 ./ollama serve` instead of just `./ollama serve`.
 | 
					  To allow the service to accept connections from all IP addresses, use `OLLAMA_HOST=0.0.0.0 ./ollama serve` instead of just `./ollama serve`.
 | 
				
			||||||
| 
						 | 
					@ -111,8 +117,7 @@ model**, e.g. `dolphin-phi`.
 | 
				
			||||||
         { 
 | 
					         { 
 | 
				
			||||||
            "model": "<model_name>", 
 | 
					            "model": "<model_name>", 
 | 
				
			||||||
            "prompt": "Why is the sky blue?", 
 | 
					            "prompt": "Why is the sky blue?", 
 | 
				
			||||||
            "stream": false,
 | 
					            "stream": false
 | 
				
			||||||
            "options":{"num_gpu": 999}
 | 
					 | 
				
			||||||
         }'
 | 
					         }'
 | 
				
			||||||
 | 
					
 | 
				
			||||||
   .. tab:: Windows
 | 
					   .. tab:: Windows
 | 
				
			||||||
| 
						 | 
					@ -125,19 +130,12 @@ model**, e.g. `dolphin-phi`.
 | 
				
			||||||
         {
 | 
					         {
 | 
				
			||||||
            \"model\": \"<model_name>\",
 | 
					            \"model\": \"<model_name>\",
 | 
				
			||||||
            \"prompt\": \"Why is the sky blue?\",
 | 
					            \"prompt\": \"Why is the sky blue?\",
 | 
				
			||||||
            \"stream\": false,
 | 
					            \"stream\": false
 | 
				
			||||||
            \"options\":{\"num_gpu\": 999}
 | 
					 | 
				
			||||||
         }"
 | 
					         }"
 | 
				
			||||||
 | 
					
 | 
				
			||||||
```
 | 
					```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
```eval_rst
 | 
					 | 
				
			||||||
.. note::
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
  Please don't forget to set ``"options":{"num_gpu": 999}`` to make sure all layers of your model are running on Intel GPU, otherwise, some layers may run on CPU.
 | 
					 | 
				
			||||||
```
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
#### Using Ollama Run GGUF models
 | 
					#### Using Ollama Run GGUF models
 | 
				
			||||||
 | 
					
 | 
				
			||||||
Ollama supports importing GGUF models in the Modelfile, for example, suppose you have downloaded a `mistral-7b-instruct-v0.1.Q4_K_M.gguf` from [Mistral-7B-Instruct-v0.1-GGUF](https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF/tree/main), then you can create a file named `Modelfile`:
 | 
					Ollama supports importing GGUF models in the Modelfile, for example, suppose you have downloaded a `mistral-7b-instruct-v0.1.Q4_K_M.gguf` from [Mistral-7B-Instruct-v0.1-GGUF](https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF/tree/main), then you can create a file named `Modelfile`:
 | 
				
			||||||
| 
						 | 
					@ -145,16 +143,9 @@ Ollama supports importing GGUF models in the Modelfile, for example, suppose you
 | 
				
			||||||
```bash
 | 
					```bash
 | 
				
			||||||
FROM ./mistral-7b-instruct-v0.1.Q4_K_M.gguf
 | 
					FROM ./mistral-7b-instruct-v0.1.Q4_K_M.gguf
 | 
				
			||||||
TEMPLATE [INST] {{ .Prompt }} [/INST]
 | 
					TEMPLATE [INST] {{ .Prompt }} [/INST]
 | 
				
			||||||
PARAMETER num_gpu 999
 | 
					 | 
				
			||||||
PARAMETER num_predict 64
 | 
					PARAMETER num_predict 64
 | 
				
			||||||
```
 | 
					```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
```eval_rst
 | 
					 | 
				
			||||||
.. note::
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
  Please don't forget to set ``PARAMETER num_gpu 999`` to make sure all layers of your model are running on Intel GPU, otherwise, some layers may run on CPU.
 | 
					 | 
				
			||||||
```
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
Then you can create the model in Ollama by `ollama create example -f Modelfile` and use `ollama run` to run the model directly on console.
 | 
					Then you can create the model in Ollama by `ollama create example -f Modelfile` and use `ollama run` to run the model directly on console.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
```eval_rst
 | 
					```eval_rst
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
		Loading…
	
		Reference in a new issue