Ollama quickstart update (#10806)

* add ollama doc for OLLAMA_NUM_GPU

* remove useless params

* revert unexpected changes back

* move env setting to server part

* update
This commit is contained in:
SONG Ge 2024-04-19 15:00:25 +08:00 committed by GitHub
parent 08458b4f74
commit fbd1743b5e
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -44,8 +44,7 @@ Activate the `llm-cpp` conda environment and initialize Ollama by executing the
### 3 Run Ollama Serve ### 3 Run Ollama Serve
You may launch the Ollama service as below:
Launch the Ollama service:
```eval_rst ```eval_rst
.. tabs:: .. tabs::
@ -53,6 +52,7 @@ Launch the Ollama service:
.. code-block:: bash .. code-block:: bash
export OLLAMA_NUM_GPU=999
export no_proxy=localhost,127.0.0.1 export no_proxy=localhost,127.0.0.1
export ZES_ENABLE_SYSMAN=1 export ZES_ENABLE_SYSMAN=1
source /opt/intel/oneapi/setvars.sh source /opt/intel/oneapi/setvars.sh
@ -65,6 +65,7 @@ Launch the Ollama service:
.. code-block:: bash .. code-block:: bash
set OLLAMA_NUM_GPU=999
set no_proxy=localhost,127.0.0.1 set no_proxy=localhost,127.0.0.1
set ZES_ENABLE_SYSMAN=1 set ZES_ENABLE_SYSMAN=1
call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
@ -74,6 +75,11 @@ Launch the Ollama service:
``` ```
```eval_rst ```eval_rst
.. note::
Please set environment variable ``OLLAMA_NUM_GPU`` to ``999`` to make sure all layers of your model are running on Intel GPU, otherwise, some layers may run on CPU.
```
.. note:: .. note::
To allow the service to accept connections from all IP addresses, use `OLLAMA_HOST=0.0.0.0 ./ollama serve` instead of just `./ollama serve`. To allow the service to accept connections from all IP addresses, use `OLLAMA_HOST=0.0.0.0 ./ollama serve` instead of just `./ollama serve`.
@ -111,8 +117,7 @@ model**, e.g. `dolphin-phi`.
{ {
"model": "<model_name>", "model": "<model_name>",
"prompt": "Why is the sky blue?", "prompt": "Why is the sky blue?",
"stream": false, "stream": false
"options":{"num_gpu": 999}
}' }'
.. tab:: Windows .. tab:: Windows
@ -125,19 +130,12 @@ model**, e.g. `dolphin-phi`.
{ {
\"model\": \"<model_name>\", \"model\": \"<model_name>\",
\"prompt\": \"Why is the sky blue?\", \"prompt\": \"Why is the sky blue?\",
\"stream\": false, \"stream\": false
\"options\":{\"num_gpu\": 999}
}" }"
``` ```
```eval_rst
.. note::
Please don't forget to set ``"options":{"num_gpu": 999}`` to make sure all layers of your model are running on Intel GPU, otherwise, some layers may run on CPU.
```
#### Using Ollama Run GGUF models #### Using Ollama Run GGUF models
Ollama supports importing GGUF models in the Modelfile, for example, suppose you have downloaded a `mistral-7b-instruct-v0.1.Q4_K_M.gguf` from [Mistral-7B-Instruct-v0.1-GGUF](https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF/tree/main), then you can create a file named `Modelfile`: Ollama supports importing GGUF models in the Modelfile, for example, suppose you have downloaded a `mistral-7b-instruct-v0.1.Q4_K_M.gguf` from [Mistral-7B-Instruct-v0.1-GGUF](https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF/tree/main), then you can create a file named `Modelfile`:
@ -145,16 +143,9 @@ Ollama supports importing GGUF models in the Modelfile, for example, suppose you
```bash ```bash
FROM ./mistral-7b-instruct-v0.1.Q4_K_M.gguf FROM ./mistral-7b-instruct-v0.1.Q4_K_M.gguf
TEMPLATE [INST] {{ .Prompt }} [/INST] TEMPLATE [INST] {{ .Prompt }} [/INST]
PARAMETER num_gpu 999
PARAMETER num_predict 64 PARAMETER num_predict 64
``` ```
```eval_rst
.. note::
Please don't forget to set ``PARAMETER num_gpu 999`` to make sure all layers of your model are running on Intel GPU, otherwise, some layers may run on CPU.
```
Then you can create the model in Ollama by `ollama create example -f Modelfile` and use `ollama run` to run the model directly on console. Then you can create the model in Ollama by `ollama create example -f Modelfile` and use `ollama run` to run the model directly on console.
```eval_rst ```eval_rst