Ollama quickstart update (#10806)
* add ollama doc for OLLAMA_NUM_GPU * remove useless params * revert unexpected changes back * move env setting to server part * update
This commit is contained in:
parent
08458b4f74
commit
fbd1743b5e
1 changed files with 10 additions and 19 deletions
|
|
@ -44,8 +44,7 @@ Activate the `llm-cpp` conda environment and initialize Ollama by executing the
|
||||||
|
|
||||||
### 3 Run Ollama Serve
|
### 3 Run Ollama Serve
|
||||||
|
|
||||||
|
You may launch the Ollama service as below:
|
||||||
Launch the Ollama service:
|
|
||||||
|
|
||||||
```eval_rst
|
```eval_rst
|
||||||
.. tabs::
|
.. tabs::
|
||||||
|
|
@ -53,6 +52,7 @@ Launch the Ollama service:
|
||||||
|
|
||||||
.. code-block:: bash
|
.. code-block:: bash
|
||||||
|
|
||||||
|
export OLLAMA_NUM_GPU=999
|
||||||
export no_proxy=localhost,127.0.0.1
|
export no_proxy=localhost,127.0.0.1
|
||||||
export ZES_ENABLE_SYSMAN=1
|
export ZES_ENABLE_SYSMAN=1
|
||||||
source /opt/intel/oneapi/setvars.sh
|
source /opt/intel/oneapi/setvars.sh
|
||||||
|
|
@ -65,6 +65,7 @@ Launch the Ollama service:
|
||||||
|
|
||||||
.. code-block:: bash
|
.. code-block:: bash
|
||||||
|
|
||||||
|
set OLLAMA_NUM_GPU=999
|
||||||
set no_proxy=localhost,127.0.0.1
|
set no_proxy=localhost,127.0.0.1
|
||||||
set ZES_ENABLE_SYSMAN=1
|
set ZES_ENABLE_SYSMAN=1
|
||||||
call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
|
call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
|
||||||
|
|
@ -74,6 +75,11 @@ Launch the Ollama service:
|
||||||
```
|
```
|
||||||
|
|
||||||
```eval_rst
|
```eval_rst
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
Please set environment variable ``OLLAMA_NUM_GPU`` to ``999`` to make sure all layers of your model are running on Intel GPU, otherwise, some layers may run on CPU.
|
||||||
|
```
|
||||||
|
|
||||||
.. note::
|
.. note::
|
||||||
|
|
||||||
To allow the service to accept connections from all IP addresses, use `OLLAMA_HOST=0.0.0.0 ./ollama serve` instead of just `./ollama serve`.
|
To allow the service to accept connections from all IP addresses, use `OLLAMA_HOST=0.0.0.0 ./ollama serve` instead of just `./ollama serve`.
|
||||||
|
|
@ -111,8 +117,7 @@ model**, e.g. `dolphin-phi`.
|
||||||
{
|
{
|
||||||
"model": "<model_name>",
|
"model": "<model_name>",
|
||||||
"prompt": "Why is the sky blue?",
|
"prompt": "Why is the sky blue?",
|
||||||
"stream": false,
|
"stream": false
|
||||||
"options":{"num_gpu": 999}
|
|
||||||
}'
|
}'
|
||||||
|
|
||||||
.. tab:: Windows
|
.. tab:: Windows
|
||||||
|
|
@ -125,19 +130,12 @@ model**, e.g. `dolphin-phi`.
|
||||||
{
|
{
|
||||||
\"model\": \"<model_name>\",
|
\"model\": \"<model_name>\",
|
||||||
\"prompt\": \"Why is the sky blue?\",
|
\"prompt\": \"Why is the sky blue?\",
|
||||||
\"stream\": false,
|
\"stream\": false
|
||||||
\"options\":{\"num_gpu\": 999}
|
|
||||||
}"
|
}"
|
||||||
|
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
```eval_rst
|
|
||||||
.. note::
|
|
||||||
|
|
||||||
Please don't forget to set ``"options":{"num_gpu": 999}`` to make sure all layers of your model are running on Intel GPU, otherwise, some layers may run on CPU.
|
|
||||||
```
|
|
||||||
|
|
||||||
#### Using Ollama Run GGUF models
|
#### Using Ollama Run GGUF models
|
||||||
|
|
||||||
Ollama supports importing GGUF models in the Modelfile, for example, suppose you have downloaded a `mistral-7b-instruct-v0.1.Q4_K_M.gguf` from [Mistral-7B-Instruct-v0.1-GGUF](https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF/tree/main), then you can create a file named `Modelfile`:
|
Ollama supports importing GGUF models in the Modelfile, for example, suppose you have downloaded a `mistral-7b-instruct-v0.1.Q4_K_M.gguf` from [Mistral-7B-Instruct-v0.1-GGUF](https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF/tree/main), then you can create a file named `Modelfile`:
|
||||||
|
|
@ -145,16 +143,9 @@ Ollama supports importing GGUF models in the Modelfile, for example, suppose you
|
||||||
```bash
|
```bash
|
||||||
FROM ./mistral-7b-instruct-v0.1.Q4_K_M.gguf
|
FROM ./mistral-7b-instruct-v0.1.Q4_K_M.gguf
|
||||||
TEMPLATE [INST] {{ .Prompt }} [/INST]
|
TEMPLATE [INST] {{ .Prompt }} [/INST]
|
||||||
PARAMETER num_gpu 999
|
|
||||||
PARAMETER num_predict 64
|
PARAMETER num_predict 64
|
||||||
```
|
```
|
||||||
|
|
||||||
```eval_rst
|
|
||||||
.. note::
|
|
||||||
|
|
||||||
Please don't forget to set ``PARAMETER num_gpu 999`` to make sure all layers of your model are running on Intel GPU, otherwise, some layers may run on CPU.
|
|
||||||
```
|
|
||||||
|
|
||||||
Then you can create the model in Ollama by `ollama create example -f Modelfile` and use `ollama run` to run the model directly on console.
|
Then you can create the model in Ollama by `ollama create example -f Modelfile` and use `ollama run` to run the model directly on console.
|
||||||
|
|
||||||
```eval_rst
|
```eval_rst
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue