From fbd1743b5ec26edf86d349a9f8d165696eea27b0 Mon Sep 17 00:00:00 2001 From: SONG Ge <38711238+sgwhat@users.noreply.github.com> Date: Fri, 19 Apr 2024 15:00:25 +0800 Subject: [PATCH] Ollama quickstart update (#10806) * add ollama doc for OLLAMA_NUM_GPU * remove useless params * revert unexpected changes back * move env setting to server part * update --- .../doc/LLM/Quickstart/ollama_quickstart.md | 29 +++++++------------ 1 file changed, 10 insertions(+), 19 deletions(-) diff --git a/docs/readthedocs/source/doc/LLM/Quickstart/ollama_quickstart.md b/docs/readthedocs/source/doc/LLM/Quickstart/ollama_quickstart.md index 4c893a93..a150c0df 100644 --- a/docs/readthedocs/source/doc/LLM/Quickstart/ollama_quickstart.md +++ b/docs/readthedocs/source/doc/LLM/Quickstart/ollama_quickstart.md @@ -44,8 +44,7 @@ Activate the `llm-cpp` conda environment and initialize Ollama by executing the ### 3 Run Ollama Serve - -Launch the Ollama service: +You may launch the Ollama service as below: ```eval_rst .. tabs:: @@ -53,6 +52,7 @@ Launch the Ollama service: .. code-block:: bash + export OLLAMA_NUM_GPU=999 export no_proxy=localhost,127.0.0.1 export ZES_ENABLE_SYSMAN=1 source /opt/intel/oneapi/setvars.sh @@ -65,6 +65,7 @@ Launch the Ollama service: .. code-block:: bash + set OLLAMA_NUM_GPU=999 set no_proxy=localhost,127.0.0.1 set ZES_ENABLE_SYSMAN=1 call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" @@ -74,6 +75,11 @@ Launch the Ollama service: ``` ```eval_rst +.. note:: + + Please set environment variable ``OLLAMA_NUM_GPU`` to ``999`` to make sure all layers of your model are running on Intel GPU, otherwise, some layers may run on CPU. +``` + .. note:: To allow the service to accept connections from all IP addresses, use `OLLAMA_HOST=0.0.0.0 ./ollama serve` instead of just `./ollama serve`. @@ -111,8 +117,7 @@ model**, e.g. `dolphin-phi`. { "model": "", "prompt": "Why is the sky blue?", - "stream": false, - "options":{"num_gpu": 999} + "stream": false }' .. tab:: Windows @@ -125,19 +130,12 @@ model**, e.g. `dolphin-phi`. { \"model\": \"\", \"prompt\": \"Why is the sky blue?\", - \"stream\": false, - \"options\":{\"num_gpu\": 999} + \"stream\": false }" ``` -```eval_rst -.. note:: - - Please don't forget to set ``"options":{"num_gpu": 999}`` to make sure all layers of your model are running on Intel GPU, otherwise, some layers may run on CPU. -``` - #### Using Ollama Run GGUF models Ollama supports importing GGUF models in the Modelfile, for example, suppose you have downloaded a `mistral-7b-instruct-v0.1.Q4_K_M.gguf` from [Mistral-7B-Instruct-v0.1-GGUF](https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF/tree/main), then you can create a file named `Modelfile`: @@ -145,16 +143,9 @@ Ollama supports importing GGUF models in the Modelfile, for example, suppose you ```bash FROM ./mistral-7b-instruct-v0.1.Q4_K_M.gguf TEMPLATE [INST] {{ .Prompt }} [/INST] -PARAMETER num_gpu 999 PARAMETER num_predict 64 ``` -```eval_rst -.. note:: - - Please don't forget to set ``PARAMETER num_gpu 999`` to make sure all layers of your model are running on Intel GPU, otherwise, some layers may run on CPU. -``` - Then you can create the model in Ollama by `ollama create example -f Modelfile` and use `ollama run` to run the model directly on console. ```eval_rst