Update ollama quickstart (#10756)
* update windows part * update ollama quickstart * update ollama * update * small fix * update * meet review
This commit is contained in:
parent
47622c6a92
commit
1bd431976d
2 changed files with 125 additions and 49 deletions
|
|
@ -15,7 +15,7 @@ IPEX-LLM's support for `llama.cpp` now is avaliable for Linux system and Windows
|
||||||
#### Linux
|
#### Linux
|
||||||
For Linux system, we recommend Ubuntu 20.04 or later (Ubuntu 22.04 is preferred).
|
For Linux system, we recommend Ubuntu 20.04 or later (Ubuntu 22.04 is preferred).
|
||||||
|
|
||||||
Visit the [Install IPEX-LLM on Linux with Intel GPU](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_linux_gpu.html), follow [Install Intel GPU Driver](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_linux_gpu.html#install-intel-gpu-driver) and [Install oneAPI](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_linux_gpu.html#install-oneapi) to install GPU driver and [Intel® oneAPI Base Toolkit 2024.0](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html).
|
Visit the [Install IPEX-LLM on Linux with Intel GPU](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_linux_gpu.html), follow [Install Intel GPU Driver](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_linux_gpu.html#install-intel-gpu-driver) and [Install oneAPI](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_linux_gpu.html#install-oneapi) to install GPU driver and Intel® oneAPI Base Toolkit 2024.0.
|
||||||
|
|
||||||
#### Windows
|
#### Windows
|
||||||
Visit the [Install IPEX-LLM on Windows with Intel GPU Guide](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_windows_gpu.html), and follow [Install Prerequisites](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_windows_gpu.html#install-prerequisites) to install [Visual Studio 2022](https://visualstudio.microsoft.com/downloads/) Community Edition, latest [GPU driver](https://www.intel.com/content/www/us/en/download/785597/intel-arc-iris-xe-graphics-windows.html) and Intel® oneAPI Base Toolkit 2024.0.
|
Visit the [Install IPEX-LLM on Windows with Intel GPU Guide](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_windows_gpu.html), and follow [Install Prerequisites](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_windows_gpu.html#install-prerequisites) to install [Visual Studio 2022](https://visualstudio.microsoft.com/downloads/) Community Edition, latest [GPU driver](https://www.intel.com/content/www/us/en/download/785597/intel-arc-iris-xe-graphics-windows.html) and Intel® oneAPI Base Toolkit 2024.0.
|
||||||
|
|
|
||||||
|
|
@ -1,46 +1,76 @@
|
||||||
# Run Ollama on Linux with Intel GPU
|
# Run Ollama with IPEX-LLM on Intel GPU
|
||||||
|
|
||||||
[ollama/ollama](https://github.com/ollama/ollama) is popular framework designed to build and run language models on a local machine; you can now use the C++ interface of [`ipex-llm`](https://github.com/intel-analytics/ipex-llm) as an accelerated backend for `ollama` running on Intel **GPU** *(e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max)*.
|
[ollama/ollama](https://github.com/ollama/ollama) is popular framework designed to build and run language models on a local machine; you can now use the C++ interface of [`ipex-llm`](https://github.com/intel-analytics/ipex-llm) as an accelerated backend for `ollama` running on Intel **GPU** *(e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max)*.
|
||||||
|
|
||||||
```eval_rst
|
|
||||||
.. note::
|
|
||||||
Only Linux is currently supported.
|
|
||||||
```
|
|
||||||
|
|
||||||
See the demo of running LLaMA2-7B on Intel Arc GPU below.
|
See the demo of running LLaMA2-7B on Intel Arc GPU below.
|
||||||
|
|
||||||
<video src="https://llm-assets.readthedocs.io/en/latest/_images/ollama-linux-arc.mp4" width="100%" controls></video>
|
<video src="https://llm-assets.readthedocs.io/en/latest/_images/ollama-linux-arc.mp4" width="100%" controls></video>
|
||||||
|
|
||||||
## Quickstart
|
## Quickstart
|
||||||
|
|
||||||
### 1 Install IPEX-LLM with Ollama Binaries
|
### 1 Install IPEX-LLM for Ollama
|
||||||
|
|
||||||
Visit [Run llama.cpp with IPEX-LLM on Intel GPU Guide](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/llama_cpp_quickstart.html), and follow the instructions in section [Install Prerequisits on Linux](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/llama_cpp_quickstart.html#linux) , and section [Install IPEX-LLM cpp](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/llama_cpp_quickstart.html#install-ipex-llm-for-llama-cpp) to install the IPEX-LLM with Ollama binaries.
|
IPEX-LLM's support for `ollama` now is avaliable for Linux system and Windows system.
|
||||||
|
|
||||||
**After the installation, you should have created a conda environment, named `llm-cpp` for instance, for running `llama.cpp` commands with IPEX-LLM.**
|
Visit [Run llama.cpp with IPEX-LLM on Intel GPU Guide](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/llama_cpp_quickstart.html), and follow the instructions in section [Prerequisites](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/llama_cpp_quickstart.html#prerequisites) to setup and section [Install IPEX-LLM cpp](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/llama_cpp_quickstart.html#install-ipex-llm-for-llama-cpp) to install the IPEX-LLM with Ollama binaries.
|
||||||
|
|
||||||
|
**After the installation, you should have created a conda environment, named `llm-cpp` for instance, for running `ollama` commands with IPEX-LLM.**
|
||||||
|
|
||||||
### 2. Initialize Ollama
|
### 2. Initialize Ollama
|
||||||
|
|
||||||
Activate the `llm-cpp` conda environment and initialize Ollama by executing the commands below. A symbolic link to `ollama` will appear in your current directory.
|
Activate the `llm-cpp` conda environment and initialize Ollama by executing the commands below. A symbolic link to `ollama` will appear in your current directory.
|
||||||
|
|
||||||
```bash
|
```eval_rst
|
||||||
conda activate llm-cpp
|
.. tabs::
|
||||||
init-ollama
|
.. tab:: Linux
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
conda activate llm-cpp
|
||||||
|
init-ollama
|
||||||
|
|
||||||
|
.. tab:: Windows
|
||||||
|
|
||||||
|
Please run the following command with **administrator privilege in Anaconda Prompt**.
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
conda activate llm-cpp
|
||||||
|
init-ollama.bat
|
||||||
|
|
||||||
```
|
```
|
||||||
|
|
||||||
|
**Now you can use this executable file by standard ollama's usage.**
|
||||||
|
|
||||||
### 3 Run Ollama Serve
|
### 3 Run Ollama Serve
|
||||||
|
|
||||||
|
|
||||||
Launch the Ollama service:
|
Launch the Ollama service:
|
||||||
|
|
||||||
```bash
|
```eval_rst
|
||||||
conda activate llm-cpp
|
.. tabs::
|
||||||
|
.. tab:: Linux
|
||||||
|
|
||||||
export no_proxy=localhost,127.0.0.1
|
.. code-block:: bash
|
||||||
export ZES_ENABLE_SYSMAN=1
|
|
||||||
source /opt/intel/oneapi/setvars.sh
|
export no_proxy=localhost,127.0.0.1
|
||||||
|
export ZES_ENABLE_SYSMAN=1
|
||||||
|
source /opt/intel/oneapi/setvars.sh
|
||||||
|
|
||||||
|
./ollama serve
|
||||||
|
|
||||||
|
.. tab:: Windows
|
||||||
|
|
||||||
|
Please run the following command in Anaconda Prompt.
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
set no_proxy=localhost,127.0.0.1
|
||||||
|
set ZES_ENABLE_SYSMAN=1
|
||||||
|
call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
|
||||||
|
|
||||||
|
ollama.exe serve
|
||||||
|
|
||||||
./ollama serve
|
|
||||||
```
|
```
|
||||||
|
|
||||||
```eval_rst
|
```eval_rst
|
||||||
|
|
@ -56,55 +86,101 @@ The console will display messages similar to the following:
|
||||||
</a>
|
</a>
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
### 4 Pull Model
|
### 4 Pull Model
|
||||||
Keep the Ollama service on and open another terminal and run `./ollama pull <model_name>` to automatically pull a model. e.g. `dolphin-phi:latest`:
|
Keep the Ollama service on and open another terminal and run `./ollama pull <model_name>` in Linux (`ollama.exe pull <model_name>` in Windows) to automatically pull a model. e.g. `dolphin-phi:latest`:
|
||||||
|
|
||||||
<a href="https://llm-assets.readthedocs.io/en/latest/_images/ollama_pull.png" target="_blank">
|
<a href="https://llm-assets.readthedocs.io/en/latest/_images/ollama_pull.png" target="_blank">
|
||||||
<img src="https://llm-assets.readthedocs.io/en/latest/_images/ollama_pull.png" width=100%; />
|
<img src="https://llm-assets.readthedocs.io/en/latest/_images/ollama_pull.png" width=100%; />
|
||||||
</a>
|
</a>
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
### 5 Using Ollama
|
### 5 Using Ollama
|
||||||
|
|
||||||
#### Using Curl
|
#### Using Curl
|
||||||
|
|
||||||
Using `curl` is the easiest way to verify the API service and model. Execute the following commands in a terminal. **Replace the <model_name> with your pulled model**, e.g. `dolphin-phi`.
|
Using `curl` is the easiest way to verify the API service and model. Execute the following commands in a terminal. **Replace the <model_name> with your pulled
|
||||||
|
model**, e.g. `dolphin-phi`.
|
||||||
|
|
||||||
```shell
|
```eval_rst
|
||||||
curl http://localhost:11434/api/generate -d '
|
.. tabs::
|
||||||
{
|
.. tab:: Linux
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
curl http://localhost:11434/api/generate -d '
|
||||||
|
{
|
||||||
"model": "<model_name>",
|
"model": "<model_name>",
|
||||||
"prompt": "Why is the sky blue?",
|
"prompt": "Why is the sky blue?",
|
||||||
"stream": false
|
"stream": false,
|
||||||
}'
|
"options":{"num_gpu": 999}
|
||||||
|
}'
|
||||||
|
|
||||||
|
.. tab:: Windows
|
||||||
|
|
||||||
|
Please run the following command in Anaconda Prompt.
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
curl http://localhost:11434/api/generate -d "
|
||||||
|
{
|
||||||
|
\"model\": \"<model_name>\",
|
||||||
|
\"prompt\": \"Why is the sky blue?\",
|
||||||
|
\"stream\": false,
|
||||||
|
\"options\":{\"num_gpu\": 999}
|
||||||
|
}"
|
||||||
|
|
||||||
```
|
```
|
||||||
|
|
||||||
An example output of using model `doplphin-phi` looks like the following:
|
|
||||||
|
|
||||||
<a href="https://llm-assets.readthedocs.io/en/latest/_images/ollama_curl.png" target="_blank">
|
```eval_rst
|
||||||
<img src="https://llm-assets.readthedocs.io/en/latest/_images/ollama_curl.png" width=100%; />
|
.. note::
|
||||||
</a>
|
|
||||||
|
|
||||||
|
Please don't forget to set ``"options":{"num_gpu": 999}`` to make sure all layers of your model are running on Intel GPU, otherwise, some layers may run on CPU.
|
||||||
|
```
|
||||||
|
|
||||||
#### Using Ollama Run
|
#### Using Ollama Run GGUF models
|
||||||
|
|
||||||
You can also use `ollama run` to run the model directly on console. **Replace the <model_name> with your pulled model**, e.g. `dolphin-phi`. This command will seamlessly download, load the model, and enable you to interact with it through a streaming conversation."
|
|
||||||
|
|
||||||
|
Ollama supports importing GGUF models in the Modelfile, for example, suppose you have downloaded a `mistral-7b-instruct-v0.1.Q4_K_M.gguf` from [Mistral-7B-Instruct-v0.1-GGUF](https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF/tree/main), then you can create a file named `Modelfile`:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
conda activate llm-cpp
|
FROM ./mistral-7b-instruct-v0.1.Q4_K_M.gguf
|
||||||
|
TEMPLATE [INST] {{ .Prompt }} [/INST]
|
||||||
export no_proxy=localhost,127.0.0.1
|
PARAMETER num_gpu 999
|
||||||
export ZES_ENABLE_SYSMAN=1
|
PARAMETER num_predict 64
|
||||||
source /opt/intel/oneapi/setvars.sh
|
|
||||||
|
|
||||||
./ollama run <model_name>
|
|
||||||
```
|
```
|
||||||
|
|
||||||
An example process of interacting with model with `ollama run` looks like the following:
|
```eval_rst
|
||||||
|
.. note::
|
||||||
|
|
||||||
<a href="https://llm-assets.readthedocs.io/en/latest/_images/ollama_run_1.png" target="_blank">
|
Please don't forget to set ``PARAMETER num_gpu 999`` to make sure all layers of your model are running on Intel GPU, otherwise, some layers may run on CPU.
|
||||||
<img src="https://llm-assets.readthedocs.io/en/latest/_images/ollama_run_1.png" width=100%; /><img src="https://llm-assets.readthedocs.io/en/latest/_images/ollama_run_2.png" width=100%; />
|
```
|
||||||
|
|
||||||
|
Then you can create the model in Ollama by `ollama create example -f Modelfile` and use `ollama run` to run the model directly on console.
|
||||||
|
|
||||||
|
```eval_rst
|
||||||
|
.. tabs::
|
||||||
|
.. tab:: Linux
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
export no_proxy=localhost,127.0.0.1
|
||||||
|
./ollama create example -f Modelfile
|
||||||
|
./ollama run example
|
||||||
|
|
||||||
|
.. tab:: Windows
|
||||||
|
|
||||||
|
Please run the following command in Anaconda Prompt.
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
set no_proxy=localhost,127.0.0.1
|
||||||
|
ollama.exe create example -f Modelfile
|
||||||
|
ollama.exe run example
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
An example process of interacting with model with `ollama run example` looks like the following:
|
||||||
|
|
||||||
|
<a href="https://llm-assets.readthedocs.io/en/latest/_images/ollama_gguf_demo_image.png" target="_blank">
|
||||||
|
<img src="https://llm-assets.readthedocs.io/en/latest/_images/ollama_gguf_demo_image.png" width=100%; />
|
||||||
</a>
|
</a>
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue