revise ollama quickstart (#10653)
This commit is contained in:
parent
f789c2eee4
commit
f84e72e7af
4 changed files with 83 additions and 18 deletions
|
|
@ -43,6 +43,9 @@
|
|||
<li>
|
||||
<a href="doc/LLM/Quickstart/llama_cpp_quickstart.html">Run llama.cpp with IPEX-LLM on Intel GPU</a>
|
||||
</li>
|
||||
<li>
|
||||
<a href="doc/LLM/Quickstart/ollama_quickstart.html">Run Ollama with IPEX-LLM on Intel GPU</a>
|
||||
</li>
|
||||
</ul>
|
||||
</li>
|
||||
<li>
|
||||
|
|
|
|||
|
|
@ -28,6 +28,7 @@ subtrees:
|
|||
- file: doc/LLM/Quickstart/continue_quickstart
|
||||
- file: doc/LLM/Quickstart/benchmark_quickstart
|
||||
- file: doc/LLM/Quickstart/llama_cpp_quickstart
|
||||
- file: doc/LLM/Quickstart/ollama_quickstart
|
||||
- file: doc/LLM/Overview/KeyFeatures/index
|
||||
title: "Key Features"
|
||||
subtrees:
|
||||
|
|
|
|||
|
|
@ -17,6 +17,7 @@ This section includes efficient guide to show you how to:
|
|||
* `Run Text Generation WebUI on Intel GPU <./webui_quickstart.html>`_
|
||||
* `Run Code Copilot (Continue) in VSCode with Intel GPU <./continue_quickstart.html>`_
|
||||
* `Run llama.cpp with IPEX-LLM on Intel GPU <./llama_cpp_quickstart.html>`_
|
||||
* `Run Ollama with IPEX-LLM on Intel GPU <./ollama_quickstart.html>`_
|
||||
|
||||
.. |bigdl_llm_migration_guide| replace:: ``bigdl-llm`` Migration Guide
|
||||
.. _bigdl_llm_migration_guide: bigdl_llm_migration.html
|
||||
|
|
|
|||
|
|
@ -1,26 +1,70 @@
|
|||
# Run Ollama on Intel GPU
|
||||
# Run Ollama on Linux with Intel GPU
|
||||
|
||||
### 1 Install Ollama integrated with IPEX-LLM
|
||||
The [ollama/ollama](https://github.com/ollama/ollama) is popular framework designed to build and run language models on a local machine. Now you can run Ollama with [`ipex-llm`](https://github.com/intel-analytics/ipex-llm) on Intel GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max); see the demo of running LLaMA2-7B on an Intel A770 GPU below.
|
||||
|
||||
First ensure that IPEX-LLM is installed. Follow the instructions on the [IPEX-LLM Installation Quickstart for Windows with Intel GPU](install_windows_gpu.html). And activate your conda environment.
|
||||
<video src="https://llm-assets.readthedocs.io/en/latest/_images/ollama-linux-arc.mp4" width="100%" controls></video>
|
||||
|
||||
Run `pip install --pre --upgrade ipex-llm[cpp]`, then execute `init-ollama`, you can see a softlink of `ollama`under your current directory.
|
||||
|
||||
### 2 Verify Ollama Serve
|
||||
## Quickstart
|
||||
|
||||
To avoid potential proxy issues, run `export no_proxy=localhost,127.0.0.1`. Execute `export ZES_ENABLE_SYSMAN=1` and `source /opt/intel/oneapi/setvars.sh` to enable driver initialization and dependencies for system management.
|
||||
### 1 Install IPEX-LLM with Ollama Binaries
|
||||
|
||||
Start the service using `./ollama serve`. It should display something like:
|
||||
Visit [Run llama.cpp with IPEX-LLM on Intel GPU Guide](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/llama_cpp_quickstart.html), and follow the instructions in section [Install Prerequisits on Linux](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/llama_cpp_quickstart.html#linux) , and section [Install IPEX-LLM cpp](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/llama_cpp_quickstart.html#install-ipex-llm-for-llama-cpp) to install the IPEX-LLM with Ollama binaries.
|
||||
|
||||

|
||||
**After the installation, you should have created a conda environment, named `llm-cpp` for instance, for running `llama.cpp` commands with IPEX-LLM.**
|
||||
|
||||
To expose the `ollama` service port and access it from another machine, use `OLLAMA_HOST=0.0.0.0 ./ollama serve`.
|
||||
### 2. Initialize Ollama
|
||||
|
||||
Open another terminal, use `./ollama pull <model_name>` to download a model locally.
|
||||
Activate the `llm-cpp` conda environment and initialize Ollama by executing the commands below. A symbolic link to `ollama` will appear in your current directory.
|
||||
|
||||

|
||||
```bash
|
||||
conda activate llm-cpp
|
||||
init-ollama
|
||||
```
|
||||
|
||||
Verify the setup with the following command:
|
||||
### 3 Run Ollama Serve
|
||||
|
||||
|
||||
Launch the Ollama service:
|
||||
|
||||
```bash
|
||||
conda activate llm-cpp
|
||||
|
||||
export no_proxy=localhost,127.0.0.1
|
||||
export ZES_ENABLE_SYSMAN=1
|
||||
source /opt/intel/oneapi/setvars.sh
|
||||
|
||||
./ollama serve
|
||||
```
|
||||
|
||||
```eval_rst
|
||||
.. note::
|
||||
|
||||
To allow the service to accept connections from all IP addresses, use `OLLAMA_HOST=0.0.0.0 ./ollama serve` instead of just `./ollama serve`.
|
||||
```
|
||||
|
||||
The console will display messages similar to the following:
|
||||
|
||||
<a href="https://llm-assets.readthedocs.io/en/latest/_images/webui_quickstart_chat.png" target="_blank">
|
||||
<img src="https://llm-assets.readthedocs.io/en/latest/_images/ollama_serve.png" width=100%; />
|
||||
</a>
|
||||
|
||||
|
||||
|
||||
### 4 Pull Model
|
||||
Keep the Ollama service on and open a new terminal and pull a model, e.g. `dolphin-phi:latest`:
|
||||
|
||||
<a href="https://llm-assets.readthedocs.io/en/latest/_images/webui_quickstart_chat.png" target="_blank">
|
||||
<img src="https://llm-assets.readthedocs.io/en/latest/_images/ollama_pull.png" width=100%; />
|
||||
</a>
|
||||
|
||||
|
||||
|
||||
### 5 Using Ollama
|
||||
|
||||
#### Using Curl
|
||||
|
||||
Using `curl` is the easiest way to verify the API service and model. Execute the following commands in a terminal. **Replace the <model_name> with your pulled model**, e.g. `dolphin-phi`.
|
||||
|
||||
```shell
|
||||
curl http://localhost:11434/api/generate -d '
|
||||
|
|
@ -31,14 +75,30 @@ curl http://localhost:11434/api/generate -d '
|
|||
}'
|
||||
```
|
||||
|
||||
Expected results:
|
||||
An example output of using model `doplphin-phi` looks like the following:
|
||||
|
||||

|
||||
<a href="https://llm-assets.readthedocs.io/en/latest/_images/webui_quickstart_chat.png" target="_blank">
|
||||
<img src="https://llm-assets.readthedocs.io/en/latest/_images/ollama_curl.png" width=100%; />
|
||||
</a>
|
||||
|
||||
### 3 Example: Ollama Run
|
||||
|
||||
You can use `./ollama run <model_name>` to automatically pull and load the model for a stream chat.
|
||||
#### Using Ollama Run
|
||||
|
||||

|
||||
You can also use `ollama run` to run the model directly on console. **Replace the <model_name> with your pulled model**, e.g. `dolphin-phi`. This command will seamlessly download, load the model, and enable you to interact with it through a streaming conversation."
|
||||
|
||||

|
||||
|
||||
```bash
|
||||
conda activate llm-cpp
|
||||
|
||||
export no_proxy=localhost,127.0.0.1
|
||||
export ZES_ENABLE_SYSMAN=1
|
||||
source /opt/intel/oneapi/setvars.sh
|
||||
|
||||
./ollama run <model_name>
|
||||
```
|
||||
|
||||
An example process of interacting with model with `ollama run` looks like the following:
|
||||
|
||||
<a href="https://llm-assets.readthedocs.io/en/latest/_images/webui_quickstart_chat.png" target="_blank">
|
||||
<img src="https://llm-assets.readthedocs.io/en/latest/_images/ollama_run_1.png" width=100%; /><img src="https://llm-assets.readthedocs.io/en/latest/_images/ollama_run_2.png" width=100%; />
|
||||
</a>
|
||||
|
|
|
|||
Loading…
Reference in a new issue