Co-authored-by: arda <arda@arda-arc12.sh.intel.com>

2024-04-03 19:33:39 +08:00

1.9 KiB

Raw Blame History

Run Ollama on Intel GPU

1 Install Ollama integrated with IPEX-LLM

First ensure that IPEX-LLM is installed. Follow the instructions on the IPEX-LLM Installation Quickstart for Windows with Intel GPU. And activate your conda environment.

Run pip install --pre --upgrade ipex-llm[cpp], then execute init-ollama, you can see a softlink of ollamaunder your current directory.

2 Verify Ollama Serve

To avoid potential proxy issues, run export no_proxy=localhost,127.0.0.1. Execute export ZES_ENABLE_SYSMAN=1 and source /opt/intel/oneapi/setvars.sh to enable driver initialization and dependencies for system management.

Start the service using ./ollama serve. It should display something like:

To expose the ollama service port and access it from another machine, use OLLAMA_HOST=0.0.0.0 ./ollama serve.

Open another terminal, use ./ollama pull <model_name> to download a model locally.

Verify the setup with the following command:

curl http://localhost:11434/api/generate -d '
{ 
  "model": "<model_name>", 
  "prompt": "Why is the sky blue?", 
  "stream": false 
}'

Expected results:

3 Example: Ollama Run

You can use ./ollama run <model_name> to automatically pull and load the model for a stream chat.

1.9 KiB Raw Blame History

Run Ollama on Intel GPU

1 Install Ollama integrated with IPEX-LLM

2 Verify Ollama Serve

3 Example: Ollama Run

1.9 KiB

Raw Blame History