small fix of cpp quickstart(#10829)
This commit is contained in:
parent
61c67af386
commit
1edb19c1dd
3 changed files with 41 additions and 3 deletions
|
|
@ -15,7 +15,7 @@ This quickstart guide walks you through how to run Llama 3 on Intel GPU using `l
|
||||||
|
|
||||||
#### 1.1 Install IPEX-LLM for llama.cpp and Initialize
|
#### 1.1 Install IPEX-LLM for llama.cpp and Initialize
|
||||||
|
|
||||||
Visit [Run llama.cpp with IPEX-LLM on Intel GPU Guide](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/llama_cpp_quickstart.html), and follow the instructions in section [Prerequisites](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/llama_cpp_quickstart.html#prerequisites) to setup and section [Install IPEX-LLM for llama.cpp](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/llama_cpp_quickstart.html#install-ipex-llm-for-llama-cpp) to install the IPEX-LLM with llama.cpp binaries, then follow the instructions in section [Initialize llama.cpp with IPEX-LLM](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/llama_cpp_quickstart.html#prerequisites) to initialize.
|
Visit [Run llama.cpp with IPEX-LLM on Intel GPU Guide](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/llama_cpp_quickstart.html), and follow the instructions in section [Prerequisites](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/llama_cpp_quickstart.html#prerequisites) to setup and section [Install IPEX-LLM for llama.cpp](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/llama_cpp_quickstart.html#install-ipex-llm-for-llama-cpp) to install the IPEX-LLM with llama.cpp binaries, then follow the instructions in section [Initialize llama.cpp with IPEX-LLM](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/llama_cpp_quickstart.html#initialize-llama-cpp-with-ipex-llm) to initialize.
|
||||||
|
|
||||||
**After above steps, you should have created a conda environment, named `llm-cpp` for instance and have llama.cpp binaries in your current directory.**
|
**After above steps, you should have created a conda environment, named `llm-cpp` for instance and have llama.cpp binaries in your current directory.**
|
||||||
|
|
||||||
|
|
@ -29,6 +29,33 @@ Suppose you have downloaded a [Meta-Llama-3-8B-Instruct-Q4_K_M.gguf](https://hug
|
||||||
|
|
||||||
#### 1.3 Run Llama3 on Intel GPU using llama.cpp
|
#### 1.3 Run Llama3 on Intel GPU using llama.cpp
|
||||||
|
|
||||||
|
##### Set Environment Variables(optional)
|
||||||
|
|
||||||
|
```eval_rst
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
This is a required step on for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
|
||||||
|
```
|
||||||
|
|
||||||
|
Configure oneAPI variables by running the following command:
|
||||||
|
|
||||||
|
```eval_rst
|
||||||
|
.. tabs::
|
||||||
|
.. tab:: Linux
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
source /opt/intel/oneapi/setvars.sh
|
||||||
|
|
||||||
|
.. tab:: Windows
|
||||||
|
|
||||||
|
.. code-block:: bash
|
||||||
|
|
||||||
|
call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
|
||||||
|
```
|
||||||
|
|
||||||
|
##### Run llama3
|
||||||
|
|
||||||
Under your current directory, exceuting below command to do inference with Llama3:
|
Under your current directory, exceuting below command to do inference with Llama3:
|
||||||
|
|
||||||
```eval_rst
|
```eval_rst
|
||||||
|
|
@ -99,6 +126,7 @@ Launch the Ollama service:
|
||||||
export ZES_ENABLE_SYSMAN=1
|
export ZES_ENABLE_SYSMAN=1
|
||||||
export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
|
export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
|
||||||
export OLLAMA_NUM_GPU=999
|
export OLLAMA_NUM_GPU=999
|
||||||
|
# Below is a required step for APT or offline installed oneAPI. Skip below step for PIP-installed oneAPI.
|
||||||
source /opt/intel/oneapi/setvars.sh
|
source /opt/intel/oneapi/setvars.sh
|
||||||
|
|
||||||
./ollama serve
|
./ollama serve
|
||||||
|
|
@ -112,6 +140,7 @@ Launch the Ollama service:
|
||||||
set no_proxy=localhost,127.0.0.1
|
set no_proxy=localhost,127.0.0.1
|
||||||
set ZES_ENABLE_SYSMAN=1
|
set ZES_ENABLE_SYSMAN=1
|
||||||
set OLLAMA_NUM_GPU=999
|
set OLLAMA_NUM_GPU=999
|
||||||
|
# Below is a required step for APT or offline installed oneAPI. Skip below step for PIP-installed oneAPI.
|
||||||
call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
|
call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
|
||||||
|
|
||||||
ollama serve
|
ollama serve
|
||||||
|
|
@ -124,7 +153,7 @@ Launch the Ollama service:
|
||||||
To allow the service to accept connections from all IP addresses, use `OLLAMA_HOST=0.0.0.0 ./ollama serve` instead of just `./ollama serve`.
|
To allow the service to accept connections from all IP addresses, use `OLLAMA_HOST=0.0.0.0 ./ollama serve` instead of just `./ollama serve`.
|
||||||
```
|
```
|
||||||
|
|
||||||
#### 2.2.2 Using Ollama Run Llama3
|
##### 2.2.2 Using Ollama Run Llama3
|
||||||
|
|
||||||
Keep the Ollama service on and open another terminal and run llama3 with `ollama run`:
|
Keep the Ollama service on and open another terminal and run llama3 with `ollama run`:
|
||||||
|
|
||||||
|
|
|
||||||
|
|
@ -82,7 +82,14 @@ Then you can use following command to initialize `llama.cpp` with IPEX-LLM:
|
||||||
|
|
||||||
Here we provide a simple example to show how to run a community GGUF model with IPEX-LLM.
|
Here we provide a simple example to show how to run a community GGUF model with IPEX-LLM.
|
||||||
|
|
||||||
#### Set Environment Variables
|
#### Set Environment Variables(optional)
|
||||||
|
|
||||||
|
```eval_rst
|
||||||
|
.. note::
|
||||||
|
|
||||||
|
This is a required step on for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
|
||||||
|
```
|
||||||
|
|
||||||
Configure oneAPI variables by running the following command:
|
Configure oneAPI variables by running the following command:
|
||||||
|
|
||||||
```eval_rst
|
```eval_rst
|
||||||
|
|
|
||||||
|
|
@ -55,6 +55,7 @@ You may launch the Ollama service as below:
|
||||||
export OLLAMA_NUM_GPU=999
|
export OLLAMA_NUM_GPU=999
|
||||||
export no_proxy=localhost,127.0.0.1
|
export no_proxy=localhost,127.0.0.1
|
||||||
export ZES_ENABLE_SYSMAN=1
|
export ZES_ENABLE_SYSMAN=1
|
||||||
|
# Below is a required step for APT or offline installed oneAPI. Skip below step for PIP-installed oneAPI.
|
||||||
source /opt/intel/oneapi/setvars.sh
|
source /opt/intel/oneapi/setvars.sh
|
||||||
|
|
||||||
./ollama serve
|
./ollama serve
|
||||||
|
|
@ -68,6 +69,7 @@ You may launch the Ollama service as below:
|
||||||
set OLLAMA_NUM_GPU=999
|
set OLLAMA_NUM_GPU=999
|
||||||
set no_proxy=localhost,127.0.0.1
|
set no_proxy=localhost,127.0.0.1
|
||||||
set ZES_ENABLE_SYSMAN=1
|
set ZES_ENABLE_SYSMAN=1
|
||||||
|
# Below is a required step for APT or offline installed oneAPI. Skip below step for PIP-installed oneAPI.
|
||||||
call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
|
call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
|
||||||
|
|
||||||
ollama serve
|
ollama serve
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue