Update quickstart (#10654)
This commit is contained in:
parent
f84e72e7af
commit
7c08d83d9e
2 changed files with 22 additions and 19 deletions
|
|
@ -1,25 +1,26 @@
|
|||
# Run llama.cpp with IPEX-LLM on Intel GPU
|
||||
|
||||
Now you can use IPEX-LLM as an Intel GPU accelerated backend of [llama.cpp](https://github.com/ggerganov/llama.cpp). This quickstart guide walks you through setting up and using [llama.cpp](https://github.com/ggerganov/llama.cpp) with `ipex-llm` on Intel GPU (both iGPU and dGPU).
|
||||
[ggerganov/llama.cpp](https://github.com/ggerganov/llama.cpp) prvoides fast LLM inference in in pure C++ across a variety of hardware; you can now use the C++ interface of `ipex-llm` as an accelerated backend for `llama.cpp` running on Intel **GPU** *(e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max)*.
|
||||
|
||||
```eval_rst
|
||||
.. note::
|
||||
See the demo of running LLaMA2-7B on Intel Arc GPU below.
|
||||
|
||||
``llama.cpp`` now provides Q-series (Q4_0 / Q4_1 / Q8_0 / Q4_K / Q5_K / Q6_K /...) and IQ-series(IQ2 / IQ3 / IQ4 /...) quantization types. Only Q-series GGUF models are supported in IPEX-LLM now, support for IQ-series is still work in progress.
|
||||
```
|
||||
<video src="https://llm-assets.readthedocs.io/en/latest/_images/llama-cpp-arc.mp4" width="100%" controls></video>
|
||||
|
||||
## 0 Prerequisites
|
||||
## Quick Start
|
||||
This quickstart guide walks you through installing and running `llama.cpp` with `ipex-llm`.
|
||||
|
||||
### 0 Prerequisites
|
||||
IPEX-LLM's support for `llama.cpp` now is avaliable for Linux system and Windows system.
|
||||
|
||||
### Linux
|
||||
#### Linux
|
||||
For Linux system, we recommend Ubuntu 20.04 or later (Ubuntu 22.04 is preferred).
|
||||
|
||||
Visit the [Install IPEX-LLM on Linux with Intel GPU](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_linux_gpu.html), follow [Install Intel GPU Driver](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_linux_gpu.html#install-intel-gpu-driver) and [Install oneAPI](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_linux_gpu.html#install-oneapi) to install GPU driver and [Intel® oneAPI Base Toolkit 2024.0](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html).
|
||||
|
||||
### Windows
|
||||
#### Windows
|
||||
Visit the [Install IPEX-LLM on Windows with Intel GPU Guide](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_windows_gpu.html), and follow [Install Prerequisites](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_windows_gpu.html#install-prerequisites) to install [Visual Studio 2022](https://visualstudio.microsoft.com/downloads/) Community Edition, latest [GPU driver](https://www.intel.com/content/www/us/en/download/785597/intel-arc-iris-xe-graphics-windows.html) and Intel® oneAPI Base Toolkit 2024.0.
|
||||
|
||||
## 1 Install IPEX-LLM for llama.cpp
|
||||
### 1 Install IPEX-LLM for llama.cpp
|
||||
|
||||
To use `llama.cpp` with IPEX-LLM, first ensure that `ipex-llm[cpp]` is installed.
|
||||
```cmd
|
||||
|
|
@ -30,7 +31,7 @@ pip install --pre --upgrade ipex-llm[cpp]
|
|||
|
||||
**After the installation, you should have created a conda environment, named `llm-cpp` for instance, for running `llama.cpp` commands with IPEX-LLM.**
|
||||
|
||||
## 2 Setup for running llama.cpp
|
||||
### 2 Setup for running llama.cpp
|
||||
|
||||
First you should create a directory to use `llama.cpp`, for instance, use following command to create a `llama-cpp` directory and enter it.
|
||||
```cmd
|
||||
|
|
@ -38,7 +39,7 @@ mkdir llama-cpp
|
|||
cd llama-cpp
|
||||
```
|
||||
|
||||
### Initialize llama.cpp with IPEX-LLM
|
||||
#### Initialize llama.cpp with IPEX-LLM
|
||||
|
||||
Then you can use following command to initialize `llama.cpp` with IPEX-LLM:
|
||||
```eval_rst
|
||||
|
|
@ -75,11 +76,11 @@ Then you can use following command to initialize `llama.cpp` with IPEX-LLM:
|
|||
|
||||
**Now you can use these executable files by standard llama.cpp's usage.**
|
||||
|
||||
## 3 Example: Running community GGUF models with IPEX-LLM
|
||||
### 3 Example: Running community GGUF models with IPEX-LLM
|
||||
|
||||
Here we provide a simple example to show how to run a community GGUF model with IPEX-LLM.
|
||||
|
||||
### Set Environment Variables
|
||||
#### Set Environment Variables
|
||||
Configure oneAPI variables by running the following command:
|
||||
|
||||
```eval_rst
|
||||
|
|
@ -97,10 +98,10 @@ Configure oneAPI variables by running the following command:
|
|||
call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
|
||||
```
|
||||
|
||||
### Model Download
|
||||
#### Model Download
|
||||
Before running, you should download or copy community GGUF model to your current directory. For instance, `mistral-7b-instruct-v0.1.Q4_K_M.gguf` of [Mistral-7B-Instruct-v0.1-GGUF](https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF/tree/main).
|
||||
|
||||
### Run the quantized model
|
||||
#### Run the quantized model
|
||||
|
||||
```eval_rst
|
||||
.. tabs::
|
||||
|
|
@ -125,7 +126,7 @@ Before running, you should download or copy community GGUF model to your current
|
|||
For more details about meaning of each parameter, you can use ``main.exe -h``.
|
||||
```
|
||||
|
||||
### Sample Output
|
||||
#### Sample Output
|
||||
```
|
||||
Log start
|
||||
main: build = 1 (38bcbd4)
|
||||
|
|
@ -254,7 +255,7 @@ llama_print_timings: total time = xx.xx ms / 62 tokens
|
|||
Log end
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
### Troubleshooting
|
||||
|
||||
### Fail to quantize model
|
||||
#### Fail to quantize model
|
||||
If you encounter `main: failed to quantize model from xxx`, please make sure you have created related output directory.
|
||||
|
|
|
|||
|
|
@ -1,6 +1,8 @@
|
|||
# Run Text Generation WebUI on Intel GPU
|
||||
|
||||
The [oobabooga/text-generation-webui](https://github.com/oobabooga/text-generation-webui) provides a user friendly GUI for anyone to run LLM locally; by porting it to [`ipex-llm`](https://github.com/intel-analytics/ipex-llm), users can now easily run LLM in [Text Generation WebUI](https://github.com/intel-analytics/text-generation-webui) on Intel GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max); see the demo of running LLaMA2-7B on an Intel Core Ultra laptop below.
|
||||
The [oobabooga/text-generation-webui](https://github.com/oobabooga/text-generation-webui) provides a user friendly GUI for anyone to run LLM locally; by porting it to [`ipex-llm`](https://github.com/intel-analytics/ipex-llm), users can now easily run LLM in [Text Generation WebUI](https://github.com/intel-analytics/text-generation-webui) on Intel **GPU** *(e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max)*.
|
||||
|
||||
See the demo of running LLaMA2-7B on an Intel Core Ultra laptop below.
|
||||
|
||||
<video src="https://llm-assets.readthedocs.io/en/latest/_images/webui-mtl.mp4" width="100%" controls></video>
|
||||
|
||||
|
|
|
|||
Loading…
Reference in a new issue