Update quickstart (#10654)
This commit is contained in:
		
							parent
							
								
									f84e72e7af
								
							
						
					
					
						commit
						7c08d83d9e
					
				
					 2 changed files with 22 additions and 19 deletions
				
			
		| 
						 | 
				
			
			@ -1,25 +1,26 @@
 | 
			
		|||
# Run llama.cpp with IPEX-LLM on Intel GPU 
 | 
			
		||||
 | 
			
		||||
Now you can use IPEX-LLM as an Intel GPU accelerated backend of [llama.cpp](https://github.com/ggerganov/llama.cpp). This quickstart guide walks you through setting up and using [llama.cpp](https://github.com/ggerganov/llama.cpp) with `ipex-llm` on Intel GPU (both iGPU and dGPU). 
 | 
			
		||||
[ggerganov/llama.cpp](https://github.com/ggerganov/llama.cpp) prvoides fast LLM inference in in pure C++ across a variety of hardware; you can now use the C++ interface of `ipex-llm` as an accelerated backend for `llama.cpp` running on Intel **GPU** *(e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max)*.
 | 
			
		||||
 | 
			
		||||
```eval_rst
 | 
			
		||||
.. note::
 | 
			
		||||
See the demo of running LLaMA2-7B on Intel Arc GPU below.
 | 
			
		||||
 | 
			
		||||
   ``llama.cpp`` now provides Q-series (Q4_0 / Q4_1 / Q8_0 / Q4_K / Q5_K / Q6_K /...) and IQ-series(IQ2 / IQ3 / IQ4 /...) quantization types. Only Q-series GGUF models are supported in IPEX-LLM now, support for IQ-series is still work in progress.
 | 
			
		||||
```
 | 
			
		||||
<video src="https://llm-assets.readthedocs.io/en/latest/_images/llama-cpp-arc.mp4" width="100%" controls></video>
 | 
			
		||||
 | 
			
		||||
## 0 Prerequisites
 | 
			
		||||
## Quick Start
 | 
			
		||||
This quickstart guide walks you through installing and running `llama.cpp` with `ipex-llm`.
 | 
			
		||||
 | 
			
		||||
### 0 Prerequisites
 | 
			
		||||
IPEX-LLM's support for `llama.cpp` now is avaliable for Linux system and Windows system.
 | 
			
		||||
 | 
			
		||||
### Linux
 | 
			
		||||
#### Linux
 | 
			
		||||
For Linux system, we recommend Ubuntu 20.04 or later (Ubuntu 22.04 is preferred).
 | 
			
		||||
 | 
			
		||||
Visit the [Install IPEX-LLM on Linux with Intel GPU](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_linux_gpu.html), follow [Install Intel GPU Driver](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_linux_gpu.html#install-intel-gpu-driver) and [Install oneAPI](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_linux_gpu.html#install-oneapi) to install GPU driver and [Intel® oneAPI Base Toolkit 2024.0](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html).
 | 
			
		||||
 | 
			
		||||
### Windows
 | 
			
		||||
#### Windows
 | 
			
		||||
Visit the [Install IPEX-LLM on Windows with Intel GPU Guide](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_windows_gpu.html), and follow [Install Prerequisites](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/install_windows_gpu.html#install-prerequisites) to install [Visual Studio 2022](https://visualstudio.microsoft.com/downloads/) Community Edition, latest [GPU driver](https://www.intel.com/content/www/us/en/download/785597/intel-arc-iris-xe-graphics-windows.html) and Intel® oneAPI Base Toolkit 2024.0.
 | 
			
		||||
 | 
			
		||||
## 1 Install IPEX-LLM for llama.cpp
 | 
			
		||||
### 1 Install IPEX-LLM for llama.cpp
 | 
			
		||||
 | 
			
		||||
To use `llama.cpp` with IPEX-LLM, first ensure that `ipex-llm[cpp]` is installed.
 | 
			
		||||
```cmd
 | 
			
		||||
| 
						 | 
				
			
			@ -30,7 +31,7 @@ pip install --pre --upgrade ipex-llm[cpp]
 | 
			
		|||
 | 
			
		||||
**After the installation, you should have created a conda environment, named `llm-cpp` for instance, for running `llama.cpp` commands with IPEX-LLM.**
 | 
			
		||||
 | 
			
		||||
## 2 Setup for running llama.cpp
 | 
			
		||||
### 2 Setup for running llama.cpp
 | 
			
		||||
 | 
			
		||||
First you should create a directory to use `llama.cpp`, for instance, use following command to create a `llama-cpp` directory and enter it.
 | 
			
		||||
```cmd
 | 
			
		||||
| 
						 | 
				
			
			@ -38,7 +39,7 @@ mkdir llama-cpp
 | 
			
		|||
cd llama-cpp
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
### Initialize llama.cpp with IPEX-LLM
 | 
			
		||||
#### Initialize llama.cpp with IPEX-LLM
 | 
			
		||||
 | 
			
		||||
Then you can use following command to initialize `llama.cpp` with IPEX-LLM:
 | 
			
		||||
```eval_rst
 | 
			
		||||
| 
						 | 
				
			
			@ -75,11 +76,11 @@ Then you can use following command to initialize `llama.cpp` with IPEX-LLM:
 | 
			
		|||
 | 
			
		||||
**Now you can use these executable files by standard llama.cpp's usage.**
 | 
			
		||||
 | 
			
		||||
## 3 Example: Running community GGUF models with IPEX-LLM
 | 
			
		||||
### 3 Example: Running community GGUF models with IPEX-LLM
 | 
			
		||||
 | 
			
		||||
Here we provide a simple example to show how to run a community GGUF model with IPEX-LLM.
 | 
			
		||||
 | 
			
		||||
### Set Environment Variables
 | 
			
		||||
#### Set Environment Variables
 | 
			
		||||
Configure oneAPI variables by running the following command:
 | 
			
		||||
 | 
			
		||||
```eval_rst
 | 
			
		||||
| 
						 | 
				
			
			@ -97,10 +98,10 @@ Configure oneAPI variables by running the following command:
 | 
			
		|||
         call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
### Model Download
 | 
			
		||||
#### Model Download
 | 
			
		||||
Before running, you should download or copy community GGUF model to your current directory. For instance,  `mistral-7b-instruct-v0.1.Q4_K_M.gguf` of [Mistral-7B-Instruct-v0.1-GGUF](https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF/tree/main).
 | 
			
		||||
 | 
			
		||||
### Run the quantized model
 | 
			
		||||
#### Run the quantized model
 | 
			
		||||
 | 
			
		||||
```eval_rst
 | 
			
		||||
.. tabs::
 | 
			
		||||
| 
						 | 
				
			
			@ -125,7 +126,7 @@ Before running, you should download or copy community GGUF model to your current
 | 
			
		|||
      For more details about meaning of each parameter, you can use ``main.exe -h``.
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
### Sample Output
 | 
			
		||||
#### Sample Output
 | 
			
		||||
```
 | 
			
		||||
Log start
 | 
			
		||||
main: build = 1 (38bcbd4)
 | 
			
		||||
| 
						 | 
				
			
			@ -254,7 +255,7 @@ llama_print_timings:       total time =    xx.xx ms /    62 tokens
 | 
			
		|||
Log end
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
## Troubleshooting
 | 
			
		||||
### Troubleshooting
 | 
			
		||||
 | 
			
		||||
### Fail to quantize model
 | 
			
		||||
#### Fail to quantize model
 | 
			
		||||
If you encounter `main: failed to quantize model from xxx`, please make sure you have created related output directory.
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
| 
						 | 
				
			
			@ -1,6 +1,8 @@
 | 
			
		|||
# Run Text Generation WebUI on Intel GPU
 | 
			
		||||
 | 
			
		||||
The [oobabooga/text-generation-webui](https://github.com/oobabooga/text-generation-webui) provides a user friendly GUI for anyone to run LLM locally; by porting it to [`ipex-llm`](https://github.com/intel-analytics/ipex-llm), users can now easily run LLM in [Text Generation WebUI](https://github.com/intel-analytics/text-generation-webui) on Intel GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max); see the demo of running LLaMA2-7B on an Intel Core Ultra laptop below.
 | 
			
		||||
The [oobabooga/text-generation-webui](https://github.com/oobabooga/text-generation-webui) provides a user friendly GUI for anyone to run LLM locally; by porting it to [`ipex-llm`](https://github.com/intel-analytics/ipex-llm), users can now easily run LLM in [Text Generation WebUI](https://github.com/intel-analytics/text-generation-webui) on Intel **GPU** *(e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max)*.
 | 
			
		||||
 | 
			
		||||
See the demo of running LLaMA2-7B on an Intel Core Ultra laptop below.
 | 
			
		||||
 | 
			
		||||
<video src="https://llm-assets.readthedocs.io/en/latest/_images/webui-mtl.mp4" width="100%" controls></video>
 | 
			
		||||
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
		Loading…
	
		Reference in a new issue