Update WebUI Quickstart (#10630)

2024-04-02 21:49:19 +08:00 · 2024-04-02 21:49:19 +08:00 · e184c480d2
commit e184c480d2
parent fd384ddfb8
2 changed files with 42 additions and 27 deletions
--- a/docs/readthedocs/source/doc/LLM/Quickstart/index.rst
+++ b/docs/readthedocs/source/doc/LLM/Quickstart/index.rst
@ -12,9 +12,9 @@ This section includes efficient guide to show you how to:
 * `Install IPEX-LLM on Linux with Intel GPU <./install_linux_gpu.html>`_
 * `Install IPEX-LLM on Windows with Intel GPU <./install_windows_gpu.html>`_
 * `Install IPEX-LLM in Docker on Windows with Intel GPU <./docker_windows_gpu.html>`_
-* `Use Text Generation WebUI on Windows with Intel GPU <./webui_quickstart.html>`_
-* `Conduct Performance Benchmarking with IPEX-LLM <./benchmark_quickstart.html>`_
-* `Use llama.cpp with IPEX-LLM on Intel GPU <./llama_cpp_quickstart.html>`_
+* `Run Text Generation WebUI on Intel GPU <./webui_quickstart.html>`_
+* `Run Performance Benchmarking with IPEX-LLM <./benchmark_quickstart.html>`_
+* `Run llama.cpp with IPEX-LLM on Intel GPU <./llama_cpp_quickstart.html>`_

 .. |bigdl_llm_migration_guide| replace:: ``bigdl-llm`` Migration Guide
-.. _bigdl_llm_migration_guide: bigdl_llm_migration.html
+.. _bigdl_llm_migration_guide: bigdl_llm_migration.html
--- a/docs/readthedocs/source/doc/LLM/Quickstart/webui_quickstart.md
+++ b/docs/readthedocs/source/doc/LLM/Quickstart/webui_quickstart.md
@ -1,29 +1,45 @@
+# Run Text Generation WebUI on Intel GPU

-# Use Text Generation WebUI on Windows with Intel GPU
+The [oobabooga/text-generation-webui](https://github.com/oobabooga/text-generation-webui) provides a user friendly GUI for anyone to run LLM locally; by porting it to [`ipex-llm`](https://github.com/intel-analytics/ipex-llm), users can now easily run LLM in [Text Generation WebUI](https://github.com/intel-analytics/text-generation-webui) on Intel GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max); see the demo of running LLaMA2-7B on an Intel Core Ultra laptop below.

-This quickstart guide walks you through setting up and using the [Text Generation WebUI](https://github.com/intel-analytics/text-generation-webui) (a Gradio WebUI for running Large Language Models) with `ipex-llm`. 
+```eval_rst
+  .. raw:: html
+   
+    <table width="100%">
+      <tr>
+         <td>
+            <video width=100% controls>
+              <source src="https://llm-assets.readthedocs.io/en/latest/_images/webui-mtl.mp4">
+              Your browser does not support the video tag.
+            </video>           
+         </td>
+      </tr>
+    </table>
+```

+## Quickstart
+This quickstart guide walks you through setting up and using the [Text Generation WebUI](https://github.com/intel-analytics/text-generation-webui) with `ipex-llm`. 

 A preview of the WebUI in action is shown below:

 <a href="https://llm-assets.readthedocs.io/en/latest/_images/webui_quickstart_chat.png" target="_blank">
-  <img src="https://llm-assets.readthedocs.io/en/latest/_images/webui_quickstart_chat.png" width=100%; />
+  <img src="https://llm-assets.readthedocs.io/en/latest/_images/webui_quickstart_chat.png" width=80%; />
 </a>


-## 1 Install IPEX-LLM
+### 1 Install IPEX-LLM

 To use the WebUI, first ensure that IPEX-LLM is installed. Follow the instructions on the [IPEX-LLM Installation Quickstart for Windows with Intel GPU](install_windows_gpu.html). 

 **After the installation, you should have created a conda environment, named `llm` for instance, for running `ipex-llm` applications.**

-## 2 Install the WebUI
+### 2 Install the WebUI


-### Download the WebUI
+#### Download the WebUI
 Download the `text-generation-webui` with IPEX-LLM integrations from [this link](https://github.com/intel-analytics/text-generation-webui/archive/refs/heads/ipex-llm.zip). Unzip the content into a directory, e.g.,`C:\text-generation-webui`. 
  
-### Install Dependencies
+#### Install Dependencies

 Open **Anaconda Prompt** and activate the conda environment you have created in [section 1](#1-install-ipex-llm), e.g., `llm`. 
 ```
@ -35,9 +51,9 @@ cd C:\text-generation-webui
 pip install -r requirements_cpu_only.txt
 ```

-## 3 Start the WebUI Server
+### 3 Start the WebUI Server

-### Set Environment Variables
+#### Set Environment Variables
 Configure oneAPI variables by running the following command in **Anaconda Prompt**:

 ```eval_rst
@ -55,7 +71,7 @@ set SYCL_CACHE_PERSISTENT=1
 set BIGDL_LLM_XMX_DISABLED=1
 ```

-### Launch the Server
+#### Launch the Server
 In **Anaconda Prompt** with the conda environment `llm` activated, navigate to the text-generation-webui folder and start the server using the following command:

 ```eval_rst
@ -68,16 +84,16 @@ In **Anaconda Prompt** with the conda environment `llm` activated, navigate to t
   python server.py --load-in-4bit
   ```

-### Access the WebUI
+#### Access the WebUI
 Upon successful launch, URLs to access the WebUI will be displayed in the terminal as shown below. Open the provided local URL in your browser to interact with the WebUI. 

 <a href="https://llm-assets.readthedocs.io/en/latest/_images/webui_quickstart_launch_server.png" target="_blank">
  <img src="https://llm-assets.readthedocs.io/en/latest/_images/webui_quickstart_launch_server.png" width=100%; />
 </a>

-## 4. Using the WebUI
+### 4. Using the WebUI

-### Model Download
+#### Model Download

 Place Huggingface models in `C:\text-generation-webui\models` by either copying locally or downloading via the WebUI. To download, navigate to the **Model** tab, enter the model's huggingface id (for instance, `microsoft/phi-1_5`) in the **Download model or LoRA** section, and click **Download**, as illustrated below. 

@ -91,7 +107,7 @@ After copying or downloading the models, click on the blue **refresh** button to
  <img src="https://llm-assets.readthedocs.io/en/latest/_images/webui_quickstart_select_model.png" width=100%; />
 </a>

-### Load Model
+#### Load Model

 Default settings are recommended for most users. Click **Load** to activate the model. Address any errors by installing missing packages as prompted, and ensure compatibility with your version of the transformers package. Refer to [troubleshooting section](#troubleshooting) for more details.

@ -101,7 +117,7 @@ If everything goes well, you will get a message as shown below.
  <img src="https://llm-assets.readthedocs.io/en/latest/_images/webui_quickstart_load_model_success.png" width=100%; />
 </a>

-### Chat with the Model
+#### Chat with the Model

 In the **Chat** tab, start new conversations with **New chat**. 

@ -118,13 +134,13 @@ Enter prompts into the textbox at the bottom and press the **Generate** button t

 * Please see [Chat-Tab Wiki](https://github.com/oobabooga/text-generation-webui/wiki/01-%E2%80%90-Chat-Tab) for more details. -->

-### Exit the WebUI
+#### Exit the WebUI

 To shut down the WebUI server, use **Ctrl+C** in the **Anaconda Prompt** terminal where the WebUI Server is runing, then close your browser tab.


-## 5. Advanced Usage
-### Using Instruct mode
+### 5. Advanced Usage
+#### Using Instruct mode
 Instruction-following models are models that are fine-tuned with specific prompt formats. 
 For these models, you should ideally use the `instruct` chat mode.
 Under this mode, the model receives user prompts that are formatted according to prompt formats it was trained with.
@ -148,7 +164,7 @@ You can also manually select an instruction template from `Saved instruction tem
 You can add custom template files to this list in `/instruction-templates/` [folder](https://github.com/intel-analytics/text-generation-webui/tree/ipex-llm/instruction-templates).
 <!-- For instance, the automatically loaded instruction template for `chatGLM3` model is incorrect, and you should manually select the `chatGLM3` instruction template. -->

-### Tested models
+#### Tested models
 We have tested the following models with `ipex-llm` using Text Generation WebUI.

 | Model | Notes |
@ -159,13 +175,13 @@ We have tested the following models with `ipex-llm` using Text Generation WebUI.
 | qwen-7B-Chat       |          |


-## Troubleshooting
+### Troubleshooting

 ### Potentially slower first response

 The first response to user prompt might be slower than expected, with delays of up to several minutes before the response is generated. This delay occurs because the GPU kernels require compilation and initialization, which varies across different GPU types.

-### Missing Required Dependencies
+#### Missing Required Dependencies

 During model loading, you may encounter an **ImportError** like `ImportError: This modeling file requires the following packages that were not found in your environment`. This indicates certain packages required by the model are absent from your environment. Detailed instructions for installing these necessary packages can be found at the bottom of the error messages. Take the following steps to fix these errors:

@ -175,8 +191,7 @@ During model loading, you may encounter an **ImportError** like `ImportError: Th

 If there are still errors on missing packages, repeat the installation process for any additional required packages.

-
-### Compatiblity issues
+#### Compatiblity issues
 If you encounter **AttributeError** errors like `AttributeError: 'BaichuanTokenizer' object has no attribute 'sp_model'`, it may be due to some models being incompatible with the current version of the transformers package because the models are outdated. In such instances, using a more recent model is recommended.
 <!-- 
 <a href="https://llm-assets.readthedocs.io/en/latest/_images/webui_quickstart_load_model_error.png">