Add webUI quickstart (#10266)
* Add webUI quickstart * Add GPU driver install * Move images to readthedocs assets
This commit is contained in:
parent
4b08bc1417
commit
653cb500ed
1 changed files with 139 additions and 0 deletions
139
docs/readthedocs/source/doc/LLM/Quickstart/webui_quickstart.md
Normal file
139
docs/readthedocs/source/doc/LLM/Quickstart/webui_quickstart.md
Normal file
|
|
@ -0,0 +1,139 @@
|
|||
|
||||
# WebUI quickstart on Windows
|
||||
This quickstart tutorial provides a step-by-step guide on how to use Text-Generation-WebUI to run Hugging Face transformers-based applications on BigDL-LLM.
|
||||
|
||||
The WebUI is ported from [Text-Generation-WebUI](https://github.com/oobabooga/text-generation-webui).
|
||||
|
||||
## 1. Install and set up WebUI
|
||||
|
||||
### 1.1 Install GPU driver
|
||||
* Download and Install Visual Studio 2022 Community Edition from the [official Microsoft Visual Studio website](https://visualstudio.microsoft.com/downloads/). Ensure you select the **Desktop development with C++ workload** during the installation process.
|
||||
|
||||
> Note: The installation could take around 15 minutes, and requires at least 7GB of free disk space.
|
||||
> If you accidentally skip adding the **Desktop development with C++ workload** during the initial setup, you can add it afterward by navigating to **Tools > Get Tools and Features...**. Follow the instructions on [this Microsoft guide](https://learn.microsoft.com/en-us/cpp/build/vscpp-step-0-installation?view=msvc-170#step-4---choose-workloads) to update your installation.
|
||||
>
|
||||
> <img src="https://llm-assets.readthedocs.io/en/latest/_images/quickstart_windows_gpu_1.png" alt="image-20240221102252560" width=100%; />
|
||||
|
||||
* Download and install the latest GPU driver from the [official Intel download page](https://www.intel.com/content/www/us/en/download/785597/intel-arc-iris-xe-graphics-windows.html). A system reboot is necessary to apply the changes after the installation is complete.
|
||||
|
||||
> Note: The process could take around 10 minutes. After reboot, check for the **Intel Arc Control** application to verify the driver has been installed correctly. If the installation was successful, you should see the **Arc Control** interface similar to the figure below
|
||||
|
||||
> <img src="https://llm-assets.readthedocs.io/en/latest/_images/quickstart_windows_gpu_3.png" width=80%; />
|
||||
|
||||
* To monitor your GPU's performance and status, you can use either the **Windows Task Manager** (see the left side of the figure below) or the **Arc Control** application (see the right side of the figure below) :
|
||||
> <img src="https://llm-assets.readthedocs.io/en/latest/_images/quickstart_windows_gpu_4.png" width=70%; />
|
||||
|
||||
### 1.2 Set up Python Environment
|
||||
|
||||
* Visit [Miniconda installation page](https://docs.anaconda.com/free/miniconda/), download the **Miniconda installer for Windows**, and follow the instructions to complete the installation.
|
||||
|
||||
<!-- > <img src="https://llm-assets.readthedocs.io/en/latest/_images/quickstart_windows_gpu_5.png" width=50%; /> -->
|
||||
|
||||
* After installation, open the **Anaconda Prompt**, create a new python environment `llm`:
|
||||
```bash
|
||||
conda create -n llm python=3.9 libuv
|
||||
```
|
||||
* Activate the newly created environment `llm`:
|
||||
```bash
|
||||
conda activate llm
|
||||
```
|
||||
|
||||
### 1.3 Install oneAPI and `bigdl-llm`
|
||||
|
||||
* With the `llm` environment active, use `pip` to install the [**Intel oneAPI Base Toolkit**](https://www.intel.com/content/www/us/en/developer/tools/oneapi/overview.html):
|
||||
```bash
|
||||
pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
|
||||
```
|
||||
|
||||
* Use `pip` to install `bigdl-llm` for GPU:
|
||||
```bash
|
||||
pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-whl-stable-xpu
|
||||
```
|
||||
|
||||
### 1.4 Download WebUI
|
||||
Download text-generation-webui with `BigDL-LLM` optimizations from [here](https://github.com/intel-analytics/text-generation-webui/archive/refs/heads/bigdl-llm.zip) and unzip it to a folder. In this example, the text-generation-webui folder is `C:\text-generation-webui`
|
||||
|
||||
### 1.5 Install other dependencies
|
||||
In your **Anaconda Prompt** terminal, navigate to your unzipped text-generation-webui folder. Then use `pip` to install other WebUI dependencies:
|
||||
```bash
|
||||
pip install -r requirements_cpu_only.txt
|
||||
```
|
||||
|
||||
## 2. Start the WebUI Server
|
||||
|
||||
* Step 1: Open the **Anaconda Prompt** and activate the Python environment `llm` you previously created:
|
||||
```bash
|
||||
conda activate llm
|
||||
```
|
||||
|
||||
* Step 2: If you're running on iGPU, set some environment variables by running below commands:
|
||||
> For more details about runtime configurations, refer to [this guide](https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#runtime-configuration):
|
||||
```bash
|
||||
set SYCL_CACHE_PERSISTENT=1
|
||||
set BIGDL_LLM_XMX_DISABLED=1
|
||||
```
|
||||
|
||||
* Step 3: Navigate to your unzipped text-generation-webui folder (`C:\text-generation-webui` in this example) and launch webUI. Models will be optimized and run at 4-bit precision.
|
||||
```bash
|
||||
cd C:\text-generation-webui
|
||||
python server.py --load-in-4bit
|
||||
```
|
||||
|
||||
* Step 4: After the successful startup of the WebUI server, links to access WebUI are displayed in the terminal.
|
||||
<!-- ```bash
|
||||
Running on local URL: http://127.0.0.1:7860
|
||||
``` -->
|
||||
<img src="https://llm-assets.readthedocs.io/en/latest/_images/webui_quickstart_launch_server.png" width=80%; />
|
||||
|
||||
Open the local URL (eg., http://127.0.0.1:7864) in your web browser to access the webUI interface.
|
||||
|
||||
## 3. Using WebUI
|
||||
|
||||
### 3.1 Select the Model
|
||||
|
||||
First, you need to place huggingface models in `C:\text-generation-webui\models`.
|
||||
You can either copy a local model to that folder, or download a model from Huggingface Hub using webUI (VPN connection might be required).
|
||||
To download a model, navigate to `Model` tab, enter the Huggingface model `username/model path` under `Download model or LoRA` (for instance, `Qwen/Qwen-7B-Chat`), and click `Download`.
|
||||
|
||||
<img src="https://llm-assets.readthedocs.io/en/latest/_images/webui_quickstart_download_model.png" width=80%; />
|
||||
|
||||
After the models have been obtained, click the blue icon to refresh the `Model` drop-down list.
|
||||
Then select the model you want from the list.
|
||||
|
||||
<img src="https://llm-assets.readthedocs.io/en/latest/_images/webui_quickstart_select_model.png" width=80%; />
|
||||
|
||||
|
||||
### 3.2 Load the Model
|
||||
Using the default model settings are recommended.
|
||||
Click `Load` to load your model.
|
||||
|
||||
* For some modes, you might see an `ImportError: This modeling file requires the following packages that were not found in your environment` error message (scroll down to the end of the error messages) and instructions for installing the packages. This is because the models require additional pip packages.
|
||||
Stop the WebUI Server in the **Anaconda Prompt** terminal with `Ctrl+C`, install the pip packages, and then run the WebUI Server again.
|
||||
If there are still errors on missing packages, repeat the process of installing missing packages.
|
||||
|
||||
* Some models are too old and do not support the installed version of transformers package.
|
||||
In this case, errors like `AttributeError`, would appear. You are should use a more recent model.
|
||||
|
||||
<img src="https://llm-assets.readthedocs.io/en/latest/_images/webui_quickstart_load_model_error.png" width=80%; />
|
||||
|
||||
When the model is successfully loaded, you will get a message on this.
|
||||
|
||||
<img src="https://llm-assets.readthedocs.io/en/latest/_images/webui_quickstart_load_model_success.png" width=80%; />
|
||||
|
||||
### 3.3 Run the Model on WebUI
|
||||
Select the `Chat` tab. This interface supports having multi-turn conversations with the model.
|
||||
You may simply enter prompts and click the `Generate` button to get responses.
|
||||
You can start a new conversation by clicking `New chat`.
|
||||
|
||||
<img src="https://llm-assets.readthedocs.io/en/latest/_images/webui_quickstart_chat.png" width=80%; />
|
||||
|
||||
<!-- Notes:
|
||||
* Multi-turn conversations may consume GPU memory. You may specify the `Truncate the prompt up to this length` value in `Parameters` tab to reduce the GPU memory usage.
|
||||
|
||||
* You may switch to a single-turn conversation mode by turning off `Activate text streaming` in the Parameters tab.
|
||||
|
||||
* Please see [Chat-Tab Wiki](https://github.com/oobabooga/text-generation-webui/wiki/01-%E2%80%90-Chat-Tab) for more details. -->
|
||||
|
||||
### 3.4 Ending the program
|
||||
Go to the **Anaconda Prompt** terminal where the WebUI Server was launched, enter `Ctrl+C` to stop the server.
|
||||
Then close the webUI browser tab.
|
||||
Loading…
Reference in a new issue