Change quickstart documentation to use oneapi offline installer (#10350)
* Change to oneapi offline installer * Fixes * Add "call" * Fixes
This commit is contained in:
parent
9026c08633
commit
6829efd350
2 changed files with 41 additions and 27 deletions
|
|
@ -4,7 +4,7 @@ This guide demonstrates how to install BigDL-LLM on Windows with Intel GPUs.
|
|||
|
||||
It applies to Intel Core Ultra and Core 12 - 14 gen integrated GPUs (iGPUs), as well as Intel Arc Series GPU.
|
||||
|
||||
## Install GPU Driver
|
||||
## Install Visual Studio 2022
|
||||
|
||||
* Download and Install Visual Studio 2022 Community Edition from the [official Microsoft Visual Studio website](https://visualstudio.microsoft.com/downloads/). Ensure you select the **Desktop development with C++ workload** during the installation process.
|
||||
|
||||
|
|
@ -13,6 +13,8 @@ It applies to Intel Core Ultra and Core 12 - 14 gen integrated GPUs (iGPUs), as
|
|||
>
|
||||
> <img src="https://llm-assets.readthedocs.io/en/latest/_images/quickstart_windows_gpu_1.png" alt="image-20240221102252560" width=100%; />
|
||||
|
||||
## Install GPU Driver
|
||||
|
||||
* Download and install the latest GPU driver from the [official Intel download page](https://www.intel.com/content/www/us/en/download/785597/intel-arc-iris-xe-graphics-windows.html). A system reboot is necessary to apply the changes after the installation is complete.
|
||||
|
||||
> Note: The process could take around 10 minutes. After reboot, check for the **Intel Arc Control** application to verify the driver has been installed correctly. If the installation was successful, you should see the **Arc Control** interface similar to the figure below
|
||||
|
|
@ -22,6 +24,17 @@ It applies to Intel Core Ultra and Core 12 - 14 gen integrated GPUs (iGPUs), as
|
|||
* To monitor your GPU's performance and status, you can use either the **Windows Task Manager** (see the left side of the figure below) or the **Arc Control** application (see the right side of the figure below) :
|
||||
> <img src="https://llm-assets.readthedocs.io/en/latest/_images/quickstart_windows_gpu_4.png" width=70%; />
|
||||
|
||||
## Install oneAPI
|
||||
|
||||
<!-- * With the `llm` environment active, use `pip` to install the [**Intel oneAPI Base Toolkit**](https://www.intel.com/content/www/us/en/developer/tools/oneapi/overview.html):
|
||||
```cmd
|
||||
pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
|
||||
``` -->
|
||||
|
||||
* Download and install the [**Intel oneAPI Base Toolkit**](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html?operatingsystem=window&distributions=offline). During installation, you can continue with the default installation settings.
|
||||
|
||||
> <img src="https://llm-assets.readthedocs.io/en/latest/_images/quickstart_windows_gpu_oneapi_offline_installer.png" width=90%; />
|
||||
|
||||
## Setup Python Environment
|
||||
|
||||
* Visit [Miniconda installation page](https://docs.anaconda.com/free/miniconda/), download the **Miniconda installer for Windows**, and follow the instructions to complete the installation.
|
||||
|
|
@ -29,31 +42,24 @@ It applies to Intel Core Ultra and Core 12 - 14 gen integrated GPUs (iGPUs), as
|
|||
> <img src="https://llm-assets.readthedocs.io/en/latest/_images/quickstart_windows_gpu_5.png" width=50%; />
|
||||
|
||||
* After installation, open the **Anaconda Prompt**, create a new python environment `llm`:
|
||||
```bash
|
||||
```cmd
|
||||
conda create -n llm python=3.9 libuv
|
||||
```
|
||||
* Activate the newly created environment `llm`:
|
||||
```bash
|
||||
```cmd
|
||||
conda activate llm
|
||||
```
|
||||
|
||||
## Install oneAPI
|
||||
|
||||
* With the `llm` environment active, use `pip` to install the [**Intel oneAPI Base Toolkit**](https://www.intel.com/content/www/us/en/developer/tools/oneapi/overview.html):
|
||||
```bash
|
||||
pip install dpcpp-cpp-rt==2024.0.2 mkl-dpcpp==2024.0.0 onednn==2024.0.0
|
||||
```
|
||||
|
||||
## Install `bigdl-llm`
|
||||
|
||||
* With the `llm` environment active, use `pip` to install `bigdl-llm` for GPU:
|
||||
Choose either US or CN website for `extra-index-url`:
|
||||
* US:
|
||||
```bash
|
||||
```cmd
|
||||
pip install --pre --upgrade bigdl-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
|
||||
```
|
||||
* CN:
|
||||
```bash
|
||||
```cmd
|
||||
pip install --pre --upgrade bigdl-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/cn/
|
||||
```
|
||||
> Note: If you encounter network issues while installing IPEX, refer to [this guide](https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#install-bigdl-llm-from-wheel) for troubleshooting advice.
|
||||
|
|
@ -68,17 +74,21 @@ It applies to Intel Core Ultra and Core 12 - 14 gen integrated GPUs (iGPUs), as
|
|||
Now let's play with a real LLM. We'll be using the [phi-1.5](https://huggingface.co/microsoft/phi-1_5) model, a 1.3 billion parameter LLM for this demostration. Follow the steps below to setup and run the model, and observe how it responds to a prompt "What is AI?".
|
||||
|
||||
* Step 1: Open the **Anaconda Prompt** and activate the Python environment `llm` you previously created:
|
||||
```bash
|
||||
```cmd
|
||||
conda activate llm
|
||||
```
|
||||
* Step 2: If you're running on iGPU, set some environment variables by running below commands:
|
||||
> For more details about runtime configurations, refer to [this guide](https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#runtime-configuration):
|
||||
```bash
|
||||
* Step 2: Configure oneAPI variables by running the following command:
|
||||
> For more details about runtime configurations, refer to [this guide](https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#runtime-configuration):
|
||||
```cmd
|
||||
call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
|
||||
```
|
||||
If you're running on iGPU, set additional environment variables by running the following commands:
|
||||
```cmd
|
||||
set SYCL_CACHE_PERSISTENT=1
|
||||
set BIGDL_LLM_XMX_DISABLED=1
|
||||
```
|
||||
* Step 3: To ensure compatibility with `phi-1.5`, update the transformers library to version 4.37.0:
|
||||
```bash
|
||||
```cmd
|
||||
pip install -U transformers==4.37.0
|
||||
```
|
||||
* Step 4: Create a new file named `demo.py` and insert the code snippet below.
|
||||
|
|
@ -111,7 +121,7 @@ Now let's play with a real LLM. We'll be using the [phi-1.5](https://huggingface
|
|||
> This will allow the memory-intensive embedding layer to utilize the CPU instead of GPU.
|
||||
|
||||
* Step 5. Run `demo.py` within the activated Python environment using the following command:
|
||||
```bash
|
||||
```cmd
|
||||
python demo.py
|
||||
```
|
||||
|
||||
|
|
|
|||
|
|
@ -29,7 +29,7 @@ Open **Anaconda Prompt** and activate the conda environment you have created in
|
|||
conda activate llm
|
||||
```
|
||||
Then, change to the directory of WebUI (e.g.,`C:\text-generation-webui`) and install the necessary dependencies:
|
||||
```bash
|
||||
```cmd
|
||||
cd C:\text-generation-webui
|
||||
pip install -r requirements_cpu_only.txt
|
||||
```
|
||||
|
|
@ -37,23 +37,27 @@ pip install -r requirements_cpu_only.txt
|
|||
## 3 Start the WebUI Server
|
||||
|
||||
### Set Environment Variables
|
||||
If you're running on iGPUs, set some environment variables by running below commands in **Anaconda Prompt**:
|
||||
> Note: For more details about runtime configurations, refer to [this link](../Overview/install_gpu.html#runtime-configuration):
|
||||
```bash
|
||||
set SYCL_CACHE_PERSISTENT=1
|
||||
set BIGDL_LLM_XMX_DISABLED=1
|
||||
```
|
||||
Configure oneAPI variables by running the following command in **Anaconda Prompt**:
|
||||
> Note: For more details about runtime configurations, refer to [this guide](https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#runtime-configuration):
|
||||
```cmd
|
||||
call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
|
||||
```
|
||||
If you're running on iGPU, set additional environment variables by running the following commands:
|
||||
```cmd
|
||||
set SYCL_CACHE_PERSISTENT=1
|
||||
set BIGDL_LLM_XMX_DISABLED=1
|
||||
```
|
||||
|
||||
### Launch the Server
|
||||
In **Anaconda Prompt** with the conda environment `llm` activated, navigate to the text-generation-webui folder and start the server using the following command:
|
||||
> Note: with `--load-in-4bit` option, the models will be optimized and run at 4-bit precision. For configuration for other formats and precisions, refer to [this link](https://github.com/intel-analytics/text-generation-webui?tab=readme-ov-file#32-optimizations-for-other-percisions).
|
||||
```bash
|
||||
```cmd
|
||||
python server.py --load-in-4bit
|
||||
```
|
||||
|
||||
### Access the WebUI
|
||||
Upon successful launch, URLs to access the WebUI will be displayed in the terminal as shown below. Open the provided local URL in your browser to interact with the WebUI.
|
||||
<!-- ```bash
|
||||
<!-- ```cmd
|
||||
Running on local URL: http://127.0.0.1:7860
|
||||
``` -->
|
||||
<img src="https://llm-assets.readthedocs.io/en/latest/_images/webui_quickstart_launch_server.png" width=80%; />
|
||||
|
|
|
|||
Loading…
Reference in a new issue