Update Linux Quickstart (#10499)

* fix quick start

* update toc

* expose docker
This commit is contained in:
Kai Huang 2024-03-21 20:13:21 +08:00 committed by GitHub
parent 3a3756b51d
commit 92ee2077b3
5 changed files with 27 additions and 19 deletions

View file

@ -9,9 +9,15 @@
<i class="fa-solid fa-chevron-down"></i>
</label>
<ul class="nav bigdl-quicklinks-section-nav">
<li>
<a href="doc/LLM/Quickstart/install_linux_gpu.html">Install BigDL-LLM on Linux with Intel GPU</a>
</li>
<li>
<a href="doc/LLM/Quickstart/install_windows_gpu.html">Install BigDL-LLM on Windows with Intel GPU</a>
</li>
<li>
<a href="doc/LLM/Quickstart/docker_windows_gpu.html">Install BigDL-LLM in Docker on Windows with Intel GPU</a>
</li>
<li>
<a href="doc/LLM/Quickstart/webui_quickstart.html">Use Text Generation WebUI on Windows with Intel GPU</a>
</li>

View file

@ -38,7 +38,9 @@ subtrees:
title: "Quickstart"
subtrees:
- entries:
- file: doc/LLM/Quickstart/install_linux_gpu
- file: doc/LLM/Quickstart/install_windows_gpu
- file: doc/LLM/Quickstart/docker_windows_gpu
- file: doc/LLM/Quickstart/webui_quickstart
- file: doc/LLM/Quickstart/benchmark_quickstart
- file: doc/LLM/Quickstart/llama_cpp_quickstart

View file

@ -7,7 +7,9 @@ BigDL-LLM Quickstart
This section includes efficient guide to show you how to:
* `Install BigDL-LLM on Linux with Intel GPU <./install_linux_gpu.html>`_
* `Install BigDL-LLM on Windows with Intel GPU <./install_windows_gpu.html>`_
* `Install BigDL-LLM in Docker on Windows with Intel GPU <./docker_windows_gpu.html>`_
* `Use Text Generation WebUI on Windows with Intel GPU <./webui_quickstart.html>`_
* `Conduct Performance Benchmarking with BigDL-LLM <./benchmark_quickstart.html>`_
* `Use llama.cpp with BigDL-LLM on Intel GPU <./llama_cpp_quickstart.html>`_

View file

@ -2,14 +2,12 @@
This guide demonstrates how to install BigDL-LLM on Linux with Intel GPUs. It applies to Intel Data Center GPU Flex Series and Max Series, as well as Intel Arc Series GPU.
BigDL-LLM currently supports the Ubuntu 20.04 operating system and later, and supports PyTorch 2.0 and PyTorch 2.1 on Linux. This example installs BigDL-LLM with PyTorch 2.1 using `pip`. For more details and other options like installing with wheel, please refer to the [Installation Webpage](https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#linux).
BigDL-LLM currently supports the Ubuntu 20.04 operating system and later, and supports PyTorch 2.0 and PyTorch 2.1 on Linux. This page demonstrates BigDL-LLM with PyTorch 2.1. Check the [Installation](https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#linux) page for more details.
## Install Intel GPU Driver
This guide demonstrates how to install driver on linux with **kernel version 6.2** on Intel GPU.
We assume that you have the 6.2 kernel on your linux machine.
### For Linux kernel 6.2
* Install arc driver
```bash
@ -20,7 +18,7 @@ We assume that you have the 6.2 kernel on your linux machine.
sudo tee /etc/apt/sources.list.d/intel-gpu-jammy.list
```
> <img src="https://llm-assets.readthedocs.io/en/latest/_images/driver_install1.png" width=100%; />
> <img src="https://llm-assets.readthedocs.io/en/latest/_images/wget.png" width=100%; />
* Install drivers
@ -45,9 +43,9 @@ We assume that you have the 6.2 kernel on your linux machine.
sudo reboot
```
> <img src="https://llm-assets.readthedocs.io/en/latest/_images/driver_install2.png" width=100%; />
> <img src="https://llm-assets.readthedocs.io/en/latest/_images/i915.png" width=100%; />
> <img src="https://llm-assets.readthedocs.io/en/latest/_images/driver_install3.png" width=100%; />
> <img src="https://llm-assets.readthedocs.io/en/latest/_images/gawk.png" width=100%; />
* Configure permissions
@ -63,7 +61,7 @@ We assume that you have the 6.2 kernel on your linux machine.
## Setup Python Environment
* Install the Miniconda as follows
Install the Miniconda as follows if you don't have conda installed on your machine:
```bash
wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
@ -75,7 +73,7 @@ We assume that you have the 6.2 kernel on your linux machine.
conda --version
# rm Miniconda3-latest-Linux-x86_64.sh # if you don't need this file any longer
```
> <img src="https://llm-assets.readthedocs.io/en/latest/_images/python_env1.png" alt="image-20240221102252569" width=100%; />
> <img src="https://llm-assets.readthedocs.io/en/latest/_images/install_conda.png" alt="image-20240221102252569" width=100%; />
## Install oneAPI
@ -88,9 +86,9 @@ We assume that you have the 6.2 kernel on your linux machine.
sudo apt install intel-basekit
```
> <img src="https://llm-assets.readthedocs.io/en/latest/_images/install_oneapi_1.png" alt="image-20240221102252565" width=100%; />
> <img src="https://llm-assets.readthedocs.io/en/latest/_images/oneapi.png" alt="image-20240221102252565" width=100%; />
> <img src="https://llm-assets.readthedocs.io/en/latest/_images/install_oneapi_2.png" alt="image-20240221102252565" width=100%; />
> <img src="https://llm-assets.readthedocs.io/en/latest/_images/basekit.png" alt="image-20240221102252565" width=100%; />
## Install `bigdl-llm`
@ -103,24 +101,24 @@ We assume that you have the 6.2 kernel on your linux machine.
pip install --pre --upgrade bigdl-llm[xpu] --extra-index-url https://developer.intel.com/ipex-whl-stable-xpu
```
> <img src="https://llm-assets.readthedocs.io/en/latest/_images/big_dl1.png" alt="image-20240221102252564" width=100%; />
> <img src="https://llm-assets.readthedocs.io/en/latest/_images/create_conda_env.png" alt="image-20240221102252564" width=100%; />
> <img src="https://llm-assets.readthedocs.io/en/latest/_images/big_dl2.png" alt="image-20240221102252564" width=100%; />
> <img src="https://llm-assets.readthedocs.io/en/latest/_images/create_conda_env.png" alt="image-20240221102252564" width=100%; />
* You can verfy if bigdl-llm is successfully by simply importing a few classes from the library. For example, execute the following import command in terminal:
* You can verify if bigdl-llm is successfully installed by simply importing a few classes from the library. For example, execute the following import command in the terminal:
```bash
source /opt/intel/oneapi/setvars.sh
python
> from bigdl.llm.transformers import AutoModel,AutoModelForCausalLM
> from bigdl.llm.transformers import AutoModel, AutoModelForCausalLM
```
> <img src="https://llm-assets.readthedocs.io/en/latest/_images/big_dl3.png" alt="image-20240221102252562" width=100%; />
> <img src="https://llm-assets.readthedocs.io/en/latest/_images/verify_bigdl_import.png" alt="image-20240221102252562" width=100%; />
## Runtime Configuration
## Runtime Configurations
To use GPU acceleration on Linux, several environment variables are required or recommended before running a GPU example.
@ -147,7 +145,7 @@ To use GPU acceleration on Linux, several environment variables are required or
export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
export ENABLE_SDP_FUSION=1
```
Please note that libtcmalloc.so can be installed by conda install -c conda-forge -y gperftools=2.10
Please note that `libtcmalloc.so` can be installed by ```conda install -c conda-forge -y gperftools=2.10```.
## A Quick Example
@ -213,5 +211,5 @@ Now let's play with a real LLM. We'll be using the [phi-1.5](https://huggingface
## Tips & Troubleshooting
### Warmup for optimial performance on first run
When running LLMs on GPU for the first time, you might notice the performance is lower than expected, with delays up to several minutes before the first token is generated. This delay occurs because the GPU kernels require compilation and initialization, which varies across different GPU models. To achieve optimal and consistent performance, we recommend a one-time warm-up by running `model.generate(...)` an additional time before starting your actual generation tasks. If you're developing an application, you can incorporate this warmup step into start-up or loading routine to enhance the user experience.
When running LLMs on GPU for the first time, you might notice the performance is lower than expected, with delays up to several minutes before the first token is generated. This delay occurs because the GPU kernels require compilation and initialization, which varies across different GPU types. To achieve optimal and consistent performance, we recommend a one-time warm-up by running `model.generate(...)` an additional time before starting your actual generation tasks. If you're developing an application, you can incorporate this warmup step into start-up or loading routine to enhance the user experience.