Reorganize GPU examples (#8844)
This commit is contained in:
parent
a386ad984e
commit
aab7deab1f
26 changed files with 34 additions and 28 deletions
|
|
@ -9,14 +9,16 @@ _**Fast, Distributed, Secure AI for Big Data**_
|
||||||
---
|
---
|
||||||
## Latest News
|
## Latest News
|
||||||
|
|
||||||
- **Try the latest [`bigdl-llm`](python/llm) library for running LLM (large language model) on your Intel laptop using INT4 with very low latency!**[^1] *(It is built on top of the excellent work of [llama.cpp](https://github.com/ggerganov/llama.cpp), [gptq](https://github.com/IST-DASLab/gptq), [bitsandbytes](https://github.com/TimDettmers/bitsandbytes), etc., and supports any Hugging Face Transformers model)*
|
- **Try the latest [`bigdl-llm`](python/llm) library for running LLM (large language model) on your Intel laptop or GPU using INT4 with very low latency!**[^1] *(It is built on top of the excellent work of [llama.cpp](https://github.com/ggerganov/llama.cpp), [gptq](https://github.com/IST-DASLab/gptq), [bitsandbytes](https://github.com/TimDettmers/bitsandbytes), [qlora](https://github.com/artidoro/qlora), etc., and supports any Hugging Face Transformers model)*
|
||||||
|
|
||||||
<p align="center">
|
<p align="center">
|
||||||
<img src="https://github.com/bigdl-project/bigdl-project.github.io/blob/master/assets/chatglm2-6b.gif" width='30%' /> <img src="https://github.com/bigdl-project/bigdl-project.github.io/blob/master/assets/llama-2-13b-chat.gif" width='30%' /> <img src="https://github.com/bigdl-project/bigdl-project.github.io/blob/master/assets/llm-15b5.gif" width='30%' />
|
<img src="https://github.com/bigdl-project/bigdl-project.github.io/blob/master/assets/chatglm2-6b.gif" width='30%' /> <img src="https://github.com/bigdl-project/bigdl-project.github.io/blob/master/assets/llama-2-13b-chat.gif" width='30%' /> <img src="https://github.com/bigdl-project/bigdl-project.github.io/blob/master/assets/llm-15b5.gif" width='30%' />
|
||||||
<img src="https://github.com/bigdl-project/bigdl-project.github.io/blob/master/assets/llm-models3.png" width='76%'/>
|
<img src="https://github.com/bigdl-project/bigdl-project.github.io/blob/master/assets/llm-models3.png" width='76%'/>
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
- **[Update] Over a dozen models have been verified on [`bigdl-llm`](python/llm)**, including *LLaMA/LLaMA2, ChatGLM/ChatGLM2, MPT, Falcon, Dolly-v1/Dolly-v2, StarCoder, Whisper, QWen, Baichuan,* and more; see the complete list [here](python/llm/README.md#verified-models).
|
- **[Update] `bigdl-llm` now supports Intel Arc or Flex GPU; see the the latest GPU examples [here](https://github.com/jason-dai/BigDL/tree/main/python/llm/example/gpu).**
|
||||||
|
|
||||||
|
- **Over a dozen models have been verified on [`bigdl-llm`](python/llm)**, including *LLaMA/LLaMA2, ChatGLM/ChatGLM2, MPT, Falcon, Dolly-v1/Dolly-v2, StarCoder, Whisper, QWen, Baichuan,* and more; see the complete list [here](python/llm/README.md#verified-models).
|
||||||
|
|
||||||
---
|
---
|
||||||
## Overview
|
## Overview
|
||||||
|
|
|
||||||
|
|
@ -1,9 +1,12 @@
|
||||||
## BigDL-LLM
|
## BigDL-LLM
|
||||||
|
|
||||||
**`bigdl-llm`** is a library for running ***LLM*** (large language model) on your Intel ***laptop*** using INT4 with very low latency[^1] (for any Hugging Face *Transformers* model).
|
**`bigdl-llm`** is a library for running ***LLM*** (large language model) on your Intel ***laptop*** or ***GPU*** using INT4 with very low latency[^1] (for any Hugging Face *Transformers* model).
|
||||||
|
|
||||||
>*(It is built on top of the excellent work of [llama.cpp](https://github.com/ggerganov/llama.cpp), [gptq](https://github.com/IST-DASLab/gptq), [ggml](https://github.com/ggerganov/ggml), [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), [gptq_for_llama](https://github.com/qwopqwop200/GPTQ-for-LLaMa), [bitsandbytes](https://github.com/TimDettmers/bitsandbytes), [chatglm.cpp](https://github.com/li-plus/chatglm.cpp), [redpajama.cpp](https://github.com/togethercomputer/redpajama.cpp), [gptneox.cpp](https://github.com/byroneverson/gptneox.cpp), [bloomz.cpp](https://github.com/NouamaneTazi/bloomz.cpp/), etc.)*
|
> *It is built on top of the excellent work of [llama.cpp](https://github.com/ggerganov/llama.cpp), [gptq](https://github.com/IST-DASLab/gptq), [ggml](https://github.com/ggerganov/ggml), [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), [bitsandbytes](https://github.com/TimDettmers/bitsandbytes), [qlora](https://github.com/artidoro/qlora), [gptq_for_llama](https://github.com/qwopqwop200/GPTQ-for-LLaMa), [chatglm.cpp](https://github.com/li-plus/chatglm.cpp), [redpajama.cpp](https://github.com/togethercomputer/redpajama.cpp), [gptneox.cpp](https://github.com/byroneverson/gptneox.cpp), [bloomz.cpp](https://github.com/NouamaneTazi/bloomz.cpp/), etc.*
|
||||||
|
|
||||||
|
### Latest update
|
||||||
|
- `bigdl-llm` now supports Intel Arc or Flex GPU; see the the latest GPU examples [here](https://github.com/jason-dai/BigDL/tree/main/python/llm/example/gpu).
|
||||||
|
|
||||||
### Demos
|
### Demos
|
||||||
See the ***optimized performance*** of `chatglm2-6b`, `llama-2-13b-chat`, and `starcoder-15b` models on a 12th Gen Intel Core CPU below.
|
See the ***optimized performance*** of `chatglm2-6b`, `llama-2-13b-chat`, and `starcoder-15b` models on a 12th Gen Intel Core CPU below.
|
||||||
|
|
||||||
|
|
|
||||||
24
python/llm/example/gpu/README.md
Normal file
24
python/llm/example/gpu/README.md
Normal file
|
|
@ -0,0 +1,24 @@
|
||||||
|
# BigDL-LLM Transformers INT4 Optimization for Large Language Model on Intel GPUs
|
||||||
|
You can use BigDL-LLM to run almost every Huggingface Transformer models with INT4 optimizations on your laptops with Intel GPUs. This directory contains example scripts to help you quickly get started using BigDL-LLM to run some popular open-source models in the community. Each model has its own dedicated folder, where you can find detailed instructions on how to install and run it.
|
||||||
|
|
||||||
|
## Verified Hardware Platforms
|
||||||
|
|
||||||
|
- Intel Arc™ A-Series Graphics
|
||||||
|
- Intel Data Center GPU Flex Series
|
||||||
|
|
||||||
|
## Recommended Requirements
|
||||||
|
To apply Intel GPU acceleration, there’re several steps for tools installation and environment preparation.
|
||||||
|
|
||||||
|
Step 1, only Linux system is supported now, Ubuntu 22.04 is prefered.
|
||||||
|
|
||||||
|
Step 2, please refer to our [drive installation](https://dgpu-docs.intel.com/driver/installation.html) for general purpose GPU capabilities.
|
||||||
|
|
||||||
|
Step 3, you also need to download and install [Intel® oneAPI Base Toolkit](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html). OneMKL and DPC++ compiler are needed, others are optional.
|
||||||
|
> **Note**: IPEX 2.0.110+xpu requires Intel® oneAPI Base Toolkit's version >= 2023.2.0.
|
||||||
|
|
||||||
|
## Best Known Configuration on Linux
|
||||||
|
For better performance, it is recommended to set environment variables on Linux:
|
||||||
|
```bash
|
||||||
|
export USE_XETLA=OFF
|
||||||
|
export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
|
||||||
|
```
|
||||||
|
|
@ -1,24 +1 @@
|
||||||
# BigDL-LLM Transformers INT4 Optimization for Large Language Model on Intel GPUs
|
### The GPU examples for `bigdl-llm` have been moved to [here](../../../gpu).
|
||||||
You can use BigDL-LLM to run almost every Huggingface Transformer models with INT4 optimizations on your laptops with Intel GPUs. This directory contains example scripts to help you quickly get started using BigDL-LLM to run some popular open-source models in the community. Each model has its own dedicated folder, where you can find detailed instructions on how to install and run it.
|
|
||||||
|
|
||||||
## Verified Hardware Platforms
|
|
||||||
|
|
||||||
- Intel Arc™ A-Series Graphics
|
|
||||||
- Intel Data Center GPU Flex Series
|
|
||||||
|
|
||||||
## Recommended Requirements
|
|
||||||
To apply Intel GPU acceleration, there’re several steps for tools installation and environment preparation.
|
|
||||||
|
|
||||||
Step 1, only Linux system is supported now, Ubuntu 22.04 is prefered.
|
|
||||||
|
|
||||||
Step 2, please refer to our [drive installation](https://dgpu-docs.intel.com/driver/installation.html) for general purpose GPU capabilities.
|
|
||||||
|
|
||||||
Step 3, you also need to download and install [Intel® oneAPI Base Toolkit](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html). OneMKL and DPC++ compiler are needed, others are optional.
|
|
||||||
> **Note**: IPEX 2.0.110+xpu requires Intel® oneAPI Base Toolkit's version >= 2023.2.0.
|
|
||||||
|
|
||||||
## Best Known Configuration on Linux
|
|
||||||
For better performance, it is recommended to set environment variables on Linux:
|
|
||||||
```bash
|
|
||||||
export USE_XETLA=OFF
|
|
||||||
export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
|
|
||||||
```
|
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue