Reorganize GPU examples (#8844)

2023-08-30 08:32:08 +08:00 · 2023-08-30 08:32:08 +08:00 · aab7deab1f
commit aab7deab1f
parent a386ad984e
26 changed files with 34 additions and 28 deletions
--- a/README.md
+++ b/README.md
@ -9,14 +9,16 @@ _**Fast, Distributed, Secure AI for Big Data**_
 ---
 ## Latest News
- **Try the latest [`bigdl-llm`](python/llm) library for running LLM (large language model) on your Intel laptop using INT4 with very low latency!**[^1] *(It is built on top of the excellent work of [llama.cpp](https://github.com/ggerganov/llama.cpp), [gptq](https://github.com/IST-DASLab/gptq), [bitsandbytes](https://github.com/TimDettmers/bitsandbytes), etc., and supports any Hugging Face Transformers model)*
+- **Try the latest [`bigdl-llm`](python/llm) library for running LLM (large language model) on your Intel laptop or GPU using INT4 with very low latency!**[^1] *(It is built on top of the excellent work of [llama.cpp](https://github.com/ggerganov/llama.cpp), [gptq](https://github.com/IST-DASLab/gptq), [bitsandbytes](https://github.com/TimDettmers/bitsandbytes), [qlora](https://github.com/artidoro/qlora), etc., and supports any Hugging Face Transformers model)*
 <p align="center">
            <img src="https://github.com/bigdl-project/bigdl-project.github.io/blob/master/assets/chatglm2-6b.gif" width='30%' /> <img src="https://github.com/bigdl-project/bigdl-project.github.io/blob/master/assets/llama-2-13b-chat.gif" width='30%' /> <img src="https://github.com/bigdl-project/bigdl-project.github.io/blob/master/assets/llm-15b5.gif" width='30%' />
            <img src="https://github.com/bigdl-project/bigdl-project.github.io/blob/master/assets/llm-models3.png" width='76%'/>
 </p>
- **[Update] Over a dozen models have been verified on [`bigdl-llm`](python/llm)**, including *LLaMA/LLaMA2, ChatGLM/ChatGLM2, MPT, Falcon, Dolly-v1/Dolly-v2, StarCoder, Whisper, QWen, Baichuan,* and more; see the complete list [here](python/llm/README.md#verified-models).
+- **[Update] `bigdl-llm` now supports Intel Arc or Flex GPU; see the the latest GPU examples [here](https://github.com/jason-dai/BigDL/tree/main/python/llm/example/gpu).**
 - **Over a dozen models have been verified on [`bigdl-llm`](python/llm)**, including *LLaMA/LLaMA2, ChatGLM/ChatGLM2, MPT, Falcon, Dolly-v1/Dolly-v2, StarCoder, Whisper, QWen, Baichuan,* and more; see the complete list [here](python/llm/README.md#verified-models).
 ---
 ## Overview
--- a/python/llm/README.md
+++ b/python/llm/README.md
@ -1,9 +1,12 @@
 ## BigDL-LLM
-**`bigdl-llm`** is a library for running ***LLM*** (large language model) on your Intel ***laptop*** using INT4 with very low latency[^1] (for any Hugging Face *Transformers* model).
+**`bigdl-llm`** is a library for running ***LLM*** (large language model) on your Intel ***laptop*** or ***GPU*** using INT4 with very low latency[^1] (for any Hugging Face *Transformers* model).
->*(It is built on top of the excellent work of [llama.cpp](https://github.com/ggerganov/llama.cpp), [gptq](https://github.com/IST-DASLab/gptq), [ggml](https://github.com/ggerganov/ggml), [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), [gptq_for_llama](https://github.com/qwopqwop200/GPTQ-for-LLaMa), [bitsandbytes](https://github.com/TimDettmers/bitsandbytes), [chatglm.cpp](https://github.com/li-plus/chatglm.cpp), [redpajama.cpp](https://github.com/togethercomputer/redpajama.cpp), [gptneox.cpp](https://github.com/byroneverson/gptneox.cpp), [bloomz.cpp](https://github.com/NouamaneTazi/bloomz.cpp/), etc.)*
+> *It is built on top of the excellent work of [llama.cpp](https://github.com/ggerganov/llama.cpp), [gptq](https://github.com/IST-DASLab/gptq), [ggml](https://github.com/ggerganov/ggml), [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), [bitsandbytes](https://github.com/TimDettmers/bitsandbytes), [qlora](https://github.com/artidoro/qlora), [gptq_for_llama](https://github.com/qwopqwop200/GPTQ-for-LLaMa), [chatglm.cpp](https://github.com/li-plus/chatglm.cpp), [redpajama.cpp](https://github.com/togethercomputer/redpajama.cpp), [gptneox.cpp](https://github.com/byroneverson/gptneox.cpp), [bloomz.cpp](https://github.com/NouamaneTazi/bloomz.cpp/), etc.*
 ### Latest update
 - `bigdl-llm` now supports Intel Arc or Flex GPU; see the the latest GPU examples [here](https://github.com/jason-dai/BigDL/tree/main/python/llm/example/gpu).
 ### Demos
 See the ***optimized performance*** of `chatglm2-6b`, `llama-2-13b-chat`, and `starcoder-15b` models on a 12th Gen Intel Core CPU below.
--- a/python/llm/example/gpu/README.md
+++ b/python/llm/example/gpu/README.md
@ -0,0 +1,24 @@
 # BigDL-LLM Transformers INT4 Optimization for Large Language Model on Intel GPUs
 You can use BigDL-LLM to run almost every Huggingface Transformer models with INT4 optimizations on your laptops with Intel GPUs. This directory contains example scripts to help you quickly get started using BigDL-LLM to run some popular open-source models in the community. Each model has its own dedicated folder, where you can find detailed instructions on how to install and run it.
 ## Verified Hardware Platforms
 - Intel Arc™ A-Series Graphics
 - Intel Data Center GPU Flex Series
 ## Recommended Requirements
 To apply Intel GPU acceleration, there’re several steps for tools installation and environment preparation.
 Step 1, only Linux system is supported now, Ubuntu 22.04 is prefered.
 Step 2, please refer to our [drive installation](https://dgpu-docs.intel.com/driver/installation.html) for general purpose GPU capabilities.
 Step 3, you also need to download and install [Intel® oneAPI Base Toolkit](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html). OneMKL and DPC++ compiler are needed, others are optional.
 > **Note**: IPEX 2.0.110+xpu requires Intel® oneAPI Base Toolkit's version >= 2023.2.0.
 ## Best Known Configuration on Linux
 For better performance, it is recommended to set environment variables on Linux:
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```
--- a/python/llm/example/transformers/transformers_int4/GPU/baichuan/README.md
+++ b/python/llm/example/transformers/transformers_int4/GPU/baichuan/README.md
--- a/python/llm/example/transformers/transformers_int4/GPU/baichuan/generate.py
+++ b/python/llm/example/transformers/transformers_int4/GPU/baichuan/generate.py
--- a/python/llm/example/transformers/transformers_int4/GPU/chatglm2/README.md
+++ b/python/llm/example/transformers/transformers_int4/GPU/chatglm2/README.md
--- a/python/llm/example/transformers/transformers_int4/GPU/chatglm2/generate.py
+++ b/python/llm/example/transformers/transformers_int4/GPU/chatglm2/generate.py
--- a/python/llm/example/transformers/transformers_int4/GPU/chatglm2/streamchat.py
+++ b/python/llm/example/transformers/transformers_int4/GPU/chatglm2/streamchat.py
--- a/python/llm/example/transformers/transformers_int4/GPU/falcon/README.md
+++ b/python/llm/example/transformers/transformers_int4/GPU/falcon/README.md
--- a/python/llm/example/transformers/transformers_int4/GPU/falcon/falcon-7b-instruct/modelling_RW.py
+++ b/python/llm/example/transformers/transformers_int4/GPU/falcon/falcon-7b-instruct/modelling_RW.py
--- a/python/llm/example/transformers/transformers_int4/GPU/falcon/generate.py
+++ b/python/llm/example/transformers/transformers_int4/GPU/falcon/generate.py
--- a/python/llm/example/transformers/transformers_int4/GPU/internlm/README.md
+++ b/python/llm/example/transformers/transformers_int4/GPU/internlm/README.md
--- a/python/llm/example/transformers/transformers_int4/GPU/internlm/generate.py
+++ b/python/llm/example/transformers/transformers_int4/GPU/internlm/generate.py
--- a/python/llm/example/transformers/transformers_int4/GPU/llama2/README.md
+++ b/python/llm/example/transformers/transformers_int4/GPU/llama2/README.md
--- a/python/llm/example/transformers/transformers_int4/GPU/llama2/generate.py
+++ b/python/llm/example/transformers/transformers_int4/GPU/llama2/generate.py
--- a/python/llm/example/transformers/transformers_int4/GPU/mpt/README.md
+++ b/python/llm/example/transformers/transformers_int4/GPU/mpt/README.md
--- a/python/llm/example/transformers/transformers_int4/GPU/mpt/generate.py
+++ b/python/llm/example/transformers/transformers_int4/GPU/mpt/generate.py
--- a/python/llm/example/transformers/transformers_int4/GPU/qwen/README.md
+++ b/python/llm/example/transformers/transformers_int4/GPU/qwen/README.md
--- a/python/llm/example/transformers/transformers_int4/GPU/qwen/generate.py
+++ b/python/llm/example/transformers/transformers_int4/GPU/qwen/generate.py
--- a/python/llm/example/transformers/transformers_int4/GPU/starcoder/generate.py
+++ b/python/llm/example/transformers/transformers_int4/GPU/starcoder/generate.py
--- a/python/llm/example/transformers/transformers_int4/GPU/starcoder/readme.md
+++ b/python/llm/example/transformers/transformers_int4/GPU/starcoder/readme.md
--- a/python/llm/example/transformers/transformers_int4/GPU/voiceassistant/README.md
+++ b/python/llm/example/transformers/transformers_int4/GPU/voiceassistant/README.md
--- a/python/llm/example/transformers/transformers_int4/GPU/voiceassistant/generate.py
+++ b/python/llm/example/transformers/transformers_int4/GPU/voiceassistant/generate.py
--- a/python/llm/example/transformers/transformers_int4/GPU/whisper/readme.md
+++ b/python/llm/example/transformers/transformers_int4/GPU/whisper/readme.md
--- a/python/llm/example/transformers/transformers_int4/GPU/whisper/recognize.py
+++ b/python/llm/example/transformers/transformers_int4/GPU/whisper/recognize.py
--- a/python/llm/example/transformers/transformers_int4/GPU/README.md
+++ b/python/llm/example/transformers/transformers_int4/GPU/README.md
@ -1,24 +1 @@
-# BigDL-LLM Transformers INT4 Optimization for Large Language Model on Intel GPUs
+### The GPU examples for `bigdl-llm` have been moved to [here](../../../gpu).
 You can use BigDL-LLM to run almost every Huggingface Transformer models with INT4 optimizations on your laptops with Intel GPUs. This directory contains example scripts to help you quickly get started using BigDL-LLM to run some popular open-source models in the community. Each model has its own dedicated folder, where you can find detailed instructions on how to install and run it.
 ## Verified Hardware Platforms
 - Intel Arc™ A-Series Graphics
 - Intel Data Center GPU Flex Series
 ## Recommended Requirements
 To apply Intel GPU acceleration, there’re several steps for tools installation and environment preparation.
 Step 1, only Linux system is supported now, Ubuntu 22.04 is prefered.
 Step 2, please refer to our [drive installation](https://dgpu-docs.intel.com/driver/installation.html) for general purpose GPU capabilities.
 Step 3, you also need to download and install [Intel® oneAPI Base Toolkit](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html). OneMKL and DPC++ compiler are needed, others are optional.
 > **Note**: IPEX 2.0.110+xpu requires Intel® oneAPI Base Toolkit's version >= 2023.2.0.
 ## Best Known Configuration on Linux
 For better performance, it is recommended to set environment variables on Linux:
 ```bash
 export USE_XETLA=OFF
 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
 ```