diff --git a/README.md b/README.md index 90bcea27..259dbd54 100644 --- a/README.md +++ b/README.md @@ -9,14 +9,16 @@ _**Fast, Distributed, Secure AI for Big Data**_ --- ## Latest News -- **Try the latest [`bigdl-llm`](python/llm) library for running LLM (large language model) on your Intel laptop using INT4 with very low latency!**[^1] *(It is built on top of the excellent work of [llama.cpp](https://github.com/ggerganov/llama.cpp), [gptq](https://github.com/IST-DASLab/gptq), [bitsandbytes](https://github.com/TimDettmers/bitsandbytes), etc., and supports any Hugging Face Transformers model)* +- **Try the latest [`bigdl-llm`](python/llm) library for running LLM (large language model) on your Intel laptop or GPU using INT4 with very low latency!**[^1] *(It is built on top of the excellent work of [llama.cpp](https://github.com/ggerganov/llama.cpp), [gptq](https://github.com/IST-DASLab/gptq), [bitsandbytes](https://github.com/TimDettmers/bitsandbytes), [qlora](https://github.com/artidoro/qlora), etc., and supports any Hugging Face Transformers model)*

-- **[Update] Over a dozen models have been verified on [`bigdl-llm`](python/llm)**, including *LLaMA/LLaMA2, ChatGLM/ChatGLM2, MPT, Falcon, Dolly-v1/Dolly-v2, StarCoder, Whisper, QWen, Baichuan,* and more; see the complete list [here](python/llm/README.md#verified-models). +- **[Update] `bigdl-llm` now supports Intel Arc or Flex GPU; see the the latest GPU examples [here](https://github.com/jason-dai/BigDL/tree/main/python/llm/example/gpu).** + +- **Over a dozen models have been verified on [`bigdl-llm`](python/llm)**, including *LLaMA/LLaMA2, ChatGLM/ChatGLM2, MPT, Falcon, Dolly-v1/Dolly-v2, StarCoder, Whisper, QWen, Baichuan,* and more; see the complete list [here](python/llm/README.md#verified-models). --- ## Overview diff --git a/python/llm/README.md b/python/llm/README.md index 29293084..7e3e98a7 100644 --- a/python/llm/README.md +++ b/python/llm/README.md @@ -1,9 +1,12 @@ ## BigDL-LLM -**`bigdl-llm`** is a library for running ***LLM*** (large language model) on your Intel ***laptop*** using INT4 with very low latency[^1] (for any Hugging Face *Transformers* model). +**`bigdl-llm`** is a library for running ***LLM*** (large language model) on your Intel ***laptop*** or ***GPU*** using INT4 with very low latency[^1] (for any Hugging Face *Transformers* model). ->*(It is built on top of the excellent work of [llama.cpp](https://github.com/ggerganov/llama.cpp), [gptq](https://github.com/IST-DASLab/gptq), [ggml](https://github.com/ggerganov/ggml), [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), [gptq_for_llama](https://github.com/qwopqwop200/GPTQ-for-LLaMa), [bitsandbytes](https://github.com/TimDettmers/bitsandbytes), [chatglm.cpp](https://github.com/li-plus/chatglm.cpp), [redpajama.cpp](https://github.com/togethercomputer/redpajama.cpp), [gptneox.cpp](https://github.com/byroneverson/gptneox.cpp), [bloomz.cpp](https://github.com/NouamaneTazi/bloomz.cpp/), etc.)* +> *It is built on top of the excellent work of [llama.cpp](https://github.com/ggerganov/llama.cpp), [gptq](https://github.com/IST-DASLab/gptq), [ggml](https://github.com/ggerganov/ggml), [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), [bitsandbytes](https://github.com/TimDettmers/bitsandbytes), [qlora](https://github.com/artidoro/qlora), [gptq_for_llama](https://github.com/qwopqwop200/GPTQ-for-LLaMa), [chatglm.cpp](https://github.com/li-plus/chatglm.cpp), [redpajama.cpp](https://github.com/togethercomputer/redpajama.cpp), [gptneox.cpp](https://github.com/byroneverson/gptneox.cpp), [bloomz.cpp](https://github.com/NouamaneTazi/bloomz.cpp/), etc.* +### Latest update + - `bigdl-llm` now supports Intel Arc or Flex GPU; see the the latest GPU examples [here](https://github.com/jason-dai/BigDL/tree/main/python/llm/example/gpu). + ### Demos See the ***optimized performance*** of `chatglm2-6b`, `llama-2-13b-chat`, and `starcoder-15b` models on a 12th Gen Intel Core CPU below. diff --git a/python/llm/example/gpu/README.md b/python/llm/example/gpu/README.md new file mode 100644 index 00000000..d5c4dff3 --- /dev/null +++ b/python/llm/example/gpu/README.md @@ -0,0 +1,24 @@ +# BigDL-LLM Transformers INT4 Optimization for Large Language Model on Intel GPUs +You can use BigDL-LLM to run almost every Huggingface Transformer models with INT4 optimizations on your laptops with Intel GPUs. This directory contains example scripts to help you quickly get started using BigDL-LLM to run some popular open-source models in the community. Each model has its own dedicated folder, where you can find detailed instructions on how to install and run it. + +## Verified Hardware Platforms + +- Intel Arc™ A-Series Graphics +- Intel Data Center GPU Flex Series + +## Recommended Requirements +To apply Intel GPU acceleration, there’re several steps for tools installation and environment preparation. + +Step 1, only Linux system is supported now, Ubuntu 22.04 is prefered. + +Step 2, please refer to our [drive installation](https://dgpu-docs.intel.com/driver/installation.html) for general purpose GPU capabilities. + +Step 3, you also need to download and install [Intel® oneAPI Base Toolkit](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html). OneMKL and DPC++ compiler are needed, others are optional. +> **Note**: IPEX 2.0.110+xpu requires Intel® oneAPI Base Toolkit's version >= 2023.2.0. + +## Best Known Configuration on Linux +For better performance, it is recommended to set environment variables on Linux: +```bash +export USE_XETLA=OFF +export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 +``` diff --git a/python/llm/example/transformers/transformers_int4/GPU/baichuan/README.md b/python/llm/example/gpu/baichuan/README.md similarity index 100% rename from python/llm/example/transformers/transformers_int4/GPU/baichuan/README.md rename to python/llm/example/gpu/baichuan/README.md diff --git a/python/llm/example/transformers/transformers_int4/GPU/baichuan/generate.py b/python/llm/example/gpu/baichuan/generate.py similarity index 100% rename from python/llm/example/transformers/transformers_int4/GPU/baichuan/generate.py rename to python/llm/example/gpu/baichuan/generate.py diff --git a/python/llm/example/transformers/transformers_int4/GPU/chatglm2/README.md b/python/llm/example/gpu/chatglm2/README.md similarity index 100% rename from python/llm/example/transformers/transformers_int4/GPU/chatglm2/README.md rename to python/llm/example/gpu/chatglm2/README.md diff --git a/python/llm/example/transformers/transformers_int4/GPU/chatglm2/generate.py b/python/llm/example/gpu/chatglm2/generate.py similarity index 100% rename from python/llm/example/transformers/transformers_int4/GPU/chatglm2/generate.py rename to python/llm/example/gpu/chatglm2/generate.py diff --git a/python/llm/example/transformers/transformers_int4/GPU/chatglm2/streamchat.py b/python/llm/example/gpu/chatglm2/streamchat.py similarity index 100% rename from python/llm/example/transformers/transformers_int4/GPU/chatglm2/streamchat.py rename to python/llm/example/gpu/chatglm2/streamchat.py diff --git a/python/llm/example/transformers/transformers_int4/GPU/falcon/README.md b/python/llm/example/gpu/falcon/README.md similarity index 100% rename from python/llm/example/transformers/transformers_int4/GPU/falcon/README.md rename to python/llm/example/gpu/falcon/README.md diff --git a/python/llm/example/transformers/transformers_int4/GPU/falcon/falcon-7b-instruct/modelling_RW.py b/python/llm/example/gpu/falcon/falcon-7b-instruct/modelling_RW.py similarity index 100% rename from python/llm/example/transformers/transformers_int4/GPU/falcon/falcon-7b-instruct/modelling_RW.py rename to python/llm/example/gpu/falcon/falcon-7b-instruct/modelling_RW.py diff --git a/python/llm/example/transformers/transformers_int4/GPU/falcon/generate.py b/python/llm/example/gpu/falcon/generate.py similarity index 100% rename from python/llm/example/transformers/transformers_int4/GPU/falcon/generate.py rename to python/llm/example/gpu/falcon/generate.py diff --git a/python/llm/example/transformers/transformers_int4/GPU/internlm/README.md b/python/llm/example/gpu/internlm/README.md similarity index 100% rename from python/llm/example/transformers/transformers_int4/GPU/internlm/README.md rename to python/llm/example/gpu/internlm/README.md diff --git a/python/llm/example/transformers/transformers_int4/GPU/internlm/generate.py b/python/llm/example/gpu/internlm/generate.py similarity index 100% rename from python/llm/example/transformers/transformers_int4/GPU/internlm/generate.py rename to python/llm/example/gpu/internlm/generate.py diff --git a/python/llm/example/transformers/transformers_int4/GPU/llama2/README.md b/python/llm/example/gpu/llama2/README.md similarity index 100% rename from python/llm/example/transformers/transformers_int4/GPU/llama2/README.md rename to python/llm/example/gpu/llama2/README.md diff --git a/python/llm/example/transformers/transformers_int4/GPU/llama2/generate.py b/python/llm/example/gpu/llama2/generate.py similarity index 100% rename from python/llm/example/transformers/transformers_int4/GPU/llama2/generate.py rename to python/llm/example/gpu/llama2/generate.py diff --git a/python/llm/example/transformers/transformers_int4/GPU/mpt/README.md b/python/llm/example/gpu/mpt/README.md similarity index 100% rename from python/llm/example/transformers/transformers_int4/GPU/mpt/README.md rename to python/llm/example/gpu/mpt/README.md diff --git a/python/llm/example/transformers/transformers_int4/GPU/mpt/generate.py b/python/llm/example/gpu/mpt/generate.py similarity index 100% rename from python/llm/example/transformers/transformers_int4/GPU/mpt/generate.py rename to python/llm/example/gpu/mpt/generate.py diff --git a/python/llm/example/transformers/transformers_int4/GPU/qwen/README.md b/python/llm/example/gpu/qwen/README.md similarity index 100% rename from python/llm/example/transformers/transformers_int4/GPU/qwen/README.md rename to python/llm/example/gpu/qwen/README.md diff --git a/python/llm/example/transformers/transformers_int4/GPU/qwen/generate.py b/python/llm/example/gpu/qwen/generate.py similarity index 100% rename from python/llm/example/transformers/transformers_int4/GPU/qwen/generate.py rename to python/llm/example/gpu/qwen/generate.py diff --git a/python/llm/example/transformers/transformers_int4/GPU/starcoder/generate.py b/python/llm/example/gpu/starcoder/generate.py similarity index 100% rename from python/llm/example/transformers/transformers_int4/GPU/starcoder/generate.py rename to python/llm/example/gpu/starcoder/generate.py diff --git a/python/llm/example/transformers/transformers_int4/GPU/starcoder/readme.md b/python/llm/example/gpu/starcoder/readme.md similarity index 100% rename from python/llm/example/transformers/transformers_int4/GPU/starcoder/readme.md rename to python/llm/example/gpu/starcoder/readme.md diff --git a/python/llm/example/transformers/transformers_int4/GPU/voiceassistant/README.md b/python/llm/example/gpu/voiceassistant/README.md similarity index 100% rename from python/llm/example/transformers/transformers_int4/GPU/voiceassistant/README.md rename to python/llm/example/gpu/voiceassistant/README.md diff --git a/python/llm/example/transformers/transformers_int4/GPU/voiceassistant/generate.py b/python/llm/example/gpu/voiceassistant/generate.py similarity index 100% rename from python/llm/example/transformers/transformers_int4/GPU/voiceassistant/generate.py rename to python/llm/example/gpu/voiceassistant/generate.py diff --git a/python/llm/example/transformers/transformers_int4/GPU/whisper/readme.md b/python/llm/example/gpu/whisper/readme.md similarity index 100% rename from python/llm/example/transformers/transformers_int4/GPU/whisper/readme.md rename to python/llm/example/gpu/whisper/readme.md diff --git a/python/llm/example/transformers/transformers_int4/GPU/whisper/recognize.py b/python/llm/example/gpu/whisper/recognize.py similarity index 100% rename from python/llm/example/transformers/transformers_int4/GPU/whisper/recognize.py rename to python/llm/example/gpu/whisper/recognize.py diff --git a/python/llm/example/transformers/transformers_int4/GPU/README.md b/python/llm/example/transformers/transformers_int4/GPU/README.md index d5c4dff3..f12e7824 100644 --- a/python/llm/example/transformers/transformers_int4/GPU/README.md +++ b/python/llm/example/transformers/transformers_int4/GPU/README.md @@ -1,24 +1 @@ -# BigDL-LLM Transformers INT4 Optimization for Large Language Model on Intel GPUs -You can use BigDL-LLM to run almost every Huggingface Transformer models with INT4 optimizations on your laptops with Intel GPUs. This directory contains example scripts to help you quickly get started using BigDL-LLM to run some popular open-source models in the community. Each model has its own dedicated folder, where you can find detailed instructions on how to install and run it. - -## Verified Hardware Platforms - -- Intel Arc™ A-Series Graphics -- Intel Data Center GPU Flex Series - -## Recommended Requirements -To apply Intel GPU acceleration, there’re several steps for tools installation and environment preparation. - -Step 1, only Linux system is supported now, Ubuntu 22.04 is prefered. - -Step 2, please refer to our [drive installation](https://dgpu-docs.intel.com/driver/installation.html) for general purpose GPU capabilities. - -Step 3, you also need to download and install [Intel® oneAPI Base Toolkit](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html). OneMKL and DPC++ compiler are needed, others are optional. -> **Note**: IPEX 2.0.110+xpu requires Intel® oneAPI Base Toolkit's version >= 2023.2.0. - -## Best Known Configuration on Linux -For better performance, it is recommended to set environment variables on Linux: -```bash -export USE_XETLA=OFF -export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 -``` +### The GPU examples for `bigdl-llm` have been moved to [here](../../../gpu).