From 361781bcd0ca0ae41fb9134b2290e3a5a4992390 Mon Sep 17 00:00:00 2001 From: Jason Dai Date: Tue, 26 Dec 2023 19:46:11 +0800 Subject: [PATCH] Update readme (#9788) --- README.md | 13 ++++++------- docs/readthedocs/source/index.rst | 12 ++++++++---- .../CPU/HF-Transformers-AutoModels/README.md | 1 + .../GPU/HF-Transformers-AutoModels/README.md | 1 + 4 files changed, 16 insertions(+), 11 deletions(-) diff --git a/README.md b/README.md index a26b6080..2ba750a4 100644 --- a/README.md +++ b/README.md @@ -12,14 +12,14 @@ > *It is built on the excellent work of [llama.cpp](https://github.com/ggerganov/llama.cpp), [bitsandbytes](https://github.com/TimDettmers/bitsandbytes), [qlora](https://github.com/artidoro/qlora), [gptq](https://github.com/IST-DASLab/gptq), [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ), [awq](https://github.com/mit-han-lab/llm-awq), [AutoAWQ](https://github.com/casper-hansen/AutoAWQ), [vLLM](https://github.com/vllm-project/vllm), [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), [gptq_for_llama](https://github.com/qwopqwop200/GPTQ-for-LLaMa), [chatglm.cpp](https://github.com/li-plus/chatglm.cpp), [redpajama.cpp](https://github.com/togethercomputer/redpajama.cpp), [gptneox.cpp](https://github.com/byroneverson/gptneox.cpp), [bloomz.cpp](https://github.com/NouamaneTazi/bloomz.cpp/), etc.* ### Latest update :fire: +- [2023/12] `bigdl-llm` now supports [ReLoRA](python/llm/example/GPU/QLoRA-FineTuning/alpaca-qlora#relora) (see *["ReLoRA: High-Rank Training Through Low-Rank Updates"](https://arxiv.org/abs/2307.05695)*) - [2023/12] `bigdl-llm` now supports [Mixtra-7x8B](python/llm/example/GPU/HF-Transformers-AutoModels/Model/mixtral) on both Intel [GPU](python/llm/example/GPU/HF-Transformers-AutoModels/Model/mixtral) and [CPU](python/llm/example/CPU/HF-Transformers-AutoModels/Model/mixtral). - [2023/12] `bigdl-llm` now supports [QA-LoRA](python/llm/example/GPU/QLoRA-FineTuning/alpaca-qlora#qa-lora) (see *["QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models"](https://arxiv.org/abs/2309.14717)*) - [2023/12] `bigdl-llm` now supports [FP8 and FP4 inference](python/llm/example/GPU/HF-Transformers-AutoModels/More-Data-Types) on Intel ***GPU***. -- [2023/11] Initial support for directly loading [GGUF](python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/GGUF), [AWQ](python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/AWQ) and [GPTQ](python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/GPTQ) models in to `bigdl-llm` is available. -- [2023/11] Initial support for [vLLM continuous batching](python/llm/example/CPU/vLLM-Serving) is availabe on Intel ***CPU***. -- [2023/11] Initial support for [vLLM continuous batching](python/llm/example/GPU/vLLM-Serving) is availabe on Intel ***GPU***. -- [2023/10] [QLoRA finetuning](python/llm/example/CPU/QLoRA-FineTuning) on Intel ***CPU*** is available. -- [2023/10] [QLoRA finetuning](python/llm/example/GPU/QLoRA-FineTuning) on Intel ***GPU*** is available. +- [2023/11] Initial support for directly loading [GGUF](python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/GGUF), [AWQ](python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/AWQ) and [GPTQ](python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/GPTQ) models into `bigdl-llm` is available. +- [2023/11] `bigdl-llm` now supports [vLLM continuous batching](python/llm/example/GPU/vLLM-Serving) on both Intel [GPU](python/llm/example/GPU/vLLM-Serving) and [CPU](python/llm/example/CPU/vLLM-Serving). +- [2023/10] `bigdl-llm` now supports [QLoRA finetuning](python/llm/example/GPU/QLoRA-FineTuning) on both Intel [GPU](python/llm/example/GPU/QLoRA-FineTuning) and [CPU](python/llm/example/CPU/QLoRA-FineTuning). +- [2023/10] `bigdl-llm` now supports [FastChat serving](python/llm/src/bigdl/llm/serving) on on both Intel CPU and GPU. - [2023/09] `bigdl-llm` now supports [Intel GPU](python/llm/example/GPU) (including Arc, Flex and MAX) - [2023/09] `bigdl-llm` [tutorial](https://github.com/intel-analytics/bigdl-llm-tutorial) is released. - [2023/09] Over 30 models have been optimized/verified on `bigdl-llm`, including *LLaMA/LLaMA2, ChatGLM2/ChatGLM3, Mistral, Falcon, MPT, LLaVA, WizardCoder, Dolly, Whisper, Baichuan/Baichuan2, InternLM, Skywork, QWen/Qwen-VL, Aquila, MOSS,* and more; see the complete list [here](#verified-models). @@ -89,6 +89,7 @@ output = tokenizer.batch_decode(output_ids) #### GPU INT4 ##### Install You may install **`bigdl-llm`** on Intel GPU as follows: +> Note: See the [GPU installation guide](https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html) for more details. ```bash # below command will install intel_extension_for_pytorch==2.0.110+xpu as default # you can install specific ipex/torch version for your need @@ -96,8 +97,6 @@ pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-w ``` > Note: `bigdl-llm` has been tested on Python 3.9 -See the [GPU installation guide](https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html) for mode details. - ##### Run Model You may apply INT4 optimizations to any Hugging Face *Transformers* models as follows. diff --git a/docs/readthedocs/source/index.rst b/docs/readthedocs/source/index.rst index 4a696bfc..1ccae5db 100644 --- a/docs/readthedocs/source/index.rst +++ b/docs/readthedocs/source/index.rst @@ -24,14 +24,14 @@ BigDL-LLM: low-Bit LLM library ============================================ Latest update ============================================ +- [2023/12] ``bigdl-llm`` now supports `ReLoRA `_ (see `"ReLoRA: High-Rank Training Through Low-Rank Updates" `_) - [2023/12] ``bigdl-llm`` now supports `Mixtra-7x8B `_ on both Intel `GPU `_ and `CPU `_. - [2023/12] ``bigdl-llm`` now supports `QA-LoRA `_ (see `"QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models" `_). - [2023/12] ``bigdl-llm`` now supports `FP8 and FP4 inference `_ on Intel **GPU**. - [2023/11] Initial support for directly loading `GGUF `_, `AWQ `_ and `GPTQ `_ models in to ``bigdl-llm`` is available. -- [2023/11] Initial support for `vLLM continuous batching `_ is availabe on Intel **CPU**. -- [2023/11] Initial support for `vLLM continuous batching `_ is availabe on Intel **GPU**. -- [2023/10] ``bigdl-llm`` now supports `QLoRA finetuning `_ on Intel **CPU**. -- [2023/10] ``bigdl-llm`` now supports `QLoRA finetuning `_ on Intel **GPU**. +- [2023/11] ``bigdl-llm`` now supports `vLLM continuous batching `_ on both Intel `GPU `_ and `CPU `_. +- [2023/10] ``bigdl-llm`` now supports `QLoRA finetuning `_ on both Intel `GPU `_ and `CPU `_. +- [2023/10] ``bigdl-llm`` now supports `FastChat serving `_ on on both Intel CPU and GPU. - [2023/09] ``bigdl-llm`` now supports `Intel GPU `_ (including Arc, Flex and MAX) - [2023/09] ``bigdl-llm`` `tutorial `_ is released. - Over 30 models have been verified on ``bigdl-llm``, including *LLaMA/LLaMA2, ChatGLM2/ChatGLM3, Mistral, Falcon, MPT, LLaVA, WizardCoder, Dolly, Whisper, Baichuan/Baichuan2, InternLM, Skywork, QWen/Qwen-VL, Aquila, MOSS* and more; see the complete list `here `_. @@ -113,6 +113,10 @@ GPU Quickstart You may install ``bigdl-llm`` on Intel GPU as follows as follows: +.. note:: + + See the `GPU installation guide `_ for more details. + .. code-block:: console # below command will install intel_extension_for_pytorch==2.0.110+xpu as default diff --git a/python/llm/example/CPU/HF-Transformers-AutoModels/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/README.md index e0cebde5..ebe9f774 100644 --- a/python/llm/example/CPU/HF-Transformers-AutoModels/README.md +++ b/python/llm/example/CPU/HF-Transformers-AutoModels/README.md @@ -5,3 +5,4 @@ This folder contains examples of running any Hugging Face Transformers model on - [Model](Model): examples of running Hugging Face Transformers models (e.g., LLaMA2, ChatGLM2, Falcon, MPT, Baichuan2, etc.) using INT4 optimizations - [More-Data-Types](More-Data-Types): examples of applying other low bit optimizations (NF4/INT5/INT8, etc.) - [Save-Load](Save-Load): examples of saving and loading low-bit models +- [Advanced-Quantizations](): examples of loading GGUF/AWQ/GPTQ models diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/README.md index da1a13d6..18281167 100644 --- a/python/llm/example/GPU/HF-Transformers-AutoModels/README.md +++ b/python/llm/example/GPU/HF-Transformers-AutoModels/README.md @@ -5,3 +5,4 @@ This folder contains examples of running any Hugging Face Transformers model on - [Model](Model): examples of running Hugging Face Transformers models (e.g., LLaMA2, ChatGLM2, Falcon, MPT, Baichuan2, etc.) using INT4 optimizations - [More-Data-Types](More-Data-Types): examples of applying other low bit optimizations (NF4/INT5/INT8, etc.) - [Save-Load](Save-Load): examples of saving and loading low-bit models +- [Advanced-Quantizations](): examples of loading GGUF/AWQ/GPTQ models