Update readme (#9788)
This commit is contained in:
parent
c38e18f2ff
commit
361781bcd0
4 changed files with 16 additions and 11 deletions
11
README.md
11
README.md
|
|
@ -12,14 +12,14 @@
|
|||
> *It is built on the excellent work of [llama.cpp](https://github.com/ggerganov/llama.cpp), [bitsandbytes](https://github.com/TimDettmers/bitsandbytes), [qlora](https://github.com/artidoro/qlora), [gptq](https://github.com/IST-DASLab/gptq), [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ), [awq](https://github.com/mit-han-lab/llm-awq), [AutoAWQ](https://github.com/casper-hansen/AutoAWQ), [vLLM](https://github.com/vllm-project/vllm), [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), [gptq_for_llama](https://github.com/qwopqwop200/GPTQ-for-LLaMa), [chatglm.cpp](https://github.com/li-plus/chatglm.cpp), [redpajama.cpp](https://github.com/togethercomputer/redpajama.cpp), [gptneox.cpp](https://github.com/byroneverson/gptneox.cpp), [bloomz.cpp](https://github.com/NouamaneTazi/bloomz.cpp/), etc.*
|
||||
|
||||
### Latest update :fire:
|
||||
- [2023/12] `bigdl-llm` now supports [ReLoRA](python/llm/example/GPU/QLoRA-FineTuning/alpaca-qlora#relora) (see *["ReLoRA: High-Rank Training Through Low-Rank Updates"](https://arxiv.org/abs/2307.05695)*)
|
||||
- [2023/12] `bigdl-llm` now supports [Mixtra-7x8B](python/llm/example/GPU/HF-Transformers-AutoModels/Model/mixtral) on both Intel [GPU](python/llm/example/GPU/HF-Transformers-AutoModels/Model/mixtral) and [CPU](python/llm/example/CPU/HF-Transformers-AutoModels/Model/mixtral).
|
||||
- [2023/12] `bigdl-llm` now supports [QA-LoRA](python/llm/example/GPU/QLoRA-FineTuning/alpaca-qlora#qa-lora) (see *["QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models"](https://arxiv.org/abs/2309.14717)*)
|
||||
- [2023/12] `bigdl-llm` now supports [FP8 and FP4 inference](python/llm/example/GPU/HF-Transformers-AutoModels/More-Data-Types) on Intel ***GPU***.
|
||||
- [2023/11] Initial support for directly loading [GGUF](python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/GGUF), [AWQ](python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/AWQ) and [GPTQ](python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/GPTQ) models into `bigdl-llm` is available.
|
||||
- [2023/11] Initial support for [vLLM continuous batching](python/llm/example/CPU/vLLM-Serving) is availabe on Intel ***CPU***.
|
||||
- [2023/11] Initial support for [vLLM continuous batching](python/llm/example/GPU/vLLM-Serving) is availabe on Intel ***GPU***.
|
||||
- [2023/10] [QLoRA finetuning](python/llm/example/CPU/QLoRA-FineTuning) on Intel ***CPU*** is available.
|
||||
- [2023/10] [QLoRA finetuning](python/llm/example/GPU/QLoRA-FineTuning) on Intel ***GPU*** is available.
|
||||
- [2023/11] `bigdl-llm` now supports [vLLM continuous batching](python/llm/example/GPU/vLLM-Serving) on both Intel [GPU](python/llm/example/GPU/vLLM-Serving) and [CPU](python/llm/example/CPU/vLLM-Serving).
|
||||
- [2023/10] `bigdl-llm` now supports [QLoRA finetuning](python/llm/example/GPU/QLoRA-FineTuning) on both Intel [GPU](python/llm/example/GPU/QLoRA-FineTuning) and [CPU](python/llm/example/CPU/QLoRA-FineTuning).
|
||||
- [2023/10] `bigdl-llm` now supports [FastChat serving](python/llm/src/bigdl/llm/serving) on on both Intel CPU and GPU.
|
||||
- [2023/09] `bigdl-llm` now supports [Intel GPU](python/llm/example/GPU) (including Arc, Flex and MAX)
|
||||
- [2023/09] `bigdl-llm` [tutorial](https://github.com/intel-analytics/bigdl-llm-tutorial) is released.
|
||||
- [2023/09] Over 30 models have been optimized/verified on `bigdl-llm`, including *LLaMA/LLaMA2, ChatGLM2/ChatGLM3, Mistral, Falcon, MPT, LLaVA, WizardCoder, Dolly, Whisper, Baichuan/Baichuan2, InternLM, Skywork, QWen/Qwen-VL, Aquila, MOSS,* and more; see the complete list [here](#verified-models).
|
||||
|
|
@ -89,6 +89,7 @@ output = tokenizer.batch_decode(output_ids)
|
|||
#### GPU INT4
|
||||
##### Install
|
||||
You may install **`bigdl-llm`** on Intel GPU as follows:
|
||||
> Note: See the [GPU installation guide](https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html) for more details.
|
||||
```bash
|
||||
# below command will install intel_extension_for_pytorch==2.0.110+xpu as default
|
||||
# you can install specific ipex/torch version for your need
|
||||
|
|
@ -96,8 +97,6 @@ pip install --pre --upgrade bigdl-llm[xpu] -f https://developer.intel.com/ipex-w
|
|||
```
|
||||
> Note: `bigdl-llm` has been tested on Python 3.9
|
||||
|
||||
See the [GPU installation guide](https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html) for mode details.
|
||||
|
||||
##### Run Model
|
||||
You may apply INT4 optimizations to any Hugging Face *Transformers* models as follows.
|
||||
|
||||
|
|
|
|||
|
|
@ -24,14 +24,14 @@ BigDL-LLM: low-Bit LLM library
|
|||
============================================
|
||||
Latest update
|
||||
============================================
|
||||
- [2023/12] ``bigdl-llm`` now supports `ReLoRA <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/QLoRA-FineTuning/alpaca-qlora#relora>`_ (see `"ReLoRA: High-Rank Training Through Low-Rank Updates" <https://arxiv.org/abs/2307.05695>`_)
|
||||
- [2023/12] ``bigdl-llm`` now supports `Mixtra-7x8B <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mixtral>`_ on both Intel `GPU <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mixtral>`_ and `CPU <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels/Model/mixtral>`_.
|
||||
- [2023/12] ``bigdl-llm`` now supports `QA-LoRA <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/QLoRA-FineTuning/alpaca-qlora#qa-lora>`_ (see `"QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models" <https://arxiv.org/abs/2309.14717>`_).
|
||||
- [2023/12] ``bigdl-llm`` now supports `FP8 and FP4 inference <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/More-Data-Types>`_ on Intel **GPU**.
|
||||
- [2023/11] Initial support for directly loading `GGUF <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/GGUF>`_, `AWQ <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/AWQ>`_ and `GPTQ <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Advanced-Quantizations/GPTQ>`_ models in to ``bigdl-llm`` is available.
|
||||
- [2023/11] Initial support for `vLLM continuous batching <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/vLLM-Serving>`_ is availabe on Intel **CPU**.
|
||||
- [2023/11] Initial support for `vLLM continuous batching <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/vLLM-Serving>`_ is availabe on Intel **GPU**.
|
||||
- [2023/10] ``bigdl-llm`` now supports `QLoRA finetuning <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/QLoRA-FineTuning>`_ on Intel **CPU**.
|
||||
- [2023/10] ``bigdl-llm`` now supports `QLoRA finetuning <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/QLoRA-FineTuning>`_ on Intel **GPU**.
|
||||
- [2023/11] ``bigdl-llm`` now supports `vLLM continuous batching <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/vLLM-Serving>`_ on both Intel `GPU <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/vLLM-Serving>`_ and `CPU <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/vLLM-Serving>`_.
|
||||
- [2023/10] ``bigdl-llm`` now supports `QLoRA finetuning <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/QLoRA-FineTuning>`_ on both Intel `GPU <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/QLoRA-FineTuning>`_ and `CPU <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/QLoRA-FineTuning>`_.
|
||||
- [2023/10] ``bigdl-llm`` now supports `FastChat serving <https://github.com/intel-analytics/BigDL/tree/main/python/llm/src/bigdl/llm/serving>`_ on on both Intel CPU and GPU.
|
||||
- [2023/09] ``bigdl-llm`` now supports `Intel GPU <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU>`_ (including Arc, Flex and MAX)
|
||||
- [2023/09] ``bigdl-llm`` `tutorial <https://github.com/intel-analytics/bigdl-llm-tutorial>`_ is released.
|
||||
- Over 30 models have been verified on ``bigdl-llm``, including *LLaMA/LLaMA2, ChatGLM2/ChatGLM3, Mistral, Falcon, MPT, LLaVA, WizardCoder, Dolly, Whisper, Baichuan/Baichuan2, InternLM, Skywork, QWen/Qwen-VL, Aquila, MOSS* and more; see the complete list `here <https://github.com/intel-analytics/bigdl#verified-models>`_.
|
||||
|
|
@ -113,6 +113,10 @@ GPU Quickstart
|
|||
|
||||
You may install ``bigdl-llm`` on Intel GPU as follows as follows:
|
||||
|
||||
.. note::
|
||||
|
||||
See the `GPU installation guide <https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html>`_ for more details.
|
||||
|
||||
.. code-block:: console
|
||||
|
||||
# below command will install intel_extension_for_pytorch==2.0.110+xpu as default
|
||||
|
|
|
|||
|
|
@ -5,3 +5,4 @@ This folder contains examples of running any Hugging Face Transformers model on
|
|||
- [Model](Model): examples of running Hugging Face Transformers models (e.g., LLaMA2, ChatGLM2, Falcon, MPT, Baichuan2, etc.) using INT4 optimizations
|
||||
- [More-Data-Types](More-Data-Types): examples of applying other low bit optimizations (NF4/INT5/INT8, etc.)
|
||||
- [Save-Load](Save-Load): examples of saving and loading low-bit models
|
||||
- [Advanced-Quantizations](): examples of loading GGUF/AWQ/GPTQ models
|
||||
|
|
|
|||
|
|
@ -5,3 +5,4 @@ This folder contains examples of running any Hugging Face Transformers model on
|
|||
- [Model](Model): examples of running Hugging Face Transformers models (e.g., LLaMA2, ChatGLM2, Falcon, MPT, Baichuan2, etc.) using INT4 optimizations
|
||||
- [More-Data-Types](More-Data-Types): examples of applying other low bit optimizations (NF4/INT5/INT8, etc.)
|
||||
- [Save-Load](Save-Load): examples of saving and loading low-bit models
|
||||
- [Advanced-Quantizations](): examples of loading GGUF/AWQ/GPTQ models
|
||||
|
|
|
|||
Loading…
Reference in a new issue