Update README (#10186)
This commit is contained in:
		
							parent
							
								
									f7c96b19ef
								
							
						
					
					
						commit
						4655005f24
					
				
					 2 changed files with 12 additions and 6 deletions
				
			
		| 
						 | 
				
			
			@ -7,11 +7,12 @@
 | 
			
		|||
---
 | 
			
		||||
## BigDL-LLM
 | 
			
		||||
 | 
			
		||||
**[`bigdl-llm`](python/llm)** is a library for running **LLM** (large language model) on Intel **XPU** (from *Laptop* to *GPU* to *Cloud*) using **INT4/FP4/INT8/FP8** with very low latency[^1] (for any **PyTorch** model).
 | 
			
		||||
**`bigdl-llm`** is a library for running **LLM** (large language model) on Intel **XPU** (from *Laptop* to *GPU* to *Cloud*) using **INT4/FP4/INT8/FP8** with very low latency[^1] (for any **PyTorch** model).
 | 
			
		||||
 | 
			
		||||
> *It is built on the excellent work of [llama.cpp](https://github.com/ggerganov/llama.cpp), [bitsandbytes](https://github.com/TimDettmers/bitsandbytes), [qlora](https://github.com/artidoro/qlora), [gptq](https://github.com/IST-DASLab/gptq), [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ), [awq](https://github.com/mit-han-lab/llm-awq), [AutoAWQ](https://github.com/casper-hansen/AutoAWQ), [vLLM](https://github.com/vllm-project/vllm), [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), [gptq_for_llama](https://github.com/qwopqwop200/GPTQ-for-LLaMa), [chatglm.cpp](https://github.com/li-plus/chatglm.cpp), [redpajama.cpp](https://github.com/togethercomputer/redpajama.cpp), [gptneox.cpp](https://github.com/byroneverson/gptneox.cpp), [bloomz.cpp](https://github.com/NouamaneTazi/bloomz.cpp/), etc.*
 | 
			
		||||
 | 
			
		||||
### Latest update 🔥 
 | 
			
		||||
- [2024/02] Users can now use `bigdl-llm` through [Text-Generation-WebUI](https://github.com/intel-analytics/text-generation-webui) GUI.
 | 
			
		||||
- [2024/02] `bigdl-llm` now supports *[Self-Speculative Decoding](https://bigdl.readthedocs.io/en/latest/doc/LLM/Inference/Self_Speculative_Decoding.html)*, which in practice brings **~30% speedup** for FP16 and BF16 inference latency on Intel [GPU](python/llm/example/GPU/Speculative-Decoding) and [CPU](python/llm/example/CPU/Speculative-Decoding) respectively.
 | 
			
		||||
- [2024/02] `bigdl-llm` now supports a comprehensive list of LLM finetuning on Intel GPU (including [LoRA](python/llm/example/GPU/LLM-Finetuning/LoRA), [QLoRA](python/llm/example/GPU/LLM-Finetuning/QLoRA), [DPO](python/llm/example/GPU/LLM-Finetuning/DPO), [QA-LoRA](python/llm/example/GPU/LLM-Finetuning/QA-LoRA) and [ReLoRA](python/llm/example/GPU/LLM-Finetuning/ReLora)).
 | 
			
		||||
- [2024/01] Using `bigdl-llm` [QLoRA](python/llm/example/GPU/LLM-Finetuning/QLoRA), we managed to finetune LLaMA2-7B in **21 minutes** and LLaMA2-70B in **3.14 hours** on 8 Intel Max 1550 GPU for [Standford-Alpaca](python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora) (see the blog [here](https://www.intel.com/content/www/us/en/developer/articles/technical/finetuning-llms-on-intel-gpus-using-bigdl-llm.html)).
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
| 
						 | 
				
			
			@ -8,13 +8,13 @@ The BigDL Project
 | 
			
		|||
------
 | 
			
		||||
 | 
			
		||||
************************************************
 | 
			
		||||
BigDL-LLM: low-Bit LLM library
 | 
			
		||||
BigDL-LLM
 | 
			
		||||
************************************************
 | 
			
		||||
 | 
			
		||||
.. raw:: html
 | 
			
		||||
 | 
			
		||||
   <p>
 | 
			
		||||
      <a href="https://github.com/intel-analytics/BigDL/tree/main/python/llm"><code><span>bigdl-llm</span></code></a> is a library for running <strong>LLM</strong> (large language model) on Intel <strong>XPU</strong> (from <em>Laptop</em> to <em>GPU</em> to <em>Cloud</em>) using <strong>INT4/FP4/INT8/FP8</strong> with very low latency <sup><a href="#footnote-perf" id="ref-perf">[1]</a></sup> (for any <strong>PyTorch</strong> model).
 | 
			
		||||
      <a href="https://github.com/intel-analytics/BigDL/"><code><span>bigdl-llm</span></code></a> is a library for running <strong>LLM</strong> (large language model) on Intel <strong>XPU</strong> (from <em>Laptop</em> to <em>GPU</em> to <em>Cloud</em>) using <strong>INT4/FP4/INT8/FP8</strong> with very low latency <sup><a href="#footnote-perf" id="ref-perf">[1]</a></sup> (for any <strong>PyTorch</strong> model).
 | 
			
		||||
   </p>
 | 
			
		||||
 | 
			
		||||
.. note::
 | 
			
		||||
| 
						 | 
				
			
			@ -24,6 +24,7 @@ BigDL-LLM: low-Bit LLM library
 | 
			
		|||
============================================
 | 
			
		||||
Latest update 🔥
 | 
			
		||||
============================================
 | 
			
		||||
- [2024/02] Users can now use ``bigdl-llm`` through `Text-Generation-WebUI <https://github.com/intel-analytics/text-generation-webui>`_ GUI.
 | 
			
		||||
- [2024/02] ``bigdl-llm`` now supports `Self-Speculative Decoding <doc/LLM/Inference/Self_Speculative_Decoding.html>`_, which in practice brings **~30% speedup** for FP16 and BF16 inference latency on Intel `GPU <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/Speculative-Decoding>`_ and `CPU <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/Speculative-Decoding>`_ respectively.
 | 
			
		||||
- [2024/02] ``bigdl-llm`` now supports a comprehensive list of LLM finetuning on Intel GPU (including `LoRA <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/LLM-Finetuning/LoRA>`_, `QLoRA <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/LLM-Finetuning/QLoRA>`_, `DPO <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/LLM-Finetuning/DPO>`_, `QA-LoRA <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/LLM-Finetuning/QA-LoRA>`_ and `ReLoRA <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/LLM-Finetuning/ReLora>`_).
 | 
			
		||||
- [2024/01] Using ``bigdl-llm`` `QLoRA <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/LLM-Finetuning/QLoRA>`_, we managed to finetune LLaMA2-7B in **21 minutes** and LLaMA2-70B in **3.14 hours** on 8 Intel Max 1550 GPU for `Standford-Alpaca <https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/LLM-Finetuning/QLoRA/alpaca-qlora>`_ (see the blog `here <https://www.intel.com/content/www/us/en/developer/articles/technical/finetuning-llms-on-intel-gpus-using-bigdl-llm.html>`_).
 | 
			
		||||
| 
						 | 
				
			
			@ -88,13 +89,17 @@ CPU Quickstart
 | 
			
		|||
 | 
			
		||||
You may install ``bigdl-llm`` on Intel CPU as follows as follows:
 | 
			
		||||
 | 
			
		||||
.. note::
 | 
			
		||||
 | 
			
		||||
   See the `CPU installation guide <doc/LLM/Overview/install_cpu.html>`_ for more details.
 | 
			
		||||
 | 
			
		||||
.. code-block:: console
 | 
			
		||||
 | 
			
		||||
   pip install --pre --upgrade bigdl-llm[all]
 | 
			
		||||
 | 
			
		||||
.. note::
 | 
			
		||||
 | 
			
		||||
   ``bigdl-llm`` has been tested on Python 3.9.
 | 
			
		||||
   ``bigdl-llm`` has been tested on Python 3.9, 3.10 and 3.11
 | 
			
		||||
 | 
			
		||||
You can then apply INT4 optimizations to any Hugging Face *Transformers* models as follows.
 | 
			
		||||
 | 
			
		||||
| 
						 | 
				
			
			@ -119,7 +124,7 @@ You may install ``bigdl-llm`` on Intel GPU as follows as follows:
 | 
			
		|||
 | 
			
		||||
.. note::
 | 
			
		||||
 | 
			
		||||
   See the `GPU installation guide <https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html>`_ for more details.
 | 
			
		||||
   See the `GPU installation guide <doc/LLM/Overview/install_gpu.html>`_ for more details.
 | 
			
		||||
 | 
			
		||||
.. code-block:: console
 | 
			
		||||
 | 
			
		||||
| 
						 | 
				
			
			@ -128,7 +133,7 @@ You may install ``bigdl-llm`` on Intel GPU as follows as follows:
 | 
			
		|||
 | 
			
		||||
.. note::
 | 
			
		||||
 | 
			
		||||
   ``bigdl-llm`` has been tested on Python 3.9.
 | 
			
		||||
   ``bigdl-llm`` has been tested on Python 3.9, 3.10 and 3.11
 | 
			
		||||
 | 
			
		||||
You can then apply INT4 optimizations to any Hugging Face *Transformers* models on Intel GPU as follows.
 | 
			
		||||
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
		Loading…
	
		Reference in a new issue