Update llm document (#8784)

This commit is contained in:
Jason Dai 2023-08-21 22:34:44 +08:00 committed by GitHub
parent 611c1fb628
commit dcadd09154
3 changed files with 50 additions and 32 deletions

View file

@ -9,11 +9,11 @@ _**Fast, Distributed, Secure AI for Big Data**_
---
## Latest News
- **Try the latest [`bigdl-llm`](python/llm) for running LLM (large language model) on your Intel laptop using INT4 with very low latency!**[^1] *(It is built on top of the excellent work of [llama.cpp](https://github.com/ggerganov/llama.cpp), [gptq](https://github.com/IST-DASLab/gptq), [bitsandbytes](https://github.com/TimDettmers/bitsandbytes), etc., and supports any Hugging Face Transformers model)*
- **Try the latest [`bigdl-llm`](python/llm) library for running LLM (large language model) on your Intel laptop using INT4 with very low latency!**[^1] *(It is built on top of the excellent work of [llama.cpp](https://github.com/ggerganov/llama.cpp), [gptq](https://github.com/IST-DASLab/gptq), [bitsandbytes](https://github.com/TimDettmers/bitsandbytes), etc., and supports any Hugging Face Transformers model)*
<p align="center">
<img src="https://github.com/bigdl-project/bigdl-project.github.io/blob/master/assets/chatglm2-6b.gif" width='30%' /> <img src="https://github.com/bigdl-project/bigdl-project.github.io/blob/master/assets/llm-13b.gif" width='30%' /> <img src="https://github.com/bigdl-project/bigdl-project.github.io/blob/master/assets/llm-15b5.gif" width='30%' />
<img src="https://github.com/bigdl-project/bigdl-project.github.io/blob/master/assets/llm-models2.png" width='76%'/>
<img src="https://github.com/bigdl-project/bigdl-project.github.io/blob/master/assets/chatglm2-6b.gif" width='30%' /> <img src="https://github.com/bigdl-project/bigdl-project.github.io/blob/master/assets/llama-2-13b-chat.gif" width='30%' /> <img src="https://github.com/bigdl-project/bigdl-project.github.io/blob/master/assets/llm-15b5.gif" width='30%' />
<img src="https://github.com/bigdl-project/bigdl-project.github.io/blob/master/assets/llm-models3.png" width='76%'/>
</p>
- **[Update] Over a dozen models have been verified on [`bigdl-llm`](python/llm)**, including *LLaMA/LLaMA2, ChatGLM/ChatGLM2, MPT, Falcon, Dolly-v1/Dolly-v2, StarCoder, Whisper, QWen, Baichuan,* and more; see the complete list [here](python/llm/README.md#verified-models).
@ -23,9 +23,11 @@ _**Fast, Distributed, Secure AI for Big Data**_
BigDL seamlessly scales your data analytics & AI applications from laptop to cloud, with the following libraries:
- [LLM](python/llm): Low-bit (INT3/INT4/INT5/INT8) large language model library for Intel CPU/GPU
- [Orca](#orca): Distributed Big Data & AI (TF & PyTorch) Pipeline on Spark and Ray
- [Nano](#nano): Transparent Acceleration of Tensorflow & PyTorch Programs on XPU
- [Nano](#nano): Transparent Acceleration of Tensorflow & PyTorch Programs on Intel CPU/GPU
- [DLlib](#dllib): “Equivalent of Spark MLlib” for Deep Learning
@ -33,7 +35,7 @@ BigDL seamlessly scales your data analytics & AI applications from laptop to clo
- [Friesian](#friesian): End-to-End Recommendation Systems
- [PPML](#ppml): Secure Big Data and AI (with SGX Hardware Security)
- [PPML](#ppml): Secure Big Data and AI (with SGX/TDX Hardware Security)
For more information, you may [read the docs](https://bigdl.readthedocs.io/).
@ -47,13 +49,15 @@ flowchart TD;
Feature1-- "Yes" -->ReferPPML([<em><strong>PPML</strong></em>]);
Feature2-- Python -->Feature3{{What type of application?}};
Feature2-- Scala/Java -->ReferDLlib([<em><strong>DLlib</strong></em>]);
Feature3-- "Distributed Big Data + AI (TF/PyTorch)" -->ReferOrca([<em><strong>Orca</strong></em>]);
Feature3-- "Large Language Model" -->ReferLLM([<em><strong>LLM</strong></em>]);
Feature3-- "Big Data + AI (TF/PyTorch)" -->ReferOrca([<em><strong>Orca</strong></em>]);
Feature3-- Accelerate TensorFlow / PyTorch -->ReferNano([<em><strong>Nano</strong></em>]);
Feature3-- DL for Spark MLlib -->ReferDLlib2([<em><strong>DLlib</strong></em>]);
Feature3-- High Level App Framework -->Feature4{{Domain?}};
Feature4-- Time Series -->ReferChronos([<em><strong>Chronos</strong></em>]);
Feature4-- Recommender System -->ReferFriesian([<em><strong>Friesian</strong></em>]);
click ReferLLM "https://github.com/intel-analytics/bigdl/tree/main/python/llm"
click ReferNano "https://github.com/intel-analytics/bigdl#nano"
click ReferOrca "https://github.com/intel-analytics/bigdl#orca"
click ReferDLlib "https://github.com/intel-analytics/bigdl#dllib"
@ -64,7 +68,7 @@ flowchart TD;
classDef ReferStyle1 fill:#5099ce,stroke:#5099ce;
classDef Feature fill:#FFF,stroke:#08409c,stroke-width:1px;
class ReferNano,ReferOrca,ReferDLlib,ReferDLlib2,ReferChronos,ReferFriesian,ReferPPML ReferStyle1;
class ReferLLM,ReferNano,ReferOrca,ReferDLlib,ReferDLlib2,ReferChronos,ReferFriesian,ReferPPML ReferStyle1;
class Feature1,Feature2,Feature3,Feature4,Feature5,Feature6,Feature7 Feature;
```

View file

@ -4,10 +4,20 @@
BigDL: fast, distributed, secure AI for Big Data
=================================================
Latest News
---------------------------------
- **Try the latest** `bigdl-llm <https://github.com/intel-analytics/BigDL/tree/main/python/llm>`_ **library for running LLM (large language model) on your Intel laptop using INT4 with very low latency!** [*]_. *(It is built on top of the excellent work of* `llama.cpp <https://github.com/ggerganov/llama.cpp>`_, `gptq <https://github.com/IST-DASLab/gptq>`_, `bitsandbytes <https://github.com/TimDettmers/bitsandbytes>`_, *etc., and supports any Hugging Face Transformers model.)*
- **[Update] Over a dozen models have been verified on** `bigdl-llm <https://github.com/intel-analytics/BigDL/tree/main/python/llm>`_, including *LLaMA/LLaMA2, ChatGLM/ChatGLM2, MPT, Falcon, Dolly-v1/Dolly-v2, StarCoder, Whisper, QWen, Baichuan,* and more; see the complete list `here <https://github.com/intel-analytics/BigDL/tree/main/python/llm/README.md#verified-models>`_.
------
Overview
---------------------------------
`BigDL <https://github.com/intel-analytics/bigdl>`_ seamlessly scales your data analytics & AI applications from laptop to cloud, with the following libraries:
- `LLM <https://github.com/intel-analytics/BigDL/tree/main/python/llm>`_: Low-bit (INT3/INT4/INT5/INT8) large language model library for Intel CPU/GPU
- `Orca <doc/Orca/index.html>`_: Distributed Big Data & AI (TF & PyTorch) Pipeline on Spark and Ray
- `Nano <doc/Nano/index.html>`_: Transparent Acceleration of Tensorflow & PyTorch Programs on XPU
- `Nano <doc/Nano/index.html>`_: Transparent Acceleration of Tensorflow & PyTorch Programs on Intel CPU/GPU
- `DLlib <doc/DLlib/index.html>`_: "Equivalent of Spark MLlib" for Deep Learning
- `Chronos <doc/Chronos/index.html>`_: Scalable Time Series Analysis using AutoML
- `Friesian <doc/Friesian/index.html>`_: End-to-End Recommendation Systems
@ -30,6 +40,7 @@ Choosing the right BigDL library
Feature3 [label="What type of application?"]
Feature4 [label="Domain?"]
LLM[href="https://github.com/intel-analytics/BigDL/blob/main/python/llm" target="_blank" target="_blank" style="rounded,filled" fontcolor="#ffffff" tooltip="Go to BigDL-LLM document"]
Orca[href="../doc/Orca/index.html" target="_blank" target="_blank" style="rounded,filled" fontcolor="#ffffff" tooltip="Go to BigDL-Orca document"]
Nano[href="../doc/Nano/index.html" target="_blank" target="_blank" style="rounded,filled" fontcolor="#ffffff" tooltip="Go to BigDL-Nano document"]
DLlib1[label="DLlib" href="../doc/DLlib/index.html" target="_blank" style="rounded,filled" fontcolor="#ffffff" tooltip="Go to BigDL-DLlib document"]
@ -42,12 +53,13 @@ Choosing the right BigDL library
ArrowLabel2[label="Yes" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"]
ArrowLabel3[label="Python" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"]
ArrowLabel4[label="Scala/Java" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"]
ArrowLabel5[label="Distributed Big Data \n + \n AI (TF/PyTorch)" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"]
ArrowLabel6[label="Accelerate \n TensorFlow / PyTorch" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"]
ArrowLabel7[label="DL for Spark MLlib" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"]
ArrowLabel8[label="High Level App Framework" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"]
ArrowLabel9[label="Time Series" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"]
ArrowLabel10[label="Recommender System" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"]
ArrowLabel5[label="Large Language Model" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"]
ArrowLabel6[label="Big Data + \n AI (TF/PyTorch)" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"]
ArrowLabel7[label="Accelerate \n TensorFlow / PyTorch" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"]
ArrowLabel8[label="DL for Spark MLlib" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"]
ArrowLabel9[label="High Level App Framework" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"]
ArrowLabel10[label="Time Series" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"]
ArrowLabel11[label="Recommender System" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"]
Feature1 -> ArrowLabel1[dir=none]
ArrowLabel1 -> Feature2
@ -60,16 +72,22 @@ Choosing the right BigDL library
ArrowLabel4 -> DLlib1
Feature3 -> ArrowLabel5[dir=none]
ArrowLabel5 -> Orca
ArrowLabel5 -> LLM
Feature3 -> ArrowLabel6[dir=none]
ArrowLabel6 -> Nano
ArrowLabel6 -> Orca
Feature3 -> ArrowLabel7[dir=none]
ArrowLabel7 -> DLlib2
ArrowLabel7 -> Nano
Feature3 -> ArrowLabel8[dir=none]
ArrowLabel8 -> Feature4
Feature4 -> ArrowLabel9[dir=none]
ArrowLabel9 -> Chronos
ArrowLabel8 -> DLlib2
Feature3 -> ArrowLabel9[dir=none]
ArrowLabel9 -> Feature4
Feature4 -> ArrowLabel10[dir=none]
ArrowLabel10 -> Friesian
ArrowLabel10 -> Chronos
Feature4 -> ArrowLabel11[dir=none]
ArrowLabel11 -> Friesian
}
------
.. [*] Performance varies by use, configuration and other factors. `bigdl-llm` may not optimize to the same degree for non-Intel products. Learn more at www.Intel.com/PerformanceIndex.

View file

@ -5,11 +5,11 @@
>*(It is built on top of the excellent work of [llama.cpp](https://github.com/ggerganov/llama.cpp), [gptq](https://github.com/IST-DASLab/gptq), [ggml](https://github.com/ggerganov/ggml), [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), [gptq_for_llama](https://github.com/qwopqwop200/GPTQ-for-LLaMa), [bitsandbytes](https://github.com/TimDettmers/bitsandbytes), [chatglm.cpp](https://github.com/li-plus/chatglm.cpp), [redpajama.cpp](https://github.com/togethercomputer/redpajama.cpp), [gptneox.cpp](https://github.com/byroneverson/gptneox.cpp), [bloomz.cpp](https://github.com/NouamaneTazi/bloomz.cpp/), etc.)*
### Demos
See the ***optimized performance*** of `chatglm2-6b`, `vicuna-13b-v1.1`, and `starcoder-15b` models on a 12th Gen Intel Core CPU below.
See the ***optimized performance*** of `chatglm2-6b`, `llama-2-13b-chat`, and `starcoder-15b` models on a 12th Gen Intel Core CPU below.
<p align="center">
<img src="https://github.com/bigdl-project/bigdl-project.github.io/blob/master/assets/chatglm2-6b.gif" width='33%' /> <img src="https://github.com/bigdl-project/bigdl-project.github.io/blob/master/assets/llm-13b.gif" width='33%' /> <img src="https://github.com/bigdl-project/bigdl-project.github.io/blob/master/assets/llm-15b5.gif" width='33%' />
<img src="https://github.com/bigdl-project/bigdl-project.github.io/blob/master/assets/llm-models2.png" width='85%'/>
<img src="https://github.com/bigdl-project/bigdl-project.github.io/blob/master/assets/chatglm2-6b.gif" width='33%' /> <img src="https://github.com/bigdl-project/bigdl-project.github.io/blob/master/assets/llama-2-13b-chat.gif" width='33%' /> <img src="https://github.com/bigdl-project/bigdl-project.github.io/blob/master/assets/llm-15b5.gif" width='33%' />
<img src="https://github.com/bigdl-project/bigdl-project.github.io/blob/master/assets/llm-models3.png" width='85%'/>
</p>
### Verified models
@ -111,13 +111,9 @@ You may run the models using `transformers`-style API in `bigdl-llm`.
You may also convert Hugging Face *Transformers* models into native INT4 format for maximum performance as follows.
>**Notes**:
>**Notes**: Currently only llama/bloom/gptneox/starcoder/chatglm model families are supported; you may use the corresponding API to load the converted model. (For other models, you can use the Transformers INT4 format as described above).
* Currently only llama/bloom/gptneox/starcoder/chatglm model families are supported; for other models, you may use the Transformers INT4 format as described above).
* You may choose the corresponding API developed for specific native models to load the converted model.
```python
```python
#convert the model
from bigdl.llm import llm_convert
bigdl_llm_path = llm_convert(model='/path/to/model/',
@ -126,7 +122,7 @@ You may run the models using `transformers`-style API in `bigdl-llm`.
#load the converted model
#switch to ChatGLMForCausalLM/GptneoxForCausalLM/BloomForCausalLM/StarcoderForCausalLM to load other models
from bigdl.llm.transformers import LlamaForCausalLM
llm = LlamaForCausalLM.from_pretrained("/path/to/output/model.bin", ...)
llm = LlamaForCausalLM.from_pretrained("/path/to/output/model.bin", native=True, ...)
#run the converted model
input_ids = llm.tokenize(prompt)