Update llm document (#8784)

2023-08-21 22:34:44 +08:00 · 2023-08-21 22:34:44 +08:00 · dcadd09154
commit dcadd09154
parent 611c1fb628
3 changed files with 50 additions and 32 deletions
--- a/README.md
+++ b/README.md
@ -9,11 +9,11 @@ _**Fast, Distributed, Secure AI for Big Data**_
 ---
 ## Latest News

- **Try the latest [`bigdl-llm`](python/llm) for running LLM (large language model) on your Intel laptop using INT4 with very low latency!**[^1] *(It is built on top of the excellent work of [llama.cpp](https://github.com/ggerganov/llama.cpp), [gptq](https://github.com/IST-DASLab/gptq), [bitsandbytes](https://github.com/TimDettmers/bitsandbytes), etc., and supports any Hugging Face Transformers model)*
+- **Try the latest [`bigdl-llm`](python/llm) library for running LLM (large language model) on your Intel laptop using INT4 with very low latency!**[^1] *(It is built on top of the excellent work of [llama.cpp](https://github.com/ggerganov/llama.cpp), [gptq](https://github.com/IST-DASLab/gptq), [bitsandbytes](https://github.com/TimDettmers/bitsandbytes), etc., and supports any Hugging Face Transformers model)*

 <p align="center">
-            <img src="https://github.com/bigdl-project/bigdl-project.github.io/blob/master/assets/chatglm2-6b.gif" width='30%' /> <img src="https://github.com/bigdl-project/bigdl-project.github.io/blob/master/assets/llm-13b.gif" width='30%' /> <img src="https://github.com/bigdl-project/bigdl-project.github.io/blob/master/assets/llm-15b5.gif" width='30%' />
-            <img src="https://github.com/bigdl-project/bigdl-project.github.io/blob/master/assets/llm-models2.png" width='76%'/>
+            <img src="https://github.com/bigdl-project/bigdl-project.github.io/blob/master/assets/chatglm2-6b.gif" width='30%' /> <img src="https://github.com/bigdl-project/bigdl-project.github.io/blob/master/assets/llama-2-13b-chat.gif" width='30%' /> <img src="https://github.com/bigdl-project/bigdl-project.github.io/blob/master/assets/llm-15b5.gif" width='30%' />
+            <img src="https://github.com/bigdl-project/bigdl-project.github.io/blob/master/assets/llm-models3.png" width='76%'/>
 </p>

 - **[Update] Over a dozen models have been verified on [`bigdl-llm`](python/llm)**, including *LLaMA/LLaMA2, ChatGLM/ChatGLM2, MPT, Falcon, Dolly-v1/Dolly-v2, StarCoder, Whisper, QWen, Baichuan,* and more; see the complete list [here](python/llm/README.md#verified-models).
@ -23,9 +23,11 @@ _**Fast, Distributed, Secure AI for Big Data**_

 BigDL seamlessly scales your data analytics & AI applications from laptop to cloud, with the following libraries:

+- [LLM](python/llm): Low-bit (INT3/INT4/INT5/INT8) large language model library for Intel CPU/GPU
+
 - [Orca](#orca): Distributed Big Data & AI (TF & PyTorch) Pipeline on Spark and Ray

- [Nano](#nano): Transparent Acceleration of Tensorflow & PyTorch Programs on XPU
+- [Nano](#nano): Transparent Acceleration of Tensorflow & PyTorch Programs on Intel CPU/GPU

 - [DLlib](#dllib): “Equivalent of Spark MLlib” for Deep Learning

@ -33,7 +35,7 @@ BigDL seamlessly scales your data analytics & AI applications from laptop to clo

 - [Friesian](#friesian): End-to-End Recommendation Systems

- [PPML](#ppml): Secure Big Data and AI (with SGX Hardware Security)
+- [PPML](#ppml): Secure Big Data and AI (with SGX/TDX Hardware Security)

 For more information, you may [read the docs](https://bigdl.readthedocs.io/).

@ -47,13 +49,15 @@ flowchart TD;
    Feature1-- "Yes"  -->ReferPPML([<em><strong>PPML</strong></em>]);
    Feature2-- Python -->Feature3{{What type of application?}};
    Feature2-- Scala/Java -->ReferDLlib([<em><strong>DLlib</strong></em>]);
-    Feature3-- "Distributed Big Data + AI (TF/PyTorch)" -->ReferOrca([<em><strong>Orca</strong></em>]);
+    Feature3-- "Large Language Model" -->ReferLLM([<em><strong>LLM</strong></em>]);
+    Feature3-- "Big Data + AI (TF/PyTorch)" -->ReferOrca([<em><strong>Orca</strong></em>]);
    Feature3-- Accelerate TensorFlow / PyTorch -->ReferNano([<em><strong>Nano</strong></em>]);
    Feature3-- DL for Spark MLlib -->ReferDLlib2([<em><strong>DLlib</strong></em>]);
    Feature3-- High Level App Framework -->Feature4{{Domain?}};
    Feature4-- Time Series -->ReferChronos([<em><strong>Chronos</strong></em>]);
    Feature4-- Recommender System -->ReferFriesian([<em><strong>Friesian</strong></em>]);
    
+    click ReferLLM "https://github.com/intel-analytics/bigdl/tree/main/python/llm"
    click ReferNano "https://github.com/intel-analytics/bigdl#nano"
    click ReferOrca "https://github.com/intel-analytics/bigdl#orca"
    click ReferDLlib "https://github.com/intel-analytics/bigdl#dllib"
@ -64,7 +68,7 @@ flowchart TD;
    
    classDef ReferStyle1 fill:#5099ce,stroke:#5099ce;
    classDef Feature fill:#FFF,stroke:#08409c,stroke-width:1px;
-    class ReferNano,ReferOrca,ReferDLlib,ReferDLlib2,ReferChronos,ReferFriesian,ReferPPML ReferStyle1;
+    class ReferLLM,ReferNano,ReferOrca,ReferDLlib,ReferDLlib2,ReferChronos,ReferFriesian,ReferPPML ReferStyle1;
    class Feature1,Feature2,Feature3,Feature4,Feature5,Feature6,Feature7 Feature;
    
 ```
--- a/docs/readthedocs/source/index.rst
+++ b/docs/readthedocs/source/index.rst
@ -4,10 +4,20 @@
 BigDL: fast, distributed, secure AI for Big Data
 =================================================

+Latest News
+---------------------------------
+- **Try the latest** `bigdl-llm <https://github.com/intel-analytics/BigDL/tree/main/python/llm>`_ **library for running LLM (large language model) on your Intel laptop using INT4 with very low latency!** [*]_. *(It is built on top of the excellent work of* `llama.cpp <https://github.com/ggerganov/llama.cpp>`_, `gptq <https://github.com/IST-DASLab/gptq>`_, `bitsandbytes <https://github.com/TimDettmers/bitsandbytes>`_, *etc., and supports any Hugging Face Transformers model.)*
+
+- **[Update] Over a dozen models have been verified on** `bigdl-llm <https://github.com/intel-analytics/BigDL/tree/main/python/llm>`_, including *LLaMA/LLaMA2, ChatGLM/ChatGLM2, MPT, Falcon, Dolly-v1/Dolly-v2, StarCoder, Whisper, QWen, Baichuan,* and more; see the complete list `here <https://github.com/intel-analytics/BigDL/tree/main/python/llm/README.md#verified-models>`_.
+------
+
+Overview
+---------------------------------
 `BigDL <https://github.com/intel-analytics/bigdl>`_ seamlessly scales your data analytics & AI applications from laptop to cloud, with the following libraries:

+- `LLM <https://github.com/intel-analytics/BigDL/tree/main/python/llm>`_: Low-bit (INT3/INT4/INT5/INT8) large language model library for Intel CPU/GPU
 - `Orca <doc/Orca/index.html>`_: Distributed Big Data & AI (TF & PyTorch) Pipeline on Spark and Ray
- `Nano <doc/Nano/index.html>`_: Transparent Acceleration of Tensorflow & PyTorch Programs on XPU
+- `Nano <doc/Nano/index.html>`_: Transparent Acceleration of Tensorflow & PyTorch Programs on Intel CPU/GPU
 - `DLlib <doc/DLlib/index.html>`_: "Equivalent of Spark MLlib" for Deep Learning
 - `Chronos <doc/Chronos/index.html>`_: Scalable Time Series Analysis using AutoML
 - `Friesian <doc/Friesian/index.html>`_: End-to-End Recommendation Systems
@ -30,6 +40,7 @@ Choosing the right BigDL library
        Feature3 [label="What type of application?"]
        Feature4 [label="Domain?"]
        
+        LLM[href="https://github.com/intel-analytics/BigDL/blob/main/python/llm" target="_blank" target="_blank" style="rounded,filled" fontcolor="#ffffff" tooltip="Go to BigDL-LLM document"]
        Orca[href="../doc/Orca/index.html" target="_blank" target="_blank" style="rounded,filled" fontcolor="#ffffff" tooltip="Go to BigDL-Orca document"]
        Nano[href="../doc/Nano/index.html" target="_blank" target="_blank" style="rounded,filled" fontcolor="#ffffff" tooltip="Go to BigDL-Nano document"]
        DLlib1[label="DLlib" href="../doc/DLlib/index.html" target="_blank" style="rounded,filled" fontcolor="#ffffff" tooltip="Go to BigDL-DLlib document"]
@ -42,12 +53,13 @@ Choosing the right BigDL library
        ArrowLabel2[label="Yes" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"]
        ArrowLabel3[label="Python" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"]
        ArrowLabel4[label="Scala/Java" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"]
-        ArrowLabel5[label="Distributed Big Data \n + \n AI (TF/PyTorch)" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"]
-        ArrowLabel6[label="Accelerate \n TensorFlow / PyTorch" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"]
-        ArrowLabel7[label="DL for Spark MLlib" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"]
-        ArrowLabel8[label="High Level App Framework" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"]
-        ArrowLabel9[label="Time Series" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"]
-        ArrowLabel10[label="Recommender System" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"]
+        ArrowLabel5[label="Large Language Model" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"]
+        ArrowLabel6[label="Big Data + \n AI (TF/PyTorch)" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"]
+        ArrowLabel7[label="Accelerate \n TensorFlow / PyTorch" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"]
+        ArrowLabel8[label="DL for Spark MLlib" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"]
+        ArrowLabel9[label="High Level App Framework" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"]
+        ArrowLabel10[label="Time Series" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"]
+        ArrowLabel11[label="Recommender System" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"]
        
        Feature1 -> ArrowLabel1[dir=none]
        ArrowLabel1 -> Feature2
@ -60,16 +72,22 @@ Choosing the right BigDL library
        ArrowLabel4 -> DLlib1
        
        Feature3 -> ArrowLabel5[dir=none]
-         ArrowLabel5 -> Orca
+        ArrowLabel5 -> LLM
        Feature3 -> ArrowLabel6[dir=none]
-        ArrowLabel6 -> Nano
+        ArrowLabel6 -> Orca
        Feature3 -> ArrowLabel7[dir=none]
-        ArrowLabel7 -> DLlib2
+        ArrowLabel7 -> Nano
        Feature3 -> ArrowLabel8[dir=none]
-        ArrowLabel8 -> Feature4
-        
-        Feature4 -> ArrowLabel9[dir=none]
-        ArrowLabel9 -> Chronos
+        ArrowLabel8 -> DLlib2
+        Feature3 -> ArrowLabel9[dir=none]
+        ArrowLabel9 -> Feature4
+     
        Feature4 -> ArrowLabel10[dir=none]
-        ArrowLabel10 -> Friesian
+        ArrowLabel10 -> Chronos
+        Feature4 -> ArrowLabel11[dir=none]
+        ArrowLabel11 -> Friesian
    }
+
+------
+
+.. [*] Performance varies by use, configuration and other factors. `bigdl-llm` may not optimize to the same degree for non-Intel products. Learn more at www.Intel.com/PerformanceIndex.
--- a/python/llm/README.md
+++ b/python/llm/README.md
@ -5,11 +5,11 @@
 >*(It is built on top of the excellent work of [llama.cpp](https://github.com/ggerganov/llama.cpp), [gptq](https://github.com/IST-DASLab/gptq), [ggml](https://github.com/ggerganov/ggml), [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), [gptq_for_llama](https://github.com/qwopqwop200/GPTQ-for-LLaMa), [bitsandbytes](https://github.com/TimDettmers/bitsandbytes), [chatglm.cpp](https://github.com/li-plus/chatglm.cpp), [redpajama.cpp](https://github.com/togethercomputer/redpajama.cpp), [gptneox.cpp](https://github.com/byroneverson/gptneox.cpp), [bloomz.cpp](https://github.com/NouamaneTazi/bloomz.cpp/), etc.)*

 ### Demos
-See the ***optimized performance*** of `chatglm2-6b`, `vicuna-13b-v1.1`, and `starcoder-15b` models on a 12th Gen Intel Core CPU below.
+See the ***optimized performance*** of `chatglm2-6b`, `llama-2-13b-chat`, and `starcoder-15b` models on a 12th Gen Intel Core CPU below.

 <p align="center">
-            <img src="https://github.com/bigdl-project/bigdl-project.github.io/blob/master/assets/chatglm2-6b.gif" width='33%' /> <img src="https://github.com/bigdl-project/bigdl-project.github.io/blob/master/assets/llm-13b.gif" width='33%' /> <img src="https://github.com/bigdl-project/bigdl-project.github.io/blob/master/assets/llm-15b5.gif" width='33%' />
-            <img src="https://github.com/bigdl-project/bigdl-project.github.io/blob/master/assets/llm-models2.png" width='85%'/>
+            <img src="https://github.com/bigdl-project/bigdl-project.github.io/blob/master/assets/chatglm2-6b.gif" width='33%' /> <img src="https://github.com/bigdl-project/bigdl-project.github.io/blob/master/assets/llama-2-13b-chat.gif" width='33%' /> <img src="https://github.com/bigdl-project/bigdl-project.github.io/blob/master/assets/llm-15b5.gif" width='33%' />
+            <img src="https://github.com/bigdl-project/bigdl-project.github.io/blob/master/assets/llm-models3.png" width='85%'/>
 </p>

 ### Verified models
@ -111,13 +111,9 @@ You may run the models using `transformers`-style API in `bigdl-llm`.

  You may also convert Hugging Face *Transformers* models into native INT4 format for maximum performance as follows.

-  >**Notes**: 
+  >**Notes**: Currently only llama/bloom/gptneox/starcoder/chatglm model families are supported; you may use the corresponding API to load the converted model. (For other models, you can use the Transformers INT4 format as described above).
  
-  * Currently only llama/bloom/gptneox/starcoder/chatglm model families are supported; for other models, you may use the Transformers INT4 format as described above).
-  
-  * You may choose the corresponding API developed for specific native models to load the converted model.
-
-  ```python
+   ```python
  #convert the model
  from bigdl.llm import llm_convert
  bigdl_llm_path = llm_convert(model='/path/to/model/',
@ -126,7 +122,7 @@ You may run the models using `transformers`-style API in `bigdl-llm`.
  #load the converted model
  #switch to ChatGLMForCausalLM/GptneoxForCausalLM/BloomForCausalLM/StarcoderForCausalLM to load other models
  from bigdl.llm.transformers import LlamaForCausalLM
-  llm = LlamaForCausalLM.from_pretrained("/path/to/output/model.bin", ...)
+  llm = LlamaForCausalLM.from_pretrained("/path/to/output/model.bin", native=True, ...)
  
  #run the converted model
  input_ids = llm.tokenize(prompt)