diff --git a/README.md b/README.md
index ef40fdbe..90bcea27 100644
--- a/README.md
+++ b/README.md
@@ -9,11 +9,11 @@ _**Fast, Distributed, Secure AI for Big Data**_
---
## Latest News
-- **Try the latest [`bigdl-llm`](python/llm) for running LLM (large language model) on your Intel laptop using INT4 with very low latency!**[^1] *(It is built on top of the excellent work of [llama.cpp](https://github.com/ggerganov/llama.cpp), [gptq](https://github.com/IST-DASLab/gptq), [bitsandbytes](https://github.com/TimDettmers/bitsandbytes), etc., and supports any Hugging Face Transformers model)*
+- **Try the latest [`bigdl-llm`](python/llm) library for running LLM (large language model) on your Intel laptop using INT4 with very low latency!**[^1] *(It is built on top of the excellent work of [llama.cpp](https://github.com/ggerganov/llama.cpp), [gptq](https://github.com/IST-DASLab/gptq), [bitsandbytes](https://github.com/TimDettmers/bitsandbytes), etc., and supports any Hugging Face Transformers model)*
-
-
+
+
- **[Update] Over a dozen models have been verified on [`bigdl-llm`](python/llm)**, including *LLaMA/LLaMA2, ChatGLM/ChatGLM2, MPT, Falcon, Dolly-v1/Dolly-v2, StarCoder, Whisper, QWen, Baichuan,* and more; see the complete list [here](python/llm/README.md#verified-models).
@@ -23,9 +23,11 @@ _**Fast, Distributed, Secure AI for Big Data**_
BigDL seamlessly scales your data analytics & AI applications from laptop to cloud, with the following libraries:
+- [LLM](python/llm): Low-bit (INT3/INT4/INT5/INT8) large language model library for Intel CPU/GPU
+
- [Orca](#orca): Distributed Big Data & AI (TF & PyTorch) Pipeline on Spark and Ray
-- [Nano](#nano): Transparent Acceleration of Tensorflow & PyTorch Programs on XPU
+- [Nano](#nano): Transparent Acceleration of Tensorflow & PyTorch Programs on Intel CPU/GPU
- [DLlib](#dllib): “Equivalent of Spark MLlib” for Deep Learning
@@ -33,7 +35,7 @@ BigDL seamlessly scales your data analytics & AI applications from laptop to clo
- [Friesian](#friesian): End-to-End Recommendation Systems
-- [PPML](#ppml): Secure Big Data and AI (with SGX Hardware Security)
+- [PPML](#ppml): Secure Big Data and AI (with SGX/TDX Hardware Security)
For more information, you may [read the docs](https://bigdl.readthedocs.io/).
@@ -47,13 +49,15 @@ flowchart TD;
Feature1-- "Yes" -->ReferPPML([PPML]);
Feature2-- Python -->Feature3{{What type of application?}};
Feature2-- Scala/Java -->ReferDLlib([DLlib]);
- Feature3-- "Distributed Big Data + AI (TF/PyTorch)" -->ReferOrca([Orca]);
+ Feature3-- "Large Language Model" -->ReferLLM([LLM]);
+ Feature3-- "Big Data + AI (TF/PyTorch)" -->ReferOrca([Orca]);
Feature3-- Accelerate TensorFlow / PyTorch -->ReferNano([Nano]);
Feature3-- DL for Spark MLlib -->ReferDLlib2([DLlib]);
Feature3-- High Level App Framework -->Feature4{{Domain?}};
Feature4-- Time Series -->ReferChronos([Chronos]);
Feature4-- Recommender System -->ReferFriesian([Friesian]);
+ click ReferLLM "https://github.com/intel-analytics/bigdl/tree/main/python/llm"
click ReferNano "https://github.com/intel-analytics/bigdl#nano"
click ReferOrca "https://github.com/intel-analytics/bigdl#orca"
click ReferDLlib "https://github.com/intel-analytics/bigdl#dllib"
@@ -64,7 +68,7 @@ flowchart TD;
classDef ReferStyle1 fill:#5099ce,stroke:#5099ce;
classDef Feature fill:#FFF,stroke:#08409c,stroke-width:1px;
- class ReferNano,ReferOrca,ReferDLlib,ReferDLlib2,ReferChronos,ReferFriesian,ReferPPML ReferStyle1;
+ class ReferLLM,ReferNano,ReferOrca,ReferDLlib,ReferDLlib2,ReferChronos,ReferFriesian,ReferPPML ReferStyle1;
class Feature1,Feature2,Feature3,Feature4,Feature5,Feature6,Feature7 Feature;
```
diff --git a/docs/readthedocs/source/index.rst b/docs/readthedocs/source/index.rst
index a0bd5394..30a6050e 100644
--- a/docs/readthedocs/source/index.rst
+++ b/docs/readthedocs/source/index.rst
@@ -4,10 +4,20 @@
BigDL: fast, distributed, secure AI for Big Data
=================================================
+Latest News
+---------------------------------
+- **Try the latest** `bigdl-llm `_ **library for running LLM (large language model) on your Intel laptop using INT4 with very low latency!** [*]_. *(It is built on top of the excellent work of* `llama.cpp `_, `gptq `_, `bitsandbytes `_, *etc., and supports any Hugging Face Transformers model.)*
+
+- **[Update] Over a dozen models have been verified on** `bigdl-llm `_, including *LLaMA/LLaMA2, ChatGLM/ChatGLM2, MPT, Falcon, Dolly-v1/Dolly-v2, StarCoder, Whisper, QWen, Baichuan,* and more; see the complete list `here `_.
+------
+
+Overview
+---------------------------------
`BigDL `_ seamlessly scales your data analytics & AI applications from laptop to cloud, with the following libraries:
+- `LLM `_: Low-bit (INT3/INT4/INT5/INT8) large language model library for Intel CPU/GPU
- `Orca `_: Distributed Big Data & AI (TF & PyTorch) Pipeline on Spark and Ray
-- `Nano `_: Transparent Acceleration of Tensorflow & PyTorch Programs on XPU
+- `Nano `_: Transparent Acceleration of Tensorflow & PyTorch Programs on Intel CPU/GPU
- `DLlib `_: "Equivalent of Spark MLlib" for Deep Learning
- `Chronos `_: Scalable Time Series Analysis using AutoML
- `Friesian `_: End-to-End Recommendation Systems
@@ -30,6 +40,7 @@ Choosing the right BigDL library
Feature3 [label="What type of application?"]
Feature4 [label="Domain?"]
+ LLM[href="https://github.com/intel-analytics/BigDL/blob/main/python/llm" target="_blank" target="_blank" style="rounded,filled" fontcolor="#ffffff" tooltip="Go to BigDL-LLM document"]
Orca[href="../doc/Orca/index.html" target="_blank" target="_blank" style="rounded,filled" fontcolor="#ffffff" tooltip="Go to BigDL-Orca document"]
Nano[href="../doc/Nano/index.html" target="_blank" target="_blank" style="rounded,filled" fontcolor="#ffffff" tooltip="Go to BigDL-Nano document"]
DLlib1[label="DLlib" href="../doc/DLlib/index.html" target="_blank" style="rounded,filled" fontcolor="#ffffff" tooltip="Go to BigDL-DLlib document"]
@@ -42,12 +53,13 @@ Choosing the right BigDL library
ArrowLabel2[label="Yes" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"]
ArrowLabel3[label="Python" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"]
ArrowLabel4[label="Scala/Java" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"]
- ArrowLabel5[label="Distributed Big Data \n + \n AI (TF/PyTorch)" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"]
- ArrowLabel6[label="Accelerate \n TensorFlow / PyTorch" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"]
- ArrowLabel7[label="DL for Spark MLlib" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"]
- ArrowLabel8[label="High Level App Framework" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"]
- ArrowLabel9[label="Time Series" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"]
- ArrowLabel10[label="Recommender System" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"]
+ ArrowLabel5[label="Large Language Model" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"]
+ ArrowLabel6[label="Big Data + \n AI (TF/PyTorch)" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"]
+ ArrowLabel7[label="Accelerate \n TensorFlow / PyTorch" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"]
+ ArrowLabel8[label="DL for Spark MLlib" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"]
+ ArrowLabel9[label="High Level App Framework" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"]
+ ArrowLabel10[label="Time Series" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"]
+ ArrowLabel11[label="Recommender System" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"]
Feature1 -> ArrowLabel1[dir=none]
ArrowLabel1 -> Feature2
@@ -60,16 +72,22 @@ Choosing the right BigDL library
ArrowLabel4 -> DLlib1
Feature3 -> ArrowLabel5[dir=none]
- ArrowLabel5 -> Orca
+ ArrowLabel5 -> LLM
Feature3 -> ArrowLabel6[dir=none]
- ArrowLabel6 -> Nano
+ ArrowLabel6 -> Orca
Feature3 -> ArrowLabel7[dir=none]
- ArrowLabel7 -> DLlib2
+ ArrowLabel7 -> Nano
Feature3 -> ArrowLabel8[dir=none]
- ArrowLabel8 -> Feature4
-
- Feature4 -> ArrowLabel9[dir=none]
- ArrowLabel9 -> Chronos
+ ArrowLabel8 -> DLlib2
+ Feature3 -> ArrowLabel9[dir=none]
+ ArrowLabel9 -> Feature4
+
Feature4 -> ArrowLabel10[dir=none]
- ArrowLabel10 -> Friesian
+ ArrowLabel10 -> Chronos
+ Feature4 -> ArrowLabel11[dir=none]
+ ArrowLabel11 -> Friesian
}
+
+------
+
+.. [*] Performance varies by use, configuration and other factors. `bigdl-llm` may not optimize to the same degree for non-Intel products. Learn more at www.Intel.com/PerformanceIndex.
diff --git a/python/llm/README.md b/python/llm/README.md
index 34a8b4da..d563809b 100644
--- a/python/llm/README.md
+++ b/python/llm/README.md
@@ -5,11 +5,11 @@
>*(It is built on top of the excellent work of [llama.cpp](https://github.com/ggerganov/llama.cpp), [gptq](https://github.com/IST-DASLab/gptq), [ggml](https://github.com/ggerganov/ggml), [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), [gptq_for_llama](https://github.com/qwopqwop200/GPTQ-for-LLaMa), [bitsandbytes](https://github.com/TimDettmers/bitsandbytes), [chatglm.cpp](https://github.com/li-plus/chatglm.cpp), [redpajama.cpp](https://github.com/togethercomputer/redpajama.cpp), [gptneox.cpp](https://github.com/byroneverson/gptneox.cpp), [bloomz.cpp](https://github.com/NouamaneTazi/bloomz.cpp/), etc.)*
### Demos
-See the ***optimized performance*** of `chatglm2-6b`, `vicuna-13b-v1.1`, and `starcoder-15b` models on a 12th Gen Intel Core CPU below.
+See the ***optimized performance*** of `chatglm2-6b`, `llama-2-13b-chat`, and `starcoder-15b` models on a 12th Gen Intel Core CPU below.
-
-
+
+
### Verified models
@@ -111,13 +111,9 @@ You may run the models using `transformers`-style API in `bigdl-llm`.
You may also convert Hugging Face *Transformers* models into native INT4 format for maximum performance as follows.
- >**Notes**:
+ >**Notes**: Currently only llama/bloom/gptneox/starcoder/chatglm model families are supported; you may use the corresponding API to load the converted model. (For other models, you can use the Transformers INT4 format as described above).
- * Currently only llama/bloom/gptneox/starcoder/chatglm model families are supported; for other models, you may use the Transformers INT4 format as described above).
-
- * You may choose the corresponding API developed for specific native models to load the converted model.
-
- ```python
+ ```python
#convert the model
from bigdl.llm import llm_convert
bigdl_llm_path = llm_convert(model='/path/to/model/',
@@ -126,7 +122,7 @@ You may run the models using `transformers`-style API in `bigdl-llm`.
#load the converted model
#switch to ChatGLMForCausalLM/GptneoxForCausalLM/BloomForCausalLM/StarcoderForCausalLM to load other models
from bigdl.llm.transformers import LlamaForCausalLM
- llm = LlamaForCausalLM.from_pretrained("/path/to/output/model.bin", ...)
+ llm = LlamaForCausalLM.from_pretrained("/path/to/output/model.bin", native=True, ...)
#run the converted model
input_ids = llm.tokenize(prompt)