From dcadd0915454cf033f40bde6968a613a481a9de2 Mon Sep 17 00:00:00 2001 From: Jason Dai Date: Mon, 21 Aug 2023 22:34:44 +0800 Subject: [PATCH] Update llm document (#8784) --- README.md | 18 +++++++----- docs/readthedocs/source/index.rst | 48 +++++++++++++++++++++---------- python/llm/README.md | 16 ++++------- 3 files changed, 50 insertions(+), 32 deletions(-) diff --git a/README.md b/README.md index ef40fdbe..90bcea27 100644 --- a/README.md +++ b/README.md @@ -9,11 +9,11 @@ _**Fast, Distributed, Secure AI for Big Data**_ --- ## Latest News -- **Try the latest [`bigdl-llm`](python/llm) for running LLM (large language model) on your Intel laptop using INT4 with very low latency!**[^1] *(It is built on top of the excellent work of [llama.cpp](https://github.com/ggerganov/llama.cpp), [gptq](https://github.com/IST-DASLab/gptq), [bitsandbytes](https://github.com/TimDettmers/bitsandbytes), etc., and supports any Hugging Face Transformers model)* +- **Try the latest [`bigdl-llm`](python/llm) library for running LLM (large language model) on your Intel laptop using INT4 with very low latency!**[^1] *(It is built on top of the excellent work of [llama.cpp](https://github.com/ggerganov/llama.cpp), [gptq](https://github.com/IST-DASLab/gptq), [bitsandbytes](https://github.com/TimDettmers/bitsandbytes), etc., and supports any Hugging Face Transformers model)*

- - + +

- **[Update] Over a dozen models have been verified on [`bigdl-llm`](python/llm)**, including *LLaMA/LLaMA2, ChatGLM/ChatGLM2, MPT, Falcon, Dolly-v1/Dolly-v2, StarCoder, Whisper, QWen, Baichuan,* and more; see the complete list [here](python/llm/README.md#verified-models). @@ -23,9 +23,11 @@ _**Fast, Distributed, Secure AI for Big Data**_ BigDL seamlessly scales your data analytics & AI applications from laptop to cloud, with the following libraries: +- [LLM](python/llm): Low-bit (INT3/INT4/INT5/INT8) large language model library for Intel CPU/GPU + - [Orca](#orca): Distributed Big Data & AI (TF & PyTorch) Pipeline on Spark and Ray -- [Nano](#nano): Transparent Acceleration of Tensorflow & PyTorch Programs on XPU +- [Nano](#nano): Transparent Acceleration of Tensorflow & PyTorch Programs on Intel CPU/GPU - [DLlib](#dllib): “Equivalent of Spark MLlib” for Deep Learning @@ -33,7 +35,7 @@ BigDL seamlessly scales your data analytics & AI applications from laptop to clo - [Friesian](#friesian): End-to-End Recommendation Systems -- [PPML](#ppml): Secure Big Data and AI (with SGX Hardware Security) +- [PPML](#ppml): Secure Big Data and AI (with SGX/TDX Hardware Security) For more information, you may [read the docs](https://bigdl.readthedocs.io/). @@ -47,13 +49,15 @@ flowchart TD; Feature1-- "Yes" -->ReferPPML([PPML]); Feature2-- Python -->Feature3{{What type of application?}}; Feature2-- Scala/Java -->ReferDLlib([DLlib]); - Feature3-- "Distributed Big Data + AI (TF/PyTorch)" -->ReferOrca([Orca]); + Feature3-- "Large Language Model" -->ReferLLM([LLM]); + Feature3-- "Big Data + AI (TF/PyTorch)" -->ReferOrca([Orca]); Feature3-- Accelerate TensorFlow / PyTorch -->ReferNano([Nano]); Feature3-- DL for Spark MLlib -->ReferDLlib2([DLlib]); Feature3-- High Level App Framework -->Feature4{{Domain?}}; Feature4-- Time Series -->ReferChronos([Chronos]); Feature4-- Recommender System -->ReferFriesian([Friesian]); + click ReferLLM "https://github.com/intel-analytics/bigdl/tree/main/python/llm" click ReferNano "https://github.com/intel-analytics/bigdl#nano" click ReferOrca "https://github.com/intel-analytics/bigdl#orca" click ReferDLlib "https://github.com/intel-analytics/bigdl#dllib" @@ -64,7 +68,7 @@ flowchart TD; classDef ReferStyle1 fill:#5099ce,stroke:#5099ce; classDef Feature fill:#FFF,stroke:#08409c,stroke-width:1px; - class ReferNano,ReferOrca,ReferDLlib,ReferDLlib2,ReferChronos,ReferFriesian,ReferPPML ReferStyle1; + class ReferLLM,ReferNano,ReferOrca,ReferDLlib,ReferDLlib2,ReferChronos,ReferFriesian,ReferPPML ReferStyle1; class Feature1,Feature2,Feature3,Feature4,Feature5,Feature6,Feature7 Feature; ``` diff --git a/docs/readthedocs/source/index.rst b/docs/readthedocs/source/index.rst index a0bd5394..30a6050e 100644 --- a/docs/readthedocs/source/index.rst +++ b/docs/readthedocs/source/index.rst @@ -4,10 +4,20 @@ BigDL: fast, distributed, secure AI for Big Data ================================================= +Latest News +--------------------------------- +- **Try the latest** `bigdl-llm `_ **library for running LLM (large language model) on your Intel laptop using INT4 with very low latency!** [*]_. *(It is built on top of the excellent work of* `llama.cpp `_, `gptq `_, `bitsandbytes `_, *etc., and supports any Hugging Face Transformers model.)* + +- **[Update] Over a dozen models have been verified on** `bigdl-llm `_, including *LLaMA/LLaMA2, ChatGLM/ChatGLM2, MPT, Falcon, Dolly-v1/Dolly-v2, StarCoder, Whisper, QWen, Baichuan,* and more; see the complete list `here `_. +------ + +Overview +--------------------------------- `BigDL `_ seamlessly scales your data analytics & AI applications from laptop to cloud, with the following libraries: +- `LLM `_: Low-bit (INT3/INT4/INT5/INT8) large language model library for Intel CPU/GPU - `Orca `_: Distributed Big Data & AI (TF & PyTorch) Pipeline on Spark and Ray -- `Nano `_: Transparent Acceleration of Tensorflow & PyTorch Programs on XPU +- `Nano `_: Transparent Acceleration of Tensorflow & PyTorch Programs on Intel CPU/GPU - `DLlib `_: "Equivalent of Spark MLlib" for Deep Learning - `Chronos `_: Scalable Time Series Analysis using AutoML - `Friesian `_: End-to-End Recommendation Systems @@ -30,6 +40,7 @@ Choosing the right BigDL library Feature3 [label="What type of application?"] Feature4 [label="Domain?"] + LLM[href="https://github.com/intel-analytics/BigDL/blob/main/python/llm" target="_blank" target="_blank" style="rounded,filled" fontcolor="#ffffff" tooltip="Go to BigDL-LLM document"] Orca[href="../doc/Orca/index.html" target="_blank" target="_blank" style="rounded,filled" fontcolor="#ffffff" tooltip="Go to BigDL-Orca document"] Nano[href="../doc/Nano/index.html" target="_blank" target="_blank" style="rounded,filled" fontcolor="#ffffff" tooltip="Go to BigDL-Nano document"] DLlib1[label="DLlib" href="../doc/DLlib/index.html" target="_blank" style="rounded,filled" fontcolor="#ffffff" tooltip="Go to BigDL-DLlib document"] @@ -42,12 +53,13 @@ Choosing the right BigDL library ArrowLabel2[label="Yes" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"] ArrowLabel3[label="Python" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"] ArrowLabel4[label="Scala/Java" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"] - ArrowLabel5[label="Distributed Big Data \n + \n AI (TF/PyTorch)" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"] - ArrowLabel6[label="Accelerate \n TensorFlow / PyTorch" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"] - ArrowLabel7[label="DL for Spark MLlib" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"] - ArrowLabel8[label="High Level App Framework" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"] - ArrowLabel9[label="Time Series" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"] - ArrowLabel10[label="Recommender System" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"] + ArrowLabel5[label="Large Language Model" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"] + ArrowLabel6[label="Big Data + \n AI (TF/PyTorch)" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"] + ArrowLabel7[label="Accelerate \n TensorFlow / PyTorch" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"] + ArrowLabel8[label="DL for Spark MLlib" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"] + ArrowLabel9[label="High Level App Framework" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"] + ArrowLabel10[label="Time Series" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"] + ArrowLabel11[label="Recommender System" fontsize=12 width=0.1 height=0.1 style=filled color="#c9c9c9"] Feature1 -> ArrowLabel1[dir=none] ArrowLabel1 -> Feature2 @@ -60,16 +72,22 @@ Choosing the right BigDL library ArrowLabel4 -> DLlib1 Feature3 -> ArrowLabel5[dir=none] - ArrowLabel5 -> Orca + ArrowLabel5 -> LLM Feature3 -> ArrowLabel6[dir=none] - ArrowLabel6 -> Nano + ArrowLabel6 -> Orca Feature3 -> ArrowLabel7[dir=none] - ArrowLabel7 -> DLlib2 + ArrowLabel7 -> Nano Feature3 -> ArrowLabel8[dir=none] - ArrowLabel8 -> Feature4 - - Feature4 -> ArrowLabel9[dir=none] - ArrowLabel9 -> Chronos + ArrowLabel8 -> DLlib2 + Feature3 -> ArrowLabel9[dir=none] + ArrowLabel9 -> Feature4 + Feature4 -> ArrowLabel10[dir=none] - ArrowLabel10 -> Friesian + ArrowLabel10 -> Chronos + Feature4 -> ArrowLabel11[dir=none] + ArrowLabel11 -> Friesian } + +------ + +.. [*] Performance varies by use, configuration and other factors. `bigdl-llm` may not optimize to the same degree for non-Intel products. Learn more at www.Intel.com/PerformanceIndex. diff --git a/python/llm/README.md b/python/llm/README.md index 34a8b4da..d563809b 100644 --- a/python/llm/README.md +++ b/python/llm/README.md @@ -5,11 +5,11 @@ >*(It is built on top of the excellent work of [llama.cpp](https://github.com/ggerganov/llama.cpp), [gptq](https://github.com/IST-DASLab/gptq), [ggml](https://github.com/ggerganov/ggml), [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), [gptq_for_llama](https://github.com/qwopqwop200/GPTQ-for-LLaMa), [bitsandbytes](https://github.com/TimDettmers/bitsandbytes), [chatglm.cpp](https://github.com/li-plus/chatglm.cpp), [redpajama.cpp](https://github.com/togethercomputer/redpajama.cpp), [gptneox.cpp](https://github.com/byroneverson/gptneox.cpp), [bloomz.cpp](https://github.com/NouamaneTazi/bloomz.cpp/), etc.)* ### Demos -See the ***optimized performance*** of `chatglm2-6b`, `vicuna-13b-v1.1`, and `starcoder-15b` models on a 12th Gen Intel Core CPU below. +See the ***optimized performance*** of `chatglm2-6b`, `llama-2-13b-chat`, and `starcoder-15b` models on a 12th Gen Intel Core CPU below.

- - + +

### Verified models @@ -111,13 +111,9 @@ You may run the models using `transformers`-style API in `bigdl-llm`. You may also convert Hugging Face *Transformers* models into native INT4 format for maximum performance as follows. - >**Notes**: + >**Notes**: Currently only llama/bloom/gptneox/starcoder/chatglm model families are supported; you may use the corresponding API to load the converted model. (For other models, you can use the Transformers INT4 format as described above). - * Currently only llama/bloom/gptneox/starcoder/chatglm model families are supported; for other models, you may use the Transformers INT4 format as described above). - - * You may choose the corresponding API developed for specific native models to load the converted model. - - ```python + ```python #convert the model from bigdl.llm import llm_convert bigdl_llm_path = llm_convert(model='/path/to/model/', @@ -126,7 +122,7 @@ You may run the models using `transformers`-style API in `bigdl-llm`. #load the converted model #switch to ChatGLMForCausalLM/GptneoxForCausalLM/BloomForCausalLM/StarcoderForCausalLM to load other models from bigdl.llm.transformers import LlamaForCausalLM - llm = LlamaForCausalLM.from_pretrained("/path/to/output/model.bin", ...) + llm = LlamaForCausalLM.from_pretrained("/path/to/output/model.bin", native=True, ...) #run the converted model input_ids = llm.tokenize(prompt)