[BigDL LLM] Update readme for unifying transformers API (#8737)

* update readme doc

* fix readthedocs error

* update comment

* update exception error info

* invalidInputError instead

* fix readme typo error and remove import error

* fix more typo
This commit is contained in:
SONG Ge 2023-08-16 14:22:32 +08:00 committed by GitHub
parent c1f9af6d97
commit f4164e4492
4 changed files with 98 additions and 36 deletions

View file

@ -9,7 +9,7 @@ _**Fast, Distributed, Secure AI for Big Data**_
---
## Latest News
- **Try the latest [`bigdl-llm`](python/llm) for running LLM (language language model) on your Intel laptop using INT4 with very low latency!**[^1] *(It is built on top of the excellent work of [llama.cpp](https://github.com/ggerganov/llama.cpp), [gptq](https://github.com/IST-DASLab/gptq), [bitsandbytes](https://github.com/TimDettmers/bitsandbytes), etc., and supports any Hugging Face Transformers model)*
- **Try the latest [`bigdl-llm`](python/llm) for running LLM (large language model) on your Intel laptop using INT4 with very low latency!**[^1] *(It is built on top of the excellent work of [llama.cpp](https://github.com/ggerganov/llama.cpp), [gptq](https://github.com/IST-DASLab/gptq), [bitsandbytes](https://github.com/TimDettmers/bitsandbytes), etc., and supports any Hugging Face Transformers model)*
<p align="center">
<img src="https://github.com/bigdl-project/bigdl-project.github.io/blob/master/assets/chatglm2-6b.gif" width='30%' /> <img src="https://github.com/bigdl-project/bigdl-project.github.io/blob/master/assets/llm-13b.gif" width='30%' /> <img src="https://github.com/bigdl-project/bigdl-project.github.io/blob/master/assets/llm-15b5.gif" width='30%' />

View file

@ -48,7 +48,45 @@ llm.transformers.model
llm.transformers.modelling_bigdl
----------------------------------------
.. automodule:: bigdl.llm.transformers.modelling_bigdl
.. autoclass:: bigdl.llm.transformers.modelling_bigdl.LlamaForCausalLM
:members:
:undoc-members:
:show-inheritance:
:exclude-members: GGML_Model, GGML_Module, HF_Class
.. automethod:: from_pretrained
.. autoclass:: bigdl.llm.transformers.modelling_bigdl.ChatGLMForCausalLM
:members:
:undoc-members:
:show-inheritance:
:exclude-members: GGML_Model, GGML_Module, HF_Class
.. automethod:: from_pretrained
.. autoclass:: bigdl.llm.transformers.modelling_bigdl.GptneoxForCausalLM
:members:
:undoc-members:
:show-inheritance:
:exclude-members: GGML_Model, GGML_Module, HF_Class
.. automethod:: from_pretrained
.. autoclass:: bigdl.llm.transformers.modelling_bigdl.BloomForCausalLM
:members:
:undoc-members:
:show-inheritance:
:exclude-members: GGML_Model, GGML_Module, HF_Class
.. automethod:: from_pretrained
.. autoclass:: bigdl.llm.transformers.modelling_bigdl.StarcoderForCausalLM
:members:
:undoc-members:
:show-inheritance:
:exclude-members: GGML_Model, GGML_Module, HF_Class
.. automethod:: from_pretrained

View file

@ -1,6 +1,6 @@
## BigDL-LLM
**`bigdl-llm`** is a library for running ***LLM*** (language language model) on your Intel ***laptop*** using INT4 with very low latency[^1] (for any Hugging Face *Transformers* model).
**`bigdl-llm`** is a library for running ***LLM*** (large language model) on your Intel ***laptop*** using INT4 with very low latency[^1] (for any Hugging Face *Transformers* model).
>*(It is built on top of the excellent work of [llama.cpp](https://github.com/ggerganov/llama.cpp), [gptq](https://github.com/IST-DASLab/gptq), [ggml](https://github.com/ggerganov/ggml), [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), [gptq_for_llama](https://github.com/qwopqwop200/GPTQ-for-LLaMa), [bitsandbytes](https://github.com/TimDettmers/bitsandbytes), [chatglm.cpp](https://github.com/li-plus/chatglm.cpp), [redpajama.cpp](https://github.com/togethercomputer/redpajama.cpp), [gptneox.cpp](https://github.com/byroneverson/gptneox.cpp), [bloomz.cpp](https://github.com/NouamaneTazi/bloomz.cpp/), etc.)*
@ -76,7 +76,11 @@ You may run the models using `transformers`-style API in `bigdl-llm`.
#load Hugging Face Transformers model with INT4 optimizations
from bigdl.llm.transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained('/path/to/model/', load_in_4bit=True)
```
After loading the Hugging Face Transformers model, you may easily run the optimized model as follows.
```python
#run the optimized model
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path)
@ -88,13 +92,14 @@ You may run the models using `transformers`-style API in `bigdl-llm`.
See the complete examples [here](example/transformers/transformers_int4/).
>**Note**: You may apply more low bit optimizations (including INT8, INT5 and INT4) as follows:
>```python
>model = AutoModelForCausalLM.from_pretrained('/path/to/model/', load_in_low_bit="sym_int5")
>```
>See the complete example [here](example/transformers/transformers_low_bit/).
>```python
>model = AutoModelForCausalLM.from_pretrained('/path/to/model/', load_in_low_bit="sym_int5")
>```
>See the complete example [here](example/transformers/transformers_low_bit/).
After the model is optimizaed using INT4 (or INT5/INT8), you may save and load the optimized model as follows:
After the model is optimizaed using INT4 (or INT8/INT5), you may save and load the optimized model as follows:
```python
model.save_low_bit(model_path)
@ -106,19 +111,24 @@ You may run the models using `transformers`-style API in `bigdl-llm`.
You may also convert Hugging Face *Transformers* models into native INT4 format for maximum performance as follows.
>**Note**: Currently only llama/bloom/gptneox/starcoder model family is supported; for other models, you may use the Transformers INT4 format as described above).
>**Notes**:
```python
* Currently only llama/bloom/gptneox/starcoder/chatglm model families are supported; for other models, you may use the Transformers INT4 format as described above).
* You may choose the corresponding API developed for specific native models to load the converted model.
```python
#convert the model
from bigdl.llm import llm_convert
bigdl_llm_path = llm_convert(model='/path/to/model/',
outfile='/path/to/output/', outtype='int4', model_family="llama")
#load the converted model
from bigdl.llm.transformers import BigdlNativeForCausalLM
llm = BigdlNativeForCausalLM.from_pretrained("/path/to/output/model.bin",...)
#switch to ChatGLMForCausalLM/GptneoxForCausalLM/BloomForCausalLM/StarcoderForCausalLM to load other models
from bigdl.llm.transformers import LlamaForCausalLM
llm = LlamaForCausalLM.from_pretrained("/path/to/output/model.bin", ...)
#run the converted model
#run the converted model
input_ids = llm.tokenize(prompt)
output_ids = llm.generate(input_ids, ...)
output = llm.batch_decode(output_ids)
@ -243,8 +253,9 @@ See the inital `bigdl-llm` API Doc [here](https://bigdl.readthedocs.io/en/latest
[^1]: Performance varies by use, configuration and other factors. `bigdl-llm` may not optimize to the same degree for non-Intel products. Learn more at www.Intel.com/PerformanceIndex.
### `bigdl-llm` Dependence
The native code/lib in `bigdl-llm` has been built using the following tools; in particular, lower `LIBC` version on your Linux system may be incompatible with `bigdl-llm`.
### `bigdl-llm` Dependencies
The native code/lib in `bigdl-llm` has been built using the following tools.
Note that lower `LIBC` version on your Linux system may be incompatible with `bigdl-llm`.
| Model family | Platform | Compiler | GLIBC |
| ------------ | -------- | ------------------ | ----- |

View file

@ -19,7 +19,9 @@
# Otherwise there would be module not found error in non-pip's setting as Python would
# only search the first bigdl package and end up finding only one sub-package.
import importlib
import logging
from bigdl.llm.utils.common import invalidInputError
from .model import *
@ -107,42 +109,53 @@ class _BaseGGMLClass:
:return: a model instance
"""
if native:
invalidInputError(dtype.lower() in ['int4', 'int8'],
"Now we only support int4 and int8 as date type for weight")
ggml_model_path = pretrained_model_name_or_path
return cls.GGML_Model(model_path=ggml_model_path,
**kwargs)
else:
return cls.HF_Class.from_pretrained(pretrained_model_name_or_path,
*args, **kwargs)
try:
module = importlib.import_module(cls.GGML_Module)
class_ = getattr(module, cls.GGML_Model)
if native:
invalidInputError(dtype.lower() in ['int4', 'int8'],
"Now we only support int4 and int8 as date type for weight")
ggml_model_path = pretrained_model_name_or_path
model = class_(model_path=ggml_model_path, **kwargs)
else:
model = cls.HF_Class.from_pretrained(pretrained_model_name_or_path,
*args, **kwargs)
except Exception as e:
invalidInputError(
False,
f"Could not load model from path: {pretrained_model_name_or_path}. "
f"Please make sure the CausalLM class matches "
"the model you want to load."
f"Received error {e}"
)
return model
class LlamaForCausalLM(_BaseGGMLClass):
from bigdl.llm.ggml.model.llama import Llama
GGML_Model = Llama
GGML_Module = "bigdl.llm.models"
GGML_Model = "Llama"
HF_Class = AutoModelForCausalLM
class ChatGLMForCausalLM(_BaseGGMLClass):
from bigdl.llm.ggml.model.chatglm import ChatGLM
GGML_Model = ChatGLM
GGML_Module = "bigdl.llm.ggml.model.chatglm"
GGML_Model = "ChatGLM"
HF_Class = AutoModel
class GptneoxForCausalLM(_BaseGGMLClass):
from bigdl.llm.ggml.model.gptneox import Gptneox
GGML_Model = Gptneox
GGML_Module = "bigdl.llm.models"
GGML_Model = "Gptneox"
HF_Class = AutoModelForCausalLM
class BloomForCausalLM(_BaseGGMLClass):
from bigdl.llm.ggml.model.bloom import Bloom
GGML_Model = Bloom
GGML_Module = "bigdl.llm.models"
GGML_Model = "Bloom"
HF_Class = AutoModelForCausalLM
class StarcoderForCausalLM(_BaseGGMLClass):
from bigdl.llm.ggml.model.starcoder import Starcoder
GGML_Model = Starcoder
GGML_Module = "bigdl.llm.models"
GGML_Model = "Starcoder"
HF_Class = AutoModelForCausalLM