Update READMEs (#8554)

This commit is contained in:
Jason Dai 2023-07-18 11:06:06 +08:00 committed by GitHub
parent ee70977c07
commit 1ebc43b151
4 changed files with 49 additions and 8 deletions

View file

@ -12,6 +12,22 @@ See the ***optimized performance*** of `phoenix-inst-chat-7b`, `vicuna-13b-v1.1`
<img src="https://github.com/bigdl-project/bigdl-project.github.io/blob/master/assets/llm-models.png" width='85%'/>
</p>
### Verified models
We may use any Hugging Face Transfomer models on `bigdl-llm`, and the following models have been verified on Intel laptops.
| Model | Example |
|-----------|----------------------------------------------------------|
| LLaMA | [link1](example/transformers/native_int4), [link2](example/transformers/transformers_int4/vicuna) |
| MPT | [link](example/transformers/transformers_int4/mpt) |
| Falcon | [link](example/transformers/transformers_int4/falcon) |
| ChatGLM | [link](example/transformers/transformers_int4/chatglm) |
| ChatGLM2 | [link](example/transformers/transformers_int4/chatglm2) |
| MOSS | [link](example/transformers/transformers_int4/moss) |
| Baichuan | [link](example/transformers/transformers_int4/baichuan) |
| Dolly-v1 | [link](example/transformers/transformers_int4/dolly_v1) |
| RedPajama | [link1](example/transformers/native_int4), [link2](example/transformers/transformers_int4/redpajama) |
| Phoenix | [link1](example/transformers/native_int4), [link2](example/transformers/transformers_int4/phoenix) |
| StarCoder | [link1](example/transformers/native_int4), [link2](example/transformers/transformers_int4/starcoder) |
### Working with `bigdl-llm`
@ -100,13 +116,23 @@ You may run the models using `transformers`-style API in `bigdl-llm`.
output = tokenizer.batch_decode(output_ids)
```
See the complete example [here](example/transformers/transformers_int4/transformers_int4_pipeline.py).
See the complete examples [here](example/transformers/transformers_int4/).
>**Note**: You may apply more low bit optimizations (including INT8, INT5 and INT4) as follows:
>```python
>model = AutoModelForCausalLM.from_pretrained('/path/to/model/', load_in_low_bit="sym_int5")
>```
>See the complete example [here](example/transformers/transformers_low_bit/).
Notice: For more quantized precision, you can use another parameter `load_in_low_bit`. Available types are `sym_int4`, `asym_int4`, `sym_int5`, `asym_int5` and `sym_int8`.
```python
model = AutoModelForCausalLM.from_pretrained('/path/to/model/', load_in_low_bit="sym_int5")
```
After the model is optimizaed using INT4 (or INT8/INT5), you may save and load the optimized model as follows:
```python
model.save_low_bit(model_path)
new_model = AutoModelForCausalLM.load_low_bit(model_path)
```
See the example [here](example/transformers/transformers_low_bit/).
- ##### Using native INT4 format
You may also convert Hugging Face *Transformers* models into native INT4 format for maximum performance as follows.

View file

@ -2,7 +2,7 @@
In this example, we show a pipeline to convert a large language model to BigDL-LLM native INT4 format, and then run inference on the converted INT4 model.
> **Note**: BigDL-LLM native INT4 format currently supports model family LLaMA, GPT-NeoX, BLOOM and StarCoder.
> **Note**: BigDL-LLM native INT4 format currently supports model family **LLaMA**(such as Vicuna, Guanaco, Koala, Baize, WizardLM, etc.), **GPT-NeoX**(such as RedPajama), **BLOOM** (such as Phoenix) and **StarCoder**.
## Prepare Environment
We suggest using conda to manage environment:

View file

@ -1,6 +1,21 @@
# BigDL-LLM Transformers INT4 Optimization for Large Language Model
You can use BigDL-LLM to run any Huggingface Transformer models with INT4 optimizations on either servers or laptops. This directory contains example scripts to help you quickly get started using BigDL-LLM to run some popular open-source models in the community. Each model has its own dedicated folder, where you can find detailed instructions on how to install and run it.
# Verified models
| Model | Example |
|-----------|----------------------------------------------------------|
| LLaMA | [link](vicuna) |
| MPT | [link](mpt) |
| Falcon | [link](falcon) |
| ChatGLM | [link](chatglm) |
| ChatGLM2 | [link](chatglm2) |
| MOSS | [link](moss) |
| Baichuan | [link](baichuan) |
| Dolly-v1 | [link](dolly_v1) |
| RedPajama | [link](redpajama) |
| Phoenix | [link](phoenix) |
| StarCoder | [link](starcoder) |
## Recommended Requirements
To run the examples, we recommend using Intel® Xeon® processors (server), or >= 12th Gen Intel® Core™ processor (client).

View file

@ -1,6 +1,6 @@
# BigDL-LLM Transformers INT4 Inference Pipeline for Large Language Model
# BigDL-LLM Transformers Low-Bit Inference Pipeline for Large Language Model
In this example, we show a pipeline to apply BigDL-LLM low-bit optimizations to any Hugging Face Transformers model, and then run inference on the optimized low-bit model.
In this example, we show a pipeline to apply BigDL-LLM low-bit optimizations (including INT8/INT5/INT4) to any Hugging Face Transformers model, and then run inference on the optimized low-bit model.
## Prepare Environment
We suggest using conda to manage environment: