Update READMEs (#8554)
This commit is contained in:
parent
ee70977c07
commit
1ebc43b151
4 changed files with 49 additions and 8 deletions
|
|
@ -12,6 +12,22 @@ See the ***optimized performance*** of `phoenix-inst-chat-7b`, `vicuna-13b-v1.1`
|
|||
<img src="https://github.com/bigdl-project/bigdl-project.github.io/blob/master/assets/llm-models.png" width='85%'/>
|
||||
</p>
|
||||
|
||||
### Verified models
|
||||
We may use any Hugging Face Transfomer models on `bigdl-llm`, and the following models have been verified on Intel laptops.
|
||||
| Model | Example |
|
||||
|-----------|----------------------------------------------------------|
|
||||
| LLaMA | [link1](example/transformers/native_int4), [link2](example/transformers/transformers_int4/vicuna) |
|
||||
| MPT | [link](example/transformers/transformers_int4/mpt) |
|
||||
| Falcon | [link](example/transformers/transformers_int4/falcon) |
|
||||
| ChatGLM | [link](example/transformers/transformers_int4/chatglm) |
|
||||
| ChatGLM2 | [link](example/transformers/transformers_int4/chatglm2) |
|
||||
| MOSS | [link](example/transformers/transformers_int4/moss) |
|
||||
| Baichuan | [link](example/transformers/transformers_int4/baichuan) |
|
||||
| Dolly-v1 | [link](example/transformers/transformers_int4/dolly_v1) |
|
||||
| RedPajama | [link1](example/transformers/native_int4), [link2](example/transformers/transformers_int4/redpajama) |
|
||||
| Phoenix | [link1](example/transformers/native_int4), [link2](example/transformers/transformers_int4/phoenix) |
|
||||
| StarCoder | [link1](example/transformers/native_int4), [link2](example/transformers/transformers_int4/starcoder) |
|
||||
|
||||
|
||||
### Working with `bigdl-llm`
|
||||
|
||||
|
|
@ -100,13 +116,23 @@ You may run the models using `transformers`-style API in `bigdl-llm`.
|
|||
output = tokenizer.batch_decode(output_ids)
|
||||
```
|
||||
|
||||
See the complete example [here](example/transformers/transformers_int4/transformers_int4_pipeline.py).
|
||||
See the complete examples [here](example/transformers/transformers_int4/).
|
||||
|
||||
>**Note**: You may apply more low bit optimizations (including INT8, INT5 and INT4) as follows:
|
||||
>```python
|
||||
>model = AutoModelForCausalLM.from_pretrained('/path/to/model/', load_in_low_bit="sym_int5")
|
||||
>```
|
||||
>See the complete example [here](example/transformers/transformers_low_bit/).
|
||||
|
||||
Notice: For more quantized precision, you can use another parameter `load_in_low_bit`. Available types are `sym_int4`, `asym_int4`, `sym_int5`, `asym_int5` and `sym_int8`.
|
||||
```python
|
||||
model = AutoModelForCausalLM.from_pretrained('/path/to/model/', load_in_low_bit="sym_int5")
|
||||
```
|
||||
|
||||
After the model is optimizaed using INT4 (or INT8/INT5), you may save and load the optimized model as follows:
|
||||
```python
|
||||
model.save_low_bit(model_path)
|
||||
|
||||
new_model = AutoModelForCausalLM.load_low_bit(model_path)
|
||||
```
|
||||
See the example [here](example/transformers/transformers_low_bit/).
|
||||
|
||||
- ##### Using native INT4 format
|
||||
|
||||
You may also convert Hugging Face *Transformers* models into native INT4 format for maximum performance as follows.
|
||||
|
|
|
|||
|
|
@ -2,7 +2,7 @@
|
|||
|
||||
In this example, we show a pipeline to convert a large language model to BigDL-LLM native INT4 format, and then run inference on the converted INT4 model.
|
||||
|
||||
> **Note**: BigDL-LLM native INT4 format currently supports model family LLaMA, GPT-NeoX, BLOOM and StarCoder.
|
||||
> **Note**: BigDL-LLM native INT4 format currently supports model family **LLaMA**(such as Vicuna, Guanaco, Koala, Baize, WizardLM, etc.), **GPT-NeoX**(such as RedPajama), **BLOOM** (such as Phoenix) and **StarCoder**.
|
||||
|
||||
## Prepare Environment
|
||||
We suggest using conda to manage environment:
|
||||
|
|
@ -1,6 +1,21 @@
|
|||
# BigDL-LLM Transformers INT4 Optimization for Large Language Model
|
||||
You can use BigDL-LLM to run any Huggingface Transformer models with INT4 optimizations on either servers or laptops. This directory contains example scripts to help you quickly get started using BigDL-LLM to run some popular open-source models in the community. Each model has its own dedicated folder, where you can find detailed instructions on how to install and run it.
|
||||
|
||||
# Verified models
|
||||
| Model | Example |
|
||||
|-----------|----------------------------------------------------------|
|
||||
| LLaMA | [link](vicuna) |
|
||||
| MPT | [link](mpt) |
|
||||
| Falcon | [link](falcon) |
|
||||
| ChatGLM | [link](chatglm) |
|
||||
| ChatGLM2 | [link](chatglm2) |
|
||||
| MOSS | [link](moss) |
|
||||
| Baichuan | [link](baichuan) |
|
||||
| Dolly-v1 | [link](dolly_v1) |
|
||||
| RedPajama | [link](redpajama) |
|
||||
| Phoenix | [link](phoenix) |
|
||||
| StarCoder | [link](starcoder) |
|
||||
|
||||
## Recommended Requirements
|
||||
To run the examples, we recommend using Intel® Xeon® processors (server), or >= 12th Gen Intel® Core™ processor (client).
|
||||
|
||||
|
|
|
|||
|
|
@ -1,6 +1,6 @@
|
|||
# BigDL-LLM Transformers INT4 Inference Pipeline for Large Language Model
|
||||
# BigDL-LLM Transformers Low-Bit Inference Pipeline for Large Language Model
|
||||
|
||||
In this example, we show a pipeline to apply BigDL-LLM low-bit optimizations to any Hugging Face Transformers model, and then run inference on the optimized low-bit model.
|
||||
In this example, we show a pipeline to apply BigDL-LLM low-bit optimizations (including INT8/INT5/INT4) to any Hugging Face Transformers model, and then run inference on the optimized low-bit model.
|
||||
|
||||
## Prepare Environment
|
||||
We suggest using conda to manage environment:
|
||||
|
|
|
|||
Loading…
Reference in a new issue