Update READMEs (#8554)

2023-07-18 11:06:06 +08:00 · 2023-07-18 11:06:06 +08:00 · 1ebc43b151
commit 1ebc43b151
parent ee70977c07
4 changed files with 49 additions and 8 deletions
--- a/python/llm/README.md
+++ b/python/llm/README.md
@ -12,6 +12,22 @@ See the ***optimized performance*** of `phoenix-inst-chat-7b`, `vicuna-13b-v1.1`
            <img src="https://github.com/bigdl-project/bigdl-project.github.io/blob/master/assets/llm-models.png" width='85%'/>
 </p>

+### Verified models
+We may use any Hugging Face Transfomer models on `bigdl-llm`, and the following models have been verified on Intel laptops.
+| Model     | Example                                                  |
+|-----------|----------------------------------------------------------|
+| LLaMA     | [link1](example/transformers/native_int4), [link2](example/transformers/transformers_int4/vicuna)    |
+| MPT       | [link](example/transformers/transformers_int4/mpt)       |
+| Falcon    | [link](example/transformers/transformers_int4/falcon)    |
+| ChatGLM   | [link](example/transformers/transformers_int4/chatglm)   | 
+| ChatGLM2  | [link](example/transformers/transformers_int4/chatglm2)  | 
+| MOSS      | [link](example/transformers/transformers_int4/moss)      | 
+| Baichuan  | [link](example/transformers/transformers_int4/baichuan)  | 
+| Dolly-v1  | [link](example/transformers/transformers_int4/dolly_v1)  | 
+| RedPajama | [link1](example/transformers/native_int4), [link2](example/transformers/transformers_int4/redpajama) | 
+| Phoenix   | [link1](example/transformers/native_int4), [link2](example/transformers/transformers_int4/phoenix)   | 
+| StarCoder | [link1](example/transformers/native_int4), [link2](example/transformers/transformers_int4/starcoder) | 
+

 ### Working with `bigdl-llm`

@ -100,13 +116,23 @@ You may run the models using `transformers`-style API in `bigdl-llm`.
  output = tokenizer.batch_decode(output_ids)
  ```

-  See the complete example [here](example/transformers/transformers_int4/transformers_int4_pipeline.py).  
+  See the complete examples [here](example/transformers/transformers_int4/).  
+
+  >**Note**: You may apply more low bit optimizations (including INT8, INT5 and INT4) as follows: 
+  >```python
+  >model = AutoModelForCausalLM.from_pretrained('/path/to/model/', load_in_low_bit="sym_int5")
+  >```
+  >See the complete example [here](example/transformers/transformers_low_bit/).

-  Notice: For more quantized precision, you can use another parameter `load_in_low_bit`. Available types are `sym_int4`, `asym_int4`, `sym_int5`, `asym_int5` and `sym_int8`.
-  ```python
-  model = AutoModelForCausalLM.from_pretrained('/path/to/model/', load_in_low_bit="sym_int5")
-  ```
  
+  After the model is optimizaed using INT4 (or INT8/INT5), you may save and load the optimized model as follows:
+  ```python
+  model.save_low_bit(model_path)
+  
+  new_model = AutoModelForCausalLM.load_low_bit(model_path)
+  ```
+  See the example [here](example/transformers/transformers_low_bit/).
+
 - ##### Using native INT4 format

  You may also convert Hugging Face *Transformers* models into native INT4 format for maximum performance as follows.
--- a/python/llm/example/transformers/native_int4/native_int4_pipeline_readme.md
+++ b/python/llm/example/transformers/native_int4/native_int4_pipeline_readme.md
@ -2,7 +2,7 @@

 In this example, we show a pipeline to convert a large language model to BigDL-LLM native INT4 format, and then run inference on the converted INT4 model.

-> **Note**: BigDL-LLM native INT4 format currently supports model family LLaMA, GPT-NeoX, BLOOM and StarCoder.
+> **Note**: BigDL-LLM native INT4 format currently supports model family **LLaMA**(such as Vicuna, Guanaco, Koala, Baize, WizardLM, etc.), **GPT-NeoX**(such as RedPajama), **BLOOM** (such as Phoenix) and **StarCoder**.

 ## Prepare Environment
 We suggest using conda to manage environment:
--- a/python/llm/example/transformers/transformers_int4/README.md
+++ b/python/llm/example/transformers/transformers_int4/README.md
@ -1,6 +1,21 @@
 # BigDL-LLM Transformers INT4 Optimization for Large Language Model
 You can use BigDL-LLM to run any Huggingface Transformer models with INT4 optimizations on either servers or laptops. This directory contains example scripts to help you quickly get started using BigDL-LLM to run some popular open-source models in the community. Each model has its own dedicated folder, where you can find detailed instructions on how to install and run it.

+# Verified models
+| Model     | Example                                                  |
+|-----------|----------------------------------------------------------|
+| LLaMA     | [link](vicuna)    |
+| MPT       | [link](mpt)       |
+| Falcon    | [link](falcon)    |
+| ChatGLM   | [link](chatglm)   | 
+| ChatGLM2  | [link](chatglm2)  | 
+| MOSS      | [link](moss)      | 
+| Baichuan  | [link](baichuan)  | 
+| Dolly-v1  | [link](dolly_v1)  | 
+| RedPajama | [link](redpajama) | 
+| Phoenix   | [link](phoenix)   | 
+| StarCoder | [link](starcoder) | 
+
 ## Recommended Requirements
 To run the examples, we recommend using Intel® Xeon® processors (server), or >= 12th Gen Intel® Core™ processor (client).

--- a/python/llm/example/transformers/transformers_low_bit/README.md
+++ b/python/llm/example/transformers/transformers_low_bit/README.md
@ -1,6 +1,6 @@
-# BigDL-LLM Transformers INT4 Inference Pipeline for Large Language Model
+# BigDL-LLM Transformers Low-Bit Inference Pipeline for Large Language Model

-In this example, we show a pipeline to apply BigDL-LLM low-bit optimizations to any Hugging Face Transformers model, and then run inference on the optimized low-bit model.
+In this example, we show a pipeline to apply BigDL-LLM low-bit optimizations (including INT8/INT5/INT4) to any Hugging Face Transformers model, and then run inference on the optimized low-bit model.

 ## Prepare Environment
 We suggest using conda to manage environment: