diff --git a/python/llm/README.md b/python/llm/README.md index d4116274..9aa00e19 100644 --- a/python/llm/README.md +++ b/python/llm/README.md @@ -12,6 +12,22 @@ See the ***optimized performance*** of `phoenix-inst-chat-7b`, `vicuna-13b-v1.1`

+### Verified models +We may use any Hugging Face Transfomer models on `bigdl-llm`, and the following models have been verified on Intel laptops. +| Model | Example | +|-----------|----------------------------------------------------------| +| LLaMA | [link1](example/transformers/native_int4), [link2](example/transformers/transformers_int4/vicuna) | +| MPT | [link](example/transformers/transformers_int4/mpt) | +| Falcon | [link](example/transformers/transformers_int4/falcon) | +| ChatGLM | [link](example/transformers/transformers_int4/chatglm) | +| ChatGLM2 | [link](example/transformers/transformers_int4/chatglm2) | +| MOSS | [link](example/transformers/transformers_int4/moss) | +| Baichuan | [link](example/transformers/transformers_int4/baichuan) | +| Dolly-v1 | [link](example/transformers/transformers_int4/dolly_v1) | +| RedPajama | [link1](example/transformers/native_int4), [link2](example/transformers/transformers_int4/redpajama) | +| Phoenix | [link1](example/transformers/native_int4), [link2](example/transformers/transformers_int4/phoenix) | +| StarCoder | [link1](example/transformers/native_int4), [link2](example/transformers/transformers_int4/starcoder) | + ### Working with `bigdl-llm` @@ -100,13 +116,23 @@ You may run the models using `transformers`-style API in `bigdl-llm`. output = tokenizer.batch_decode(output_ids) ``` - See the complete example [here](example/transformers/transformers_int4/transformers_int4_pipeline.py). + See the complete examples [here](example/transformers/transformers_int4/). + + >**Note**: You may apply more low bit optimizations (including INT8, INT5 and INT4) as follows: + >```python + >model = AutoModelForCausalLM.from_pretrained('/path/to/model/', load_in_low_bit="sym_int5") + >``` + >See the complete example [here](example/transformers/transformers_low_bit/). - Notice: For more quantized precision, you can use another parameter `load_in_low_bit`. Available types are `sym_int4`, `asym_int4`, `sym_int5`, `asym_int5` and `sym_int8`. - ```python - model = AutoModelForCausalLM.from_pretrained('/path/to/model/', load_in_low_bit="sym_int5") - ``` + After the model is optimizaed using INT4 (or INT8/INT5), you may save and load the optimized model as follows: + ```python + model.save_low_bit(model_path) + + new_model = AutoModelForCausalLM.load_low_bit(model_path) + ``` + See the example [here](example/transformers/transformers_low_bit/). + - ##### Using native INT4 format You may also convert Hugging Face *Transformers* models into native INT4 format for maximum performance as follows. diff --git a/python/llm/example/transformers/native_int4/native_int4_pipeline_readme.md b/python/llm/example/transformers/native_int4/README.md similarity index 97% rename from python/llm/example/transformers/native_int4/native_int4_pipeline_readme.md rename to python/llm/example/transformers/native_int4/README.md index 9b042536..3152ff31 100644 --- a/python/llm/example/transformers/native_int4/native_int4_pipeline_readme.md +++ b/python/llm/example/transformers/native_int4/README.md @@ -2,7 +2,7 @@ In this example, we show a pipeline to convert a large language model to BigDL-LLM native INT4 format, and then run inference on the converted INT4 model. -> **Note**: BigDL-LLM native INT4 format currently supports model family LLaMA, GPT-NeoX, BLOOM and StarCoder. +> **Note**: BigDL-LLM native INT4 format currently supports model family **LLaMA**(such as Vicuna, Guanaco, Koala, Baize, WizardLM, etc.), **GPT-NeoX**(such as RedPajama), **BLOOM** (such as Phoenix) and **StarCoder**. ## Prepare Environment We suggest using conda to manage environment: diff --git a/python/llm/example/transformers/transformers_int4/README.md b/python/llm/example/transformers/transformers_int4/README.md index b319f89b..9c207a7d 100644 --- a/python/llm/example/transformers/transformers_int4/README.md +++ b/python/llm/example/transformers/transformers_int4/README.md @@ -1,6 +1,21 @@ # BigDL-LLM Transformers INT4 Optimization for Large Language Model You can use BigDL-LLM to run any Huggingface Transformer models with INT4 optimizations on either servers or laptops. This directory contains example scripts to help you quickly get started using BigDL-LLM to run some popular open-source models in the community. Each model has its own dedicated folder, where you can find detailed instructions on how to install and run it. +# Verified models +| Model | Example | +|-----------|----------------------------------------------------------| +| LLaMA | [link](vicuna) | +| MPT | [link](mpt) | +| Falcon | [link](falcon) | +| ChatGLM | [link](chatglm) | +| ChatGLM2 | [link](chatglm2) | +| MOSS | [link](moss) | +| Baichuan | [link](baichuan) | +| Dolly-v1 | [link](dolly_v1) | +| RedPajama | [link](redpajama) | +| Phoenix | [link](phoenix) | +| StarCoder | [link](starcoder) | + ## Recommended Requirements To run the examples, we recommend using Intel® Xeon® processors (server), or >= 12th Gen Intel® Core™ processor (client). diff --git a/python/llm/example/transformers/transformers_low_bit/README.md b/python/llm/example/transformers/transformers_low_bit/README.md index 4bf23fae..46cd9406 100644 --- a/python/llm/example/transformers/transformers_low_bit/README.md +++ b/python/llm/example/transformers/transformers_low_bit/README.md @@ -1,6 +1,6 @@ -# BigDL-LLM Transformers INT4 Inference Pipeline for Large Language Model +# BigDL-LLM Transformers Low-Bit Inference Pipeline for Large Language Model -In this example, we show a pipeline to apply BigDL-LLM low-bit optimizations to any Hugging Face Transformers model, and then run inference on the optimized low-bit model. +In this example, we show a pipeline to apply BigDL-LLM low-bit optimizations (including INT8/INT5/INT4) to any Hugging Face Transformers model, and then run inference on the optimized low-bit model. ## Prepare Environment We suggest using conda to manage environment: