Update READMEs (#8554)
This commit is contained in:
		
							parent
							
								
									ee70977c07
								
							
						
					
					
						commit
						1ebc43b151
					
				
					 4 changed files with 49 additions and 8 deletions
				
			
		| 
						 | 
				
			
			@ -12,6 +12,22 @@ See the ***optimized performance*** of `phoenix-inst-chat-7b`, `vicuna-13b-v1.1`
 | 
			
		|||
            <img src="https://github.com/bigdl-project/bigdl-project.github.io/blob/master/assets/llm-models.png" width='85%'/>
 | 
			
		||||
</p>
 | 
			
		||||
 | 
			
		||||
### Verified models
 | 
			
		||||
We may use any Hugging Face Transfomer models on `bigdl-llm`, and the following models have been verified on Intel laptops.
 | 
			
		||||
| Model     | Example                                                  |
 | 
			
		||||
|-----------|----------------------------------------------------------|
 | 
			
		||||
| LLaMA     | [link1](example/transformers/native_int4), [link2](example/transformers/transformers_int4/vicuna)    |
 | 
			
		||||
| MPT       | [link](example/transformers/transformers_int4/mpt)       |
 | 
			
		||||
| Falcon    | [link](example/transformers/transformers_int4/falcon)    |
 | 
			
		||||
| ChatGLM   | [link](example/transformers/transformers_int4/chatglm)   | 
 | 
			
		||||
| ChatGLM2  | [link](example/transformers/transformers_int4/chatglm2)  | 
 | 
			
		||||
| MOSS      | [link](example/transformers/transformers_int4/moss)      | 
 | 
			
		||||
| Baichuan  | [link](example/transformers/transformers_int4/baichuan)  | 
 | 
			
		||||
| Dolly-v1  | [link](example/transformers/transformers_int4/dolly_v1)  | 
 | 
			
		||||
| RedPajama | [link1](example/transformers/native_int4), [link2](example/transformers/transformers_int4/redpajama) | 
 | 
			
		||||
| Phoenix   | [link1](example/transformers/native_int4), [link2](example/transformers/transformers_int4/phoenix)   | 
 | 
			
		||||
| StarCoder | [link1](example/transformers/native_int4), [link2](example/transformers/transformers_int4/starcoder) | 
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
### Working with `bigdl-llm`
 | 
			
		||||
 | 
			
		||||
| 
						 | 
				
			
			@ -100,12 +116,22 @@ You may run the models using `transformers`-style API in `bigdl-llm`.
 | 
			
		|||
  output = tokenizer.batch_decode(output_ids)
 | 
			
		||||
  ```
 | 
			
		||||
 | 
			
		||||
  See the complete example [here](example/transformers/transformers_int4/transformers_int4_pipeline.py).  
 | 
			
		||||
  See the complete examples [here](example/transformers/transformers_int4/).  
 | 
			
		||||
 | 
			
		||||
  Notice: For more quantized precision, you can use another parameter `load_in_low_bit`. Available types are `sym_int4`, `asym_int4`, `sym_int5`, `asym_int5` and `sym_int8`.
 | 
			
		||||
  >**Note**: You may apply more low bit optimizations (including INT8, INT5 and INT4) as follows: 
 | 
			
		||||
  >```python
 | 
			
		||||
  >model = AutoModelForCausalLM.from_pretrained('/path/to/model/', load_in_low_bit="sym_int5")
 | 
			
		||||
  >```
 | 
			
		||||
  >See the complete example [here](example/transformers/transformers_low_bit/).
 | 
			
		||||
 | 
			
		||||
  
 | 
			
		||||
  After the model is optimizaed using INT4 (or INT8/INT5), you may save and load the optimized model as follows:
 | 
			
		||||
  ```python
 | 
			
		||||
  model = AutoModelForCausalLM.from_pretrained('/path/to/model/', load_in_low_bit="sym_int5")
 | 
			
		||||
  model.save_low_bit(model_path)
 | 
			
		||||
  
 | 
			
		||||
  new_model = AutoModelForCausalLM.load_low_bit(model_path)
 | 
			
		||||
  ```
 | 
			
		||||
  See the example [here](example/transformers/transformers_low_bit/).
 | 
			
		||||
 | 
			
		||||
- ##### Using native INT4 format
 | 
			
		||||
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
| 
						 | 
				
			
			@ -2,7 +2,7 @@
 | 
			
		|||
 | 
			
		||||
In this example, we show a pipeline to convert a large language model to BigDL-LLM native INT4 format, and then run inference on the converted INT4 model.
 | 
			
		||||
 | 
			
		||||
> **Note**: BigDL-LLM native INT4 format currently supports model family LLaMA, GPT-NeoX, BLOOM and StarCoder.
 | 
			
		||||
> **Note**: BigDL-LLM native INT4 format currently supports model family **LLaMA**(such as Vicuna, Guanaco, Koala, Baize, WizardLM, etc.), **GPT-NeoX**(such as RedPajama), **BLOOM** (such as Phoenix) and **StarCoder**.
 | 
			
		||||
 | 
			
		||||
## Prepare Environment
 | 
			
		||||
We suggest using conda to manage environment:
 | 
			
		||||
| 
						 | 
				
			
			@ -1,6 +1,21 @@
 | 
			
		|||
# BigDL-LLM Transformers INT4 Optimization for Large Language Model
 | 
			
		||||
You can use BigDL-LLM to run any Huggingface Transformer models with INT4 optimizations on either servers or laptops. This directory contains example scripts to help you quickly get started using BigDL-LLM to run some popular open-source models in the community. Each model has its own dedicated folder, where you can find detailed instructions on how to install and run it.
 | 
			
		||||
 | 
			
		||||
# Verified models
 | 
			
		||||
| Model     | Example                                                  |
 | 
			
		||||
|-----------|----------------------------------------------------------|
 | 
			
		||||
| LLaMA     | [link](vicuna)    |
 | 
			
		||||
| MPT       | [link](mpt)       |
 | 
			
		||||
| Falcon    | [link](falcon)    |
 | 
			
		||||
| ChatGLM   | [link](chatglm)   | 
 | 
			
		||||
| ChatGLM2  | [link](chatglm2)  | 
 | 
			
		||||
| MOSS      | [link](moss)      | 
 | 
			
		||||
| Baichuan  | [link](baichuan)  | 
 | 
			
		||||
| Dolly-v1  | [link](dolly_v1)  | 
 | 
			
		||||
| RedPajama | [link](redpajama) | 
 | 
			
		||||
| Phoenix   | [link](phoenix)   | 
 | 
			
		||||
| StarCoder | [link](starcoder) | 
 | 
			
		||||
 | 
			
		||||
## Recommended Requirements
 | 
			
		||||
To run the examples, we recommend using Intel® Xeon® processors (server), or >= 12th Gen Intel® Core™ processor (client).
 | 
			
		||||
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
| 
						 | 
				
			
			@ -1,6 +1,6 @@
 | 
			
		|||
# BigDL-LLM Transformers INT4 Inference Pipeline for Large Language Model
 | 
			
		||||
# BigDL-LLM Transformers Low-Bit Inference Pipeline for Large Language Model
 | 
			
		||||
 | 
			
		||||
In this example, we show a pipeline to apply BigDL-LLM low-bit optimizations to any Hugging Face Transformers model, and then run inference on the optimized low-bit model.
 | 
			
		||||
In this example, we show a pipeline to apply BigDL-LLM low-bit optimizations (including INT8/INT5/INT4) to any Hugging Face Transformers model, and then run inference on the optimized low-bit model.
 | 
			
		||||
 | 
			
		||||
## Prepare Environment
 | 
			
		||||
We suggest using conda to manage environment:
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
		Loading…
	
		Reference in a new issue