[BigDL LLM] Update readme for unifying transformers API (#8737)
* update readme doc * fix readthedocs error * update comment * update exception error info * invalidInputError instead * fix readme typo error and remove import error * fix more typo
This commit is contained in:
		
							parent
							
								
									c1f9af6d97
								
							
						
					
					
						commit
						f4164e4492
					
				
					 4 changed files with 98 additions and 36 deletions
				
			
		| 
						 | 
					@ -9,7 +9,7 @@ _**Fast, Distributed, Secure AI for Big Data**_
 | 
				
			||||||
---
 | 
					---
 | 
				
			||||||
## Latest News
 | 
					## Latest News
 | 
				
			||||||
 | 
					
 | 
				
			||||||
- **Try the latest [`bigdl-llm`](python/llm) for running LLM (language language model) on your Intel laptop using INT4 with very low latency!**[^1] *(It is built on top of the excellent work of [llama.cpp](https://github.com/ggerganov/llama.cpp), [gptq](https://github.com/IST-DASLab/gptq), [bitsandbytes](https://github.com/TimDettmers/bitsandbytes), etc., and supports any Hugging Face Transformers model)*
 | 
					- **Try the latest [`bigdl-llm`](python/llm) for running LLM (large language model) on your Intel laptop using INT4 with very low latency!**[^1] *(It is built on top of the excellent work of [llama.cpp](https://github.com/ggerganov/llama.cpp), [gptq](https://github.com/IST-DASLab/gptq), [bitsandbytes](https://github.com/TimDettmers/bitsandbytes), etc., and supports any Hugging Face Transformers model)*
 | 
				
			||||||
 | 
					
 | 
				
			||||||
<p align="center">
 | 
					<p align="center">
 | 
				
			||||||
            <img src="https://github.com/bigdl-project/bigdl-project.github.io/blob/master/assets/chatglm2-6b.gif" width='30%' /> <img src="https://github.com/bigdl-project/bigdl-project.github.io/blob/master/assets/llm-13b.gif" width='30%' /> <img src="https://github.com/bigdl-project/bigdl-project.github.io/blob/master/assets/llm-15b5.gif" width='30%' />
 | 
					            <img src="https://github.com/bigdl-project/bigdl-project.github.io/blob/master/assets/chatglm2-6b.gif" width='30%' /> <img src="https://github.com/bigdl-project/bigdl-project.github.io/blob/master/assets/llm-13b.gif" width='30%' /> <img src="https://github.com/bigdl-project/bigdl-project.github.io/blob/master/assets/llm-15b5.gif" width='30%' />
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
| 
						 | 
					@ -48,7 +48,45 @@ llm.transformers.model
 | 
				
			||||||
llm.transformers.modelling_bigdl
 | 
					llm.transformers.modelling_bigdl
 | 
				
			||||||
----------------------------------------
 | 
					----------------------------------------
 | 
				
			||||||
 | 
					
 | 
				
			||||||
.. automodule:: bigdl.llm.transformers.modelling_bigdl
 | 
					.. autoclass:: bigdl.llm.transformers.modelling_bigdl.LlamaForCausalLM
 | 
				
			||||||
    :members:
 | 
					    :members:
 | 
				
			||||||
    :undoc-members:
 | 
					    :undoc-members:
 | 
				
			||||||
    :show-inheritance:
 | 
					    :show-inheritance:
 | 
				
			||||||
 | 
					    :exclude-members: GGML_Model, GGML_Module, HF_Class
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    .. automethod:: from_pretrained
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					.. autoclass:: bigdl.llm.transformers.modelling_bigdl.ChatGLMForCausalLM
 | 
				
			||||||
 | 
					    :members:
 | 
				
			||||||
 | 
					    :undoc-members:
 | 
				
			||||||
 | 
					    :show-inheritance:
 | 
				
			||||||
 | 
					    :exclude-members: GGML_Model, GGML_Module, HF_Class
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    .. automethod:: from_pretrained
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					.. autoclass:: bigdl.llm.transformers.modelling_bigdl.GptneoxForCausalLM
 | 
				
			||||||
 | 
					    :members:
 | 
				
			||||||
 | 
					    :undoc-members:
 | 
				
			||||||
 | 
					    :show-inheritance:
 | 
				
			||||||
 | 
					    :exclude-members: GGML_Model, GGML_Module, HF_Class
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    .. automethod:: from_pretrained
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					.. autoclass:: bigdl.llm.transformers.modelling_bigdl.BloomForCausalLM
 | 
				
			||||||
 | 
					    :members:
 | 
				
			||||||
 | 
					    :undoc-members:
 | 
				
			||||||
 | 
					    :show-inheritance:
 | 
				
			||||||
 | 
					    :exclude-members: GGML_Model, GGML_Module, HF_Class    
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    .. automethod:: from_pretrained
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					.. autoclass:: bigdl.llm.transformers.modelling_bigdl.StarcoderForCausalLM
 | 
				
			||||||
 | 
					    :members:
 | 
				
			||||||
 | 
					    :undoc-members:
 | 
				
			||||||
 | 
					    :show-inheritance:
 | 
				
			||||||
 | 
					    :exclude-members: GGML_Model, GGML_Module, HF_Class
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    .. automethod:: from_pretrained
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
| 
						 | 
					@ -1,6 +1,6 @@
 | 
				
			||||||
## BigDL-LLM
 | 
					## BigDL-LLM
 | 
				
			||||||
 | 
					
 | 
				
			||||||
**`bigdl-llm`** is a library for running ***LLM*** (language language model) on your Intel ***laptop*** using INT4 with very low latency[^1] (for any Hugging Face *Transformers* model). 
 | 
					**`bigdl-llm`** is a library for running ***LLM*** (large language model) on your Intel ***laptop*** using INT4 with very low latency[^1] (for any Hugging Face *Transformers* model).
 | 
				
			||||||
 | 
					
 | 
				
			||||||
>*(It is built on top of the excellent work of [llama.cpp](https://github.com/ggerganov/llama.cpp), [gptq](https://github.com/IST-DASLab/gptq), [ggml](https://github.com/ggerganov/ggml), [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), [gptq_for_llama](https://github.com/qwopqwop200/GPTQ-for-LLaMa), [bitsandbytes](https://github.com/TimDettmers/bitsandbytes), [chatglm.cpp](https://github.com/li-plus/chatglm.cpp), [redpajama.cpp](https://github.com/togethercomputer/redpajama.cpp), [gptneox.cpp](https://github.com/byroneverson/gptneox.cpp), [bloomz.cpp](https://github.com/NouamaneTazi/bloomz.cpp/), etc.)*
 | 
					>*(It is built on top of the excellent work of [llama.cpp](https://github.com/ggerganov/llama.cpp), [gptq](https://github.com/IST-DASLab/gptq), [ggml](https://github.com/ggerganov/ggml), [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), [gptq_for_llama](https://github.com/qwopqwop200/GPTQ-for-LLaMa), [bitsandbytes](https://github.com/TimDettmers/bitsandbytes), [chatglm.cpp](https://github.com/li-plus/chatglm.cpp), [redpajama.cpp](https://github.com/togethercomputer/redpajama.cpp), [gptneox.cpp](https://github.com/byroneverson/gptneox.cpp), [bloomz.cpp](https://github.com/NouamaneTazi/bloomz.cpp/), etc.)*
 | 
				
			||||||
 | 
					
 | 
				
			||||||
| 
						 | 
					@ -76,7 +76,11 @@ You may run the models using `transformers`-style API in `bigdl-llm`.
 | 
				
			||||||
  #load Hugging Face Transformers model with INT4 optimizations
 | 
					  #load Hugging Face Transformers model with INT4 optimizations
 | 
				
			||||||
  from bigdl.llm.transformers import AutoModelForCausalLM
 | 
					  from bigdl.llm.transformers import AutoModelForCausalLM
 | 
				
			||||||
  model = AutoModelForCausalLM.from_pretrained('/path/to/model/', load_in_4bit=True)
 | 
					  model = AutoModelForCausalLM.from_pretrained('/path/to/model/', load_in_4bit=True)
 | 
				
			||||||
 | 
					  ```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					  After loading the Hugging Face Transformers model, you may easily run the optimized model as follows.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					  ```python
 | 
				
			||||||
  #run the optimized model
 | 
					  #run the optimized model
 | 
				
			||||||
  from transformers import AutoTokenizer
 | 
					  from transformers import AutoTokenizer
 | 
				
			||||||
  tokenizer = AutoTokenizer.from_pretrained(model_path)
 | 
					  tokenizer = AutoTokenizer.from_pretrained(model_path)
 | 
				
			||||||
| 
						 | 
					@ -94,7 +98,8 @@ You may run the models using `transformers`-style API in `bigdl-llm`.
 | 
				
			||||||
    >See the complete example [here](example/transformers/transformers_low_bit/).
 | 
					    >See the complete example [here](example/transformers/transformers_low_bit/).
 | 
				
			||||||
  
 | 
					  
 | 
				
			||||||
 | 
					
 | 
				
			||||||
  After the model is optimizaed using INT4 (or INT5/INT8), you may save and load the optimized model as follows:
 | 
					  After the model is optimizaed using INT4 (or INT8/INT5), you may save and load the optimized model as follows:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
  ```python
 | 
					  ```python
 | 
				
			||||||
  model.save_low_bit(model_path)
 | 
					  model.save_low_bit(model_path)
 | 
				
			||||||
  
 | 
					  
 | 
				
			||||||
| 
						 | 
					@ -106,7 +111,11 @@ You may run the models using `transformers`-style API in `bigdl-llm`.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
  You may also convert Hugging Face *Transformers* models into native INT4 format for maximum performance as follows.
 | 
					  You may also convert Hugging Face *Transformers* models into native INT4 format for maximum performance as follows.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
  >**Note**: Currently only llama/bloom/gptneox/starcoder model family is supported; for other models, you may use the Transformers INT4 format as described above).
 | 
					  >**Notes**: 
 | 
				
			||||||
 | 
					  
 | 
				
			||||||
 | 
					  * Currently only llama/bloom/gptneox/starcoder/chatglm model families are supported; for other models, you may use the Transformers INT4 format as described above).
 | 
				
			||||||
 | 
					  
 | 
				
			||||||
 | 
					  * You may choose the corresponding API developed for specific native models to load the converted model.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
  ```python
 | 
					  ```python
 | 
				
			||||||
  #convert the model
 | 
					  #convert the model
 | 
				
			||||||
| 
						 | 
					@ -115,8 +124,9 @@ You may run the models using `transformers`-style API in `bigdl-llm`.
 | 
				
			||||||
          outfile='/path/to/output/', outtype='int4', model_family="llama")
 | 
					          outfile='/path/to/output/', outtype='int4', model_family="llama")
 | 
				
			||||||
 | 
					
 | 
				
			||||||
  #load the converted model
 | 
					  #load the converted model
 | 
				
			||||||
  from bigdl.llm.transformers import BigdlNativeForCausalLM
 | 
					  #switch to ChatGLMForCausalLM/GptneoxForCausalLM/BloomForCausalLM/StarcoderForCausalLM to load other models
 | 
				
			||||||
  llm = BigdlNativeForCausalLM.from_pretrained("/path/to/output/model.bin",...)
 | 
					  from bigdl.llm.transformers import LlamaForCausalLM
 | 
				
			||||||
 | 
					  llm = LlamaForCausalLM.from_pretrained("/path/to/output/model.bin", ...)
 | 
				
			||||||
  
 | 
					  
 | 
				
			||||||
  #run the converted model
 | 
					  #run the converted model
 | 
				
			||||||
  input_ids = llm.tokenize(prompt)
 | 
					  input_ids = llm.tokenize(prompt)
 | 
				
			||||||
| 
						 | 
					@ -243,8 +253,9 @@ See the inital `bigdl-llm` API Doc [here](https://bigdl.readthedocs.io/en/latest
 | 
				
			||||||
 | 
					
 | 
				
			||||||
[^1]: Performance varies by use, configuration and other factors. `bigdl-llm` may not optimize to the same degree for non-Intel products. Learn more at www.Intel.com/PerformanceIndex.
 | 
					[^1]: Performance varies by use, configuration and other factors. `bigdl-llm` may not optimize to the same degree for non-Intel products. Learn more at www.Intel.com/PerformanceIndex.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
### `bigdl-llm` Dependence 
 | 
					### `bigdl-llm` Dependencies 
 | 
				
			||||||
The native code/lib in `bigdl-llm` has been built using the following tools; in particular, lower  `LIBC` version on your Linux system may be incompatible with `bigdl-llm`.
 | 
					The native code/lib in `bigdl-llm` has been built using the following tools.
 | 
				
			||||||
 | 
					Note that lower  `LIBC` version on your Linux system may be incompatible with `bigdl-llm`.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
| Model family | Platform | Compiler           | GLIBC |
 | 
					| Model family | Platform | Compiler           | GLIBC |
 | 
				
			||||||
| ------------ | -------- | ------------------ | ----- |
 | 
					| ------------ | -------- | ------------------ | ----- |
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
| 
						 | 
					@ -19,7 +19,9 @@
 | 
				
			||||||
# Otherwise there would be module not found error in non-pip's setting as Python would
 | 
					# Otherwise there would be module not found error in non-pip's setting as Python would
 | 
				
			||||||
# only search the first bigdl package and end up finding only one sub-package.
 | 
					# only search the first bigdl package and end up finding only one sub-package.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					import importlib
 | 
				
			||||||
import logging
 | 
					import logging
 | 
				
			||||||
 | 
					
 | 
				
			||||||
from bigdl.llm.utils.common import invalidInputError
 | 
					from bigdl.llm.utils.common import invalidInputError
 | 
				
			||||||
from .model import *
 | 
					from .model import *
 | 
				
			||||||
 | 
					
 | 
				
			||||||
| 
						 | 
					@ -107,42 +109,53 @@ class _BaseGGMLClass:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
        :return: a model instance
 | 
					        :return: a model instance
 | 
				
			||||||
        """
 | 
					        """
 | 
				
			||||||
 | 
					        try:
 | 
				
			||||||
 | 
					            module = importlib.import_module(cls.GGML_Module)
 | 
				
			||||||
 | 
					            class_ = getattr(module, cls.GGML_Model)
 | 
				
			||||||
            if native:
 | 
					            if native:
 | 
				
			||||||
                invalidInputError(dtype.lower() in ['int4', 'int8'],
 | 
					                invalidInputError(dtype.lower() in ['int4', 'int8'],
 | 
				
			||||||
                                  "Now we only support int4 and int8 as date type for weight")
 | 
					                                  "Now we only support int4 and int8 as date type for weight")
 | 
				
			||||||
                ggml_model_path = pretrained_model_name_or_path
 | 
					                ggml_model_path = pretrained_model_name_or_path
 | 
				
			||||||
            return cls.GGML_Model(model_path=ggml_model_path,
 | 
					                model = class_(model_path=ggml_model_path, **kwargs)
 | 
				
			||||||
                                  **kwargs)
 | 
					 | 
				
			||||||
            else:
 | 
					            else:
 | 
				
			||||||
            return cls.HF_Class.from_pretrained(pretrained_model_name_or_path,
 | 
					                model = cls.HF_Class.from_pretrained(pretrained_model_name_or_path,
 | 
				
			||||||
                                                     *args, **kwargs)
 | 
					                                                     *args, **kwargs)
 | 
				
			||||||
 | 
					        except Exception as e:
 | 
				
			||||||
 | 
					            invalidInputError(
 | 
				
			||||||
 | 
					                False,
 | 
				
			||||||
 | 
					                f"Could not load model from path: {pretrained_model_name_or_path}. "
 | 
				
			||||||
 | 
					                f"Please make sure the CausalLM class matches "
 | 
				
			||||||
 | 
					                "the model you want to load."
 | 
				
			||||||
 | 
					                f"Received error {e}"
 | 
				
			||||||
 | 
					            )
 | 
				
			||||||
 | 
					        return model
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
class LlamaForCausalLM(_BaseGGMLClass):
 | 
					class LlamaForCausalLM(_BaseGGMLClass):
 | 
				
			||||||
    from bigdl.llm.ggml.model.llama import Llama
 | 
					    GGML_Module = "bigdl.llm.models"
 | 
				
			||||||
    GGML_Model = Llama
 | 
					    GGML_Model = "Llama"
 | 
				
			||||||
    HF_Class = AutoModelForCausalLM
 | 
					    HF_Class = AutoModelForCausalLM
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
class ChatGLMForCausalLM(_BaseGGMLClass):
 | 
					class ChatGLMForCausalLM(_BaseGGMLClass):
 | 
				
			||||||
    from bigdl.llm.ggml.model.chatglm import ChatGLM
 | 
					    GGML_Module = "bigdl.llm.ggml.model.chatglm"
 | 
				
			||||||
    GGML_Model = ChatGLM
 | 
					    GGML_Model = "ChatGLM"
 | 
				
			||||||
    HF_Class = AutoModel
 | 
					    HF_Class = AutoModel
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
class GptneoxForCausalLM(_BaseGGMLClass):
 | 
					class GptneoxForCausalLM(_BaseGGMLClass):
 | 
				
			||||||
    from bigdl.llm.ggml.model.gptneox import Gptneox
 | 
					    GGML_Module = "bigdl.llm.models"
 | 
				
			||||||
    GGML_Model = Gptneox
 | 
					    GGML_Model = "Gptneox"
 | 
				
			||||||
    HF_Class = AutoModelForCausalLM
 | 
					    HF_Class = AutoModelForCausalLM
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
class BloomForCausalLM(_BaseGGMLClass):
 | 
					class BloomForCausalLM(_BaseGGMLClass):
 | 
				
			||||||
    from bigdl.llm.ggml.model.bloom import Bloom
 | 
					    GGML_Module = "bigdl.llm.models"
 | 
				
			||||||
    GGML_Model = Bloom
 | 
					    GGML_Model = "Bloom"
 | 
				
			||||||
    HF_Class = AutoModelForCausalLM
 | 
					    HF_Class = AutoModelForCausalLM
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
class StarcoderForCausalLM(_BaseGGMLClass):
 | 
					class StarcoderForCausalLM(_BaseGGMLClass):
 | 
				
			||||||
    from bigdl.llm.ggml.model.starcoder import Starcoder
 | 
					    GGML_Module = "bigdl.llm.models"
 | 
				
			||||||
    GGML_Model = Starcoder
 | 
					    GGML_Model = "Starcoder"
 | 
				
			||||||
    HF_Class = AutoModelForCausalLM
 | 
					    HF_Class = AutoModelForCausalLM
 | 
				
			||||||
| 
						 | 
					
 | 
				
			||||||
		Loading…
	
		Reference in a new issue