diff --git a/README.md b/README.md
index 1e1add49..ef40fdbe 100644
--- a/README.md
+++ b/README.md
@@ -9,7 +9,7 @@ _**Fast, Distributed, Secure AI for Big Data**_
 ---
 ## Latest News
 
-- **Try the latest [`bigdl-llm`](python/llm) for running LLM (language language model) on your Intel laptop using INT4 with very low latency!**[^1] *(It is built on top of the excellent work of [llama.cpp](https://github.com/ggerganov/llama.cpp), [gptq](https://github.com/IST-DASLab/gptq), [bitsandbytes](https://github.com/TimDettmers/bitsandbytes), etc., and supports any Hugging Face Transformers model)*
+- **Try the latest [`bigdl-llm`](python/llm) for running LLM (large language model) on your Intel laptop using INT4 with very low latency!**[^1] *(It is built on top of the excellent work of [llama.cpp](https://github.com/ggerganov/llama.cpp), [gptq](https://github.com/IST-DASLab/gptq), [bitsandbytes](https://github.com/TimDettmers/bitsandbytes), etc., and supports any Hugging Face Transformers model)*
 
 <p align="center">
             <img src="https://github.com/bigdl-project/bigdl-project.github.io/blob/master/assets/chatglm2-6b.gif" width='30%' /> <img src="https://github.com/bigdl-project/bigdl-project.github.io/blob/master/assets/llm-13b.gif" width='30%' /> <img src="https://github.com/bigdl-project/bigdl-project.github.io/blob/master/assets/llm-15b5.gif" width='30%' />
diff --git a/docs/readthedocs/source/doc/PythonAPI/LLM/transformers.rst b/docs/readthedocs/source/doc/PythonAPI/LLM/transformers.rst
index 10438632..519b2d27 100644
--- a/docs/readthedocs/source/doc/PythonAPI/LLM/transformers.rst
+++ b/docs/readthedocs/source/doc/PythonAPI/LLM/transformers.rst
@@ -48,7 +48,45 @@ llm.transformers.model
 llm.transformers.modelling_bigdl
 ----------------------------------------
 
-.. automodule:: bigdl.llm.transformers.modelling_bigdl
+.. autoclass:: bigdl.llm.transformers.modelling_bigdl.LlamaForCausalLM
     :members:
     :undoc-members:
     :show-inheritance:
+    :exclude-members: GGML_Model, GGML_Module, HF_Class
+
+    .. automethod:: from_pretrained
+
+
+.. autoclass:: bigdl.llm.transformers.modelling_bigdl.ChatGLMForCausalLM
+    :members:
+    :undoc-members:
+    :show-inheritance:
+    :exclude-members: GGML_Model, GGML_Module, HF_Class
+
+    .. automethod:: from_pretrained
+
+
+.. autoclass:: bigdl.llm.transformers.modelling_bigdl.GptneoxForCausalLM
+    :members:
+    :undoc-members:
+    :show-inheritance:
+    :exclude-members: GGML_Model, GGML_Module, HF_Class
+
+    .. automethod:: from_pretrained
+
+
+.. autoclass:: bigdl.llm.transformers.modelling_bigdl.BloomForCausalLM
+    :members:
+    :undoc-members:
+    :show-inheritance:
+    :exclude-members: GGML_Model, GGML_Module, HF_Class    
+
+    .. automethod:: from_pretrained
+
+.. autoclass:: bigdl.llm.transformers.modelling_bigdl.StarcoderForCausalLM
+    :members:
+    :undoc-members:
+    :show-inheritance:
+    :exclude-members: GGML_Model, GGML_Module, HF_Class
+
+    .. automethod:: from_pretrained
diff --git a/python/llm/README.md b/python/llm/README.md
index 84ee4f2b..34a8b4da 100644
--- a/python/llm/README.md
+++ b/python/llm/README.md
@@ -1,6 +1,6 @@
 ## BigDL-LLM
 
-**`bigdl-llm`** is a library for running ***LLM*** (language language model) on your Intel ***laptop*** using INT4 with very low latency[^1] (for any Hugging Face *Transformers* model). 
+**`bigdl-llm`** is a library for running ***LLM*** (large language model) on your Intel ***laptop*** using INT4 with very low latency[^1] (for any Hugging Face *Transformers* model).
 
 >*(It is built on top of the excellent work of [llama.cpp](https://github.com/ggerganov/llama.cpp), [gptq](https://github.com/IST-DASLab/gptq), [ggml](https://github.com/ggerganov/ggml), [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), [gptq_for_llama](https://github.com/qwopqwop200/GPTQ-for-LLaMa), [bitsandbytes](https://github.com/TimDettmers/bitsandbytes), [chatglm.cpp](https://github.com/li-plus/chatglm.cpp), [redpajama.cpp](https://github.com/togethercomputer/redpajama.cpp), [gptneox.cpp](https://github.com/byroneverson/gptneox.cpp), [bloomz.cpp](https://github.com/NouamaneTazi/bloomz.cpp/), etc.)*
 
@@ -76,7 +76,11 @@ You may run the models using `transformers`-style API in `bigdl-llm`.
   #load Hugging Face Transformers model with INT4 optimizations
   from bigdl.llm.transformers import AutoModelForCausalLM
   model = AutoModelForCausalLM.from_pretrained('/path/to/model/', load_in_4bit=True)
+  ```
 
+  After loading the Hugging Face Transformers model, you may easily run the optimized model as follows.
+
+  ```python
   #run the optimized model
   from transformers import AutoTokenizer
   tokenizer = AutoTokenizer.from_pretrained(model_path)
@@ -88,13 +92,14 @@ You may run the models using `transformers`-style API in `bigdl-llm`.
   See the complete examples [here](example/transformers/transformers_int4/).  
 
   >**Note**: You may apply more low bit optimizations (including INT8, INT5 and INT4) as follows: 
-  >```python
-  >model = AutoModelForCausalLM.from_pretrained('/path/to/model/', load_in_low_bit="sym_int5")
-  >```
-  >See the complete example [here](example/transformers/transformers_low_bit/).
-
+    >```python
+    >model = AutoModelForCausalLM.from_pretrained('/path/to/model/', load_in_low_bit="sym_int5")
+    >```
+    >See the complete example [here](example/transformers/transformers_low_bit/).
   
-  After the model is optimizaed using INT4 (or INT5/INT8), you may save and load the optimized model as follows:
+
+  After the model is optimizaed using INT4 (or INT8/INT5), you may save and load the optimized model as follows:
+
   ```python
   model.save_low_bit(model_path)
   
@@ -106,19 +111,24 @@ You may run the models using `transformers`-style API in `bigdl-llm`.
 
   You may also convert Hugging Face *Transformers* models into native INT4 format for maximum performance as follows.
 
-  >**Note**: Currently only llama/bloom/gptneox/starcoder model family is supported; for other models, you may use the Transformers INT4 format as described above).
+  >**Notes**: 
+  
+  * Currently only llama/bloom/gptneox/starcoder/chatglm model families are supported; for other models, you may use the Transformers INT4 format as described above).
+  
+  * You may choose the corresponding API developed for specific native models to load the converted model.
 
-   ```python
+  ```python
   #convert the model
   from bigdl.llm import llm_convert
   bigdl_llm_path = llm_convert(model='/path/to/model/',
           outfile='/path/to/output/', outtype='int4', model_family="llama")
 
   #load the converted model
-  from bigdl.llm.transformers import BigdlNativeForCausalLM
-  llm = BigdlNativeForCausalLM.from_pretrained("/path/to/output/model.bin",...)
-   
-  #run the converted  model
+  #switch to ChatGLMForCausalLM/GptneoxForCausalLM/BloomForCausalLM/StarcoderForCausalLM to load other models
+  from bigdl.llm.transformers import LlamaForCausalLM
+  llm = LlamaForCausalLM.from_pretrained("/path/to/output/model.bin", ...)
+  
+  #run the converted model
   input_ids = llm.tokenize(prompt)
   output_ids = llm.generate(input_ids, ...)
   output = llm.batch_decode(output_ids)
@@ -243,8 +253,9 @@ See the inital `bigdl-llm` API Doc [here](https://bigdl.readthedocs.io/en/latest
 
 [^1]: Performance varies by use, configuration and other factors. `bigdl-llm` may not optimize to the same degree for non-Intel products. Learn more at www.Intel.com/PerformanceIndex.
 
-### `bigdl-llm` Dependence 
-The native code/lib in `bigdl-llm` has been built using the following tools; in particular, lower  `LIBC` version on your Linux system may be incompatible with `bigdl-llm`.
+### `bigdl-llm` Dependencies 
+The native code/lib in `bigdl-llm` has been built using the following tools.
+Note that lower  `LIBC` version on your Linux system may be incompatible with `bigdl-llm`.
 
 | Model family | Platform | Compiler           | GLIBC |
 | ------------ | -------- | ------------------ | ----- |
diff --git a/python/llm/src/bigdl/llm/transformers/modelling_bigdl.py b/python/llm/src/bigdl/llm/transformers/modelling_bigdl.py
index 098de5e8..4c7ba671 100644
--- a/python/llm/src/bigdl/llm/transformers/modelling_bigdl.py
+++ b/python/llm/src/bigdl/llm/transformers/modelling_bigdl.py
@@ -19,7 +19,9 @@
 # Otherwise there would be module not found error in non-pip's setting as Python would
 # only search the first bigdl package and end up finding only one sub-package.
 
+import importlib
 import logging
+
 from bigdl.llm.utils.common import invalidInputError
 from .model import *
 
@@ -107,42 +109,53 @@ class _BaseGGMLClass:
 
         :return: a model instance
         """
-        if native:
-            invalidInputError(dtype.lower() in ['int4', 'int8'],
-                              "Now we only support int4 and int8 as date type for weight")
-            ggml_model_path = pretrained_model_name_or_path
-            return cls.GGML_Model(model_path=ggml_model_path,
-                                  **kwargs)
-        else:
-            return cls.HF_Class.from_pretrained(pretrained_model_name_or_path,
-                                                *args, **kwargs)
+        try:
+            module = importlib.import_module(cls.GGML_Module)
+            class_ = getattr(module, cls.GGML_Model)
+            if native:
+                invalidInputError(dtype.lower() in ['int4', 'int8'],
+                                  "Now we only support int4 and int8 as date type for weight")
+                ggml_model_path = pretrained_model_name_or_path
+                model = class_(model_path=ggml_model_path, **kwargs)
+            else:
+                model = cls.HF_Class.from_pretrained(pretrained_model_name_or_path,
+                                                     *args, **kwargs)
+        except Exception as e:
+            invalidInputError(
+                False,
+                f"Could not load model from path: {pretrained_model_name_or_path}. "
+                f"Please make sure the CausalLM class matches "
+                "the model you want to load."
+                f"Received error {e}"
+            )
+        return model
 
 
 class LlamaForCausalLM(_BaseGGMLClass):
-    from bigdl.llm.ggml.model.llama import Llama
-    GGML_Model = Llama
+    GGML_Module = "bigdl.llm.models"
+    GGML_Model = "Llama"
     HF_Class = AutoModelForCausalLM
 
 
 class ChatGLMForCausalLM(_BaseGGMLClass):
-    from bigdl.llm.ggml.model.chatglm import ChatGLM
-    GGML_Model = ChatGLM
+    GGML_Module = "bigdl.llm.ggml.model.chatglm"
+    GGML_Model = "ChatGLM"
     HF_Class = AutoModel
 
 
 class GptneoxForCausalLM(_BaseGGMLClass):
-    from bigdl.llm.ggml.model.gptneox import Gptneox
-    GGML_Model = Gptneox
+    GGML_Module = "bigdl.llm.models"
+    GGML_Model = "Gptneox"
     HF_Class = AutoModelForCausalLM
 
 
 class BloomForCausalLM(_BaseGGMLClass):
-    from bigdl.llm.ggml.model.bloom import Bloom
-    GGML_Model = Bloom
+    GGML_Module = "bigdl.llm.models"
+    GGML_Model = "Bloom"
     HF_Class = AutoModelForCausalLM
 
 
 class StarcoderForCausalLM(_BaseGGMLClass):
-    from bigdl.llm.ggml.model.starcoder import Starcoder
-    GGML_Model = Starcoder
+    GGML_Module = "bigdl.llm.models"
+    GGML_Model = "Starcoder"
     HF_Class = AutoModelForCausalLM