diff --git a/README.md b/README.md index f3e02023..e47e05d0 100644 --- a/README.md +++ b/README.md @@ -2,26 +2,74 @@


-_**Fast, Distributed, Secure AI for Big Data**_ - --- -## Latest News +## BigDL-LLM -- **Try the latest [`bigdl-llm`](python/llm) library for running LLM (large language model) on your Intel laptop or GPU using INT4 with very low latency!**[^1] *(It is built on top of the excellent work of [llama.cpp](https://github.com/ggerganov/llama.cpp), [gptq](https://github.com/IST-DASLab/gptq), [bitsandbytes](https://github.com/TimDettmers/bitsandbytes), [qlora](https://github.com/artidoro/qlora), etc., and supports any Hugging Face Transformers model)* +**[`bigdl-llm`](python/llm)** is a library for running ***LLM*** (large language model) on your Intel ***laptop*** or ***GPU*** using INT4 with very low latency[^1] (for any **PyTorch** model). +> *It is built on top of the excellent work of [llama.cpp](https://github.com/ggerganov/llama.cpp), [gptq](https://github.com/IST-DASLab/gptq), [ggml](https://github.com/ggerganov/ggml), [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), [bitsandbytes](https://github.com/TimDettmers/bitsandbytes), [qlora](https://github.com/artidoro/qlora), [gptq_for_llama](https://github.com/qwopqwop200/GPTQ-for-LLaMa), [chatglm.cpp](https://github.com/li-plus/chatglm.cpp), [redpajama.cpp](https://github.com/togethercomputer/redpajama.cpp), [gptneox.cpp](https://github.com/byroneverson/gptneox.cpp), [bloomz.cpp](https://github.com/NouamaneTazi/bloomz.cpp/), etc.* + +### Latest update +- `bigdl-llm` now supports Intel Arc or Flex GPU; see the the latest GPU examples [here](python/llm/example/gpu). +- `bigdl-llm` tutorial is made availabe [here](https://github.com/intel-analytics/bigdl-llm-tutorial). +- Over 20 models have been optimized/verified on `bigdl-llm`, including *LLaMA/LLaMA2, ChatGLM/ChatGLM2, MPT, Falcon, Dolly-v1/Dolly-v2, StarCoder, Whisper, QWen, Baichuan, MOSS,* and more; see the complete list [here](python/llm/README.md#verified-models). + +### `bigdl-llm` Demos +See the ***optimized performance*** of `chatglm2-6b`, `llama-2-13b-chat`, and `starcoder-15b` models on a 12th Gen Intel Core CPU below.

-- **[Update] `bigdl-llm` now supports Intel Arc or Flex GPU; see the the latest GPU examples [here](python/llm/example/gpu).** +### `bigdl-llm` quick start -- **Over a dozen models have been verified on [`bigdl-llm`](python/llm)**, including *LLaMA/LLaMA2, ChatGLM/ChatGLM2, MPT, Falcon, Dolly-v1/Dolly-v2, StarCoder, Whisper, QWen, Baichuan,* and more; see the complete list [here](python/llm/README.md#verified-models). +#### Install +You may install **`bigdl-llm`** as follows: +```bash +pip install --pre --upgrade bigdl-llm[all] +``` +> Note: `bigdl-llm` has been tested on Python 3.9 + +#### Run Model + +You may apply INT4 optimizations to any Hugging Face *Transformers* models as follows. + +```python +#load Hugging Face Transformers model with INT4 optimizations +from bigdl.llm.transformers import AutoModelForCausalLM +model = AutoModelForCausalLM.from_pretrained('/path/to/model/', load_in_4bit=True) + +#run the optimized model +from transformers import AutoTokenizer +tokenizer = AutoTokenizer.from_pretrained(model_path) +input_ids = tokenizer.encode(input_str, ...) +output_ids = model.generate(input_ids, ...) +output = tokenizer.batch_decode(output_ids) +``` +*See the complete examples [here](python/llm/example/transformers/transformers_int4/).* + +>**Note**: You may apply more low bit optimizations (including INT8, INT5 and INT4) as follows: + >```python + >model = AutoModelForCausalLM.from_pretrained('/path/to/model/', load_in_low_bit="sym_int5") + >``` + >*See the complete example [here](python/llm/example/transformers/transformers_low_bit/).* + + +After the model is optimizaed using INT4 (or INT8/INT5), you may also save and load the optimized model as follows: + +```python +model.save_low_bit(model_path) + +new_model = AutoModelForCausalLM.load_low_bit(model_path) +``` +*See the complete example [here](python/llm/example/transformers/transformers_low_bit/).* + +***For more details, please refer to the `bigdl-llm` [Readme](python/llm), [Tutorial](https://github.com/intel-analytics/bigdl-llm-tutorial) and [API Doc](https://bigdl.readthedocs.io/en/latest/doc/PythonAPI/LLM/index.html).*** --- -## Overview +## Overview of the complete BigDL project BigDL seamlessly scales your data analytics & AI applications from laptop to cloud, with the following libraries: diff --git a/docs/readthedocs/source/index.rst b/docs/readthedocs/source/index.rst index 30a6050e..4be55297 100644 --- a/docs/readthedocs/source/index.rst +++ b/docs/readthedocs/source/index.rst @@ -1,17 +1,56 @@ .. meta:: :google-site-verification: S66K6GAclKw1RroxU0Rka_2d1LZFVe27M0gRneEsIVI -BigDL: fast, distributed, secure AI for Big Data +BigDL: fast and secure AI ================================================= -Latest News +BigDL-LLM --------------------------------- -- **Try the latest** `bigdl-llm `_ **library for running LLM (large language model) on your Intel laptop using INT4 with very low latency!** [*]_. *(It is built on top of the excellent work of* `llama.cpp `_, `gptq `_, `bitsandbytes `_, *etc., and supports any Hugging Face Transformers model.)* +`bigdl-llm `_ is a library for running **LLM** (large language model) on your Intel **laptop** or **GPU** using INT4 with very low latency [*]_ (for any **PyTorch** model). + +.. note:: + + It is built on top of the excellent work of `llama.cpp `_, `gptq `_, `bitsandbytes `_, `qlora `_, etc. + +Latest update +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +- ``bigdl-llm`` now supports Intel Arc and Flex GPU; see the the latest GPU examples `here `_. +- ``bigdl-llm`` tutorial tutorial is made availabe `here `_. +- Over a dozen models have been verified on ``bigdl-llm``, including *LLaMA/LLaMA2, ChatGLM/ChatGLM2, MPT, Falcon, Dolly-v1/Dolly-v2, StarCoder, Whisper, QWen, Baichuan,* and more; see the complete list `here `_. + +bigdl-llm quickstart +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +You may install ``bigdl-llm`` as follows: + +.. code-block:: console + + pip install --pre --upgrade bigdl-llm[all] + +.. note:: + + ``bigdl-llm`` has been tested on Python 3.9. + +You can then apply INT4 optimizations to any Hugging Face *Transformers* models as follows. + +.. code-block:: python + + #load Hugging Face Transformers model with INT4 optimizations + from bigdl.llm.transformers import AutoModelForCausalLM + model = AutoModelForCausalLM.from_pretrained('/path/to/model/', load_in_4bit=True) + + #run the optimized model + from transformers import AutoTokenizer + tokenizer = AutoTokenizer.from_pretrained(model_path) + input_ids = tokenizer.encode(input_str, ...) + output_ids = model.generate(input_ids, ...) + output = tokenizer.batch_decode(output_ids) + +**For more details, please refer to the bigdl-llm** `Readme `_, `Tutorial `_ and `API Doc `_. -- **[Update] Over a dozen models have been verified on** `bigdl-llm `_, including *LLaMA/LLaMA2, ChatGLM/ChatGLM2, MPT, Falcon, Dolly-v1/Dolly-v2, StarCoder, Whisper, QWen, Baichuan,* and more; see the complete list `here `_. ------ -Overview +Overview of the complete BigDL project --------------------------------- `BigDL `_ seamlessly scales your data analytics & AI applications from laptop to cloud, with the following libraries: