diff --git a/README.md b/README.md index 841d9bcd..3b637337 100644 --- a/README.md +++ b/README.md @@ -12,8 +12,8 @@ > *It is built on top of the excellent work of [llama.cpp](https://github.com/ggerganov/llama.cpp), [ggml](https://github.com/ggerganov/ggml), [gptq](https://github.com/IST-DASLab/gptq), [bitsandbytes](https://github.com/TimDettmers/bitsandbytes), [qlora](https://github.com/artidoro/qlora), [llama-cpp-python](https://github.com/abetlen/llama-cpp-python), [gptq_for_llama](https://github.com/qwopqwop200/GPTQ-for-LLaMa), [chatglm.cpp](https://github.com/li-plus/chatglm.cpp), [redpajama.cpp](https://github.com/togethercomputer/redpajama.cpp), [gptneox.cpp](https://github.com/byroneverson/gptneox.cpp), [bloomz.cpp](https://github.com/NouamaneTazi/bloomz.cpp/), etc.* ### Latest update -- **[New]** `bigdl-llm` now supports QLoRA fintuning on Intel GPU; see the the example [here](python/llm/example/gpu/qlora_finetuning). -- `bigdl-llm` now supports Intel GPU (including Arc, Flex and MAX); see the the latest GPU examples [here](python/llm/example/gpu). +- **[New]** `bigdl-llm` now supports QLoRA fintuning on Intel GPU; see the the example [here](python/llm/example/GPU/QLoRA-FineTuning). +- `bigdl-llm` now supports Intel GPU (including Arc, Flex and MAX); see the the latest GPU examples [here](python/llm/example/GPU). - `bigdl-llm` tutorial is released [here](https://github.com/intel-analytics/bigdl-llm-tutorial). - Over 20 models have been optimized/verified on `bigdl-llm`, including *LLaMA/LLaMA2, ChatGLM/ChatGLM2, MPT, Falcon, Dolly, StarCoder, Whisper, InternLM, QWen, Baichuan, Aquila, MOSS,* and more; see the complete list [here](python/llm/README.md#verified-models). @@ -76,7 +76,7 @@ input_ids = tokenizer.encode(input_str, ...) output_ids = model.generate(input_ids, ...) output = tokenizer.batch_decode(output_ids) ``` -*See the complete examples [here](python/llm/example/transformers/transformers_int4/).* +*See the complete examples [here](python/llm/example/CPU/HF-Transformers-AutoModels/Model).* #### GPU INT4 ##### Install @@ -105,7 +105,7 @@ input_ids = tokenizer.encode(input_str, ...).to('xpu') output_ids = model.generate(input_ids, ...) output = tokenizer.batch_decode(output_ids.cpu()) ``` -*See the complete examples [here](python/llm/example/gpu/).* +*See the complete examples [here](python/llm/example/GPU).* #### More Low-Bit Support ##### Save and load @@ -115,7 +115,7 @@ After the model is optimized using `bigdl-llm`, you may save and load the model model.save_low_bit(model_path) new_model = AutoModelForCausalLM.load_low_bit(model_path) ``` -*See the complete example [here](python/llm/example/transformers/transformers_low_bit/).* +*See the complete example [here](python/llm/example/CPU/HF-Transformers-AutoModels/Save-Load).* ##### Additonal data types @@ -123,7 +123,7 @@ In addition to INT4, You may apply other low bit optimizations (such as *INT8*, ```python model = AutoModelForCausalLM.from_pretrained('/path/to/model/', load_in_low_bit="sym_int8") ``` -*See the complete example [here](python/llm/example/transformers/transformers_low_bit/).* +*See the complete example [here](python/llm/example/CPU/HF-Transformers-AutoModels/More-Data-Types).* ***For more details, please refer to the `bigdl-llm` [Document](https://test-bigdl-llm.readthedocs.io/en/main/doc/LLM/index.html), [Readme](python/llm), [Tutorial](https://github.com/intel-analytics/bigdl-llm-tutorial) and [API Doc](https://bigdl.readthedocs.io/en/latest/doc/PythonAPI/LLM/index.html).*** diff --git a/python/llm/README.md b/python/llm/README.md index bb19f43b..63cd54fa 100644 --- a/python/llm/README.md +++ b/python/llm/README.md @@ -40,23 +40,24 @@ Over 20 models have been optimized/verified on `bigdl-llm`, including *LLaMA/LLa | Model | Example | |-----------|----------------------------------------------------------| -| LLaMA *(such as Vicuna, Guanaco, Koala, Baize, WizardLM, etc.)* | [link1](example/transformers/native_int4), [link2](example/transformers/transformers_int4/vicuna) | -| LLaMA 2 | [link](example/transformers/transformers_int4/llama2) | -| MPT | [link](example/transformers/transformers_int4/mpt) | -| Falcon | [link](example/transformers/transformers_int4/falcon) | -| ChatGLM | [link](example/transformers/transformers_int4/chatglm) | -| ChatGLM2 | [link](example/transformers/transformers_int4/chatglm2) | -| Qwen | [link](example/transformers/transformers_int4/qwen) | -| MOSS | [link](example/transformers/transformers_int4/moss) | -| Baichuan | [link](example/transformers/transformers_int4/baichuan) | -| Baichuan2 | [link](example/transformers/transformers_int4/baichuan2) | -| Dolly-v1 | [link](example/transformers/transformers_int4/dolly_v1) | -| Dolly-v2 | [link](example/transformers/transformers_int4/dolly_v2) | -| RedPajama | [link1](example/transformers/native_int4), [link2](example/transformers/transformers_int4/redpajama) | -| Phoenix | [link1](example/transformers/native_int4), [link2](example/transformers/transformers_int4/phoenix) | -| StarCoder | [link1](example/transformers/native_int4), [link2](example/transformers/transformers_int4/starcoder) | -| InternLM | [link](example/transformers/transformers_int4/internlm) | -| Whisper | [link](example/transformers/transformers_int4/whisper) | +| LLaMA *(such as Vicuna, Guanaco, Koala, Baize, WizardLM, etc.)* | [link1](example/CPU/Native-Models), [link2](example/CPU/HF-Transformers-AutoModels/Model/vicuna) | +| LLaMA 2 | [link](example/CPU/HF-Transformers-AutoModels/Model/llama2) | +| MPT | [link](example/CPU/HF-Transformers-AutoModels/Model/mpt) | +| Falcon | [link](example/CPU/HF-Transformers-AutoModels/Model/falcon) | +| ChatGLM | [link](example/CPU/HF-Transformers-AutoModels/Model/chatglm) | +| ChatGLM2 | [link](example/CPU/HF-Transformers-AutoModels/Model/chatglm2) | +| Qwen | [link](example/CPU/HF-Transformers-AutoModels/Model/qwen) | +| MOSS | [link](example/CPU/HF-Transformers-AutoModels/Model/moss) | +| Baichuan | [link](example/CPU/HF-Transformers-AutoModels/Model/baichuan) | +| Baichuan2 | [link](example/CPU/HF-Transformers-AutoModels/Model/baichuan2) | +| Dolly-v1 | [link](example/CPU/HF-Transformers-AutoModels/Model/dolly_v1) | +| Dolly-v2 | [link](example/CPU/HF-Transformers-AutoModels/Model/dolly_v2) | +| RedPajama | [link1](example/CPU/Native-Models), [link2](example/CPU/HF-Transformers-AutoModels/Model/redpajama) | +| Phoenix | [link1](example/CPU/Native-Models), [link2](example/CPU/HF-Transformers-AutoModels/Model/phoenix) | +| StarCoder | [link1](example/CPU/Native-Models), [link2](example/CPU/HF-Transformers-AutoModels/Model/starcoder) | +| InternLM | [link](example/CPU/HF-Transformers-AutoModels/Model/internlm) | +| Whisper | [link](example/CPU/HF-Transformers-AutoModels/Model/whisper) | +| Aquila | [link](example/CPU/HF-Transformers-AutoModels/Model/aquila) | @@ -119,7 +120,7 @@ output_ids = model.generate(input_ids, ...) output = tokenizer.batch_decode(output_ids) ``` -See the complete examples [here](example/transformers/transformers_int4/). +See the complete examples [here](example/CPU/HF-Transformers-AutoModels/Model/). ###### GPU INT4 You may apply INT4 optimizations to any Hugging Face *Transformers* model on Intel GPU as follows. @@ -138,7 +139,7 @@ input_ids = tokenizer.encode(input_str, ...).to('xpu') output_ids = model.generate(input_ids, ...) output = tokenizer.batch_decode(output_ids.cpu()) ``` -See the complete examples [here](example/gpu/). +See the complete examples [here](example/GPU). ###### More Low-Bit Support - Save and load @@ -148,7 +149,7 @@ See the complete examples [here](example/gpu/). model.save_low_bit(model_path) new_model = AutoModelForCausalLM.load_low_bit(model_path) ``` - *See the complete example [here](example/transformers/transformers_low_bit/).* + *See the complete example [here](example/CPU/HF-Transformers-AutoModels/Save-Load).* - Additonal data types @@ -157,7 +158,7 @@ See the complete examples [here](example/gpu/). ```python model = AutoModelForCausalLM.from_pretrained('/path/to/model/', load_in_low_bit="sym_int8") ``` - *See the complete example [here](example/transformers/transformers_low_bit/).* + *See the complete example [here](example/CPU/HF-Transformers-AutoModels/More-Data-Types).* ##### 2. Native INT4 model @@ -182,7 +183,7 @@ output_ids = llm.generate(input_ids, ...) output = llm.batch_decode(output_ids) ``` -See the complete example [here](example/transformers/native_int4/native_int4_pipeline.py). +See the complete example [here](example/CPU/Native-Models/native_int4_pipeline.py). ##### 3. LangChain API You may run the models using the LangChain API in `bigdl-llm`. @@ -202,7 +203,7 @@ You may run the models using the LangChain API in `bigdl-llm`. doc_chain = load_qa_chain(bigdl_llm, ...) output = doc_chain.run(...) ``` - See the examples [here](example/langchain/transformers_int4). + See the examples [here](example/CPU/LangChain/transformers_int4). - **Using native INT4 model** @@ -224,7 +225,7 @@ You may run the models using the LangChain API in `bigdl-llm`. doc_chain.run(...) ``` - See the examples [here](example/langchain/native_int4). + See the examples [here](example/CPU/LangChain/native_int4). ##### 4. CLI Tool >**Note**: Currently `bigdl-llm` CLI supports *LLaMA* (e.g., *vicuna*), *GPT-NeoX* (e.g., *redpajama*), *BLOOM* (e.g., *pheonix*) and *GPT2* (e.g., *starcoder*) model architecture; for other models, you may use the Hugging Face `transformers` or LangChain APIs. diff --git a/python/llm/example/transformers/transformers_int4/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/README.md similarity index 97% rename from python/llm/example/transformers/transformers_int4/README.md rename to python/llm/example/CPU/HF-Transformers-AutoModels/Model/README.md index 79e23fd1..497e7e6c 100644 --- a/python/llm/example/transformers/transformers_int4/README.md +++ b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/README.md @@ -21,6 +21,7 @@ You can use BigDL-LLM to run any Huggingface Transformer models with INT4 optimi | InternLM | [link](internlm) | | Whisper | [link](whisper) | | Qwen | [link](qwen) | +| Aquila | [link](aquila) | ## Recommended Requirements To run the examples, we recommend using Intel® Xeon® processors (server), or >= 12th Gen Intel® Core™ processor (client). diff --git a/python/llm/example/transformers/transformers_int4/aquila/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/aquila/README.md similarity index 100% rename from python/llm/example/transformers/transformers_int4/aquila/README.md rename to python/llm/example/CPU/HF-Transformers-AutoModels/Model/aquila/README.md diff --git a/python/llm/example/transformers/transformers_int4/aquila/generate.py b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/aquila/generate.py similarity index 100% rename from python/llm/example/transformers/transformers_int4/aquila/generate.py rename to python/llm/example/CPU/HF-Transformers-AutoModels/Model/aquila/generate.py diff --git a/python/llm/example/transformers/transformers_int4/baichuan/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/baichuan/README.md similarity index 100% rename from python/llm/example/transformers/transformers_int4/baichuan/README.md rename to python/llm/example/CPU/HF-Transformers-AutoModels/Model/baichuan/README.md diff --git a/python/llm/example/transformers/transformers_int4/baichuan/generate.py b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/baichuan/generate.py similarity index 100% rename from python/llm/example/transformers/transformers_int4/baichuan/generate.py rename to python/llm/example/CPU/HF-Transformers-AutoModels/Model/baichuan/generate.py diff --git a/python/llm/example/transformers/transformers_int4/baichuan2/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/baichuan2/README.md similarity index 100% rename from python/llm/example/transformers/transformers_int4/baichuan2/README.md rename to python/llm/example/CPU/HF-Transformers-AutoModels/Model/baichuan2/README.md diff --git a/python/llm/example/transformers/transformers_int4/baichuan2/generate.py b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/baichuan2/generate.py similarity index 100% rename from python/llm/example/transformers/transformers_int4/baichuan2/generate.py rename to python/llm/example/CPU/HF-Transformers-AutoModels/Model/baichuan2/generate.py diff --git a/python/llm/example/transformers/transformers_int4/chatglm/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/chatglm/README.md similarity index 100% rename from python/llm/example/transformers/transformers_int4/chatglm/README.md rename to python/llm/example/CPU/HF-Transformers-AutoModels/Model/chatglm/README.md diff --git a/python/llm/example/transformers/transformers_int4/chatglm/generate.py b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/chatglm/generate.py similarity index 100% rename from python/llm/example/transformers/transformers_int4/chatglm/generate.py rename to python/llm/example/CPU/HF-Transformers-AutoModels/Model/chatglm/generate.py diff --git a/python/llm/example/transformers/transformers_int4/chatglm2/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/chatglm2/README.md similarity index 100% rename from python/llm/example/transformers/transformers_int4/chatglm2/README.md rename to python/llm/example/CPU/HF-Transformers-AutoModels/Model/chatglm2/README.md diff --git a/python/llm/example/transformers/transformers_int4/chatglm2/generate.py b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/chatglm2/generate.py similarity index 100% rename from python/llm/example/transformers/transformers_int4/chatglm2/generate.py rename to python/llm/example/CPU/HF-Transformers-AutoModels/Model/chatglm2/generate.py diff --git a/python/llm/example/transformers/transformers_int4/chatglm2/streamchat.py b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/chatglm2/streamchat.py similarity index 100% rename from python/llm/example/transformers/transformers_int4/chatglm2/streamchat.py rename to python/llm/example/CPU/HF-Transformers-AutoModels/Model/chatglm2/streamchat.py diff --git a/python/llm/example/transformers/transformers_int4/dolly_v1/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/dolly_v1/README.md similarity index 100% rename from python/llm/example/transformers/transformers_int4/dolly_v1/README.md rename to python/llm/example/CPU/HF-Transformers-AutoModels/Model/dolly_v1/README.md diff --git a/python/llm/example/transformers/transformers_int4/dolly_v1/generate.py b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/dolly_v1/generate.py similarity index 100% rename from python/llm/example/transformers/transformers_int4/dolly_v1/generate.py rename to python/llm/example/CPU/HF-Transformers-AutoModels/Model/dolly_v1/generate.py diff --git a/python/llm/example/transformers/transformers_int4/dolly_v2/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/dolly_v2/README.md similarity index 100% rename from python/llm/example/transformers/transformers_int4/dolly_v2/README.md rename to python/llm/example/CPU/HF-Transformers-AutoModels/Model/dolly_v2/README.md diff --git a/python/llm/example/transformers/transformers_int4/dolly_v2/generate.py b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/dolly_v2/generate.py similarity index 100% rename from python/llm/example/transformers/transformers_int4/dolly_v2/generate.py rename to python/llm/example/CPU/HF-Transformers-AutoModels/Model/dolly_v2/generate.py diff --git a/python/llm/example/transformers/transformers_int4/falcon/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/falcon/README.md similarity index 100% rename from python/llm/example/transformers/transformers_int4/falcon/README.md rename to python/llm/example/CPU/HF-Transformers-AutoModels/Model/falcon/README.md diff --git a/python/llm/example/transformers/transformers_int4/falcon/falcon-40b-instruct/modelling_RW.py b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/falcon/falcon-40b-instruct/modelling_RW.py similarity index 100% rename from python/llm/example/transformers/transformers_int4/falcon/falcon-40b-instruct/modelling_RW.py rename to python/llm/example/CPU/HF-Transformers-AutoModels/Model/falcon/falcon-40b-instruct/modelling_RW.py diff --git a/python/llm/example/gpu/hf-transformers-models/falcon/falcon-7b-instruct/modelling_RW.py b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/falcon/falcon-7b-instruct/modelling_RW.py similarity index 100% rename from python/llm/example/gpu/hf-transformers-models/falcon/falcon-7b-instruct/modelling_RW.py rename to python/llm/example/CPU/HF-Transformers-AutoModels/Model/falcon/falcon-7b-instruct/modelling_RW.py diff --git a/python/llm/example/transformers/transformers_int4/falcon/generate.py b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/falcon/generate.py similarity index 100% rename from python/llm/example/transformers/transformers_int4/falcon/generate.py rename to python/llm/example/CPU/HF-Transformers-AutoModels/Model/falcon/generate.py diff --git a/python/llm/example/transformers/transformers_int4/internlm/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/internlm/README.md similarity index 100% rename from python/llm/example/transformers/transformers_int4/internlm/README.md rename to python/llm/example/CPU/HF-Transformers-AutoModels/Model/internlm/README.md diff --git a/python/llm/example/transformers/transformers_int4/internlm/generate.py b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/internlm/generate.py similarity index 100% rename from python/llm/example/transformers/transformers_int4/internlm/generate.py rename to python/llm/example/CPU/HF-Transformers-AutoModels/Model/internlm/generate.py diff --git a/python/llm/example/transformers/transformers_int4/llama2/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/llama2/README.md similarity index 100% rename from python/llm/example/transformers/transformers_int4/llama2/README.md rename to python/llm/example/CPU/HF-Transformers-AutoModels/Model/llama2/README.md diff --git a/python/llm/example/transformers/transformers_int4/llama2/generate.py b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/llama2/generate.py similarity index 100% rename from python/llm/example/transformers/transformers_int4/llama2/generate.py rename to python/llm/example/CPU/HF-Transformers-AutoModels/Model/llama2/generate.py diff --git a/python/llm/example/transformers/transformers_int4/moss/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/moss/README.md similarity index 100% rename from python/llm/example/transformers/transformers_int4/moss/README.md rename to python/llm/example/CPU/HF-Transformers-AutoModels/Model/moss/README.md diff --git a/python/llm/example/transformers/transformers_int4/moss/generate.py b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/moss/generate.py similarity index 100% rename from python/llm/example/transformers/transformers_int4/moss/generate.py rename to python/llm/example/CPU/HF-Transformers-AutoModels/Model/moss/generate.py diff --git a/python/llm/example/transformers/transformers_int4/mpt/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/mpt/README.md similarity index 100% rename from python/llm/example/transformers/transformers_int4/mpt/README.md rename to python/llm/example/CPU/HF-Transformers-AutoModels/Model/mpt/README.md diff --git a/python/llm/example/transformers/transformers_int4/mpt/generate.py b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/mpt/generate.py similarity index 100% rename from python/llm/example/transformers/transformers_int4/mpt/generate.py rename to python/llm/example/CPU/HF-Transformers-AutoModels/Model/mpt/generate.py diff --git a/python/llm/example/transformers/transformers_int4/phoenix/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/phoenix/README.md similarity index 100% rename from python/llm/example/transformers/transformers_int4/phoenix/README.md rename to python/llm/example/CPU/HF-Transformers-AutoModels/Model/phoenix/README.md diff --git a/python/llm/example/transformers/transformers_int4/phoenix/generate.py b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/phoenix/generate.py similarity index 100% rename from python/llm/example/transformers/transformers_int4/phoenix/generate.py rename to python/llm/example/CPU/HF-Transformers-AutoModels/Model/phoenix/generate.py diff --git a/python/llm/example/transformers/transformers_int4/qwen/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/qwen/README.md similarity index 100% rename from python/llm/example/transformers/transformers_int4/qwen/README.md rename to python/llm/example/CPU/HF-Transformers-AutoModels/Model/qwen/README.md diff --git a/python/llm/example/transformers/transformers_int4/qwen/generate.py b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/qwen/generate.py similarity index 100% rename from python/llm/example/transformers/transformers_int4/qwen/generate.py rename to python/llm/example/CPU/HF-Transformers-AutoModels/Model/qwen/generate.py diff --git a/python/llm/example/transformers/transformers_int4/redpajama/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/redpajama/README.md similarity index 100% rename from python/llm/example/transformers/transformers_int4/redpajama/README.md rename to python/llm/example/CPU/HF-Transformers-AutoModels/Model/redpajama/README.md diff --git a/python/llm/example/transformers/transformers_int4/redpajama/generate.py b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/redpajama/generate.py similarity index 100% rename from python/llm/example/transformers/transformers_int4/redpajama/generate.py rename to python/llm/example/CPU/HF-Transformers-AutoModels/Model/redpajama/generate.py diff --git a/python/llm/example/transformers/transformers_int4/starcoder/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/starcoder/README.md similarity index 100% rename from python/llm/example/transformers/transformers_int4/starcoder/README.md rename to python/llm/example/CPU/HF-Transformers-AutoModels/Model/starcoder/README.md diff --git a/python/llm/example/transformers/transformers_int4/starcoder/generate.py b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/starcoder/generate.py similarity index 100% rename from python/llm/example/transformers/transformers_int4/starcoder/generate.py rename to python/llm/example/CPU/HF-Transformers-AutoModels/Model/starcoder/generate.py diff --git a/python/llm/example/transformers/transformers_int4/vicuna/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/vicuna/README.md similarity index 100% rename from python/llm/example/transformers/transformers_int4/vicuna/README.md rename to python/llm/example/CPU/HF-Transformers-AutoModels/Model/vicuna/README.md diff --git a/python/llm/example/transformers/transformers_int4/vicuna/generate.py b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/vicuna/generate.py similarity index 100% rename from python/llm/example/transformers/transformers_int4/vicuna/generate.py rename to python/llm/example/CPU/HF-Transformers-AutoModels/Model/vicuna/generate.py diff --git a/python/llm/example/transformers/transformers_int4/whisper/long-segment-recognize.py b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/whisper/long-segment-recognize.py similarity index 100% rename from python/llm/example/transformers/transformers_int4/whisper/long-segment-recognize.py rename to python/llm/example/CPU/HF-Transformers-AutoModels/Model/whisper/long-segment-recognize.py diff --git a/python/llm/example/transformers/transformers_int4/whisper/readme.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/whisper/readme.md similarity index 100% rename from python/llm/example/transformers/transformers_int4/whisper/readme.md rename to python/llm/example/CPU/HF-Transformers-AutoModels/Model/whisper/readme.md diff --git a/python/llm/example/transformers/transformers_int4/whisper/recognize.py b/python/llm/example/CPU/HF-Transformers-AutoModels/Model/whisper/recognize.py similarity index 100% rename from python/llm/example/transformers/transformers_int4/whisper/recognize.py rename to python/llm/example/CPU/HF-Transformers-AutoModels/Model/whisper/recognize.py diff --git a/python/llm/example/transformers/transformers_low_bit/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/More-Data-Types/README.md similarity index 100% rename from python/llm/example/transformers/transformers_low_bit/README.md rename to python/llm/example/CPU/HF-Transformers-AutoModels/More-Data-Types/README.md diff --git a/python/llm/example/transformers/transformers_low_bit/transformers_low_bit_pipeline.py b/python/llm/example/CPU/HF-Transformers-AutoModels/More-Data-Types/transformers_low_bit_pipeline.py similarity index 100% rename from python/llm/example/transformers/transformers_low_bit/transformers_low_bit_pipeline.py rename to python/llm/example/CPU/HF-Transformers-AutoModels/More-Data-Types/transformers_low_bit_pipeline.py diff --git a/python/llm/example/CPU/HF-Transformers-AutoModels/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/README.md new file mode 100644 index 00000000..e0cebde5 --- /dev/null +++ b/python/llm/example/CPU/HF-Transformers-AutoModels/README.md @@ -0,0 +1,7 @@ +# Running Hugging Face Transformers model using BigDL-LLM on Intel CPU + +This folder contains examples of running any Hugging Face Transformers model on BigDL-LLM (using the standard AutoModel APIs): + +- [Model](Model): examples of running Hugging Face Transformers models (e.g., LLaMA2, ChatGLM2, Falcon, MPT, Baichuan2, etc.) using INT4 optimizations +- [More-Data-Types](More-Data-Types): examples of applying other low bit optimizations (NF4/INT5/INT8, etc.) +- [Save-Load](Save-Load): examples of saving and loading low-bit models diff --git a/python/llm/example/CPU/HF-Transformers-AutoModels/Save-Load/README.md b/python/llm/example/CPU/HF-Transformers-AutoModels/Save-Load/README.md new file mode 100644 index 00000000..6a992c85 --- /dev/null +++ b/python/llm/example/CPU/HF-Transformers-AutoModels/Save-Load/README.md @@ -0,0 +1,43 @@ +# BigDL-LLM Transformers Low-Bit Inference Pipeline for Large Language Model + +In this example, we show a pipeline to apply BigDL-LLM low-bit optimizations (including INT8/INT5/INT4) to any Hugging Face Transformers model, and then run inference on the optimized low-bit model. + +## Prepare Environment +We suggest using conda to manage environment: +```bash +conda create -n llm python=3.9 +conda activate llm + +pip install --pre --upgrade bigdl-llm[all] +``` + +## Run Example +```bash +python ./transformers_low_bit_pipeline.py --repo-id-or-model-path decapoda-research/llama-7b-hf --low-bit sym_int5 --save-path ./llama-7b-sym_int5 +``` +arguments info: +- `--repo-id-or-model-path`: str value, argument defining the huggingface repo id for the large language model to be downloaded, or the path to the huggingface checkpoint folder, the value is 'decapoda-research/llama-7b-hf' by default. +- `--low-bit`: str value, options are sym_int4, asym_int4, sym_int5, asym_int5 or sym_int8. (sym_int4 means symmetric int 4, asym_int4 means asymmetric int 4, etc.). Relevant low bit optimizations will be applied to the model. +- `--save-path`: str value, the path to save the low-bit model. Then you can load the low-bit directly. +- `--load-path`: optional str value. The path to load low-bit model. + + +## Sample Output for Inference +### 'decapoda-research/llama-7b-hf' Model +```log +Prompt: Once upon a time, there existed a little girl who liked to have adventures. She wanted to go to places and meet new people, and have fun +Output: Once upon a time, there existed a little girl who liked to have adventures. She wanted to go to places and meet new people, and have fun. She wanted to be a princess, and she wanted to be a pirate. She wanted to be a superhero, and she wanted to be +Model and tokenizer are saved to ./llama-7b-sym_int5 +``` + +### Load low-bit model +Command to run: +```bash +python ./transformers_low_bit_pipeline.py --load-path ./llama-7b-sym_int5 +``` +Output log: +```log +Prompt: Once upon a time, there existed a little girl who liked to have adventures. She wanted to go to places and meet new people, and have fun +Output: Once upon a time, there existed a little girl who liked to have adventures. She wanted to go to places and meet new people, and have fun. She wanted to be a princess, and she wanted to be a pirate. She wanted to be a superhero, and she wanted to be +``` + diff --git a/python/llm/example/CPU/HF-Transformers-AutoModels/Save-Load/transformers_low_bit_pipeline.py b/python/llm/example/CPU/HF-Transformers-AutoModels/Save-Load/transformers_low_bit_pipeline.py new file mode 100644 index 00000000..9cf9cffb --- /dev/null +++ b/python/llm/example/CPU/HF-Transformers-AutoModels/Save-Load/transformers_low_bit_pipeline.py @@ -0,0 +1,56 @@ +# +# Copyright 2016 The BigDL Authors. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import argparse +from bigdl.llm.transformers import AutoModelForCausalLM +from transformers import LlamaTokenizer, TextGenerationPipeline + +if __name__ == '__main__': + parser = argparse.ArgumentParser(description='Transformer save_load example') + parser.add_argument('--repo-id-or-model-path', type=str, default="decapoda-research/llama-7b-hf", + help='The huggingface repo id for the large language model to be downloaded' + ', or the path to the huggingface checkpoint folder') + parser.add_argument('--low-bit', type=str, default="sym_int4", + choices=['sym_int4', 'asym_int4', 'sym_int5', 'asym_int5', 'sym_int8'], + help='The quantization type the model will convert to.') + parser.add_argument('--save-path', type=str, default=None, + help='The path to save the low-bit model.') + parser.add_argument('--load-path', type=str, default=None, + help='The path to load the low-bit model.') + args = parser.parse_args() + model_path = args.repo_id_or_model_path + low_bit = args.low_bit + load_path = args.load_path + if load_path: + model = AutoModelForCausalLM.load_low_bit(load_path) + tokenizer = LlamaTokenizer.from_pretrained(load_path) + else: + # load_in_low_bit in bigdl.llm.transformers will convert + # the relevant layers in the model into corresponding int X format + model = AutoModelForCausalLM.from_pretrained(model_path, load_in_low_bit=low_bit, trust_remote_code=True) + tokenizer = LlamaTokenizer.from_pretrained(model_path, trust_remote_code=True) + + pipeline = TextGenerationPipeline(model=model, tokenizer=tokenizer, max_new_tokens=32) + input_str = "Once upon a time, there existed a little girl who liked to have adventures. She wanted to go to places and meet new people, and have fun" + output = pipeline(input_str)[0]["generated_text"] + print(f"Prompt: {input_str}") + print(f"Output: {output}") + + save_path = args.save_path + if save_path: + model.save_low_bit(save_path) + tokenizer.save_pretrained(save_path) + print(f"Model and tokenizer are saved to {save_path}") diff --git a/python/llm/example/langchain/README.md b/python/llm/example/CPU/LangChain/README.md similarity index 100% rename from python/llm/example/langchain/README.md rename to python/llm/example/CPU/LangChain/README.md diff --git a/python/llm/example/langchain/native_int4/docqa.py b/python/llm/example/CPU/LangChain/native_int4/docqa.py similarity index 100% rename from python/llm/example/langchain/native_int4/docqa.py rename to python/llm/example/CPU/LangChain/native_int4/docqa.py diff --git a/python/llm/example/langchain/native_int4/streamchat.py b/python/llm/example/CPU/LangChain/native_int4/streamchat.py similarity index 100% rename from python/llm/example/langchain/native_int4/streamchat.py rename to python/llm/example/CPU/LangChain/native_int4/streamchat.py diff --git a/python/llm/example/langchain/native_int4/voiceassistant.py b/python/llm/example/CPU/LangChain/native_int4/voiceassistant.py similarity index 100% rename from python/llm/example/langchain/native_int4/voiceassistant.py rename to python/llm/example/CPU/LangChain/native_int4/voiceassistant.py diff --git a/python/llm/example/langchain/transformers_int4/chat.py b/python/llm/example/CPU/LangChain/transformers_int4/chat.py similarity index 100% rename from python/llm/example/langchain/transformers_int4/chat.py rename to python/llm/example/CPU/LangChain/transformers_int4/chat.py diff --git a/python/llm/example/langchain/transformers_int4/docqa.py b/python/llm/example/CPU/LangChain/transformers_int4/docqa.py similarity index 100% rename from python/llm/example/langchain/transformers_int4/docqa.py rename to python/llm/example/CPU/LangChain/transformers_int4/docqa.py diff --git a/python/llm/example/langchain/transformers_int4/llm_math.py b/python/llm/example/CPU/LangChain/transformers_int4/llm_math.py similarity index 100% rename from python/llm/example/langchain/transformers_int4/llm_math.py rename to python/llm/example/CPU/LangChain/transformers_int4/llm_math.py diff --git a/python/llm/example/langchain/transformers_int4/voiceassistant.py b/python/llm/example/CPU/LangChain/transformers_int4/voiceassistant.py similarity index 100% rename from python/llm/example/langchain/transformers_int4/voiceassistant.py rename to python/llm/example/CPU/LangChain/transformers_int4/voiceassistant.py diff --git a/python/llm/example/transformers/native_int4/README.md b/python/llm/example/CPU/Native-Models/README.md similarity index 100% rename from python/llm/example/transformers/native_int4/README.md rename to python/llm/example/CPU/Native-Models/README.md diff --git a/python/llm/example/transformers/native_int4/native_int4_pipeline.py b/python/llm/example/CPU/Native-Models/native_int4_pipeline.py similarity index 100% rename from python/llm/example/transformers/native_int4/native_int4_pipeline.py rename to python/llm/example/CPU/Native-Models/native_int4_pipeline.py diff --git a/python/llm/example/pytorch-models/README.md b/python/llm/example/CPU/PyTorch-Models/Model/README.md similarity index 100% rename from python/llm/example/pytorch-models/README.md rename to python/llm/example/CPU/PyTorch-Models/Model/README.md diff --git a/python/llm/example/pytorch-models/bark/README.md b/python/llm/example/CPU/PyTorch-Models/Model/bark/README.md similarity index 100% rename from python/llm/example/pytorch-models/bark/README.md rename to python/llm/example/CPU/PyTorch-Models/Model/bark/README.md diff --git a/python/llm/example/pytorch-models/bark/synthesize_speech.py b/python/llm/example/CPU/PyTorch-Models/Model/bark/synthesize_speech.py similarity index 100% rename from python/llm/example/pytorch-models/bark/synthesize_speech.py rename to python/llm/example/CPU/PyTorch-Models/Model/bark/synthesize_speech.py diff --git a/python/llm/example/pytorch-models/bert/README.md b/python/llm/example/CPU/PyTorch-Models/Model/bert/README.md similarity index 100% rename from python/llm/example/pytorch-models/bert/README.md rename to python/llm/example/CPU/PyTorch-Models/Model/bert/README.md diff --git a/python/llm/example/pytorch-models/bert/extract_feature.py b/python/llm/example/CPU/PyTorch-Models/Model/bert/extract_feature.py similarity index 100% rename from python/llm/example/pytorch-models/bert/extract_feature.py rename to python/llm/example/CPU/PyTorch-Models/Model/bert/extract_feature.py diff --git a/python/llm/example/pytorch-models/chatglm/README.md b/python/llm/example/CPU/PyTorch-Models/Model/chatglm/README.md similarity index 100% rename from python/llm/example/pytorch-models/chatglm/README.md rename to python/llm/example/CPU/PyTorch-Models/Model/chatglm/README.md diff --git a/python/llm/example/pytorch-models/chatglm/generate.py b/python/llm/example/CPU/PyTorch-Models/Model/chatglm/generate.py similarity index 100% rename from python/llm/example/pytorch-models/chatglm/generate.py rename to python/llm/example/CPU/PyTorch-Models/Model/chatglm/generate.py diff --git a/python/llm/example/pytorch-models/llama2/README.md b/python/llm/example/CPU/PyTorch-Models/Model/llama2/README.md similarity index 100% rename from python/llm/example/pytorch-models/llama2/README.md rename to python/llm/example/CPU/PyTorch-Models/Model/llama2/README.md diff --git a/python/llm/example/pytorch-models/llama2/generate.py b/python/llm/example/CPU/PyTorch-Models/Model/llama2/generate.py similarity index 100% rename from python/llm/example/pytorch-models/llama2/generate.py rename to python/llm/example/CPU/PyTorch-Models/Model/llama2/generate.py diff --git a/python/llm/example/pytorch-models/openai-whisper/readme.md b/python/llm/example/CPU/PyTorch-Models/Model/openai-whisper/readme.md similarity index 100% rename from python/llm/example/pytorch-models/openai-whisper/readme.md rename to python/llm/example/CPU/PyTorch-Models/Model/openai-whisper/readme.md diff --git a/python/llm/example/pytorch-models/openai-whisper/recognize.py b/python/llm/example/CPU/PyTorch-Models/Model/openai-whisper/recognize.py similarity index 100% rename from python/llm/example/pytorch-models/openai-whisper/recognize.py rename to python/llm/example/CPU/PyTorch-Models/Model/openai-whisper/recognize.py diff --git a/python/llm/example/CPU/PyTorch-Models/More-Data-Types/.keep b/python/llm/example/CPU/PyTorch-Models/More-Data-Types/.keep new file mode 100644 index 00000000..e69de29b diff --git a/python/llm/example/CPU/PyTorch-Models/README.md b/python/llm/example/CPU/PyTorch-Models/README.md new file mode 100644 index 00000000..06860d45 --- /dev/null +++ b/python/llm/example/CPU/PyTorch-Models/README.md @@ -0,0 +1,7 @@ +# Running PyTorch model using BigDL-LLM on Intel CPU + +This folder contains examples of running any PyTorch model on BigDL-LLM (with "one-line code change"): + +- [Model](Model): examples of running PyTorch models (e.g., Openai Whisper, LLaMA2, ChatGLM2, Falcon, MPT, Baichuan2, etc.) using INT4 optimizations +- [More-Data-Types](More-Data-Types): examples of applying other low bit optimizations (NF4/INT5/INT8, etc.) +- [Save-Load](Save-Load): examples of saving and loading low-bit models diff --git a/python/llm/example/CPU/PyTorch-Models/Save-Load/.keep b/python/llm/example/CPU/PyTorch-Models/Save-Load/.keep new file mode 100644 index 00000000..e69de29b diff --git a/python/llm/example/CPU/README.md b/python/llm/example/CPU/README.md new file mode 100644 index 00000000..1344cbb6 --- /dev/null +++ b/python/llm/example/CPU/README.md @@ -0,0 +1,18 @@ +# BigDL-LLM Examples on Intel CPU + +This folder contains examples of running BigDL-LLM on Intel CPU: + +- [HF-Transformers-AutoModels](HF-Transformers-AutoModels): running any Hugging Face Transformers model on BigDL-LLM (using the standard AutoModel APIs) +- [PyTorch-Models](PyTorch-Models): running any PyTorch model on BigDL-LLM (with "one-line code change") +- [Native-Models](Native-Models): converting & running LLM in `llama`/`chatglm`/`bloom`/`gptneox`/`starcoder` model family using native (cpp) implementation +- [LangChain](LangChain): running LangChain applications on BigDL-LLM + +## System Support +**Hardware**: +- Intel® Core™ processors +- Intel® Xeon® processors + +**Operating System**: +- Ubuntu 20.04 or later +- CentOS 7 or later +- Windows 10/11, with or without WSL diff --git a/python/llm/example/gpu/hf-transformers-models/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/README.md similarity index 98% rename from python/llm/example/gpu/hf-transformers-models/README.md rename to python/llm/example/GPU/HF-Transformers-AutoModels/Model/README.md index 0798745b..a2164718 100644 --- a/python/llm/example/gpu/hf-transformers-models/README.md +++ b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/README.md @@ -21,6 +21,7 @@ You can use BigDL-LLM to run almost every Huggingface Transformer models with IN - Intel Arc™ A-Series Graphics - Intel Data Center GPU Flex Series +- Intel Data Center GPU Max Series ## Recommended Requirements To apply Intel GPU acceleration, there’re several steps for tools installation and environment preparation. diff --git a/python/llm/example/gpu/hf-transformers-models/baichuan/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan/README.md similarity index 100% rename from python/llm/example/gpu/hf-transformers-models/baichuan/README.md rename to python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan/README.md diff --git a/python/llm/example/gpu/hf-transformers-models/baichuan/generate.py b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan/generate.py similarity index 100% rename from python/llm/example/gpu/hf-transformers-models/baichuan/generate.py rename to python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan/generate.py diff --git a/python/llm/example/gpu/hf-transformers-models/baichuan2/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan2/README.md similarity index 100% rename from python/llm/example/gpu/hf-transformers-models/baichuan2/README.md rename to python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan2/README.md diff --git a/python/llm/example/gpu/hf-transformers-models/baichuan2/generate.py b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan2/generate.py similarity index 100% rename from python/llm/example/gpu/hf-transformers-models/baichuan2/generate.py rename to python/llm/example/GPU/HF-Transformers-AutoModels/Model/baichuan2/generate.py diff --git a/python/llm/example/gpu/hf-transformers-models/chatglm2/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/chatglm2/README.md similarity index 100% rename from python/llm/example/gpu/hf-transformers-models/chatglm2/README.md rename to python/llm/example/GPU/HF-Transformers-AutoModels/Model/chatglm2/README.md diff --git a/python/llm/example/gpu/hf-transformers-models/chatglm2/generate.py b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/chatglm2/generate.py similarity index 100% rename from python/llm/example/gpu/hf-transformers-models/chatglm2/generate.py rename to python/llm/example/GPU/HF-Transformers-AutoModels/Model/chatglm2/generate.py diff --git a/python/llm/example/gpu/hf-transformers-models/chatglm2/streamchat.py b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/chatglm2/streamchat.py similarity index 100% rename from python/llm/example/gpu/hf-transformers-models/chatglm2/streamchat.py rename to python/llm/example/GPU/HF-Transformers-AutoModels/Model/chatglm2/streamchat.py diff --git a/python/llm/example/gpu/hf-transformers-models/chinese-llama2/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/chinese-llama2/README.md similarity index 100% rename from python/llm/example/gpu/hf-transformers-models/chinese-llama2/README.md rename to python/llm/example/GPU/HF-Transformers-AutoModels/Model/chinese-llama2/README.md diff --git a/python/llm/example/gpu/hf-transformers-models/chinese-llama2/generate.py b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/chinese-llama2/generate.py similarity index 100% rename from python/llm/example/gpu/hf-transformers-models/chinese-llama2/generate.py rename to python/llm/example/GPU/HF-Transformers-AutoModels/Model/chinese-llama2/generate.py diff --git a/python/llm/example/gpu/hf-transformers-models/falcon/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/falcon/README.md similarity index 100% rename from python/llm/example/gpu/hf-transformers-models/falcon/README.md rename to python/llm/example/GPU/HF-Transformers-AutoModels/Model/falcon/README.md diff --git a/python/llm/example/transformers/transformers_int4/falcon/falcon-7b-instruct/modelling_RW.py b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/falcon/falcon-7b-instruct/modelling_RW.py similarity index 100% rename from python/llm/example/transformers/transformers_int4/falcon/falcon-7b-instruct/modelling_RW.py rename to python/llm/example/GPU/HF-Transformers-AutoModels/Model/falcon/falcon-7b-instruct/modelling_RW.py diff --git a/python/llm/example/gpu/hf-transformers-models/falcon/generate.py b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/falcon/generate.py similarity index 100% rename from python/llm/example/gpu/hf-transformers-models/falcon/generate.py rename to python/llm/example/GPU/HF-Transformers-AutoModels/Model/falcon/generate.py diff --git a/python/llm/example/gpu/hf-transformers-models/gpt-j/generate.py b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/gpt-j/generate.py similarity index 100% rename from python/llm/example/gpu/hf-transformers-models/gpt-j/generate.py rename to python/llm/example/GPU/HF-Transformers-AutoModels/Model/gpt-j/generate.py diff --git a/python/llm/example/gpu/hf-transformers-models/gpt-j/readme.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/gpt-j/readme.md similarity index 100% rename from python/llm/example/gpu/hf-transformers-models/gpt-j/readme.md rename to python/llm/example/GPU/HF-Transformers-AutoModels/Model/gpt-j/readme.md diff --git a/python/llm/example/gpu/hf-transformers-models/internlm/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/internlm/README.md similarity index 100% rename from python/llm/example/gpu/hf-transformers-models/internlm/README.md rename to python/llm/example/GPU/HF-Transformers-AutoModels/Model/internlm/README.md diff --git a/python/llm/example/gpu/hf-transformers-models/internlm/generate.py b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/internlm/generate.py similarity index 100% rename from python/llm/example/gpu/hf-transformers-models/internlm/generate.py rename to python/llm/example/GPU/HF-Transformers-AutoModels/Model/internlm/generate.py diff --git a/python/llm/example/gpu/hf-transformers-models/llama2/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/llama2/README.md similarity index 100% rename from python/llm/example/gpu/hf-transformers-models/llama2/README.md rename to python/llm/example/GPU/HF-Transformers-AutoModels/Model/llama2/README.md diff --git a/python/llm/example/gpu/hf-transformers-models/llama2/generate.py b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/llama2/generate.py similarity index 100% rename from python/llm/example/gpu/hf-transformers-models/llama2/generate.py rename to python/llm/example/GPU/HF-Transformers-AutoModels/Model/llama2/generate.py diff --git a/python/llm/example/gpu/hf-transformers-models/mpt/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mpt/README.md similarity index 100% rename from python/llm/example/gpu/hf-transformers-models/mpt/README.md rename to python/llm/example/GPU/HF-Transformers-AutoModels/Model/mpt/README.md diff --git a/python/llm/example/gpu/hf-transformers-models/mpt/generate.py b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/mpt/generate.py similarity index 100% rename from python/llm/example/gpu/hf-transformers-models/mpt/generate.py rename to python/llm/example/GPU/HF-Transformers-AutoModels/Model/mpt/generate.py diff --git a/python/llm/example/gpu/hf-transformers-models/qwen/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/qwen/README.md similarity index 100% rename from python/llm/example/gpu/hf-transformers-models/qwen/README.md rename to python/llm/example/GPU/HF-Transformers-AutoModels/Model/qwen/README.md diff --git a/python/llm/example/gpu/hf-transformers-models/qwen/generate.py b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/qwen/generate.py similarity index 100% rename from python/llm/example/gpu/hf-transformers-models/qwen/generate.py rename to python/llm/example/GPU/HF-Transformers-AutoModels/Model/qwen/generate.py diff --git a/python/llm/example/gpu/hf-transformers-models/starcoder/generate.py b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/starcoder/generate.py similarity index 100% rename from python/llm/example/gpu/hf-transformers-models/starcoder/generate.py rename to python/llm/example/GPU/HF-Transformers-AutoModels/Model/starcoder/generate.py diff --git a/python/llm/example/gpu/hf-transformers-models/starcoder/readme.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/starcoder/readme.md similarity index 100% rename from python/llm/example/gpu/hf-transformers-models/starcoder/readme.md rename to python/llm/example/GPU/HF-Transformers-AutoModels/Model/starcoder/readme.md diff --git a/python/llm/example/gpu/hf-transformers-models/voiceassistant/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/voiceassistant/README.md similarity index 100% rename from python/llm/example/gpu/hf-transformers-models/voiceassistant/README.md rename to python/llm/example/GPU/HF-Transformers-AutoModels/Model/voiceassistant/README.md diff --git a/python/llm/example/gpu/hf-transformers-models/voiceassistant/generate.py b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/voiceassistant/generate.py similarity index 100% rename from python/llm/example/gpu/hf-transformers-models/voiceassistant/generate.py rename to python/llm/example/GPU/HF-Transformers-AutoModels/Model/voiceassistant/generate.py diff --git a/python/llm/example/gpu/hf-transformers-models/whisper/readme.md b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/whisper/readme.md similarity index 100% rename from python/llm/example/gpu/hf-transformers-models/whisper/readme.md rename to python/llm/example/GPU/HF-Transformers-AutoModels/Model/whisper/readme.md diff --git a/python/llm/example/gpu/hf-transformers-models/whisper/recognize.py b/python/llm/example/GPU/HF-Transformers-AutoModels/Model/whisper/recognize.py similarity index 100% rename from python/llm/example/gpu/hf-transformers-models/whisper/recognize.py rename to python/llm/example/GPU/HF-Transformers-AutoModels/Model/whisper/recognize.py diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/More-Data-Types/.keep b/python/llm/example/GPU/HF-Transformers-AutoModels/More-Data-Types/.keep new file mode 100644 index 00000000..e69de29b diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/README.md b/python/llm/example/GPU/HF-Transformers-AutoModels/README.md new file mode 100644 index 00000000..da1a13d6 --- /dev/null +++ b/python/llm/example/GPU/HF-Transformers-AutoModels/README.md @@ -0,0 +1,7 @@ +# Running Hugging Face Transformers model using BigDL-LLM on Intel GPU + +This folder contains examples of running any Hugging Face Transformers model on BigDL-LLM (using the standard AutoModel APIs): + +- [Model](Model): examples of running Hugging Face Transformers models (e.g., LLaMA2, ChatGLM2, Falcon, MPT, Baichuan2, etc.) using INT4 optimizations +- [More-Data-Types](More-Data-Types): examples of applying other low bit optimizations (NF4/INT5/INT8, etc.) +- [Save-Load](Save-Load): examples of saving and loading low-bit models diff --git a/python/llm/example/GPU/HF-Transformers-AutoModels/Save-Load/.keep b/python/llm/example/GPU/HF-Transformers-AutoModels/Save-Load/.keep new file mode 100644 index 00000000..e69de29b diff --git a/python/llm/example/GPU/PyTorch-Models/Model/.keep b/python/llm/example/GPU/PyTorch-Models/Model/.keep new file mode 100644 index 00000000..e69de29b diff --git a/python/llm/example/GPU/PyTorch-Models/More-Data-Types/.keep b/python/llm/example/GPU/PyTorch-Models/More-Data-Types/.keep new file mode 100644 index 00000000..e69de29b diff --git a/python/llm/example/GPU/PyTorch-Models/README.md b/python/llm/example/GPU/PyTorch-Models/README.md new file mode 100644 index 00000000..ce5cd50e --- /dev/null +++ b/python/llm/example/GPU/PyTorch-Models/README.md @@ -0,0 +1,7 @@ +# Running PyTorch model using BigDL-LLM on Intel GPU + +This folder contains examples of running any PyTorch model on BigDL-LLM (with "one-line code change"): + +- [Model](Model): examples of running PyTorch models (e.g., Openai Whisper, LLaMA2, ChatGLM2, Falcon, MPT, Baichuan2, etc.) using INT4 optimizations +- [More-Data-Types](More-Data-Types): examples of applying other low bit optimizations (NF4/INT5/INT8, etc.) +- [Save-Load](Save-Load): examples of saving and loading low-bit models diff --git a/python/llm/example/GPU/PyTorch-Models/Save-Load/.keep b/python/llm/example/GPU/PyTorch-Models/Save-Load/.keep new file mode 100644 index 00000000..e69de29b diff --git a/python/llm/example/gpu/qlora_finetuning/README.md b/python/llm/example/GPU/QLoRA-FineTuning/README.md similarity index 100% rename from python/llm/example/gpu/qlora_finetuning/README.md rename to python/llm/example/GPU/QLoRA-FineTuning/README.md diff --git a/python/llm/example/gpu/qlora_finetuning/export_merged_model.py b/python/llm/example/GPU/QLoRA-FineTuning/export_merged_model.py similarity index 100% rename from python/llm/example/gpu/qlora_finetuning/export_merged_model.py rename to python/llm/example/GPU/QLoRA-FineTuning/export_merged_model.py diff --git a/python/llm/example/gpu/qlora_finetuning/qlora_finetuning.py b/python/llm/example/GPU/QLoRA-FineTuning/qlora_finetuning.py similarity index 100% rename from python/llm/example/gpu/qlora_finetuning/qlora_finetuning.py rename to python/llm/example/GPU/QLoRA-FineTuning/qlora_finetuning.py diff --git a/python/llm/example/GPU/README.md b/python/llm/example/GPU/README.md new file mode 100644 index 00000000..8cb7c721 --- /dev/null +++ b/python/llm/example/GPU/README.md @@ -0,0 +1,26 @@ +# BigDL-LLM Examples on Intel GPU + +This folder contains examples of running BigDL-LLM on Intel GPU: + +- [HF-Transformers-AutoModels](HF-Transformers-AutoModels): running any Hugging Face Transformers model on BigDL-LLM (using the standard AutoModel APIs) +- [PyTorch-Models](PyTorch-Models): running any PyTorch model on BigDL-LLM (with "one-line code change") +- [QLoRA-FineTuning](QLoRA-FineTuning): running QLoRA finetuning on BigDL-LLM + + +## System Support +**Hardware**: +- Intel Arc™ A-Series Graphics +- Intel Data Center GPU Flex Series +- Intel Data Center GPU Max Series + +**Operating System**: +- Ubuntu 20.04 or later (Ubuntu 22.04 is preferred) + +## Requirements +To apply Intel GPU acceleration, there’re several steps for tools installation and environment preparation. + +Step 1, please refer to our [driver installation](https://dgpu-docs.intel.com/driver/installation.html) for general purpose GPU capabilities. +> **Note**: IPEX 2.0.110+xpu requires Intel GPU Driver version is [Stable 647.21](https://dgpu-docs.intel.com/releases/stable_647_21_20230714.html). + +Step 2, you also need to download and install [Intel® oneAPI Base Toolkit](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html). OneMKL and DPC++ compiler are needed, others are optional. +> **Note**: IPEX 2.0.110+xpu requires Intel® oneAPI Base Toolkit's version >= 2023.2.0. diff --git a/python/llm/example/cpp-python/README.md b/python/llm/example/cpp-python/README.md deleted file mode 100644 index 60d51707..00000000 --- a/python/llm/example/cpp-python/README.md +++ /dev/null @@ -1,28 +0,0 @@ -# BigDL-LLM INT4 Inference Using Llama-Cpp-Python Format API - -In this example, we show how to run inference on converted INT4 model using llama-cpp-python format API. - -> **Note**: Currently model family LLaMA, GPT-NeoX, BLOOM and StarCoder are supported. - -## Prepare Environment -We suggest using conda to manage environment: -```bash -conda create -n llm python=3.9 -conda activate llm - -pip install --pre --upgrade bigdl-llm[all] -``` - -## Convert Models using bigdl-llm -Follow the instructions in [Convert model](https://github.com/intel-analytics/BigDL/tree/main/python/llm#convert-model). - - -## Run the example -```bash -python ./int4_inference.py -m CONVERTED_MODEL_PATH -x MODEL_FAMILY -p PROMPT -t THREAD_NUM -``` -arguments info: -- `-m CONVERTED_MODEL_PATH`: **required**, path to the converted model -- `-x MODEL_FAMILY`: **required**, the model family of the model specified in `-m`, available options are `llama`, `gptneox`, `bloom` and `starcoder` -- `-p PROMPT`: question to ask. Default is `What is AI?`. -- `-t THREAD_NUM`: specify the number of threads to use for inference. Default is `2`. diff --git a/python/llm/example/cpp-python/int4_inference.py b/python/llm/example/cpp-python/int4_inference.py deleted file mode 100644 index b7edcb68..00000000 --- a/python/llm/example/cpp-python/int4_inference.py +++ /dev/null @@ -1,60 +0,0 @@ -# -# Copyright 2016 The BigDL Authors. -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# http://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. -# - -# This would makes sure Python is aware there is more than one sub-package within bigdl, -# physically located elsewhere. -# Otherwise there would be module not found error in non-pip's setting as Python would -# only search the first bigdl package and end up finding only one sub-package. - -import argparse - -def main(args): - model_family = args.model_family - model_path = args.model_path - prompt = args.prompt - n_threads = args.thread_num - - if model_family == "llama": - from bigdl.llm.models import Llama - modelclass = Llama - if model_family == "bloom": - from bigdl.llm.models import Bloom - modelclass = Bloom - if model_family == "gptneox": - from bigdl.llm.models import Gptneox - modelclass = Gptneox - if model_family == "starcoder": - from bigdl.llm.models import Starcoder - modelclass = Starcoder - - model = modelclass(model_path, n_threads=n_threads) - response=model(prompt) - print(response) - -if __name__ == '__main__': - parser = argparse.ArgumentParser(description='Llama-CPP-Python style API Simple Example') - parser.add_argument('-x','--model-family', type=str, required=True, - choices=["llama", "bloom", "gptneox", "starcoder"], - help='the model family') - parser.add_argument('-m','--model-path', type=str, required=True, - help='the path to the converted llm model') - parser.add_argument('-p', '--prompt', type=str, default='What is AI?', - help='qustion you want to ask.') - parser.add_argument('-t','--thread-num', type=int, default=2, - help='number of threads to use for inference') - args = parser.parse_args() - - main(args) diff --git a/python/llm/example/gpu/README.md b/python/llm/example/gpu/README.md deleted file mode 100644 index 1abff7e5..00000000 --- a/python/llm/example/gpu/README.md +++ /dev/null @@ -1,41 +0,0 @@ -# BigDL-LLM INT4 Optimization for Large Language Model on Intel GPUs -You can use BigDL-LLM to run almost every Huggingface Transformer models with INT4 optimizations on your laptops with Intel GPUs. Moreover, you can also use `optimize_model` API to accelerate general PyTorch models on Intel GPUs. - -## Verified models -| Model | Example | -|------------|----------------------------------------------------------| -| Baichuan | [link](hf-transformers-models/baichuan) | -| Baichuan2 | [link](hf-transformers-models/baichuan2) | -| ChatGLM2 | [link](hf-transformers-models/chatglm2) | -| Chinese Llama2 | [link](hf-transformers-models/chinese-llama2)| -| Falcon | [link](hf-transformers-models/falcon) | -| GPT-J | [link](hf-transformers-models/gpt-j) | -| InternLM | [link](hf-transformers-models/internlm) | -| LLaMA 2 | [link](hf-transformers-models/llama2) | -| MPT | [link](hf-transformers-models/mpt) | -| Qwen | [link](hf-transformers-models/qwen) | -| StarCoder | [link](hf-transformers-models/starcoder) | -| Whisper | [link](hf-transformers-models/whisper) | - -## Verified Hardware Platforms - -- Intel Arc™ A-Series Graphics -- Intel Data Center GPU Flex Series - -## Recommended Requirements -To apply Intel GPU acceleration, there’re several steps for tools installation and environment preparation. - -Step 1, only Linux system is supported now, Ubuntu 22.04 is prefered. - -Step 2, please refer to our [driver installation](https://dgpu-docs.intel.com/driver/installation.html) for general purpose GPU capabilities. -> **Note**: IPEX 2.0.110+xpu requires Intel GPU Driver version is [Stable 647.21](https://dgpu-docs.intel.com/releases/stable_647_21_20230714.html). - -Step 3, you also need to download and install [Intel® oneAPI Base Toolkit](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html). OneMKL and DPC++ compiler are needed, others are optional. -> **Note**: IPEX 2.0.110+xpu requires Intel® oneAPI Base Toolkit's version >= 2023.2.0. - -## Best Known Configuration on Linux -For better performance, it is recommended to set environment variables on Linux: -```bash -export USE_XETLA=OFF -export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 -``` diff --git a/python/llm/example/gpu/pytorch-models/README.md b/python/llm/example/gpu/pytorch-models/README.md deleted file mode 100644 index 6c958e7a..00000000 --- a/python/llm/example/gpu/pytorch-models/README.md +++ /dev/null @@ -1,25 +0,0 @@ -# BigDL-LLM INT4 Optimization for Large Language Model on Intel GPUs -You can use `optimize_model` API to accelerate general PyTorch models on Intel servers and PCs. This directory contains example scripts to help you quickly get started using BigDL-LLM to run some popular open-source models in the community. Each model has its own dedicated folder, where you can find detailed instructions on how to install and run it. - -## Verified Hardware Platforms - -- Intel Arc™ A-Series Graphics -- Intel Data Center GPU Flex Series - -## Recommended Requirements -To apply Intel GPU acceleration, there’re several steps for tools installation and environment preparation. - -Step 1, only Linux system is supported now, Ubuntu 22.04 is prefered. - -Step 2, please refer to our [driver installation](https://dgpu-docs.intel.com/driver/installation.html) for general purpose GPU capabilities. -> **Note**: IPEX 2.0.110+xpu requires Intel GPU Driver version is [Stable 647.21](https://dgpu-docs.intel.com/releases/stable_647_21_20230714.html). - -Step 3, you also need to download and install [Intel® oneAPI Base Toolkit](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit-download.html). OneMKL and DPC++ compiler are needed, others are optional. -> **Note**: IPEX 2.0.110+xpu requires Intel® oneAPI Base Toolkit's version >= 2023.2.0. - -## Best Known Configuration on Linux -For better performance, it is recommended to set environment variables on Linux: -```bash -export USE_XETLA=OFF -export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 -``` diff --git a/python/llm/example/transformers/transformers_int4/GPU/README.md b/python/llm/example/transformers/transformers_int4/GPU/README.md deleted file mode 100644 index f12e7824..00000000 --- a/python/llm/example/transformers/transformers_int4/GPU/README.md +++ /dev/null @@ -1 +0,0 @@ -### The GPU examples for `bigdl-llm` have been moved to [here](../../../gpu).