Commit graph

412 commits

Author SHA1 Message Date
Ziteng Zhang
44b4a0c9c5 [LLM] Correct prompt format of Yi, Llama2 and Qwen in generate.py (#9786)
* correct prompt format of Yi

* correct prompt format of llama2 in cpu generate.py

* correct prompt format of Qwen in GPU example
2023-12-26 16:57:55 +08:00
Heyang Sun
66e286a73d Support for Mixtral AWQ (#9775)
* Support for Mixtral AWQ

* Update README.md

* Update README.md

* Update awq_config.py

* Update README.md

* Update README.md
2023-12-25 16:08:09 +08:00
Ruonan Wang
1917bbe626 LLM: fix BF16Linear related training & inference issue (#9755)
* fix bf16 related issue

* fix

* update based on comment & add arc lora script

* update readme

* update based on comment

* update based on comment

* update

* force to bf16

* fix style

* move check input dtype into function

* update convert

* meet code review

* meet code review

* update merged model to support new training_mode api

* fix typo
2023-12-25 14:49:30 +08:00
Yina Chen
449b387125 Support relora in bigdl-llm (#9687)
* init

* fix style

* update

* support resume & update readme

* update

* update

* remove important

* add training mode

* meet comments
2023-12-25 14:04:28 +08:00
Yishuo Wang
be13b162fe add codeshell example (#9743) 2023-12-25 10:54:01 +08:00
binbin Deng
ed8ed76d4f LLM: update deepspeed autotp usage (#9733) 2023-12-25 09:41:14 +08:00
Qiyuan Gong
4c487313f2 Revert "[LLM] IPEX auto importer turn on by default for XPU (#9730)" (#9759)
This reverts commit 0284801fbd.
2023-12-22 16:38:24 +08:00
Qiyuan Gong
0284801fbd [LLM] IPEX auto importer turn on by default for XPU (#9730)
* Set BIGDL_IMPORT_IPEX default to true, i.e., auto import IPEX for XPU.
* Remove import intel_extension_for_pytorch as ipex from GPU example.
* Add support for bigdl-core-xe-21.
2023-12-22 16:20:32 +08:00
Ruonan Wang
2f36769208 LLM: bigdl-llm lora support & lora example (#9740)
* lora support and single card example

* support multi-card, refactor code

* fix model id and style

* remove torch patch, add two new class for bf16, update example

* fix style

* change to training_mode

* small fix

* add more info in help

* fixstyle, update readme

* fix ut

* fix ut

* Handling compatibility issues with default LoraConfig
2023-12-22 11:05:39 +08:00
Wang, Jian4
984697afe2 LLM: Add bloom gguf support (#9734)
* init

* update bloom add merges

* update

* update readme

* update for llama error

* update
2023-12-21 14:06:25 +08:00
Heyang Sun
1fa7793fc0 Load Mixtral GGUF Model (#9690)
* Load Mixtral GGUF Model

* refactor

* fix empty tensor when to cpu

* update gpu and cpu readmes

* add dtype when set tensor into module
2023-12-19 13:54:38 +08:00
binbin Deng
12df70953e LLM: add resume_from_checkpoint related section (#9705) 2023-12-18 12:27:02 +08:00
Wang, Jian4
b8437a1c1e LLM: Add gguf mistral model support (#9691)
* add mistral support

* need to upgrade transformers version

* update
2023-12-15 13:37:39 +08:00
Wang, Jian4
496bb2e845 LLM: Support load BaiChuan model family gguf model (#9685)
* support baichuan model family gguf model

* update gguf generate.py

* add verify models

* add support model_family

* update

* update style

* update type

* update readme

* update

* remove support model_family
2023-12-15 13:34:33 +08:00
Lilac09
3afed99216 fix path issue (#9696) 2023-12-15 11:21:49 +08:00
Jason Dai
37f509bb95 Update readme (#9692) 2023-12-14 19:50:21 +08:00
Ziteng Zhang
21c7503a42 [LLM] Correct prompt format of Qwen in generate.py (#9678)
* Change qwen prompt format to chatml
2023-12-14 14:01:30 +08:00
Qiyuan Gong
223c9622f7 [LLM] Mixtral CPU examples (#9673)
* Mixtral CPU PyTorch and hugging face examples, based on #9661 and #9671
2023-12-14 10:35:11 +08:00
ZehuaCao
877229f3be [LLM]Add Yi-34B-AWQ to verified AWQ model. (#9676)
* verfiy Yi-34B-AWQ

* update
2023-12-14 09:55:47 +08:00
binbin Deng
68a4be762f remove disco mixtral, update oneapi version (#9671) 2023-12-13 23:24:59 +08:00
ZehuaCao
503880809c verfiy codeLlama (#9668) 2023-12-13 15:39:31 +08:00
Heyang Sun
c64e2248ef fix str returned by get_int_from_str rather than expected int (#9667) 2023-12-13 11:01:21 +08:00
binbin Deng
bf1bcf4a14 add official Mixtral model support (#9663) 2023-12-12 22:27:07 +08:00
binbin Deng
2fe38b4b9b LLM: add mixtral GPU examples (#9661) 2023-12-12 20:26:36 +08:00
ZehuaCao
45721f3473 verfiy llava (#9649) 2023-12-11 14:26:05 +08:00
Heyang Sun
9f02f96160 [LLM] support for Yi AWQ model (#9648) 2023-12-11 14:07:34 +08:00
Yina Chen
70f5e7bf0d Support peft LoraConfig (#9636)
* support peft loraconfig

* use testcase to test

* fix style

* meet comments
2023-12-08 16:13:03 +08:00
binbin Deng
499100daf1 LLM: Add solution to fix oneccl related error (#9630) 2023-12-08 10:51:55 +08:00
ZehuaCao
6eca8a8bb5 update transformer version (#9631) 2023-12-08 09:36:00 +08:00
Heyang Sun
3811cf43c9 [LLM] update AWQ documents (#9623)
* [LLM] update AWQ and verified models' documents

* refine

* refine links

* refine
2023-12-07 16:02:20 +08:00
Jason Dai
51b668f229 Update GGUF readme (#9611) 2023-12-06 18:21:54 +08:00
dingbaorong
a7bc89b3a1 remove q4_1 in gguf example (#9610)
* remove q4_1

* fixes
2023-12-06 16:00:05 +08:00
Yina Chen
404e101ded QALora example (#9551)
* Support qa-lora

* init

* update

* update

* update

* update

* update

* update merge

* update

* fix style & update scripts

* update

* address comments

* fix typo

* fix typo

---------

Co-authored-by: Yang Wang <yang3.wang@intel.com>
2023-12-06 15:36:21 +08:00
dingbaorong
89069d6173 Add gpu gguf example (#9603)
* add gpu gguf example

* some fixes

* address kai's comments

* address json's comments
2023-12-06 15:17:54 +08:00
Ziteng Zhang
aeb77b2ab1 Add minimum Qwen model version (#9606) 2023-12-06 11:49:14 +08:00
Heyang Sun
4e70e33934 [LLM] code and document for distributed qlora (#9585)
* [LLM] code and document for distributed qlora

* doc

* refine for gradient checkpoint

* refine

* Update alpaca_qlora_finetuning_cpu.py

* Update alpaca_qlora_finetuning_cpu.py

* Update alpaca_qlora_finetuning_cpu.py

* add link in doc
2023-12-06 09:23:17 +08:00
Zheng, Yi
d154b38bf9 Add llama2 gpu low memory example (#9514)
* Add low memory example

* Minor fixes

* Update readme.md
2023-12-05 17:29:48 +08:00
Jason Dai
06febb5fa7 Update readme for FP8/FP4 inference examples (#9601) 2023-12-05 15:59:03 +08:00
dingbaorong
a66fbedd7e add gpu more data types example (#9592)
* add gpu more data types example

* add int8
2023-12-05 15:45:38 +08:00
Jinyi Wan
b721138132 Add cpu and gpu examples for BlueLM (#9589)
* Add cpu int4 example for BlueLM

* addexample optimize_model cpu for bluelm

* add example gpu int4 blueLM

* add example optimiza_model GPU for bluelm

* Fixing naming issues and BigDL package version.

* Fixing naming issues...

* Add BlueLM in README.md "Verified Models"
2023-12-05 13:59:02 +08:00
Guancheng Fu
8b00653039 fix doc (#9599) 2023-12-05 13:49:31 +08:00
Wang, Jian4
ed0dc57c6e LLM: Add cpu qlora support other models guide (#9567)
* use bf16 flag

* add using baichuan model

* update merge

* remove

* update
2023-12-01 11:18:04 +08:00
Jason Dai
bda404fc8f Update readme (#9575) 2023-11-30 22:45:52 +08:00
Yishuo Wang
66f5b45f57 [LLM] add a llama2 gguf example (#9553) 2023-11-30 16:37:17 +08:00
Wang, Jian4
a0a80d232e LLM: Add qlora cpu distributed readme (#9561)
* init readme

* add distributed guide

* update
2023-11-30 13:42:30 +08:00
Qiyuan Gong
d85a430a8c Uing bigdl-llm-init instead of bigdl-nano-init (#9558)
* Replace `bigdl-nano-init` with `bigdl-llm-init`.
* Install `bigdl-llm` instead of `bigdl-nano`.
* Remove nano in README.
2023-11-30 10:10:29 +08:00
binbin Deng
4ff2ca9d0d LLM: fix loss error on Arc (#9550) 2023-11-29 15:16:18 +08:00
Wang, Jian4
b824754256 LLM: Update for cpu qlora mpirun (#9548) 2023-11-29 10:56:17 +08:00
Guancheng Fu
963a5c8d79 Add vLLM-XPU version's README/examples (#9536)
* test

* test

* fix last kv cache

* add xpu readme

* remove numactl for xpu example

* fix link error

* update max_num_batched_tokens logic

* add explaination

* add xpu environement version requirement

* refine gpu memory

* fix

* fix style
2023-11-28 09:44:03 +08:00
Guancheng Fu
b6c3520748 Remove xformers from vLLM-CPU (#9535) 2023-11-27 11:21:25 +08:00
binbin Deng
2b9c7d2a59 LLM: quick fix alpaca qlora finetuning script (#9534) 2023-11-27 11:04:27 +08:00
binbin Deng
6bec0faea5 LLM: support Mistral AWQ models (#9520) 2023-11-24 16:20:22 +08:00
Jason Dai
b3178d449f Update README.md (#9525) 2023-11-23 21:45:20 +08:00
Jason Dai
82898a4203 Update GPU example README (#9524) 2023-11-23 21:20:26 +08:00
Jason Dai
064848028f Update README.md (#9523) 2023-11-23 21:16:21 +08:00
Guancheng Fu
bf579507c2 Integrate vllm (#9310)
* done

* Rename structure

* add models

* Add structure/sampling_params,sequence

* add input_metadata

* add outputs

* Add policy,logger

* add and update

* add parallelconfig back

* core/scheduler.py

* Add llm_engine.py

* Add async_llm_engine.py

* Add tested entrypoint

* fix minor error

* Fix everything

* fix kv cache view

* fix

* fix

* fix

* format&refine

* remove logger from repo

* try to add token latency

* remove logger

* Refine config.py

* finish worker.py

* delete utils.py

* add license

* refine

* refine sequence.py

* remove sampling_params.py

* finish

* add license

* format

* add license

* refine

* refine

* Refine line too long

* remove exception

* so dumb style-check

* refine

* refine

* refine

* refine

* refine

* refine

* add README

* refine README

* add warning instead error

* fix padding

* add license

* format

* format

* format fix

* Refine vllm dependency (#1)

vllm dependency clear

* fix licence

* fix format

* fix format

* fix

* adapt LLM engine

* fix

* add license

* fix format

* fix

* Moving README.md to the correct position

* Fix readme.md

* done

* guide for adding models

* fix

* Fix README.md

* Add new model readme

* remove ray-logic

* refactor arg_utils.py

* remove distributed_init_method logic

* refactor entrypoints

* refactor input_metadata

* refactor model_loader

* refactor utils.py

* refactor models

* fix api server

* remove vllm.stucture

* revert by txy 1120

* remove utils

* format

* fix license

* add bigdl model

* Refer to a specfic commit

* Change code base

* add comments

* add async_llm_engine comment

* refine

* formatted

* add worker comments

* add comments

* add comments

* fix style

* add changes

---------

Co-authored-by: xiangyuT <xiangyu.tian@intel.com>
Co-authored-by: Xiangyu Tian <109123695+xiangyuT@users.noreply.github.com>
Co-authored-by: leonardozcm <leonardo1997zcm@gmail.com>
2023-11-23 16:46:45 +08:00
Heyang Sun
48fbb1eb94 support ccl (MPI) distributed mode in alpaca_qlora_finetuning_cpu (#9507) 2023-11-23 10:58:09 +08:00
Heyang Sun
11fa5a8a0e Fix QLoRA CPU dispatch_model issue about accelerate (#9506) 2023-11-23 08:41:25 +08:00
Heyang Sun
1453046938 install bigdl-llm in deepspeed cpu inference example (#9508) 2023-11-23 08:39:21 +08:00
binbin Deng
86743fb57b LLM: fix transformers version in CPU finetuning example (#9511) 2023-11-22 15:53:07 +08:00
binbin Deng
1a2129221d LLM: support resume from checkpoint in Alpaca QLoRA (#9502) 2023-11-22 13:49:14 +08:00
Ruonan Wang
076d106ef5 LLM: GPU QLoRA update to bf16 to accelerate gradient checkpointing (#9499)
* update to bf16 to accelerate gradient checkpoint

* add utils and fix ut
2023-11-21 17:08:36 +08:00
binbin Deng
b7ae572ac3 LLM: update Alpaca QLoRA finetuning example on GPU (#9492) 2023-11-21 14:22:19 +08:00
Wang, Jian4
c5cb3ab82e LLM : Add CPU alpaca qlora example (#9469)
* init

* update xpu to cpu

* update

* update readme

* update example

* update

* add refer

* add guide to train different datasets

* update readme

* update
2023-11-21 09:19:58 +08:00
binbin Deng
96fd26759c LLM: fix QLoRA finetuning example on CPU (#9489) 2023-11-20 14:31:24 +08:00
binbin Deng
3dac21ac7b LLM: add more example usages about alpaca qlora on different hardware (#9458) 2023-11-17 09:56:43 +08:00
Heyang Sun
921b263d6a update deepspeed install and run guide in README (#9441) 2023-11-17 09:11:39 +08:00
Yina Chen
d5263e6681 Add awq load support (#9453)
* Support directly loading GPTQ models from huggingface

* fix style

* fix tests

* change example structure

* address comments

* fix style

* init

* address comments

* add examples

* fix style

* fix style

* fix style

* fix style

* update

* remove

* meet comments

* fix style

---------

Co-authored-by: Yang Wang <yang3.wang@intel.com>
2023-11-16 14:06:25 +08:00
Ruonan Wang
0f82b8c3a0 LLM: update qlora example (#9454)
* update qlora example

* fix loss=0
2023-11-15 09:24:15 +08:00
Yang Wang
51d07a9fd8 Support directly loading gptq models from huggingface (#9391)
* Support directly loading GPTQ models from huggingface

* fix style

* fix tests

* change example structure

* address comments

* fix style

* address comments
2023-11-13 20:48:12 -08:00
Heyang Sun
da6bbc8c11 fix deepspeed dependencies to install (#9400)
* remove reductant parameter from deepspeed install

* Update install.sh

* Update install.sh
2023-11-13 16:42:50 +08:00
Zheng, Yi
9b5d0e9c75 Add examples for Yi-6B (#9421) 2023-11-13 10:53:15 +08:00
Wang, Jian4
ac7fbe77e2 Update qlora readme (#9416) 2023-11-12 19:29:29 +08:00
Zheng, Yi
0674146cfb Add cpu and gpu examples of distil-whisper (#9374)
* Add distil-whisper examples

* Fixes based on comments

* Minor fixes

---------

Co-authored-by: Ariadne330 <wyn2000330@126.com>
2023-11-10 16:09:55 +08:00
Ziteng Zhang
ad81b5d838 Update qlora README.md (#9422) 2023-11-10 15:19:25 +08:00
Heyang Sun
b23b91407c fix llm-init on deepspeed missing lib (#9419) 2023-11-10 13:51:24 +08:00
dingbaorong
36fbe2144d Add CPU examples of fuyu (#9393)
* add fuyu cpu examples

* add gpu example

* add comments

* add license

* remove gpu example

* fix inference time
2023-11-09 15:29:19 +08:00
binbin Deng
54d95e4907 LLM: add alpaca qlora finetuning example (#9276) 2023-11-08 16:25:17 +08:00
binbin Deng
97316bbb66 LLM: highlight transformers version requirement in mistral examples (#9380) 2023-11-08 16:05:03 +08:00
Heyang Sun
af94058203 [LLM] Support CPU deepspeed distributed inference (#9259)
* [LLM] Support CPU Deepspeed distributed inference

* Update run_deepspeed.py

* Rename

* fix style

* add new codes

* refine

* remove annotated codes

* refine

* Update README.md

* refine doc and example code
2023-11-06 17:56:42 +08:00
Jin Qiao
e6b6afa316 LLM: add aquila2 model example (#9356) 2023-11-06 15:47:39 +08:00
Yining Wang
9377b9c5d7 add CodeShell CPU example (#9345)
* add CodeShell CPU example

* fix some problems
2023-11-03 13:15:54 +08:00
Zheng, Yi
63411dff75 Add cpu examples of WizardCoder (#9344)
* Add wizardcoder example

* Minor fixes
2023-11-02 20:22:43 +08:00
dingbaorong
2e3bfbfe1f Add internlm_xcomposer cpu examples (#9337)
* add internlm-xcomposer cpu examples

* use chat

* some fixes

* add license

* address shengsheng's comments

* use demo.jpg
2023-11-02 15:50:02 +08:00
Jin Qiao
97a38958bd LLM: add CodeLlama CPU and GPU examples (#9338)
* LLM: add codellama CPU pytorch examples

* LLM: add codellama CPU transformers examples

* LLM: add codellama GPU transformers examples

* LLM: add codellama GPU pytorch examples

* LLM: add codellama in readme

* LLM: add LLaVA link
2023-11-02 15:34:25 +08:00
Zheng, Yi
63b2556ce2 Add cpu examples of skywork (#9340) 2023-11-02 15:10:45 +08:00
dingbaorong
f855a864ef add llava gpu example (#9324)
* add llava gpu example

* use 7b model

* fix typo

* add in README
2023-11-02 14:48:29 +08:00
Wang, Jian4
149146004f LLM: Add qlora finetunning CPU example (#9275)
* add qlora finetunning example

* update readme

* update example

* remove merge.py and update readme
2023-11-02 09:45:42 +08:00
Cengguang Zhang
9f3d4676c6 LLM: Add qwen-vl gpu example (#9290)
* create qwen-vl gpu example.

* add readme.

* fix.

* change input figure and update outputs.

* add qwen-vl pytorch model gpu example.

* fix.

* add readme.
2023-11-01 11:01:39 +08:00
Jin Qiao
96f8158fe2 LLM: adjust dolly v2 GPU example README (#9318) 2023-11-01 09:50:22 +08:00
Jin Qiao
c44c6dc43a LLM: add chatglm3 examples (#9305) 2023-11-01 09:50:05 +08:00
Ruonan Wang
d383ee8efb LLM: update QLoRA example about accelerate version(#9314) 2023-10-31 13:54:38 +08:00
dingbaorong
ee5becdd61 use coco image in Qwen-VL (#9298)
* use coco image

* add output

* address yuwen's comments
2023-10-30 14:32:35 +08:00
Yang Wang
8838707009 Add deepspeed autotp example readme (#9289)
* Add deepspeed autotp example readme

* change word
2023-10-27 13:04:38 -07:00
dingbaorong
f053688cad add cpu example of LLaVA (#9269)
* add LLaVA cpu example

* Small text updates

* update link

---------

Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
2023-10-27 18:59:20 +08:00
Zheng, Yi
7f2ad182fd Minor Fixes of README (#9294) 2023-10-27 18:25:46 +08:00
Zheng, Yi
1bff54a378 Display demo.jpg n the README.md of HuggingFace Transformers Agent (#9293)
* Display demo.jpg

* remove demo.jpg
2023-10-27 18:00:03 +08:00
Zheng, Yi
a4a1dec064 Add a cpu example of HuggingFace Transformers Agent (use vicuna-7b-v1.5) (#9284)
* Add examples of HF Agent

* Modify folder structure and add link of demo.jpg

* Fixes of readme

* Merge applications and Applications
2023-10-27 17:14:12 +08:00
Guoqiong Song
aa319de5e8 Add streaming-llm using llama2 on CPU (#9265)
Enable streaming-llm to let model take infinite inputs, tested on desktop and SPR10
2023-10-27 01:30:39 -07:00
Yang Wang
067c7e8098 Support deepspeed AutoTP (#9230)
* Support deepspeed

* add test script

* refactor convert

* refine example

* refine

* refine example

* fix style

* refine example and adapte latest ipex

* fix style
2023-10-24 23:46:28 -07:00
Yining Wang
a6a8afc47e Add qwen vl CPU example (#9221)
* eee

* add examples on CPU and GPU

* fix

* fix

* optimize model examples

* add Qwen-VL-Chat CPU example

* Add Qwen-VL CPU example

* fix optimize problem

* fix error

* Have updated, benchmark fix removed from this PR

* add generate API example

* Change formats in qwen-vl example

* Add CPU transformer int4 example for qwen-vl

* fix repo-id problem and add Readme

* change picture url

* Remove unnecessary file

---------

Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
2023-10-25 13:22:12 +08:00
dingbaorong
5a2ce421af add cpu and gpu examples of flan-t5 (#9171)
* add cpu and gpu examples of flan-t5

* address yuwen's comments
* Add explanation  why we add modules to not convert
* Refine prompt and add a translation example
* Add a empty line at the end of files

* add examples of flan-t5 using optimize_mdoel api

* address bin's comments

* address binbin's comments

* add flan-t5 in readme
2023-10-24 15:24:01 +08:00
Yining Wang
4a19f50d16 phi-1_5 CPU and GPU examples (#9173)
* eee

* add examples on CPU and GPU

* fix

* fix

* optimize model examples

* have updated

* Warmup and configs added

* Update two tables
2023-10-24 15:08:04 +08:00
Xin Qiu
0c5055d38c add position_ids and fuse embedding for falcon (#9242)
* add position_ids for falcon

* add cpu

* add cpu

* add license
2023-10-24 09:58:20 +08:00
Jin Qiao
a3b664ed03 LLM: add GPU More-Data-Types and Save/Load example (#9199) 2023-10-18 13:13:45 +08:00
Jin Qiao
d946bd7c55 LLM: add CPU More-Data-Types and Save-Load examples (#9179) 2023-10-17 14:38:52 +08:00
Ruonan Wang
c0497ab41b LLM: support kv_cache optimization for Qwen-VL-Chat (#9193)
* dupport qwen_vl_chat

* fix style
2023-10-17 13:33:56 +08:00
Yang Wang
7a2de00b48 Fixes for xpu Bf16 training (#9156)
* Support bf16 training

* Use a stable transformer version

* remove env

* fix style
2023-10-14 21:28:59 -07:00
Jin Qiao
db7f938fdc LLM: add replit and starcoder to gpu pytorch model example (#9154) 2023-10-13 15:44:17 +08:00
Jin Qiao
797b156a0d LLM: add dolly-v1 and dolly-v2 to gpu pytorch model example (#9153) 2023-10-13 15:43:35 +08:00
Jin Qiao
f754ab3e60 LLM: add baichuan and baichuan2 to gpu pytorch model example (#9152) 2023-10-13 13:44:31 +08:00
JIN Qiao
1a1ddc4144 LLM: Add Replit CPU and GPU example (#9028) 2023-10-12 13:42:14 +08:00
JIN Qiao
d74834ff4c LLM: add gpu pytorch-models example llama2 and chatglm2 (#9142) 2023-10-12 13:41:48 +08:00
binbin Deng
995b0f119f LLM: update some gpu examples (#9136) 2023-10-11 14:23:56 +08:00
binbin Deng
2ad67a18b1 LLM: add mistral examples (#9121) 2023-10-11 13:38:15 +08:00
Guoqiong Song
e8c5645067 add LLM example of aquila on GPU (#9056)
* aquila, dolly-v1, dolly-v2, vacuna
2023-10-10 17:01:35 -07:00
binbin Deng
5e9962b60e LLM: update example layout (#9046) 2023-10-09 15:36:39 +08:00
Yang Wang
88565c76f6 add export merged model example (#9018)
* add export merged model example

* add sources

* add script

* fix style
2023-10-04 21:18:52 -07:00
Ruonan Wang
b943d73844 LLM: refactor kv cache (#9030)
* refactor utils

* meet code review; update all models

* small fix
2023-09-21 21:28:03 +08:00
Ruonan Wang
bf51ec40b2 LLM: Fix empty cache (#9024)
* fix

* fix

* update example
2023-09-21 17:16:07 +08:00
binbin Deng
edb225530b add bark (#9016) 2023-09-21 12:24:58 +08:00
JinBridge
48b503c630 LLM: add example of aquila (#9006)
* LLM: add example of aquila

* LLM: replace AquilaChat with Aquila

* LLM: shorten prompt of aquila example
2023-09-20 15:52:56 +08:00
Yang Wang
c88f6ec457 Experiment XPU QLora Finetuning (#8937)
* Support xpu finetuning

* support xpu finetuning

* fix style

* fix style

* fix style

* refine example

* add readme

* refine readme

* refine api

* fix fp16

* fix example

* refactor

* fix style

* fix compute type

* add qlora

* refine training args

* fix example

* fix style

* fast path forinference

* address comments

* refine readme

* revert lint
2023-09-19 10:15:44 -07:00
Jason Dai
51518e029d Update llm readme (#9005) 2023-09-19 20:01:33 +08:00
Ruonan Wang
249386261c LLM: add Baichuan2 cpu example (#9002)
* add baichuan2 cpu examples

* add link

* update prompt
2023-09-19 18:08:30 +08:00
binbin Deng
c1d25a51a8 LLM: add optimize_model example for bert (#8975) 2023-09-18 16:18:35 +08:00
Ruonan Wang
cabe7c0358 LLM: add baichuan2 example for arc (#8994)
* add baichuan2 examples

* add link

* small fix
2023-09-18 14:32:27 +08:00
JinBridge
c12b8f24b6 LLM: add use_cache=True for all gpu examples (#8971) 2023-09-15 09:54:38 +08:00
binbin Deng
be29c75c18 LLM: refactor gpu examples (#8963)
* restructure

* change to hf-transformers-models/
2023-09-13 14:47:47 +08:00
Ruonan Wang
4de73f592e LLM: add gpu example of chinese-llama-2-7b (#8960)
* add gpu example of chinese -llama2

* update model name and link

* update name
2023-09-13 10:16:51 +08:00
binbin Deng
2d81521019 LLM: add optimize_model examples for llama2 and chatglm (#8894)
* add llama2 and chatglm optimize_model examples

* update default usage

* update command and some descriptions

* move folder and remove general_int4 descriptions

* change folder name
2023-09-12 10:36:29 +08:00
Yuwen Hu
ca35c93825 [LLM] Fix langchain UT (#8929)
* Change dependency version for langchain uts

* Downgrade pandas version instead; and update example readme accordingly
2023-09-08 13:51:04 +08:00
Zhao Changmin
8bc1d8a17c LLM: Fix discards in optimize_model with non-hf models and add openai whisper example (#8877)
* openai-whisper
2023-09-07 10:35:59 +08:00
Yina Chen
bfc71fbc15 Add known issue in arc voice assistant example (#8902)
* add known issue in voice assistant example

* update cpu
2023-09-07 09:28:26 +08:00
Yina Chen
74a2c2ddf5 Update optimize_model=True in llama2 chatglm2 arc examples (#8878)
* add optimize_model=True in llama2 chatglm2 examples

* add ipex optimize in gpt-j example
2023-09-05 10:35:37 +08:00
Zhao Changmin
9c652fbe95 LLM: Whisper long segment recognize example (#8826)
* LLM: Long segment recognize example
2023-08-31 16:41:25 +08:00
Yina Chen
3462fd5c96 Add arc gpt-j example (#8840) 2023-08-30 10:31:24 +08:00
Ruonan Wang
f42c0bad1b LLM: update GPU doc (#8845) 2023-08-30 09:24:19 +08:00
Jason Dai
aab7deab1f Reorganize GPU examples (#8844) 2023-08-30 08:32:08 +08:00
Yang Wang
a386ad984e Add Data Center GPU Flex Series to Readme (#8835)
* Add Data Center GPU Flex Series to Readme

* remove

* update starcoder
2023-08-29 11:19:09 -07:00
Ruonan Wang
ddff7a6f05 Update readme of GPU to specify oneapi version(#8820) 2023-08-29 13:14:22 +08:00
Yina Chen
35fdf94031 [LLM]Arc starcoder example (#8814)
* arc starcoder example init

* add log

* meet comments
2023-08-28 16:48:00 +08:00
Ruonan Wang
eae92bc7da llm: quick fix path (#8810) 2023-08-25 16:02:31 +08:00
Ruonan Wang
0186f3ab2f llm: update all ARC int4 examples (#8809)
* update GPU examples

* update other examples

* fix

* update based on comment
2023-08-25 15:26:10 +08:00
Yang Wang
9d0f6a8cce rename math.py in example to avoid conflict (#8805) 2023-08-24 21:06:31 -07:00
SONG Ge
d2926c7672 [LLM] Unify Langchain Native and Transformers LLM API (#8752)
* deprecate BigDLNativeTransformers and add specific LMEmbedding method

* deprecate and add LM methods for langchain llms

* add native params to native langchain

* new imple for embedding

* move ut from bigdlnative to casual llm

* rename embeddings api and examples update align with usage updating

* docqa example hot-fix

* add more api docs

* add langchain ut for starcoder

* support model_kwargs for transformer methods when calling causalLM and add ut

* ut fix for transformers embedding

* update for langchain causal supporting transformers

* remove model_family in readme doc

* add model_families params to support more models

* update api docs and remove chatglm embeddings for now

* remove chatglm embeddings in examples

* new refactor for ut to add bloom and transformers llama ut

* disable llama transformers embedding ut
2023-08-25 11:14:21 +08:00
binbin Deng
5582872744 LLM: update chatglm example to be more friendly for beginners (#8795) 2023-08-25 10:55:01 +08:00
Yina Chen
7c37424a63 Fix voice assistant example input error on Linux (#8799)
* fix linux error

* update

* remove alsa log
2023-08-25 10:47:27 +08:00
Ruonan Wang
e9aa2bd890 LLM: reduce GPU 1st token latency and update example (#8763)
* reduce 1st token latency

* update example

* fix

* fix style

* update readme of gpu benchmark
2023-08-16 18:01:23 +08:00
binbin Deng
06609d9260 LLM: add qwen example on arc (#8757) 2023-08-16 17:11:08 +08:00
Song Jiaming
c1f9af6d97 [LLM] chatglm example and transformers low-bit examples (#8751) 2023-08-16 11:41:44 +08:00
binbin Deng
97283c033c LLM: add falcon example on arc (#8742) 2023-08-15 17:38:38 +08:00
binbin Deng
8c55911308 LLM: add baichuan-13B on arc example (#8755) 2023-08-15 15:07:04 +08:00
binbin Deng
be2ae6eb7c LLM: fix langchain native int4 voiceasistant example (#8750) 2023-08-14 17:23:33 +08:00
Ruonan Wang
d28ad8f7db LLM: add whisper example for arc transformer int4 (#8749)
* add whisper example for arc int4

* fix
2023-08-14 17:05:48 +08:00
Ruonan Wang
faaccb64a2 LLM: add chatglm2 example for Arc (#8741)
* add chatglm2 example

* update

* fix readme
2023-08-14 10:43:08 +08:00
binbin Deng
b10d7e1adf LLM: add mpt example on arc (#8723) 2023-08-14 09:40:01 +08:00
binbin Deng
e9a1afffc5 LLM: add internlm example on arc (#8722) 2023-08-14 09:39:39 +08:00
SONG Ge
aceea4dc29 [LLM] Unify Transformers and Native API (#8713)
* re-open pr to run on latest runner

* re-add examples and ut

* rename ut and move deprecate to warning instead of raising an error info

* ut fix
2023-08-11 19:45:47 +08:00
Shengsheng Huang
7c56c39e36 Fix GPU examples READ to use bigdl-core-xe (#8714)
* Update README.md

* Update README.md
2023-08-10 12:53:49 +08:00
Yina Chen
6d1ca88aac add voice assistant example (#8711) 2023-08-10 12:42:14 +08:00
Ruonan Wang
1a7b698a83 [LLM] support ipex arc int4 & add basic llama2 example (#8700)
* first support of xpu

* make it works on gpu

update setup

update

add GPU llama2 examples

add use_optimize flag to disbale optimize for gpu

fix style

update gpu exmaple readme

fix

* update example, and update env

* fix setup to add cpp files

* replace jit with aot to avoid data leak

* rename to bigdl-core-xe

* update installation in example readme
2023-08-09 22:20:32 +08:00
binbin Deng
4c44153584 LLM: add Qwen transformers int4 example (#8699) 2023-08-08 11:23:09 +08:00
binbin Deng
6fc31bb4cf LLM: first update descriptions for ChatGLM transformers int4 example (#8646) 2023-08-02 11:00:56 +08:00
binbin Deng
39994738d1 LLM: add chat & stream chat example for ChatGLM2 transformers int4 (#8636) 2023-08-01 14:57:45 +08:00
Zhao Changmin
d6cbfc6d2c LLM: Add requirements in whisper example (#8644)
* LLM: Add requirements in whisper example
2023-08-01 12:07:14 +08:00
binbin Deng
3dbab9087b LLM: add llama2-7b native int4 example (#8629) 2023-07-28 10:56:16 +08:00
binbin Deng
fcf8c085e3 LLM: add llama2-13b native int4 example (#8613) 2023-07-26 10:12:52 +08:00
binbin Deng
3f24202e4c [LLM] Add more transformers int4 example (Llama 2) (#8602) 2023-07-25 09:21:12 +08:00
Jason Dai
0f8201c730 llm readme update (#8595) 2023-07-24 09:47:49 +08:00
Yuwen Hu
6504e31a97 Small fix (#8577) 2023-07-20 16:37:04 +08:00
Yuwen Hu
cad78740a7 [LLM] Small fixes to the Whisper transformers INT4 example (#8573)
* Small fixes to the whisper example

* Small fix

* Small fix
2023-07-20 10:11:33 +08:00
binbin Deng
7a9fdf74df [LLM] Add more transformers int4 example (Dolly v2) (#8571)
* add

* add trust_remote_mode
2023-07-19 18:20:16 +08:00
Zhao Changmin
e680af45ea LLM: Optimize Langchain Pipeline (#8561)
* LLM: Optimize Langchain Pipeline

* load in low bit
2023-07-19 17:43:13 +08:00
Shengsheng Huang
616b7cb0a2 add more langchain examples (#8542)
* update langchain descriptions

* add mathchain example

* update readme

* update readme
2023-07-19 17:42:18 +08:00
binbin Deng
457571b44e [LLM] Add more transformers int4 example (InternLM) (#8557) 2023-07-19 15:15:38 +08:00
Zhao Changmin
3dbe3bf18e transformer_int4 (#8553) 2023-07-19 08:33:58 +08:00
Zhao Changmin
49d636e295 [LLM] whisper model transformer int4 verification and example (#8511)
* LLM: transformer api support

* va

* example

* revert

* pep8

* pep8
2023-07-19 08:33:20 +08:00
Jason Dai
1ebc43b151 Update READMEs (#8554) 2023-07-18 11:06:06 +08:00
Yuwen Hu
ee70977c07 [LLM] Transformers int4 example small typo fixes (#8550) 2023-07-17 18:15:32 +08:00
Yuwen Hu
1344f50f75 [LLM] Add more transformers int4 examples (Falcon) (#8546)
* Initial commit

* Add Falcon examples and other small fix

* Small fix

* Small fix

* Update based on comments

* Small fix
2023-07-17 17:36:21 +08:00
Yuwen Hu
de772e7a80 Update mpt for prompt tuning (#8547) 2023-07-17 17:33:54 +08:00
binbin Deng
f1fd746722 [LLM] Add more transformers int4 example (vicuna) (#8544) 2023-07-17 16:59:55 +08:00
Xin Qiu
fccae91461 Add load_low_bit save_load_bit to AutoModelForCausalLM (#8531)
* transformers save_low_bit load_low_bit

* update example and add readme

* update

* update

* update

* add ut

* update
2023-07-17 15:29:55 +08:00
binbin Deng
808a64d53a [LLM] Add more transformers int4 example (starcoder) (#8540) 2023-07-17 14:41:19 +08:00
binbin Deng
f56b5ade4c [LLM] Add more transformers int4 example (chatglm2) (#8539) 2023-07-14 17:58:33 +08:00
binbin Deng
92d33cf35a [LLM] Add more transformers int4 example (phoenix) (#8520) 2023-07-14 17:58:04 +08:00
Yuwen Hu
e0f0def279 Remove unused example for now (#8538) 2023-07-14 17:32:50 +08:00
binbin Deng
b397e40015 [LLM] Add more transformers int4 example (RedPajama) (#8523) 2023-07-14 17:30:28 +08:00
Yuwen Hu
7bf3e10415 [LLM] Add more int4 transformers examples (MOSS) (#8532)
* Add Moss example

* Small fix
2023-07-14 16:41:41 +08:00
Yuwen Hu
59b7287ef5 [LLM] Add more transformers int4 example (Baichuan) (#8522)
* Add example model Baichuan

* Small updates to client windows settings

* Small refactor

* Small fix
2023-07-14 16:41:29 +08:00
Yuwen Hu
ca6e38607c [LLM] Add more transformers examples (ChatGLM) (#8521)
* Add example for chatglm v1 and other small fixes

* Small fix

* Small further fix

* Small fix

* Update based on comments & updates for client windows recommended settingts

* Small fix

* Small refactor

* Small fix

* Small fix

* Small fix to dolly v1

* Small fix
2023-07-14 16:41:13 +08:00
Yuwen Hu
349bcb4bae [LLM] Add more transformers int4 example (Dolly v1) (#8517)
* Initial commit for dolly v1

* Add example for Dolly v1 and other small fix

* Small output updates

* Small fix

* fix based on comments
2023-07-13 16:13:47 +08:00
Yuwen Hu
bcde8ec83e [LLM] Small fix to MPT Example (#8513) 2023-07-13 14:33:21 +08:00
Yuwen Hu
fcc352eee3 [LLM] Add more transformers_int4 examples (MPT) (#8498)
* Update transformers_int4 readme, and initial commit for mpt

* Update example for mpt

* Small fix and recover transformers_int4_pipeline_readme.md for now

* Update based on comments

* Small fix

* Small fix

* Update based on comments
2023-07-13 09:41:16 +08:00
Yuwen Hu
52c6b057d6 Initial LLM Transformers example refactor (#8491) 2023-07-10 17:53:57 +08:00
Junwei Deng
254a7aa3c4 bigdl-llm: add voice-assistant example that are migrated from langchain use-case document (#8468) 2023-07-10 16:51:45 +08:00
Ruonan Wang
2f77d485d8 Llm: Initial support of langchain transformer int4 API (#8459)
* first commit of transformer int4 and pipeline

* basic examples

temp save for embeddings

support embeddings and docqa exaple

* fix based on comment

* small fix
2023-07-06 17:50:05 +08:00
binbin Deng
14626fe05b LLM: refactor transformers and langchain class name (#8470) 2023-07-06 17:16:44 +08:00
binbin Deng
70bc8ea8ae LLM: update langchain and cpp-python style API examples (#8456) 2023-07-06 14:36:42 +08:00
binbin Deng
1970bcf14e LLM: add readme for transformer examples (#8444) 2023-07-04 17:25:58 +08:00
binbin Deng
c956a46c40 LLM: first fix example/transformers (#8438) 2023-07-03 14:13:33 +08:00
binbin Deng
ca5a4b6e3a LLM: update bloom and starcoder usage in transformers_int4_pipeline (#8406) 2023-06-28 13:15:50 +08:00
Ruonan Wang
4be784a49d LLM: add UT for starcoder (convert, inference) update examples and readme (#8379)
* first commit to add path

* update example and readme

* update path

* fix

* update based on comment
2023-06-27 12:12:11 +08:00
Ruonan Wang
b9eae23c79 LLM: add chatglm-6b example for transformer_int4 usage (#8392)
* add example for chatglm-6b

* fix
2023-06-26 13:46:43 +08:00
Shengsheng Huang
446175cc05 transformer api refactor (#8389)
* transformer api refactor

* fix style

* add huggingface tokenizer usage in example and make ggml tokenzizer as option 1 and huggingface tokenizer as option 2

* fix style
2023-06-25 17:15:33 +08:00
Yang Wang
ce6d06eb0a Support directly quantizing huggingface transformers into 4bit format (#8371)
* Support directly quantizing huggingface transformers into 4bit format

* refine example

* license

* fix bias

* address comments

* move to ggml transformers

* fix example

* fix style

* fix style

* address comments

* rename

* change API

* fix style

* add lm head to conversion

* address comments
2023-06-25 16:35:06 +08:00
Yuwen Hu
7ef1c890eb [LLM] Supports GPTQ convert in transfomers-like API, and supports folder outfile for llm-convert (#8366)
* Add docstrings to llm_convert

* Small docstrings fix

* Unify outfile type to be a folder path for either gptq or pth model_format

* Supports gptq model input for from_pretrained

* Fix example and readme

* Small fix

* Python style fix

* Bug fix in llm_convert

* Python style check

* Fix based on comments

* Small fix
2023-06-20 17:42:38 +08:00
Zhao Changmin
4d177ca0a1 LLM: Merge convert pth/gptq model script into one shell script (#8348)
* convert model in one

* model type

* license

* readme and pep8

* ut path

* rename

* readme

* fix docs

* without lines
2023-06-19 11:50:05 +08:00
Shengsheng Huang
02c583144c [LLM] langchain integrations and examples (#8256)
* langchain intergrations and examples

* add licences and rename

* add licences

* fix license issues and change backbone to model_family

* update examples to use model_family param

* fix linting

* fix code style

* exclude langchain integration from stylecheck

* update langchain examples and update integrations based on latets changes

* update simple llama-cpp-python style API example

* remove bloom in README

* change default n_threads to 2 and remove redundant code

---------

Co-authored-by: leonardozcm <changmin.zhao@intel.com>
2023-06-12 19:22:07 +08:00
Yuwen Hu
f83c48280f [LLM] Unify transformers-like API example for 3 different model families (#8315)
* Refactor bigdl-llm transformers-like API to unify them

* Small fix
2023-06-12 17:20:30 +08:00
Yuwen Hu
c619315131 [LLM] Add examples for gptneox, llama, and bloom family model using transformers-like API (#8286)
* First push of bigdl-llm example for gptneox model family

* Add some args and other small updates

* Small updates

* Add example for llama family models

* Small fix

* Small fix

* Update for batch_decode api and change default model for llama example

* Small fix

* Small fix

* Small fix

* Small model family name fix and add example for bloom

* Small fix

* Small default prompt fix

* Small fix

* Change default prompt

* Add sample output for inference

* Hide example inference time
2023-06-09 15:48:22 +08:00