Commit graph

279 commits

Author SHA1 Message Date
Heyang Sun
5184f400f9 Fix Mixtral GGUF Wrong Output Issue (#9930)
* Fix Mixtral GGUF Wrong Output Issue

* fix style

* fix style
2024-01-18 14:11:27 +08:00
Jinyi Wan
07485eff5a Add SOLAR-10.7B to README (#9869) 2024-01-11 14:28:41 +08:00
ZehuaCao
e76d984164 [LLM] Support llm-awq vicuna-7b-1.5 on arc (#9874)
* support llm-awq vicuna-7b-1.5 on arc

* support llm-awq vicuna-7b-1.5 on arc
2024-01-10 14:28:39 +08:00
Yuwen Hu
023679459e [LLM] Small fixes for finetune related examples and UTs (#9870) 2024-01-09 18:05:03 +08:00
Yuwen Hu
23fc888abe Update llm gpu xpu default related info to PyTorch 2.1 (#9866) 2024-01-09 15:38:47 +08:00
ZehuaCao
146076bdb5 Support llm-awq backend (#9856)
* Support for LLM-AWQ Backend

* fix

* Update README.md

* Add awqconfig

* modify init

* update

* support llm-awq

* fix style

* fix style

* update

* fix AwqBackendPackingMethod not found error

* fix style

* update README

* fix style

---------

Co-authored-by: Uxito-Ada <414416158@qq.com>
Co-authored-by: Heyang Sun <60865256+Uxito-Ada@users.noreply.github.com>
Co-authored-by: cyita <yitastudy@gmail.com>
2024-01-09 13:07:32 +08:00
binbin Deng
294fd32787 LLM: update DeepSpeed AutoTP example with GPU memory optimization (#9823) 2024-01-09 09:22:49 +08:00
Mingyu Wei
ed81baa35e LLM: Use default typing-extension in LangChain examples (#9857)
* remove typing extension downgrade in readme; minor fixes of code

* fix typos in README

* change default question of docqa.py
2024-01-08 16:50:55 +08:00
Jinyi Wan
3147ebe63d Add cpu and gpu examples for SOLAR-10.7B (#9821) 2024-01-05 09:50:28 +08:00
Ruonan Wang
8504a2bbca LLM: update qlora alpaca example to change lora usage (#9835)
* update example

* fix style
2024-01-04 15:22:20 +08:00
Ziteng Zhang
05b681fa85 [LLM] IPEX auto importer set on by default (#9832)
* Set BIGDL_IMPORT_IPEX default to True

* Remove import intel_extension_for_pytorch as ipex from GPU example
2024-01-04 13:33:29 +08:00
Wang, Jian4
4ceefc9b18 LLM: Support bitsandbytes config on qlora finetune (#9715)
* test support bitsandbytesconfig

* update style

* update cpu example

* update example

* update readme

* update unit test

* use bfloat16

* update logic

* use int4

* set defalut bnb_4bit_use_double_quant

* update

* update example

* update model.py

* update

* support lora example
2024-01-04 11:23:16 +08:00
Wang, Jian4
a54cd767b1 LLM: Add gguf falcon (#9801)
* init falcon

* update convert.py

* update style
2024-01-03 14:49:02 +08:00
binbin Deng
6584539c91 LLM: fix installation of codellama (#9813) 2024-01-02 14:32:50 +08:00
Wang, Jian4
7ed9538b9f LLM: support gguf mpt (#9773)
* add gguf mpt

* update
2023-12-28 09:22:39 +08:00
binbin Deng
40edb7b5d7 LLM: fix get environment variables setting (#9787) 2023-12-27 09:11:37 +08:00
Jason Dai
361781bcd0 Update readme (#9788) 2023-12-26 19:46:11 +08:00
Ziteng Zhang
44b4a0c9c5 [LLM] Correct prompt format of Yi, Llama2 and Qwen in generate.py (#9786)
* correct prompt format of Yi

* correct prompt format of llama2 in cpu generate.py

* correct prompt format of Qwen in GPU example
2023-12-26 16:57:55 +08:00
Heyang Sun
66e286a73d Support for Mixtral AWQ (#9775)
* Support for Mixtral AWQ

* Update README.md

* Update README.md

* Update awq_config.py

* Update README.md

* Update README.md
2023-12-25 16:08:09 +08:00
Ruonan Wang
1917bbe626 LLM: fix BF16Linear related training & inference issue (#9755)
* fix bf16 related issue

* fix

* update based on comment & add arc lora script

* update readme

* update based on comment

* update based on comment

* update

* force to bf16

* fix style

* move check input dtype into function

* update convert

* meet code review

* meet code review

* update merged model to support new training_mode api

* fix typo
2023-12-25 14:49:30 +08:00
Yina Chen
449b387125 Support relora in bigdl-llm (#9687)
* init

* fix style

* update

* support resume & update readme

* update

* update

* remove important

* add training mode

* meet comments
2023-12-25 14:04:28 +08:00
Yishuo Wang
be13b162fe add codeshell example (#9743) 2023-12-25 10:54:01 +08:00
binbin Deng
ed8ed76d4f LLM: update deepspeed autotp usage (#9733) 2023-12-25 09:41:14 +08:00
Qiyuan Gong
4c487313f2 Revert "[LLM] IPEX auto importer turn on by default for XPU (#9730)" (#9759)
This reverts commit 0284801fbd.
2023-12-22 16:38:24 +08:00
Qiyuan Gong
0284801fbd [LLM] IPEX auto importer turn on by default for XPU (#9730)
* Set BIGDL_IMPORT_IPEX default to true, i.e., auto import IPEX for XPU.
* Remove import intel_extension_for_pytorch as ipex from GPU example.
* Add support for bigdl-core-xe-21.
2023-12-22 16:20:32 +08:00
Ruonan Wang
2f36769208 LLM: bigdl-llm lora support & lora example (#9740)
* lora support and single card example

* support multi-card, refactor code

* fix model id and style

* remove torch patch, add two new class for bf16, update example

* fix style

* change to training_mode

* small fix

* add more info in help

* fixstyle, update readme

* fix ut

* fix ut

* Handling compatibility issues with default LoraConfig
2023-12-22 11:05:39 +08:00
Wang, Jian4
984697afe2 LLM: Add bloom gguf support (#9734)
* init

* update bloom add merges

* update

* update readme

* update for llama error

* update
2023-12-21 14:06:25 +08:00
Heyang Sun
1fa7793fc0 Load Mixtral GGUF Model (#9690)
* Load Mixtral GGUF Model

* refactor

* fix empty tensor when to cpu

* update gpu and cpu readmes

* add dtype when set tensor into module
2023-12-19 13:54:38 +08:00
binbin Deng
12df70953e LLM: add resume_from_checkpoint related section (#9705) 2023-12-18 12:27:02 +08:00
Wang, Jian4
b8437a1c1e LLM: Add gguf mistral model support (#9691)
* add mistral support

* need to upgrade transformers version

* update
2023-12-15 13:37:39 +08:00
Wang, Jian4
496bb2e845 LLM: Support load BaiChuan model family gguf model (#9685)
* support baichuan model family gguf model

* update gguf generate.py

* add verify models

* add support model_family

* update

* update style

* update type

* update readme

* update

* remove support model_family
2023-12-15 13:34:33 +08:00
Lilac09
3afed99216 fix path issue (#9696) 2023-12-15 11:21:49 +08:00
Jason Dai
37f509bb95 Update readme (#9692) 2023-12-14 19:50:21 +08:00
Ziteng Zhang
21c7503a42 [LLM] Correct prompt format of Qwen in generate.py (#9678)
* Change qwen prompt format to chatml
2023-12-14 14:01:30 +08:00
Qiyuan Gong
223c9622f7 [LLM] Mixtral CPU examples (#9673)
* Mixtral CPU PyTorch and hugging face examples, based on #9661 and #9671
2023-12-14 10:35:11 +08:00
ZehuaCao
877229f3be [LLM]Add Yi-34B-AWQ to verified AWQ model. (#9676)
* verfiy Yi-34B-AWQ

* update
2023-12-14 09:55:47 +08:00
binbin Deng
68a4be762f remove disco mixtral, update oneapi version (#9671) 2023-12-13 23:24:59 +08:00
ZehuaCao
503880809c verfiy codeLlama (#9668) 2023-12-13 15:39:31 +08:00
Heyang Sun
c64e2248ef fix str returned by get_int_from_str rather than expected int (#9667) 2023-12-13 11:01:21 +08:00
binbin Deng
bf1bcf4a14 add official Mixtral model support (#9663) 2023-12-12 22:27:07 +08:00
binbin Deng
2fe38b4b9b LLM: add mixtral GPU examples (#9661) 2023-12-12 20:26:36 +08:00
ZehuaCao
45721f3473 verfiy llava (#9649) 2023-12-11 14:26:05 +08:00
Heyang Sun
9f02f96160 [LLM] support for Yi AWQ model (#9648) 2023-12-11 14:07:34 +08:00
Yina Chen
70f5e7bf0d Support peft LoraConfig (#9636)
* support peft loraconfig

* use testcase to test

* fix style

* meet comments
2023-12-08 16:13:03 +08:00
binbin Deng
499100daf1 LLM: Add solution to fix oneccl related error (#9630) 2023-12-08 10:51:55 +08:00
ZehuaCao
6eca8a8bb5 update transformer version (#9631) 2023-12-08 09:36:00 +08:00
Heyang Sun
3811cf43c9 [LLM] update AWQ documents (#9623)
* [LLM] update AWQ and verified models' documents

* refine

* refine links

* refine
2023-12-07 16:02:20 +08:00
Jason Dai
51b668f229 Update GGUF readme (#9611) 2023-12-06 18:21:54 +08:00
dingbaorong
a7bc89b3a1 remove q4_1 in gguf example (#9610)
* remove q4_1

* fixes
2023-12-06 16:00:05 +08:00
Yina Chen
404e101ded QALora example (#9551)
* Support qa-lora

* init

* update

* update

* update

* update

* update

* update merge

* update

* fix style & update scripts

* update

* address comments

* fix typo

* fix typo

---------

Co-authored-by: Yang Wang <yang3.wang@intel.com>
2023-12-06 15:36:21 +08:00
dingbaorong
89069d6173 Add gpu gguf example (#9603)
* add gpu gguf example

* some fixes

* address kai's comments

* address json's comments
2023-12-06 15:17:54 +08:00
Ziteng Zhang
aeb77b2ab1 Add minimum Qwen model version (#9606) 2023-12-06 11:49:14 +08:00
Heyang Sun
4e70e33934 [LLM] code and document for distributed qlora (#9585)
* [LLM] code and document for distributed qlora

* doc

* refine for gradient checkpoint

* refine

* Update alpaca_qlora_finetuning_cpu.py

* Update alpaca_qlora_finetuning_cpu.py

* Update alpaca_qlora_finetuning_cpu.py

* add link in doc
2023-12-06 09:23:17 +08:00
Zheng, Yi
d154b38bf9 Add llama2 gpu low memory example (#9514)
* Add low memory example

* Minor fixes

* Update readme.md
2023-12-05 17:29:48 +08:00
Jason Dai
06febb5fa7 Update readme for FP8/FP4 inference examples (#9601) 2023-12-05 15:59:03 +08:00
dingbaorong
a66fbedd7e add gpu more data types example (#9592)
* add gpu more data types example

* add int8
2023-12-05 15:45:38 +08:00
Jinyi Wan
b721138132 Add cpu and gpu examples for BlueLM (#9589)
* Add cpu int4 example for BlueLM

* addexample optimize_model cpu for bluelm

* add example gpu int4 blueLM

* add example optimiza_model GPU for bluelm

* Fixing naming issues and BigDL package version.

* Fixing naming issues...

* Add BlueLM in README.md "Verified Models"
2023-12-05 13:59:02 +08:00
Guancheng Fu
8b00653039 fix doc (#9599) 2023-12-05 13:49:31 +08:00
Wang, Jian4
ed0dc57c6e LLM: Add cpu qlora support other models guide (#9567)
* use bf16 flag

* add using baichuan model

* update merge

* remove

* update
2023-12-01 11:18:04 +08:00
Jason Dai
bda404fc8f Update readme (#9575) 2023-11-30 22:45:52 +08:00
Yishuo Wang
66f5b45f57 [LLM] add a llama2 gguf example (#9553) 2023-11-30 16:37:17 +08:00
Wang, Jian4
a0a80d232e LLM: Add qlora cpu distributed readme (#9561)
* init readme

* add distributed guide

* update
2023-11-30 13:42:30 +08:00
Qiyuan Gong
d85a430a8c Uing bigdl-llm-init instead of bigdl-nano-init (#9558)
* Replace `bigdl-nano-init` with `bigdl-llm-init`.
* Install `bigdl-llm` instead of `bigdl-nano`.
* Remove nano in README.
2023-11-30 10:10:29 +08:00
binbin Deng
4ff2ca9d0d LLM: fix loss error on Arc (#9550) 2023-11-29 15:16:18 +08:00
Wang, Jian4
b824754256 LLM: Update for cpu qlora mpirun (#9548) 2023-11-29 10:56:17 +08:00
Guancheng Fu
963a5c8d79 Add vLLM-XPU version's README/examples (#9536)
* test

* test

* fix last kv cache

* add xpu readme

* remove numactl for xpu example

* fix link error

* update max_num_batched_tokens logic

* add explaination

* add xpu environement version requirement

* refine gpu memory

* fix

* fix style
2023-11-28 09:44:03 +08:00
Guancheng Fu
b6c3520748 Remove xformers from vLLM-CPU (#9535) 2023-11-27 11:21:25 +08:00
binbin Deng
2b9c7d2a59 LLM: quick fix alpaca qlora finetuning script (#9534) 2023-11-27 11:04:27 +08:00
binbin Deng
6bec0faea5 LLM: support Mistral AWQ models (#9520) 2023-11-24 16:20:22 +08:00
Jason Dai
b3178d449f Update README.md (#9525) 2023-11-23 21:45:20 +08:00
Jason Dai
82898a4203 Update GPU example README (#9524) 2023-11-23 21:20:26 +08:00
Jason Dai
064848028f Update README.md (#9523) 2023-11-23 21:16:21 +08:00
Guancheng Fu
bf579507c2 Integrate vllm (#9310)
* done

* Rename structure

* add models

* Add structure/sampling_params,sequence

* add input_metadata

* add outputs

* Add policy,logger

* add and update

* add parallelconfig back

* core/scheduler.py

* Add llm_engine.py

* Add async_llm_engine.py

* Add tested entrypoint

* fix minor error

* Fix everything

* fix kv cache view

* fix

* fix

* fix

* format&refine

* remove logger from repo

* try to add token latency

* remove logger

* Refine config.py

* finish worker.py

* delete utils.py

* add license

* refine

* refine sequence.py

* remove sampling_params.py

* finish

* add license

* format

* add license

* refine

* refine

* Refine line too long

* remove exception

* so dumb style-check

* refine

* refine

* refine

* refine

* refine

* refine

* add README

* refine README

* add warning instead error

* fix padding

* add license

* format

* format

* format fix

* Refine vllm dependency (#1)

vllm dependency clear

* fix licence

* fix format

* fix format

* fix

* adapt LLM engine

* fix

* add license

* fix format

* fix

* Moving README.md to the correct position

* Fix readme.md

* done

* guide for adding models

* fix

* Fix README.md

* Add new model readme

* remove ray-logic

* refactor arg_utils.py

* remove distributed_init_method logic

* refactor entrypoints

* refactor input_metadata

* refactor model_loader

* refactor utils.py

* refactor models

* fix api server

* remove vllm.stucture

* revert by txy 1120

* remove utils

* format

* fix license

* add bigdl model

* Refer to a specfic commit

* Change code base

* add comments

* add async_llm_engine comment

* refine

* formatted

* add worker comments

* add comments

* add comments

* fix style

* add changes

---------

Co-authored-by: xiangyuT <xiangyu.tian@intel.com>
Co-authored-by: Xiangyu Tian <109123695+xiangyuT@users.noreply.github.com>
Co-authored-by: leonardozcm <leonardo1997zcm@gmail.com>
2023-11-23 16:46:45 +08:00
Heyang Sun
48fbb1eb94 support ccl (MPI) distributed mode in alpaca_qlora_finetuning_cpu (#9507) 2023-11-23 10:58:09 +08:00
Heyang Sun
11fa5a8a0e Fix QLoRA CPU dispatch_model issue about accelerate (#9506) 2023-11-23 08:41:25 +08:00
Heyang Sun
1453046938 install bigdl-llm in deepspeed cpu inference example (#9508) 2023-11-23 08:39:21 +08:00
binbin Deng
86743fb57b LLM: fix transformers version in CPU finetuning example (#9511) 2023-11-22 15:53:07 +08:00
binbin Deng
1a2129221d LLM: support resume from checkpoint in Alpaca QLoRA (#9502) 2023-11-22 13:49:14 +08:00
Ruonan Wang
076d106ef5 LLM: GPU QLoRA update to bf16 to accelerate gradient checkpointing (#9499)
* update to bf16 to accelerate gradient checkpoint

* add utils and fix ut
2023-11-21 17:08:36 +08:00
binbin Deng
b7ae572ac3 LLM: update Alpaca QLoRA finetuning example on GPU (#9492) 2023-11-21 14:22:19 +08:00
Wang, Jian4
c5cb3ab82e LLM : Add CPU alpaca qlora example (#9469)
* init

* update xpu to cpu

* update

* update readme

* update example

* update

* add refer

* add guide to train different datasets

* update readme

* update
2023-11-21 09:19:58 +08:00
binbin Deng
96fd26759c LLM: fix QLoRA finetuning example on CPU (#9489) 2023-11-20 14:31:24 +08:00
binbin Deng
3dac21ac7b LLM: add more example usages about alpaca qlora on different hardware (#9458) 2023-11-17 09:56:43 +08:00
Heyang Sun
921b263d6a update deepspeed install and run guide in README (#9441) 2023-11-17 09:11:39 +08:00
Yina Chen
d5263e6681 Add awq load support (#9453)
* Support directly loading GPTQ models from huggingface

* fix style

* fix tests

* change example structure

* address comments

* fix style

* init

* address comments

* add examples

* fix style

* fix style

* fix style

* fix style

* update

* remove

* meet comments

* fix style

---------

Co-authored-by: Yang Wang <yang3.wang@intel.com>
2023-11-16 14:06:25 +08:00
Ruonan Wang
0f82b8c3a0 LLM: update qlora example (#9454)
* update qlora example

* fix loss=0
2023-11-15 09:24:15 +08:00
Yang Wang
51d07a9fd8 Support directly loading gptq models from huggingface (#9391)
* Support directly loading GPTQ models from huggingface

* fix style

* fix tests

* change example structure

* address comments

* fix style

* address comments
2023-11-13 20:48:12 -08:00
Heyang Sun
da6bbc8c11 fix deepspeed dependencies to install (#9400)
* remove reductant parameter from deepspeed install

* Update install.sh

* Update install.sh
2023-11-13 16:42:50 +08:00
Zheng, Yi
9b5d0e9c75 Add examples for Yi-6B (#9421) 2023-11-13 10:53:15 +08:00
Wang, Jian4
ac7fbe77e2 Update qlora readme (#9416) 2023-11-12 19:29:29 +08:00
Zheng, Yi
0674146cfb Add cpu and gpu examples of distil-whisper (#9374)
* Add distil-whisper examples

* Fixes based on comments

* Minor fixes

---------

Co-authored-by: Ariadne330 <wyn2000330@126.com>
2023-11-10 16:09:55 +08:00
Ziteng Zhang
ad81b5d838 Update qlora README.md (#9422) 2023-11-10 15:19:25 +08:00
Heyang Sun
b23b91407c fix llm-init on deepspeed missing lib (#9419) 2023-11-10 13:51:24 +08:00
dingbaorong
36fbe2144d Add CPU examples of fuyu (#9393)
* add fuyu cpu examples

* add gpu example

* add comments

* add license

* remove gpu example

* fix inference time
2023-11-09 15:29:19 +08:00
binbin Deng
54d95e4907 LLM: add alpaca qlora finetuning example (#9276) 2023-11-08 16:25:17 +08:00
binbin Deng
97316bbb66 LLM: highlight transformers version requirement in mistral examples (#9380) 2023-11-08 16:05:03 +08:00
Heyang Sun
af94058203 [LLM] Support CPU deepspeed distributed inference (#9259)
* [LLM] Support CPU Deepspeed distributed inference

* Update run_deepspeed.py

* Rename

* fix style

* add new codes

* refine

* remove annotated codes

* refine

* Update README.md

* refine doc and example code
2023-11-06 17:56:42 +08:00
Jin Qiao
e6b6afa316 LLM: add aquila2 model example (#9356) 2023-11-06 15:47:39 +08:00
Yining Wang
9377b9c5d7 add CodeShell CPU example (#9345)
* add CodeShell CPU example

* fix some problems
2023-11-03 13:15:54 +08:00
Zheng, Yi
63411dff75 Add cpu examples of WizardCoder (#9344)
* Add wizardcoder example

* Minor fixes
2023-11-02 20:22:43 +08:00