Commit graph

1811 commits

Author SHA1 Message Date
binbin Deng
bf1bcf4a14 add official Mixtral model support (#9663) 2023-12-12 22:27:07 +08:00
Ziteng Zhang
8931f2eb62 [LLM] Fix transformer qwen size mismatch and rename causal_mask (#9655)
* Fix size mismatching caused by context_layer
* Change registered_causal_mask to causal_mask
2023-12-12 20:57:40 +08:00
binbin Deng
2fe38b4b9b LLM: add mixtral GPU examples (#9661) 2023-12-12 20:26:36 +08:00
Yuwen Hu
968d99e6f5 Remove empty cache between each iteration of generation (#9660) 2023-12-12 17:24:06 +08:00
Xin Qiu
0e639b920f disable test_optimized_model.py temporarily due to out of memory on A730M(pr validation machine) (#9658)
* disable test_optimized_model.py

* disable seq2seq
2023-12-12 17:13:52 +08:00
binbin Deng
59ce86d292 LLM: support optimize_model=True for Mixtral model (#9657) 2023-12-12 16:41:26 +08:00
Yuwen Hu
017932a7fb Small fix for html generation (#9656) 2023-12-12 14:06:18 +08:00
WeiguangHan
1e25499de0 LLM: test new oneapi (#9654)
* test new oneapi

* revert llm_performance_tests.yml
2023-12-12 11:12:14 +08:00
Yuwen Hu
d272b6dc47 [LLM] Enable generation of html again for win igpu tests (#9652)
* Enable generation of html again and comment out rwkv for 32-512 as it is not very stable

* Small fix
2023-12-11 19:15:17 +08:00
WeiguangHan
afa895877c LLM: fix the issue that may generate blank html (#9650)
* LLM: fix the issue that may generate blank html

* reslove some comments
2023-12-11 19:14:57 +08:00
Yining Wang
a04a027b4c Edit gpu doc (#9583)
* harness: run llama2-7b

* harness: run llama2-7b

* harness: run llama2-7b

* harness: run llama2-7b

* edit-gpu-doc

* fix some format problem

* fix spelling problems

* fix evaluation yml

* delete redundant space

* fix some problems

* address comments

* change link
2023-12-11 14:59:07 +08:00
ZehuaCao
45721f3473 verfiy llava (#9649) 2023-12-11 14:26:05 +08:00
Heyang Sun
9f02f96160 [LLM] support for Yi AWQ model (#9648) 2023-12-11 14:07:34 +08:00
Xin Qiu
82255f9726 Enable fused layernorm (#9614)
* bloom layernorm

* fix

* layernorm

* fix

* fix

* fix

* style fix

* fix

* replace nn.LayerNorm
2023-12-11 09:26:13 +08:00
Jason Dai
84a19705a6 Update readme (#9617) 2023-12-09 19:23:14 +08:00
Yuwen Hu
894d0aaf5e [LLM] iGPU win perf test reorg based on in-out pairs (#9645)
* trigger pr temparorily

* Saparate benchmark run for win igpu based in in-out pairs

* Rename fix

* Test workflow

* Small fix

* Skip generation of html for now

* Change back to nightly triggered
2023-12-08 20:46:40 +08:00
Chen, Zhentao
972cdb9992 gsm8k OOM workaround (#9597)
* update bigdl_llm.py

* update the installation of harness

* fix partial function

* import ipex

* force seq len in decrease order

* put func outside class

* move comments

* default 'trust_remote_code' as True

* Update llm-harness-evaluation.yml
2023-12-08 18:47:25 +08:00
WeiguangHan
1ff4bc43a6 degrade pandas version (#9643) 2023-12-08 17:44:51 +08:00
Yina Chen
70f5e7bf0d Support peft LoraConfig (#9636)
* support peft loraconfig

* use testcase to test

* fix style

* meet comments
2023-12-08 16:13:03 +08:00
Xin Qiu
0b6f29a7fc add fused rms norm for Yi and Qwen (#9640) 2023-12-08 16:04:38 +08:00
Xin Qiu
5636b0ba80 set new linear status (#9639) 2023-12-08 11:02:49 +08:00
binbin Deng
499100daf1 LLM: Add solution to fix oneccl related error (#9630) 2023-12-08 10:51:55 +08:00
ZehuaCao
d204125e88 [LLM] Use to build a more slim docker for k8s (#9608)
* Create Dockerfile.k8s

* Update Dockerfile

More slim standalone image

* Update Dockerfile

* Update Dockerfile.k8s

* Update bigdl-qlora-finetuing-entrypoint.sh

* Update qlora_finetuning_cpu.py

* Update alpaca_qlora_finetuning_cpu.py

Refer to this [pr](https://github.com/intel-analytics/BigDL/pull/9551/files#diff-2025188afa54672d21236e6955c7c7f7686bec9239532e41c7983858cc9aaa89), update the LoraConfig

* update

* update

* update

* update

* update

* update

* update

* update transformer version

* update Dockerfile

* update Docker image name

* fix error
2023-12-08 10:25:36 +08:00
ZehuaCao
6eca8a8bb5 update transformer version (#9631) 2023-12-08 09:36:00 +08:00
WeiguangHan
e9299adb3b LLM: Highlight some values in the html (#9635)
* highlight some values in the html

* revert the llm_performance_tests.yml
2023-12-07 19:02:41 +08:00
Yuwen Hu
6f34978b94 [LLM] Add more performance tests for win iGPU (more in-out pairs, RWKV model) (#9626)
* Add supports for loading rwkv models using from_pretrained api

* Temporarily enable pr tests

* Add RWKV in tests and more in-out pairs

* Add rwkv for 512 tests

* Make iterations smaller

* Change back to nightly trigger
2023-12-07 18:55:16 +08:00
Ruonan Wang
d9b0c01de3 LLM: fix unlora module in qlora finetune (#9621)
* fix unlora module

* split train and inference
2023-12-07 16:32:02 +08:00
Heyang Sun
3811cf43c9 [LLM] update AWQ documents (#9623)
* [LLM] update AWQ and verified models' documents

* refine

* refine links

* refine
2023-12-07 16:02:20 +08:00
Yishuo Wang
7319f2c227 use fused mlp in baichuan2 (#9620) 2023-12-07 15:50:57 +08:00
Xiangyu Tian
deee65785c [LLM] vLLM: Delete last_kv_cache before prefilling (#9619)
Remove last_kv_cache before prefilling to reduce peak memory usage.
2023-12-07 11:32:33 +08:00
Yuwen Hu
48b85593b3 Update all-in-one benchmark readme (#9618) 2023-12-07 10:32:09 +08:00
Xiangyu Tian
0327169b50 [LLM] vLLM: fix memory leak in prepare_kv_cache (#9616)
Revert modification in prepare_kv_cache to fix memory leak.
2023-12-07 10:08:18 +08:00
Xin Qiu
13d47955a8 use fused rms norm in chatglm2 and baichuan (#9613)
* use fused rms norm in chatglm2 and baichuan

* style fix
2023-12-07 09:21:41 +08:00
Jason Dai
51b668f229 Update GGUF readme (#9611) 2023-12-06 18:21:54 +08:00
dingbaorong
a7bc89b3a1 remove q4_1 in gguf example (#9610)
* remove q4_1

* fixes
2023-12-06 16:00:05 +08:00
Yina Chen
404e101ded QALora example (#9551)
* Support qa-lora

* init

* update

* update

* update

* update

* update

* update merge

* update

* fix style & update scripts

* update

* address comments

* fix typo

* fix typo

---------

Co-authored-by: Yang Wang <yang3.wang@intel.com>
2023-12-06 15:36:21 +08:00
Guancheng Fu
6978b2c316 [VLLM] Change padding patterns for vLLM & clean code (#9609)
* optimize

* fix minor error

* optimizations

* fix style
2023-12-06 15:27:26 +08:00
dingbaorong
89069d6173 Add gpu gguf example (#9603)
* add gpu gguf example

* some fixes

* address kai's comments

* address json's comments
2023-12-06 15:17:54 +08:00
Yuwen Hu
0e8f4020e5 Add traceback error output for win igpu test api in benchmark (#9607) 2023-12-06 14:35:16 +08:00
Ziteng Zhang
aeb77b2ab1 Add minimum Qwen model version (#9606) 2023-12-06 11:49:14 +08:00
Yuwen Hu
c998f5f2ba [LLM] iGPU long context tests (#9598)
* Temp enable PR

* Enable tests for 256-64

* Try again 128-64

* Empty cache after each iteration for igpu benchmark scripts

* Try tests for 512

* change order for 512

* Skip chatglm3 and llama2 for now

* Separate tests for 512-64

* Small fix

* Further fixes

* Change back to nightly again
2023-12-06 10:19:20 +08:00
Heyang Sun
4e70e33934 [LLM] code and document for distributed qlora (#9585)
* [LLM] code and document for distributed qlora

* doc

* refine for gradient checkpoint

* refine

* Update alpaca_qlora_finetuning_cpu.py

* Update alpaca_qlora_finetuning_cpu.py

* Update alpaca_qlora_finetuning_cpu.py

* add link in doc
2023-12-06 09:23:17 +08:00
Zheng, Yi
d154b38bf9 Add llama2 gpu low memory example (#9514)
* Add low memory example

* Minor fixes

* Update readme.md
2023-12-05 17:29:48 +08:00
Jason Dai
06febb5fa7 Update readme for FP8/FP4 inference examples (#9601) 2023-12-05 15:59:03 +08:00
dingbaorong
a66fbedd7e add gpu more data types example (#9592)
* add gpu more data types example

* add int8
2023-12-05 15:45:38 +08:00
Ziteng Zhang
65934c9f4f [LLM] Fix Qwen causal_mask and attention_mask size mismatching (#9600)
* Fix #9582 , caused by Qwen modified modeling_qwen.py 7f62181c94 (d2h-049182)
2023-12-05 15:15:54 +08:00
Jinyi Wan
b721138132 Add cpu and gpu examples for BlueLM (#9589)
* Add cpu int4 example for BlueLM

* addexample optimize_model cpu for bluelm

* add example gpu int4 blueLM

* add example optimiza_model GPU for bluelm

* Fixing naming issues and BigDL package version.

* Fixing naming issues...

* Add BlueLM in README.md "Verified Models"
2023-12-05 13:59:02 +08:00
Guancheng Fu
8b00653039 fix doc (#9599) 2023-12-05 13:49:31 +08:00
Qiyuan Gong
f211f136b6 Configurable TORCH_LINEAR_THRESHOLD from env (#9588)
* Add TORCH_LINEAR_THRESHOLD from env (BIGDL_LLM_LINEAR_THRESHOLD)
* Change default to 512
2023-12-05 13:19:47 +08:00
Yuwen Hu
1012507a40 [LLM] Fix performance tests (#9596)
* Fix missing key for cpu_embedding

* Remove 512 as it stuck for now

* Small fix
2023-12-05 10:59:28 +08:00