Commit graph

962 commits

Author SHA1 Message Date
Xin Qiu
30795bdfbc Gemma optimization: rms_norm, kv_cache, fused_rope, fused_rope+qkv (#10212)
* gemma optimization

* update

* update

* fix style

* meet code review
2024-02-23 10:07:24 +08:00
Guoqiong Song
63681af97e falcon for transformers 4.36 (#9960)
* falcon for transformers 4.36
2024-02-22 17:04:40 -08:00
Jason Dai
84d5f40936 Update README.md (#10213) 2024-02-22 17:22:59 +08:00
Yina Chen
ce5840a8b7 GPT-J rope optimization on xpu (#10182)
* optimize

* update

* fix style & move use_fuse_rope

* add ipex version check

* fix style

* update

* fix style

* meet comments

* address comments

* fix style
2024-02-22 16:25:12 +08:00
Xiangyu Tian
f445217d02 LLM: Update IPEX to 2.2.0+cpu and Refactor for _ipex_optimize (#10189)
Update IPEX to 2.2.0+cpu and refactor for _ipex_optimize.
2024-02-22 16:01:11 +08:00
Heyang Sun
c876d9b5ca Support for MPT rotary embedding (#10208) 2024-02-22 15:16:31 +08:00
Ruonan Wang
5e1fee5e05 LLM: add GGUF-IQ2 examples (#10207)
* add iq2 examples

* small fix

* meet code review

* fix

* meet review

* small fix
2024-02-22 14:18:45 +08:00
Yuwen Hu
21de2613ce [LLM] Add model loading time record for all-in-one benchmark (#10201)
* Add model loading time record in csv for all-in-one benchmark

* Small fix

* Small fix to number after .
2024-02-22 13:57:18 +08:00
Ovo233
60e11b6739 LLM: Add mlp layer unit tests (#10200)
* add mlp layer unit tests

* add download baichuan-13b

* exclude llama for now

* install additional packages

* rename bash file

* switch to Baichuan2

* delete attention related code

* fix name errors in yml file
2024-02-22 13:44:45 +08:00
SONG Ge
ca1166a0e5 [LLM] Add quantize kv_cache for Baichuan2-13B (#10203)
* add quantize kv_cache for baichuan2-13b

* style fix
2024-02-22 13:43:35 +08:00
Ruonan Wang
34ee1aa91f LLM: add esimd sdp support for chatglm3 (#10205)
* add esimd sdp support

* fix style
2024-02-22 13:37:16 +08:00
Yuxuan Xia
7cbc2429a6 Fix C-Eval ChatGLM loading issue (#10206)
* Add c-eval workflow and modify running files

* Modify the chatglm evaluator file

* Modify the ceval workflow for triggering test

* Modify the ceval workflow file

* Modify the ceval workflow file

* Modify ceval workflow

* Adjust the ceval dataset download

* Add ceval workflow dependencies

* Modify ceval workflow dataset download

* Add ceval test dependencies

* Add ceval test dependencies

* Correct the result print

* Fix the nightly test trigger time

* Fix ChatGLM loading issue
2024-02-22 10:00:43 +08:00
Yuwen Hu
94cb16fe40 [LLM] Small updates to Win GPU Install Doc (#10199)
* Make Offline installer as default for win gpu doc for oneAPI

* Small other fixes
2024-02-21 17:58:40 +08:00
binbin Deng
9975b029c5 LLM: add qlora finetuning example using trl.SFTTrainer (#10183) 2024-02-21 16:40:04 +08:00
Ruonan Wang
f7c96b19ef LLM: support iq2 for mixtral (#10191)
* support name mapping for mixtral

* support mixtral mixed quantization

* fix style

* fix
2024-02-21 16:00:29 +08:00
yb-peng
b1a97b71a9 Harness eval: Add is_last parameter and fix logical operator in highlight_vals (#10192)
* Add is_last parameter and fix logical operator in highlight_vals

* Add script to update HTML files in parent folder

* Add running update_html_in_parent_folder.py in summarize step

* Add licence info

* Remove update_html_in_parent_folder.py in Summarize the results for pull request
2024-02-21 14:45:32 +08:00
Zhicun
c7e839e66c Add Qwen1.5-7B-Chat (#10113)
* add Qwen1.5-7B-Chat

* modify Qwen1.5 example

* update README

* update prompt format

* update folder name and example README

* add Chinese prompt sample output

* update link in README

* correct the link

* update transformer version
2024-02-21 13:29:29 +08:00
Xin Qiu
56ad781f2f qwen2 cpu fix (#10187) 2024-02-21 11:23:51 +08:00
Chen, Zhentao
39d37bd042 upgrade harness package version in workflow (#10188)
* upgrade harness

* update readme
2024-02-21 11:21:30 +08:00
Yuwen Hu
001c13243e [LLM] Add support for low_low_bit benchmark on Windows GPU (#10167)
* Add support for low_low_bit performance test on Windows GPU

* Small fix

* Small fix

* Save memory during converting model process

* Drop the results for first time when loading in low bit on mtl igpu for better performance

* Small fix
2024-02-21 10:51:52 +08:00
Ziteng Zhang
276ef0e885 Speculative Ziya on CPU (#10160)
* Speculative Ziya on CPU

* Without part of Accelerate with BIGDL_OPT_IPEX
2024-02-21 10:30:39 +08:00
Zhao Changmin
4fbf449c2d for rwkv4 (#10179) 2024-02-21 10:11:10 +08:00
yb-peng
de3dc609ee Modify harness evaluation workflow (#10174)
* Modify table head in harness

* Specify the file path of fp16.csv

* change run to run nightly and run pr to debug

* Modify the way to get fp16.csv to downloading from github

* Change the method to calculate diff in html table

* Change the method to calculate diff in html table

* Re-arrange job order

* Re-arrange job order

* Change limit

* Change fp16.csv  path

* Change highlight rules

* Change limit
2024-02-20 18:55:43 +08:00
Ruonan Wang
3288acb8de LLM : Support embedding quantization (only q2k now) (#10170)
* basic logic added

* basic support

* support save&load, update mixed strategy

* fix style

* use int8 for lm_head

* add check for xpu
2024-02-20 16:56:57 +08:00
hxsz1997
6e10d98a8d Fix some typos (#10175)
* add llm-ppl workflow

* update the DATASET_DIR

* test multiple precisions

* modify nightly test

* match the updated ppl code

* add matrix.include

* fix the include error

* update the include

* add more model

* update the precision of include

* update nightly time and add more models

* fix the workflow_dispatch description, change default model of pr and modify the env

* modify workflow_dispatch language options

* modify options

* modify language options

* modeify workflow_dispatch type

* modify type

* modify the type of language

* change seq_len type

* fix some typos

* revert changes to stress_test.txt
2024-02-20 14:14:53 +08:00
Zhicun
add3899311 Add ziya CPU example (#10114)
* ziya on CPU

* add README for ziya

* specify use_cache

* add arc CPU

* update prompt format

* update link

* add comments to emphasize use_cache

* update pip cmd
2024-02-20 13:59:52 +08:00
binbin Deng
2bb96c775c LLM: fix device setting during saving optimized model (#10154) 2024-02-20 09:52:59 +08:00
Xin Qiu
1f6d5b9f30 enable fused rmsnorm and rope qwen2 (#10163)
* qwen2

* change convert

* cleanup
2024-02-20 08:33:09 +08:00
yb-peng
e31210ba00 Modify html table style and add fp16.csv in harness (#10169)
* Specify the version of pandas in harness evaluation workflow

* Specify the version of pandas in harness evaluation workflow

* Modify html table style and add fp16.csv in harness

* Modify comments
2024-02-19 18:13:40 +08:00
WeiguangHan
6c09aed90d LLM: add qwen_1.5_7b model for arc perf test (#10166)
* LLM: add qwen_1.5_7b model for arc perf test

* small fix

* revert some codes
2024-02-19 17:21:00 +08:00
Yuxuan Xia
209122559a Add Ceval workflow and modify the result printing (#10140)
* Add c-eval workflow and modify running files

* Modify the chatglm evaluator file

* Modify the ceval workflow for triggering test

* Modify the ceval workflow file

* Modify the ceval workflow file

* Modify ceval workflow

* Adjust the ceval dataset download

* Add ceval workflow dependencies

* Modify ceval workflow dataset download

* Add ceval test dependencies

* Add ceval test dependencies

* Correct the result print
2024-02-19 17:06:53 +08:00
Zhao Changmin
f8730e8dc1 Skip rescale rwkv linear when load_low_bit (#10164)
* rwkv_ld
2024-02-19 15:56:42 +08:00
Heyang Sun
3e2af5ec0a Fix IPEX Baichuan Speculative (#10162)
* Fix IPEX Baichuan Speculative

* compatible with 13B

* Update speculative.py
2024-02-19 15:27:34 +08:00
Yina Chen
23c91cdce6 [LLM] Add min_step_draft in speculative decoding (#10142)
* Fix gptj kvcache & position id

* Add min_draft_tokens in speculative decoding

* fix style

* update
2024-02-19 14:31:41 +08:00
Chen, Zhentao
14ba2c5135 Harness: remove deprecated files (#10165) 2024-02-19 14:27:49 +08:00
Wang, Jian4
d3591383d5 LLM : Add CPU chatglm3 speculative example (#10004)
* init chatglm

* update

* update
2024-02-19 13:38:52 +08:00
Wang, Jian4
f2417e083c LLM: enable chatglm3-6b target_model ipex (#10085)
* init

* always make casual_mask

* not return last tensor

* update

* optimize_model = False

* enable optimized=False

* enable optimized_model=true

* speed_up ipex target_model

* remove if True

* use group_size

* update python style

* update

* update
2024-02-19 13:38:32 +08:00
Heyang Sun
177273c1a4 IPEX Speculative Support for Baichuan2 7B (#10112)
* IPEX Speculative Support for Baichuan2 7B

* fix license problems

* refine
2024-02-19 09:12:57 +08:00
Yina Chen
1508d6b089 Fix gptj kvcache & position id (#10141) 2024-02-18 10:02:49 +08:00
yb-peng
b4dc33def6 In harness-evaluation workflow, add statistical tables (#10118)
* chnage storage

* fix typo

* change label

* change label to arc03

* change needs in the last step

* add generate csv in harness/make_table_results.py

* modify needs in the last job

* add csv to html

* mfix path issue in llm-harness-summary-nightly

* modify output_path

* modify args in make_table_results.py

* modify make table command in summary

* change pr env label

* remove irrelevant code in summary; add set output path step; add limit in harness run

* re-organize code structure

* modify limit in run harness

* modify csv_to_html input path

* modify needs in summary-nightly
2024-02-08 19:01:05 +08:00
Yishuo Wang
4d33aac7f9 quick fix qwen2 fp8 kv cache (#10135) 2024-02-08 17:04:59 +08:00
Cengguang Zhang
39d90839aa LLM: add quantize kv cache for llama. (#10086)
* feat: add quantize kv cache for llama.

* fix style.

* add quantized attention forward function.

* revert style.

* fix style.

* fix style.

* update quantized kv cache and add quantize_qkv

* fix style.

* fix style.

* optimize quantize kv cache.

* fix style.
2024-02-08 16:49:22 +08:00
Yishuo Wang
d848efe17c add quantize kv cache support for qwen2 (#10134) 2024-02-08 16:17:21 +08:00
SONG Ge
3f79128ed7 [LLM] Enable kv_cache optimization for Qwen2 on transformers-v4.37.0 (#10131)
* add support for kv_cache optimization on transformers-v4.37.0

* enable attention forward

* style fix

* disable rotary for now
2024-02-08 14:20:26 +08:00
Ruonan Wang
063dc145ac LLM: basic support for q2k (#10132)
* basic support for q2k

* fix style
2024-02-08 13:52:01 +08:00
binbin Deng
11fe5a87ec LLM: add Modelscope model example (#10126) 2024-02-08 11:18:07 +08:00
Cengguang Zhang
0cf6a12691 LLM: add default torch_dtype for fp16. (#10124)
* set default torch_dtype for fp16.

* fix style.

* bug fix.

* update bug fix.
2024-02-08 10:24:16 +08:00
Yishuo Wang
1aa0c623ce disable fused layer norm on UHD (#10130) 2024-02-08 10:20:01 +08:00
Yuwen Hu
a8450fc300 [LLM] Support MLP optimization for Qwen1.5 (#10123) 2024-02-08 09:15:34 +08:00
Yuwen Hu
81ed65fbe7 [LLM] Add qwen1.5-7B in iGPU perf (#10127)
* Add qwen1.5 test config yaml with transformers 4.37.0

* Update for yaml file
2024-02-07 22:31:20 +08:00
Jin Qiao
0fcfbfaf6f LLM: add rwkv5 eagle GPU HF example (#10122)
* LLM: add rwkv5 eagle example

* fix

* fix link
2024-02-07 16:58:29 +08:00
binbin Deng
925f82107e LLM: support models hosted by modelscope (#10106) 2024-02-07 16:46:36 +08:00
binbin Deng
c1ec3d8921 LLM: update FAQ about too many open files (#10119) 2024-02-07 15:02:24 +08:00
Keyan (Kyrie) Zhang
2e80701f58 Unit test on final logits and the logits of the last attention layer (#10093)
* Add unit test on final logits and attention

* Add unit test on final logits and attention

* Modify unit test on final logits and attention
2024-02-07 14:25:36 +08:00
Yuxuan Xia
3832eb0ce0 Add ChatGLM C-Eval Evaluator (#10095)
* Add ChatGLM ceval evaluator

* Modify ChatGLM Evaluator Reference
2024-02-07 11:27:06 +08:00
Jin Qiao
63050c954d fix (#10117) 2024-02-07 11:05:11 +08:00
Jin Qiao
d3d2ee1b63 LLM: add speech T5 GPU example (#10090)
* add speech t5 example

* fix

* fix
2024-02-07 10:50:02 +08:00
Jin Qiao
2f4c754759 LLM: add bark gpu example (#10091)
* add bark gpu example

* fix

* fix license

* add bark

* add example

* fix

* another way
2024-02-07 10:47:11 +08:00
Xiangyu Tian
8953acd7d6 [LLM] Fix log condition for BIGDL_OPT_IPEX (#10115)
Fix log condition for BIGDL_OPT_IPEX
2024-02-07 10:27:10 +08:00
SONG Ge
0eccb94d75 remove text-generation-webui from bigdl repo (#10107) 2024-02-06 17:46:52 +08:00
Ovo233
2aaa21c41d LLM: Update ppl tests (#10092)
* update ppl tests

* use load_dataset api

* add exception handling

* add language argument

* address comments
2024-02-06 17:31:48 +08:00
Yuwen Hu
3a46b57253 [LLM] Add RWKV4 HF GPU Example (#10105)
* Add GPU HF example for RWKV 4

* Add link to rwkv4

* fix
2024-02-06 16:30:24 +08:00
Yuwen Hu
518ef95abc Small fix for Nonetype error (#10104) 2024-02-06 14:58:52 +08:00
Ruonan Wang
d61f4905ac LLM: 2bit quantization initial support (#10042)
* basis quantize support

* fix new module name

* small update

* and mixed int4 with iq2_xxs

* remove print

* code refactor

* fix style

* meet code review
2024-02-06 14:58:32 +08:00
dingbaorong
36c9442c6d Arc Stable version test (#10087)
* add batch_size in stable version test

* add batch_size in excludes

* add excludes for batch_size

* fix ci

* triger regression test

* fix xpu version

* disable ci

* address kai's comment

---------

Co-authored-by: Ariadne <wyn2000330@126.com>
2024-02-06 10:23:50 +08:00
Jiao Wang
33b9e7744d fix dimension (#10097) 2024-02-05 15:07:38 -08:00
SONG Ge
4b02ff188b [WebUI] Add prompt format and stopping words for Qwen (#10066)
* add prompt format and stopping_words for qwen mdoel

* performance optimization

* optimize

* update

* meet comments
2024-02-05 18:23:13 +08:00
WeiguangHan
0aecd8637b LLM: small fix for the html script (#10094) 2024-02-05 17:27:34 +08:00
Zhicun
7d2be7994f add phixtral and optimize phi-moe (#10052) 2024-02-05 11:12:47 +08:00
Zhicun
676d6923f2 LLM: modify transformersembeddings.embed() in langchain (#10051) 2024-02-05 10:42:10 +08:00
Jin Qiao
ad050107b3 LLM: fix mpt load_low_bit issue (#10075)
* fix

* retry

* retry
2024-02-05 10:17:07 +08:00
SONG Ge
9050991e4e fix gradio check issue temply (#10082) 2024-02-04 16:46:29 +08:00
WeiguangHan
c2e562d037 LLM: add batch_size to the csv and html (#10080)
* LLM: add batch_size to the csv and html

* small fix
2024-02-04 16:35:44 +08:00
binbin Deng
7e49fbc5dd LLM: make finetuning examples more common for other models (#10078) 2024-02-04 16:03:52 +08:00
Heyang Sun
90f004b80b remove benchmarkwrapper form deepspeed example (#10079) 2024-02-04 15:42:15 +08:00
Ruonan Wang
8e33cb0f38 LLM: support speecht5_tts (#10077)
* support speecht5_tts

* fix
2024-02-04 13:26:42 +08:00
ivy-lv11
428b7105f6 Add HF and PyTorch example InternLM2 (#10061) 2024-02-04 10:25:55 +08:00
Yina Chen
77be19bb97 LLM: Support gpt-j in speculative decoding (#10067)
* gptj

* support gptj in speculative decoding

* fix

* update readme

* small fix
2024-02-02 14:54:55 +08:00
SONG Ge
19183ef476 [WebUI] Reset bigdl-llm loader options with default value (#10064)
* reset bigdl-llm loader options with default value

* remove options which maybe complex for naive users
2024-02-01 15:45:39 +08:00
Xin Qiu
6e0f1a1e92 use apply_rotary_pos_emb_cache_freq_xpu in mixtral (#10060)
* use apply_rotary_pos_emb_cache_freq_xpu in mixtral

* fix style
2024-02-01 15:40:49 +08:00
binbin Deng
aae20d728e LLM: Add initial DPO finetuning example (#10021) 2024-02-01 14:18:08 +08:00
Heyang Sun
601024f418 Mistral CPU example of speculative decoding (#10024)
* Mistral CPU example of speculative decoding

* update transformres version

* update example

* Update README.md
2024-02-01 10:52:32 +08:00
Heyang Sun
968e70544d Enable IPEX Mistral in Speculative (#10059) 2024-02-01 10:48:16 +08:00
Yina Chen
3ca03d4e97 Add deepmind sample into bigdl-llm speculative decoding (#10041)
* migrate deepmind sample

* update

* meet comments

* fix style

* fix style
2024-02-01 09:57:02 +08:00
WeiguangHan
d2d3f6b091 LLM: ensure the result of daily arc perf test (#10016)
* ensure the result of daily arc perf test

* small fix

* small fix

* small fix

* small fix

* small fix

* small fix

* small fix

* small fix

* small fix

* small fix

* concat more csvs

* small fix

* revert some files
2024-01-31 18:26:21 +08:00
WeiguangHan
9724939499 temporarily disable bloom 2k input (#10056) 2024-01-31 17:49:12 +08:00
Jin Qiao
8c8fc148c9 LLM: add rwkv 5 (#10048) 2024-01-31 15:54:55 +08:00
WeiguangHan
a9018a0e95 LLM: modify the GPU example for redpajama model (#10044)
* LLM: modify the GPU example for redpajama model

* small fix
2024-01-31 14:32:08 +08:00
Yuxuan Xia
95636cad97 Add AutoGen CPU and XPU Example (#9980)
* Add AutoGen example

* Adjust AutoGen README

* Adjust AutoGen README

* Change AutoGen README

* Change AutoGen README
2024-01-31 11:31:18 +08:00
Heyang Sun
7284edd9b7 Vicuna CPU example of speculative decoding (#10018)
* Vicuna CPU example of speculative decoding

* Update speculative.py

* Update README.md

* add requirements for ipex

* Update README.md

* Update speculative.py

* Update speculative.py
2024-01-31 11:23:50 +08:00
Wang, Jian4
7e5cd42a5c LLM : Update optimize ipex bf16 (#10038)
* use 4.35.2 and remove

* update rmsnorm

* remove

* remove

* update python style

* update

* update python style

* update

* fix style

* update

* remove whitespace
2024-01-31 10:59:55 +08:00
Wang, Jian4
fb53b994f8 LLM : Add llama ipex optimized (#10046)
* init ipex

* remove padding
2024-01-31 10:38:46 +08:00
Ruonan Wang
3685622f29 LLM: fix llama 4.36 forward(#10047) 2024-01-31 10:31:10 +08:00
Yishuo Wang
53a5140eff Optimize rwkv v5 rest token again (#10043) 2024-01-31 10:01:11 +08:00
Heyang Sun
b1ff28ceb6 LLama2 CPU example of speculative decoding (#9962)
* LLama2 example of speculative decoding

* add docs

* Update speculative.py

* Update README.md

* Update README.md

* Update speculative.py

* remove autocast
2024-01-31 09:45:20 +08:00
WeiguangHan
0fcad6ce14 LLM: add gpu example for redpajama models (#10040) 2024-01-30 19:39:28 +08:00
Xiangyu Tian
9978089796 [LLM] Enable BIGDL_OPT_IPEX in speculative baichuan2 13b example (#10028)
Enable BIGDL_OPT_IPEX in speculative baichuan2 13b example
2024-01-30 17:11:37 +08:00
Ovo233
226f398c2a fix ppl test errors (#10036) 2024-01-30 16:26:21 +08:00
Xin Qiu
13e61738c5 hide detail memory for each token in benchmark_utils.py (#10037) 2024-01-30 16:04:17 +08:00
Ruonan Wang
6b63ba23d1 LLM: add full module name during convert (#10035) 2024-01-30 14:43:07 +08:00