Commit graph

87 commits

Author SHA1 Message Date
RyuKosei
05a8d051f6
Fix run.py run_ipex_fp16_gpu (#11361)
* fix a bug on run.py

* Update run.py

fixed the format problem

---------

Co-authored-by: sgwhat <ge.song@intel.com>
2024-06-20 10:29:32 +08:00
hxsz1997
44f22cba70
add config and default value (#11344)
* add config and default value

* add config in taml

* remove lookahead and max_matching_ngram_size in config

* remove streaming and use_fp16_torch_dtype in test yaml

* update task in readme

* update commit of task
2024-06-18 15:28:57 +08:00
hxsz1997
99b309928b
Add lookahead in test_api: transformer_int4_fp16_gpu (#11337)
* add lookahead in test_api:transformer_int4_fp16_gpu

* change the short prompt of summarize

* change short prompt to cnn_64

* change short prompt of summarize
2024-06-17 17:41:41 +08:00
binbin Deng
6ea1e71af0
Update PP inference benchmark script (#11323) 2024-06-17 09:59:36 +08:00
Ruonan Wang
986af21896
fix perf test(#11295) 2024-06-13 10:35:48 +08:00
Ruonan Wang
14b1e6b699
Fix gguf_q4k (#11293)
* udpate embedding parameter

* update benchmark
2024-06-12 20:43:08 +08:00
Yuwen Hu
fac49f15e3
Remove manual importing ipex in all-in-one benchmark (#11272) 2024-06-11 09:32:13 +08:00
Shaojun Liu
85df5e7699
fix nightly perf test (#11251) 2024-06-07 09:33:14 +08:00
hxsz1997
b6234eb4e2
Add task in allinone (#11226)
* add task

* update prompt

* modify typos

* add more cases in summarize

* Make the summarize & QA prompt preprocessing as a util function
2024-06-06 17:22:40 +08:00
Wenjing Margaret Mao
231b968aba
Modify the check_results.py to support batch 2&4 (#11133)
* add batch 2&4 and exclude to perf_test

* modify the perf-test&437 yaml

* modify llm_performance_test.yml

* remove batch 4

* modify check_results.py to support batch 2&4

* change the batch_size format

* remove genxir

* add str(batch_size)

* change actual_test_casese in check_results file to support batch_size

* change html highlight

* less models to test html and html_path

* delete the moe model

* split batch html

* split

* use installing from pypi

* use installing from pypi - batch2

* revert cpp

* revert cpp

* merge two jobs into one, test batch_size in one job

* merge two jobs into one, test batch_size in one job

* change file directory in workflow

* try catch deal with odd file without batch_size

* modify pandas version

* change the dir

* organize the code

* organize the code

* remove Qwen-MOE

* modify based on feedback

* modify based on feedback

* modify based on second round of feedback

* modify based on second round of feedback + change run-arc.sh mode

* modify based on second round of feedback + revert config

* modify based on second round of feedback + revert config

* modify based on second round of feedback + remove comments

* modify based on second round of feedback + remove comments

* modify based on second round of feedback + revert arc-perf-test

* modify based on third round of feedback

* change error type

* change error type

* modify check_results.html

* split batch into two folders

* add all models

* move csv_name

* revert pr test

* revert pr test

---------

Co-authored-by: Yishuo Wang <yishuo.wang@intel.com>
2024-06-05 15:04:55 +08:00
Cengguang Zhang
3eb13ccd8c
LLM: fix input length condition in deepspeed all-in-one benchmark. (#11185) 2024-06-03 10:05:43 +08:00
hxsz1997
62b2d8af6b
Add lookahead in all-in-one (#11142)
* add lookahead in allinone

* delete save to csv in run_transformer_int4_gpu

* change lookup to lookahead

* fix the error of add model.peak_memory

* Set transformer_int4_gpu as the default option

* add comment of transformer_int4_fp16_lookahead_gpu
2024-05-28 15:39:58 +08:00
Wang, Jian4
d9f71f1f53
Update benchmark util for example using (#11027)
* mv benchmark_util.py to utils/

* remove

* update
2024-05-15 14:16:35 +08:00
Xin Qiu
dfa3147278
update (#10944) 2024-05-08 14:28:05 +08:00
Cengguang Zhang
0edef1f94c
LLM: add min_new_tokens to all in one benchmark. (#10911) 2024-05-06 09:32:59 +08:00
Yuwen Hu
1a8a93d5e0
Further fix nightly perf (#10901) 2024-04-28 10:18:58 +08:00
Yuwen Hu
ddfdaec137
Fix nightly perf (#10899)
* Fix nightly perf by adding default value in benchmark for use_fp16_torch_dtype

* further fixes
2024-04-28 09:39:29 +08:00
binbin Deng
f51bf018eb
Add benchmark script for pipeline parallel inference (#10873) 2024-04-26 15:28:11 +08:00
Cengguang Zhang
eb39c61607
LLM: add min new token to perf test. (#10869) 2024-04-24 14:32:02 +08:00
Wang, Jian4
23c6a52fb0
LLM: Fix ipex torchscript=True error (#10832)
* remove

* update

* remove torchscript
2024-04-22 15:53:09 +08:00
yb-peng
2685c41318
Modify all-in-one benchmark (#10726)
* Update 8192 prompt in all-in-one

* Add cpu_embedding param for linux api

* Update run.py

* Update README.md
2024-04-11 13:38:50 +08:00
yb-peng
2d88bb9b4b
add test api transformer_int4_fp16_gpu (#10627)
* add test api transformer_int4_fp16_gpu

* update config.yaml and README.md in all-in-one

* modify run.py in all-in-one

* re-order test-api

* re-order test-api in config

* modify README.md in all-in-one

* modify README.md in all-in-one

* modify config.yaml

---------

Co-authored-by: pengyb2001 <arda@arda-arc21.sh.intel.com>
Co-authored-by: ivy-lv11 <zhicunlv@gmail.com>
2024-04-07 15:47:17 +08:00
binbin Deng
d9a1153b4e
LLM: upgrade deepspeed in AutoTP on GPU (#10647) 2024-04-07 14:05:19 +08:00
binbin Deng
27be448920
LLM: add cpu_embedding and peak memory record for deepspeed autotp script (#10621) 2024-04-02 17:32:50 +08:00
Ruonan Wang
d6af4877dd
LLM: remove ipex.optimize for gpt-j (#10606)
* remove ipex.optimize

* fix

* fix
2024-04-01 12:21:49 +08:00
Ruonan Wang
ea4bc450c4
LLM: add esimd sdp for pvc (#10543)
* add esimd sdp for pvc

* update

* fix

* fix batch
2024-03-26 19:04:40 +08:00
Wang, Jian4
9df70d95eb
Refactor bigdl.llm to ipex_llm (#24)
* Rename bigdl/llm to ipex_llm

* rm python/llm/src/bigdl

* from bigdl.llm to from ipex_llm
2024-03-22 15:41:21 +08:00
binbin Deng
85ef3f1d99 LLM: add empty cache in deepspeed autotp benchmark script (#10488) 2024-03-21 10:51:23 +08:00
Xiangyu Tian
5a5fd5af5b LLM: Add speculative benchmark on CPU/XPU (#10464)
Add speculative benchmark on CPU/XPU.
2024-03-21 09:51:06 +08:00
Xiangyu Tian
cbe24cc7e6 LLM: Enable BigDL IPEX Int8 (#10480)
Enable BigDL IPEX Int8
2024-03-20 15:59:54 +08:00
Jin Qiao
e41d556436 LLM: change fp16 benchmark to model.half (#10477)
* LLM: change fp16 benchmark to model.half

* fix
2024-03-20 13:38:39 +08:00
Jin Qiao
e9055c32f9 LLM: fix fp16 mem record in benchmark (#10461)
* LLM: fix fp16 mem record in benchmark

* change style
2024-03-19 16:17:23 +08:00
Jin Qiao
0451103a43 LLM: add int4+fp16 benchmark script for windows benchmarking (#10449)
* LLM: add fp16 for benchmark script

* remove transformer_int4_fp16_loadlowbit_gpu_win
2024-03-19 11:11:25 +08:00
Xiangyu Tian
0ded0b4b13 LLM: Enable BigDL IPEX optimization for int4 (#10319)
Enable BigDL IPEX optimization for int4
2024-03-12 17:08:50 +08:00
binbin Deng
5d996a5caf LLM: add benchmark script for deepspeed autotp on gpu (#10380) 2024-03-12 15:19:57 +08:00
WeiguangHan
fd81d66047 LLM: Compress some models to save space (#10315)
* LLM: compress some models to save space

* add deleted comments
2024-03-04 17:53:03 +08:00
Yuwen Hu
27d9a14989 [LLM] all-on-one update: memory optimize and streaming output (#10302)
* Memory saving for continous in-out pair run and add support for streaming output on MTL iGPU

* Small fix

* Small fix

* Add things back
2024-03-01 18:02:30 +08:00
Yuwen Hu
21de2613ce [LLM] Add model loading time record for all-in-one benchmark (#10201)
* Add model loading time record in csv for all-in-one benchmark

* Small fix

* Small fix to number after .
2024-02-22 13:57:18 +08:00
Yuwen Hu
001c13243e [LLM] Add support for low_low_bit benchmark on Windows GPU (#10167)
* Add support for low_low_bit performance test on Windows GPU

* Small fix

* Small fix

* Save memory during converting model process

* Drop the results for first time when loading in low bit on mtl igpu for better performance

* Small fix
2024-02-21 10:51:52 +08:00
dingbaorong
36c9442c6d Arc Stable version test (#10087)
* add batch_size in stable version test

* add batch_size in excludes

* add excludes for batch_size

* fix ci

* triger regression test

* fix xpu version

* disable ci

* address kai's comment

---------

Co-authored-by: Ariadne <wyn2000330@126.com>
2024-02-06 10:23:50 +08:00
WeiguangHan
c2e562d037 LLM: add batch_size to the csv and html (#10080)
* LLM: add batch_size to the csv and html

* small fix
2024-02-04 16:35:44 +08:00
WeiguangHan
d2d3f6b091 LLM: ensure the result of daily arc perf test (#10016)
* ensure the result of daily arc perf test

* small fix

* small fix

* small fix

* small fix

* small fix

* small fix

* small fix

* small fix

* small fix

* small fix

* concat more csvs

* small fix

* revert some files
2024-01-31 18:26:21 +08:00
Xin Qiu
7952bbc919 add conf batch_size to run_model (#10010) 2024-01-26 15:48:48 +08:00
Ziteng Zhang
8b08ad408b Add batch_size in all_in_one (#9999)
Add batch_size in all_in_one, except run_native_int4
2024-01-25 17:43:49 +08:00
Xin Qiu
610b5226be move reserved memory to benchmark_utils.py (#9907)
* move reserved memory to benchmark_utils.py

* meet code review
2024-01-19 09:44:30 +08:00
WeiguangHan
100e0a87e5 LLM: add compressed chatglm3 model (#9892)
* LLM: add compressed chatglm3 model

* small fix

* revert github action
2024-01-18 17:48:15 +08:00
Ruonan Wang
b059a32fff LLM: add benchmark api for bigdl-llm fp16 on GPU (#9919)
* add bmk for bigdl fp16

* fix
2024-01-17 14:24:35 +08:00
WeiguangHan
0e69bfe6b0 LLM: fix the performance drop of starcoder (#9889)
* LLM: fix the performance drop of starcoder

* small fix

* small fix
2024-01-12 09:14:15 +08:00
Ziteng Zhang
4f4ce73f31 [LLM] Add transformer_autocast_bf16 into all-in-one (#9890)
* Add transformer_autocast_bf16 into all-in-one
2024-01-11 17:51:07 +08:00
WeiguangHan
33fd1f9c76 LLM: fix input length logic for run_transformer_int4_gpu (#9864)
* LLM: fix input length logic for run_transformer_int4_gpu

* small fix

* small fix

* small fix
2024-01-10 18:20:14 +08:00