Commit graph

118 commits

Author SHA1 Message Date
Ruonan Wang
c267355b35
fix three NPU benchmark issues (#12350)
* fix three issues

* limit mixed_precision for CW only
2024-11-06 19:01:01 +08:00
Jin, Qiao
7240c283a3
Add dummy model in iGPU perf (#12341)
* Add dummy model in iGPU perf

* Add dummy model in iGPU perf

* Fix
2024-11-05 17:56:10 +08:00
Ch1y0q
e54af44ed6
Add transformers_int4_npu_pipeline_win in all-in-one benchmark (#12325)
* add transformers_int4_npu_pipeline_win

* bugfix

* bugfix: wrong actual_output_len

* fix format

* bugfix & update `README.md`
2024-11-04 16:00:20 +08:00
Yuwen Hu
20755e8077
Small fix to all-in-one benchmark scripts (#12317) 2024-11-01 19:16:25 +08:00
Ch1y0q
48123af463
add npu_group_size for transformers_int4_npu_win in all-in-one benchmark api (#12316)
* add `npu_group_size` for `transformers_int4_npu_win`
small bugfix

* update
2024-11-01 18:44:27 +08:00
Ruonan Wang
3fe2ea3081
[NPU] Reuse prefill of acc lib for pipeline (#12279)
* first commit

* update example

* fix style

* update example

* embedding as const

* fix generate

* code  refactor

* meet code review

* fix style

* change max_output_len to max_context_len

* fix all-in-one

* fix example

* add check for new tokens
2024-10-28 16:05:49 +08:00
Zijie Li
f7f62a3fef
Add OpenVINO performance tests to all-in-one benchmark (#12238)
* add-openvino-to-all-in-one

* update on openvino API

* Update save_openvino.py

* Update save_openvino.py

* Update save_openvino.py

* update on run.py and save_openvino

* update references

* Create openvino-requirements.txt

* fix on comments

* Small updates

* Small fix

* Fix

---------

Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
2024-10-25 13:53:53 +08:00
Chu,Youcheng
f17cc4fdee
feat: add llama3.2-11b-vision in all in one (#12207)
* feat: add llama3.2-11b-vision in all in one

* fix: change model

* fix: change name

* fix: add a space

* fix: switch import
2024-10-16 10:32:11 +08:00
Jinhe
02399021d6
add npu load_low_bit api in all-in-one benchmark (#12103) 2024-09-20 17:56:08 +08:00
Ch1y0q
9650bf616a
add transpose_value_cache for NPU benchmark (#12092)
* add `transpose_value_cache`

* update

* update
2024-09-19 18:45:05 +08:00
binbin Deng
7f7f6c89f5
Quick fix benchmark script (#11938) 2024-08-27 15:29:27 +08:00
binbin Deng
7c8c9a0670
Update benchmark script for NPU (#11932) 2024-08-27 14:41:14 +08:00
Yuwen Hu
a0bbd8e28d
All-in-one benchmark update regarding performance mode for input length threshold (#11920)
* All-in-one benchmark update regarding performance mode input length threshold

* typo fix
2024-08-26 18:52:13 +08:00
Ruonan Wang
a0fbda5bc8
add MiniCPM-Llama3-V-2_5 into all-in-one benchmark (#11849) 2024-08-19 17:51:16 +08:00
Yuwen Hu
cfc959defa
Fixes regarding utf-8 in all-in-one benchmark (#11839) 2024-08-19 10:38:00 +08:00
Jin, Qiao
9f17234f3b
Add MiniCPM-V-2_6 to iGPU Perf (#11810)
* Add MiniCPM-V-2_6 to iGPU Perf

* keep last model in yaml

* fix MINICPM_V_IDS

* Restore tested model list

* Small fix

---------

Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
2024-08-16 18:41:21 +08:00
Yuwen Hu
96796f95cb
Update all-in-one benchmark prompts for continuation task & lookup update for minicpmv (#11827)
* Update all-in-one benchmark prompts for continuation task

* Small fix

* Add pure-text benchmark support for minicpm-v-2_6

* Support lookahead for model.llm generate of minicpmv

* Add prompt reference

* Small update

* Small fix
2024-08-16 17:16:35 +08:00
Yuwen Hu
356281cb80
Further all-in-one benchmark update continuation task (#11784)
* Further update prompt for continuation task, and disable lookup candidate update strategy on MTL

* style fix
2024-08-14 14:39:34 +08:00
Yuwen Hu
81824ff8c9
Fix stdout in all-in-one benchmark to utf-8 (#11772) 2024-08-13 10:51:08 +08:00
Yuwen Hu
f97a77ea4e
Update all-in-one benchmark for continuation task input preparation (#11760)
* All use 8192.txt for prompt preparation for now

* Small fix

* Fix text encoding mode to utf-8

* Small update
2024-08-12 17:49:45 +08:00
Jin, Qiao
05989ad0f9
Update npu example and all in one benckmark (#11766) 2024-08-12 16:46:46 +08:00
Ruonan Wang
66fe2ee464
initial support of IPEX_LLM_PERFORMANCE_MODE (#11754)
* add perf mode

* update

* fix style
2024-08-09 19:04:09 +08:00
Zijie Li
8fb36b9f4a
add new benchmark_util.py (#11713)
* add new benchmark_util.py
2024-08-05 16:18:48 +08:00
Qiyuan Gong
0c6e0b86c0
Refine continuation get input_str (#11652)
* Remove duplicate code in continuation get input_str.
* Avoid infinite loop in all-in-one due to test_length not in the list.
2024-07-25 14:41:19 +08:00
Xu, Shuo
7f80db95eb
Change run.py in benchmark to support phi-3-vision in arc-perf (#11638)
Co-authored-by: ATMxsp01 <shou.xu@intel.com>
2024-07-23 09:51:36 +08:00
Zhao Changmin
06745e5742
Add npu benchmark all-in-one script (#11571)
* npu benchmark
2024-07-15 10:42:37 +08:00
Xu, Shuo
1355b2ce06
Add model Qwen-VL-Chat to iGPU-perf (#11558)
* Add model Qwen-VL-Chat to iGPU-perf

* small fix

---------

Co-authored-by: ATMxsp01 <shou.xu@intel.com>
2024-07-11 15:39:02 +08:00
Xu, Shuo
028ad4f63c
Add model phi-3-vision-128k-instruct to iGPU-perf benchmark (#11554)
* try to improve MIniCPM performance

* Add model phi-3-vision-128k-instruct to iGPU-perf benchmark

---------

Co-authored-by: ATMxsp01 <shou.xu@intel.com>
2024-07-10 17:26:30 +08:00
Cengguang Zhang
fa81dbefd3
LLM: update multi gpu write csv in all-in-one benchmark. (#11538) 2024-07-09 11:14:17 +08:00
Jun Wang
1efb6ebe93
[ADD] add transformer_int4_fp16_loadlowbit_gpu_win api (#11511)
* [ADD] add transformer_int4_fp16_loadlowbit_gpu_win api

* [UPDATE] add int4_fp16_lowbit config and description

* [FIX] fix run.py mistake

* [FIX] fix run.py mistake

* [FIX] fix indent; change dtype=float16 to model.half()
2024-07-05 16:38:41 +08:00
Cengguang Zhang
d0b801d7bc
LLM: change write mode in all-in-one benchmark. (#11444)
* LLM: change write mode in all-in-one benchmark.

* update output style.
2024-06-27 19:36:38 +08:00
RyuKosei
05a8d051f6
Fix run.py run_ipex_fp16_gpu (#11361)
* fix a bug on run.py

* Update run.py

fixed the format problem

---------

Co-authored-by: sgwhat <ge.song@intel.com>
2024-06-20 10:29:32 +08:00
hxsz1997
44f22cba70
add config and default value (#11344)
* add config and default value

* add config in taml

* remove lookahead and max_matching_ngram_size in config

* remove streaming and use_fp16_torch_dtype in test yaml

* update task in readme

* update commit of task
2024-06-18 15:28:57 +08:00
hxsz1997
99b309928b
Add lookahead in test_api: transformer_int4_fp16_gpu (#11337)
* add lookahead in test_api:transformer_int4_fp16_gpu

* change the short prompt of summarize

* change short prompt to cnn_64

* change short prompt of summarize
2024-06-17 17:41:41 +08:00
binbin Deng
6ea1e71af0
Update PP inference benchmark script (#11323) 2024-06-17 09:59:36 +08:00
Ruonan Wang
986af21896
fix perf test(#11295) 2024-06-13 10:35:48 +08:00
Ruonan Wang
14b1e6b699
Fix gguf_q4k (#11293)
* udpate embedding parameter

* update benchmark
2024-06-12 20:43:08 +08:00
Yuwen Hu
fac49f15e3
Remove manual importing ipex in all-in-one benchmark (#11272) 2024-06-11 09:32:13 +08:00
Shaojun Liu
85df5e7699
fix nightly perf test (#11251) 2024-06-07 09:33:14 +08:00
hxsz1997
b6234eb4e2
Add task in allinone (#11226)
* add task

* update prompt

* modify typos

* add more cases in summarize

* Make the summarize & QA prompt preprocessing as a util function
2024-06-06 17:22:40 +08:00
Wenjing Margaret Mao
231b968aba
Modify the check_results.py to support batch 2&4 (#11133)
* add batch 2&4 and exclude to perf_test

* modify the perf-test&437 yaml

* modify llm_performance_test.yml

* remove batch 4

* modify check_results.py to support batch 2&4

* change the batch_size format

* remove genxir

* add str(batch_size)

* change actual_test_casese in check_results file to support batch_size

* change html highlight

* less models to test html and html_path

* delete the moe model

* split batch html

* split

* use installing from pypi

* use installing from pypi - batch2

* revert cpp

* revert cpp

* merge two jobs into one, test batch_size in one job

* merge two jobs into one, test batch_size in one job

* change file directory in workflow

* try catch deal with odd file without batch_size

* modify pandas version

* change the dir

* organize the code

* organize the code

* remove Qwen-MOE

* modify based on feedback

* modify based on feedback

* modify based on second round of feedback

* modify based on second round of feedback + change run-arc.sh mode

* modify based on second round of feedback + revert config

* modify based on second round of feedback + revert config

* modify based on second round of feedback + remove comments

* modify based on second round of feedback + remove comments

* modify based on second round of feedback + revert arc-perf-test

* modify based on third round of feedback

* change error type

* change error type

* modify check_results.html

* split batch into two folders

* add all models

* move csv_name

* revert pr test

* revert pr test

---------

Co-authored-by: Yishuo Wang <yishuo.wang@intel.com>
2024-06-05 15:04:55 +08:00
Cengguang Zhang
3eb13ccd8c
LLM: fix input length condition in deepspeed all-in-one benchmark. (#11185) 2024-06-03 10:05:43 +08:00
hxsz1997
62b2d8af6b
Add lookahead in all-in-one (#11142)
* add lookahead in allinone

* delete save to csv in run_transformer_int4_gpu

* change lookup to lookahead

* fix the error of add model.peak_memory

* Set transformer_int4_gpu as the default option

* add comment of transformer_int4_fp16_lookahead_gpu
2024-05-28 15:39:58 +08:00
Wang, Jian4
d9f71f1f53
Update benchmark util for example using (#11027)
* mv benchmark_util.py to utils/

* remove

* update
2024-05-15 14:16:35 +08:00
Xin Qiu
dfa3147278
update (#10944) 2024-05-08 14:28:05 +08:00
Cengguang Zhang
0edef1f94c
LLM: add min_new_tokens to all in one benchmark. (#10911) 2024-05-06 09:32:59 +08:00
Yuwen Hu
1a8a93d5e0
Further fix nightly perf (#10901) 2024-04-28 10:18:58 +08:00
Yuwen Hu
ddfdaec137
Fix nightly perf (#10899)
* Fix nightly perf by adding default value in benchmark for use_fp16_torch_dtype

* further fixes
2024-04-28 09:39:29 +08:00
binbin Deng
f51bf018eb
Add benchmark script for pipeline parallel inference (#10873) 2024-04-26 15:28:11 +08:00
Cengguang Zhang
eb39c61607
LLM: add min new token to perf test. (#10869) 2024-04-24 14:32:02 +08:00