Commit graph

250 commits

Author SHA1 Message Date
binbin Deng
f99f188023
Hotfix of benchmark script (#12467) 2024-11-29 14:00:59 +08:00
binbin Deng
c911026f03
[NPU C++] Update model support & examples & benchmark (#12466) 2024-11-29 13:35:58 +08:00
binbin Deng
7a97fbb779
Support vpm and resampler module of minicpm-v on NPU (#12375) 2024-11-12 15:59:55 +08:00
Yuwen Hu
8fe294e01f
Small fix to all-in-one benchmark (#12362) 2024-11-07 18:56:34 +08:00
Jinhe
79f2877413
add minicpm-v models to transformers_int4_npu_win api (#12352)
* add minicpm npu

* optimize model
2024-11-07 10:05:10 +08:00
Ruonan Wang
c267355b35
fix three NPU benchmark issues (#12350)
* fix three issues

* limit mixed_precision for CW only
2024-11-06 19:01:01 +08:00
Jin, Qiao
7240c283a3
Add dummy model in iGPU perf (#12341)
* Add dummy model in iGPU perf

* Add dummy model in iGPU perf

* Fix
2024-11-05 17:56:10 +08:00
Zijie Li
45b0d371aa
update benchmark readme (#12323)
* update benchmark readme

update new comment with memory usage included

* Update README.md
2024-11-05 08:19:08 +08:00
Ch1y0q
e54af44ed6
Add transformers_int4_npu_pipeline_win in all-in-one benchmark (#12325)
* add transformers_int4_npu_pipeline_win

* bugfix

* bugfix: wrong actual_output_len

* fix format

* bugfix & update `README.md`
2024-11-04 16:00:20 +08:00
Chu,Youcheng
a01371f90b
Doc: update harness readme (#12324) 2024-11-04 14:58:54 +08:00
Yuwen Hu
20755e8077
Small fix to all-in-one benchmark scripts (#12317) 2024-11-01 19:16:25 +08:00
Ch1y0q
48123af463
add npu_group_size for transformers_int4_npu_win in all-in-one benchmark api (#12316)
* add `npu_group_size` for `transformers_int4_npu_win`
small bugfix

* update
2024-11-01 18:44:27 +08:00
Ruonan Wang
3fe2ea3081
[NPU] Reuse prefill of acc lib for pipeline (#12279)
* first commit

* update example

* fix style

* update example

* embedding as const

* fix generate

* code  refactor

* meet code review

* fix style

* change max_output_len to max_context_len

* fix all-in-one

* fix example

* add check for new tokens
2024-10-28 16:05:49 +08:00
Yuwen Hu
e713296090
Update all-in-one benchmark (#12272)
* Update all-in-one benchmark

* Small fix

* Small fix

* Small fix
2024-10-25 16:52:59 +08:00
Yuwen Hu
93895b2ac2
Openvino all in one benchmark small fix (#12269)
* Small update for all-in-one benchmark readme to support OpenVINO tests

* Small fix
2024-10-25 14:13:52 +08:00
Zijie Li
f7f62a3fef
Add OpenVINO performance tests to all-in-one benchmark (#12238)
* add-openvino-to-all-in-one

* update on openvino API

* Update save_openvino.py

* Update save_openvino.py

* Update save_openvino.py

* update on run.py and save_openvino

* update references

* Create openvino-requirements.txt

* fix on comments

* Small updates

* Small fix

* Fix

---------

Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
2024-10-25 13:53:53 +08:00
Yina Chen
e37f951cce
[NPU] Groupwise (#12241)
* dq divide

* fix

* support attn divide

* update qwen2 7b

* divide down_proj & other linear

* use concat & reduce sum

* support scale after

* support qwen2

* w/ mm

* update reshape

* spda

* split

* split 2+

* update

* lm head-> 28

* no scale

* update

* update

* update

* fix style

* fix style

* to split linear

* update

* update code

* address comments

* fix style & remove redundant code & revert benchmark scripts

* fix style & remove code

* update save & load

---------

Co-authored-by: Yang Wang <yang3.wang@intel.com>
2024-10-23 14:10:58 +08:00
Chu,Youcheng
f17cc4fdee
feat: add llama3.2-11b-vision in all in one (#12207)
* feat: add llama3.2-11b-vision in all in one

* fix: change model

* fix: change name

* fix: add a space

* fix: switch import
2024-10-16 10:32:11 +08:00
Zijie Li
7d80db710e
Add benchmark_util for transformers >= 4.44.0 (#12171)
* Create benchmark_util_4_45.py

* Update __init__.py

* Update lint-python

* Update benchmark_util_4_45.py

* Update benchmark_util_4_45.py

* Create benchmark_util_4_44.py
2024-10-14 15:40:12 +08:00
Jinhe
02399021d6
add npu load_low_bit api in all-in-one benchmark (#12103) 2024-09-20 17:56:08 +08:00
Ch1y0q
9650bf616a
add transpose_value_cache for NPU benchmark (#12092)
* add `transpose_value_cache`

* update

* update
2024-09-19 18:45:05 +08:00
Xu, Shuo
ee33b93464
Longbench: NV code to ipex-llm (#11662)
* add nv longbench

* LongBench: NV code to ipex-llm

* ammend

* add more models support

* ammend

* optimize LongBench's user experience

* ammend

* ammend

* fix typo

* ammend

* remove cuda related information & add a readme

* add license to python scripts & polish the readme

* ammend

* ammend

---------

Co-authored-by: cyita <yitastudy@gmail.com>
Co-authored-by: ATMxsp01 <shou.xu@intel.com>
Co-authored-by: leonardozcm <leonardo1997zcm@gmail.com>
2024-09-18 15:55:14 +08:00
Chu,Youcheng
649390c464
fix: textual and env variable adjustment (#12038) 2024-09-11 13:38:01 +08:00
Guancheng Fu
69c8d36f16
Switching from vLLM v0.3.3 to vLLM 0.5.4 (#12042)
* Enable single card sync engine

* enable ipex-llm optimizations for vllm

* enable optimizations for lm_head

* Fix chatglm multi-reference problem

* Remove duplicate layer

* LLM: Update vLLM to v0.5.4 (#11746)

* Enable single card sync engine

* enable ipex-llm optimizations for vllm

* enable optimizations for lm_head

* Fix chatglm multi-reference problem

* update 0.5.4 api_server

* add dockerfile

* fix

* fix

* refine

* fix

---------

Co-authored-by: gc-fu <guancheng.fu@intel.com>

* Add vllm-0.5.4 Dockerfile (#11838)

* Update BIGDL_LLM_SDP_IGNORE_MASK in start-vllm-service.sh (#11957)

* Fix vLLM not convert issues (#11817) (#11918)

* Fix not convert issues

* refine

Co-authored-by: Guancheng Fu <110874468+gc-fu@users.noreply.github.com>

* Fix glm4-9b-chat nan error on vllm 0.5.4 (#11969)

* init

* update mlp forward

* fix minicpm error in vllm 0.5.4

* fix dependabot alerts (#12008)

* Update 0.5.4 dockerfile (#12021)

* Add vllm awq loading logic (#11987)

* [ADD] Add vllm awq loading logic

* [FIX] fix the module.linear_method path

* [FIX] fix quant_config path error

* Enable Qwen padding mlp to 256 to support batch_forward (#12030)

* Enable padding mlp

* padding to 256

* update style

* Install 27191 runtime in 0.5.4 docker image (#12040)

* fix rebase error

* fix rebase error

* vLLM: format for 0.5.4 rebase (#12043)

* format

* Update model_convert.py

* Fix serving docker related modifications (#12046)

* Fix undesired modifications (#12048)

* fix

* Refine offline_inference arguments

---------

Co-authored-by: Xiangyu Tian <109123695+xiangyuT@users.noreply.github.com>
Co-authored-by: Jun Wang <thoughts.times@gmail.com>
Co-authored-by: Wang, Jian4 <61138589+hzjane@users.noreply.github.com>
Co-authored-by: liu-shaojun <johnssalyn@outlook.com>
Co-authored-by: Shaojun Liu <61072813+liu-shaojun@users.noreply.github.com>
2024-09-10 15:37:43 +08:00
Chu,Youcheng
16c658e732
LLM: add known issues to harness evaluation (#12036)
* feat: 在harness中添加known issue

* fix: resolve comments

* fix: small fixes
2024-09-09 14:15:42 +08:00
Chu,Youcheng
ae7302a654
add gptq option for ppl test (#11921)
* feat:add gptq for ppl

* fix: add an empty line

* fix: add an empty line

* fix: remove an empty line

* Resolve comments

* Resolve comments

* Resolve comments
2024-08-30 13:43:48 +08:00
binbin Deng
7f7f6c89f5
Quick fix benchmark script (#11938) 2024-08-27 15:29:27 +08:00
binbin Deng
7c8c9a0670
Update benchmark script for NPU (#11932) 2024-08-27 14:41:14 +08:00
Yuwen Hu
a0bbd8e28d
All-in-one benchmark update regarding performance mode for input length threshold (#11920)
* All-in-one benchmark update regarding performance mode input length threshold

* typo fix
2024-08-26 18:52:13 +08:00
Yina Chen
0236de3ac2
set IPEX_LLM_LAST_LM_HEAD=1 as default (#11885) 2024-08-21 15:06:12 +08:00
Chu,Youcheng
32f0a77846
feat: update readme for ppl test (#11865)
* feat: update readme for ppl test

* fix: textual adjustments

* fix: textual adjustments

* Add ipex-llm npu option in setup.py (#11858)

* add ipex-llm npu release

* update example doc

* meet latest release changes

* optimize phi3 memory usage (#11867)

* Update `ipex-llm` default transformers version to 4.37.0 (#11859)

* Update default transformers version to 4.37.0

* Add dependency requirements for qwen and qwen-vl

* Temp fix transformers version for these not yet verified models

* Skip qwen test in UT for now as it requires transformers<4.37.0

* Update performance test regarding updated default `transformers==4.37.0` (#11869)

* Update igpu performance from transformers 4.36.2 to 4.37.0 (#11841)

* upgrade arc perf test to transformers 4.37 (#11842)

* fix load low bit com dtype (#11832)

* feat: add mixed_precision argument on ppl longbench evaluation

* fix: delete extra code

* feat: upgrade arc perf test to transformers 4.37

* fix: add missing codes

* fix: keep perf test for qwen-vl-chat in transformers 4.36

* fix: remove extra space

* fix: resolve pr comment

* fix: add empty line

* fix: add pip install for spr and core test

* fix: delete extra comments

* fix: remove python -m for pip

* Revert "fix load low bit com dtype (#11832)"

This reverts commit 6841a9ac8f.

---------

Co-authored-by: Zhao Changmin <changmin.zhao@intel.com>
Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>

* add transformers==4.36 for qwen vl in igpu-perf (#11846)

* add transformers==4.36.2 for qwen-vl

* Small update

---------

Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>

* fix: remove qwen-7b on core test (#11851)

* fix: remove qwen-7b on core test

* fix: change delete to comment

---------

Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>

* replce filename (#11854)

* fix: remove qwen-7b on core test

* fix: change delete to comment

* fix: replace filename

---------

Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>

* fix: delete extra comments (#11863)

* Remove transformers installation for temp test purposes

* Small fix

* Small update

---------

Co-authored-by: Chu,Youcheng <70999398+cranechu0131@users.noreply.github.com>
Co-authored-by: Zhao Changmin <changmin.zhao@intel.com>
Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>
Co-authored-by: Zijie Li <michael20001122@gmail.com>
Co-authored-by: Chu,Youcheng <1340390339@qq.com>

* Pytorch models transformers version update (#11860)

* yi sync

* delete 4.34 constraint

* delete 4.34 constraint

* delete 4.31 constraint

* delete 4.34 constraint

* delete 4.35 constraint

* added <=4.33.3 constraint

* added <=4.33.3 constraint

* switched to chinese prompt

* Update compresskv model forward type logic (#11868)

* update

* fix

* Update local import for ppl (#11866)

Co-authored-by: jenniew <jenniewang123@gmail.com>

* fix: textual adjustment

---------

Co-authored-by: SONG Ge <38711238+sgwhat@users.noreply.github.com>
Co-authored-by: Yishuo Wang <yishuo.wang@intel.com>
Co-authored-by: Yuwen Hu <54161268+Oscilloscope98@users.noreply.github.com>
Co-authored-by: Zhao Changmin <changmin.zhao@intel.com>
Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>
Co-authored-by: Zijie Li <michael20001122@gmail.com>
Co-authored-by: Yina Chen <33650826+cyita@users.noreply.github.com>
Co-authored-by: RyuKosei <70006706+RyuKosei@users.noreply.github.com>
Co-authored-by: jenniew <jenniewang123@gmail.com>
2024-08-20 20:13:54 +08:00
RyuKosei
5df00869de
Update local import for ppl (#11866)
Co-authored-by: jenniew <jenniewang123@gmail.com>
2024-08-20 18:50:00 +08:00
Ruonan Wang
a0fbda5bc8
add MiniCPM-Llama3-V-2_5 into all-in-one benchmark (#11849) 2024-08-19 17:51:16 +08:00
Yuwen Hu
cfc959defa
Fixes regarding utf-8 in all-in-one benchmark (#11839) 2024-08-19 10:38:00 +08:00
Chu,Youcheng
46a1cbfa64
feat: add mixed_precision argument on ppl longbench evaluation (#11837)
* feat: add mixed_precision argument on ppl longbench evaluation

* fix: delete two spaces

---------

Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>
2024-08-19 10:00:44 +08:00
Jin, Qiao
9f17234f3b
Add MiniCPM-V-2_6 to iGPU Perf (#11810)
* Add MiniCPM-V-2_6 to iGPU Perf

* keep last model in yaml

* fix MINICPM_V_IDS

* Restore tested model list

* Small fix

---------

Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
2024-08-16 18:41:21 +08:00
Yuwen Hu
96796f95cb
Update all-in-one benchmark prompts for continuation task & lookup update for minicpmv (#11827)
* Update all-in-one benchmark prompts for continuation task

* Small fix

* Add pure-text benchmark support for minicpm-v-2_6

* Support lookahead for model.llm generate of minicpmv

* Add prompt reference

* Small update

* Small fix
2024-08-16 17:16:35 +08:00
RyuKosei
3b630fb9df
updated ppl README (#11807)
* edit README.md

* update the branch

* edited README.md

* updated

* updated description

---------

Co-authored-by: jenniew <jenniewang123@gmail.com>
2024-08-16 15:49:25 +08:00
Chu,Youcheng
28d1c972da
add mixed_precision argument on ppl wikitext evaluation (#11813)
* fix: delete ipex extension import in ppl wikitext evaluation

* feat: add mixed_precision argument on ppl wikitext evaluation

* fix: delete mix_precision command in perplex evaluation for wikitext

* fix: remove fp16 mixed-presicion argument

* fix: Add a space.

---------

Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>
2024-08-15 17:58:53 +08:00
Chu,Youcheng
3ac83f8396
fix: delete ipex extension import in ppl wikitext evaluation (#11806)
Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>
2024-08-15 13:40:01 +08:00
Yuwen Hu
356281cb80
Further all-in-one benchmark update continuation task (#11784)
* Further update prompt for continuation task, and disable lookup candidate update strategy on MTL

* style fix
2024-08-14 14:39:34 +08:00
Yuwen Hu
81824ff8c9
Fix stdout in all-in-one benchmark to utf-8 (#11772) 2024-08-13 10:51:08 +08:00
Yuwen Hu
f97a77ea4e
Update all-in-one benchmark for continuation task input preparation (#11760)
* All use 8192.txt for prompt preparation for now

* Small fix

* Fix text encoding mode to utf-8

* Small update
2024-08-12 17:49:45 +08:00
Jin, Qiao
05989ad0f9
Update npu example and all in one benckmark (#11766) 2024-08-12 16:46:46 +08:00
Ruonan Wang
66fe2ee464
initial support of IPEX_LLM_PERFORMANCE_MODE (#11754)
* add perf mode

* update

* fix style
2024-08-09 19:04:09 +08:00
Zijie Li
e7f7141781
Add benchmark util for transformers 4.42 (#11725)
* add new benchmark_util.py

Add new benchmark_util.py for transformers>=4.43.1. The old one renamed to benchmark_util_prev.py.

* Small fix to import code

* Update __init__.py

* fix file names

* Update lint-python

Update lint-python to exclude benchmark_util_4_29.py
benchmark_util_4_43.py

* Update benchmark_util_4_43.py

* add benchmark_util for transformers 4.42
2024-08-07 08:48:07 +08:00
Zijie Li
8fb36b9f4a
add new benchmark_util.py (#11713)
* add new benchmark_util.py
2024-08-05 16:18:48 +08:00
RyuKosei
1da1f1dd0e
Combine two versions of run_wikitext.py (#11597)
* Combine two versions of run_wikitext.py

* Update run_wikitext.py

* Update run_wikitext.py

* aligned the format

* update error display

* simplified argument parser

---------

Co-authored-by: jenniew <jenniewang123@gmail.com>
2024-07-29 15:56:16 +08:00
Yishuo Wang
7f88ce23cd
add more gemma2 optimization (#11673) 2024-07-29 11:13:00 +08:00
Yishuo Wang
3e8819734b
add basic gemma2 optimization (#11672) 2024-07-29 10:46:51 +08:00