Commit graph

3331 commits

Author SHA1 Message Date
Jinhe
18662dca1c
change 5 pytorch/huggingface models to fp16 (#11894) 2024-08-22 16:12:09 +08:00
Wang, Jian4
5c4ed00593
Add lightweight-serving whisper asr example (#11847)
* add asr init

* update for pp

* update style

* update readme

* update reamde
2024-08-22 15:46:28 +08:00
Jinhe
a8e2573421
added tokenization file for codegeex2-6b in pytorch-models(#11875)
* added tokenization file

* tokenization file readme update

* optional
2024-08-22 14:37:56 +08:00
Yuwen Hu
bac98baab9
Make performance test install specific ipex-llm version from pypi (#11892) 2024-08-22 11:10:12 +08:00
binbin Deng
72a7bf624b
Support qwen2-1.5b with fused decoderlayer optimization on NPU (#11888) 2024-08-22 11:09:12 +08:00
Zijie Li
bdbe995b01
Update README.md (#11889)
Set datasets version to 2.16.1. Clear out the transformers version requirement.
2024-08-22 09:40:16 +08:00
Yina Chen
cc27321441
support chatglm4 in lookup (#11855) 2024-08-21 15:53:17 +08:00
Yina Chen
0236de3ac2
set IPEX_LLM_LAST_LM_HEAD=1 as default (#11885) 2024-08-21 15:06:12 +08:00
SONG Ge
8c5c7f32dd
Update doc for running npu generate example with ipex-llm[npu] (#11876)
* update doc for running npu generate example with ipex-llm[npu]

* switch max_prompt_len to 512 to fix compile error on mtl
2024-08-21 13:45:29 +08:00
Yang Wang
209d42ab79
Refactor npu mp to make it easier to integrate new models (#11873)
* Refactor npu mp to make it easier to integrate new models

* fix style

* move layer functions to base
2024-08-20 20:58:47 -07:00
Guancheng Fu
537c0d2767
fix vllm qwen2 models (#11879) 2024-08-21 11:05:24 +08:00
Yishuo Wang
bd1e490d62
fix phi3 (#11878) 2024-08-21 10:31:41 +08:00
Yuwen Hu
eab6f6dde4
Spr perf small fix (#11874) 2024-08-21 09:35:26 +08:00
Yuwen Hu
37106a877c
igpu performance test smal fix (#11872) 2024-08-21 03:09:14 +08:00
Yang Wang
bdaeee1d63
Fix run_decoders bug (#11871) 2024-08-20 12:04:59 -07:00
Chu,Youcheng
32f0a77846
feat: update readme for ppl test (#11865)
* feat: update readme for ppl test

* fix: textual adjustments

* fix: textual adjustments

* Add ipex-llm npu option in setup.py (#11858)

* add ipex-llm npu release

* update example doc

* meet latest release changes

* optimize phi3 memory usage (#11867)

* Update `ipex-llm` default transformers version to 4.37.0 (#11859)

* Update default transformers version to 4.37.0

* Add dependency requirements for qwen and qwen-vl

* Temp fix transformers version for these not yet verified models

* Skip qwen test in UT for now as it requires transformers<4.37.0

* Update performance test regarding updated default `transformers==4.37.0` (#11869)

* Update igpu performance from transformers 4.36.2 to 4.37.0 (#11841)

* upgrade arc perf test to transformers 4.37 (#11842)

* fix load low bit com dtype (#11832)

* feat: add mixed_precision argument on ppl longbench evaluation

* fix: delete extra code

* feat: upgrade arc perf test to transformers 4.37

* fix: add missing codes

* fix: keep perf test for qwen-vl-chat in transformers 4.36

* fix: remove extra space

* fix: resolve pr comment

* fix: add empty line

* fix: add pip install for spr and core test

* fix: delete extra comments

* fix: remove python -m for pip

* Revert "fix load low bit com dtype (#11832)"

This reverts commit 6841a9ac8f.

---------

Co-authored-by: Zhao Changmin <changmin.zhao@intel.com>
Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>

* add transformers==4.36 for qwen vl in igpu-perf (#11846)

* add transformers==4.36.2 for qwen-vl

* Small update

---------

Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>

* fix: remove qwen-7b on core test (#11851)

* fix: remove qwen-7b on core test

* fix: change delete to comment

---------

Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>

* replce filename (#11854)

* fix: remove qwen-7b on core test

* fix: change delete to comment

* fix: replace filename

---------

Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>

* fix: delete extra comments (#11863)

* Remove transformers installation for temp test purposes

* Small fix

* Small update

---------

Co-authored-by: Chu,Youcheng <70999398+cranechu0131@users.noreply.github.com>
Co-authored-by: Zhao Changmin <changmin.zhao@intel.com>
Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>
Co-authored-by: Zijie Li <michael20001122@gmail.com>
Co-authored-by: Chu,Youcheng <1340390339@qq.com>

* Pytorch models transformers version update (#11860)

* yi sync

* delete 4.34 constraint

* delete 4.34 constraint

* delete 4.31 constraint

* delete 4.34 constraint

* delete 4.35 constraint

* added <=4.33.3 constraint

* added <=4.33.3 constraint

* switched to chinese prompt

* Update compresskv model forward type logic (#11868)

* update

* fix

* Update local import for ppl (#11866)

Co-authored-by: jenniew <jenniewang123@gmail.com>

* fix: textual adjustment

---------

Co-authored-by: SONG Ge <38711238+sgwhat@users.noreply.github.com>
Co-authored-by: Yishuo Wang <yishuo.wang@intel.com>
Co-authored-by: Yuwen Hu <54161268+Oscilloscope98@users.noreply.github.com>
Co-authored-by: Zhao Changmin <changmin.zhao@intel.com>
Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>
Co-authored-by: Zijie Li <michael20001122@gmail.com>
Co-authored-by: Yina Chen <33650826+cyita@users.noreply.github.com>
Co-authored-by: RyuKosei <70006706+RyuKosei@users.noreply.github.com>
Co-authored-by: jenniew <jenniewang123@gmail.com>
2024-08-20 20:13:54 +08:00
RyuKosei
5df00869de
Update local import for ppl (#11866)
Co-authored-by: jenniew <jenniewang123@gmail.com>
2024-08-20 18:50:00 +08:00
Yina Chen
c3c058373f
Update compresskv model forward type logic (#11868)
* update

* fix
2024-08-20 18:11:37 +08:00
Jinhe
3ee194d983
Pytorch models transformers version update (#11860)
* yi sync

* delete 4.34 constraint

* delete 4.34 constraint

* delete 4.31 constraint

* delete 4.34 constraint

* delete 4.35 constraint

* added <=4.33.3 constraint

* added <=4.33.3 constraint

* switched to chinese prompt
2024-08-20 18:01:42 +08:00
Yuwen Hu
0d58c2fdf9
Update performance test regarding updated default transformers==4.37.0 (#11869)
* Update igpu performance from transformers 4.36.2 to 4.37.0 (#11841)

* upgrade arc perf test to transformers 4.37 (#11842)

* fix load low bit com dtype (#11832)

* feat: add mixed_precision argument on ppl longbench evaluation

* fix: delete extra code

* feat: upgrade arc perf test to transformers 4.37

* fix: add missing codes

* fix: keep perf test for qwen-vl-chat in transformers 4.36

* fix: remove extra space

* fix: resolve pr comment

* fix: add empty line

* fix: add pip install for spr and core test

* fix: delete extra comments

* fix: remove python -m for pip

* Revert "fix load low bit com dtype (#11832)"

This reverts commit 6841a9ac8f.

---------

Co-authored-by: Zhao Changmin <changmin.zhao@intel.com>
Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>

* add transformers==4.36 for qwen vl in igpu-perf (#11846)

* add transformers==4.36.2 for qwen-vl

* Small update

---------

Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>

* fix: remove qwen-7b on core test (#11851)

* fix: remove qwen-7b on core test

* fix: change delete to comment

---------

Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>

* replce filename (#11854)

* fix: remove qwen-7b on core test

* fix: change delete to comment

* fix: replace filename

---------

Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>

* fix: delete extra comments (#11863)

* Remove transformers installation for temp test purposes

* Small fix

* Small update

---------

Co-authored-by: Chu,Youcheng <70999398+cranechu0131@users.noreply.github.com>
Co-authored-by: Zhao Changmin <changmin.zhao@intel.com>
Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>
Co-authored-by: Zijie Li <michael20001122@gmail.com>
Co-authored-by: Chu,Youcheng <1340390339@qq.com>
2024-08-20 17:59:28 +08:00
Yuwen Hu
5e8286f72d
Update ipex-llm default transformers version to 4.37.0 (#11859)
* Update default transformers version to 4.37.0

* Add dependency requirements for qwen and qwen-vl

* Temp fix transformers version for these not yet verified models

* Skip qwen test in UT for now as it requires transformers<4.37.0
2024-08-20 17:37:58 +08:00
Yishuo Wang
d4ee0a89f3
optimize phi3 memory usage (#11867) 2024-08-20 17:32:51 +08:00
SONG Ge
5b83493b1a
Add ipex-llm npu option in setup.py (#11858)
* add ipex-llm npu release

* update example doc

* meet latest release changes
2024-08-20 17:29:49 +08:00
Heyang Sun
ee6852c915
Fix typo (#11862) 2024-08-20 16:38:11 +08:00
Yishuo Wang
2946420e14
add minicpmv 2.6 load_low_bit workaround (#11856) 2024-08-20 11:16:02 +08:00
SONG Ge
7380823f3f
Update Llama2 multi-processes example (#11852)
* update llama2 multi-processes examples

* update

* update readme

* update
2024-08-19 19:49:01 +08:00
Yang Wang
99b05ba1dc
separate prefill into a process (#11787)
* seperate prefill into a process

* using model.share_memory()

* might work

* worked

* use long prompt

* refactor

* cleanup

* fix bug

* clean up

* changable inter and intra process stages

* refactor

* add max output len

* fix npu_model changes that may cause generate down

* fix npu_model generate import error

* fix generare forward error

---------

Co-authored-by: sgwhat <ge.song@intel.com>
2024-08-19 17:53:36 +08:00
Jinhe
da3d7a3a53
delete transformers version requirement (#11845)
* delete transformers version requirement

* delete transformers version requirement
2024-08-19 17:53:02 +08:00
Ruonan Wang
a0fbda5bc8
add MiniCPM-Llama3-V-2_5 into all-in-one benchmark (#11849) 2024-08-19 17:51:16 +08:00
Yishuo Wang
9490781aec
optimize phi3 memory usage again (#11848) 2024-08-19 17:26:59 +08:00
Yina Chen
3cd4e87168
Support compress KV with quantize KV (#11812)
* update llama

* support llama 4.41

* fix style

* support minicpm

* support qwen2

* support minicpm & update

* support chatglm4

* support chatglm

* remove print

* add DynamicCompressFp8Cache & support qwen

* support llama

* support minicpm phi3

* update chatglm2/4

* small fix & support qwen 4.42

* remove print
2024-08-19 15:32:32 +08:00
Zhao Changmin
6841a9ac8f
fix load low bit com dtype (#11832) 2024-08-19 13:43:19 +08:00
Yuwen Hu
cfc959defa
Fixes regarding utf-8 in all-in-one benchmark (#11839) 2024-08-19 10:38:00 +08:00
Chu,Youcheng
46a1cbfa64
feat: add mixed_precision argument on ppl longbench evaluation (#11837)
* feat: add mixed_precision argument on ppl longbench evaluation

* fix: delete two spaces

---------

Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>
2024-08-19 10:00:44 +08:00
Yuwen Hu
580c94d0e2
Remove gemma-2-9b-it 3k input from igpu-perf (#11834) 2024-08-17 13:10:05 +08:00
Jin, Qiao
9f17234f3b
Add MiniCPM-V-2_6 to iGPU Perf (#11810)
* Add MiniCPM-V-2_6 to iGPU Perf

* keep last model in yaml

* fix MINICPM_V_IDS

* Restore tested model list

* Small fix

---------

Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
2024-08-16 18:41:21 +08:00
Yuwen Hu
96796f95cb
Update all-in-one benchmark prompts for continuation task & lookup update for minicpmv (#11827)
* Update all-in-one benchmark prompts for continuation task

* Small fix

* Add pure-text benchmark support for minicpm-v-2_6

* Support lookahead for model.llm generate of minicpmv

* Add prompt reference

* Small update

* Small fix
2024-08-16 17:16:35 +08:00
Yishuo Wang
e966e85df8
force lm_head optimization in any model if set environment variable (#11830) 2024-08-16 16:48:45 +08:00
RyuKosei
3b630fb9df
updated ppl README (#11807)
* edit README.md

* update the branch

* edited README.md

* updated

* updated description

---------

Co-authored-by: jenniew <jenniewang123@gmail.com>
2024-08-16 15:49:25 +08:00
Jinhe
e07a55665c
Codegeex2 tokenization fix (#11831)
* updated tokenizer file

* updated tokenizer file

* updated tokenizer file

* updated tokenizer file

* new folder
2024-08-16 15:48:47 +08:00
Jinhe
a508b0a902
added link to minicpm-v-2_6 example (#11829) 2024-08-16 14:49:23 +08:00
Jinhe
adfbb9124a
Reorganize MiniCPM-V-2_6 example & update others MiniCPM-V-2 exmaples (#11815)
* model to fp16 & 2_6 reorganize

* revisions

* revisions

* half

* deleted transformer version requirements

* deleted transformer version requirements

---------

Co-authored-by: ivy-lv11 <zhicunlv@gmail.com>
2024-08-16 14:48:56 +08:00
Chu,Youcheng
f463268e36
fix: add run oneAPI instruction for the example of codeshell (#11828)
* fix: delete ipex extension import in ppl wikitext evaluation

* feat: add mixed_precision argument on ppl wikitext evaluation

* fix: delete mix_precision command in perplex evaluation for wikitext

* fix: remove fp16 mixed-presicion argument

* fix: Add a space.

* fix: add run oneAPI instruction for the example of codeshell

* fix: textual adjustments

* fix: Textual adjustment

---------

Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>
2024-08-16 14:29:06 +08:00
Yishuo Wang
17a0beb21f
optimize qwen2-audio again (#11825) 2024-08-16 11:11:35 +08:00
Jason Dai
6a8d07ddb4
Update README.md (#11824) 2024-08-16 10:22:02 +08:00
Yuwen Hu
9e9086cc2a
Update IPEX_LLM_PERFORMANCE_MODE (#11823) 2024-08-16 09:48:36 +08:00
Wang, Jian4
5a80fd2633
Fix lightweight-serving no streaming resp on mtl (#11822) 2024-08-16 09:43:03 +08:00
Guancheng Fu
e70ae0638e
Fix vLLM not convert issues (#11817)
* Fix not convert issues

* refine
2024-08-15 19:04:05 +08:00
Yishuo Wang
750d4ad5dc
fix minicpm-v-2 fp16 (#11819) 2024-08-15 18:34:40 +08:00
Yuwen Hu
6543321f04
Remove 4k igpu perf on gemma-2-9b-it (#11820) 2024-08-15 18:06:19 +08:00