Yishuo Wang
d8c044e79d
optimize minicpm3 kv cache ( #12052 )
2024-09-10 16:51:21 +08:00
Wang, Jian4
5d3ab16a80
Add vllm glm and baichuan padding ( #12053 )
2024-09-10 15:57:28 +08:00
Guancheng Fu
69c8d36f16
Switching from vLLM v0.3.3 to vLLM 0.5.4 ( #12042 )
...
* Enable single card sync engine
* enable ipex-llm optimizations for vllm
* enable optimizations for lm_head
* Fix chatglm multi-reference problem
* Remove duplicate layer
* LLM: Update vLLM to v0.5.4 (#11746 )
* Enable single card sync engine
* enable ipex-llm optimizations for vllm
* enable optimizations for lm_head
* Fix chatglm multi-reference problem
* update 0.5.4 api_server
* add dockerfile
* fix
* fix
* refine
* fix
---------
Co-authored-by: gc-fu <guancheng.fu@intel.com>
* Add vllm-0.5.4 Dockerfile (#11838 )
* Update BIGDL_LLM_SDP_IGNORE_MASK in start-vllm-service.sh (#11957 )
* Fix vLLM not convert issues (#11817 ) (#11918 )
* Fix not convert issues
* refine
Co-authored-by: Guancheng Fu <110874468+gc-fu@users.noreply.github.com>
* Fix glm4-9b-chat nan error on vllm 0.5.4 (#11969 )
* init
* update mlp forward
* fix minicpm error in vllm 0.5.4
* fix dependabot alerts (#12008 )
* Update 0.5.4 dockerfile (#12021 )
* Add vllm awq loading logic (#11987 )
* [ADD] Add vllm awq loading logic
* [FIX] fix the module.linear_method path
* [FIX] fix quant_config path error
* Enable Qwen padding mlp to 256 to support batch_forward (#12030 )
* Enable padding mlp
* padding to 256
* update style
* Install 27191 runtime in 0.5.4 docker image (#12040 )
* fix rebase error
* fix rebase error
* vLLM: format for 0.5.4 rebase (#12043 )
* format
* Update model_convert.py
* Fix serving docker related modifications (#12046 )
* Fix undesired modifications (#12048 )
* fix
* Refine offline_inference arguments
---------
Co-authored-by: Xiangyu Tian <109123695+xiangyuT@users.noreply.github.com>
Co-authored-by: Jun Wang <thoughts.times@gmail.com>
Co-authored-by: Wang, Jian4 <61138589+hzjane@users.noreply.github.com>
Co-authored-by: liu-shaojun <johnssalyn@outlook.com>
Co-authored-by: Shaojun Liu <61072813+liu-shaojun@users.noreply.github.com>
2024-09-10 15:37:43 +08:00
Ch1y0q
73a4360f3f
update lowbit path for baichuan2, qwen2, generate.py ( #12051 )
...
* update lowbit path for baichuan2, qwen2, `generate.py`
* update readme
2024-09-10 15:35:24 +08:00
Ruonan Wang
dc4af02b2a
Fix qwen2 1.5B NPU load error ( #12049 )
2024-09-10 14:41:18 +08:00
Yishuo Wang
abc370728c
optimize minicpm3 again ( #12047 )
2024-09-10 14:19:57 +08:00
Ch1y0q
f0061a9916
remove local import os to fix Baichuan NPU load issue ( #12044 )
2024-09-10 14:13:24 +08:00
Ruonan Wang
640998edea
update inter_pp of qwen2 ( #12041 )
2024-09-10 10:34:17 +08:00
Yishuo Wang
048b4590aa
add basic minicpm3 optimization ( #12039 )
2024-09-09 17:25:08 +08:00
Chu,Youcheng
16c658e732
LLM: add known issues to harness evaluation ( #12036 )
...
* feat: 在harness中添加known issue
* fix: resolve comments
* fix: small fixes
2024-09-09 14:15:42 +08:00
Yishuo Wang
6cedb601e4
remove some useless code ( #12035 )
2024-09-06 17:51:08 +08:00
binbin Deng
d2e1b9aaff
Add input padding during prefill for qwen2-7b ( #12033 )
2024-09-06 16:39:59 +08:00
Yuwen Hu
f61b1785fb
Small update to NPU example readme ( #12034 )
...
* Small update to NPU example readme
* Small fix
2024-09-06 15:54:23 +08:00
Ruonan Wang
0d04531ae0
update NPU readme of Qwen2 ( #12032 )
...
* update readme
* update broadcast
2024-09-06 15:02:39 +08:00
Yang Wang
58555bd9de
Optimize broadcast for npu llama ( #12028 )
2024-09-06 13:28:20 +08:00
binbin Deng
5b18bb3c4a
Add recommend version for mtl npu ( #12024 )
2024-09-05 16:28:53 +08:00
binbin Deng
845e5dc89e
Support lm_head of minicpm-2b on NPU ( #12019 )
2024-09-05 16:19:22 +08:00
Ch1y0q
820f8a4554
add --lowbit-path option for NPU llama example ( #12020 )
...
* add option" `--lowbit-path`
* add descriptions in `README.md` and formatting
* Update llama.py
2024-09-05 15:31:01 +08:00
Guoqiong Song
8803242f5c
fix llama on cpu ( #12018 )
2024-09-04 19:17:54 -07:00
Wang, Jian4
b3b2cd64b4
Support lightweight-serving glm-4v-9b ( #11994 )
...
* enable glm-4v-9b serving
* update readme
* update for no image input
2024-09-05 09:25:08 +08:00
Yishuo Wang
b1408a1f1c
fix UT ( #12005 )
2024-09-04 18:02:49 +08:00
Wang, Jian4
2b993ad479
vllm update for glm-4 model automatic not_convert ( #12003 )
2024-09-04 13:50:32 +08:00
Ruonan Wang
9eaff5e47d
add save & load support for NPU optimized model ( #11999 )
...
* add save & load support
* fix style
2024-09-03 20:53:22 +08:00
Yuwen Hu
6eb55653ba
Performance mode strategy update for input_embeds input ( #11997 )
2024-09-03 17:46:16 +08:00
Jinhe
164f47adbd
MiniCPM-V-2 & MiniCPM-Llama3-V-2_5 example updates ( #11988 )
...
* minicpm example updates
* --stream
2024-09-03 17:02:06 +08:00
Jin, Qiao
2e54f4402b
Rename MiniCPM-V-2_6 CPU example ( #11998 )
2024-09-03 16:50:42 +08:00
binbin Deng
01099f08ee
Revert prefill logic of qwen2-7b ( #11992 )
2024-09-03 14:45:01 +08:00
Yuwen Hu
659d15defc
Fix wrong attention mask and garbage output for inputs_embeds inputs during lookup generation ( #11989 )
...
* Fix garbage output for input_embeds inputs during lookup generation
* Fix on sliding windows
* Simplify code
2024-09-02 19:09:12 +08:00
binbin Deng
2f3d1bd0ec
hotfix qwen2-7b weight setting ( #11991 )
2024-09-02 18:11:08 +08:00
binbin Deng
a40ea7038d
Fix AttributeError of qwen2-1.5B ( #11990 )
2024-09-02 17:55:10 +08:00
Yang Wang
c48817bd43
Support Qwen2-7b MLP in int4 and transpose_value_cache=True ( #11968 )
2024-09-02 14:37:44 +08:00
Jin, Qiao
65e281bb29
Add MiniCPM-V cpu example ( #11975 )
...
* Add MiniCPM-V cpu example
* fix
* fix
* fix
* fix
2024-09-02 10:17:57 +08:00
Ruonan Wang
79978e6f36
update npu multimodal readme ( #11979 )
...
* update npu readme of multimodal
* small fix
* meet comment
2024-08-30 19:02:06 +08:00
Ruonan Wang
4811a490ef
small fix ( #11978 )
...
* fix
* meet comment
2024-08-30 17:55:15 +08:00
Ruonan Wang
573c20bae6
fix npu lm_head cpu condition ( #11976 )
...
* fix
* fix
* fix
* fix stype
* fix style
* fix style
2024-08-30 17:11:26 +08:00
Ruonan Wang
60aa1a2c0f
Initial NPU support for MiniCPM-V-2_6 ( #11966 )
...
* initial pr
* update npu model
* fix
* fix kv cache type
* fix
* small fix
* fix style
* fix model id
* change inter_pp=4
* address comment
* fix
* fix style
* fix
* rebase
2024-08-30 16:34:35 +08:00
SONG Ge
158289d205
[NPU] Add initial support for minicpm-llama-v2.5 ( #11962 )
...
* add initial support for minicpm-llama-v2.5
* update impl
* add minicpm-llama3-v2.5 example
2024-08-30 16:00:33 +08:00
Chu,Youcheng
ae7302a654
add gptq option for ppl test ( #11921 )
...
* feat:add gptq for ppl
* fix: add an empty line
* fix: add an empty line
* fix: remove an empty line
* Resolve comments
* Resolve comments
* Resolve comments
2024-08-30 13:43:48 +08:00
binbin Deng
cd077881f1
Disable lm head ( #11972 )
2024-08-30 11:05:18 +08:00
Wang, Jian4
7d103417b8
Fix glm4-9b-chat nan error on vllm 0.3.3 ( #11970 )
...
* fix nan value
* update
2024-08-30 09:50:18 +08:00
Yang Wang
fbf088f61e
remove obselete npu code ( #11967 )
2024-08-29 14:16:44 -07:00
Yuwen Hu
a9e485eb1b
Support MiniCPM-V-2_6 multi-modal benchmarking with latency text streamer ( #11963 )
...
* Support MiniCPM-V-2_6 multi-modal benchmarking with latency text streamer
* Style fixes
2024-08-29 19:22:09 +08:00
Yuwen Hu
2e49e1f8e9
Further fix for MiniCPM-V-2_6 example ( #11965 )
2024-08-29 19:14:13 +08:00
Jason Dai
431affd0a0
Update README.md ( #11964 )
2024-08-29 18:56:35 +08:00
binbin Deng
14b2c8dc32
Update qwen2-7b example script ( #11961 )
2024-08-29 18:25:17 +08:00
Yuwen Hu
7abe17d6f7
Update MiniCPM-V-2_6 Example ( #11958 )
...
* Update example scripts regarding warmup, stream generate, moudles to not convert, etc.
* Update readme accordingly
* Fix based on comments
* Small fix
* Remove n_predict
2024-08-29 18:23:48 +08:00
Yina Chen
5f7ff76ea5
update troubleshooting ( #11960 )
2024-08-29 17:44:22 +08:00
Yina Chen
882f4a5ff7
Add lnl npu driver recommend version and enable cpu_lm_head on llama3 ( #11952 )
...
* update lnl npu driver version and enable cpu_lm_head on llama3
* update
* fix style
* typo
* address comments
* update
* add qwen2-7b
2024-08-29 15:01:18 +08:00
binbin Deng
71f03dcc39
Support qwen2-7b with fused decoderlayer optimization on NPU ( #11912 )
2024-08-29 13:34:20 +08:00
Jiao Wang
63ac5f64bb
Refactor NPU baichuan multiple-process ( #11945 )
...
* update
* add baichuan mp
* clean
* refactor
* merge
* style
* update
* update
2024-08-28 11:33:40 -07:00
SONG Ge
5ca7390082
[NPU] Add minicpm-2b support for npu multi-processing ( #11949 )
...
* add minicpm-2b support
* update example for minicpm-2b
* add LNL NPU driver requirement in readme
2024-08-28 18:08:49 +08:00
Yishuo Wang
0fbb10259a
use sdp_causal to reduce internvl2-4b memory usage if set environment variable ( #11953 )
2024-08-28 17:35:05 +08:00
Guancheng Fu
0a7bd274e2
Add vllm awq loading logic ( #11950 )
...
* add vllm awq loading logic
* fix
* refine
2024-08-28 16:46:18 +08:00
Yina Chen
b38fb67bec
[NPU] lm head to cpu ( #11943 )
...
* lm head to cpu
* qwen2
* mv logic and add param to disable cpu_lm_head
* use env and lm_head opt to mp file
* fix
* update
* remove print
2024-08-28 16:34:07 +08:00
hxsz1997
e23549f63f
Update llamaindex examples ( #11940 )
...
* modify rag.py
* update readme of gpu example
* update llamaindex cpu example and readme
* add llamaindex doc
* update note style
* import before instancing IpexLLMEmbedding
* update index in readme
* update links
* update link
* update related links
2024-08-28 14:03:44 +08:00
binbin Deng
bec00e2015
Improve baichuan2 NPU performance ( #11942 )
2024-08-27 18:37:08 +08:00
Zijie Li
90f692937d
Update npu baichuan2 ( #11939 )
2024-08-27 16:56:26 +08:00
binbin Deng
7f7f6c89f5
Quick fix benchmark script ( #11938 )
2024-08-27 15:29:27 +08:00
Jiao Wang
b4b6ddf73c
NPU Baichuan2 Multi- Process example ( #11928 )
2024-08-27 15:25:49 +08:00
SONG Ge
e211a5b076
update minicpm to meet latest refactor ( #11937 )
2024-08-27 15:08:01 +08:00
SONG Ge
a81a329a5f
[NPU] Add example for NPU multi-processing minicpm-1b model ( #11935 )
...
* add minicpm example
2024-08-27 14:57:46 +08:00
binbin Deng
7c8c9a0670
Update benchmark script for NPU ( #11932 )
2024-08-27 14:41:14 +08:00
Ch1y0q
730d9ec811
Add Qwen2-audio example ( #11835 )
...
* add draft for qwen2-audio
* update example for `Qwen2-Audio`
* update
* update
* add warmup
2024-08-27 13:35:24 +08:00
Shaojun Liu
b11b28e9a9
update CORE_XE_VERSION to 2.6.0 ( #11929 )
2024-08-27 13:10:13 +08:00
Yina Chen
e246f1e258
update llama3 npu example ( #11933 )
2024-08-27 13:03:18 +08:00
binbin Deng
14dddfc0d6
Update NPU example readme ( #11931 )
2024-08-27 12:44:58 +08:00
Zijie Li
6c3eb1e1e8
refactor from_pretrained API for NPU ( #11927 )
2024-08-27 09:50:30 +08:00
Xiangyu Tian
7ca557aada
LLM: Fix vLLM CPU convert error ( #11926 )
2024-08-27 09:22:19 +08:00
Yuwen Hu
c1d07bc626
Support streaming for lookup generation ( #11922 )
...
* Support streaming for lookup generation
* Small update
* Style fixes
* Add origin generate full back for batch inference and beam search; support input length threshold judgement for directly input with input_ids
* Fix lookup stream generate with eos token
* Small fixes
* Small fix
* index fix
* Small fix
2024-08-26 19:33:31 +08:00
Yuwen Hu
a0bbd8e28d
All-in-one benchmark update regarding performance mode for input length threshold ( #11920 )
...
* All-in-one benchmark update regarding performance mode input length threshold
* typo fix
2024-08-26 18:52:13 +08:00
SONG Ge
019f725d4d
[NPU] Add support for running mp minicpm model on npu ( #11909 )
...
* add initial support for npu minicpm mp
* fix minicpm-1b abnormal output error
2024-08-26 17:52:55 +08:00
binbin Deng
dd303776cf
Add troubleshooting about transpose value setting
2024-08-26 16:06:32 +08:00
Yuwen Hu
24c279e0ae
Update IPEX_LLM_PERFORMANCE_MODE with input length threshold ( #11908 )
...
* Update IPEX_LLM_PERFORMANCE_MODE with input length threshold
* Update based on comments. And and judgement for inputs_embeds
* Fix for benchmarking purposes
* Update based on comments
* Small fix
2024-08-23 20:49:15 +08:00
binbin Deng
303a090a6b
Add lm_head optimization on NPU ( #11903 )
2024-08-23 15:51:07 +08:00
Yina Chen
23631cd357
disable lm_head opt for baichuan2-13b ( #11905 )
2024-08-23 15:39:47 +08:00
hxsz1997
650e6e6ce4
Merge pull request #11891 from hxsz1997/baichuan2-compresskv
...
Add compress_kv for Baichuan2
2024-08-23 06:09:58 +03:00
Ruonan Wang
4a61f7d20d
update mlp of llama ( #11897 )
...
* update mlp of llama
* relax threshold of mlp test
* revert code
2024-08-22 20:34:53 +08:00
Yuwen Hu
420ce7d164
Fix non-stop at eos token problem for lookup generation ( #11896 )
...
* Fix non-stop by eos_token_id problem for lookup
* Small fix
* Add judgement when generation_config.eos_token_id is None
* Fix based on comments
2024-08-22 18:55:59 +08:00
Huang, Xinshengzi
4cf03d6212
update baichuan-7b
2024-08-22 18:16:33 +08:00
Zijie Li
794abe2ce8
update npu-readme ( #11900 )
2024-08-22 17:49:35 +08:00
Guancheng Fu
278b191dc1
Fix optimize lm head error ( #11899 )
2024-08-22 17:45:26 +08:00
Shaojun Liu
c5b51d41fb
Update pypi tag to 2.2.0.dev0 ( #11895 )
2024-08-22 16:48:09 +08:00
Jinhe
18662dca1c
change 5 pytorch/huggingface models to fp16 ( #11894 )
2024-08-22 16:12:09 +08:00
Wang, Jian4
5c4ed00593
Add lightweight-serving whisper asr example ( #11847 )
...
* add asr init
* update for pp
* update style
* update readme
* update reamde
2024-08-22 15:46:28 +08:00
Huang, Xinshengzi
eb1e65f8a9
add comment
2024-08-22 15:14:47 +08:00
Huang, Xinshengzi
a2be3d7501
add comment of compress kv in attention forward
2024-08-22 15:11:55 +08:00
Jinhe
a8e2573421
added tokenization file for codegeex2-6b in pytorch-models( #11875 )
...
* added tokenization file
* tokenization file readme update
* optional
2024-08-22 14:37:56 +08:00
Huang, Xinshengzi
ce7de77085
add comment of change in model forward
2024-08-22 14:29:27 +08:00
Huang, Xinshengzi
42398a0045
add comment
2024-08-22 13:17:13 +08:00
Huang, Xinshengzi
48a827aa07
fix typos
2024-08-22 11:35:47 +08:00
Huang, Xinshengzi
8a5df93de2
fix typos
2024-08-22 11:33:07 +08:00
Huang, Xinshengzi
01ed397e7a
fix typos
2024-08-22 11:31:25 +08:00
Huang, Xinshengzi
c6ed1c412d
fix typos
2024-08-22 11:26:49 +08:00
Huang, Xinshengzi
2a0aa9271b
fix typos
2024-08-22 11:23:22 +08:00
Huang, Xinshengzi
4adadddbbc
fix typos
2024-08-22 11:12:23 +08:00
Huang, Xinshengzi
6a5ca17afc
fix typoes
2024-08-22 11:09:58 +08:00
binbin Deng
72a7bf624b
Support qwen2-1.5b with fused decoderlayer optimization on NPU ( #11888 )
2024-08-22 11:09:12 +08:00
Huang, Xinshengzi
6bb9035788
fix typos
2024-08-22 11:08:48 +08:00
Huang, Xinshengzi
86248b0505
add compress_kv for baichuan2
2024-08-22 10:59:08 +08:00
Zijie Li
bdbe995b01
Update README.md ( #11889 )
...
Set datasets version to 2.16.1. Clear out the transformers version requirement.
2024-08-22 09:40:16 +08:00
Yina Chen
cc27321441
support chatglm4 in lookup ( #11855 )
2024-08-21 15:53:17 +08:00
Yina Chen
0236de3ac2
set IPEX_LLM_LAST_LM_HEAD=1 as default ( #11885 )
2024-08-21 15:06:12 +08:00
SONG Ge
8c5c7f32dd
Update doc for running npu generate example with ipex-llm[npu] ( #11876 )
...
* update doc for running npu generate example with ipex-llm[npu]
* switch max_prompt_len to 512 to fix compile error on mtl
2024-08-21 13:45:29 +08:00
Yang Wang
209d42ab79
Refactor npu mp to make it easier to integrate new models ( #11873 )
...
* Refactor npu mp to make it easier to integrate new models
* fix style
* move layer functions to base
2024-08-20 20:58:47 -07:00
Guancheng Fu
537c0d2767
fix vllm qwen2 models ( #11879 )
2024-08-21 11:05:24 +08:00
Yishuo Wang
bd1e490d62
fix phi3 ( #11878 )
2024-08-21 10:31:41 +08:00
Yuwen Hu
eab6f6dde4
Spr perf small fix ( #11874 )
2024-08-21 09:35:26 +08:00
Yang Wang
bdaeee1d63
Fix run_decoders bug ( #11871 )
2024-08-20 12:04:59 -07:00
Chu,Youcheng
32f0a77846
feat: update readme for ppl test ( #11865 )
...
* feat: update readme for ppl test
* fix: textual adjustments
* fix: textual adjustments
* Add ipex-llm npu option in setup.py (#11858 )
* add ipex-llm npu release
* update example doc
* meet latest release changes
* optimize phi3 memory usage (#11867 )
* Update `ipex-llm` default transformers version to 4.37.0 (#11859 )
* Update default transformers version to 4.37.0
* Add dependency requirements for qwen and qwen-vl
* Temp fix transformers version for these not yet verified models
* Skip qwen test in UT for now as it requires transformers<4.37.0
* Update performance test regarding updated default `transformers==4.37.0` (#11869 )
* Update igpu performance from transformers 4.36.2 to 4.37.0 (#11841 )
* upgrade arc perf test to transformers 4.37 (#11842 )
* fix load low bit com dtype (#11832 )
* feat: add mixed_precision argument on ppl longbench evaluation
* fix: delete extra code
* feat: upgrade arc perf test to transformers 4.37
* fix: add missing codes
* fix: keep perf test for qwen-vl-chat in transformers 4.36
* fix: remove extra space
* fix: resolve pr comment
* fix: add empty line
* fix: add pip install for spr and core test
* fix: delete extra comments
* fix: remove python -m for pip
* Revert "fix load low bit com dtype (#11832 )"
This reverts commit 6841a9ac8f .
---------
Co-authored-by: Zhao Changmin <changmin.zhao@intel.com>
Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>
* add transformers==4.36 for qwen vl in igpu-perf (#11846 )
* add transformers==4.36.2 for qwen-vl
* Small update
---------
Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
* fix: remove qwen-7b on core test (#11851 )
* fix: remove qwen-7b on core test
* fix: change delete to comment
---------
Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>
* replce filename (#11854 )
* fix: remove qwen-7b on core test
* fix: change delete to comment
* fix: replace filename
---------
Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>
* fix: delete extra comments (#11863 )
* Remove transformers installation for temp test purposes
* Small fix
* Small update
---------
Co-authored-by: Chu,Youcheng <70999398+cranechu0131@users.noreply.github.com>
Co-authored-by: Zhao Changmin <changmin.zhao@intel.com>
Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>
Co-authored-by: Zijie Li <michael20001122@gmail.com>
Co-authored-by: Chu,Youcheng <1340390339@qq.com>
* Pytorch models transformers version update (#11860 )
* yi sync
* delete 4.34 constraint
* delete 4.34 constraint
* delete 4.31 constraint
* delete 4.34 constraint
* delete 4.35 constraint
* added <=4.33.3 constraint
* added <=4.33.3 constraint
* switched to chinese prompt
* Update compresskv model forward type logic (#11868 )
* update
* fix
* Update local import for ppl (#11866 )
Co-authored-by: jenniew <jenniewang123@gmail.com>
* fix: textual adjustment
---------
Co-authored-by: SONG Ge <38711238+sgwhat@users.noreply.github.com>
Co-authored-by: Yishuo Wang <yishuo.wang@intel.com>
Co-authored-by: Yuwen Hu <54161268+Oscilloscope98@users.noreply.github.com>
Co-authored-by: Zhao Changmin <changmin.zhao@intel.com>
Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>
Co-authored-by: Zijie Li <michael20001122@gmail.com>
Co-authored-by: Yina Chen <33650826+cyita@users.noreply.github.com>
Co-authored-by: RyuKosei <70006706+RyuKosei@users.noreply.github.com>
Co-authored-by: jenniew <jenniewang123@gmail.com>
2024-08-20 20:13:54 +08:00
RyuKosei
5df00869de
Update local import for ppl ( #11866 )
...
Co-authored-by: jenniew <jenniewang123@gmail.com>
2024-08-20 18:50:00 +08:00
Yina Chen
c3c058373f
Update compresskv model forward type logic ( #11868 )
...
* update
* fix
2024-08-20 18:11:37 +08:00
Jinhe
3ee194d983
Pytorch models transformers version update ( #11860 )
...
* yi sync
* delete 4.34 constraint
* delete 4.34 constraint
* delete 4.31 constraint
* delete 4.34 constraint
* delete 4.35 constraint
* added <=4.33.3 constraint
* added <=4.33.3 constraint
* switched to chinese prompt
2024-08-20 18:01:42 +08:00
Yuwen Hu
0d58c2fdf9
Update performance test regarding updated default transformers==4.37.0 ( #11869 )
...
* Update igpu performance from transformers 4.36.2 to 4.37.0 (#11841 )
* upgrade arc perf test to transformers 4.37 (#11842 )
* fix load low bit com dtype (#11832 )
* feat: add mixed_precision argument on ppl longbench evaluation
* fix: delete extra code
* feat: upgrade arc perf test to transformers 4.37
* fix: add missing codes
* fix: keep perf test for qwen-vl-chat in transformers 4.36
* fix: remove extra space
* fix: resolve pr comment
* fix: add empty line
* fix: add pip install for spr and core test
* fix: delete extra comments
* fix: remove python -m for pip
* Revert "fix load low bit com dtype (#11832 )"
This reverts commit 6841a9ac8f .
---------
Co-authored-by: Zhao Changmin <changmin.zhao@intel.com>
Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>
* add transformers==4.36 for qwen vl in igpu-perf (#11846 )
* add transformers==4.36.2 for qwen-vl
* Small update
---------
Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
* fix: remove qwen-7b on core test (#11851 )
* fix: remove qwen-7b on core test
* fix: change delete to comment
---------
Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>
* replce filename (#11854 )
* fix: remove qwen-7b on core test
* fix: change delete to comment
* fix: replace filename
---------
Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>
* fix: delete extra comments (#11863 )
* Remove transformers installation for temp test purposes
* Small fix
* Small update
---------
Co-authored-by: Chu,Youcheng <70999398+cranechu0131@users.noreply.github.com>
Co-authored-by: Zhao Changmin <changmin.zhao@intel.com>
Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>
Co-authored-by: Zijie Li <michael20001122@gmail.com>
Co-authored-by: Chu,Youcheng <1340390339@qq.com>
2024-08-20 17:59:28 +08:00
Yuwen Hu
5e8286f72d
Update ipex-llm default transformers version to 4.37.0 ( #11859 )
...
* Update default transformers version to 4.37.0
* Add dependency requirements for qwen and qwen-vl
* Temp fix transformers version for these not yet verified models
* Skip qwen test in UT for now as it requires transformers<4.37.0
2024-08-20 17:37:58 +08:00
Yishuo Wang
d4ee0a89f3
optimize phi3 memory usage ( #11867 )
2024-08-20 17:32:51 +08:00
SONG Ge
5b83493b1a
Add ipex-llm npu option in setup.py ( #11858 )
...
* add ipex-llm npu release
* update example doc
* meet latest release changes
2024-08-20 17:29:49 +08:00
Heyang Sun
ee6852c915
Fix typo ( #11862 )
2024-08-20 16:38:11 +08:00
Yishuo Wang
2946420e14
add minicpmv 2.6 load_low_bit workaround ( #11856 )
2024-08-20 11:16:02 +08:00
SONG Ge
7380823f3f
Update Llama2 multi-processes example ( #11852 )
...
* update llama2 multi-processes examples
* update
* update readme
* update
2024-08-19 19:49:01 +08:00
Yang Wang
99b05ba1dc
separate prefill into a process ( #11787 )
...
* seperate prefill into a process
* using model.share_memory()
* might work
* worked
* use long prompt
* refactor
* cleanup
* fix bug
* clean up
* changable inter and intra process stages
* refactor
* add max output len
* fix npu_model changes that may cause generate down
* fix npu_model generate import error
* fix generare forward error
---------
Co-authored-by: sgwhat <ge.song@intel.com>
2024-08-19 17:53:36 +08:00
Jinhe
da3d7a3a53
delete transformers version requirement ( #11845 )
...
* delete transformers version requirement
* delete transformers version requirement
2024-08-19 17:53:02 +08:00
Ruonan Wang
a0fbda5bc8
add MiniCPM-Llama3-V-2_5 into all-in-one benchmark ( #11849 )
2024-08-19 17:51:16 +08:00
Yishuo Wang
9490781aec
optimize phi3 memory usage again ( #11848 )
2024-08-19 17:26:59 +08:00
Yina Chen
3cd4e87168
Support compress KV with quantize KV ( #11812 )
...
* update llama
* support llama 4.41
* fix style
* support minicpm
* support qwen2
* support minicpm & update
* support chatglm4
* support chatglm
* remove print
* add DynamicCompressFp8Cache & support qwen
* support llama
* support minicpm phi3
* update chatglm2/4
* small fix & support qwen 4.42
* remove print
2024-08-19 15:32:32 +08:00
Zhao Changmin
6841a9ac8f
fix load low bit com dtype ( #11832 )
2024-08-19 13:43:19 +08:00
Yuwen Hu
cfc959defa
Fixes regarding utf-8 in all-in-one benchmark ( #11839 )
2024-08-19 10:38:00 +08:00
Chu,Youcheng
46a1cbfa64
feat: add mixed_precision argument on ppl longbench evaluation ( #11837 )
...
* feat: add mixed_precision argument on ppl longbench evaluation
* fix: delete two spaces
---------
Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>
2024-08-19 10:00:44 +08:00
Yuwen Hu
580c94d0e2
Remove gemma-2-9b-it 3k input from igpu-perf ( #11834 )
2024-08-17 13:10:05 +08:00
Jin, Qiao
9f17234f3b
Add MiniCPM-V-2_6 to iGPU Perf ( #11810 )
...
* Add MiniCPM-V-2_6 to iGPU Perf
* keep last model in yaml
* fix MINICPM_V_IDS
* Restore tested model list
* Small fix
---------
Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
2024-08-16 18:41:21 +08:00
Yuwen Hu
96796f95cb
Update all-in-one benchmark prompts for continuation task & lookup update for minicpmv ( #11827 )
...
* Update all-in-one benchmark prompts for continuation task
* Small fix
* Add pure-text benchmark support for minicpm-v-2_6
* Support lookahead for model.llm generate of minicpmv
* Add prompt reference
* Small update
* Small fix
2024-08-16 17:16:35 +08:00
Yishuo Wang
e966e85df8
force lm_head optimization in any model if set environment variable ( #11830 )
2024-08-16 16:48:45 +08:00
RyuKosei
3b630fb9df
updated ppl README ( #11807 )
...
* edit README.md
* update the branch
* edited README.md
* updated
* updated description
---------
Co-authored-by: jenniew <jenniewang123@gmail.com>
2024-08-16 15:49:25 +08:00
Jinhe
e07a55665c
Codegeex2 tokenization fix ( #11831 )
...
* updated tokenizer file
* updated tokenizer file
* updated tokenizer file
* updated tokenizer file
* new folder
2024-08-16 15:48:47 +08:00
Jinhe
adfbb9124a
Reorganize MiniCPM-V-2_6 example & update others MiniCPM-V-2 exmaples ( #11815 )
...
* model to fp16 & 2_6 reorganize
* revisions
* revisions
* half
* deleted transformer version requirements
* deleted transformer version requirements
---------
Co-authored-by: ivy-lv11 <zhicunlv@gmail.com>
2024-08-16 14:48:56 +08:00
Chu,Youcheng
f463268e36
fix: add run oneAPI instruction for the example of codeshell ( #11828 )
...
* fix: delete ipex extension import in ppl wikitext evaluation
* feat: add mixed_precision argument on ppl wikitext evaluation
* fix: delete mix_precision command in perplex evaluation for wikitext
* fix: remove fp16 mixed-presicion argument
* fix: Add a space.
* fix: add run oneAPI instruction for the example of codeshell
* fix: textual adjustments
* fix: Textual adjustment
---------
Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>
2024-08-16 14:29:06 +08:00
Yishuo Wang
17a0beb21f
optimize qwen2-audio again ( #11825 )
2024-08-16 11:11:35 +08:00
Yuwen Hu
9e9086cc2a
Update IPEX_LLM_PERFORMANCE_MODE ( #11823 )
2024-08-16 09:48:36 +08:00
Wang, Jian4
5a80fd2633
Fix lightweight-serving no streaming resp on mtl ( #11822 )
2024-08-16 09:43:03 +08:00
Guancheng Fu
e70ae0638e
Fix vLLM not convert issues ( #11817 )
...
* Fix not convert issues
* refine
2024-08-15 19:04:05 +08:00
Yishuo Wang
750d4ad5dc
fix minicpm-v-2 fp16 ( #11819 )
2024-08-15 18:34:40 +08:00
Yuwen Hu
6543321f04
Remove 4k igpu perf on gemma-2-9b-it ( #11820 )
2024-08-15 18:06:19 +08:00
Chu,Youcheng
28d1c972da
add mixed_precision argument on ppl wikitext evaluation ( #11813 )
...
* fix: delete ipex extension import in ppl wikitext evaluation
* feat: add mixed_precision argument on ppl wikitext evaluation
* fix: delete mix_precision command in perplex evaluation for wikitext
* fix: remove fp16 mixed-presicion argument
* fix: Add a space.
---------
Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>
2024-08-15 17:58:53 +08:00
Yishuo Wang
828ab16537
fix phi3 and minicpmv cpu ( #11818 )
2024-08-15 17:43:29 +08:00
Yishuo Wang
4e178f0c5d
rewrite minicpmv optimization ( #11816 )
2024-08-15 17:27:12 +08:00
Ch1y0q
447c8ed324
update transformers version for replit-code-v1-3b, `internlm2-chat-… ( #11811 )
...
* update transformers version for `replit-code-v1-3b`, `internlm2-chat-7b` and mistral
* remove for default transformers version
2024-08-15 16:40:48 +08:00
Jinhe
2fbbb51e71
transformers==4.37, yi & yuan2 & vicuna ( #11805 )
...
* transformers==4.37
* added yi model
* added yi model
* xxxx
* delete prompt template
* / and delete
2024-08-15 15:39:24 +08:00
Jinhe
f43da2d455
deletion of specification of transformers version ( #11808 )
2024-08-15 15:23:32 +08:00
Yishuo Wang
07b7f13982
support and optimize qwen2-audio ( #11809 )
2024-08-15 14:59:04 +08:00
Chu,Youcheng
3ac83f8396
fix: delete ipex extension import in ppl wikitext evaluation ( #11806 )
...
Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>
2024-08-15 13:40:01 +08:00
Yishuo Wang
9a93808fc5
fix and optimize minicpm v 2 ( #11799 )
2024-08-14 17:27:23 +08:00