Yina Chen
e37f951cce
[NPU] Groupwise ( #12241 )
...
* dq divide
* fix
* support attn divide
* update qwen2 7b
* divide down_proj & other linear
* use concat & reduce sum
* support scale after
* support qwen2
* w/ mm
* update reshape
* spda
* split
* split 2+
* update
* lm head-> 28
* no scale
* update
* update
* update
* fix style
* fix style
* to split linear
* update
* update code
* address comments
* fix style & remove redundant code & revert benchmark scripts
* fix style & remove code
* update save & load
---------
Co-authored-by: Yang Wang <yang3.wang@intel.com>
2024-10-23 14:10:58 +08:00
Jin, Qiao
8fa98e2742
Remove Qwen2-7b from NPU example for "Run Optimized Models (Experimental)" ( #12245 )
...
* Remove qwen2-7b from npu example readme
* fix
2024-10-22 17:07:51 +08:00
Yina Chen
ec465fbcd7
Add lookup generate in load_low_bit ( #12243 )
...
* add lookup generate in load_low_bit
* update comment
2024-10-22 15:51:52 +08:00
Yuwen Hu
b3df47486d
Fix Gemma 2 on LNL ( #12240 )
...
* Fix gemma 2 on LNL
* Python style fix
2024-10-21 18:25:53 +08:00
Yuwen Hu
5935b25622
Further update windows gpu perf test regarding results integrity check ( #12232 )
2024-10-18 18:15:13 +08:00
Yuwen Hu
b88c1df324
Add Llama 3.1 & 3.2 to Arc Performance test ( #12225 )
...
* Add llama3.1 and llama3.2 in arc perf (#12202 )
* Add llama3.1 and llama3.2 in arc perf
* Uninstall trl after arc test on transformers>=4.40
* Fix arc llama3 perf (#12212 )
* Fix pip uninstall
* Uninstall trl after test on transformers==4.43.1
* Fix llama3 arc perf (#12218 )
---------
Co-authored-by: Jin, Qiao <89779290+JinBridger@users.noreply.github.com>
2024-10-17 21:12:45 +08:00
Yishuo Wang
9ea694484d
refactor ot remove old rope usage ( #12224 )
2024-10-17 17:06:09 +08:00
Yishuo Wang
324bcb057e
refactor to reduce old rope usage ( #12219 )
2024-10-17 14:45:09 +08:00
Jiao Wang
667f0db466
Update Eagle example to Eagle2+ipex-llm integration ( #11717 )
...
* update to e2 example
* update
* update
2024-10-16 23:16:14 -07:00
Yishuo Wang
a4a758656a
refactor gemma to reduce old fuse rope usage ( #12215 )
2024-10-16 17:40:28 +08:00
Yishuo Wang
9104a168f6
refactor phi-2 to reduce old fuse rope usage ( #12214 )
2024-10-16 17:08:14 +08:00
Yishuo Wang
bb247e991b
refactor merge_qkv and attention_softmax ( #12213 )
2024-10-16 15:58:14 +08:00
Yishuo Wang
e279148aa0
optimize llama3.2 vision again ( #12211 )
2024-10-16 14:29:48 +08:00
Chu,Youcheng
f17cc4fdee
feat: add llama3.2-11b-vision in all in one ( #12207 )
...
* feat: add llama3.2-11b-vision in all in one
* fix: change model
* fix: change name
* fix: add a space
* fix: switch import
2024-10-16 10:32:11 +08:00
Yuwen Hu
c9ac39fc1e
Add Llama 3.2 to iGPU performance test (transformers 4.45) ( #12209 )
...
* Add Llama 3.2 to iGPU Perf (#12200 )
* Add Llama 3.2 to iGPU Perf
* Downgrade accelerate after step
* Temporarily disable model for test
* Temporarily change ERRORLEVEL check (#12201 )
* Restore llama3.2 perf (#12206 )
* Revert "Temporarily change ERRORLEVEL check"
This reverts commit 909dbbc930ab4283737161a55bb32006e6ca1991.
* Revert "Temporarily disable model for test"
This reverts commit 95322dc3c6429aa836f21bda0b5ba8d9b48592f8.
---------
Co-authored-by: Jin, Qiao <89779290+JinBridger@users.noreply.github.com>
2024-10-15 17:44:46 +08:00
Yishuo Wang
f6611f9d3a
optimize llama3.2 vison attention again ( #12204 )
2024-10-15 16:08:20 +08:00
Yishuo Wang
9b81236a2e
optimzie qwen2-vl vision ( #12203 )
2024-10-15 15:54:25 +08:00
Yishuo Wang
d5344587ab
optimize internvl2 vision model's attention ( #12198 )
2024-10-15 10:51:00 +08:00
Yuwen Hu
f8d1adc573
Fix Llama 3.2 & 3.1 on LNL ( #12196 )
2024-10-14 17:39:20 +08:00
Yuwen Hu
516b578104
Support cpp release for ARL on Windows ( #12189 )
...
* Support cpp Windows release for ARL
* Temp commit for test
* Remove temp commit
2024-10-14 17:20:31 +08:00
Zijie Li
7d80db710e
Add benchmark_util for transformers >= 4.44.0 ( #12171 )
...
* Create benchmark_util_4_45.py
* Update __init__.py
* Update lint-python
* Update benchmark_util_4_45.py
* Update benchmark_util_4_45.py
* Create benchmark_util_4_44.py
2024-10-14 15:40:12 +08:00
Jin, Qiao
8e35800abe
Add llama 3.1 in igpu perf ( #12194 )
2024-10-14 15:14:34 +08:00
Yuwen Hu
ddcdf47539
Support Windows ARL release ( #12183 )
...
* Support release for ARL
* Small fix
* Small fix to doc
* Temp for test
* Remove temp commit for test
2024-10-11 18:30:52 +08:00
Jinhe
f983f1a8f4
Add Qwen2-VL gpu example ( #12135 )
...
* qwen2-vl readme
* add qwen2-vl example
* fix
* fix
* fix
* add link
* Update regarding modules_to_not_convert and readme
* Further fix
* Small fix
---------
Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
2024-10-11 18:25:23 +08:00
Ruonan Wang
310f18c8af
update NPU pipeline generate ( #12182 )
...
* update
* fix style
2024-10-11 17:39:20 +08:00
Shaojun Liu
724b2ae66d
add npu-level0 pipeline.dll to ipex-llm ( #12181 )
...
* add npu-level0 pipeline.dll to ipex-llm
* test
* update runner label
* fix
* update
* fix
* fix
2024-10-11 16:05:20 +08:00
Ruonan Wang
4d93bb81fe
Initial support of NPU level0 Model ( #12177 )
...
* first commit to support load dll and init llm pipeline
* add init generate
* fix style
* small updates
* fix style and check tokens number
2024-10-11 09:45:53 +08:00
Yuwen Hu
890662610b
Fix auto importer for LNL release ( #12175 )
2024-10-10 15:17:43 +08:00
Yishuo Wang
535bee5381
fix qwen2 vl again ( #12174 )
2024-10-10 13:50:01 +08:00
Yuwen Hu
aef1f671bd
Support LNL Windows release ( #12169 )
...
* Release for LNL on Windows
* Temp commit for release test
* Change option name
* Remove temp commit and change option name
* temp commit for test again
* Remove temp commit
2024-10-09 17:41:10 +08:00
Yishuo Wang
78d253165d
optimize qwen2 vl perf again ( #12167 )
2024-10-09 16:43:48 +08:00
Zijie Li
3d044dbf53
add llama3.2-vision Pytorch example ( #12165 )
2024-10-09 09:20:42 +08:00
Yishuo Wang
644af2a76e
add basic llama 3.2 vision support ( #12163 )
2024-10-08 10:46:48 +08:00
Ch1y0q
17c23cd759
add llama3.2 GPU example ( #12137 )
...
* add llama3.2 GPU example
* change prompt format reference url
* update
* add Meta-Llama-3.2-1B-Instruct sample output
* update wording
2024-09-29 14:41:54 +08:00
Yuwen Hu
f71b38a994
Update MiniCPM_V_26 GPU example with save & load ( #12127 )
2024-09-26 17:40:22 +08:00
Yishuo Wang
669ff1a97b
fix sd1.5 ( #12129 )
2024-09-26 17:15:16 +08:00
Yishuo Wang
a266528719
optimize llama 3.2 rope ( #12128 )
2024-09-26 16:08:10 +08:00
Yishuo Wang
584c3489e7
add basic support for llama3.2 ( #12125 )
2024-09-26 15:46:19 +08:00
Yishuo Wang
66f419f8b7
fix qwen2 vl ( #12126 )
2024-09-26 15:44:02 +08:00
Ch1y0q
2ea13d502f
Add minicpm3 gpu example ( #12114 )
...
* add minicpm3 gpu example
* update GPU example
* update
---------
Co-authored-by: Huang, Xinshengzi <xinshengzi.huang@intel.com>
2024-09-26 13:51:37 +08:00
Yishuo Wang
77af9bc5fa
support passing None to low_bit in optimize_model ( #12121 )
2024-09-26 11:09:35 +08:00
Yishuo Wang
47e0b83cbf
optimize sd 1.5 ( #12119 )
2024-09-25 15:45:13 +08:00
Jin, Qiao
2bedb17be7
Add Qwen2.5 NPU Example ( #12110 )
...
* Add Qwen2.5 NPU Example
* fix
* Merge qwen2.py and qwen2.5.py into qwen.py
* Fix description
2024-09-25 15:20:03 +08:00
Yishuo Wang
5d63aef60b
optimize qwen2 vl again ( #12109 )
2024-09-23 13:22:01 +08:00
Ruonan Wang
03bd01c99c
optimize npu qwen2 ( #12107 )
2024-09-20 19:46:16 +08:00
Jinhe
02399021d6
add npu load_low_bit api in all-in-one benchmark ( #12103 )
2024-09-20 17:56:08 +08:00
Yishuo Wang
9239fd4f12
add basic support and optimization for qwen2-vl ( #12104 )
2024-09-20 17:23:06 +08:00
Yuwen Hu
828fa01ad3
[NPU] Add mixed_precision for Qwen2 7B ( #12098 )
...
* Add mix_precision argument to control whether use INT8 lm_head for Qwen2-7B-Instruct
* Small fix
* Fixed on load low bit with mixed precision
* Small fix
* Update example accordingly
* Update for default prompt
* Update base on comments
* Final fix
2024-09-20 16:36:21 +08:00
Ch1y0q
2269768e71
add internvl2 example ( #12102 )
...
* add internvl2 example
* add to README.md
* update
* add link to zh-CN readme
2024-09-20 16:31:54 +08:00
Ruonan Wang
09b8c80d9d
update code for NPU qwen2 ( #12094 )
...
* update code
* fix
2024-09-20 15:58:32 +08:00