Commit graph

3571 commits

Author SHA1 Message Date
Yuwen Hu
16074ae2a4
Update Linux prerequisites installation guide for MTL iGPU (#12263)
* Update Linux prerequisites installation guide for MTL iGPU

* Further link update

* Small fixes

* Small fix

* Update based on comments

* Small fix

* Make oneAPI installation a shared section for both MTL iGPU and other GPU

* Small fix

* Small fix

* Clarify description
2024-10-28 09:27:14 +08:00
binbin Deng
ec362e6133
Add llama3 level0 example (#12275) 2024-10-28 09:24:51 +08:00
SONG Ge
08cb065370
hot-fix redundant import funasr (#12277) 2024-10-25 19:40:39 +08:00
SONG Ge
a0c6432899
[NPU] Add support for loading a FunASR model (#12073)
* add support for loading funasr model

* add initial support for paraformer-encoder

* add npu ops impl

* add encoder-decoder npu pipeline

* move paraformer encoders prefix 30 layers  to npu and keep the rest layers on cpu
2024-10-25 17:22:01 +08:00
Ruonan Wang
854398f6e0
update example to reduce peak memory usage (#12274) 2024-10-25 17:09:26 +08:00
Yuwen Hu
e713296090
Update all-in-one benchmark (#12272)
* Update all-in-one benchmark

* Small fix

* Small fix

* Small fix
2024-10-25 16:52:59 +08:00
Yuwen Hu
43b25a2fe7
Fix llama 3.2 vision on LNL (#12264)
* Fix llama 3.2 vision on LNL

* Small fix
2024-10-25 16:23:31 +08:00
Yuwen Hu
94c4568988
Update windows installation guide regarding troubleshooting (#12270) 2024-10-25 14:32:38 +08:00
Yuwen Hu
93895b2ac2
Openvino all in one benchmark small fix (#12269)
* Small update for all-in-one benchmark readme to support OpenVINO tests

* Small fix
2024-10-25 14:13:52 +08:00
Zijie Li
f7f62a3fef
Add OpenVINO performance tests to all-in-one benchmark (#12238)
* add-openvino-to-all-in-one

* update on openvino API

* Update save_openvino.py

* Update save_openvino.py

* Update save_openvino.py

* update on run.py and save_openvino

* update references

* Create openvino-requirements.txt

* fix on comments

* Small updates

* Small fix

* Fix

---------

Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
2024-10-25 13:53:53 +08:00
Ruonan Wang
ae57e23e4f
fix incompatibility between llama GW & llama pipeline (#12267)
* fix

* fix
2024-10-25 10:31:44 +08:00
Yina Chen
b5e663854b
[NPU] Support llama groupwise (#12260)
* support llama gw

* support llama gw lm_head

* fix style

* remove unused code
2024-10-24 18:06:45 +08:00
Shaojun Liu
48fc63887d
use oneccl 0.0.5.1 (#12262) 2024-10-24 16:12:24 +08:00
joan726
e0a95eb2d6
Add llama_cpp_quickstart.zh-CN.md (#12221) 2024-10-24 16:08:31 +08:00
Xin Qiu
39c9d1de52
fix code geex (#12261) 2024-10-24 14:34:01 +08:00
Yishuo Wang
f3a2b20e6b
Optimize gpt2 (#12259) 2024-10-24 13:44:24 +08:00
Ruonan Wang
821fd96367
Initial integrate our L0 Llama impl into ipex-llm (#12255)
* temp save

* initial support

* fix

* simplify code

* fix style

* fix example

* make default value of pipeline as False
2024-10-24 09:49:27 +08:00
Yishuo Wang
cacc891962
Fix PR validation (#12253) 2024-10-23 18:10:47 +08:00
binbin Deng
b685cf4349
Fix npu group size setting of optimize_model=False (#12256) 2024-10-23 17:53:54 +08:00
binbin Deng
567b77a76b
Support IR and blob format for llama level0 pipeline (#12251) 2024-10-23 16:02:35 +08:00
Yishuo Wang
578aef245d
Fix models auto choose SdpaAttention with ipex 2.3 (#12252) 2024-10-23 15:33:45 +08:00
Yishuo Wang
88dc120a4c
fix fp16 linear (#12250) 2024-10-23 14:35:19 +08:00
Yina Chen
e8cf7f32f5
npu gw small fix (#12249) 2024-10-23 14:26:01 +08:00
Shaojun Liu
aae2490cb8
fix UT (#12247)
* fix ut

* Update test_transformers_api_attention.py

* Update test_transformers_api_mlp.py
2024-10-23 14:13:06 +08:00
Yina Chen
e37f951cce
[NPU] Groupwise (#12241)
* dq divide

* fix

* support attn divide

* update qwen2 7b

* divide down_proj & other linear

* use concat & reduce sum

* support scale after

* support qwen2

* w/ mm

* update reshape

* spda

* split

* split 2+

* update

* lm head-> 28

* no scale

* update

* update

* update

* fix style

* fix style

* to split linear

* update

* update code

* address comments

* fix style & remove redundant code & revert benchmark scripts

* fix style & remove code

* update save & load

---------

Co-authored-by: Yang Wang <yang3.wang@intel.com>
2024-10-23 14:10:58 +08:00
Jun Wang
aedc4edfba
[ADD] add open webui + vllm serving (#12246) 2024-10-23 10:13:14 +08:00
Jin, Qiao
8fa98e2742
Remove Qwen2-7b from NPU example for "Run Optimized Models (Experimental)" (#12245)
* Remove qwen2-7b from npu example readme

* fix
2024-10-22 17:07:51 +08:00
Yina Chen
ec465fbcd7
Add lookup generate in load_low_bit (#12243)
* add lookup generate in load_low_bit

* update comment
2024-10-22 15:51:52 +08:00
Yuwen Hu
d8c1287335
Further update for Windows dGPU performance tests (#12244) 2024-10-22 15:07:21 +08:00
Jason Dai
a35cf4d533
Update README.md (#12242) 2024-10-22 10:19:07 +08:00
Yuwen Hu
b3df47486d
Fix Gemma 2 on LNL (#12240)
* Fix gemma 2 on LNL

* Python style fix
2024-10-21 18:25:53 +08:00
Yuwen Hu
ac2dac857c
Disable 4k input test for now for Windows dGPU performance test (#12239) 2024-10-21 15:03:26 +08:00
Yuwen Hu
ea5154d85e
Further update to Windows dGPU perf test (#12237) 2024-10-21 10:27:16 +08:00
Yuwen Hu
da9270be2d
Further update to Windows dGPU perf test (#12233) 2024-10-18 23:20:17 +08:00
Yuwen Hu
5935b25622
Further update windows gpu perf test regarding results integrity check (#12232) 2024-10-18 18:15:13 +08:00
Yuwen Hu
ef659629f3
Small update to Windows dGPU perf test (#12230)
* Small update to Windows dGPU perf test

* Small fix

* Small fixes

* Remove unnecessary file
2024-10-18 16:39:59 +08:00
Yuwen Hu
9d7f42fd0f
Support manually trigger of dGPU perf test on Windows (#12229)
* Support manually trigger of dgpu perf test on Windows

* Small fix

* Small fix

* Small update
2024-10-18 15:38:21 +08:00
Jun Wang
b10fc892e1
Update new reference link of xpu/docker/readme.md (#12188)
* [ADD] rewrite new vllm docker quick start

* [ADD] lora adapter doc finished

* [ADD] mulit lora adapter test successfully

* [ADD] add ipex-llm quantization doc

* [Merge] rebase main

* [REMOVE] rm tmp file

* [Merge] rebase main

* [ADD] add prefix caching experiment and result

* [REMOVE] rm cpu offloading chapter

* [ADD] rewrite new vllm docker quick start

* [ADD] lora adapter doc finished

* [ADD] mulit lora adapter test successfully

* [ADD] add ipex-llm quantization doc

* [Merge] rebase main

* [REMOVE] rm tmp file

* [Merge] rebase main

* [ADD] rewrite new vllm docker quick start

* [ADD] lora adapter doc finished

* [ADD] mulit lora adapter test successfully

* [ADD] add ipex-llm quantization doc

* [Merge] rebase main

* [REMOVE] rm tmp file

* [Merge] rebase main

* [UPDATE] update the link to new vllm-docker-quickstart
2024-10-18 13:18:08 +08:00
Jun Wang
fe3b5cd89b
[Update] mmdocs/dockerguide vllm-quick-start awq,gptq online serving document (#12227)
* [FIX] fix the docker start script error

* [ADD] add awq online serving doc

* [ADD] add gptq online serving doc

* [Fix] small fix
2024-10-18 09:46:59 +08:00
Shaojun Liu
7825dc1398
Upgrade oneccl to 0.0.5 (#12223) 2024-10-18 09:29:19 +08:00
Yuwen Hu
b88c1df324
Add Llama 3.1 & 3.2 to Arc Performance test (#12225)
* Add llama3.1 and llama3.2 in arc perf (#12202)

* Add llama3.1 and llama3.2 in arc perf

* Uninstall trl after arc test on transformers>=4.40

* Fix arc llama3 perf (#12212)

* Fix pip uninstall

* Uninstall trl after test on transformers==4.43.1

* Fix llama3 arc perf (#12218)

---------

Co-authored-by: Jin, Qiao <89779290+JinBridger@users.noreply.github.com>
2024-10-17 21:12:45 +08:00
Yishuo Wang
9ea694484d
refactor ot remove old rope usage (#12224) 2024-10-17 17:06:09 +08:00
Yishuo Wang
324bcb057e
refactor to reduce old rope usage (#12219) 2024-10-17 14:45:09 +08:00
Jiao Wang
667f0db466
Update Eagle example to Eagle2+ipex-llm integration (#11717)
* update to e2 example

* update

* update
2024-10-16 23:16:14 -07:00
Shaojun Liu
26390f9213
Update oneccl_wks_installer to 2024.0.0.4.1 (#12217) 2024-10-17 10:11:55 +08:00
Yishuo Wang
a4a758656a
refactor gemma to reduce old fuse rope usage (#12215) 2024-10-16 17:40:28 +08:00
Yishuo Wang
9104a168f6
refactor phi-2 to reduce old fuse rope usage (#12214) 2024-10-16 17:08:14 +08:00
Yishuo Wang
bb247e991b
refactor merge_qkv and attention_softmax (#12213) 2024-10-16 15:58:14 +08:00
Yishuo Wang
e279148aa0
optimize llama3.2 vision again (#12211) 2024-10-16 14:29:48 +08:00
Chu,Youcheng
f17cc4fdee
feat: add llama3.2-11b-vision in all in one (#12207)
* feat: add llama3.2-11b-vision in all in one

* fix: change model

* fix: change name

* fix: add a space

* fix: switch import
2024-10-16 10:32:11 +08:00