Shaojun Liu
|
48fc63887d
|
use oneccl 0.0.5.1 (#12262)
|
2024-10-24 16:12:24 +08:00 |
|
joan726
|
e0a95eb2d6
|
Add llama_cpp_quickstart.zh-CN.md (#12221)
|
2024-10-24 16:08:31 +08:00 |
|
Xin Qiu
|
39c9d1de52
|
fix code geex (#12261)
|
2024-10-24 14:34:01 +08:00 |
|
Yishuo Wang
|
f3a2b20e6b
|
Optimize gpt2 (#12259)
|
2024-10-24 13:44:24 +08:00 |
|
Ruonan Wang
|
821fd96367
|
Initial integrate our L0 Llama impl into ipex-llm (#12255)
* temp save
* initial support
* fix
* simplify code
* fix style
* fix example
* make default value of pipeline as False
|
2024-10-24 09:49:27 +08:00 |
|
Yishuo Wang
|
cacc891962
|
Fix PR validation (#12253)
|
2024-10-23 18:10:47 +08:00 |
|
binbin Deng
|
b685cf4349
|
Fix npu group size setting of optimize_model=False (#12256)
|
2024-10-23 17:53:54 +08:00 |
|
binbin Deng
|
567b77a76b
|
Support IR and blob format for llama level0 pipeline (#12251)
|
2024-10-23 16:02:35 +08:00 |
|
Yishuo Wang
|
578aef245d
|
Fix models auto choose SdpaAttention with ipex 2.3 (#12252)
|
2024-10-23 15:33:45 +08:00 |
|
Yishuo Wang
|
88dc120a4c
|
fix fp16 linear (#12250)
|
2024-10-23 14:35:19 +08:00 |
|
Yina Chen
|
e8cf7f32f5
|
npu gw small fix (#12249)
|
2024-10-23 14:26:01 +08:00 |
|
Shaojun Liu
|
aae2490cb8
|
fix UT (#12247)
* fix ut
* Update test_transformers_api_attention.py
* Update test_transformers_api_mlp.py
|
2024-10-23 14:13:06 +08:00 |
|
Yina Chen
|
e37f951cce
|
[NPU] Groupwise (#12241)
* dq divide
* fix
* support attn divide
* update qwen2 7b
* divide down_proj & other linear
* use concat & reduce sum
* support scale after
* support qwen2
* w/ mm
* update reshape
* spda
* split
* split 2+
* update
* lm head-> 28
* no scale
* update
* update
* update
* fix style
* fix style
* to split linear
* update
* update code
* address comments
* fix style & remove redundant code & revert benchmark scripts
* fix style & remove code
* update save & load
---------
Co-authored-by: Yang Wang <yang3.wang@intel.com>
|
2024-10-23 14:10:58 +08:00 |
|
Jun Wang
|
aedc4edfba
|
[ADD] add open webui + vllm serving (#12246)
|
2024-10-23 10:13:14 +08:00 |
|
Jin, Qiao
|
8fa98e2742
|
Remove Qwen2-7b from NPU example for "Run Optimized Models (Experimental)" (#12245)
* Remove qwen2-7b from npu example readme
* fix
|
2024-10-22 17:07:51 +08:00 |
|
Yina Chen
|
ec465fbcd7
|
Add lookup generate in load_low_bit (#12243)
* add lookup generate in load_low_bit
* update comment
|
2024-10-22 15:51:52 +08:00 |
|
Yuwen Hu
|
d8c1287335
|
Further update for Windows dGPU performance tests (#12244)
|
2024-10-22 15:07:21 +08:00 |
|
Jason Dai
|
a35cf4d533
|
Update README.md (#12242)
|
2024-10-22 10:19:07 +08:00 |
|
Yuwen Hu
|
b3df47486d
|
Fix Gemma 2 on LNL (#12240)
* Fix gemma 2 on LNL
* Python style fix
|
2024-10-21 18:25:53 +08:00 |
|
Yuwen Hu
|
ac2dac857c
|
Disable 4k input test for now for Windows dGPU performance test (#12239)
|
2024-10-21 15:03:26 +08:00 |
|
Yuwen Hu
|
ea5154d85e
|
Further update to Windows dGPU perf test (#12237)
|
2024-10-21 10:27:16 +08:00 |
|
Yuwen Hu
|
da9270be2d
|
Further update to Windows dGPU perf test (#12233)
|
2024-10-18 23:20:17 +08:00 |
|
Yuwen Hu
|
5935b25622
|
Further update windows gpu perf test regarding results integrity check (#12232)
|
2024-10-18 18:15:13 +08:00 |
|
Yuwen Hu
|
ef659629f3
|
Small update to Windows dGPU perf test (#12230)
* Small update to Windows dGPU perf test
* Small fix
* Small fixes
* Remove unnecessary file
|
2024-10-18 16:39:59 +08:00 |
|
Yuwen Hu
|
9d7f42fd0f
|
Support manually trigger of dGPU perf test on Windows (#12229)
* Support manually trigger of dgpu perf test on Windows
* Small fix
* Small fix
* Small update
|
2024-10-18 15:38:21 +08:00 |
|
Jun Wang
|
b10fc892e1
|
Update new reference link of xpu/docker/readme.md (#12188)
* [ADD] rewrite new vllm docker quick start
* [ADD] lora adapter doc finished
* [ADD] mulit lora adapter test successfully
* [ADD] add ipex-llm quantization doc
* [Merge] rebase main
* [REMOVE] rm tmp file
* [Merge] rebase main
* [ADD] add prefix caching experiment and result
* [REMOVE] rm cpu offloading chapter
* [ADD] rewrite new vllm docker quick start
* [ADD] lora adapter doc finished
* [ADD] mulit lora adapter test successfully
* [ADD] add ipex-llm quantization doc
* [Merge] rebase main
* [REMOVE] rm tmp file
* [Merge] rebase main
* [ADD] rewrite new vllm docker quick start
* [ADD] lora adapter doc finished
* [ADD] mulit lora adapter test successfully
* [ADD] add ipex-llm quantization doc
* [Merge] rebase main
* [REMOVE] rm tmp file
* [Merge] rebase main
* [UPDATE] update the link to new vllm-docker-quickstart
|
2024-10-18 13:18:08 +08:00 |
|
Jun Wang
|
fe3b5cd89b
|
[Update] mmdocs/dockerguide vllm-quick-start awq,gptq online serving document (#12227)
* [FIX] fix the docker start script error
* [ADD] add awq online serving doc
* [ADD] add gptq online serving doc
* [Fix] small fix
|
2024-10-18 09:46:59 +08:00 |
|
Shaojun Liu
|
7825dc1398
|
Upgrade oneccl to 0.0.5 (#12223)
|
2024-10-18 09:29:19 +08:00 |
|
Yuwen Hu
|
b88c1df324
|
Add Llama 3.1 & 3.2 to Arc Performance test (#12225)
* Add llama3.1 and llama3.2 in arc perf (#12202)
* Add llama3.1 and llama3.2 in arc perf
* Uninstall trl after arc test on transformers>=4.40
* Fix arc llama3 perf (#12212)
* Fix pip uninstall
* Uninstall trl after test on transformers==4.43.1
* Fix llama3 arc perf (#12218)
---------
Co-authored-by: Jin, Qiao <89779290+JinBridger@users.noreply.github.com>
|
2024-10-17 21:12:45 +08:00 |
|
Yishuo Wang
|
9ea694484d
|
refactor ot remove old rope usage (#12224)
|
2024-10-17 17:06:09 +08:00 |
|
Yishuo Wang
|
324bcb057e
|
refactor to reduce old rope usage (#12219)
|
2024-10-17 14:45:09 +08:00 |
|
Jiao Wang
|
667f0db466
|
Update Eagle example to Eagle2+ipex-llm integration (#11717)
* update to e2 example
* update
* update
|
2024-10-16 23:16:14 -07:00 |
|
Shaojun Liu
|
26390f9213
|
Update oneccl_wks_installer to 2024.0.0.4.1 (#12217)
|
2024-10-17 10:11:55 +08:00 |
|
Yishuo Wang
|
a4a758656a
|
refactor gemma to reduce old fuse rope usage (#12215)
|
2024-10-16 17:40:28 +08:00 |
|
Yishuo Wang
|
9104a168f6
|
refactor phi-2 to reduce old fuse rope usage (#12214)
|
2024-10-16 17:08:14 +08:00 |
|
Yishuo Wang
|
bb247e991b
|
refactor merge_qkv and attention_softmax (#12213)
|
2024-10-16 15:58:14 +08:00 |
|
Yishuo Wang
|
e279148aa0
|
optimize llama3.2 vision again (#12211)
|
2024-10-16 14:29:48 +08:00 |
|
Chu,Youcheng
|
f17cc4fdee
|
feat: add llama3.2-11b-vision in all in one (#12207)
* feat: add llama3.2-11b-vision in all in one
* fix: change model
* fix: change name
* fix: add a space
* fix: switch import
|
2024-10-16 10:32:11 +08:00 |
|
Yuwen Hu
|
c9ac39fc1e
|
Add Llama 3.2 to iGPU performance test (transformers 4.45) (#12209)
* Add Llama 3.2 to iGPU Perf (#12200)
* Add Llama 3.2 to iGPU Perf
* Downgrade accelerate after step
* Temporarily disable model for test
* Temporarily change ERRORLEVEL check (#12201)
* Restore llama3.2 perf (#12206)
* Revert "Temporarily change ERRORLEVEL check"
This reverts commit 909dbbc930ab4283737161a55bb32006e6ca1991.
* Revert "Temporarily disable model for test"
This reverts commit 95322dc3c6429aa836f21bda0b5ba8d9b48592f8.
---------
Co-authored-by: Jin, Qiao <89779290+JinBridger@users.noreply.github.com>
|
2024-10-15 17:44:46 +08:00 |
|
Yishuo Wang
|
f6611f9d3a
|
optimize llama3.2 vison attention again (#12204)
|
2024-10-15 16:08:20 +08:00 |
|
Yishuo Wang
|
9b81236a2e
|
optimzie qwen2-vl vision (#12203)
|
2024-10-15 15:54:25 +08:00 |
|
Yishuo Wang
|
d5344587ab
|
optimize internvl2 vision model's attention (#12198)
|
2024-10-15 10:51:00 +08:00 |
|
Yuwen Hu
|
f8d1adc573
|
Fix Llama 3.2 & 3.1 on LNL (#12196)
|
2024-10-14 17:39:20 +08:00 |
|
Yuwen Hu
|
516b578104
|
Support cpp release for ARL on Windows (#12189)
* Support cpp Windows release for ARL
* Temp commit for test
* Remove temp commit
|
2024-10-14 17:20:31 +08:00 |
|
Yuwen Hu
|
7da3ab7322
|
Add missing link for Llama3.2-Vision (#12197)
|
2024-10-14 17:19:49 +08:00 |
|
Zijie Li
|
7d80db710e
|
Add benchmark_util for transformers >= 4.44.0 (#12171)
* Create benchmark_util_4_45.py
* Update __init__.py
* Update lint-python
* Update benchmark_util_4_45.py
* Update benchmark_util_4_45.py
* Create benchmark_util_4_44.py
|
2024-10-14 15:40:12 +08:00 |
|
Jin, Qiao
|
8e35800abe
|
Add llama 3.1 in igpu perf (#12194)
|
2024-10-14 15:14:34 +08:00 |
|
Yuwen Hu
|
a768d71581
|
Small fix to LNL installation guide (#12192)
|
2024-10-14 12:03:03 +08:00 |
|
Shaojun Liu
|
49eb20613a
|
add --blocksize to doc and script (#12187)
|
2024-10-12 09:17:42 +08:00 |
|
Jun Wang
|
6ffaec66a2
|
[UPDATE] add prefix caching document into vllm_docker_quickstart.md (#12173)
* [ADD] rewrite new vllm docker quick start
* [ADD] lora adapter doc finished
* [ADD] mulit lora adapter test successfully
* [ADD] add ipex-llm quantization doc
* [Merge] rebase main
* [REMOVE] rm tmp file
* [Merge] rebase main
* [ADD] add prefix caching experiment and result
* [REMOVE] rm cpu offloading chapter
|
2024-10-11 19:12:22 +08:00 |
|