ipex-llm

ayo/ipex-llm

Fork 0

8fe01c9e4d

[NPU pipeline] update cmake usage of pipeline (#12320) Ruonan Wang 2024-11-04 10:30:03 +0800
c8679ad592

Qwen layernorm as input (#12309) Kai Huang 2024-11-04 09:51:15 +0800
94ce447794

Fix performance tests regarding trl version (#12319) Yuwen Hu 2024-11-04 09:42:18 +0800
20755e8077

Small fix to all-in-one benchmark scripts (#12317) Yuwen Hu 2024-11-01 19:16:25 +0800
48123af463

add npu_group_size for transformers_int4_npu_win in all-in-one benchmark api (#12316) Ch1y0q 2024-11-01 18:44:27 +0800
cd5e22cee5

Update Llava GPU Example (#12311) Zijie Li 2024-11-01 05:06:00 -0400
f53bb4ea0b

[NPU L0] Update 1st token generation (#12314) binbin Deng 2024-11-01 17:02:07 +0800
d409d9d0eb

[NPU L0] Update streaming mode of example (#12312) binbin Deng 2024-11-01 15:38:10 +0800
126f95be80

Fix DPO finetuning example (#12313) Jin, Qiao 2024-11-01 13:29:44 +0800
05c5d0267a

[NPU] Llama2 prefill use ov sdp (#12310) Yina Chen 2024-11-01 05:05:20 +0200
eda764909c

Add minicpm-2b in L0 pipeline (#12308) binbin Deng 2024-11-01 09:30:01 +0800
b9853f98b3

fix qwen2 attention_mask slice (#12307) Yishuo Wang 2024-10-31 17:00:05 +0800
3df6195cb0

Fix application quickstart (#12305) Jin, Qiao 2024-10-31 16:57:35 +0800
4892df61c9

Add qwen2-1.5b in l0 pipeline example (#12306) binbin Deng 2024-10-31 16:44:25 +0800
30f668c206

updated transformers & accelerate requirements (#12301) Jinhe 2024-10-31 15:59:40 +0800
97a0f7fd35

Codegeex support (#12303) Xin Qiu 2024-10-31 15:28:56 +0800
72605c7016

fix llama3.1/3.2 quantize kv check (#12302) Yishuo Wang 2024-10-31 11:55:07 +0800
416c19165c

Add Qwen pipeline and example (#12292) Kai Huang 2024-10-31 11:25:25 +0800
4cf1ccc43a

Update DPO EADME.md (#12162) Rahul Nair 2024-10-30 18:56:46 -0800
29400e2e75

feat: change oneccl to internal (#12296) Chu,Youcheng 2024-10-31 09:51:43 +0800
6f22133efc

Update AWQ and GPTQ GPU example (#12300) Zijie Li 2024-10-30 21:35:31 -0400
0763268e4c

[NPU]Qwen2 groupwise performance opt (#12299) Yina Chen 2024-10-30 11:40:21 +0200
41b8064554

Support minicpm-1B in level0 pipeline (#12297) binbin Deng 2024-10-30 17:21:47 +0800
46d8300f6b

bugfix for qlora finetuning on GPU (#12298) Jinhe 2024-10-30 16:54:10 +0800
70037ad55f

Groupwise prefill optimization (#12291) Yina Chen 2024-10-30 08:59:45 +0200
540eaeb12c

refactor attention_softmax (#12295) Yishuo Wang 2024-10-30 13:20:50 +0800
2b2cb9c693

[NPU pipeline] Support save & load and update examples (#12293) Ruonan Wang 2024-10-30 10:02:00 +0800
5a15098835

Initial support for quantized forward on CPU when quantization_group_size=0 (#12282) Yuwen Hu 2024-10-29 19:40:17 +0800
3feb58d1e4

Support baichuan2 for level0 pipeline (#12289) binbin Deng 2024-10-29 19:24:16 +0800
546f455e8e

Patch sdpa check function in specific module attributes table (#12285) Zhao Changmin 2024-10-29 18:41:09 +0800
3700e81977

[fix] vllm-online-benchmark first token latency error (#12271) Jun Wang 2024-10-29 17:54:36 +0800
0bbc04b5ec

Add ollama_quickstart.zh-CN.md (#12284) joan726 2024-10-29 15:12:44 +0800
821b0033ed

[NPU L0] update layernorm & code refactor (#12287) Ruonan Wang 2024-10-29 15:01:45 +0800
4467645088

[NPU] Support l0 Llama groupwise (#12276) Yina Chen 2024-10-28 11:06:55 +0200
1cef0c4948

Update README.md (#12286) Jason Dai 2024-10-28 17:06:16 +0800
67014cb29f

Add benchmark_latency.py to docker serving image (#12283) Guancheng Fu 2024-10-28 16:19:59 +0800
3fe2ea3081

[NPU] Reuse prefill of acc lib for pipeline (#12279) Ruonan Wang 2024-10-28 16:05:49 +0800
42a528ded9

Small update to MTL iGPU Linux Prerequisites installation guide (#12281) Yuwen Hu 2024-10-28 14:12:07 +0800
16074ae2a4

Update Linux prerequisites installation guide for MTL iGPU (#12263) Yuwen Hu 2024-10-28 09:27:14 +0800
ec362e6133

Add llama3 level0 example (#12275) binbin Deng 2024-10-28 09:24:51 +0800
08cb065370

hot-fix redundant import funasr (#12277) SONG Ge 2024-10-25 19:40:39 +0800
a0c6432899

[NPU] Add support for loading a FunASR model (#12073) SONG Ge 2024-10-25 17:22:01 +0800
854398f6e0

update example to reduce peak memory usage (#12274) Ruonan Wang 2024-10-25 17:09:26 +0800
e713296090

Update all-in-one benchmark (#12272) Yuwen Hu 2024-10-25 16:52:59 +0800
43b25a2fe7

Fix llama 3.2 vision on LNL (#12264) Yuwen Hu 2024-10-25 16:23:31 +0800
94c4568988

Update windows installation guide regarding troubleshooting (#12270) Yuwen Hu 2024-10-25 14:32:38 +0800
93895b2ac2

Openvino all in one benchmark small fix (#12269) Yuwen Hu 2024-10-25 14:13:52 +0800
f7f62a3fef

Add OpenVINO performance tests to all-in-one benchmark (#12238) Zijie Li 2024-10-25 01:53:53 -0400
ae57e23e4f

fix incompatibility between llama GW & llama pipeline (#12267) Ruonan Wang 2024-10-25 10:31:44 +0800
b5e663854b

[NPU] Support llama groupwise (#12260) Yina Chen 2024-10-24 13:06:45 +0300
48fc63887d

use oneccl 0.0.5.1 (#12262) Shaojun Liu 2024-10-24 16:12:24 +0800
e0a95eb2d6

Add llama_cpp_quickstart.zh-CN.md (#12221) joan726 2024-10-24 16:08:31 +0800
39c9d1de52

fix code geex (#12261) Xin Qiu 2024-10-24 14:34:01 +0800
f3a2b20e6b

Optimize gpt2 (#12259) Yishuo Wang 2024-10-24 13:44:24 +0800
821fd96367

Initial integrate our L0 Llama impl into ipex-llm (#12255) Ruonan Wang 2024-10-24 09:49:27 +0800
cacc891962

Fix PR validation (#12253) Yishuo Wang 2024-10-23 18:10:47 +0800
b685cf4349

Fix npu group size setting of optimize_model=False (#12256) binbin Deng 2024-10-23 17:53:54 +0800
567b77a76b

Support IR and blob format for llama level0 pipeline (#12251) binbin Deng 2024-10-23 16:02:35 +0800
578aef245d

Fix models auto choose SdpaAttention with ipex 2.3 (#12252) Yishuo Wang 2024-10-23 15:33:45 +0800
88dc120a4c

fix fp16 linear (#12250) Yishuo Wang 2024-10-23 14:35:19 +0800
e8cf7f32f5

npu gw small fix (#12249) Yina Chen 2024-10-23 09:26:01 +0300
aae2490cb8

fix UT (#12247) Shaojun Liu 2024-10-23 14:13:06 +0800
e37f951cce

[NPU] Groupwise (#12241) Yina Chen 2024-10-23 09:10:58 +0300
aedc4edfba

[ADD] add open webui + vllm serving (#12246) Jun Wang 2024-10-23 10:13:14 +0800
8fa98e2742

Remove Qwen2-7b from NPU example for "Run Optimized Models (Experimental)" (#12245) Jin, Qiao 2024-10-22 17:07:51 +0800
ec465fbcd7

Add lookup generate in load_low_bit (#12243) Yina Chen 2024-10-22 10:51:52 +0300
d8c1287335

Further update for Windows dGPU performance tests (#12244) Yuwen Hu 2024-10-22 15:07:21 +0800
a35cf4d533

Update README.md (#12242) Jason Dai 2024-10-22 10:19:07 +0800
b3df47486d

Fix Gemma 2 on LNL (#12240) Yuwen Hu 2024-10-21 18:25:53 +0800
ac2dac857c

Disable 4k input test for now for Windows dGPU performance test (#12239) Yuwen Hu 2024-10-21 15:03:26 +0800
ea5154d85e

Further update to Windows dGPU perf test (#12237) Yuwen Hu 2024-10-21 10:27:16 +0800
da9270be2d

Further update to Windows dGPU perf test (#12233) Yuwen Hu 2024-10-18 23:20:17 +0800
5935b25622

Further update windows gpu perf test regarding results integrity check (#12232) Yuwen Hu 2024-10-18 18:15:13 +0800
ef659629f3

Small update to Windows dGPU perf test (#12230) Yuwen Hu 2024-10-18 16:39:59 +0800
9d7f42fd0f

Support manually trigger of dGPU perf test on Windows (#12229) Yuwen Hu 2024-10-18 15:38:21 +0800
b10fc892e1

Update new reference link of xpu/docker/readme.md (#12188) Jun Wang 2024-10-18 13:18:08 +0800
fe3b5cd89b

[Update] mmdocs/dockerguide vllm-quick-start awq,gptq online serving document (#12227) Jun Wang 2024-10-18 09:46:59 +0800
7825dc1398

Upgrade oneccl to 0.0.5 (#12223) Shaojun Liu 2024-10-18 09:29:19 +0800
b88c1df324

Add Llama 3.1 & 3.2 to Arc Performance test (#12225) Yuwen Hu 2024-10-17 21:12:45 +0800
9ea694484d

refactor ot remove old rope usage (#12224) Yishuo Wang 2024-10-17 17:06:09 +0800
324bcb057e

refactor to reduce old rope usage (#12219) Yishuo Wang 2024-10-17 14:45:09 +0800
667f0db466

Update Eagle example to Eagle2+ipex-llm integration (#11717) Jiao Wang 2024-10-17 14:16:14 +0800
26390f9213

Update oneccl_wks_installer to 2024.0.0.4.1 (#12217) Shaojun Liu 2024-10-17 10:11:55 +0800
a4a758656a

refactor gemma to reduce old fuse rope usage (#12215) Yishuo Wang 2024-10-16 17:40:28 +0800
9104a168f6

refactor phi-2 to reduce old fuse rope usage (#12214) Yishuo Wang 2024-10-16 17:08:14 +0800
bb247e991b

refactor merge_qkv and attention_softmax (#12213) Yishuo Wang 2024-10-16 15:58:14 +0800
e279148aa0

optimize llama3.2 vision again (#12211) Yishuo Wang 2024-10-16 14:29:48 +0800
f17cc4fdee

feat: add llama3.2-11b-vision in all in one (#12207) Chu,Youcheng 2024-10-16 10:32:11 +0800
c9ac39fc1e

Add Llama 3.2 to iGPU performance test (transformers 4.45) (#12209) Yuwen Hu 2024-10-15 17:44:46 +0800
f6611f9d3a

optimize llama3.2 vison attention again (#12204) Yishuo Wang 2024-10-15 16:08:20 +0800
9b81236a2e

optimzie qwen2-vl vision (#12203) Yishuo Wang 2024-10-15 15:54:25 +0800
d5344587ab

optimize internvl2 vision model's attention (#12198) Yishuo Wang 2024-10-15 10:51:00 +0800
f8d1adc573

Fix Llama 3.2 & 3.1 on LNL (#12196) Yuwen Hu 2024-10-14 17:39:20 +0800
516b578104

Support cpp release for ARL on Windows (#12189) Yuwen Hu 2024-10-14 17:20:31 +0800
7da3ab7322

Add missing link for Llama3.2-Vision (#12197) Yuwen Hu 2024-10-14 17:19:49 +0800
7d80db710e

Add benchmark_util for transformers >= 4.44.0 (#12171) Zijie Li 2024-10-14 03:40:12 -0400
8e35800abe

Add llama 3.1 in igpu perf (#12194) Jin, Qiao 2024-10-14 15:14:34 +0800
a768d71581

Small fix to LNL installation guide (#12192) Yuwen Hu 2024-10-14 12:03:03 +0800
49eb20613a

add --blocksize to doc and script (#12187) Shaojun Liu 2024-10-12 09:17:42 +0800
6ffaec66a2

[UPDATE] add prefix caching document into vllm_docker_quickstart.md (#12173) Jun Wang 2024-10-11 19:12:22 +0800

Commit graph Select branches Hide pull requests main Mono Color

Commit graph

Select branches

Hide pull requests

main