Commit graph

  • 8fe01c9e4d
    [NPU pipeline] update cmake usage of pipeline (#12320) Ruonan Wang 2024-11-04 10:30:03 +0800
  • c8679ad592
    Qwen layernorm as input (#12309) Kai Huang 2024-11-04 09:51:15 +0800
  • 94ce447794
    Fix performance tests regarding trl version (#12319) Yuwen Hu 2024-11-04 09:42:18 +0800
  • 20755e8077
    Small fix to all-in-one benchmark scripts (#12317) Yuwen Hu 2024-11-01 19:16:25 +0800
  • 48123af463
    add npu_group_size for transformers_int4_npu_win in all-in-one benchmark api (#12316) Ch1y0q 2024-11-01 18:44:27 +0800
  • cd5e22cee5
    Update Llava GPU Example (#12311) Zijie Li 2024-11-01 05:06:00 -0400
  • f53bb4ea0b
    [NPU L0] Update 1st token generation (#12314) binbin Deng 2024-11-01 17:02:07 +0800
  • d409d9d0eb
    [NPU L0] Update streaming mode of example (#12312) binbin Deng 2024-11-01 15:38:10 +0800
  • 126f95be80
    Fix DPO finetuning example (#12313) Jin, Qiao 2024-11-01 13:29:44 +0800
  • 05c5d0267a
    [NPU] Llama2 prefill use ov sdp (#12310) Yina Chen 2024-11-01 05:05:20 +0200
  • eda764909c
    Add minicpm-2b in L0 pipeline (#12308) binbin Deng 2024-11-01 09:30:01 +0800
  • b9853f98b3
    fix qwen2 attention_mask slice (#12307) Yishuo Wang 2024-10-31 17:00:05 +0800
  • 3df6195cb0
    Fix application quickstart (#12305) Jin, Qiao 2024-10-31 16:57:35 +0800
  • 4892df61c9
    Add qwen2-1.5b in l0 pipeline example (#12306) binbin Deng 2024-10-31 16:44:25 +0800
  • 30f668c206
    updated transformers & accelerate requirements (#12301) Jinhe 2024-10-31 15:59:40 +0800
  • 97a0f7fd35
    Codegeex support (#12303) Xin Qiu 2024-10-31 15:28:56 +0800
  • 72605c7016
    fix llama3.1/3.2 quantize kv check (#12302) Yishuo Wang 2024-10-31 11:55:07 +0800
  • 416c19165c
    Add Qwen pipeline and example (#12292) Kai Huang 2024-10-31 11:25:25 +0800
  • 4cf1ccc43a
    Update DPO EADME.md (#12162) Rahul Nair 2024-10-30 18:56:46 -0800
  • 29400e2e75
    feat: change oneccl to internal (#12296) Chu,Youcheng 2024-10-31 09:51:43 +0800
  • 6f22133efc
    Update AWQ and GPTQ GPU example (#12300) Zijie Li 2024-10-30 21:35:31 -0400
  • 0763268e4c
    [NPU]Qwen2 groupwise performance opt (#12299) Yina Chen 2024-10-30 11:40:21 +0200
  • 41b8064554
    Support minicpm-1B in level0 pipeline (#12297) binbin Deng 2024-10-30 17:21:47 +0800
  • 46d8300f6b
    bugfix for qlora finetuning on GPU (#12298) Jinhe 2024-10-30 16:54:10 +0800
  • 70037ad55f
    Groupwise prefill optimization (#12291) Yina Chen 2024-10-30 08:59:45 +0200
  • 540eaeb12c
    refactor attention_softmax (#12295) Yishuo Wang 2024-10-30 13:20:50 +0800
  • 2b2cb9c693
    [NPU pipeline] Support save & load and update examples (#12293) Ruonan Wang 2024-10-30 10:02:00 +0800
  • 5a15098835
    Initial support for quantized forward on CPU when quantization_group_size=0 (#12282) Yuwen Hu 2024-10-29 19:40:17 +0800
  • 3feb58d1e4
    Support baichuan2 for level0 pipeline (#12289) binbin Deng 2024-10-29 19:24:16 +0800
  • 546f455e8e
    Patch sdpa check function in specific module attributes table (#12285) Zhao Changmin 2024-10-29 18:41:09 +0800
  • 3700e81977
    [fix] vllm-online-benchmark first token latency error (#12271) Jun Wang 2024-10-29 17:54:36 +0800
  • 0bbc04b5ec
    Add ollama_quickstart.zh-CN.md (#12284) joan726 2024-10-29 15:12:44 +0800
  • 821b0033ed
    [NPU L0] update layernorm & code refactor (#12287) Ruonan Wang 2024-10-29 15:01:45 +0800
  • 4467645088
    [NPU] Support l0 Llama groupwise (#12276) Yina Chen 2024-10-28 11:06:55 +0200
  • 1cef0c4948
    Update README.md (#12286) Jason Dai 2024-10-28 17:06:16 +0800
  • 67014cb29f
    Add benchmark_latency.py to docker serving image (#12283) Guancheng Fu 2024-10-28 16:19:59 +0800
  • 3fe2ea3081
    [NPU] Reuse prefill of acc lib for pipeline (#12279) Ruonan Wang 2024-10-28 16:05:49 +0800
  • 42a528ded9
    Small update to MTL iGPU Linux Prerequisites installation guide (#12281) Yuwen Hu 2024-10-28 14:12:07 +0800
  • 16074ae2a4
    Update Linux prerequisites installation guide for MTL iGPU (#12263) Yuwen Hu 2024-10-28 09:27:14 +0800
  • ec362e6133
    Add llama3 level0 example (#12275) binbin Deng 2024-10-28 09:24:51 +0800
  • 08cb065370
    hot-fix redundant import funasr (#12277) SONG Ge 2024-10-25 19:40:39 +0800
  • a0c6432899
    [NPU] Add support for loading a FunASR model (#12073) SONG Ge 2024-10-25 17:22:01 +0800
  • 854398f6e0
    update example to reduce peak memory usage (#12274) Ruonan Wang 2024-10-25 17:09:26 +0800
  • e713296090
    Update all-in-one benchmark (#12272) Yuwen Hu 2024-10-25 16:52:59 +0800
  • 43b25a2fe7
    Fix llama 3.2 vision on LNL (#12264) Yuwen Hu 2024-10-25 16:23:31 +0800
  • 94c4568988
    Update windows installation guide regarding troubleshooting (#12270) Yuwen Hu 2024-10-25 14:32:38 +0800
  • 93895b2ac2
    Openvino all in one benchmark small fix (#12269) Yuwen Hu 2024-10-25 14:13:52 +0800
  • f7f62a3fef
    Add OpenVINO performance tests to all-in-one benchmark (#12238) Zijie Li 2024-10-25 01:53:53 -0400
  • ae57e23e4f
    fix incompatibility between llama GW & llama pipeline (#12267) Ruonan Wang 2024-10-25 10:31:44 +0800
  • b5e663854b
    [NPU] Support llama groupwise (#12260) Yina Chen 2024-10-24 13:06:45 +0300
  • 48fc63887d
    use oneccl 0.0.5.1 (#12262) Shaojun Liu 2024-10-24 16:12:24 +0800
  • e0a95eb2d6
    Add llama_cpp_quickstart.zh-CN.md (#12221) joan726 2024-10-24 16:08:31 +0800
  • 39c9d1de52
    fix code geex (#12261) Xin Qiu 2024-10-24 14:34:01 +0800
  • f3a2b20e6b
    Optimize gpt2 (#12259) Yishuo Wang 2024-10-24 13:44:24 +0800
  • 821fd96367
    Initial integrate our L0 Llama impl into ipex-llm (#12255) Ruonan Wang 2024-10-24 09:49:27 +0800
  • cacc891962
    Fix PR validation (#12253) Yishuo Wang 2024-10-23 18:10:47 +0800
  • b685cf4349
    Fix npu group size setting of optimize_model=False (#12256) binbin Deng 2024-10-23 17:53:54 +0800
  • 567b77a76b
    Support IR and blob format for llama level0 pipeline (#12251) binbin Deng 2024-10-23 16:02:35 +0800
  • 578aef245d
    Fix models auto choose SdpaAttention with ipex 2.3 (#12252) Yishuo Wang 2024-10-23 15:33:45 +0800
  • 88dc120a4c
    fix fp16 linear (#12250) Yishuo Wang 2024-10-23 14:35:19 +0800
  • e8cf7f32f5
    npu gw small fix (#12249) Yina Chen 2024-10-23 09:26:01 +0300
  • aae2490cb8
    fix UT (#12247) Shaojun Liu 2024-10-23 14:13:06 +0800
  • e37f951cce
    [NPU] Groupwise (#12241) Yina Chen 2024-10-23 09:10:58 +0300
  • aedc4edfba
    [ADD] add open webui + vllm serving (#12246) Jun Wang 2024-10-23 10:13:14 +0800
  • 8fa98e2742
    Remove Qwen2-7b from NPU example for "Run Optimized Models (Experimental)" (#12245) Jin, Qiao 2024-10-22 17:07:51 +0800
  • ec465fbcd7
    Add lookup generate in load_low_bit (#12243) Yina Chen 2024-10-22 10:51:52 +0300
  • d8c1287335
    Further update for Windows dGPU performance tests (#12244) Yuwen Hu 2024-10-22 15:07:21 +0800
  • a35cf4d533
    Update README.md (#12242) Jason Dai 2024-10-22 10:19:07 +0800
  • b3df47486d
    Fix Gemma 2 on LNL (#12240) Yuwen Hu 2024-10-21 18:25:53 +0800
  • ac2dac857c
    Disable 4k input test for now for Windows dGPU performance test (#12239) Yuwen Hu 2024-10-21 15:03:26 +0800
  • ea5154d85e
    Further update to Windows dGPU perf test (#12237) Yuwen Hu 2024-10-21 10:27:16 +0800
  • da9270be2d
    Further update to Windows dGPU perf test (#12233) Yuwen Hu 2024-10-18 23:20:17 +0800
  • 5935b25622
    Further update windows gpu perf test regarding results integrity check (#12232) Yuwen Hu 2024-10-18 18:15:13 +0800
  • ef659629f3
    Small update to Windows dGPU perf test (#12230) Yuwen Hu 2024-10-18 16:39:59 +0800
  • 9d7f42fd0f
    Support manually trigger of dGPU perf test on Windows (#12229) Yuwen Hu 2024-10-18 15:38:21 +0800
  • b10fc892e1
    Update new reference link of xpu/docker/readme.md (#12188) Jun Wang 2024-10-18 13:18:08 +0800
  • fe3b5cd89b
    [Update] mmdocs/dockerguide vllm-quick-start awq,gptq online serving document (#12227) Jun Wang 2024-10-18 09:46:59 +0800
  • 7825dc1398
    Upgrade oneccl to 0.0.5 (#12223) Shaojun Liu 2024-10-18 09:29:19 +0800
  • b88c1df324
    Add Llama 3.1 & 3.2 to Arc Performance test (#12225) Yuwen Hu 2024-10-17 21:12:45 +0800
  • 9ea694484d
    refactor ot remove old rope usage (#12224) Yishuo Wang 2024-10-17 17:06:09 +0800
  • 324bcb057e
    refactor to reduce old rope usage (#12219) Yishuo Wang 2024-10-17 14:45:09 +0800
  • 667f0db466
    Update Eagle example to Eagle2+ipex-llm integration (#11717) Jiao Wang 2024-10-17 14:16:14 +0800
  • 26390f9213
    Update oneccl_wks_installer to 2024.0.0.4.1 (#12217) Shaojun Liu 2024-10-17 10:11:55 +0800
  • a4a758656a
    refactor gemma to reduce old fuse rope usage (#12215) Yishuo Wang 2024-10-16 17:40:28 +0800
  • 9104a168f6
    refactor phi-2 to reduce old fuse rope usage (#12214) Yishuo Wang 2024-10-16 17:08:14 +0800
  • bb247e991b
    refactor merge_qkv and attention_softmax (#12213) Yishuo Wang 2024-10-16 15:58:14 +0800
  • e279148aa0
    optimize llama3.2 vision again (#12211) Yishuo Wang 2024-10-16 14:29:48 +0800
  • f17cc4fdee
    feat: add llama3.2-11b-vision in all in one (#12207) Chu,Youcheng 2024-10-16 10:32:11 +0800
  • c9ac39fc1e
    Add Llama 3.2 to iGPU performance test (transformers 4.45) (#12209) Yuwen Hu 2024-10-15 17:44:46 +0800
  • f6611f9d3a
    optimize llama3.2 vison attention again (#12204) Yishuo Wang 2024-10-15 16:08:20 +0800
  • 9b81236a2e
    optimzie qwen2-vl vision (#12203) Yishuo Wang 2024-10-15 15:54:25 +0800
  • d5344587ab
    optimize internvl2 vision model's attention (#12198) Yishuo Wang 2024-10-15 10:51:00 +0800
  • f8d1adc573
    Fix Llama 3.2 & 3.1 on LNL (#12196) Yuwen Hu 2024-10-14 17:39:20 +0800
  • 516b578104
    Support cpp release for ARL on Windows (#12189) Yuwen Hu 2024-10-14 17:20:31 +0800
  • 7da3ab7322
    Add missing link for Llama3.2-Vision (#12197) Yuwen Hu 2024-10-14 17:19:49 +0800
  • 7d80db710e
    Add benchmark_util for transformers >= 4.44.0 (#12171) Zijie Li 2024-10-14 03:40:12 -0400
  • 8e35800abe
    Add llama 3.1 in igpu perf (#12194) Jin, Qiao 2024-10-14 15:14:34 +0800
  • a768d71581
    Small fix to LNL installation guide (#12192) Yuwen Hu 2024-10-14 12:03:03 +0800
  • 49eb20613a
    add --blocksize to doc and script (#12187) Shaojun Liu 2024-10-12 09:17:42 +0800
  • 6ffaec66a2
    [UPDATE] add prefix caching document into vllm_docker_quickstart.md (#12173) Jun Wang 2024-10-11 19:12:22 +0800