Commit graph

  • ddcdf47539
    Support Windows ARL release (#12183) Yuwen Hu 2024-10-11 18:30:52 +0800
  • f983f1a8f4
    Add Qwen2-VL gpu example (#12135) Jinhe 2024-10-11 18:25:23 +0800
  • 310f18c8af
    update NPU pipeline generate (#12182) Ruonan Wang 2024-10-11 17:39:20 +0800
  • 1daab4531f
    Upgrade oneccl to 0.0.4 in serving-xpu image (#12185) Shaojun Liu 2024-10-11 16:54:50 +0800
  • 724b2ae66d
    add npu-level0 pipeline.dll to ipex-llm (#12181) Shaojun Liu 2024-10-11 16:05:20 +0800
  • 4d93bb81fe
    Initial support of NPU level0 Model (#12177) Ruonan Wang 2024-10-11 09:45:53 +0800
  • ac44e98b7d
    Update Windows guide regarding LNL support (#12178) Yuwen Hu 2024-10-11 09:20:08 +0800
  • 0ef7e1d101
    fix vllm docs (#12176) Guancheng Fu 2024-10-10 15:44:36 +0800
  • 890662610b
    Fix auto importer for LNL release (#12175) Yuwen Hu 2024-10-10 15:17:43 +0800
  • 535bee5381
    fix qwen2 vl again (#12174) Yishuo Wang 2024-10-10 13:50:01 +0800
  • aef1f671bd
    Support LNL Windows release (#12169) Yuwen Hu 2024-10-09 17:41:10 +0800
  • 78d253165d
    optimize qwen2 vl perf again (#12167) Yishuo Wang 2024-10-09 16:43:48 +0800
  • 412cf8e20c
    [UPDATE] update mddocs/DockerGuides/vllm_docker_quickstart.md (#12166) Jun Wang 2024-10-09 11:19:32 +0800
  • 3d044dbf53
    add llama3.2-vision Pytorch example (#12165) Zijie Li 2024-10-08 21:20:42 -0400
  • e2ef9e938e
    Delete deprecated docs/readthedocs directory (#12164) Shaojun Liu 2024-10-08 14:48:02 +0800
  • 644af2a76e
    add basic llama 3.2 vision support (#12163) Yishuo Wang 2024-10-08 10:46:48 +0800
  • 9b75806d14
    Update Windows GPU quickstart regarding demo (#12124) Ch1y0q 2024-09-29 18:08:49 +0800
  • 17c23cd759
    add llama3.2 GPU example (#12137) Ch1y0q 2024-09-29 14:41:54 +0800
  • f71b38a994
    Update MiniCPM_V_26 GPU example with save & load (#12127) Yuwen Hu 2024-09-26 17:40:22 +0800
  • 669ff1a97b
    fix sd1.5 (#12129) Yishuo Wang 2024-09-26 17:15:16 +0800
  • a266528719
    optimize llama 3.2 rope (#12128) Yishuo Wang 2024-09-26 16:08:10 +0800
  • 584c3489e7
    add basic support for llama3.2 (#12125) Yishuo Wang 2024-09-26 15:46:19 +0800
  • 66f419f8b7
    fix qwen2 vl (#12126) Yishuo Wang 2024-09-26 15:44:02 +0800
  • 2ea13d502f
    Add minicpm3 gpu example (#12114) Ch1y0q 2024-09-26 13:51:37 +0800
  • 77af9bc5fa
    support passing None to low_bit in optimize_model (#12121) Yishuo Wang 2024-09-26 11:09:35 +0800
  • 47e0b83cbf
    optimize sd 1.5 (#12119) Yishuo Wang 2024-09-25 15:45:13 +0800
  • 2bedb17be7
    Add Qwen2.5 NPU Example (#12110) Jin, Qiao 2024-09-25 15:20:03 +0800
  • 657889e3e4
    use english prompt by default (#12115) Shaojun Liu 2024-09-24 17:40:50 +0800
  • 5d63aef60b
    optimize qwen2 vl again (#12109) Yishuo Wang 2024-09-23 13:22:01 +0800
  • 03bd01c99c
    optimize npu qwen2 (#12107) Ruonan Wang 2024-09-20 04:46:16 -0700
  • 02399021d6
    add npu load_low_bit api in all-in-one benchmark (#12103) Jinhe 2024-09-20 17:56:08 +0800
  • 47a9597f24
    Add missing link for Qwen2.5 to CN-ZH readme (#12106) Yuwen Hu 2024-09-20 17:30:30 +0800
  • 9239fd4f12
    add basic support and optimization for qwen2-vl (#12104) Yishuo Wang 2024-09-20 17:23:06 +0800
  • 828fa01ad3
    [NPU] Add mixed_precision for Qwen2 7B (#12098) Yuwen Hu 2024-09-20 16:36:21 +0800
  • 2269768e71
    add internvl2 example (#12102) Ch1y0q 2024-09-20 16:31:54 +0800
  • ad1fe77fe6
    Add language switching (#12096) joan726 2024-09-20 16:05:20 +0800
  • 09b8c80d9d
    update code for NPU qwen2 (#12094) Ruonan Wang 2024-09-20 00:58:32 -0700
  • db7500bfd4
    Add Qwen2.5 GPU example (#12101) Jin, Qiao 2024-09-20 15:55:57 +0800
  • b36359e2ab
    Fix xpu serving image oneccl (#12100) Guancheng Fu 2024-09-20 15:25:41 +0800
  • 54b973c744
    fix ipex_llm import in transformers 4.45 (#12099) Yishuo Wang 2024-09-20 15:24:59 +0800
  • a6cbc01911
    Use new oneccl for ipex-llm serving image (#12097) Guancheng Fu 2024-09-20 14:52:49 +0800
  • 1295898830
    update vllm_online_benchmark script to support long input (#12095) Shaojun Liu 2024-09-20 14:18:30 +0800
  • 9650bf616a
    add transpose_value_cache for NPU benchmark (#12092) Ch1y0q 2024-09-19 18:45:05 +0800
  • f7fb3c896c
    Update lm_head optimization for Qwen2 7B (#12090) Yuwen Hu 2024-09-18 17:02:02 +0800
  • ee33b93464
    Longbench: NV code to ipex-llm (#11662) Xu, Shuo 2024-09-18 15:55:14 +0800
  • 40e463c66b
    Enable vllm load gptq model (#12083) Wang, Jian4 2024-09-18 14:41:00 +0800
  • c2774e1a43
    Update oneccl to 0.0.3 in serving-xpu image (#12088) Xiangyu Tian 2024-09-18 14:29:17 +0800
  • 081af41def
    [NPU] Optimize Qwen2 lm_head to use INT4 (#12072) Ruonan Wang 2024-09-14 00:26:46 -0700
  • 18714ceac7
    Update README.md (#12084) joan726 2024-09-14 15:24:08 +0800
  • b4b8c3e495
    add lowbit_path for generate.py, fix npu_model (#12077) Ch1y0q 2024-09-13 17:28:05 +0800
  • d703e4f127
    Enable vllm multimodal minicpm-v-2-6 (#12074) Wang, Jian4 2024-09-13 13:28:35 +0800
  • a767438546
    fix typo (#12076) Ruonan Wang 2024-09-12 20:44:42 -0700
  • 3f0b24ae2b
    update cpp quickstart (#12075) Ruonan Wang 2024-09-12 20:35:32 -0700
  • 9b4fee8b5b
    disable nightly release for finetune images (#12070) Shaojun Liu 2024-09-12 15:10:50 +0800
  • beb876665d
    pin gradio version to fix connection error (#12069) Shaojun Liu 2024-09-12 14:36:09 +0800
  • 48d9092b5a
    upgrade OneAPI version for cpp Windows (#12063) Ruonan Wang 2024-09-11 20:12:12 -0700
  • e78e45ee01
    update NPU readme: run conhost as administrator (#12066) Jinhe 2024-09-11 17:54:04 +0800
  • 4ca330da15
    Fix NPU load error message and add minicpm npu lowbit feat (#12064) Jinhe 2024-09-11 16:56:35 +0800
  • 32e8362da7
    added minicpm cpu examples (#12027) Jinhe 2024-09-11 15:51:21 +0800
  • a0c73c26d8
    clean NPU code (#12060) Ruonan Wang 2024-09-11 00:10:35 -0700
  • c75f3dd874
    vllm no padding glm4 to avoid nan error (#12062) Wang, Jian4 2024-09-11 13:44:40 +0800
  • 649390c464
    fix: textual and env variable adjustment (#12038) Chu,Youcheng 2024-09-11 13:38:01 +0800
  • c94032f97e
    Try to fix llamaindex ut again (#12061) Yuwen Hu 2024-09-11 12:11:04 +0800
  • 7e1e51d91a
    Update vllm setting (#12059) Shaojun Liu 2024-09-11 11:45:08 +0800
  • 30a8680645
    Update for vllm one card padding (#12058) Wang, Jian4 2024-09-11 10:52:55 +0800
  • c5fdfde1bd
    fix npu-model prompt (#12057) Zijie Li 2024-09-11 10:06:45 +0800
  • 94dade9aca
    Fix UT of ipex_llm.llamaindex (#12055) Yuwen Hu 2024-09-11 09:58:43 +0800
  • 52863dd567
    fix vllm_online_benchmark.py (#12056) Shaojun Liu 2024-09-11 09:45:30 +0800
  • d8c044e79d
    optimize minicpm3 kv cache (#12052) Yishuo Wang 2024-09-10 16:51:21 +0800
  • 5d3ab16a80
    Add vllm glm and baichuan padding (#12053) Wang, Jian4 2024-09-10 15:57:28 +0800
  • 69c8d36f16
    Switching from vLLM v0.3.3 to vLLM 0.5.4 (#12042) Guancheng Fu 2024-09-10 15:37:43 +0800
  • 73a4360f3f
    update lowbit path for baichuan2, qwen2, generate.py (#12051) Ch1y0q 2024-09-10 15:35:24 +0800
  • dc4af02b2a
    Fix qwen2 1.5B NPU load error (#12049) Ruonan Wang 2024-09-09 23:41:18 -0700
  • abc370728c
    optimize minicpm3 again (#12047) Yishuo Wang 2024-09-10 14:19:57 +0800
  • f0061a9916
    remove local import os to fix Baichuan NPU load issue (#12044) Ch1y0q 2024-09-10 14:13:24 +0800
  • 640998edea
    update inter_pp of qwen2 (#12041) Ruonan Wang 2024-09-09 19:34:17 -0700
  • 048b4590aa
    add basic minicpm3 optimization (#12039) Yishuo Wang 2024-09-09 17:25:08 +0800
  • 16c658e732
    LLM: add known issues to harness evaluation (#12036) Chu,Youcheng 2024-09-09 14:15:42 +0800
  • 6cedb601e4
    remove some useless code (#12035) Yishuo Wang 2024-09-06 17:51:08 +0800
  • d2e1b9aaff
    Add input padding during prefill for qwen2-7b (#12033) binbin Deng 2024-09-06 16:39:59 +0800
  • f61b1785fb
    Small update to NPU example readme (#12034) Yuwen Hu 2024-09-06 15:54:23 +0800
  • 0d04531ae0
    update NPU readme of Qwen2 (#12032) Ruonan Wang 2024-09-06 00:02:39 -0700
  • 58555bd9de
    Optimize broadcast for npu llama (#12028) Yang Wang 2024-09-05 22:28:20 -0700
  • e5581e6ded
    Select the Appropriate APT Repository Based on CPU Type (#12023) Shaojun Liu 2024-09-05 17:06:07 +0800
  • 5b18bb3c4a
    Add recommend version for mtl npu (#12024) binbin Deng 2024-09-05 16:28:53 +0800
  • 845e5dc89e
    Support lm_head of minicpm-2b on NPU (#12019) binbin Deng 2024-09-05 16:19:22 +0800
  • 820f8a4554
    add --lowbit-path option for NPU llama example (#12020) Ch1y0q 2024-09-05 15:31:01 +0800
  • 8803242f5c
    fix llama on cpu (#12018) Guoqiong Song 2024-09-04 19:17:54 -0700
  • b3b2cd64b4
    Support lightweight-serving glm-4v-9b (#11994) Wang, Jian4 2024-09-05 09:25:08 +0800
  • 75b19f8522
    revert actions/download-artifact version to 3 (#12017) Shaojun Liu 2024-09-04 22:39:07 +0800
  • c6348a4666
    Update action.yml (#12016) Shaojun Liu 2024-09-04 22:12:24 +0800
  • b1408a1f1c
    fix UT (#12005) Yishuo Wang 2024-09-04 18:02:49 +0800
  • 77cb348220
    fix dependabot alerts (#12006) Shaojun Liu 2024-09-04 17:13:45 +0800
  • 2b993ad479
    vllm update for glm-4 model automatic not_convert (#12003) Wang, Jian4 2024-09-04 13:50:32 +0800
  • 9eaff5e47d
    add save & load support for NPU optimized model (#11999) Ruonan Wang 2024-09-03 05:53:22 -0700
  • 6eb55653ba
    Performance mode strategy update for input_embeds input (#11997) Yuwen Hu 2024-09-03 17:46:16 +0800
  • 164f47adbd
    MiniCPM-V-2 & MiniCPM-Llama3-V-2_5 example updates (#11988) Jinhe 2024-09-03 17:02:06 +0800
  • 2e54f4402b
    Rename MiniCPM-V-2_6 CPU example (#11998) Jin, Qiao 2024-09-03 16:50:42 +0800
  • 643458d8f0
    Update GraphRAG QuickStart (#11995) Yuwen Hu 2024-09-03 15:52:08 +0800
  • 01099f08ee
    Revert prefill logic of qwen2-7b (#11992) binbin Deng 2024-09-03 14:45:01 +0800