Commit graph

  • 4135b895b3
    refactor chatglm2, internlm, stablelm and qwen (#12604) Yishuo Wang 2024-12-24 18:18:00 +0800
  • 073f936c37
    refactor mistral and phi3 (#12605) Yishuo Wang 2024-12-24 17:52:32 +0800
  • 45f8f72a28
    [NPU] Fix minicpm on MTL (#12599) binbin Deng 2024-12-24 15:37:56 +0800
  • ad2dc965c5
    refactor mllama, gpt2 and internvl (#12602) Yishuo Wang 2024-12-24 14:18:31 +0800
  • 7aaf02f602
    refactor baichuan, glm4 and minicpm3 (#12600) Yishuo Wang 2024-12-24 14:16:30 +0800
  • c410d9cf73
    [NPU] support asym_int4 for baichuan (#12576) Zijie Li 2024-12-23 20:17:50 -0500
  • 098eb335b2
    refactor sd 1.5 and qwen2-vl and fix (#12590) Yishuo Wang 2024-12-20 17:34:55 +0800
  • b050368efc
    refactor yuan2 and starcoder2 and fix (#12589) Yishuo Wang 2024-12-20 16:41:50 +0800
  • 6ea8033635
    refactor glm edge (#12588) Yishuo Wang 2024-12-20 15:36:57 +0800
  • b0338c5529
    Add --modelscope option for glm-v4 MiniCPM-V-2_6 glm-edge and internvl2 (#12583) Xu, Shuo 2024-12-20 13:54:17 +0800
  • f3b5fad3be
    refactor qwen2 and llama3 (#12587) Yishuo Wang 2024-12-20 13:25:25 +0800
  • 51ff9ebd8a
    Upgrade oneccl version to 0.0.6.3 (#12560) Shaojun Liu 2024-12-20 09:29:16 +0800
  • 47da3c999f
    Add --modelscope in GPU examples for minicpm, minicpm3, baichuan2 (#12564) Xu, Shuo 2024-12-19 17:25:46 +0800
  • 3eeb02f1be
    support Megrez-3B-Omni (#12582) Yishuo Wang 2024-12-19 17:23:01 +0800
  • 4e7e988f70
    [NPU] Fix MTL and ARL support (#12580) binbin Deng 2024-12-19 16:55:30 +0800
  • 80f2fdc37b
    optimize new minicpm model (#12579) Yishuo Wang 2024-12-19 14:22:47 +0800
  • 4540424271
    optimize siglip attention again (#12578) Yishuo Wang 2024-12-19 13:40:48 +0800
  • e0921f80c1
    padding mask on torch side (#12577) Yishuo Wang 2024-12-19 10:53:02 +0800
  • 47e90a362f
    Add --modelscope in GPU examples for glm4, codegeex2, qwen2 and qwen2.5 (#12561) Xu, Shuo 2024-12-19 10:00:39 +0800
  • 28e81fda8e
    Replace runner doc in ollama quickstart (#12575) SONG Ge 2024-12-18 19:05:28 +0800
  • f7a2bd21cf
    Update ollama and llama.cpp readme (#12574) SONG Ge 2024-12-18 17:33:20 +0800
  • e2ae42929a
    small fix (#12573) Yishuo Wang 2024-12-18 15:48:22 +0800
  • a4eb561f36
    optimize siglip attention on arc (#12569) Yishuo Wang 2024-12-18 14:19:43 +0800
  • 1a2ab12876
    [NPU] support asym_int4 for minicpm (#12567) Zijie Li 2024-12-17 21:55:35 -0500
  • 6e801bc4e1
    Update readme (#12565) Jason Dai 2024-12-18 09:33:16 +0800
  • 6278cafc25
    Add setuptools as a basic dependency (#12563) Yuwen Hu 2024-12-17 16:56:41 +0800
  • 694d14b2b4
    [NPU doc] Add ARL runtime configuration (#12562) binbin Deng 2024-12-17 16:08:42 +0800
  • 429bf1ffeb
    Change: Use cn mirror for PyTorch extension installation to resolve network issues (#12559) Shaojun Liu 2024-12-17 14:22:50 +0800
  • fcb474820d
    [NPU] support asym_int4 for llama (#12556) Zijie Li 2024-12-17 01:01:17 -0500
  • d127a8654c
    Small typo fixes (#12558) Yuwen Hu 2024-12-17 13:54:13 +0800
  • a608f26cc8
    use new fused layer norm (#12553) Yishuo Wang 2024-12-17 13:52:35 +0800
  • 680ea7e4a8
    [NPU doc] Update configuration for different platforms (#12554) binbin Deng 2024-12-17 10:15:09 +0800
  • ccc18eefb5
    Add Modelscope option for chatglm3 on GPU (#12545) Xu, Shuo 2024-12-16 20:00:37 +0800
  • 5ae0006103
    remove old rope usage (#12552) Yishuo Wang 2024-12-16 15:59:36 +0800
  • a86487c539
    Add GLM-Edge GPU example (#12483) Chu,Youcheng 2024-12-16 14:39:19 +0800
  • 0b953e61ef
    [REFINE] graphmode code (#12540) Jun Wang 2024-12-16 09:17:01 +0800
  • caf15cc5ef
    [NPU] Add IPEX_LLM_NPU_MTL to enable support on mtl (#12543) binbin Deng 2024-12-13 17:01:13 +0800
  • c090d167dc
    remove old rope usage (#12544) Yishuo Wang 2024-12-13 16:54:58 +0800
  • 5402fc65c8
    [Ollama] Update ipex-llm ollama readme to v0.4.6 (#12542) SONG Ge 2024-12-13 16:26:12 +0800
  • d20a968ce2
    [NPU] Fix generate example (#12541) binbin Deng 2024-12-13 14:07:24 +0800
  • 15219944b8
    optimize glm edge again (#12539) Yishuo Wang 2024-12-13 13:52:39 +0800
  • 6596c18489
    [NPU] Modify IPEX_LLM_NPU_DISABLE_COMPILE_OPT setting for long input (#12537) binbin Deng 2024-12-13 13:49:56 +0800
  • 7cc01fdc86
    [NPU] further fix of new_value_states (#12538) Ruonan Wang 2024-12-12 21:42:00 -0800
  • fa261b8af1
    torch 2.3 inference docker (#12517) Heyang Sun 2024-12-13 10:47:04 +0800
  • b747f3f6b8
    Small fix to GPU installation guide (#12536) Yuwen Hu 2024-12-13 10:02:47 +0800
  • f36c23664f
    [NPU] Fix abnormal output with latest driver (#12530) binbin Deng 2024-12-12 17:56:30 +0800
  • ffce86d69f
    add basic glm-edge-v support (#12533) Yishuo Wang 2024-12-12 17:25:48 +0800
  • 3e0823d2ae
    add basic glm-edge support (#12531) Yishuo Wang 2024-12-12 16:02:22 +0800
  • dbaf4abcb3
    [NPU] Update C++ example with repetition_penalty & update Python code accordingly (#12528) Yuwen Hu 2024-12-12 13:42:55 +0800
  • 2cce89691a
    Enable use_batch_forward Optimization on Battlemage GPU (#12516) Shaojun Liu 2024-12-12 12:44:36 +0800
  • 6fc27da9c1
    [NPU] Update glm-edge support in docs (#12529) binbin Deng 2024-12-12 11:14:09 +0800
  • 509bdb4661
    [NPU] Fix minicpm-2B error (#12527) binbin Deng 2024-12-11 16:49:32 +0800
  • fd9cf767ed
    All-in-one Benchmark run.py: Ignore error if import BenchmarkWrapper failed. (#12526) Xu, Shuo 2024-12-11 16:20:55 +0800
  • 41ef4974ab
    [NPU] fix transpose_value = False for NPU optimize_model=True (#12525) Ruonan Wang 2024-12-10 23:51:39 -0800
  • 588bfa24dc
    support hqq (#12518) Ruonan Wang 2024-12-10 23:43:02 -0800
  • 68f2873bd3
    [NPU] Support repetition penalty for simple generate, Python (cpp backend) (#12522) Yuwen Hu 2024-12-11 14:55:25 +0800
  • 77404d2a63
    support new model (#12523) Yishuo Wang 2024-12-11 13:41:15 +0800
  • 922958c018
    vllm oneccl upgrade to b9 (#12520) Wang, Jian4 2024-12-10 15:02:56 +0800
  • ea55235cbd
    [NPU] Support glm-edge models (#12511) binbin Deng 2024-12-09 14:06:27 +0800
  • 12c78978dd
    [NPU C++] Update example with conversation mode support (#12510) binbin Deng 2024-12-06 12:46:37 +0800
  • 0918d3baca
    [NPU] Fix hf generate with save/load generation config for Python (cpp backend) (#12509) Yuwen Hu 2024-12-05 19:19:58 +0800
  • 49ab8974fa
    [NPU] initial support of asym_int4_rtn (#12484) Ruonan Wang 2024-12-05 01:40:36 -0800
  • 60bafab855
    Small fixes to main readme (#12508) Yuwen Hu 2024-12-05 16:08:43 +0800
  • 0a3eda06d0
    Update README.md (#12507) Jason Dai 2024-12-05 15:46:53 +0800
  • 5e1416c9aa
    fix readme for npu cpp examples and llama.cpp (#12505) Jinhe 2024-12-05 12:32:42 +0800
  • 727f29968c
    Add NPU demo gif to main readme (#12503) Yuwen Hu 2024-12-05 12:24:27 +0800
  • f56a111aa2
    [NPU] Fix load-low-bit benchmark script (#12502) binbin Deng 2024-12-05 10:01:32 +0800
  • 84f1c4ad57
    Small fix for NPU Python cpp simple generate regarding eos tokens (#12501) Yuwen Hu 2024-12-04 18:54:06 +0800
  • d8b14a6305
    Update save/load comments (#12500) Kai Huang 2024-12-04 18:51:38 +0800
  • b89ea1b0cf
    Support save/load model for hf generate (#12499) Kai Huang 2024-12-04 18:26:39 +0800
  • 7d27f134dd
    Fix hf generate for llama3.2 (#12497) Kai Huang 2024-12-04 17:54:40 +0800
  • ffa9a9e1b3
    Update streaming in npu examples (#12495) Chu,Youcheng 2024-12-04 17:51:10 +0800
  • a9e3f7f14c
    optimize minicpm (#12496) Yishuo Wang 2024-12-04 17:14:16 +0800
  • ae9c2154f4
    Added cross-links (#12494) joan726 2024-12-04 16:53:13 +0800
  • e0bf0054e1
    small fix (#12493) Yishuo Wang 2024-12-04 16:37:39 +0800
  • 7ff4533b39
    Support hf generate (#12477) Kai Huang 2024-12-04 16:31:09 +0800
  • ef4028ac2d
    [NPU] Support split lm_head for Qwen2 with CPP (#12491) Yuwen Hu 2024-12-04 14:41:08 +0800
  • 5629fdd518
    optimize qwen2_vl multiple image input or video input (#12487) Yishuo Wang 2024-12-04 09:24:38 +0800
  • c59284418c
    Hotfix of BCE-Emdedding model (#12490) binbin Deng 2024-12-03 18:16:04 +0800
  • 80f15e41f5
    Update README.md (#12489) Jason Dai 2024-12-03 18:02:28 +0800
  • 4ac66db034
    [NPU] Support streaming in Python (cpp backend) (#12488) Yuwen Hu 2024-12-03 17:17:26 +0800
  • 7082844f3f
    Fix NPU LLM example save/load tokenizer (#12485) Jin, Qiao 2024-12-03 16:30:55 +0800
  • 5fe766788e
    Fix MiniCPM-V-2_6 running on NPU (#12486) Jin, Qiao 2024-12-03 16:16:29 +0800
  • 598603bea6
    small fix of imatrix (#12480) Ruonan Wang 2024-12-02 18:46:36 -0800
  • ab01753b1c
    [NPU] update save-load API usage (#12473) binbin Deng 2024-12-03 09:46:15 +0800
  • 26adb82ee3
    [NPU] Remove hard code (#12479) Yuwen Hu 2024-12-02 18:26:07 +0800
  • b2e56a2e03
    Add release support for option xpu_arc (#12422) Yuwen Hu 2024-12-02 17:16:04 +0800
  • aee9acb303
    Add NPU QuickStart & update example links (#12470) Yuwen Hu 2024-12-02 17:03:10 +0800
  • 31c69a8d31
    Fix MiniCPM-V models running on NPU (#12478) Jin, Qiao 2024-12-02 16:29:46 +0800
  • 54d9a590d4
    [NPU]Fix eos_token setting (#12475) binbin Deng 2024-12-02 14:18:22 +0800
  • 59bd4a214f
    add vLLM glm4 fix (#12474) Guancheng Fu 2024-12-02 14:05:16 +0800
  • 4b6c3160be
    Support imatrix-guided quantization for NPU CW (#12468) Ruonan Wang 2024-12-01 19:31:26 -0800
  • f99f188023
    Hotfix of benchmark script (#12467) binbin Deng 2024-11-29 14:00:59 +0800
  • c911026f03
    [NPU C++] Update model support & examples & benchmark (#12466) binbin Deng 2024-11-29 13:35:58 +0800
  • 14d8d3d8af
    Integrate NPU C++ imple into ipex-llm (#12461) binbin Deng 2024-11-29 09:25:37 +0800
  • 490bb0ca53
    [NPU] update fused layers for GW (#12459) Ruonan Wang 2024-11-28 01:14:30 -0800
  • 1b533a105c
    [NPU] Add env to enable scale search (#12462) Yina Chen 2024-11-28 11:06:00 +0200
  • d272f6b471
    remove nf4 unsupport comment in cpu finetuning (#12460) Heyang Sun 2024-11-28 13:26:46 +0800
  • b29da30205
    [NPU] Update C++ L0 (#12458) Ruonan Wang 2024-11-27 06:08:48 -0800
  • a2272b70d3
    Small fix in llama.cpp troubleshooting guide (#12457) Yuwen Hu 2024-11-27 19:22:11 +0800