Commit graph

  • a1e7bfc638
    Update Readme (#12770) Jason Dai 2025-02-05 19:19:57 +0800
  • 0237ffb302
    refactor xpu linear forward (#12768) Yishuo Wang 2025-02-05 17:40:38 +0800
  • 413d6c2b66
    Update check.py removing a twice defined function (#12760) Danciu Georgian 2025-02-05 05:37:59 +0200
  • 184adb2653
    Small fix to MiniCPM-o-2_6 GPU example (#12766) Yuwen Hu 2025-02-05 11:32:26 +0800
  • ee809e71df
    add troubleshooting section (#12755) Shaojun Liu 2025-01-26 11:03:58 +0800
  • 5fb87d7486
    remove ${HF_TOKEN} (#12742) Shaojun Liu 2025-01-26 10:31:42 +0800
  • f924880694
    vLLM: Fix vLLM-CPU docker image (#12741) Xiangyu Tian 2025-01-24 10:00:29 +0800
  • 69f13c78b8
    [NPU] Update layernorm node on MTL/ARL (#12738) Yuwen Hu 2025-01-23 17:25:19 +0800
  • d11f257ee7
    Add GPU example for MiniCPM-o-2_6 (#12735) Yuwen Hu 2025-01-23 16:10:19 +0800
  • dcca522618
    Remove sdpa available patch (#12734) Yuwen Hu 2025-01-22 17:22:28 +0800
  • c9b6c94a59
    vLLM: Update vLLM-cpu to v0.6.6-post1 (#12728) Xiangyu Tian 2025-01-22 15:03:01 +0800
  • 78cca0a68c
    [NPU] update llm-npu-cli example (#12729) Ruonan Wang 2025-01-22 09:59:27 +0800
  • 7e29edcc4b
    Update Readme (#12730) Jason Dai 2025-01-22 08:43:32 +0800
  • 6789e5d92f
    small fix (#12727) Yishuo Wang 2025-01-21 17:27:18 +0800
  • 412bfd6644
    Update readme (#12724) Jason Dai 2025-01-21 10:59:14 +0800
  • 716d4fe563
    Add vllm 0.6.2 vision offline example (#12721) Wang, Jian4 2025-01-21 09:58:01 +0800
  • 085974e307
    fix nf4 to cpu (#12722) Yishuo Wang 2025-01-21 09:23:22 +0800
  • 9aa4be8ced
    Update runtime configuration on MTL (#12720) Yuwen Hu 2025-01-20 11:06:37 +0800
  • bda87c21eb
    add support and optimization for minicpmo audio part (#12716) Yishuo Wang 2025-01-16 16:39:00 +0800
  • 53aae24616
    Add note about enabling Resizable BAR in BIOS for GPU setup (#12715) Shaojun Liu 2025-01-16 16:22:35 +0800
  • 534e0e6774
    Update dependency for PyTorch 2.6 RC support for woq int4 (#12714) Yuwen Hu 2025-01-16 15:51:57 +0800
  • 54d6328b3c
    woq int4 fwd (#12711) Zhao Changmin 2025-01-16 15:48:05 +0800
  • b62734748f
    add support and optimization for minicpmo vision part (#12713) Yishuo Wang 2025-01-16 14:51:00 +0800
  • c52bdff76b
    Update Deepseek coder GPU example (#12712) Yuwen Hu 2025-01-16 14:05:31 +0800
  • 9d65dcd7ef
    Fix deepseek coder with linear rope type support on GPU (#12709) Yuwen Hu 2025-01-15 21:12:34 +0800
  • 36bf3d8e29
    [NPU doc] Update ARL product in QuickStart (#12708) binbin Deng 2025-01-15 15:57:06 +0800
  • 9930351112
    LLM: add new qtype woq_int4 to support gemm int4 temporary. (#12706) Cengguang Zhang 2025-01-15 14:41:33 +0800
  • 6d03d06ebb
    Change runtime configurations for perf test on Windows (#12705) Yuwen Hu 2025-01-14 17:54:57 +0800
  • 350fae285d
    Add Qwen2-VL HF GPU example with ModelScope Support (#12606) Xu, Shuo 2025-01-13 15:42:04 +0800
  • a1da7908b9
    Fix name device is not found bug (#12703) Yuwen Hu 2025-01-13 10:11:02 +0800
  • e2d58f733e
    Update ollama v0.5.1 document (#12699) SONG Ge 2025-01-10 18:04:49 +0800
  • db9db51e2c
    fix lnl perf (#12700) Yishuo Wang 2025-01-10 18:00:58 +0800
  • 4bf93c66e8
    Support install from source for PyTorch 2.6 RC in UT (#12697) Yuwen Hu 2025-01-10 16:44:18 +0800
  • da8bcb7db1
    [NPU ] fix load logic of glm-edge models (#12698) binbin Deng 2025-01-10 16:08:37 +0800
  • 584c1c5373
    Update B580 CN doc (#12695) joan726 2025-01-10 11:20:47 +0800
  • cbb8e2a2d5
    Update documents (#12693) Jason Dai 2025-01-10 10:47:11 +0800
  • f8dc408888
    fix user issue (#12692) Yishuo Wang 2025-01-10 10:18:47 +0800
  • 68857494a5
    refactor to simplify following upgrade 2 (#12685) Yishuo Wang 2025-01-10 09:29:03 +0800
  • 2673792de6
    Update Dockerfile (#12688) Shaojun Liu 2025-01-10 09:01:29 +0800
  • f9b29a4f56
    Update B580 doc (#12691) Jason Dai 2025-01-10 08:59:35 +0800
  • 66d4385cc9
    Update B580 CN Doc (#12686) joan726 2025-01-09 19:10:57 +0800
  • c24741584d
    Support PyTorch 2.6 RC perf test on Windows (#12683) Yuwen Hu 2025-01-09 18:17:23 +0800
  • 7234c9b27b
    update quantize kv cache condition (#12681) Yishuo Wang 2025-01-09 15:23:04 +0800
  • 5d8081afbc
    Remove dummy model from performance tests (#12682) Yuwen Hu 2025-01-09 14:50:17 +0800
  • 1ec40cd09e
    refactor to simplify following upgrade (#12680) Yishuo Wang 2025-01-09 13:34:30 +0800
  • aa9e70a347
    Update B580 Doc (#12678) Jason Dai 2025-01-08 22:36:48 +0800
  • c6f57ad6ed
    Update README.md (#12677) Jason Dai 2025-01-08 21:55:52 +0800
  • 2321e8d60c
    Update README.md (#12676) Jason Dai 2025-01-08 21:54:31 +0800
  • 5c24276fc4
    fix custom kernel registration (#12674) Yishuo Wang 2025-01-08 17:39:17 +0800
  • a22a8c21bb
    small fix and remove ununsed code about ipex (#12671) Yishuo Wang 2025-01-08 17:39:04 +0800
  • c11f5f0fcd
    also convert SdpaAttention in optimize_model (#12673) Yishuo Wang 2025-01-08 16:48:03 +0800
  • 2c23ce2553
    Create a BattleMage QuickStart (#12663) Shaojun Liu 2025-01-08 14:58:37 +0800
  • 7dd156d292
    small fix and add comment (#12670) Yishuo Wang 2025-01-08 10:56:50 +0800
  • ccf618ff4a
    Remove all ipex usage (#12666) Yishuo Wang 2025-01-08 10:31:18 +0800
  • 0534d7254f
    Update docker_cpp_xpu_quickstart.md (#12667) logicat 2025-01-08 09:56:56 +0800
  • 5db6f9dcde
    Add option with PyTorch 2.6 RC version for testing purposes (#12668) Yuwen Hu 2025-01-07 18:28:55 +0800
  • f9ee7898c8
    fix onednn dependency bug (#12665) Yishuo Wang 2025-01-07 16:26:56 +0800
  • 29ad5c449e
    refactor codegeex to remove ipex kernel usage (#12664) Yishuo Wang 2025-01-07 16:17:40 +0800
  • 525b0ee991
    [NPU] Tiny fixes on examples (#12661) Yuwen Hu 2025-01-07 14:30:38 +0800
  • ebdf19fa7e
    [NPU] Further fix saving of generation config (#12657) Yuwen Hu 2025-01-07 13:53:54 +0800
  • 381d448ee2
    [NPU] Example & Quickstart updates (#12650) Yuwen Hu 2025-01-07 13:52:41 +0800
  • ddc0ef3993
    refactor device check and remove cohere/mixtral support (#12659) Yishuo Wang 2025-01-07 11:15:51 +0800
  • ea65e4fecc
    remove falcon support and related UT (#12656) Yishuo Wang 2025-01-07 09:26:00 +0800
  • fae73eee79
    [NPU] Support save npu quantized model without npu dependency (#12647) Yina Chen 2025-01-06 12:06:22 +0200
  • 502461d836
    remove unnecessary ipex kernel usage (#12649) Yishuo Wang 2025-01-03 16:45:24 +0800
  • 9f8b134889
    add ipex-llm custom kernel registration (#12648) Yishuo Wang 2025-01-03 16:45:04 +0800
  • 0b377100c5
    Add guide for save-load usage (#12498) binbin Deng 2025-01-03 16:30:15 +0800
  • 6711a48a36
    Enable internvl2-8b on vllm(#12645) Wang, Jian4 2025-01-03 14:49:36 +0800
  • 8fd2dcba86
    Add benchmark_util for transformers >= 4.47.0 (#12644) Zijie Li 2025-01-03 10:48:29 +0800
  • 550fa01649
    [Doc] Update ipex-llm ollama troubleshooting for v0.4.6 (#12642) SONG Ge 2025-01-02 17:28:54 +0800
  • 8e5328e9b4
    add disable opts for awq (#12641) Yina Chen 2025-01-02 09:45:22 +0200
  • 62318964fa
    Update llama example information (#12640) Xu, Shuo 2025-01-02 13:48:39 +0800
  • 81211fd010
    remove unused code (#12635) Yishuo Wang 2025-01-02 13:31:09 +0800
  • 534566e290
    [NPU] Support minicpm-v with python cpp backend (#12637) binbin Deng 2025-01-02 11:13:15 +0800
  • f289f68d57
    small fix (#12634) Yishuo Wang 2024-12-30 17:14:25 +0800
  • 2d08155513
    remove bmm, which is only required in ipex 2.0 (#12630) Yishuo Wang 2024-12-27 17:28:57 +0800
  • f17ccfa61a
    [NPU] Fix save-load usage of minicpm models (#12628) binbin Deng 2024-12-27 15:56:46 +0800
  • c72a5db757
    remove unused code again (#12624) Yishuo Wang 2024-12-27 14:17:11 +0800
  • 46eeab4479
    [NPU] Fix regression caused by layer_norm change (#12627) binbin Deng 2024-12-27 14:08:49 +0800
  • 90f6709486
    [remove pipeline examples (#12626) Ruonan Wang 2024-12-26 21:42:28 -0800
  • 5f04ed7254
    NPU] Update prompt format for baichuan2-pipeline (#12625) Zijie Li 2024-12-27 11:30:54 +0800
  • 34dbdb8ee3
    small fix (#12623) Yishuo Wang 2024-12-27 10:19:27 +0800
  • 55ce091242
    Add GLM4-Edge-V GPU example (#12596) Xu, Shuo 2024-12-27 09:40:29 +0800
  • 796ee571a5
    [NPU doc] Update verified platforms (#12621) binbin Deng 2024-12-26 17:39:13 +0800
  • bbdbbb0d88
    [NPU] Compatible with other third-party models like auto-round (#12620) Ruonan Wang 2024-12-26 01:25:18 -0800
  • a9abde0b5d
    support passing attn_scale to sdpa (#12619) Yishuo Wang 2024-12-26 16:58:09 +0800
  • 40a7d2b4f0
    Consolidated C-Eval Benchmark Guide for Single-GPU and Multi-GPU Environments (#12618) Shaojun Liu 2024-12-26 15:23:32 +0800
  • ccc4055058
    [NPU] Update prompt format for baichuan2 (#12615) Zijie Li 2024-12-26 11:41:37 +0800
  • 1604b4ead8
    small fix (#12616) Yishuo Wang 2024-12-26 11:35:12 +0800
  • d841e1dc0d
    [NPU] update convert script based on latest usage (#12617) Ruonan Wang 2024-12-25 19:23:04 -0800
  • ef585d3360
    Polish Readme for ModelScope-related examples (#12603) Xu, Shuo 2024-12-26 10:52:47 +0800
  • 28737c250c
    Update Dockerfile (#12585) Shaojun Liu 2024-12-26 10:20:52 +0800
  • a596f1ae5f
    remove bigdl-llm test to fix langchain UT (#12613) Yishuo Wang 2024-12-26 10:17:25 +0800
  • 9e895f04ec
    [NPU] fix npu save (#12614) Ruonan Wang 2024-12-25 17:21:16 -0800
  • 0477fe6480
    [docs] Update doc for latest open webui: 0.4.8 (#12591) Mingqi Hu 2024-12-26 09:18:20 +0800
  • 6249c1e373
    rewrite llama optimization (#12609) Yishuo Wang 2024-12-25 17:04:32 +0800
  • 5f5ac8a856
    fix llama related import (#12611) Yishuo Wang 2024-12-25 16:23:52 +0800
  • 54b1d7d333
    Update README.zh-CN.md (#12610) Jason Dai 2024-12-25 15:38:59 +0800
  • 4e6b9d804f
    add compresskv back for mistral (#12607) Yishuo Wang 2024-12-25 11:06:08 +0800
  • 9c9800be31
    Update README.zh-CN.md (#12570) joan726 2024-12-24 20:32:36 +0800