Commit graph

  • 56cb992497
    LLM: Modify CPU Installation Command for most examples (#11049) ZehuaCao 2024-05-17 15:52:20 +0800
  • f1156e6b20
    support gguf_q4k_m / gguf_q4k_s (#10887) Ruonan Wang 2024-05-17 06:30:09 +0000
  • 981d668be6
    refactor baichuan2-7b (#11062) Yishuo Wang 2024-05-17 13:01:34 +0800
  • 84239d0bd3
    Update docker image tags in Docker Quickstart (#11061) Shaojun Liu 2024-05-17 11:06:11 +0800
  • b3027e2d60
    Update for cpu install option in performance tests (#11060) Yuwen Hu 2024-05-17 10:33:43 +0800
  • d963e95363
    LLM: Modify CPU Installation Command for documentation (#11042) Xiangyu Tian 2024-05-17 10:14:00 +0800
  • fff067d240
    Make install ut for cpu exactly the same as what we want for users (#11051) Yuwen Hu 2024-05-17 10:11:01 +0800
  • 3a72e5df8c
    disable mlp fusion of fp6 on mtl (#11059) Ruonan Wang 2024-05-17 10:10:16 +0800
  • 192ae35012
    Add support for llama2 quantize_kv with transformers 4.38.0 (#11054) SONG Ge 2024-05-16 22:23:39 +0800
  • 16b2a418be
    hotfix native_sdp ut (#11046) SONG Ge 2024-05-16 17:15:37 +0800
  • 6be70283b7
    fix chatglm run error (#11045) Xin Qiu 2024-05-16 15:39:18 +0800
  • 8cae897643
    use new rope in phi3 (#11047) Yishuo Wang 2024-05-16 15:12:35 +0800
  • 00d4410746
    Update cpp docker quickstart (#11040) Wang, Jian4 2024-05-16 14:55:13 +0800
  • c62e828281
    Create release-ipex-llm.yaml (#11039) Shaojun Liu 2024-05-16 11:10:10 +0800
  • 4638682140
    Fix xpu finetune image path in action (#11037) Qiyuan Gong 2024-05-16 10:48:02 +0800
  • 9a96af4232
    Remove oneAPI pip install command in related examples (#11030) Jin Qiao 2024-05-16 10:46:29 +0800
  • 612a365479
    LLM: Install CPU version torch with extras [all] (#10868) Xiangyu Tian 2024-05-16 10:39:55 +0800
  • 59df750326
    Use new sdp again (#11025) Yishuo Wang 2024-05-16 09:33:34 +0800
  • 7e29928865
    refactor serving docker image (#11028) Guancheng Fu 2024-05-16 09:30:36 +0800
  • 9942a4ba69
    [WIP] Support llama2 with transformers==4.38.0 (#11024) SONG Ge 2024-05-15 18:07:00 +0800
  • 686f6038a8
    Support fp6 save & load (#11034) Yina Chen 2024-05-15 17:52:02 +0800
  • ac384e0f45
    add fp6 mlp fusion (#11032) Ruonan Wang 2024-05-15 17:42:50 +0800
  • 2084ebe4ee
    Enable fastchat benchmark latency (#11017) Wang, Jian4 2024-05-15 14:52:09 +0800
  • 93d40ab127
    Update lookahead strategy (#11021) hxsz1997 2024-05-15 14:48:05 +0800
  • 1d73fc8106
    update cpp quickstart (#11031) Ruonan Wang 2024-05-15 14:33:36 +0800
  • d9f71f1f53
    Update benchmark util for example using (#11027) Wang, Jian4 2024-05-15 14:16:35 +0800
  • 86cec80b51
    LLM: Add llm inference_cpp_xpu_docker (#10933) Wang, Jian4 2024-05-15 11:10:22 +0800
  • 4053a6ef94
    Update environment variable setting in AutoTP with arc (#11018) binbin Deng 2024-05-15 10:23:58 +0800
  • fad1dbaf60
    use sdp fp8 causal kernel (#11023) Yishuo Wang 2024-05-15 10:22:35 +0800
  • c34f85e7d0
    [Doc] Simplify installation on Windows for Intel GPU (#11004) Yuwen Hu 2024-05-15 09:55:41 +0800
  • 1e00bd7bbe
    Re-org XPU finetune images (#10971) Qiyuan Gong 2024-05-15 09:42:43 +0800
  • ee325e9cc9
    fix phi3 (#11022) Yishuo Wang 2024-05-15 09:32:12 +0800
  • 7d3791c819
    [LLM] Add llama3 alpaca qlora example (#11011) Ziteng Zhang 2024-05-15 09:17:32 +0800
  • 0a732bebe7
    Add phi3 cached RotaryEmbedding (#11013) Zhao Changmin 2024-05-15 08:16:43 +0800
  • 0b7e78b592
    revise the benchmark part in python inference docker (#11020) Shengsheng Huang 2024-05-14 18:43:41 +0800
  • 586a151f9c
    update the README and reorganize the docker guides structure. (#11016) Shengsheng Huang 2024-05-14 17:56:11 +0800
  • 893197434d
    Add fp6 support on gpu (#11008) Yina Chen 2024-05-14 16:31:44 +0800
  • b03c859278
    Add phi3RMS (#10988) Zhao Changmin 2024-05-14 15:16:27 +0800
  • 170e3d65e0
    use new sdp and fp32 sdp (#11007) Yishuo Wang 2024-05-14 14:29:18 +0800
  • 8010af700f
    Update igpu performance test to use pypi installed oneAPI (#11010) Yuwen Hu 2024-05-14 14:05:33 +0800
  • c957ea3831
    Add axolotl main support and axolotl Llama-3-8B QLoRA example (#10984) Qiyuan Gong 2024-05-14 13:43:59 +0800
  • fb656fbf74
    Add requirements for oneAPI pypi packages for windows Intel GPU users (#11009) Yuwen Hu 2024-05-14 13:40:54 +0800
  • 7f8c5b410b
    Quickstart: Run PyTorch Inference on Intel GPU using Docker (on Linux or WSL) (#10970) Shaojun Liu 2024-05-14 12:58:31 +0800
  • a465111cf4
    Update README.md (#11003) Guancheng Fu 2024-05-13 16:44:48 +0800
  • 74997a3ed1
    Adding load_low_bit interface for ipex_llm_worker (#11000) Guancheng Fu 2024-05-13 15:30:19 +0800
  • 1b3c7a6928
    remove phi3 empty cache (#10997) Yishuo Wang 2024-05-13 14:09:55 +0800
  • 99255fe36e
    fix ppl (#10996) ZehuaCao 2024-05-13 13:57:19 +0800
  • 04d5a900e1
    update troubleshooting of llama.cpp (#10990) Ruonan Wang 2024-05-13 11:18:38 +0800
  • f8dd2e52ad
    Fix Langchain upstream ut (#10985) Kai Huang 2024-05-11 14:40:37 +0800
  • 9f6358e4c2
    Deprecate support for pytorch 2.0 on Linux for ipex-llm >= 2.1.0b20240511 (#10986) Yuwen Hu 2024-05-11 12:33:35 +0800
  • 5e0872073e
    add version for llama.cpp and ollama (#10982) Ruonan Wang 2024-05-11 09:20:31 +0800
  • ad96f32ce0
    optimize phi3 1st token performance (#10981) Yishuo Wang 2024-05-10 17:33:46 +0800
  • cfed76b2ed
    LLM: add long-context support for Qwen1.5-7B/Baichuan2-7B/Mistral-7B. (#10937) Cengguang Zhang 2024-05-10 16:40:15 +0800
  • f9615f12d1
    Add driver related packages version check in env script (#10977) binbin Deng 2024-05-10 15:02:58 +0800
  • a6342cc068
    Empty cache after phi first attention to support 4k input (#10972) Kai Huang 2024-05-09 19:50:04 +0800
  • e753125880
    use fp16_sdp when head_dim=96 (#10976) Yishuo Wang 2024-05-09 17:02:59 +0800
  • b7f7d05a7e
    update llama.cpp usage of llama3 (#10975) Ruonan Wang 2024-05-09 16:44:12 +0800
  • 697ca79eca
    use quantize kv and sdp in phi3-mini (#10973) Yishuo Wang 2024-05-09 15:16:18 +0800
  • e3159c45e4
    update private gpt quickstart and a small fix for dify (#10969) Shengsheng Huang 2024-05-09 13:57:45 +0800
  • 459b764406
    Remove munually_build_for_test push outside (#10968) Wang, Jian4 2024-05-09 10:40:34 +0800
  • 11df5f9773
    revise private GPT quickstart and a few fixes for other quickstart (#10967) Shengsheng Huang 2024-05-08 21:18:20 +0800
  • 37820e1d86
    Add privateGPT quickstart (#10932) Keyan (Kyrie) Zhang 2024-05-08 05:48:00 -0700
  • f4c615b1ee
    Add cohere example (#10954) Wang, Jian4 2024-05-08 17:19:59 +0800
  • 7e7d969dcb
    a experimental for workflow abuse step1 fix a typo (#10965) Zephyr1101 2024-05-08 17:12:50 +0800
  • 3209d6b057
    Fix spculative llama3 no stop error (#10963) Wang, Jian4 2024-05-08 17:09:47 +0800
  • 02870dc385
    LLM: Refine README of AutoTP-FastAPI example (#10960) Xiangyu Tian 2024-05-08 16:55:23 +0800
  • 2ebec0395c
    optimize phi-3-mini-128 (#10959) Yishuo Wang 2024-05-08 16:33:17 +0800
  • dfa3147278
    update (#10944) Xin Qiu 2024-05-08 14:28:05 +0800
  • 5973d6c753
    make gemma's output better (#10943) Xin Qiu 2024-05-08 14:27:51 +0800
  • 15ee3fd542
    Update igpu perf internlm (#10958) Jin Qiao 2024-05-08 14:16:43 +0800
  • 0d6e12036f
    Disable fast_init_ in load_low_bit (#10945) Zhao Changmin 2024-05-08 10:46:19 +0800
  • 164e6957af
    Refine axolotl quickstart (#10957) Qiyuan Gong 2024-05-08 09:34:02 +0800
  • c801c37bc6
    optimize phi3 again: use quantize kv if possible (#10953) Yishuo Wang 2024-05-07 17:26:19 +0800
  • aa2fa9fde1
    optimize phi3 again: use sdp if possible (#10951) Yishuo Wang 2024-05-07 15:53:08 +0800
  • c11170b96f
    Upgrade Peft to 0.10.0 in finetune examples and docker (#10930) Qiyuan Gong 2024-05-07 15:12:26 +0800
  • d7ca5d935b
    Upgrade Peft version to 0.10.0 for LLM finetune (#10886) Qiyuan Gong 2024-05-07 15:09:14 +0800
  • 0efe26c3b6
    Change order of chatglm2-6b and chatglm3-6b in iGPU perf test for more stable performance (#10948) Yuwen Hu 2024-05-07 13:48:39 +0800
  • 245c7348bc
    Add codegemma example (#10884) hxsz1997 2024-05-07 13:35:42 +0800
  • 08ad40b251
    improve ipex-llm-init for Linux (#10928) Shaojun Liu 2024-05-07 12:55:14 +0800
  • 33b8f524c2
    Add cpp docker manually_test (#10946) Wang, Jian4 2024-05-07 11:23:28 +0800
  • 191b184341
    LLM: Optimize cohere model (#10878) Wang, Jian4 2024-05-07 10:19:50 +0800
  • 13a44cdacb
    LLM: Refine Deepspped-AutoTP-FastAPI example (#10916) Xiangyu Tian 2024-05-07 09:37:31 +0800
  • 1de878bee1
    LLM: Fix speculative llama3 long input error (#10934) Wang, Jian4 2024-05-07 09:25:20 +0800
  • 49ab5a2b0e
    Add embeddings (#10931) Guancheng Fu 2024-05-07 09:07:02 +0800
  • d649236321
    make images clickable (#10939) Shengsheng Huang 2024-05-06 20:24:15 +0800
  • 64938c2ca7
    Dify quickstart revision (#10938) Shengsheng Huang 2024-05-06 19:59:17 +0800
  • 3f438495e4
    update llama.cpp and ollama quickstart (#10929) Ruonan Wang 2024-05-06 15:01:06 +0800
  • 41ffe1526c
    Modify CPU finetune docker for bz2 error (#10919) Qiyuan Gong 2024-05-06 10:41:50 +0800
  • 0e0bd309e2
    LLM: Enable Speculative on Fastchat (#10909) Wang, Jian4 2024-05-06 10:06:20 +0800
  • 8379f02a74
    Add Dify quickstart (#10903) Zhicun 2024-05-06 10:01:34 +0800
  • 0edef1f94c
    LLM: add min_new_tokens to all in one benchmark. (#10911) Cengguang Zhang 2024-05-06 09:32:59 +0800
  • c78a8e3677
    update quickstart (#10923) Shengsheng Huang 2024-04-30 18:19:31 +0800
  • 282d676561
    update continue quickstart (#10922) Shengsheng Huang 2024-04-30 17:51:21 +0800
  • 75dbf240ec
    LLM: update split tensor conditions. (#10872) Cengguang Zhang 2024-04-30 17:07:21 +0800
  • 71f51ce589
    Initial Update for Continue Quickstart with Ollama backend (#10918) Yuwen Hu 2024-04-30 15:10:30 +0800
  • 2c64754eb0
    Add vLLM to ipex-llm serving image (#10807) Guancheng Fu 2024-04-29 17:25:42 +0800
  • 1f876fd837
    Add example for phi-3 (#10881) Jin Qiao 2024-04-29 16:43:55 +0800
  • c936ba3b64
    Small fix for supporting workflow dispatch in nightly perf (#10908) Yuwen Hu 2024-04-29 13:25:14 +0800
  • d884c62dc4
    remove new_layout parameter (#10906) Yishuo Wang 2024-04-29 10:31:50 +0800
  • fbcd7bc737
    Fix Loader issue with dtype fp16 (#10907) Guancheng Fu 2024-04-29 10:16:02 +0800