Commit graph

  • 659d15defc
    Fix wrong attention mask and garbage output for inputs_embeds inputs during lookup generation (#11989) Yuwen Hu 2024-09-02 19:09:12 +0800
  • 2f3d1bd0ec
    hotfix qwen2-7b weight setting (#11991) binbin Deng 2024-09-02 18:11:08 +0800
  • a40ea7038d
    Fix AttributeError of qwen2-1.5B (#11990) binbin Deng 2024-09-02 17:55:10 +0800
  • c48817bd43
    Support Qwen2-7b MLP in int4 and transpose_value_cache=True (#11968) Yang Wang 2024-09-01 23:37:44 -0700
  • 65e281bb29
    Add MiniCPM-V cpu example (#11975) Jin, Qiao 2024-09-02 10:17:57 +0800
  • 79978e6f36
    update npu multimodal readme (#11979) Ruonan Wang 2024-08-30 04:02:06 -0700
  • 4811a490ef
    small fix (#11978) Ruonan Wang 2024-08-30 02:55:15 -0700
  • 573c20bae6
    fix npu lm_head cpu condition (#11976) Ruonan Wang 2024-08-30 02:11:26 -0700
  • 60aa1a2c0f
    Initial NPU support for MiniCPM-V-2_6 (#11966) Ruonan Wang 2024-08-30 01:34:35 -0700
  • 158289d205
    [NPU] Add initial support for minicpm-llama-v2.5 (#11962) SONG Ge 2024-08-30 16:00:33 +0800
  • ae7302a654
    add gptq option for ppl test (#11921) Chu,Youcheng 2024-08-30 13:43:48 +0800
  • 1e8c87050f
    fix model path (#11973) Shaojun Liu 2024-08-30 13:28:28 +0800
  • e895e1b4c5
    modification on llamacpp readme after Ipex-llm latest update (#11971) Jinhe 2024-08-30 11:36:45 +0800
  • cd077881f1
    Disable lm head (#11972) binbin Deng 2024-08-30 11:05:18 +0800
  • 7d103417b8
    Fix glm4-9b-chat nan error on vllm 0.3.3 (#11970) Wang, Jian4 2024-08-30 09:50:18 +0800
  • 77b04efcc5
    add notes for SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS (#11936) Ch1y0q 2024-08-30 09:26:47 +0800
  • fbf088f61e
    remove obselete npu code (#11967) Yang Wang 2024-08-29 14:16:44 -0700
  • a9e485eb1b
    Support MiniCPM-V-2_6 multi-modal benchmarking with latency text streamer (#11963) Yuwen Hu 2024-08-29 19:22:09 +0800
  • 2e49e1f8e9
    Further fix for MiniCPM-V-2_6 example (#11965) Yuwen Hu 2024-08-29 19:14:13 +0800
  • 431affd0a0
    Update README.md (#11964) Jason Dai 2024-08-29 18:56:35 +0800
  • 14b2c8dc32
    Update qwen2-7b example script (#11961) binbin Deng 2024-08-29 18:25:17 +0800
  • 7abe17d6f7
    Update MiniCPM-V-2_6 Example (#11958) Yuwen Hu 2024-08-29 18:23:48 +0800
  • 6fc9340d53
    restore ollama webui quickstart (#11955) Jinhe 2024-08-29 17:53:19 +0800
  • 5f7ff76ea5
    update troubleshooting (#11960) Yina Chen 2024-08-29 12:44:22 +0300
  • 882f4a5ff7
    Add lnl npu driver recommend version and enable cpu_lm_head on llama3 (#11952) Yina Chen 2024-08-29 10:01:18 +0300
  • 71f03dcc39
    Support qwen2-7b with fused decoderlayer optimization on NPU (#11912) binbin Deng 2024-08-29 13:34:20 +0800
  • 63ac5f64bb
    Refactor NPU baichuan multiple-process (#11945) Jiao Wang 2024-08-28 11:33:40 -0700
  • 5ca7390082
    [NPU] Add minicpm-2b support for npu multi-processing (#11949) SONG Ge 2024-08-28 18:08:49 +0800
  • 0fbb10259a
    use sdp_causal to reduce internvl2-4b memory usage if set environment variable (#11953) Yishuo Wang 2024-08-28 17:35:05 +0800
  • 0a7bd274e2
    Add vllm awq loading logic (#11950) Guancheng Fu 2024-08-28 16:46:18 +0800
  • b38fb67bec
    [NPU] lm head to cpu (#11943) Yina Chen 2024-08-28 11:34:07 +0300
  • ec67ee7177
    added accelerate version specification in open webui quickstart(#11948) Jinhe 2024-08-28 15:02:39 +0800
  • e23549f63f
    Update llamaindex examples (#11940) hxsz1997 2024-08-28 09:03:44 +0300
  • 23f51f87f0
    update tag to 2.2.0-SNAPSHOT (#11947) Shaojun Liu 2024-08-28 09:20:32 +0800
  • 460bc96d32
    update version of llama.cpp / ollama (#11930) Ruonan Wang 2024-08-27 06:21:44 -0700
  • bec00e2015
    Improve baichuan2 NPU performance (#11942) binbin Deng 2024-08-27 18:37:08 +0800
  • 90f692937d
    Update npu baichuan2 (#11939) Zijie Li 2024-08-27 16:56:26 +0800
  • 7f7f6c89f5
    Quick fix benchmark script (#11938) binbin Deng 2024-08-27 15:29:27 +0800
  • b4b6ddf73c
    NPU Baichuan2 Multi- Process example (#11928) Jiao Wang 2024-08-27 00:25:49 -0700
  • e211a5b076
    update minicpm to meet latest refactor (#11937) SONG Ge 2024-08-27 15:08:01 +0800
  • a81a329a5f
    [NPU] Add example for NPU multi-processing minicpm-1b model (#11935) SONG Ge 2024-08-27 14:57:46 +0800
  • 7c8c9a0670
    Update benchmark script for NPU (#11932) binbin Deng 2024-08-27 14:41:14 +0800
  • 730d9ec811
    Add Qwen2-audio example (#11835) Ch1y0q 2024-08-27 13:35:24 +0800
  • b11b28e9a9
    update CORE_XE_VERSION to 2.6.0 (#11929) Shaojun Liu 2024-08-27 13:10:13 +0800
  • e246f1e258
    update llama3 npu example (#11933) Yina Chen 2024-08-27 08:03:18 +0300
  • 14dddfc0d6
    Update NPU example readme (#11931) binbin Deng 2024-08-27 12:44:58 +0800
  • 6c3eb1e1e8
    refactor from_pretrained API for NPU (#11927) Zijie Li 2024-08-27 09:50:30 +0800
  • 7ca557aada
    LLM: Fix vLLM CPU convert error (#11926) Xiangyu Tian 2024-08-27 09:22:19 +0800
  • 5a8fc1baa2
    update troubleshooting for llama.cpp and ollama (#11890) Ch1y0q 2024-08-26 20:55:23 +0800
  • c1d07bc626
    Support streaming for lookup generation (#11922) Yuwen Hu 2024-08-26 19:33:31 +0800
  • a0bbd8e28d
    All-in-one benchmark update regarding performance mode for input length threshold (#11920) Yuwen Hu 2024-08-26 18:52:13 +0800
  • 019f725d4d
    [NPU] Add support for running mp minicpm model on npu (#11909) SONG Ge 2024-08-26 17:52:55 +0800
  • dd303776cf
    Add troubleshooting about transpose value setting binbin Deng 2024-08-26 16:06:32 +0800
  • e5dc4e9123
    disable outdated scheduled workflow (#11915) Shaojun Liu 2024-08-24 07:17:42 +0800
  • 24c279e0ae
    Update IPEX_LLM_PERFORMANCE_MODE with input length threshold (#11908) Yuwen Hu 2024-08-23 20:49:15 +0800
  • 303a090a6b
    Add lm_head optimization on NPU (#11903) binbin Deng 2024-08-23 15:51:07 +0800
  • 23631cd357
    disable lm_head opt for baichuan2-13b (#11905) Yina Chen 2024-08-23 10:39:47 +0300
  • 4cf640c548
    update docker image tag to 2.2.0-SNAPSHOT (#11904) Shaojun Liu 2024-08-23 13:57:41 +0800
  • 650e6e6ce4
    Merge pull request #11891 from hxsz1997/baichuan2-compresskv hxsz1997 2024-08-23 06:09:58 +0300
  • 4a61f7d20d
    update mlp of llama (#11897) Ruonan Wang 2024-08-22 05:34:53 -0700
  • 420ce7d164
    Fix non-stop at eos token problem for lookup generation (#11896) Yuwen Hu 2024-08-22 18:55:59 +0800
  • 4cf03d6212 update baichuan-7b Huang, Xinshengzi 2024-08-22 18:16:33 +0800
  • 794abe2ce8
    update npu-readme (#11900) Zijie Li 2024-08-22 17:49:35 +0800
  • 278b191dc1
    Fix optimize lm head error (#11899) Guancheng Fu 2024-08-22 17:45:26 +0800
  • c5b51d41fb
    Update pypi tag to 2.2.0.dev0 (#11895) Shaojun Liu 2024-08-22 16:48:09 +0800
  • 18662dca1c
    change 5 pytorch/huggingface models to fp16 (#11894) Jinhe 2024-08-22 16:12:09 +0800
  • 5c4ed00593
    Add lightweight-serving whisper asr example (#11847) Wang, Jian4 2024-08-22 15:46:28 +0800
  • eb1e65f8a9 add comment Huang, Xinshengzi 2024-08-22 15:14:47 +0800
  • a2be3d7501 add comment of compress kv in attention forward Huang, Xinshengzi 2024-08-22 15:11:55 +0800
  • a8e2573421
    added tokenization file for codegeex2-6b in pytorch-models(#11875) Jinhe 2024-08-22 14:37:56 +0800
  • ce7de77085 add comment of change in model forward Huang, Xinshengzi 2024-08-22 14:29:27 +0800
  • 42398a0045 add comment Huang, Xinshengzi 2024-08-22 13:17:13 +0800
  • 48a827aa07 fix typos Huang, Xinshengzi 2024-08-22 11:35:47 +0800
  • 8a5df93de2 fix typos Huang, Xinshengzi 2024-08-22 11:33:07 +0800
  • 01ed397e7a fix typos Huang, Xinshengzi 2024-08-22 11:31:25 +0800
  • c6ed1c412d fix typos Huang, Xinshengzi 2024-08-22 11:26:49 +0800
  • 2a0aa9271b fix typos Huang, Xinshengzi 2024-08-22 11:23:22 +0800
  • 4adadddbbc fix typos Huang, Xinshengzi 2024-08-22 11:12:23 +0800
  • bac98baab9
    Make performance test install specific ipex-llm version from pypi (#11892) Yuwen Hu 2024-08-22 11:10:12 +0800
  • 6a5ca17afc fix typoes Huang, Xinshengzi 2024-08-22 11:09:58 +0800
  • 72a7bf624b
    Support qwen2-1.5b with fused decoderlayer optimization on NPU (#11888) binbin Deng 2024-08-22 11:09:12 +0800
  • 6bb9035788 fix typos Huang, Xinshengzi 2024-08-22 11:08:48 +0800
  • 86248b0505 add compress_kv for baichuan2 Huang, Xinshengzi 2024-08-22 10:59:08 +0800
  • bdbe995b01
    Update README.md (#11889) Zijie Li 2024-08-22 09:40:16 +0800
  • cc27321441
    support chatglm4 in lookup (#11855) Yina Chen 2024-08-21 10:53:17 +0300
  • 0236de3ac2
    set IPEX_LLM_LAST_LM_HEAD=1 as default (#11885) Yina Chen 2024-08-21 10:06:12 +0300
  • 8c5c7f32dd
    Update doc for running npu generate example with ipex-llm[npu] (#11876) SONG Ge 2024-08-21 13:45:29 +0800
  • 209d42ab79
    Refactor npu mp to make it easier to integrate new models (#11873) Yang Wang 2024-08-20 20:58:47 -0700
  • 537c0d2767
    fix vllm qwen2 models (#11879) Guancheng Fu 2024-08-21 11:05:24 +0800
  • bd1e490d62
    fix phi3 (#11878) Yishuo Wang 2024-08-21 10:31:41 +0800
  • eab6f6dde4
    Spr perf small fix (#11874) Yuwen Hu 2024-08-21 09:35:26 +0800
  • 37106a877c
    igpu performance test smal fix (#11872) Yuwen Hu 2024-08-21 03:09:14 +0800
  • bdaeee1d63
    Fix run_decoders bug (#11871) Yang Wang 2024-08-20 12:04:59 -0700
  • 32f0a77846
    feat: update readme for ppl test (#11865) Chu,Youcheng 2024-08-20 20:13:54 +0800
  • 5df00869de
    Update local import for ppl (#11866) RyuKosei 2024-08-20 18:50:00 +0800
  • c3c058373f
    Update compresskv model forward type logic (#11868) Yina Chen 2024-08-20 13:11:37 +0300
  • 3ee194d983
    Pytorch models transformers version update (#11860) Jinhe 2024-08-20 18:01:42 +0800
  • 0d58c2fdf9
    Update performance test regarding updated default transformers==4.37.0 (#11869) Yuwen Hu 2024-08-20 17:59:28 +0800
  • 5e8286f72d
    Update ipex-llm default transformers version to 4.37.0 (#11859) Yuwen Hu 2024-08-20 17:37:58 +0800
  • d4ee0a89f3
    optimize phi3 memory usage (#11867) Yishuo Wang 2024-08-20 17:32:51 +0800