Commit graph

  • 5b83493b1a
    Add ipex-llm npu option in setup.py (#11858) SONG Ge 2024-08-20 17:29:49 +0800
  • ee6852c915
    Fix typo (#11862) Heyang Sun 2024-08-20 16:38:11 +0800
  • 2946420e14
    add minicpmv 2.6 load_low_bit workaround (#11856) Yishuo Wang 2024-08-20 11:16:02 +0800
  • 7380823f3f
    Update Llama2 multi-processes example (#11852) SONG Ge 2024-08-19 19:49:01 +0800
  • 99b05ba1dc
    separate prefill into a process (#11787) Yang Wang 2024-08-19 02:53:36 -0700
  • da3d7a3a53
    delete transformers version requirement (#11845) Jinhe 2024-08-19 17:53:02 +0800
  • a0fbda5bc8
    add MiniCPM-Llama3-V-2_5 into all-in-one benchmark (#11849) Ruonan Wang 2024-08-19 02:51:16 -0700
  • 9490781aec
    optimize phi3 memory usage again (#11848) Yishuo Wang 2024-08-19 17:26:59 +0800
  • 3cd4e87168
    Support compress KV with quantize KV (#11812) Yina Chen 2024-08-19 10:32:32 +0300
  • 6841a9ac8f
    fix load low bit com dtype (#11832) Zhao Changmin 2024-08-19 13:43:19 +0800
  • cfc959defa
    Fixes regarding utf-8 in all-in-one benchmark (#11839) Yuwen Hu 2024-08-19 10:38:00 +0800
  • 46a1cbfa64
    feat: add mixed_precision argument on ppl longbench evaluation (#11837) Chu,Youcheng 2024-08-19 10:00:44 +0800
  • 580c94d0e2
    Remove gemma-2-9b-it 3k input from igpu-perf (#11834) Yuwen Hu 2024-08-17 13:10:05 +0800
  • 9f17234f3b
    Add MiniCPM-V-2_6 to iGPU Perf (#11810) Jin, Qiao 2024-08-16 18:41:21 +0800
  • 96796f95cb
    Update all-in-one benchmark prompts for continuation task & lookup update for minicpmv (#11827) Yuwen Hu 2024-08-16 17:16:35 +0800
  • e966e85df8
    force lm_head optimization in any model if set environment variable (#11830) Yishuo Wang 2024-08-16 16:48:45 +0800
  • 3b630fb9df
    updated ppl README (#11807) RyuKosei 2024-08-16 15:49:25 +0800
  • e07a55665c
    Codegeex2 tokenization fix (#11831) Jinhe 2024-08-16 15:48:47 +0800
  • a508b0a902
    added link to minicpm-v-2_6 example (#11829) Jinhe 2024-08-16 14:49:23 +0800
  • adfbb9124a
    Reorganize MiniCPM-V-2_6 example & update others MiniCPM-V-2 exmaples (#11815) Jinhe 2024-08-16 14:48:56 +0800
  • f463268e36
    fix: add run oneAPI instruction for the example of codeshell (#11828) Chu,Youcheng 2024-08-16 14:29:06 +0800
  • 17a0beb21f
    optimize qwen2-audio again (#11825) Yishuo Wang 2024-08-16 11:11:35 +0800
  • 6a8d07ddb4
    Update README.md (#11824) Jason Dai 2024-08-16 10:22:02 +0800
  • 9e9086cc2a
    Update IPEX_LLM_PERFORMANCE_MODE (#11823) Yuwen Hu 2024-08-16 09:48:36 +0800
  • 5a80fd2633
    Fix lightweight-serving no streaming resp on mtl (#11822) Wang, Jian4 2024-08-16 09:43:03 +0800
  • e70ae0638e
    Fix vLLM not convert issues (#11817) Guancheng Fu 2024-08-15 19:04:05 +0800
  • 750d4ad5dc
    fix minicpm-v-2 fp16 (#11819) Yishuo Wang 2024-08-15 18:34:40 +0800
  • 6543321f04
    Remove 4k igpu perf on gemma-2-9b-it (#11820) Yuwen Hu 2024-08-15 18:06:19 +0800
  • 28d1c972da
    add mixed_precision argument on ppl wikitext evaluation (#11813) Chu,Youcheng 2024-08-15 17:58:53 +0800
  • 828ab16537
    fix phi3 and minicpmv cpu (#11818) Yishuo Wang 2024-08-15 17:43:29 +0800
  • 4e178f0c5d
    rewrite minicpmv optimization (#11816) Yishuo Wang 2024-08-15 17:27:12 +0800
  • 447c8ed324
    update transformers version for replit-code-v1-3b, `internlm2-chat-… (#11811) Ch1y0q 2024-08-15 16:40:48 +0800
  • 2fbbb51e71
    transformers==4.37, yi & yuan2 & vicuna (#11805) Jinhe 2024-08-15 15:39:24 +0800
  • f43da2d455
    deletion of specification of transformers version (#11808) Jinhe 2024-08-15 15:23:32 +0800
  • 07b7f13982
    support and optimize qwen2-audio (#11809) Yishuo Wang 2024-08-15 14:59:04 +0800
  • 3ac83f8396
    fix: delete ipex extension import in ppl wikitext evaluation (#11806) Chu,Youcheng 2024-08-15 13:40:01 +0800
  • 016e840eed
    Fix performance tests (#11802) Yuwen Hu 2024-08-15 01:37:01 +0800
  • e3c1dae619
    Fix Windows Unit Test (#11801) Shaojun Liu 2024-08-14 19:16:48 +0800
  • 9a93808fc5
    fix and optimize minicpm v 2 (#11799) Yishuo Wang 2024-08-14 17:27:23 +0800
  • d8d887edd2
    added minicpm-v-2_6 (#11794) Jinhe 2024-08-14 16:23:44 +0800
  • 3d6cfa291d
    optimize minicpm v 2.5 (#11793) Yishuo Wang 2024-08-14 16:07:24 +0800
  • 356281cb80
    Further all-in-one benchmark update continuation task (#11784) Yuwen Hu 2024-08-14 14:39:34 +0800
  • 43cca3be27
    fix gemma2 runtime error caused by sliding window (#11788) Ruonan Wang 2024-08-14 05:43:33 +0300
  • dbd14251dd
    Troubleshoot for sycl not found (#11774) Jinhe 2024-08-14 10:26:01 +0800
  • 51bcac1229
    follow up on experimental support of fused decoder layer for llama2 (#11785) Yang Wang 2024-08-13 18:53:55 -0700
  • cb79dcda93
    refactor llama convert to fix minicpm-v 2.5 optimization (#11783) Yishuo Wang 2024-08-14 09:29:57 +0800
  • 7cd6ec9723
    MiniCPM-V support compresskv (#11779) Yina Chen 2024-08-13 14:03:40 +0300
  • 3998de14f0
    Fix mistral forward_qkv in q4_0 (#11781) Qiyuan Gong 2024-08-13 16:48:19 +0800
  • 70c828b87c
    deepspeed zero3 QLoRA finetuning (#11625) Heyang Sun 2024-08-13 16:15:29 +0800
  • a184b120c9
    fix minicpm-v 2.5 (#11780) Yishuo Wang 2024-08-13 16:14:00 +0800
  • ec184af243
    Add gemma-2-2b-it and gemma-2-9b-it to igpu nightly performance test (#11778) Yuwen Hu 2024-08-13 15:39:56 +0800
  • a88c132e54
    Reduce Mistral softmax memory only in low memory mode (#11775) Qiyuan Gong 2024-08-13 14:50:54 +0800
  • aa861df066
    use new fp32 softmax kernel (#11776) Yishuo Wang 2024-08-13 14:48:11 +0800
  • 23d3acdc77
    Add experimental support of fused decoder layer for llama2 (#11768) binbin Deng 2024-08-13 14:41:36 +0800
  • c28b3389e6
    Update npu multimodal example (#11773) Jin, Qiao 2024-08-13 14:14:59 +0800
  • 81824ff8c9
    Fix stdout in all-in-one benchmark to utf-8 (#11772) Yuwen Hu 2024-08-13 10:51:08 +0800
  • a1eb793f70
    optimize minicpm v 2_6 firs token perf (#11770) Yishuo Wang 2024-08-13 09:51:18 +0800
  • 841dbcdf3a
    Fix compresskv with lookahead issue (#11767) Yina Chen 2024-08-12 13:53:55 +0300
  • f97a77ea4e
    Update all-in-one benchmark for continuation task input preparation (#11760) Yuwen Hu 2024-08-12 17:49:45 +0800
  • 1b05caba2b
    Set mistral fuse rope to false except fp6 & fp16 (#11765) Xu, Shuo 2024-08-12 17:25:07 +0800
  • 8db34057b4
    optimize lookahead init time (#11769) Ruonan Wang 2024-08-12 12:19:12 +0300
  • 05989ad0f9
    Update npu example and all in one benckmark (#11766) Jin, Qiao 2024-08-12 16:46:46 +0800
  • 57d177738d
    optimize minicpm-v-2_6 repetition penalty (#11763) Yishuo Wang 2024-08-12 14:10:10 +0800
  • fac4c01a6e
    Revert to use out-of-tree GPU driver (#11761) Shaojun Liu 2024-08-12 13:41:47 +0800
  • 245dba0abc
    Fix lightweight-serving codegeex error (#11759) Wang, Jian4 2024-08-12 10:35:37 +0800
  • 66fe2ee464
    initial support of IPEX_LLM_PERFORMANCE_MODE (#11754) Ruonan Wang 2024-08-09 14:04:09 +0300
  • 4b9c57cc60
    Support compress kv with lookahead (#11752) Yina Chen 2024-08-09 12:39:57 +0300
  • 93455aac09
    fix minicpm V 2.6 repeat output (#11753) Yishuo Wang 2024-08-09 17:39:24 +0800
  • 7e917d6cfb
    fix gptq of llama (#11749) Ruonan Wang 2024-08-09 11:39:25 +0300
  • dd46c141bd
    Phi3 support compresskv (#11733) Yina Chen 2024-08-09 10:43:43 +0300
  • d8808cc2e3
    Mistral apply_rotary_pos_emb_no_cache_xpu use rope_theta from config (#11747) Qiyuan Gong 2024-08-09 10:35:51 +0800
  • 044e486480
    Fix vLLM CPU /chat endpoint (#11748) Xiangyu Tian 2024-08-09 10:33:52 +0800
  • 27b4b104ed
    Add qwen2-1.5b-instruct into igpu performance (#11735) Jinhe 2024-08-08 16:42:18 +0800
  • 107f7aafd0
    enable inference mode for deepspeed tp serving (#11742) Shaojun Liu 2024-08-08 14:38:30 +0800
  • 9e65cf00b3
    Add openai-whisper pytorch gpu (#11736) Zijie Li 2024-08-08 12:32:59 +0800
  • 7e61fa1af7
    Revise GPU driver related guide in for Windows users (#11740) Yuwen Hu 2024-08-08 11:26:26 +0800
  • d0c89fb715
    updated llama.cpp and ollama quickstart (#11732) Jinhe 2024-08-08 11:04:01 +0800
  • 54cc9353db
    support and optimize minicpm-v-2_6 (#11738) Yishuo Wang 2024-08-07 18:21:16 +0800
  • e956e71fc1
    fix conflict with quant kv (#11737) Yina Chen 2024-08-07 13:10:30 +0300
  • 00a5574c8a
    Use merge_qkv to replace fused_qkv for llama2 (#11727) Ruonan Wang 2024-08-07 13:04:01 +0300
  • d2abc9711b
    Fix MTL 4k input qwen2 compresskv error (#11734) Yina Chen 2024-08-07 11:21:57 +0300
  • a71ae7c22b
    Support minicpm compresskv & modify default compresskv config & default enable compresskv on mtl 2.5k~4.5k (#11726) Yina Chen 2024-08-07 06:35:39 +0300
  • c093f7d980
    fix phi3 (#11729) Yishuo Wang 2024-08-07 09:39:46 +0800
  • e32d13d78c
    Remove Out of tree Driver from GPU driver installation document (#11728) Qiyuan Gong 2024-08-07 09:38:19 +0800
  • e7f7141781
    Add benchmark util for transformers 4.42 (#11725) Zijie Li 2024-08-07 08:48:07 +0800
  • 4676af2054
    add gemma2 example (#11724) Ch1y0q 2024-08-06 21:17:50 +0800
  • 985213614b
    Removed no longer needed models for Arc nightly perf (#11722) SichengStevenLi 2024-08-06 16:12:00 +0800
  • 929675aa6b
    support latest phi3 (#11721) Yishuo Wang 2024-08-06 15:52:55 +0800
  • 11650b6f81
    upgrade glm-4v example transformers version (#11719) Jin, Qiao 2024-08-06 14:55:09 +0800
  • bbdff6edeb
    optimize internvl2 4b performance (#11720) Yishuo Wang 2024-08-06 14:25:08 +0800
  • f44b732aa8
    support internvl2-4b (#11718) Yishuo Wang 2024-08-06 13:36:32 +0800
  • 7f241133da
    Add MiniCPM-Llama3-V-2_5 GPU example (#11693) Jin, Qiao 2024-08-06 10:22:41 +0800
  • 808d9a7bae
    Add MiniCPM-V-2 GPU example (#11699) Jin, Qiao 2024-08-06 10:22:33 +0800
  • 8fb36b9f4a
    add new benchmark_util.py (#11713) Zijie Li 2024-08-05 16:18:48 +0800
  • 493cbd9a36
    Support lightweight-serving with internlm-xcomposer2-vl-7b multimodal input (#11703) Wang, Jian4 2024-08-05 09:36:04 +0800
  • aa98ef96fe
    change mixed_precision to q6_k (#11706) Ruonan Wang 2024-08-02 10:55:16 +0300
  • 1baa3efe0e
    Optimizations for Pipeline Parallel Serving (#11702) Xiangyu Tian 2024-08-02 12:06:59 +0800
  • 8d1e0bd2f4
    add sdp causal support in llama (#11705) Yina Chen 2024-08-02 05:27:40 +0300
  • 736a7ef72e
    add sdp_causal for mistral 4.36 (#11686) Ruonan Wang 2024-08-01 13:57:31 +0300
  • 45c730ff39
    Chatglm support compresskv (#11690) Yina Chen 2024-08-01 13:20:20 +0300