Commit graph

  • b2f62a8561
    Add batch 4 perf test (#11355) Wenjing Margaret Mao 2024-06-20 09:48:52 +0800
  • a721c1ae43
    minor fix of ragflow_quickstart.md (#11364) Shengsheng Huang 2024-06-19 22:30:33 +0800
  • 13727635e8
    revise ragflow quickstart (#11363) Shengsheng Huang 2024-06-19 22:24:31 +0800
  • 5283df0078
    LLM: Add RAGFlow with Ollama Example QuickStart (#11338) Zijie Li 2024-06-19 20:00:50 +0800
  • ae452688c2
    Add NPU HF example (#11358) Zijie Li 2024-06-19 18:07:28 +0800
  • 1eb884a249
    IPEX Duplicate importer V2 (#11310) Qiyuan Gong 2024-06-19 16:29:19 +0800
  • 271d82a4fc
    Update readme (#11357) Jason Dai 2024-06-19 10:05:42 +0800
  • ae7b662ed2
    add fp16 NPU Linear support and fix intel_npu_acceleration_library version 1.0 support (#11352) Yishuo Wang 2024-06-19 09:14:59 +0800
  • c44b1942ed
    fix mistral for transformers>=4.39 (#11191) Guoqiong Song 2024-06-18 13:39:35 -0700
  • 67a1e05876
    Remove zero3 context manager from LoRA (#11346) Heyang Sun 2024-06-18 17:24:43 +0800
  • f6cd628cd8
    Fix script usage in vLLM CPU Quickstart (#11353) Xiangyu Tian 2024-06-18 16:50:48 +0800
  • ef9f740801
    Docs: Fix CPU Serving Docker README (#11351) Xiangyu Tian 2024-06-18 16:27:51 +0800
  • c9b4cadd81
    fix vLLM/docker issues (#11348) Guancheng Fu 2024-06-18 16:23:53 +0800
  • 83082e5cc7
    add initial support for intel npu acceleration library (#11347) Yishuo Wang 2024-06-18 16:07:16 +0800
  • 694912698e
    Upgrade scikit-learn to 1.5.0 to fix dependabot issue (#11349) Shaojun Liu 2024-06-18 15:47:25 +0800
  • 44f22cba70
    add config and default value (#11344) hxsz1997 2024-06-18 15:28:57 +0800
  • 1f39bb84c7
    update readthedocs perf data (#11345) Shengsheng Huang 2024-06-18 13:23:47 +0800
  • 00f322d8ee
    Finetune ChatGLM with Deepspeed Zero3 LoRA (#11314) Heyang Sun 2024-06-18 12:31:26 +0800
  • 5dad33e5af
    Support fp8_e4m3 scale search (#11339) Yina Chen 2024-06-18 11:47:43 +0800
  • e50c890e1f
    Support finishing PP inference once eos_token_id is found (#11336) binbin Deng 2024-06-18 09:55:40 +0800
  • de4bb97b4f
    Remove accelerate 0.23.0 install command in readme and docker (#11333) Qiyuan Gong 2024-06-17 17:52:12 +0800
  • ef4b6519fb
    Add phi-3 model support for pipeline parallel inference (#11334) SONG Ge 2024-06-17 17:44:24 +0800
  • 99b309928b
    Add lookahead in test_api: transformer_int4_fp16_gpu (#11337) hxsz1997 2024-06-17 17:41:41 +0800
  • bc4bafffc7
    Update README.md (#11335) Jason Dai 2024-06-17 16:24:23 +0800
  • 5d7c9bf901
    Upgrade accelerate to 0.23.0 (#11331) Qiyuan Gong 2024-06-17 15:03:11 +0800
  • 183e0c6cf5
    glm-4v-9b support (#11327) Xin Qiu 2024-06-17 13:52:37 +0800
  • bca5cbd96c
    Modify arc nightly perf to fp16 (#11275) Wenjing Margaret Mao 2024-06-17 13:47:22 +0800
  • a2a5890b48
    Make manually-triggered perf test able to choose which test to run (#11324) Yuwen Hu 2024-06-17 10:23:13 +0800
  • 1978f63f6b
    Fix igpu performance guide regarding html generation (#11328) Yuwen Hu 2024-06-17 10:21:30 +0800
  • 6ea1e71af0
    Update PP inference benchmark script (#11323) binbin Deng 2024-06-17 09:59:36 +0800
  • be00380f1a
    Fix pipeline parallel inference past_key_value error in Baichuan (#11318) SONG Ge 2024-06-17 09:29:32 +0800
  • 0af0102e61
    Add quantization scale search switch (#11326) Yina Chen 2024-06-14 18:46:52 +0800
  • 8a3247ac71
    support batch forward for q4_k, q6_k (#11325) Ruonan Wang 2024-06-14 18:25:50 +0800
  • e8dd8e97ef
    fix chatglm lookahead on ARC (#11320) Yishuo Wang 2024-06-14 16:26:11 +0800
  • f5ef94046e
    exclude dolly-v2-12b for arc perf test (#11315) Shaojun Liu 2024-06-14 15:35:56 +0800
  • 77809be946
    Install packages for ipex-llm-serving-cpu docker image (#11321) Shaojun Liu 2024-06-14 15:26:01 +0800
  • 4359ab3172
    LLM: Add /generate_stream endpoint for Pipeline-Parallel-FastAPI example (#11187) Xiangyu Tian 2024-06-14 15:15:32 +0800
  • 9e4d87a696
    Langchain-chatchat QuickStart small link fix (#11317) Yuwen Hu 2024-06-14 14:02:17 +0800
  • 0e7a31a09c
    ChatGLM Examples Restructure regarding Installation Steps (#11285) Jin Qiao 2024-06-14 12:37:05 +0800
  • 91965b5d05
    add glm_sdpa back to fix chatglm-6b (#11313) Yishuo Wang 2024-06-14 10:31:43 +0800
  • 7f65836cb9
    fix chatglm2/3-32k/128k fp16 (#11311) Yishuo Wang 2024-06-14 09:58:07 +0800
  • 1b0c4c8cb8
    use new rotary two in chatglm4 (#11312) Xin Qiu 2024-06-13 19:02:18 +0800
  • f1410d6823
    refactor chatglm4 (#11301) Xin Qiu 2024-06-13 18:06:04 +0800
  • 5e25766855
    fix and optimize chatglm2-32k and chatglm3-128k (#11306) Yishuo Wang 2024-06-13 17:37:58 +0800
  • 60cb1dac7c
    Support PP for qwen1.5 (#11300) binbin Deng 2024-06-13 17:35:24 +0800
  • f97cce2642
    Fix import error of ds autotp (#11307) binbin Deng 2024-06-13 16:22:52 +0800
  • 3682c6a979
    add glm4 and qwen2 to igpu perf (#11304) Jin Qiao 2024-06-13 16:16:35 +0800
  • a24666b8f3
    fix chatglm3-6b-32k (#11303) Yishuo Wang 2024-06-13 16:01:34 +0800
  • 9760ffc256
    Fix SDLe CT222 Vulnerabilities (#11237) Shaojun Liu 2024-06-13 15:31:22 +0800
  • bfab294f08
    Update langchain-chatchat QuickStart to include Core Ultra iGPU Linux Guide (#11302) Yuwen Hu 2024-06-13 15:09:55 +0800
  • 84f04087fb
    Add intelanalytics/ipex-llm:sources image for OSPDT (#11296) Shaojun Liu 2024-06-13 14:29:14 +0800
  • 01fe0fc1a2
    refactor chatglm2/3 (#11290) Yishuo Wang 2024-06-13 12:22:58 +0800
  • ea372cc472
    update demos section (#11298) Shengsheng Huang 2024-06-13 11:58:19 +0800
  • 57a023aadc
    Fix vllm tp (#11297) Guancheng Fu 2024-06-13 10:47:48 +0800
  • 986af21896
    fix perf test(#11295) Ruonan Wang 2024-06-13 10:35:48 +0800
  • 220151e2a1
    Refactor pipeline parallel multi-stage implementation (#11286) binbin Deng 2024-06-13 10:00:23 +0800
  • 14b1e6b699
    Fix gguf_q4k (#11293) Ruonan Wang 2024-06-12 20:43:08 +0800
  • 8edcdeb0e7
    Fix bug that torch.ops.torch_ipex.matmul_bias_out cannot work on Linux MTL for short input (#11292) Yuwen Hu 2024-06-12 19:12:57 +0800
  • b61f6e3ab1
    Add update_parent_folder for nightly_perf_test (#11287) Wenjing Margaret Mao 2024-06-12 17:58:13 +0800
  • 2e75bbccf9
    Add more control arguments for benchmark_vllm_throughput (#11291) Guancheng Fu 2024-06-12 17:43:06 +0800
  • 592f7aa61e
    Refine glm1-4 sdp (#11276) Xin Qiu 2024-06-12 17:11:56 +0800
  • cffb932f05
    Expose timeout for streamer for fastchat worker (#11288) Yuwen Hu 2024-06-12 17:02:40 +0800
  • d99423b75a
    Readme demo (#11283) Shengsheng Huang 2024-06-12 17:01:53 +0800
  • e7a4e2296f
    Add Stable Diffusion examples on GPU and CPU (#11166) ivy-lv11 2024-06-12 16:33:25 +0800
  • f224e98297
    Add GLM-4 CPU example (#11223) Jin Qiao 2024-06-12 15:30:51 +0800
  • 40fc8704c4
    Add GPU example for GLM-4 (#11267) Zijie Li 2024-06-12 14:29:50 +0800
  • 0d9cc9c106
    Remove duplicate check for ipex (#11281) Qiyuan Gong 2024-06-12 13:52:02 +0800
  • 10e480ee96
    refactor internlm and internlm2 (#11274) Yishuo Wang 2024-06-11 14:19:19 +0800
  • fac49f15e3
    Remove manual importing ipex in all-in-one benchmark (#11272) Yuwen Hu 2024-06-11 09:32:13 +0800
  • 70b17c87be
    Merge multiple batches (#11264) Wenjing Margaret Mao 2024-06-07 18:38:45 +0800
  • 4b07712fd8
    LLM: Fix vLLM CPU model convert mismatch (#11254) Xiangyu Tian 2024-06-07 15:54:34 +0800
  • 42fab480ea
    support stablm2 12b (#11265) Yishuo Wang 2024-06-07 15:46:00 +0800
  • dbc3c2d72d
    glm4 sdp (#11253) Xin Qiu 2024-06-07 15:42:23 +0800
  • 151fcf37bb
    check devie name in use_flash_attention (#11263) Xin Qiu 2024-06-07 15:07:47 +0800
  • 2623944604
    qwen2 sdpa small fix (#11261) Yishuo Wang 2024-06-07 14:42:18 +0800
  • ea0d03fd28
    Refactor baichuan1 7B and 13B (#11258) Yishuo Wang 2024-06-07 14:29:20 +0800
  • 1aa9c9597a
    Avoid duplicate import in IPEX auto importer (#11227) Qiyuan Gong 2024-06-07 14:08:00 +0800
  • 6f2684e5c9
    Update pp llama.py to save memory (#11233) Wang, Jian4 2024-06-07 13:18:16 +0800
  • ef8e9b2ecd
    Refactor qwen2 moe (#11244) Yishuo Wang 2024-06-07 13:14:54 +0800
  • 7b753dc8ca
    Update sample output for HF Qwen2 GPU and CPU (#11257) Zijie Li 2024-06-07 11:36:22 +0800
  • b7948671de
    [WIP] Add look up table in 1st token stage (#11193) Zhao Changmin 2024-06-07 10:51:05 +0800
  • 375174af33
    Small qwen2 link fix (#11255) Yuwen Hu 2024-06-07 10:37:32 +0800
  • 8c36b5bdde
    Add qwen2 example (#11252) Yuwen Hu 2024-06-07 10:29:33 +0800
  • 85df5e7699
    fix nightly perf test (#11251) Shaojun Liu 2024-06-07 09:33:14 +0800
  • 2f809116e2
    optimize Chatglm4 (#11239) Xin Qiu 2024-06-06 18:25:20 +0800
  • eeffeeb2e2
    fix benchmark script(#11243) Guancheng Fu 2024-06-06 17:44:19 +0800
  • 8aabb5bac7
    Enable CodeQL Check for CT39 (#11242) Shaojun Liu 2024-06-06 17:41:12 +0800
  • b6234eb4e2
    Add task in allinone (#11226) hxsz1997 2024-06-06 17:22:40 +0800
  • c825a7e1e9
    change the workflow file to test ftp (#11241) Wenjing Margaret Mao 2024-06-06 16:53:19 +0800
  • 2e4ccd541c
    fix qwen2 cpu (#11240) Yishuo Wang 2024-06-06 16:24:19 +0800
  • e738ec38f4
    disable quantize kv in specific qwen model (#11238) Yishuo Wang 2024-06-06 14:08:39 +0800
  • c4e5806e01
    add latest optimization in starcoder2 (#11236) Yishuo Wang 2024-06-06 14:02:17 +0800
  • ba27e750b1
    refactor yuan2 (#11235) Yishuo Wang 2024-06-06 13:17:54 +0800
  • 6be24fdd28
    OSPDT: add tpp licenses (#11165) Shaojun Liu 2024-06-06 10:59:06 +0800
  • 09c6780d0c
    phi-2 transformers 4.37 (#11161) Guoqiong Song 2024-06-05 13:36:41 -0700
  • f6d5c6af78
    fix issue 1407 (#11171) Guoqiong Song 2024-06-05 13:35:57 -0700
  • bfa1367149
    Add CPU and GPU example for MiniCPM (#11202) Zijie Li 2024-06-05 18:09:53 +0800
  • a27a559650
    Add some information in FAQ to help users solve "RuntimeError: could not create a primitive" error on Windows (#11221) Xu, Shuo 2024-06-05 17:57:42 +0800
  • af96579c76
    Update installation guide for pipeline parallel inference (#11224) Yuwen Hu 2024-06-05 17:54:29 +0800
  • ed67435491
    Support Fp6 k in ipex-llm (#11222) Yina Chen 2024-06-05 17:34:36 +0800