Commit graph

  • 7d8bc83415
    LLM: Partial Prefilling for Pipeline Parallel Serving (#11457) Xiangyu Tian 2024-07-05 13:10:35 +0800
  • 72b4efaad4
    Enhanced XPU Dockerfiles: Optimized Environment Variables and Documentation (#11506) Shaojun Liu 2024-07-04 20:18:38 +0800
  • 60de428b37
    Support pipeline parallel for qwen-vl (#11503) binbin Deng 2024-07-04 18:03:57 +0800
  • 57b8adb189
    [WIP] Support npu load_low_bit method (#11502) Zhao Changmin 2024-07-04 17:15:34 +0800
  • f07937945f
    [REMOVE] remove all useless repo-id in benchmark/igpu-perf (#11508) Jun Wang 2024-07-04 16:38:34 +0800
  • 1a8bab172e
    add minicpm 1B/2B npu support (#11507) Yishuo Wang 2024-07-04 16:31:04 +0800
  • bb0a84044b
    add qwen2 npu support (#11504) Yishuo Wang 2024-07-04 11:01:25 +0800
  • 932ef78131
    Update Workflow Inputs, Runner, and PR Validation Process (#11501) Shaojun Liu 2024-07-03 16:49:54 +0800
  • f84ca99b9f
    optimize gemma2 rmsnorm (#11500) Xin Qiu 2024-07-03 15:21:03 +0800
  • 61c36ba085
    Add pp_serving verified models (#11498) Wang, Jian4 2024-07-03 14:57:09 +0800
  • 9274282ef7
    Support pipeline parallel for glm-4-9b-chat (#11463) binbin Deng 2024-07-03 14:25:28 +0800
  • e7ab93b55c
    Update pull_request_template.md (#11484) Shaojun Liu 2024-07-03 11:13:16 +0800
  • d97c2664ce
    use new fuse rope in stablelm family (#11497) Yishuo Wang 2024-07-03 11:08:26 +0800
  • 18c973dc3e
    Wang jun/ipex llm workflow (#11499) Jun Wang 2024-07-03 10:13:42 +0800
  • e53bd4401c
    Small typo fixes in binary build workflow (#11494) Yuwen Hu 2024-07-02 19:11:43 +0800
  • 4e32c92979
    Further fix for triggering perf test from commit (#11493) Yuwen Hu 2024-07-02 18:56:53 +0800
  • 52519e07df
    remove models we no longer need in benchmark. (#11492) Xu, Shuo 2024-07-02 17:20:48 +0800
  • 6a0134a9b2
    support q4_0_rtn (#11477) Zhao Changmin 2024-07-02 16:57:02 +0800
  • 6352c718f3
    [update] merge manually build for testing function to manualy build (#11491) Jun Wang 2024-07-02 16:28:15 +0800
  • 5e967205ac
    remove the code converts input to fp16 before calling batch forward kernel (#11489) Yishuo Wang 2024-07-02 16:23:53 +0800
  • 1638573f56
    Update llama cpp quickstart regarding windows prerequisites to avoid misleading (#11490) Yuwen Hu 2024-07-02 16:15:47 +0800
  • 986b10e397
    Further fix for performance tests triggered by pr (#11488) Yuwen Hu 2024-07-02 15:29:42 +0800
  • bb6953c19e
    Support pr validate perf test (#11486) Yuwen Hu 2024-07-02 15:20:42 +0800
  • 4390e7dc49
    Fix codegeex2 transformers version (#11487) Wang, Jian4 2024-07-02 15:09:28 +0800
  • 4fbb0d33ae
    Pin compute runtime version for xpu images (#11479) Guancheng Fu 2024-07-01 21:41:02 +0800
  • a1164e45b6
    Enable Release Pypi workflow to be called in another repo (#11483) Shaojun Liu 2024-07-01 19:48:21 +0800
  • fb4774b076
    Update pull request template for manually-ttriggered Unit tests (#11482) Yuwen Hu 2024-07-01 19:06:29 +0800
  • ca24794dd0
    Fixes for performance test triggering (#11481) Yuwen Hu 2024-07-01 18:39:54 +0800
  • 6bdc562f4c
    Enable triggering nightly tests/performance tests from another repo (#11480) Yuwen Hu 2024-07-01 17:45:42 +0800
  • ec3a912ab6
    optimize npu llama long context performance (#11478) Yishuo Wang 2024-07-01 16:49:23 +0800
  • 913e750b01
    fix non-string deepseed config path bug (#11476) Heyang Sun 2024-07-01 15:53:50 +0800
  • 48ad482d3d
    Fix import error caused by pydantic on cpu (#11474) binbin Deng 2024-07-01 15:49:49 +0800
  • dbba51f455
    Enable LLM UT workflow to be called in another repo (#11475) Yuwen Hu 2024-07-01 15:26:17 +0800
  • 39bcb33a67
    add sdp support for stablelm 3b (#11473) Yishuo Wang 2024-07-01 14:56:15 +0800
  • cf8eb7b128
    Init NPU quantize method and support q8_0_rtn (#11452) Zhao Changmin 2024-07-01 13:45:07 +0800
  • 319a3b36b2
    fix npu llama2 (#11471) Yishuo Wang 2024-07-01 10:14:11 +0800
  • 07362ffffc
    ChatGLM3-6B LoRA Fine-tuning Demo (#11450) Heyang Sun 2024-07-01 09:18:39 +0800
  • e000ac90c4
    Add pp_serving example to serving image (#11433) Wang, Jian4 2024-06-28 16:45:25 +0800
  • fd933c92d8
    Fix: Correct num_requests in benchmark for Pipeline Parallel Serving (#11462) Xiangyu Tian 2024-06-28 16:10:51 +0800
  • b7bc1023fb
    Add vllm_online_benchmark.py (#11458) Wang, Jian4 2024-06-28 14:59:06 +0800
  • 86b81c09d9
    Table of Contents in Quickstart Files (#11437) SichengStevenLi 2024-06-28 10:41:00 +0800
  • a414e3ff8a
    add pipeline parallel support with load_low_bit (#11414) SONG Ge 2024-06-28 10:17:56 +0800
  • d0b801d7bc
    LLM: change write mode in all-in-one benchmark. (#11444) Cengguang Zhang 2024-06-27 19:36:38 +0800
  • 987017ef47
    Update pipeline parallel serving for more model support (#11428) binbin Deng 2024-06-27 18:21:01 +0800
  • 029ff15d28
    optimize npu llama2 first token performance (#11451) Yishuo Wang 2024-06-27 17:37:33 +0800
  • 4e4ecd5095
    Control sys.modules ipex duplicate check with BIGDL_CHECK_DUPLICATE_IMPORT (#11453) Qiyuan Gong 2024-06-27 17:21:45 +0800
  • c6e5ad668d
    fix internlm xcomposser meta-instruction typo (#11448) Yishuo Wang 2024-06-27 15:29:43 +0800
  • f89ca23748
    optimize npu llama2 perf again (#11445) Yishuo Wang 2024-06-27 15:13:42 +0800
  • 13f59ae6b4
    Fix llm binary build linux-build-avxvnni failure (#11447) Shaojun Liu 2024-06-27 14:12:14 +0800
  • cf0f5c4322
    change npu document (#11446) Yishuo Wang 2024-06-27 13:59:59 +0800
  • 508c364a79
    Add precision option in PP inference examples (#11440) binbin Deng 2024-06-27 09:24:27 +0800
  • e9e8f9b4d4
    Update Readme (#11441) Jason Dai 2024-06-26 19:48:07 +0800
  • 2939f1ac60
    Update README.md (#11439) Jason Dai 2024-06-26 19:25:58 +0800
  • 2a0f8087e3
    optimize qwen2 gpu memory usage again (#11435) Yishuo Wang 2024-06-26 16:52:29 +0800
  • ab9f7f3ac5
    FIX: Qwen1.5-GPTQ-Int4 inference error (#11432) Shaojun Liu 2024-06-26 15:36:22 +0800
  • 99cd16ef9f
    Fix error while using pipeline parallism (#11434) Guancheng Fu 2024-06-26 15:33:47 +0800
  • a45ceac4e4
    Update main readme for missing quickstarts (#11427) Yuwen Hu 2024-06-26 13:51:42 +0800
  • 40fa23560e
    Fix LLAVA example on CPU (#11271) Jiao Wang 2024-06-25 20:04:59 -0700
  • ca0e69c3a7
    optimize npu llama perf again (#11431) Yishuo Wang 2024-06-26 10:52:54 +0800
  • 9f6e5b4fba
    optimize llama npu perf (#11426) Yishuo Wang 2024-06-25 17:43:20 +0800
  • e473b8d946
    Add more qwen1.5 and qwen2 support for pipeline parallel inference (#11423) binbin Deng 2024-06-25 15:49:32 +0800
  • aacc1fd8c0
    Fix shape error when run qwen1.5-14b using deepspeed autotp (#11420) binbin Deng 2024-06-25 13:48:37 +0800
  • 3b23de684a
    update npu examples (#11422) Yishuo Wang 2024-06-25 13:32:53 +0800
  • 8ddae22cfb
    LLM: Refactor Pipeline-Parallel-FastAPI example (#11319) Xiangyu Tian 2024-06-25 13:30:36 +0800
  • 34c15d3a10
    update pp document (#11421) SONG Ge 2024-06-25 10:17:20 +0800
  • 9e4ee61737
    rename BIGDL_OPTIMIZE_LM_HEAD to IPEX_LLM_LAST_LM_HEAD and add qwen2 (#11418) Xin Qiu 2024-06-24 18:42:37 +0800
  • 75f836f288
    Add extra warmup for THUDM/glm-4-9b-chat in igpu-performance test (#11417) Yuwen Hu 2024-06-24 18:08:05 +0800
  • ecb9efde65
    Workaround if demo preview image load slow in mddocs (#11412) Yuwen Hu 2024-06-24 16:17:50 +0800
  • 5e823ef2ce
    Fix nightly arc perf (#11404) Shaojun Liu 2024-06-24 15:58:41 +0800
  • ccb3fb357a
    Add mddocs index (#11411) Yuwen Hu 2024-06-24 15:35:18 +0800
  • c985912ee3
    Add Deepspeed LoRA dependencies in document (#11410) Heyang Sun 2024-06-24 15:29:59 +0800
  • abe53eaa4f
    optimize qwen1.5/2 memory usage when running long input with fp16 (#11403) Yishuo Wang 2024-06-24 13:43:04 +0800
  • 7507000ef2
    Fix 1383 Llama model on transformers=4.41[WIP] (#11280) Guoqiong Song 2024-06-21 11:24:10 -0700
  • 475b0213d2
    README update (API doc and FAQ and minor fixes) (#11397) Shengsheng Huang 2024-06-21 19:46:32 +0800
  • 0c67639539
    Add more examples for pipeline parallel inference (#11372) SONG Ge 2024-06-21 17:55:16 +0800
  • 2004fe1a43
    Small fix (#11395) Yuwen Hu 2024-06-21 17:45:10 +0800
  • 4cb9a4728e
    Add index page for API doc & links update in mddocs (#11393) Yuwen Hu 2024-06-21 17:34:34 +0800
  • b200e11e21
    Add initial python api doc in mddoc (2/2) (#11388) Xu, Shuo 2024-06-21 17:15:05 +0800
  • aafd6d55cd
    Add initial python api doc in mddoc (1/2) (#11389) Yuwen Hu 2024-06-21 17:14:42 +0800
  • a027121530
    Small mddoc fixed based on review (#11391) Yuwen Hu 2024-06-21 17:09:30 +0800
  • 072ce7e66d
    update README links to mddocs (#11387) Shengsheng Huang 2024-06-21 13:59:27 +0800
  • 54f9d07d8f
    Further mddocs fixes (#11386) Yuwen Hu 2024-06-21 13:27:43 +0800
  • b30bf7648e
    Fix vLLM CPU api_server params (#11384) Xiangyu Tian 2024-06-21 13:00:06 +0800
  • 21fc781fce
    Add GLM-4V example (#11343) ivy-lv11 2024-06-21 12:54:31 +0800
  • 9b475c07db
    Add missing ragflow quickstart in mddocs and update legecy contents (#11385) Yuwen Hu 2024-06-21 12:28:26 +0800
  • fed79f106b
    Update mddocs for DockerGuides (#11380) Xu, Shuo 2024-06-21 12:10:35 +0800
  • 1a1a97c9e4
    Update mddocs for part of Overview (2/2) and Inference (#11377) SichengStevenLi 2024-06-21 12:07:50 +0800
  • 33b9a9c4c9
    Update part of Overview guide in mddocs (1/2) (#11378) Zijie Li 2024-06-21 10:45:17 +0800
  • 4ba82191f2
    Support PP inference for chatglm3 (#11375) binbin Deng 2024-06-21 09:59:01 +0800
  • 9a3a21e4fc
    Update part of Quickstart guide in mddocs (2/2) (#11376) Jin Qiao 2024-06-20 19:03:06 +0800
  • 8c9f877171
    Update part of Quickstart guide in mddocs (1/2) Yuwen Hu 2024-06-20 18:43:23 +0800
  • f0fdfa081b
    Optimize qwen 1.5 14B batch performance (#11370) Yishuo Wang 2024-06-20 17:23:39 +0800
  • 5aa3e427a9
    Fix docker images (#11362) Shaojun Liu 2024-06-20 15:44:55 +0800
  • d9dd1b70bd
    Remove example page in mddocs (#11373) Yuwen Hu 2024-06-20 14:23:43 +0800
  • c0e86c523a
    Add qwen-moe batch1 to nightly perf (#11369) Wenjing Margaret Mao 2024-06-20 14:17:41 +0800
  • 769728c1eb
    Add initial md docs (#11371) Yuwen Hu 2024-06-20 13:47:49 +0800
  • 9601fae5d5
    fix system note (#11368) Shengsheng Huang 2024-06-20 11:09:53 +0800
  • a5e7d93242
    Add initial save/load low bit support for NPU(now only fp16 is supported) (#11359) Yishuo Wang 2024-06-20 10:49:39 +0800
  • ed4c439497
    small fix (#11366) Shengsheng Huang 2024-06-20 10:38:20 +0800
  • 05a8d051f6
    Fix run.py run_ipex_fp16_gpu (#11361) RyuKosei 2024-06-20 10:29:32 +0800