Commit graph

  • a6674f5bce
    Fix should_use_fuse_rope error of Qwen1.5-MoE-A2.7B-Chat (#11216) binbin Deng 2024-06-05 15:56:10 +0800
  • 231b968aba
    Modify the check_results.py to support batch 2&4 (#11133) Wenjing Margaret Mao 2024-06-05 15:04:55 +0800
  • dc4fea7e3f
    always cleanup conda env after build (#11211) Shaojun Liu 2024-06-05 13:46:30 +0800
  • 1f2057b16a
    Fix ipex-llm-cpu docker image (#11213) Shaojun Liu 2024-06-05 11:13:17 +0800
  • 566691c5a3
    quantized attention forward for minicpm (#11200) Xin Qiu 2024-06-05 09:15:25 +0800
  • bb83bc23fd
    Fix Starcoder issue on CPU on transformers 4.36+ (#11190) Jiao Wang 2024-06-04 10:05:40 -0700
  • f93664147c
    Update config.yaml (#11208) Kai Huang 2024-06-04 19:58:18 +0800
  • ac3d53ff5d
    LLM: Fix vLLM CPU version error (#11206) Xiangyu Tian 2024-06-04 19:10:23 +0800
  • 3ef4aa98d1
    Refine vllm_quickstart doc (#11199) Guancheng Fu 2024-06-04 18:46:27 +0800
  • 744042d1b2
    remove software-properties-common from Dockerfile (#11203) Shaojun Liu 2024-06-04 17:37:42 +0800
  • 1dde204775
    update q6k (#11205) Ruonan Wang 2024-06-04 17:14:33 +0800
  • ce3f08b25a
    Fix IPEX auto importer (#11192) Qiyuan Gong 2024-06-04 16:57:18 +0800
  • 711fa0199e
    Fix fp6k phi3 ppl core dump (#11204) Yina Chen 2024-06-04 16:44:27 +0800
  • f02f097002
    Fix vLLM verion in CPU/vLLM-Serving example README (#11201) Xiangyu Tian 2024-06-04 15:56:55 +0800
  • 6454655dcc
    use sdp in baichuan2 13b (#11198) Yishuo Wang 2024-06-04 15:39:00 +0800
  • 9f8074c653
    Add extra warmup for chatglm3-6b in igpu-performance test (#11197) Yuwen Hu 2024-06-04 14:06:09 +0800
  • d90cd977d0
    refactor stablelm (#11195) Yishuo Wang 2024-06-04 13:14:43 +0800
  • a644e9409b
    Miniconda/Anaconda -> Miniforge update in examples (#11194) Zijie Li 2024-06-04 10:14:02 +0800
  • 5f13700c9f
    optimize Minicpm (#11189) Xin Qiu 2024-06-03 18:28:29 +0800
  • ff83fad400
    Fix typo in vLLM CPU docker guide (#11188) Xiangyu Tian 2024-06-03 15:55:27 +0800
  • 15a6205790
    Fix LoRA tokenizer for Llama and chatglm (#11186) Qiyuan Gong 2024-06-03 15:35:38 +0800
  • 3eb13ccd8c
    LLM: fix input length condition in deepspeed all-in-one benchmark. (#11185) Cengguang Zhang 2024-06-03 10:05:43 +0800
  • 401013a630
    Remove chatglm_C Module to Eliminate LGPL Dependency (#11178) Shaojun Liu 2024-05-31 17:03:11 +0800
  • 50b5f4476f
    update q4k convert (#11179) Ruonan Wang 2024-05-31 11:36:53 +0800
  • f0aaa130a9
    Update miniconda/anaconda -> miniforge in documentation (#11176) Yuwen Hu 2024-05-30 17:40:18 +0800
  • c0f1be6aea
    Fix pp logic (#11175) Wang, Jian4 2024-05-30 16:40:59 +0800
  • 4127b99ed6
    Fix null pointer dereferences error. (#11125) ZehuaCao 2024-05-30 16:16:10 +0800
  • 50ee004ac7
    Fix vllm condition (#11169) Guancheng Fu 2024-05-30 15:23:17 +0800
  • dcbf4d3d0a
    Add phi-3-vision example (#11156) Jin Qiao 2024-05-30 10:02:47 +0800
  • 93146b9433
    Reconstruct Speculative Decoding example directory (#11136) Jiao Wang 2024-05-29 13:15:27 -0700
  • 2299698b45
    Refine Pipeline Parallel FastAPI example (#11168) Xiangyu Tian 2024-05-29 17:16:50 +0800
  • 9bfbf78bf4
    update api usage of xe_batch & fp16 (#11164) Ruonan Wang 2024-05-29 07:15:14 +0000
  • e29e2f1c78
    Support new fp8 e4m3 (#11158) Yina Chen 2024-05-29 14:27:14 +0800
  • 8e25de1126
    LLM: Add codegeex2 example (#11143) Wang, Jian4 2024-05-29 10:00:26 +0800
  • 751e1a4e29
    Fix concurrent issue in autoTP streming. (#11150) ZehuaCao 2024-05-29 08:22:38 +0800
  • 7cc43aa67a
    Update readme (#11160) Jason Dai 2024-05-28 21:16:36 +0800
  • bc5008f0d5
    disable sdp_causal in phi-3 to fix overflow (#11157) Yishuo Wang 2024-05-28 17:25:53 +0800
  • 33852bd23e
    Refactor pipeline parallel device config (#11149) SONG Ge 2024-05-28 16:52:46 +0800
  • 62b2d8af6b
    Add lookahead in all-in-one (#11142) hxsz1997 2024-05-28 15:39:58 +0800
  • 83bd9cb681
    add new version for cpp quickstart and keep an old version (#11151) Ruonan Wang 2024-05-28 07:29:34 +0000
  • b44cf405e2
    Refine Pipeline-Parallel-Fastapi example README (#11155) Xiangyu Tian 2024-05-28 15:18:21 +0800
  • d307622797
    fix first token sdp with batch (#11153) Yishuo Wang 2024-05-28 15:03:06 +0800
  • 3464440839
    fix qwen import error (#11154) Yina Chen 2024-05-28 14:50:12 +0800
  • 25b6402315
    Add Windows GPU unit test (#11050) Jin Qiao 2024-05-28 13:29:47 +0800
  • b6b70d1ba0
    Divide core-xe packages (#11131) Yina Chen 2024-05-28 12:00:18 +0800
  • c9168b85b7
    Fix error during merging adapter (#11145) binbin Deng 2024-05-27 19:41:42 +0800
  • daf7b1cd56
    [Docker] Fix image using two cards error (#11144) Guancheng Fu 2024-05-27 16:20:13 +0800
  • 34dab3b4ef
    Update readme (#11141) Jason Dai 2024-05-27 15:41:02 +0800
  • 5c8ccf0ba9
    LLM: Add Pipeline-Parallel-FastAPI example (#10917) Xiangyu Tian 2024-05-27 14:46:29 +0800
  • d550af957a
    fix security issue of eagle (#11140) Ruonan Wang 2024-05-27 02:15:28 +0000
  • 367de141f2
    Fix mixtral-8x7b with transformers=4.37.0 (#11132) binbin Deng 2024-05-27 09:50:54 +0800
  • ab476c7fe2
    Eagle Speculative Sampling examples (#11104) Jean Yu 2024-05-24 13:13:43 -0500
  • fabc395d0d
    add langchain vllm interface (#11121) Guancheng Fu 2024-05-24 17:19:27 +0800
  • 63e95698eb
    [LLM]Reopen autotp generate_stream (#11120) ZehuaCao 2024-05-24 17:16:14 +0800
  • 1dc680341b
    fix phi-3-vision import (#11129) Yishuo Wang 2024-05-24 15:57:15 +0800
  • 7f772c5a4f
    Add half precision for fastchat models (#11130) Guancheng Fu 2024-05-24 15:41:14 +0800
  • 65f4212f89
    Fix qwen 14b run into register attention fwd (#11128) Zhao Changmin 2024-05-24 14:45:07 +0800
  • 373f9e6c79
    add ipex-llm-init.bat for Windows (#11082) Shaojun Liu 2024-05-24 14:26:25 +0800
  • 85491907f3
    Update GIF link (#11119) Shaojun Liu 2024-05-24 14:26:18 +0800
  • 120a0035ac
    Fix type mismatch in eval for Baichuan2 QLora example (#11117) Qiyuan Gong 2024-05-24 14:14:30 +0800
  • 21a1a973c1
    Remove axolotl and python3-blinker (#11127) Qiyuan Gong 2024-05-24 13:54:19 +0800
  • 1db9d9a63b
    optimize internlm2 xcomposer agin (#11124) Yishuo Wang 2024-05-24 13:44:52 +0800
  • 9372ce87ce
    fix internlm xcomposer2 fp16 (#11123) Yishuo Wang 2024-05-24 11:03:31 +0800
  • 011b9faa5c
    LLM: unify baichuan2-13b alibi mask dtype with model dtype. (#11107) Cengguang Zhang 2024-05-24 10:27:53 +0800
  • 0a06a6e1d4
    Update tests for transformers 4.36 (#10858) Jiao Wang 2024-05-23 19:26:38 -0700
  • 1291165720
    LLM: Add quickstart for vLLM cpu (#11122) Xiangyu Tian 2024-05-24 10:21:21 +0800
  • 1443b802cc
    Docker:Fix building cpp_docker and remove unimportant dependencies (#11114) Wang, Jian4 2024-05-24 09:49:44 +0800
  • b3f6faa038
    LLM: Add CPU vLLM entrypoint (#11083) Xiangyu Tian 2024-05-24 09:16:59 +0800
  • 7ed270a4d8
    update readme docker section, fix quickstart title, remove chs figure (#11044) Shengsheng Huang 2024-05-24 00:18:20 +0800
  • 797dbc48b8
    fix phi-2 and phi-3 convert (#11116) Yishuo Wang 2024-05-23 17:37:37 +0800
  • 37b98a531f
    support running internlm xcomposer2 on gpu and add sdp optimization (#11115) Yishuo Wang 2024-05-23 17:26:24 +0800
  • c5e8b90c8d
    Add Qwen register attention implemention (#11110) Zhao Changmin 2024-05-23 17:17:45 +0800
  • 0e53f20edb
    support running internlm-xcomposer2 on cpu (#11111) Yishuo Wang 2024-05-23 16:36:09 +0800
  • e0f401d97d
    FIX: APT Repository not working (signatures invalid) (#11112) Shaojun Liu 2024-05-23 16:15:45 +0800
  • d36b41d59e
    Add setuptools limitation for ipex-llm[xpu] (#11102) Yuwen Hu 2024-05-22 18:20:30 +0800
  • cd4dff09ee
    support phi-3 vision (#11101) Yishuo Wang 2024-05-22 17:43:50 +0800
  • 15d906a97b
    Update linux igpu run script (#11098) Zhao Changmin 2024-05-22 17:18:07 +0800
  • f63172ef63
    Align ppl with llama.cpp (#11055) Kai Huang 2024-05-22 16:43:11 +0800
  • f6c9ffe4dc
    Add WANDB_MODE and HF_HUB_OFFLINE to XPU finetune README (#11097) Qiyuan Gong 2024-05-22 15:20:53 +0800
  • 1c5ed9b6cf
    Fix arc ut (#11096) Yuwen Hu 2024-05-22 14:13:13 +0800
  • 4fd1df9cf6
    Add toc for docker quickstarts (#11095) Guancheng Fu 2024-05-22 11:23:22 +0800
  • 584439e498
    update homepage url for ipex-llm (#11094) Shaojun Liu 2024-05-22 11:10:44 +0800
  • bf0f904e66
    Update level_zero on MTL linux (#11085) Zhao Changmin 2024-05-22 11:01:56 +0800
  • 8fdc8fb197
    Quickstart: Run/Develop PyTorch in VSCode with Docker on Intel GPU (#11070) Shaojun Liu 2024-05-22 09:29:42 +0800
  • 71bcd18f44
    fix qwen vl (#11090) Xin Qiu 2024-05-21 18:40:29 +0800
  • f654f7e08c
    Add serving docker quickstart (#11072) Guancheng Fu 2024-05-21 17:00:58 +0800
  • f00625f9a4
    refactor qwen2 (#11087) Yishuo Wang 2024-05-21 16:53:42 +0800
  • 492ed3fd41
    Add verified models to GPU finetune README (#11088) Qiyuan Gong 2024-05-21 15:49:15 +0800
  • 1210491748
    ChatGLM3, Baichuan2 and Qwen1.5 QLoRA example (#11078) Qiyuan Gong 2024-05-21 15:29:43 +0800
  • ecb16dcf14
    Add deepspeed autotp support for xpu docker (#11077) binbin Deng 2024-05-21 14:49:54 +0800
  • 842d6dfc2d
    Further Modify CPU example (#11081) ZehuaCao 2024-05-21 13:55:47 +0800
  • d830a63bb7
    refactor qwen (#11074) Yishuo Wang 2024-05-20 18:08:37 +0800
  • 74950a152a
    Fix tgi_api_server error file name (#11075) Wang, Jian4 2024-05-20 16:48:40 +0800
  • 4e97047d70
    fix baichuan2 13b fp16 (#11071) Yishuo Wang 2024-05-20 11:21:20 +0800
  • 7170dd9192
    Update guide for running qwen with AutoTP (#11065) binbin Deng 2024-05-20 10:53:17 +0800
  • a2e1578fd9
    Merge tgi_api_server to main (#11036) Wang, Jian4 2024-05-20 09:15:03 +0800
  • f60565adc7
    Fix toc for vllm serving quickstart (#11068) Yuwen Hu 2024-05-17 17:12:48 +0800
  • dfac168d5f
    fix format/typo (#11067) Guancheng Fu 2024-05-17 16:52:17 +0800
  • 31ce3e0c13
    refactor baichuan2-13b (#11064) Yishuo Wang 2024-05-17 16:25:30 +0800
  • 67db925112
    Add vllm quickstart (#10978) Guancheng Fu 2024-05-17 16:16:42 +0800