ipex-llm

ayo/ipex-llm

Fork 0

7d8bc83415

LLM: Partial Prefilling for Pipeline Parallel Serving (#11457) Xiangyu Tian 2024-07-05 13:10:35 +0800
72b4efaad4

Enhanced XPU Dockerfiles: Optimized Environment Variables and Documentation (#11506) Shaojun Liu 2024-07-04 20:18:38 +0800
60de428b37

Support pipeline parallel for qwen-vl (#11503) binbin Deng 2024-07-04 18:03:57 +0800
57b8adb189

[WIP] Support npu load_low_bit method (#11502) Zhao Changmin 2024-07-04 17:15:34 +0800
f07937945f

[REMOVE] remove all useless repo-id in benchmark/igpu-perf (#11508) Jun Wang 2024-07-04 16:38:34 +0800
1a8bab172e

add minicpm 1B/2B npu support (#11507) Yishuo Wang 2024-07-04 16:31:04 +0800
bb0a84044b

add qwen2 npu support (#11504) Yishuo Wang 2024-07-04 11:01:25 +0800
932ef78131

Update Workflow Inputs, Runner, and PR Validation Process (#11501) Shaojun Liu 2024-07-03 16:49:54 +0800
f84ca99b9f

optimize gemma2 rmsnorm (#11500) Xin Qiu 2024-07-03 15:21:03 +0800
61c36ba085

Add pp_serving verified models (#11498) Wang, Jian4 2024-07-03 14:57:09 +0800
9274282ef7

Support pipeline parallel for glm-4-9b-chat (#11463) binbin Deng 2024-07-03 14:25:28 +0800
e7ab93b55c

Update pull_request_template.md (#11484) Shaojun Liu 2024-07-03 11:13:16 +0800
d97c2664ce

use new fuse rope in stablelm family (#11497) Yishuo Wang 2024-07-03 11:08:26 +0800
18c973dc3e

Wang jun/ipex llm workflow (#11499) Jun Wang 2024-07-03 10:13:42 +0800
e53bd4401c

Small typo fixes in binary build workflow (#11494) Yuwen Hu 2024-07-02 19:11:43 +0800
4e32c92979

Further fix for triggering perf test from commit (#11493) Yuwen Hu 2024-07-02 18:56:53 +0800
52519e07df

remove models we no longer need in benchmark. (#11492) Xu, Shuo 2024-07-02 17:20:48 +0800
6a0134a9b2

support q4_0_rtn (#11477) Zhao Changmin 2024-07-02 16:57:02 +0800
6352c718f3

[update] merge manually build for testing function to manualy build (#11491) Jun Wang 2024-07-02 16:28:15 +0800
5e967205ac

remove the code converts input to fp16 before calling batch forward kernel (#11489) Yishuo Wang 2024-07-02 16:23:53 +0800
1638573f56

Update llama cpp quickstart regarding windows prerequisites to avoid misleading (#11490) Yuwen Hu 2024-07-02 16:15:47 +0800
986b10e397

Further fix for performance tests triggered by pr (#11488) Yuwen Hu 2024-07-02 15:29:42 +0800
bb6953c19e

Support pr validate perf test (#11486) Yuwen Hu 2024-07-02 15:20:42 +0800
4390e7dc49

Fix codegeex2 transformers version (#11487) Wang, Jian4 2024-07-02 15:09:28 +0800
4fbb0d33ae

Pin compute runtime version for xpu images (#11479) Guancheng Fu 2024-07-01 21:41:02 +0800
a1164e45b6

Enable Release Pypi workflow to be called in another repo (#11483) Shaojun Liu 2024-07-01 19:48:21 +0800
fb4774b076

Update pull request template for manually-ttriggered Unit tests (#11482) Yuwen Hu 2024-07-01 19:06:29 +0800
ca24794dd0

Fixes for performance test triggering (#11481) Yuwen Hu 2024-07-01 18:39:54 +0800
6bdc562f4c

Enable triggering nightly tests/performance tests from another repo (#11480) Yuwen Hu 2024-07-01 17:45:42 +0800
ec3a912ab6

optimize npu llama long context performance (#11478) Yishuo Wang 2024-07-01 16:49:23 +0800
913e750b01

fix non-string deepseed config path bug (#11476) Heyang Sun 2024-07-01 15:53:50 +0800
48ad482d3d

Fix import error caused by pydantic on cpu (#11474) binbin Deng 2024-07-01 15:49:49 +0800
dbba51f455

Enable LLM UT workflow to be called in another repo (#11475) Yuwen Hu 2024-07-01 15:26:17 +0800
39bcb33a67

add sdp support for stablelm 3b (#11473) Yishuo Wang 2024-07-01 14:56:15 +0800
cf8eb7b128

Init NPU quantize method and support q8_0_rtn (#11452) Zhao Changmin 2024-07-01 13:45:07 +0800
319a3b36b2

fix npu llama2 (#11471) Yishuo Wang 2024-07-01 10:14:11 +0800
07362ffffc

ChatGLM3-6B LoRA Fine-tuning Demo (#11450) Heyang Sun 2024-07-01 09:18:39 +0800
e000ac90c4

Add pp_serving example to serving image (#11433) Wang, Jian4 2024-06-28 16:45:25 +0800
fd933c92d8

Fix: Correct num_requests in benchmark for Pipeline Parallel Serving (#11462) Xiangyu Tian 2024-06-28 16:10:51 +0800
b7bc1023fb

Add vllm_online_benchmark.py (#11458) Wang, Jian4 2024-06-28 14:59:06 +0800
86b81c09d9

Table of Contents in Quickstart Files (#11437) SichengStevenLi 2024-06-28 10:41:00 +0800
a414e3ff8a

add pipeline parallel support with load_low_bit (#11414) SONG Ge 2024-06-28 10:17:56 +0800
d0b801d7bc

LLM: change write mode in all-in-one benchmark. (#11444) Cengguang Zhang 2024-06-27 19:36:38 +0800
987017ef47

Update pipeline parallel serving for more model support (#11428) binbin Deng 2024-06-27 18:21:01 +0800
029ff15d28

optimize npu llama2 first token performance (#11451) Yishuo Wang 2024-06-27 17:37:33 +0800
4e4ecd5095

Control sys.modules ipex duplicate check with BIGDL_CHECK_DUPLICATE_IMPORT (#11453) Qiyuan Gong 2024-06-27 17:21:45 +0800
c6e5ad668d

fix internlm xcomposser meta-instruction typo (#11448) Yishuo Wang 2024-06-27 15:29:43 +0800
f89ca23748

optimize npu llama2 perf again (#11445) Yishuo Wang 2024-06-27 15:13:42 +0800
13f59ae6b4

Fix llm binary build linux-build-avxvnni failure (#11447) Shaojun Liu 2024-06-27 14:12:14 +0800
cf0f5c4322

change npu document (#11446) Yishuo Wang 2024-06-27 13:59:59 +0800
508c364a79

Add precision option in PP inference examples (#11440) binbin Deng 2024-06-27 09:24:27 +0800
e9e8f9b4d4

Update Readme (#11441) Jason Dai 2024-06-26 19:48:07 +0800
2939f1ac60

Update README.md (#11439) Jason Dai 2024-06-26 19:25:58 +0800
2a0f8087e3

optimize qwen2 gpu memory usage again (#11435) Yishuo Wang 2024-06-26 16:52:29 +0800
ab9f7f3ac5

FIX: Qwen1.5-GPTQ-Int4 inference error (#11432) Shaojun Liu 2024-06-26 15:36:22 +0800
99cd16ef9f

Fix error while using pipeline parallism (#11434) Guancheng Fu 2024-06-26 15:33:47 +0800
a45ceac4e4

Update main readme for missing quickstarts (#11427) Yuwen Hu 2024-06-26 13:51:42 +0800
40fa23560e

Fix LLAVA example on CPU (#11271) Jiao Wang 2024-06-25 20:04:59 -0700
ca0e69c3a7

optimize npu llama perf again (#11431) Yishuo Wang 2024-06-26 10:52:54 +0800
9f6e5b4fba

optimize llama npu perf (#11426) Yishuo Wang 2024-06-25 17:43:20 +0800
e473b8d946

Add more qwen1.5 and qwen2 support for pipeline parallel inference (#11423) binbin Deng 2024-06-25 15:49:32 +0800
aacc1fd8c0

Fix shape error when run qwen1.5-14b using deepspeed autotp (#11420) binbin Deng 2024-06-25 13:48:37 +0800
3b23de684a

update npu examples (#11422) Yishuo Wang 2024-06-25 13:32:53 +0800
8ddae22cfb

LLM: Refactor Pipeline-Parallel-FastAPI example (#11319) Xiangyu Tian 2024-06-25 13:30:36 +0800
34c15d3a10

update pp document (#11421) SONG Ge 2024-06-25 10:17:20 +0800
9e4ee61737

rename BIGDL_OPTIMIZE_LM_HEAD to IPEX_LLM_LAST_LM_HEAD and add qwen2 (#11418) Xin Qiu 2024-06-24 18:42:37 +0800
75f836f288

Add extra warmup for THUDM/glm-4-9b-chat in igpu-performance test (#11417) Yuwen Hu 2024-06-24 18:08:05 +0800
ecb9efde65

Workaround if demo preview image load slow in mddocs (#11412) Yuwen Hu 2024-06-24 16:17:50 +0800
5e823ef2ce

Fix nightly arc perf (#11404) Shaojun Liu 2024-06-24 15:58:41 +0800
ccb3fb357a

Add mddocs index (#11411) Yuwen Hu 2024-06-24 15:35:18 +0800
c985912ee3

Add Deepspeed LoRA dependencies in document (#11410) Heyang Sun 2024-06-24 15:29:59 +0800
abe53eaa4f

optimize qwen1.5/2 memory usage when running long input with fp16 (#11403) Yishuo Wang 2024-06-24 13:43:04 +0800
7507000ef2

Fix 1383 Llama model on transformers=4.41[WIP] (#11280) Guoqiong Song 2024-06-21 11:24:10 -0700
475b0213d2

README update (API doc and FAQ and minor fixes) (#11397) Shengsheng Huang 2024-06-21 19:46:32 +0800
0c67639539

Add more examples for pipeline parallel inference (#11372) SONG Ge 2024-06-21 17:55:16 +0800
2004fe1a43

Small fix (#11395) Yuwen Hu 2024-06-21 17:45:10 +0800
4cb9a4728e

Add index page for API doc & links update in mddocs (#11393) Yuwen Hu 2024-06-21 17:34:34 +0800
b200e11e21

Add initial python api doc in mddoc (2/2) (#11388) Xu, Shuo 2024-06-21 17:15:05 +0800
aafd6d55cd

Add initial python api doc in mddoc (1/2) (#11389) Yuwen Hu 2024-06-21 17:14:42 +0800
a027121530

Small mddoc fixed based on review (#11391) Yuwen Hu 2024-06-21 17:09:30 +0800
072ce7e66d

update README links to mddocs (#11387) Shengsheng Huang 2024-06-21 13:59:27 +0800
54f9d07d8f

Further mddocs fixes (#11386) Yuwen Hu 2024-06-21 13:27:43 +0800
b30bf7648e

Fix vLLM CPU api_server params (#11384) Xiangyu Tian 2024-06-21 13:00:06 +0800
21fc781fce

Add GLM-4V example (#11343) ivy-lv11 2024-06-21 12:54:31 +0800
9b475c07db

Add missing ragflow quickstart in mddocs and update legecy contents (#11385) Yuwen Hu 2024-06-21 12:28:26 +0800
fed79f106b

Update mddocs for DockerGuides (#11380) Xu, Shuo 2024-06-21 12:10:35 +0800
1a1a97c9e4

Update mddocs for part of Overview (2/2) and Inference (#11377) SichengStevenLi 2024-06-21 12:07:50 +0800
33b9a9c4c9

Update part of Overview guide in mddocs (1/2) (#11378) Zijie Li 2024-06-21 10:45:17 +0800
4ba82191f2

Support PP inference for chatglm3 (#11375) binbin Deng 2024-06-21 09:59:01 +0800
9a3a21e4fc

Update part of Quickstart guide in mddocs (2/2) (#11376) Jin Qiao 2024-06-20 19:03:06 +0800
8c9f877171

Update part of Quickstart guide in mddocs (1/2) Yuwen Hu 2024-06-20 18:43:23 +0800
f0fdfa081b

Optimize qwen 1.5 14B batch performance (#11370) Yishuo Wang 2024-06-20 17:23:39 +0800
5aa3e427a9

Fix docker images (#11362) Shaojun Liu 2024-06-20 15:44:55 +0800
d9dd1b70bd

Remove example page in mddocs (#11373) Yuwen Hu 2024-06-20 14:23:43 +0800
c0e86c523a

Add qwen-moe batch1 to nightly perf (#11369) Wenjing Margaret Mao 2024-06-20 14:17:41 +0800
769728c1eb

Add initial md docs (#11371) Yuwen Hu 2024-06-20 13:47:49 +0800
9601fae5d5

fix system note (#11368) Shengsheng Huang 2024-06-20 11:09:53 +0800
a5e7d93242

Add initial save/load low bit support for NPU(now only fp16 is supported) (#11359) Yishuo Wang 2024-06-20 10:49:39 +0800
ed4c439497

small fix (#11366) Shengsheng Huang 2024-06-20 10:38:20 +0800
05a8d051f6

Fix run.py run_ipex_fp16_gpu (#11361) RyuKosei 2024-06-20 10:29:32 +0800

Commit graph Select branches Hide pull requests main Mono Color

Commit graph

Select branches

Hide pull requests

main