ipex-llm

ayo/ipex-llm

Fork 0

6f3441ba4c

fix glm4-9b overflow (#12455) Yishuo Wang 2024-11-27 17:39:13 +0800
281c9b0bb9

[NPU] Add L0 support for NPU C++ (#12454) Ruonan Wang 2024-11-27 01:04:13 -0800
ce6fcaa9ba

update transformers version in example of glm4 (#12453) Chu,Youcheng 2024-11-27 15:02:25 +0800
effb9bb41c

Small update to LangChain examples readme (#12452) Yuwen Hu 2024-11-27 14:02:25 +0800
acd77d9e87

Remove env variable BIGDL_LLM_XMX_DISABLED in documentation (#12445) Chu,Youcheng 2024-11-27 11:16:36 +0800
f8c2bb2943

[NPU] optimize qwen2 prefill performance for C++ (#12451) Ruonan Wang 2024-11-26 18:46:18 -0800
8331875f34

Fix (#12390) Guancheng Fu 2024-11-27 10:41:58 +0800
cb7b08948b

update vllm-docker-quick-start for vllm0.6.2 (#12392) Jun Wang 2024-11-27 08:47:03 +0800
7b40f9b372

[NPU] Support GW for NPU C++ (#12450) Ruonan Wang 2024-11-26 01:46:40 -0800
c2efa264d9

Update LangChain examples to use upstream (#12388) Jin, Qiao 2024-11-26 16:43:15 +0800
24b46b2b19

[NPU] further fix of qwen2 int8 pipeline & C++ (#12449) Ruonan Wang 2024-11-26 00:39:39 -0800
303b104c10

Fix abnormal output for Qwen2-7B when sym_int8 (#12446) Yuwen Hu 2024-11-26 15:53:04 +0800
71e1f11aa6

update serving image runtime (#12433) Pepijn de Vos 2024-11-26 07:55:30 +0100
52c17fe104

Optimize first token of C++ NPU by adding npu_dpu_groups (#12443) Ruonan Wang 2024-11-25 19:41:32 -0800
66bd7abae4

add sdxl and lora-lcm optimization (#12444) Jinhe 2024-11-26 11:38:09 +0800
0e23bd779f

Add support of llama3.2 for NPU C++ (#12442) Ruonan Wang 2024-11-25 17:26:55 -0800
cdd41f5e4c

optimize sdxl again (#12441) Yishuo Wang 2024-11-25 17:46:46 +0800
b9abb8a285

Support qwen2.5 3B for NPU & update related examples (#12438) Ruonan Wang 2024-11-25 00:38:31 -0800
b633fbf26c

add chinese prompt troubleshooting for npu cpp examples (#12437) Jinhe 2024-11-25 15:28:47 +0800
8164aed802

small change (#12439) Yishuo Wang 2024-11-25 14:35:49 +0800
be132c4209

fix and optimize sd (#12436) Yishuo Wang 2024-11-25 14:09:48 +0800
f41405368a

Support minicpm for NPU C++ (#12434) Ruonan Wang 2024-11-24 18:42:02 -0800
0819fad34e

support Llama2-7B / Llama3-8B for NPU C++ (#12431) Ruonan Wang 2024-11-22 02:47:19 -0800
4ffa6c752c

New convert support for C++ NPU (#12430) Ruonan Wang 2024-11-21 22:28:30 -0800
c089b6c10d

Update english prompt to 34k (#12429) Shaojun Liu 2024-11-22 11:20:35 +0800
e61ae88c5b

Upgrade denpendency for xpu_lnl and xpu_arl option (#12424) Yuwen Hu 2024-11-21 18:37:15 +0800
2935e97610

small fix of cpp readme(#12425) Ruonan Wang 2024-11-21 02:21:34 -0800
8fdc36c140

Optimize with new batch kernel when batch_size=1 on LNL (#12419) Yuwen Hu 2024-11-21 16:21:35 +0800
7e0a840f74

add optimization to openjourney (#12423) Jinhe 2024-11-21 15:23:51 +0800
145e8b480f

update batch kernel condition (#12421) Yishuo Wang 2024-11-21 10:12:46 +0800
7288c759ce

Initial NPU C++ Example (#12417) Ruonan Wang 2024-11-20 18:09:26 -0800
d2a37b6ab2

add Stable diffusion examples (#12418) Jinhe 2024-11-20 17:18:36 +0800
54c62feb74

[NPU] dump prefill IR for further C++ solution (#12402) Ruonan Wang 2024-11-19 23:20:05 -0800
1bfcbc0640

Add multimodal benchmark (#12415) Wang, Jian4 2024-11-20 14:21:13 +0800
ff3f7cb25f

Fix speech_paraformer issue with unexpected changes (#12416) SONG Ge 2024-11-18 23:01:20 -0800
a9cb70a71c

Add install_windows_gpu.zh-CN.md and install_linux_gpu.zh-CN.md (#12409) joan726 2024-11-19 14:39:53 +0800
d6057f6dd2

Update benchmark_vllm_throughput.py (#12414) Guancheng Fu 2024-11-19 10:41:43 +0800
a69395f31f

Support performance mode of GLM4 model (#12401) Yuwen Hu 2024-11-18 18:46:52 +0800
d2c821d458

Add missing arguments in pipeline parallel generate method (#12142) Song Fuchang 2024-11-18 13:50:18 +0800
3d5fbf2069

update batch kernel condition (#12408) Yishuo Wang 2024-11-15 13:47:05 +0800
6c5e8fc70c

fix again (#12407) Ruonan Wang 2024-11-15 11:57:58 +0800
fcc0fa7316

fix workflow again (#12406) Ruonan Wang 2024-11-15 11:01:35 +0800
d1cde7fac4

Tiny doc fix (#12405) Yuwen Hu 2024-11-15 10:28:38 +0800
548dec5185

fix npu pipeline workflow (#12404) Ruonan Wang 2024-11-15 10:01:33 +0800
d4d949443f

[NPU] change attention_mask to fp16 (#12400) binbin Deng 2024-11-14 17:20:29 +0800
7e50ff113c

Add padding_token=eos_token for GPU trl QLora example (#12398) Qiyuan Gong 2024-11-14 10:51:30 +0800
d2cbcb060c

Add initial support for modeling_xlm encoder on NPU (#12393) SONG Ge 2024-11-14 10:50:27 +0800
6726b198fd

Update readme & doc for the vllm upgrade to v0.6.2 (#12399) Xu, Shuo 2024-11-14 10:28:15 +0800
59b01fa7d2

small fix (#12397) Yina Chen 2024-11-14 04:03:36 +0200
00fce5c940

use new q4_0 batch kernel (#12396) Yishuo Wang 2024-11-13 18:37:34 +0800
d6d63d6b84

[NPU] Qwen prefill attn_mask type hotfix (#12395) Yina Chen 2024-11-13 11:51:34 +0200
9220babaab

qwen prefill attn_mask type fp16 (#12394) Yina Chen 2024-11-13 11:45:26 +0200
1158f91648

Fix llava with multi-image inputs (#12384) Yuwen Hu 2024-11-13 09:27:50 +0800
27152476e1

minor fix (#12389) Shaojun Liu 2024-11-12 22:36:43 +0800
dd8964ba9c

changed inference-cpp/Dockerfile (#12386) Xu, Shuo 2024-11-12 20:40:21 +0800
0ee54fc55f

Upgrade to vllm 0.6.2 (#12338) Guancheng Fu 2024-11-12 20:35:34 +0800
4376fdee62

Decouple the openwebui and the ollama. in inference-cpp-xpu dockerfile (#12382) Jun Wang 2024-11-12 20:15:23 +0800
6bf5a8c230

[NPU] Update qwen2 compile config (#12383) Ruonan Wang 2024-11-12 16:59:44 +0800
7a97fbb779

Support vpm and resampler module of minicpm-v on NPU (#12375) binbin Deng 2024-11-12 15:59:55 +0800
85c9279e6e

Update llama-cpp docker usage (#12387) Wang, Jian4 2024-11-12 15:30:17 +0800
c92d76b997

Update oneccl-binding.patch (#12377) Shaojun Liu 2024-11-11 22:34:08 +0800
e0918934c8

Add fused_mlp to glm4v models (#12378) Yuwen Hu 2024-11-11 17:10:25 +0800
dc34e8c51f

optimize glm4v vision attention (#12369) Yishuo Wang 2024-11-08 17:01:57 +0800
2dfcc36825

Fix trl version and padding in trl qlora example (#12368) Qiyuan Gong 2024-11-08 16:05:17 +0800
fad15c8ca0

Update fastchat demo script (#12367) Shaojun Liu 2024-11-08 15:42:17 +0800
51f7f87768

fix ipex 2.3 bug (#12366) Yishuo Wang 2024-11-08 13:29:15 +0800
b2e69a896c

[NPU] Support Baichuan groupwise & gw code refactor (#12337) Yina Chen 2024-11-08 05:42:42 +0200
812d5cc32e

[NPU L0] Support llama3.2 in L0 pipeline (#12361) binbin Deng 2024-11-08 10:01:23 +0800
7ef7696956

update linux installation doc (#12365) Xin Qiu 2024-11-08 09:44:58 +0800
8fe294e01f

Small fix to all-in-one benchmark (#12362) Yuwen Hu 2024-11-07 18:56:34 +0800
1a6cbc473f

Add fused mlp optimizations to glm4 models (#12360) Yuwen Hu 2024-11-07 18:52:47 +0800
520af4e9b5

Update install_linux_gpu.md (#12353) Xin Qiu 2024-11-07 16:08:01 +0800
ad68c56573

small improvement (#12359) Yishuo Wang 2024-11-07 15:57:41 +0800
71ea539351

Add troubleshootings for ollama and llama.cpp (#12358) Jinhe 2024-11-07 15:49:20 +0800
ce0c6ae423

Update Readme for FastChat docker demo (#12354) Xu, Shuo 2024-11-07 15:22:42 +0800
d880e534d2

[NPU] acclib llama3.2 support groupwise (#12355) Yina Chen 2024-11-07 05:19:55 +0200
79f2877413

add minicpm-v models to transformers_int4_npu_win api (#12352) Jinhe 2024-11-07 10:05:10 +0800
a7b66683f1

[NPU] Add Optimized Support for Llama3.2-1B/3B on NPU (#12339) SONG Ge 2024-11-06 19:21:40 +0800
872a74481a

Small optimization to glm4 models (#12351) Yuwen Hu 2024-11-06 19:16:58 +0800
c267355b35

fix three NPU benchmark issues (#12350) Ruonan Wang 2024-11-06 19:01:01 +0800
f24352aef9

llama 3.1/3.2 support compresskv (#12347) Yina Chen 2024-11-06 11:33:43 +0200
d984c0672a

Add MiniCPM-V-2_6 to arc perf test (#12349) Jin, Qiao 2024-11-06 16:32:28 +0800
e23ef7d088

optimize glm4v's vision part (#12346) Yishuo Wang 2024-11-06 15:43:40 +0800
c8b7265359

Add basic glm4v support (#12345) Yishuo Wang 2024-11-06 13:50:10 +0800
69e3a56943

[NPU] Hot fix of load_low_bit (#12344) binbin Deng 2024-11-06 10:07:00 +0800
899a30331a

Replace gradio_web_server.patch to adjust webui (#12329) Xu, Shuo 2024-11-06 09:16:32 +0800
7240c283a3

Add dummy model in iGPU perf (#12341) Jin, Qiao 2024-11-05 17:56:10 +0800
8e9a3a1158

fix chatglm2 cpu ut (#12336) Zhao Changmin 2024-11-05 16:43:57 +0800
d872639395

[NPU] Llama3, Qwen2 1.5b, MiniCPM 1/2B groupwise support (#12327) Yina Chen 2024-11-05 09:51:31 +0200
82a61b5cf3

Limit trl version in example (#12332) Jin, Qiao 2024-11-05 14:50:10 +0800
923d696854

Small fix to LNL performance tests (#12333) Yuwen Hu 2024-11-05 13:24:58 +0800
45b0d371aa

update benchmark readme (#12323) Zijie Li 2024-11-04 19:19:08 -0500
e2adc974fd

Small fix to LNL performance tests (#12331) Yuwen Hu 2024-11-04 19:22:41 +0800
522cdf8e9d

Add initial support for LNL nightly performance tests (#12326) Yuwen Hu 2024-11-04 18:53:51 +0800
1b637e4477

Add chatglm2&3 fuse mlp (#12328) Zhao Changmin 2024-11-04 18:04:41 +0800
94c4ce389f

[NPU] Add env to disable compile opt (#12330) Yina Chen 2024-11-04 11:46:17 +0200
e54af44ed6

Add transformers_int4_npu_pipeline_win in all-in-one benchmark (#12325) Ch1y0q 2024-11-04 16:00:20 +0800
5ee6f97d6f

[NPU L0] Add layernorm weight as const / input setting (#12322) binbin Deng 2024-11-04 15:46:24 +0800
a01371f90b

Doc: update harness readme (#12324) Chu,Youcheng 2024-11-04 14:58:54 +0800
4644cb640c

Perf test further fix regarding trl version (#12321) Yuwen Hu 2024-11-04 11:01:25 +0800

Commit graph Select branches Hide pull requests main Mono Color

Commit graph

Select branches

Hide pull requests

main