Commit graph

762 commits

Author SHA1 Message Date
zxue2
6ba3138d7c
Fix ambiguous boolean evaluation in bert.py (#13236)
Signed-off-by: Xue, Zhan <zhan.xue@intel.com>
2025-06-30 14:14:01 +08:00
Guancheng Fu
3f6d407be4
Fix engine.py (#13215) 2025-06-09 09:03:17 +08:00
Guancheng Fu
ac04992278
Update engine.py (#13209) 2025-06-06 15:47:33 +08:00
Wang, Jian4
5a1c1297e1
Fix internvl fp16 error (#13205) 2025-06-05 11:17:44 +08:00
Wang, Jian4
45864790f7
Enable phi-4 with vision and audio (#13203)
* add phi4

* update

* enable audio

* update and add readme
2025-06-05 10:15:20 +08:00
Yina Chen
e032156518
Support torch_fp8 (#13196)
* support torch_fp8
2025-06-04 20:08:01 +08:00
Guancheng Fu
bb50cd0881
Update api_server.py (#13198) 2025-05-30 09:26:53 +08:00
Ruonan Wang
9df610f80d
fix trl import when not running speculative (#13187)
* fix trl import when not running speculative

* fix style
2025-05-26 13:21:54 +08:00
Xiangyu Tian
531bef2810
vLLM: Fix conver_to_half condition (#13177)
* fix

* format
2025-05-22 15:44:10 +08:00
Wang, Jian4
e3130a06ed
Fix multimodal errors (#13178)
* fix glm4v int4 output error

* fix glm-4v qwen2.5-vl fp16 error

* update
2025-05-22 15:39:27 +08:00
Xiangyu Tian
154af7d7f7
vLLM: set convert_to_half to False by default (#13172)
* init

* remove

* fix
2025-05-21 18:41:28 +08:00
Wang, Jian4
d83e5068d2
Enable whisper (#13162)
* fix error

* update dockerfile
2025-05-19 14:07:51 +08:00
Yina Chen
8ba57b41cd
Add merge quantized qkv (#13160)
* add merge quantized qkv

* fix style & device

* add check
2025-05-16 15:46:47 +08:00
Emmanuel Ferdman
1e4e1353a0
Resolve messages formatting issues (#13095)
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
2025-05-15 16:46:52 +08:00
Kai Huang
35b49e4d91
Add trl version in error message (#13049)
* add version in error msg

* fix style
2025-05-15 09:16:27 +08:00
Yina Chen
f6441b4e3d
Add moe_softmax_topk (#13157)
* add moe_softmax_topk

* address comments

* update
2025-05-13 14:50:59 +08:00
Ruonan Wang
f5d9c49a2a
add rotary_half_with_cache_inplaced to ipex_llm.transformers.models.common (#13143)
* update

* small fix
2025-05-09 09:20:44 +08:00
Wang, Jian4
f2598b119e
update for bge-m3 (#13138) 2025-05-07 16:59:52 +08:00
Yishuo Wang
3a28b69202
Add qwen3 support (#13137) 2025-05-07 14:03:16 +08:00
Wang, Jian4
01bc7e9eb9
Fix 083 lm_head error (#13132)
* fix no quantize error

* update

* update style
2025-05-06 15:47:20 +08:00
Xiangyu Tian
51b41faad7
vLLM: update vLLM XPU to 0.8.3 version (#13118)
vLLM: update vLLM XPU to 0.8.3 version
2025-04-30 14:40:53 +08:00
Wang, Jian4
16fa778e65
enable glm4v and gemma-3 on vllm 083 (#13114)
* enable glm4v and gemma-3

* update

* add qwen2.5-vl
2025-04-27 17:10:56 +08:00
Yishuo Wang
908fdb982e
small refactor and fix (#13101) 2025-04-22 14:45:31 +08:00
Ruonan Wang
2f78afcd2a
Refactor some functions to ipex_llm.transformers.models.common (#13091)
* add quantize_linear & linear_forward

* add moe_group_topk

* rotary_two_with_cache_inplaced

* fix code style

* update related models
2025-04-18 11:15:43 +08:00
Ruonan Wang
e08c6bd018
Fix several models based on sdp api change (#13075)
* fix baichuan based on sdp api change

* fix several models based on api change

* fix style
2025-04-15 11:13:12 +08:00
Yishuo Wang
10c30cdba9
set woq_int4 as default int4 (#13021) 2025-04-14 14:10:59 +08:00
Ruonan Wang
6693e8ab04
Deepseek kv / sdp support (#13068)
* update kv

* fix

* fix style
2025-04-11 11:26:15 +08:00
Yishuo Wang
ef852dcb4a
add audio optimization for qwen2.5-omni (#13037) 2025-04-07 17:20:26 +08:00
Yishuo Wang
300eb01d98
Add basic optimization for Qwen2.5 omni (#13022) 2025-03-28 17:21:52 +08:00
Guancheng Fu
f437b36678
Fix vllm glm edge model (#13007)
* fix done

* fix
2025-03-26 09:25:32 +08:00
Yuwen Hu
374747b492
Update bert optimization to fit higher transformers/torch version (#13006) 2025-03-25 16:12:03 +08:00
Yuwen Hu
5bdf57327d
Remove ipex import in fastchat loader (#12984) 2025-03-20 18:29:00 +08:00
Wang, Jian4
c9ecb7a113
Fix qwen nan value issue on vllm (#12971)
* add to fix qwen nan value issue

* update
2025-03-14 14:43:54 +08:00
Wang, Jian4
c8a0462507
Add vllm api_server input output log (#12962) 2025-03-12 20:58:04 +08:00
Yishuo Wang
b6f33d5c4d
optimize moonlight again (#12909) 2025-03-03 09:21:15 +08:00
Yishuo Wang
39e360fe9d
add grouped topk optimization for moonlight (#12903) 2025-02-28 13:25:56 +08:00
Xin Qiu
e946127613
glm 4v 1st sdp for vision (#12904)
* glm4v 1st sdp

* update glm4v example

* meet code review

* fix style
2025-02-28 13:23:27 +08:00
Yishuo Wang
be1f073866
add fuse moe optimization for moonlight (#12898) 2025-02-27 09:15:24 +08:00
Yishuo Wang
5faba06409
simple optimization for moonlight moe decoding forward (#12891) 2025-02-25 16:18:27 +08:00
Yishuo Wang
ab3fc66eb7
optimize attention part of moonlight-14B-A3B (#12886) 2025-02-25 09:38:13 +08:00
Yishuo Wang
3f6ecce508
support using xgrammar to get json output (#12870) 2025-02-24 14:10:58 +08:00
Wang, Jian4
3ea5389a99
Fix vllm api_server v1/models error (#12867) 2025-02-21 11:08:29 +08:00
Wang, Jian4
348dc8056d
Fix vllm gptq awq error (#12863)
* fix gptq awq error

* fix python style
2025-02-20 16:27:23 +08:00
Guancheng Fu
4eed0c7d99
initial implementation for low_bit_loader vLLM (#12838)
* initial

* add logic for handling tensor parallel models

* fix

* Add some comments

* add doc

* fix done
2025-02-19 19:45:34 +08:00
Xiangyu Tian
b26409d53f
R1 Hybrid: Add Benchmark for DeepSeek R1 transformers example (#12854)
* init

* fix

* update

* update

* fix

* fix
2025-02-19 18:33:21 +08:00
Yishuo Wang
aee2db30f9
update sdp support (#12847) 2025-02-19 12:07:00 +08:00
Xiangyu Tian
93c10be762
LLM: Support hybrid convert for DeepSeek V3/R1 (#12834)
LLM: Support hybrid convert for DeepSeek V3/R1
2025-02-19 11:31:19 +08:00
Wang, Jian4
e1809a6295
Update multimodal on vllm 0.6.6 (#12816)
* add glm4v and minicpmv example

* fix
2025-02-19 10:04:42 +08:00
Yishuo Wang
8418450300
optimize minicpm-o's tts part (#12833) 2025-02-17 14:53:37 +08:00
Wang, Jian4
1083fe5508
Reenable pp and lightweight-serving serving on 0.6.6 (#12814)
* reenable pp ang lightweight serving on 066

* update readme

* updat

* update tag
2025-02-13 10:16:00 +08:00