zxue2
|
6ba3138d7c
|
Fix ambiguous boolean evaluation in bert.py (#13236)
Signed-off-by: Xue, Zhan <zhan.xue@intel.com>
|
2025-06-30 14:14:01 +08:00 |
|
Guancheng Fu
|
3f6d407be4
|
Fix engine.py (#13215)
|
2025-06-09 09:03:17 +08:00 |
|
Guancheng Fu
|
ac04992278
|
Update engine.py (#13209)
|
2025-06-06 15:47:33 +08:00 |
|
Wang, Jian4
|
5a1c1297e1
|
Fix internvl fp16 error (#13205)
|
2025-06-05 11:17:44 +08:00 |
|
Wang, Jian4
|
45864790f7
|
Enable phi-4 with vision and audio (#13203)
* add phi4
* update
* enable audio
* update and add readme
|
2025-06-05 10:15:20 +08:00 |
|
Yina Chen
|
e032156518
|
Support torch_fp8 (#13196)
* support torch_fp8
|
2025-06-04 20:08:01 +08:00 |
|
Guancheng Fu
|
bb50cd0881
|
Update api_server.py (#13198)
|
2025-05-30 09:26:53 +08:00 |
|
Ruonan Wang
|
9df610f80d
|
fix trl import when not running speculative (#13187)
* fix trl import when not running speculative
* fix style
|
2025-05-26 13:21:54 +08:00 |
|
Xiangyu Tian
|
531bef2810
|
vLLM: Fix conver_to_half condition (#13177)
* fix
* format
|
2025-05-22 15:44:10 +08:00 |
|
Wang, Jian4
|
e3130a06ed
|
Fix multimodal errors (#13178)
* fix glm4v int4 output error
* fix glm-4v qwen2.5-vl fp16 error
* update
|
2025-05-22 15:39:27 +08:00 |
|
Xiangyu Tian
|
154af7d7f7
|
vLLM: set convert_to_half to False by default (#13172)
* init
* remove
* fix
|
2025-05-21 18:41:28 +08:00 |
|
Wang, Jian4
|
d83e5068d2
|
Enable whisper (#13162)
* fix error
* update dockerfile
|
2025-05-19 14:07:51 +08:00 |
|
Yina Chen
|
8ba57b41cd
|
Add merge quantized qkv (#13160)
* add merge quantized qkv
* fix style & device
* add check
|
2025-05-16 15:46:47 +08:00 |
|
Emmanuel Ferdman
|
1e4e1353a0
|
Resolve messages formatting issues (#13095)
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
|
2025-05-15 16:46:52 +08:00 |
|
Kai Huang
|
35b49e4d91
|
Add trl version in error message (#13049)
* add version in error msg
* fix style
|
2025-05-15 09:16:27 +08:00 |
|
Yina Chen
|
f6441b4e3d
|
Add moe_softmax_topk (#13157)
* add moe_softmax_topk
* address comments
* update
|
2025-05-13 14:50:59 +08:00 |
|
Ruonan Wang
|
f5d9c49a2a
|
add rotary_half_with_cache_inplaced to ipex_llm.transformers.models.common (#13143)
* update
* small fix
|
2025-05-09 09:20:44 +08:00 |
|
Wang, Jian4
|
f2598b119e
|
update for bge-m3 (#13138)
|
2025-05-07 16:59:52 +08:00 |
|
Yishuo Wang
|
3a28b69202
|
Add qwen3 support (#13137)
|
2025-05-07 14:03:16 +08:00 |
|
Wang, Jian4
|
01bc7e9eb9
|
Fix 083 lm_head error (#13132)
* fix no quantize error
* update
* update style
|
2025-05-06 15:47:20 +08:00 |
|
Xiangyu Tian
|
51b41faad7
|
vLLM: update vLLM XPU to 0.8.3 version (#13118)
vLLM: update vLLM XPU to 0.8.3 version
|
2025-04-30 14:40:53 +08:00 |
|
Wang, Jian4
|
16fa778e65
|
enable glm4v and gemma-3 on vllm 083 (#13114)
* enable glm4v and gemma-3
* update
* add qwen2.5-vl
|
2025-04-27 17:10:56 +08:00 |
|
Yishuo Wang
|
908fdb982e
|
small refactor and fix (#13101)
|
2025-04-22 14:45:31 +08:00 |
|
Ruonan Wang
|
2f78afcd2a
|
Refactor some functions to ipex_llm.transformers.models.common (#13091)
* add quantize_linear & linear_forward
* add moe_group_topk
* rotary_two_with_cache_inplaced
* fix code style
* update related models
|
2025-04-18 11:15:43 +08:00 |
|
Ruonan Wang
|
e08c6bd018
|
Fix several models based on sdp api change (#13075)
* fix baichuan based on sdp api change
* fix several models based on api change
* fix style
|
2025-04-15 11:13:12 +08:00 |
|
Yishuo Wang
|
10c30cdba9
|
set woq_int4 as default int4 (#13021)
|
2025-04-14 14:10:59 +08:00 |
|
Ruonan Wang
|
6693e8ab04
|
Deepseek kv / sdp support (#13068)
* update kv
* fix
* fix style
|
2025-04-11 11:26:15 +08:00 |
|
Yishuo Wang
|
ef852dcb4a
|
add audio optimization for qwen2.5-omni (#13037)
|
2025-04-07 17:20:26 +08:00 |
|
Yishuo Wang
|
300eb01d98
|
Add basic optimization for Qwen2.5 omni (#13022)
|
2025-03-28 17:21:52 +08:00 |
|
Guancheng Fu
|
f437b36678
|
Fix vllm glm edge model (#13007)
* fix done
* fix
|
2025-03-26 09:25:32 +08:00 |
|
Yuwen Hu
|
374747b492
|
Update bert optimization to fit higher transformers/torch version (#13006)
|
2025-03-25 16:12:03 +08:00 |
|
Yuwen Hu
|
5bdf57327d
|
Remove ipex import in fastchat loader (#12984)
|
2025-03-20 18:29:00 +08:00 |
|
Wang, Jian4
|
c9ecb7a113
|
Fix qwen nan value issue on vllm (#12971)
* add to fix qwen nan value issue
* update
|
2025-03-14 14:43:54 +08:00 |
|
Wang, Jian4
|
c8a0462507
|
Add vllm api_server input output log (#12962)
|
2025-03-12 20:58:04 +08:00 |
|
Yishuo Wang
|
b6f33d5c4d
|
optimize moonlight again (#12909)
|
2025-03-03 09:21:15 +08:00 |
|
Yishuo Wang
|
39e360fe9d
|
add grouped topk optimization for moonlight (#12903)
|
2025-02-28 13:25:56 +08:00 |
|
Xin Qiu
|
e946127613
|
glm 4v 1st sdp for vision (#12904)
* glm4v 1st sdp
* update glm4v example
* meet code review
* fix style
|
2025-02-28 13:23:27 +08:00 |
|
Yishuo Wang
|
be1f073866
|
add fuse moe optimization for moonlight (#12898)
|
2025-02-27 09:15:24 +08:00 |
|
Yishuo Wang
|
5faba06409
|
simple optimization for moonlight moe decoding forward (#12891)
|
2025-02-25 16:18:27 +08:00 |
|
Yishuo Wang
|
ab3fc66eb7
|
optimize attention part of moonlight-14B-A3B (#12886)
|
2025-02-25 09:38:13 +08:00 |
|
Yishuo Wang
|
3f6ecce508
|
support using xgrammar to get json output (#12870)
|
2025-02-24 14:10:58 +08:00 |
|
Wang, Jian4
|
3ea5389a99
|
Fix vllm api_server v1/models error (#12867)
|
2025-02-21 11:08:29 +08:00 |
|
Wang, Jian4
|
348dc8056d
|
Fix vllm gptq awq error (#12863)
* fix gptq awq error
* fix python style
|
2025-02-20 16:27:23 +08:00 |
|
Guancheng Fu
|
4eed0c7d99
|
initial implementation for low_bit_loader vLLM (#12838)
* initial
* add logic for handling tensor parallel models
* fix
* Add some comments
* add doc
* fix done
|
2025-02-19 19:45:34 +08:00 |
|
Xiangyu Tian
|
b26409d53f
|
R1 Hybrid: Add Benchmark for DeepSeek R1 transformers example (#12854)
* init
* fix
* update
* update
* fix
* fix
|
2025-02-19 18:33:21 +08:00 |
|
Yishuo Wang
|
aee2db30f9
|
update sdp support (#12847)
|
2025-02-19 12:07:00 +08:00 |
|
Xiangyu Tian
|
93c10be762
|
LLM: Support hybrid convert for DeepSeek V3/R1 (#12834)
LLM: Support hybrid convert for DeepSeek V3/R1
|
2025-02-19 11:31:19 +08:00 |
|
Wang, Jian4
|
e1809a6295
|
Update multimodal on vllm 0.6.6 (#12816)
* add glm4v and minicpmv example
* fix
|
2025-02-19 10:04:42 +08:00 |
|
Yishuo Wang
|
8418450300
|
optimize minicpm-o's tts part (#12833)
|
2025-02-17 14:53:37 +08:00 |
|
Wang, Jian4
|
1083fe5508
|
Reenable pp and lightweight-serving serving on 0.6.6 (#12814)
* reenable pp ang lightweight serving on 066
* update readme
* updat
* update tag
|
2025-02-13 10:16:00 +08:00 |
|