Commit graph

4109 commits

Author SHA1 Message Date
Shaojun Liu
25e1709050
To avoid errors caused by a Transformers version that is too new. (#13291) 2025-08-14 14:52:47 +08:00
Shaojun Liu
cac90a9238
update patches (#13290)
Signed-off-by: liu-shaojun <shaojun.liu@intel.com>
2025-08-14 10:15:48 +08:00
Yina Chen
9cfdf143a2
delete the deprecated llm win test (#13275) 2025-08-01 11:27:46 +08:00
Qiyuan Gong
891e1f511b
[Doc] Add note about avoiding sourcing oneAPI for flashmoe and llama.cpp portable zip (#13274)
* Add note about avoiding sourcing oneAPI
* Move note ahead of cli
2025-07-30 13:58:52 +08:00
SheldonChen
951c23739d
update quickstart md related to llama.cpp/ollama (#13265)
* update quickstart md related to llama.cpp/ollama

* update troubleshooting

* update quickstart/troubleshooting according to RuonanWang's comments
2025-07-21 16:20:20 +08:00
Emmanuel Ferdman
68c5103a0a
[NPU] Update quickstart reference (#13262)
Fix the wrong QuickStart URLs in NPU `Save-Load/README.md`.

Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
2025-07-21 09:55:40 +08:00
Jason Dai
b229e5ad60
Update README.md (#13258) 2025-07-18 07:27:01 +08:00
Yina Chen
f0b600da77
update llama.cpp version (#13251)
* update llama.cpp version
2025-07-09 17:30:27 +08:00
Ruonan Wang
28f72123bd
update ollama version (#13244) 2025-07-01 09:20:46 +08:00
zxue2
6ba3138d7c
Fix ambiguous boolean evaluation in bert.py (#13236)
Signed-off-by: Xue, Zhan <zhan.xue@intel.com>
2025-06-30 14:14:01 +08:00
Guancheng Fu
3f6d407be4
Fix engine.py (#13215) 2025-06-09 09:03:17 +08:00
Shaojun Liu
5a629ae470
update vllm patch (#13211)
Co-authored-by: gc-fu <guancheng.fu@intel.com>
2025-06-06 17:20:45 +08:00
Guancheng Fu
ac04992278
Update engine.py (#13209) 2025-06-06 15:47:33 +08:00
Ruonan Wang
dd49368e0c
only install onednn for windows when torch 2.6 (#13207) 2025-06-05 17:28:21 +08:00
Wang, Jian4
5a1c1297e1
Fix internvl fp16 error (#13205) 2025-06-05 11:17:44 +08:00
Wang, Jian4
45864790f7
Enable phi-4 with vision and audio (#13203)
* add phi4

* update

* enable audio

* update and add readme
2025-06-05 10:15:20 +08:00
Yina Chen
e032156518
Support torch_fp8 (#13196)
* support torch_fp8
2025-06-04 20:08:01 +08:00
Guancheng Fu
3accc31b86
Update 1ccl_for_multi_arc.patch (#13199) 2025-05-30 17:13:59 +08:00
Guancheng Fu
bb50cd0881
Update api_server.py (#13198) 2025-05-30 09:26:53 +08:00
Ruonan Wang
9df610f80d
fix trl import when not running speculative (#13187)
* fix trl import when not running speculative

* fix style
2025-05-26 13:21:54 +08:00
Shaojun Liu
c5d919b151
update vllm patch (#13185)
Co-authored-by: gc-fu <guancheng.fu@intel.com>
2025-05-23 15:02:50 +08:00
Xiangyu Tian
531bef2810
vLLM: Fix conver_to_half condition (#13177)
* fix

* format
2025-05-22 15:44:10 +08:00
Wang, Jian4
e3130a06ed
Fix multimodal errors (#13178)
* fix glm4v int4 output error

* fix glm-4v qwen2.5-vl fp16 error

* update
2025-05-22 15:39:27 +08:00
Xiangyu Tian
154af7d7f7
vLLM: set convert_to_half to False by default (#13172)
* init

* remove

* fix
2025-05-21 18:41:28 +08:00
Shaojun Liu
1576347892
Update Dockerfile (#13168) 2025-05-20 16:41:13 +08:00
Wang, Jian4
66eb054988
Update vllm patch (#13164) 2025-05-19 16:54:21 +08:00
Wang, Jian4
d83e5068d2
Enable whisper (#13162)
* fix error

* update dockerfile
2025-05-19 14:07:51 +08:00
Yina Chen
8ba57b41cd
Add merge quantized qkv (#13160)
* add merge quantized qkv

* fix style & device

* add check
2025-05-16 15:46:47 +08:00
Emmanuel Ferdman
1e4e1353a0
Resolve messages formatting issues (#13095)
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
2025-05-15 16:46:52 +08:00
Kai Huang
35b49e4d91
Add trl version in error message (#13049)
* add version in error msg

* fix style
2025-05-15 09:16:27 +08:00
Pranav Singh
bd45bf7584
Update llama_cpp_quickstart.md (#13145)
Signed-off-by: Pranav Singh <pranav.singh@intel.com>
2025-05-15 08:40:53 +08:00
Shaojun Liu
bd71739e64
Update docs and scripts to align with new Docker image release (#13156)
* Update vllm_docker_quickstart.md

* Update start-vllm-service.sh

* Update vllm_docker_quickstart.md

* Update start-vllm-service.sh
2025-05-13 17:06:29 +08:00
Yina Chen
f6441b4e3d
Add moe_softmax_topk (#13157)
* add moe_softmax_topk

* address comments

* update
2025-05-13 14:50:59 +08:00
Yuwen Hu
aa12f69bbf
Update Ollama portable zip QuickStart regarding saving VRAM (#13155)
* Update Ollama portable zip quickstart regarding saving VRAM

* Small fix
2025-05-13 13:25:22 +08:00
Jason Dai
086a8b3ab9
Update flashmoe_quickstart (#13154) 2025-05-13 07:56:09 +08:00
Xiangyu Tian
886c7632b2
Add IPEX_LLM_FORCE_BATCH_FORWARD for vLLM docker image (#13151) 2025-05-12 13:44:33 +08:00
Wang, Jian4
5df03ced2c
Update vllm patch for fix telechat2 and baichuan2 error(#13150) 2025-05-12 10:54:22 +08:00
Jason Dai
9da1c56fa8
Create flashmoe quickstart (#13147) 2025-05-12 10:11:22 +08:00
Guancheng Fu
da08c9ca60
Update Dockerfile (#13148) 2025-05-12 09:19:18 +08:00
Yuwen Hu
0438e39f3e
Add PyTorch 2.6 support in Latest Update (#13144) 2025-05-09 13:26:49 +08:00
Shaojun Liu
45f7bf6688
Refactor vLLM Documentation: Centralize Benchmarking and Improve Readability (#13141)
* update vllm doc

* update image name

* update

* update

* update

* update
2025-05-09 10:19:42 +08:00
Ruonan Wang
f5d9c49a2a
add rotary_half_with_cache_inplaced to ipex_llm.transformers.models.common (#13143)
* update

* small fix
2025-05-09 09:20:44 +08:00
Wang, Jian4
f2598b119e
update for bge-m3 (#13138) 2025-05-07 16:59:52 +08:00
SONG Ge
e88a2aa65b
Modify ollama num_ctx related doc (#13139)
* Modify ollama num_ctx related doc

* meet comments
2025-05-07 16:44:58 +08:00
Yishuo Wang
3a28b69202
Add qwen3 support (#13137) 2025-05-07 14:03:16 +08:00
Wang, Jian4
be76918b61
Update 083 multimodal benchmark (#13135)
* update multimodal benchmark

* update
2025-05-07 09:35:09 +08:00
Wang, Jian4
01bc7e9eb9
Fix 083 lm_head error (#13132)
* fix no quantize error

* update

* update style
2025-05-06 15:47:20 +08:00
SONG Ge
685a749adb
Update ollama-release doc into v0.6.2 (#13094)
* Update ollama-release doc into v0.6.2

* update

* revert signature changes
2025-04-30 16:22:42 +08:00
Xiangyu Tian
51b41faad7
vLLM: update vLLM XPU to 0.8.3 version (#13118)
vLLM: update vLLM XPU to 0.8.3 version
2025-04-30 14:40:53 +08:00
Yuwen Hu
f66eee1d1d
Update BMG troubleshooting guides regarding PPA installation (#13119)
* Update bmg troubleshooting guides regarding PPA installation

* Small fix

* Update based on comments

* Small fix
2025-04-28 15:48:17 +08:00