Shaojun Liu
|
25e1709050
|
To avoid errors caused by a Transformers version that is too new. (#13291)
|
2025-08-14 14:52:47 +08:00 |
|
Shaojun Liu
|
cac90a9238
|
update patches (#13290)
Signed-off-by: liu-shaojun <shaojun.liu@intel.com>
|
2025-08-14 10:15:48 +08:00 |
|
Yina Chen
|
9cfdf143a2
|
delete the deprecated llm win test (#13275)
|
2025-08-01 11:27:46 +08:00 |
|
Qiyuan Gong
|
891e1f511b
|
[Doc] Add note about avoiding sourcing oneAPI for flashmoe and llama.cpp portable zip (#13274)
* Add note about avoiding sourcing oneAPI
* Move note ahead of cli
|
2025-07-30 13:58:52 +08:00 |
|
SheldonChen
|
951c23739d
|
update quickstart md related to llama.cpp/ollama (#13265)
* update quickstart md related to llama.cpp/ollama
* update troubleshooting
* update quickstart/troubleshooting according to RuonanWang's comments
|
2025-07-21 16:20:20 +08:00 |
|
Emmanuel Ferdman
|
68c5103a0a
|
[NPU] Update quickstart reference (#13262)
Fix the wrong QuickStart URLs in NPU `Save-Load/README.md`.
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
|
2025-07-21 09:55:40 +08:00 |
|
Jason Dai
|
b229e5ad60
|
Update README.md (#13258)
|
2025-07-18 07:27:01 +08:00 |
|
Yina Chen
|
f0b600da77
|
update llama.cpp version (#13251)
* update llama.cpp version
|
2025-07-09 17:30:27 +08:00 |
|
Ruonan Wang
|
28f72123bd
|
update ollama version (#13244)
|
2025-07-01 09:20:46 +08:00 |
|
zxue2
|
6ba3138d7c
|
Fix ambiguous boolean evaluation in bert.py (#13236)
Signed-off-by: Xue, Zhan <zhan.xue@intel.com>
|
2025-06-30 14:14:01 +08:00 |
|
Guancheng Fu
|
3f6d407be4
|
Fix engine.py (#13215)
|
2025-06-09 09:03:17 +08:00 |
|
Shaojun Liu
|
5a629ae470
|
update vllm patch (#13211)
Co-authored-by: gc-fu <guancheng.fu@intel.com>
|
2025-06-06 17:20:45 +08:00 |
|
Guancheng Fu
|
ac04992278
|
Update engine.py (#13209)
|
2025-06-06 15:47:33 +08:00 |
|
Ruonan Wang
|
dd49368e0c
|
only install onednn for windows when torch 2.6 (#13207)
|
2025-06-05 17:28:21 +08:00 |
|
Wang, Jian4
|
5a1c1297e1
|
Fix internvl fp16 error (#13205)
|
2025-06-05 11:17:44 +08:00 |
|
Wang, Jian4
|
45864790f7
|
Enable phi-4 with vision and audio (#13203)
* add phi4
* update
* enable audio
* update and add readme
|
2025-06-05 10:15:20 +08:00 |
|
Yina Chen
|
e032156518
|
Support torch_fp8 (#13196)
* support torch_fp8
|
2025-06-04 20:08:01 +08:00 |
|
Guancheng Fu
|
3accc31b86
|
Update 1ccl_for_multi_arc.patch (#13199)
|
2025-05-30 17:13:59 +08:00 |
|
Guancheng Fu
|
bb50cd0881
|
Update api_server.py (#13198)
|
2025-05-30 09:26:53 +08:00 |
|
Ruonan Wang
|
9df610f80d
|
fix trl import when not running speculative (#13187)
* fix trl import when not running speculative
* fix style
|
2025-05-26 13:21:54 +08:00 |
|
Shaojun Liu
|
c5d919b151
|
update vllm patch (#13185)
Co-authored-by: gc-fu <guancheng.fu@intel.com>
|
2025-05-23 15:02:50 +08:00 |
|
Xiangyu Tian
|
531bef2810
|
vLLM: Fix conver_to_half condition (#13177)
* fix
* format
|
2025-05-22 15:44:10 +08:00 |
|
Wang, Jian4
|
e3130a06ed
|
Fix multimodal errors (#13178)
* fix glm4v int4 output error
* fix glm-4v qwen2.5-vl fp16 error
* update
|
2025-05-22 15:39:27 +08:00 |
|
Xiangyu Tian
|
154af7d7f7
|
vLLM: set convert_to_half to False by default (#13172)
* init
* remove
* fix
|
2025-05-21 18:41:28 +08:00 |
|
Shaojun Liu
|
1576347892
|
Update Dockerfile (#13168)
|
2025-05-20 16:41:13 +08:00 |
|
Wang, Jian4
|
66eb054988
|
Update vllm patch (#13164)
|
2025-05-19 16:54:21 +08:00 |
|
Wang, Jian4
|
d83e5068d2
|
Enable whisper (#13162)
* fix error
* update dockerfile
|
2025-05-19 14:07:51 +08:00 |
|
Yina Chen
|
8ba57b41cd
|
Add merge quantized qkv (#13160)
* add merge quantized qkv
* fix style & device
* add check
|
2025-05-16 15:46:47 +08:00 |
|
Emmanuel Ferdman
|
1e4e1353a0
|
Resolve messages formatting issues (#13095)
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
|
2025-05-15 16:46:52 +08:00 |
|
Kai Huang
|
35b49e4d91
|
Add trl version in error message (#13049)
* add version in error msg
* fix style
|
2025-05-15 09:16:27 +08:00 |
|
Pranav Singh
|
bd45bf7584
|
Update llama_cpp_quickstart.md (#13145)
Signed-off-by: Pranav Singh <pranav.singh@intel.com>
|
2025-05-15 08:40:53 +08:00 |
|
Shaojun Liu
|
bd71739e64
|
Update docs and scripts to align with new Docker image release (#13156)
* Update vllm_docker_quickstart.md
* Update start-vllm-service.sh
* Update vllm_docker_quickstart.md
* Update start-vllm-service.sh
|
2025-05-13 17:06:29 +08:00 |
|
Yina Chen
|
f6441b4e3d
|
Add moe_softmax_topk (#13157)
* add moe_softmax_topk
* address comments
* update
|
2025-05-13 14:50:59 +08:00 |
|
Yuwen Hu
|
aa12f69bbf
|
Update Ollama portable zip QuickStart regarding saving VRAM (#13155)
* Update Ollama portable zip quickstart regarding saving VRAM
* Small fix
|
2025-05-13 13:25:22 +08:00 |
|
Jason Dai
|
086a8b3ab9
|
Update flashmoe_quickstart (#13154)
|
2025-05-13 07:56:09 +08:00 |
|
Xiangyu Tian
|
886c7632b2
|
Add IPEX_LLM_FORCE_BATCH_FORWARD for vLLM docker image (#13151)
|
2025-05-12 13:44:33 +08:00 |
|
Wang, Jian4
|
5df03ced2c
|
Update vllm patch for fix telechat2 and baichuan2 error(#13150)
|
2025-05-12 10:54:22 +08:00 |
|
Jason Dai
|
9da1c56fa8
|
Create flashmoe quickstart (#13147)
|
2025-05-12 10:11:22 +08:00 |
|
Guancheng Fu
|
da08c9ca60
|
Update Dockerfile (#13148)
|
2025-05-12 09:19:18 +08:00 |
|
Yuwen Hu
|
0438e39f3e
|
Add PyTorch 2.6 support in Latest Update (#13144)
|
2025-05-09 13:26:49 +08:00 |
|
Shaojun Liu
|
45f7bf6688
|
Refactor vLLM Documentation: Centralize Benchmarking and Improve Readability (#13141)
* update vllm doc
* update image name
* update
* update
* update
* update
|
2025-05-09 10:19:42 +08:00 |
|
Ruonan Wang
|
f5d9c49a2a
|
add rotary_half_with_cache_inplaced to ipex_llm.transformers.models.common (#13143)
* update
* small fix
|
2025-05-09 09:20:44 +08:00 |
|
Wang, Jian4
|
f2598b119e
|
update for bge-m3 (#13138)
|
2025-05-07 16:59:52 +08:00 |
|
SONG Ge
|
e88a2aa65b
|
Modify ollama num_ctx related doc (#13139)
* Modify ollama num_ctx related doc
* meet comments
|
2025-05-07 16:44:58 +08:00 |
|
Yishuo Wang
|
3a28b69202
|
Add qwen3 support (#13137)
|
2025-05-07 14:03:16 +08:00 |
|
Wang, Jian4
|
be76918b61
|
Update 083 multimodal benchmark (#13135)
* update multimodal benchmark
* update
|
2025-05-07 09:35:09 +08:00 |
|
Wang, Jian4
|
01bc7e9eb9
|
Fix 083 lm_head error (#13132)
* fix no quantize error
* update
* update style
|
2025-05-06 15:47:20 +08:00 |
|
SONG Ge
|
685a749adb
|
Update ollama-release doc into v0.6.2 (#13094)
* Update ollama-release doc into v0.6.2
* update
* revert signature changes
|
2025-04-30 16:22:42 +08:00 |
|
Xiangyu Tian
|
51b41faad7
|
vLLM: update vLLM XPU to 0.8.3 version (#13118)
vLLM: update vLLM XPU to 0.8.3 version
|
2025-04-30 14:40:53 +08:00 |
|
Yuwen Hu
|
f66eee1d1d
|
Update BMG troubleshooting guides regarding PPA installation (#13119)
* Update bmg troubleshooting guides regarding PPA installation
* Small fix
* Update based on comments
* Small fix
|
2025-04-28 15:48:17 +08:00 |
|