Yina Chen
e032156518
Support torch_fp8 ( #13196 )
...
* support torch_fp8
2025-06-04 20:08:01 +08:00
Guancheng Fu
3accc31b86
Update 1ccl_for_multi_arc.patch ( #13199 )
2025-05-30 17:13:59 +08:00
Guancheng Fu
bb50cd0881
Update api_server.py ( #13198 )
2025-05-30 09:26:53 +08:00
Ruonan Wang
9df610f80d
fix trl import when not running speculative ( #13187 )
...
* fix trl import when not running speculative
* fix style
2025-05-26 13:21:54 +08:00
Shaojun Liu
c5d919b151
update vllm patch ( #13185 )
...
Co-authored-by: gc-fu <guancheng.fu@intel.com>
2025-05-23 15:02:50 +08:00
Xiangyu Tian
531bef2810
vLLM: Fix conver_to_half condition ( #13177 )
...
* fix
* format
2025-05-22 15:44:10 +08:00
Wang, Jian4
e3130a06ed
Fix multimodal errors ( #13178 )
...
* fix glm4v int4 output error
* fix glm-4v qwen2.5-vl fp16 error
* update
2025-05-22 15:39:27 +08:00
Xiangyu Tian
154af7d7f7
vLLM: set convert_to_half to False by default ( #13172 )
...
* init
* remove
* fix
2025-05-21 18:41:28 +08:00
Shaojun Liu
1576347892
Update Dockerfile ( #13168 )
2025-05-20 16:41:13 +08:00
Wang, Jian4
66eb054988
Update vllm patch ( #13164 )
2025-05-19 16:54:21 +08:00
Wang, Jian4
d83e5068d2
Enable whisper ( #13162 )
...
* fix error
* update dockerfile
2025-05-19 14:07:51 +08:00
Yina Chen
8ba57b41cd
Add merge quantized qkv ( #13160 )
...
* add merge quantized qkv
* fix style & device
* add check
2025-05-16 15:46:47 +08:00
Emmanuel Ferdman
1e4e1353a0
Resolve messages formatting issues ( #13095 )
...
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
2025-05-15 16:46:52 +08:00
Kai Huang
35b49e4d91
Add trl version in error message ( #13049 )
...
* add version in error msg
* fix style
2025-05-15 09:16:27 +08:00
Pranav Singh
bd45bf7584
Update llama_cpp_quickstart.md ( #13145 )
...
Signed-off-by: Pranav Singh <pranav.singh@intel.com>
2025-05-15 08:40:53 +08:00
Shaojun Liu
bd71739e64
Update docs and scripts to align with new Docker image release ( #13156 )
...
* Update vllm_docker_quickstart.md
* Update start-vllm-service.sh
* Update vllm_docker_quickstart.md
* Update start-vllm-service.sh
2025-05-13 17:06:29 +08:00
Yina Chen
f6441b4e3d
Add moe_softmax_topk ( #13157 )
...
* add moe_softmax_topk
* address comments
* update
2025-05-13 14:50:59 +08:00
Yuwen Hu
aa12f69bbf
Update Ollama portable zip QuickStart regarding saving VRAM ( #13155 )
...
* Update Ollama portable zip quickstart regarding saving VRAM
* Small fix
2025-05-13 13:25:22 +08:00
Jason Dai
086a8b3ab9
Update flashmoe_quickstart ( #13154 )
2025-05-13 07:56:09 +08:00
Xiangyu Tian
886c7632b2
Add IPEX_LLM_FORCE_BATCH_FORWARD for vLLM docker image ( #13151 )
2025-05-12 13:44:33 +08:00
Wang, Jian4
5df03ced2c
Update vllm patch for fix telechat2 and baichuan2 error( #13150 )
2025-05-12 10:54:22 +08:00
Jason Dai
9da1c56fa8
Create flashmoe quickstart ( #13147 )
2025-05-12 10:11:22 +08:00
Guancheng Fu
da08c9ca60
Update Dockerfile ( #13148 )
2025-05-12 09:19:18 +08:00
Yuwen Hu
0438e39f3e
Add PyTorch 2.6 support in Latest Update ( #13144 )
2025-05-09 13:26:49 +08:00
Shaojun Liu
45f7bf6688
Refactor vLLM Documentation: Centralize Benchmarking and Improve Readability ( #13141 )
...
* update vllm doc
* update image name
* update
* update
* update
* update
2025-05-09 10:19:42 +08:00
Ruonan Wang
f5d9c49a2a
add rotary_half_with_cache_inplaced to ipex_llm.transformers.models.common ( #13143 )
...
* update
* small fix
2025-05-09 09:20:44 +08:00
Wang, Jian4
f2598b119e
update for bge-m3 ( #13138 )
2025-05-07 16:59:52 +08:00
SONG Ge
e88a2aa65b
Modify ollama num_ctx related doc ( #13139 )
...
* Modify ollama num_ctx related doc
* meet comments
2025-05-07 16:44:58 +08:00
Yishuo Wang
3a28b69202
Add qwen3 support ( #13137 )
2025-05-07 14:03:16 +08:00
Wang, Jian4
be76918b61
Update 083 multimodal benchmark ( #13135 )
...
* update multimodal benchmark
* update
2025-05-07 09:35:09 +08:00
Wang, Jian4
01bc7e9eb9
Fix 083 lm_head error ( #13132 )
...
* fix no quantize error
* update
* update style
2025-05-06 15:47:20 +08:00
SONG Ge
685a749adb
Update ollama-release doc into v0.6.2 ( #13094 )
...
* Update ollama-release doc into v0.6.2
* update
* revert signature changes
2025-04-30 16:22:42 +08:00
Xiangyu Tian
51b41faad7
vLLM: update vLLM XPU to 0.8.3 version ( #13118 )
...
vLLM: update vLLM XPU to 0.8.3 version
2025-04-30 14:40:53 +08:00
Yuwen Hu
f66eee1d1d
Update BMG troubleshooting guides regarding PPA installation ( #13119 )
...
* Update bmg troubleshooting guides regarding PPA installation
* Small fix
* Update based on comments
* Small fix
2025-04-28 15:48:17 +08:00
Jason Dai
ad741503a9
Update bmg_quickstart.md ( #13117 )
2025-04-27 22:03:14 +08:00
Jason Dai
6b033f8982
Update readme ( #13116 )
2025-04-27 18:18:19 +08:00
Guancheng Fu
d222eaffd7
Update README.md ( #13113 )
2025-04-27 17:13:18 +08:00
Wang, Jian4
16fa778e65
enable glm4v and gemma-3 on vllm 083 ( #13114 )
...
* enable glm4v and gemma-3
* update
* add qwen2.5-vl
2025-04-27 17:10:56 +08:00
Guancheng Fu
cf97d8f1d7
Update start-vllm-service.sh ( #13109 )
2025-04-25 15:42:15 +08:00
Ruonan Wang
9808fb1ac2
update doc about flash-moe ( #13103 )
...
* update doc about flashmoe
* revert toc
* meet review, add version note
* small fix
2025-04-24 17:53:14 +08:00
Guancheng Fu
0cfdd399e7
Update README.md ( #13104 )
2025-04-24 10:21:17 +08:00
Yishuo Wang
908fdb982e
small refactor and fix ( #13101 )
2025-04-22 14:45:31 +08:00
Guancheng Fu
14cd613fe1
Update vLLM docs with some new features ( #13092 )
...
* done
* fix
* done
* Update README.md
2025-04-22 14:39:28 +08:00
Yuwen Hu
0801d27a6f
Remove PyTorch 2.3 support for Intel GPU ( #13097 )
...
* Remove PyTorch 2.3 installation option for GPU
* Remove xpu_lnl option in installation guides for docs
* Update BMG quickstart
* Remove PyTorch 2.3 dependencies for GPU examples
* Update the graphmode example to use stable version 2.2.0
* Fix based on comments
2025-04-22 10:26:16 +08:00
Yina Chen
a2a35fdfad
Update portable zip link ( #13098 )
...
* update portable zip link
* update CN
* address comments
* update latest updates
* revert
2025-04-21 17:25:35 +08:00
Ruonan Wang
2f78afcd2a
Refactor some functions to ipex_llm.transformers.models.common ( #13091 )
...
* add quantize_linear & linear_forward
* add moe_group_topk
* rotary_two_with_cache_inplaced
* fix code style
* update related models
2025-04-18 11:15:43 +08:00
Shaojun Liu
73198d5b80
Update to b17 image ( #13085 )
...
* update vllm patch
* fix
* fix triton
---------
Co-authored-by: gc-fu <guancheng.fu@intel.com>
2025-04-17 16:18:22 +08:00
Shaojun Liu
db5edba786
Update Dockerfile ( #13081 )
2025-04-16 09:18:46 +08:00
Shaojun Liu
fa56212bb3
Update vLLM patch ( #13079 )
...
* update vllm patch
* Update Dockerfile
2025-04-15 16:55:29 +08:00
Shaojun Liu
f5aaa83649
Update serving-xpu Dockerfile ( #13077 )
...
* Update Dockerfile
* Update Dockerfile
2025-04-15 13:34:14 +08:00