ipex-llm

Author	SHA1	Message	Date
zxue2	6ba3138d7c	Fix ambiguous boolean evaluation in bert.py (#13236 ) Signed-off-by: Xue, Zhan <zhan.xue@intel.com>	2025-06-30 14:14:01 +08:00
Guancheng Fu	3f6d407be4	Fix engine.py (#13215 )	2025-06-09 09:03:17 +08:00
Shaojun Liu	5a629ae470	update vllm patch (#13211 ) Co-authored-by: gc-fu <guancheng.fu@intel.com>	2025-06-06 17:20:45 +08:00
Guancheng Fu	ac04992278	Update engine.py (#13209 )	2025-06-06 15:47:33 +08:00
Ruonan Wang	dd49368e0c	only install onednn for windows when torch 2.6 (#13207 )	2025-06-05 17:28:21 +08:00
Wang, Jian4	5a1c1297e1	Fix internvl fp16 error (#13205 )	2025-06-05 11:17:44 +08:00
Wang, Jian4	45864790f7	Enable phi-4 with vision and audio (#13203 ) * add phi4 * update * enable audio * update and add readme	2025-06-05 10:15:20 +08:00
Yina Chen	e032156518	Support torch_fp8 (#13196 ) * support torch_fp8	2025-06-04 20:08:01 +08:00
Guancheng Fu	3accc31b86	Update 1ccl_for_multi_arc.patch (#13199 )	2025-05-30 17:13:59 +08:00
Guancheng Fu	bb50cd0881	Update api_server.py (#13198 )	2025-05-30 09:26:53 +08:00
Ruonan Wang	9df610f80d	fix trl import when not running speculative (#13187 ) * fix trl import when not running speculative * fix style	2025-05-26 13:21:54 +08:00
Shaojun Liu	c5d919b151	update vllm patch (#13185 ) Co-authored-by: gc-fu <guancheng.fu@intel.com>	2025-05-23 15:02:50 +08:00
Xiangyu Tian	531bef2810	vLLM: Fix conver_to_half condition (#13177 ) * fix * format	2025-05-22 15:44:10 +08:00
Wang, Jian4	e3130a06ed	Fix multimodal errors (#13178 ) * fix glm4v int4 output error * fix glm-4v qwen2.5-vl fp16 error * update	2025-05-22 15:39:27 +08:00
Xiangyu Tian	154af7d7f7	vLLM: set convert_to_half to False by default (#13172 ) * init * remove * fix	2025-05-21 18:41:28 +08:00
Shaojun Liu	1576347892	Update Dockerfile (#13168 )	2025-05-20 16:41:13 +08:00
Wang, Jian4	66eb054988	Update vllm patch (#13164 )	2025-05-19 16:54:21 +08:00
Wang, Jian4	d83e5068d2	Enable whisper (#13162 ) * fix error * update dockerfile	2025-05-19 14:07:51 +08:00
Yina Chen	8ba57b41cd	Add merge quantized qkv (#13160 ) * add merge quantized qkv * fix style & device * add check	2025-05-16 15:46:47 +08:00
Emmanuel Ferdman	1e4e1353a0	Resolve messages formatting issues (#13095 ) Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>	2025-05-15 16:46:52 +08:00
Kai Huang	35b49e4d91	Add trl version in error message (#13049 ) * add version in error msg * fix style	2025-05-15 09:16:27 +08:00
Pranav Singh	bd45bf7584	Update llama_cpp_quickstart.md (#13145 ) Signed-off-by: Pranav Singh <pranav.singh@intel.com>	2025-05-15 08:40:53 +08:00
Shaojun Liu	bd71739e64	Update docs and scripts to align with new Docker image release (#13156 ) * Update vllm_docker_quickstart.md * Update start-vllm-service.sh * Update vllm_docker_quickstart.md * Update start-vllm-service.sh	2025-05-13 17:06:29 +08:00
Yina Chen	f6441b4e3d	Add moe_softmax_topk (#13157 ) * add moe_softmax_topk * address comments * update	2025-05-13 14:50:59 +08:00
Yuwen Hu	aa12f69bbf	Update Ollama portable zip QuickStart regarding saving VRAM (#13155 ) * Update Ollama portable zip quickstart regarding saving VRAM * Small fix	2025-05-13 13:25:22 +08:00
Jason Dai	086a8b3ab9	Update flashmoe_quickstart (#13154 )	2025-05-13 07:56:09 +08:00
Xiangyu Tian	886c7632b2	Add IPEX_LLM_FORCE_BATCH_FORWARD for vLLM docker image (#13151 )	2025-05-12 13:44:33 +08:00
Wang, Jian4	5df03ced2c	Update vllm patch for fix telechat2 and baichuan2 error(#13150 )	2025-05-12 10:54:22 +08:00
Jason Dai	9da1c56fa8	Create flashmoe quickstart (#13147 )	2025-05-12 10:11:22 +08:00
Guancheng Fu	da08c9ca60	Update Dockerfile (#13148 )	2025-05-12 09:19:18 +08:00
Yuwen Hu	0438e39f3e	Add PyTorch 2.6 support in Latest Update (#13144 )	2025-05-09 13:26:49 +08:00
Shaojun Liu	45f7bf6688	Refactor vLLM Documentation: Centralize Benchmarking and Improve Readability (#13141 ) * update vllm doc * update image name * update * update * update * update	2025-05-09 10:19:42 +08:00
Ruonan Wang	f5d9c49a2a	add `rotary_half_with_cache_inplaced` to `ipex_llm.transformers.models.common` (#13143 ) * update * small fix	2025-05-09 09:20:44 +08:00
Wang, Jian4	f2598b119e	update for bge-m3 (#13138 )	2025-05-07 16:59:52 +08:00
SONG Ge	e88a2aa65b	Modify ollama num_ctx related doc (#13139 ) * Modify ollama num_ctx related doc * meet comments	2025-05-07 16:44:58 +08:00
Yishuo Wang	3a28b69202	Add qwen3 support (#13137 )	2025-05-07 14:03:16 +08:00
Wang, Jian4	be76918b61	Update 083 multimodal benchmark (#13135 ) * update multimodal benchmark * update	2025-05-07 09:35:09 +08:00
Wang, Jian4	01bc7e9eb9	Fix 083 lm_head error (#13132 ) * fix no quantize error * update * update style	2025-05-06 15:47:20 +08:00
SONG Ge	685a749adb	Update ollama-release doc into v0.6.2 (#13094 ) * Update ollama-release doc into v0.6.2 * update * revert signature changes	2025-04-30 16:22:42 +08:00
Xiangyu Tian	51b41faad7	vLLM: update vLLM XPU to 0.8.3 version (#13118 ) vLLM: update vLLM XPU to 0.8.3 version	2025-04-30 14:40:53 +08:00
Yuwen Hu	f66eee1d1d	Update BMG troubleshooting guides regarding PPA installation (#13119 ) * Update bmg troubleshooting guides regarding PPA installation * Small fix * Update based on comments * Small fix	2025-04-28 15:48:17 +08:00
Jason Dai	ad741503a9	Update bmg_quickstart.md (#13117 )	2025-04-27 22:03:14 +08:00
Jason Dai	6b033f8982	Update readme (#13116 )	2025-04-27 18:18:19 +08:00
Guancheng Fu	d222eaffd7	Update README.md (#13113 )	2025-04-27 17:13:18 +08:00
Wang, Jian4	16fa778e65	enable glm4v and gemma-3 on vllm 083 (#13114 ) * enable glm4v and gemma-3 * update * add qwen2.5-vl	2025-04-27 17:10:56 +08:00
Guancheng Fu	cf97d8f1d7	Update start-vllm-service.sh (#13109 )	2025-04-25 15:42:15 +08:00
Ruonan Wang	9808fb1ac2	update doc about flash-moe (#13103 ) * update doc about flashmoe * revert toc * meet review, add version note * small fix	2025-04-24 17:53:14 +08:00
Guancheng Fu	0cfdd399e7	Update README.md (#13104 )	2025-04-24 10:21:17 +08:00
Yishuo Wang	908fdb982e	small refactor and fix (#13101 )	2025-04-22 14:45:31 +08:00
Guancheng Fu	14cd613fe1	Update vLLM docs with some new features (#13092 ) * done * fix * done * Update README.md	2025-04-22 14:39:28 +08:00

1 2 3 4 5 ...

4100 commits