ipex-llm

Author	SHA1	Message	Date
Yina Chen	e032156518	Support torch_fp8 (#13196 ) * support torch_fp8	2025-06-04 20:08:01 +08:00
Guancheng Fu	3accc31b86	Update 1ccl_for_multi_arc.patch (#13199 )	2025-05-30 17:13:59 +08:00
Guancheng Fu	bb50cd0881	Update api_server.py (#13198 )	2025-05-30 09:26:53 +08:00
Ruonan Wang	9df610f80d	fix trl import when not running speculative (#13187 ) * fix trl import when not running speculative * fix style	2025-05-26 13:21:54 +08:00
Shaojun Liu	c5d919b151	update vllm patch (#13185 ) Co-authored-by: gc-fu <guancheng.fu@intel.com>	2025-05-23 15:02:50 +08:00
Xiangyu Tian	531bef2810	vLLM: Fix conver_to_half condition (#13177 ) * fix * format	2025-05-22 15:44:10 +08:00
Wang, Jian4	e3130a06ed	Fix multimodal errors (#13178 ) * fix glm4v int4 output error * fix glm-4v qwen2.5-vl fp16 error * update	2025-05-22 15:39:27 +08:00
Xiangyu Tian	154af7d7f7	vLLM: set convert_to_half to False by default (#13172 ) * init * remove * fix	2025-05-21 18:41:28 +08:00
Shaojun Liu	1576347892	Update Dockerfile (#13168 )	2025-05-20 16:41:13 +08:00
Wang, Jian4	66eb054988	Update vllm patch (#13164 )	2025-05-19 16:54:21 +08:00
Wang, Jian4	d83e5068d2	Enable whisper (#13162 ) * fix error * update dockerfile	2025-05-19 14:07:51 +08:00
Yina Chen	8ba57b41cd	Add merge quantized qkv (#13160 ) * add merge quantized qkv * fix style & device * add check	2025-05-16 15:46:47 +08:00
Emmanuel Ferdman	1e4e1353a0	Resolve messages formatting issues (#13095 ) Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>	2025-05-15 16:46:52 +08:00
Kai Huang	35b49e4d91	Add trl version in error message (#13049 ) * add version in error msg * fix style	2025-05-15 09:16:27 +08:00
Pranav Singh	bd45bf7584	Update llama_cpp_quickstart.md (#13145 ) Signed-off-by: Pranav Singh <pranav.singh@intel.com>	2025-05-15 08:40:53 +08:00
Shaojun Liu	bd71739e64	Update docs and scripts to align with new Docker image release (#13156 ) * Update vllm_docker_quickstart.md * Update start-vllm-service.sh * Update vllm_docker_quickstart.md * Update start-vllm-service.sh	2025-05-13 17:06:29 +08:00
Yina Chen	f6441b4e3d	Add moe_softmax_topk (#13157 ) * add moe_softmax_topk * address comments * update	2025-05-13 14:50:59 +08:00
Yuwen Hu	aa12f69bbf	Update Ollama portable zip QuickStart regarding saving VRAM (#13155 ) * Update Ollama portable zip quickstart regarding saving VRAM * Small fix	2025-05-13 13:25:22 +08:00
Jason Dai	086a8b3ab9	Update flashmoe_quickstart (#13154 )	2025-05-13 07:56:09 +08:00
Xiangyu Tian	886c7632b2	Add IPEX_LLM_FORCE_BATCH_FORWARD for vLLM docker image (#13151 )	2025-05-12 13:44:33 +08:00
Wang, Jian4	5df03ced2c	Update vllm patch for fix telechat2 and baichuan2 error(#13150 )	2025-05-12 10:54:22 +08:00
Jason Dai	9da1c56fa8	Create flashmoe quickstart (#13147 )	2025-05-12 10:11:22 +08:00
Guancheng Fu	da08c9ca60	Update Dockerfile (#13148 )	2025-05-12 09:19:18 +08:00
Yuwen Hu	0438e39f3e	Add PyTorch 2.6 support in Latest Update (#13144 )	2025-05-09 13:26:49 +08:00
Shaojun Liu	45f7bf6688	Refactor vLLM Documentation: Centralize Benchmarking and Improve Readability (#13141 ) * update vllm doc * update image name * update * update * update * update	2025-05-09 10:19:42 +08:00
Ruonan Wang	f5d9c49a2a	add `rotary_half_with_cache_inplaced` to `ipex_llm.transformers.models.common` (#13143 ) * update * small fix	2025-05-09 09:20:44 +08:00
Wang, Jian4	f2598b119e	update for bge-m3 (#13138 )	2025-05-07 16:59:52 +08:00
SONG Ge	e88a2aa65b	Modify ollama num_ctx related doc (#13139 ) * Modify ollama num_ctx related doc * meet comments	2025-05-07 16:44:58 +08:00
Yishuo Wang	3a28b69202	Add qwen3 support (#13137 )	2025-05-07 14:03:16 +08:00
Wang, Jian4	be76918b61	Update 083 multimodal benchmark (#13135 ) * update multimodal benchmark * update	2025-05-07 09:35:09 +08:00
Wang, Jian4	01bc7e9eb9	Fix 083 lm_head error (#13132 ) * fix no quantize error * update * update style	2025-05-06 15:47:20 +08:00
SONG Ge	685a749adb	Update ollama-release doc into v0.6.2 (#13094 ) * Update ollama-release doc into v0.6.2 * update * revert signature changes	2025-04-30 16:22:42 +08:00
Xiangyu Tian	51b41faad7	vLLM: update vLLM XPU to 0.8.3 version (#13118 ) vLLM: update vLLM XPU to 0.8.3 version	2025-04-30 14:40:53 +08:00
Yuwen Hu	f66eee1d1d	Update BMG troubleshooting guides regarding PPA installation (#13119 ) * Update bmg troubleshooting guides regarding PPA installation * Small fix * Update based on comments * Small fix	2025-04-28 15:48:17 +08:00
Jason Dai	ad741503a9	Update bmg_quickstart.md (#13117 )	2025-04-27 22:03:14 +08:00
Jason Dai	6b033f8982	Update readme (#13116 )	2025-04-27 18:18:19 +08:00
Guancheng Fu	d222eaffd7	Update README.md (#13113 )	2025-04-27 17:13:18 +08:00
Wang, Jian4	16fa778e65	enable glm4v and gemma-3 on vllm 083 (#13114 ) * enable glm4v and gemma-3 * update * add qwen2.5-vl	2025-04-27 17:10:56 +08:00
Guancheng Fu	cf97d8f1d7	Update start-vllm-service.sh (#13109 )	2025-04-25 15:42:15 +08:00
Ruonan Wang	9808fb1ac2	update doc about flash-moe (#13103 ) * update doc about flashmoe * revert toc * meet review, add version note * small fix	2025-04-24 17:53:14 +08:00
Guancheng Fu	0cfdd399e7	Update README.md (#13104 )	2025-04-24 10:21:17 +08:00
Yishuo Wang	908fdb982e	small refactor and fix (#13101 )	2025-04-22 14:45:31 +08:00
Guancheng Fu	14cd613fe1	Update vLLM docs with some new features (#13092 ) * done * fix * done * Update README.md	2025-04-22 14:39:28 +08:00
Yuwen Hu	0801d27a6f	Remove PyTorch 2.3 support for Intel GPU (#13097 ) * Remove PyTorch 2.3 installation option for GPU * Remove xpu_lnl option in installation guides for docs * Update BMG quickstart * Remove PyTorch 2.3 dependencies for GPU examples * Update the graphmode example to use stable version 2.2.0 * Fix based on comments	2025-04-22 10:26:16 +08:00
Yina Chen	a2a35fdfad	Update portable zip link (#13098 ) * update portable zip link * update CN * address comments * update latest updates * revert	2025-04-21 17:25:35 +08:00
Ruonan Wang	2f78afcd2a	Refactor some functions to `ipex_llm.transformers.models.common` (#13091 ) * add quantize_linear & linear_forward * add moe_group_topk * rotary_two_with_cache_inplaced * fix code style * update related models	2025-04-18 11:15:43 +08:00
Shaojun Liu	73198d5b80	Update to b17 image (#13085 ) * update vllm patch * fix * fix triton --------- Co-authored-by: gc-fu <guancheng.fu@intel.com>	2025-04-17 16:18:22 +08:00
Shaojun Liu	db5edba786	Update Dockerfile (#13081 )	2025-04-16 09:18:46 +08:00
Shaojun Liu	fa56212bb3	Update vLLM patch (#13079 ) * update vllm patch * Update Dockerfile	2025-04-15 16:55:29 +08:00
Shaojun Liu	f5aaa83649	Update serving-xpu Dockerfile (#13077 ) * Update Dockerfile * Update Dockerfile	2025-04-15 13:34:14 +08:00

1 2 3 4 5 ...

4093 commits