ipex-llm

Author	SHA1	Message	Date
Wang, Jian4	16fa778e65	enable glm4v and gemma-3 on vllm 083 (#13114 ) * enable glm4v and gemma-3 * update * add qwen2.5-vl	2025-04-27 17:10:56 +08:00
Yishuo Wang	908fdb982e	small refactor and fix (#13101 )	2025-04-22 14:45:31 +08:00
Ruonan Wang	2f78afcd2a	Refactor some functions to `ipex_llm.transformers.models.common` (#13091 ) * add quantize_linear & linear_forward * add moe_group_topk * rotary_two_with_cache_inplaced * fix code style * update related models	2025-04-18 11:15:43 +08:00
Ruonan Wang	e08c6bd018	Fix several models based on sdp api change (#13075 ) * fix baichuan based on sdp api change * fix several models based on api change * fix style	2025-04-15 11:13:12 +08:00
Yishuo Wang	10c30cdba9	set woq_int4 as default int4 (#13021 )	2025-04-14 14:10:59 +08:00
Ruonan Wang	6693e8ab04	Deepseek kv / sdp support (#13068 ) * update kv * fix * fix style	2025-04-11 11:26:15 +08:00
Yishuo Wang	ef852dcb4a	add audio optimization for qwen2.5-omni (#13037 )	2025-04-07 17:20:26 +08:00
Yishuo Wang	300eb01d98	Add basic optimization for Qwen2.5 omni (#13022 )	2025-03-28 17:21:52 +08:00
Guancheng Fu	f437b36678	Fix vllm glm edge model (#13007 ) * fix done * fix	2025-03-26 09:25:32 +08:00
Yuwen Hu	374747b492	Update bert optimization to fit higher transformers/torch version (#13006 )	2025-03-25 16:12:03 +08:00
Yuwen Hu	5bdf57327d	Remove ipex import in fastchat loader (#12984 )	2025-03-20 18:29:00 +08:00
Wang, Jian4	c9ecb7a113	Fix qwen nan value issue on vllm (#12971 ) * add to fix qwen nan value issue * update	2025-03-14 14:43:54 +08:00
Wang, Jian4	c8a0462507	Add vllm api_server input output log (#12962 )	2025-03-12 20:58:04 +08:00
Yishuo Wang	b6f33d5c4d	optimize moonlight again (#12909 )	2025-03-03 09:21:15 +08:00
Yishuo Wang	39e360fe9d	add grouped topk optimization for moonlight (#12903 )	2025-02-28 13:25:56 +08:00
Xin Qiu	e946127613	glm 4v 1st sdp for vision (#12904 ) * glm4v 1st sdp * update glm4v example * meet code review * fix style	2025-02-28 13:23:27 +08:00
Yishuo Wang	be1f073866	add fuse moe optimization for moonlight (#12898 )	2025-02-27 09:15:24 +08:00
Yishuo Wang	5faba06409	simple optimization for moonlight moe decoding forward (#12891 )	2025-02-25 16:18:27 +08:00
Yishuo Wang	ab3fc66eb7	optimize attention part of moonlight-14B-A3B (#12886 )	2025-02-25 09:38:13 +08:00
Yishuo Wang	3f6ecce508	support using xgrammar to get json output (#12870 )	2025-02-24 14:10:58 +08:00
Wang, Jian4	3ea5389a99	Fix vllm api_server v1/models error (#12867 )	2025-02-21 11:08:29 +08:00
Wang, Jian4	348dc8056d	Fix vllm gptq awq error (#12863 ) * fix gptq awq error * fix python style	2025-02-20 16:27:23 +08:00
Guancheng Fu	4eed0c7d99	initial implementation for low_bit_loader vLLM (#12838 ) * initial * add logic for handling tensor parallel models * fix * Add some comments * add doc * fix done	2025-02-19 19:45:34 +08:00
Xiangyu Tian	b26409d53f	R1 Hybrid: Add Benchmark for DeepSeek R1 transformers example (#12854 ) * init * fix * update * update * fix * fix	2025-02-19 18:33:21 +08:00
Yishuo Wang	aee2db30f9	update sdp support (#12847 )	2025-02-19 12:07:00 +08:00
Xiangyu Tian	93c10be762	LLM: Support hybrid convert for DeepSeek V3/R1 (#12834 ) LLM: Support hybrid convert for DeepSeek V3/R1	2025-02-19 11:31:19 +08:00
Wang, Jian4	e1809a6295	Update multimodal on vllm 0.6.6 (#12816 ) * add glm4v and minicpmv example * fix	2025-02-19 10:04:42 +08:00
Yishuo Wang	8418450300	optimize minicpm-o's tts part (#12833 )	2025-02-17 14:53:37 +08:00
Wang, Jian4	1083fe5508	Reenable pp and lightweight-serving serving on 0.6.6 (#12814 ) * reenable pp ang lightweight serving on 066 * update readme * updat * update tag	2025-02-13 10:16:00 +08:00
Guancheng Fu	af693425f1	Upgrade to vLLM 0.6.6 (#12796 ) * init * update engine init * fix serving load_in_low_bit problem * temp * temp * temp * temp * temp * fix * fixed * done * fix * fix all arguments * fix * fix throughput script * fix * fix * use official ipex-llm * Fix readme * fix --------- Co-authored-by: hzjane <a1015616934@qq.com>	2025-02-12 16:47:51 +08:00
Yishuo Wang	f8ab833f74	support and optimize janus pro (#12813 )	2025-02-12 15:07:24 +08:00
Yishuo Wang	73cfe293fa	add basic support for Baichuan-M1-14B-Instruct (#12808 )	2025-02-11 17:27:42 +08:00
Xiangyu Tian	b70ad902b4	Fix ipex-llm CPU linear dtype not match (#12805 )	2025-02-11 10:34:44 +08:00
Yishuo Wang	e4ceb722b6	fix qwen2 vl (#12798 )	2025-02-10 13:25:53 +08:00
Ruonan Wang	e90a9ad196	[NPU] Support non-const parameter for decoder layers when keep_ir=True (#12789 ) * support layernorm=False for decoder layers * renbame to meet review * fix style * rename to const_parameter * fix rebase error * fix rebase error	2025-02-08 09:58:42 +08:00
binbin Deng	ca1d7b7c2c	[NPU] Support qwen models with `cos_sin_input=True` (#12788 )	2025-02-07 16:41:13 +08:00
Yishuo Wang	d0d9c9d636	remove load_in_8bit usage as it is not supported a long time ago (#12779 )	2025-02-07 11:21:29 +08:00
Yishuo Wang	9697197f3e	fix qlora finetune example (#12769 )	2025-02-06 11:18:28 +08:00
Ruonan Wang	094a25b740	[NPU] Expose parameter to control blob / IR save logic (#12767 ) * update api * fix convert.py * fix style * remove unnecessary bin file * fix style	2025-02-06 10:07:45 +08:00
Yishuo Wang	0237ffb302	refactor xpu linear forward (#12768 )	2025-02-05 17:40:38 +08:00
Yuwen Hu	69f13c78b8	[NPU] Update layernorm node on MTL/ARL (#12738 ) * Update layernorm node on MTL/ARL * Fix on style	2025-01-23 17:25:19 +08:00
Yuwen Hu	dcca522618	Remove sdpa available patch (#12734 )	2025-01-22 17:22:28 +08:00
Xiangyu Tian	c9b6c94a59	vLLM: Update vLLM-cpu to v0.6.6-post1 (#12728 ) Update vLLM-cpu to v0.6.6-post1	2025-01-22 15:03:01 +08:00
Yishuo Wang	6789e5d92f	small fix (#12727 )	2025-01-21 17:27:18 +08:00
Yishuo Wang	085974e307	fix nf4 to cpu (#12722 )	2025-01-21 09:23:22 +08:00
Yuwen Hu	9aa4be8ced	Update runtime configuration on MTL (#12720 )	2025-01-20 11:06:37 +08:00
Yishuo Wang	bda87c21eb	add support and optimization for minicpmo audio part (#12716 )	2025-01-16 16:39:00 +08:00
Zhao Changmin	54d6328b3c	woq int4 fwd (#12711 )	2025-01-16 15:48:05 +08:00
Yishuo Wang	b62734748f	add support and optimization for minicpmo vision part (#12713 )	2025-01-16 14:51:00 +08:00
Yuwen Hu	9d65dcd7ef	Fix deepseek coder with linear rope type support on GPU (#12709 ) * Fix deepseek coder with linear rope type * Style fix * Move to optimize_pre * Small fix * Small fix * Small fix to not affect other cases * Style fixes * Update function name * Small fix * Small fix * Small fix * Fix for low transformers version first * Style fix * Small fix	2025-01-15 21:12:34 +08:00

1 2 3 4 5 ...

741 commits