ipex-llm

Author	SHA1	Message	Date
Wang, Jian4	16fa778e65	enable glm4v and gemma-3 on vllm 083 (#13114 ) * enable glm4v and gemma-3 * update * add qwen2.5-vl	2025-04-27 17:10:56 +08:00
Guancheng Fu	0cfdd399e7	Update README.md (#13104 )	2025-04-24 10:21:17 +08:00
Yishuo Wang	908fdb982e	small refactor and fix (#13101 )	2025-04-22 14:45:31 +08:00
Guancheng Fu	14cd613fe1	Update vLLM docs with some new features (#13092 ) * done * fix * done * Update README.md	2025-04-22 14:39:28 +08:00
Yuwen Hu	0801d27a6f	Remove PyTorch 2.3 support for Intel GPU (#13097 ) * Remove PyTorch 2.3 installation option for GPU * Remove xpu_lnl option in installation guides for docs * Update BMG quickstart * Remove PyTorch 2.3 dependencies for GPU examples * Update the graphmode example to use stable version 2.2.0 * Fix based on comments	2025-04-22 10:26:16 +08:00
Ruonan Wang	2f78afcd2a	Refactor some functions to `ipex_llm.transformers.models.common` (#13091 ) * add quantize_linear & linear_forward * add moe_group_topk * rotary_two_with_cache_inplaced * fix code style * update related models	2025-04-18 11:15:43 +08:00
Ruonan Wang	e08c6bd018	Fix several models based on sdp api change (#13075 ) * fix baichuan based on sdp api change * fix several models based on api change * fix style	2025-04-15 11:13:12 +08:00
Yishuo Wang	10c30cdba9	set woq_int4 as default int4 (#13021 )	2025-04-14 14:10:59 +08:00
Ruonan Wang	6693e8ab04	Deepseek kv / sdp support (#13068 ) * update kv * fix * fix style	2025-04-11 11:26:15 +08:00
Yuwen Hu	cd0d4857b8	`ipex-llm` 2.2.0 post-release update (#13053 ) * Update ollama/llama.cpp release link to 2.2.0 (#13052) * Post-update for releasing ipex-llm 2.2.0	2025-04-07 17:41:22 +08:00
Yishuo Wang	ef852dcb4a	add audio optimization for qwen2.5-omni (#13037 )	2025-04-07 17:20:26 +08:00
Yishuo Wang	300eb01d98	Add basic optimization for Qwen2.5 omni (#13022 )	2025-03-28 17:21:52 +08:00
Wang, Jian4	7809ca9864	Reuse --privileged (#13015 ) * fix * add	2025-03-27 10:00:50 +08:00
Guancheng Fu	f437b36678	Fix vllm glm edge model (#13007 ) * fix done * fix	2025-03-26 09:25:32 +08:00
Yuwen Hu	374747b492	Update bert optimization to fit higher transformers/torch version (#13006 )	2025-03-25 16:12:03 +08:00
Ruonan Wang	27d669210f	remove fschat in EAGLE example (#13005 ) * update fschat version * fix	2025-03-25 15:48:48 +08:00
Shaojun Liu	08f96a5139	Rename LICENSE-Intel®-OpenMP*-Runtime-Library.txt to LICENSE-Intel®-OpenMP-Runtime-Library.txt (#13002 )	2025-03-25 10:07:55 +08:00
Shaojun Liu	46a4f53967	OSPDT: add tpp licenses for release 2.2.0 (#12840 ) * Create LICENSE-zstd.txt * Create LICENSE-libcxx.txt * Create LICENSE-libcxxabi.txt * Create LICENSE-safestring.txt * Create LICENSE-stb-image.txt * Create LICENSE-cluster-agent.txt * Create LICENSE-hd-agent.txt * Create LICENSE-platform-telemetry-agent.txt * Create LICENSE-platform-update-agent.txt * Create LICENSE-OpenCL-ICD-Loader.txt * Create LICENSE-xptifw.txt * Create LICENSE-intel-openmp.txt * Create LICENSE-Intel®-OpenMP-Runtime-Library.txt Create LICENSE-Intel®-C-C++-Fortran-Compiler-Mainline.txt * add TPP files * Add TPP files * add tpp * add tpp * update * update	2025-03-21 15:52:22 +08:00
Yuwen Hu	5bdf57327d	Remove ipex import in fastchat loader (#12984 )	2025-03-20 18:29:00 +08:00
Wang, Jian4	c9ecb7a113	Fix qwen nan value issue on vllm (#12971 ) * add to fix qwen nan value issue * update	2025-03-14 14:43:54 +08:00
Heyang Sun	cd109bb061	Gemma QLoRA example (#12969 ) * Gemma QLoRA example * Update README.md * Update README.md --------- Co-authored-by: sgwhat <ge.song@intel.com>	2025-03-14 14:27:51 +08:00
Yuwen Hu	8bc41c13ab	Support PyTorch 2.6 with Arrow Lake-H AOT on Windows (#12967 )	2025-03-13 15:29:47 +08:00
Wang, Jian4	c8a0462507	Add vllm api_server input output log (#12962 )	2025-03-12 20:58:04 +08:00
Shaojun Liu	6a2d87e40f	add `--entrypoint /bin/bash` (#12957 ) Co-authored-by: gc-fu <guancheng.fu@intel.com>	2025-03-10 10:10:27 +08:00
Yuwen Hu	7c0c77cce3	Tiny fixes (#12936 )	2025-03-05 14:55:26 +08:00
Yuwen Hu	68a770745b	Add moonlight GPU example (#12929 ) * Add moonlight GPU example and update table * Small fix * Fix based on comments * Small fix	2025-03-05 11:31:14 +08:00
Shaojun Liu	f81d89d908	Remove Unnecessary --privileged Flag While Keeping It for WSL Users (#12920 )	2025-03-03 11:11:42 +08:00
Yishuo Wang	b6f33d5c4d	optimize moonlight again (#12909 )	2025-03-03 09:21:15 +08:00
Yuwen Hu	443cb5d4e0	Update Janus-Pro GPU example (#12906 )	2025-02-28 15:39:03 +08:00
Yishuo Wang	39e360fe9d	add grouped topk optimization for moonlight (#12903 )	2025-02-28 13:25:56 +08:00
Xin Qiu	e946127613	glm 4v 1st sdp for vision (#12904 ) * glm4v 1st sdp * update glm4v example * meet code review * fix style	2025-02-28 13:23:27 +08:00
Yishuo Wang	be1f073866	add fuse moe optimization for moonlight (#12898 )	2025-02-27 09:15:24 +08:00
Yishuo Wang	5faba06409	simple optimization for moonlight moe decoding forward (#12891 )	2025-02-25 16:18:27 +08:00
Yishuo Wang	ab3fc66eb7	optimize attention part of moonlight-14B-A3B (#12886 )	2025-02-25 09:38:13 +08:00
Yishuo Wang	3f6ecce508	support using xgrammar to get json output (#12870 )	2025-02-24 14:10:58 +08:00
Guancheng Fu	02ec313eab	Update README.md (#12877 )	2025-02-24 09:59:17 +08:00
Xu, Shuo	1e00bed001	Add GPU example for Janus-Pro (#12869 ) * Add example for Janus-Pro * Update model link * Fixes * Fixes --------- Co-authored-by: ATMxsp01 <shou.xu@intel.com> Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>	2025-02-21 18:36:50 +08:00
Wang, Jian4	3ea5389a99	Fix vllm api_server v1/models error (#12867 )	2025-02-21 11:08:29 +08:00
binbin Deng	8077850452	[NPU GGUF] Add simple example (#12853 )	2025-02-21 09:58:00 +08:00
Wang, Jian4	348dc8056d	Fix vllm gptq awq error (#12863 ) * fix gptq awq error * fix python style	2025-02-20 16:27:23 +08:00
Guancheng Fu	4eed0c7d99	initial implementation for low_bit_loader vLLM (#12838 ) * initial * add logic for handling tensor parallel models * fix * Add some comments * add doc * fix done	2025-02-19 19:45:34 +08:00
Xiangyu Tian	b26409d53f	R1 Hybrid: Add Benchmark for DeepSeek R1 transformers example (#12854 ) * init * fix * update * update * fix * fix	2025-02-19 18:33:21 +08:00
Yishuo Wang	aee2db30f9	update sdp support (#12847 )	2025-02-19 12:07:00 +08:00
Xiangyu Tian	93c10be762	LLM: Support hybrid convert for DeepSeek V3/R1 (#12834 ) LLM: Support hybrid convert for DeepSeek V3/R1	2025-02-19 11:31:19 +08:00
Wang, Jian4	e1809a6295	Update multimodal on vllm 0.6.6 (#12816 ) * add glm4v and minicpmv example * fix	2025-02-19 10:04:42 +08:00
Xiangyu Tian	09150b6058	Initiate CPU-XPU Hybrid Inference for DeepSeek-R1 (#12832 ) Initiate CPU-XPU Hybrid Inference for DeepSeek-R1 with DeepseekV3Attention and DeepseekV3MLP to XPU	2025-02-18 13:34:14 +08:00
Xiangyu Tian	09ed96082b	Add DeepSeek V3/R1 CPU example (#12836 ) Add DeepSeek V3/R1 CPU example for bf16 model	2025-02-18 12:45:49 +08:00
Yishuo Wang	8418450300	optimize minicpm-o's tts part (#12833 )	2025-02-17 14:53:37 +08:00
Wang, Jian4	1083fe5508	Reenable pp and lightweight-serving serving on 0.6.6 (#12814 ) * reenable pp ang lightweight serving on 066 * update readme * updat * update tag	2025-02-13 10:16:00 +08:00
Guancheng Fu	af693425f1	Upgrade to vLLM 0.6.6 (#12796 ) * init * update engine init * fix serving load_in_low_bit problem * temp * temp * temp * temp * temp * fix * fixed * done * fix * fix all arguments * fix * fix throughput script * fix * fix * use official ipex-llm * Fix readme * fix --------- Co-authored-by: hzjane <a1015616934@qq.com>	2025-02-12 16:47:51 +08:00

1 2 3 4 5 ...

2250 commits