ipex-llm

Author	SHA1	Message	Date
Yishuo Wang	9697197f3e	fix qlora finetune example (#12769 )	2025-02-06 11:18:28 +08:00
Ruonan Wang	094a25b740	[NPU] Expose parameter to control blob / IR save logic (#12767 ) * update api * fix convert.py * fix style * remove unnecessary bin file * fix style	2025-02-06 10:07:45 +08:00
Yishuo Wang	0237ffb302	refactor xpu linear forward (#12768 )	2025-02-05 17:40:38 +08:00
Danciu Georgian	413d6c2b66	Update check.py removing a twice defined function (#12760 ) Remove duplicate function	2025-02-05 11:37:59 +08:00
Yuwen Hu	184adb2653	Small fix to MiniCPM-o-2_6 GPU example (#12766 )	2025-02-05 11:32:26 +08:00
Shaojun Liu	5fb87d7486	remove ${HF_TOKEN} (#12742 )	2025-01-26 10:31:42 +08:00
Yuwen Hu	69f13c78b8	[NPU] Update layernorm node on MTL/ARL (#12738 ) * Update layernorm node on MTL/ARL * Fix on style	2025-01-23 17:25:19 +08:00
Yuwen Hu	d11f257ee7	Add GPU example for MiniCPM-o-2_6 (#12735 ) * Add init example for omni mode * Small fix * Small fix * Add chat example * Remove lagecy link * Further update link * Add readme * Small fix * Update main readme link * Update based on comments * Small fix * Small fix * Small fix	2025-01-23 16:10:19 +08:00
Yuwen Hu	dcca522618	Remove sdpa available patch (#12734 )	2025-01-22 17:22:28 +08:00
Xiangyu Tian	c9b6c94a59	vLLM: Update vLLM-cpu to v0.6.6-post1 (#12728 ) Update vLLM-cpu to v0.6.6-post1	2025-01-22 15:03:01 +08:00
Ruonan Wang	78cca0a68c	[NPU] update llm-npu-cli example (#12729 ) * update cli example * add license * rename * update readme sample output	2025-01-22 09:59:27 +08:00
Yishuo Wang	6789e5d92f	small fix (#12727 )	2025-01-21 17:27:18 +08:00
Yishuo Wang	085974e307	fix nf4 to cpu (#12722 )	2025-01-21 09:23:22 +08:00
Yuwen Hu	9aa4be8ced	Update runtime configuration on MTL (#12720 )	2025-01-20 11:06:37 +08:00
Yishuo Wang	bda87c21eb	add support and optimization for minicpmo audio part (#12716 )	2025-01-16 16:39:00 +08:00
Yuwen Hu	534e0e6774	Update dependency for PyTorch 2.6 RC support for woq int4 (#12714 )	2025-01-16 15:51:57 +08:00
Zhao Changmin	54d6328b3c	woq int4 fwd (#12711 )	2025-01-16 15:48:05 +08:00
Yishuo Wang	b62734748f	add support and optimization for minicpmo vision part (#12713 )	2025-01-16 14:51:00 +08:00
Yuwen Hu	c52bdff76b	Update Deepseek coder GPU example (#12712 ) * Update Deepseek coder GPU example * Fix based on comment	2025-01-16 14:05:31 +08:00
Yuwen Hu	9d65dcd7ef	Fix deepseek coder with linear rope type support on GPU (#12709 ) * Fix deepseek coder with linear rope type * Style fix * Move to optimize_pre * Small fix * Small fix * Small fix to not affect other cases * Style fixes * Update function name * Small fix * Small fix * Small fix * Fix for low transformers version first * Style fix * Small fix	2025-01-15 21:12:34 +08:00
Cengguang Zhang	9930351112	LLM: add new qtype woq_int4 to support gemm int4 temporary. (#12706 ) This PR add temporary qtype woq_int4 to avoid affecting other qtype and models. Co-authored-by: leonardozcm <leonardo1997zcm@gmail.com>	2025-01-15 14:41:33 +08:00
Xu, Shuo	350fae285d	Add Qwen2-VL HF GPU example with ModelScope Support (#12606 ) * Add qwen2-vl example * complete generate.py & readme * improve lint style * update 1-6 * update main readme * Format and other small fixes --------- Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>	2025-01-13 15:42:04 +08:00
Yuwen Hu	a1da7908b9	Fix name device is not found bug (#12703 )	2025-01-13 10:11:02 +08:00
Yishuo Wang	db9db51e2c	fix lnl perf (#12700 )	2025-01-10 18:00:58 +08:00
binbin Deng	da8bcb7db1	[NPU ] fix load logic of glm-edge models (#12698 )	2025-01-10 16:08:37 +08:00
Yishuo Wang	f8dc408888	fix user issue (#12692 )	2025-01-10 10:18:47 +08:00
Yishuo Wang	68857494a5	refactor to simplify following upgrade 2 (#12685 )	2025-01-10 09:29:03 +08:00
Yishuo Wang	7234c9b27b	update quantize kv cache condition (#12681 )	2025-01-09 15:23:04 +08:00
Yuwen Hu	5d8081afbc	Remove dummy model from performance tests (#12682 )	2025-01-09 14:50:17 +08:00
Yishuo Wang	1ec40cd09e	refactor to simplify following upgrade (#12680 )	2025-01-09 13:34:30 +08:00
Yishuo Wang	5c24276fc4	fix custom kernel registration (#12674 )	2025-01-08 17:39:17 +08:00
Yishuo Wang	a22a8c21bb	small fix and remove ununsed code about ipex (#12671 )	2025-01-08 17:39:04 +08:00
Yishuo Wang	c11f5f0fcd	also convert SdpaAttention in optimize_model (#12673 )	2025-01-08 16:48:03 +08:00
Yishuo Wang	7dd156d292	small fix and add comment (#12670 )	2025-01-08 10:56:50 +08:00
Yishuo Wang	ccf618ff4a	Remove all ipex usage (#12666 )	2025-01-08 10:31:18 +08:00
Yuwen Hu	5db6f9dcde	Add option with PyTorch 2.6 RC version for testing purposes (#12668 ) * Add option with PyTorch 2.6 RC version for testing purposes * Small update	2025-01-07 18:28:55 +08:00
Yishuo Wang	f9ee7898c8	fix onednn dependency bug (#12665 )	2025-01-07 16:26:56 +08:00
Yishuo Wang	29ad5c449e	refactor codegeex to remove ipex kernel usage (#12664 )	2025-01-07 16:17:40 +08:00
Yuwen Hu	525b0ee991	[NPU] Tiny fixes on examples (#12661 )	2025-01-07 14:30:38 +08:00
Yuwen Hu	ebdf19fa7e	[NPU] Further fix saving of generation config (#12657 ) * Further fix saving of generation config * Fix based on comments * Small fix	2025-01-07 13:53:54 +08:00
Yuwen Hu	381d448ee2	[NPU] Example & Quickstart updates (#12650 ) * Remove model with optimize_model=False in NPU verified models tables, and remove related example * Remove experimental in run optimized model section title * Unify model table order & example cmd * Move embedding example to separate folder & update quickstart example link * Add Quickstart reference in main NPU readme * Small fix * Small fix * Move save/load examples under NPU/HF-Transformers-AutoModels * Add low-bit and polish arguments for LLM Python examples * Small fix * Add low-bit and polish arguments for Multi-Model examples * Polish argument for Embedding models * Polish argument for LLM CPP examples * Add low-bit and polish argument for Save-Load examples * Add accuracy tuning tips for examples * Update NPU qucikstart accuracy tuning with low-bit optimizations * Add save/load section to qucikstart * Update CPP example sample output to EN * Add installation regarding cmake for CPP examples * Small fix * Small fix * Small fix * Small fix * Small fix * Small fix * Unify max prompt length to 512 * Change recommended low-bit for Qwen2.5-3B-Instruct to asym_int4 * Update based on comments * Small fix	2025-01-07 13:52:41 +08:00
Yishuo Wang	ddc0ef3993	refactor device check and remove cohere/mixtral support (#12659 )	2025-01-07 11:15:51 +08:00
Yishuo Wang	ea65e4fecc	remove falcon support and related UT (#12656 )	2025-01-07 09:26:00 +08:00
Yina Chen	fae73eee79	[NPU] Support save npu quantized model without npu dependency (#12647 ) * support save awq * load quantized model & save npu compiled model * fix style * update * fix dll load issue * update error message * fix style	2025-01-06 18:06:22 +08:00
Yishuo Wang	502461d836	remove unnecessary ipex kernel usage (#12649 )	2025-01-03 16:45:24 +08:00
Yishuo Wang	9f8b134889	add ipex-llm custom kernel registration (#12648 )	2025-01-03 16:45:04 +08:00
binbin Deng	0b377100c5	Add guide for save-load usage (#12498 )	2025-01-03 16:30:15 +08:00
Wang, Jian4	6711a48a36	Enable internvl2-8b on vllm(#12645 )	2025-01-03 14:49:36 +08:00
Zijie Li	8fd2dcba86	Add benchmark_util for `transformers >= 4.47.0` (#12644 )	2025-01-03 10:48:29 +08:00
Yina Chen	8e5328e9b4	add disable opts for awq (#12641 )	2025-01-02 15:45:22 +08:00

1 2 3 4 5 ...

2183 commits