ipex-llm

Author	SHA1	Message	Date
Yuwen Hu	c52bdff76b	Update Deepseek coder GPU example (#12712 ) * Update Deepseek coder GPU example * Fix based on comment	2025-01-16 14:05:31 +08:00
Yuwen Hu	9d65dcd7ef	Fix deepseek coder with linear rope type support on GPU (#12709 ) * Fix deepseek coder with linear rope type * Style fix * Move to optimize_pre * Small fix * Small fix * Small fix to not affect other cases * Style fixes * Update function name * Small fix * Small fix * Small fix * Fix for low transformers version first * Style fix * Small fix	2025-01-15 21:12:34 +08:00
Cengguang Zhang	9930351112	LLM: add new qtype woq_int4 to support gemm int4 temporary. (#12706 ) This PR add temporary qtype woq_int4 to avoid affecting other qtype and models. Co-authored-by: leonardozcm <leonardo1997zcm@gmail.com>	2025-01-15 14:41:33 +08:00
Xu, Shuo	350fae285d	Add Qwen2-VL HF GPU example with ModelScope Support (#12606 ) * Add qwen2-vl example * complete generate.py & readme * improve lint style * update 1-6 * update main readme * Format and other small fixes --------- Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>	2025-01-13 15:42:04 +08:00
Yuwen Hu	a1da7908b9	Fix name device is not found bug (#12703 )	2025-01-13 10:11:02 +08:00
Yishuo Wang	db9db51e2c	fix lnl perf (#12700 )	2025-01-10 18:00:58 +08:00
binbin Deng	da8bcb7db1	[NPU ] fix load logic of glm-edge models (#12698 )	2025-01-10 16:08:37 +08:00
Yishuo Wang	f8dc408888	fix user issue (#12692 )	2025-01-10 10:18:47 +08:00
Yishuo Wang	68857494a5	refactor to simplify following upgrade 2 (#12685 )	2025-01-10 09:29:03 +08:00
Yishuo Wang	7234c9b27b	update quantize kv cache condition (#12681 )	2025-01-09 15:23:04 +08:00
Yuwen Hu	5d8081afbc	Remove dummy model from performance tests (#12682 )	2025-01-09 14:50:17 +08:00
Yishuo Wang	1ec40cd09e	refactor to simplify following upgrade (#12680 )	2025-01-09 13:34:30 +08:00
Yishuo Wang	5c24276fc4	fix custom kernel registration (#12674 )	2025-01-08 17:39:17 +08:00
Yishuo Wang	a22a8c21bb	small fix and remove ununsed code about ipex (#12671 )	2025-01-08 17:39:04 +08:00
Yishuo Wang	c11f5f0fcd	also convert SdpaAttention in optimize_model (#12673 )	2025-01-08 16:48:03 +08:00
Yishuo Wang	7dd156d292	small fix and add comment (#12670 )	2025-01-08 10:56:50 +08:00
Yishuo Wang	ccf618ff4a	Remove all ipex usage (#12666 )	2025-01-08 10:31:18 +08:00
Yuwen Hu	5db6f9dcde	Add option with PyTorch 2.6 RC version for testing purposes (#12668 ) * Add option with PyTorch 2.6 RC version for testing purposes * Small update	2025-01-07 18:28:55 +08:00
Yishuo Wang	f9ee7898c8	fix onednn dependency bug (#12665 )	2025-01-07 16:26:56 +08:00
Yishuo Wang	29ad5c449e	refactor codegeex to remove ipex kernel usage (#12664 )	2025-01-07 16:17:40 +08:00
Yuwen Hu	525b0ee991	[NPU] Tiny fixes on examples (#12661 )	2025-01-07 14:30:38 +08:00
Yuwen Hu	ebdf19fa7e	[NPU] Further fix saving of generation config (#12657 ) * Further fix saving of generation config * Fix based on comments * Small fix	2025-01-07 13:53:54 +08:00
Yuwen Hu	381d448ee2	[NPU] Example & Quickstart updates (#12650 ) * Remove model with optimize_model=False in NPU verified models tables, and remove related example * Remove experimental in run optimized model section title * Unify model table order & example cmd * Move embedding example to separate folder & update quickstart example link * Add Quickstart reference in main NPU readme * Small fix * Small fix * Move save/load examples under NPU/HF-Transformers-AutoModels * Add low-bit and polish arguments for LLM Python examples * Small fix * Add low-bit and polish arguments for Multi-Model examples * Polish argument for Embedding models * Polish argument for LLM CPP examples * Add low-bit and polish argument for Save-Load examples * Add accuracy tuning tips for examples * Update NPU qucikstart accuracy tuning with low-bit optimizations * Add save/load section to qucikstart * Update CPP example sample output to EN * Add installation regarding cmake for CPP examples * Small fix * Small fix * Small fix * Small fix * Small fix * Small fix * Unify max prompt length to 512 * Change recommended low-bit for Qwen2.5-3B-Instruct to asym_int4 * Update based on comments * Small fix	2025-01-07 13:52:41 +08:00
Yishuo Wang	ddc0ef3993	refactor device check and remove cohere/mixtral support (#12659 )	2025-01-07 11:15:51 +08:00
Yishuo Wang	ea65e4fecc	remove falcon support and related UT (#12656 )	2025-01-07 09:26:00 +08:00
Yina Chen	fae73eee79	[NPU] Support save npu quantized model without npu dependency (#12647 ) * support save awq * load quantized model & save npu compiled model * fix style * update * fix dll load issue * update error message * fix style	2025-01-06 18:06:22 +08:00
Yishuo Wang	502461d836	remove unnecessary ipex kernel usage (#12649 )	2025-01-03 16:45:24 +08:00
Yishuo Wang	9f8b134889	add ipex-llm custom kernel registration (#12648 )	2025-01-03 16:45:04 +08:00
binbin Deng	0b377100c5	Add guide for save-load usage (#12498 )	2025-01-03 16:30:15 +08:00
Wang, Jian4	6711a48a36	Enable internvl2-8b on vllm(#12645 )	2025-01-03 14:49:36 +08:00
Zijie Li	8fd2dcba86	Add benchmark_util for `transformers >= 4.47.0` (#12644 )	2025-01-03 10:48:29 +08:00
Yina Chen	8e5328e9b4	add disable opts for awq (#12641 )	2025-01-02 15:45:22 +08:00
Xu, Shuo	62318964fa	Update llama example information (#12640 ) Co-authored-by: ATMxsp01 <shou.xu@intel.com>	2025-01-02 13:48:39 +08:00
Yishuo Wang	81211fd010	remove unused code (#12635 )	2025-01-02 13:31:09 +08:00
binbin Deng	534566e290	[NPU] Support minicpm-v with python cpp backend (#12637 )	2025-01-02 11:13:15 +08:00
Yishuo Wang	f289f68d57	small fix (#12634 )	2024-12-30 17:14:25 +08:00
Yishuo Wang	2d08155513	remove bmm, which is only required in ipex 2.0 (#12630 )	2024-12-27 17:28:57 +08:00
binbin Deng	f17ccfa61a	[NPU] Fix save-load usage of minicpm models (#12628 )	2024-12-27 15:56:46 +08:00
Yishuo Wang	c72a5db757	remove unused code again (#12624 )	2024-12-27 14:17:11 +08:00
binbin Deng	46eeab4479	[NPU] Fix regression caused by layer_norm change (#12627 )	2024-12-27 14:08:49 +08:00
Ruonan Wang	90f6709486	[remove pipeline examples (#12626 )	2024-12-27 13:42:28 +08:00
Zijie Li	5f04ed7254	NPU] Update prompt format for baichuan2-pipeline (#12625 )	2024-12-27 11:30:54 +08:00
Yishuo Wang	34dbdb8ee3	small fix (#12623 )	2024-12-27 10:19:27 +08:00
Xu, Shuo	55ce091242	Add GLM4-Edge-V GPU example (#12596 ) * Add GLM4-Edge-V examples * polish readme * revert wrong changes * polish readme * polish readme * little polish in reference info and indent * Small fix and sample output updates * Update main readme --------- Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>	2024-12-27 09:40:29 +08:00
binbin Deng	796ee571a5	[NPU doc] Update verified platforms (#12621 )	2024-12-26 17:39:13 +08:00
Ruonan Wang	bbdbbb0d88	[NPU] Compatible with other third-party models like auto-round (#12620 ) * support third party model * simplify code * fix sty;e * fix sym int4 GW * code refactor * fix	2024-12-26 17:25:18 +08:00
Yishuo Wang	a9abde0b5d	support passing attn_scale to sdpa (#12619 )	2024-12-26 16:58:09 +08:00
Shaojun Liu	40a7d2b4f0	Consolidated C-Eval Benchmark Guide for Single-GPU and Multi-GPU Environments (#12618 ) * run c-eval on multi-GPUs * Update README.md	2024-12-26 15:23:32 +08:00
Zijie Li	ccc4055058	[NPU] Update prompt format for baichuan2 (#12615 ) * Update baichuan2.py * style fix	2024-12-26 11:41:37 +08:00
Yishuo Wang	1604b4ead8	small fix (#12616 )	2024-12-26 11:35:12 +08:00

1 2 3 4 5 ...

2165 commits