ipex-llm

Author	SHA1	Message	Date
Yuwen Hu	525b0ee991	[NPU] Tiny fixes on examples (#12661 )	2025-01-07 14:30:38 +08:00
Yuwen Hu	ebdf19fa7e	[NPU] Further fix saving of generation config (#12657 ) * Further fix saving of generation config * Fix based on comments * Small fix	2025-01-07 13:53:54 +08:00
Yuwen Hu	381d448ee2	[NPU] Example & Quickstart updates (#12650 ) * Remove model with optimize_model=False in NPU verified models tables, and remove related example * Remove experimental in run optimized model section title * Unify model table order & example cmd * Move embedding example to separate folder & update quickstart example link * Add Quickstart reference in main NPU readme * Small fix * Small fix * Move save/load examples under NPU/HF-Transformers-AutoModels * Add low-bit and polish arguments for LLM Python examples * Small fix * Add low-bit and polish arguments for Multi-Model examples * Polish argument for Embedding models * Polish argument for LLM CPP examples * Add low-bit and polish argument for Save-Load examples * Add accuracy tuning tips for examples * Update NPU qucikstart accuracy tuning with low-bit optimizations * Add save/load section to qucikstart * Update CPP example sample output to EN * Add installation regarding cmake for CPP examples * Small fix * Small fix * Small fix * Small fix * Small fix * Small fix * Unify max prompt length to 512 * Change recommended low-bit for Qwen2.5-3B-Instruct to asym_int4 * Update based on comments * Small fix	2025-01-07 13:52:41 +08:00
Yishuo Wang	ddc0ef3993	refactor device check and remove cohere/mixtral support (#12659 )	2025-01-07 11:15:51 +08:00
Yishuo Wang	ea65e4fecc	remove falcon support and related UT (#12656 )	2025-01-07 09:26:00 +08:00
Yina Chen	fae73eee79	[NPU] Support save npu quantized model without npu dependency (#12647 ) * support save awq * load quantized model & save npu compiled model * fix style * update * fix dll load issue * update error message * fix style	2025-01-06 18:06:22 +08:00
Yishuo Wang	502461d836	remove unnecessary ipex kernel usage (#12649 )	2025-01-03 16:45:24 +08:00
Yishuo Wang	9f8b134889	add ipex-llm custom kernel registration (#12648 )	2025-01-03 16:45:04 +08:00
binbin Deng	0b377100c5	Add guide for save-load usage (#12498 )	2025-01-03 16:30:15 +08:00
Wang, Jian4	6711a48a36	Enable internvl2-8b on vllm(#12645 )	2025-01-03 14:49:36 +08:00
Zijie Li	8fd2dcba86	Add benchmark_util for `transformers >= 4.47.0` (#12644 )	2025-01-03 10:48:29 +08:00
Yina Chen	8e5328e9b4	add disable opts for awq (#12641 )	2025-01-02 15:45:22 +08:00
Xu, Shuo	62318964fa	Update llama example information (#12640 ) Co-authored-by: ATMxsp01 <shou.xu@intel.com>	2025-01-02 13:48:39 +08:00
Yishuo Wang	81211fd010	remove unused code (#12635 )	2025-01-02 13:31:09 +08:00
binbin Deng	534566e290	[NPU] Support minicpm-v with python cpp backend (#12637 )	2025-01-02 11:13:15 +08:00
Yishuo Wang	f289f68d57	small fix (#12634 )	2024-12-30 17:14:25 +08:00
Yishuo Wang	2d08155513	remove bmm, which is only required in ipex 2.0 (#12630 )	2024-12-27 17:28:57 +08:00
binbin Deng	f17ccfa61a	[NPU] Fix save-load usage of minicpm models (#12628 )	2024-12-27 15:56:46 +08:00
Yishuo Wang	c72a5db757	remove unused code again (#12624 )	2024-12-27 14:17:11 +08:00
binbin Deng	46eeab4479	[NPU] Fix regression caused by layer_norm change (#12627 )	2024-12-27 14:08:49 +08:00
Ruonan Wang	90f6709486	[remove pipeline examples (#12626 )	2024-12-27 13:42:28 +08:00
Zijie Li	5f04ed7254	NPU] Update prompt format for baichuan2-pipeline (#12625 )	2024-12-27 11:30:54 +08:00
Yishuo Wang	34dbdb8ee3	small fix (#12623 )	2024-12-27 10:19:27 +08:00
Xu, Shuo	55ce091242	Add GLM4-Edge-V GPU example (#12596 ) * Add GLM4-Edge-V examples * polish readme * revert wrong changes * polish readme * polish readme * little polish in reference info and indent * Small fix and sample output updates * Update main readme --------- Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>	2024-12-27 09:40:29 +08:00
binbin Deng	796ee571a5	[NPU doc] Update verified platforms (#12621 )	2024-12-26 17:39:13 +08:00
Ruonan Wang	bbdbbb0d88	[NPU] Compatible with other third-party models like auto-round (#12620 ) * support third party model * simplify code * fix sty;e * fix sym int4 GW * code refactor * fix	2024-12-26 17:25:18 +08:00
Yishuo Wang	a9abde0b5d	support passing attn_scale to sdpa (#12619 )	2024-12-26 16:58:09 +08:00
Shaojun Liu	40a7d2b4f0	Consolidated C-Eval Benchmark Guide for Single-GPU and Multi-GPU Environments (#12618 ) * run c-eval on multi-GPUs * Update README.md	2024-12-26 15:23:32 +08:00
Zijie Li	ccc4055058	[NPU] Update prompt format for baichuan2 (#12615 ) * Update baichuan2.py * style fix	2024-12-26 11:41:37 +08:00
Yishuo Wang	1604b4ead8	small fix (#12616 )	2024-12-26 11:35:12 +08:00
Ruonan Wang	d841e1dc0d	[NPU] update convert script based on latest usage (#12617 )	2024-12-26 11:23:04 +08:00
Xu, Shuo	ef585d3360	Polish Readme for ModelScope-related examples (#12603 )	2024-12-26 10:52:47 +08:00
Yishuo Wang	a596f1ae5f	remove bigdl-llm test to fix langchain UT (#12613 )	2024-12-26 10:17:25 +08:00
Ruonan Wang	9e895f04ec	[NPU] fix npu save (#12614 ) * fix npu save * update	2024-12-26 09:21:16 +08:00
Yishuo Wang	6249c1e373	rewrite llama optimization (#12609 )	2024-12-25 17:04:32 +08:00
Yishuo Wang	5f5ac8a856	fix llama related import (#12611 )	2024-12-25 16:23:52 +08:00
Yishuo Wang	4e6b9d804f	add compresskv back for mistral (#12607 ) * add compresskv back for mistral * fix * fix	2024-12-25 11:06:08 +08:00
Yishuo Wang	4135b895b3	refactor chatglm2, internlm, stablelm and qwen (#12604 )	2024-12-24 18:18:00 +08:00
Yishuo Wang	073f936c37	refactor mistral and phi3 (#12605 )	2024-12-24 17:52:32 +08:00
binbin Deng	45f8f72a28	[NPU] Fix minicpm on MTL (#12599 )	2024-12-24 15:37:56 +08:00
Yishuo Wang	ad2dc965c5	refactor mllama, gpt2 and internvl (#12602 )	2024-12-24 14:18:31 +08:00
Yishuo Wang	7aaf02f602	refactor baichuan, glm4 and minicpm3 (#12600 )	2024-12-24 14:16:30 +08:00
Zijie Li	c410d9cf73	[NPU] support asym_int4 for baichuan (#12576 ) * add npu support for baichuan * Update baichuan_mp.py * Update baichuan_mp.py	2024-12-24 09:17:50 +08:00
Yishuo Wang	098eb335b2	refactor sd 1.5 and qwen2-vl and fix (#12590 )	2024-12-20 17:34:55 +08:00
Yishuo Wang	b050368efc	refactor yuan2 and starcoder2 and fix (#12589 )	2024-12-20 16:41:50 +08:00
Yishuo Wang	6ea8033635	refactor glm edge (#12588 )	2024-12-20 15:36:57 +08:00
Xu, Shuo	b0338c5529	Add --modelscope option for glm-v4 MiniCPM-V-2_6 glm-edge and internvl2 (#12583 ) * Add --modelscope option for glm-v4 and MiniCPM-V-2_6 * glm-edge * minicpm-v-2_6:don't use model_hub=modelscope when use lowbit; internvl2 --------- Co-authored-by: ATMxsp01 <shou.xu@intel.com>	2024-12-20 13:54:17 +08:00
Yishuo Wang	f3b5fad3be	refactor qwen2 and llama3 (#12587 )	2024-12-20 13:25:25 +08:00
Xu, Shuo	47da3c999f	Add `--modelscope` in GPU examples for minicpm, minicpm3, baichuan2 (#12564 ) * Add --modelscope for more models * minicpm --------- Co-authored-by: ATMxsp01 <shou.xu@intel.com>	2024-12-19 17:25:46 +08:00
Yishuo Wang	3eeb02f1be	support Megrez-3B-Omni (#12582 )	2024-12-19 17:23:01 +08:00

1 2 3 4 5 ...

2145 commits