ipex-llm

Author	SHA1	Message	Date
Ruonan Wang	0819fad34e	support Llama2-7B / Llama3-8B for NPU C++ (#12431 ) * support llama2 * update * support fused_layers=4 for Llama2-7B	2024-11-22 18:47:19 +08:00
Ruonan Wang	4ffa6c752c	New convert support for C++ NPU (#12430 ) * initial commit * fix * fix style * fix style * fix * fix	2024-11-22 14:28:30 +08:00
Ruonan Wang	2935e97610	small fix of cpp readme(#12425 )	2024-11-21 18:21:34 +08:00
Ruonan Wang	7288c759ce	Initial NPU C++ Example (#12417 ) * temp save * meet review, update * update * meet review, add license * typo	2024-11-21 10:09:26 +08:00
SONG Ge	ff3f7cb25f	Fix speech_paraformer issue with unexpected changes (#12416 ) * Fix speech_paraformer issue with unexpected changes * Add paraformer version specified	2024-11-19 15:01:20 +08:00
SONG Ge	d2cbcb060c	Add initial support for modeling_xlm encoder on NPU (#12393 ) * Add initial support for modeling_xlm encoder on NPU * Add EmbeddingModel class to keep the same usage with bce and npu fp16 linear convert * Optimize currently implementation to support EmbeddingModel.encode API and convert other torch modules to NPU * Add related example and documents	2024-11-14 10:50:27 +08:00
binbin Deng	7a97fbb779	Support vpm and resampler module of minicpm-v on NPU (#12375 )	2024-11-12 15:59:55 +08:00
Yina Chen	b2e69a896c	[NPU] Support Baichuan groupwise & gw code refactor (#12337 ) * support minicpm 1b & qwen 1.5b gw * support minicpm 1b * baichuan part * update * support minicpm 1b & qwen 1.5b gw * support minicpm 1b * baichuan part * update * update * update * baichuan support * code refactor * remove code * fix style * address comments * revert	2024-11-08 11:42:42 +08:00
binbin Deng	812d5cc32e	[NPU L0] Support llama3.2 in L0 pipeline (#12361 )	2024-11-08 10:01:23 +08:00
SONG Ge	a7b66683f1	[NPU] Add Optimized Support for Llama3.2-1B/3B on NPU (#12339 ) * Add initial support for llama3.2-1b/3b * move llama3.2 support into current llama_mp impl	2024-11-06 19:21:40 +08:00
Yina Chen	d872639395	[NPU] Llama3, Qwen2 1.5b, MiniCPM 1/2B groupwise support (#12327 ) * support minicpm 1b & qwen 1.5b gw * support minicpm 1b * support minicpm 2b * fix style & error * fix style & update * remove print	2024-11-05 15:51:31 +08:00
Kai Huang	c8679ad592	Qwen layernorm as input (#12309 ) * qwen layernorm as input * add group size	2024-11-04 09:51:15 +08:00
binbin Deng	d409d9d0eb	[NPU L0] Update streaming mode of example (#12312 )	2024-11-01 15:38:10 +08:00
binbin Deng	eda764909c	Add minicpm-2b in L0 pipeline (#12308 )	2024-11-01 09:30:01 +08:00
binbin Deng	4892df61c9	Add qwen2-1.5b in l0 pipeline example (#12306 )	2024-10-31 16:44:25 +08:00
Kai Huang	416c19165c	Add Qwen pipeline and example (#12292 ) * support qwen pipeline * update error msg * style * meet review * minor	2024-10-31 11:25:25 +08:00
binbin Deng	41b8064554	Support minicpm-1B in level0 pipeline (#12297 )	2024-10-30 17:21:47 +08:00
Ruonan Wang	2b2cb9c693	[NPU pipeline] Support save & load and update examples (#12293 ) * support save & load, update llama examples * update baichuan2 example * update readme	2024-10-30 10:02:00 +08:00
binbin Deng	3feb58d1e4	Support baichuan2 for level0 pipeline (#12289 )	2024-10-29 19:24:16 +08:00
Yina Chen	4467645088	[NPU] Support l0 Llama groupwise (#12276 ) * except lm_head * remove * support gw lm_head * update * fix * remove run.bat * fix style * support llama3	2024-10-28 17:06:55 +08:00
Ruonan Wang	3fe2ea3081	[NPU] Reuse prefill of acc lib for pipeline (#12279 ) * first commit * update example * fix style * update example * embedding as const * fix generate * code refactor * meet code review * fix style * change max_output_len to max_context_len * fix all-in-one * fix example * add check for new tokens	2024-10-28 16:05:49 +08:00
binbin Deng	ec362e6133	Add llama3 level0 example (#12275 )	2024-10-28 09:24:51 +08:00
SONG Ge	a0c6432899	[NPU] Add support for loading a FunASR model (#12073 ) * add support for loading funasr model * add initial support for paraformer-encoder * add npu ops impl * add encoder-decoder npu pipeline * move paraformer encoders prefix 30 layers to npu and keep the rest layers on cpu	2024-10-25 17:22:01 +08:00
Ruonan Wang	854398f6e0	update example to reduce peak memory usage (#12274 )	2024-10-25 17:09:26 +08:00
Ruonan Wang	ae57e23e4f	fix incompatibility between llama GW & llama pipeline (#12267 ) * fix * fix	2024-10-25 10:31:44 +08:00
Ruonan Wang	821fd96367	Initial integrate our L0 Llama impl into ipex-llm (#12255 ) * temp save * initial support * fix * simplify code * fix style * fix example * make default value of pipeline as False	2024-10-24 09:49:27 +08:00
Jin, Qiao	8fa98e2742	Remove Qwen2-7b from NPU example for "Run Optimized Models (Experimental)" (#12245 ) * Remove qwen2-7b from npu example readme * fix	2024-10-22 17:07:51 +08:00
Ruonan Wang	4d93bb81fe	Initial support of NPU level0 Model (#12177 ) * first commit to support load dll and init llm pipeline * add init generate * fix style * small updates * fix style and check tokens number	2024-10-11 09:45:53 +08:00
Jin, Qiao	2bedb17be7	Add Qwen2.5 NPU Example (#12110 ) * Add Qwen2.5 NPU Example * fix * Merge qwen2.py and qwen2.5.py into qwen.py * Fix description	2024-09-25 15:20:03 +08:00
Yuwen Hu	828fa01ad3	[NPU] Add `mixed_precision` for Qwen2 7B (#12098 ) * Add mix_precision argument to control whether use INT8 lm_head for Qwen2-7B-Instruct * Small fix * Fixed on load low bit with mixed precision * Small fix * Update example accordingly * Update for default prompt * Update base on comments * Final fix	2024-09-20 16:36:21 +08:00
Ch1y0q	b4b8c3e495	add `lowbit_path` for `generate.py`, fix `npu_model` (#12077 ) * add `lowbit_path` for `generate.py`, fix `npu_model` * update `README.md`	2024-09-13 17:28:05 +08:00
Jinhe	e78e45ee01	update NPU readme: run conhost as administrator (#12066 )	2024-09-11 17:54:04 +08:00
Jinhe	4ca330da15	Fix NPU load error message and add minicpm npu lowbit feat (#12064 ) * fix npu_model raise sym_int4 error * add load_lowbit * remove print&perf	2024-09-11 16:56:35 +08:00
Zijie Li	c5fdfde1bd	fix npu-model prompt (#12057 )	2024-09-11 10:06:45 +08:00
Ch1y0q	73a4360f3f	update lowbit path for baichuan2, qwen2, `generate.py` (#12051 ) * update lowbit path for baichuan2, qwen2, `generate.py` * update readme	2024-09-10 15:35:24 +08:00
Yuwen Hu	f61b1785fb	Small update to NPU example readme (#12034 ) * Small update to NPU example readme * Small fix	2024-09-06 15:54:23 +08:00
Ruonan Wang	0d04531ae0	update NPU readme of Qwen2 (#12032 ) * update readme * update broadcast	2024-09-06 15:02:39 +08:00
binbin Deng	5b18bb3c4a	Add recommend version for mtl npu (#12024 )	2024-09-05 16:28:53 +08:00
Ch1y0q	820f8a4554	add `--lowbit-path` option for NPU llama example (#12020 ) * add option" `--lowbit-path` * add descriptions in `README.md` and formatting * Update llama.py	2024-09-05 15:31:01 +08:00
Ruonan Wang	79978e6f36	update npu multimodal readme (#11979 ) * update npu readme of multimodal * small fix * meet comment	2024-08-30 19:02:06 +08:00
Ruonan Wang	4811a490ef	small fix (#11978 ) * fix * meet comment	2024-08-30 17:55:15 +08:00
Ruonan Wang	573c20bae6	fix npu lm_head cpu condition (#11976 ) * fix * fix * fix * fix stype * fix style * fix style	2024-08-30 17:11:26 +08:00
Ruonan Wang	60aa1a2c0f	Initial NPU support for MiniCPM-V-2_6 (#11966 ) * initial pr * update npu model * fix * fix kv cache type * fix * small fix * fix style * fix model id * change inter_pp=4 * address comment * fix * fix style * fix * rebase	2024-08-30 16:34:35 +08:00
SONG Ge	158289d205	[NPU] Add initial support for minicpm-llama-v2.5 (#11962 ) * add initial support for minicpm-llama-v2.5 * update impl * add minicpm-llama3-v2.5 example	2024-08-30 16:00:33 +08:00
binbin Deng	cd077881f1	Disable lm head (#11972 )	2024-08-30 11:05:18 +08:00
Jason Dai	431affd0a0	Update README.md (#11964 )	2024-08-29 18:56:35 +08:00
binbin Deng	14b2c8dc32	Update qwen2-7b example script (#11961 )	2024-08-29 18:25:17 +08:00
Yina Chen	5f7ff76ea5	update troubleshooting (#11960 )	2024-08-29 17:44:22 +08:00
Yina Chen	882f4a5ff7	Add lnl npu driver recommend version and enable cpu_lm_head on llama3 (#11952 ) * update lnl npu driver version and enable cpu_lm_head on llama3 * update * fix style * typo * address comments * update * add qwen2-7b	2024-08-29 15:01:18 +08:00
binbin Deng	71f03dcc39	Support qwen2-7b with fused decoderlayer optimization on NPU (#11912 )	2024-08-29 13:34:20 +08:00

1 2

78 commits