ipex-llm

Author	SHA1	Message	Date
Yuwen Hu	ef4028ac2d	[NPU] Support split `lm_head` for Qwen2 with CPP (#12491 ) * Use split for Qwen2 lm_head instead of slice in optimize_pre * Support split lm_head in Qwen2 python cpp backend * Fit with Python acc lib pipeline * Removed default mixed_precision=True in all-in-one and related examples * Small fix * Style fix * Fix based on comments * Fix based on comments * Stype fix	2024-12-04 14:41:08 +08:00
Jin, Qiao	7082844f3f	Fix NPU LLM example save/load tokenizer (#12485 )	2024-12-03 16:30:55 +08:00
binbin Deng	ab01753b1c	[NPU] update save-load API usage (#12473 )	2024-12-03 09:46:15 +08:00
Ruonan Wang	0e23bd779f	Add support of llama3.2 for NPU C++ (#12442 ) * initial support of llama3.2 * update * update * fix style * fix style * fix * small fix	2024-11-26 09:26:55 +08:00
Ruonan Wang	b9abb8a285	Support qwen2.5 3B for NPU & update related examples (#12438 ) * update qwen2.5-3B * update convert * small fix * replace load_in_low_bit with low_bit * small fix	2024-11-25 16:38:31 +08:00
Kai Huang	c8679ad592	Qwen layernorm as input (#12309 ) * qwen layernorm as input * add group size	2024-11-04 09:51:15 +08:00
binbin Deng	d409d9d0eb	[NPU L0] Update streaming mode of example (#12312 )	2024-11-01 15:38:10 +08:00
binbin Deng	4892df61c9	Add qwen2-1.5b in l0 pipeline example (#12306 )	2024-10-31 16:44:25 +08:00
Kai Huang	416c19165c	Add Qwen pipeline and example (#12292 ) * support qwen pipeline * update error msg * style * meet review * minor	2024-10-31 11:25:25 +08:00