ipex-llm

Author	SHA1	Message	Date
binbin Deng	680ea7e4a8	[NPU doc] Update configuration for different platforms (#12554 )	2024-12-17 10:15:09 +08:00
binbin Deng	ab01753b1c	[NPU] update save-load API usage (#12473 )	2024-12-03 09:46:15 +08:00
Yuwen Hu	aee9acb303	Add NPU QuickStart & update example links (#12470 ) * Add initial NPU quickstart (c++ part unfinished) * Small update * Update based on comments * Update main readme * Remove LLaMA description * Small fix * Small fix * Remove subsection link in main README * Small fix * Update based on comments * Small fix * TOC update and other small fixes * Update for Chinese main readme * Update based on comments and other small fixes * Change order	2024-12-02 17:03:10 +08:00
SONG Ge	ff3f7cb25f	Fix speech_paraformer issue with unexpected changes (#12416 ) * Fix speech_paraformer issue with unexpected changes * Add paraformer version specified	2024-11-19 15:01:20 +08:00
SONG Ge	d2cbcb060c	Add initial support for modeling_xlm encoder on NPU (#12393 ) * Add initial support for modeling_xlm encoder on NPU * Add EmbeddingModel class to keep the same usage with bce and npu fp16 linear convert * Optimize currently implementation to support EmbeddingModel.encode API and convert other torch modules to NPU * Add related example and documents	2024-11-14 10:50:27 +08:00
binbin Deng	7a97fbb779	Support vpm and resampler module of minicpm-v on NPU (#12375 )	2024-11-12 15:59:55 +08:00
Ruonan Wang	3fe2ea3081	[NPU] Reuse prefill of acc lib for pipeline (#12279 ) * first commit * update example * fix style * update example * embedding as const * fix generate * code refactor * meet code review * fix style * change max_output_len to max_context_len * fix all-in-one * fix example * add check for new tokens	2024-10-28 16:05:49 +08:00
SONG Ge	a0c6432899	[NPU] Add support for loading a FunASR model (#12073 ) * add support for loading funasr model * add initial support for paraformer-encoder * add npu ops impl * add encoder-decoder npu pipeline * move paraformer encoders prefix 30 layers to npu and keep the rest layers on cpu	2024-10-25 17:22:01 +08:00
Ch1y0q	b4b8c3e495	add `lowbit_path` for `generate.py`, fix `npu_model` (#12077 ) * add `lowbit_path` for `generate.py`, fix `npu_model` * update `README.md`	2024-09-13 17:28:05 +08:00
Yuwen Hu	f61b1785fb	Small update to NPU example readme (#12034 ) * Small update to NPU example readme * Small fix	2024-09-06 15:54:23 +08:00
Ruonan Wang	79978e6f36	update npu multimodal readme (#11979 ) * update npu readme of multimodal * small fix * meet comment	2024-08-30 19:02:06 +08:00
Ruonan Wang	4811a490ef	small fix (#11978 ) * fix * meet comment	2024-08-30 17:55:15 +08:00
Ruonan Wang	573c20bae6	fix npu lm_head cpu condition (#11976 ) * fix * fix * fix * fix stype * fix style * fix style	2024-08-30 17:11:26 +08:00
Ruonan Wang	60aa1a2c0f	Initial NPU support for MiniCPM-V-2_6 (#11966 ) * initial pr * update npu model * fix * fix kv cache type * fix * small fix * fix style * fix model id * change inter_pp=4 * address comment * fix * fix style * fix * rebase	2024-08-30 16:34:35 +08:00
SONG Ge	158289d205	[NPU] Add initial support for minicpm-llama-v2.5 (#11962 ) * add initial support for minicpm-llama-v2.5 * update impl * add minicpm-llama3-v2.5 example	2024-08-30 16:00:33 +08:00
Jin, Qiao	c28b3389e6	Update npu multimodal example (#11773 )	2024-08-13 14:14:59 +08:00
Jin, Qiao	a44ab32153	Switch to conhost when running on NPU (#11687 )	2024-07-30 17:08:06 +08:00
Zhao Changmin	06745e5742	Add npu benchmark all-in-one script (#11571 ) * npu benchmark	2024-07-15 10:42:37 +08:00
Zhao Changmin	105e124752	optimize phi3-v encoder npu performance and add multimodal example (#11553 ) * phi3-v * readme	2024-07-11 13:59:14 +08:00

19 commits