ipex-llm

Author	SHA1	Message	Date
Jiao Wang	b4b6ddf73c	NPU Baichuan2 Multi- Process example (#11928 )	2024-08-27 15:25:49 +08:00
SONG Ge	a81a329a5f	[NPU] Add example for NPU multi-processing minicpm-1b model (#11935 ) * add minicpm example	2024-08-27 14:57:46 +08:00
Yina Chen	e246f1e258	update llama3 npu example (#11933 )	2024-08-27 13:03:18 +08:00
binbin Deng	14dddfc0d6	Update NPU example readme (#11931 )	2024-08-27 12:44:58 +08:00
Zijie Li	6c3eb1e1e8	refactor from_pretrained API for NPU (#11927 )	2024-08-27 09:50:30 +08:00
binbin Deng	dd303776cf	Add troubleshooting about transpose value setting	2024-08-26 16:06:32 +08:00
Zijie Li	794abe2ce8	update npu-readme (#11900 )	2024-08-22 17:49:35 +08:00
binbin Deng	72a7bf624b	Support qwen2-1.5b with fused decoderlayer optimization on NPU (#11888 )	2024-08-22 11:09:12 +08:00
SONG Ge	8c5c7f32dd	Update doc for running npu generate example with ipex-llm[npu] (#11876 ) * update doc for running npu generate example with ipex-llm[npu] * switch max_prompt_len to 512 to fix compile error on mtl	2024-08-21 13:45:29 +08:00
SONG Ge	5b83493b1a	Add ipex-llm npu option in setup.py (#11858 ) * add ipex-llm npu release * update example doc * meet latest release changes	2024-08-20 17:29:49 +08:00
SONG Ge	7380823f3f	Update Llama2 multi-processes example (#11852 ) * update llama2 multi-processes examples * update * update readme * update	2024-08-19 19:49:01 +08:00
Yang Wang	99b05ba1dc	separate prefill into a process (#11787 ) * seperate prefill into a process * using model.share_memory() * might work * worked * use long prompt * refactor * cleanup * fix bug * clean up * changable inter and intra process stages * refactor * add max output len * fix npu_model changes that may cause generate down * fix npu_model generate import error * fix generare forward error --------- Co-authored-by: sgwhat <ge.song@intel.com>	2024-08-19 17:53:36 +08:00
Yang Wang	51bcac1229	follow up on experimental support of fused decoder layer for llama2 (#11785 ) * clean up and support transpose value cache * refine * fix style * fix style	2024-08-13 18:53:55 -07:00
binbin Deng	23d3acdc77	Add experimental support of fused decoder layer for llama2 (#11768 )	2024-08-13 14:41:36 +08:00
Jin, Qiao	05989ad0f9	Update npu example and all in one benckmark (#11766 )	2024-08-12 16:46:46 +08:00
Jin, Qiao	a44ab32153	Switch to conhost when running on NPU (#11687 )	2024-07-30 17:08:06 +08:00
Zhao Changmin	06745e5742	Add npu benchmark all-in-one script (#11571 ) * npu benchmark	2024-07-15 10:42:37 +08:00
Zhao Changmin	b9c66994a5	add npu sdp (#11562 )	2024-07-11 16:57:35 +08:00
Zhao Changmin	3c16c9f725	Optimize baichuan on NPU (#11548 ) * baichuan_npu	2024-07-10 13:18:48 +08:00
Zhao Changmin	76a5802acf	update NPU examples (#11540 ) * update NPU examples	2024-07-09 17:19:42 +08:00

20 commits