ipex-llm

Author	SHA1	Message	Date
Chu,Youcheng	ffa9a9e1b3	Update streaming in npu examples (#12495 ) * feat: add streaming * Update readme accordingly --------- Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>	2024-12-04 17:51:10 +08:00
Jin, Qiao	7082844f3f	Fix NPU LLM example save/load tokenizer (#12485 )	2024-12-03 16:30:55 +08:00
binbin Deng	ab01753b1c	[NPU] update save-load API usage (#12473 )	2024-12-03 09:46:15 +08:00
binbin Deng	c911026f03	[NPU C++] Update model support & examples & benchmark (#12466 )	2024-11-29 13:35:58 +08:00
Yina Chen	e246f1e258	update llama3 npu example (#11933 )	2024-08-27 13:03:18 +08:00
Zijie Li	6c3eb1e1e8	refactor from_pretrained API for NPU (#11927 )	2024-08-27 09:50:30 +08:00
SONG Ge	8c5c7f32dd	Update doc for running npu generate example with ipex-llm[npu] (#11876 ) * update doc for running npu generate example with ipex-llm[npu] * switch max_prompt_len to 512 to fix compile error on mtl	2024-08-21 13:45:29 +08:00
SONG Ge	7380823f3f	Update Llama2 multi-processes example (#11852 ) * update llama2 multi-processes examples * update * update readme * update	2024-08-19 19:49:01 +08:00
Yang Wang	99b05ba1dc	separate prefill into a process (#11787 ) * seperate prefill into a process * using model.share_memory() * might work * worked * use long prompt * refactor * cleanup * fix bug * clean up * changable inter and intra process stages * refactor * add max output len * fix npu_model changes that may cause generate down * fix npu_model generate import error * fix generare forward error --------- Co-authored-by: sgwhat <ge.song@intel.com>	2024-08-19 17:53:36 +08:00
Yang Wang	51bcac1229	follow up on experimental support of fused decoder layer for llama2 (#11785 ) * clean up and support transpose value cache * refine * fix style * fix style	2024-08-13 18:53:55 -07:00
binbin Deng	23d3acdc77	Add experimental support of fused decoder layer for llama2 (#11768 )	2024-08-13 14:41:36 +08:00

11 commits