ipex-llm

Author	SHA1	Message	Date
Yuwen Hu	381d448ee2	[NPU] Example & Quickstart updates (#12650 ) * Remove model with optimize_model=False in NPU verified models tables, and remove related example * Remove experimental in run optimized model section title * Unify model table order & example cmd * Move embedding example to separate folder & update quickstart example link * Add Quickstart reference in main NPU readme * Small fix * Small fix * Move save/load examples under NPU/HF-Transformers-AutoModels * Add low-bit and polish arguments for LLM Python examples * Small fix * Add low-bit and polish arguments for Multi-Model examples * Polish argument for Embedding models * Polish argument for LLM CPP examples * Add low-bit and polish argument for Save-Load examples * Add accuracy tuning tips for examples * Update NPU qucikstart accuracy tuning with low-bit optimizations * Add save/load section to qucikstart * Update CPP example sample output to EN * Add installation regarding cmake for CPP examples * Small fix * Small fix * Small fix * Small fix * Small fix * Small fix * Unify max prompt length to 512 * Change recommended low-bit for Qwen2.5-3B-Instruct to asym_int4 * Update based on comments * Small fix	2025-01-07 13:52:41 +08:00
Chu,Youcheng	ffa9a9e1b3	Update streaming in npu examples (#12495 ) * feat: add streaming * Update readme accordingly --------- Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>	2024-12-04 17:51:10 +08:00
Jin, Qiao	7082844f3f	Fix NPU LLM example save/load tokenizer (#12485 )	2024-12-03 16:30:55 +08:00
binbin Deng	ab01753b1c	[NPU] update save-load API usage (#12473 )	2024-12-03 09:46:15 +08:00
binbin Deng	c911026f03	[NPU C++] Update model support & examples & benchmark (#12466 )	2024-11-29 13:35:58 +08:00
Yina Chen	e246f1e258	update llama3 npu example (#11933 )	2024-08-27 13:03:18 +08:00
Zijie Li	6c3eb1e1e8	refactor from_pretrained API for NPU (#11927 )	2024-08-27 09:50:30 +08:00
SONG Ge	8c5c7f32dd	Update doc for running npu generate example with ipex-llm[npu] (#11876 ) * update doc for running npu generate example with ipex-llm[npu] * switch max_prompt_len to 512 to fix compile error on mtl	2024-08-21 13:45:29 +08:00
SONG Ge	7380823f3f	Update Llama2 multi-processes example (#11852 ) * update llama2 multi-processes examples * update * update readme * update	2024-08-19 19:49:01 +08:00
Yang Wang	99b05ba1dc	separate prefill into a process (#11787 ) * seperate prefill into a process * using model.share_memory() * might work * worked * use long prompt * refactor * cleanup * fix bug * clean up * changable inter and intra process stages * refactor * add max output len * fix npu_model changes that may cause generate down * fix npu_model generate import error * fix generare forward error --------- Co-authored-by: sgwhat <ge.song@intel.com>	2024-08-19 17:53:36 +08:00
Yang Wang	51bcac1229	follow up on experimental support of fused decoder layer for llama2 (#11785 ) * clean up and support transpose value cache * refine * fix style * fix style	2024-08-13 18:53:55 -07:00
binbin Deng	23d3acdc77	Add experimental support of fused decoder layer for llama2 (#11768 )	2024-08-13 14:41:36 +08:00

12 commits