ipex-llm

Author	SHA1	Message	Date
Wang, Jian4	9df70d95eb	Refactor bigdl.llm to ipex_llm (#24 ) * Rename bigdl/llm to ipex_llm * rm python/llm/src/bigdl * from bigdl.llm to from ipex_llm	2024-03-22 15:41:21 +08:00
Heyang Sun	36a9e88104	Speculative Starcoder on CPU (#10138 ) * Speculative Starcoder on CPU * enable kv-cache pre-allocation * refine codes * refine * fix style * fix style * fix style * refine * refine * Update speculative.py * Update gptbigcode.py * fix style * Update speculative.py * enable mixed-datatype layernorm on top of torch API * adaptive dtype * Update README.md	2024-02-27 09:57:29 +08:00
Qiyuan Gong	9e18ea187f	[LLM] Avoid KV Cache OOM when seq len is larger than 1 (#10006 ) * Avoid OOM during muti-round streaming chat with kv cache * For llama like kv cache, i.e., [bs, n_head, seq_len, head_dim], use is_enough_kv_cache_room_4_31. * Other models need to compare kv cache size with kv_len.	2024-01-26 17:30:08 +08:00
Yishuo Wang	7bbb98abb6	Disable fused layer norm when using XMX to fix mpt UT (#9933 )	2024-01-18 16:22:12 +08:00
Xin Qiu	320110d158	handle empty fused norm result (#9688 ) * handle empty fused norm result * remove fast_rms_norm * fix style	2023-12-18 09:56:11 +08:00
Xin Qiu	82255f9726	Enable fused layernorm (#9614 ) * bloom layernorm * fix * layernorm * fix * fix * fix * style fix * fix * replace nn.LayerNorm	2023-12-11 09:26:13 +08:00
Ruonan Wang	b943d73844	LLM: refactor kv cache (#9030 ) * refactor utils * meet code review; update all models * small fix	2023-09-21 21:28:03 +08:00
Cengguang Zhang	868511cf02	LLM: fix kv cache issue of bloom and falcon. (#9029 )	2023-09-21 18:12:20 +08:00
Ruonan Wang	bf51ec40b2	LLM: Fix empty cache (#9024 ) * fix * fix * update example	2023-09-21 17:16:07 +08:00
Cengguang Zhang	b3cad7de57	LLM: add bloom kv cache support (#9012 ) * LLM: add bloom kv cache support * fix style.	2023-09-20 21:10:53 +08:00

10 commits