ipex-llm

History

Qiyuan Gong 9e18ea187f [LLM] Avoid KV Cache OOM when seq len is larger than 1 (#10006 ) * Avoid OOM during muti-round streaming chat with kv cache * For llama like kv cache, i.e., [bs, n_head, seq_len, head_dim], use is_enough_kv_cache_room_4_31. * Other models need to compare kv cache size with kv_len.		2024-01-26 17:30:08 +08:00
..
llm	[LLM] Avoid KV Cache OOM when seq len is larger than 1 (#10006 )	2024-01-26 17:30:08 +08:00
__init__.py	LLM: add first round files (#8225 )	2023-05-25 11:29:18 +08:00