ipex-llm/python/llm/src/bigdl
Qiyuan Gong 9e18ea187f [LLM] Avoid KV Cache OOM when seq len is larger than 1 (#10006)
* Avoid OOM during muti-round streaming chat with kv cache
* For llama like kv cache, i.e., [bs, n_head, seq_len, head_dim], use is_enough_kv_cache_room_4_31.
* Other models need to compare kv cache size with kv_len.
2024-01-26 17:30:08 +08:00
..
llm [LLM] Avoid KV Cache OOM when seq len is larger than 1 (#10006) 2024-01-26 17:30:08 +08:00
__init__.py LLM: add first round files (#8225) 2023-05-25 11:29:18 +08:00