ipex-llm

History

Guancheng Fu daf536fb2d vLLM: Apply attention optimizations for selective batching (#9758 ) * fuse_rope for prefil * apply kv_cache optimizations * apply fast_decoding_path * Re-enable kv_cache optimizations for prefill * reduce KV_CACHE_ALLOC_BLOCK for selective_batching		2023-12-25 10:29:31 +08:00
..
llm	vLLM: Apply attention optimizations for selective batching (#9758 )	2023-12-25 10:29:31 +08:00