ipex-llm/python
Guancheng Fu daf536fb2d vLLM: Apply attention optimizations for selective batching (#9758)
* fuse_rope for prefil

* apply kv_cache optimizations

* apply fast_decoding_path

* Re-enable kv_cache optimizations for prefill

* reduce KV_CACHE_ALLOC_BLOCK for selective_batching
2023-12-25 10:29:31 +08:00
..
llm vLLM: Apply attention optimizations for selective batching (#9758) 2023-12-25 10:29:31 +08:00