ipex-llm/python
Yang Wang cbeae97a26 Optimize Llama Attention to to reduce KV cache memory copy (#8580)
* Optimize llama attention to reduce KV cache memory copy

* fix bug

* fix style

* remove git

* fix style

* fix style

* fix style

* fix tests

* move llama attention to another file

* revert

* fix style

* remove jit

* fix
2023-08-01 16:37:58 -07:00
..
llm Optimize Llama Attention to to reduce KV cache memory copy (#8580) 2023-08-01 16:37:58 -07:00