ipex-llm/python
Wang, Jian4 191b184341
LLM: Optimize cohere model (#10878)
* use mlp and rms

* optimize kv_cache

* add fuse qkv

* add flash attention and fp16 sdp

* error fp8 sdp

* fix optimized

* fix style

* update

* add for pp
2024-05-07 10:19:50 +08:00
..
llm LLM: Optimize cohere model (#10878) 2024-05-07 10:19:50 +08:00