ipex-llm/python
Ruonan Wang b63aae8a8e LLM: add flash attention support for llama (#9518)
* add initial flash attention for llama

* accelerate fp32 first token by changing to fp16 in advance

* support fp32
2023-11-23 18:40:18 +08:00
..
llm LLM: add flash attention support for llama (#9518) 2023-11-23 18:40:18 +08:00