* add initial flash attention for llama * accelerate fp32 first token by changing to fp16 in advance * support fp32 |
||
|---|---|---|
| .. | ||
| llm | ||
| __init__.py | ||
* add initial flash attention for llama * accelerate fp32 first token by changing to fp16 in advance * support fp32 |
||
|---|---|---|
| .. | ||
| llm | ||
| __init__.py | ||