Commit graph

5 commits

Author SHA1 Message Date
Ruonan Wang
2f78afcd2a
Refactor some functions to ipex_llm.transformers.models.common (#13091)
* add quantize_linear & linear_forward

* add moe_group_topk

* rotary_two_with_cache_inplaced

* fix code style

* update related models
2025-04-18 11:15:43 +08:00
Yishuo Wang
39e360fe9d
add grouped topk optimization for moonlight (#12903) 2025-02-28 13:25:56 +08:00
Yishuo Wang
be1f073866
add fuse moe optimization for moonlight (#12898) 2025-02-27 09:15:24 +08:00
Yishuo Wang
5faba06409
simple optimization for moonlight moe decoding forward (#12891) 2025-02-25 16:18:27 +08:00
Yishuo Wang
ab3fc66eb7
optimize attention part of moonlight-14B-A3B (#12886) 2025-02-25 09:38:13 +08:00