Ruonan Wang
|
2f78afcd2a
|
Refactor some functions to ipex_llm.transformers.models.common (#13091)
* add quantize_linear & linear_forward
* add moe_group_topk
* rotary_two_with_cache_inplaced
* fix code style
* update related models
|
2025-04-18 11:15:43 +08:00 |
|
Yishuo Wang
|
39e360fe9d
|
add grouped topk optimization for moonlight (#12903)
|
2025-02-28 13:25:56 +08:00 |
|
Yishuo Wang
|
be1f073866
|
add fuse moe optimization for moonlight (#12898)
|
2025-02-27 09:15:24 +08:00 |
|
Yishuo Wang
|
5faba06409
|
simple optimization for moonlight moe decoding forward (#12891)
|
2025-02-25 16:18:27 +08:00 |
|
Yishuo Wang
|
ab3fc66eb7
|
optimize attention part of moonlight-14B-A3B (#12886)
|
2025-02-25 09:38:13 +08:00 |
|