Commit graph

15 commits

Author SHA1 Message Date
Yishuo Wang
3eeb02f1be
support Megrez-3B-Omni (#12582) 2024-12-19 17:23:01 +08:00
Yishuo Wang
80f2fdc37b
optimize new minicpm model (#12579) 2024-12-19 14:22:47 +08:00
Yishuo Wang
4540424271
optimize siglip attention again (#12578) 2024-12-19 13:40:48 +08:00
Yishuo Wang
e0921f80c1
padding mask on torch side (#12577) 2024-12-19 10:53:02 +08:00
Yishuo Wang
a608f26cc8
use new fused layer norm (#12553) 2024-12-17 13:52:35 +08:00
Yishuo Wang
e0bf0054e1
small fix (#12493) 2024-12-04 16:37:39 +08:00
Yishuo Wang
dc34e8c51f
optimize glm4v vision attention (#12369) 2024-11-08 17:01:57 +08:00
Yishuo Wang
540eaeb12c
refactor attention_softmax (#12295) 2024-10-30 13:20:50 +08:00
Yishuo Wang
abc370728c
optimize minicpm3 again (#12047) 2024-09-10 14:19:57 +08:00
Yishuo Wang
048b4590aa
add basic minicpm3 optimization (#12039) 2024-09-09 17:25:08 +08:00
Yishuo Wang
828ab16537
fix phi3 and minicpmv cpu (#11818) 2024-08-15 17:43:29 +08:00
Yishuo Wang
a1eb793f70
optimize minicpm v 2_6 firs token perf (#11770) 2024-08-13 09:51:18 +08:00
Ruonan Wang
7e917d6cfb
fix gptq of llama (#11749)
* fix gptq of llama

* small fix
2024-08-09 16:39:25 +08:00
Yishuo Wang
c02003925b
add mlp for gemma2 (#11678) 2024-07-29 16:10:23 +08:00
Yishuo Wang
7f88ce23cd
add more gemma2 optimization (#11673) 2024-07-29 11:13:00 +08:00