Ruonan Wang
2f78afcd2a
Refactor some functions to ipex_llm.transformers.models.common ( #13091 )
...
* add quantize_linear & linear_forward
* add moe_group_topk
* rotary_two_with_cache_inplaced
* fix code style
* update related models
2025-04-18 11:15:43 +08:00
Ruonan Wang
e08c6bd018
Fix several models based on sdp api change ( #13075 )
...
* fix baichuan based on sdp api change
* fix several models based on api change
* fix style
2025-04-15 11:13:12 +08:00
Ruonan Wang
6693e8ab04
Deepseek kv / sdp support ( #13068 )
...
* update kv
* fix
* fix style
2025-04-11 11:26:15 +08:00
Yishuo Wang
ef852dcb4a
add audio optimization for qwen2.5-omni ( #13037 )
2025-04-07 17:20:26 +08:00
Yishuo Wang
300eb01d98
Add basic optimization for Qwen2.5 omni ( #13022 )
2025-03-28 17:21:52 +08:00
Yuwen Hu
374747b492
Update bert optimization to fit higher transformers/torch version ( #13006 )
2025-03-25 16:12:03 +08:00
Yishuo Wang
39e360fe9d
add grouped topk optimization for moonlight ( #12903 )
2025-02-28 13:25:56 +08:00
Xin Qiu
e946127613
glm 4v 1st sdp for vision ( #12904 )
...
* glm4v 1st sdp
* update glm4v example
* meet code review
* fix style
2025-02-28 13:23:27 +08:00
Yishuo Wang
be1f073866
add fuse moe optimization for moonlight ( #12898 )
2025-02-27 09:15:24 +08:00
Yishuo Wang
5faba06409
simple optimization for moonlight moe decoding forward ( #12891 )
2025-02-25 16:18:27 +08:00
Yishuo Wang
ab3fc66eb7
optimize attention part of moonlight-14B-A3B ( #12886 )
2025-02-25 09:38:13 +08:00
Yishuo Wang
aee2db30f9
update sdp support ( #12847 )
2025-02-19 12:07:00 +08:00
Xiangyu Tian
93c10be762
LLM: Support hybrid convert for DeepSeek V3/R1 ( #12834 )
...
LLM: Support hybrid convert for DeepSeek V3/R1
2025-02-19 11:31:19 +08:00
Yishuo Wang
f8ab833f74
support and optimize janus pro ( #12813 )
2025-02-12 15:07:24 +08:00
Yishuo Wang
73cfe293fa
add basic support for Baichuan-M1-14B-Instruct ( #12808 )
2025-02-11 17:27:42 +08:00
Yishuo Wang
e4ceb722b6
fix qwen2 vl ( #12798 )
2025-02-10 13:25:53 +08:00
Yishuo Wang
0237ffb302
refactor xpu linear forward ( #12768 )
2025-02-05 17:40:38 +08:00
Yishuo Wang
bda87c21eb
add support and optimization for minicpmo audio part ( #12716 )
2025-01-16 16:39:00 +08:00
Yuwen Hu
9d65dcd7ef
Fix deepseek coder with linear rope type support on GPU ( #12709 )
...
* Fix deepseek coder with linear rope type
* Style fix
* Move to optimize_pre
* Small fix
* Small fix
* Small fix to not affect other cases
* Style fixes
* Update function name
* Small fix
* Small fix
* Small fix
* Fix for low transformers version first
* Style fix
* Small fix
2025-01-15 21:12:34 +08:00
Cengguang Zhang
9930351112
LLM: add new qtype woq_int4 to support gemm int4 temporary. ( #12706 )
...
This PR add temporary qtype woq_int4 to avoid affecting other qtype and models.
Co-authored-by: leonardozcm <leonardo1997zcm@gmail.com>
2025-01-15 14:41:33 +08:00
Yishuo Wang
68857494a5
refactor to simplify following upgrade 2 ( #12685 )
2025-01-10 09:29:03 +08:00
Yishuo Wang
7234c9b27b
update quantize kv cache condition ( #12681 )
2025-01-09 15:23:04 +08:00
Yishuo Wang
1ec40cd09e
refactor to simplify following upgrade ( #12680 )
2025-01-09 13:34:30 +08:00
Yishuo Wang
a22a8c21bb
small fix and remove ununsed code about ipex ( #12671 )
2025-01-08 17:39:04 +08:00
Yishuo Wang
c11f5f0fcd
also convert SdpaAttention in optimize_model ( #12673 )
2025-01-08 16:48:03 +08:00
Yishuo Wang
ccf618ff4a
Remove all ipex usage ( #12666 )
2025-01-08 10:31:18 +08:00
Yishuo Wang
29ad5c449e
refactor codegeex to remove ipex kernel usage ( #12664 )
2025-01-07 16:17:40 +08:00
Yishuo Wang
ddc0ef3993
refactor device check and remove cohere/mixtral support ( #12659 )
2025-01-07 11:15:51 +08:00
Yishuo Wang
ea65e4fecc
remove falcon support and related UT ( #12656 )
2025-01-07 09:26:00 +08:00
Yishuo Wang
502461d836
remove unnecessary ipex kernel usage ( #12649 )
2025-01-03 16:45:24 +08:00
Yishuo Wang
81211fd010
remove unused code ( #12635 )
2025-01-02 13:31:09 +08:00
Yishuo Wang
f289f68d57
small fix ( #12634 )
2024-12-30 17:14:25 +08:00
Yishuo Wang
c72a5db757
remove unused code again ( #12624 )
2024-12-27 14:17:11 +08:00
Yishuo Wang
34dbdb8ee3
small fix ( #12623 )
2024-12-27 10:19:27 +08:00
Yishuo Wang
a9abde0b5d
support passing attn_scale to sdpa ( #12619 )
2024-12-26 16:58:09 +08:00
Yishuo Wang
1604b4ead8
small fix ( #12616 )
2024-12-26 11:35:12 +08:00
Yishuo Wang
6249c1e373
rewrite llama optimization ( #12609 )
2024-12-25 17:04:32 +08:00
Yishuo Wang
5f5ac8a856
fix llama related import ( #12611 )
2024-12-25 16:23:52 +08:00
Yishuo Wang
4e6b9d804f
add compresskv back for mistral ( #12607 )
...
* add compresskv back for mistral
* fix
* fix
2024-12-25 11:06:08 +08:00
Yishuo Wang
4135b895b3
refactor chatglm2, internlm, stablelm and qwen ( #12604 )
2024-12-24 18:18:00 +08:00
Yishuo Wang
073f936c37
refactor mistral and phi3 ( #12605 )
2024-12-24 17:52:32 +08:00
Yishuo Wang
ad2dc965c5
refactor mllama, gpt2 and internvl ( #12602 )
2024-12-24 14:18:31 +08:00
Yishuo Wang
7aaf02f602
refactor baichuan, glm4 and minicpm3 ( #12600 )
2024-12-24 14:16:30 +08:00
Yishuo Wang
098eb335b2
refactor sd 1.5 and qwen2-vl and fix ( #12590 )
2024-12-20 17:34:55 +08:00
Yishuo Wang
b050368efc
refactor yuan2 and starcoder2 and fix ( #12589 )
2024-12-20 16:41:50 +08:00
Yishuo Wang
6ea8033635
refactor glm edge ( #12588 )
2024-12-20 15:36:57 +08:00
Yishuo Wang
f3b5fad3be
refactor qwen2 and llama3 ( #12587 )
2024-12-20 13:25:25 +08:00
Yishuo Wang
3eeb02f1be
support Megrez-3B-Omni ( #12582 )
2024-12-19 17:23:01 +08:00
Yishuo Wang
80f2fdc37b
optimize new minicpm model ( #12579 )
2024-12-19 14:22:47 +08:00
Yishuo Wang
4540424271
optimize siglip attention again ( #12578 )
2024-12-19 13:40:48 +08:00