Yishuo Wang
f8ab833f74
support and optimize janus pro ( #12813 )
2025-02-12 15:07:24 +08:00
Yishuo Wang
73cfe293fa
add basic support for Baichuan-M1-14B-Instruct ( #12808 )
2025-02-11 17:27:42 +08:00
Xiangyu Tian
c9b6c94a59
vLLM: Update vLLM-cpu to v0.6.6-post1 ( #12728 )
...
Update vLLM-cpu to v0.6.6-post1
2025-01-22 15:03:01 +08:00
Yishuo Wang
bda87c21eb
add support and optimization for minicpmo audio part ( #12716 )
2025-01-16 16:39:00 +08:00
Yishuo Wang
b62734748f
add support and optimization for minicpmo vision part ( #12713 )
2025-01-16 14:51:00 +08:00
Yuwen Hu
9d65dcd7ef
Fix deepseek coder with linear rope type support on GPU ( #12709 )
...
* Fix deepseek coder with linear rope type
* Style fix
* Move to optimize_pre
* Small fix
* Small fix
* Small fix to not affect other cases
* Style fixes
* Update function name
* Small fix
* Small fix
* Small fix
* Fix for low transformers version first
* Style fix
* Small fix
2025-01-15 21:12:34 +08:00
Yishuo Wang
68857494a5
refactor to simplify following upgrade 2 ( #12685 )
2025-01-10 09:29:03 +08:00
Yishuo Wang
1ec40cd09e
refactor to simplify following upgrade ( #12680 )
2025-01-09 13:34:30 +08:00
Yishuo Wang
a22a8c21bb
small fix and remove ununsed code about ipex ( #12671 )
2025-01-08 17:39:04 +08:00
Yishuo Wang
c11f5f0fcd
also convert SdpaAttention in optimize_model ( #12673 )
2025-01-08 16:48:03 +08:00
Yishuo Wang
ccf618ff4a
Remove all ipex usage ( #12666 )
2025-01-08 10:31:18 +08:00
Yishuo Wang
29ad5c449e
refactor codegeex to remove ipex kernel usage ( #12664 )
2025-01-07 16:17:40 +08:00
Yishuo Wang
ddc0ef3993
refactor device check and remove cohere/mixtral support ( #12659 )
2025-01-07 11:15:51 +08:00
Yishuo Wang
ea65e4fecc
remove falcon support and related UT ( #12656 )
2025-01-07 09:26:00 +08:00
Yishuo Wang
502461d836
remove unnecessary ipex kernel usage ( #12649 )
2025-01-03 16:45:24 +08:00
Yina Chen
8e5328e9b4
add disable opts for awq ( #12641 )
2025-01-02 15:45:22 +08:00
Yishuo Wang
2d08155513
remove bmm, which is only required in ipex 2.0 ( #12630 )
2024-12-27 17:28:57 +08:00
Yishuo Wang
c72a5db757
remove unused code again ( #12624 )
2024-12-27 14:17:11 +08:00
Yishuo Wang
1604b4ead8
small fix ( #12616 )
2024-12-26 11:35:12 +08:00
Yishuo Wang
6249c1e373
rewrite llama optimization ( #12609 )
2024-12-25 17:04:32 +08:00
Yishuo Wang
073f936c37
refactor mistral and phi3 ( #12605 )
2024-12-24 17:52:32 +08:00
Yishuo Wang
3eeb02f1be
support Megrez-3B-Omni ( #12582 )
2024-12-19 17:23:01 +08:00
Yishuo Wang
a608f26cc8
use new fused layer norm ( #12553 )
2024-12-17 13:52:35 +08:00
Yishuo Wang
ffce86d69f
add basic glm-edge-v support ( #12533 )
2024-12-12 17:25:48 +08:00
Yishuo Wang
3e0823d2ae
add basic glm-edge support ( #12531 )
2024-12-12 16:02:22 +08:00
Yishuo Wang
77404d2a63
support new model ( #12523 )
2024-12-11 13:41:15 +08:00
Yishuo Wang
a9e3f7f14c
optimize minicpm ( #12496 )
2024-12-04 17:14:16 +08:00
Yishuo Wang
6f3441ba4c
fix glm4-9b overflow ( #12455 )
2024-11-27 17:39:13 +08:00
Yishuo Wang
cdd41f5e4c
optimize sdxl again ( #12441 )
2024-11-25 17:46:46 +08:00
Yishuo Wang
8164aed802
small change ( #12439 )
2024-11-25 14:35:49 +08:00
Yishuo Wang
be132c4209
fix and optimize sd ( #12436 )
2024-11-25 14:09:48 +08:00
Yuwen Hu
e0918934c8
Add fused_mlp to glm4v models ( #12378 )
2024-11-11 17:10:25 +08:00
Yuwen Hu
1a6cbc473f
Add fused mlp optimizations to glm4 models ( #12360 )
...
* Add fused mlp to glm4 models
* Small fix
2024-11-07 18:52:47 +08:00
Yuwen Hu
872a74481a
Small optimization to glm4 models ( #12351 )
2024-11-06 19:16:58 +08:00
Yishuo Wang
e23ef7d088
optimize glm4v's vision part ( #12346 )
2024-11-06 15:43:40 +08:00
Yishuo Wang
c8b7265359
Add basic glm4v support ( #12345 )
2024-11-06 13:50:10 +08:00
Zhao Changmin
1b637e4477
Add chatglm2&3 fuse mlp ( #12328 )
...
* add chatglm fuse mlp
2024-11-04 18:04:41 +08:00
Xin Qiu
97a0f7fd35
Codegeex support ( #12303 )
...
* new codegeex attn
* use kv cache
* add compress/quantize kv
* remove compress/quantize kv
* fix style check
* fix style
* fix codegeex
2024-10-31 15:28:56 +08:00
Yuwen Hu
43b25a2fe7
Fix llama 3.2 vision on LNL ( #12264 )
...
* Fix llama 3.2 vision on LNL
* Small fix
2024-10-25 16:23:31 +08:00
Yishuo Wang
f3a2b20e6b
Optimize gpt2 ( #12259 )
2024-10-24 13:44:24 +08:00
Yuwen Hu
b3df47486d
Fix Gemma 2 on LNL ( #12240 )
...
* Fix gemma 2 on LNL
* Python style fix
2024-10-21 18:25:53 +08:00
Yishuo Wang
a4a758656a
refactor gemma to reduce old fuse rope usage ( #12215 )
2024-10-16 17:40:28 +08:00
Yishuo Wang
e279148aa0
optimize llama3.2 vision again ( #12211 )
2024-10-16 14:29:48 +08:00
Yishuo Wang
d5344587ab
optimize internvl2 vision model's attention ( #12198 )
2024-10-15 10:51:00 +08:00
Yuwen Hu
f8d1adc573
Fix Llama 3.2 & 3.1 on LNL ( #12196 )
2024-10-14 17:39:20 +08:00
Yishuo Wang
535bee5381
fix qwen2 vl again ( #12174 )
2024-10-10 13:50:01 +08:00
Yishuo Wang
78d253165d
optimize qwen2 vl perf again ( #12167 )
2024-10-09 16:43:48 +08:00
Yishuo Wang
644af2a76e
add basic llama 3.2 vision support ( #12163 )
2024-10-08 10:46:48 +08:00
Yishuo Wang
584c3489e7
add basic support for llama3.2 ( #12125 )
2024-09-26 15:46:19 +08:00
Yishuo Wang
77af9bc5fa
support passing None to low_bit in optimize_model ( #12121 )
2024-09-26 11:09:35 +08:00