Commit graph

114 commits

Author SHA1 Message Date
Xin Qiu
dbc3c2d72d
glm4 sdp (#11253)
* glm4 sdp

* fix style

* update comment
2024-06-07 15:42:23 +08:00
Xin Qiu
151fcf37bb
check devie name in use_flash_attention (#11263) 2024-06-07 15:07:47 +08:00
Yishuo Wang
2623944604
qwen2 sdpa small fix (#11261) 2024-06-07 14:42:18 +08:00
Yishuo Wang
ea0d03fd28
Refactor baichuan1 7B and 13B (#11258) 2024-06-07 14:29:20 +08:00
Yishuo Wang
ef8e9b2ecd
Refactor qwen2 moe (#11244) 2024-06-07 13:14:54 +08:00
Xin Qiu
2f809116e2
optimize Chatglm4 (#11239)
* chatglm4

* update

* update

* add rms norm

* chatglm4
2024-06-06 18:25:20 +08:00
Yishuo Wang
2e4ccd541c
fix qwen2 cpu (#11240) 2024-06-06 16:24:19 +08:00
Yishuo Wang
e738ec38f4
disable quantize kv in specific qwen model (#11238) 2024-06-06 14:08:39 +08:00
Yishuo Wang
c4e5806e01
add latest optimization in starcoder2 (#11236) 2024-06-06 14:02:17 +08:00
Yishuo Wang
ba27e750b1
refactor yuan2 (#11235) 2024-06-06 13:17:54 +08:00
binbin Deng
a6674f5bce
Fix should_use_fuse_rope error of Qwen1.5-MoE-A2.7B-Chat (#11216) 2024-06-05 15:56:10 +08:00
Xin Qiu
566691c5a3
quantized attention forward for minicpm (#11200)
* quantized minicpm

* fix style check
2024-06-05 09:15:25 +08:00
Jiao Wang
bb83bc23fd
Fix Starcoder issue on CPU on transformers 4.36+ (#11190)
* fix starcoder for sdpa

* update

* style
2024-06-04 10:05:40 -07:00
Yishuo Wang
6454655dcc
use sdp in baichuan2 13b (#11198) 2024-06-04 15:39:00 +08:00
Yishuo Wang
d90cd977d0
refactor stablelm (#11195) 2024-06-04 13:14:43 +08:00
Xin Qiu
5f13700c9f
optimize Minicpm (#11189)
* minicpm optimize

* update
2024-06-03 18:28:29 +08:00
Yishuo Wang
bc5008f0d5
disable sdp_causal in phi-3 to fix overflow (#11157) 2024-05-28 17:25:53 +08:00
Yishuo Wang
d307622797
fix first token sdp with batch (#11153) 2024-05-28 15:03:06 +08:00
Yina Chen
3464440839
fix qwen import error (#11154) 2024-05-28 14:50:12 +08:00
Yina Chen
b6b70d1ba0
Divide core-xe packages (#11131)
* temp

* add batch

* fix style

* update package name

* fix style

* add workflow

* use temp version to run uts

* trigger performance test

* trigger win igpu perf

* revert workflow & setup
2024-05-28 12:00:18 +08:00
binbin Deng
367de141f2
Fix mixtral-8x7b with transformers=4.37.0 (#11132) 2024-05-27 09:50:54 +08:00
Yishuo Wang
1db9d9a63b
optimize internlm2 xcomposer agin (#11124) 2024-05-24 13:44:52 +08:00
Yishuo Wang
9372ce87ce
fix internlm xcomposer2 fp16 (#11123) 2024-05-24 11:03:31 +08:00
Cengguang Zhang
011b9faa5c
LLM: unify baichuan2-13b alibi mask dtype with model dtype. (#11107)
* LLM: unify alibi mask dtype.

* fix comments.
2024-05-24 10:27:53 +08:00
Yishuo Wang
37b98a531f
support running internlm xcomposer2 on gpu and add sdp optimization (#11115) 2024-05-23 17:26:24 +08:00
Zhao Changmin
c5e8b90c8d
Add Qwen register attention implemention (#11110)
* qwen_register
2024-05-23 17:17:45 +08:00
Yishuo Wang
0e53f20edb
support running internlm-xcomposer2 on cpu (#11111) 2024-05-23 16:36:09 +08:00
Yishuo Wang
cd4dff09ee
support phi-3 vision (#11101) 2024-05-22 17:43:50 +08:00
Xin Qiu
71bcd18f44
fix qwen vl (#11090) 2024-05-21 18:40:29 +08:00
Yishuo Wang
f00625f9a4
refactor qwen2 (#11087) 2024-05-21 16:53:42 +08:00
Yishuo Wang
d830a63bb7
refactor qwen (#11074) 2024-05-20 18:08:37 +08:00
Yishuo Wang
4e97047d70
fix baichuan2 13b fp16 (#11071) 2024-05-20 11:21:20 +08:00
Yishuo Wang
31ce3e0c13
refactor baichuan2-13b (#11064) 2024-05-17 16:25:30 +08:00
Yishuo Wang
981d668be6
refactor baichuan2-7b (#11062) 2024-05-17 13:01:34 +08:00
Ruonan Wang
3a72e5df8c
disable mlp fusion of fp6 on mtl (#11059) 2024-05-17 10:10:16 +08:00
SONG Ge
192ae35012
Add support for llama2 quantize_kv with transformers 4.38.0 (#11054)
* add support for llama2 quantize_kv with transformers 4.38.0

* fix code style

* fix code style
2024-05-16 22:23:39 +08:00
SONG Ge
16b2a418be
hotfix native_sdp ut (#11046)
* hotfix native_sdp

* update
2024-05-16 17:15:37 +08:00
Xin Qiu
6be70283b7
fix chatglm run error (#11045)
* fix chatglm

* update

* fix style
2024-05-16 15:39:18 +08:00
Yishuo Wang
8cae897643
use new rope in phi3 (#11047) 2024-05-16 15:12:35 +08:00
Yishuo Wang
59df750326
Use new sdp again (#11025) 2024-05-16 09:33:34 +08:00
SONG Ge
9942a4ba69
[WIP] Support llama2 with transformers==4.38.0 (#11024)
* support llama2 with transformers==4.38.0

* add supprot for quantize_qkv

* add original support for 4.38.0 now

* code style fix
2024-05-15 18:07:00 +08:00
Ruonan Wang
ac384e0f45
add fp6 mlp fusion (#11032)
* add fp6 fusion

* add qkv fusion for fp6

* remove qkv first
2024-05-15 17:42:50 +08:00
Yishuo Wang
fad1dbaf60
use sdp fp8 causal kernel (#11023) 2024-05-15 10:22:35 +08:00
Zhao Changmin
0a732bebe7
Add phi3 cached RotaryEmbedding (#11013)
* phi3cachedrotaryembed

* pep8
2024-05-15 08:16:43 +08:00
Zhao Changmin
b03c859278
Add phi3RMS (#10988)
* phi3RMS
2024-05-14 15:16:27 +08:00
Yishuo Wang
170e3d65e0
use new sdp and fp32 sdp (#11007) 2024-05-14 14:29:18 +08:00
Yishuo Wang
ad96f32ce0
optimize phi3 1st token performance (#10981) 2024-05-10 17:33:46 +08:00
Cengguang Zhang
cfed76b2ed
LLM: add long-context support for Qwen1.5-7B/Baichuan2-7B/Mistral-7B. (#10937)
* LLM: add split tensor support for baichuan2-7b and qwen1.5-7b.

* fix style.

* fix style.

* fix style.

* add support for mistral and fix condition threshold.

* fix  style.

* fix comments.
2024-05-10 16:40:15 +08:00
Yishuo Wang
e753125880
use fp16_sdp when head_dim=96 (#10976) 2024-05-09 17:02:59 +08:00
Yishuo Wang
697ca79eca
use quantize kv and sdp in phi3-mini (#10973) 2024-05-09 15:16:18 +08:00