Yishuo Wang
|
0fbb10259a
|
use sdp_causal to reduce internvl2-4b memory usage if set environment variable (#11953)
|
2024-08-28 17:35:05 +08:00 |
|
Yishuo Wang
|
bd1e490d62
|
fix phi3 (#11878)
|
2024-08-21 10:31:41 +08:00 |
|
Yina Chen
|
c3c058373f
|
Update compresskv model forward type logic (#11868)
* update
* fix
|
2024-08-20 18:11:37 +08:00 |
|
Yishuo Wang
|
d4ee0a89f3
|
optimize phi3 memory usage (#11867)
|
2024-08-20 17:32:51 +08:00 |
|
Yishuo Wang
|
9490781aec
|
optimize phi3 memory usage again (#11848)
|
2024-08-19 17:26:59 +08:00 |
|
Yina Chen
|
3cd4e87168
|
Support compress KV with quantize KV (#11812)
* update llama
* support llama 4.41
* fix style
* support minicpm
* support qwen2
* support minicpm & update
* support chatglm4
* support chatglm
* remove print
* add DynamicCompressFp8Cache & support qwen
* support llama
* support minicpm phi3
* update chatglm2/4
* small fix & support qwen 4.42
* remove print
|
2024-08-19 15:32:32 +08:00 |
|
Yishuo Wang
|
828ab16537
|
fix phi3 and minicpmv cpu (#11818)
|
2024-08-15 17:43:29 +08:00 |
|
Yina Chen
|
7cd6ec9723
|
MiniCPM-V support compresskv (#11779)
* fix check error
* fix other models
* remove print
|
2024-08-13 19:03:40 +08:00 |
|
Yishuo Wang
|
aa861df066
|
use new fp32 softmax kernel (#11776)
|
2024-08-13 14:48:11 +08:00 |
|
Yina Chen
|
4b9c57cc60
|
Support compress kv with lookahead (#11752)
* support compress kv with lookahead
* enough kv miss param
|
2024-08-09 17:39:57 +08:00 |
|
Yina Chen
|
dd46c141bd
|
Phi3 support compresskv (#11733)
* phi3 support compresskv
* fix phi3 mtl error
* fix conflict with quant kv
* fix abnormal on mtl
* fix style
* use slide windows size to compress kv
* support sliding window
* fix style
* fix style
* temp: partial support quant kv
* support quant kv with compress kv, todo: model check
* temp
* fix style
* fix style
* remove prepare
* address comment
* default -> 1.8k
|
2024-08-09 15:43:43 +08:00 |
|
Yishuo Wang
|
c093f7d980
|
fix phi3 (#11729)
|
2024-08-07 09:39:46 +08:00 |
|
Yishuo Wang
|
929675aa6b
|
support latest phi3 (#11721)
|
2024-08-06 15:52:55 +08:00 |
|
SONG Ge
|
ef4b6519fb
|
Add phi-3 model support for pipeline parallel inference (#11334)
* add phi-3 model support
* add phi3 example
|
2024-06-17 17:44:24 +08:00 |
|
Yishuo Wang
|
bc5008f0d5
|
disable sdp_causal in phi-3 to fix overflow (#11157)
|
2024-05-28 17:25:53 +08:00 |
|
Yishuo Wang
|
d307622797
|
fix first token sdp with batch (#11153)
|
2024-05-28 15:03:06 +08:00 |
|
Yina Chen
|
b6b70d1ba0
|
Divide core-xe packages (#11131)
* temp
* add batch
* fix style
* update package name
* fix style
* add workflow
* use temp version to run uts
* trigger performance test
* trigger win igpu perf
* revert workflow & setup
|
2024-05-28 12:00:18 +08:00 |
|
Yishuo Wang
|
cd4dff09ee
|
support phi-3 vision (#11101)
|
2024-05-22 17:43:50 +08:00 |
|
Yishuo Wang
|
8cae897643
|
use new rope in phi3 (#11047)
|
2024-05-16 15:12:35 +08:00 |
|
Yishuo Wang
|
59df750326
|
Use new sdp again (#11025)
|
2024-05-16 09:33:34 +08:00 |
|
Yishuo Wang
|
fad1dbaf60
|
use sdp fp8 causal kernel (#11023)
|
2024-05-15 10:22:35 +08:00 |
|
Zhao Changmin
|
0a732bebe7
|
Add phi3 cached RotaryEmbedding (#11013)
* phi3cachedrotaryembed
* pep8
|
2024-05-15 08:16:43 +08:00 |
|
Zhao Changmin
|
b03c859278
|
Add phi3RMS (#10988)
* phi3RMS
|
2024-05-14 15:16:27 +08:00 |
|
Yishuo Wang
|
170e3d65e0
|
use new sdp and fp32 sdp (#11007)
|
2024-05-14 14:29:18 +08:00 |
|
Yishuo Wang
|
ad96f32ce0
|
optimize phi3 1st token performance (#10981)
|
2024-05-10 17:33:46 +08:00 |
|
Yishuo Wang
|
697ca79eca
|
use quantize kv and sdp in phi3-mini (#10973)
|
2024-05-09 15:16:18 +08:00 |
|
Yishuo Wang
|
2ebec0395c
|
optimize phi-3-mini-128 (#10959)
|
2024-05-08 16:33:17 +08:00 |
|
Yishuo Wang
|
c801c37bc6
|
optimize phi3 again: use quantize kv if possible (#10953)
|
2024-05-07 17:26:19 +08:00 |
|
Yishuo Wang
|
aa2fa9fde1
|
optimize phi3 again: use sdp if possible (#10951)
|
2024-05-07 15:53:08 +08:00 |
|
Yishuo Wang
|
2d210817ff
|
add phi3 optimization (#10871)
|
2024-04-24 15:17:40 +08:00 |
|