Yishuo Wang
|
f3b5fad3be
|
refactor qwen2 and llama3 (#12587)
|
2024-12-20 13:25:25 +08:00 |
|
Yishuo Wang
|
4540424271
|
optimize siglip attention again (#12578)
|
2024-12-19 13:40:48 +08:00 |
|
Yishuo Wang
|
c090d167dc
|
remove old rope usage (#12544)
|
2024-12-13 16:54:58 +08:00 |
|
Yishuo Wang
|
15219944b8
|
optimize glm edge again (#12539)
|
2024-12-13 13:52:39 +08:00 |
|
Yishuo Wang
|
e0bf0054e1
|
small fix (#12493)
|
2024-12-04 16:37:39 +08:00 |
|
Yishuo Wang
|
8164aed802
|
small change (#12439)
|
2024-11-25 14:35:49 +08:00 |
|
Yuwen Hu
|
8fdc36c140
|
Optimize with new batch kernel when batch_size=1 on LNL (#12419)
* Add use batch kernel condition for LNL
* Fix for other device judgement
* Fix based on comment
|
2024-11-21 16:21:35 +08:00 |
|
Yishuo Wang
|
9ea694484d
|
refactor ot remove old rope usage (#12224)
|
2024-10-17 17:06:09 +08:00 |
|
Yishuo Wang
|
9b81236a2e
|
optimzie qwen2-vl vision (#12203)
|
2024-10-15 15:54:25 +08:00 |
|
Yishuo Wang
|
6cedb601e4
|
remove some useless code (#12035)
|
2024-09-06 17:51:08 +08:00 |
|
Yuwen Hu
|
9e9086cc2a
|
Update IPEX_LLM_PERFORMANCE_MODE (#11823)
|
2024-08-16 09:48:36 +08:00 |
|
Yina Chen
|
841dbcdf3a
|
Fix compresskv with lookahead issue (#11767)
* fix compresskv + lookahead attn_mask qwen2
* support llama chatglm
* support mistral & chatglm
* address comments
* revert run.py
|
2024-08-12 18:53:55 +08:00 |
|
Yina Chen
|
dd46c141bd
|
Phi3 support compresskv (#11733)
* phi3 support compresskv
* fix phi3 mtl error
* fix conflict with quant kv
* fix abnormal on mtl
* fix style
* use slide windows size to compress kv
* support sliding window
* fix style
* fix style
* temp: partial support quant kv
* support quant kv with compress kv, todo: model check
* temp
* fix style
* fix style
* remove prepare
* address comment
* default -> 1.8k
|
2024-08-09 15:43:43 +08:00 |
|
Ruonan Wang
|
00a5574c8a
|
Use merge_qkv to replace fused_qkv for llama2 (#11727)
* update 4.38
* support new versions
* update
* fix style
* fix style
* update rope
* temp test sdpa
* fix style
* fix cpu ut
|
2024-08-07 18:04:01 +08:00 |
|
Yina Chen
|
a71ae7c22b
|
Support minicpm compresskv & modify default compresskv config & default enable compresskv on mtl 2.5k~4.5k (#11726)
* support minicpm & modify default & default enable on mtl 2.5k~4.5k
* fix style
|
2024-08-07 11:35:39 +08:00 |
|
hxsz1997
|
9b36877897
|
disable default quantize_kv of GQA on MTL (#11679)
* disable default quantizekv of gqa in mtl
* fix stype
* fix stype
* fix stype
* fix stype
* fix stype
* fix stype
|
2024-07-30 09:38:46 +08:00 |
|
Yishuo Wang
|
6f999e6e90
|
add sdp for gemma2 (#11677)
|
2024-07-29 15:15:47 +08:00 |
|
Yina Chen
|
fc7f8feb83
|
Support compress kv (#11642)
* mistral snapkv
* update
* mtl update
* update
* update
* update
* add comments
* style fix
* fix style
* support llama
* llama use compress kv
* support mistral 4.40
* fix style
* support diff transformers versions
* move snapkv util to kv
* fix style
* meet comments & small fix
* revert all in one
* fix indent
---------
Co-authored-by: leonardozcm <leonardo1997zcm@gmail.com>
|
2024-07-26 16:02:00 +08:00 |
|
Yishuo Wang
|
39bcb33a67
|
add sdp support for stablelm 3b (#11473)
|
2024-07-01 14:56:15 +08:00 |
|
Yishuo Wang
|
01fe0fc1a2
|
refactor chatglm2/3 (#11290)
|
2024-06-13 12:22:58 +08:00 |
|
Xin Qiu
|
151fcf37bb
|
check devie name in use_flash_attention (#11263)
|
2024-06-07 15:07:47 +08:00 |
|
Yina Chen
|
b6b70d1ba0
|
Divide core-xe packages (#11131)
* temp
* add batch
* fix style
* update package name
* fix style
* add workflow
* use temp version to run uts
* trigger performance test
* trigger win igpu perf
* revert workflow & setup
|
2024-05-28 12:00:18 +08:00 |
|
Yishuo Wang
|
1db9d9a63b
|
optimize internlm2 xcomposer agin (#11124)
|
2024-05-24 13:44:52 +08:00 |
|
Yishuo Wang
|
31ce3e0c13
|
refactor baichuan2-13b (#11064)
|
2024-05-17 16:25:30 +08:00 |
|
Ruonan Wang
|
3a72e5df8c
|
disable mlp fusion of fp6 on mtl (#11059)
|
2024-05-17 10:10:16 +08:00 |
|
Yishuo Wang
|
59df750326
|
Use new sdp again (#11025)
|
2024-05-16 09:33:34 +08:00 |
|
SONG Ge
|
9942a4ba69
|
[WIP] Support llama2 with transformers==4.38.0 (#11024)
* support llama2 with transformers==4.38.0
* add supprot for quantize_qkv
* add original support for 4.38.0 now
* code style fix
|
2024-05-15 18:07:00 +08:00 |
|
Ruonan Wang
|
ac384e0f45
|
add fp6 mlp fusion (#11032)
* add fp6 fusion
* add qkv fusion for fp6
* remove qkv first
|
2024-05-15 17:42:50 +08:00 |
|
Yishuo Wang
|
170e3d65e0
|
use new sdp and fp32 sdp (#11007)
|
2024-05-14 14:29:18 +08:00 |
|
Yishuo Wang
|
ad96f32ce0
|
optimize phi3 1st token performance (#10981)
|
2024-05-10 17:33:46 +08:00 |
|
Yishuo Wang
|
e753125880
|
use fp16_sdp when head_dim=96 (#10976)
|
2024-05-09 17:02:59 +08:00 |
|
Cengguang Zhang
|
75dbf240ec
|
LLM: update split tensor conditions. (#10872)
* LLM: update split tensor condition.
* add cond for split tensor.
* update priority of env.
* fix style.
* update env name.
|
2024-04-30 17:07:21 +08:00 |
|
Yishuo Wang
|
d884c62dc4
|
remove new_layout parameter (#10906)
|
2024-04-29 10:31:50 +08:00 |
|
Yishuo Wang
|
46ba962168
|
use new quantize kv (#10888)
|
2024-04-26 14:42:17 +08:00 |
|
Yina Chen
|
8811f268ff
|
Use new fp16 sdp in Qwen and modify the constraint (#10882)
|
2024-04-25 19:23:37 +08:00 |
|
Yina Chen
|
dc27b3bc35
|
Use sdp when rest token seq_len > 1 in llama & mistral (for lookup & spec) (#10790)
* update sdp condition
* update
* fix
* update & test llama
* mistral
* fix style
* update
* fix style
* remove pvc constrain
* update ds on arc
* fix style
|
2024-04-24 17:24:01 +08:00 |
|
Yishuo Wang
|
2d210817ff
|
add phi3 optimization (#10871)
|
2024-04-24 15:17:40 +08:00 |
|
Wang, Jian4
|
209c3501e6
|
LLM: Optimize qwen1.5 moe model (#10706)
* update moe block
* fix style
* enable optmize MLP
* enabel kv_cache
* enable fuse rope
* enable fused qkv
* enable flash_attention
* error sdp quantize
* use old api
* use fuse
* use xetla
* fix python style
* update moe_blocks num
* fix output error
* add cpu sdpa
* update
* update
* update
|
2024-04-18 14:54:05 +08:00 |
|
Xin Qiu
|
e764f9b1b1
|
Disable fast fused rope on UHD (#10780)
* use decoding fast path
* update
* update
* cleanup
|
2024-04-18 10:03:53 +08:00 |
|
Yishuo Wang
|
8f45e22072
|
fix llama2 (#10710)
|
2024-04-09 17:28:37 +08:00 |
|
Xin Qiu
|
3a9ab8f1ae
|
fix stablelm logits diff (#10636)
* fix logits diff
* Small fixes
---------
Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
|
2024-04-03 15:08:12 +08:00 |
|
binbin Deng
|
2bbd8a1548
|
LLM: fix llama2 FP16 & bs>1 & autotp on PVC and ARC (#10611)
|
2024-04-03 09:28:04 +08:00 |
|
Yuwen Hu
|
fd384ddfb8
|
Optimize StableLM (#10619)
* Initial commit for stablelm optimizations
* Small style fix
* add dependency
* Add mlp optimizations
* Small fix
* add attention forward
* Remove quantize kv for now as head_dim=80
* Add merged qkv
* fix lisence
* Python style fix
---------
Co-authored-by: qiuxin2012 <qiuxin2012cs@gmail.com>
|
2024-04-02 18:58:38 +08:00 |
|
Shaojun Liu
|
a10f5a1b8d
|
add python style check (#10620)
* add python style check
* fix style checks
* update runner
* add ipex-llm-finetune-qlora-cpu-k8s to manually_build workflow
* update tag to 2.1.0-SNAPSHOT
|
2024-04-02 16:17:56 +08:00 |
|
Cengguang Zhang
|
58b57177e3
|
LLM: support bigdl quantize kv cache env and add warning. (#10623)
* LLM: support bigdl quantize kv cache env and add warnning.
* fix style.
* fix comments.
|
2024-04-02 15:41:08 +08:00 |
|
Cengguang Zhang
|
e567956121
|
LLM: add memory optimization for llama. (#10592)
* add initial memory optimization.
* fix logic.
* fix logic,
* remove env var check in mlp split.
|
2024-04-02 09:07:50 +08:00 |
|
Qiyuan Gong
|
f4537798c1
|
Enable kv cache quantization by default for flex when 1 < batch <= 8 (#10584)
* Enable kv cache quantization by default for flex when 1 < batch <= 8.
* Change up bound from <8 to <=8.
|
2024-03-29 09:43:42 +08:00 |
|
Cengguang Zhang
|
b44f7adbad
|
LLM: Disable esimd sdp for PVC GPU when batch size>1 (#10579)
* llm: disable esimd sdp for pvc bz>1.
* fix logic.
* fix: avoid call get device name twice.
|
2024-03-28 22:55:48 +08:00 |
|
Ruonan Wang
|
ea4bc450c4
|
LLM: add esimd sdp for pvc (#10543)
* add esimd sdp for pvc
* update
* fix
* fix batch
|
2024-03-26 19:04:40 +08:00 |
|
Xin Qiu
|
1dd40b429c
|
enable fp4 fused mlp and qkv (#10531)
* enable fp4 fused mlp and qkv
* update qwen
* update qwen2
|
2024-03-26 08:34:00 +08:00 |
|