Yishuo Wang
|
6f999e6e90
|
add sdp for gemma2 (#11677)
|
2024-07-29 15:15:47 +08:00 |
|
Yina Chen
|
fc7f8feb83
|
Support compress kv (#11642)
* mistral snapkv
* update
* mtl update
* update
* update
* update
* add comments
* style fix
* fix style
* support llama
* llama use compress kv
* support mistral 4.40
* fix style
* support diff transformers versions
* move snapkv util to kv
* fix style
* meet comments & small fix
* revert all in one
* fix indent
---------
Co-authored-by: leonardozcm <leonardo1997zcm@gmail.com>
|
2024-07-26 16:02:00 +08:00 |
|
Yishuo Wang
|
39bcb33a67
|
add sdp support for stablelm 3b (#11473)
|
2024-07-01 14:56:15 +08:00 |
|
Yishuo Wang
|
01fe0fc1a2
|
refactor chatglm2/3 (#11290)
|
2024-06-13 12:22:58 +08:00 |
|
Xin Qiu
|
151fcf37bb
|
check devie name in use_flash_attention (#11263)
|
2024-06-07 15:07:47 +08:00 |
|
Yina Chen
|
b6b70d1ba0
|
Divide core-xe packages (#11131)
* temp
* add batch
* fix style
* update package name
* fix style
* add workflow
* use temp version to run uts
* trigger performance test
* trigger win igpu perf
* revert workflow & setup
|
2024-05-28 12:00:18 +08:00 |
|
Yishuo Wang
|
1db9d9a63b
|
optimize internlm2 xcomposer agin (#11124)
|
2024-05-24 13:44:52 +08:00 |
|
Yishuo Wang
|
31ce3e0c13
|
refactor baichuan2-13b (#11064)
|
2024-05-17 16:25:30 +08:00 |
|
Ruonan Wang
|
3a72e5df8c
|
disable mlp fusion of fp6 on mtl (#11059)
|
2024-05-17 10:10:16 +08:00 |
|
Yishuo Wang
|
59df750326
|
Use new sdp again (#11025)
|
2024-05-16 09:33:34 +08:00 |
|
SONG Ge
|
9942a4ba69
|
[WIP] Support llama2 with transformers==4.38.0 (#11024)
* support llama2 with transformers==4.38.0
* add supprot for quantize_qkv
* add original support for 4.38.0 now
* code style fix
|
2024-05-15 18:07:00 +08:00 |
|
Ruonan Wang
|
ac384e0f45
|
add fp6 mlp fusion (#11032)
* add fp6 fusion
* add qkv fusion for fp6
* remove qkv first
|
2024-05-15 17:42:50 +08:00 |
|
Yishuo Wang
|
170e3d65e0
|
use new sdp and fp32 sdp (#11007)
|
2024-05-14 14:29:18 +08:00 |
|
Yishuo Wang
|
ad96f32ce0
|
optimize phi3 1st token performance (#10981)
|
2024-05-10 17:33:46 +08:00 |
|
Yishuo Wang
|
e753125880
|
use fp16_sdp when head_dim=96 (#10976)
|
2024-05-09 17:02:59 +08:00 |
|
Cengguang Zhang
|
75dbf240ec
|
LLM: update split tensor conditions. (#10872)
* LLM: update split tensor condition.
* add cond for split tensor.
* update priority of env.
* fix style.
* update env name.
|
2024-04-30 17:07:21 +08:00 |
|
Yishuo Wang
|
d884c62dc4
|
remove new_layout parameter (#10906)
|
2024-04-29 10:31:50 +08:00 |
|
Yishuo Wang
|
46ba962168
|
use new quantize kv (#10888)
|
2024-04-26 14:42:17 +08:00 |
|
Yina Chen
|
8811f268ff
|
Use new fp16 sdp in Qwen and modify the constraint (#10882)
|
2024-04-25 19:23:37 +08:00 |
|
Yina Chen
|
dc27b3bc35
|
Use sdp when rest token seq_len > 1 in llama & mistral (for lookup & spec) (#10790)
* update sdp condition
* update
* fix
* update & test llama
* mistral
* fix style
* update
* fix style
* remove pvc constrain
* update ds on arc
* fix style
|
2024-04-24 17:24:01 +08:00 |
|
Yishuo Wang
|
2d210817ff
|
add phi3 optimization (#10871)
|
2024-04-24 15:17:40 +08:00 |
|
Wang, Jian4
|
209c3501e6
|
LLM: Optimize qwen1.5 moe model (#10706)
* update moe block
* fix style
* enable optmize MLP
* enabel kv_cache
* enable fuse rope
* enable fused qkv
* enable flash_attention
* error sdp quantize
* use old api
* use fuse
* use xetla
* fix python style
* update moe_blocks num
* fix output error
* add cpu sdpa
* update
* update
* update
|
2024-04-18 14:54:05 +08:00 |
|
Xin Qiu
|
e764f9b1b1
|
Disable fast fused rope on UHD (#10780)
* use decoding fast path
* update
* update
* cleanup
|
2024-04-18 10:03:53 +08:00 |
|
Yishuo Wang
|
8f45e22072
|
fix llama2 (#10710)
|
2024-04-09 17:28:37 +08:00 |
|
Xin Qiu
|
3a9ab8f1ae
|
fix stablelm logits diff (#10636)
* fix logits diff
* Small fixes
---------
Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
|
2024-04-03 15:08:12 +08:00 |
|
binbin Deng
|
2bbd8a1548
|
LLM: fix llama2 FP16 & bs>1 & autotp on PVC and ARC (#10611)
|
2024-04-03 09:28:04 +08:00 |
|
Yuwen Hu
|
fd384ddfb8
|
Optimize StableLM (#10619)
* Initial commit for stablelm optimizations
* Small style fix
* add dependency
* Add mlp optimizations
* Small fix
* add attention forward
* Remove quantize kv for now as head_dim=80
* Add merged qkv
* fix lisence
* Python style fix
---------
Co-authored-by: qiuxin2012 <qiuxin2012cs@gmail.com>
|
2024-04-02 18:58:38 +08:00 |
|
Shaojun Liu
|
a10f5a1b8d
|
add python style check (#10620)
* add python style check
* fix style checks
* update runner
* add ipex-llm-finetune-qlora-cpu-k8s to manually_build workflow
* update tag to 2.1.0-SNAPSHOT
|
2024-04-02 16:17:56 +08:00 |
|
Cengguang Zhang
|
58b57177e3
|
LLM: support bigdl quantize kv cache env and add warning. (#10623)
* LLM: support bigdl quantize kv cache env and add warnning.
* fix style.
* fix comments.
|
2024-04-02 15:41:08 +08:00 |
|
Cengguang Zhang
|
e567956121
|
LLM: add memory optimization for llama. (#10592)
* add initial memory optimization.
* fix logic.
* fix logic,
* remove env var check in mlp split.
|
2024-04-02 09:07:50 +08:00 |
|
Qiyuan Gong
|
f4537798c1
|
Enable kv cache quantization by default for flex when 1 < batch <= 8 (#10584)
* Enable kv cache quantization by default for flex when 1 < batch <= 8.
* Change up bound from <8 to <=8.
|
2024-03-29 09:43:42 +08:00 |
|
Cengguang Zhang
|
b44f7adbad
|
LLM: Disable esimd sdp for PVC GPU when batch size>1 (#10579)
* llm: disable esimd sdp for pvc bz>1.
* fix logic.
* fix: avoid call get device name twice.
|
2024-03-28 22:55:48 +08:00 |
|
Ruonan Wang
|
ea4bc450c4
|
LLM: add esimd sdp for pvc (#10543)
* add esimd sdp for pvc
* update
* fix
* fix batch
|
2024-03-26 19:04:40 +08:00 |
|
Xin Qiu
|
1dd40b429c
|
enable fp4 fused mlp and qkv (#10531)
* enable fp4 fused mlp and qkv
* update qwen
* update qwen2
|
2024-03-26 08:34:00 +08:00 |
|
Wang, Jian4
|
9df70d95eb
|
Refactor bigdl.llm to ipex_llm (#24)
* Rename bigdl/llm to ipex_llm
* rm python/llm/src/bigdl
* from bigdl.llm to from ipex_llm
|
2024-03-22 15:41:21 +08:00 |
|