Commit graph

43 commits

Author SHA1 Message Date
Yishuo Wang
9b81236a2e
optimzie qwen2-vl vision (#12203) 2024-10-15 15:54:25 +08:00
Yishuo Wang
6cedb601e4
remove some useless code (#12035) 2024-09-06 17:51:08 +08:00
Yuwen Hu
9e9086cc2a
Update IPEX_LLM_PERFORMANCE_MODE (#11823) 2024-08-16 09:48:36 +08:00
Yina Chen
841dbcdf3a
Fix compresskv with lookahead issue (#11767)
* fix compresskv + lookahead attn_mask qwen2

* support llama chatglm

* support mistral & chatglm

* address comments

* revert run.py
2024-08-12 18:53:55 +08:00
Yina Chen
dd46c141bd
Phi3 support compresskv (#11733)
* phi3 support compresskv

* fix phi3 mtl error

* fix conflict with quant kv

* fix abnormal on mtl

* fix style

* use slide windows size to compress kv

* support sliding window

* fix style

* fix style

* temp: partial support quant kv

* support quant kv with compress kv, todo: model check

* temp

* fix style

* fix style

* remove prepare

* address comment

* default -> 1.8k
2024-08-09 15:43:43 +08:00
Ruonan Wang
00a5574c8a
Use merge_qkv to replace fused_qkv for llama2 (#11727)
* update 4.38

* support new versions

* update

* fix style

* fix style

* update rope

* temp test sdpa

* fix style

* fix cpu ut
2024-08-07 18:04:01 +08:00
Yina Chen
a71ae7c22b
Support minicpm compresskv & modify default compresskv config & default enable compresskv on mtl 2.5k~4.5k (#11726)
* support minicpm & modify default & default enable on mtl 2.5k~4.5k

* fix style
2024-08-07 11:35:39 +08:00
hxsz1997
9b36877897
disable default quantize_kv of GQA on MTL (#11679)
* disable default quantizekv of gqa in mtl

* fix stype

* fix stype

* fix stype

* fix stype

* fix stype

* fix stype
2024-07-30 09:38:46 +08:00
Yishuo Wang
6f999e6e90
add sdp for gemma2 (#11677) 2024-07-29 15:15:47 +08:00
Yina Chen
fc7f8feb83
Support compress kv (#11642)
* mistral snapkv

* update

* mtl update

* update

* update

* update

* add comments

* style fix

* fix style

* support llama

* llama use compress kv

* support mistral 4.40

* fix style

* support diff transformers versions

* move snapkv util to kv

* fix style

* meet comments & small fix

* revert all in one

* fix indent

---------

Co-authored-by: leonardozcm <leonardo1997zcm@gmail.com>
2024-07-26 16:02:00 +08:00
Yishuo Wang
39bcb33a67
add sdp support for stablelm 3b (#11473) 2024-07-01 14:56:15 +08:00
Yishuo Wang
01fe0fc1a2
refactor chatglm2/3 (#11290) 2024-06-13 12:22:58 +08:00
Xin Qiu
151fcf37bb
check devie name in use_flash_attention (#11263) 2024-06-07 15:07:47 +08:00
Yina Chen
b6b70d1ba0
Divide core-xe packages (#11131)
* temp

* add batch

* fix style

* update package name

* fix style

* add workflow

* use temp version to run uts

* trigger performance test

* trigger win igpu perf

* revert workflow & setup
2024-05-28 12:00:18 +08:00
Yishuo Wang
1db9d9a63b
optimize internlm2 xcomposer agin (#11124) 2024-05-24 13:44:52 +08:00
Yishuo Wang
31ce3e0c13
refactor baichuan2-13b (#11064) 2024-05-17 16:25:30 +08:00
Ruonan Wang
3a72e5df8c
disable mlp fusion of fp6 on mtl (#11059) 2024-05-17 10:10:16 +08:00
Yishuo Wang
59df750326
Use new sdp again (#11025) 2024-05-16 09:33:34 +08:00
SONG Ge
9942a4ba69
[WIP] Support llama2 with transformers==4.38.0 (#11024)
* support llama2 with transformers==4.38.0

* add supprot for quantize_qkv

* add original support for 4.38.0 now

* code style fix
2024-05-15 18:07:00 +08:00
Ruonan Wang
ac384e0f45
add fp6 mlp fusion (#11032)
* add fp6 fusion

* add qkv fusion for fp6

* remove qkv first
2024-05-15 17:42:50 +08:00
Yishuo Wang
170e3d65e0
use new sdp and fp32 sdp (#11007) 2024-05-14 14:29:18 +08:00
Yishuo Wang
ad96f32ce0
optimize phi3 1st token performance (#10981) 2024-05-10 17:33:46 +08:00
Yishuo Wang
e753125880
use fp16_sdp when head_dim=96 (#10976) 2024-05-09 17:02:59 +08:00
Cengguang Zhang
75dbf240ec
LLM: update split tensor conditions. (#10872)
* LLM: update split tensor condition.

* add cond for split tensor.

* update priority of env.

* fix style.

* update env name.
2024-04-30 17:07:21 +08:00
Yishuo Wang
d884c62dc4
remove new_layout parameter (#10906) 2024-04-29 10:31:50 +08:00
Yishuo Wang
46ba962168
use new quantize kv (#10888) 2024-04-26 14:42:17 +08:00
Yina Chen
8811f268ff
Use new fp16 sdp in Qwen and modify the constraint (#10882) 2024-04-25 19:23:37 +08:00
Yina Chen
dc27b3bc35
Use sdp when rest token seq_len > 1 in llama & mistral (for lookup & spec) (#10790)
* update sdp condition

* update

* fix

* update & test llama

* mistral

* fix style

* update

* fix style

* remove pvc constrain

* update ds on arc

* fix style
2024-04-24 17:24:01 +08:00
Yishuo Wang
2d210817ff
add phi3 optimization (#10871) 2024-04-24 15:17:40 +08:00
Wang, Jian4
209c3501e6
LLM: Optimize qwen1.5 moe model (#10706)
* update moe block

* fix style

* enable optmize MLP

* enabel kv_cache

* enable fuse rope

* enable fused qkv

* enable flash_attention

* error sdp quantize

* use old api

* use fuse

* use xetla

* fix python style

* update moe_blocks num

* fix output error

* add cpu sdpa

* update

* update

* update
2024-04-18 14:54:05 +08:00
Xin Qiu
e764f9b1b1
Disable fast fused rope on UHD (#10780)
* use decoding fast path

* update

* update

* cleanup
2024-04-18 10:03:53 +08:00
Yishuo Wang
8f45e22072
fix llama2 (#10710) 2024-04-09 17:28:37 +08:00
Xin Qiu
3a9ab8f1ae
fix stablelm logits diff (#10636)
* fix logits diff

* Small fixes

---------

Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
2024-04-03 15:08:12 +08:00
binbin Deng
2bbd8a1548
LLM: fix llama2 FP16 & bs>1 & autotp on PVC and ARC (#10611) 2024-04-03 09:28:04 +08:00
Yuwen Hu
fd384ddfb8
Optimize StableLM (#10619)
* Initial commit for stablelm optimizations

* Small style fix

* add dependency

* Add mlp optimizations

* Small fix

* add attention forward

* Remove quantize kv for now as head_dim=80

* Add merged qkv

* fix lisence

* Python style fix

---------

Co-authored-by: qiuxin2012 <qiuxin2012cs@gmail.com>
2024-04-02 18:58:38 +08:00
Shaojun Liu
a10f5a1b8d
add python style check (#10620)
* add python style check

* fix style checks

* update runner

* add ipex-llm-finetune-qlora-cpu-k8s to manually_build workflow

* update tag to 2.1.0-SNAPSHOT
2024-04-02 16:17:56 +08:00
Cengguang Zhang
58b57177e3
LLM: support bigdl quantize kv cache env and add warning. (#10623)
* LLM: support bigdl quantize kv cache env and add warnning.

* fix style.

* fix comments.
2024-04-02 15:41:08 +08:00
Cengguang Zhang
e567956121
LLM: add memory optimization for llama. (#10592)
* add initial memory optimization.

* fix logic.

* fix logic,

* remove env var check in mlp split.
2024-04-02 09:07:50 +08:00
Qiyuan Gong
f4537798c1
Enable kv cache quantization by default for flex when 1 < batch <= 8 (#10584)
* Enable kv cache quantization by default for flex when 1 < batch <= 8.
* Change up bound from <8 to <=8.
2024-03-29 09:43:42 +08:00
Cengguang Zhang
b44f7adbad
LLM: Disable esimd sdp for PVC GPU when batch size>1 (#10579)
* llm: disable esimd sdp for pvc bz>1.

* fix logic.

* fix: avoid call get device name twice.
2024-03-28 22:55:48 +08:00
Ruonan Wang
ea4bc450c4
LLM: add esimd sdp for pvc (#10543)
* add esimd sdp for pvc

* update

* fix

* fix batch
2024-03-26 19:04:40 +08:00
Xin Qiu
1dd40b429c
enable fp4 fused mlp and qkv (#10531)
* enable fp4 fused mlp and qkv

* update qwen

* update qwen2
2024-03-26 08:34:00 +08:00
Wang, Jian4
9df70d95eb
Refactor bigdl.llm to ipex_llm (#24)
* Rename bigdl/llm to ipex_llm

* rm python/llm/src/bigdl

* from bigdl.llm to from ipex_llm
2024-03-22 15:41:21 +08:00
Renamed from python/llm/src/bigdl/llm/transformers/models/utils.py (Browse further)