Commit graph

230 commits

Author SHA1 Message Date
Yishuo Wang
47e0b83cbf
optimize sd 1.5 (#12119) 2024-09-25 15:45:13 +08:00
Yishuo Wang
5d63aef60b
optimize qwen2 vl again (#12109) 2024-09-23 13:22:01 +08:00
Yishuo Wang
9239fd4f12
add basic support and optimization for qwen2-vl (#12104) 2024-09-20 17:23:06 +08:00
Yishuo Wang
d8c044e79d
optimize minicpm3 kv cache (#12052) 2024-09-10 16:51:21 +08:00
Yishuo Wang
abc370728c
optimize minicpm3 again (#12047) 2024-09-10 14:19:57 +08:00
Yishuo Wang
048b4590aa
add basic minicpm3 optimization (#12039) 2024-09-09 17:25:08 +08:00
Yishuo Wang
6cedb601e4
remove some useless code (#12035) 2024-09-06 17:51:08 +08:00
Guoqiong Song
8803242f5c
fix llama on cpu (#12018) 2024-09-04 19:17:54 -07:00
Yuwen Hu
a9e485eb1b
Support MiniCPM-V-2_6 multi-modal benchmarking with latency text streamer (#11963)
* Support MiniCPM-V-2_6 multi-modal benchmarking with latency text streamer

* Style fixes
2024-08-29 19:22:09 +08:00
Yishuo Wang
0fbb10259a
use sdp_causal to reduce internvl2-4b memory usage if set environment variable (#11953) 2024-08-28 17:35:05 +08:00
hxsz1997
650e6e6ce4
Merge pull request #11891 from hxsz1997/baichuan2-compresskv
Add compress_kv for Baichuan2
2024-08-23 06:09:58 +03:00
Ruonan Wang
4a61f7d20d
update mlp of llama (#11897)
* update mlp of llama

* relax threshold of  mlp test

* revert code
2024-08-22 20:34:53 +08:00
Huang, Xinshengzi
eb1e65f8a9 add comment 2024-08-22 15:14:47 +08:00
Huang, Xinshengzi
a2be3d7501 add comment of compress kv in attention forward 2024-08-22 15:11:55 +08:00
Huang, Xinshengzi
ce7de77085 add comment of change in model forward 2024-08-22 14:29:27 +08:00
Huang, Xinshengzi
42398a0045 add comment 2024-08-22 13:17:13 +08:00
Huang, Xinshengzi
48a827aa07 fix typos 2024-08-22 11:35:47 +08:00
Huang, Xinshengzi
8a5df93de2 fix typos 2024-08-22 11:33:07 +08:00
Huang, Xinshengzi
01ed397e7a fix typos 2024-08-22 11:31:25 +08:00
Huang, Xinshengzi
c6ed1c412d fix typos 2024-08-22 11:26:49 +08:00
Huang, Xinshengzi
2a0aa9271b fix typos 2024-08-22 11:23:22 +08:00
Huang, Xinshengzi
4adadddbbc fix typos 2024-08-22 11:12:23 +08:00
Huang, Xinshengzi
6a5ca17afc fix typoes 2024-08-22 11:09:58 +08:00
Huang, Xinshengzi
6bb9035788 fix typos 2024-08-22 11:08:48 +08:00
Huang, Xinshengzi
86248b0505 add compress_kv for baichuan2 2024-08-22 10:59:08 +08:00
Yishuo Wang
bd1e490d62
fix phi3 (#11878) 2024-08-21 10:31:41 +08:00
Yina Chen
c3c058373f
Update compresskv model forward type logic (#11868)
* update

* fix
2024-08-20 18:11:37 +08:00
Yishuo Wang
d4ee0a89f3
optimize phi3 memory usage (#11867) 2024-08-20 17:32:51 +08:00
Yishuo Wang
2946420e14
add minicpmv 2.6 load_low_bit workaround (#11856) 2024-08-20 11:16:02 +08:00
Yishuo Wang
9490781aec
optimize phi3 memory usage again (#11848) 2024-08-19 17:26:59 +08:00
Yina Chen
3cd4e87168
Support compress KV with quantize KV (#11812)
* update llama

* support llama 4.41

* fix style

* support minicpm

* support qwen2

* support minicpm & update

* support chatglm4

* support chatglm

* remove print

* add DynamicCompressFp8Cache & support qwen

* support llama

* support minicpm phi3

* update chatglm2/4

* small fix & support qwen 4.42

* remove print
2024-08-19 15:32:32 +08:00
Yishuo Wang
17a0beb21f
optimize qwen2-audio again (#11825) 2024-08-16 11:11:35 +08:00
Yuwen Hu
9e9086cc2a
Update IPEX_LLM_PERFORMANCE_MODE (#11823) 2024-08-16 09:48:36 +08:00
Yishuo Wang
750d4ad5dc
fix minicpm-v-2 fp16 (#11819) 2024-08-15 18:34:40 +08:00
Yishuo Wang
828ab16537
fix phi3 and minicpmv cpu (#11818) 2024-08-15 17:43:29 +08:00
Yishuo Wang
4e178f0c5d
rewrite minicpmv optimization (#11816) 2024-08-15 17:27:12 +08:00
Yishuo Wang
07b7f13982
support and optimize qwen2-audio (#11809) 2024-08-15 14:59:04 +08:00
Yishuo Wang
9a93808fc5
fix and optimize minicpm v 2 (#11799) 2024-08-14 17:27:23 +08:00
Yishuo Wang
3d6cfa291d
optimize minicpm v 2.5 (#11793) 2024-08-14 16:07:24 +08:00
Ruonan Wang
43cca3be27
fix gemma2 runtime error caused by sliding window (#11788)
* fix runtime error

* revert workflow
2024-08-14 10:43:33 +08:00
Yina Chen
7cd6ec9723
MiniCPM-V support compresskv (#11779)
* fix check error

* fix other models

* remove print
2024-08-13 19:03:40 +08:00
Qiyuan Gong
3998de14f0
Fix mistral forward_qkv in q4_0 (#11781)
* Fix mistral forward_qkv without self.rotary_emb.base in q4_0.
* Replace apply_rotary_pos_emb_no_cache_xpu with rotary_half_inplaced.
* Revert https://github.com/intel-analytics/ipex-llm/pull/11765
2024-08-13 16:48:19 +08:00
Yishuo Wang
a184b120c9
fix minicpm-v 2.5 (#11780) 2024-08-13 16:14:00 +08:00
Qiyuan Gong
a88c132e54
Reduce Mistral softmax memory only in low memory mode (#11775)
* Reduce Mistral softmax memory only in low memory mode
2024-08-13 14:50:54 +08:00
Yishuo Wang
aa861df066
use new fp32 softmax kernel (#11776) 2024-08-13 14:48:11 +08:00
Yishuo Wang
a1eb793f70
optimize minicpm v 2_6 firs token perf (#11770) 2024-08-13 09:51:18 +08:00
Yina Chen
841dbcdf3a
Fix compresskv with lookahead issue (#11767)
* fix compresskv + lookahead attn_mask qwen2

* support llama chatglm

* support mistral & chatglm

* address comments

* revert run.py
2024-08-12 18:53:55 +08:00
Xu, Shuo
1b05caba2b
Set mistral fuse rope to false except fp6 & fp16 (#11765)
* set mistral fuse rope to false except fp6 & fp16

* lint

* lint

---------

Co-authored-by: ATMxsp01 <shou.xu@intel.com>
2024-08-12 17:25:07 +08:00
Yishuo Wang
57d177738d
optimize minicpm-v-2_6 repetition penalty (#11763) 2024-08-12 14:10:10 +08:00
Yina Chen
4b9c57cc60
Support compress kv with lookahead (#11752)
* support compress kv with lookahead

* enough kv miss param
2024-08-09 17:39:57 +08:00