Yishuo Wang
|
535bee5381
|
fix qwen2 vl again (#12174)
|
2024-10-10 13:50:01 +08:00 |
|
Yishuo Wang
|
78d253165d
|
optimize qwen2 vl perf again (#12167)
|
2024-10-09 16:43:48 +08:00 |
|
Yishuo Wang
|
644af2a76e
|
add basic llama 3.2 vision support (#12163)
|
2024-10-08 10:46:48 +08:00 |
|
Yishuo Wang
|
669ff1a97b
|
fix sd1.5 (#12129)
|
2024-09-26 17:15:16 +08:00 |
|
Yishuo Wang
|
a266528719
|
optimize llama 3.2 rope (#12128)
|
2024-09-26 16:08:10 +08:00 |
|
Yishuo Wang
|
584c3489e7
|
add basic support for llama3.2 (#12125)
|
2024-09-26 15:46:19 +08:00 |
|
Yishuo Wang
|
66f419f8b7
|
fix qwen2 vl (#12126)
|
2024-09-26 15:44:02 +08:00 |
|
Yishuo Wang
|
47e0b83cbf
|
optimize sd 1.5 (#12119)
|
2024-09-25 15:45:13 +08:00 |
|
Yishuo Wang
|
5d63aef60b
|
optimize qwen2 vl again (#12109)
|
2024-09-23 13:22:01 +08:00 |
|
Yishuo Wang
|
9239fd4f12
|
add basic support and optimization for qwen2-vl (#12104)
|
2024-09-20 17:23:06 +08:00 |
|
Yishuo Wang
|
d8c044e79d
|
optimize minicpm3 kv cache (#12052)
|
2024-09-10 16:51:21 +08:00 |
|
Yishuo Wang
|
abc370728c
|
optimize minicpm3 again (#12047)
|
2024-09-10 14:19:57 +08:00 |
|
Yishuo Wang
|
048b4590aa
|
add basic minicpm3 optimization (#12039)
|
2024-09-09 17:25:08 +08:00 |
|
Yishuo Wang
|
6cedb601e4
|
remove some useless code (#12035)
|
2024-09-06 17:51:08 +08:00 |
|
Guoqiong Song
|
8803242f5c
|
fix llama on cpu (#12018)
|
2024-09-04 19:17:54 -07:00 |
|
Yuwen Hu
|
a9e485eb1b
|
Support MiniCPM-V-2_6 multi-modal benchmarking with latency text streamer (#11963)
* Support MiniCPM-V-2_6 multi-modal benchmarking with latency text streamer
* Style fixes
|
2024-08-29 19:22:09 +08:00 |
|
Yishuo Wang
|
0fbb10259a
|
use sdp_causal to reduce internvl2-4b memory usage if set environment variable (#11953)
|
2024-08-28 17:35:05 +08:00 |
|
hxsz1997
|
650e6e6ce4
|
Merge pull request #11891 from hxsz1997/baichuan2-compresskv
Add compress_kv for Baichuan2
|
2024-08-23 06:09:58 +03:00 |
|
Ruonan Wang
|
4a61f7d20d
|
update mlp of llama (#11897)
* update mlp of llama
* relax threshold of mlp test
* revert code
|
2024-08-22 20:34:53 +08:00 |
|
Huang, Xinshengzi
|
eb1e65f8a9
|
add comment
|
2024-08-22 15:14:47 +08:00 |
|
Huang, Xinshengzi
|
a2be3d7501
|
add comment of compress kv in attention forward
|
2024-08-22 15:11:55 +08:00 |
|
Huang, Xinshengzi
|
ce7de77085
|
add comment of change in model forward
|
2024-08-22 14:29:27 +08:00 |
|
Huang, Xinshengzi
|
42398a0045
|
add comment
|
2024-08-22 13:17:13 +08:00 |
|
Huang, Xinshengzi
|
48a827aa07
|
fix typos
|
2024-08-22 11:35:47 +08:00 |
|
Huang, Xinshengzi
|
8a5df93de2
|
fix typos
|
2024-08-22 11:33:07 +08:00 |
|
Huang, Xinshengzi
|
01ed397e7a
|
fix typos
|
2024-08-22 11:31:25 +08:00 |
|
Huang, Xinshengzi
|
c6ed1c412d
|
fix typos
|
2024-08-22 11:26:49 +08:00 |
|
Huang, Xinshengzi
|
2a0aa9271b
|
fix typos
|
2024-08-22 11:23:22 +08:00 |
|
Huang, Xinshengzi
|
4adadddbbc
|
fix typos
|
2024-08-22 11:12:23 +08:00 |
|
Huang, Xinshengzi
|
6a5ca17afc
|
fix typoes
|
2024-08-22 11:09:58 +08:00 |
|
Huang, Xinshengzi
|
6bb9035788
|
fix typos
|
2024-08-22 11:08:48 +08:00 |
|
Huang, Xinshengzi
|
86248b0505
|
add compress_kv for baichuan2
|
2024-08-22 10:59:08 +08:00 |
|
Yishuo Wang
|
bd1e490d62
|
fix phi3 (#11878)
|
2024-08-21 10:31:41 +08:00 |
|
Yina Chen
|
c3c058373f
|
Update compresskv model forward type logic (#11868)
* update
* fix
|
2024-08-20 18:11:37 +08:00 |
|
Yishuo Wang
|
d4ee0a89f3
|
optimize phi3 memory usage (#11867)
|
2024-08-20 17:32:51 +08:00 |
|
Yishuo Wang
|
2946420e14
|
add minicpmv 2.6 load_low_bit workaround (#11856)
|
2024-08-20 11:16:02 +08:00 |
|
Yishuo Wang
|
9490781aec
|
optimize phi3 memory usage again (#11848)
|
2024-08-19 17:26:59 +08:00 |
|
Yina Chen
|
3cd4e87168
|
Support compress KV with quantize KV (#11812)
* update llama
* support llama 4.41
* fix style
* support minicpm
* support qwen2
* support minicpm & update
* support chatglm4
* support chatglm
* remove print
* add DynamicCompressFp8Cache & support qwen
* support llama
* support minicpm phi3
* update chatglm2/4
* small fix & support qwen 4.42
* remove print
|
2024-08-19 15:32:32 +08:00 |
|
Yishuo Wang
|
17a0beb21f
|
optimize qwen2-audio again (#11825)
|
2024-08-16 11:11:35 +08:00 |
|
Yuwen Hu
|
9e9086cc2a
|
Update IPEX_LLM_PERFORMANCE_MODE (#11823)
|
2024-08-16 09:48:36 +08:00 |
|
Yishuo Wang
|
750d4ad5dc
|
fix minicpm-v-2 fp16 (#11819)
|
2024-08-15 18:34:40 +08:00 |
|
Yishuo Wang
|
828ab16537
|
fix phi3 and minicpmv cpu (#11818)
|
2024-08-15 17:43:29 +08:00 |
|
Yishuo Wang
|
4e178f0c5d
|
rewrite minicpmv optimization (#11816)
|
2024-08-15 17:27:12 +08:00 |
|
Yishuo Wang
|
07b7f13982
|
support and optimize qwen2-audio (#11809)
|
2024-08-15 14:59:04 +08:00 |
|
Yishuo Wang
|
9a93808fc5
|
fix and optimize minicpm v 2 (#11799)
|
2024-08-14 17:27:23 +08:00 |
|
Yishuo Wang
|
3d6cfa291d
|
optimize minicpm v 2.5 (#11793)
|
2024-08-14 16:07:24 +08:00 |
|
Ruonan Wang
|
43cca3be27
|
fix gemma2 runtime error caused by sliding window (#11788)
* fix runtime error
* revert workflow
|
2024-08-14 10:43:33 +08:00 |
|
Yina Chen
|
7cd6ec9723
|
MiniCPM-V support compresskv (#11779)
* fix check error
* fix other models
* remove print
|
2024-08-13 19:03:40 +08:00 |
|
Qiyuan Gong
|
3998de14f0
|
Fix mistral forward_qkv in q4_0 (#11781)
* Fix mistral forward_qkv without self.rotary_emb.base in q4_0.
* Replace apply_rotary_pos_emb_no_cache_xpu with rotary_half_inplaced.
* Revert https://github.com/intel-analytics/ipex-llm/pull/11765
|
2024-08-13 16:48:19 +08:00 |
|
Yishuo Wang
|
a184b120c9
|
fix minicpm-v 2.5 (#11780)
|
2024-08-13 16:14:00 +08:00 |
|