Yina Chen
|
f24352aef9
|
llama 3.1/3.2 support compresskv (#12347)
* llama 3.1/3.2 support compresskv
* update
* fix transformers 4.45 error
* fix style
* fix typo
* disable llama3.2 1b compresskv
|
2024-11-06 17:33:43 +08:00 |
|
Yishuo Wang
|
584c3489e7
|
add basic support for llama3.2 (#12125)
|
2024-09-26 15:46:19 +08:00 |
|
Yishuo Wang
|
d4ee0a89f3
|
optimize phi3 memory usage (#11867)
|
2024-08-20 17:32:51 +08:00 |
|
Yina Chen
|
3cd4e87168
|
Support compress KV with quantize KV (#11812)
* update llama
* support llama 4.41
* fix style
* support minicpm
* support qwen2
* support minicpm & update
* support chatglm4
* support chatglm
* remove print
* add DynamicCompressFp8Cache & support qwen
* support llama
* support minicpm phi3
* update chatglm2/4
* small fix & support qwen 4.42
* remove print
|
2024-08-19 15:32:32 +08:00 |
|
Yina Chen
|
dd46c141bd
|
Phi3 support compresskv (#11733)
* phi3 support compresskv
* fix phi3 mtl error
* fix conflict with quant kv
* fix abnormal on mtl
* fix style
* use slide windows size to compress kv
* support sliding window
* fix style
* fix style
* temp: partial support quant kv
* support quant kv with compress kv, todo: model check
* temp
* fix style
* fix style
* remove prepare
* address comment
* default -> 1.8k
|
2024-08-09 15:43:43 +08:00 |
|
Yina Chen
|
d2abc9711b
|
Fix MTL 4k input qwen2 compresskv error (#11734)
* fix
* fix style
|
2024-08-07 16:21:57 +08:00 |
|
Yina Chen
|
a71ae7c22b
|
Support minicpm compresskv & modify default compresskv config & default enable compresskv on mtl 2.5k~4.5k (#11726)
* support minicpm & modify default & default enable on mtl 2.5k~4.5k
* fix style
|
2024-08-07 11:35:39 +08:00 |
|
Yina Chen
|
670ad887fc
|
Qwen support compress kv (#11680)
* Qwen support compress kv
* fix style
* fix
|
2024-07-30 11:16:42 +08:00 |
|
Yina Chen
|
fc7f8feb83
|
Support compress kv (#11642)
* mistral snapkv
* update
* mtl update
* update
* update
* update
* add comments
* style fix
* fix style
* support llama
* llama use compress kv
* support mistral 4.40
* fix style
* support diff transformers versions
* move snapkv util to kv
* fix style
* meet comments & small fix
* revert all in one
* fix indent
---------
Co-authored-by: leonardozcm <leonardo1997zcm@gmail.com>
|
2024-07-26 16:02:00 +08:00 |
|
Yishuo Wang
|
d884c62dc4
|
remove new_layout parameter (#10906)
|
2024-04-29 10:31:50 +08:00 |
|
Yishuo Wang
|
702e686901
|
optimize starcoder normal kv cache (#10642)
|
2024-04-03 15:27:02 +08:00 |
|
Yishuo Wang
|
ba8cc6bd68
|
optimize starcoder2-3b (#10625)
|
2024-04-02 17:16:29 +08:00 |
|
Wang, Jian4
|
9df70d95eb
|
Refactor bigdl.llm to ipex_llm (#24)
* Rename bigdl/llm to ipex_llm
* rm python/llm/src/bigdl
* from bigdl.llm to from ipex_llm
|
2024-03-22 15:41:21 +08:00 |
|