Xin Qiu
|
39c9d1de52
|
fix code geex (#12261)
|
2024-10-24 14:34:01 +08:00 |
|
Yina Chen
|
3cd4e87168
|
Support compress KV with quantize KV (#11812)
* update llama
* support llama 4.41
* fix style
* support minicpm
* support qwen2
* support minicpm & update
* support chatglm4
* support chatglm
* remove print
* add DynamicCompressFp8Cache & support qwen
* support llama
* support minicpm phi3
* update chatglm2/4
* small fix & support qwen 4.42
* remove print
|
2024-08-19 15:32:32 +08:00 |
|
Yina Chen
|
7cd6ec9723
|
MiniCPM-V support compresskv (#11779)
* fix check error
* fix other models
* remove print
|
2024-08-13 19:03:40 +08:00 |
|
Yina Chen
|
841dbcdf3a
|
Fix compresskv with lookahead issue (#11767)
* fix compresskv + lookahead attn_mask qwen2
* support llama chatglm
* support mistral & chatglm
* address comments
* revert run.py
|
2024-08-12 18:53:55 +08:00 |
|
Yina Chen
|
4b9c57cc60
|
Support compress kv with lookahead (#11752)
* support compress kv with lookahead
* enough kv miss param
|
2024-08-09 17:39:57 +08:00 |
|
Yina Chen
|
a71ae7c22b
|
Support minicpm compresskv & modify default compresskv config & default enable compresskv on mtl 2.5k~4.5k (#11726)
* support minicpm & modify default & default enable on mtl 2.5k~4.5k
* fix style
|
2024-08-07 11:35:39 +08:00 |
|
Yina Chen
|
45c730ff39
|
Chatglm support compresskv (#11690)
* chatglm4 support compresskv
* fix
* fix style
* support chatglm2
* fix quantkv conflict
* fix style
|
2024-08-01 18:20:20 +08:00 |
|
binbin Deng
|
9274282ef7
|
Support pipeline parallel for glm-4-9b-chat (#11463)
|
2024-07-03 14:25:28 +08:00 |
|
binbin Deng
|
4ba82191f2
|
Support PP inference for chatglm3 (#11375)
|
2024-06-21 09:59:01 +08:00 |
|
Yishuo Wang
|
e8dd8e97ef
|
fix chatglm lookahead on ARC (#11320)
|
2024-06-14 16:26:11 +08:00 |
|
Yishuo Wang
|
7f65836cb9
|
fix chatglm2/3-32k/128k fp16 (#11311)
|
2024-06-14 09:58:07 +08:00 |
|
Yishuo Wang
|
5e25766855
|
fix and optimize chatglm2-32k and chatglm3-128k (#11306)
|
2024-06-13 17:37:58 +08:00 |
|
Yishuo Wang
|
a24666b8f3
|
fix chatglm3-6b-32k (#11303)
|
2024-06-13 16:01:34 +08:00 |
|
Yishuo Wang
|
01fe0fc1a2
|
refactor chatglm2/3 (#11290)
|
2024-06-13 12:22:58 +08:00 |
|
Xin Qiu
|
592f7aa61e
|
Refine glm1-4 sdp (#11276)
* chatglm
* update
* update
* change chatglm
* update sdpa
* update
* fix style
* fix
* fix glm
* update glm2-32k
* update glm2-32k
* fix cpu
* update
* change lower_bound
|
2024-06-12 17:11:56 +08:00 |
|
Yina Chen
|
b6b70d1ba0
|
Divide core-xe packages (#11131)
* temp
* add batch
* fix style
* update package name
* fix style
* add workflow
* use temp version to run uts
* trigger performance test
* trigger win igpu perf
* revert workflow & setup
|
2024-05-28 12:00:18 +08:00 |
|
Yishuo Wang
|
59df750326
|
Use new sdp again (#11025)
|
2024-05-16 09:33:34 +08:00 |
|
Cengguang Zhang
|
75dbf240ec
|
LLM: update split tensor conditions. (#10872)
* LLM: update split tensor condition.
* add cond for split tensor.
* update priority of env.
* fix style.
* update env name.
|
2024-04-30 17:07:21 +08:00 |
|
Yishuo Wang
|
d884c62dc4
|
remove new_layout parameter (#10906)
|
2024-04-29 10:31:50 +08:00 |
|
Cengguang Zhang
|
9752ffe979
|
LLM: update split qkv native sdp. (#10895)
* LLM: update split qkv native sdp.
* fix typo.
|
2024-04-26 18:47:35 +08:00 |
|
Cengguang Zhang
|
763413b7e1
|
LLM: support llama split tensor for long context in transformers>=4.36. (#10844)
* LLm: support llama split tensor for long context in transformers>=4.36.
* fix dtype.
* fix style.
* fix style.
* fix style.
* fix style.
* fix dtype.
* fix style.
|
2024-04-23 16:13:25 +08:00 |
|
Yishuo Wang
|
08458b4f74
|
remove rms norm copy (#10793)
|
2024-04-19 13:57:48 +08:00 |
|
Cengguang Zhang
|
3e2662c87e
|
LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771)
|
2024-04-16 09:32:30 +08:00 |
|
Cengguang Zhang
|
4b024b7aac
|
LLM: optimize chatglm2 8k input. (#10723)
* LLM: optimize chatglm2 8k input.
* rename.
|
2024-04-10 16:59:06 +08:00 |
|
Keyan (Kyrie) Zhang
|
585c174e92
|
Read the value of KV_CACHE_ALLOC_BLOCK_LENGTH from the environment variables (#10707)
* Read the value of KV_CACHE_ALLOC_BLOCK_LENGTH from the environment variables.
* Fix style
|
2024-04-10 10:48:46 +08:00 |
|
Cengguang Zhang
|
1a9b8204a4
|
LLM: support int4 fp16 chatglm2-6b 8k input. (#10648)
|
2024-04-07 09:39:21 +08:00 |
|
Shaojun Liu
|
a10f5a1b8d
|
add python style check (#10620)
* add python style check
* fix style checks
* update runner
* add ipex-llm-finetune-qlora-cpu-k8s to manually_build workflow
* update tag to 2.1.0-SNAPSHOT
|
2024-04-02 16:17:56 +08:00 |
|
ZehuaCao
|
52a2135d83
|
Replace ipex with ipex-llm (#10554)
* fix ipex with ipex_llm
* fix ipex with ipex_llm
* update
* update
* update
* update
* update
* update
* update
* update
|
2024-03-28 13:54:40 +08:00 |
|
Yishuo Wang
|
69a28d6b4c
|
fix chatglm (#10540)
|
2024-03-26 16:01:00 +08:00 |
|
Wang, Jian4
|
9df70d95eb
|
Refactor bigdl.llm to ipex_llm (#24)
* Rename bigdl/llm to ipex_llm
* rm python/llm/src/bigdl
* from bigdl.llm to from ipex_llm
|
2024-03-22 15:41:21 +08:00 |
|