Yuwen Hu
|
fd384ddfb8
|
Optimize StableLM (#10619)
* Initial commit for stablelm optimizations
* Small style fix
* add dependency
* Add mlp optimizations
* Small fix
* add attention forward
* Remove quantize kv for now as head_dim=80
* Add merged qkv
* fix lisence
* Python style fix
---------
Co-authored-by: qiuxin2012 <qiuxin2012cs@gmail.com>
|
2024-04-02 18:58:38 +08:00 |
|
Shaojun Liu
|
a10f5a1b8d
|
add python style check (#10620)
* add python style check
* fix style checks
* update runner
* add ipex-llm-finetune-qlora-cpu-k8s to manually_build workflow
* update tag to 2.1.0-SNAPSHOT
|
2024-04-02 16:17:56 +08:00 |
|
Cengguang Zhang
|
58b57177e3
|
LLM: support bigdl quantize kv cache env and add warning. (#10623)
* LLM: support bigdl quantize kv cache env and add warnning.
* fix style.
* fix comments.
|
2024-04-02 15:41:08 +08:00 |
|
Cengguang Zhang
|
e567956121
|
LLM: add memory optimization for llama. (#10592)
* add initial memory optimization.
* fix logic.
* fix logic,
* remove env var check in mlp split.
|
2024-04-02 09:07:50 +08:00 |
|
Qiyuan Gong
|
f4537798c1
|
Enable kv cache quantization by default for flex when 1 < batch <= 8 (#10584)
* Enable kv cache quantization by default for flex when 1 < batch <= 8.
* Change up bound from <8 to <=8.
|
2024-03-29 09:43:42 +08:00 |
|
Cengguang Zhang
|
b44f7adbad
|
LLM: Disable esimd sdp for PVC GPU when batch size>1 (#10579)
* llm: disable esimd sdp for pvc bz>1.
* fix logic.
* fix: avoid call get device name twice.
|
2024-03-28 22:55:48 +08:00 |
|
Ruonan Wang
|
ea4bc450c4
|
LLM: add esimd sdp for pvc (#10543)
* add esimd sdp for pvc
* update
* fix
* fix batch
|
2024-03-26 19:04:40 +08:00 |
|
Xin Qiu
|
1dd40b429c
|
enable fp4 fused mlp and qkv (#10531)
* enable fp4 fused mlp and qkv
* update qwen
* update qwen2
|
2024-03-26 08:34:00 +08:00 |
|
Wang, Jian4
|
9df70d95eb
|
Refactor bigdl.llm to ipex_llm (#24)
* Rename bigdl/llm to ipex_llm
* rm python/llm/src/bigdl
* from bigdl.llm to from ipex_llm
|
2024-03-22 15:41:21 +08:00 |
|