Yishuo Wang
c72a5db757
remove unused code again ( #12624 )
2024-12-27 14:17:11 +08:00
Zhao Changmin
546f455e8e
Patch sdpa check function in specific module attributes table ( #12285 )
2024-10-29 18:41:09 +08:00
Yishuo Wang
cacc891962
Fix PR validation ( #12253 )
2024-10-23 18:10:47 +08:00
Yishuo Wang
578aef245d
Fix models auto choose SdpaAttention with ipex 2.3 ( #12252 )
2024-10-23 15:33:45 +08:00
Yina Chen
ec465fbcd7
Add lookup generate in load_low_bit ( #12243 )
...
* add lookup generate in load_low_bit
* update comment
2024-10-22 15:51:52 +08:00
Yuwen Hu
96796f95cb
Update all-in-one benchmark prompts for continuation task & lookup update for minicpmv ( #11827 )
...
* Update all-in-one benchmark prompts for continuation task
* Small fix
* Add pure-text benchmark support for minicpm-v-2_6
* Support lookahead for model.llm generate of minicpmv
* Add prompt reference
* Small update
* Small fix
2024-08-16 17:16:35 +08:00
Xiangyu Tian
d27a8cd08c
Fix Pipeline Parallel dtype ( #11623 )
2024-07-19 13:07:40 +08:00
Yishuo Wang
d020ad6397
add save_low_bit support for DiskEmbedding ( #11621 )
2024-07-19 10:34:53 +08:00
Yishuo Wang
0209427cf4
Add disk_embedding parameter to support put Embedding layer on CPU ( #11617 )
2024-07-18 17:06:06 +08:00
Zhao Changmin
f7e957aaf9
Clean npu dtype branch ( #11515 )
...
* clean branch
* create_npu_kernels
2024-07-05 15:45:26 +08:00
SONG Ge
a414e3ff8a
add pipeline parallel support with load_low_bit ( #11414 )
2024-06-28 10:17:56 +08:00
binbin Deng
220151e2a1
Refactor pipeline parallel multi-stage implementation ( #11286 )
2024-06-13 10:00:23 +08:00
Yina Chen
ed67435491
Support Fp6 k in ipex-llm ( #11222 )
...
* support fp6_k
* support fp6_k
* remove
* fix style
2024-06-05 17:34:36 +08:00
SONG Ge
33852bd23e
Refactor pipeline parallel device config ( #11149 )
...
* refactor pipeline parallel device config
* meet comments
* update example
* add warnings and update code doc
2024-05-28 16:52:46 +08:00
Yishuo Wang
1dc680341b
fix phi-3-vision import ( #11129 )
2024-05-24 15:57:15 +08:00
Ruonan Wang
f1156e6b20
support gguf_q4k_m / gguf_q4k_s ( #10887 )
...
* initial commit
* UPDATE
* fix style
* fix style
* add gguf_q4k_s
* update comment
* fix
2024-05-17 14:30:09 +08:00
Yina Chen
893197434d
Add fp6 support on gpu ( #11008 )
...
* add fp6 support
* fix style
2024-05-14 16:31:44 +08:00
Zhao Changmin
0d6e12036f
Disable fast_init_ in load_low_bit ( #10945 )
...
* fast_init_ disable
2024-05-08 10:46:19 +08:00
Yang Wang
1ce8d7bcd9
Support the desc_act feature in GPTQ model ( #10851 )
...
* support act_order
* update versions
* fix style
* fix bug
* clean up
2024-04-24 10:17:13 -07:00
Ruonan Wang
439c834ed3
LLM: add mixed precision for lm_head ( #10795 )
...
* add mixed_quantization
* meet code review
* update
* fix style
* meet review
2024-04-18 19:11:31 +08:00
Yina Chen
8796401b08
Support q4k in ipex-llm ( #10796 )
...
* support q4k
* update
2024-04-18 18:55:28 +08:00
Ruonan Wang
0e8aac19e3
add q6k precision in ipex-llm ( #10792 )
...
* add q6k
* add initial 16k
* update
* fix style
2024-04-18 16:52:09 +08:00
Yina Chen
899d392e2f
Support prompt lookup in ipex-llm ( #10768 )
...
* lookup init
* add lookup
* fix style
* remove redundant code
* change param name
* fix style
2024-04-16 16:52:38 +08:00
Zhicun
b4147a97bb
Fix dtype mismatch error ( #10609 )
...
* fix llama
* fix
* fix code style
* add torch type in model.py
---------
Co-authored-by: arda <arda@arda-arc19.sh.intel.com>
2024-04-09 17:50:33 +08:00
Shaojun Liu
a10f5a1b8d
add python style check ( #10620 )
...
* add python style check
* fix style checks
* update runner
* add ipex-llm-finetune-qlora-cpu-k8s to manually_build workflow
* update tag to 2.1.0-SNAPSHOT
2024-04-02 16:17:56 +08:00
Ruonan Wang
0136fad1d4
LLM: support iq1_s ( #10564 )
...
* init version
* update utils
* remove unsed code
2024-03-29 09:43:55 +08:00
Wang, Jian4
9df70d95eb
Refactor bigdl.llm to ipex_llm ( #24 )
...
* Rename bigdl/llm to ipex_llm
* rm python/llm/src/bigdl
* from bigdl.llm to from ipex_llm
2024-03-22 15:41:21 +08:00