Ruonan Wang
|
4b6c3160be
|
Support imatrix-guided quantization for NPU CW (#12468)
* init commit
* remove print
* add interface
* fix
* fix
* fix style
|
2024-12-02 11:31:26 +08:00 |
|
Zhao Changmin
|
cf8eb7b128
|
Init NPU quantize method and support q8_0_rtn (#11452)
* q8_0_rtn
* fix float point
|
2024-07-01 13:45:07 +08:00 |
|
Yina Chen
|
0af0102e61
|
Add quantization scale search switch (#11326)
* add scale_search switch
* remove llama3 instruct
* remove print
|
2024-06-14 18:46:52 +08:00 |
|
Yina Chen
|
ed67435491
|
Support Fp6 k in ipex-llm (#11222)
* support fp6_k
* support fp6_k
* remove
* fix style
|
2024-06-05 17:34:36 +08:00 |
|
Shaojun Liu
|
401013a630
|
Remove chatglm_C Module to Eliminate LGPL Dependency (#11178)
* remove chatglm_C.**.pyd to solve ngsolve weak copyright vunl
* fix style check error
* remove chatglm native int4 from langchain
|
2024-05-31 17:03:11 +08:00 |
|
Ruonan Wang
|
f1156e6b20
|
support gguf_q4k_m / gguf_q4k_s (#10887)
* initial commit
* UPDATE
* fix style
* fix style
* add gguf_q4k_s
* update comment
* fix
|
2024-05-17 14:30:09 +08:00 |
|
Yina Chen
|
893197434d
|
Add fp6 support on gpu (#11008)
* add fp6 support
* fix style
|
2024-05-14 16:31:44 +08:00 |
|
Yina Chen
|
8796401b08
|
Support q4k in ipex-llm (#10796)
* support q4k
* update
|
2024-04-18 18:55:28 +08:00 |
|
Ruonan Wang
|
0e8aac19e3
|
add q6k precision in ipex-llm (#10792)
* add q6k
* add initial 16k
* update
* fix style
|
2024-04-18 16:52:09 +08:00 |
|
Ruonan Wang
|
0136fad1d4
|
LLM: support iq1_s (#10564)
* init version
* update utils
* remove unsed code
|
2024-03-29 09:43:55 +08:00 |
|
Wang, Jian4
|
9df70d95eb
|
Refactor bigdl.llm to ipex_llm (#24)
* Rename bigdl/llm to ipex_llm
* rm python/llm/src/bigdl
* from bigdl.llm to from ipex_llm
|
2024-03-22 15:41:21 +08:00 |
|