ipex-llm/python/llm/src/ipex_llm/ggml
Ruonan Wang 49ab8974fa
[NPU] initial support of asym_int4_rtn (#12484)
* initiail support of q4_1

* fix

* fix

* update

* update min to Z1

* update

* fix

* update

* fix style

* fix

* support qwen2 optimize_model=True mp version

* temp save

* fix

* fix style

* replace min with zero

* support split linear for q4_1

* fix lm_head with mixed_precision=True

* fix style

* revert test code

* add down proj back for q4_0

* remove print
2024-12-05 17:40:36 +08:00
..
model Support imatrix-guided quantization for NPU CW (#12468) 2024-12-02 11:31:26 +08:00
__init__.py Refactor bigdl.llm to ipex_llm (#24) 2024-03-22 15:41:21 +08:00
convert.py Remove chatglm_C Module to Eliminate LGPL Dependency (#11178) 2024-05-31 17:03:11 +08:00
convert_model.py Remove chatglm_C Module to Eliminate LGPL Dependency (#11178) 2024-05-31 17:03:11 +08:00
quantize.py [NPU] initial support of asym_int4_rtn (#12484) 2024-12-05 17:40:36 +08:00