ipex-llm/python/llm/src/ipex_llm
Yuwen Hu ef4028ac2d
[NPU] Support split lm_head for Qwen2 with CPP (#12491)
* Use split for Qwen2 lm_head instead of slice in optimize_pre

* Support split lm_head in Qwen2 python cpp backend

* Fit with Python acc lib pipeline

* Removed default mixed_precision=True in all-in-one and related examples

* Small fix

* Style fix

* Fix based on comments

* Fix based on comments

* Stype fix
2024-12-04 14:41:08 +08:00
..
cli Refactor bigdl.llm to ipex_llm (#24) 2024-03-22 15:41:21 +08:00
ggml Support imatrix-guided quantization for NPU CW (#12468) 2024-12-02 11:31:26 +08:00
gptq Refactor bigdl.llm to ipex_llm (#24) 2024-03-22 15:41:21 +08:00
langchain Remove chatglm_C Module to Eliminate LGPL Dependency (#11178) 2024-05-31 17:03:11 +08:00
llamaindex Llamaindex: add tokenizer_id and support chat (#10590) 2024-04-07 13:51:34 +08:00
serving Upgrade to vllm 0.6.2 (#12338) 2024-11-12 20:35:34 +08:00
transformers [NPU] Support split lm_head for Qwen2 with CPP (#12491) 2024-12-04 14:41:08 +08:00
utils fix ipex 2.3 bug (#12366) 2024-11-08 13:29:15 +08:00
vllm add vLLM glm4 fix (#12474) 2024-12-02 14:05:16 +08:00
__init__.py IPEX Duplicate importer V2 (#11310) 2024-06-19 16:29:19 +08:00
convert_model.py Refactor bigdl.llm to ipex_llm (#24) 2024-03-22 15:41:21 +08:00
format.sh Refactor bigdl.llm to ipex_llm (#24) 2024-03-22 15:41:21 +08:00
llm_patching.py Upgrade Peft version to 0.10.0 for LLM finetune (#10886) 2024-05-07 15:09:14 +08:00
models.py Remove chatglm_C Module to Eliminate LGPL Dependency (#11178) 2024-05-31 17:03:11 +08:00
optimize.py fix and optimize sd (#12436) 2024-11-25 14:09:48 +08:00