ipex-llm/python/llm
Ruonan Wang 081af41def
[NPU] Optimize Qwen2 lm_head to use INT4 (#12072)
* temp save

* update

* fix

* fix

* Split lm_head into 7 parts & remove int8 for lm_head when sym_int4

* Simlify and add condition to code

* Small fix

* refactor some code

* fix style

* fix style

* fix style

* fix

* fix

* temp sav e

* refactor

* fix style

* further refactor

* simplify code

* meet code review

* fix style

---------

Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
2024-09-14 15:26:46 +08:00
..
dev fix: textual and env variable adjustment (#12038) 2024-09-11 13:38:01 +08:00
example add lowbit_path for generate.py, fix npu_model (#12077) 2024-09-13 17:28:05 +08:00
portable-zip Fix null pointer dereferences error. (#11125) 2024-05-30 16:16:10 +08:00
scripts fix typo in python/llm/scripts/README.md (#11536) 2024-07-09 09:53:14 +08:00
src/ipex_llm [NPU] Optimize Qwen2 lm_head to use INT4 (#12072) 2024-09-14 15:26:46 +08:00
test fix UT (#12005) 2024-09-04 18:02:49 +08:00
tpp OSPDT: add tpp licenses (#11165) 2024-06-06 10:59:06 +08:00
.gitignore [LLM] add chatglm pybinding binary file release (#8677) 2023-08-04 11:45:27 +08:00
setup.py upgrade OneAPI version for cpp Windows (#12063) 2024-09-12 11:12:12 +08:00
version.txt Update pypi tag to 2.2.0.dev0 (#11895) 2024-08-22 16:48:09 +08:00