ipex-llm/python/llm/example/NPU
Yuwen Hu ef4028ac2d
[NPU] Support split lm_head for Qwen2 with CPP (#12491)
* Use split for Qwen2 lm_head instead of slice in optimize_pre

* Support split lm_head in Qwen2 python cpp backend

* Fit with Python acc lib pipeline

* Removed default mixed_precision=True in all-in-one and related examples

* Small fix

* Style fix

* Fix based on comments

* Fix based on comments

* Stype fix
2024-12-04 14:41:08 +08:00
..
HF-Transformers-AutoModels [NPU] Support split lm_head for Qwen2 with CPP (#12491) 2024-12-04 14:41:08 +08:00