ipex-llm

History

Yuwen Hu ef4028ac2d [NPU] Support split `lm_head` for Qwen2 with CPP (#12491 ) * Use split for Qwen2 lm_head instead of slice in optimize_pre * Support split lm_head in Qwen2 python cpp backend * Fit with Python acc lib pipeline * Removed default mixed_precision=True in all-in-one and related examples * Small fix * Style fix * Fix based on comments * Fix based on comments * Stype fix	2024-12-04 14:41:08 +08:00
..
HF-Transformers-AutoModels	[NPU] Support split `lm_head` for Qwen2 with CPP (#12491 )	2024-12-04 14:41:08 +08:00

[NPU] Support split lm_head for Qwen2 with CPP (#12491 )

* Use split for Qwen2 lm_head instead of slice in optimize_pre

* Support split lm_head in Qwen2 python cpp backend

* Fit with Python acc lib pipeline

* Removed default mixed_precision=True in all-in-one and related examples

* Small fix

* Style fix

* Fix based on comments

* Fix based on comments

* Stype fix

2024-12-04 14:41:08 +08:00

HF-Transformers-AutoModels [NPU] Support split lm_head for Qwen2 with CPP (#12491 ) 2024-12-04 14:41:08 +08:00