* Use split for Qwen2 lm_head instead of slice in optimize_pre * Support split lm_head in Qwen2 python cpp backend * Fit with Python acc lib pipeline * Removed default mixed_precision=True in all-in-one and related examples * Small fix * Style fix * Fix based on comments * Fix based on comments * Stype fix |
||
|---|---|---|
| .. | ||
| CPU | ||
| GPU | ||
| NPU/HF-Transformers-AutoModels | ||