* Use split for Qwen2 lm_head instead of slice in optimize_pre * Support split lm_head in Qwen2 python cpp backend * Fit with Python acc lib pipeline * Removed default mixed_precision=True in all-in-one and related examples * Small fix * Style fix * Fix based on comments * Fix based on comments * Stype fix |
||
|---|---|---|
| .. | ||
| cli | ||
| ggml | ||
| gptq | ||
| langchain | ||
| llamaindex | ||
| serving | ||
| transformers | ||
| utils | ||
| vllm | ||
| __init__.py | ||
| convert_model.py | ||
| format.sh | ||
| llm_patching.py | ||
| models.py | ||
| optimize.py | ||