Commit graph

6 commits

Author SHA1 Message Date
Yuwen Hu
381d448ee2
[NPU] Example & Quickstart updates (#12650)
* Remove model with optimize_model=False in NPU verified models tables, and remove related example

* Remove experimental in run optimized model section title

* Unify model table order & example cmd

* Move embedding example to separate folder & update quickstart example link

* Add Quickstart reference in main NPU readme

* Small fix

* Small fix

* Move save/load examples under NPU/HF-Transformers-AutoModels

* Add low-bit and polish arguments for LLM Python examples

* Small fix

* Add low-bit and polish arguments for Multi-Model  examples

* Polish argument for Embedding models

* Polish argument for LLM CPP examples

* Add low-bit and polish argument for Save-Load examples

* Add accuracy tuning tips for examples

* Update NPU qucikstart accuracy tuning with low-bit optimizations

* Add save/load section to qucikstart

* Update CPP example sample output to EN

* Add installation regarding cmake for CPP examples

* Small fix

* Small fix

* Small fix

* Small fix

* Small fix

* Small fix

* Unify max prompt length to 512

* Change recommended low-bit for Qwen2.5-3B-Instruct to asym_int4

* Update based on comments

* Small fix
2025-01-07 13:52:41 +08:00
binbin Deng
6fc27da9c1
[NPU] Update glm-edge support in docs (#12529) 2024-12-12 11:14:09 +08:00
Yuwen Hu
aee9acb303
Add NPU QuickStart & update example links (#12470)
* Add initial NPU quickstart (c++ part unfinished)

* Small update

* Update based on comments

* Update main readme

* Remove LLaMA description

* Small fix

* Small fix

* Remove subsection link in main README

* Small fix

* Update based on comments

* Small fix

* TOC update and other small fixes

* Update for Chinese main readme

* Update based on comments and other small fixes

* Change order
2024-12-02 17:03:10 +08:00
SONG Ge
d2cbcb060c
Add initial support for modeling_xlm encoder on NPU (#12393)
* Add initial support for modeling_xlm encoder on NPU

* Add EmbeddingModel class to keep the same usage with bce and npu fp16 linear convert

* Optimize currently implementation to support EmbeddingModel.encode API and convert other torch modules to NPU

* Add related example and documents
2024-11-14 10:50:27 +08:00
SONG Ge
a7b66683f1
[NPU] Add Optimized Support for Llama3.2-1B/3B on NPU (#12339)
* Add initial support for llama3.2-1b/3b

* move llama3.2 support into current llama_mp impl
2024-11-06 19:21:40 +08:00
SONG Ge
a0c6432899
[NPU] Add support for loading a FunASR model (#12073)
* add support for loading funasr model

* add initial support for paraformer-encoder

* add npu ops impl

* add encoder-decoder npu pipeline

* move paraformer encoders prefix 30 layers  to npu and keep the rest layers on cpu
2024-10-25 17:22:01 +08:00