Commit graph

6 commits

Author SHA1 Message Date
Yuwen Hu
381d448ee2
[NPU] Example & Quickstart updates (#12650)
* Remove model with optimize_model=False in NPU verified models tables, and remove related example

* Remove experimental in run optimized model section title

* Unify model table order & example cmd

* Move embedding example to separate folder & update quickstart example link

* Add Quickstart reference in main NPU readme

* Small fix

* Small fix

* Move save/load examples under NPU/HF-Transformers-AutoModels

* Add low-bit and polish arguments for LLM Python examples

* Small fix

* Add low-bit and polish arguments for Multi-Model  examples

* Polish argument for Embedding models

* Polish argument for LLM CPP examples

* Add low-bit and polish argument for Save-Load examples

* Add accuracy tuning tips for examples

* Update NPU qucikstart accuracy tuning with low-bit optimizations

* Add save/load section to qucikstart

* Update CPP example sample output to EN

* Add installation regarding cmake for CPP examples

* Small fix

* Small fix

* Small fix

* Small fix

* Small fix

* Small fix

* Unify max prompt length to 512

* Change recommended low-bit for Qwen2.5-3B-Instruct to asym_int4

* Update based on comments

* Small fix
2025-01-07 13:52:41 +08:00
binbin Deng
ab01753b1c
[NPU] update save-load API usage (#12473) 2024-12-03 09:46:15 +08:00
binbin Deng
7a97fbb779
Support vpm and resampler module of minicpm-v on NPU (#12375) 2024-11-12 15:59:55 +08:00
Ruonan Wang
3fe2ea3081
[NPU] Reuse prefill of acc lib for pipeline (#12279)
* first commit

* update example

* fix style

* update example

* embedding as const

* fix generate

* code  refactor

* meet code review

* fix style

* change max_output_len to max_context_len

* fix all-in-one

* fix example

* add check for new tokens
2024-10-28 16:05:49 +08:00
Ruonan Wang
573c20bae6
fix npu lm_head cpu condition (#11976)
* fix

* fix

* fix

* fix stype

* fix style

* fix style
2024-08-30 17:11:26 +08:00
SONG Ge
158289d205
[NPU] Add initial support for minicpm-llama-v2.5 (#11962)
* add initial support for minicpm-llama-v2.5

* update impl

* add minicpm-llama3-v2.5 example
2024-08-30 16:00:33 +08:00