* Remove model with optimize_model=False in NPU verified models tables, and remove related example * Remove experimental in run optimized model section title * Unify model table order & example cmd * Move embedding example to separate folder & update quickstart example link * Add Quickstart reference in main NPU readme * Small fix * Small fix * Move save/load examples under NPU/HF-Transformers-AutoModels * Add low-bit and polish arguments for LLM Python examples * Small fix * Add low-bit and polish arguments for Multi-Model examples * Polish argument for Embedding models * Polish argument for LLM CPP examples * Add low-bit and polish argument for Save-Load examples * Add accuracy tuning tips for examples * Update NPU qucikstart accuracy tuning with low-bit optimizations * Add save/load section to qucikstart * Update CPP example sample output to EN * Add installation regarding cmake for CPP examples * Small fix * Small fix * Small fix * Small fix * Small fix * Small fix * Unify max prompt length to 512 * Change recommended low-bit for Qwen2.5-3B-Instruct to asym_int4 * Update based on comments * Small fix  | 
			||
|---|---|---|
| .. | ||
| Embedding | ||
| LLM | ||
| Multimodal | ||
| Save-Load | ||
| README.md | ||
IPEX-LLM Examples on Intel NPU
This folder contains examples of running IPEX-LLM on Intel NPU:
- LLM: examples of running large language models using IPEX-LLM optimizations
- CPP: examples of running large language models using IPEX-LLM optimizations through C++ API
 
 - Multimodal: examples of running large multimodal models using IPEX-LLM optimizations
 - Embedding: examples of running embedding models using IPEX-LLM optimizations
 - Save-Load: examples of saving and loading low-bit models with IPEX-LLM optimizations
 
Tip
Please refer to IPEX-LLM NPU Quickstart regarding more information about running
ipex-llmon Intel NPU.
Verified Models on Intel NPU
| Model | Example Link | 
|---|---|
| Llama2 | Python link, C++ link | 
| Llama3 | Python link, C++ link | 
| Llama3.2 | Python link, C++ link | 
| GLM-Edge | Python link | 
| Qwen2 | Python link, C++ link | 
| Qwen2.5 | Python link, C++ link | 
| MiniCPM | Python link, C++ link | 
| Baichuan2 | Python link | 
| MiniCPM-Llama3-V-2_5 | Python link | 
| MiniCPM-V-2_6 | Python link | 
| Speech_Paraformer-Large | Python link | 
| Bce-Embedding-Base-V1 | Python link |