* Remove model with optimize_model=False in NPU verified models tables, and remove related example * Remove experimental in run optimized model section title * Unify model table order & example cmd * Move embedding example to separate folder & update quickstart example link * Add Quickstart reference in main NPU readme * Small fix * Small fix * Move save/load examples under NPU/HF-Transformers-AutoModels * Add low-bit and polish arguments for LLM Python examples * Small fix * Add low-bit and polish arguments for Multi-Model examples * Polish argument for Embedding models * Polish argument for LLM CPP examples * Add low-bit and polish argument for Save-Load examples * Add accuracy tuning tips for examples * Update NPU qucikstart accuracy tuning with low-bit optimizations * Add save/load section to qucikstart * Update CPP example sample output to EN * Add installation regarding cmake for CPP examples * Small fix * Small fix * Small fix * Small fix * Small fix * Small fix * Unify max prompt length to 512 * Change recommended low-bit for Qwen2.5-3B-Instruct to asym_int4 * Update based on comments * Small fix
28 lines
No EOL
1.7 KiB
Markdown
28 lines
No EOL
1.7 KiB
Markdown
# IPEX-LLM Examples on Intel NPU
|
|
|
|
This folder contains examples of running IPEX-LLM on Intel NPU:
|
|
|
|
- [LLM](./LLM): examples of running large language models using IPEX-LLM optimizations
|
|
- [CPP](./LLM/CPP_Examples/): examples of running large language models using IPEX-LLM optimizations through C++ API
|
|
- [Multimodal](./Multimodal): examples of running large multimodal models using IPEX-LLM optimizations
|
|
- [Embedding](./Embedding): examples of running embedding models using IPEX-LLM optimizations
|
|
- [Save-Load](./Save-Load): examples of saving and loading low-bit models with IPEX-LLM optimizations
|
|
|
|
> [!TIP]
|
|
> Please refer to [IPEX-LLM NPU Quickstart](../../../../../docs/mddocs/Quickstart/npu_quickstart.md) regarding more information about running `ipex-llm` on Intel NPU.
|
|
|
|
## Verified Models on Intel NPU
|
|
| Model | Example Link |
|
|
|------------|----------------------------------------------------------------|
|
|
| Llama2 | [Python link](./LLM), [C++ link](./LLM/CPP_Examples/) |
|
|
| Llama3 | [Python link](./LLM), [C++ link](./LLM/CPP_Examples/) |
|
|
| Llama3.2 | [Python link](./LLM), [C++ link](./LLM/CPP_Examples/) |
|
|
| GLM-Edge | [Python link](./LLM) |
|
|
| Qwen2 | [Python link](./LLM), [C++ link](./LLM/CPP_Examples/) |
|
|
| Qwen2.5 | [Python link](./LLM), [C++ link](./LLM/CPP_Examples/) |
|
|
| MiniCPM | [Python link](./LLM), [C++ link](./LLM/CPP_Examples/) |
|
|
| Baichuan2 | [Python link](./LLM) |
|
|
| MiniCPM-Llama3-V-2_5 | [Python link](./Multimodal/) |
|
|
| MiniCPM-V-2_6 | [Python link](./Multimodal/) |
|
|
| Speech_Paraformer-Large | [Python link](./Multimodal/) |
|
|
| Bce-Embedding-Base-V1 | [Python link](./Embedding//) | |