ipex-llm/python/llm/example/NPU/HF-Transformers-AutoModels/README.md
Yuwen Hu 381d448ee2
[NPU] Example & Quickstart updates (#12650)
* Remove model with optimize_model=False in NPU verified models tables, and remove related example

* Remove experimental in run optimized model section title

* Unify model table order & example cmd

* Move embedding example to separate folder & update quickstart example link

* Add Quickstart reference in main NPU readme

* Small fix

* Small fix

* Move save/load examples under NPU/HF-Transformers-AutoModels

* Add low-bit and polish arguments for LLM Python examples

* Small fix

* Add low-bit and polish arguments for Multi-Model  examples

* Polish argument for Embedding models

* Polish argument for LLM CPP examples

* Add low-bit and polish argument for Save-Load examples

* Add accuracy tuning tips for examples

* Update NPU qucikstart accuracy tuning with low-bit optimizations

* Add save/load section to qucikstart

* Update CPP example sample output to EN

* Add installation regarding cmake for CPP examples

* Small fix

* Small fix

* Small fix

* Small fix

* Small fix

* Small fix

* Unify max prompt length to 512

* Change recommended low-bit for Qwen2.5-3B-Instruct to asym_int4

* Update based on comments

* Small fix
2025-01-07 13:52:41 +08:00

28 lines
No EOL
1.7 KiB
Markdown

# IPEX-LLM Examples on Intel NPU
This folder contains examples of running IPEX-LLM on Intel NPU:
- [LLM](./LLM): examples of running large language models using IPEX-LLM optimizations
- [CPP](./LLM/CPP_Examples/): examples of running large language models using IPEX-LLM optimizations through C++ API
- [Multimodal](./Multimodal): examples of running large multimodal models using IPEX-LLM optimizations
- [Embedding](./Embedding): examples of running embedding models using IPEX-LLM optimizations
- [Save-Load](./Save-Load): examples of saving and loading low-bit models with IPEX-LLM optimizations
> [!TIP]
> Please refer to [IPEX-LLM NPU Quickstart](../../../../../docs/mddocs/Quickstart/npu_quickstart.md) regarding more information about running `ipex-llm` on Intel NPU.
## Verified Models on Intel NPU
| Model | Example Link |
|------------|----------------------------------------------------------------|
| Llama2 | [Python link](./LLM), [C++ link](./LLM/CPP_Examples/) |
| Llama3 | [Python link](./LLM), [C++ link](./LLM/CPP_Examples/) |
| Llama3.2 | [Python link](./LLM), [C++ link](./LLM/CPP_Examples/) |
| GLM-Edge | [Python link](./LLM) |
| Qwen2 | [Python link](./LLM), [C++ link](./LLM/CPP_Examples/) |
| Qwen2.5 | [Python link](./LLM), [C++ link](./LLM/CPP_Examples/) |
| MiniCPM | [Python link](./LLM), [C++ link](./LLM/CPP_Examples/) |
| Baichuan2 | [Python link](./LLM) |
| MiniCPM-Llama3-V-2_5 | [Python link](./Multimodal/) |
| MiniCPM-V-2_6 | [Python link](./Multimodal/) |
| Speech_Paraformer-Large | [Python link](./Multimodal/) |
| Bce-Embedding-Base-V1 | [Python link](./Embedding//) |