ipex-llm/python/llm/example/NPU/HF-Transformers-AutoModels
Yuwen Hu 381d448ee2
[NPU] Example & Quickstart updates (#12650)
* Remove model with optimize_model=False in NPU verified models tables, and remove related example

* Remove experimental in run optimized model section title

* Unify model table order & example cmd

* Move embedding example to separate folder & update quickstart example link

* Add Quickstart reference in main NPU readme

* Small fix

* Small fix

* Move save/load examples under NPU/HF-Transformers-AutoModels

* Add low-bit and polish arguments for LLM Python examples

* Small fix

* Add low-bit and polish arguments for Multi-Model  examples

* Polish argument for Embedding models

* Polish argument for LLM CPP examples

* Add low-bit and polish argument for Save-Load examples

* Add accuracy tuning tips for examples

* Update NPU qucikstart accuracy tuning with low-bit optimizations

* Add save/load section to qucikstart

* Update CPP example sample output to EN

* Add installation regarding cmake for CPP examples

* Small fix

* Small fix

* Small fix

* Small fix

* Small fix

* Small fix

* Unify max prompt length to 512

* Change recommended low-bit for Qwen2.5-3B-Instruct to asym_int4

* Update based on comments

* Small fix
2025-01-07 13:52:41 +08:00
..
Embedding [NPU] Example & Quickstart updates (#12650) 2025-01-07 13:52:41 +08:00
LLM [NPU] Example & Quickstart updates (#12650) 2025-01-07 13:52:41 +08:00
Multimodal [NPU] Example & Quickstart updates (#12650) 2025-01-07 13:52:41 +08:00
Save-Load [NPU] Example & Quickstart updates (#12650) 2025-01-07 13:52:41 +08:00
README.md [NPU] Example & Quickstart updates (#12650) 2025-01-07 13:52:41 +08:00

IPEX-LLM Examples on Intel NPU

This folder contains examples of running IPEX-LLM on Intel NPU:

  • LLM: examples of running large language models using IPEX-LLM optimizations
    • CPP: examples of running large language models using IPEX-LLM optimizations through C++ API
  • Multimodal: examples of running large multimodal models using IPEX-LLM optimizations
  • Embedding: examples of running embedding models using IPEX-LLM optimizations
  • Save-Load: examples of saving and loading low-bit models with IPEX-LLM optimizations

Tip

Please refer to IPEX-LLM NPU Quickstart regarding more information about running ipex-llm on Intel NPU.

Verified Models on Intel NPU

Model Example Link
Llama2 Python link, C++ link
Llama3 Python link, C++ link
Llama3.2 Python link, C++ link
GLM-Edge Python link
Qwen2 Python link, C++ link
Qwen2.5 Python link, C++ link
MiniCPM Python link, C++ link
Baichuan2 Python link
MiniCPM-Llama3-V-2_5 Python link
MiniCPM-V-2_6 Python link
Speech_Paraformer-Large Python link
Bce-Embedding-Base-V1 Python link