ipex-llm/python/llm/example/CPU/Speculative-Decoding
Wang, Jian4 18c032652d
LLM: Add mixtral speculative CPU example (#10830)
* init mixtral sp example

* use different prompt_format

* update output

* update
2024-04-23 10:05:51 +08:00
..
baichuan2 LLM: Fix ipex torchscript=True error (#10832) 2024-04-22 15:53:09 +08:00
chatglm3 LLM: Fix ipex torchscript=True error (#10832) 2024-04-22 15:53:09 +08:00
llama2 LLM: Fix ipex torchscript=True error (#10832) 2024-04-22 15:53:09 +08:00
mistral LLM: Fix ipex torchscript=True error (#10832) 2024-04-22 15:53:09 +08:00
mixtral LLM: Add mixtral speculative CPU example (#10830) 2024-04-23 10:05:51 +08:00
qwen Fix short prompt for IPEX_CPU speculative decoding cause no_attr error (#10783) 2024-04-17 16:19:57 +08:00
starcoder LLM: Fix ipex torchscript=True error (#10832) 2024-04-22 15:53:09 +08:00
vicuna LLM: Fix ipex torchscript=True error (#10832) 2024-04-22 15:53:09 +08:00
ziya LLM: Fix ipex torchscript=True error (#10832) 2024-04-22 15:53:09 +08:00
README.md Update_document by heyang (#30) 2024-03-25 10:06:02 +08:00

Self-Speculative Decoding for Large Language Model BF16 Inference using IPEX-LLM on Intel CPUs

You can use IPEX-LLM to run BF16 inference for any Huggingface Transformer model with self-speculative decoding on Intel CPUs. This directory contains example scripts to help you quickly get started to run some popular open-source models using self-speculative decoding. Each model has its own dedicated folder, where you can find detailed instructions on how to install and run it.

Verified Hardware Platforms

  • Intel Xeon SPR server

To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to here for more information. Make sure you have installed ipex-llm before:

pip install --pre --upgrade ipex-llm[all]

Moreover, install IPEX 2.1.0, which can be done through pip install intel_extension_for_pytorch==2.1.0.