ipex-llm/python/llm/example/CPU/Speculative-Decoding
Jean Yu ab476c7fe2
Eagle Speculative Sampling examples (#11104)
* Eagle Speculative Sampling examples

* rm multi-gpu and ray content

* updated README to include Arc A770
2024-05-24 11:13:43 -07:00
..
baichuan2 Further Modify CPU example (#11081) 2024-05-21 13:55:47 +08:00
chatglm3 Further Modify CPU example (#11081) 2024-05-21 13:55:47 +08:00
Eagle Eagle Speculative Sampling examples (#11104) 2024-05-24 11:13:43 -07:00
llama2 Further Modify CPU example (#11081) 2024-05-21 13:55:47 +08:00
llama3 Further Modify CPU example (#11081) 2024-05-21 13:55:47 +08:00
mistral Further Modify CPU example (#11081) 2024-05-21 13:55:47 +08:00
mixtral Further Modify CPU example (#11081) 2024-05-21 13:55:47 +08:00
qwen Further Modify CPU example (#11081) 2024-05-21 13:55:47 +08:00
starcoder Further Modify CPU example (#11081) 2024-05-21 13:55:47 +08:00
vicuna Further Modify CPU example (#11081) 2024-05-21 13:55:47 +08:00
ziya Further Modify CPU example (#11081) 2024-05-21 13:55:47 +08:00
README.md Further Modify CPU example (#11081) 2024-05-21 13:55:47 +08:00

Self-Speculative Decoding for Large Language Model BF16 Inference using IPEX-LLM on Intel CPUs

You can use IPEX-LLM to run BF16 inference for any Huggingface Transformer model with self-speculative decoding on Intel CPUs. This directory contains example scripts to help you quickly get started to run some popular open-source models using self-speculative decoding. Each model has its own dedicated folder, where you can find detailed instructions on how to install and run it.

Verified Hardware Platforms

  • Intel Xeon SPR server

To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to here for more information. Make sure you have installed ipex-llm before:

pip install --pre --upgrade ipex-llm[all] --extra-index-url https://download.pytorch.org/whl/cpu

Moreover, install IPEX 2.1.0, which can be done through pip install intel_extension_for_pytorch==2.1.0.