History

Wang, Jian4 18c032652d LLM: Add mixtral speculative CPU example (#10830 ) * init mixtral sp example * use different prompt_format * update output * update		2024-04-23 10:05:51 +08:00
..
baichuan2	LLM: Fix ipex torchscript=True error (#10832 )	2024-04-22 15:53:09 +08:00
chatglm3	LLM: Fix ipex torchscript=True error (#10832 )	2024-04-22 15:53:09 +08:00
llama2	LLM: Fix ipex torchscript=True error (#10832 )	2024-04-22 15:53:09 +08:00
mistral	LLM: Fix ipex torchscript=True error (#10832 )	2024-04-22 15:53:09 +08:00
mixtral	LLM: Add mixtral speculative CPU example (#10830 )	2024-04-23 10:05:51 +08:00
qwen	Fix short prompt for IPEX_CPU speculative decoding cause no_attr error (#10783 )	2024-04-17 16:19:57 +08:00
starcoder	LLM: Fix ipex torchscript=True error (#10832 )	2024-04-22 15:53:09 +08:00
vicuna	LLM: Fix ipex torchscript=True error (#10832 )	2024-04-22 15:53:09 +08:00
ziya	LLM: Fix ipex torchscript=True error (#10832 )	2024-04-22 15:53:09 +08:00
README.md	Update_document by heyang (#30 )	2024-03-25 10:06:02 +08:00

README.md

Self-Speculative Decoding for Large Language Model BF16 Inference using IPEX-LLM on Intel CPUs

You can use IPEX-LLM to run BF16 inference for any Huggingface Transformer model with self-speculative decoding on Intel CPUs. This directory contains example scripts to help you quickly get started to run some popular open-source models using self-speculative decoding. Each model has its own dedicated folder, where you can find detailed instructions on how to install and run it.

Verified Hardware Platforms

Intel Xeon SPR server

Recommended Requirements

To run these examples with IPEX-LLM, we have some recommended requirements for your machine, please refer to here for more information. Make sure you have installed ipex-llm before:

pip install --pre --upgrade ipex-llm[all]

Moreover, install IPEX 2.1.0, which can be done through pip install intel_extension_for_pytorch==2.1.0.