ipex-llm/python/llm/example/CPU/Speculative-Decoding
Heyang Sun 36a9e88104 Speculative Starcoder on CPU (#10138)
* Speculative Starcoder on CPU

* enable kv-cache pre-allocation

* refine codes

* refine

* fix style

* fix style

* fix style

* refine

* refine

* Update speculative.py

* Update gptbigcode.py

* fix style

* Update speculative.py

* enable mixed-datatype layernorm on top of torch API

* adaptive dtype

* Update README.md
2024-02-27 09:57:29 +08:00
..
baichuan2 LLM: Update IPEX to 2.2.0+cpu and Refactor for _ipex_optimize (#10189) 2024-02-22 16:01:11 +08:00
chatglm3 LLM: Fix ChatGLM3 Speculative Example (#10236) 2024-02-26 10:57:28 +08:00
llama2 LLM: Update IPEX to 2.2.0+cpu and Refactor for _ipex_optimize (#10189) 2024-02-22 16:01:11 +08:00
mistral Speculative Starcoder on CPU (#10138) 2024-02-27 09:57:29 +08:00
qwen LLM: Update qwen readme (#10245) 2024-02-26 17:03:09 +08:00
starcoder Speculative Starcoder on CPU (#10138) 2024-02-27 09:57:29 +08:00
vicuna LLM: Update IPEX to 2.2.0+cpu and Refactor for _ipex_optimize (#10189) 2024-02-22 16:01:11 +08:00
ziya Speculative Ziya on CPU (#10160) 2024-02-21 10:30:39 +08:00
README.md LLama2 CPU example of speculative decoding (#9962) 2024-01-31 09:45:20 +08:00

Self-Speculative Decoding for Large Language Model BF16 Inference using BigDL-LLM on Intel CPUs

You can use BigDL-LLM to run BF16 inference for any Huggingface Transformer model with self-speculative decoding on Intel CPUs. This directory contains example scripts to help you quickly get started to run some popular open-source models using self-speculative decoding. Each model has its own dedicated folder, where you can find detailed instructions on how to install and run it.

Verified Hardware Platforms

  • Intel Xeon SPR server

To run these examples with BigDL-LLM, we have some recommended requirements for your machine, please refer to here for more information. Make sure you have installed bigdl-llm before:

pip install --pre --upgrade bigdl-llm[all]

Moreover, install IPEX 2.1.0, which can be done through pip install intel_extension_for_pytorch==2.1.0.