* Remove PyTorch 2.3 installation option for GPU * Remove xpu_lnl option in installation guides for docs * Update BMG quickstart * Remove PyTorch 2.3 dependencies for GPU examples * Update the graphmode example to use stable version 2.2.0 * Fix based on comments  | 
			||
|---|---|---|
| .. | ||
| convert.py | ||
| download.py | ||
| generate.py | ||
| README.md | ||
Moonlight
In this directory, you will find examples on how you could apply IPEX-LLM INT4 optimizations on Moonlight model on Intel GPUs. For illustration purposes, we utilize moonshotai/Moonlight-16B-A3B-Instruct as reference Moonlight model.
0. Requirements & Installation
To run these examples with IPEX-LLM on Intel GPUs, we have some recommended requirements for your machine, please refer to here (Windows or Linux) for more information.
0.1 Installation
Visit Install IPEX-LLM on Intel GPU with PyTorch 2.6, and follow Install ipex-llm (Windows or Linux).
Then, install other dependencies for Moonlight model with IPEX-LLM optimizations:
conda activate llm-pt26
pip install transformers==4.45.0
pip install accelerate==0.33.0
pip install "trl<0.12.0" 
pip install tiktoken blobfile
0.2 Runtime Configuration
Visit Install IPEX-LLM on Intel GPU with PyTorch 2.6, and follow Runtime Configurations (Windows or Linux).
1. Download & Convert Model
To run the Moonlight model with IPEX-LLM optimizations, we need to download and convert it first to make sure it could be successfully loaded by transformers.
1.1 Download Model
To download moonshotai/Moonlight-16B-A3B-Instruct from Hugging Face, you could use download.py through:
download.py --repo-id moonshotai/Moonlight-16B-A3B-Instruct --commit-id 95583251e616c46a80715897a705cd38659afc27 
By default, Moonlight-16B-A3B-Instruct will be downloaded to the current folder. You could also define the download folder path by --download-dir-path DOWNLOAD_DIR_PATH.
Tip
Refer to here for althernative methods to download models from Hugging Face.
For moonshotai/Moonlight-16B-A3B-Instruct, please make sure to use its revision/commit id
95583251e616c46a80715897a705cd38659afc27.
1.2 Convert Model
Next, convert the downloaded model by convert.py:
convert.py --model-path DOWNLOAD_DIR_PATH
The converted model will be saved at <DOWNLOAD_DIR_PATH>-converted.
2. Example: Predict Tokens using generate() API
In the example generate.py, we show a basic use case for a Moonlight model to predict the next N tokens using generate() API, with IPEX-LLM INT4 optimizations on Intel GPUs.
2.1 Running example
python generate.py --converted-model-path `<DOWNLOAD_DIR_PATH>-converted` --prompt PROMPT --n-predict N_PREDICT
Arguments info:
--converted-model-path CONVERTED_MODEL_PATH: argument defining the converted model path byconvert.py--prompt PROMPT: argument defining the prompt to be infered (with integrated prompt format for chat). It is default to be'What is AI?'.--n-predict N_PREDICT: argument defining the max number of tokens to predict. It is default to be32.
2.2 Sample Outputs
moonshotai/Moonlight-16B-A3B-Instruct
Inference time: xxxx s
-------------------- Prompt --------------------
Is 123 a prime?
-------------------- Output --------------------
<|im_system|>system<|im_middle|>You are a helpful assistant provided by Moonshot-AI.<|im_end|><|im_user|>user<|im_middle|>Is 123 a prime?<|im_end|><|im_assistant|>assistant<|im_middle|>No, 123 is not a prime number. A prime number is a number greater than 1 that has no positive divisors other than 1 and itself