.. meta::
   :google-site-verification: S66K6GAclKw1RroxU0Rka_2d1LZFVe27M0gRneEsIVI
.. important::
   
   .. raw:: html
      
         
            bigdl-llm has now become ipex-llm (see the migration guide here); you may find the original BigDL project here.
         
      
------
################################################
💫 Intel® LLM library for PyTorch*
################################################
.. raw:: html
   
      IPEX-LLM is a PyTorch library for running LLM on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max) with very low latency [1].
   
.. note::
   .. raw:: html
       
         
            - 
               It is built on top of the excellent work of 
llama.cpp, transfromers, bitsandbytes, vLLM, qlora, AutoGPTQ, AutoAWQ, etc. 
             
            - 
               It provides seamless integration with llama.cpp, ollama, Text-Generation-WebUI, HuggingFace transformers, HuggingFace PEFT, LangChain, LlamaIndex, DeepSpeed-AutoTP, vLLM, FastChat, HuggingFace TRL, AutoGen, ModeScope, etc.
            
 
            - 
               50+ models have been optimized/verified on 
ipex-llm (including LLaMA2, Mistral, Mixtral, Gemma, LLaVA, Whisper, ChatGLM, Baichuan, Qwen, RWKV, and more); see the complete list here.
             
         
      
************************************************
Latest update 🔥
************************************************
* [2024/05] ``ipex-llm`` now supports **Axolotl** for LLM finetuning on Intel GPU; see the quickstart `here `_.
* [2024/04] You can now run **Open WebUI** on Intel GPU using ``ipex-llm``; see the quickstart `here `_.
* [2024/04] You can now run **Llama 3** on Intel GPU using ``llama.cpp`` and ``ollama``; see the quickstart `here `_.
* [2024/04] ``ipex-llm`` now supports **Llama 3** on Intel `GPU `_ and `CPU `_.
* [2024/04] ``ipex-llm`` now provides C++ interface, which can be used as an accelerated backend for running `llama.cpp `_ and `ollama `_ on Intel GPU.
* [2024/03] ``bigdl-llm`` has now become ``ipex-llm`` (see the migration guide `here `_); you may find the original ``BigDL`` project `here `_.
* [2024/02] ``ipex-llm`` now supports directly loading model from `ModelScope `_ (`魔搭 `_).
* [2024/02] ``ipex-llm`` added inital **INT2** support (based on llama.cpp `IQ2 `_ mechanism), which makes it possible to run large-size LLM (e.g., Mixtral-8x7B) on Intel GPU with 16GB VRAM.
* [2024/02] Users can now use ``ipex-llm`` through `Text-Generation-WebUI `_ GUI.
* [2024/02] ``ipex-llm`` now supports `Self-Speculative Decoding `_, which in practice brings **~30% speedup** for FP16 and BF16 inference latency on Intel `GPU `_ and `CPU `_ respectively.
* [2024/02] ``ipex-llm`` now supports a comprehensive list of LLM finetuning on Intel GPU (including `LoRA `_, `QLoRA `_, `DPO `_, `QA-LoRA `_ and `ReLoRA `_).
* [2024/01] Using ``ipex-llm`` `QLoRA `_, we managed to finetune LLaMA2-7B in **21 minutes** and LLaMA2-70B in **3.14 hours** on 8 Intel Max 1550 GPU for `Standford-Alpaca `_ (see the blog `here `_).
.. dropdown:: More updates
   :color: primary
   * [2023/12] ``ipex-llm`` now supports `ReLoRA `_ (see `"ReLoRA: High-Rank Training Through Low-Rank Updates" `_).
   * [2023/12] ``ipex-llm`` now supports `Mixtral-8x7B `_ on both Intel `GPU `_ and `CPU `_.
   * [2023/12] ``ipex-llm`` now supports `QA-LoRA `_ (see `"QA-LoRA: Quantization-Aware Low-Rank Adaptation of Large Language Models" `_).
   * [2023/12] ``ipex-llm`` now supports `FP8 and FP4 inference `_ on Intel **GPU**.
   * [2023/11] Initial support for directly loading `GGUF `_, `AWQ `_ and `GPTQ `_ models in to ``ipex-llm`` is available.
   * [2023/11] ``ipex-llm`` now supports `vLLM continuous batching `_ on both Intel `GPU  `_ and `CPU `_.
   * [2023/10] ``ipex-llm`` now supports `QLoRA finetuning `_ on both Intel `GPU `_ and `CPU `_.
   * [2023/10] ``ipex-llm`` now supports `FastChat serving `_ on on both Intel CPU and GPU.
   * [2023/09] ``ipex-llm`` now supports `Intel GPU `_ (including iGPU, Arc, Flex and MAX).
   * [2023/09] ``ipex-llm`` `tutorial `_ is released.
************************************************
``ipex-llm`` Performance
************************************************
.. raw:: html
  
      See the Token Generation Speed on Intel Core Ultra and Intel Arc GPU below [1]
       (and refer to [2][3][4] for more details).
  
.. raw:: html
    
  
    
      
        
           
        
       | 
      
        
           
        
       | 
    
  
You may follow the `guide `_ to run ``ipex-llm`` performance benchmark yourself.
************************************************
``ipex-llm`` Demos
************************************************
See demos of running local LLMs *on Intel Iris iGPU, Intel Core Ultra iGPU, single-card Arc GPU, or multi-card Arc GPUs* using ``ipex-llm`` below.
.. raw:: html
   
  
************************************************
``ipex-llm`` Quickstart
************************************************
============================================
Docker
============================================
* `GPU Inference in C++ `_: running ``llama.cpp``, ``ollama``, ``OpenWebUI``, etc., with ``ipex-llm`` on Intel GPU
* `GPU Inference in Python `_: running HuggingFace ``transformers``, ``LangChain``, ``LlamaIndex``, ``ModelScope``, etc. with ``ipex-llm`` on Intel GPU
* `vLLM on GPU `_: running ``vLLM`` serving with ``ipex-llm`` on Intel GPU 
* `FastChat on GPU `_: running ``FastChat`` serving with ``ipex-llm`` on Intel GPU
============================================
Run
============================================
* `llama.cpp `_: running **llama.cpp** (*using C++ interface of* ``ipex-llm`` *as an accelerated backend for* ``llama.cpp``) on Intel GPU
* `ollama `_: running **ollama** (*using C++ interface of* ``ipex-llm`` *as an accelerated backend for* ``ollama``) on Intel GPU
* `vLLM `_: running ``ipex-llm`` in ``vLLM`` on both Intel `GPU `_ and `CPU `_
* `FastChat `_: running ``ipex-llm`` in ``FastChat`` serving on on both Intel GPU and CPU
* `LangChain-Chatchat RAG `_: running ``ipex-llm`` in ``LangChain-Chatchat`` (*Knowledge Base QA using* **RAG** *pipeline*)
* `Text-Generation-WebUI `_: running ``ipex-llm`` in ``oobabooga`` **WebUI**
* `Benchmarking `_: running  (latency and throughput) benchmarks for ``ipex-llm`` on Intel CPU and GPU
============================================
Install
============================================
* `Windows GPU `_: installing ``ipex-llm`` on Windows with Intel GPU
* `Linux GPU `_: installing ``ipex-llm`` on Linux with Intel GPU
.. seealso::
   For more details, please refer to the `installation guide `_
============================================
Code Examples
============================================
* Low bit inference
  * `INT4 inference `_: **INT4** LLM inference on Intel `GPU `_ and `CPU `_
  * `FP8/FP4 inference `_: **FP8** and **FP4** LLM inference on Intel `GPU `_
  * `INT8 inference `_: **INT8** LLM inference on Intel `GPU `_ and `CPU `_
  * `INT2 inference `_: **INT2** LLM inference (based on llama.cpp IQ2 mechanism) on Intel `GPU `_
* FP16/BF16 inference
  * **FP16** LLM inference on Intel `GPU `_, with possible `self-speculative decoding `_ optimization
  * **BF16** LLM inference on Intel `CPU `_, with possible `self-speculative decoding `_ optimization 
* Save and load
  * `Low-bit models `_: saving and loading ``ipex-llm`` low-bit models
  * `GGUF `_: directly loading GGUF models into ``ipex-llm``
  * `AWQ `_: directly loading AWQ models into ``ipex-llm``
  * `GPTQ `_: directly loading GPTQ models into ``ipex-llm``
* Finetuning
  * LLM finetuning on Intel `GPU `_, including `LoRA `_, `QLoRA `_, `DPO `_, `QA-LoRA `_ and `ReLoRA `_
  * QLoRA finetuning on Intel `CPU `_
* Integration with community libraries
  * `HuggingFace transformers `_
  * `Standard PyTorch model `_
  * `DeepSpeed-AutoTP `_
  * `HuggingFace PEFT `_
  * `HuggingFace TRL `_
  * `LangChain `_
  * `LlamaIndex `_
  * `AutoGen `_
  * `ModeScope `_
* `Tutorials `_
.. seealso::
   For more details, please refer to the |ipex_llm_document|_.
.. |ipex_llm_document| replace:: ``ipex-llm`` document
.. _ipex_llm_document: doc/LLM/index.html
************************************************
Verified Models
************************************************
.. raw:: html
   
     
       
       
       
     
       
         | Model | 
         CPU Example | 
         GPU Example | 
     
     
       
         | LLaMA
           (such as Vicuna, Guanaco, Koala, Baize, WizardLM, etc.) | 
         
           link1,
           link2 | 
         
           link
           link | 
       
       
         | LLaMA 2 | 
         
           link1,
           link2 | 
         
           link
           link | 
       
       
         | LLaMA 3 | 
         
           link | 
         
           link | 
       
       
         | ChatGLM | 
         
           link | 
          | 
       
       
         | ChatGLM2 | 
         
           link | 
         
           link | 
       
       
         | ChatGLM3 | 
         
           link | 
         
           link | 
       
       
         | GLM-4 | 
         
           link | 
         
           link | 
       
       
         | Mistral | 
         
           link | 
         
           link | 
       
       
         | Mixtral | 
         
           link | 
         
           link | 
       
       
         | Falcon | 
         
           link | 
         
           link | 
       
       
         | MPT | 
         
           link | 
         
           link | 
       
       
         | Dolly-v1 | 
         
           link | 
         
           link | 
       
       
         | Dolly-v2 | 
         
           link | 
         
           link | 
       
       
         | Replit Code | 
         
           link | 
         
           link | 
       
       
         | RedPajama | 
         
           link1,
           link2 | 
          | 
       
       
         | Phoenix | 
         
           link1,
           link2 | 
          | 
       
       
         | StarCoder | 
         
           link1,
           link2 | 
         
           link | 
       
       
         | Baichuan | 
         
           link | 
         
           link | 
       
       
         | Baichuan2 | 
         
           link | 
         
           link | 
       
       
         | InternLM | 
         
           link | 
         
           link | 
       
       
         | Qwen | 
         
           link | 
         
           link | 
       
       
         | Qwen1.5 | 
         
           link | 
         
           link | 
       
       
         | Qwen2 | 
         
           link | 
         
           link | 
       
       
         | Qwen-VL | 
         
           link | 
         
           link | 
       
       
         | Aquila | 
         
           link | 
         
           link | 
       
       
         | Aquila2 | 
         
           link | 
         
           link | 
       
       
         | MOSS | 
         
           link | 
          | 
       
       
         | Whisper | 
         
           link | 
         
           link | 
       
       
         | Phi-1_5 | 
         
           link | 
         
           link | 
       
       
         | Flan-t5 | 
         
           link | 
         
           link | 
       
       
         | LLaVA | 
         
           link | 
         
           link | 
       
       
         | CodeLlama | 
         
           link | 
         
           link | 
       
       
         | Skywork | 
         
           link | 
          | 
       
       
         | InternLM-XComposer | 
         
           link | 
          | 
       
       
         | WizardCoder-Python | 
         
           link | 
          | 
       
       
         | CodeShell | 
         
           link | 
          | 
       
       
         | Fuyu | 
         
           link | 
          | 
       
       
         | Distil-Whisper | 
         
           link | 
         
           link | 
       
       
         | Yi | 
         
           link | 
         
           link | 
       
       
         | BlueLM | 
         
           link | 
         
           link | 
       
       
         | Mamba | 
         
           link | 
         
           link | 
       
       
         | SOLAR | 
         
           link | 
         
           link | 
       
       
         | Phixtral | 
         
           link | 
         
           link | 
       
       
         | InternLM2 | 
         
           link | 
         
           link | 
       
       
         | RWKV4 | 
          | 
         
           link | 
       
       
         | RWKV5 | 
          | 
         
           link | 
       
       
         | Bark | 
         
           link | 
         
           link | 
       
       
         | SpeechT5 | 
          | 
         
           link | 
       
       
         | DeepSeek-MoE | 
         
           link | 
          | 
       
       
         | Ziya-Coding-34B-v1.0 | 
         
           link | 
          | 
       
       
         | Phi-2 | 
         
           link | 
         
           link | 
       
       
         | Phi-3 | 
         
           link | 
         
           link | 
       
       
         | Phi-3-vision | 
         
           link | 
         
           link | 
       
       
         | Yuan2 | 
         
           link | 
         
           link | 
       
       
         | Gemma | 
         
           link | 
         
           link | 
       
       
         | DeciLM-7B | 
         
           link | 
         
           link | 
       
       
         | Deepseek | 
         
           link | 
         
           link | 
       
       
         | StableLM | 
         
           link | 
         
           link | 
       
       
         | CodeGemma | 
         
           link | 
         
           link | 
       
       
         | Command-R/cohere | 
         
           link | 
         
           link | 
       
       
         | CodeGeeX2 | 
         
           link | 
         
           link | 
       
       
         | MiniCPM | 
         
           link | 
         
           link | 
       
     
   
************************************************
Get Support
************************************************
* Please report a bug or raise a feature request by opening a `Github Issue `_
* Please report a vulnerability by opening a draft `GitHub Security Advisory `_
------
.. raw:: html
    
        
            
               Performance varies by use, configuration and other factors. ipex-llm may not optimize to the same degree for non-Intel products. Learn more at www.Intel.com/PerformanceIndex.