diff --git a/docs/readthedocs/source/index.rst b/docs/readthedocs/source/index.rst index 57712a11..0e58a69e 100644 --- a/docs/readthedocs/source/index.rst +++ b/docs/readthedocs/source/index.rst @@ -20,7 +20,7 @@ .. raw:: html

- IPEX-LLM is a PyTorch library for running LLM on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max) with very low latency [1]. + IPEX-LLM is a PyTorch library for running LLM on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max) with very low latency [1].

.. note:: @@ -48,7 +48,7 @@ Latest update 🔥 * [2024/02] ``ipex-llm`` now supports directly loading model from `ModelScope `_ (`魔搭 `_). * [2024/02] ``ipex-llm`` added inital **INT2** support (based on llama.cpp `IQ2 `_ mechanism), which makes it possible to run large-size LLM (e.g., Mixtral-8x7B) on Intel GPU with 16GB VRAM. * [2024/02] Users can now use ``ipex-llm`` through `Text-Generation-WebUI `_ GUI. -* [2024/02] ``ipex-llm`` now supports `*Self-Speculative Decoding* `_, which in practice brings **~30% speedup** for FP16 and BF16 inference latency on Intel `GPU `_ and `CPU `_ respectively. +* [2024/02] ``ipex-llm`` now supports `Self-Speculative Decoding `_, which in practice brings **~30% speedup** for FP16 and BF16 inference latency on Intel `GPU `_ and `CPU `_ respectively. * [2024/02] ``ipex-llm`` now supports a comprehensive list of LLM finetuning on Intel GPU (including `LoRA `_, `QLoRA `_, `DPO `_, `QA-LoRA `_ and `ReLoRA `_). * [2024/01] Using ``ipex-llm`` `QLoRA `_, we managed to finetune LLaMA2-7B in **21 minutes** and LLaMA2-70B in **3.14 hours** on 8 Intel Max 1550 GPU for `Standford-Alpaca `_ (see the blog `here `_).