diff --git a/docs/readthedocs/source/doc/LLM/Quickstart/continue_quickstart.md b/docs/readthedocs/source/doc/LLM/Quickstart/continue_quickstart.md index c80121bb..c747c82c 100644 --- a/docs/readthedocs/source/doc/LLM/Quickstart/continue_quickstart.md +++ b/docs/readthedocs/source/doc/LLM/Quickstart/continue_quickstart.md @@ -1,16 +1,78 @@ -# Use Continue in VS Code with Intel GPU +# Run Code Copilot on Windows with Intel GPU + +[**Continue**](https://marketplace.visualstudio.com/items?itemName=Continue.continue) is a coding copilot extension in [Microsoft Visual Studio Code](https://code.visualstudio.com/); by porting it to [`ipex-llm`](https://github.com/intel-analytics/ipex-llm), users can now easily leverage local llms running on Intel GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max) for code explanation, code generation/completion; see the snapshot of running Continue below (using [`CodeLlama-7b`](https://huggingface.co/codellama/CodeLlama-7b-hf)). -This quickstart guide walks you through setting up and using the **Continue** extension in VS Code (a coding assistant using Large Language Models) with **local LLMs** using `Text Generation WebUI` and `ipex-llm`. -A preview of Continue in action is shown below: -## 0. Install Continue Extension +## Quickstart + +This guide walks you through setting up and running **Continue** within _Visual Studio Code_, empowered by local large language models served via [Text Generation WebUI](https://github.com/intel-analytics/text-generation-webui/) with `ipex-llm` optimizations. + + +### 1. Install and Run Text Generation WebUI + +Visit [Run Text Generation WebUI Quickstart Guide](webui_quickstart.html), and follow the steps 1) [Install IPEX-LLM](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/webui_quickstart.html#install-ipex-llm), 2) [Install WebUI](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/webui_quickstart.html#install-the-webui) and 3) [Start the Server](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/webui_quickstart.html#start-the-webui-server) to install and start the **Text Generation WebUI API Service**, with a few exceptions as below: + + + +- The Text Generation WebUI API service requires Python version 3.10 or higher. But [IPEX-LLM installation instructions](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/webui_quickstart.html#install-ipex-llm) used ``python=3.9`` as default for creating the conda environment. We recommend changing it to ``3.11``, using below command: + ```bash + conda create -n llm python=3.11 libuv + ``` +- When following instructions in [Install Python Dependencies](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/webui_quickstart.html#install-dependencies), install an extra dependency for the API service, i.e. `extensions/openai/requirements.txt`: + ```cmd + cd C:\text-generation-webui + pip install -r requirements_cpu_only.txt + pip install -r extensions/openai/requirements.txt + ``` +- When following the instructios in [Launch the Server](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/webui_quickstart.html#launch-the-server), add a few additional command line arguments for the API service: + ``` + python server.py --load-in-4bit --api --api-port 5000 --listen + ``` + +```eval_rst +.. note:: + + The API server will by default use port ``5000``. To change the port, use ``--api-port 1234`` in the command above. You can also specify using SSL or API Key in the command. Please see `this guide `_ for the full list of arguments. +``` + + +### 2. Use WebUI to Load Model + +#### Access the WebUI +Upon successful launch, URLs to access the WebUI will be displayed in the terminal as shown below. Open the provided local URL in your browser to interact with the WebUI. + + + + + + +#### Model Download and Loading + +Here's a list of models that can be used for coding copilot on local PC. +- Code Llama: +- WizardCoder +- Mistral +- StarCoder +- DeepSeek Coder + +Follow the steps in [Model Download](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/webui_quickstart.html#model-download) and [Load Model](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/webui_quickstart.html#load-model) to download and load your coding model. + +```eval_rst +.. note:: + + If you don't need to use the API service anymore, you can follow the instructions in [Exit WebUI](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/webui_quickstart.html#exit-the-webui) to stop the service. +``` + + + +### 3. Install `Continue` Extension 1. Click `Install` on the [Continue extension in the Visual Studio Marketplace](https://marketplace.visualstudio.com/items?itemName=Continue.continue) 2. This will open the Continue extension page in VS Code, where you will need to click `Install` again 3. Once you do this, you will see the Continue logo show up on the left side bar. If you click it, the Continue extension will open up: @@ -25,123 +87,9 @@ A preview of Continue in action is shown below: Note: We strongly recommend moving Continue to VS Code's right sidebar. This helps keep the file explorer open while using Continue, and the sidebar can be toggled with a simple keyboard shortcut. ``` -## 1. Install IPEX-LLM - -To use Continue with local LLMs on Intel GPU, first ensure that IPEX-LLM is installed. Follow the instructions on the [IPEX-LLM Installation Quickstart for Windows with Intel GPU](install_windows_gpu.html). - -**After the installation, you should have created a conda environment, named `llm` for instance, for running `ipex-llm` applications.** - -```eval_rst -.. note:: - - Please note that Text Generation WebUI API service only supports ``Python >= 3.10``. We recommend using ``Python 3.11``here. -``` - -## 2. Install Text Generation WebUI -### Download the WebUI -Download the `text-generation-webui` with IPEX-LLM integrations from [this link](https://github.com/intel-analytics/text-generation-webui/archive/refs/heads/ipex-llm.zip). Unzip the content into a directory, e.g.,`C:\text-generation-webui`. - -### Install Dependencies - -Open **Anaconda Prompt** and activate the conda environment you have created in [section 1](#1-install-ipex-llm), e.g., `llm`. -``` -conda activate llm -``` -Then, change to the directory of WebUI (e.g.,`C:\text-generation-webui`) and install the necessary dependencies: -```cmd -cd C:\text-generation-webui -pip install -r requirements_cpu_only.txt -pip install -r extensions/openai/requirements.txt -``` - -## 3. Start the WebUI Server - -### Set Environment Variables -Configure oneAPI variables by running the following command in **Anaconda Prompt**: - -```cmd -call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" -``` - -```eval_rst -.. note:: - - For more details about runtime configurations, `refer to this guide `_ -``` - -If you're running on iGPU, set additional environment variables by running the following commands: -```cmd -set SYCL_CACHE_PERSISTENT=1 -set BIGDL_LLM_XMX_DISABLED=1 -``` - -### Launch the Server -In **Anaconda Prompt** with the conda environment `llm` activated, navigate to the text-generation-webui folder and start the server using the following command: - -```cmd -python server.py --load-in-4bit --api --api-port 5000 --listen -``` - -```eval_rst -.. note:: - - with ``--load-in-4bit`` option, the models will be optimized and run at 4-bit precision. For configuration for other formats and precisions, refer to `this link `_ -``` - -```eval_rst -.. note:: - - The API server will by default use port ``5000``. To change the port, use ``--api-port 1234`` in the command above. You can also specify using SSL or API Key in the command. Please see `this guide `_ for the full list of arguments. -``` - -### Access the WebUI -Upon successful launch, URLs to access the WebUI will be displayed in the terminal as shown below. Open the provided local URL in your browser to interact with the WebUI. - - - - - -## 4. Use WebUI to Load Model - -### Recommended Model Series -- Code Llama -- WizardCoder -- Mistral -- StarCoder -- DeepSeek Coder - -### Model Download - -Place Huggingface models in `C:\text-generation-webui\models` by either copying locally or downloading via the WebUI. To download, navigate to the **Model** tab, enter the model's huggingface id (for instance, `microsoft/phi-1_5`) in the **Download model or LoRA** section, and click **Download**, as illustrated below. - - - - - -After copying or downloading the models, click on the blue **refresh** button to update the **Model** drop-down menu. Then, choose your desired model from the newly updated list. - - - - - -### Load Model - -Default settings are recommended for most users. Click **Load** to activate the model. Address any errors by installing missing packages as prompted, and ensure compatibility with your version of the transformers package. Refer to [troubleshooting section](#troubleshooting) for more details. - -If everything goes well, you will get a message as shown below. - - - - - -### Exit the WebUI - -To shut down the WebUI server, use **Ctrl+C** in the **Anaconda Prompt** terminal where the WebUI Server is runing, then close your browser tab. - - -## 5. Configure Continue +## 4. Configure `Continue` @@ -164,7 +112,7 @@ In `config.json`, you'll find the `models` property, a list of the models that y } ``` -## 6. How to Use Continue +## 5. How to Use Continue For detailed tutorials please refer to [this link](https://continue.dev/docs/how-to-use-continue). Here we are only showing the most common scenarios. ### Ask about highlighted code or an entire file @@ -188,24 +136,6 @@ You can ask Continue to edit your highlighted code with the command `/edit`. If you encounter `TypeError: unsupported operand type(s) for |: 'type' and 'NoneType'` when you run `python server.py --load-in-4bit --api`, please make sure you are using `Python 3.11` instead of lower versions. -### Potentially slower first response - -The first response to user prompt might be slower than expected, with delays of up to several minutes before the response is generated. This delay occurs because the GPU kernels require compilation and initialization, which varies across different GPU types. - -### Missing Required Dependencies - -During model loading, you may encounter an **ImportError** like `ImportError: This modeling file requires the following packages that were not found in your environment`. This indicates certain packages required by the model are absent from your environment. Detailed instructions for installing these necessary packages can be found at the bottom of the error messages. Take the following steps to fix these errors: - -- Exit the WebUI Server by pressing **Ctrl+C** in the **Anaconda Prompt** terminal. -- Install the missing pip packages as specified in the error message -- Restart the WebUI Server. - -If there are still errors on missing packages, repeat the installation process for any additional required packages. -### Compatiblity issues -If you encounter **AttributeError** errors like `AttributeError: 'BaichuanTokenizer' object has no attribute 'sp_model'`, it may be due to some models being incompatible with the current version of the transformers package because the models are outdated. In such instances, using a more recent model is recommended. - + diff --git a/docs/readthedocs/source/doc/LLM/Quickstart/webui_quickstart.md b/docs/readthedocs/source/doc/LLM/Quickstart/webui_quickstart.md index 86c3205d..65d28bb5 100644 --- a/docs/readthedocs/source/doc/LLM/Quickstart/webui_quickstart.md +++ b/docs/readthedocs/source/doc/LLM/Quickstart/webui_quickstart.md @@ -104,6 +104,12 @@ If everything goes well, you will get a message as shown below. +```eval_rst +.. note:: + + Model loading might take a few minutes as it includes a **warm-up** phase. This `warm-up` step is used to improve the speed of subsequent model uses. +``` + #### Chat with the Model In the **Chat** tab, start new conversations with **New chat**.