Initial Update for Continue Quickstart with Ollama backend (#10918)

* Initial continue quickstart with ollama backend updates

* Small fix

* Small fix
This commit is contained in:
Yuwen Hu 2024-04-30 15:10:30 +08:00 committed by GitHub
parent 2c64754eb0
commit 71f51ce589
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -1,5 +1,5 @@
# Run Coding Copilot on Windows with Intel GPU
# Run Coding Copilot in VSCode with Intel GPU
[**Continue**](https://marketplace.visualstudio.com/items?itemName=Continue.continue) is a coding copilot extension in [Microsoft Visual Studio Code](https://code.visualstudio.com/); by porting it to [`ipex-llm`](https://github.com/intel-analytics/ipex-llm), users can now easily leverage local LLMs running on Intel GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max) for code explanation, code generation/completion, etc.
@ -18,42 +18,111 @@ See the demos of using Continue with [Mistral-7B-Instruct-v0.1](https://huggingf
## Quickstart
This guide walks you through setting up and running **Continue** within _Visual Studio Code_, empowered by local large language models served via [Text Generation WebUI](https://github.com/intel-analytics/text-generation-webui/) with `ipex-llm` optimizations.
This guide walks you through setting up and running **Continue** within _Visual Studio Code_, empowered by local large language models served via [Ollama](./ollama_quickstart.html) with `ipex-llm` optimizations.
### 1. Install and Run Text Generation WebUI
### 1. Install and Run Ollama Serve
Visit [Run Text Generation WebUI Quickstart Guide](webui_quickstart.html), and follow the steps 1) [Install IPEX-LLM](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/webui_quickstart.html#install-ipex-llm), 2) [Install WebUI](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/webui_quickstart.html#install-the-webui) and 3) [Start the Server](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/webui_quickstart.html#start-the-webui-server) to install and start the Text Generation WebUI API Service. **Please pay attention to below items during installation:**
Visit [Run Ollama with IPEX-LLM on Intel GPU](./ollama_quickstart.html), and follow the steps 1) [Install IPEX-LLM for Ollama](./ollama_quickstart.html#install-ipex-llm-for-ollama), 2) [Initialize Ollama](./ollama_quickstart.html#initialize-ollama) and 3) [Run Ollama Serve](./ollama_quickstart.html#run-ollama-serve) to install and initialize and start the Ollama Service.
- The Text Generation WebUI API service requires Python version 3.10 or higher. We recommend use Python 3.11 as below:
```bash
conda create -n llm python=3.11 libuv
```eval_rst
.. important::
Please make sure you have set ``OLLAMA_HOST=0.0.0.0`` before starting the Ollama service, so that connections from all IP addresses can be accepted.
.. tip::
If your local LLM is running on Intel Arc™ A-Series Graphics with Linux OS, it is recommended to additionaly set the following environment variable for optimal performance before the Ollama service is started:
.. code-block:: bash
export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
```
- Remember to launch the server **with API service** as specified in [Launch the Server](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/webui_quickstart.html#launch-the-server)
### 2. Use WebUI to Load Model
### 2. Prepare and Run Model
#### Access the WebUI
Upon successful launch, URLs to access the WebUI will be displayed in the terminal as shown below. Open the provided local URL in your browser to interact with the WebUI.
#### Pull [`codeqwen:latest`](https://ollama.com/library/codeqwen)
<a href="https://llm-assets.readthedocs.io/en/latest/_images/continue_quickstart_launch_server.jpeg" target="_blank">
<img src="https://llm-assets.readthedocs.io/en/latest/_images/continue_quickstart_launch_server.jpeg" width=100%; />
</a>
In a new terminal window:
#### Model Download and Loading
```eval_rst
.. tabs::
.. tab:: Linux
.. code-block:: bash
export no_proxy=localhost,127.0.0.1
./ollama pull codeqwen:latest
.. tab:: Windows
Please run the following command in Anaconda Prompt.
.. code-block:: cmd
set no_proxy=localhost,127.0.0.1
ollama pull codeqwen:latest
.. seealso::
Here's a list of models that can be used for coding copilot on local PC:
Here's a list of models that can be used for coding copilot on local PC.
- Code Llama:
- WizardCoder
- Mistral
- StarCoder
- DeepSeek Coder
Follow the steps in [Model Download](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/webui_quickstart.html#model-download) and [Load Model](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/webui_quickstart.html#load-model) to download and load your coding model.
You could find them in the `Ollama model library <https://ollama.com/library>`_ and have a try.
```
#### Create and Run Model
First, create a `Modelfile` file with contents:
```
FROM codeqwen:latest
PARAMETER num_ctx 4096
```
then:
```eval_rst
.. note::
.. tabs::
.. tab:: Linux
If you don't need to use the API service anymore, you can follow the instructions in refer to `Exit WebUI <https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/webui_quickstart.html#exit-the-webui>`_ to stop the service.
.. code-block:: bash
./ollama create codeqwen:latest-continue -f Modelfile
.. tab:: Windows
Please run the following command in Anaconda Prompt.
.. code-block:: cmd
ollama create codeqwen:latest-continue -f Modelfile
```
You can now find `codeqwen:latest-continue` in `ollama list`.
Finially, run the `codeqwen:latest-continue`:
```eval_rst
.. tabs::
.. tab:: Linux
.. code-block:: bash
./ollama run codeqwen:latest-continue
.. tab:: Windows
Please run the following command in Anaconda Prompt.
.. code-block:: cmd
ollama run codeqwen:latest-continue
```
### 3. Install `Continue` Extension