diff --git a/docs/readthedocs/source/doc/LLM/Quickstart/llama3_llamacpp_ollama_quickstart.md b/docs/readthedocs/source/doc/LLM/Quickstart/llama3_llamacpp_ollama_quickstart.md index 6ec9252a..813e81bc 100644 --- a/docs/readthedocs/source/doc/LLM/Quickstart/llama3_llamacpp_ollama_quickstart.md +++ b/docs/readthedocs/source/doc/LLM/Quickstart/llama3_llamacpp_ollama_quickstart.md @@ -15,7 +15,7 @@ This quickstart guide walks you through how to run Llama 3 on Intel GPU using `l #### 1.1 Install IPEX-LLM for llama.cpp and Initialize -Visit [Run llama.cpp with IPEX-LLM on Intel GPU Guide](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/llama_cpp_quickstart.html), and follow the instructions in section [Prerequisites](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/llama_cpp_quickstart.html#prerequisites) to setup and section [Install IPEX-LLM for llama.cpp](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/llama_cpp_quickstart.html#install-ipex-llm-for-llama-cpp) to install the IPEX-LLM with llama.cpp binaries, then follow the instructions in section [Initialize llama.cpp with IPEX-LLM](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/llama_cpp_quickstart.html#prerequisites) to initialize. +Visit [Run llama.cpp with IPEX-LLM on Intel GPU Guide](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/llama_cpp_quickstart.html), and follow the instructions in section [Prerequisites](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/llama_cpp_quickstart.html#prerequisites) to setup and section [Install IPEX-LLM for llama.cpp](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/llama_cpp_quickstart.html#install-ipex-llm-for-llama-cpp) to install the IPEX-LLM with llama.cpp binaries, then follow the instructions in section [Initialize llama.cpp with IPEX-LLM](https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/llama_cpp_quickstart.html#initialize-llama-cpp-with-ipex-llm) to initialize. **After above steps, you should have created a conda environment, named `llm-cpp` for instance and have llama.cpp binaries in your current directory.** @@ -29,6 +29,33 @@ Suppose you have downloaded a [Meta-Llama-3-8B-Instruct-Q4_K_M.gguf](https://hug #### 1.3 Run Llama3 on Intel GPU using llama.cpp +##### Set Environment Variables(optional) + +```eval_rst +.. note:: + + This is a required step on for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. +``` + +Configure oneAPI variables by running the following command: + +```eval_rst +.. tabs:: + .. tab:: Linux + + .. code-block:: bash + + source /opt/intel/oneapi/setvars.sh + + .. tab:: Windows + + .. code-block:: bash + + call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" +``` + +##### Run llama3 + Under your current directory, exceuting below command to do inference with Llama3: ```eval_rst @@ -99,6 +126,7 @@ Launch the Ollama service: export ZES_ENABLE_SYSMAN=1 export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1 export OLLAMA_NUM_GPU=999 + # Below is a required step for APT or offline installed oneAPI. Skip below step for PIP-installed oneAPI. source /opt/intel/oneapi/setvars.sh ./ollama serve @@ -112,6 +140,7 @@ Launch the Ollama service: set no_proxy=localhost,127.0.0.1 set ZES_ENABLE_SYSMAN=1 set OLLAMA_NUM_GPU=999 + # Below is a required step for APT or offline installed oneAPI. Skip below step for PIP-installed oneAPI. call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" ollama serve @@ -124,7 +153,7 @@ Launch the Ollama service: To allow the service to accept connections from all IP addresses, use `OLLAMA_HOST=0.0.0.0 ./ollama serve` instead of just `./ollama serve`. ``` -#### 2.2.2 Using Ollama Run Llama3 +##### 2.2.2 Using Ollama Run Llama3 Keep the Ollama service on and open another terminal and run llama3 with `ollama run`: diff --git a/docs/readthedocs/source/doc/LLM/Quickstart/llama_cpp_quickstart.md b/docs/readthedocs/source/doc/LLM/Quickstart/llama_cpp_quickstart.md index f3d064a5..14600853 100644 --- a/docs/readthedocs/source/doc/LLM/Quickstart/llama_cpp_quickstart.md +++ b/docs/readthedocs/source/doc/LLM/Quickstart/llama_cpp_quickstart.md @@ -82,7 +82,14 @@ Then you can use following command to initialize `llama.cpp` with IPEX-LLM: Here we provide a simple example to show how to run a community GGUF model with IPEX-LLM. -#### Set Environment Variables +#### Set Environment Variables(optional) + +```eval_rst +.. note:: + + This is a required step on for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI. +``` + Configure oneAPI variables by running the following command: ```eval_rst diff --git a/docs/readthedocs/source/doc/LLM/Quickstart/ollama_quickstart.md b/docs/readthedocs/source/doc/LLM/Quickstart/ollama_quickstart.md index a150c0df..fc899216 100644 --- a/docs/readthedocs/source/doc/LLM/Quickstart/ollama_quickstart.md +++ b/docs/readthedocs/source/doc/LLM/Quickstart/ollama_quickstart.md @@ -55,6 +55,7 @@ You may launch the Ollama service as below: export OLLAMA_NUM_GPU=999 export no_proxy=localhost,127.0.0.1 export ZES_ENABLE_SYSMAN=1 + # Below is a required step for APT or offline installed oneAPI. Skip below step for PIP-installed oneAPI. source /opt/intel/oneapi/setvars.sh ./ollama serve @@ -68,6 +69,7 @@ You may launch the Ollama service as below: set OLLAMA_NUM_GPU=999 set no_proxy=localhost,127.0.0.1 set ZES_ENABLE_SYSMAN=1 + # Below is a required step for APT or offline installed oneAPI. Skip below step for PIP-installed oneAPI. call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" ollama serve