Small mddoc fixed based on review (#11391)

* Fix based on review

* Further fix

* Small fix

* Small fix
This commit is contained in:
Yuwen Hu 2024-06-21 17:09:30 +08:00 committed by GitHub
parent 072ce7e66d
commit a027121530
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
10 changed files with 91 additions and 97 deletions

View file

@ -35,19 +35,19 @@ Choose one of the following methods to start the container:
export DOCKER_IMAGE=intelanalytics/ipex-llm-inference-cpp-xpu:latest
export CONTAINER_NAME=ipex-llm-inference-cpp-xpu-container
sudo docker run -itd \
--net=host \
--device=/dev/dri \
-v /path/to/models:/models \
-e no_proxy=localhost,127.0.0.1 \
--memory="32G" \
--name=$CONTAINER_NAME \
-e bench_model="mistral-7b-v0.1.Q4_0.gguf" \
-e DEVICE=Arc \
--shm-size="16g" \
$DOCKER_IMAGE
--net=host \
--device=/dev/dri \
-v /path/to/models:/models \
-e no_proxy=localhost,127.0.0.1 \
--memory="32G" \
--name=$CONTAINER_NAME \
-e bench_model="mistral-7b-v0.1.Q4_0.gguf" \
-e DEVICE=Arc \
--shm-size="16g" \
$DOCKER_IMAGE
```
- For **Windows users**:
- For **Windows WSL users**:
To map the `xpu` into the container, you need to specify `--device=/dev/dri` when booting the container. And change the `/path/to/models` to mount the models. Then add `--privileged` and map the `/usr/lib/wsl` to the docker.
@ -56,18 +56,18 @@ Choose one of the following methods to start the container:
export DOCKER_IMAGE=intelanalytics/ipex-llm-inference-cpp-xpu:latest
export CONTAINER_NAME=ipex-llm-inference-cpp-xpu-container
sudo docker run -itd \
--net=host \
--device=/dev/dri \
--privileged \
-v /path/to/models:/models \
-v /usr/lib/wsl:/usr/lib/wsl \
-e no_proxy=localhost,127.0.0.1 \
--memory="32G" \
--name=$CONTAINER_NAME \
-e bench_model="mistral-7b-v0.1.Q4_0.gguf" \
-e DEVICE=Arc \
--shm-size="16g" \
$DOCKER_IMAGE
--net=host \
--device=/dev/dri \
--privileged \
-v /path/to/models:/models \
-v /usr/lib/wsl:/usr/lib/wsl \
-e no_proxy=localhost,127.0.0.1 \
--memory="32G" \
--name=$CONTAINER_NAME \
-e bench_model="mistral-7b-v0.1.Q4_0.gguf" \
-e DEVICE=Arc \
--shm-size="16g" \
$DOCKER_IMAGE
```
After the container is booted, you could get into the container through `docker exec`.

View file

@ -26,13 +26,13 @@ Start ipex-llm-xpu Docker Container. Choose one of the following commands to sta
export MODEL_PATH=/llm/models[change to your model path]
docker run -itd \
--net=host \
--device=/dev/dri \
--memory="32G" \
--name=$CONTAINER_NAME \
--shm-size="16g" \
-v $MODEL_PATH:/llm/models \
$DOCKER_IMAGE
--net=host \
--device=/dev/dri \
--memory="32G" \
--name=$CONTAINER_NAME \
--shm-size="16g" \
-v $MODEL_PATH:/llm/models \
$DOCKER_IMAGE
```
- For **Windows WSL users**:
@ -44,15 +44,15 @@ Start ipex-llm-xpu Docker Container. Choose one of the following commands to sta
export MODEL_PATH=/llm/models[change to your model path]
sudo docker run -itd \
--net=host \
--privileged \
--device /dev/dri \
--memory="32G" \
--name=$CONTAINER_NAME \
--shm-size="16g" \
-v $MODEL_PATH:/llm/llm-models \
-v /usr/lib/wsl:/usr/lib/wsl \
$DOCKER_IMAGE
--net=host \
--privileged \
--device /dev/dri \
--memory="32G" \
--name=$CONTAINER_NAME \
--shm-size="16g" \
-v $MODEL_PATH:/llm/llm-models \
-v /usr/lib/wsl:/usr/lib/wsl \
$DOCKER_IMAGE
```
Access the container:

View file

@ -60,13 +60,13 @@ Start ipex-llm-xpu Docker Container. Choose one of the following commands to sta
export MODEL_PATH=/llm/models[change to your model path]
docker run -itd \
--net=host \
--device=/dev/dri \
--memory="32G" \
--name=$CONTAINER_NAME \
--shm-size="16g" \
-v $MODEL_PATH:/llm/models \
$DOCKER_IMAGE
--net=host \
--device=/dev/dri \
--memory="32G" \
--name=$CONTAINER_NAME \
--shm-size="16g" \
-v $MODEL_PATH:/llm/models \
$DOCKER_IMAGE
```
- For **Windows WSL users**:
@ -78,15 +78,15 @@ Start ipex-llm-xpu Docker Container. Choose one of the following commands to sta
export MODEL_PATH=/llm/models[change to your model path]
sudo docker run -itd \
--net=host \
--privileged \
--device /dev/dri \
--memory="32G" \
--name=$CONTAINER_NAME \
--shm-size="16g" \
-v $MODEL_PATH:/llm/llm-models \
-v /usr/lib/wsl:/usr/lib/wsl \
$DOCKER_IMAGE
--net=host \
--privileged \
--device /dev/dri \
--memory="32G" \
--name=$CONTAINER_NAME \
--shm-size="16g" \
-v $MODEL_PATH:/llm/llm-models \
-v /usr/lib/wsl:/usr/lib/wsl \
$DOCKER_IMAGE
```

View file

@ -23,7 +23,7 @@ output = tokenizer.batch_decode(output_ids)
```
> [!TIP]
> See the complete CPU examples [here](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels) and GPU examples [here](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels>).
> See the complete CPU examples [here](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/HF-Transformers-AutoModels) and GPU examples [here](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels).
> [!NOTE]
> You may apply more low bit optimizations (including INT8, INT5 and INT4) as follows:

View file

@ -66,7 +66,8 @@ You could choose to use [PyTorch API](./optimize_model.md) or [`transformers`-st
model = model.to('xpu') # Important after obtaining the optimized model
```
> [!TIP]
> **Tip**:
>
> When running LLMs on Intel iGPUs for Windows users, we recommend setting `cpu_embedding=True` in the `from_pretrained` function. This will allow the memory-intensive embedding layer to utilize the CPU instead of iGPU.
>
> See the [API doc](https://ipex-llm.readthedocs.io/en/latest/doc/PythonAPI/LLM/transformers.html) to find more information.
@ -81,7 +82,8 @@ You could choose to use [PyTorch API](./optimize_model.md) or [`transformers`-st
model = model.to('xpu') # Important after obtaining the optimized model
```
> [!TIP]
> **Tip**:
>
> When running saved optimized models on Intel iGPUs for Windows users, we also recommend setting `cpu_embedding=True` in the `load_low_bit` function.

View file

@ -19,7 +19,7 @@ output = doc_chain.run(...)
```
> [!TIP]
> See the examples [here](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/LangChain/transformers_int4)
> See the examples [here](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/LangChain)
## Using Native INT4 Format
@ -41,7 +41,4 @@ ipex_llm = LlamaLLM(model_path='/path/to/converted/model.bin')
doc_chain = load_qa_chain(ipex_llm, ...)
doc_chain.run(...)
```
> [!TIP]
> See the examples [here](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/LangChain/native_int4) for more information.
```

View file

@ -10,7 +10,9 @@ You may also convert Hugging Face *Transformers* models into native INT4 format
# convert the model
from ipex_llm import llm_convert
ipex_llm_path = llm_convert(model='/path/to/model/',
outfile='/path/to/output/', outtype='int4', model_family="llama")
outfile='/path/to/output/',
outtype='int4',
model_family="llama")
# load the converted model
# switch to ChatGLMForCausalLM/GptneoxForCausalLM/BloomForCausalLM/StarcoderForCausalLM to load other models

View file

@ -55,13 +55,14 @@ First we recommend using [Conda](https://conda-forge.org/download/) to create a
pip install --pre --upgrade ipex-llm[all] --extra-index-url https://download.pytorch.org/whl/cpu
```
- For
```cmd
conda create -n llm python=3.11
conda activate llm
- For **Windows users**:
pip install --pre --upgrade ipex-llm[all]
```
```cmd
conda create -n llm python=3.11
conda activate llm
pip install --pre --upgrade ipex-llm[all]
```
Then for running a LLM model with IPEX-LLM optimizations (taking an `example.py` an example):

View file

@ -20,7 +20,7 @@
## Langchain-Chatchat Architecture
See the Langchain-Chatchat architecture below ([source](https://github.com/chatchat-space/Langchain-Chatchat/blob/master/img/langchain%2Bchatglm.png)).
See the Langchain-Chatchat architecture below ([source](https://github.com/chatchat-space/Langchain-Chatchat/blob/master/docs/img/langchain%2Bchatglm.png)).
<img src="https://llm-assets.readthedocs.io/en/latest/_images/langchain-arch.png" height="50%" />

View file

@ -139,15 +139,12 @@ You can now open a browser and access the RAGflow web portal. With the default s
If this is your first time using RAGFlow, you will need to register. After registering, log in with your new account to access the portal.
<div style="display: flex; gap: 5px;">
<a href="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-login.png" target="_blank" style="flex: 1;">
<img src="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-login.png" style="width: 100%;" />
</a>
<a href="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-login2.png" target="_blank" style="flex: 1;">
<img src="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-login2.png" style="width: 100%;" />
</a>
</div>
<table width="100%">
<tr>
<td><a href="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-login.png"><img src="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-login.png"/></a></td>
<td><a href="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-login2.png"><img src="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-login2.png"/></a></td>
</tr>
</table>
#### Configure `Ollama` service URL
@ -180,26 +177,21 @@ Go to **Knowledge Base** by clicking on **Knowledge Base** in the top bar. Click
After entering a name, you will be directed to edit the knowledge base. Click on **Dataset** on the left, then click **+ Add file -> Local files**. Upload your file in the pop-up window and click **OK**.
<div style="display: flex; gap: 5px;">
<a href="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-knowledgebase2.png" target="_blank" style="flex: 1;">
<img src="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-knowledgebase2.png" style="width: 100%;" />
</a>
<a href="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-knowledgebase3.png" target="_blank" style="flex: 1;">
<img src="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-knowledgebase3.png" style="width: 100%;" />
</a>
</div>
<table width="100%">
<tr>
<td><a href="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-knowledgebase2.png"><img src="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-knowledgebase2.png"/></a></td>
<td><a href="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-knowledgebase3.png"><img src="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-knowledgebase3.png"/></a></td>
</tr>
</table>
After the upload is successful, you will see a new record in the dataset. The _**Parsing Status**_ column will show `UNSTARTED`. Click the green start button in the _**Action**_ column to begin file parsing. Once parsing is finished, the _**Parsing Status**_ column will change to **SUCCESS**.
<div style="display: flex; gap: 5px;">
<a href="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-knowledgebase4.png" target="_blank" style="flex: 1;">
<img src="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-knowledgebase4.png" style="width: 100%;" />
</a>
<a href="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-knowledgebase5.png" target="_blank" style="flex: 1;">
<img src="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-knowledgebase5.png" style="width: 100%;" />
</a>
</div>
<table width="100%">
<tr>
<td><a href="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-knowledgebase4.pngg"><img src="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-knowledgebase4.png"/></a></td>
<td><a href="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-knowledgebase5.png"><img src="https://llm-assets.readthedocs.io/en/latest/_images/ragflow-knowledgebase5.png"/></a></td>
</tr>
</table>
Next, go to **Configuration** on the left menu and click **Save** at the bottom to save the changes.