Add mddocs index (#11411)
This commit is contained in:
		
							parent
							
								
									c985912ee3
								
							
						
					
					
						commit
						ccb3fb357a
					
				
					 4 changed files with 49 additions and 79 deletions
				
			
		| 
						 | 
					@ -1,44 +0,0 @@
 | 
				
			||||||
# LangChain API
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
You may run the models using the LangChain API in `ipex-llm`.
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
## Using Hugging Face `transformers` INT4 Format
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
You may run any Hugging Face *Transformers* model (with INT4 optimiztions applied) using the LangChain API as follows:
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
```python
 | 
					 | 
				
			||||||
from ipex_llm.langchain.llms import TransformersLLM
 | 
					 | 
				
			||||||
from ipex_llm.langchain.embeddings import TransformersEmbeddings
 | 
					 | 
				
			||||||
from langchain.chains.question_answering import load_qa_chain
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
embeddings = TransformersEmbeddings.from_model_id(model_id=model_path)
 | 
					 | 
				
			||||||
ipex_llm = TransformersLLM.from_model_id(model_id=model_path, ...)
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
doc_chain = load_qa_chain(ipex_llm, ...)
 | 
					 | 
				
			||||||
output = doc_chain.run(...)
 | 
					 | 
				
			||||||
```
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
> [!TIP]
 | 
					 | 
				
			||||||
> See the examples [here](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/LangChain)
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
## Using Native INT4 Format
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
You may also convert Hugging Face *Transformers* models into native INT4 format, and then run the converted models using the LangChain API as follows.
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
> [!NOTE]
 | 
					 | 
				
			||||||
> - Currently only llama/bloom/gptneox/starcoder model families are supported; for other models, you may use the Hugging Face ``transformers`` INT4 format as described [above](./langchain_api.md#using-hugging-face-transformers-int4-format).
 | 
					 | 
				
			||||||
> - You may choose the corresponding API developed for specific native models to load the converted model.
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
```python
 | 
					 | 
				
			||||||
from ipex_llm.langchain.llms import LlamaLLM
 | 
					 | 
				
			||||||
from ipex_llm.langchain.embeddings import LlamaEmbeddings
 | 
					 | 
				
			||||||
from langchain.chains.question_answering import load_qa_chain
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
# switch to GptneoxEmbeddings/BloomEmbeddings/StarcoderEmbeddings to load other models
 | 
					 | 
				
			||||||
embeddings = LlamaEmbeddings(model_path='/path/to/converted/model.bin')
 | 
					 | 
				
			||||||
# switch to GptneoxLLM/BloomLLM/StarcoderLLM to load other models
 | 
					 | 
				
			||||||
ipex_llm = LlamaLLM(model_path='/path/to/converted/model.bin')
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
doc_chain = load_qa_chain(ipex_llm, ...)
 | 
					 | 
				
			||||||
doc_chain.run(...)
 | 
					 | 
				
			||||||
```
 | 
					 | 
				
			||||||
| 
						 | 
					@ -1,29 +0,0 @@
 | 
				
			||||||
# Native Format
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
You may also convert Hugging Face *Transformers* models into native INT4 format for maximum performance as follows.
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
> [!NOTE]
 | 
					 | 
				
			||||||
> Currently only llama/bloom/gptneox/starcoder/chatglm model families are supported; you may use the corresponding API to load the converted model. (For other models, you can use the Hugging Face ``transformers`` format as described [here](./hugging_face_format.md))
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
```python
 | 
					 | 
				
			||||||
# convert the model
 | 
					 | 
				
			||||||
from ipex_llm import llm_convert
 | 
					 | 
				
			||||||
ipex_llm_path = llm_convert(model='/path/to/model/',
 | 
					 | 
				
			||||||
                            outfile='/path/to/output/',
 | 
					 | 
				
			||||||
                            outtype='int4',
 | 
					 | 
				
			||||||
                            model_family="llama")
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
# load the converted model
 | 
					 | 
				
			||||||
# switch to ChatGLMForCausalLM/GptneoxForCausalLM/BloomForCausalLM/StarcoderForCausalLM to load other models
 | 
					 | 
				
			||||||
from ipex_llm.transformers import LlamaForCausalLM
 | 
					 | 
				
			||||||
llm = LlamaForCausalLM.from_pretrained("/path/to/output/model.bin", native=True, ...)
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
# run the converted model
 | 
					 | 
				
			||||||
input_ids = llm.tokenize(prompt)
 | 
					 | 
				
			||||||
output_ids = llm.generate(input_ids, ...)
 | 
					 | 
				
			||||||
output = llm.batch_decode(output_ids)
 | 
					 | 
				
			||||||
```
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
> [!NOTE] 
 | 
					 | 
				
			||||||
> See the complete example [here](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/Native-Models)
 | 
					 | 
				
			||||||
| 
						 | 
					@ -1,6 +0,0 @@
 | 
				
			||||||
# `transformers`-style API
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
You may run the LLMs using `transformers`-style API in `ipex-llm`.
 | 
					 | 
				
			||||||
 | 
					 | 
				
			||||||
* [Hugging Face `transformers` Format](./hugging_face_format.md)
 | 
					 | 
				
			||||||
* [Native Format](./native_format.md)
 | 
					 | 
				
			||||||
							
								
								
									
										49
									
								
								docs/mddocs/README.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										49
									
								
								docs/mddocs/README.md
									
									
									
									
									
										Normal file
									
								
							| 
						 | 
					@ -0,0 +1,49 @@
 | 
				
			||||||
 | 
					# IPEX-LLM Documentation
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Table of Contents
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					- [LLM in 5 minutes](./Overview/llm.md)
 | 
				
			||||||
 | 
					- [Installation](./Overview/install.md)
 | 
				
			||||||
 | 
					  - [CPU](./Overview/install_cpu.md)
 | 
				
			||||||
 | 
					  - [GPU](./Overview/install_gpu.md)
 | 
				
			||||||
 | 
					- [Docker Guides](./DockerGuides/)
 | 
				
			||||||
 | 
					  - [Overview of IPEX-LLM Containers for Intel GPU](./DockerGuides/docker_windows_gpu.md)
 | 
				
			||||||
 | 
					  - [Python Inference using IPEX-LLM on Intel GPU](./DockerGuides/docker_pytorch_inference_gpu.md)
 | 
				
			||||||
 | 
					  - [Run/Develop PyTorch in VSCode with Docker on Intel GPU](./DockerGuides/docker_run_pytorch_inference_in_vscode.md)
 | 
				
			||||||
 | 
					  - [Run llama.cpp/Ollama/Open-WebUI on an Intel GPU via Docker](./DockerGuides/docker_cpp_xpu_quickstart.md)
 | 
				
			||||||
 | 
					  - [FastChat Serving with IPEX-LLM on Intel GPUs via docker](./DockerGuides/fastchat_docker_quickstart.md)
 | 
				
			||||||
 | 
					  - [vLLM Serving with IPEX-LLM on Intel GPUs via Docker](./DockerGuides/vllm_docker_quickstart.md)
 | 
				
			||||||
 | 
					  - [vLLM Serving with IPEX-LLM on Intel CPU via Docker](./DockerGuides/vllm_cpu_docker_quickstart.md)
 | 
				
			||||||
 | 
					- [Quickstart](https://github.com/intel-analytics/ipex-llm/tree/main/docs/mddocs/Quickstart/)
 | 
				
			||||||
 | 
					  - [`bigdl-llm` Migration Guide](./Quickstart/bigdl_llm_migration.md)
 | 
				
			||||||
 | 
					  - [Install IPEX-LLM on Linux with Intel GPU](./Quickstart/install_linux_gpu.md)
 | 
				
			||||||
 | 
					  - [Install IPEX-LLM on Windows with Intel GPU](./Quickstart/install_windows_gpu.md)
 | 
				
			||||||
 | 
					  - [Run Local RAG using Langchain-Chatchat on Intel CPU and GPU](./Quickstart/chatchat_quickstart.md)
 | 
				
			||||||
 | 
					  - [Run Text Generation WebUI on Intel GPU](./Quickstart/webui_quickstart.md)
 | 
				
			||||||
 | 
					  - [Run Open WebUI with Intel GPU](./Quickstart/open_webui_with_ollama_quickstart.md)
 | 
				
			||||||
 | 
					  - [Run PrivateGPT with IPEX-LLM on Intel GPU](./Quickstart/privateGPT_quickstart.md)
 | 
				
			||||||
 | 
					  - [Run Coding Copilot in VSCode with Intel GPU](./Quickstart/continue_quickstart.md)
 | 
				
			||||||
 | 
					  - [Run Dify on Intel GPU](./Quickstart/dify_quickstart.md)
 | 
				
			||||||
 | 
					  - [Run Performance Benchmarking with IPEX-LLM](./Quickstart/benchmark_quickstart.md)
 | 
				
			||||||
 | 
					  - [Run llama.cpp with IPEX-LLM on Intel GPU](./Quickstart/llama_cpp_quickstart.md)
 | 
				
			||||||
 | 
					  - [Run Ollama with IPEX-LLM on Intel GPU](./Quickstart/ollama_quickstart.md)
 | 
				
			||||||
 | 
					  - [Run Llama 3 on Intel GPU using llama.cpp and ollama with IPEX-LLM](./Quickstart/llama3_llamacpp_ollama_quickstart.md)
 | 
				
			||||||
 | 
					  - [Serving using IPEX-LLM and FastChat](./Quickstart/fastchat_quickstart.md)
 | 
				
			||||||
 | 
					  - [Serving using IPEX-LLM and vLLM on Intel GPU](./Quickstart/vLLM_quickstart.md)
 | 
				
			||||||
 | 
					  - [Finetune LLM with Axolotl on Intel GPU](./Quickstart/axolotl_quickstart.md)
 | 
				
			||||||
 | 
					  - [Run IPEX-LLM serving on Multiple Intel GPUs using DeepSpeed AutoTP and FastApi](./Quickstart/deepspeed_autotp_fastapi_quickstart.md)
 | 
				
			||||||
 | 
					  - [Run RAGFlow with IPEX-LLM on Intel GPU](./Quickstart/ragflow_quickstart.md)
 | 
				
			||||||
 | 
					- [Key Features](./Overview/KeyFeatures/)
 | 
				
			||||||
 | 
					  - [PyTorch API](./Overview/KeyFeatures/optimize_model.md)
 | 
				
			||||||
 | 
					  - [`transformers`-style API](./Overview/KeyFeatures/hugging_face_format.md)
 | 
				
			||||||
 | 
					  - [GPU Supports](./Overview/KeyFeatures/gpu_supports.md)
 | 
				
			||||||
 | 
					    - [Inference on GPU](./Overview/KeyFeatures/inference_on_gpu.md)
 | 
				
			||||||
 | 
					    - [Finetune (QLoRA)](./Overview/KeyFeatures/finetune.md)
 | 
				
			||||||
 | 
					    - [Multi Intel GPUs selection](./Overview/KeyFeatures/multi_gpus_selection.md)
 | 
				
			||||||
 | 
					- [Examples](../../python/llm/example/)
 | 
				
			||||||
 | 
					  - [CPU](../../python/llm/example/CPU/)
 | 
				
			||||||
 | 
					  - [GPU](../../python/llm/example/GPU/)
 | 
				
			||||||
 | 
					- [API Reference](./PythonAPI/)
 | 
				
			||||||
 | 
					  - [IPEX-LLM PyTorch API](./PythonAPI/optimize.md)
 | 
				
			||||||
 | 
					  - [IPEX-LLM `transformers`-style API](./PythonAPI/transformers.md)
 | 
				
			||||||
 | 
					- [FQA](./Overview/FAQ/faq.md)
 | 
				
			||||||
		Loading…
	
		Reference in a new issue