From ccb3fb357ae1335bce04d778fd157c34a3baa5b0 Mon Sep 17 00:00:00 2001 From: Yuwen Hu <54161268+Oscilloscope98@users.noreply.github.com> Date: Mon, 24 Jun 2024 15:35:18 +0800 Subject: [PATCH] Add mddocs index (#11411) --- .../Overview/KeyFeatures/langchain_api.md | 44 ----------------- .../Overview/KeyFeatures/native_format.md | 29 ----------- .../KeyFeatures/transformers_style_api.md | 6 --- docs/mddocs/README.md | 49 +++++++++++++++++++ 4 files changed, 49 insertions(+), 79 deletions(-) delete mode 100644 docs/mddocs/Overview/KeyFeatures/langchain_api.md delete mode 100644 docs/mddocs/Overview/KeyFeatures/native_format.md delete mode 100644 docs/mddocs/Overview/KeyFeatures/transformers_style_api.md create mode 100644 docs/mddocs/README.md diff --git a/docs/mddocs/Overview/KeyFeatures/langchain_api.md b/docs/mddocs/Overview/KeyFeatures/langchain_api.md deleted file mode 100644 index dd8732b2..00000000 --- a/docs/mddocs/Overview/KeyFeatures/langchain_api.md +++ /dev/null @@ -1,44 +0,0 @@ -# LangChain API - -You may run the models using the LangChain API in `ipex-llm`. - -## Using Hugging Face `transformers` INT4 Format - -You may run any Hugging Face *Transformers* model (with INT4 optimiztions applied) using the LangChain API as follows: - -```python -from ipex_llm.langchain.llms import TransformersLLM -from ipex_llm.langchain.embeddings import TransformersEmbeddings -from langchain.chains.question_answering import load_qa_chain - -embeddings = TransformersEmbeddings.from_model_id(model_id=model_path) -ipex_llm = TransformersLLM.from_model_id(model_id=model_path, ...) - -doc_chain = load_qa_chain(ipex_llm, ...) -output = doc_chain.run(...) -``` - -> [!TIP] -> See the examples [here](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/LangChain) - -## Using Native INT4 Format - -You may also convert Hugging Face *Transformers* models into native INT4 format, and then run the converted models using the LangChain API as follows. - -> [!NOTE] -> - Currently only llama/bloom/gptneox/starcoder model families are supported; for other models, you may use the Hugging Face ``transformers`` INT4 format as described [above](./langchain_api.md#using-hugging-face-transformers-int4-format). -> - You may choose the corresponding API developed for specific native models to load the converted model. - -```python -from ipex_llm.langchain.llms import LlamaLLM -from ipex_llm.langchain.embeddings import LlamaEmbeddings -from langchain.chains.question_answering import load_qa_chain - -# switch to GptneoxEmbeddings/BloomEmbeddings/StarcoderEmbeddings to load other models -embeddings = LlamaEmbeddings(model_path='/path/to/converted/model.bin') -# switch to GptneoxLLM/BloomLLM/StarcoderLLM to load other models -ipex_llm = LlamaLLM(model_path='/path/to/converted/model.bin') - -doc_chain = load_qa_chain(ipex_llm, ...) -doc_chain.run(...) -``` \ No newline at end of file diff --git a/docs/mddocs/Overview/KeyFeatures/native_format.md b/docs/mddocs/Overview/KeyFeatures/native_format.md deleted file mode 100644 index dd98560d..00000000 --- a/docs/mddocs/Overview/KeyFeatures/native_format.md +++ /dev/null @@ -1,29 +0,0 @@ -# Native Format - -You may also convert Hugging Face *Transformers* models into native INT4 format for maximum performance as follows. - -> [!NOTE] -> Currently only llama/bloom/gptneox/starcoder/chatglm model families are supported; you may use the corresponding API to load the converted model. (For other models, you can use the Hugging Face ``transformers`` format as described [here](./hugging_face_format.md)) - - -```python -# convert the model -from ipex_llm import llm_convert -ipex_llm_path = llm_convert(model='/path/to/model/', - outfile='/path/to/output/', - outtype='int4', - model_family="llama") - -# load the converted model -# switch to ChatGLMForCausalLM/GptneoxForCausalLM/BloomForCausalLM/StarcoderForCausalLM to load other models -from ipex_llm.transformers import LlamaForCausalLM -llm = LlamaForCausalLM.from_pretrained("/path/to/output/model.bin", native=True, ...) - -# run the converted model -input_ids = llm.tokenize(prompt) -output_ids = llm.generate(input_ids, ...) -output = llm.batch_decode(output_ids) -``` - -> [!NOTE] -> See the complete example [here](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/Native-Models) diff --git a/docs/mddocs/Overview/KeyFeatures/transformers_style_api.md b/docs/mddocs/Overview/KeyFeatures/transformers_style_api.md deleted file mode 100644 index 22ae08ac..00000000 --- a/docs/mddocs/Overview/KeyFeatures/transformers_style_api.md +++ /dev/null @@ -1,6 +0,0 @@ -# `transformers`-style API - -You may run the LLMs using `transformers`-style API in `ipex-llm`. - -* [Hugging Face `transformers` Format](./hugging_face_format.md) -* [Native Format](./native_format.md) \ No newline at end of file diff --git a/docs/mddocs/README.md b/docs/mddocs/README.md new file mode 100644 index 00000000..9569fa68 --- /dev/null +++ b/docs/mddocs/README.md @@ -0,0 +1,49 @@ +# IPEX-LLM Documentation + +## Table of Contents + +- [LLM in 5 minutes](./Overview/llm.md) +- [Installation](./Overview/install.md) + - [CPU](./Overview/install_cpu.md) + - [GPU](./Overview/install_gpu.md) +- [Docker Guides](./DockerGuides/) + - [Overview of IPEX-LLM Containers for Intel GPU](./DockerGuides/docker_windows_gpu.md) + - [Python Inference using IPEX-LLM on Intel GPU](./DockerGuides/docker_pytorch_inference_gpu.md) + - [Run/Develop PyTorch in VSCode with Docker on Intel GPU](./DockerGuides/docker_run_pytorch_inference_in_vscode.md) + - [Run llama.cpp/Ollama/Open-WebUI on an Intel GPU via Docker](./DockerGuides/docker_cpp_xpu_quickstart.md) + - [FastChat Serving with IPEX-LLM on Intel GPUs via docker](./DockerGuides/fastchat_docker_quickstart.md) + - [vLLM Serving with IPEX-LLM on Intel GPUs via Docker](./DockerGuides/vllm_docker_quickstart.md) + - [vLLM Serving with IPEX-LLM on Intel CPU via Docker](./DockerGuides/vllm_cpu_docker_quickstart.md) +- [Quickstart](https://github.com/intel-analytics/ipex-llm/tree/main/docs/mddocs/Quickstart/) + - [`bigdl-llm` Migration Guide](./Quickstart/bigdl_llm_migration.md) + - [Install IPEX-LLM on Linux with Intel GPU](./Quickstart/install_linux_gpu.md) + - [Install IPEX-LLM on Windows with Intel GPU](./Quickstart/install_windows_gpu.md) + - [Run Local RAG using Langchain-Chatchat on Intel CPU and GPU](./Quickstart/chatchat_quickstart.md) + - [Run Text Generation WebUI on Intel GPU](./Quickstart/webui_quickstart.md) + - [Run Open WebUI with Intel GPU](./Quickstart/open_webui_with_ollama_quickstart.md) + - [Run PrivateGPT with IPEX-LLM on Intel GPU](./Quickstart/privateGPT_quickstart.md) + - [Run Coding Copilot in VSCode with Intel GPU](./Quickstart/continue_quickstart.md) + - [Run Dify on Intel GPU](./Quickstart/dify_quickstart.md) + - [Run Performance Benchmarking with IPEX-LLM](./Quickstart/benchmark_quickstart.md) + - [Run llama.cpp with IPEX-LLM on Intel GPU](./Quickstart/llama_cpp_quickstart.md) + - [Run Ollama with IPEX-LLM on Intel GPU](./Quickstart/ollama_quickstart.md) + - [Run Llama 3 on Intel GPU using llama.cpp and ollama with IPEX-LLM](./Quickstart/llama3_llamacpp_ollama_quickstart.md) + - [Serving using IPEX-LLM and FastChat](./Quickstart/fastchat_quickstart.md) + - [Serving using IPEX-LLM and vLLM on Intel GPU](./Quickstart/vLLM_quickstart.md) + - [Finetune LLM with Axolotl on Intel GPU](./Quickstart/axolotl_quickstart.md) + - [Run IPEX-LLM serving on Multiple Intel GPUs using DeepSpeed AutoTP and FastApi](./Quickstart/deepspeed_autotp_fastapi_quickstart.md) + - [Run RAGFlow with IPEX-LLM on Intel GPU](./Quickstart/ragflow_quickstart.md) +- [Key Features](./Overview/KeyFeatures/) + - [PyTorch API](./Overview/KeyFeatures/optimize_model.md) + - [`transformers`-style API](./Overview/KeyFeatures/hugging_face_format.md) + - [GPU Supports](./Overview/KeyFeatures/gpu_supports.md) + - [Inference on GPU](./Overview/KeyFeatures/inference_on_gpu.md) + - [Finetune (QLoRA)](./Overview/KeyFeatures/finetune.md) + - [Multi Intel GPUs selection](./Overview/KeyFeatures/multi_gpus_selection.md) +- [Examples](../../python/llm/example/) + - [CPU](../../python/llm/example/CPU/) + - [GPU](../../python/llm/example/GPU/) +- [API Reference](./PythonAPI/) + - [IPEX-LLM PyTorch API](./PythonAPI/optimize.md) + - [IPEX-LLM `transformers`-style API](./PythonAPI/transformers.md) +- [FQA](./Overview/FAQ/faq.md)