Add index page for API doc & links update in mddocs (#11393)

* Small fixes * Add initial api doc index * Change index.md -> README.md * Fix on API links
2024-06-21 17:34:34 +08:00 · 2024-06-21 17:34:34 +08:00 · 4cb9a4728e
commit 4cb9a4728e
parent b200e11e21
8 changed files with 104 additions and 88 deletions
--- a/docs/mddocs/DockerGuides/README.md
+++ b/docs/mddocs/DockerGuides/README.md
--- a/docs/mddocs/Overview/KeyFeatures/README.md
+++ b/docs/mddocs/Overview/KeyFeatures/README.md
--- a/docs/mddocs/Overview/KeyFeatures/inference_on_gpu.md
+++ b/docs/mddocs/Overview/KeyFeatures/inference_on_gpu.md
@ -34,7 +34,7 @@ You could choose to use [PyTorch API](./optimize_model.md) or [`transformers`-st
  >
  > When running LLMs on Intel iGPUs for Windows users, we recommend setting `cpu_embedding=True` in the `optimize_model` function. This will allow the memory-intensive embedding layer to utilize the CPU instead of iGPU.
  >
-  > See the [API doc](https://ipex-llm.readthedocs.io/en/latest/doc/PythonAPI/LLM/optimize.html) for ``optimize_model`` to find more information.
+  > See the [API doc](../../PythonAPI/optimize.md) for ``optimize_model`` to find more information.

  Especially, if you have saved the optimized model following setps [here](./optimize_model.md#save), the loading process on Intel GPUs maybe as follows:

@ -70,7 +70,7 @@ You could choose to use [PyTorch API](./optimize_model.md) or [`transformers`-st
  >
  > When running LLMs on Intel iGPUs for Windows users, we recommend setting `cpu_embedding=True` in the `from_pretrained` function. This will allow the memory-intensive embedding layer to utilize the CPU instead of iGPU.
  >
-  > See the [API doc](https://ipex-llm.readthedocs.io/en/latest/doc/PythonAPI/LLM/transformers.html) to find more information.
+  > See the [API doc](../../PythonAPI/transformers.md) to find more information.

  Especially, if you have saved the optimized model following setps [here](./hugging_face_format.md#save--load), the loading process on Intel GPUs maybe as follows:

--- a/docs/mddocs/Overview/KeyFeatures/optimize_model.md
+++ b/docs/mddocs/Overview/KeyFeatures/optimize_model.md
@ -61,6 +61,6 @@ model = load_low_bit(model, saved_dir) # Load the optimized model


 > [!NOTE]
-> - Please refer to the [API documentation](https://ipex-llm.readthedocs.io/en/latest/doc/PythonAPI/LLM/optimize.html) for more details.
+> - Please refer to the [API documentation](../../PythonAPI/optimize.md) for more details.
 > - We also provide detailed examples on how to run PyTorch models (e.g., Openai Whisper, LLaMA2, ChatGLM2, Falcon, MPT, Baichuan2, etc.) using IPEX-LLM. See the complete CPU examples [here](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/CPU/PyTorch-Models) and GPU examples [here](https://github.com/intel-analytics/ipex-llm/tree/main/python/llm/example/GPU/PyTorch-Models)

--- a/docs/mddocs/PythonAPI/PyTorch-API.md
+++ b/docs/mddocs/PythonAPI/PyTorch-API.md
@ -1,85 +0,0 @@
-# IPEX-LLM PyTorch API
-
-## Optimize Model
-You can run any PyTorch model with `optimize_model` through only one-line code change to benefit from IPEX-LLM optimization, regardless of the library or API you are using.
-
-### `ipex_llm.optimize_model`_`(model, low_bit='sym_int4', optimize_llm=True, modules_to_not_convert=None, cpu_embedding=False, lightweight_bmm=False, **kwargs)`_
-
-A method to optimize any pytorch model.
-    
- **Parameters**:
-
-  - **model**: The original PyTorch model (nn.module) 
-  
-  - **low_bit**: str value, options are `'sym_int4'`, `'asym_int4'`, `'sym_int5'`, `'asym_int5'`, `'sym_int8'`, `'nf3'`, `'nf4'`, `'fp4'`, `'fp8'`, `'fp8_e4m3'`, `'fp8_e5m2'`, `'fp16'` or `'bf16'`, `'sym_int4'` means symmetric int 4, `'asym_int4'` means asymmetric int 4, `'nf4'` means 4-bit NormalFloat, etc. Relevant low bit optimizations will be applied to the model.
-  
-  - **optimize_llm**: Whether to further optimize llm model. 
- 
-    Default to be `True`.
-  
-  - **modules_to_not_convert**: list of str value, modules (`nn.Module`) that are skipped when conducting model optimizations. 
-  
-    Default to be `None`.
-  
-  - **cpu_embedding**: Whether to replace the Embedding layer, may need to set it to `True` when running BigDL-LLM on GPU on Windows. 
-  
-    Default to be `False`.
-  
-  - **lightweight_bmm**: Whether to replace the `torch.bmm` ops, may need to set it to `True` when running BigDL-LLM on GPU on Windows. 
-  
-    Default to be `False`.
-  
- **Returns**: The optimized model.
-
- **Example**:
-    ```python
-    # Take OpenAI Whisper model as an example
-    from ipex_llm import optimize_model
-    model = whisper.load_model('tiny') # Load whisper model under pytorch framework
-    model = optimize_model(model) # With only one line code change
-    # Use the optimized model without other API change
-    result = model.transcribe(audio, verbose=True, language="English")
-    # (Optional) you can also save the optimized model by calling 'save_low_bit'
-    model.save_low_bit(saved_dir)
-    ```
-
-## Load Optimized Model
-
-To avoid high resource consumption during the loading processes of the original model, we provide save/load API to support the saving of model after low-bit optimization and the loading of the saved low-bit model. Saving and loading operations are platform-independent, regardless of their operating systems.
-
-### `ipex_llm.optimize.load_low_bit`_`(model, model_path)`_
-
-Load the optimized pytorch model.
-
- **Parameters**:
-
-  - **model**: The PyTorch model instance.
-  
-  - **model_path**: The path of saved optimized model.
-
-
- **Returns**: The optimized model.
-
- **Example**:
-    ```python
-    # Example 1:
-    # Take ChatGLM2-6B model as an example
-    # Make sure you have saved the optimized model by calling 'save_low_bit'
-    from ipex_llm.optimize import low_memory_init, load_low_bit
-    with low_memory_init(): # Fast and low cost by loading model on meta device
-        model = AutoModel.from_pretrained(saved_dir,
-                                          torch_dtype="auto",
-                                          trust_remote_code=True)
-    model = load_low_bit(model, saved_dir) # Load the optimized model
-    ```
-
-    ```python
-    # Example 2:
-    # If the model doesn't fit 'low_memory_init' method,
-    # alternatively, you can obtain the model instance through traditional loading method.
-    # Take OpenAI Whisper model as an example
-    # Make sure you have saved the optimized model by calling 'save_low_bit'
-    from ipex_llm.optimize import load_low_bit
-    model = whisper.load_model('tiny') # A model instance through traditional loading method
-    model = load_low_bit(model, saved_dir) # Load the optimized model
-    ```
--- a/docs/mddocs/PythonAPI/README.md
+++ b/docs/mddocs/PythonAPI/README.md
@ -0,0 +1,22 @@
+# IPEX-LLM API
+
+- [IPEX-LLM `transformers`-style API](./transformers.md)
+
+  - [Hugging Face `transformers` AutoModel](./transformers.md#hugging-face-transformers-automodel)
+
+    - AutoModelForCausalLM
+    - AutoModel
+    - AutoModelForSpeechSeq2Seq
+    - AutoModelForSeq2SeqLM
+    - AutoModelForSequenceClassification
+    - AutoModelForMaskedLM
+    - AutoModelForQuestionAnswering
+    - AutoModelForNextSentencePrediction
+    - AutoModelForMultipleChoice
+    - AutoModelForTokenClassification
+
+- [IPEX-LLM PyTorch API](./optimize.md)
+  
+  - [Optimize Model](./optimize.md#optimize-model)
+
+  - [Load Optimized Model](./optimize.md#load-optimized-model)
--- a/docs/mddocs/PythonAPI/optimize.md
+++ b/docs/mddocs/PythonAPI/optimize.md
@ -0,0 +1,79 @@
+# IPEX-LLM PyTorch API
+
+## Optimize Model
+You can run any PyTorch model with `optimize_model` through only one-line code change to benefit from IPEX-LLM optimization, regardless of the library or API you are using.
+
+### `ipex_llm.optimize_model`_`(model, low_bit='sym_int4', optimize_llm=True, modules_to_not_convert=None, cpu_embedding=False, lightweight_bmm=False, **kwargs)`_
+
+A method to optimize any pytorch model.
+    
+- **Parameters**:
+
+  - **model**: The original PyTorch model (nn.module) 
+  
+  - **low_bit**: str value, options are `'sym_int4'`, `'asym_int4'`, `'sym_int5'`, `'asym_int5'`, `'sym_int8'`, `'nf3'`, `'nf4'`, `'fp4'`, `'fp8'`, `'fp8_e4m3'`, `'fp8_e5m2'`, `'fp16'` or `'bf16'`, `'sym_int4'` means symmetric int 4, `'asym_int4'` means asymmetric int 4, `'nf4'` means 4-bit NormalFloat, etc. Relevant low bit optimizations will be applied to the model.
+  
+  - **optimize_llm**: Whether to further optimize llm model. Default to be `True`.
+  
+  - **modules_to_not_convert**: list of str value, modules (`nn.Module`) that are skipped when conducting model optimizations. Default to be `None`.
+  
+  - **cpu_embedding**: Whether to replace the Embedding layer, may need to set it to `True` when running IPEX-LLM on GPU. Default to be `False`.
+  
+  - **lightweight_bmm**: Whether to replace the `torch.bmm` ops, may need to set it to `True` when running IPEX-LLM on GPU on Windows. Default to be `False`.
+  
+- **Returns**: The optimized model.
+
+- **Example**:
+
+  ```python
+  # Take OpenAI Whisper model as an example
+  from ipex_llm import optimize_model
+  model = whisper.load_model('tiny') # Load whisper model under pytorch framework
+  model = optimize_model(model) # With only one line code change
+  # Use the optimized model without other API change
+  result = model.transcribe(audio, verbose=True, language="English")
+  # (Optional) you can also save the optimized model by calling 'save_low_bit'
+  model.save_low_bit(saved_dir)
+  ```
+
+## Load Optimized Model
+
+To avoid high resource consumption during the loading processes of the original model, we provide save/load API to support the saving of model after low-bit optimization and the loading of the saved low-bit model. Saving and loading operations are platform-independent, regardless of their operating systems.
+
+### `ipex_llm.optimize.load_low_bit`_`(model, model_path)`_
+
+Load the optimized pytorch model.
+
+- **Parameters**:
+
+  - **model**: The PyTorch model instance.
+  
+  - **model_path**: The path of saved optimized model.
+
+
+- **Returns**: The optimized model.
+
+- **Example**:
+
+  ```python
+  # Example 1:
+  # Take ChatGLM2-6B model as an example
+  # Make sure you have saved the optimized model by calling 'save_low_bit'
+  from ipex_llm.optimize import low_memory_init, load_low_bit
+  with low_memory_init(): # Fast and low cost by loading model on meta device
+      model = AutoModel.from_pretrained(saved_dir,
+                                        torch_dtype="auto",
+                                        trust_remote_code=True)
+  model = load_low_bit(model, saved_dir) # Load the optimized model
+  ```
+
+  ```python
+  # Example 2:
+  # If the model doesn't fit 'low_memory_init' method,
+  # alternatively, you can obtain the model instance through traditional loading method.
+  # Take OpenAI Whisper model as an example
+  # Make sure you have saved the optimized model by calling 'save_low_bit'
+  from ipex_llm.optimize import load_low_bit
+  model = whisper.load_model('tiny') # A model instance through traditional loading method
+  model = load_low_bit(model, saved_dir) # Load the optimized model
+  ```
--- a/docs/mddocs/Quickstart/README.md
+++ b/docs/mddocs/Quickstart/README.md