Add initial python api doc in mddoc (2/2) (#11388)
* add PyTorch-API.md * small change * small change --------- Co-authored-by: ATMxsp01 <shou.xu@intel.com>
This commit is contained in:
		
							parent
							
								
									aafd6d55cd
								
							
						
					
					
						commit
						b200e11e21
					
				
					 1 changed files with 85 additions and 0 deletions
				
			
		
							
								
								
									
										85
									
								
								docs/mddocs/PythonAPI/PyTorch-API.md
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										85
									
								
								docs/mddocs/PythonAPI/PyTorch-API.md
									
									
									
									
									
										Normal file
									
								
							| 
						 | 
					@ -0,0 +1,85 @@
 | 
				
			||||||
 | 
					# IPEX-LLM PyTorch API
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Optimize Model
 | 
				
			||||||
 | 
					You can run any PyTorch model with `optimize_model` through only one-line code change to benefit from IPEX-LLM optimization, regardless of the library or API you are using.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					### `ipex_llm.optimize_model`_`(model, low_bit='sym_int4', optimize_llm=True, modules_to_not_convert=None, cpu_embedding=False, lightweight_bmm=False, **kwargs)`_
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					A method to optimize any pytorch model.
 | 
				
			||||||
 | 
					    
 | 
				
			||||||
 | 
					- **Parameters**:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					  - **model**: The original PyTorch model (nn.module) 
 | 
				
			||||||
 | 
					  
 | 
				
			||||||
 | 
					  - **low_bit**: str value, options are `'sym_int4'`, `'asym_int4'`, `'sym_int5'`, `'asym_int5'`, `'sym_int8'`, `'nf3'`, `'nf4'`, `'fp4'`, `'fp8'`, `'fp8_e4m3'`, `'fp8_e5m2'`, `'fp16'` or `'bf16'`, `'sym_int4'` means symmetric int 4, `'asym_int4'` means asymmetric int 4, `'nf4'` means 4-bit NormalFloat, etc. Relevant low bit optimizations will be applied to the model.
 | 
				
			||||||
 | 
					  
 | 
				
			||||||
 | 
					  - **optimize_llm**: Whether to further optimize llm model. 
 | 
				
			||||||
 | 
					 
 | 
				
			||||||
 | 
					    Default to be `True`.
 | 
				
			||||||
 | 
					  
 | 
				
			||||||
 | 
					  - **modules_to_not_convert**: list of str value, modules (`nn.Module`) that are skipped when conducting model optimizations. 
 | 
				
			||||||
 | 
					  
 | 
				
			||||||
 | 
					    Default to be `None`.
 | 
				
			||||||
 | 
					  
 | 
				
			||||||
 | 
					  - **cpu_embedding**: Whether to replace the Embedding layer, may need to set it to `True` when running BigDL-LLM on GPU on Windows. 
 | 
				
			||||||
 | 
					  
 | 
				
			||||||
 | 
					    Default to be `False`.
 | 
				
			||||||
 | 
					  
 | 
				
			||||||
 | 
					  - **lightweight_bmm**: Whether to replace the `torch.bmm` ops, may need to set it to `True` when running BigDL-LLM on GPU on Windows. 
 | 
				
			||||||
 | 
					  
 | 
				
			||||||
 | 
					    Default to be `False`.
 | 
				
			||||||
 | 
					  
 | 
				
			||||||
 | 
					- **Returns**: The optimized model.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					- **Example**:
 | 
				
			||||||
 | 
					    ```python
 | 
				
			||||||
 | 
					    # Take OpenAI Whisper model as an example
 | 
				
			||||||
 | 
					    from ipex_llm import optimize_model
 | 
				
			||||||
 | 
					    model = whisper.load_model('tiny') # Load whisper model under pytorch framework
 | 
				
			||||||
 | 
					    model = optimize_model(model) # With only one line code change
 | 
				
			||||||
 | 
					    # Use the optimized model without other API change
 | 
				
			||||||
 | 
					    result = model.transcribe(audio, verbose=True, language="English")
 | 
				
			||||||
 | 
					    # (Optional) you can also save the optimized model by calling 'save_low_bit'
 | 
				
			||||||
 | 
					    model.save_low_bit(saved_dir)
 | 
				
			||||||
 | 
					    ```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					## Load Optimized Model
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					To avoid high resource consumption during the loading processes of the original model, we provide save/load API to support the saving of model after low-bit optimization and the loading of the saved low-bit model. Saving and loading operations are platform-independent, regardless of their operating systems.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					### `ipex_llm.optimize.load_low_bit`_`(model, model_path)`_
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					Load the optimized pytorch model.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					- **Parameters**:
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					  - **model**: The PyTorch model instance.
 | 
				
			||||||
 | 
					  
 | 
				
			||||||
 | 
					  - **model_path**: The path of saved optimized model.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					- **Returns**: The optimized model.
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					- **Example**:
 | 
				
			||||||
 | 
					    ```python
 | 
				
			||||||
 | 
					    # Example 1:
 | 
				
			||||||
 | 
					    # Take ChatGLM2-6B model as an example
 | 
				
			||||||
 | 
					    # Make sure you have saved the optimized model by calling 'save_low_bit'
 | 
				
			||||||
 | 
					    from ipex_llm.optimize import low_memory_init, load_low_bit
 | 
				
			||||||
 | 
					    with low_memory_init(): # Fast and low cost by loading model on meta device
 | 
				
			||||||
 | 
					        model = AutoModel.from_pretrained(saved_dir,
 | 
				
			||||||
 | 
					                                          torch_dtype="auto",
 | 
				
			||||||
 | 
					                                          trust_remote_code=True)
 | 
				
			||||||
 | 
					    model = load_low_bit(model, saved_dir) # Load the optimized model
 | 
				
			||||||
 | 
					    ```
 | 
				
			||||||
 | 
					
 | 
				
			||||||
 | 
					    ```python
 | 
				
			||||||
 | 
					    # Example 2:
 | 
				
			||||||
 | 
					    # If the model doesn't fit 'low_memory_init' method,
 | 
				
			||||||
 | 
					    # alternatively, you can obtain the model instance through traditional loading method.
 | 
				
			||||||
 | 
					    # Take OpenAI Whisper model as an example
 | 
				
			||||||
 | 
					    # Make sure you have saved the optimized model by calling 'save_low_bit'
 | 
				
			||||||
 | 
					    from ipex_llm.optimize import load_low_bit
 | 
				
			||||||
 | 
					    model = whisper.load_model('tiny') # A model instance through traditional loading method
 | 
				
			||||||
 | 
					    model = load_low_bit(model, saved_dir) # Load the optimized model
 | 
				
			||||||
 | 
					    ```
 | 
				
			||||||
		Loading…
	
		Reference in a new issue