* Enhance model loading with device support and error handling
Updated device handling for model loading and added support for MPS. Improved error handling and fallback mechanisms for attention implementations.
* Improve device handling and model loading logic
Updated device argument handling to support MPS and added validation for MPS availability. Enhanced model loading logic based on the selected device type.
* fallback only when flash_attention_2 and add some comments back
---------
Co-authored-by: YaoyaoChang <cyy574006791@qq.com>