ipex-llm

History

Qiyuan Gong 762ad49362 Add RANK_WAIT_TIME into DeepSpeed-AutoTP to avoid CPU memory OOM (#11704 ) * DeepSpeed-AutoTP will start multiple processors to load models and convert them in CPU memory. If model/rank_num is large, this will lead to OOM. Add RANK_WAIT_TIME to reduce memory usage by controlling model reading parallelism.		2024-08-01 18:16:21 +08:00
..
CPU	Add Llama3.1 example (#11689 )	2024-07-31 10:53:30 +08:00
GPU	Add RANK_WAIT_TIME into DeepSpeed-AutoTP to avoid CPU memory OOM (#11704 )	2024-08-01 18:16:21 +08:00
NPU/HF-Transformers-AutoModels	Switch to conhost when running on NPU (#11687 )	2024-07-30 17:08:06 +08:00