ipex-llm

History

Qiyuan Gong 762ad49362 Add RANK_WAIT_TIME into DeepSpeed-AutoTP to avoid CPU memory OOM (#11704 ) * DeepSpeed-AutoTP will start multiple processors to load models and convert them in CPU memory. If model/rank_num is large, this will lead to OOM. Add RANK_WAIT_TIME to reduce memory usage by controlling model reading parallelism.		2024-08-01 18:16:21 +08:00
..
llm	Add RANK_WAIT_TIME into DeepSpeed-AutoTP to avoid CPU memory OOM (#11704 )	2024-08-01 18:16:21 +08:00