ipex-llm/python
Qiyuan Gong 762ad49362
Add RANK_WAIT_TIME into DeepSpeed-AutoTP to avoid CPU memory OOM (#11704)
* DeepSpeed-AutoTP will start multiple processors to load models and convert them in CPU memory. If model/rank_num is large, this will lead to OOM. Add RANK_WAIT_TIME to reduce memory usage by controlling model reading parallelism.
2024-08-01 18:16:21 +08:00
..
llm Add RANK_WAIT_TIME into DeepSpeed-AutoTP to avoid CPU memory OOM (#11704) 2024-08-01 18:16:21 +08:00