* DeepSpeed-AutoTP will start multiple processors to load models and convert them in CPU memory. If model/rank_num is large, this will lead to OOM. Add RANK_WAIT_TIME to reduce memory usage by controlling model reading parallelism. |
||
|---|---|---|
| .. | ||
| CPU | ||
| GPU | ||
| NPU/HF-Transformers-AutoModels | ||