* DeepSpeed-AutoTP will start multiple processors to load models and convert them in CPU memory. If model/rank_num is large, this will lead to OOM. Add RANK_WAIT_TIME to reduce memory usage by controlling model reading parallelism. |
||
|---|---|---|
| .. | ||
| dev | ||
| example | ||
| portable-zip | ||
| scripts | ||
| src/ipex_llm | ||
| test | ||
| tpp | ||
| .gitignore | ||
| setup.py | ||
| version.txt | ||