ipex-llm/python/llm/example
Qiyuan Gong 762ad49362
Add RANK_WAIT_TIME into DeepSpeed-AutoTP to avoid CPU memory OOM (#11704)
* DeepSpeed-AutoTP will start multiple processors to load models and convert them in CPU memory. If model/rank_num is large, this will lead to OOM. Add RANK_WAIT_TIME to reduce memory usage by controlling model reading parallelism.
2024-08-01 18:16:21 +08:00
..
CPU Add Llama3.1 example (#11689) 2024-07-31 10:53:30 +08:00
GPU Add RANK_WAIT_TIME into DeepSpeed-AutoTP to avoid CPU memory OOM (#11704) 2024-08-01 18:16:21 +08:00
NPU/HF-Transformers-AutoModels Switch to conhost when running on NPU (#11687) 2024-07-30 17:08:06 +08:00