ipex-llm

History

Qiyuan Gong 762ad49362 Add RANK_WAIT_TIME into DeepSpeed-AutoTP to avoid CPU memory OOM (#11704 ) * DeepSpeed-AutoTP will start multiple processors to load models and convert them in CPU memory. If model/rank_num is large, this will lead to OOM. Add RANK_WAIT_TIME to reduce memory usage by controlling model reading parallelism.		2024-08-01 18:16:21 +08:00
..
dev	Combine two versions of run_wikitext.py (#11597 )	2024-07-29 15:56:16 +08:00
example	Add RANK_WAIT_TIME into DeepSpeed-AutoTP to avoid CPU memory OOM (#11704 )	2024-08-01 18:16:21 +08:00
portable-zip	Fix null pointer dereferences error. (#11125 )	2024-05-30 16:16:10 +08:00
scripts	fix typo in python/llm/scripts/README.md (#11536 )	2024-07-09 09:53:14 +08:00
src/ipex_llm	Fix import vllm condition (#11682 )	2024-07-31 13:50:01 +08:00
test	add 3k and 4k input of nightly perf test on iGPU (#11701 )	2024-08-01 14:17:46 +08:00
tpp	OSPDT: add tpp licenses (#11165 )	2024-06-06 10:59:06 +08:00
.gitignore
setup.py	update doc/setup to use onednn gemm for cpp (#11598 )	2024-07-18 13:04:38 +08:00
version.txt	Update setup.py and add new actions and add compatible mode (#25 )	2024-03-22 15:44:59 +08:00