ipex-llm

Author	SHA1	Message	Date
Shaojun Liu	7f8c5b410b	Quickstart: Run PyTorch Inference on Intel GPU using Docker (on Linux or WSL) (#10970 ) * add entrypoint.sh * add quickstart * remove entrypoint * update * Install related library of benchmarking * update * print out results * update docs * minor update * update * update quickstart * update * update * update * update * update * update * add chat & example section * add more details * minor update * rename quickstart * update * minor update * update * update config.yaml * update readme * use --gpu * add tips * minor update * update	2024-05-14 12:58:31 +08:00
binbin Deng	f51bf018eb	Add benchmark script for pipeline parallel inference (#10873 )	2024-04-26 15:28:11 +08:00
yb-peng	2685c41318	Modify all-in-one benchmark (#10726 ) * Update 8192 prompt in all-in-one * Add cpu_embedding param for linux api * Update run.py * Update README.md	2024-04-11 13:38:50 +08:00
yb-peng	2d88bb9b4b	add test api transformer_int4_fp16_gpu (#10627 ) * add test api transformer_int4_fp16_gpu * update config.yaml and README.md in all-in-one * modify run.py in all-in-one * re-order test-api * re-order test-api in config * modify README.md in all-in-one * modify README.md in all-in-one * modify config.yaml --------- Co-authored-by: pengyb2001 <arda@arda-arc21.sh.intel.com> Co-authored-by: ivy-lv11 <zhicunlv@gmail.com>	2024-04-07 15:47:17 +08:00
Xiangyu Tian	5a5fd5af5b	LLM: Add speculative benchmark on CPU/XPU (#10464 ) Add speculative benchmark on CPU/XPU.	2024-03-21 09:51:06 +08:00
Xiangyu Tian	cbe24cc7e6	LLM: Enable BigDL IPEX Int8 (#10480 ) Enable BigDL IPEX Int8	2024-03-20 15:59:54 +08:00
Jin Qiao	0451103a43	LLM: add int4+fp16 benchmark script for windows benchmarking (#10449 ) * LLM: add fp16 for benchmark script * remove transformer_int4_fp16_loadlowbit_gpu_win	2024-03-19 11:11:25 +08:00
Xiangyu Tian	0ded0b4b13	LLM: Enable BigDL IPEX optimization for int4 (#10319 ) Enable BigDL IPEX optimization for int4	2024-03-12 17:08:50 +08:00
binbin Deng	5d996a5caf	LLM: add benchmark script for deepspeed autotp on gpu (#10380 )	2024-03-12 15:19:57 +08:00
Yuwen Hu	27d9a14989	[LLM] all-on-one update: memory optimize and streaming output (#10302 ) * Memory saving for continous in-out pair run and add support for streaming output on MTL iGPU * Small fix * Small fix * Add things back	2024-03-01 18:02:30 +08:00
Keyan (Kyrie) Zhang	59861f73e5	Add Deepseek-6.7B (#9991 ) * Add new example Deepseek * Add new example Deepseek * Add new example Deepseek * Add new example Deepseek * Add new example Deepseek * modify deepseek * modify deepseek * Add verified model in README * Turn cpu_embedding=True in Deepseek example --------- Co-authored-by: Shengsheng Huang <shengsheng.huang@intel.com>	2024-02-28 11:36:39 +08:00
Yuwen Hu	001c13243e	[LLM] Add support for `low_low_bit` benchmark on Windows GPU (#10167 ) * Add support for low_low_bit performance test on Windows GPU * Small fix * Small fix * Save memory during converting model process * Drop the results for first time when loading in low bit on mtl igpu for better performance * Small fix	2024-02-21 10:51:52 +08:00
Ziteng Zhang	8b08ad408b	Add batch_size in all_in_one (#9999 ) Add batch_size in all_in_one, except run_native_int4	2024-01-25 17:43:49 +08:00
Ruonan Wang	b059a32fff	LLM: add benchmark api for bigdl-llm fp16 on GPU (#9919 ) * add bmk for bigdl fp16 * fix	2024-01-17 14:24:35 +08:00
Ziteng Zhang	4f4ce73f31	[LLM] Add transformer_autocast_bf16 into all-in-one (#9890 ) * Add transformer_autocast_bf16 into all-in-one	2024-01-11 17:51:07 +08:00
Yuwen Hu	3f4ad97929	[LLM] Add performance tests for windows iGPU (#9584 ) * Add support for win gpu benchmark with peak gpu memory monitoring * Add win igpu tests * Small fix * Forward outputs * Small fix * Test and small fixes * Small fix * Small fix and test * Small fixes * Add tests for 512-64 and change back to nightly tests * Small fix	2023-12-04 20:50:02 +08:00
Heyang Sun	af94058203	[LLM] Support CPU deepspeed distributed inference (#9259 ) * [LLM] Support CPU Deepspeed distributed inference * Update run_deepspeed.py * Rename * fix style * add new codes * refine * remove annotated codes * refine * Update README.md * refine doc and example code	2023-11-06 17:56:42 +08:00
binbin Deng	770ac70b00	LLM: add `low_bit` option in benchmark scripts (#9257 )	2023-10-25 10:27:48 +08:00
Ruonan Wang	4f34557224	LLM: support num_beams in all-in-one benchmark (#9141 ) * support num_beams * fix	2023-10-12 13:35:12 +08:00
Ruonan Wang	ad7d9231f5	LLM: add benchmark script for Max gpu and ipex fp16 gpu (#9112 ) * add pvc bash * meet code review * rename to run-max-gpu.sh	2023-10-10 10:18:41 +08:00
Cengguang Zhang	cca84b0a64	LLM: update llm benchmark scripts. (#8943 ) * update llm benchmark scripts. * change tranformer_bf16 to pytorch_autocast_bf16. * add autocast in transformer int4. * revert autocast. * add "pytorch_autocast_bf16" to doc * fix comments.	2023-09-13 12:23:28 +08:00
binbin Deng	7897eb4b51	LLM: add benchmark scripts on GPU (#8916 )	2023-09-07 18:08:17 +08:00
Xin Qiu	e9de9d9950	benchmark for native int4 (#8918 ) * native4 * update * update * update	2023-09-07 15:56:15 +08:00
Xin Qiu	5d9942a3ca	transformer int4 and native int4's benchmark script for 32 256 1k 2k input (#8871 ) * transformer * move * update * add header * update all-in-one * clean up	2023-09-07 09:49:55 +08:00
Song Jiaming	7b3ac66e17	[LLM] auto performance test fix specific settings to template (#8876 )	2023-09-01 15:49:04 +08:00
Song Jiaming	c06f1ca93e	[LLM] auto perf test to output to csv (#8846 )	2023-09-01 10:48:00 +08:00

26 commits