ipex-llm

Author	SHA1	Message	Date
Xin Qiu	37bb0cbf8f	Speed up gpt-j in gpubenchmark (#9000 ) * Speedup gpt-j in gpubenchmark * meet code review	2023-09-19 14:22:28 +08:00
Cengguang Zhang	8299b68fea	update readme. (#8996 )	2023-09-18 17:06:15 +08:00
Cengguang Zhang	74338fd291	LLM: add auto torch dtype in benchmark. (#8981 )	2023-09-18 15:48:25 +08:00
Ruonan Wang	32716106e0	update use_cahce=True (#8986 )	2023-09-18 07:59:33 +08:00
Xin Qiu	64ee1d7689	update run_transformer_int4_gpu (#8983 ) * xpuperf * update run.py * clean upo * uodate * update * meet code review	2023-09-15 15:10:04 +08:00
Cengguang Zhang	cca84b0a64	LLM: update llm benchmark scripts. (#8943 ) * update llm benchmark scripts. * change tranformer_bf16 to pytorch_autocast_bf16. * add autocast in transformer int4. * revert autocast. * add "pytorch_autocast_bf16" to doc * fix comments.	2023-09-13 12:23:28 +08:00
Xin Qiu	ea0853c0b5	update benchmark_utils readme (#8925 ) * update readme * meet code review	2023-09-08 10:30:26 +08:00
Cengguang Zhang	3d2efe9608	LLM: update llm latency benchmark. (#8922 )	2023-09-07 19:00:19 +08:00
binbin Deng	7897eb4b51	LLM: add benchmark scripts on GPU (#8916 )	2023-09-07 18:08:17 +08:00
Xin Qiu	d8a01d7c4f	fix chatglm in run.pu (#8919 )	2023-09-07 16:44:10 +08:00
Xin Qiu	e9de9d9950	benchmark for native int4 (#8918 ) * native4 * update * update * update	2023-09-07 15:56:15 +08:00
Ruonan Wang	057e77e229	LLM: update benchmark_utils.py to handle do_sample=True (#8903 )	2023-09-07 14:20:47 +08:00
Xin Qiu	5d9942a3ca	transformer int4 and native int4's benchmark script for 32 256 1k 2k input (#8871 ) * transformer * move * update * add header * update all-in-one * clean up	2023-09-07 09:49:55 +08:00
Xin Qiu	49a39452c6	update benchmark (#8899 )	2023-09-06 15:11:43 +08:00
Song Jiaming	7b3ac66e17	[LLM] auto performance test fix specific settings to template (#8876 )	2023-09-01 15:49:04 +08:00
Song Jiaming	c06f1ca93e	[LLM] auto perf test to output to csv (#8846 )	2023-09-01 10:48:00 +08:00
Song Jiaming	b8b1b6888b	[LLM] Performance test (#8796 )	2023-08-25 14:31:45 +08:00
Ruonan Wang	e9aa2bd890	LLM: reduce GPU 1st token latency and update example (#8763 ) * reduce 1st token latency * update example * fix * fix style * update readme of gpu benchmark	2023-08-16 18:01:23 +08:00
Ruonan Wang	8805186f2f	LLM: add benchmark tool for gpu (#8760 ) * add benchmark tool for gpu * update	2023-08-16 11:22:10 +08:00
Ruonan Wang	64b38e1dc8	llm: benchmark tool for transformers int4 (separate 1st token and rest) (#8460 ) * add benchmark utils * fix * fix bug and add readme * hidden latency data	2023-07-06 09:49:52 +08:00

20 commits