ipex-llm

Author	SHA1	Message	Date
Yuwen Hu	968d99e6f5	Remove empty cache between each iteration of generation (#9660 )	2023-12-12 17:24:06 +08:00
WeiguangHan	e9299adb3b	LLM: Highlight some values in the html (#9635 ) * highlight some values in the html * revert the llm_performance_tests.yml	2023-12-07 19:02:41 +08:00
Yuwen Hu	0e8f4020e5	Add traceback error output for win igpu test api in benchmark (#9607 )	2023-12-06 14:35:16 +08:00
Yuwen Hu	c998f5f2ba	[LLM] iGPU long context tests (#9598 ) * Temp enable PR * Enable tests for 256-64 * Try again 128-64 * Empty cache after each iteration for igpu benchmark scripts * Try tests for 512 * change order for 512 * Skip chatglm3 and llama2 for now * Separate tests for 512-64 * Small fix * Further fixes * Change back to nightly again	2023-12-06 10:19:20 +08:00
Yuwen Hu	3f4ad97929	[LLM] Add performance tests for windows iGPU (#9584 ) * Add support for win gpu benchmark with peak gpu memory monitoring * Add win igpu tests * Small fix * Forward outputs * Small fix * Test and small fixes * Small fix * Small fix and test * Small fixes * Add tests for 512-64 and change back to nightly tests * Small fix	2023-12-04 20:50:02 +08:00
Ruonan Wang	139e98aa18	LLM: quick fix benchmark (#9509 )	2023-11-22 10:19:57 +08:00
WeiguangHan	c2aeb4d1e8	del model after test (#9504 )	2023-11-21 18:41:50 +08:00
Cengguang Zhang	ece5805572	LLM: add chatglm3-6b to latency benchmark test. (#9442 )	2023-11-13 17:24:37 +08:00
WeiguangHan	84ab614aab	LLM: add more models and skip runtime error (#9349 ) * add more models and skip runtime error * upgrade transformers * temporarily removed Mistral-7B-v0.1 * temporarily disable the upload of arc perf result	2023-11-08 09:45:53 +08:00
Heyang Sun	af94058203	[LLM] Support CPU deepspeed distributed inference (#9259 ) * [LLM] Support CPU Deepspeed distributed inference * Update run_deepspeed.py * Rename * fix style * add new codes * refine * remove annotated codes * refine * Update README.md * refine doc and example code	2023-11-06 17:56:42 +08:00
binbin Deng	770ac70b00	LLM: add `low_bit` option in benchmark scripts (#9257 )	2023-10-25 10:27:48 +08:00
WeiguangHan	ec9195da42	LLM: using html to visualize the perf result for Arc (#9228 ) * LLM: using html to visualize the perf result for Arc * deploy the html file * add python license * reslove some comments	2023-10-24 18:05:25 +08:00
Ruonan Wang	b15656229e	LLM: fix benchmark issue (#9255 )	2023-10-24 14:15:05 +08:00
WeiguangHan	b9194c5786	LLM: skip some model tests using certain api (#9163 ) * LLM: Skip some model tests using certain api * initialize variable named result	2023-10-18 09:39:27 +08:00
Ruonan Wang	4f34557224	LLM: support num_beams in all-in-one benchmark (#9141 ) * support num_beams * fix	2023-10-12 13:35:12 +08:00
Ruonan Wang	62ac7ae444	LLM: fix inaccurate input / output tokens of current all-in-one benchmark (#9137 ) * first fix * fix all apis * fix	2023-10-11 17:13:34 +08:00
Ruonan Wang	1c8d5da362	LLM: fix llama tokenizer for all-in-one benchmark (#9129 ) * fix tokenizer for gpu benchmark * fix ipex fp16 * meet code review * fix	2023-10-11 13:39:39 +08:00
Ruonan Wang	ad7d9231f5	LLM: add benchmark script for Max gpu and ipex fp16 gpu (#9112 ) * add pvc bash * meet code review * rename to run-max-gpu.sh	2023-10-10 10:18:41 +08:00
Cengguang Zhang	26213a5829	LLM: Change benchmark bf16 load format. (#9035 ) * LLM: Change benchmark bf16 load format. * comment on bf16 chatglm. * fix.	2023-09-22 17:38:38 +08:00
Xin Qiu	37bb0cbf8f	Speed up gpt-j in gpubenchmark (#9000 ) * Speedup gpt-j in gpubenchmark * meet code review	2023-09-19 14:22:28 +08:00
Cengguang Zhang	74338fd291	LLM: add auto torch dtype in benchmark. (#8981 )	2023-09-18 15:48:25 +08:00
Ruonan Wang	32716106e0	update use_cahce=True (#8986 )	2023-09-18 07:59:33 +08:00
Xin Qiu	64ee1d7689	update run_transformer_int4_gpu (#8983 ) * xpuperf * update run.py * clean upo * uodate * update * meet code review	2023-09-15 15:10:04 +08:00
Cengguang Zhang	cca84b0a64	LLM: update llm benchmark scripts. (#8943 ) * update llm benchmark scripts. * change tranformer_bf16 to pytorch_autocast_bf16. * add autocast in transformer int4. * revert autocast. * add "pytorch_autocast_bf16" to doc * fix comments.	2023-09-13 12:23:28 +08:00
Cengguang Zhang	3d2efe9608	LLM: update llm latency benchmark. (#8922 )	2023-09-07 19:00:19 +08:00
binbin Deng	7897eb4b51	LLM: add benchmark scripts on GPU (#8916 )	2023-09-07 18:08:17 +08:00
Xin Qiu	d8a01d7c4f	fix chatglm in run.pu (#8919 )	2023-09-07 16:44:10 +08:00
Xin Qiu	e9de9d9950	benchmark for native int4 (#8918 ) * native4 * update * update * update	2023-09-07 15:56:15 +08:00
Xin Qiu	5d9942a3ca	transformer int4 and native int4's benchmark script for 32 256 1k 2k input (#8871 ) * transformer * move * update * add header * update all-in-one * clean up	2023-09-07 09:49:55 +08:00
Song Jiaming	c06f1ca93e	[LLM] auto perf test to output to csv (#8846 )	2023-09-01 10:48:00 +08:00

30 commits