ipex-llm

Author	SHA1	Message	Date
Chen, Zhentao	f315c7f93a	Move harness nightly related files to llm/test folder (#10209 ) * move harness nightly files to test folder * change workflow file path accordingly * use arc01 when pr * fix path * fix fp16 csv path	2024-02-23 11:12:36 +08:00
Yuwen Hu	21de2613ce	[LLM] Add model loading time record for all-in-one benchmark (#10201 ) * Add model loading time record in csv for all-in-one benchmark * Small fix * Small fix to number after .	2024-02-22 13:57:18 +08:00
Yuxuan Xia	7cbc2429a6	Fix C-Eval ChatGLM loading issue (#10206 ) * Add c-eval workflow and modify running files * Modify the chatglm evaluator file * Modify the ceval workflow for triggering test * Modify the ceval workflow file * Modify the ceval workflow file * Modify ceval workflow * Adjust the ceval dataset download * Add ceval workflow dependencies * Modify ceval workflow dataset download * Add ceval test dependencies * Add ceval test dependencies * Correct the result print * Fix the nightly test trigger time * Fix ChatGLM loading issue	2024-02-22 10:00:43 +08:00
yb-peng	b1a97b71a9	Harness eval: Add is_last parameter and fix logical operator in highlight_vals (#10192 ) * Add is_last parameter and fix logical operator in highlight_vals * Add script to update HTML files in parent folder * Add running update_html_in_parent_folder.py in summarize step * Add licence info * Remove update_html_in_parent_folder.py in Summarize the results for pull request	2024-02-21 14:45:32 +08:00
Chen, Zhentao	39d37bd042	upgrade harness package version in workflow (#10188 ) * upgrade harness * update readme	2024-02-21 11:21:30 +08:00
Yuwen Hu	001c13243e	[LLM] Add support for `low_low_bit` benchmark on Windows GPU (#10167 ) * Add support for low_low_bit performance test on Windows GPU * Small fix * Small fix * Save memory during converting model process * Drop the results for first time when loading in low bit on mtl igpu for better performance * Small fix	2024-02-21 10:51:52 +08:00
yb-peng	de3dc609ee	Modify harness evaluation workflow (#10174 ) * Modify table head in harness * Specify the file path of fp16.csv * change run to run nightly and run pr to debug * Modify the way to get fp16.csv to downloading from github * Change the method to calculate diff in html table * Change the method to calculate diff in html table * Re-arrange job order * Re-arrange job order * Change limit * Change fp16.csv path * Change highlight rules * Change limit	2024-02-20 18:55:43 +08:00
hxsz1997	6e10d98a8d	Fix some typos (#10175 ) * add llm-ppl workflow * update the DATASET_DIR * test multiple precisions * modify nightly test * match the updated ppl code * add matrix.include * fix the include error * update the include * add more model * update the precision of include * update nightly time and add more models * fix the workflow_dispatch description, change default model of pr and modify the env * modify workflow_dispatch language options * modify options * modify language options * modeify workflow_dispatch type * modify type * modify the type of language * change seq_len type * fix some typos * revert changes to stress_test.txt	2024-02-20 14:14:53 +08:00
yb-peng	e31210ba00	Modify html table style and add fp16.csv in harness (#10169 ) * Specify the version of pandas in harness evaluation workflow * Specify the version of pandas in harness evaluation workflow * Modify html table style and add fp16.csv in harness * Modify comments	2024-02-19 18:13:40 +08:00
Yuxuan Xia	209122559a	Add Ceval workflow and modify the result printing (#10140 ) * Add c-eval workflow and modify running files * Modify the chatglm evaluator file * Modify the ceval workflow for triggering test * Modify the ceval workflow file * Modify the ceval workflow file * Modify ceval workflow * Adjust the ceval dataset download * Add ceval workflow dependencies * Modify ceval workflow dataset download * Add ceval test dependencies * Add ceval test dependencies * Correct the result print	2024-02-19 17:06:53 +08:00
yb-peng	b4dc33def6	In harness-evaluation workflow, add statistical tables (#10118 ) * chnage storage * fix typo * change label * change label to arc03 * change needs in the last step * add generate csv in harness/make_table_results.py * modify needs in the last job * add csv to html * mfix path issue in llm-harness-summary-nightly * modify output_path * modify args in make_table_results.py * modify make table command in summary * change pr env label * remove irrelevant code in summary; add set output path step; add limit in harness run * re-organize code structure * modify limit in run harness * modify csv_to_html input path * modify needs in summary-nightly	2024-02-08 19:01:05 +08:00
Yuxuan Xia	3832eb0ce0	Add ChatGLM C-Eval Evaluator (#10095 ) * Add ChatGLM ceval evaluator * Modify ChatGLM Evaluator Reference	2024-02-07 11:27:06 +08:00
Ovo233	2aaa21c41d	LLM: Update ppl tests (#10092 ) * update ppl tests * use load_dataset api * add exception handling * add language argument * address comments	2024-02-06 17:31:48 +08:00
dingbaorong	36c9442c6d	Arc Stable version test (#10087 ) * add batch_size in stable version test * add batch_size in excludes * add excludes for batch_size * fix ci * triger regression test * fix xpu version * disable ci * address kai's comment --------- Co-authored-by: Ariadne <wyn2000330@126.com>	2024-02-06 10:23:50 +08:00
WeiguangHan	c2e562d037	LLM: add batch_size to the csv and html (#10080 ) * LLM: add batch_size to the csv and html * small fix	2024-02-04 16:35:44 +08:00
WeiguangHan	d2d3f6b091	LLM: ensure the result of daily arc perf test (#10016 ) * ensure the result of daily arc perf test * small fix * small fix * small fix * small fix * small fix * small fix * small fix * small fix * small fix * small fix * concat more csvs * small fix * revert some files	2024-01-31 18:26:21 +08:00
Ovo233	226f398c2a	fix ppl test errors (#10036 )	2024-01-30 16:26:21 +08:00
Xin Qiu	13e61738c5	hide detail memory for each token in benchmark_utils.py (#10037 )	2024-01-30 16:04:17 +08:00
Xin Qiu	7952bbc919	add conf batch_size to run_model (#10010 )	2024-01-26 15:48:48 +08:00
Chen, Zhentao	762adc4f9d	Reformat summary table (#9942 ) * reformat the table * refactor the file * read result.json only	2024-01-25 23:49:00 +08:00
Ziteng Zhang	8b08ad408b	Add batch_size in all_in_one (#9999 ) Add batch_size in all_in_one, except run_native_int4	2024-01-25 17:43:49 +08:00
Chen, Zhentao	86055d76d5	fix optimize_model not working (#9995 )	2024-01-25 16:39:05 +08:00
Chen, Zhentao	301425e377	harness tests on pvc multiple xpus (#9908 ) * add run_multi_llb.py * update readme * add job hint	2024-01-23 13:20:37 +08:00
Cheen Hau, 俊豪	947b1e27b7	Add readme for Whisper Test (#9944 ) * Fix local data path * Remove non-essential files * Add readme * Minor fixes to script * Bugfix, refactor * Add references to original source. Bugfixes. * Reviewer comments * Properly print and explain output * Move files to dev/benchmark * Fixes	2024-01-22 15:11:33 +08:00
Xin Qiu	6fb3f40f7e	fix error for benchmark_util.py running on cpu (#9949 )	2024-01-22 10:14:40 +08:00
binbin Deng	db8e90796a	LLM: add avg token latency information and benchmark guide of autotp (#9940 )	2024-01-19 15:09:57 +08:00
Xin Qiu	610b5226be	move reserved memory to benchmark_utils.py (#9907 ) * move reserved memory to benchmark_utils.py * meet code review	2024-01-19 09:44:30 +08:00
Chen, Zhentao	a8c866c32b	add ppl benchmark (#9914 ) * add ppl benchmark * add license * add readme * add dataset argument * add dataset usage * fixed low bit args * correct result * fix terminal display * fix ppl update * enable fp16 fp32 bf16 * format the desc * fix model_kwargs * add more readme	2024-01-18 17:54:28 +08:00
WeiguangHan	100e0a87e5	LLM: add compressed chatglm3 model (#9892 ) * LLM: add compressed chatglm3 model * small fix * revert github action	2024-01-18 17:48:15 +08:00
Ruonan Wang	b059a32fff	LLM: add benchmark api for bigdl-llm fp16 on GPU (#9919 ) * add bmk for bigdl fp16 * fix	2024-01-17 14:24:35 +08:00
Cengguang Zhang	511cbcf773	LLM: add Ceval benchmark test. (#9872 ) * init ceval benchmark test. * upload dataset. * add other tests. * add qwen evaluator. * fix qwen evaluator style. * fix qwen evaluator style. * update qwen evaluator. * add llama evaluator. * update eval * fix typo. * fix * fix typo. * fix llama evaluator. * fix bug. * fix style. * delete dataset. * fix style. * fix style. * add README.md and fix typo. * fix comments. * remove run scripts	2024-01-16 19:14:26 +08:00
WeiguangHan	0e69bfe6b0	LLM: fix the performance drop of starcoder (#9889 ) * LLM: fix the performance drop of starcoder * small fix * small fix	2024-01-12 09:14:15 +08:00
Ziteng Zhang	4f4ce73f31	[LLM] Add transformer_autocast_bf16 into all-in-one (#9890 ) * Add transformer_autocast_bf16 into all-in-one	2024-01-11 17:51:07 +08:00
WeiguangHan	33fd1f9c76	LLM: fix input length logic for run_transformer_int4_gpu (#9864 ) * LLM: fix input length logic for run_transformer_int4_gpu * small fix * small fix * small fix	2024-01-10 18:20:14 +08:00
Cheen Hau, 俊豪	b2aa267f50	Enhance LLM GPU installation document (#9828 ) * Improve gpu install doc * Add troubleshooting - setvars.sh not done properly. * Further improvements * 2024.x.x -> 2024.0 * Fixes * Fix Install BigDL-LLM From Wheel : bigdl-llm[xpu_2.0] * Remove "export USE_XETLA=OFF" for Max GPU	2024-01-09 16:30:50 +08:00
dingbaorong	f6bb4ab313	Arc stress test (#9795 ) * add arc stress test * triger ci * triger CI * triger ci * disable ci	2023-12-27 21:02:41 +08:00
Shaojun Liu	6c75c689ea	bigdl-llm stress test for stable version (#9781 ) * 1k-512 2k-512 baseline * add cpu stress test * update yaml name * update * update * clean up * test * update * update * update * test * update	2023-12-27 15:40:53 +08:00
dingbaorong	5cfb4c4f5b	Arc stable version performance regression test (#9785 ) * add arc stable version regression test * empty gpu mem between different models * triger ci * comment spr test * triger ci * address kai's comments and disable ci * merge fp8 and int4 * disable ci	2023-12-27 11:01:56 +08:00
WeiguangHan	c05d7e1532	LLM: add star_corder_15.5b model (#9772 ) * LLM: add star_corder_15.5b model * revert llm_performance_tests.yml	2023-12-26 18:55:56 +08:00
dingbaorong	64d05e581c	add peak gpu mem stats in transformer_int4_gpu (#9766 ) * add peak gpu mem stats in transformer_int4_gpu * address weiguang's comments	2023-12-26 15:38:28 +08:00
Chen, Zhentao	7fd7c37e1b	Enable fp8e5 harness (#9761 ) * fix precision format like fp8e5 * match fp8_e5m2	2023-12-22 16:59:48 +08:00
WeiguangHan	474c099559	LLM: using separate threads to do inference (#9727 ) * using separate threads to do inference * resolve some comments * resolve some comments * revert llm_performance_tests.yml file	2023-12-21 17:56:43 +08:00
Chen, Zhentao	b06a3146c8	Fix 70b oom (#9738 ) * add default value to bigdl llm * fix model oom	2023-12-21 10:40:52 +08:00
WeiguangHan	3e8d198b57	LLM: add eval func (#9662 ) * Add eval func * add left eval	2023-12-14 14:59:02 +08:00
Yuwen Hu	cbdd49f229	[LLM] win igpu performance for ipex 2.1 and oneapi 2024.0 (#9679 ) * Change igpu win tests for ipex 2.1 and oneapi 2024.0 * Qwen model repo id updates; updates model list for 512-64 * Add .eval for win igpu all-in-one benchmark for best performance	2023-12-13 18:52:29 +08:00
Mingyu Wei	16febc949c	[LLM] Add exclude option in all-in-one performance test (#9632 ) * add exclude option in all-in-one perf test * update arc-perf-test.yaml * Exclude in_out_pairs in main function * fix some bugs * address Kai's comments * define excludes at the beginning * add bloomz:2048 to exclude	2023-12-13 18:13:06 +08:00
Yuwen Hu	968d99e6f5	Remove empty cache between each iteration of generation (#9660 )	2023-12-12 17:24:06 +08:00
Chen, Zhentao	972cdb9992	gsm8k OOM workaround (#9597 ) * update bigdl_llm.py * update the installation of harness * fix partial function * import ipex * force seq len in decrease order * put func outside class * move comments * default 'trust_remote_code' as True * Update llm-harness-evaluation.yml	2023-12-08 18:47:25 +08:00
WeiguangHan	e9299adb3b	LLM: Highlight some values in the html (#9635 ) * highlight some values in the html * revert the llm_performance_tests.yml	2023-12-07 19:02:41 +08:00
Yuwen Hu	48b85593b3	Update all-in-one benchmark readme (#9618 )	2023-12-07 10:32:09 +08:00
Yuwen Hu	0e8f4020e5	Add traceback error output for win igpu test api in benchmark (#9607 )	2023-12-06 14:35:16 +08:00
Yuwen Hu	c998f5f2ba	[LLM] iGPU long context tests (#9598 ) * Temp enable PR * Enable tests for 256-64 * Try again 128-64 * Empty cache after each iteration for igpu benchmark scripts * Try tests for 512 * change order for 512 * Skip chatglm3 and llama2 for now * Separate tests for 512-64 * Small fix * Further fixes * Change back to nightly again	2023-12-06 10:19:20 +08:00
Chen, Zhentao	8c8a27ded7	Add harness summary job (#9457 ) * format yml * add make_table_results * add summary job * add a job to print single result * upload full directory	2023-12-05 10:04:10 +08:00
Yuwen Hu	3f4ad97929	[LLM] Add performance tests for windows iGPU (#9584 ) * Add support for win gpu benchmark with peak gpu memory monitoring * Add win igpu tests * Small fix * Forward outputs * Small fix * Test and small fixes * Small fix * Small fix and test * Small fixes * Add tests for 512-64 and change back to nightly tests * Small fix	2023-12-04 20:50:02 +08:00
Chen, Zhentao	cb228c70ea	Add harness nightly (#9552 ) * modify output_path as a directory * schedule nightly at 21 on Friday * add tasks and models for nightly * add accuracy regression * comment out if to test * mixed fp4 * for test * add missing delimiter * remove comma * fixed golden results * add mixed 4 golden result * add more options * add mistral results * get golden result of stable lm * move nightly scripts and results to test folder * add license * add fp8 stable lm golden * run on all available devices * trigger only when ready for review * fix new line * update golden * add mistral	2023-12-01 14:16:35 +08:00
Chen, Zhentao	4d7d5d4c59	Add 3 leaderboard tasks (#9566 ) * update leaderboard map * download model and dataset without overwritten * fix task drop * run on all available devices	2023-12-01 14:01:14 +08:00
Chen, Zhentao	c8e0c2ed48	Fixed dumped logs in harness (#9549 ) * install transformers==4.34.0 * modify output_path as a directory * add device and task to output dir parents	2023-11-30 12:47:56 +08:00
Chen, Zhentao	45820cf3b9	add optimize model option (#9530 )	2023-11-24 17:10:49 +08:00
Guancheng Fu	bf579507c2	Integrate vllm (#9310 ) * done * Rename structure * add models * Add structure/sampling_params,sequence * add input_metadata * add outputs * Add policy,logger * add and update * add parallelconfig back * core/scheduler.py * Add llm_engine.py * Add async_llm_engine.py * Add tested entrypoint * fix minor error * Fix everything * fix kv cache view * fix * fix * fix * format&refine * remove logger from repo * try to add token latency * remove logger * Refine config.py * finish worker.py * delete utils.py * add license * refine * refine sequence.py * remove sampling_params.py * finish * add license * format * add license * refine * refine * Refine line too long * remove exception * so dumb style-check * refine * refine * refine * refine * refine * refine * add README * refine README * add warning instead error * fix padding * add license * format * format * format fix * Refine vllm dependency (#1) vllm dependency clear * fix licence * fix format * fix format * fix * adapt LLM engine * fix * add license * fix format * fix * Moving README.md to the correct position * Fix readme.md * done * guide for adding models * fix * Fix README.md * Add new model readme * remove ray-logic * refactor arg_utils.py * remove distributed_init_method logic * refactor entrypoints * refactor input_metadata * refactor model_loader * refactor utils.py * refactor models * fix api server * remove vllm.stucture * revert by txy 1120 * remove utils * format * fix license * add bigdl model * Refer to a specfic commit * Change code base * add comments * add async_llm_engine comment * refine * formatted * add worker comments * add comments * add comments * fix style * add changes --------- Co-authored-by: xiangyuT <xiangyu.tian@intel.com> Co-authored-by: Xiangyu Tian <109123695+xiangyuT@users.noreply.github.com> Co-authored-by: leonardozcm <leonardo1997zcm@gmail.com>	2023-11-23 16:46:45 +08:00
Ruonan Wang	139e98aa18	LLM: quick fix benchmark (#9509 )	2023-11-22 10:19:57 +08:00
WeiguangHan	c2aeb4d1e8	del model after test (#9504 )	2023-11-21 18:41:50 +08:00
Cheen Hau, 俊豪	3e39828420	Update all in one benchmark readme (#9496 ) * Add gperftools install to all in one benchmark readme * Update readme	2023-11-21 14:57:16 +08:00
WeiguangHan	c487b53f21	LLM: only run arc perf test nightly (#9448 ) * LLM: only run arc perf test nightly * deleted unused python scripts * rebase main	2023-11-15 19:38:14 +08:00
Chen, Zhentao	dbbdb53a18	fix multiple gpu usage (#9459 )	2023-11-14 17:06:27 +08:00
Chen, Zhentao	d19ca21957	patch bigdl-llm model to harness by binding instead of patch file (#9420 ) * add run_llb.py * fix args interpret * modify outputs * update workflow * add license * test mixed 4 bit * update readme * use autotokenizer * add timeout * refactor workflow file * fix working directory * fix env * throw exception if some jobs failed * improve terminal outputs * Disable var which cause the run stuck * fix unknown precision * fix key error * directly output config instead * rm harness submodule	2023-11-14 12:51:39 +08:00
Chen, Zhentao	0ecb9efb05	use AutoTokenizer to enable more models (#9446 )	2023-11-13 17:47:43 +08:00
Cengguang Zhang	ece5805572	LLM: add chatglm3-6b to latency benchmark test. (#9442 )	2023-11-13 17:24:37 +08:00
Chen, Zhentao	5747e2fe69	fix multiple gpu usage of harness (#9444 )	2023-11-13 16:53:23 +08:00
Heyang Sun	b23b91407c	fix llm-init on deepspeed missing lib (#9419 )	2023-11-10 13:51:24 +08:00
Chen, Zhentao	298b64217e	add auto triggered acc test (#9364 ) * add auto triggered acc test * use llama 7b instead * fix env * debug download * fix download prefix * add cut dirs * fix env of model path * fix dataset download * full job * source xpu env vars * use matrix to trigger model run * reset batch=1 * remove redirect * remove some trigger * add task matrix * add precision list * test llama-7b-chat * use /mnt/disk1 to store model and datasets * remove installation test * correct downloading path * fix HF vars * add bigdl-llm env vars * rename file * fix hf_home * fix script path * rename as harness evalution * rerun	2023-11-08 10:22:27 +08:00
WeiguangHan	84ab614aab	LLM: add more models and skip runtime error (#9349 ) * add more models and skip runtime error * upgrade transformers * temporarily removed Mistral-7B-v0.1 * temporarily disable the upload of arc perf result	2023-11-08 09:45:53 +08:00
Heyang Sun	af94058203	[LLM] Support CPU deepspeed distributed inference (#9259 ) * [LLM] Support CPU Deepspeed distributed inference * Update run_deepspeed.py * Rename * fix style * add new codes * refine * remove annotated codes * refine * Update README.md * refine doc and example code	2023-11-06 17:56:42 +08:00
Chen, Zhentao	d4dffbdb62	Merge harness (#9319 ) * add harness patch and llb script * add readme * add license * use patch instead * update readme * rename tests to evaluation * fix typo * remove nano dependency * add original harness link * rename title of usage * rename BigDLGPULM as BigDLLM * empty commit to rerun job	2023-11-02 15:14:19 +08:00
Ruonan Wang	7e73c354a6	LLM: decoupling bigdl-llm and bigdl-nano (#9306 )	2023-11-01 11:00:54 +08:00
binbin Deng	770ac70b00	LLM: add `low_bit` option in benchmark scripts (#9257 )	2023-10-25 10:27:48 +08:00
WeiguangHan	ec9195da42	LLM: using html to visualize the perf result for Arc (#9228 ) * LLM: using html to visualize the perf result for Arc * deploy the html file * add python license * reslove some comments	2023-10-24 18:05:25 +08:00
Ruonan Wang	b15656229e	LLM: fix benchmark issue (#9255 )	2023-10-24 14:15:05 +08:00
WeiguangHan	b9194c5786	LLM: skip some model tests using certain api (#9163 ) * LLM: Skip some model tests using certain api * initialize variable named result	2023-10-18 09:39:27 +08:00
Ruonan Wang	4f34557224	LLM: support num_beams in all-in-one benchmark (#9141 ) * support num_beams * fix	2023-10-12 13:35:12 +08:00
Ruonan Wang	62ac7ae444	LLM: fix inaccurate input / output tokens of current all-in-one benchmark (#9137 ) * first fix * fix all apis * fix	2023-10-11 17:13:34 +08:00
Ruonan Wang	1c8d5da362	LLM: fix llama tokenizer for all-in-one benchmark (#9129 ) * fix tokenizer for gpu benchmark * fix ipex fp16 * meet code review * fix	2023-10-11 13:39:39 +08:00
Ruonan Wang	1363e666fc	LLM: update benchmark_util.py for beam search (#9126 ) * update reorder_cache * fix	2023-10-11 09:41:53 +08:00
Yuwen Hu	0e09dd926b	[LLM] Fix example test (#9118 ) * Update llm example test link due to example layout change * Add better change detect	2023-10-10 13:24:18 +08:00
Ruonan Wang	ad7d9231f5	LLM: add benchmark script for Max gpu and ipex fp16 gpu (#9112 ) * add pvc bash * meet code review * rename to run-max-gpu.sh	2023-10-10 10:18:41 +08:00
Yuwen Hu	65212451cc	[LLM] Small update to performance tests (#9106 ) * small updates to llm performance tests regarding model handling * Small fix	2023-10-09 16:55:25 +08:00
Kai Huang	78ea7ddb1c	Combine apply_rotary_pos_emb for gpt-neox (#9074 )	2023-10-07 16:27:46 +08:00
Cengguang Zhang	ad62c58b33	LLM: Enable jemalloc in benchmark scripts. (#9058 ) * enable jemalloc. * fix readme.	2023-09-26 15:37:49 +08:00
Cengguang Zhang	26213a5829	LLM: Change benchmark bf16 load format. (#9035 ) * LLM: Change benchmark bf16 load format. * comment on bf16 chatglm. * fix.	2023-09-22 17:38:38 +08:00
Kai Huang	6981745fe4	Optimize kv_cache for gpt-neox model family (#9015 ) * override gptneox * style * move to utils * revert	2023-09-20 19:59:19 +08:00
Xin Qiu	37bb0cbf8f	Speed up gpt-j in gpubenchmark (#9000 ) * Speedup gpt-j in gpubenchmark * meet code review	2023-09-19 14:22:28 +08:00
Cengguang Zhang	8299b68fea	update readme. (#8996 )	2023-09-18 17:06:15 +08:00
Cengguang Zhang	74338fd291	LLM: add auto torch dtype in benchmark. (#8981 )	2023-09-18 15:48:25 +08:00
Ruonan Wang	32716106e0	update use_cahce=True (#8986 )	2023-09-18 07:59:33 +08:00
Xin Qiu	64ee1d7689	update run_transformer_int4_gpu (#8983 ) * xpuperf * update run.py * clean upo * uodate * update * meet code review	2023-09-15 15:10:04 +08:00
Cengguang Zhang	cca84b0a64	LLM: update llm benchmark scripts. (#8943 ) * update llm benchmark scripts. * change tranformer_bf16 to pytorch_autocast_bf16. * add autocast in transformer int4. * revert autocast. * add "pytorch_autocast_bf16" to doc * fix comments.	2023-09-13 12:23:28 +08:00
Xin Qiu	ea0853c0b5	update benchmark_utils readme (#8925 ) * update readme * meet code review	2023-09-08 10:30:26 +08:00
Cengguang Zhang	3d2efe9608	LLM: update llm latency benchmark. (#8922 )	2023-09-07 19:00:19 +08:00
binbin Deng	7897eb4b51	LLM: add benchmark scripts on GPU (#8916 )	2023-09-07 18:08:17 +08:00
Xin Qiu	d8a01d7c4f	fix chatglm in run.pu (#8919 )	2023-09-07 16:44:10 +08:00
Xin Qiu	e9de9d9950	benchmark for native int4 (#8918 ) * native4 * update * update * update	2023-09-07 15:56:15 +08:00

1 2 3 4

169 commits