Ruonan Wang
|
057e77e229
|
LLM: update benchmark_utils.py to handle do_sample=True (#8903)
|
2023-09-07 14:20:47 +08:00 |
|
Xin Qiu
|
5d9942a3ca
|
transformer int4 and native int4's benchmark script for 32 256 1k 2k input (#8871)
* transformer
* move
* update
* add header
* update all-in-one
* clean up
|
2023-09-07 09:49:55 +08:00 |
|
Xin Qiu
|
49a39452c6
|
update benchmark (#8899)
|
2023-09-06 15:11:43 +08:00 |
|
Song Jiaming
|
7b3ac66e17
|
[LLM] auto performance test fix specific settings to template (#8876)
|
2023-09-01 15:49:04 +08:00 |
|
Song Jiaming
|
c06f1ca93e
|
[LLM] auto perf test to output to csv (#8846)
|
2023-09-01 10:48:00 +08:00 |
|
Song Jiaming
|
b8b1b6888b
|
[LLM] Performance test (#8796)
|
2023-08-25 14:31:45 +08:00 |
|
Ruonan Wang
|
e9aa2bd890
|
LLM: reduce GPU 1st token latency and update example (#8763)
* reduce 1st token latency
* update example
* fix
* fix style
* update readme of gpu benchmark
|
2023-08-16 18:01:23 +08:00 |
|
Ruonan Wang
|
8805186f2f
|
LLM: add benchmark tool for gpu (#8760)
* add benchmark tool for gpu
* update
|
2023-08-16 11:22:10 +08:00 |
|
Ruonan Wang
|
64b38e1dc8
|
llm: benchmark tool for transformers int4 (separate 1st token and rest) (#8460)
* add benchmark utils
* fix
* fix bug and add readme
* hidden latency data
|
2023-07-06 09:49:52 +08:00 |
|