Commit graph

30 commits

Author SHA1 Message Date
Yuwen Hu
968d99e6f5 Remove empty cache between each iteration of generation (#9660) 2023-12-12 17:24:06 +08:00
WeiguangHan
e9299adb3b LLM: Highlight some values in the html (#9635)
* highlight some values in the html

* revert the llm_performance_tests.yml
2023-12-07 19:02:41 +08:00
Yuwen Hu
0e8f4020e5 Add traceback error output for win igpu test api in benchmark (#9607) 2023-12-06 14:35:16 +08:00
Yuwen Hu
c998f5f2ba [LLM] iGPU long context tests (#9598)
* Temp enable PR

* Enable tests for 256-64

* Try again 128-64

* Empty cache after each iteration for igpu benchmark scripts

* Try tests for 512

* change order for 512

* Skip chatglm3 and llama2 for now

* Separate tests for 512-64

* Small fix

* Further fixes

* Change back to nightly again
2023-12-06 10:19:20 +08:00
Yuwen Hu
3f4ad97929 [LLM] Add performance tests for windows iGPU (#9584)
* Add support for win gpu benchmark with peak gpu memory monitoring

* Add win igpu tests

* Small fix

* Forward outputs

* Small fix

* Test and small fixes

* Small fix

* Small fix and test

* Small fixes

* Add tests for 512-64 and change back to nightly tests

* Small fix
2023-12-04 20:50:02 +08:00
Ruonan Wang
139e98aa18 LLM: quick fix benchmark (#9509) 2023-11-22 10:19:57 +08:00
WeiguangHan
c2aeb4d1e8 del model after test (#9504) 2023-11-21 18:41:50 +08:00
Cengguang Zhang
ece5805572 LLM: add chatglm3-6b to latency benchmark test. (#9442) 2023-11-13 17:24:37 +08:00
WeiguangHan
84ab614aab LLM: add more models and skip runtime error (#9349)
* add more models and skip runtime error

* upgrade transformers

* temporarily removed Mistral-7B-v0.1

* temporarily disable the upload of arc perf result
2023-11-08 09:45:53 +08:00
Heyang Sun
af94058203 [LLM] Support CPU deepspeed distributed inference (#9259)
* [LLM] Support CPU Deepspeed distributed inference

* Update run_deepspeed.py

* Rename

* fix style

* add new codes

* refine

* remove annotated codes

* refine

* Update README.md

* refine doc and example code
2023-11-06 17:56:42 +08:00
binbin Deng
770ac70b00 LLM: add low_bit option in benchmark scripts (#9257) 2023-10-25 10:27:48 +08:00
WeiguangHan
ec9195da42 LLM: using html to visualize the perf result for Arc (#9228)
* LLM: using html to visualize the perf result for Arc

* deploy the html file

* add python license

* reslove some comments
2023-10-24 18:05:25 +08:00
Ruonan Wang
b15656229e LLM: fix benchmark issue (#9255) 2023-10-24 14:15:05 +08:00
WeiguangHan
b9194c5786 LLM: skip some model tests using certain api (#9163)
* LLM: Skip some model tests using certain api

* initialize variable named result
2023-10-18 09:39:27 +08:00
Ruonan Wang
4f34557224 LLM: support num_beams in all-in-one benchmark (#9141)
* support num_beams

* fix
2023-10-12 13:35:12 +08:00
Ruonan Wang
62ac7ae444 LLM: fix inaccurate input / output tokens of current all-in-one benchmark (#9137)
* first fix

* fix all apis

* fix
2023-10-11 17:13:34 +08:00
Ruonan Wang
1c8d5da362 LLM: fix llama tokenizer for all-in-one benchmark (#9129)
* fix tokenizer for gpu benchmark

* fix ipex fp16

* meet code review

* fix
2023-10-11 13:39:39 +08:00
Ruonan Wang
ad7d9231f5 LLM: add benchmark script for Max gpu and ipex fp16 gpu (#9112)
* add pvc bash

* meet code review

* rename to run-max-gpu.sh
2023-10-10 10:18:41 +08:00
Cengguang Zhang
26213a5829 LLM: Change benchmark bf16 load format. (#9035)
* LLM: Change benchmark bf16 load format.

* comment on bf16 chatglm.

* fix.
2023-09-22 17:38:38 +08:00
Xin Qiu
37bb0cbf8f Speed up gpt-j in gpubenchmark (#9000)
* Speedup gpt-j in gpubenchmark

* meet code review
2023-09-19 14:22:28 +08:00
Cengguang Zhang
74338fd291 LLM: add auto torch dtype in benchmark. (#8981) 2023-09-18 15:48:25 +08:00
Ruonan Wang
32716106e0 update use_cahce=True (#8986) 2023-09-18 07:59:33 +08:00
Xin Qiu
64ee1d7689 update run_transformer_int4_gpu (#8983)
* xpuperf

* update run.py

* clean upo

* uodate

* update

* meet code review
2023-09-15 15:10:04 +08:00
Cengguang Zhang
cca84b0a64 LLM: update llm benchmark scripts. (#8943)
* update llm benchmark scripts.

* change tranformer_bf16 to pytorch_autocast_bf16.

* add autocast in transformer int4.

* revert autocast.

* add "pytorch_autocast_bf16" to doc

* fix comments.
2023-09-13 12:23:28 +08:00
Cengguang Zhang
3d2efe9608 LLM: update llm latency benchmark. (#8922) 2023-09-07 19:00:19 +08:00
binbin Deng
7897eb4b51 LLM: add benchmark scripts on GPU (#8916) 2023-09-07 18:08:17 +08:00
Xin Qiu
d8a01d7c4f fix chatglm in run.pu (#8919) 2023-09-07 16:44:10 +08:00
Xin Qiu
e9de9d9950 benchmark for native int4 (#8918)
* native4

* update

* update

* update
2023-09-07 15:56:15 +08:00
Xin Qiu
5d9942a3ca transformer int4 and native int4's benchmark script for 32 256 1k 2k input (#8871)
* transformer

* move

* update

* add header

* update all-in-one

* clean up
2023-09-07 09:49:55 +08:00
Song Jiaming
c06f1ca93e [LLM] auto perf test to output to csv (#8846) 2023-09-01 10:48:00 +08:00