Commit graph

82 commits

Author SHA1 Message Date
binbin Deng
db8e90796a LLM: add avg token latency information and benchmark guide of autotp (#9940) 2024-01-19 15:09:57 +08:00
Xin Qiu
610b5226be move reserved memory to benchmark_utils.py (#9907)
* move reserved memory to benchmark_utils.py

* meet code review
2024-01-19 09:44:30 +08:00
Chen, Zhentao
a8c866c32b add ppl benchmark (#9914)
* add ppl benchmark

* add license

* add readme

* add dataset argument

* add dataset usage

* fixed low bit args

* correct result

* fix terminal display

* fix ppl update

* enable fp16 fp32 bf16

* format the desc

* fix model_kwargs

* add more readme
2024-01-18 17:54:28 +08:00
WeiguangHan
100e0a87e5 LLM: add compressed chatglm3 model (#9892)
* LLM: add compressed chatglm3 model

* small fix

* revert github action
2024-01-18 17:48:15 +08:00
Ruonan Wang
b059a32fff LLM: add benchmark api for bigdl-llm fp16 on GPU (#9919)
* add bmk for bigdl fp16

* fix
2024-01-17 14:24:35 +08:00
Cengguang Zhang
511cbcf773 LLM: add Ceval benchmark test. (#9872)
* init ceval benchmark test.

* upload dataset.

* add other tests.

* add qwen evaluator.

* fix qwen evaluator style.

* fix qwen evaluator style.

* update qwen evaluator.

* add llama evaluator.

* update eval

* fix typo.

* fix

* fix typo.

* fix llama evaluator.

* fix bug.

* fix style.

* delete dataset.

* fix style.

* fix style.

* add README.md and fix typo.

* fix comments.

* remove run scripts
2024-01-16 19:14:26 +08:00
WeiguangHan
0e69bfe6b0 LLM: fix the performance drop of starcoder (#9889)
* LLM: fix the performance drop of starcoder

* small fix

* small fix
2024-01-12 09:14:15 +08:00
Ziteng Zhang
4f4ce73f31 [LLM] Add transformer_autocast_bf16 into all-in-one (#9890)
* Add transformer_autocast_bf16 into all-in-one
2024-01-11 17:51:07 +08:00
WeiguangHan
33fd1f9c76 LLM: fix input length logic for run_transformer_int4_gpu (#9864)
* LLM: fix input length logic for run_transformer_int4_gpu

* small fix

* small fix

* small fix
2024-01-10 18:20:14 +08:00
Cheen Hau, 俊豪
b2aa267f50 Enhance LLM GPU installation document (#9828)
* Improve gpu install doc

* Add troubleshooting - setvars.sh not done properly.

* Further improvements

* 2024.x.x -> 2024.0

* Fixes

* Fix Install BigDL-LLM From Wheel : bigdl-llm[xpu_2.0]

* Remove "export USE_XETLA=OFF" for Max GPU
2024-01-09 16:30:50 +08:00
dingbaorong
f6bb4ab313 Arc stress test (#9795)
* add arc stress test

* triger ci

* triger CI

* triger ci

* disable ci
2023-12-27 21:02:41 +08:00
Shaojun Liu
6c75c689ea bigdl-llm stress test for stable version (#9781)
* 1k-512 2k-512 baseline

* add cpu stress test

* update yaml name

* update

* update

* clean up

* test

* update

* update

* update

* test

* update
2023-12-27 15:40:53 +08:00
dingbaorong
5cfb4c4f5b Arc stable version performance regression test (#9785)
* add arc stable version regression test

* empty gpu mem between different models

* triger ci

* comment spr test

* triger ci

* address kai's comments and disable ci

* merge fp8 and int4

* disable ci
2023-12-27 11:01:56 +08:00
WeiguangHan
c05d7e1532 LLM: add star_corder_15.5b model (#9772)
* LLM: add star_corder_15.5b model

* revert llm_performance_tests.yml
2023-12-26 18:55:56 +08:00
dingbaorong
64d05e581c add peak gpu mem stats in transformer_int4_gpu (#9766)
* add peak gpu mem stats in transformer_int4_gpu

* address weiguang's comments
2023-12-26 15:38:28 +08:00
Chen, Zhentao
7fd7c37e1b Enable fp8e5 harness (#9761)
* fix precision format like fp8e5

* match fp8_e5m2
2023-12-22 16:59:48 +08:00
WeiguangHan
474c099559 LLM: using separate threads to do inference (#9727)
* using separate threads to do inference

* resolve some comments

* resolve some comments

* revert llm_performance_tests.yml file
2023-12-21 17:56:43 +08:00
Chen, Zhentao
b06a3146c8 Fix 70b oom (#9738)
* add default value to bigdl llm

* fix model oom
2023-12-21 10:40:52 +08:00
WeiguangHan
3e8d198b57 LLM: add eval func (#9662)
* Add eval func

* add left eval
2023-12-14 14:59:02 +08:00
Yuwen Hu
cbdd49f229 [LLM] win igpu performance for ipex 2.1 and oneapi 2024.0 (#9679)
* Change igpu win tests for ipex 2.1 and oneapi 2024.0

* Qwen model repo id updates; updates model list for 512-64

* Add .eval for win igpu all-in-one benchmark for best performance
2023-12-13 18:52:29 +08:00
Mingyu Wei
16febc949c [LLM] Add exclude option in all-in-one performance test (#9632)
* add exclude option in all-in-one perf test

* update arc-perf-test.yaml

* Exclude in_out_pairs in main function

* fix some bugs

* address Kai's comments

* define excludes at the beginning

* add bloomz:2048 to exclude
2023-12-13 18:13:06 +08:00
Yuwen Hu
968d99e6f5 Remove empty cache between each iteration of generation (#9660) 2023-12-12 17:24:06 +08:00
Chen, Zhentao
972cdb9992 gsm8k OOM workaround (#9597)
* update bigdl_llm.py

* update the installation of harness

* fix partial function

* import ipex

* force seq len in decrease order

* put func outside class

* move comments

* default 'trust_remote_code' as True

* Update llm-harness-evaluation.yml
2023-12-08 18:47:25 +08:00
WeiguangHan
e9299adb3b LLM: Highlight some values in the html (#9635)
* highlight some values in the html

* revert the llm_performance_tests.yml
2023-12-07 19:02:41 +08:00
Yuwen Hu
48b85593b3 Update all-in-one benchmark readme (#9618) 2023-12-07 10:32:09 +08:00
Yuwen Hu
0e8f4020e5 Add traceback error output for win igpu test api in benchmark (#9607) 2023-12-06 14:35:16 +08:00
Yuwen Hu
c998f5f2ba [LLM] iGPU long context tests (#9598)
* Temp enable PR

* Enable tests for 256-64

* Try again 128-64

* Empty cache after each iteration for igpu benchmark scripts

* Try tests for 512

* change order for 512

* Skip chatglm3 and llama2 for now

* Separate tests for 512-64

* Small fix

* Further fixes

* Change back to nightly again
2023-12-06 10:19:20 +08:00
Chen, Zhentao
8c8a27ded7 Add harness summary job (#9457)
* format yml

* add make_table_results

* add summary job

* add a job to print single result

* upload full directory
2023-12-05 10:04:10 +08:00
Yuwen Hu
3f4ad97929 [LLM] Add performance tests for windows iGPU (#9584)
* Add support for win gpu benchmark with peak gpu memory monitoring

* Add win igpu tests

* Small fix

* Forward outputs

* Small fix

* Test and small fixes

* Small fix

* Small fix and test

* Small fixes

* Add tests for 512-64 and change back to nightly tests

* Small fix
2023-12-04 20:50:02 +08:00
Chen, Zhentao
cb228c70ea Add harness nightly (#9552)
* modify output_path as a directory

* schedule nightly at 21 on Friday

* add tasks and models for nightly

* add accuracy regression

* comment out if to test

* mixed fp4

* for test

* add  missing delimiter

* remove comma

* fixed golden results

* add mixed 4 golden result

* add more options

* add mistral results

* get golden result of stable lm

* move nightly scripts and results to test folder

* add license

* add fp8 stable lm golden

* run on all available devices

* trigger only when ready for review

* fix new line

* update golden

* add mistral
2023-12-01 14:16:35 +08:00
Chen, Zhentao
4d7d5d4c59 Add 3 leaderboard tasks (#9566)
* update leaderboard map

* download model and dataset without overwritten

* fix task drop

* run on all available devices
2023-12-01 14:01:14 +08:00
Chen, Zhentao
c8e0c2ed48 Fixed dumped logs in harness (#9549)
* install transformers==4.34.0

* modify output_path as a directory

* add device and task to output dir parents
2023-11-30 12:47:56 +08:00
Chen, Zhentao
45820cf3b9 add optimize model option (#9530) 2023-11-24 17:10:49 +08:00
Ruonan Wang
139e98aa18 LLM: quick fix benchmark (#9509) 2023-11-22 10:19:57 +08:00
WeiguangHan
c2aeb4d1e8 del model after test (#9504) 2023-11-21 18:41:50 +08:00
Cheen Hau, 俊豪
3e39828420 Update all in one benchmark readme (#9496)
* Add gperftools install to all in one benchmark readme

* Update readme
2023-11-21 14:57:16 +08:00
WeiguangHan
c487b53f21 LLM: only run arc perf test nightly (#9448)
* LLM: only run arc perf test nightly

* deleted unused python scripts

* rebase main
2023-11-15 19:38:14 +08:00
Chen, Zhentao
dbbdb53a18 fix multiple gpu usage (#9459) 2023-11-14 17:06:27 +08:00
Chen, Zhentao
d19ca21957 patch bigdl-llm model to harness by binding instead of patch file (#9420)
* add run_llb.py

* fix args interpret

* modify outputs

* update workflow

* add license

* test mixed 4 bit

* update readme

* use autotokenizer

* add timeout

* refactor workflow file

* fix working directory

* fix env

* throw exception if some jobs failed

* improve terminal outputs

* Disable var which cause the run stuck

* fix unknown precision

* fix key error

* directly output config instead

* rm harness submodule
2023-11-14 12:51:39 +08:00
Chen, Zhentao
0ecb9efb05 use AutoTokenizer to enable more models (#9446) 2023-11-13 17:47:43 +08:00
Cengguang Zhang
ece5805572 LLM: add chatglm3-6b to latency benchmark test. (#9442) 2023-11-13 17:24:37 +08:00
Chen, Zhentao
5747e2fe69 fix multiple gpu usage of harness (#9444) 2023-11-13 16:53:23 +08:00
Heyang Sun
b23b91407c fix llm-init on deepspeed missing lib (#9419) 2023-11-10 13:51:24 +08:00
Chen, Zhentao
298b64217e add auto triggered acc test (#9364)
* add auto triggered acc test

* use llama 7b instead

* fix env

* debug download

* fix download prefix

* add cut dirs

* fix env of model path

* fix dataset download

* full job

* source xpu env vars

* use matrix to trigger model run

* reset batch=1

* remove redirect

* remove some trigger

* add task matrix

* add precision list

* test llama-7b-chat

* use /mnt/disk1 to store model and datasets

* remove installation test

* correct downloading path

* fix HF vars

* add bigdl-llm env vars

* rename file

* fix hf_home

* fix script path

* rename as harness evalution

* rerun
2023-11-08 10:22:27 +08:00
WeiguangHan
84ab614aab LLM: add more models and skip runtime error (#9349)
* add more models and skip runtime error

* upgrade transformers

* temporarily removed Mistral-7B-v0.1

* temporarily disable the upload of arc perf result
2023-11-08 09:45:53 +08:00
Heyang Sun
af94058203 [LLM] Support CPU deepspeed distributed inference (#9259)
* [LLM] Support CPU Deepspeed distributed inference

* Update run_deepspeed.py

* Rename

* fix style

* add new codes

* refine

* remove annotated codes

* refine

* Update README.md

* refine doc and example code
2023-11-06 17:56:42 +08:00
Chen, Zhentao
d4dffbdb62 Merge harness (#9319)
* add harness patch and llb script

* add readme

* add license

* use patch instead

* update readme

* rename tests to evaluation

* fix typo

* remove nano dependency

* add original harness link

* rename title of usage

* rename BigDLGPULM as BigDLLM

* empty commit to rerun job
2023-11-02 15:14:19 +08:00
Ruonan Wang
7e73c354a6 LLM: decoupling bigdl-llm and bigdl-nano (#9306) 2023-11-01 11:00:54 +08:00
binbin Deng
770ac70b00 LLM: add low_bit option in benchmark scripts (#9257) 2023-10-25 10:27:48 +08:00
WeiguangHan
ec9195da42 LLM: using html to visualize the perf result for Arc (#9228)
* LLM: using html to visualize the perf result for Arc

* deploy the html file

* add python license

* reslove some comments
2023-10-24 18:05:25 +08:00