Chen, Zhentao
86055d76d5
fix optimize_model not working ( #9995 )
2024-01-25 16:39:05 +08:00
Chen, Zhentao
301425e377
harness tests on pvc multiple xpus ( #9908 )
...
* add run_multi_llb.py
* update readme
* add job hint
2024-01-23 13:20:37 +08:00
Cheen Hau, 俊豪
947b1e27b7
Add readme for Whisper Test ( #9944 )
...
* Fix local data path
* Remove non-essential files
* Add readme
* Minor fixes to script
* Bugfix, refactor
* Add references to original source. Bugfixes.
* Reviewer comments
* Properly print and explain output
* Move files to dev/benchmark
* Fixes
2024-01-22 15:11:33 +08:00
Xin Qiu
6fb3f40f7e
fix error for benchmark_util.py running on cpu ( #9949 )
2024-01-22 10:14:40 +08:00
binbin Deng
db8e90796a
LLM: add avg token latency information and benchmark guide of autotp ( #9940 )
2024-01-19 15:09:57 +08:00
Xin Qiu
610b5226be
move reserved memory to benchmark_utils.py ( #9907 )
...
* move reserved memory to benchmark_utils.py
* meet code review
2024-01-19 09:44:30 +08:00
Chen, Zhentao
a8c866c32b
add ppl benchmark ( #9914 )
...
* add ppl benchmark
* add license
* add readme
* add dataset argument
* add dataset usage
* fixed low bit args
* correct result
* fix terminal display
* fix ppl update
* enable fp16 fp32 bf16
* format the desc
* fix model_kwargs
* add more readme
2024-01-18 17:54:28 +08:00
WeiguangHan
100e0a87e5
LLM: add compressed chatglm3 model ( #9892 )
...
* LLM: add compressed chatglm3 model
* small fix
* revert github action
2024-01-18 17:48:15 +08:00
Ruonan Wang
b059a32fff
LLM: add benchmark api for bigdl-llm fp16 on GPU ( #9919 )
...
* add bmk for bigdl fp16
* fix
2024-01-17 14:24:35 +08:00
Cengguang Zhang
511cbcf773
LLM: add Ceval benchmark test. ( #9872 )
...
* init ceval benchmark test.
* upload dataset.
* add other tests.
* add qwen evaluator.
* fix qwen evaluator style.
* fix qwen evaluator style.
* update qwen evaluator.
* add llama evaluator.
* update eval
* fix typo.
* fix
* fix typo.
* fix llama evaluator.
* fix bug.
* fix style.
* delete dataset.
* fix style.
* fix style.
* add README.md and fix typo.
* fix comments.
* remove run scripts
2024-01-16 19:14:26 +08:00
WeiguangHan
0e69bfe6b0
LLM: fix the performance drop of starcoder ( #9889 )
...
* LLM: fix the performance drop of starcoder
* small fix
* small fix
2024-01-12 09:14:15 +08:00
Ziteng Zhang
4f4ce73f31
[LLM] Add transformer_autocast_bf16 into all-in-one ( #9890 )
...
* Add transformer_autocast_bf16 into all-in-one
2024-01-11 17:51:07 +08:00
WeiguangHan
33fd1f9c76
LLM: fix input length logic for run_transformer_int4_gpu ( #9864 )
...
* LLM: fix input length logic for run_transformer_int4_gpu
* small fix
* small fix
* small fix
2024-01-10 18:20:14 +08:00
Cheen Hau, 俊豪
b2aa267f50
Enhance LLM GPU installation document ( #9828 )
...
* Improve gpu install doc
* Add troubleshooting - setvars.sh not done properly.
* Further improvements
* 2024.x.x -> 2024.0
* Fixes
* Fix Install BigDL-LLM From Wheel : bigdl-llm[xpu_2.0]
* Remove "export USE_XETLA=OFF" for Max GPU
2024-01-09 16:30:50 +08:00
dingbaorong
f6bb4ab313
Arc stress test ( #9795 )
...
* add arc stress test
* triger ci
* triger CI
* triger ci
* disable ci
2023-12-27 21:02:41 +08:00
Shaojun Liu
6c75c689ea
bigdl-llm stress test for stable version ( #9781 )
...
* 1k-512 2k-512 baseline
* add cpu stress test
* update yaml name
* update
* update
* clean up
* test
* update
* update
* update
* test
* update
2023-12-27 15:40:53 +08:00
dingbaorong
5cfb4c4f5b
Arc stable version performance regression test ( #9785 )
...
* add arc stable version regression test
* empty gpu mem between different models
* triger ci
* comment spr test
* triger ci
* address kai's comments and disable ci
* merge fp8 and int4
* disable ci
2023-12-27 11:01:56 +08:00
WeiguangHan
c05d7e1532
LLM: add star_corder_15.5b model ( #9772 )
...
* LLM: add star_corder_15.5b model
* revert llm_performance_tests.yml
2023-12-26 18:55:56 +08:00
dingbaorong
64d05e581c
add peak gpu mem stats in transformer_int4_gpu ( #9766 )
...
* add peak gpu mem stats in transformer_int4_gpu
* address weiguang's comments
2023-12-26 15:38:28 +08:00
Chen, Zhentao
7fd7c37e1b
Enable fp8e5 harness ( #9761 )
...
* fix precision format like fp8e5
* match fp8_e5m2
2023-12-22 16:59:48 +08:00
WeiguangHan
474c099559
LLM: using separate threads to do inference ( #9727 )
...
* using separate threads to do inference
* resolve some comments
* resolve some comments
* revert llm_performance_tests.yml file
2023-12-21 17:56:43 +08:00
Chen, Zhentao
b06a3146c8
Fix 70b oom ( #9738 )
...
* add default value to bigdl llm
* fix model oom
2023-12-21 10:40:52 +08:00
WeiguangHan
3e8d198b57
LLM: add eval func ( #9662 )
...
* Add eval func
* add left eval
2023-12-14 14:59:02 +08:00
Yuwen Hu
cbdd49f229
[LLM] win igpu performance for ipex 2.1 and oneapi 2024.0 ( #9679 )
...
* Change igpu win tests for ipex 2.1 and oneapi 2024.0
* Qwen model repo id updates; updates model list for 512-64
* Add .eval for win igpu all-in-one benchmark for best performance
2023-12-13 18:52:29 +08:00
Mingyu Wei
16febc949c
[LLM] Add exclude option in all-in-one performance test ( #9632 )
...
* add exclude option in all-in-one perf test
* update arc-perf-test.yaml
* Exclude in_out_pairs in main function
* fix some bugs
* address Kai's comments
* define excludes at the beginning
* add bloomz:2048 to exclude
2023-12-13 18:13:06 +08:00
Yuwen Hu
968d99e6f5
Remove empty cache between each iteration of generation ( #9660 )
2023-12-12 17:24:06 +08:00
Chen, Zhentao
972cdb9992
gsm8k OOM workaround ( #9597 )
...
* update bigdl_llm.py
* update the installation of harness
* fix partial function
* import ipex
* force seq len in decrease order
* put func outside class
* move comments
* default 'trust_remote_code' as True
* Update llm-harness-evaluation.yml
2023-12-08 18:47:25 +08:00
WeiguangHan
e9299adb3b
LLM: Highlight some values in the html ( #9635 )
...
* highlight some values in the html
* revert the llm_performance_tests.yml
2023-12-07 19:02:41 +08:00
Yuwen Hu
48b85593b3
Update all-in-one benchmark readme ( #9618 )
2023-12-07 10:32:09 +08:00
Yuwen Hu
0e8f4020e5
Add traceback error output for win igpu test api in benchmark ( #9607 )
2023-12-06 14:35:16 +08:00
Yuwen Hu
c998f5f2ba
[LLM] iGPU long context tests ( #9598 )
...
* Temp enable PR
* Enable tests for 256-64
* Try again 128-64
* Empty cache after each iteration for igpu benchmark scripts
* Try tests for 512
* change order for 512
* Skip chatglm3 and llama2 for now
* Separate tests for 512-64
* Small fix
* Further fixes
* Change back to nightly again
2023-12-06 10:19:20 +08:00
Chen, Zhentao
8c8a27ded7
Add harness summary job ( #9457 )
...
* format yml
* add make_table_results
* add summary job
* add a job to print single result
* upload full directory
2023-12-05 10:04:10 +08:00
Yuwen Hu
3f4ad97929
[LLM] Add performance tests for windows iGPU ( #9584 )
...
* Add support for win gpu benchmark with peak gpu memory monitoring
* Add win igpu tests
* Small fix
* Forward outputs
* Small fix
* Test and small fixes
* Small fix
* Small fix and test
* Small fixes
* Add tests for 512-64 and change back to nightly tests
* Small fix
2023-12-04 20:50:02 +08:00
Chen, Zhentao
cb228c70ea
Add harness nightly ( #9552 )
...
* modify output_path as a directory
* schedule nightly at 21 on Friday
* add tasks and models for nightly
* add accuracy regression
* comment out if to test
* mixed fp4
* for test
* add missing delimiter
* remove comma
* fixed golden results
* add mixed 4 golden result
* add more options
* add mistral results
* get golden result of stable lm
* move nightly scripts and results to test folder
* add license
* add fp8 stable lm golden
* run on all available devices
* trigger only when ready for review
* fix new line
* update golden
* add mistral
2023-12-01 14:16:35 +08:00
Chen, Zhentao
4d7d5d4c59
Add 3 leaderboard tasks ( #9566 )
...
* update leaderboard map
* download model and dataset without overwritten
* fix task drop
* run on all available devices
2023-12-01 14:01:14 +08:00
Chen, Zhentao
c8e0c2ed48
Fixed dumped logs in harness ( #9549 )
...
* install transformers==4.34.0
* modify output_path as a directory
* add device and task to output dir parents
2023-11-30 12:47:56 +08:00
Chen, Zhentao
45820cf3b9
add optimize model option ( #9530 )
2023-11-24 17:10:49 +08:00
Ruonan Wang
139e98aa18
LLM: quick fix benchmark ( #9509 )
2023-11-22 10:19:57 +08:00
WeiguangHan
c2aeb4d1e8
del model after test ( #9504 )
2023-11-21 18:41:50 +08:00
Cheen Hau, 俊豪
3e39828420
Update all in one benchmark readme ( #9496 )
...
* Add gperftools install to all in one benchmark readme
* Update readme
2023-11-21 14:57:16 +08:00
WeiguangHan
c487b53f21
LLM: only run arc perf test nightly ( #9448 )
...
* LLM: only run arc perf test nightly
* deleted unused python scripts
* rebase main
2023-11-15 19:38:14 +08:00
Chen, Zhentao
dbbdb53a18
fix multiple gpu usage ( #9459 )
2023-11-14 17:06:27 +08:00
Chen, Zhentao
d19ca21957
patch bigdl-llm model to harness by binding instead of patch file ( #9420 )
...
* add run_llb.py
* fix args interpret
* modify outputs
* update workflow
* add license
* test mixed 4 bit
* update readme
* use autotokenizer
* add timeout
* refactor workflow file
* fix working directory
* fix env
* throw exception if some jobs failed
* improve terminal outputs
* Disable var which cause the run stuck
* fix unknown precision
* fix key error
* directly output config instead
* rm harness submodule
2023-11-14 12:51:39 +08:00
Chen, Zhentao
0ecb9efb05
use AutoTokenizer to enable more models ( #9446 )
2023-11-13 17:47:43 +08:00
Cengguang Zhang
ece5805572
LLM: add chatglm3-6b to latency benchmark test. ( #9442 )
2023-11-13 17:24:37 +08:00
Chen, Zhentao
5747e2fe69
fix multiple gpu usage of harness ( #9444 )
2023-11-13 16:53:23 +08:00
Heyang Sun
b23b91407c
fix llm-init on deepspeed missing lib ( #9419 )
2023-11-10 13:51:24 +08:00
Chen, Zhentao
298b64217e
add auto triggered acc test ( #9364 )
...
* add auto triggered acc test
* use llama 7b instead
* fix env
* debug download
* fix download prefix
* add cut dirs
* fix env of model path
* fix dataset download
* full job
* source xpu env vars
* use matrix to trigger model run
* reset batch=1
* remove redirect
* remove some trigger
* add task matrix
* add precision list
* test llama-7b-chat
* use /mnt/disk1 to store model and datasets
* remove installation test
* correct downloading path
* fix HF vars
* add bigdl-llm env vars
* rename file
* fix hf_home
* fix script path
* rename as harness evalution
* rerun
2023-11-08 10:22:27 +08:00
WeiguangHan
84ab614aab
LLM: add more models and skip runtime error ( #9349 )
...
* add more models and skip runtime error
* upgrade transformers
* temporarily removed Mistral-7B-v0.1
* temporarily disable the upload of arc perf result
2023-11-08 09:45:53 +08:00
Heyang Sun
af94058203
[LLM] Support CPU deepspeed distributed inference ( #9259 )
...
* [LLM] Support CPU Deepspeed distributed inference
* Update run_deepspeed.py
* Rename
* fix style
* add new codes
* refine
* remove annotated codes
* refine
* Update README.md
* refine doc and example code
2023-11-06 17:56:42 +08:00