Commit graph

261 commits

Author SHA1 Message Date
Shaojun Liu
b6222404b8 bigdl-llm stable version: let the perf test fail if the difference between perf and baseline is greater than 5% (#9750)
* test

* test

* test

* update

* revert
2023-12-25 13:47:11 +08:00
Chen, Zhentao
4a98bfa5ae fix harness manual run env typo (#9763) 2023-12-22 18:42:35 +08:00
Yuwen Hu
02436c6cce [LLM] Enable more long context in-out pairs for iGPU perf tests (#9765)
* Add test for 1024-128 and enable more tests for 512-64

* Fix date in results csv name to the time when the performance is triggered

* Small fix

* Small fix

* further fixes
2023-12-22 18:18:23 +08:00
Yuwen Hu
1c8c104bb8 [LLM] Small fixes for igpu win perf tests (#9756) 2023-12-22 15:51:03 +08:00
Chen, Zhentao
86a69e289c fix harness runner label of manual trigger (#9754)
* fix runner

* update golden
2023-12-22 15:09:22 +08:00
WeiguangHan
2d1bf20309 LLM: small fix llm_performance_tests.html (#9753)
* LLM: small fix llm_performance_tests.html

* reslove some comments

* revert the llm_performance_test.yaml
2023-12-22 13:55:01 +08:00
Shaojun Liu
bb52239e0a bigdl-llm stable version release & test (#9732)
* stable version test

* trigger spr test

* update

* trigger

* test

* test

* test

* test

* test

* refine

* release linux first
2023-12-21 22:55:33 +08:00
WeiguangHan
34bb804189 LLM: check csv and its corresponding yaml file (#9702)
* LLM: check csv and its corresponding yaml file

* run PR arc perf test

* modify the name of some variables

* execute the check results script in right place

* use cp to replace mv command

* resolve some comments

* resolve more comments

* revert the llm_performance_test.yaml file
2023-12-21 09:54:33 +08:00
WeiguangHan
3aa8b66bc3 LLM: remove starcoder-15.5b model temporarily (#9720) 2023-12-19 20:14:46 +08:00
Chen, Zhentao
b3647507c0 Fix harness workflow (#9704)
* error when larger than 0.001

* fix env setup

* fix typo

* fix typo
2023-12-18 15:42:10 +08:00
WeiguangHan
1f0245039d LLM: check the final csv results for arc perf test (#9684)
* LLM: check the final csv results for arc perf test

* delete useless python script

* change threshold

* revert the llm_performance_tests.yml
2023-12-14 19:46:08 +08:00
Yuwen Hu
82ac2dbf55 [LLM] Small fixes for win igpu test for ipex 2.1 (#9686)
* Fixes to install for igpu performance tests

* Small update for core performance tests model lists
2023-12-14 15:39:51 +08:00
Yuwen Hu
cbdd49f229 [LLM] win igpu performance for ipex 2.1 and oneapi 2024.0 (#9679)
* Change igpu win tests for ipex 2.1 and oneapi 2024.0

* Qwen model repo id updates; updates model list for 512-64

* Add .eval for win igpu all-in-one benchmark for best performance
2023-12-13 18:52:29 +08:00
Yuwen Hu
017932a7fb Small fix for html generation (#9656) 2023-12-12 14:06:18 +08:00
WeiguangHan
1e25499de0 LLM: test new oneapi (#9654)
* test new oneapi

* revert llm_performance_tests.yml
2023-12-12 11:12:14 +08:00
Yuwen Hu
d272b6dc47 [LLM] Enable generation of html again for win igpu tests (#9652)
* Enable generation of html again and comment out rwkv for 32-512 as it is not very stable

* Small fix
2023-12-11 19:15:17 +08:00
Yuwen Hu
894d0aaf5e [LLM] iGPU win perf test reorg based on in-out pairs (#9645)
* trigger pr temparorily

* Saparate benchmark run for win igpu based in in-out pairs

* Rename fix

* Test workflow

* Small fix

* Skip generation of html for now

* Change back to nightly triggered
2023-12-08 20:46:40 +08:00
Chen, Zhentao
972cdb9992 gsm8k OOM workaround (#9597)
* update bigdl_llm.py

* update the installation of harness

* fix partial function

* import ipex

* force seq len in decrease order

* put func outside class

* move comments

* default 'trust_remote_code' as True

* Update llm-harness-evaluation.yml
2023-12-08 18:47:25 +08:00
WeiguangHan
1ff4bc43a6 degrade pandas version (#9643) 2023-12-08 17:44:51 +08:00
Yuwen Hu
c998f5f2ba [LLM] iGPU long context tests (#9598)
* Temp enable PR

* Enable tests for 256-64

* Try again 128-64

* Empty cache after each iteration for igpu benchmark scripts

* Try tests for 512

* change order for 512

* Skip chatglm3 and llama2 for now

* Separate tests for 512-64

* Small fix

* Further fixes

* Change back to nightly again
2023-12-06 10:19:20 +08:00
Chen, Zhentao
8c8a27ded7 Add harness summary job (#9457)
* format yml

* add make_table_results

* add summary job

* add a job to print single result

* upload full directory
2023-12-05 10:04:10 +08:00
Yuwen Hu
3f4ad97929 [LLM] Add performance tests for windows iGPU (#9584)
* Add support for win gpu benchmark with peak gpu memory monitoring

* Add win igpu tests

* Small fix

* Forward outputs

* Small fix

* Test and small fixes

* Small fix

* Small fix and test

* Small fixes

* Add tests for 512-64 and change back to nightly tests

* Small fix
2023-12-04 20:50:02 +08:00
Chen, Zhentao
29d5bb8df4 Harness workflow dispatch (#9591)
* add set-matrix job

* add workflow_dispatch

* fix context

* fix manual run

* rename step

* add quotes

* add runner option

* not required labels

* add runner label to output

* use double quote
2023-12-04 15:53:29 +08:00
Chen, Zhentao
9557aa9c21 Fix harness nightly (#9586)
* update golden

* loose the restriction of diff

* only compare results when scheduled
2023-12-04 11:45:00 +08:00
Chen, Zhentao
5de92090b3 try to fix deps installation of bigdl (#9578) 2023-12-01 15:25:47 +08:00
Chen, Zhentao
cb228c70ea Add harness nightly (#9552)
* modify output_path as a directory

* schedule nightly at 21 on Friday

* add tasks and models for nightly

* add accuracy regression

* comment out if to test

* mixed fp4

* for test

* add  missing delimiter

* remove comma

* fixed golden results

* add mixed 4 golden result

* add more options

* add mistral results

* get golden result of stable lm

* move nightly scripts and results to test folder

* add license

* add fp8 stable lm golden

* run on all available devices

* trigger only when ready for review

* fix new line

* update golden

* add mistral
2023-12-01 14:16:35 +08:00
Chen, Zhentao
4d7d5d4c59 Add 3 leaderboard tasks (#9566)
* update leaderboard map

* download model and dataset without overwritten

* fix task drop

* run on all available devices
2023-12-01 14:01:14 +08:00
Chen, Zhentao
c8e0c2ed48 Fixed dumped logs in harness (#9549)
* install transformers==4.34.0

* modify output_path as a directory

* add device and task to output dir parents
2023-11-30 12:47:56 +08:00
WeiguangHan
5098bc3544 LLM: enable previous models (#9505)
* enable previous models

* test mistral model

* for test

* run models separately

* test all models

* for test

* revert the llm_performance_test.yaml
2023-11-28 10:21:07 +08:00
Wang, Jian4
40ec9f7ead Add qlora cpu docker manually build (#9501) 2023-11-21 14:39:16 +08:00
WeiguangHan
c487b53f21 LLM: only run arc perf test nightly (#9448)
* LLM: only run arc perf test nightly

* deleted unused python scripts

* rebase main
2023-11-15 19:38:14 +08:00
Chen, Zhentao
d19ca21957 patch bigdl-llm model to harness by binding instead of patch file (#9420)
* add run_llb.py

* fix args interpret

* modify outputs

* update workflow

* add license

* test mixed 4 bit

* update readme

* use autotokenizer

* add timeout

* refactor workflow file

* fix working directory

* fix env

* throw exception if some jobs failed

* improve terminal outputs

* Disable var which cause the run stuck

* fix unknown precision

* fix key error

* directly output config instead

* rm harness submodule
2023-11-14 12:51:39 +08:00
Chen, Zhentao
f36d7b2d59 Fix harness stuck (#9435)
* remove env to avoid being stuck

* use small model for test
2023-11-13 15:29:53 +08:00
Yuwen Hu
4faf5af8f1 [LLM] Add perf test for core on Windows (#9397)
* temporary stop other perf test

* Add framework for core performance test with one test model

* Small fix and add platform control

* Comment out lp for now

* Add missing ymal file

* Small fix

* Fix sed contents

* Small fix

* Small path fixes

* Small fix

* Add update to ftp

* Small upload fix

* add chatglm3-6b

* LLM: add model names

* Keep repo id same as ftp and temporary make baichuan2 first priority

* change order

* Remove temp if false and separate pr and nightly results

* Small fix

---------

Co-authored-by: jinbridge <2635480475@qq.com>
2023-11-13 13:58:40 +08:00
WeiguangHan
2cfef5ef1e LLM: store the nightly test and pr results separately (#9404)
* LLM: store the csv results separately

* modify the trigger files of LLM Performance Test
2023-11-11 06:35:27 +08:00
Yuwen Hu
3d107f6d25 [LLM] Separate windows build UT and build runner (#9403)
* Separate windows build UT and build runner

* Small fix
2023-11-09 18:47:38 +08:00
WeiguangHan
34449cb4bb LLM: add remaining models to the arc perf test (#9384)
* add remaining models

* modify the filepath which stores the test result on ftp server

* resolve some comments
2023-11-09 14:28:42 +08:00
Yuwen Hu
d4b248fcd4 Add windows binary build label AVX_VNNI (#9387) 2023-11-08 18:13:35 +08:00
Chen, Zhentao
298b64217e add auto triggered acc test (#9364)
* add auto triggered acc test

* use llama 7b instead

* fix env

* debug download

* fix download prefix

* add cut dirs

* fix env of model path

* fix dataset download

* full job

* source xpu env vars

* use matrix to trigger model run

* reset batch=1

* remove redirect

* remove some trigger

* add task matrix

* add precision list

* test llama-7b-chat

* use /mnt/disk1 to store model and datasets

* remove installation test

* correct downloading path

* fix HF vars

* add bigdl-llm env vars

* rename file

* fix hf_home

* fix script path

* rename as harness evalution

* rerun
2023-11-08 10:22:27 +08:00
WeiguangHan
84ab614aab LLM: add more models and skip runtime error (#9349)
* add more models and skip runtime error

* upgrade transformers

* temporarily removed Mistral-7B-v0.1

* temporarily disable the upload of arc perf result
2023-11-08 09:45:53 +08:00
Shaojun Liu
833e4dbc8d fix llm-performance-test-on-arc bug (#9357) 2023-11-06 10:00:25 +08:00
ZehuaCao
ef83c3302e Use to test llm-performance on spr-perf (#9316)
* Update llm_performance_tests.yml

* Update llm_performance_tests.yml

* Update action.yml

* Create cpu-perf-test.yaml

* Update action.yml

* Update action.yml

* Update llm_performance_tests.yml

* Update llm_performance_tests.yml

* Update llm_performance_tests.yml

* Update llm_performance_tests.yml

* Update llm_performance_tests.yml

* Update llm_performance_tests.yml

* Update llm_performance_tests.yml

* Update llm_performance_tests.yml

* Update llm_performance_tests.yml

* Update llm_performance_tests.yml

* Update llm_performance_tests.yml
2023-11-03 11:17:16 +08:00
Cheen Hau, 俊豪
8f23fb04dc Add inference test for Whisper model on Arc (#9330)
* Add inference test for Whisper model

* Remove unnecessary inference time measurement
2023-11-03 10:15:52 +08:00
Ziteng Zhang
dd3cf2f153 LLM: Add python 3.10 & 3.11 UT
LLM: Add python 3.10 & 3.11 UT
2023-11-02 14:09:29 +08:00
Jasonzzt
d1bdc0ef72 spr & arc ut with python 3.9 & 3.10 & 3.11 2023-11-01 22:57:48 +08:00
Jasonzzt
687da21467 test 3.11 2023-11-01 19:14:53 +08:00
WeiguangHan
9722e811be LLM: add more models to the arc perf test (#9297)
* LLM: add more models to the arc perf test

* remove some old models

* install some dependencies
2023-11-01 16:56:32 +08:00
Jasonzzt
3c3329010d add conda update -n base conda 2023-11-01 16:36:35 +08:00
Jasonzzt
2fff0e8c21 use runner avx2 with linux 2023-11-01 16:28:29 +08:00
Jasonzzt
964a8e6dc1 update conda 2023-11-01 16:20:19 +08:00