Commit graph

10 commits

Author SHA1 Message Date
Chen, Zhentao
972cdb9992 gsm8k OOM workaround (#9597)
* update bigdl_llm.py

* update the installation of harness

* fix partial function

* import ipex

* force seq len in decrease order

* put func outside class

* move comments

* default 'trust_remote_code' as True

* Update llm-harness-evaluation.yml
2023-12-08 18:47:25 +08:00
Chen, Zhentao
8c8a27ded7 Add harness summary job (#9457)
* format yml

* add make_table_results

* add summary job

* add a job to print single result

* upload full directory
2023-12-05 10:04:10 +08:00
Chen, Zhentao
29d5bb8df4 Harness workflow dispatch (#9591)
* add set-matrix job

* add workflow_dispatch

* fix context

* fix manual run

* rename step

* add quotes

* add runner option

* not required labels

* add runner label to output

* use double quote
2023-12-04 15:53:29 +08:00
Chen, Zhentao
9557aa9c21 Fix harness nightly (#9586)
* update golden

* loose the restriction of diff

* only compare results when scheduled
2023-12-04 11:45:00 +08:00
Chen, Zhentao
cb228c70ea Add harness nightly (#9552)
* modify output_path as a directory

* schedule nightly at 21 on Friday

* add tasks and models for nightly

* add accuracy regression

* comment out if to test

* mixed fp4

* for test

* add  missing delimiter

* remove comma

* fixed golden results

* add mixed 4 golden result

* add more options

* add mistral results

* get golden result of stable lm

* move nightly scripts and results to test folder

* add license

* add fp8 stable lm golden

* run on all available devices

* trigger only when ready for review

* fix new line

* update golden

* add mistral
2023-12-01 14:16:35 +08:00
Chen, Zhentao
4d7d5d4c59 Add 3 leaderboard tasks (#9566)
* update leaderboard map

* download model and dataset without overwritten

* fix task drop

* run on all available devices
2023-12-01 14:01:14 +08:00
Chen, Zhentao
c8e0c2ed48 Fixed dumped logs in harness (#9549)
* install transformers==4.34.0

* modify output_path as a directory

* add device and task to output dir parents
2023-11-30 12:47:56 +08:00
Chen, Zhentao
d19ca21957 patch bigdl-llm model to harness by binding instead of patch file (#9420)
* add run_llb.py

* fix args interpret

* modify outputs

* update workflow

* add license

* test mixed 4 bit

* update readme

* use autotokenizer

* add timeout

* refactor workflow file

* fix working directory

* fix env

* throw exception if some jobs failed

* improve terminal outputs

* Disable var which cause the run stuck

* fix unknown precision

* fix key error

* directly output config instead

* rm harness submodule
2023-11-14 12:51:39 +08:00
Chen, Zhentao
f36d7b2d59 Fix harness stuck (#9435)
* remove env to avoid being stuck

* use small model for test
2023-11-13 15:29:53 +08:00
Chen, Zhentao
298b64217e add auto triggered acc test (#9364)
* add auto triggered acc test

* use llama 7b instead

* fix env

* debug download

* fix download prefix

* add cut dirs

* fix env of model path

* fix dataset download

* full job

* source xpu env vars

* use matrix to trigger model run

* reset batch=1

* remove redirect

* remove some trigger

* add task matrix

* add precision list

* test llama-7b-chat

* use /mnt/disk1 to store model and datasets

* remove installation test

* correct downloading path

* fix HF vars

* add bigdl-llm env vars

* rename file

* fix hf_home

* fix script path

* rename as harness evalution

* rerun
2023-11-08 10:22:27 +08:00