ipex-llm/python/llm/dev/benchmark/ceval
Yuxuan Xia 7cbc2429a6 Fix C-Eval ChatGLM loading issue (#10206)
* Add c-eval workflow and modify running files

* Modify the chatglm evaluator file

* Modify the ceval workflow for triggering test

* Modify the ceval workflow file

* Modify the ceval workflow file

* Modify ceval workflow

* Adjust the ceval dataset download

* Add ceval workflow dependencies

* Modify ceval workflow dataset download

* Add ceval test dependencies

* Add ceval test dependencies

* Correct the result print

* Fix the nightly test trigger time

* Fix ChatGLM loading issue
2024-02-22 10:00:43 +08:00
..
evaluators Fix C-Eval ChatGLM loading issue (#10206) 2024-02-22 10:00:43 +08:00
eval.py Add Ceval workflow and modify the result printing (#10140) 2024-02-19 17:06:53 +08:00
organize_results.py Add Ceval workflow and modify the result printing (#10140) 2024-02-19 17:06:53 +08:00
README.md Add Ceval workflow and modify the result printing (#10140) 2024-02-19 17:06:53 +08:00
run.sh LLM: add Ceval benchmark test. (#9872) 2024-01-16 19:14:26 +08:00

C-Eval Benchmark Test

C-Eval benchmark test allows users to test on C-Eval datasets, which is a multi-level multi-discipline chinese evaluation suite for foundation models. It consists of 13948 multi-choice questions spanning 52 diverse disciplines and four difficulty levels. Please check paper and github repo for more information.

Download dataset

Please download and unzip the dataset for evaluation.

wget https://huggingface.co/datasets/ceval/ceval-exam/resolve/main/ceval-exam.zip
mkdir data
mv ceval-exam.zip data
cd data; unzip ceval-exam.zip

Run

You can run evaluation with following command.

bash run.sh
  • run.sh
python eval.py \
    --model_path "path to model" \
    --eval_type validation \
    --device xpu \
    --eval_data_path data \
    --qtype sym_int4

Note

eval_type there is two types of evaluation, first type is validation, which runs on validation dataset and output evaluation scores. The second type is test, which runs on test dataset and output submission.json file for submission on https://cevalbenchmark.com to get the evaluation score.