| 
				 | 
			||
|---|---|---|
| .. | ||
| evaluators | ||
| eval.py | ||
| README.md | ||
| run.sh | ||
C-Eval Benchmark Test
C-Eval benchmark test allows users to test on C-Eval datasets, which is a multi-level multi-discipline chinese evaluation suite for foundation models. It consists of 13948 multi-choice questions spanning 52 diverse disciplines and four difficulty levels. Please check paper and github repo for more information.
Download dataset
Please download and unzip the dataset for evaluation.
wget https://huggingface.co/datasets/ceval/ceval-exam/resolve/main/ceval-exam.zip
mkdir data
mv ceval-exam.zip data
cd data; unzip ceval-exam.zip
Run
You can run evaluation with following command.
bash run.sh
run.sh
python eval.py \
    --model_family llama \
    --model_path "path to model" \
    --eval_type validation \
    --device xpu \
    --eval_data_path data \
    --qtype sym_int4
Note
eval_typethere is two types of evaluation, first type isvalidation, which runs on validation dataset and output evaluation scores. The second type istest, which runs on test dataset and outputsubmission.jsonfile for submission on https://cevalbenchmark.com to get the evaluation score.