* Add c-eval workflow and modify running files * Modify the chatglm evaluator file * Modify the ceval workflow for triggering test * Modify the ceval workflow file * Modify the ceval workflow file * Modify ceval workflow * Adjust the ceval dataset download * Add ceval workflow dependencies * Modify ceval workflow dataset download * Add ceval test dependencies * Add ceval test dependencies * Correct the result print
		
			
				
	
	
		
			31 lines
		
	
	
	
		
			1.2 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			31 lines
		
	
	
	
		
			1.2 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
## C-Eval Benchmark Test
 | 
						|
 | 
						|
C-Eval benchmark test allows users to test on [C-Eval](https://cevalbenchmark.com) datasets, which is a multi-level multi-discipline chinese evaluation suite for foundation models. It consists of 13948 multi-choice questions spanning 52 diverse disciplines and four difficulty levels. Please check [paper](https://arxiv.org/abs/2305.08322) and [github repo](https://github.com/hkust-nlp/ceval) for more information.
 | 
						|
 | 
						|
### Download dataset
 | 
						|
Please download and unzip the dataset for evaluation.
 | 
						|
```shell
 | 
						|
wget https://huggingface.co/datasets/ceval/ceval-exam/resolve/main/ceval-exam.zip
 | 
						|
mkdir data
 | 
						|
mv ceval-exam.zip data
 | 
						|
cd data; unzip ceval-exam.zip
 | 
						|
```
 | 
						|
 | 
						|
### Run
 | 
						|
You can run evaluation with following command.
 | 
						|
```shell
 | 
						|
bash run.sh
 | 
						|
```
 | 
						|
+ `run.sh`
 | 
						|
```shell
 | 
						|
python eval.py \
 | 
						|
    --model_path "path to model" \
 | 
						|
    --eval_type validation \
 | 
						|
    --device xpu \
 | 
						|
    --eval_data_path data \
 | 
						|
    --qtype sym_int4
 | 
						|
```
 | 
						|
 | 
						|
> **Note**
 | 
						|
>
 | 
						|
> `eval_type` there is two types of evaluation, first type is `validation`, which runs on validation dataset and output evaluation scores. The second type is `test`, which runs on test dataset and output `submission.json` file for submission on https://cevalbenchmark.com to get the evaluation score.
 |