History

Chen, Zhentao a8c866c32b add ppl benchmark (#9914 ) * add ppl benchmark * add license * add readme * add dataset argument * add dataset usage * fixed low bit args * correct result * fix terminal display * fix ppl update * enable fp16 fp32 bf16 * format the desc * fix model_kwargs * add more readme		2024-01-18 17:54:28 +08:00
..
ppl.py	add ppl benchmark (#9914 )	2024-01-18 17:54:28 +08:00
README.md	add ppl benchmark (#9914 )	2024-01-18 17:54:28 +08:00
run.py	add ppl benchmark (#9914 )	2024-01-18 17:54:28 +08:00

README.md

Perplexity

Perplexity (PPL) is one of the most common metrics for evaluating language models. This benchmark implementation was from transformers/perplexity and llm_perplexity.py

HOW TO RUN

python run.py --model_path <path/to/model> --low_bit sym_int4 fp4 mixed_fp4 sym_int8 fp8_e5m2 fp8_e4m3 mixed_fp8 --device xpu --dataset path=<dataset_path>,name=<dataset_name>

A more specific example to run perplexity on Llama2-7B and wikitext:

python run.py --model_path meta-llama/Llama-2-7b-chat-hf --low_bit float16 sym_int4 --device xpu --dataset path=wikitext,name=wikitext-2-raw-v1