* feat: update readme for ppl test
* fix: textual adjustments
* fix: textual adjustments
* Add ipex-llm npu option in setup.py (#11858)
* add ipex-llm npu release
* update example doc
* meet latest release changes
* optimize phi3 memory usage (#11867)
* Update `ipex-llm` default transformers version to 4.37.0 (#11859)
* Update default transformers version to 4.37.0
* Add dependency requirements for qwen and qwen-vl
* Temp fix transformers version for these not yet verified models
* Skip qwen test in UT for now as it requires transformers<4.37.0
* Update performance test regarding updated default `transformers==4.37.0` (#11869)
* Update igpu performance from transformers 4.36.2 to 4.37.0 (#11841)
* upgrade arc perf test to transformers 4.37 (#11842)
* fix load low bit com dtype (#11832)
* feat: add mixed_precision argument on ppl longbench evaluation
* fix: delete extra code
* feat: upgrade arc perf test to transformers 4.37
* fix: add missing codes
* fix: keep perf test for qwen-vl-chat in transformers 4.36
* fix: remove extra space
* fix: resolve pr comment
* fix: add empty line
* fix: add pip install for spr and core test
* fix: delete extra comments
* fix: remove python -m for pip
* Revert "fix load low bit com dtype (#11832)"
This reverts commit 6841a9ac8f.
---------
Co-authored-by: Zhao Changmin <changmin.zhao@intel.com>
Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>
* add transformers==4.36 for qwen vl in igpu-perf (#11846)
* add transformers==4.36.2 for qwen-vl
* Small update
---------
Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
* fix: remove qwen-7b on core test (#11851)
* fix: remove qwen-7b on core test
* fix: change delete to comment
---------
Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>
* replce filename (#11854)
* fix: remove qwen-7b on core test
* fix: change delete to comment
* fix: replace filename
---------
Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>
* fix: delete extra comments (#11863)
* Remove transformers installation for temp test purposes
* Small fix
* Small update
---------
Co-authored-by: Chu,Youcheng <70999398+cranechu0131@users.noreply.github.com>
Co-authored-by: Zhao Changmin <changmin.zhao@intel.com>
Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>
Co-authored-by: Zijie Li <michael20001122@gmail.com>
Co-authored-by: Chu,Youcheng <1340390339@qq.com>
* Pytorch models transformers version update (#11860)
* yi sync
* delete 4.34 constraint
* delete 4.34 constraint
* delete 4.31 constraint
* delete 4.34 constraint
* delete 4.35 constraint
* added <=4.33.3 constraint
* added <=4.33.3 constraint
* switched to chinese prompt
* Update compresskv model forward type logic (#11868)
* update
* fix
* Update local import for ppl (#11866)
Co-authored-by: jenniew <jenniewang123@gmail.com>
* fix: textual adjustment
---------
Co-authored-by: SONG Ge <38711238+sgwhat@users.noreply.github.com>
Co-authored-by: Yishuo Wang <yishuo.wang@intel.com>
Co-authored-by: Yuwen Hu <54161268+Oscilloscope98@users.noreply.github.com>
Co-authored-by: Zhao Changmin <changmin.zhao@intel.com>
Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>
Co-authored-by: Zijie Li <michael20001122@gmail.com>
Co-authored-by: Yina Chen <33650826+cyita@users.noreply.github.com>
Co-authored-by: RyuKosei <70006706+RyuKosei@users.noreply.github.com>
Co-authored-by: jenniew <jenniewang123@gmail.com>
		
	
			
		
			
				
	
	
		
			36 lines
		
	
	
	
		
			2 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			36 lines
		
	
	
	
		
			2 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
# Perplexity
 | 
						|
Perplexity (PPL) is one of the most common metrics for evaluating language models. This benchmark implementation is adapted from [transformers/perplexity](https://huggingface.co/docs/transformers/perplexity#perplexity-of-fixed-length-models) and [benchmark_patch_llm.py](https://github.com/insuhan/hyper-attn/blob/main/benchmark_patch_llm.py) 
 | 
						|
 | 
						|
## Environment Preparation
 | 
						|
```bash
 | 
						|
pip install --pre --upgrade ipex-llm[xpu] --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
 | 
						|
pip install datasets
 | 
						|
```
 | 
						|
This is a required step on Linux for APT or offline installed oneAPI. Skip this step for PIP-installed oneAPI.
 | 
						|
 | 
						|
```bash
 | 
						|
source /opt/intel/oneapi/setvars.sh
 | 
						|
```
 | 
						|
 | 
						|
## PPL Evaluation
 | 
						|
### 1. Run on Wikitext
 | 
						|
An example to run perplexity on [wikitext](https://paperswithcode.com/dataset/wikitext-2):
 | 
						|
```bash
 | 
						|
python run_wikitext.py --model_path meta-llama/Meta-Llama-3-8B --dataset path=wikitext,name=wikitext-2-raw-v1 --precision sym_int4 --device xpu --stride 512 --max_length 4096
 | 
						|
```
 | 
						|
###  2. Run on [THUDM/LongBench](https://github.com/THUDM/LongBench) dataset
 | 
						|
 | 
						|
An example to run perplexity on chatglm3-6b using the default Chinese datasets("multifieldqa_zh", "dureader", "vcsum", "lsht", "passage_retrieval_zh")
 | 
						|
```bash
 | 
						|
python run_longbench.py --model_path THUDM/chatglm3-6b --precisions float16 sym_int4 --device xpu --language zh
 | 
						|
```
 | 
						|
 | 
						|
 | 
						|
Notes:
 | 
						|
- If you want to test model perplexity on a few selected datasets from the `LongBench` dataset, please use the format below.
 | 
						|
  ```bash
 | 
						|
  --datasets narrativeqa qasper ...
 | 
						|
  ```
 | 
						|
- The `language` argument will only take effect if `datasets` is `None`. The choices for this argument are `en, zh, all`, which stands for all the English datasets, all the Chinese datasets and all the datasets respectively during testing.
 | 
						|
- If you want to test perplexity on pre-downloaded datasets, please specify the `<path/to/dataset>` in the `dataset_path` argument in your command.
 | 
						|
- You can run `python make_table.py <input_dir>` to summarize the results.
 |