LLM: update llm latency benchmark. (#8922)
This commit is contained in:
parent
7897eb4b51
commit
3d2efe9608
3 changed files with 13 additions and 5 deletions
|
|
@ -1,7 +1,7 @@
|
||||||
# All in One Benchmark Test
|
# All in One Benchmark Test
|
||||||
All in one benchmark test allows users to test all the benchmarks and record them in a result CSV. Users can provide models and related information in `config.yaml`.
|
All in one benchmark test allows users to test all the benchmarks and record them in a result CSV. Users can provide models and related information in `config.yaml`.
|
||||||
|
|
||||||
Before running, make sure to have [bigdl-llm](../../../README.md) installed.
|
Before running, make sure to have [bigdl-llm](../../../README.md) and [bigdl-nano](../../../../nano/README.md) installed.
|
||||||
|
|
||||||
## Config
|
## Config
|
||||||
Config YAML file has following format
|
Config YAML file has following format
|
||||||
|
|
@ -28,4 +28,10 @@ test_api:
|
||||||
run `python run.py`, this will output results to `results.csv`.
|
run `python run.py`, this will output results to `results.csv`.
|
||||||
|
|
||||||
For SPR performance, run `bash run-spr.sh`.
|
For SPR performance, run `bash run-spr.sh`.
|
||||||
For ARC performance, run `bash run-arc.sh`
|
> **Note**
|
||||||
|
>
|
||||||
|
> In `run-spr.sh`, we set optimal environment varaible by `source bigdl-nano-init -c`, `-c` stands for disabling jemalloc. Enabling jemalloc may lead to latency increasement after multiple trials.
|
||||||
|
>
|
||||||
|
> The value of `OMP_NUM_THREADS` should be the same as the cpu cores specified by `numactl -C`.
|
||||||
|
|
||||||
|
For ARC performance, run `bash run-arc.sh`.
|
||||||
|
|
|
||||||
|
|
@ -1,5 +1,7 @@
|
||||||
#!/bin/bash
|
#!/bin/bash
|
||||||
|
source bigdl-nano-init -c
|
||||||
|
export OMP_NUM_THREADS=48
|
||||||
|
export TRANSFORMERS_OFFLINE=1
|
||||||
|
|
||||||
# set following parameters according to the actual specs of the test machine
|
# set following parameters according to the actual specs of the test machine
|
||||||
numactl -C 0-47 -m 0 python $(dirname "$0")/run.py
|
numactl -C 0-47 -m 0 python $(dirname "$0")/run.py
|
||||||
|
|
@ -116,8 +116,8 @@ def run_transformer_int4(repo_id,
|
||||||
model = AutoModel.from_pretrained(model_path, load_in_4bit=True, trust_remote_code=True, torch_dtype='auto')
|
model = AutoModel.from_pretrained(model_path, load_in_4bit=True, trust_remote_code=True, torch_dtype='auto')
|
||||||
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
|
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
|
||||||
else:
|
else:
|
||||||
model = AutoModelForCausalLM.from_pretrained(model_path, load_in_4bit=True)
|
model = AutoModelForCausalLM.from_pretrained(model_path, load_in_4bit=True, trust_remote_code=True)
|
||||||
tokenizer = AutoTokenizer.from_pretrained(model_path)
|
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
|
||||||
end = time.perf_counter()
|
end = time.perf_counter()
|
||||||
print(">> loading of model costs {}s".format(end - st))
|
print(">> loading of model costs {}s".format(end - st))
|
||||||
|
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue