update benchmark readme (#12323)
* update benchmark readme update new comment with memory usage included * Update README.md
This commit is contained in:
parent
e2adc974fd
commit
45b0d371aa
1 changed files with 17 additions and 7 deletions
|
|
@ -59,6 +59,23 @@ with torch.inference_mode():
|
||||||
output_str = tokenizer.decode(output[0], skip_special_tokens=True)
|
output_str = tokenizer.decode(output[0], skip_special_tokens=True)
|
||||||
```
|
```
|
||||||
|
|
||||||
|
### Sample Output
|
||||||
|
```bash
|
||||||
|
=========First token cost xx.xxxxs and 3.595703125 GB=========
|
||||||
|
=========Rest tokens cost average xx.xxxxs (31 tokens in all) and 3.595703125 GB=========
|
||||||
|
```
|
||||||
|
|
||||||
|
You can also set `verbose = True`
|
||||||
|
```python
|
||||||
|
model = BenchmarkWrapper(model, do_print=True, verbose=True)
|
||||||
|
```
|
||||||
|
|
||||||
|
```bash
|
||||||
|
=========First token cost xx.xxxxs and 3.595703125 GB=========
|
||||||
|
=========Rest token cost average xx.xxxxs (31 tokens in all) and 3.595703125 GB=========
|
||||||
|
Peak memory for every token: [3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125]
|
||||||
|
```
|
||||||
|
|
||||||
### Inference on multi GPUs
|
### Inference on multi GPUs
|
||||||
Similarly, put this file into your benchmark directory, and then wrap your optimized model with `BenchmarkWrapper` (`model = BenchmarkWrapper(model)`).
|
Similarly, put this file into your benchmark directory, and then wrap your optimized model with `BenchmarkWrapper` (`model = BenchmarkWrapper(model)`).
|
||||||
For example, just need to apply following code patch on [Deepspeed Autotp example code](https://github.com/intel-analytics/ipex-llm/blob/main/python/llm/example/GPU/Deepspeed-AutoTP/deepspeed_autotp.py) to calculate 1st and the rest token performance:
|
For example, just need to apply following code patch on [Deepspeed Autotp example code](https://github.com/intel-analytics/ipex-llm/blob/main/python/llm/example/GPU/Deepspeed-AutoTP/deepspeed_autotp.py) to calculate 1st and the rest token performance:
|
||||||
|
|
@ -79,10 +96,3 @@ For example, just need to apply following code patch on [Deepspeed Autotp exampl
|
||||||
# Load tokenizer
|
# Load tokenizer
|
||||||
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
|
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
|
||||||
```
|
```
|
||||||
|
|
||||||
### Sample Output
|
|
||||||
Output will be like:
|
|
||||||
```bash
|
|
||||||
=========First token cost xx.xxxxs=========
|
|
||||||
=========Last token cost average xx.xxxxs (31 tokens in all)=========
|
|
||||||
```
|
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue