update benchmark readme (#12323)

* update benchmark readme update new comment with memory usage included * Update README.md
2024-11-04 19:19:08 -05:00 · 2024-11-04 19:19:08 -05:00 · 45b0d371aa
commit 45b0d371aa
parent e2adc974fd
1 changed files with 17 additions and 7 deletions
--- a/python/llm/dev/benchmark/README.md
+++ b/python/llm/dev/benchmark/README.md
@ -59,6 +59,23 @@ with torch.inference_mode():
        output_str = tokenizer.decode(output[0], skip_special_tokens=True)
 ```

+### Sample Output
+```bash
+=========First token cost xx.xxxxs and 3.595703125 GB=========
+=========Rest tokens cost average xx.xxxxs (31 tokens in all) and 3.595703125 GB=========
+```
+
+You can also set `verbose = True`
+```python
+model = BenchmarkWrapper(model, do_print=True, verbose=True)
+```
+
+```bash
+=========First token cost xx.xxxxs and 3.595703125 GB=========
+=========Rest token cost average xx.xxxxs (31 tokens in all) and 3.595703125 GB=========
+Peak memory for every token: [3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125, 3.595703125]
+```
+
 ### Inference on multi GPUs
 Similarly, put this file into your benchmark directory, and then wrap your optimized model with `BenchmarkWrapper` (`model = BenchmarkWrapper(model)`).
 For example, just need to apply following code patch on [Deepspeed Autotp example code](https://github.com/intel-analytics/ipex-llm/blob/main/python/llm/example/GPU/Deepspeed-AutoTP/deepspeed_autotp.py) to calculate 1st and the rest token performance:
@ -79,10 +96,3 @@ For example, just need to apply following code patch on [Deepspeed Autotp exampl
     # Load tokenizer
     tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
 ```
-
-### Sample Output
-Output will be like:
-```bash
-=========First token cost xx.xxxxs=========
-=========Last token cost average xx.xxxxs (31 tokens in all)=========
-```