From 289cc99cd6717155573f8a40cba8cf6a56315ea2 Mon Sep 17 00:00:00 2001
From: Wenjing Margaret Mao <wenjingmaomargaret@gmail.com>
Date: Tue, 9 Apr 2024 16:01:12 +0800
Subject: [PATCH] Update README.md (#10700)

Edit "summarize the results"
---
 python/llm/dev/benchmark/harness/README.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/python/llm/dev/benchmark/harness/README.md b/python/llm/dev/benchmark/harness/README.md
index 4dfcf09a..50ec4b86 100644
--- a/python/llm/dev/benchmark/harness/README.md
+++ b/python/llm/dev/benchmark/harness/README.md
@@ -30,6 +30,6 @@ Taking example above, the script will fork 3 processes, each for one xpu, to exe
 ## Results
 We follow [Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) to record our metrics, `acc_norm` for `hellaswag` and `arc_challenge`, `mc2` for `truthful_qa` and `acc` for `mmlu`. For `mmlu`, there are 57 subtasks which means users may need to average them manually to get final result.
 ## Summarize the results
-"""python
+```python
 python make_table.py <input_dir>
-"""
\ No newline at end of file
+```