From 5f7ff76ea512ba156caad7f083ce1bea3a2ea136 Mon Sep 17 00:00:00 2001
From: Yina Chen <33650826+cyita@users.noreply.github.com>
Date: Thu, 29 Aug 2024 12:44:22 +0300
Subject: [PATCH] update troubleshooting (#11960)

---
 .../llm/example/NPU/HF-Transformers-AutoModels/LLM/README.md  | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/python/llm/example/NPU/HF-Transformers-AutoModels/LLM/README.md b/python/llm/example/NPU/HF-Transformers-AutoModels/LLM/README.md
index 2127a34d..59a2c52f 100644
--- a/python/llm/example/NPU/HF-Transformers-AutoModels/LLM/README.md
+++ b/python/llm/example/NPU/HF-Transformers-AutoModels/LLM/README.md
@@ -126,6 +126,7 @@ Arguments info:
 
 ### Troubleshooting
 
+#### Output Problem
 If you encounter output problem, please try to disable the optimization of transposing value cache with following command:
 ```bash
 # to run Llama-2-7b-chat-hf
@@ -144,6 +145,9 @@ python minicpm.py --disable-transpose-value-cache
 python minicpm.py --repo-id-or-model-path openbmb/MiniCPM-2B-sft-bf16 --disable-transpose-value-cache
 ```
 
+#### High CPU Utilization
+You can reduce CPU utilization by setting the environment variable with `set IPEX_LLM_CPU_LM_HEAD=0`.
+
 
 ### Sample Output
 #### [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf)