From 5f7ff76ea512ba156caad7f083ce1bea3a2ea136 Mon Sep 17 00:00:00 2001 From: Yina Chen <33650826+cyita@users.noreply.github.com> Date: Thu, 29 Aug 2024 12:44:22 +0300 Subject: [PATCH] update troubleshooting (#11960) --- .../llm/example/NPU/HF-Transformers-AutoModels/LLM/README.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/python/llm/example/NPU/HF-Transformers-AutoModels/LLM/README.md b/python/llm/example/NPU/HF-Transformers-AutoModels/LLM/README.md index 2127a34d..59a2c52f 100644 --- a/python/llm/example/NPU/HF-Transformers-AutoModels/LLM/README.md +++ b/python/llm/example/NPU/HF-Transformers-AutoModels/LLM/README.md @@ -126,6 +126,7 @@ Arguments info: ### Troubleshooting +#### Output Problem If you encounter output problem, please try to disable the optimization of transposing value cache with following command: ```bash # to run Llama-2-7b-chat-hf @@ -144,6 +145,9 @@ python minicpm.py --disable-transpose-value-cache python minicpm.py --repo-id-or-model-path openbmb/MiniCPM-2B-sft-bf16 --disable-transpose-value-cache ``` +#### High CPU Utilization +You can reduce CPU utilization by setting the environment variable with `set IPEX_LLM_CPU_LM_HEAD=0`. + ### Sample Output #### [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf)