diff --git a/python/llm/example/NPU/HF-Transformers-AutoModels/LLM/README.md b/python/llm/example/NPU/HF-Transformers-AutoModels/LLM/README.md index 2127a34d..59a2c52f 100644 --- a/python/llm/example/NPU/HF-Transformers-AutoModels/LLM/README.md +++ b/python/llm/example/NPU/HF-Transformers-AutoModels/LLM/README.md @@ -126,6 +126,7 @@ Arguments info: ### Troubleshooting +#### Output Problem If you encounter output problem, please try to disable the optimization of transposing value cache with following command: ```bash # to run Llama-2-7b-chat-hf @@ -144,6 +145,9 @@ python minicpm.py --disable-transpose-value-cache python minicpm.py --repo-id-or-model-path openbmb/MiniCPM-2B-sft-bf16 --disable-transpose-value-cache ``` +#### High CPU Utilization +You can reduce CPU utilization by setting the environment variable with `set IPEX_LLM_CPU_LM_HEAD=0`. + ### Sample Output #### [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf)