[NPU] Small update about zip doc (#12951)

2025-03-07 15:22:14 +08:00 · 2025-03-07 15:22:14 +08:00 · 5ee09b4b28
commit 5ee09b4b28
parent 015a4c8c43
2 changed files with 8 additions and 0 deletions
--- a/docs/mddocs/Quickstart/llama_cpp_npu_portable_zip_quickstart.md
+++ b/docs/mddocs/Quickstart/llama_cpp_npu_portable_zip_quickstart.md
@ -65,6 +65,10 @@ You could then use cli tool to run GGUF models on Intel NPU through running `lla
 llama-cli-npu.exe -m DeepSeek-R1-Distill-Qwen-7B-Q6_K.gguf -n 32 --prompt "What is AI?"
 ```

+> [!Note]
+> 
+> - The supported maximum number of input tokens is 960, and maximum sequence length for both input and output tokens is 1024 currently.
+
 ## Troubleshooting

 ### `L0 pfnCreate2 result: ZE_RESULT_ERROR_INVALID_ARGUMENT, code 0x78000004` error
--- a/docs/mddocs/Quickstart/llama_cpp_npu_portable_zip_quickstart.zh-CN.md
+++ b/docs/mddocs/Quickstart/llama_cpp_npu_portable_zip_quickstart.zh-CN.md
@ -65,6 +65,10 @@ IPEX-LLM 提供了 llama.cpp 的相关支持以在 Intel NPU 上运行 GGUF 模
 llama-cli-npu.exe -m DeepSeek-R1-Distill-Qwen-7B-Q6_K.gguf -n 32 --prompt "What is AI?"
 ```

+> [!Note]
+> 
+> - 目前支持的输入token数上限是960，输入和输出总token数上限是1024。
+
 ## 故障排除

 ### `L0 pfnCreate2 result: ZE_RESULT_ERROR_INVALID_ARGUMENT, code 0x78000004` 报错