From 5ee09b4b28893d73b22f1751970c4690cefc7727 Mon Sep 17 00:00:00 2001 From: binbin Deng <108676127+plusbang@users.noreply.github.com> Date: Fri, 7 Mar 2025 15:22:14 +0800 Subject: [PATCH] [NPU] Small update about zip doc (#12951) --- .../Quickstart/llama_cpp_npu_portable_zip_quickstart.md | 4 ++++ .../Quickstart/llama_cpp_npu_portable_zip_quickstart.zh-CN.md | 4 ++++ 2 files changed, 8 insertions(+) diff --git a/docs/mddocs/Quickstart/llama_cpp_npu_portable_zip_quickstart.md b/docs/mddocs/Quickstart/llama_cpp_npu_portable_zip_quickstart.md index 9212ec0f..da2aeb63 100644 --- a/docs/mddocs/Quickstart/llama_cpp_npu_portable_zip_quickstart.md +++ b/docs/mddocs/Quickstart/llama_cpp_npu_portable_zip_quickstart.md @@ -65,6 +65,10 @@ You could then use cli tool to run GGUF models on Intel NPU through running `lla llama-cli-npu.exe -m DeepSeek-R1-Distill-Qwen-7B-Q6_K.gguf -n 32 --prompt "What is AI?" ``` +> [!Note] +> +> - The supported maximum number of input tokens is 960, and maximum sequence length for both input and output tokens is 1024 currently. + ## Troubleshooting ### `L0 pfnCreate2 result: ZE_RESULT_ERROR_INVALID_ARGUMENT, code 0x78000004` error diff --git a/docs/mddocs/Quickstart/llama_cpp_npu_portable_zip_quickstart.zh-CN.md b/docs/mddocs/Quickstart/llama_cpp_npu_portable_zip_quickstart.zh-CN.md index df5a012b..8620bd43 100644 --- a/docs/mddocs/Quickstart/llama_cpp_npu_portable_zip_quickstart.zh-CN.md +++ b/docs/mddocs/Quickstart/llama_cpp_npu_portable_zip_quickstart.zh-CN.md @@ -65,6 +65,10 @@ IPEX-LLM 提供了 llama.cpp 的相关支持以在 Intel NPU 上运行 GGUF 模 llama-cli-npu.exe -m DeepSeek-R1-Distill-Qwen-7B-Q6_K.gguf -n 32 --prompt "What is AI?" ``` +> [!Note] +> +> - 目前支持的输入token数上限是960,输入和输出总token数上限是1024。 + ## 故障排除 ### `L0 pfnCreate2 result: ZE_RESULT_ERROR_INVALID_ARGUMENT, code 0x78000004` 报错