[NPU] Small update about zip doc (#12951)
This commit is contained in:
		
							parent
							
								
									015a4c8c43
								
							
						
					
					
						commit
						5ee09b4b28
					
				
					 2 changed files with 8 additions and 0 deletions
				
			
		| 
						 | 
				
			
			@ -65,6 +65,10 @@ You could then use cli tool to run GGUF models on Intel NPU through running `lla
 | 
			
		|||
llama-cli-npu.exe -m DeepSeek-R1-Distill-Qwen-7B-Q6_K.gguf -n 32 --prompt "What is AI?"
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
> [!Note]
 | 
			
		||||
> 
 | 
			
		||||
> - The supported maximum number of input tokens is 960, and maximum sequence length for both input and output tokens is 1024 currently.
 | 
			
		||||
 | 
			
		||||
## Troubleshooting
 | 
			
		||||
 | 
			
		||||
### `L0 pfnCreate2 result: ZE_RESULT_ERROR_INVALID_ARGUMENT, code 0x78000004` error
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
| 
						 | 
				
			
			@ -65,6 +65,10 @@ IPEX-LLM 提供了 llama.cpp 的相关支持以在 Intel NPU 上运行 GGUF 模
 | 
			
		|||
llama-cli-npu.exe -m DeepSeek-R1-Distill-Qwen-7B-Q6_K.gguf -n 32 --prompt "What is AI?"
 | 
			
		||||
```
 | 
			
		||||
 | 
			
		||||
> [!Note]
 | 
			
		||||
> 
 | 
			
		||||
> - 目前支持的输入token数上限是960,输入和输出总token数上限是1024。
 | 
			
		||||
 | 
			
		||||
## 故障排除
 | 
			
		||||
 | 
			
		||||
### `L0 pfnCreate2 result: ZE_RESULT_ERROR_INVALID_ARGUMENT, code 0x78000004` 报错
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
		Loading…
	
		Reference in a new issue