add --blocksize to doc and script (#12187)
This commit is contained in:
		
							parent
							
								
									6ffaec66a2
								
							
						
					
					
						commit
						49eb20613a
					
				
					 2 changed files with 2 additions and 0 deletions
				
			
		| 
						 | 
				
			
			@ -19,6 +19,7 @@ python -m ipex_llm.vllm.xpu.entrypoints.openai.api_server \
 | 
			
		|||
  --port 8000 \
 | 
			
		||||
  --model $model \
 | 
			
		||||
  --trust-remote-code \
 | 
			
		||||
  --block-size 8 \
 | 
			
		||||
  --gpu-memory-utilization 0.9 \
 | 
			
		||||
  --device xpu \
 | 
			
		||||
  --dtype float16 \
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
| 
						 | 
				
			
			@ -103,6 +103,7 @@ Before performing benchmark or starting the service, you can refer to this [sect
 | 
			
		|||
|`--max-model-len`| Model context length. If unspecified, will be automatically derived from the model config.|
 | 
			
		||||
|`--max-num-batched-token`| Maximum number of batched tokens per iteration.|
 | 
			
		||||
|`--max-num-seq`| Maximum number of sequences per iteration. Default: 256|
 | 
			
		||||
|`--block-size`| vLLM block size. Set to 8 to achieve a performance boost.|
 | 
			
		||||
 | 
			
		||||
#### Single card serving
 | 
			
		||||
 | 
			
		||||
| 
						 | 
				
			
			
 | 
			
		|||
		Loading…
	
		Reference in a new issue