diff --git a/python/llm/example/GPU/vLLM-Serving/README.md b/python/llm/example/GPU/vLLM-Serving/README.md index b29da5a4..333708ff 100644 --- a/python/llm/example/GPU/vLLM-Serving/README.md +++ b/python/llm/example/GPU/vLLM-Serving/README.md @@ -336,6 +336,8 @@ python -m ipex_llm.vllm.xpu.entrypoints.openai.api_server \ ``` After seaching has completed, it will show the recommended maximum context length in the log like: + + ![max_length](./max_length.png) Then, you can start the service with this maximum length: