From d222eaffd7d73c46c55ff4fb760d5f6d5cd780fb Mon Sep 17 00:00:00 2001 From: Guancheng Fu <110874468+gc-fu@users.noreply.github.com> Date: Sun, 27 Apr 2025 17:13:18 +0800 Subject: [PATCH] Update README.md (#13113) --- python/llm/example/GPU/vLLM-Serving/README.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/python/llm/example/GPU/vLLM-Serving/README.md b/python/llm/example/GPU/vLLM-Serving/README.md index b29da5a4..333708ff 100644 --- a/python/llm/example/GPU/vLLM-Serving/README.md +++ b/python/llm/example/GPU/vLLM-Serving/README.md @@ -336,6 +336,8 @@ python -m ipex_llm.vllm.xpu.entrypoints.openai.api_server \ ``` After seaching has completed, it will show the recommended maximum context length in the log like: + + ![max_length](./max_length.png) Then, you can start the service with this maximum length: