diff --git a/docker/llm/serving/xpu/docker/README.md b/docker/llm/serving/xpu/docker/README.md index c506dcad..23d6aff6 100644 --- a/docker/llm/serving/xpu/docker/README.md +++ b/docker/llm/serving/xpu/docker/README.md @@ -179,6 +179,26 @@ The following example files are available in `/llm/` within the container: ## 4. Benchmarking +> [!TIP] +> Before running benchmarks, it's recommended to lock CPU and GPU frequencies to ensure more stable, reliable, and better performance data. +> +> **Lock CPU Frequency:** +> Use the following command to set the minimum CPU frequency (adjust based on your CPU model): +> +> ```bash +> sudo cpupower frequency-set -d 3.8GHz +> ``` +> +> **Lock GPU Frequencies:** +> Use these commands to lock GPU frequencies to 2400MHz: +> +> ```bash +> sudo xpu-smi config -d 0 -t 0 --frequencyrange 2400,2400 +> sudo xpu-smi config -d 1 -t 0 --frequencyrange 2400,2400 +> sudo xpu-smi config -d 2 -t 0 --frequencyrange 2400,2400 +> sudo xpu-smi config -d 3 -t 0 --frequencyrange 2400,2400 +> ``` + ### 4.1 Online Benchmark through API Server To benchmark the API server and estimate TPS (transactions per second), follow these steps: diff --git a/docs/mddocs/DockerGuides/vllm_docker_quickstart.md b/docs/mddocs/DockerGuides/vllm_docker_quickstart.md index dd05c68f..06025d93 100644 --- a/docs/mddocs/DockerGuides/vllm_docker_quickstart.md +++ b/docs/mddocs/DockerGuides/vllm_docker_quickstart.md @@ -59,6 +59,26 @@ root@arda-arc12:/# sycl-ls ## Running vLLM serving with IPEX-LLM on Intel GPU in Docker +> [!TIP] +> Before running benchmarks, it's recommended to lock CPU and GPU frequencies to ensure more stable, reliable, and better performance data. +> +> **Lock CPU Frequency:** +> Use the following command to set the minimum CPU frequency (adjust based on your CPU model): +> +> ```bash +> sudo cpupower frequency-set -d 3.8GHz +> ``` +> +> **Lock GPU Frequencies:** +> Use these commands to lock GPU frequencies to 2400MHz: +> +> ```bash +> sudo xpu-smi config -d 0 -t 0 --frequencyrange 2400,2400 +> sudo xpu-smi config -d 1 -t 0 --frequencyrange 2400,2400 +> sudo xpu-smi config -d 2 -t 0 --frequencyrange 2400,2400 +> sudo xpu-smi config -d 3 -t 0 --frequencyrange 2400,2400 +> ``` + We have included multiple vLLM-related files in `/llm/`: 1. `vllm_offline_inference.py`: Used for vLLM offline inference example, diff --git a/docs/mddocs/Quickstart/vLLM_quickstart.md b/docs/mddocs/Quickstart/vLLM_quickstart.md index 86d1daff..9fbb7615 100644 --- a/docs/mddocs/Quickstart/vLLM_quickstart.md +++ b/docs/mddocs/Quickstart/vLLM_quickstart.md @@ -21,6 +21,26 @@ Currently, IPEX-LLM integrated vLLM only supports the following models: ## Quick Start +> [!TIP] +> Before running benchmarks, it's recommended to lock CPU and GPU frequencies to ensure more stable, reliable, and better performance data. +> +> **Lock CPU Frequency:** +> Use the following command to set the minimum CPU frequency (adjust based on your CPU model): +> +> ```bash +> sudo cpupower frequency-set -d 3.8GHz +> ``` +> +> **Lock GPU Frequencies:** +> Use these commands to lock GPU frequencies to 2400MHz: +> +> ```bash +> sudo xpu-smi config -d 0 -t 0 --frequencyrange 2400,2400 +> sudo xpu-smi config -d 1 -t 0 --frequencyrange 2400,2400 +> sudo xpu-smi config -d 2 -t 0 --frequencyrange 2400,2400 +> sudo xpu-smi config -d 3 -t 0 --frequencyrange 2400,2400 +> ``` + This quickstart guide walks you through installing and running `vLLM` with `ipex-llm`. ### 1. Install IPEX-LLM for vLLM