..
1ccl_for_multi_arc.patch
vLLM: update vLLM XPU to 0.8.3 version ( #13118 )
2025-04-30 14:40:53 +08:00
benchmark.sh
Merge CPU & XPU Dockerfiles with Serving Images and Refactor ( #12815 )
2025-02-17 14:23:22 +08:00
benchmark_vllm_latency.py
Upgrade to vLLM 0.6.6 ( #12796 )
2025-02-12 16:47:51 +08:00
benchmark_vllm_throughput.py
Upgrade to vLLM 0.6.6 ( #12796 )
2025-02-12 16:47:51 +08:00
ccl_torch.patch
Upgrade to vLLM 0.6.6 ( #12796 )
2025-02-12 16:47:51 +08:00
chat.py
Merge CPU & XPU Dockerfiles with Serving Images and Refactor ( #12815 )
2025-02-17 14:23:22 +08:00
Dockerfile
Update Dockerfile ( #13168 )
2025-05-20 16:41:13 +08:00
oneccl-binding.patch
Update oneccl-binding.patch ( #12377 )
2024-11-11 22:34:08 +08:00
payload-1024.lua
Add vLLM to ipex-llm serving image ( #10807 )
2024-04-29 17:25:42 +08:00
README.md
Refactor vLLM Documentation: Centralize Benchmarking and Improve Readability ( #13141 )
2025-05-09 10:19:42 +08:00
setvars.sh
Refactor docker image by applying patch method ( #13011 )
2025-03-28 08:13:50 +08:00
start-lightweight_serving-service.sh
Reenable pp and lightweight-serving serving on 0.6.6 ( #12814 )
2025-02-13 10:16:00 +08:00
start-pp_serving-service.sh
Reenable pp and lightweight-serving serving on 0.6.6 ( #12814 )
2025-02-13 10:16:00 +08:00
start-vllm-service.sh
Update docs and scripts to align with new Docker image release ( #13156 )
2025-05-13 17:06:29 +08:00
vllm_for_multi_arc.patch
Update vllm patch ( #13164 )
2025-05-19 16:54:21 +08:00
vllm_offline_inference.py
Fix ( #12390 )
2024-11-27 10:41:58 +08:00
vllm_offline_inference_vision_language.py
Fix 083 lm_head error ( #13132 )
2025-05-06 15:47:20 +08:00
vllm_online_benchmark.py
Update english prompt to 34k ( #12429 )
2024-11-22 11:20:35 +08:00
vllm_online_benchmark_multimodal.py
Update 083 multimodal benchmark ( #13135 )
2025-05-07 09:35:09 +08:00