Commit graph

187 commits

Author SHA1 Message Date
Shaojun Liu
5c100ac105
Add ENTRYPOINT to Dockerfile to auto-start vllm service on container launch (for CVTE customer) (#12901)
* Add ENTRYPOINT to Dockerfile to auto-start service on container launch (for CVTE client)

* Update start-vllm-service.sh

* Update README.md

* Update README.md

* Update start-vllm-service.sh

* Update README.md
2025-02-27 17:33:58 +08:00
Xiangyu Tian
ae9f5320da
vLLM CPU: Fix Triton Version to Resolve Related Error(#12893) 2025-02-25 15:00:41 +08:00
Shaojun Liu
dd30d12cb6
Fix serving-cpu image: setuptools-scm requires setuptools>=61 (#12876)
* setuptools-scm requires setuptools>=61

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile
2025-02-25 09:10:14 +08:00
Wang, Jian4
4f2f92afa3
Update inference-cpp docker (#12882)
* remove nouse run.py

* add WORKDIR /llm
2025-02-24 14:32:44 +08:00
Shaojun Liu
afad979168
Add Apache 2.0 License Information in Dockerfile to Comply with OSPDT Requirements (#12878)
* ospdt: add Header for Dockerfile

* OSPDT: add Header for Dockerfile

* OSPDT: add Header for Dockerfile

* OSPDT: add Header for Dockerfile
2025-02-24 14:00:46 +08:00
Wang, Jian4
e1809a6295
Update multimodal on vllm 0.6.6 (#12816)
* add glm4v and minicpmv example

* fix
2025-02-19 10:04:42 +08:00
Shaojun Liu
f7b5a093a7
Merge CPU & XPU Dockerfiles with Serving Images and Refactor (#12815)
* Update Dockerfile

* Update Dockerfile

* Ensure scripts are executable

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* update

* Update Dockerfile

* remove inference-cpu and inference-xpu

* update README
2025-02-17 14:23:22 +08:00
Wang, Jian4
1083fe5508
Reenable pp and lightweight-serving serving on 0.6.6 (#12814)
* reenable pp ang lightweight serving on 066

* update readme

* updat

* update tag
2025-02-13 10:16:00 +08:00
Guancheng Fu
af693425f1
Upgrade to vLLM 0.6.6 (#12796)
* init

* update engine init

* fix serving load_in_low_bit problem

* temp

* temp

* temp

* temp

* temp

* fix

* fixed

* done

* fix

* fix all arguments

* fix

* fix throughput script

* fix

* fix

* use official ipex-llm

* Fix readme

* fix

---------

Co-authored-by: hzjane <a1015616934@qq.com>
2025-02-12 16:47:51 +08:00
Shaojun Liu
bd815a4d96
Update the base image of inference-cpp image to oneapi 2025.0.2 (#12802)
* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile
2025-02-12 14:15:08 +08:00
Xiangyu Tian
9e9b6c9f2b
Fix cpu serving docker image (#12783) 2025-02-07 11:12:42 +08:00
Xiangyu Tian
f924880694
vLLM: Fix vLLM-CPU docker image (#12741) 2025-01-24 10:00:29 +08:00
Xiangyu Tian
c9b6c94a59
vLLM: Update vLLM-cpu to v0.6.6-post1 (#12728)
Update vLLM-cpu to v0.6.6-post1
2025-01-22 15:03:01 +08:00
Wang, Jian4
716d4fe563
Add vllm 0.6.2 vision offline example (#12721)
* add vision offline example

* add to docker
2025-01-21 09:58:01 +08:00
Shaojun Liu
2673792de6
Update Dockerfile (#12688) 2025-01-10 09:01:29 +08:00
Shaojun Liu
28737c250c
Update Dockerfile (#12585) 2024-12-26 10:20:52 +08:00
Shaojun Liu
51ff9ebd8a
Upgrade oneccl version to 0.0.6.3 (#12560)
* Update Dockerfile

* Update Dockerfile

* Update start-vllm-service.sh
2024-12-20 09:29:16 +08:00
Shaojun Liu
429bf1ffeb
Change: Use cn mirror for PyTorch extension installation to resolve network issues (#12559)
* Update Dockerfile

* Update Dockerfile

* Update Dockerfile
2024-12-17 14:22:50 +08:00
Heyang Sun
fa261b8af1
torch 2.3 inference docker (#12517)
* torch 2.3 inference docker

* Update README.md

* add convert code

* rename image

* remove 2.1 and add graph example

* Update README.md
2024-12-13 10:47:04 +08:00
Wang, Jian4
922958c018
vllm oneccl upgrade to b9 (#12520) 2024-12-10 15:02:56 +08:00
Guancheng Fu
8331875f34
Fix (#12390) 2024-11-27 10:41:58 +08:00
Pepijn de Vos
71e1f11aa6
update serving image runtime (#12433) 2024-11-26 14:55:30 +08:00
Shaojun Liu
c089b6c10d
Update english prompt to 34k (#12429) 2024-11-22 11:20:35 +08:00
Wang, Jian4
1bfcbc0640
Add multimodal benchmark (#12415)
* add benchmark multimodal

* update

* update

* update
2024-11-20 14:21:13 +08:00
Guancheng Fu
d6057f6dd2
Update benchmark_vllm_throughput.py (#12414) 2024-11-19 10:41:43 +08:00
Xu, Shuo
6726b198fd
Update readme & doc for the vllm upgrade to v0.6.2 (#12399)
Co-authored-by: ATMxsp01 <shou.xu@intel.com>
2024-11-14 10:28:15 +08:00
Shaojun Liu
27152476e1
minor fix (#12389) 2024-11-12 22:36:43 +08:00
Xu, Shuo
dd8964ba9c
changed inference-cpp/Dockerfile (#12386)
Co-authored-by: ATMxsp01 <shou.xu@intel.com>
Co-authored-by: Shaojun Liu <61072813+liu-shaojun@users.noreply.github.com>
2024-11-12 20:40:21 +08:00
Guancheng Fu
0ee54fc55f
Upgrade to vllm 0.6.2 (#12338)
* Initial updates for vllm 0.6.2

* fix

* Change Dockerfile to support v062

* Fix

* fix examples

* Fix

* done

* fix

* Update engine.py

* Fix Dockerfile to original path

* fix

* add option

* fix

* fix

* fix

* fix

---------

Co-authored-by: xiangyuT <xiangyu.tian@intel.com>
2024-11-12 20:35:34 +08:00
Jun Wang
4376fdee62
Decouple the openwebui and the ollama. in inference-cpp-xpu dockerfile (#12382)
* remove the openwebui in inference-cpp-xpu dockerfile

* update docker_cpp_xpu_quickstart.md

* add sample output in inference-cpp/readme

* remove the openwebui in main readme

* remove the openwebui in main readme
2024-11-12 20:15:23 +08:00
Wang, Jian4
85c9279e6e
Update llama-cpp docker usage (#12387) 2024-11-12 15:30:17 +08:00
Shaojun Liu
c92d76b997
Update oneccl-binding.patch (#12377)
* Add files via upload

* upload oneccl-binding.patch

* Update Dockerfile
2024-11-11 22:34:08 +08:00
Shaojun Liu
fad15c8ca0
Update fastchat demo script (#12367)
* Update README.md

* Update vllm_docker_quickstart.md
2024-11-08 15:42:17 +08:00
Xu, Shuo
ce0c6ae423
Update Readme for FastChat docker demo (#12354)
* update Readme for FastChat docker demo

* update readme

* add 'Serving with FastChat' part in docs

* polish docs

---------

Co-authored-by: ATMxsp01 <shou.xu@intel.com>
2024-11-07 15:22:42 +08:00
Xu, Shuo
899a30331a
Replace gradio_web_server.patch to adjust webui (#12329)
* replace gradio_web_server.patch to adjust webui

* fix patch problem

---------

Co-authored-by: ATMxsp01 <shou.xu@intel.com>
2024-11-06 09:16:32 +08:00
Jun Wang
3700e81977
[fix] vllm-online-benchmark first token latency error (#12271) 2024-10-29 17:54:36 +08:00
Guancheng Fu
67014cb29f
Add benchmark_latency.py to docker serving image (#12283) 2024-10-28 16:19:59 +08:00
Shaojun Liu
48fc63887d
use oneccl 0.0.5.1 (#12262) 2024-10-24 16:12:24 +08:00
Jun Wang
b10fc892e1
Update new reference link of xpu/docker/readme.md (#12188)
* [ADD] rewrite new vllm docker quick start

* [ADD] lora adapter doc finished

* [ADD] mulit lora adapter test successfully

* [ADD] add ipex-llm quantization doc

* [Merge] rebase main

* [REMOVE] rm tmp file

* [Merge] rebase main

* [ADD] add prefix caching experiment and result

* [REMOVE] rm cpu offloading chapter

* [ADD] rewrite new vllm docker quick start

* [ADD] lora adapter doc finished

* [ADD] mulit lora adapter test successfully

* [ADD] add ipex-llm quantization doc

* [Merge] rebase main

* [REMOVE] rm tmp file

* [Merge] rebase main

* [ADD] rewrite new vllm docker quick start

* [ADD] lora adapter doc finished

* [ADD] mulit lora adapter test successfully

* [ADD] add ipex-llm quantization doc

* [Merge] rebase main

* [REMOVE] rm tmp file

* [Merge] rebase main

* [UPDATE] update the link to new vllm-docker-quickstart
2024-10-18 13:18:08 +08:00
Shaojun Liu
7825dc1398
Upgrade oneccl to 0.0.5 (#12223) 2024-10-18 09:29:19 +08:00
Shaojun Liu
26390f9213
Update oneccl_wks_installer to 2024.0.0.4.1 (#12217) 2024-10-17 10:11:55 +08:00
Shaojun Liu
49eb20613a
add --blocksize to doc and script (#12187) 2024-10-12 09:17:42 +08:00
Shaojun Liu
1daab4531f
Upgrade oneccl to 0.0.4 in serving-xpu image (#12185)
* Update oneccl to 0.0.4

* upgrade transformers to 4.44.2
2024-10-11 16:54:50 +08:00
Shaojun Liu
657889e3e4
use english prompt by default (#12115) 2024-09-24 17:40:50 +08:00
Guancheng Fu
b36359e2ab
Fix xpu serving image oneccl (#12100) 2024-09-20 15:25:41 +08:00
Guancheng Fu
a6cbc01911
Use new oneccl for ipex-llm serving image (#12097) 2024-09-20 14:52:49 +08:00
Shaojun Liu
1295898830
update vllm_online_benchmark script to support long input (#12095)
* update vllm_online_benchmark script to support long input

* update guide
2024-09-20 14:18:30 +08:00
Xiangyu Tian
c2774e1a43
Update oneccl to 0.0.3 in serving-xpu image (#12088) 2024-09-18 14:29:17 +08:00
Shaojun Liu
beb876665d
pin gradio version to fix connection error (#12069) 2024-09-12 14:36:09 +08:00
Shaojun Liu
7e1e51d91a
Update vllm setting (#12059)
* revert

* update

* update

* update
2024-09-11 11:45:08 +08:00