Commit graph

220 commits

Author SHA1 Message Date
Guancheng Fu
3accc31b86
Update 1ccl_for_multi_arc.patch (#13199) 2025-05-30 17:13:59 +08:00
Shaojun Liu
c5d919b151
update vllm patch (#13185)
Co-authored-by: gc-fu <guancheng.fu@intel.com>
2025-05-23 15:02:50 +08:00
Shaojun Liu
1576347892
Update Dockerfile (#13168) 2025-05-20 16:41:13 +08:00
Wang, Jian4
66eb054988
Update vllm patch (#13164) 2025-05-19 16:54:21 +08:00
Wang, Jian4
d83e5068d2
Enable whisper (#13162)
* fix error

* update dockerfile
2025-05-19 14:07:51 +08:00
Shaojun Liu
bd71739e64
Update docs and scripts to align with new Docker image release (#13156)
* Update vllm_docker_quickstart.md

* Update start-vllm-service.sh

* Update vllm_docker_quickstart.md

* Update start-vllm-service.sh
2025-05-13 17:06:29 +08:00
Xiangyu Tian
886c7632b2
Add IPEX_LLM_FORCE_BATCH_FORWARD for vLLM docker image (#13151) 2025-05-12 13:44:33 +08:00
Wang, Jian4
5df03ced2c
Update vllm patch for fix telechat2 and baichuan2 error(#13150) 2025-05-12 10:54:22 +08:00
Guancheng Fu
da08c9ca60
Update Dockerfile (#13148) 2025-05-12 09:19:18 +08:00
Shaojun Liu
45f7bf6688
Refactor vLLM Documentation: Centralize Benchmarking and Improve Readability (#13141)
* update vllm doc

* update image name

* update

* update

* update

* update
2025-05-09 10:19:42 +08:00
Wang, Jian4
be76918b61
Update 083 multimodal benchmark (#13135)
* update multimodal benchmark

* update
2025-05-07 09:35:09 +08:00
Wang, Jian4
01bc7e9eb9
Fix 083 lm_head error (#13132)
* fix no quantize error

* update

* update style
2025-05-06 15:47:20 +08:00
Xiangyu Tian
51b41faad7
vLLM: update vLLM XPU to 0.8.3 version (#13118)
vLLM: update vLLM XPU to 0.8.3 version
2025-04-30 14:40:53 +08:00
Wang, Jian4
16fa778e65
enable glm4v and gemma-3 on vllm 083 (#13114)
* enable glm4v and gemma-3

* update

* add qwen2.5-vl
2025-04-27 17:10:56 +08:00
Guancheng Fu
cf97d8f1d7
Update start-vllm-service.sh (#13109) 2025-04-25 15:42:15 +08:00
Shaojun Liu
73198d5b80
Update to b17 image (#13085)
* update vllm patch

* fix

* fix triton

---------

Co-authored-by: gc-fu <guancheng.fu@intel.com>
2025-04-17 16:18:22 +08:00
Shaojun Liu
db5edba786
Update Dockerfile (#13081) 2025-04-16 09:18:46 +08:00
Shaojun Liu
fa56212bb3
Update vLLM patch (#13079)
* update vllm patch

* Update Dockerfile
2025-04-15 16:55:29 +08:00
Shaojun Liu
f5aaa83649
Update serving-xpu Dockerfile (#13077)
* Update Dockerfile

* Update Dockerfile
2025-04-15 13:34:14 +08:00
Shaojun Liu
cfadf3f2f7
upgrade linux-libc-dev to fix CVEs (#13076) 2025-04-15 11:43:53 +08:00
Shaojun Liu
7826152f5a
update vllm patch (#13072) 2025-04-14 14:56:10 +08:00
Guancheng Fu
3ee6dec0f8
update vllm patch (#13064) 2025-04-10 15:03:37 +08:00
Shaojun Liu
1d7f4a83ac
Update documentation to build Docker image from Dockerfile instead of pulling from registry (#13057)
* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update docker_cpp_xpu_quickstart.md

* Update vllm_cpu_docker_quickstart.md

* Update docker_cpp_xpu_quickstart.md

* Update vllm_docker_quickstart.md

* Update fastchat_docker_quickstart.md

* Update docker_pytorch_inference_gpu.md
2025-04-09 16:40:20 +08:00
Xiangyu Tian
34b1b14225
vLLM: Fix vLLM CPU dockerfile to resolve cmake deprecated issue (#13026) 2025-03-31 16:09:25 +08:00
Guancheng Fu
61c2e9c271
Refactor docker image by applying patch method (#13011)
* first stage try

* second try

* add ninja

* Done

* fix
2025-03-28 08:13:50 +08:00
Wang, Jian4
7809ca9864
Reuse --privileged (#13015)
* fix

* add
2025-03-27 10:00:50 +08:00
Shaojun Liu
7a86dd0569
Remove unused Gradio (#12995) 2025-03-24 10:51:06 +08:00
Shaojun Liu
b0d56273a8
Fix Docker build failure due to outdated ipex-llm pip index URL (#12977) 2025-03-17 10:46:01 +08:00
Shaojun Liu
760abc47aa
Fix Docker build failure due to outdated ipex-llm pip index URL (#12976) 2025-03-17 09:50:09 +08:00
Shaojun Liu
6a2d87e40f
add --entrypoint /bin/bash (#12957)
Co-authored-by: gc-fu <guancheng.fu@intel.com>
2025-03-10 10:10:27 +08:00
Shaojun Liu
015a4c8c43
Add CPU and GPU Frequency Locking Instructions to Documentation (#12947) 2025-03-07 09:20:40 +08:00
Shaojun Liu
f81d89d908
Remove Unnecessary --privileged Flag While Keeping It for WSL Users (#12920) 2025-03-03 11:11:42 +08:00
Shaojun Liu
7810b8fb49
OSPDT: update dockerfile header (#12908)
* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile
2025-03-03 09:59:11 +08:00
Shaojun Liu
5c100ac105
Add ENTRYPOINT to Dockerfile to auto-start vllm service on container launch (for CVTE customer) (#12901)
* Add ENTRYPOINT to Dockerfile to auto-start service on container launch (for CVTE client)

* Update start-vllm-service.sh

* Update README.md

* Update README.md

* Update start-vllm-service.sh

* Update README.md
2025-02-27 17:33:58 +08:00
Xiangyu Tian
ae9f5320da
vLLM CPU: Fix Triton Version to Resolve Related Error(#12893) 2025-02-25 15:00:41 +08:00
Shaojun Liu
dd30d12cb6
Fix serving-cpu image: setuptools-scm requires setuptools>=61 (#12876)
* setuptools-scm requires setuptools>=61

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile
2025-02-25 09:10:14 +08:00
Wang, Jian4
4f2f92afa3
Update inference-cpp docker (#12882)
* remove nouse run.py

* add WORKDIR /llm
2025-02-24 14:32:44 +08:00
Shaojun Liu
afad979168
Add Apache 2.0 License Information in Dockerfile to Comply with OSPDT Requirements (#12878)
* ospdt: add Header for Dockerfile

* OSPDT: add Header for Dockerfile

* OSPDT: add Header for Dockerfile

* OSPDT: add Header for Dockerfile
2025-02-24 14:00:46 +08:00
Wang, Jian4
e1809a6295
Update multimodal on vllm 0.6.6 (#12816)
* add glm4v and minicpmv example

* fix
2025-02-19 10:04:42 +08:00
Shaojun Liu
f7b5a093a7
Merge CPU & XPU Dockerfiles with Serving Images and Refactor (#12815)
* Update Dockerfile

* Update Dockerfile

* Ensure scripts are executable

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* update

* Update Dockerfile

* remove inference-cpu and inference-xpu

* update README
2025-02-17 14:23:22 +08:00
Wang, Jian4
1083fe5508
Reenable pp and lightweight-serving serving on 0.6.6 (#12814)
* reenable pp ang lightweight serving on 066

* update readme

* updat

* update tag
2025-02-13 10:16:00 +08:00
Guancheng Fu
af693425f1
Upgrade to vLLM 0.6.6 (#12796)
* init

* update engine init

* fix serving load_in_low_bit problem

* temp

* temp

* temp

* temp

* temp

* fix

* fixed

* done

* fix

* fix all arguments

* fix

* fix throughput script

* fix

* fix

* use official ipex-llm

* Fix readme

* fix

---------

Co-authored-by: hzjane <a1015616934@qq.com>
2025-02-12 16:47:51 +08:00
Shaojun Liu
bd815a4d96
Update the base image of inference-cpp image to oneapi 2025.0.2 (#12802)
* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile

* Update Dockerfile
2025-02-12 14:15:08 +08:00
Xiangyu Tian
9e9b6c9f2b
Fix cpu serving docker image (#12783) 2025-02-07 11:12:42 +08:00
Xiangyu Tian
f924880694
vLLM: Fix vLLM-CPU docker image (#12741) 2025-01-24 10:00:29 +08:00
Xiangyu Tian
c9b6c94a59
vLLM: Update vLLM-cpu to v0.6.6-post1 (#12728)
Update vLLM-cpu to v0.6.6-post1
2025-01-22 15:03:01 +08:00
Wang, Jian4
716d4fe563
Add vllm 0.6.2 vision offline example (#12721)
* add vision offline example

* add to docker
2025-01-21 09:58:01 +08:00
Shaojun Liu
2673792de6
Update Dockerfile (#12688) 2025-01-10 09:01:29 +08:00
Shaojun Liu
28737c250c
Update Dockerfile (#12585) 2024-12-26 10:20:52 +08:00
Shaojun Liu
51ff9ebd8a
Upgrade oneccl version to 0.0.6.3 (#12560)
* Update Dockerfile

* Update Dockerfile

* Update start-vllm-service.sh
2024-12-20 09:29:16 +08:00