Shaojun Liu
7810b8fb49
OSPDT: update dockerfile header ( #12908 )
...
* Update Dockerfile
* Update Dockerfile
* Update Dockerfile
* Update Dockerfile
2025-03-03 09:59:11 +08:00
Shaojun Liu
5c100ac105
Add ENTRYPOINT to Dockerfile to auto-start vllm service on container launch (for CVTE customer) ( #12901 )
...
* Add ENTRYPOINT to Dockerfile to auto-start service on container launch (for CVTE client)
* Update start-vllm-service.sh
* Update README.md
* Update README.md
* Update start-vllm-service.sh
* Update README.md
2025-02-27 17:33:58 +08:00
Xiangyu Tian
ae9f5320da
vLLM CPU: Fix Triton Version to Resolve Related Error( #12893 )
2025-02-25 15:00:41 +08:00
Shaojun Liu
dd30d12cb6
Fix serving-cpu image: setuptools-scm requires setuptools>=61 ( #12876 )
...
* setuptools-scm requires setuptools>=61
* Update Dockerfile
* Update Dockerfile
* Update Dockerfile
2025-02-25 09:10:14 +08:00
Wang, Jian4
4f2f92afa3
Update inference-cpp docker ( #12882 )
...
* remove nouse run.py
* add WORKDIR /llm
2025-02-24 14:32:44 +08:00
Shaojun Liu
afad979168
Add Apache 2.0 License Information in Dockerfile to Comply with OSPDT Requirements ( #12878 )
...
* ospdt: add Header for Dockerfile
* OSPDT: add Header for Dockerfile
* OSPDT: add Header for Dockerfile
* OSPDT: add Header for Dockerfile
2025-02-24 14:00:46 +08:00
Wang, Jian4
e1809a6295
Update multimodal on vllm 0.6.6 ( #12816 )
...
* add glm4v and minicpmv example
* fix
2025-02-19 10:04:42 +08:00
Shaojun Liu
f7b5a093a7
Merge CPU & XPU Dockerfiles with Serving Images and Refactor ( #12815 )
...
* Update Dockerfile
* Update Dockerfile
* Ensure scripts are executable
* Update Dockerfile
* Update Dockerfile
* Update Dockerfile
* Update Dockerfile
* Update Dockerfile
* Update Dockerfile
* update
* Update Dockerfile
* remove inference-cpu and inference-xpu
* update README
2025-02-17 14:23:22 +08:00
Wang, Jian4
1083fe5508
Reenable pp and lightweight-serving serving on 0.6.6 ( #12814 )
...
* reenable pp ang lightweight serving on 066
* update readme
* updat
* update tag
2025-02-13 10:16:00 +08:00
Guancheng Fu
af693425f1
Upgrade to vLLM 0.6.6 ( #12796 )
...
* init
* update engine init
* fix serving load_in_low_bit problem
* temp
* temp
* temp
* temp
* temp
* fix
* fixed
* done
* fix
* fix all arguments
* fix
* fix throughput script
* fix
* fix
* use official ipex-llm
* Fix readme
* fix
---------
Co-authored-by: hzjane <a1015616934@qq.com>
2025-02-12 16:47:51 +08:00
Shaojun Liu
bd815a4d96
Update the base image of inference-cpp image to oneapi 2025.0.2 ( #12802 )
...
* Update Dockerfile
* Update Dockerfile
* Update Dockerfile
* Update Dockerfile
* Update Dockerfile
* Update Dockerfile
* Update Dockerfile
* Update Dockerfile
* Update Dockerfile
* Update Dockerfile
2025-02-12 14:15:08 +08:00
Xiangyu Tian
9e9b6c9f2b
Fix cpu serving docker image ( #12783 )
2025-02-07 11:12:42 +08:00
Xiangyu Tian
f924880694
vLLM: Fix vLLM-CPU docker image ( #12741 )
2025-01-24 10:00:29 +08:00
Xiangyu Tian
c9b6c94a59
vLLM: Update vLLM-cpu to v0.6.6-post1 ( #12728 )
...
Update vLLM-cpu to v0.6.6-post1
2025-01-22 15:03:01 +08:00
Wang, Jian4
716d4fe563
Add vllm 0.6.2 vision offline example ( #12721 )
...
* add vision offline example
* add to docker
2025-01-21 09:58:01 +08:00
Shaojun Liu
2673792de6
Update Dockerfile ( #12688 )
2025-01-10 09:01:29 +08:00
Shaojun Liu
28737c250c
Update Dockerfile ( #12585 )
2024-12-26 10:20:52 +08:00
Shaojun Liu
51ff9ebd8a
Upgrade oneccl version to 0.0.6.3 ( #12560 )
...
* Update Dockerfile
* Update Dockerfile
* Update start-vllm-service.sh
2024-12-20 09:29:16 +08:00
Shaojun Liu
429bf1ffeb
Change: Use cn mirror for PyTorch extension installation to resolve network issues ( #12559 )
...
* Update Dockerfile
* Update Dockerfile
* Update Dockerfile
2024-12-17 14:22:50 +08:00
Heyang Sun
fa261b8af1
torch 2.3 inference docker ( #12517 )
...
* torch 2.3 inference docker
* Update README.md
* add convert code
* rename image
* remove 2.1 and add graph example
* Update README.md
2024-12-13 10:47:04 +08:00
Wang, Jian4
922958c018
vllm oneccl upgrade to b9 ( #12520 )
2024-12-10 15:02:56 +08:00
Guancheng Fu
8331875f34
Fix ( #12390 )
2024-11-27 10:41:58 +08:00
Pepijn de Vos
71e1f11aa6
update serving image runtime ( #12433 )
2024-11-26 14:55:30 +08:00
Shaojun Liu
c089b6c10d
Update english prompt to 34k ( #12429 )
2024-11-22 11:20:35 +08:00
Wang, Jian4
1bfcbc0640
Add multimodal benchmark ( #12415 )
...
* add benchmark multimodal
* update
* update
* update
2024-11-20 14:21:13 +08:00
Guancheng Fu
d6057f6dd2
Update benchmark_vllm_throughput.py ( #12414 )
2024-11-19 10:41:43 +08:00
Xu, Shuo
6726b198fd
Update readme & doc for the vllm upgrade to v0.6.2 ( #12399 )
...
Co-authored-by: ATMxsp01 <shou.xu@intel.com>
2024-11-14 10:28:15 +08:00
Shaojun Liu
27152476e1
minor fix ( #12389 )
2024-11-12 22:36:43 +08:00
Xu, Shuo
dd8964ba9c
changed inference-cpp/Dockerfile ( #12386 )
...
Co-authored-by: ATMxsp01 <shou.xu@intel.com>
Co-authored-by: Shaojun Liu <61072813+liu-shaojun@users.noreply.github.com>
2024-11-12 20:40:21 +08:00
Guancheng Fu
0ee54fc55f
Upgrade to vllm 0.6.2 ( #12338 )
...
* Initial updates for vllm 0.6.2
* fix
* Change Dockerfile to support v062
* Fix
* fix examples
* Fix
* done
* fix
* Update engine.py
* Fix Dockerfile to original path
* fix
* add option
* fix
* fix
* fix
* fix
---------
Co-authored-by: xiangyuT <xiangyu.tian@intel.com>
2024-11-12 20:35:34 +08:00
Jun Wang
4376fdee62
Decouple the openwebui and the ollama. in inference-cpp-xpu dockerfile ( #12382 )
...
* remove the openwebui in inference-cpp-xpu dockerfile
* update docker_cpp_xpu_quickstart.md
* add sample output in inference-cpp/readme
* remove the openwebui in main readme
* remove the openwebui in main readme
2024-11-12 20:15:23 +08:00
Wang, Jian4
85c9279e6e
Update llama-cpp docker usage ( #12387 )
2024-11-12 15:30:17 +08:00
Shaojun Liu
c92d76b997
Update oneccl-binding.patch ( #12377 )
...
* Add files via upload
* upload oneccl-binding.patch
* Update Dockerfile
2024-11-11 22:34:08 +08:00
Shaojun Liu
fad15c8ca0
Update fastchat demo script ( #12367 )
...
* Update README.md
* Update vllm_docker_quickstart.md
2024-11-08 15:42:17 +08:00
Xu, Shuo
ce0c6ae423
Update Readme for FastChat docker demo ( #12354 )
...
* update Readme for FastChat docker demo
* update readme
* add 'Serving with FastChat' part in docs
* polish docs
---------
Co-authored-by: ATMxsp01 <shou.xu@intel.com>
2024-11-07 15:22:42 +08:00
Xu, Shuo
899a30331a
Replace gradio_web_server.patch to adjust webui ( #12329 )
...
* replace gradio_web_server.patch to adjust webui
* fix patch problem
---------
Co-authored-by: ATMxsp01 <shou.xu@intel.com>
2024-11-06 09:16:32 +08:00
Jun Wang
3700e81977
[fix] vllm-online-benchmark first token latency error ( #12271 )
2024-10-29 17:54:36 +08:00
Guancheng Fu
67014cb29f
Add benchmark_latency.py to docker serving image ( #12283 )
2024-10-28 16:19:59 +08:00
Shaojun Liu
48fc63887d
use oneccl 0.0.5.1 ( #12262 )
2024-10-24 16:12:24 +08:00
Jun Wang
b10fc892e1
Update new reference link of xpu/docker/readme.md ( #12188 )
...
* [ADD] rewrite new vllm docker quick start
* [ADD] lora adapter doc finished
* [ADD] mulit lora adapter test successfully
* [ADD] add ipex-llm quantization doc
* [Merge] rebase main
* [REMOVE] rm tmp file
* [Merge] rebase main
* [ADD] add prefix caching experiment and result
* [REMOVE] rm cpu offloading chapter
* [ADD] rewrite new vllm docker quick start
* [ADD] lora adapter doc finished
* [ADD] mulit lora adapter test successfully
* [ADD] add ipex-llm quantization doc
* [Merge] rebase main
* [REMOVE] rm tmp file
* [Merge] rebase main
* [ADD] rewrite new vllm docker quick start
* [ADD] lora adapter doc finished
* [ADD] mulit lora adapter test successfully
* [ADD] add ipex-llm quantization doc
* [Merge] rebase main
* [REMOVE] rm tmp file
* [Merge] rebase main
* [UPDATE] update the link to new vllm-docker-quickstart
2024-10-18 13:18:08 +08:00
Shaojun Liu
7825dc1398
Upgrade oneccl to 0.0.5 ( #12223 )
2024-10-18 09:29:19 +08:00
Shaojun Liu
26390f9213
Update oneccl_wks_installer to 2024.0.0.4.1 ( #12217 )
2024-10-17 10:11:55 +08:00
Shaojun Liu
49eb20613a
add --blocksize to doc and script ( #12187 )
2024-10-12 09:17:42 +08:00
Shaojun Liu
1daab4531f
Upgrade oneccl to 0.0.4 in serving-xpu image ( #12185 )
...
* Update oneccl to 0.0.4
* upgrade transformers to 4.44.2
2024-10-11 16:54:50 +08:00
Shaojun Liu
657889e3e4
use english prompt by default ( #12115 )
2024-09-24 17:40:50 +08:00
Guancheng Fu
b36359e2ab
Fix xpu serving image oneccl ( #12100 )
2024-09-20 15:25:41 +08:00
Guancheng Fu
a6cbc01911
Use new oneccl for ipex-llm serving image ( #12097 )
2024-09-20 14:52:49 +08:00
Shaojun Liu
1295898830
update vllm_online_benchmark script to support long input ( #12095 )
...
* update vllm_online_benchmark script to support long input
* update guide
2024-09-20 14:18:30 +08:00
Xiangyu Tian
c2774e1a43
Update oneccl to 0.0.3 in serving-xpu image ( #12088 )
2024-09-18 14:29:17 +08:00
Shaojun Liu
beb876665d
pin gradio version to fix connection error ( #12069 )
2024-09-12 14:36:09 +08:00