Shaojun Liu
|
7810b8fb49
|
OSPDT: update dockerfile header (#12908)
* Update Dockerfile
* Update Dockerfile
* Update Dockerfile
* Update Dockerfile
|
2025-03-03 09:59:11 +08:00 |
|
Shaojun Liu
|
5c100ac105
|
Add ENTRYPOINT to Dockerfile to auto-start vllm service on container launch (for CVTE customer) (#12901)
* Add ENTRYPOINT to Dockerfile to auto-start service on container launch (for CVTE client)
* Update start-vllm-service.sh
* Update README.md
* Update README.md
* Update start-vllm-service.sh
* Update README.md
|
2025-02-27 17:33:58 +08:00 |
|
Xiangyu Tian
|
ae9f5320da
|
vLLM CPU: Fix Triton Version to Resolve Related Error(#12893)
|
2025-02-25 15:00:41 +08:00 |
|
Shaojun Liu
|
dd30d12cb6
|
Fix serving-cpu image: setuptools-scm requires setuptools>=61 (#12876)
* setuptools-scm requires setuptools>=61
* Update Dockerfile
* Update Dockerfile
* Update Dockerfile
|
2025-02-25 09:10:14 +08:00 |
|
Shaojun Liu
|
afad979168
|
Add Apache 2.0 License Information in Dockerfile to Comply with OSPDT Requirements (#12878)
* ospdt: add Header for Dockerfile
* OSPDT: add Header for Dockerfile
* OSPDT: add Header for Dockerfile
* OSPDT: add Header for Dockerfile
|
2025-02-24 14:00:46 +08:00 |
|
Wang, Jian4
|
e1809a6295
|
Update multimodal on vllm 0.6.6 (#12816)
* add glm4v and minicpmv example
* fix
|
2025-02-19 10:04:42 +08:00 |
|
Shaojun Liu
|
f7b5a093a7
|
Merge CPU & XPU Dockerfiles with Serving Images and Refactor (#12815)
* Update Dockerfile
* Update Dockerfile
* Ensure scripts are executable
* Update Dockerfile
* Update Dockerfile
* Update Dockerfile
* Update Dockerfile
* Update Dockerfile
* Update Dockerfile
* update
* Update Dockerfile
* remove inference-cpu and inference-xpu
* update README
|
2025-02-17 14:23:22 +08:00 |
|
Wang, Jian4
|
1083fe5508
|
Reenable pp and lightweight-serving serving on 0.6.6 (#12814)
* reenable pp ang lightweight serving on 066
* update readme
* updat
* update tag
|
2025-02-13 10:16:00 +08:00 |
|
Guancheng Fu
|
af693425f1
|
Upgrade to vLLM 0.6.6 (#12796)
* init
* update engine init
* fix serving load_in_low_bit problem
* temp
* temp
* temp
* temp
* temp
* fix
* fixed
* done
* fix
* fix all arguments
* fix
* fix throughput script
* fix
* fix
* use official ipex-llm
* Fix readme
* fix
---------
Co-authored-by: hzjane <a1015616934@qq.com>
|
2025-02-12 16:47:51 +08:00 |
|
Xiangyu Tian
|
9e9b6c9f2b
|
Fix cpu serving docker image (#12783)
|
2025-02-07 11:12:42 +08:00 |
|
Xiangyu Tian
|
f924880694
|
vLLM: Fix vLLM-CPU docker image (#12741)
|
2025-01-24 10:00:29 +08:00 |
|
Xiangyu Tian
|
c9b6c94a59
|
vLLM: Update vLLM-cpu to v0.6.6-post1 (#12728)
Update vLLM-cpu to v0.6.6-post1
|
2025-01-22 15:03:01 +08:00 |
|
Wang, Jian4
|
716d4fe563
|
Add vllm 0.6.2 vision offline example (#12721)
* add vision offline example
* add to docker
|
2025-01-21 09:58:01 +08:00 |
|
Shaojun Liu
|
28737c250c
|
Update Dockerfile (#12585)
|
2024-12-26 10:20:52 +08:00 |
|
Shaojun Liu
|
51ff9ebd8a
|
Upgrade oneccl version to 0.0.6.3 (#12560)
* Update Dockerfile
* Update Dockerfile
* Update start-vllm-service.sh
|
2024-12-20 09:29:16 +08:00 |
|
Shaojun Liu
|
429bf1ffeb
|
Change: Use cn mirror for PyTorch extension installation to resolve network issues (#12559)
* Update Dockerfile
* Update Dockerfile
* Update Dockerfile
|
2024-12-17 14:22:50 +08:00 |
|
Wang, Jian4
|
922958c018
|
vllm oneccl upgrade to b9 (#12520)
|
2024-12-10 15:02:56 +08:00 |
|
Guancheng Fu
|
8331875f34
|
Fix (#12390)
|
2024-11-27 10:41:58 +08:00 |
|
Pepijn de Vos
|
71e1f11aa6
|
update serving image runtime (#12433)
|
2024-11-26 14:55:30 +08:00 |
|
Shaojun Liu
|
c089b6c10d
|
Update english prompt to 34k (#12429)
|
2024-11-22 11:20:35 +08:00 |
|
Wang, Jian4
|
1bfcbc0640
|
Add multimodal benchmark (#12415)
* add benchmark multimodal
* update
* update
* update
|
2024-11-20 14:21:13 +08:00 |
|
Guancheng Fu
|
d6057f6dd2
|
Update benchmark_vllm_throughput.py (#12414)
|
2024-11-19 10:41:43 +08:00 |
|
Xu, Shuo
|
6726b198fd
|
Update readme & doc for the vllm upgrade to v0.6.2 (#12399)
Co-authored-by: ATMxsp01 <shou.xu@intel.com>
|
2024-11-14 10:28:15 +08:00 |
|
Guancheng Fu
|
0ee54fc55f
|
Upgrade to vllm 0.6.2 (#12338)
* Initial updates for vllm 0.6.2
* fix
* Change Dockerfile to support v062
* Fix
* fix examples
* Fix
* done
* fix
* Update engine.py
* Fix Dockerfile to original path
* fix
* add option
* fix
* fix
* fix
* fix
---------
Co-authored-by: xiangyuT <xiangyu.tian@intel.com>
|
2024-11-12 20:35:34 +08:00 |
|
Shaojun Liu
|
c92d76b997
|
Update oneccl-binding.patch (#12377)
* Add files via upload
* upload oneccl-binding.patch
* Update Dockerfile
|
2024-11-11 22:34:08 +08:00 |
|
Shaojun Liu
|
fad15c8ca0
|
Update fastchat demo script (#12367)
* Update README.md
* Update vllm_docker_quickstart.md
|
2024-11-08 15:42:17 +08:00 |
|
Xu, Shuo
|
ce0c6ae423
|
Update Readme for FastChat docker demo (#12354)
* update Readme for FastChat docker demo
* update readme
* add 'Serving with FastChat' part in docs
* polish docs
---------
Co-authored-by: ATMxsp01 <shou.xu@intel.com>
|
2024-11-07 15:22:42 +08:00 |
|
Xu, Shuo
|
899a30331a
|
Replace gradio_web_server.patch to adjust webui (#12329)
* replace gradio_web_server.patch to adjust webui
* fix patch problem
---------
Co-authored-by: ATMxsp01 <shou.xu@intel.com>
|
2024-11-06 09:16:32 +08:00 |
|
Jun Wang
|
3700e81977
|
[fix] vllm-online-benchmark first token latency error (#12271)
|
2024-10-29 17:54:36 +08:00 |
|
Guancheng Fu
|
67014cb29f
|
Add benchmark_latency.py to docker serving image (#12283)
|
2024-10-28 16:19:59 +08:00 |
|
Shaojun Liu
|
48fc63887d
|
use oneccl 0.0.5.1 (#12262)
|
2024-10-24 16:12:24 +08:00 |
|
Jun Wang
|
b10fc892e1
|
Update new reference link of xpu/docker/readme.md (#12188)
* [ADD] rewrite new vllm docker quick start
* [ADD] lora adapter doc finished
* [ADD] mulit lora adapter test successfully
* [ADD] add ipex-llm quantization doc
* [Merge] rebase main
* [REMOVE] rm tmp file
* [Merge] rebase main
* [ADD] add prefix caching experiment and result
* [REMOVE] rm cpu offloading chapter
* [ADD] rewrite new vllm docker quick start
* [ADD] lora adapter doc finished
* [ADD] mulit lora adapter test successfully
* [ADD] add ipex-llm quantization doc
* [Merge] rebase main
* [REMOVE] rm tmp file
* [Merge] rebase main
* [ADD] rewrite new vllm docker quick start
* [ADD] lora adapter doc finished
* [ADD] mulit lora adapter test successfully
* [ADD] add ipex-llm quantization doc
* [Merge] rebase main
* [REMOVE] rm tmp file
* [Merge] rebase main
* [UPDATE] update the link to new vllm-docker-quickstart
|
2024-10-18 13:18:08 +08:00 |
|
Shaojun Liu
|
7825dc1398
|
Upgrade oneccl to 0.0.5 (#12223)
|
2024-10-18 09:29:19 +08:00 |
|
Shaojun Liu
|
26390f9213
|
Update oneccl_wks_installer to 2024.0.0.4.1 (#12217)
|
2024-10-17 10:11:55 +08:00 |
|
Shaojun Liu
|
49eb20613a
|
add --blocksize to doc and script (#12187)
|
2024-10-12 09:17:42 +08:00 |
|
Shaojun Liu
|
1daab4531f
|
Upgrade oneccl to 0.0.4 in serving-xpu image (#12185)
* Update oneccl to 0.0.4
* upgrade transformers to 4.44.2
|
2024-10-11 16:54:50 +08:00 |
|
Shaojun Liu
|
657889e3e4
|
use english prompt by default (#12115)
|
2024-09-24 17:40:50 +08:00 |
|
Guancheng Fu
|
b36359e2ab
|
Fix xpu serving image oneccl (#12100)
|
2024-09-20 15:25:41 +08:00 |
|
Guancheng Fu
|
a6cbc01911
|
Use new oneccl for ipex-llm serving image (#12097)
|
2024-09-20 14:52:49 +08:00 |
|
Shaojun Liu
|
1295898830
|
update vllm_online_benchmark script to support long input (#12095)
* update vllm_online_benchmark script to support long input
* update guide
|
2024-09-20 14:18:30 +08:00 |
|
Xiangyu Tian
|
c2774e1a43
|
Update oneccl to 0.0.3 in serving-xpu image (#12088)
|
2024-09-18 14:29:17 +08:00 |
|
Shaojun Liu
|
beb876665d
|
pin gradio version to fix connection error (#12069)
|
2024-09-12 14:36:09 +08:00 |
|
Shaojun Liu
|
7e1e51d91a
|
Update vllm setting (#12059)
* revert
* update
* update
* update
|
2024-09-11 11:45:08 +08:00 |
|
Shaojun Liu
|
52863dd567
|
fix vllm_online_benchmark.py (#12056)
|
2024-09-11 09:45:30 +08:00 |
|
Guancheng Fu
|
69c8d36f16
|
Switching from vLLM v0.3.3 to vLLM 0.5.4 (#12042)
* Enable single card sync engine
* enable ipex-llm optimizations for vllm
* enable optimizations for lm_head
* Fix chatglm multi-reference problem
* Remove duplicate layer
* LLM: Update vLLM to v0.5.4 (#11746)
* Enable single card sync engine
* enable ipex-llm optimizations for vllm
* enable optimizations for lm_head
* Fix chatglm multi-reference problem
* update 0.5.4 api_server
* add dockerfile
* fix
* fix
* refine
* fix
---------
Co-authored-by: gc-fu <guancheng.fu@intel.com>
* Add vllm-0.5.4 Dockerfile (#11838)
* Update BIGDL_LLM_SDP_IGNORE_MASK in start-vllm-service.sh (#11957)
* Fix vLLM not convert issues (#11817) (#11918)
* Fix not convert issues
* refine
Co-authored-by: Guancheng Fu <110874468+gc-fu@users.noreply.github.com>
* Fix glm4-9b-chat nan error on vllm 0.5.4 (#11969)
* init
* update mlp forward
* fix minicpm error in vllm 0.5.4
* fix dependabot alerts (#12008)
* Update 0.5.4 dockerfile (#12021)
* Add vllm awq loading logic (#11987)
* [ADD] Add vllm awq loading logic
* [FIX] fix the module.linear_method path
* [FIX] fix quant_config path error
* Enable Qwen padding mlp to 256 to support batch_forward (#12030)
* Enable padding mlp
* padding to 256
* update style
* Install 27191 runtime in 0.5.4 docker image (#12040)
* fix rebase error
* fix rebase error
* vLLM: format for 0.5.4 rebase (#12043)
* format
* Update model_convert.py
* Fix serving docker related modifications (#12046)
* Fix undesired modifications (#12048)
* fix
* Refine offline_inference arguments
---------
Co-authored-by: Xiangyu Tian <109123695+xiangyuT@users.noreply.github.com>
Co-authored-by: Jun Wang <thoughts.times@gmail.com>
Co-authored-by: Wang, Jian4 <61138589+hzjane@users.noreply.github.com>
Co-authored-by: liu-shaojun <johnssalyn@outlook.com>
Co-authored-by: Shaojun Liu <61072813+liu-shaojun@users.noreply.github.com>
|
2024-09-10 15:37:43 +08:00 |
|
Shaojun Liu
|
1e8c87050f
|
fix model path (#11973)
|
2024-08-30 13:28:28 +08:00 |
|
Shaojun Liu
|
4cf640c548
|
update docker image tag to 2.2.0-SNAPSHOT (#11904)
|
2024-08-23 13:57:41 +08:00 |
|
Wang, Jian4
|
b119825152
|
Remove tgi parameter validation (#11688)
* remove validation
* add min warm up
* remove no need source
|
2024-07-30 16:37:44 +08:00 |
|
Guancheng Fu
|
86fc0492f4
|
Update oneccl used (#11647)
* Add internal oneccl
* fix
* fix
* add oneccl
|
2024-07-26 09:38:39 +08:00 |
|
Wang, Jian4
|
1eed0635f2
|
Add lightweight serving and support tgi parameter (#11600)
* init tgi request
* update openai api
* update for pp
* update and add readme
* add to docker
* add start bash
* update
* update
* update
|
2024-07-19 13:15:56 +08:00 |
|