Commit graph

164 commits

Author SHA1 Message Date
Wang, Jian4
1bfcbc0640
Add multimodal benchmark (#12415)
* add benchmark multimodal

* update

* update

* update
2024-11-20 14:21:13 +08:00
Guancheng Fu
d6057f6dd2
Update benchmark_vllm_throughput.py (#12414) 2024-11-19 10:41:43 +08:00
Xu, Shuo
6726b198fd
Update readme & doc for the vllm upgrade to v0.6.2 (#12399)
Co-authored-by: ATMxsp01 <shou.xu@intel.com>
2024-11-14 10:28:15 +08:00
Shaojun Liu
27152476e1
minor fix (#12389) 2024-11-12 22:36:43 +08:00
Xu, Shuo
dd8964ba9c
changed inference-cpp/Dockerfile (#12386)
Co-authored-by: ATMxsp01 <shou.xu@intel.com>
Co-authored-by: Shaojun Liu <61072813+liu-shaojun@users.noreply.github.com>
2024-11-12 20:40:21 +08:00
Guancheng Fu
0ee54fc55f
Upgrade to vllm 0.6.2 (#12338)
* Initial updates for vllm 0.6.2

* fix

* Change Dockerfile to support v062

* Fix

* fix examples

* Fix

* done

* fix

* Update engine.py

* Fix Dockerfile to original path

* fix

* add option

* fix

* fix

* fix

* fix

---------

Co-authored-by: xiangyuT <xiangyu.tian@intel.com>
2024-11-12 20:35:34 +08:00
Jun Wang
4376fdee62
Decouple the openwebui and the ollama. in inference-cpp-xpu dockerfile (#12382)
* remove the openwebui in inference-cpp-xpu dockerfile

* update docker_cpp_xpu_quickstart.md

* add sample output in inference-cpp/readme

* remove the openwebui in main readme

* remove the openwebui in main readme
2024-11-12 20:15:23 +08:00
Wang, Jian4
85c9279e6e
Update llama-cpp docker usage (#12387) 2024-11-12 15:30:17 +08:00
Shaojun Liu
c92d76b997
Update oneccl-binding.patch (#12377)
* Add files via upload

* upload oneccl-binding.patch

* Update Dockerfile
2024-11-11 22:34:08 +08:00
Shaojun Liu
fad15c8ca0
Update fastchat demo script (#12367)
* Update README.md

* Update vllm_docker_quickstart.md
2024-11-08 15:42:17 +08:00
Xu, Shuo
ce0c6ae423
Update Readme for FastChat docker demo (#12354)
* update Readme for FastChat docker demo

* update readme

* add 'Serving with FastChat' part in docs

* polish docs

---------

Co-authored-by: ATMxsp01 <shou.xu@intel.com>
2024-11-07 15:22:42 +08:00
Xu, Shuo
899a30331a
Replace gradio_web_server.patch to adjust webui (#12329)
* replace gradio_web_server.patch to adjust webui

* fix patch problem

---------

Co-authored-by: ATMxsp01 <shou.xu@intel.com>
2024-11-06 09:16:32 +08:00
Jun Wang
3700e81977
[fix] vllm-online-benchmark first token latency error (#12271) 2024-10-29 17:54:36 +08:00
Guancheng Fu
67014cb29f
Add benchmark_latency.py to docker serving image (#12283) 2024-10-28 16:19:59 +08:00
Shaojun Liu
48fc63887d
use oneccl 0.0.5.1 (#12262) 2024-10-24 16:12:24 +08:00
Jun Wang
b10fc892e1
Update new reference link of xpu/docker/readme.md (#12188)
* [ADD] rewrite new vllm docker quick start

* [ADD] lora adapter doc finished

* [ADD] mulit lora adapter test successfully

* [ADD] add ipex-llm quantization doc

* [Merge] rebase main

* [REMOVE] rm tmp file

* [Merge] rebase main

* [ADD] add prefix caching experiment and result

* [REMOVE] rm cpu offloading chapter

* [ADD] rewrite new vllm docker quick start

* [ADD] lora adapter doc finished

* [ADD] mulit lora adapter test successfully

* [ADD] add ipex-llm quantization doc

* [Merge] rebase main

* [REMOVE] rm tmp file

* [Merge] rebase main

* [ADD] rewrite new vllm docker quick start

* [ADD] lora adapter doc finished

* [ADD] mulit lora adapter test successfully

* [ADD] add ipex-llm quantization doc

* [Merge] rebase main

* [REMOVE] rm tmp file

* [Merge] rebase main

* [UPDATE] update the link to new vllm-docker-quickstart
2024-10-18 13:18:08 +08:00
Shaojun Liu
7825dc1398
Upgrade oneccl to 0.0.5 (#12223) 2024-10-18 09:29:19 +08:00
Shaojun Liu
26390f9213
Update oneccl_wks_installer to 2024.0.0.4.1 (#12217) 2024-10-17 10:11:55 +08:00
Shaojun Liu
49eb20613a
add --blocksize to doc and script (#12187) 2024-10-12 09:17:42 +08:00
Shaojun Liu
1daab4531f
Upgrade oneccl to 0.0.4 in serving-xpu image (#12185)
* Update oneccl to 0.0.4

* upgrade transformers to 4.44.2
2024-10-11 16:54:50 +08:00
Shaojun Liu
657889e3e4
use english prompt by default (#12115) 2024-09-24 17:40:50 +08:00
Guancheng Fu
b36359e2ab
Fix xpu serving image oneccl (#12100) 2024-09-20 15:25:41 +08:00
Guancheng Fu
a6cbc01911
Use new oneccl for ipex-llm serving image (#12097) 2024-09-20 14:52:49 +08:00
Shaojun Liu
1295898830
update vllm_online_benchmark script to support long input (#12095)
* update vllm_online_benchmark script to support long input

* update guide
2024-09-20 14:18:30 +08:00
Xiangyu Tian
c2774e1a43
Update oneccl to 0.0.3 in serving-xpu image (#12088) 2024-09-18 14:29:17 +08:00
Shaojun Liu
beb876665d
pin gradio version to fix connection error (#12069) 2024-09-12 14:36:09 +08:00
Shaojun Liu
7e1e51d91a
Update vllm setting (#12059)
* revert

* update

* update

* update
2024-09-11 11:45:08 +08:00
Shaojun Liu
52863dd567
fix vllm_online_benchmark.py (#12056) 2024-09-11 09:45:30 +08:00
Guancheng Fu
69c8d36f16
Switching from vLLM v0.3.3 to vLLM 0.5.4 (#12042)
* Enable single card sync engine

* enable ipex-llm optimizations for vllm

* enable optimizations for lm_head

* Fix chatglm multi-reference problem

* Remove duplicate layer

* LLM: Update vLLM to v0.5.4 (#11746)

* Enable single card sync engine

* enable ipex-llm optimizations for vllm

* enable optimizations for lm_head

* Fix chatglm multi-reference problem

* update 0.5.4 api_server

* add dockerfile

* fix

* fix

* refine

* fix

---------

Co-authored-by: gc-fu <guancheng.fu@intel.com>

* Add vllm-0.5.4 Dockerfile (#11838)

* Update BIGDL_LLM_SDP_IGNORE_MASK in start-vllm-service.sh (#11957)

* Fix vLLM not convert issues (#11817) (#11918)

* Fix not convert issues

* refine

Co-authored-by: Guancheng Fu <110874468+gc-fu@users.noreply.github.com>

* Fix glm4-9b-chat nan error on vllm 0.5.4 (#11969)

* init

* update mlp forward

* fix minicpm error in vllm 0.5.4

* fix dependabot alerts (#12008)

* Update 0.5.4 dockerfile (#12021)

* Add vllm awq loading logic (#11987)

* [ADD] Add vllm awq loading logic

* [FIX] fix the module.linear_method path

* [FIX] fix quant_config path error

* Enable Qwen padding mlp to 256 to support batch_forward (#12030)

* Enable padding mlp

* padding to 256

* update style

* Install 27191 runtime in 0.5.4 docker image (#12040)

* fix rebase error

* fix rebase error

* vLLM: format for 0.5.4 rebase (#12043)

* format

* Update model_convert.py

* Fix serving docker related modifications (#12046)

* Fix undesired modifications (#12048)

* fix

* Refine offline_inference arguments

---------

Co-authored-by: Xiangyu Tian <109123695+xiangyuT@users.noreply.github.com>
Co-authored-by: Jun Wang <thoughts.times@gmail.com>
Co-authored-by: Wang, Jian4 <61138589+hzjane@users.noreply.github.com>
Co-authored-by: liu-shaojun <johnssalyn@outlook.com>
Co-authored-by: Shaojun Liu <61072813+liu-shaojun@users.noreply.github.com>
2024-09-10 15:37:43 +08:00
Shaojun Liu
1e8c87050f
fix model path (#11973) 2024-08-30 13:28:28 +08:00
Shaojun Liu
23f51f87f0
update tag to 2.2.0-SNAPSHOT (#11947) 2024-08-28 09:20:32 +08:00
Shaojun Liu
4cf640c548
update docker image tag to 2.2.0-SNAPSHOT (#11904) 2024-08-23 13:57:41 +08:00
Wang, Jian4
b119825152
Remove tgi parameter validation (#11688)
* remove validation

* add min warm up

* remove no need source
2024-07-30 16:37:44 +08:00
Guancheng Fu
86fc0492f4
Update oneccl used (#11647)
* Add internal oneccl

* fix

* fix

* add oneccl
2024-07-26 09:38:39 +08:00
Wang, Jian4
1eed0635f2
Add lightweight serving and support tgi parameter (#11600)
* init tgi request

* update openai api

* update for pp

* update and add readme

* add to docker

* add start bash

* update

* update

* update
2024-07-19 13:15:56 +08:00
Wang, Jian4
9c15abf825
Refactor fastapi-serving and add one card serving(#11581)
* init fastapi-serving one card

* mv api code to source

* update worker

* update for style-check

* add worker

* update bash

* update

* update worker name and add readme

* rename update

* rename to fastapi
2024-07-17 11:12:43 +08:00
Xiangyu Tian
7f5111a998
LLM: Refine start script for Pipeline Parallel Serving (#11557)
Refine start script and readme for Pipeline Parallel Serving
2024-07-11 15:45:27 +08:00
binbin Deng
66f6ffe4b2
Update GPU HF-Transformers example structure (#11526) 2024-07-08 17:58:06 +08:00
Shaojun Liu
72b4efaad4
Enhanced XPU Dockerfiles: Optimized Environment Variables and Documentation (#11506)
* Added SYCL_CACHE_PERSISTENT=1 to xpu Dockerfile

* Update the document to add explanations for environment variables.

* update quickstart
2024-07-04 20:18:38 +08:00
Guancheng Fu
4fbb0d33ae
Pin compute runtime version for xpu images (#11479)
* pin compute runtime version

* fix done
2024-07-01 21:41:02 +08:00
Wang, Jian4
e000ac90c4
Add pp_serving example to serving image (#11433)
* init pp

* update

* update

* no clone ipex-llm again
2024-06-28 16:45:25 +08:00
Wang, Jian4
b7bc1023fb
Add vllm_online_benchmark.py (#11458)
* init

* update and add

* update
2024-06-28 14:59:06 +08:00
Shaojun Liu
5aa3e427a9
Fix docker images (#11362)
* Fix docker images

* add-apt-repository requires gnupg, gpg-agent, software-properties-common

* update

* avoid importing ipex again
2024-06-20 15:44:55 +08:00
Xiangyu Tian
ef9f740801
Docs: Fix CPU Serving Docker README (#11351)
Fix CPU Serving Docker README
2024-06-18 16:27:51 +08:00
Guancheng Fu
c9b4cadd81
fix vLLM/docker issues (#11348)
* fix

* fix

* ffix
2024-06-18 16:23:53 +08:00
Qiyuan Gong
de4bb97b4f
Remove accelerate 0.23.0 install command in readme and docker (#11333)
*ipex-llm's accelerate has been upgraded to 0.23.0. Remove accelerate 0.23.0 install command in README and docker。
2024-06-17 17:52:12 +08:00
Shaojun Liu
77809be946
Install packages for ipex-llm-serving-cpu docker image (#11321)
* apt-get install patch

* Update Dockerfile

* Update Dockerfile

* revert
2024-06-14 15:26:01 +08:00
Shaojun Liu
9760ffc256
Fix SDLe CT222 Vulnerabilities (#11237)
* fix ct222 vuln

* update

* fix

* update ENTRYPOINT

* revert ENTRYPOINT

* Fix CT222 Vulns

* fix

* revert changes

* fix

* revert

* add sudo permission to ipex-llm user

* do not use ipex-llm user
2024-06-13 15:31:22 +08:00
Shaojun Liu
84f04087fb
Add intelanalytics/ipex-llm:sources image for OSPDT (#11296)
* Add intelanalytics/ipex-llm:sources image

* apt-get source
2024-06-13 14:29:14 +08:00
Guancheng Fu
2e75bbccf9
Add more control arguments for benchmark_vllm_throughput (#11291) 2024-06-12 17:43:06 +08:00