Commit graph

17 commits

Author SHA1 Message Date
Wang, Jian4
1bfcbc0640
Add multimodal benchmark (#12415)
* add benchmark multimodal

* update

* update

* update
2024-11-20 14:21:13 +08:00
Xu, Shuo
6726b198fd
Update readme & doc for the vllm upgrade to v0.6.2 (#12399)
Co-authored-by: ATMxsp01 <shou.xu@intel.com>
2024-11-14 10:28:15 +08:00
Shaojun Liu
fad15c8ca0
Update fastchat demo script (#12367)
* Update README.md

* Update vllm_docker_quickstart.md
2024-11-08 15:42:17 +08:00
Xu, Shuo
ce0c6ae423
Update Readme for FastChat docker demo (#12354)
* update Readme for FastChat docker demo

* update readme

* add 'Serving with FastChat' part in docs

* polish docs

---------

Co-authored-by: ATMxsp01 <shou.xu@intel.com>
2024-11-07 15:22:42 +08:00
Jun Wang
b10fc892e1
Update new reference link of xpu/docker/readme.md (#12188)
* [ADD] rewrite new vllm docker quick start

* [ADD] lora adapter doc finished

* [ADD] mulit lora adapter test successfully

* [ADD] add ipex-llm quantization doc

* [Merge] rebase main

* [REMOVE] rm tmp file

* [Merge] rebase main

* [ADD] add prefix caching experiment and result

* [REMOVE] rm cpu offloading chapter

* [ADD] rewrite new vllm docker quick start

* [ADD] lora adapter doc finished

* [ADD] mulit lora adapter test successfully

* [ADD] add ipex-llm quantization doc

* [Merge] rebase main

* [REMOVE] rm tmp file

* [Merge] rebase main

* [ADD] rewrite new vllm docker quick start

* [ADD] lora adapter doc finished

* [ADD] mulit lora adapter test successfully

* [ADD] add ipex-llm quantization doc

* [Merge] rebase main

* [REMOVE] rm tmp file

* [Merge] rebase main

* [UPDATE] update the link to new vllm-docker-quickstart
2024-10-18 13:18:08 +08:00
Shaojun Liu
1295898830
update vllm_online_benchmark script to support long input (#12095)
* update vllm_online_benchmark script to support long input

* update guide
2024-09-20 14:18:30 +08:00
Shaojun Liu
4cf640c548
update docker image tag to 2.2.0-SNAPSHOT (#11904) 2024-08-23 13:57:41 +08:00
Wang, Jian4
1eed0635f2
Add lightweight serving and support tgi parameter (#11600)
* init tgi request

* update openai api

* update for pp

* update and add readme

* add to docker

* add start bash

* update

* update

* update
2024-07-19 13:15:56 +08:00
Xiangyu Tian
7f5111a998
LLM: Refine start script for Pipeline Parallel Serving (#11557)
Refine start script and readme for Pipeline Parallel Serving
2024-07-11 15:45:27 +08:00
Wang, Jian4
e000ac90c4
Add pp_serving example to serving image (#11433)
* init pp

* update

* update

* no clone ipex-llm again
2024-06-28 16:45:25 +08:00
Wang, Jian4
b7bc1023fb
Add vllm_online_benchmark.py (#11458)
* init

* update and add

* update
2024-06-28 14:59:06 +08:00
Guancheng Fu
7e29928865
refactor serving docker image (#11028) 2024-05-16 09:30:36 +08:00
Guancheng Fu
2c64754eb0
Add vLLM to ipex-llm serving image (#10807)
* add vllm

* done

* doc work

* fix done

* temp

* add docs

* format

* add start-fastchat-service.sh

* fix
2024-04-29 17:25:42 +08:00
Shaojun Liu
59058bb206
replace 2.5.0-SNAPSHOT with 2.1.0-SNAPSHOT for llm docker images (#10603) 2024-04-01 09:58:51 +08:00
Wang, Jian4
e2d25de17d
Update_docker by heyang (#29) 2024-03-25 10:05:46 +08:00
Shaojun Liu
0e5ab5ebfc update docker tag to 2.5.0-SNAPSHOT (#9443) 2023-11-13 16:53:40 +08:00
Guancheng Fu
cc84ed70b3 Create serving images (#9048)
* Finished & Tested

* Install latest pip from base images

* Add blank line

* Delete unused comment

* fix typos
2023-09-25 15:51:45 +08:00