Commit graph

13 commits

Author SHA1 Message Date
Shaojun Liu
015a4c8c43
Add CPU and GPU Frequency Locking Instructions to Documentation (#12947) 2025-03-07 09:20:40 +08:00
Shaojun Liu
f81d89d908
Remove Unnecessary --privileged Flag While Keeping It for WSL Users (#12920) 2025-03-03 11:11:42 +08:00
Jun Wang
cb7b08948b
update vllm-docker-quick-start for vllm0.6.2 (#12392)
* update vllm-docker-quick-start for vllm0.6.2

* [UPDATE] rm max-num-seqs parameter in vllm-serving script
2024-11-27 08:47:03 +08:00
Xu, Shuo
6726b198fd
Update readme & doc for the vllm upgrade to v0.6.2 (#12399)
Co-authored-by: ATMxsp01 <shou.xu@intel.com>
2024-11-14 10:28:15 +08:00
Shaojun Liu
fad15c8ca0
Update fastchat demo script (#12367)
* Update README.md

* Update vllm_docker_quickstart.md
2024-11-08 15:42:17 +08:00
Xu, Shuo
ce0c6ae423
Update Readme for FastChat docker demo (#12354)
* update Readme for FastChat docker demo

* update readme

* add 'Serving with FastChat' part in docs

* polish docs

---------

Co-authored-by: ATMxsp01 <shou.xu@intel.com>
2024-11-07 15:22:42 +08:00
Jun Wang
aedc4edfba
[ADD] add open webui + vllm serving (#12246) 2024-10-23 10:13:14 +08:00
Jun Wang
fe3b5cd89b
[Update] mmdocs/dockerguide vllm-quick-start awq,gptq online serving document (#12227)
* [FIX] fix the docker start script error

* [ADD] add awq online serving doc

* [ADD] add gptq online serving doc

* [Fix] small fix
2024-10-18 09:46:59 +08:00
Shaojun Liu
49eb20613a
add --blocksize to doc and script (#12187) 2024-10-12 09:17:42 +08:00
Jun Wang
6ffaec66a2
[UPDATE] add prefix caching document into vllm_docker_quickstart.md (#12173)
* [ADD] rewrite new vllm docker quick start

* [ADD] lora adapter doc finished

* [ADD] mulit lora adapter test successfully

* [ADD] add ipex-llm quantization doc

* [Merge] rebase main

* [REMOVE] rm tmp file

* [Merge] rebase main

* [ADD] add prefix caching experiment and result

* [REMOVE] rm cpu offloading chapter
2024-10-11 19:12:22 +08:00
Jun Wang
412cf8e20c
[UPDATE] update mddocs/DockerGuides/vllm_docker_quickstart.md (#12166)
* [ADD] rewrite new vllm docker quick start

* [ADD] lora adapter doc finished

* [ADD] mulit lora adapter test successfully

* [ADD] add ipex-llm quantization doc

* [UPDATE] update mmdocs vllm_docker_quickstart content

* [REMOVE] rm tmp file

* [UPDATE] tp and pp explaination and readthedoc link change

* [FIX] fix the error description of tp+pp and quantization part

* [FIX] fix the table of verifed model

* [UPDATE] add full low bit para list

* [UPDATE] update the load_in_low_bit params to verifed dtype
2024-10-09 11:19:32 +08:00
Xu, Shuo
fed79f106b
Update mddocs for DockerGuides (#11380)
* transfer files in DockerGuides from rst to md

* add some dividing lines

* adjust the title hierarchy in docker_cpp_xpu_quickstart.md

* restore

* switch to the correct branch

* small change

---------

Co-authored-by: ATMxsp01 <shou.xu@intel.com>
2024-06-21 12:10:35 +08:00
Yuwen Hu
769728c1eb
Add initial md docs (#11371) 2024-06-20 13:47:49 +08:00