Shaojun Liu
48fc63887d
use oneccl 0.0.5.1 ( #12262 )
2024-10-24 16:12:24 +08:00
Jun Wang
b10fc892e1
Update new reference link of xpu/docker/readme.md ( #12188 )
...
* [ADD] rewrite new vllm docker quick start
* [ADD] lora adapter doc finished
* [ADD] mulit lora adapter test successfully
* [ADD] add ipex-llm quantization doc
* [Merge] rebase main
* [REMOVE] rm tmp file
* [Merge] rebase main
* [ADD] add prefix caching experiment and result
* [REMOVE] rm cpu offloading chapter
* [ADD] rewrite new vllm docker quick start
* [ADD] lora adapter doc finished
* [ADD] mulit lora adapter test successfully
* [ADD] add ipex-llm quantization doc
* [Merge] rebase main
* [REMOVE] rm tmp file
* [Merge] rebase main
* [ADD] rewrite new vllm docker quick start
* [ADD] lora adapter doc finished
* [ADD] mulit lora adapter test successfully
* [ADD] add ipex-llm quantization doc
* [Merge] rebase main
* [REMOVE] rm tmp file
* [Merge] rebase main
* [UPDATE] update the link to new vllm-docker-quickstart
2024-10-18 13:18:08 +08:00
Shaojun Liu
7825dc1398
Upgrade oneccl to 0.0.5 ( #12223 )
2024-10-18 09:29:19 +08:00
Shaojun Liu
26390f9213
Update oneccl_wks_installer to 2024.0.0.4.1 ( #12217 )
2024-10-17 10:11:55 +08:00
Shaojun Liu
49eb20613a
add --blocksize to doc and script ( #12187 )
2024-10-12 09:17:42 +08:00
Shaojun Liu
1daab4531f
Upgrade oneccl to 0.0.4 in serving-xpu image ( #12185 )
...
* Update oneccl to 0.0.4
* upgrade transformers to 4.44.2
2024-10-11 16:54:50 +08:00
Shaojun Liu
657889e3e4
use english prompt by default ( #12115 )
2024-09-24 17:40:50 +08:00
Guancheng Fu
b36359e2ab
Fix xpu serving image oneccl ( #12100 )
2024-09-20 15:25:41 +08:00
Guancheng Fu
a6cbc01911
Use new oneccl for ipex-llm serving image ( #12097 )
2024-09-20 14:52:49 +08:00
Shaojun Liu
1295898830
update vllm_online_benchmark script to support long input ( #12095 )
...
* update vllm_online_benchmark script to support long input
* update guide
2024-09-20 14:18:30 +08:00
Xiangyu Tian
c2774e1a43
Update oneccl to 0.0.3 in serving-xpu image ( #12088 )
2024-09-18 14:29:17 +08:00
Shaojun Liu
beb876665d
pin gradio version to fix connection error ( #12069 )
2024-09-12 14:36:09 +08:00
Shaojun Liu
7e1e51d91a
Update vllm setting ( #12059 )
...
* revert
* update
* update
* update
2024-09-11 11:45:08 +08:00
Shaojun Liu
52863dd567
fix vllm_online_benchmark.py ( #12056 )
2024-09-11 09:45:30 +08:00
Guancheng Fu
69c8d36f16
Switching from vLLM v0.3.3 to vLLM 0.5.4 ( #12042 )
...
* Enable single card sync engine
* enable ipex-llm optimizations for vllm
* enable optimizations for lm_head
* Fix chatglm multi-reference problem
* Remove duplicate layer
* LLM: Update vLLM to v0.5.4 (#11746 )
* Enable single card sync engine
* enable ipex-llm optimizations for vllm
* enable optimizations for lm_head
* Fix chatglm multi-reference problem
* update 0.5.4 api_server
* add dockerfile
* fix
* fix
* refine
* fix
---------
Co-authored-by: gc-fu <guancheng.fu@intel.com>
* Add vllm-0.5.4 Dockerfile (#11838 )
* Update BIGDL_LLM_SDP_IGNORE_MASK in start-vllm-service.sh (#11957 )
* Fix vLLM not convert issues (#11817 ) (#11918 )
* Fix not convert issues
* refine
Co-authored-by: Guancheng Fu <110874468+gc-fu@users.noreply.github.com>
* Fix glm4-9b-chat nan error on vllm 0.5.4 (#11969 )
* init
* update mlp forward
* fix minicpm error in vllm 0.5.4
* fix dependabot alerts (#12008 )
* Update 0.5.4 dockerfile (#12021 )
* Add vllm awq loading logic (#11987 )
* [ADD] Add vllm awq loading logic
* [FIX] fix the module.linear_method path
* [FIX] fix quant_config path error
* Enable Qwen padding mlp to 256 to support batch_forward (#12030 )
* Enable padding mlp
* padding to 256
* update style
* Install 27191 runtime in 0.5.4 docker image (#12040 )
* fix rebase error
* fix rebase error
* vLLM: format for 0.5.4 rebase (#12043 )
* format
* Update model_convert.py
* Fix serving docker related modifications (#12046 )
* Fix undesired modifications (#12048 )
* fix
* Refine offline_inference arguments
---------
Co-authored-by: Xiangyu Tian <109123695+xiangyuT@users.noreply.github.com>
Co-authored-by: Jun Wang <thoughts.times@gmail.com>
Co-authored-by: Wang, Jian4 <61138589+hzjane@users.noreply.github.com>
Co-authored-by: liu-shaojun <johnssalyn@outlook.com>
Co-authored-by: Shaojun Liu <61072813+liu-shaojun@users.noreply.github.com>
2024-09-10 15:37:43 +08:00
Shaojun Liu
1e8c87050f
fix model path ( #11973 )
2024-08-30 13:28:28 +08:00
Shaojun Liu
4cf640c548
update docker image tag to 2.2.0-SNAPSHOT ( #11904 )
2024-08-23 13:57:41 +08:00
Wang, Jian4
b119825152
Remove tgi parameter validation ( #11688 )
...
* remove validation
* add min warm up
* remove no need source
2024-07-30 16:37:44 +08:00
Guancheng Fu
86fc0492f4
Update oneccl used ( #11647 )
...
* Add internal oneccl
* fix
* fix
* add oneccl
2024-07-26 09:38:39 +08:00
Wang, Jian4
1eed0635f2
Add lightweight serving and support tgi parameter ( #11600 )
...
* init tgi request
* update openai api
* update for pp
* update and add readme
* add to docker
* add start bash
* update
* update
* update
2024-07-19 13:15:56 +08:00
Xiangyu Tian
7f5111a998
LLM: Refine start script for Pipeline Parallel Serving ( #11557 )
...
Refine start script and readme for Pipeline Parallel Serving
2024-07-11 15:45:27 +08:00
Wang, Jian4
e000ac90c4
Add pp_serving example to serving image ( #11433 )
...
* init pp
* update
* update
* no clone ipex-llm again
2024-06-28 16:45:25 +08:00
Wang, Jian4
b7bc1023fb
Add vllm_online_benchmark.py ( #11458 )
...
* init
* update and add
* update
2024-06-28 14:59:06 +08:00
Shaojun Liu
5aa3e427a9
Fix docker images ( #11362 )
...
* Fix docker images
* add-apt-repository requires gnupg, gpg-agent, software-properties-common
* update
* avoid importing ipex again
2024-06-20 15:44:55 +08:00
Guancheng Fu
c9b4cadd81
fix vLLM/docker issues ( #11348 )
...
* fix
* fix
* ffix
2024-06-18 16:23:53 +08:00
Shaojun Liu
9760ffc256
Fix SDLe CT222 Vulnerabilities ( #11237 )
...
* fix ct222 vuln
* update
* fix
* update ENTRYPOINT
* revert ENTRYPOINT
* Fix CT222 Vulns
* fix
* revert changes
* fix
* revert
* add sudo permission to ipex-llm user
* do not use ipex-llm user
2024-06-13 15:31:22 +08:00
Guancheng Fu
2e75bbccf9
Add more control arguments for benchmark_vllm_throughput ( #11291 )
2024-06-12 17:43:06 +08:00
Guancheng Fu
eeffeeb2e2
fix benchmark script( #11243 )
2024-06-06 17:44:19 +08:00
Guancheng Fu
3ef4aa98d1
Refine vllm_quickstart doc ( #11199 )
...
* refine doc
* refine
2024-06-04 18:46:27 +08:00
Xiangyu Tian
b3f6faa038
LLM: Add CPU vLLM entrypoint ( #11083 )
...
Add CPU vLLM entrypoint and update CPU vLLM serving example.
2024-05-24 09:16:59 +08:00
Guancheng Fu
7e29928865
refactor serving docker image ( #11028 )
2024-05-16 09:30:36 +08:00
Guancheng Fu
2c64754eb0
Add vLLM to ipex-llm serving image ( #10807 )
...
* add vllm
* done
* doc work
* fix done
* temp
* add docs
* format
* add start-fastchat-service.sh
* fix
2024-04-29 17:25:42 +08:00
Shaojun Liu
59058bb206
replace 2.5.0-SNAPSHOT with 2.1.0-SNAPSHOT for llm docker images ( #10603 )
2024-04-01 09:58:51 +08:00
Wang, Jian4
e2d25de17d
Update_docker by heyang ( #29 )
2024-03-25 10:05:46 +08:00
Wang, Jian4
9df70d95eb
Refactor bigdl.llm to ipex_llm ( #24 )
...
* Rename bigdl/llm to ipex_llm
* rm python/llm/src/bigdl
* from bigdl.llm to from ipex_llm
2024-03-22 15:41:21 +08:00
Shaojun Liu
0e388f4b91
Fix Trivy Docker Image Vulnerabilities for BigDL Release 2.5.0 ( #10447 )
...
* Update pypi version to fix trivy issues
* refine
2024-03-19 14:52:15 +08:00
Lilac09
052962dfa5
Using original fastchat and add bigdl worker in docker image ( #9967 )
...
* add vllm worker
* add options in entrypoint
2024-01-23 14:17:05 +08:00
Shaojun Liu
0e5ab5ebfc
update docker tag to 2.5.0-SNAPSHOT ( #9443 )
2023-11-13 16:53:40 +08:00
Lilac09
74a8ad32dc
Add entry point to llm-serving-xpu ( #9339 )
...
* add entry point to llm-serving-xpu
* manually build
* manually build
* add entry point to llm-serving-xpu
* manually build
* add entry point to llm-serving-xpu
* add entry point to llm-serving-xpu
* add entry point to llm-serving-xpu
2023-11-02 16:31:07 +08:00
Lilac09
2c2bc959ad
add tools into previously built images ( #9317 )
...
* modify Dockerfile
* manually build
* modify Dockerfile
* add chat.py into inference-xpu
* add benchmark into inference-cpu
* manually build
* add benchmark into inference-cpu
* add benchmark into inference-cpu
* add benchmark into inference-cpu
* add chat.py into inference-xpu
* add chat.py into inference-xpu
* change ADD to COPY in dockerfile
* fix dependency issue
* temporarily remove run-spr in llm-cpu
* temporarily remove run-spr in llm-cpu
2023-10-31 16:35:18 +08:00
Guancheng Fu
cc84ed70b3
Create serving images ( #9048 )
...
* Finished & Tested
* Install latest pip from base images
* Add blank line
* Delete unused comment
* fix typos
2023-09-25 15:51:45 +08:00