Shaojun Liu
c089b6c10d
Update english prompt to 34k ( #12429 )
2024-11-22 11:20:35 +08:00
Wang, Jian4
1bfcbc0640
Add multimodal benchmark ( #12415 )
...
* add benchmark multimodal
* update
* update
* update
2024-11-20 14:21:13 +08:00
Guancheng Fu
d6057f6dd2
Update benchmark_vllm_throughput.py ( #12414 )
2024-11-19 10:41:43 +08:00
Xu, Shuo
6726b198fd
Update readme & doc for the vllm upgrade to v0.6.2 ( #12399 )
...
Co-authored-by: ATMxsp01 <shou.xu@intel.com>
2024-11-14 10:28:15 +08:00
Shaojun Liu
27152476e1
minor fix ( #12389 )
2024-11-12 22:36:43 +08:00
Xu, Shuo
dd8964ba9c
changed inference-cpp/Dockerfile ( #12386 )
...
Co-authored-by: ATMxsp01 <shou.xu@intel.com>
Co-authored-by: Shaojun Liu <61072813+liu-shaojun@users.noreply.github.com>
2024-11-12 20:40:21 +08:00
Guancheng Fu
0ee54fc55f
Upgrade to vllm 0.6.2 ( #12338 )
...
* Initial updates for vllm 0.6.2
* fix
* Change Dockerfile to support v062
* Fix
* fix examples
* Fix
* done
* fix
* Update engine.py
* Fix Dockerfile to original path
* fix
* add option
* fix
* fix
* fix
* fix
---------
Co-authored-by: xiangyuT <xiangyu.tian@intel.com>
2024-11-12 20:35:34 +08:00
Jun Wang
4376fdee62
Decouple the openwebui and the ollama. in inference-cpp-xpu dockerfile ( #12382 )
...
* remove the openwebui in inference-cpp-xpu dockerfile
* update docker_cpp_xpu_quickstart.md
* add sample output in inference-cpp/readme
* remove the openwebui in main readme
* remove the openwebui in main readme
2024-11-12 20:15:23 +08:00
Wang, Jian4
85c9279e6e
Update llama-cpp docker usage ( #12387 )
2024-11-12 15:30:17 +08:00
Shaojun Liu
c92d76b997
Update oneccl-binding.patch ( #12377 )
...
* Add files via upload
* upload oneccl-binding.patch
* Update Dockerfile
2024-11-11 22:34:08 +08:00
Shaojun Liu
fad15c8ca0
Update fastchat demo script ( #12367 )
...
* Update README.md
* Update vllm_docker_quickstart.md
2024-11-08 15:42:17 +08:00
Xu, Shuo
ce0c6ae423
Update Readme for FastChat docker demo ( #12354 )
...
* update Readme for FastChat docker demo
* update readme
* add 'Serving with FastChat' part in docs
* polish docs
---------
Co-authored-by: ATMxsp01 <shou.xu@intel.com>
2024-11-07 15:22:42 +08:00
Xu, Shuo
899a30331a
Replace gradio_web_server.patch to adjust webui ( #12329 )
...
* replace gradio_web_server.patch to adjust webui
* fix patch problem
---------
Co-authored-by: ATMxsp01 <shou.xu@intel.com>
2024-11-06 09:16:32 +08:00
Jun Wang
3700e81977
[fix] vllm-online-benchmark first token latency error ( #12271 )
2024-10-29 17:54:36 +08:00
Guancheng Fu
67014cb29f
Add benchmark_latency.py to docker serving image ( #12283 )
2024-10-28 16:19:59 +08:00
Shaojun Liu
48fc63887d
use oneccl 0.0.5.1 ( #12262 )
2024-10-24 16:12:24 +08:00
Jun Wang
b10fc892e1
Update new reference link of xpu/docker/readme.md ( #12188 )
...
* [ADD] rewrite new vllm docker quick start
* [ADD] lora adapter doc finished
* [ADD] mulit lora adapter test successfully
* [ADD] add ipex-llm quantization doc
* [Merge] rebase main
* [REMOVE] rm tmp file
* [Merge] rebase main
* [ADD] add prefix caching experiment and result
* [REMOVE] rm cpu offloading chapter
* [ADD] rewrite new vllm docker quick start
* [ADD] lora adapter doc finished
* [ADD] mulit lora adapter test successfully
* [ADD] add ipex-llm quantization doc
* [Merge] rebase main
* [REMOVE] rm tmp file
* [Merge] rebase main
* [ADD] rewrite new vllm docker quick start
* [ADD] lora adapter doc finished
* [ADD] mulit lora adapter test successfully
* [ADD] add ipex-llm quantization doc
* [Merge] rebase main
* [REMOVE] rm tmp file
* [Merge] rebase main
* [UPDATE] update the link to new vllm-docker-quickstart
2024-10-18 13:18:08 +08:00
Shaojun Liu
7825dc1398
Upgrade oneccl to 0.0.5 ( #12223 )
2024-10-18 09:29:19 +08:00
Shaojun Liu
26390f9213
Update oneccl_wks_installer to 2024.0.0.4.1 ( #12217 )
2024-10-17 10:11:55 +08:00
Shaojun Liu
49eb20613a
add --blocksize to doc and script ( #12187 )
2024-10-12 09:17:42 +08:00
Shaojun Liu
1daab4531f
Upgrade oneccl to 0.0.4 in serving-xpu image ( #12185 )
...
* Update oneccl to 0.0.4
* upgrade transformers to 4.44.2
2024-10-11 16:54:50 +08:00
Shaojun Liu
657889e3e4
use english prompt by default ( #12115 )
2024-09-24 17:40:50 +08:00
Guancheng Fu
b36359e2ab
Fix xpu serving image oneccl ( #12100 )
2024-09-20 15:25:41 +08:00
Guancheng Fu
a6cbc01911
Use new oneccl for ipex-llm serving image ( #12097 )
2024-09-20 14:52:49 +08:00
Shaojun Liu
1295898830
update vllm_online_benchmark script to support long input ( #12095 )
...
* update vllm_online_benchmark script to support long input
* update guide
2024-09-20 14:18:30 +08:00
Xiangyu Tian
c2774e1a43
Update oneccl to 0.0.3 in serving-xpu image ( #12088 )
2024-09-18 14:29:17 +08:00
Shaojun Liu
beb876665d
pin gradio version to fix connection error ( #12069 )
2024-09-12 14:36:09 +08:00
Shaojun Liu
7e1e51d91a
Update vllm setting ( #12059 )
...
* revert
* update
* update
* update
2024-09-11 11:45:08 +08:00
Shaojun Liu
52863dd567
fix vllm_online_benchmark.py ( #12056 )
2024-09-11 09:45:30 +08:00
Guancheng Fu
69c8d36f16
Switching from vLLM v0.3.3 to vLLM 0.5.4 ( #12042 )
...
* Enable single card sync engine
* enable ipex-llm optimizations for vllm
* enable optimizations for lm_head
* Fix chatglm multi-reference problem
* Remove duplicate layer
* LLM: Update vLLM to v0.5.4 (#11746 )
* Enable single card sync engine
* enable ipex-llm optimizations for vllm
* enable optimizations for lm_head
* Fix chatglm multi-reference problem
* update 0.5.4 api_server
* add dockerfile
* fix
* fix
* refine
* fix
---------
Co-authored-by: gc-fu <guancheng.fu@intel.com>
* Add vllm-0.5.4 Dockerfile (#11838 )
* Update BIGDL_LLM_SDP_IGNORE_MASK in start-vllm-service.sh (#11957 )
* Fix vLLM not convert issues (#11817 ) (#11918 )
* Fix not convert issues
* refine
Co-authored-by: Guancheng Fu <110874468+gc-fu@users.noreply.github.com>
* Fix glm4-9b-chat nan error on vllm 0.5.4 (#11969 )
* init
* update mlp forward
* fix minicpm error in vllm 0.5.4
* fix dependabot alerts (#12008 )
* Update 0.5.4 dockerfile (#12021 )
* Add vllm awq loading logic (#11987 )
* [ADD] Add vllm awq loading logic
* [FIX] fix the module.linear_method path
* [FIX] fix quant_config path error
* Enable Qwen padding mlp to 256 to support batch_forward (#12030 )
* Enable padding mlp
* padding to 256
* update style
* Install 27191 runtime in 0.5.4 docker image (#12040 )
* fix rebase error
* fix rebase error
* vLLM: format for 0.5.4 rebase (#12043 )
* format
* Update model_convert.py
* Fix serving docker related modifications (#12046 )
* Fix undesired modifications (#12048 )
* fix
* Refine offline_inference arguments
---------
Co-authored-by: Xiangyu Tian <109123695+xiangyuT@users.noreply.github.com>
Co-authored-by: Jun Wang <thoughts.times@gmail.com>
Co-authored-by: Wang, Jian4 <61138589+hzjane@users.noreply.github.com>
Co-authored-by: liu-shaojun <johnssalyn@outlook.com>
Co-authored-by: Shaojun Liu <61072813+liu-shaojun@users.noreply.github.com>
2024-09-10 15:37:43 +08:00
Shaojun Liu
1e8c87050f
fix model path ( #11973 )
2024-08-30 13:28:28 +08:00
Shaojun Liu
23f51f87f0
update tag to 2.2.0-SNAPSHOT ( #11947 )
2024-08-28 09:20:32 +08:00
Shaojun Liu
4cf640c548
update docker image tag to 2.2.0-SNAPSHOT ( #11904 )
2024-08-23 13:57:41 +08:00
Wang, Jian4
b119825152
Remove tgi parameter validation ( #11688 )
...
* remove validation
* add min warm up
* remove no need source
2024-07-30 16:37:44 +08:00
Guancheng Fu
86fc0492f4
Update oneccl used ( #11647 )
...
* Add internal oneccl
* fix
* fix
* add oneccl
2024-07-26 09:38:39 +08:00
Wang, Jian4
1eed0635f2
Add lightweight serving and support tgi parameter ( #11600 )
...
* init tgi request
* update openai api
* update for pp
* update and add readme
* add to docker
* add start bash
* update
* update
* update
2024-07-19 13:15:56 +08:00
Wang, Jian4
9c15abf825
Refactor fastapi-serving and add one card serving( #11581 )
...
* init fastapi-serving one card
* mv api code to source
* update worker
* update for style-check
* add worker
* update bash
* update
* update worker name and add readme
* rename update
* rename to fastapi
2024-07-17 11:12:43 +08:00
Xiangyu Tian
7f5111a998
LLM: Refine start script for Pipeline Parallel Serving ( #11557 )
...
Refine start script and readme for Pipeline Parallel Serving
2024-07-11 15:45:27 +08:00
binbin Deng
66f6ffe4b2
Update GPU HF-Transformers example structure ( #11526 )
2024-07-08 17:58:06 +08:00
Shaojun Liu
72b4efaad4
Enhanced XPU Dockerfiles: Optimized Environment Variables and Documentation ( #11506 )
...
* Added SYCL_CACHE_PERSISTENT=1 to xpu Dockerfile
* Update the document to add explanations for environment variables.
* update quickstart
2024-07-04 20:18:38 +08:00
Guancheng Fu
4fbb0d33ae
Pin compute runtime version for xpu images ( #11479 )
...
* pin compute runtime version
* fix done
2024-07-01 21:41:02 +08:00
Wang, Jian4
e000ac90c4
Add pp_serving example to serving image ( #11433 )
...
* init pp
* update
* update
* no clone ipex-llm again
2024-06-28 16:45:25 +08:00
Wang, Jian4
b7bc1023fb
Add vllm_online_benchmark.py ( #11458 )
...
* init
* update and add
* update
2024-06-28 14:59:06 +08:00
Shaojun Liu
5aa3e427a9
Fix docker images ( #11362 )
...
* Fix docker images
* add-apt-repository requires gnupg, gpg-agent, software-properties-common
* update
* avoid importing ipex again
2024-06-20 15:44:55 +08:00
Xiangyu Tian
ef9f740801
Docs: Fix CPU Serving Docker README ( #11351 )
...
Fix CPU Serving Docker README
2024-06-18 16:27:51 +08:00
Guancheng Fu
c9b4cadd81
fix vLLM/docker issues ( #11348 )
...
* fix
* fix
* ffix
2024-06-18 16:23:53 +08:00
Qiyuan Gong
de4bb97b4f
Remove accelerate 0.23.0 install command in readme and docker ( #11333 )
...
*ipex-llm's accelerate has been upgraded to 0.23.0. Remove accelerate 0.23.0 install command in README and docker。
2024-06-17 17:52:12 +08:00
Shaojun Liu
77809be946
Install packages for ipex-llm-serving-cpu docker image ( #11321 )
...
* apt-get install patch
* Update Dockerfile
* Update Dockerfile
* revert
2024-06-14 15:26:01 +08:00
Shaojun Liu
9760ffc256
Fix SDLe CT222 Vulnerabilities ( #11237 )
...
* fix ct222 vuln
* update
* fix
* update ENTRYPOINT
* revert ENTRYPOINT
* Fix CT222 Vulns
* fix
* revert changes
* fix
* revert
* add sudo permission to ipex-llm user
* do not use ipex-llm user
2024-06-13 15:31:22 +08:00
Shaojun Liu
84f04087fb
Add intelanalytics/ipex-llm:sources image for OSPDT ( #11296 )
...
* Add intelanalytics/ipex-llm:sources image
* apt-get source
2024-06-13 14:29:14 +08:00
Guancheng Fu
2e75bbccf9
Add more control arguments for benchmark_vllm_throughput ( #11291 )
2024-06-12 17:43:06 +08:00
Guancheng Fu
eeffeeb2e2
fix benchmark script( #11243 )
2024-06-06 17:44:19 +08:00
Shaojun Liu
1f2057b16a
Fix ipex-llm-cpu docker image ( #11213 )
...
* fix
* fix ipex-llm-cpu image
2024-06-05 11:13:17 +08:00
Xiangyu Tian
ac3d53ff5d
LLM: Fix vLLM CPU version error ( #11206 )
...
Fix vLLM CPU version error
2024-06-04 19:10:23 +08:00
Guancheng Fu
3ef4aa98d1
Refine vllm_quickstart doc ( #11199 )
...
* refine doc
* refine
2024-06-04 18:46:27 +08:00
Shaojun Liu
744042d1b2
remove software-properties-common from Dockerfile ( #11203 )
2024-06-04 17:37:42 +08:00
Guancheng Fu
daf7b1cd56
[Docker] Fix image using two cards error ( #11144 )
...
* fix all
* done
2024-05-27 16:20:13 +08:00
Qiyuan Gong
21a1a973c1
Remove axolotl and python3-blinker ( #11127 )
...
* Remove axolotl from image to reduce image size.
* Remove python3-blinker to avoid axolotl lib conflict.
2024-05-24 13:54:19 +08:00
Wang, Jian4
1443b802cc
Docker:Fix building cpp_docker and remove unimportant dependencies ( #11114 )
...
* test build
* update
2024-05-24 09:49:44 +08:00
Xiangyu Tian
b3f6faa038
LLM: Add CPU vLLM entrypoint ( #11083 )
...
Add CPU vLLM entrypoint and update CPU vLLM serving example.
2024-05-24 09:16:59 +08:00
Shaojun Liu
e0f401d97d
FIX: APT Repository not working (signatures invalid) ( #11112 )
...
* chmod 644 gpg key
* chmod 644 gpg key
2024-05-23 16:15:45 +08:00
binbin Deng
ecb16dcf14
Add deepspeed autotp support for xpu docker ( #11077 )
2024-05-21 14:49:54 +08:00
Wang, Jian4
00d4410746
Update cpp docker quickstart ( #11040 )
...
* add sample output
* update link
* update
* update header
* update
2024-05-16 14:55:13 +08:00
Guancheng Fu
7e29928865
refactor serving docker image ( #11028 )
2024-05-16 09:30:36 +08:00
Wang, Jian4
86cec80b51
LLM: Add llm inference_cpp_xpu_docker ( #10933 )
...
* test_cpp_docker
* update
* update
* update
* update
* add sudo
* update nodejs version
* no need npm
* remove blinker
* new cpp docker
* restore
* add line
* add manually_build
* update and add mtl
* update for workdir llm
* add benchmark part
* update readme
* update 1024-128
* update readme
* update
* fix
* update
* update
* update readme too
* update readme
* no change
* update dir_name
* update readme
2024-05-15 11:10:22 +08:00
Qiyuan Gong
1e00bd7bbe
Re-org XPU finetune images ( #10971 )
...
* Rename xpu finetune image from `ipex-llm-finetune-qlora-xpu` to `ipex-llm-finetune-xpu`.
* Add axolotl to xpu finetune image.
* Upgrade peft to 0.10.0, transformers to 4.36.0.
* Add accelerate default config to home.
2024-05-15 09:42:43 +08:00
Shengsheng Huang
0b7e78b592
revise the benchmark part in python inference docker ( #11020 )
2024-05-14 18:43:41 +08:00
Shengsheng Huang
586a151f9c
update the README and reorganize the docker guides structure. ( #11016 )
...
* update the README and reorganize the docker guides structure.
* modified docker install guide into overview
2024-05-14 17:56:11 +08:00
Shaojun Liu
7f8c5b410b
Quickstart: Run PyTorch Inference on Intel GPU using Docker (on Linux or WSL) ( #10970 )
...
* add entrypoint.sh
* add quickstart
* remove entrypoint
* update
* Install related library of benchmarking
* update
* print out results
* update docs
* minor update
* update
* update quickstart
* update
* update
* update
* update
* update
* update
* add chat & example section
* add more details
* minor update
* rename quickstart
* update
* minor update
* update
* update config.yaml
* update readme
* use --gpu
* add tips
* minor update
* update
2024-05-14 12:58:31 +08:00
Zephyr1101
7e7d969dcb
a experimental for workflow abuse step1 fix a typo ( #10965 )
...
* Update llm_unit_tests.yml
* Update README.md
* Update llm_unit_tests.yml
* Update llm_unit_tests.yml
2024-05-08 17:12:50 +08:00
Qiyuan Gong
c11170b96f
Upgrade Peft to 0.10.0 in finetune examples and docker ( #10930 )
...
* Upgrade Peft to 0.10.0 in finetune examples.
* Upgrade Peft to 0.10.0 in docker.
2024-05-07 15:12:26 +08:00
Qiyuan Gong
41ffe1526c
Modify CPU finetune docker for bz2 error ( #10919 )
...
* Avoid bz2 error
* change to cpu torch
2024-05-06 10:41:50 +08:00
Guancheng Fu
2c64754eb0
Add vLLM to ipex-llm serving image ( #10807 )
...
* add vllm
* done
* doc work
* fix done
* temp
* add docs
* format
* add start-fastchat-service.sh
* fix
2024-04-29 17:25:42 +08:00
Heyang Sun
751f6d11d8
fix typos in qlora README ( #10893 )
2024-04-26 14:03:06 +08:00
Guancheng Fu
3b82834aaf
Update README.md ( #10838 )
2024-04-22 14:18:51 +08:00
Shaojun Liu
7297036c03
upgrade python ( #10769 )
2024-04-16 09:28:10 +08:00
Shaojun Liu
3590e1be83
revert python to 3.9 for finetune image ( #10758 )
2024-04-15 10:37:10 +08:00
Shaojun Liu
29bf28bd6f
Upgrade python to 3.11 in Docker Image ( #10718 )
...
* install python 3.11 for cpu-inference docker image
* update xpu-inference dockerfile
* update cpu-serving image
* update qlora image
* update lora image
* update document
2024-04-10 14:41:27 +08:00
Heyang Sun
4f6df37805
fix wrong cpu core num seen by docker ( #10645 )
2024-04-03 15:52:25 +08:00
Shaojun Liu
1aef3bc0ab
verify and refine ipex-llm-finetune-qlora-xpu docker document ( #10638 )
...
* verify and refine finetune-xpu document
* update export_merged_model.py link
* update link
2024-04-03 11:33:13 +08:00
Heyang Sun
b8b923ed04
move chown step to behind add script in qlora Dockerfile
2024-04-02 23:04:51 +08:00
Shaojun Liu
a10f5a1b8d
add python style check ( #10620 )
...
* add python style check
* fix style checks
* update runner
* add ipex-llm-finetune-qlora-cpu-k8s to manually_build workflow
* update tag to 2.1.0-SNAPSHOT
2024-04-02 16:17:56 +08:00
Shaojun Liu
20a5e72da0
refine and verify ipex-llm-serving-xpu docker document ( #10615 )
...
* refine serving on cpu/xpu
* minor fix
* replace localhost with 0.0.0.0 so that service can be accessed through ip address
2024-04-02 11:45:45 +08:00
Shaojun Liu
59058bb206
replace 2.5.0-SNAPSHOT with 2.1.0-SNAPSHOT for llm docker images ( #10603 )
2024-04-01 09:58:51 +08:00
Shaojun Liu
b06de94a50
verify xpu-inference image and refine document ( #10593 )
2024-03-29 16:11:12 +08:00
Shaojun Liu
52f1b541cf
refine and verify ipex-inference-cpu docker document ( #10565 )
...
* restructure the index
* refine and verify cpu-inference document
* update
2024-03-29 10:16:10 +08:00
ZehuaCao
52a2135d83
Replace ipex with ipex-llm ( #10554 )
...
* fix ipex with ipex_llm
* fix ipex with ipex_llm
* update
* update
* update
* update
* update
* update
* update
* update
2024-03-28 13:54:40 +08:00
Cheen Hau, 俊豪
1c5eb14128
Update pip install to use --extra-index-url for ipex package ( #10557 )
...
* Change to 'pip install .. --extra-index-url' for readthedocs
* Change to 'pip install .. --extra-index-url' for examples
* Change to 'pip install .. --extra-index-url' for remaining files
* Fix URL for ipex
* Add links for ipex US and CN servers
* Update ipex cpu url
* remove readme
* Update for github actions
* Update for dockerfiles
2024-03-28 09:56:23 +08:00
Wang, Jian4
e2d25de17d
Update_docker by heyang ( #29 )
2024-03-25 10:05:46 +08:00
Wang, Jian4
9df70d95eb
Refactor bigdl.llm to ipex_llm ( #24 )
...
* Rename bigdl/llm to ipex_llm
* rm python/llm/src/bigdl
* from bigdl.llm to from ipex_llm
2024-03-22 15:41:21 +08:00
Heyang Sun
c672e97239
Fix CPU finetuning docker ( #10494 )
...
* Fix CPU finetuning docker
* Update README.md
2024-03-21 11:53:30 +08:00
Shaojun Liu
0e388f4b91
Fix Trivy Docker Image Vulnerabilities for BigDL Release 2.5.0 ( #10447 )
...
* Update pypi version to fix trivy issues
* refine
2024-03-19 14:52:15 +08:00
Wang, Jian4
1de13ea578
LLM: remove CPU english_quotes dataset and update docker example ( #10399 )
...
* update dataset
* update readme
* update docker cpu
* update xpu docker
2024-03-18 10:45:14 +08:00
ZehuaCao
146b77f113
fix qlora-finetune Dockerfile ( #10379 )
2024-03-12 13:20:06 +08:00
ZehuaCao
267de7abc3
fix fschat DEP version error ( #10325 )
2024-03-06 16:15:27 +08:00
Lilac09
a2ed4d714e
Fix vllm service error ( #10279 )
2024-02-29 15:45:04 +08:00
Ziteng Zhang
e08c74f1d1
Fix build error of bigdl-llm-cpu ( #10228 )
2024-02-23 16:30:21 +08:00
Ziteng Zhang
f7e2591f15
[LLM] change IPEX230 to IPEX220 in dockerfile ( #10222 )
...
* change IPEX230 to IPEX220 in dockerfile
2024-02-23 15:02:08 +08:00
Shaojun Liu
079f2011ea
Update bigdl-llm-finetune-qlora-xpu Docker Image ( #10194 )
...
* Bump oneapi version to 2024.0
* pip install bitsandbytes scipy
* Pin level-zero-gpu version
* Pin accelerate version 0.23.0
2024-02-21 15:18:27 +08:00
Lilac09
eca69a6022
Fix build error of bigdl-llm-cpu ( #10176 )
...
* fix build error
* fix build error
* fix build error
* fix build error
2024-02-20 14:50:12 +08:00