ipex-llm

Author	SHA1	Message	Date
Shaojun Liu	7e1e51d91a	Update vllm setting (#12059 ) * revert * update * update * update	2024-09-11 11:45:08 +08:00
Shaojun Liu	52863dd567	fix vllm_online_benchmark.py (#12056 )	2024-09-11 09:45:30 +08:00
Guancheng Fu	69c8d36f16	Switching from vLLM v0.3.3 to vLLM 0.5.4 (#12042 ) * Enable single card sync engine * enable ipex-llm optimizations for vllm * enable optimizations for lm_head * Fix chatglm multi-reference problem * Remove duplicate layer * LLM: Update vLLM to v0.5.4 (#11746) * Enable single card sync engine * enable ipex-llm optimizations for vllm * enable optimizations for lm_head * Fix chatglm multi-reference problem * update 0.5.4 api_server * add dockerfile * fix * fix * refine * fix --------- Co-authored-by: gc-fu <guancheng.fu@intel.com> * Add vllm-0.5.4 Dockerfile (#11838) * Update BIGDL_LLM_SDP_IGNORE_MASK in start-vllm-service.sh (#11957) * Fix vLLM not convert issues (#11817) (#11918) * Fix not convert issues * refine Co-authored-by: Guancheng Fu <110874468+gc-fu@users.noreply.github.com> * Fix glm4-9b-chat nan error on vllm 0.5.4 (#11969) * init * update mlp forward * fix minicpm error in vllm 0.5.4 * fix dependabot alerts (#12008) * Update 0.5.4 dockerfile (#12021) * Add vllm awq loading logic (#11987) * [ADD] Add vllm awq loading logic * [FIX] fix the module.linear_method path * [FIX] fix quant_config path error * Enable Qwen padding mlp to 256 to support batch_forward (#12030) * Enable padding mlp * padding to 256 * update style * Install 27191 runtime in 0.5.4 docker image (#12040) * fix rebase error * fix rebase error * vLLM: format for 0.5.4 rebase (#12043) * format * Update model_convert.py * Fix serving docker related modifications (#12046) * Fix undesired modifications (#12048) * fix * Refine offline_inference arguments --------- Co-authored-by: Xiangyu Tian <109123695+xiangyuT@users.noreply.github.com> Co-authored-by: Jun Wang <thoughts.times@gmail.com> Co-authored-by: Wang, Jian4 <61138589+hzjane@users.noreply.github.com> Co-authored-by: liu-shaojun <johnssalyn@outlook.com> Co-authored-by: Shaojun Liu <61072813+liu-shaojun@users.noreply.github.com>	2024-09-10 15:37:43 +08:00
Shaojun Liu	1e8c87050f	fix model path (#11973 )	2024-08-30 13:28:28 +08:00
Shaojun Liu	23f51f87f0	update tag to 2.2.0-SNAPSHOT (#11947 )	2024-08-28 09:20:32 +08:00
Shaojun Liu	4cf640c548	update docker image tag to 2.2.0-SNAPSHOT (#11904 )	2024-08-23 13:57:41 +08:00
Wang, Jian4	b119825152	Remove tgi parameter validation (#11688 ) * remove validation * add min warm up * remove no need source	2024-07-30 16:37:44 +08:00
Guancheng Fu	86fc0492f4	Update oneccl used (#11647 ) * Add internal oneccl * fix * fix * add oneccl	2024-07-26 09:38:39 +08:00
Wang, Jian4	1eed0635f2	Add lightweight serving and support tgi parameter (#11600 ) * init tgi request * update openai api * update for pp * update and add readme * add to docker * add start bash * update * update * update	2024-07-19 13:15:56 +08:00
Wang, Jian4	9c15abf825	Refactor fastapi-serving and add one card serving(#11581 ) * init fastapi-serving one card * mv api code to source * update worker * update for style-check * add worker * update bash * update * update worker name and add readme * rename update * rename to fastapi	2024-07-17 11:12:43 +08:00
Xiangyu Tian	7f5111a998	LLM: Refine start script for Pipeline Parallel Serving (#11557 ) Refine start script and readme for Pipeline Parallel Serving	2024-07-11 15:45:27 +08:00
binbin Deng	66f6ffe4b2	Update GPU HF-Transformers example structure (#11526 )	2024-07-08 17:58:06 +08:00
Shaojun Liu	72b4efaad4	Enhanced XPU Dockerfiles: Optimized Environment Variables and Documentation (#11506 ) * Added SYCL_CACHE_PERSISTENT=1 to xpu Dockerfile * Update the document to add explanations for environment variables. * update quickstart	2024-07-04 20:18:38 +08:00
Guancheng Fu	4fbb0d33ae	Pin compute runtime version for xpu images (#11479 ) * pin compute runtime version * fix done	2024-07-01 21:41:02 +08:00
Wang, Jian4	e000ac90c4	Add pp_serving example to serving image (#11433 ) * init pp * update * update * no clone ipex-llm again	2024-06-28 16:45:25 +08:00
Wang, Jian4	b7bc1023fb	Add vllm_online_benchmark.py (#11458 ) * init * update and add * update	2024-06-28 14:59:06 +08:00
Shaojun Liu	5aa3e427a9	Fix docker images (#11362 ) * Fix docker images * add-apt-repository requires gnupg, gpg-agent, software-properties-common * update * avoid importing ipex again	2024-06-20 15:44:55 +08:00
Xiangyu Tian	ef9f740801	Docs: Fix CPU Serving Docker README (#11351 ) Fix CPU Serving Docker README	2024-06-18 16:27:51 +08:00
Guancheng Fu	c9b4cadd81	fix vLLM/docker issues (#11348 ) * fix * fix * ffix	2024-06-18 16:23:53 +08:00
Qiyuan Gong	de4bb97b4f	Remove accelerate 0.23.0 install command in readme and docker (#11333 ) *ipex-llm's accelerate has been upgraded to 0.23.0. Remove accelerate 0.23.0 install command in README and docker。	2024-06-17 17:52:12 +08:00
Shaojun Liu	77809be946	Install packages for ipex-llm-serving-cpu docker image (#11321 ) * apt-get install patch * Update Dockerfile * Update Dockerfile * revert	2024-06-14 15:26:01 +08:00
Shaojun Liu	9760ffc256	Fix SDLe CT222 Vulnerabilities (#11237 ) * fix ct222 vuln * update * fix * update ENTRYPOINT * revert ENTRYPOINT * Fix CT222 Vulns * fix * revert changes * fix * revert * add sudo permission to ipex-llm user * do not use ipex-llm user	2024-06-13 15:31:22 +08:00
Shaojun Liu	84f04087fb	Add intelanalytics/ipex-llm:sources image for OSPDT (#11296 ) * Add intelanalytics/ipex-llm:sources image * apt-get source	2024-06-13 14:29:14 +08:00
Guancheng Fu	2e75bbccf9	Add more control arguments for benchmark_vllm_throughput (#11291 )	2024-06-12 17:43:06 +08:00
Guancheng Fu	eeffeeb2e2	fix benchmark script(#11243 )	2024-06-06 17:44:19 +08:00
Shaojun Liu	1f2057b16a	Fix ipex-llm-cpu docker image (#11213 ) * fix * fix ipex-llm-cpu image	2024-06-05 11:13:17 +08:00
Xiangyu Tian	ac3d53ff5d	LLM: Fix vLLM CPU version error (#11206 ) Fix vLLM CPU version error	2024-06-04 19:10:23 +08:00
Guancheng Fu	3ef4aa98d1	Refine vllm_quickstart doc (#11199 ) * refine doc * refine	2024-06-04 18:46:27 +08:00
Shaojun Liu	744042d1b2	remove software-properties-common from Dockerfile (#11203 )	2024-06-04 17:37:42 +08:00
Guancheng Fu	daf7b1cd56	[Docker] Fix image using two cards error (#11144 ) * fix all * done	2024-05-27 16:20:13 +08:00
Qiyuan Gong	21a1a973c1	Remove axolotl and python3-blinker (#11127 ) * Remove axolotl from image to reduce image size. * Remove python3-blinker to avoid axolotl lib conflict.	2024-05-24 13:54:19 +08:00
Wang, Jian4	1443b802cc	Docker：Fix building cpp_docker and remove unimportant dependencies (#11114 ) * test build * update	2024-05-24 09:49:44 +08:00
Xiangyu Tian	b3f6faa038	LLM: Add CPU vLLM entrypoint (#11083 ) Add CPU vLLM entrypoint and update CPU vLLM serving example.	2024-05-24 09:16:59 +08:00
Shaojun Liu	e0f401d97d	FIX: APT Repository not working (signatures invalid) (#11112 ) * chmod 644 gpg key * chmod 644 gpg key	2024-05-23 16:15:45 +08:00
binbin Deng	ecb16dcf14	Add deepspeed autotp support for xpu docker (#11077 )	2024-05-21 14:49:54 +08:00
Wang, Jian4	00d4410746	Update cpp docker quickstart (#11040 ) * add sample output * update link * update * update header * update	2024-05-16 14:55:13 +08:00
Guancheng Fu	7e29928865	refactor serving docker image (#11028 )	2024-05-16 09:30:36 +08:00
Wang, Jian4	86cec80b51	LLM: Add llm inference_cpp_xpu_docker (#10933 ) * test_cpp_docker * update * update * update * update * add sudo * update nodejs version * no need npm * remove blinker * new cpp docker * restore * add line * add manually_build * update and add mtl * update for workdir llm * add benchmark part * update readme * update 1024-128 * update readme * update * fix * update * update * update readme too * update readme * no change * update dir_name * update readme	2024-05-15 11:10:22 +08:00
Qiyuan Gong	1e00bd7bbe	Re-org XPU finetune images (#10971 ) * Rename xpu finetune image from `ipex-llm-finetune-qlora-xpu` to `ipex-llm-finetune-xpu`. * Add axolotl to xpu finetune image. * Upgrade peft to 0.10.0, transformers to 4.36.0. * Add accelerate default config to home.	2024-05-15 09:42:43 +08:00
Shengsheng Huang	0b7e78b592	revise the benchmark part in python inference docker (#11020 )	2024-05-14 18:43:41 +08:00
Shengsheng Huang	586a151f9c	update the README and reorganize the docker guides structure. (#11016 ) * update the README and reorganize the docker guides structure. * modified docker install guide into overview	2024-05-14 17:56:11 +08:00
Shaojun Liu	7f8c5b410b	Quickstart: Run PyTorch Inference on Intel GPU using Docker (on Linux or WSL) (#10970 ) * add entrypoint.sh * add quickstart * remove entrypoint * update * Install related library of benchmarking * update * print out results * update docs * minor update * update * update quickstart * update * update * update * update * update * update * add chat & example section * add more details * minor update * rename quickstart * update * minor update * update * update config.yaml * update readme * use --gpu * add tips * minor update * update	2024-05-14 12:58:31 +08:00
Zephyr1101	7e7d969dcb	a experimental for workflow abuse step1 fix a typo (#10965 ) * Update llm_unit_tests.yml * Update README.md * Update llm_unit_tests.yml * Update llm_unit_tests.yml	2024-05-08 17:12:50 +08:00
Qiyuan Gong	c11170b96f	Upgrade Peft to 0.10.0 in finetune examples and docker (#10930 ) * Upgrade Peft to 0.10.0 in finetune examples. * Upgrade Peft to 0.10.0 in docker.	2024-05-07 15:12:26 +08:00
Qiyuan Gong	41ffe1526c	Modify CPU finetune docker for bz2 error (#10919 ) * Avoid bz2 error * change to cpu torch	2024-05-06 10:41:50 +08:00
Guancheng Fu	2c64754eb0	Add vLLM to ipex-llm serving image (#10807 ) * add vllm * done * doc work * fix done * temp * add docs * format * add start-fastchat-service.sh * fix	2024-04-29 17:25:42 +08:00
Heyang Sun	751f6d11d8	fix typos in qlora README (#10893 )	2024-04-26 14:03:06 +08:00
Guancheng Fu	3b82834aaf	Update README.md (#10838 )	2024-04-22 14:18:51 +08:00
Shaojun Liu	7297036c03	upgrade python (#10769 )	2024-04-16 09:28:10 +08:00
Shaojun Liu	3590e1be83	revert python to 3.9 for finetune image (#10758 )	2024-04-15 10:37:10 +08:00
Shaojun Liu	29bf28bd6f	Upgrade python to 3.11 in Docker Image (#10718 ) * install python 3.11 for cpu-inference docker image * update xpu-inference dockerfile * update cpu-serving image * update qlora image * update lora image * update document	2024-04-10 14:41:27 +08:00
Heyang Sun	4f6df37805	fix wrong cpu core num seen by docker (#10645 )	2024-04-03 15:52:25 +08:00
Shaojun Liu	1aef3bc0ab	verify and refine ipex-llm-finetune-qlora-xpu docker document (#10638 ) * verify and refine finetune-xpu document * update export_merged_model.py link * update link	2024-04-03 11:33:13 +08:00
Heyang Sun	b8b923ed04	move chown step to behind add script in qlora Dockerfile	2024-04-02 23:04:51 +08:00
Shaojun Liu	a10f5a1b8d	add python style check (#10620 ) * add python style check * fix style checks * update runner * add ipex-llm-finetune-qlora-cpu-k8s to manually_build workflow * update tag to 2.1.0-SNAPSHOT	2024-04-02 16:17:56 +08:00
Shaojun Liu	20a5e72da0	refine and verify ipex-llm-serving-xpu docker document (#10615 ) * refine serving on cpu/xpu * minor fix * replace localhost with 0.0.0.0 so that service can be accessed through ip address	2024-04-02 11:45:45 +08:00
Shaojun Liu	59058bb206	replace 2.5.0-SNAPSHOT with 2.1.0-SNAPSHOT for llm docker images (#10603 )	2024-04-01 09:58:51 +08:00
Shaojun Liu	b06de94a50	verify xpu-inference image and refine document (#10593 )	2024-03-29 16:11:12 +08:00
Shaojun Liu	52f1b541cf	refine and verify ipex-inference-cpu docker document (#10565 ) * restructure the index * refine and verify cpu-inference document * update	2024-03-29 10:16:10 +08:00
ZehuaCao	52a2135d83	Replace ipex with ipex-llm (#10554 ) * fix ipex with ipex_llm * fix ipex with ipex_llm * update * update * update * update * update * update * update * update	2024-03-28 13:54:40 +08:00
Cheen Hau, 俊豪	1c5eb14128	Update pip install to use --extra-index-url for ipex package (#10557 ) * Change to 'pip install .. --extra-index-url' for readthedocs * Change to 'pip install .. --extra-index-url' for examples * Change to 'pip install .. --extra-index-url' for remaining files * Fix URL for ipex * Add links for ipex US and CN servers * Update ipex cpu url * remove readme * Update for github actions * Update for dockerfiles	2024-03-28 09:56:23 +08:00
Wang, Jian4	e2d25de17d	Update_docker by heyang (#29 )	2024-03-25 10:05:46 +08:00
Wang, Jian4	9df70d95eb	Refactor bigdl.llm to ipex_llm (#24 ) * Rename bigdl/llm to ipex_llm * rm python/llm/src/bigdl * from bigdl.llm to from ipex_llm	2024-03-22 15:41:21 +08:00
Heyang Sun	c672e97239	Fix CPU finetuning docker (#10494 ) * Fix CPU finetuning docker * Update README.md	2024-03-21 11:53:30 +08:00
Shaojun Liu	0e388f4b91	Fix Trivy Docker Image Vulnerabilities for BigDL Release 2.5.0 (#10447 ) * Update pypi version to fix trivy issues * refine	2024-03-19 14:52:15 +08:00
Wang, Jian4	1de13ea578	LLM: remove CPU english_quotes dataset and update docker example (#10399 ) * update dataset * update readme * update docker cpu * update xpu docker	2024-03-18 10:45:14 +08:00
ZehuaCao	146b77f113	fix qlora-finetune Dockerfile (#10379 )	2024-03-12 13:20:06 +08:00
ZehuaCao	267de7abc3	fix fschat DEP version error (#10325 )	2024-03-06 16:15:27 +08:00
Lilac09	a2ed4d714e	Fix vllm service error (#10279 )	2024-02-29 15:45:04 +08:00
Ziteng Zhang	e08c74f1d1	Fix build error of bigdl-llm-cpu (#10228 )	2024-02-23 16:30:21 +08:00
Ziteng Zhang	f7e2591f15	[LLM] change IPEX230 to IPEX220 in dockerfile (#10222 ) * change IPEX230 to IPEX220 in dockerfile	2024-02-23 15:02:08 +08:00
Shaojun Liu	079f2011ea	Update bigdl-llm-finetune-qlora-xpu Docker Image (#10194 ) * Bump oneapi version to 2024.0 * pip install bitsandbytes scipy * Pin level-zero-gpu version * Pin accelerate version 0.23.0	2024-02-21 15:18:27 +08:00
Lilac09	eca69a6022	Fix build error of bigdl-llm-cpu (#10176 ) * fix build error * fix build error * fix build error * fix build error	2024-02-20 14:50:12 +08:00
Lilac09	f8dcaff7f4	use default python (#10070 )	2024-02-05 09:06:59 +08:00
Lilac09	72e67eedbb	Add speculative support in docker (#10058 ) * add speculative environment * add speculative environment * add speculative environment	2024-02-01 09:53:53 +08:00
binbin Deng	171fb2d185	LLM: reorganize GPU finetuning examples (#9952 )	2024-01-25 19:02:38 +08:00
ZehuaCao	51aa8b62b2	add gradio_web_ui to llm-serving image (#9918 )	2024-01-25 11:11:39 +08:00
Lilac09	de27ddd81a	Update Dockerfile (#9981 )	2024-01-24 11:10:06 +08:00
Lilac09	a2718038f7	Fix qwen model adapter in docker (#9969 ) * fix qwen in docker * add patch for model_adapter.py in fastchat * add patch for model_adapter.py in fastchat	2024-01-24 11:01:29 +08:00
Lilac09	052962dfa5	Using original fastchat and add bigdl worker in docker image (#9967 ) * add vllm worker * add options in entrypoint	2024-01-23 14:17:05 +08:00
Shaojun Liu	32c56ffc71	pip install deps (#9916 )	2024-01-17 11:03:57 +08:00
ZehuaCao	05ea0ecd70	add pv for llm-serving k8s deployment (#9906 )	2024-01-16 11:32:54 +08:00
Guancheng Fu	0396fafed1	Update BigDL-LLM-inference image (#9805 ) * upgrade to oneapi 2024 * Pin level-zero-gpu version * add flag	2024-01-03 14:00:09 +08:00
Lilac09	a5c481fedd	add chat.py denpendency in Dockerfile (#9699 )	2023-12-18 09:00:22 +08:00
Lilac09	3afed99216	fix path issue (#9696 )	2023-12-15 11:21:49 +08:00
ZehuaCao	d204125e88	[LLM] Use to build a more slim docker for k8s (#9608 ) * Create Dockerfile.k8s * Update Dockerfile More slim standalone image * Update Dockerfile * Update Dockerfile.k8s * Update bigdl-qlora-finetuing-entrypoint.sh * Update qlora_finetuning_cpu.py * Update alpaca_qlora_finetuning_cpu.py Refer to this [pr](https://github.com/intel-analytics/BigDL/pull/9551/files#diff-2025188afa54672d21236e6955c7c7f7686bec9239532e41c7983858cc9aaa89), update the LoraConfig * update * update * update * update * update * update * update * update transformer version * update Dockerfile * update Docker image name * fix error	2023-12-08 10:25:36 +08:00
Heyang Sun	4e70e33934	[LLM] code and document for distributed qlora (#9585 ) * [LLM] code and document for distributed qlora * doc * refine for gradient checkpoint * refine * Update alpaca_qlora_finetuning_cpu.py * Update alpaca_qlora_finetuning_cpu.py * Update alpaca_qlora_finetuning_cpu.py * add link in doc	2023-12-06 09:23:17 +08:00
Guancheng Fu	8b00653039	fix doc (#9599 )	2023-12-05 13:49:31 +08:00
Heyang Sun	74fd7077a2	[LLM] Multi-process and distributed QLoRA on CPU platform (#9491 ) * [LLM] Multi-process and distributed QLoRA on CPU platform * Update README.md * Update README.md * Update README.md * Update README.md * enable llm-init and bind to socket * refine * Update Dockerfile * add all files of qlora cpu example to /bigdl * fix * fix k8s * Update bigdl-qlora-finetuing-entrypoint.sh * Update bigdl-qlora-finetuing-entrypoint.sh * Update bigdl-qlora-finetuning-job.yaml * fix train sync and performance issues * add node affinity * disable user to tune cpu per pod * Update bigdl-qlora-finetuning-job.yaml	2023-12-01 13:47:19 +08:00
Lilac09	b785376f5c	Add vllm-example to docker inference image (#9570 ) * add vllm-serving to cpu image * add vllm-serving to cpu image * add vllm-serving	2023-11-30 17:04:53 +08:00
Lilac09	2554ba0913	Add usage of vllm (#9564 ) * add usage of vllm * add usage of vllm * add usage of vllm * add usage of vllm * add usage of vllm * add usage of vllm	2023-11-30 14:19:23 +08:00
Lilac09	557bb6bbdb	add judgement for running serve (#9555 )	2023-11-29 16:57:00 +08:00
Guancheng Fu	2b200bf2f2	Add vllm_worker related arguments in docker serving image's entrypoint (#9500 ) * fix entrypoint * fix missing long mode argument	2023-11-21 14:41:06 +08:00
Lilac09	566ec85113	add stream interval option to entrypoint (#9498 )	2023-11-21 09:47:32 +08:00
Lilac09	13f6eb77b4	Add exec bash to entrypoint.sh to keep container running after being booted. (#9471 ) * add bigdl-llm-init * boot bash	2023-11-15 16:09:16 +08:00
Lilac09	24146d108f	add bigdl-llm-init (#9468 )	2023-11-15 14:55:33 +08:00
Lilac09	b2b085550b	Remove bigdl-nano and add ipex into inference-cpu image (#9452 ) * remove bigdl-nano and add ipex into inference-cpu image * remove bigdl-nano in docker * remove bigdl-nano in docker	2023-11-14 10:50:52 +08:00
Wang, Jian4	0f78ebe35e	LLM : Add qlora cpu finetune docker image (#9271 ) * init qlora cpu docker image * update * remove ipex and update * update * update readme * update example and readme	2023-11-14 10:36:53 +08:00
Shaojun Liu	0e5ab5ebfc	update docker tag to 2.5.0-SNAPSHOT (#9443 )	2023-11-13 16:53:40 +08:00
Lilac09	5d4ec44488	Add all-in-one benchmark into inference-cpu docker image (#9433 ) * add all-in-one into inference-cpu image * manually_build * revise files	2023-11-13 13:07:56 +08:00

1 2 3 4

188 commits