ipex-llm

Author	SHA1	Message	Date
Shaojun Liu	7810b8fb49	OSPDT: update dockerfile header (#12908 ) * Update Dockerfile * Update Dockerfile * Update Dockerfile * Update Dockerfile	2025-03-03 09:59:11 +08:00
Shaojun Liu	5c100ac105	Add ENTRYPOINT to Dockerfile to auto-start vllm service on container launch (for CVTE customer) (#12901 ) * Add ENTRYPOINT to Dockerfile to auto-start service on container launch (for CVTE client) * Update start-vllm-service.sh * Update README.md * Update README.md * Update start-vllm-service.sh * Update README.md	2025-02-27 17:33:58 +08:00
Xiangyu Tian	ae9f5320da	vLLM CPU: Fix Triton Version to Resolve Related Error(#12893 )	2025-02-25 15:00:41 +08:00
Shaojun Liu	dd30d12cb6	Fix serving-cpu image: setuptools-scm requires setuptools>=61 (#12876 ) * setuptools-scm requires setuptools>=61 * Update Dockerfile * Update Dockerfile * Update Dockerfile	2025-02-25 09:10:14 +08:00
Shaojun Liu	afad979168	Add Apache 2.0 License Information in Dockerfile to Comply with OSPDT Requirements (#12878 ) * ospdt: add Header for Dockerfile * OSPDT: add Header for Dockerfile * OSPDT: add Header for Dockerfile * OSPDT: add Header for Dockerfile	2025-02-24 14:00:46 +08:00
Wang, Jian4	e1809a6295	Update multimodal on vllm 0.6.6 (#12816 ) * add glm4v and minicpmv example * fix	2025-02-19 10:04:42 +08:00
Shaojun Liu	f7b5a093a7	Merge CPU & XPU Dockerfiles with Serving Images and Refactor (#12815 ) * Update Dockerfile * Update Dockerfile * Ensure scripts are executable * Update Dockerfile * Update Dockerfile * Update Dockerfile * Update Dockerfile * Update Dockerfile * Update Dockerfile * update * Update Dockerfile * remove inference-cpu and inference-xpu * update README	2025-02-17 14:23:22 +08:00
Wang, Jian4	1083fe5508	Reenable pp and lightweight-serving serving on 0.6.6 (#12814 ) * reenable pp ang lightweight serving on 066 * update readme * updat * update tag	2025-02-13 10:16:00 +08:00
Guancheng Fu	af693425f1	Upgrade to vLLM 0.6.6 (#12796 ) * init * update engine init * fix serving load_in_low_bit problem * temp * temp * temp * temp * temp * fix * fixed * done * fix * fix all arguments * fix * fix throughput script * fix * fix * use official ipex-llm * Fix readme * fix --------- Co-authored-by: hzjane <a1015616934@qq.com>	2025-02-12 16:47:51 +08:00
Xiangyu Tian	9e9b6c9f2b	Fix cpu serving docker image (#12783 )	2025-02-07 11:12:42 +08:00
Xiangyu Tian	f924880694	vLLM: Fix vLLM-CPU docker image (#12741 )	2025-01-24 10:00:29 +08:00
Xiangyu Tian	c9b6c94a59	vLLM: Update vLLM-cpu to v0.6.6-post1 (#12728 ) Update vLLM-cpu to v0.6.6-post1	2025-01-22 15:03:01 +08:00
Wang, Jian4	716d4fe563	Add vllm 0.6.2 vision offline example (#12721 ) * add vision offline example * add to docker	2025-01-21 09:58:01 +08:00
Shaojun Liu	28737c250c	Update Dockerfile (#12585 )	2024-12-26 10:20:52 +08:00
Shaojun Liu	51ff9ebd8a	Upgrade oneccl version to 0.0.6.3 (#12560 ) * Update Dockerfile * Update Dockerfile * Update start-vllm-service.sh	2024-12-20 09:29:16 +08:00
Shaojun Liu	429bf1ffeb	Change: Use cn mirror for PyTorch extension installation to resolve network issues (#12559 ) * Update Dockerfile * Update Dockerfile * Update Dockerfile	2024-12-17 14:22:50 +08:00
Wang, Jian4	922958c018	vllm oneccl upgrade to b9 (#12520 )	2024-12-10 15:02:56 +08:00
Guancheng Fu	8331875f34	Fix (#12390 )	2024-11-27 10:41:58 +08:00
Pepijn de Vos	71e1f11aa6	update serving image runtime (#12433 )	2024-11-26 14:55:30 +08:00
Shaojun Liu	c089b6c10d	Update english prompt to 34k (#12429 )	2024-11-22 11:20:35 +08:00
Wang, Jian4	1bfcbc0640	Add multimodal benchmark (#12415 ) * add benchmark multimodal * update * update * update	2024-11-20 14:21:13 +08:00
Guancheng Fu	d6057f6dd2	Update benchmark_vllm_throughput.py (#12414 )	2024-11-19 10:41:43 +08:00
Xu, Shuo	6726b198fd	Update readme & doc for the vllm upgrade to v0.6.2 (#12399 ) Co-authored-by: ATMxsp01 <shou.xu@intel.com>	2024-11-14 10:28:15 +08:00
Guancheng Fu	0ee54fc55f	Upgrade to vllm 0.6.2 (#12338 ) * Initial updates for vllm 0.6.2 * fix * Change Dockerfile to support v062 * Fix * fix examples * Fix * done * fix * Update engine.py * Fix Dockerfile to original path * fix * add option * fix * fix * fix * fix --------- Co-authored-by: xiangyuT <xiangyu.tian@intel.com>	2024-11-12 20:35:34 +08:00
Shaojun Liu	c92d76b997	Update oneccl-binding.patch (#12377 ) * Add files via upload * upload oneccl-binding.patch * Update Dockerfile	2024-11-11 22:34:08 +08:00
Shaojun Liu	fad15c8ca0	Update fastchat demo script (#12367 ) * Update README.md * Update vllm_docker_quickstart.md	2024-11-08 15:42:17 +08:00
Xu, Shuo	ce0c6ae423	Update Readme for FastChat docker demo (#12354 ) * update Readme for FastChat docker demo * update readme * add 'Serving with FastChat' part in docs * polish docs --------- Co-authored-by: ATMxsp01 <shou.xu@intel.com>	2024-11-07 15:22:42 +08:00
Xu, Shuo	899a30331a	Replace gradio_web_server.patch to adjust webui (#12329 ) * replace gradio_web_server.patch to adjust webui * fix patch problem --------- Co-authored-by: ATMxsp01 <shou.xu@intel.com>	2024-11-06 09:16:32 +08:00
Jun Wang	3700e81977	[fix] vllm-online-benchmark first token latency error (#12271 )	2024-10-29 17:54:36 +08:00
Guancheng Fu	67014cb29f	Add benchmark_latency.py to docker serving image (#12283 )	2024-10-28 16:19:59 +08:00
Shaojun Liu	48fc63887d	use oneccl 0.0.5.1 (#12262 )	2024-10-24 16:12:24 +08:00
Jun Wang	b10fc892e1	Update new reference link of xpu/docker/readme.md (#12188 ) * [ADD] rewrite new vllm docker quick start * [ADD] lora adapter doc finished * [ADD] mulit lora adapter test successfully * [ADD] add ipex-llm quantization doc * [Merge] rebase main * [REMOVE] rm tmp file * [Merge] rebase main * [ADD] add prefix caching experiment and result * [REMOVE] rm cpu offloading chapter * [ADD] rewrite new vllm docker quick start * [ADD] lora adapter doc finished * [ADD] mulit lora adapter test successfully * [ADD] add ipex-llm quantization doc * [Merge] rebase main * [REMOVE] rm tmp file * [Merge] rebase main * [ADD] rewrite new vllm docker quick start * [ADD] lora adapter doc finished * [ADD] mulit lora adapter test successfully * [ADD] add ipex-llm quantization doc * [Merge] rebase main * [REMOVE] rm tmp file * [Merge] rebase main * [UPDATE] update the link to new vllm-docker-quickstart	2024-10-18 13:18:08 +08:00
Shaojun Liu	7825dc1398	Upgrade oneccl to 0.0.5 (#12223 )	2024-10-18 09:29:19 +08:00
Shaojun Liu	26390f9213	Update oneccl_wks_installer to 2024.0.0.4.1 (#12217 )	2024-10-17 10:11:55 +08:00
Shaojun Liu	49eb20613a	add --blocksize to doc and script (#12187 )	2024-10-12 09:17:42 +08:00
Shaojun Liu	1daab4531f	Upgrade oneccl to 0.0.4 in serving-xpu image (#12185 ) * Update oneccl to 0.0.4 * upgrade transformers to 4.44.2	2024-10-11 16:54:50 +08:00
Shaojun Liu	657889e3e4	use english prompt by default (#12115 )	2024-09-24 17:40:50 +08:00
Guancheng Fu	b36359e2ab	Fix xpu serving image oneccl (#12100 )	2024-09-20 15:25:41 +08:00
Guancheng Fu	a6cbc01911	Use new oneccl for ipex-llm serving image (#12097 )	2024-09-20 14:52:49 +08:00
Shaojun Liu	1295898830	update vllm_online_benchmark script to support long input (#12095 ) * update vllm_online_benchmark script to support long input * update guide	2024-09-20 14:18:30 +08:00
Xiangyu Tian	c2774e1a43	Update oneccl to 0.0.3 in serving-xpu image (#12088 )	2024-09-18 14:29:17 +08:00
Shaojun Liu	beb876665d	pin gradio version to fix connection error (#12069 )	2024-09-12 14:36:09 +08:00
Shaojun Liu	7e1e51d91a	Update vllm setting (#12059 ) * revert * update * update * update	2024-09-11 11:45:08 +08:00
Shaojun Liu	52863dd567	fix vllm_online_benchmark.py (#12056 )	2024-09-11 09:45:30 +08:00
Guancheng Fu	69c8d36f16	Switching from vLLM v0.3.3 to vLLM 0.5.4 (#12042 ) * Enable single card sync engine * enable ipex-llm optimizations for vllm * enable optimizations for lm_head * Fix chatglm multi-reference problem * Remove duplicate layer * LLM: Update vLLM to v0.5.4 (#11746) * Enable single card sync engine * enable ipex-llm optimizations for vllm * enable optimizations for lm_head * Fix chatglm multi-reference problem * update 0.5.4 api_server * add dockerfile * fix * fix * refine * fix --------- Co-authored-by: gc-fu <guancheng.fu@intel.com> * Add vllm-0.5.4 Dockerfile (#11838) * Update BIGDL_LLM_SDP_IGNORE_MASK in start-vllm-service.sh (#11957) * Fix vLLM not convert issues (#11817) (#11918) * Fix not convert issues * refine Co-authored-by: Guancheng Fu <110874468+gc-fu@users.noreply.github.com> * Fix glm4-9b-chat nan error on vllm 0.5.4 (#11969) * init * update mlp forward * fix minicpm error in vllm 0.5.4 * fix dependabot alerts (#12008) * Update 0.5.4 dockerfile (#12021) * Add vllm awq loading logic (#11987) * [ADD] Add vllm awq loading logic * [FIX] fix the module.linear_method path * [FIX] fix quant_config path error * Enable Qwen padding mlp to 256 to support batch_forward (#12030) * Enable padding mlp * padding to 256 * update style * Install 27191 runtime in 0.5.4 docker image (#12040) * fix rebase error * fix rebase error * vLLM: format for 0.5.4 rebase (#12043) * format * Update model_convert.py * Fix serving docker related modifications (#12046) * Fix undesired modifications (#12048) * fix * Refine offline_inference arguments --------- Co-authored-by: Xiangyu Tian <109123695+xiangyuT@users.noreply.github.com> Co-authored-by: Jun Wang <thoughts.times@gmail.com> Co-authored-by: Wang, Jian4 <61138589+hzjane@users.noreply.github.com> Co-authored-by: liu-shaojun <johnssalyn@outlook.com> Co-authored-by: Shaojun Liu <61072813+liu-shaojun@users.noreply.github.com>	2024-09-10 15:37:43 +08:00
Shaojun Liu	1e8c87050f	fix model path (#11973 )	2024-08-30 13:28:28 +08:00
Shaojun Liu	4cf640c548	update docker image tag to 2.2.0-SNAPSHOT (#11904 )	2024-08-23 13:57:41 +08:00
Wang, Jian4	b119825152	Remove tgi parameter validation (#11688 ) * remove validation * add min warm up * remove no need source	2024-07-30 16:37:44 +08:00
Guancheng Fu	86fc0492f4	Update oneccl used (#11647 ) * Add internal oneccl * fix * fix * add oneccl	2024-07-26 09:38:39 +08:00
Wang, Jian4	1eed0635f2	Add lightweight serving and support tgi parameter (#11600 ) * init tgi request * update openai api * update for pp * update and add readme * add to docker * add start bash * update * update * update	2024-07-19 13:15:56 +08:00

1 2

92 commits