ipex-llm

Author	SHA1	Message	Date
Shaojun Liu	5c100ac105	Add ENTRYPOINT to Dockerfile to auto-start vllm service on container launch (for CVTE customer) (#12901 ) * Add ENTRYPOINT to Dockerfile to auto-start service on container launch (for CVTE client) * Update start-vllm-service.sh * Update README.md * Update README.md * Update start-vllm-service.sh * Update README.md	2025-02-27 17:33:58 +08:00
Xiangyu Tian	ae9f5320da	vLLM CPU: Fix Triton Version to Resolve Related Error(#12893 )	2025-02-25 15:00:41 +08:00
Shaojun Liu	dd30d12cb6	Fix serving-cpu image: setuptools-scm requires setuptools>=61 (#12876 ) * setuptools-scm requires setuptools>=61 * Update Dockerfile * Update Dockerfile * Update Dockerfile	2025-02-25 09:10:14 +08:00
Wang, Jian4	4f2f92afa3	Update inference-cpp docker (#12882 ) * remove nouse run.py * add WORKDIR /llm	2025-02-24 14:32:44 +08:00
Shaojun Liu	afad979168	Add Apache 2.0 License Information in Dockerfile to Comply with OSPDT Requirements (#12878 ) * ospdt: add Header for Dockerfile * OSPDT: add Header for Dockerfile * OSPDT: add Header for Dockerfile * OSPDT: add Header for Dockerfile	2025-02-24 14:00:46 +08:00
Wang, Jian4	e1809a6295	Update multimodal on vllm 0.6.6 (#12816 ) * add glm4v and minicpmv example * fix	2025-02-19 10:04:42 +08:00
Shaojun Liu	f7b5a093a7	Merge CPU & XPU Dockerfiles with Serving Images and Refactor (#12815 ) * Update Dockerfile * Update Dockerfile * Ensure scripts are executable * Update Dockerfile * Update Dockerfile * Update Dockerfile * Update Dockerfile * Update Dockerfile * Update Dockerfile * update * Update Dockerfile * remove inference-cpu and inference-xpu * update README	2025-02-17 14:23:22 +08:00
Wang, Jian4	1083fe5508	Reenable pp and lightweight-serving serving on 0.6.6 (#12814 ) * reenable pp ang lightweight serving on 066 * update readme * updat * update tag	2025-02-13 10:16:00 +08:00
Guancheng Fu	af693425f1	Upgrade to vLLM 0.6.6 (#12796 ) * init * update engine init * fix serving load_in_low_bit problem * temp * temp * temp * temp * temp * fix * fixed * done * fix * fix all arguments * fix * fix throughput script * fix * fix * use official ipex-llm * Fix readme * fix --------- Co-authored-by: hzjane <a1015616934@qq.com>	2025-02-12 16:47:51 +08:00
Shaojun Liu	bd815a4d96	Update the base image of inference-cpp image to oneapi 2025.0.2 (#12802 ) * Update Dockerfile * Update Dockerfile * Update Dockerfile * Update Dockerfile * Update Dockerfile * Update Dockerfile * Update Dockerfile * Update Dockerfile * Update Dockerfile * Update Dockerfile	2025-02-12 14:15:08 +08:00
Xiangyu Tian	9e9b6c9f2b	Fix cpu serving docker image (#12783 )	2025-02-07 11:12:42 +08:00
Xiangyu Tian	f924880694	vLLM: Fix vLLM-CPU docker image (#12741 )	2025-01-24 10:00:29 +08:00
Xiangyu Tian	c9b6c94a59	vLLM: Update vLLM-cpu to v0.6.6-post1 (#12728 ) Update vLLM-cpu to v0.6.6-post1	2025-01-22 15:03:01 +08:00
Wang, Jian4	716d4fe563	Add vllm 0.6.2 vision offline example (#12721 ) * add vision offline example * add to docker	2025-01-21 09:58:01 +08:00
Shaojun Liu	2673792de6	Update Dockerfile (#12688 )	2025-01-10 09:01:29 +08:00
Shaojun Liu	28737c250c	Update Dockerfile (#12585 )	2024-12-26 10:20:52 +08:00
Shaojun Liu	51ff9ebd8a	Upgrade oneccl version to 0.0.6.3 (#12560 ) * Update Dockerfile * Update Dockerfile * Update start-vllm-service.sh	2024-12-20 09:29:16 +08:00
Shaojun Liu	429bf1ffeb	Change: Use cn mirror for PyTorch extension installation to resolve network issues (#12559 ) * Update Dockerfile * Update Dockerfile * Update Dockerfile	2024-12-17 14:22:50 +08:00
Heyang Sun	fa261b8af1	torch 2.3 inference docker (#12517 ) * torch 2.3 inference docker * Update README.md * add convert code * rename image * remove 2.1 and add graph example * Update README.md	2024-12-13 10:47:04 +08:00
Wang, Jian4	922958c018	vllm oneccl upgrade to b9 (#12520 )	2024-12-10 15:02:56 +08:00
Guancheng Fu	8331875f34	Fix (#12390 )	2024-11-27 10:41:58 +08:00
Pepijn de Vos	71e1f11aa6	update serving image runtime (#12433 )	2024-11-26 14:55:30 +08:00
Shaojun Liu	c089b6c10d	Update english prompt to 34k (#12429 )	2024-11-22 11:20:35 +08:00
Wang, Jian4	1bfcbc0640	Add multimodal benchmark (#12415 ) * add benchmark multimodal * update * update * update	2024-11-20 14:21:13 +08:00
Guancheng Fu	d6057f6dd2	Update benchmark_vllm_throughput.py (#12414 )	2024-11-19 10:41:43 +08:00
Xu, Shuo	6726b198fd	Update readme & doc for the vllm upgrade to v0.6.2 (#12399 ) Co-authored-by: ATMxsp01 <shou.xu@intel.com>	2024-11-14 10:28:15 +08:00
Shaojun Liu	27152476e1	minor fix (#12389 )	2024-11-12 22:36:43 +08:00
Xu, Shuo	dd8964ba9c	changed inference-cpp/Dockerfile (#12386 ) Co-authored-by: ATMxsp01 <shou.xu@intel.com> Co-authored-by: Shaojun Liu <61072813+liu-shaojun@users.noreply.github.com>	2024-11-12 20:40:21 +08:00
Guancheng Fu	0ee54fc55f	Upgrade to vllm 0.6.2 (#12338 ) * Initial updates for vllm 0.6.2 * fix * Change Dockerfile to support v062 * Fix * fix examples * Fix * done * fix * Update engine.py * Fix Dockerfile to original path * fix * add option * fix * fix * fix * fix --------- Co-authored-by: xiangyuT <xiangyu.tian@intel.com>	2024-11-12 20:35:34 +08:00
Jun Wang	4376fdee62	Decouple the openwebui and the ollama. in inference-cpp-xpu dockerfile (#12382 ) * remove the openwebui in inference-cpp-xpu dockerfile * update docker_cpp_xpu_quickstart.md * add sample output in inference-cpp/readme * remove the openwebui in main readme * remove the openwebui in main readme	2024-11-12 20:15:23 +08:00
Wang, Jian4	85c9279e6e	Update llama-cpp docker usage (#12387 )	2024-11-12 15:30:17 +08:00
Shaojun Liu	c92d76b997	Update oneccl-binding.patch (#12377 ) * Add files via upload * upload oneccl-binding.patch * Update Dockerfile	2024-11-11 22:34:08 +08:00
Shaojun Liu	fad15c8ca0	Update fastchat demo script (#12367 ) * Update README.md * Update vllm_docker_quickstart.md	2024-11-08 15:42:17 +08:00
Xu, Shuo	ce0c6ae423	Update Readme for FastChat docker demo (#12354 ) * update Readme for FastChat docker demo * update readme * add 'Serving with FastChat' part in docs * polish docs --------- Co-authored-by: ATMxsp01 <shou.xu@intel.com>	2024-11-07 15:22:42 +08:00
Xu, Shuo	899a30331a	Replace gradio_web_server.patch to adjust webui (#12329 ) * replace gradio_web_server.patch to adjust webui * fix patch problem --------- Co-authored-by: ATMxsp01 <shou.xu@intel.com>	2024-11-06 09:16:32 +08:00
Jun Wang	3700e81977	[fix] vllm-online-benchmark first token latency error (#12271 )	2024-10-29 17:54:36 +08:00
Guancheng Fu	67014cb29f	Add benchmark_latency.py to docker serving image (#12283 )	2024-10-28 16:19:59 +08:00
Shaojun Liu	48fc63887d	use oneccl 0.0.5.1 (#12262 )	2024-10-24 16:12:24 +08:00
Jun Wang	b10fc892e1	Update new reference link of xpu/docker/readme.md (#12188 ) * [ADD] rewrite new vllm docker quick start * [ADD] lora adapter doc finished * [ADD] mulit lora adapter test successfully * [ADD] add ipex-llm quantization doc * [Merge] rebase main * [REMOVE] rm tmp file * [Merge] rebase main * [ADD] add prefix caching experiment and result * [REMOVE] rm cpu offloading chapter * [ADD] rewrite new vllm docker quick start * [ADD] lora adapter doc finished * [ADD] mulit lora adapter test successfully * [ADD] add ipex-llm quantization doc * [Merge] rebase main * [REMOVE] rm tmp file * [Merge] rebase main * [ADD] rewrite new vllm docker quick start * [ADD] lora adapter doc finished * [ADD] mulit lora adapter test successfully * [ADD] add ipex-llm quantization doc * [Merge] rebase main * [REMOVE] rm tmp file * [Merge] rebase main * [UPDATE] update the link to new vllm-docker-quickstart	2024-10-18 13:18:08 +08:00
Shaojun Liu	7825dc1398	Upgrade oneccl to 0.0.5 (#12223 )	2024-10-18 09:29:19 +08:00
Shaojun Liu	26390f9213	Update oneccl_wks_installer to 2024.0.0.4.1 (#12217 )	2024-10-17 10:11:55 +08:00
Shaojun Liu	49eb20613a	add --blocksize to doc and script (#12187 )	2024-10-12 09:17:42 +08:00
Shaojun Liu	1daab4531f	Upgrade oneccl to 0.0.4 in serving-xpu image (#12185 ) * Update oneccl to 0.0.4 * upgrade transformers to 4.44.2	2024-10-11 16:54:50 +08:00
Shaojun Liu	657889e3e4	use english prompt by default (#12115 )	2024-09-24 17:40:50 +08:00
Guancheng Fu	b36359e2ab	Fix xpu serving image oneccl (#12100 )	2024-09-20 15:25:41 +08:00
Guancheng Fu	a6cbc01911	Use new oneccl for ipex-llm serving image (#12097 )	2024-09-20 14:52:49 +08:00
Shaojun Liu	1295898830	update vllm_online_benchmark script to support long input (#12095 ) * update vllm_online_benchmark script to support long input * update guide	2024-09-20 14:18:30 +08:00
Xiangyu Tian	c2774e1a43	Update oneccl to 0.0.3 in serving-xpu image (#12088 )	2024-09-18 14:29:17 +08:00
Shaojun Liu	beb876665d	pin gradio version to fix connection error (#12069 )	2024-09-12 14:36:09 +08:00
Shaojun Liu	7e1e51d91a	Update vllm setting (#12059 ) * revert * update * update * update	2024-09-11 11:45:08 +08:00

1 2 3 4

187 commits