ipex-llm

Author	SHA1	Message	Date
Jun Wang	3700e81977	[fix] vllm-online-benchmark first token latency error (#12271 )	2024-10-29 17:54:36 +08:00
Guancheng Fu	67014cb29f	Add benchmark_latency.py to docker serving image (#12283 )	2024-10-28 16:19:59 +08:00
Shaojun Liu	48fc63887d	use oneccl 0.0.5.1 (#12262 )	2024-10-24 16:12:24 +08:00
Jun Wang	b10fc892e1	Update new reference link of xpu/docker/readme.md (#12188 ) * [ADD] rewrite new vllm docker quick start * [ADD] lora adapter doc finished * [ADD] mulit lora adapter test successfully * [ADD] add ipex-llm quantization doc * [Merge] rebase main * [REMOVE] rm tmp file * [Merge] rebase main * [ADD] add prefix caching experiment and result * [REMOVE] rm cpu offloading chapter * [ADD] rewrite new vllm docker quick start * [ADD] lora adapter doc finished * [ADD] mulit lora adapter test successfully * [ADD] add ipex-llm quantization doc * [Merge] rebase main * [REMOVE] rm tmp file * [Merge] rebase main * [ADD] rewrite new vllm docker quick start * [ADD] lora adapter doc finished * [ADD] mulit lora adapter test successfully * [ADD] add ipex-llm quantization doc * [Merge] rebase main * [REMOVE] rm tmp file * [Merge] rebase main * [UPDATE] update the link to new vllm-docker-quickstart	2024-10-18 13:18:08 +08:00
Shaojun Liu	7825dc1398	Upgrade oneccl to 0.0.5 (#12223 )	2024-10-18 09:29:19 +08:00
Shaojun Liu	26390f9213	Update oneccl_wks_installer to 2024.0.0.4.1 (#12217 )	2024-10-17 10:11:55 +08:00
Shaojun Liu	49eb20613a	add --blocksize to doc and script (#12187 )	2024-10-12 09:17:42 +08:00
Shaojun Liu	1daab4531f	Upgrade oneccl to 0.0.4 in serving-xpu image (#12185 ) * Update oneccl to 0.0.4 * upgrade transformers to 4.44.2	2024-10-11 16:54:50 +08:00
Shaojun Liu	657889e3e4	use english prompt by default (#12115 )	2024-09-24 17:40:50 +08:00
Guancheng Fu	b36359e2ab	Fix xpu serving image oneccl (#12100 )	2024-09-20 15:25:41 +08:00
Guancheng Fu	a6cbc01911	Use new oneccl for ipex-llm serving image (#12097 )	2024-09-20 14:52:49 +08:00
Shaojun Liu	1295898830	update vllm_online_benchmark script to support long input (#12095 ) * update vllm_online_benchmark script to support long input * update guide	2024-09-20 14:18:30 +08:00
Xiangyu Tian	c2774e1a43	Update oneccl to 0.0.3 in serving-xpu image (#12088 )	2024-09-18 14:29:17 +08:00
Shaojun Liu	beb876665d	pin gradio version to fix connection error (#12069 )	2024-09-12 14:36:09 +08:00
Shaojun Liu	7e1e51d91a	Update vllm setting (#12059 ) * revert * update * update * update	2024-09-11 11:45:08 +08:00
Shaojun Liu	52863dd567	fix vllm_online_benchmark.py (#12056 )	2024-09-11 09:45:30 +08:00
Guancheng Fu	69c8d36f16	Switching from vLLM v0.3.3 to vLLM 0.5.4 (#12042 ) * Enable single card sync engine * enable ipex-llm optimizations for vllm * enable optimizations for lm_head * Fix chatglm multi-reference problem * Remove duplicate layer * LLM: Update vLLM to v0.5.4 (#11746) * Enable single card sync engine * enable ipex-llm optimizations for vllm * enable optimizations for lm_head * Fix chatglm multi-reference problem * update 0.5.4 api_server * add dockerfile * fix * fix * refine * fix --------- Co-authored-by: gc-fu <guancheng.fu@intel.com> * Add vllm-0.5.4 Dockerfile (#11838) * Update BIGDL_LLM_SDP_IGNORE_MASK in start-vllm-service.sh (#11957) * Fix vLLM not convert issues (#11817) (#11918) * Fix not convert issues * refine Co-authored-by: Guancheng Fu <110874468+gc-fu@users.noreply.github.com> * Fix glm4-9b-chat nan error on vllm 0.5.4 (#11969) * init * update mlp forward * fix minicpm error in vllm 0.5.4 * fix dependabot alerts (#12008) * Update 0.5.4 dockerfile (#12021) * Add vllm awq loading logic (#11987) * [ADD] Add vllm awq loading logic * [FIX] fix the module.linear_method path * [FIX] fix quant_config path error * Enable Qwen padding mlp to 256 to support batch_forward (#12030) * Enable padding mlp * padding to 256 * update style * Install 27191 runtime in 0.5.4 docker image (#12040) * fix rebase error * fix rebase error * vLLM: format for 0.5.4 rebase (#12043) * format * Update model_convert.py * Fix serving docker related modifications (#12046) * Fix undesired modifications (#12048) * fix * Refine offline_inference arguments --------- Co-authored-by: Xiangyu Tian <109123695+xiangyuT@users.noreply.github.com> Co-authored-by: Jun Wang <thoughts.times@gmail.com> Co-authored-by: Wang, Jian4 <61138589+hzjane@users.noreply.github.com> Co-authored-by: liu-shaojun <johnssalyn@outlook.com> Co-authored-by: Shaojun Liu <61072813+liu-shaojun@users.noreply.github.com>	2024-09-10 15:37:43 +08:00
Shaojun Liu	1e8c87050f	fix model path (#11973 )	2024-08-30 13:28:28 +08:00
Shaojun Liu	23f51f87f0	update tag to 2.2.0-SNAPSHOT (#11947 )	2024-08-28 09:20:32 +08:00
Shaojun Liu	4cf640c548	update docker image tag to 2.2.0-SNAPSHOT (#11904 )	2024-08-23 13:57:41 +08:00
Wang, Jian4	b119825152	Remove tgi parameter validation (#11688 ) * remove validation * add min warm up * remove no need source	2024-07-30 16:37:44 +08:00
Guancheng Fu	86fc0492f4	Update oneccl used (#11647 ) * Add internal oneccl * fix * fix * add oneccl	2024-07-26 09:38:39 +08:00
Wang, Jian4	1eed0635f2	Add lightweight serving and support tgi parameter (#11600 ) * init tgi request * update openai api * update for pp * update and add readme * add to docker * add start bash * update * update * update	2024-07-19 13:15:56 +08:00
Wang, Jian4	9c15abf825	Refactor fastapi-serving and add one card serving(#11581 ) * init fastapi-serving one card * mv api code to source * update worker * update for style-check * add worker * update bash * update * update worker name and add readme * rename update * rename to fastapi	2024-07-17 11:12:43 +08:00
Xiangyu Tian	7f5111a998	LLM: Refine start script for Pipeline Parallel Serving (#11557 ) Refine start script and readme for Pipeline Parallel Serving	2024-07-11 15:45:27 +08:00
binbin Deng	66f6ffe4b2	Update GPU HF-Transformers example structure (#11526 )	2024-07-08 17:58:06 +08:00
Shaojun Liu	72b4efaad4	Enhanced XPU Dockerfiles: Optimized Environment Variables and Documentation (#11506 ) * Added SYCL_CACHE_PERSISTENT=1 to xpu Dockerfile * Update the document to add explanations for environment variables. * update quickstart	2024-07-04 20:18:38 +08:00
Guancheng Fu	4fbb0d33ae	Pin compute runtime version for xpu images (#11479 ) * pin compute runtime version * fix done	2024-07-01 21:41:02 +08:00
Wang, Jian4	e000ac90c4	Add pp_serving example to serving image (#11433 ) * init pp * update * update * no clone ipex-llm again	2024-06-28 16:45:25 +08:00
Wang, Jian4	b7bc1023fb	Add vllm_online_benchmark.py (#11458 ) * init * update and add * update	2024-06-28 14:59:06 +08:00
Shaojun Liu	5aa3e427a9	Fix docker images (#11362 ) * Fix docker images * add-apt-repository requires gnupg, gpg-agent, software-properties-common * update * avoid importing ipex again	2024-06-20 15:44:55 +08:00
Xiangyu Tian	ef9f740801	Docs: Fix CPU Serving Docker README (#11351 ) Fix CPU Serving Docker README	2024-06-18 16:27:51 +08:00
Guancheng Fu	c9b4cadd81	fix vLLM/docker issues (#11348 ) * fix * fix * ffix	2024-06-18 16:23:53 +08:00
Qiyuan Gong	de4bb97b4f	Remove accelerate 0.23.0 install command in readme and docker (#11333 ) *ipex-llm's accelerate has been upgraded to 0.23.0. Remove accelerate 0.23.0 install command in README and docker。	2024-06-17 17:52:12 +08:00
Shaojun Liu	77809be946	Install packages for ipex-llm-serving-cpu docker image (#11321 ) * apt-get install patch * Update Dockerfile * Update Dockerfile * revert	2024-06-14 15:26:01 +08:00
Shaojun Liu	9760ffc256	Fix SDLe CT222 Vulnerabilities (#11237 ) * fix ct222 vuln * update * fix * update ENTRYPOINT * revert ENTRYPOINT * Fix CT222 Vulns * fix * revert changes * fix * revert * add sudo permission to ipex-llm user * do not use ipex-llm user	2024-06-13 15:31:22 +08:00
Shaojun Liu	84f04087fb	Add intelanalytics/ipex-llm:sources image for OSPDT (#11296 ) * Add intelanalytics/ipex-llm:sources image * apt-get source	2024-06-13 14:29:14 +08:00
Guancheng Fu	2e75bbccf9	Add more control arguments for benchmark_vllm_throughput (#11291 )	2024-06-12 17:43:06 +08:00
Guancheng Fu	eeffeeb2e2	fix benchmark script(#11243 )	2024-06-06 17:44:19 +08:00
Shaojun Liu	1f2057b16a	Fix ipex-llm-cpu docker image (#11213 ) * fix * fix ipex-llm-cpu image	2024-06-05 11:13:17 +08:00
Xiangyu Tian	ac3d53ff5d	LLM: Fix vLLM CPU version error (#11206 ) Fix vLLM CPU version error	2024-06-04 19:10:23 +08:00
Guancheng Fu	3ef4aa98d1	Refine vllm_quickstart doc (#11199 ) * refine doc * refine	2024-06-04 18:46:27 +08:00
Shaojun Liu	744042d1b2	remove software-properties-common from Dockerfile (#11203 )	2024-06-04 17:37:42 +08:00
Guancheng Fu	daf7b1cd56	[Docker] Fix image using two cards error (#11144 ) * fix all * done	2024-05-27 16:20:13 +08:00
Qiyuan Gong	21a1a973c1	Remove axolotl and python3-blinker (#11127 ) * Remove axolotl from image to reduce image size. * Remove python3-blinker to avoid axolotl lib conflict.	2024-05-24 13:54:19 +08:00
Wang, Jian4	1443b802cc	Docker：Fix building cpp_docker and remove unimportant dependencies (#11114 ) * test build * update	2024-05-24 09:49:44 +08:00
Xiangyu Tian	b3f6faa038	LLM: Add CPU vLLM entrypoint (#11083 ) Add CPU vLLM entrypoint and update CPU vLLM serving example.	2024-05-24 09:16:59 +08:00
Shaojun Liu	e0f401d97d	FIX: APT Repository not working (signatures invalid) (#11112 ) * chmod 644 gpg key * chmod 644 gpg key	2024-05-23 16:15:45 +08:00
binbin Deng	ecb16dcf14	Add deepspeed autotp support for xpu docker (#11077 )	2024-05-21 14:49:54 +08:00
Wang, Jian4	00d4410746	Update cpp docker quickstart (#11040 ) * add sample output * update link * update * update header * update	2024-05-16 14:55:13 +08:00

1 2 3 4

152 commits