Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								8418450300 
								
							 
						 
						
							
							
								
								optimize minicpm-o's tts part ( #12833 )  
							
							 
							
							
							
						 
						
							2025-02-17 14:53:37 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Wang, Jian4 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								1083fe5508 
								
							 
						 
						
							
							
								
								Reenable pp and lightweight-serving serving on 0.6.6 ( #12814 )  
							
							 
							
							... 
							
							
							
							* reenable pp ang lightweight serving on 066
* update readme
* updat
* update tag 
							
						 
						
							2025-02-13 10:16:00 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Guancheng Fu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								af693425f1 
								
							 
						 
						
							
							
								
								Upgrade to vLLM 0.6.6 ( #12796 )  
							
							 
							
							... 
							
							
							
							* init
* update engine init
* fix serving load_in_low_bit problem
* temp
* temp
* temp
* temp
* temp
* fix
* fixed
* done
* fix
* fix all arguments
* fix
* fix throughput script
* fix
* fix
* use official ipex-llm
* Fix readme
* fix
---------
Co-authored-by: hzjane <a1015616934@qq.com> 
							
						 
						
							2025-02-12 16:47:51 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f8ab833f74 
								
							 
						 
						
							
							
								
								support and optimize janus pro ( #12813 )  
							
							 
							
							
							
						 
						
							2025-02-12 15:07:24 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								73cfe293fa 
								
							 
						 
						
							
							
								
								add basic support for Baichuan-M1-14B-Instruct ( #12808 )  
							
							 
							
							
							
						 
						
							2025-02-11 17:27:42 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xiangyu Tian 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c9b6c94a59 
								
							 
						 
						
							
							
								
								vLLM: Update vLLM-cpu to v0.6.6-post1 ( #12728 )  
							
							 
							
							... 
							
							
							
							Update vLLM-cpu to v0.6.6-post1 
							
						 
						
							2025-01-22 15:03:01 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								bda87c21eb 
								
							 
						 
						
							
							
								
								add support and optimization for minicpmo audio part ( #12716 )  
							
							 
							
							
							
						 
						
							2025-01-16 16:39:00 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								b62734748f 
								
							 
						 
						
							
							
								
								add support and optimization for minicpmo vision part ( #12713 )  
							
							 
							
							
							
						 
						
							2025-01-16 14:51:00 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								9d65dcd7ef 
								
							 
						 
						
							
							
								
								Fix deepseek coder with linear rope type support on GPU ( #12709 )  
							
							 
							
							... 
							
							
							
							* Fix deepseek coder with linear rope type
* Style fix
* Move to optimize_pre
* Small fix
* Small fix
* Small fix to not affect other cases
* Style fixes
* Update function name
* Small fix
* Small fix
* Small fix
* Fix for low transformers version first
* Style fix
* Small fix 
							
						 
						
							2025-01-15 21:12:34 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								68857494a5 
								
							 
						 
						
							
							
								
								refactor to simplify following upgrade 2 ( #12685 )  
							
							 
							
							
							
						 
						
							2025-01-10 09:29:03 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								1ec40cd09e 
								
							 
						 
						
							
							
								
								refactor to simplify following upgrade ( #12680 )  
							
							 
							
							
							
						 
						
							2025-01-09 13:34:30 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a22a8c21bb 
								
							 
						 
						
							
							
								
								small fix and remove ununsed code about ipex ( #12671 )  
							
							 
							
							
							
						 
						
							2025-01-08 17:39:04 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c11f5f0fcd 
								
							 
						 
						
							
							
								
								also convert SdpaAttention in optimize_model ( #12673 )  
							
							 
							
							
							
						 
						
							2025-01-08 16:48:03 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ccf618ff4a 
								
							 
						 
						
							
							
								
								Remove all ipex usage ( #12666 )  
							
							 
							
							
							
						 
						
							2025-01-08 10:31:18 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								29ad5c449e 
								
							 
						 
						
							
							
								
								refactor codegeex to remove ipex kernel usage ( #12664 )  
							
							 
							
							
							
						 
						
							2025-01-07 16:17:40 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ddc0ef3993 
								
							 
						 
						
							
							
								
								refactor device check and remove cohere/mixtral support ( #12659 )  
							
							 
							
							
							
						 
						
							2025-01-07 11:15:51 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ea65e4fecc 
								
							 
						 
						
							
							
								
								remove falcon support and related UT ( #12656 )  
							
							 
							
							
							
						 
						
							2025-01-07 09:26:00 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								502461d836 
								
							 
						 
						
							
							
								
								remove unnecessary ipex kernel usage ( #12649 )  
							
							 
							
							
							
						 
						
							2025-01-03 16:45:24 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yina Chen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								8e5328e9b4 
								
							 
						 
						
							
							
								
								add disable opts for awq ( #12641 )  
							
							 
							
							
							
						 
						
							2025-01-02 15:45:22 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								2d08155513 
								
							 
						 
						
							
							
								
								remove bmm, which is only required in ipex 2.0 ( #12630 )  
							
							 
							
							
							
						 
						
							2024-12-27 17:28:57 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c72a5db757 
								
							 
						 
						
							
							
								
								remove unused code again ( #12624 )  
							
							 
							
							
							
						 
						
							2024-12-27 14:17:11 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								1604b4ead8 
								
							 
						 
						
							
							
								
								small fix ( #12616 )  
							
							 
							
							
							
						 
						
							2024-12-26 11:35:12 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								6249c1e373 
								
							 
						 
						
							
							
								
								rewrite llama optimization ( #12609 )  
							
							 
							
							
							
						 
						
							2024-12-25 17:04:32 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								073f936c37 
								
							 
						 
						
							
							
								
								refactor mistral and phi3 ( #12605 )  
							
							 
							
							
							
						 
						
							2024-12-24 17:52:32 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								3eeb02f1be 
								
							 
						 
						
							
							
								
								support Megrez-3B-Omni ( #12582 )  
							
							 
							
							
							
						 
						
							2024-12-19 17:23:01 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a608f26cc8 
								
							 
						 
						
							
							
								
								use new fused layer norm ( #12553 )  
							
							 
							
							
							
						 
						
							2024-12-17 13:52:35 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ffce86d69f 
								
							 
						 
						
							
							
								
								add basic glm-edge-v support ( #12533 )  
							
							 
							
							
							
						 
						
							2024-12-12 17:25:48 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								3e0823d2ae 
								
							 
						 
						
							
							
								
								add basic glm-edge support ( #12531 )  
							
							 
							
							
							
						 
						
							2024-12-12 16:02:22 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								77404d2a63 
								
							 
						 
						
							
							
								
								support new model ( #12523 )  
							
							 
							
							
							
						 
						
							2024-12-11 13:41:15 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a9e3f7f14c 
								
							 
						 
						
							
							
								
								optimize minicpm ( #12496 )  
							
							 
							
							
							
						 
						
							2024-12-04 17:14:16 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								6f3441ba4c 
								
							 
						 
						
							
							
								
								fix glm4-9b overflow ( #12455 )  
							
							 
							
							
							
						 
						
							2024-11-27 17:39:13 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								cdd41f5e4c 
								
							 
						 
						
							
							
								
								optimize sdxl again ( #12441 )  
							
							 
							
							
							
						 
						
							2024-11-25 17:46:46 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								8164aed802 
								
							 
						 
						
							
							
								
								small change ( #12439 )  
							
							 
							
							
							
						 
						
							2024-11-25 14:35:49 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								be132c4209 
								
							 
						 
						
							
							
								
								fix and optimize sd ( #12436 )  
							
							 
							
							
							
						 
						
							2024-11-25 14:09:48 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								e0918934c8 
								
							 
						 
						
							
							
								
								Add fused_mlp to glm4v models ( #12378 )  
							
							 
							
							
							
						 
						
							2024-11-11 17:10:25 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								1a6cbc473f 
								
							 
						 
						
							
							
								
								Add fused mlp optimizations to glm4 models ( #12360 )  
							
							 
							
							... 
							
							
							
							* Add fused mlp to glm4 models
* Small fix 
							
						 
						
							2024-11-07 18:52:47 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								872a74481a 
								
							 
						 
						
							
							
								
								Small optimization to glm4 models ( #12351 )  
							
							 
							
							
							
						 
						
							2024-11-06 19:16:58 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								e23ef7d088 
								
							 
						 
						
							
							
								
								optimize glm4v's vision part ( #12346 )  
							
							 
							
							
							
						 
						
							2024-11-06 15:43:40 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c8b7265359 
								
							 
						 
						
							
							
								
								Add basic glm4v support ( #12345 )  
							
							 
							
							
							
						 
						
							2024-11-06 13:50:10 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Zhao Changmin 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								1b637e4477 
								
							 
						 
						
							
							
								
								Add chatglm2&3 fuse mlp ( #12328 )  
							
							 
							
							... 
							
							
							
							* add chatglm fuse mlp 
							
						 
						
							2024-11-04 18:04:41 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xin Qiu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								97a0f7fd35 
								
							 
						 
						
							
							
								
								Codegeex support ( #12303 )  
							
							 
							
							... 
							
							
							
							* new codegeex attn
* use kv cache
* add compress/quantize kv
* remove compress/quantize kv
* fix style check
* fix style
* fix codegeex 
							
						 
						
							2024-10-31 15:28:56 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								43b25a2fe7 
								
							 
						 
						
							
							
								
								Fix llama 3.2 vision on LNL ( #12264 )  
							
							 
							
							... 
							
							
							
							* Fix llama 3.2 vision on LNL
* Small fix 
							
						 
						
							2024-10-25 16:23:31 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f3a2b20e6b 
								
							 
						 
						
							
							
								
								Optimize gpt2 ( #12259 )  
							
							 
							
							
							
						 
						
							2024-10-24 13:44:24 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								b3df47486d 
								
							 
						 
						
							
							
								
								Fix Gemma 2 on LNL ( #12240 )  
							
							 
							
							... 
							
							
							
							* Fix gemma 2 on LNL
* Python style fix 
							
						 
						
							2024-10-21 18:25:53 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a4a758656a 
								
							 
						 
						
							
							
								
								refactor gemma to reduce old fuse rope usage ( #12215 )  
							
							 
							
							
							
						 
						
							2024-10-16 17:40:28 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								e279148aa0 
								
							 
						 
						
							
							
								
								optimize llama3.2 vision again ( #12211 )  
							
							 
							
							
							
						 
						
							2024-10-16 14:29:48 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								d5344587ab 
								
							 
						 
						
							
							
								
								optimize internvl2 vision model's attention ( #12198 )  
							
							 
							
							
							
						 
						
							2024-10-15 10:51:00 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f8d1adc573 
								
							 
						 
						
							
							
								
								Fix Llama 3.2 & 3.1 on LNL ( #12196 )  
							
							 
							
							
							
						 
						
							2024-10-14 17:39:20 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								535bee5381 
								
							 
						 
						
							
							
								
								fix qwen2 vl again ( #12174 )  
							
							 
							
							
							
						 
						
							2024-10-10 13:50:01 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								78d253165d 
								
							 
						 
						
							
							
								
								optimize qwen2 vl perf again ( #12167 )  
							
							 
							
							
							
						 
						
							2024-10-09 16:43:48 +08:00