Xu, Shuo 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								62318964fa 
								
							 
						 
						
							
							
								
								Update llama example information ( #12640 )  
							
							 
							
							... 
							
							
							
							Co-authored-by: ATMxsp01 <shou.xu@intel.com> 
							
						 
						
							2025-01-02 13:48:39 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								81211fd010 
								
							 
						 
						
							
							
								
								remove unused code ( #12635 )  
							
							 
							
							
							
						 
						
							2025-01-02 13:31:09 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								534566e290 
								
							 
						 
						
							
							
								
								[NPU] Support minicpm-v with python cpp backend ( #12637 )  
							
							 
							
							
							
						 
						
							2025-01-02 11:13:15 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f289f68d57 
								
							 
						 
						
							
							
								
								small fix ( #12634 )  
							
							 
							
							
							
						 
						
							2024-12-30 17:14:25 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								2d08155513 
								
							 
						 
						
							
							
								
								remove bmm, which is only required in ipex 2.0 ( #12630 )  
							
							 
							
							
							
						 
						
							2024-12-27 17:28:57 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f17ccfa61a 
								
							 
						 
						
							
							
								
								[NPU] Fix save-load usage of minicpm models ( #12628 )  
							
							 
							
							
							
						 
						
							2024-12-27 15:56:46 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c72a5db757 
								
							 
						 
						
							
							
								
								remove unused code again ( #12624 )  
							
							 
							
							
							
						 
						
							2024-12-27 14:17:11 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								46eeab4479 
								
							 
						 
						
							
							
								
								[NPU] Fix regression caused by layer_norm change ( #12627 )  
							
							 
							
							
							
						 
						
							2024-12-27 14:08:49 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								90f6709486 
								
							 
						 
						
							
							
								
								[remove pipeline examples ( #12626 )  
							
							 
							
							
							
						 
						
							2024-12-27 13:42:28 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Zijie Li 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								5f04ed7254 
								
							 
						 
						
							
							
								
								NPU] Update prompt format for baichuan2-pipeline ( #12625 )  
							
							 
							
							
							
						 
						
							2024-12-27 11:30:54 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								34dbdb8ee3 
								
							 
						 
						
							
							
								
								small fix ( #12623 )  
							
							 
							
							
							
						 
						
							2024-12-27 10:19:27 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xu, Shuo 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								55ce091242 
								
							 
						 
						
							
							
								
								Add GLM4-Edge-V GPU example ( #12596 )  
							
							 
							
							... 
							
							
							
							* Add GLM4-Edge-V examples
* polish readme
* revert wrong changes
* polish readme
* polish readme
* little polish in reference info and indent
* Small fix and sample output updates
* Update main readme
---------
Co-authored-by: Yuwen Hu <yuwen.hu@intel.com> 
							
						 
						
							2024-12-27 09:40:29 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								796ee571a5 
								
							 
						 
						
							
							
								
								[NPU doc] Update verified platforms ( #12621 )  
							
							 
							
							
							
						 
						
							2024-12-26 17:39:13 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								bbdbbb0d88 
								
							 
						 
						
							
							
								
								[NPU] Compatible with other third-party models like auto-round ( #12620 )  
							
							 
							
							... 
							
							
							
							* support third party model
* simplify code
* fix sty;e
* fix sym int4 GW
* code refactor
* fix 
							
						 
						
							2024-12-26 17:25:18 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a9abde0b5d 
								
							 
						 
						
							
							
								
								support passing attn_scale to sdpa ( #12619 )  
							
							 
							
							
							
						 
						
							2024-12-26 16:58:09 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Shaojun Liu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								40a7d2b4f0 
								
							 
						 
						
							
							
								
								Consolidated C-Eval Benchmark Guide for Single-GPU and Multi-GPU Environments ( #12618 )  
							
							 
							
							... 
							
							
							
							* run c-eval on multi-GPUs
* Update README.md 
							
						 
						
							2024-12-26 15:23:32 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Zijie Li 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ccc4055058 
								
							 
						 
						
							
							
								
								[NPU] Update prompt format for baichuan2 ( #12615 )  
							
							 
							
							... 
							
							
							
							* Update baichuan2.py
* style fix 
							
						 
						
							2024-12-26 11:41:37 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								1604b4ead8 
								
							 
						 
						
							
							
								
								small fix ( #12616 )  
							
							 
							
							
							
						 
						
							2024-12-26 11:35:12 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								d841e1dc0d 
								
							 
						 
						
							
							
								
								[NPU] update convert script based on latest usage ( #12617 )  
							
							 
							
							
							
						 
						
							2024-12-26 11:23:04 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xu, Shuo 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ef585d3360 
								
							 
						 
						
							
							
								
								Polish Readme for ModelScope-related examples ( #12603 )  
							
							 
							
							
							
						 
						
							2024-12-26 10:52:47 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a596f1ae5f 
								
							 
						 
						
							
							
								
								remove bigdl-llm test to fix langchain UT ( #12613 )  
							
							 
							
							
							
						 
						
							2024-12-26 10:17:25 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								9e895f04ec 
								
							 
						 
						
							
							
								
								[NPU] fix npu save ( #12614 )  
							
							 
							
							... 
							
							
							
							* fix npu save
* update 
							
						 
						
							2024-12-26 09:21:16 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								6249c1e373 
								
							 
						 
						
							
							
								
								rewrite llama optimization ( #12609 )  
							
							 
							
							
							
						 
						
							2024-12-25 17:04:32 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								5f5ac8a856 
								
							 
						 
						
							
							
								
								fix llama related import ( #12611 )  
							
							 
							
							
							
						 
						
							2024-12-25 16:23:52 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								4e6b9d804f 
								
							 
						 
						
							
							
								
								add compresskv back for mistral ( #12607 )  
							
							 
							
							... 
							
							
							
							* add compresskv back for mistral
* fix
* fix 
							
						 
						
							2024-12-25 11:06:08 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								4135b895b3 
								
							 
						 
						
							
							
								
								refactor chatglm2, internlm, stablelm and qwen ( #12604 )  
							
							 
							
							
							
						 
						
							2024-12-24 18:18:00 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								073f936c37 
								
							 
						 
						
							
							
								
								refactor mistral and phi3 ( #12605 )  
							
							 
							
							
							
						 
						
							2024-12-24 17:52:32 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								45f8f72a28 
								
							 
						 
						
							
							
								
								[NPU] Fix minicpm on MTL ( #12599 )  
							
							 
							
							
							
						 
						
							2024-12-24 15:37:56 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ad2dc965c5 
								
							 
						 
						
							
							
								
								refactor mllama, gpt2 and internvl ( #12602 )  
							
							 
							
							
							
						 
						
							2024-12-24 14:18:31 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7aaf02f602 
								
							 
						 
						
							
							
								
								refactor baichuan, glm4 and minicpm3 ( #12600 )  
							
							 
							
							
							
						 
						
							2024-12-24 14:16:30 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Zijie Li 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c410d9cf73 
								
							 
						 
						
							
							
								
								[NPU] support asym_int4 for baichuan ( #12576 )  
							
							 
							
							... 
							
							
							
							* add npu support for baichuan
* Update baichuan_mp.py
* Update baichuan_mp.py 
							
						 
						
							2024-12-24 09:17:50 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								098eb335b2 
								
							 
						 
						
							
							
								
								refactor sd 1.5 and qwen2-vl and fix ( #12590 )  
							
							 
							
							
							
						 
						
							2024-12-20 17:34:55 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								b050368efc 
								
							 
						 
						
							
							
								
								refactor yuan2 and starcoder2 and fix ( #12589 )  
							
							 
							
							
							
						 
						
							2024-12-20 16:41:50 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								6ea8033635 
								
							 
						 
						
							
							
								
								refactor glm edge ( #12588 )  
							
							 
							
							
							
						 
						
							2024-12-20 15:36:57 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xu, Shuo 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								b0338c5529 
								
							 
						 
						
							
							
								
								Add --modelscope option for glm-v4 MiniCPM-V-2_6 glm-edge and internvl2 ( #12583 )  
							
							 
							
							... 
							
							
							
							* Add --modelscope option for glm-v4 and MiniCPM-V-2_6
* glm-edge
* minicpm-v-2_6:don't use model_hub=modelscope when use lowbit; internvl2
---------
Co-authored-by: ATMxsp01 <shou.xu@intel.com> 
							
						 
						
							2024-12-20 13:54:17 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f3b5fad3be 
								
							 
						 
						
							
							
								
								refactor qwen2 and llama3 ( #12587 )  
							
							 
							
							
							
						 
						
							2024-12-20 13:25:25 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xu, Shuo 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								47da3c999f 
								
							 
						 
						
							
							
								
								Add --modelscope in GPU examples for minicpm, minicpm3, baichuan2 ( #12564 )  
							
							 
							
							... 
							
							
							
							* Add --modelscope for more models
* minicpm
---------
Co-authored-by: ATMxsp01 <shou.xu@intel.com> 
							
						 
						
							2024-12-19 17:25:46 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								3eeb02f1be 
								
							 
						 
						
							
							
								
								support Megrez-3B-Omni ( #12582 )  
							
							 
							
							
							
						 
						
							2024-12-19 17:23:01 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								4e7e988f70 
								
							 
						 
						
							
							
								
								[NPU] Fix MTL and ARL support ( #12580 )  
							
							 
							
							
							
						 
						
							2024-12-19 16:55:30 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								80f2fdc37b 
								
							 
						 
						
							
							
								
								optimize new minicpm model ( #12579 )  
							
							 
							
							
							
						 
						
							2024-12-19 14:22:47 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								4540424271 
								
							 
						 
						
							
							
								
								optimize siglip attention again ( #12578 )  
							
							 
							
							
							
						 
						
							2024-12-19 13:40:48 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								e0921f80c1 
								
							 
						 
						
							
							
								
								padding mask on torch side ( #12577 )  
							
							 
							
							
							
						 
						
							2024-12-19 10:53:02 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xu, Shuo 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								47e90a362f 
								
							 
						 
						
							
							
								
								Add --modelscope in GPU examples for glm4, codegeex2, qwen2 and qwen2.5  ( #12561 )  
							
							 
							
							... 
							
							
							
							* Add --modelscope for more models
* imporve readme
---------
Co-authored-by: ATMxsp01 <shou.xu@intel.com> 
							
						 
						
							2024-12-19 10:00:39 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								e2ae42929a 
								
							 
						 
						
							
							
								
								small fix ( #12573 )  
							
							 
							
							
							
						 
						
							2024-12-18 15:48:22 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a4eb561f36 
								
							 
						 
						
							
							
								
								optimize siglip attention on arc ( #12569 )  
							
							 
							
							
							
						 
						
							2024-12-18 14:19:43 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Zijie Li 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								1a2ab12876 
								
							 
						 
						
							
							
								
								[NPU] support asym_int4 for minicpm ( #12567 )  
							
							 
							
							
							
						 
						
							2024-12-18 10:55:35 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								6278cafc25 
								
							 
						 
						
							
							
								
								Add setuptools as a basic dependency ( #12563 )  
							
							 
							
							... 
							
							
							
							* Add setuptools as a basic dependency
* Remove unnecessary requirements of setuptools in example/unit/nightly tests 
							
						 
						
							2024-12-17 16:56:41 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Zijie Li 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								fcb474820d 
								
							 
						 
						
							
							
								
								[NPU] support asym_int4 for llama ( #12556 )  
							
							 
							
							... 
							
							
							
							* add llama-imatrix
* fix bugs in llama.py
* style fix 
							
						 
						
							2024-12-17 14:01:17 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a608f26cc8 
								
							 
						 
						
							
							
								
								use new fused layer norm ( #12553 )  
							
							 
							
							
							
						 
						
							2024-12-17 13:52:35 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								680ea7e4a8 
								
							 
						 
						
							
							
								
								[NPU doc] Update configuration for different platforms ( #12554 )  
							
							 
							
							
							
						 
						
							2024-12-17 10:15:09 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xu, Shuo 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ccc18eefb5 
								
							 
						 
						
							
							
								
								Add Modelscope option for chatglm3 on GPU ( #12545 )  
							
							 
							
							... 
							
							
							
							* Add Modelscope option for GPU model chatglm3
* Update readme
* Update readme
* Update readme
* Update readme
* format update
---------
Co-authored-by: ATMxsp01 <shou.xu@intel.com> 
							
						 
						
							2024-12-16 20:00:37 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								5ae0006103 
								
							 
						 
						
							
							
								
								remove old rope usage ( #12552 )  
							
							 
							
							
							
						 
						
							2024-12-16 15:59:36 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Chu,Youcheng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a86487c539 
								
							 
						 
						
							
							
								
								Add GLM-Edge GPU example ( #12483 )  
							
							 
							
							... 
							
							
							
							* feat: initial commit
* generate.py and README updates
* Update link for main readme
* Update based on comments
* Small fix
---------
Co-authored-by: Yuwen Hu <yuwen.hu@intel.com> 
							
						 
						
							2024-12-16 14:39:19 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jun Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								0b953e61ef 
								
							 
						 
						
							
							
								
								[REFINE] graphmode code ( #12540 )  
							
							 
							
							
							
						 
						
							2024-12-16 09:17:01 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								caf15cc5ef 
								
							 
						 
						
							
							
								
								[NPU] Add IPEX_LLM_NPU_MTL to enable support on mtl ( #12543 )  
							
							 
							
							
							
						 
						
							2024-12-13 17:01:13 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c090d167dc 
								
							 
						 
						
							
							
								
								remove old rope usage ( #12544 )  
							
							 
							
							
							
						 
						
							2024-12-13 16:54:58 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								d20a968ce2 
								
							 
						 
						
							
							
								
								[NPU] Fix generate example ( #12541 )  
							
							 
							
							
							
						 
						
							2024-12-13 14:07:24 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								15219944b8 
								
							 
						 
						
							
							
								
								optimize glm edge again ( #12539 )  
							
							 
							
							
							
						 
						
							2024-12-13 13:52:39 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								6596c18489 
								
							 
						 
						
							
							
								
								[NPU] Modify IPEX_LLM_NPU_DISABLE_COMPILE_OPT setting for long input ( #12537 )  
							
							 
							
							
							
						 
						
							2024-12-13 13:49:56 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7cc01fdc86 
								
							 
						 
						
							
							
								
								[NPU] further fix of new_value_states ( #12538 )  
							
							 
							
							
							
						 
						
							2024-12-13 13:42:00 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Heyang Sun 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								fa261b8af1 
								
							 
						 
						
							
							
								
								torch 2.3 inference docker ( #12517 )  
							
							 
							
							... 
							
							
							
							* torch 2.3 inference docker
* Update README.md
* add convert code
* rename image
* remove 2.1 and add graph example
* Update README.md 
							
						 
						
							2024-12-13 10:47:04 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f36c23664f 
								
							 
						 
						
							
							
								
								[NPU] Fix abnormal output with latest driver ( #12530 )  
							
							 
							
							
							
						 
						
							2024-12-12 17:56:30 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ffce86d69f 
								
							 
						 
						
							
							
								
								add basic glm-edge-v support ( #12533 )  
							
							 
							
							
							
						 
						
							2024-12-12 17:25:48 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								3e0823d2ae 
								
							 
						 
						
							
							
								
								add basic glm-edge support ( #12531 )  
							
							 
							
							
							
						 
						
							2024-12-12 16:02:22 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								dbaf4abcb3 
								
							 
						 
						
							
							
								
								[NPU] Update C++ example with repetition_penalty & update Python code accordingly ( #12528 )  
							
							 
							
							... 
							
							
							
							* Update c++ npu examples with repetition penalty
* Fit python with updated C++ API
* Style fix
* Small fix
* Small fix 
							
						 
						
							2024-12-12 13:42:55 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Shaojun Liu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								2cce89691a 
								
							 
						 
						
							
							
								
								Enable use_batch_forward Optimization on Battlemage GPU ( #12516 )  
							
							 
							
							... 
							
							
							
							* Update get_xpu_device_type() to support bmg
* enable use_batch_forward for bmg
* Update low_bit_linear.py
* Update utils.py
* use batch kernel for fp8e5 
							
						 
						
							2024-12-12 12:44:36 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								6fc27da9c1 
								
							 
						 
						
							
							
								
								[NPU] Update glm-edge support in docs ( #12529 )  
							
							 
							
							
							
						 
						
							2024-12-12 11:14:09 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								509bdb4661 
								
							 
						 
						
							
							
								
								[NPU] Fix minicpm-2B error ( #12527 )  
							
							 
							
							
							
						 
						
							2024-12-11 16:49:32 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xu, Shuo 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								fd9cf767ed 
								
							 
						 
						
							
							
								
								All-in-one Benchmark run.py: Ignore error if import BenchmarkWrapper failed. ( #12526 )  
							
							 
							
							
							
						 
						
							2024-12-11 16:20:55 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								41ef4974ab 
								
							 
						 
						
							
							
								
								[NPU] fix transpose_value = False for NPU optimize_model=True ( #12525 )  
							
							 
							
							
							
						 
						
							2024-12-11 15:51:39 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								588bfa24dc 
								
							 
						 
						
							
							
								
								support hqq ( #12518 )  
							
							 
							
							... 
							
							
							
							* support
* fix 
							
						 
						
							2024-12-11 15:43:02 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								68f2873bd3 
								
							 
						 
						
							
							
								
								[NPU] Support repetition penalty for simple generate, Python (cpp backend) ( #12522 )  
							
							 
							
							... 
							
							
							
							* Initial support of repetition penalty on NPU (cpp backend) for simple generate
* Bug fix for generation config and others
* Remove unnecessary print and style fix
* Remove unnecessary print
* Fix based on comments 
							
						 
						
							2024-12-11 14:55:25 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								77404d2a63 
								
							 
						 
						
							
							
								
								support new model ( #12523 )  
							
							 
							
							
							
						 
						
							2024-12-11 13:41:15 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ea55235cbd 
								
							 
						 
						
							
							
								
								[NPU] Support glm-edge models ( #12511 )  
							
							 
							
							
							
						 
						
							2024-12-09 14:06:27 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								12c78978dd 
								
							 
						 
						
							
							
								
								[NPU C++] Update example with conversation mode support ( #12510 )  
							
							 
							
							
							
						 
						
							2024-12-06 12:46:37 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								0918d3baca 
								
							 
						 
						
							
							
								
								[NPU] Fix hf generate with save/load generation config for Python (cpp backend) ( #12509 )  
							
							 
							
							... 
							
							
							
							* Fix hf generate with save/load generation config
* Small fix
* Fix based on comments 
							
						 
						
							2024-12-05 19:19:58 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								49ab8974fa 
								
							 
						 
						
							
							
								
								[NPU] initial support of asym_int4_rtn ( #12484 )  
							
							 
							
							... 
							
							
							
							* initiail support of q4_1
* fix
* fix
* update
* update min to Z1
* update
* fix
* update
* fix style
* fix
* support qwen2 optimize_model=True mp version
* temp save
* fix
* fix style
* replace min with zero
* support split linear for q4_1
* fix lm_head with mixed_precision=True
* fix style
* revert test code
* add down proj back for q4_0
* remove print 
							
						 
						
							2024-12-05 17:40:36 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jinhe 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								5e1416c9aa 
								
							 
						 
						
							
							
								
								fix readme for npu cpp examples and llama.cpp ( #12505 )  
							
							 
							
							... 
							
							
							
							* fix cpp readme
* fix cpp readme
* fix cpp readme 
							
						 
						
							2024-12-05 12:32:42 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f56a111aa2 
								
							 
						 
						
							
							
								
								[NPU] Fix load-low-bit benchmark script ( #12502 )  
							
							 
							
							
							
						 
						
							2024-12-05 10:01:32 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								84f1c4ad57 
								
							 
						 
						
							
							
								
								Small fix for NPU Python cpp simple generate regarding eos tokens ( #12501 )  
							
							 
							
							
							
						 
						
							2024-12-04 18:54:06 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Kai Huang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								d8b14a6305 
								
							 
						 
						
							
							
								
								Update save/load comments ( #12500 )  
							
							 
							
							
							
						 
						
							2024-12-04 18:51:38 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Kai Huang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								b89ea1b0cf 
								
							 
						 
						
							
							
								
								Support save/load model for hf generate ( #12499 )  
							
							 
							
							... 
							
							
							
							* change dummy model
* style
* meet review 
							
						 
						
							2024-12-04 18:26:39 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Kai Huang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7d27f134dd 
								
							 
						 
						
							
							
								
								Fix hf generate for llama3.2 ( #12497 )  
							
							 
							
							... 
							
							
							
							* fix kv condition]
* meet review 
							
						 
						
							2024-12-04 17:54:40 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Chu,Youcheng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ffa9a9e1b3 
								
							 
						 
						
							
							
								
								Update streaming in npu examples ( #12495 )  
							
							 
							
							... 
							
							
							
							* feat: add streaming
* Update readme accordingly
---------
Co-authored-by: Yuwen Hu <yuwen.hu@intel.com> 
							
						 
						
							2024-12-04 17:51:10 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a9e3f7f14c 
								
							 
						 
						
							
							
								
								optimize minicpm ( #12496 )  
							
							 
							
							
							
						 
						
							2024-12-04 17:14:16 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								e0bf0054e1 
								
							 
						 
						
							
							
								
								small fix ( #12493 )  
							
							 
							
							
							
						 
						
							2024-12-04 16:37:39 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Kai Huang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7ff4533b39 
								
							 
						 
						
							
							
								
								Support hf generate ( #12477 )  
							
							 
							
							... 
							
							
							
							* generate
* style
* update
* remove timing
* style
* style
* combine generate api
* simple in kwargs 
							
						 
						
							2024-12-04 16:31:09 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ef4028ac2d 
								
							 
						 
						
							
							
								
								[NPU] Support split lm_head for Qwen2 with CPP ( #12491 )  
							
							 
							
							... 
							
							
							
							* Use split for Qwen2 lm_head instead of slice in optimize_pre
* Support split lm_head in Qwen2 python cpp backend
* Fit with Python acc lib pipeline
* Removed default mixed_precision=True in all-in-one and related examples
* Small fix
* Style fix
* Fix based on comments
* Fix based on comments
* Stype fix 
							
						 
						
							2024-12-04 14:41:08 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								5629fdd518 
								
							 
						 
						
							
							
								
								optimize qwen2_vl multiple image input or video input ( #12487 )  
							
							 
							
							
							
						 
						
							2024-12-04 09:24:38 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c59284418c 
								
							 
						 
						
							
							
								
								Hotfix of BCE-Emdedding model ( #12490 )  
							
							 
							
							
							
						 
						
							2024-12-03 18:16:04 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								4ac66db034 
								
							 
						 
						
							
							
								
								[NPU] Support streaming in Python (cpp backend) ( #12488 )  
							
							 
							
							... 
							
							
							
							* Support streaming in NPU Python (cpp backend)
* Small fix 
							
						 
						
							2024-12-03 17:17:26 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jin, Qiao 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7082844f3f 
								
							 
						 
						
							
							
								
								Fix NPU LLM example save/load tokenizer ( #12485 )  
							
							 
							
							
							
						 
						
							2024-12-03 16:30:55 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jin, Qiao 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								5fe766788e 
								
							 
						 
						
							
							
								
								Fix MiniCPM-V-2_6 running on NPU ( #12486 )  
							
							 
							
							
							
						 
						
							2024-12-03 16:16:29 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								598603bea6 
								
							 
						 
						
							
							
								
								small fix of imatrix ( #12480 )  
							
							 
							
							
							
						 
						
							2024-12-03 10:46:36 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ab01753b1c 
								
							 
						 
						
							
							
								
								[NPU] update save-load API usage ( #12473 )  
							
							 
							
							
							
						 
						
							2024-12-03 09:46:15 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								26adb82ee3 
								
							 
						 
						
							
							
								
								[NPU] Remove hard code ( #12479 )  
							
							 
							
							
							
						 
						
							2024-12-02 18:26:07 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								b2e56a2e03 
								
							 
						 
						
							
							
								
								Add release support for option xpu_arc ( #12422 )  
							
							 
							
							... 
							
							
							
							* Add release support for xpu-arc
* Dependency update 
							
						 
						
							2024-12-02 17:16:04 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								aee9acb303 
								
							 
						 
						
							
							
								
								Add NPU QuickStart & update example links ( #12470 )  
							
							 
							
							... 
							
							
							
							* Add initial NPU quickstart (c++ part unfinished)
* Small update
* Update based on comments
* Update main readme
* Remove LLaMA description
* Small fix
* Small fix
* Remove subsection link in main README
* Small fix
* Update based on comments
* Small fix
* TOC update and other small fixes
* Update for Chinese main readme
* Update based on comments and other small fixes
* Change order 
							
						 
						
							2024-12-02 17:03:10 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jin, Qiao 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								31c69a8d31 
								
							 
						 
						
							
							
								
								Fix MiniCPM-V models running on NPU ( #12478 )  
							
							 
							
							
							
						 
						
							2024-12-02 16:29:46 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								54d9a590d4 
								
							 
						 
						
							
							
								
								[NPU]Fix eos_token setting ( #12475 )  
							
							 
							
							
							
						 
						
							2024-12-02 14:18:22 +08:00