binbin Deng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								4892df61c9 
								
							 
						 
						
							
							
								
								Add qwen2-1.5b in l0 pipeline example ( #12306 )  
							
							 
							
							
							
						 
						
							2024-10-31 16:44:25 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jinhe 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								30f668c206 
								
							 
						 
						
							
							
								
								updated transformers & accelerate requirements ( #12301 )  
							
							 
							
							
							
						 
						
							2024-10-31 15:59:40 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xin Qiu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								97a0f7fd35 
								
							 
						 
						
							
							
								
								Codegeex support ( #12303 )  
							
							 
							
							... 
							
							
							
							* new codegeex attn
* use kv cache
* add compress/quantize kv
* remove compress/quantize kv
* fix style check
* fix style
* fix codegeex 
							
						 
						
							2024-10-31 15:28:56 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								72605c7016 
								
							 
						 
						
							
							
								
								fix llama3.1/3.2 quantize kv check ( #12302 )  
							
							 
							
							
							
						 
						
							2024-10-31 11:55:07 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Kai Huang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								416c19165c 
								
							 
						 
						
							
							
								
								Add Qwen pipeline and example ( #12292 )  
							
							 
							
							... 
							
							
							
							* support qwen pipeline
* update error msg
* style
* meet review
* minor 
							
						 
						
							2024-10-31 11:25:25 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Rahul Nair 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								4cf1ccc43a 
								
							 
						 
						
							
							
								
								Update DPO EADME.md ( #12162 )  
							
							 
							
							... 
							
							
							
							bitsanbytes multi backend is now available and is required , otherwise would error out saying that no cuda is available 
							
						 
						
							2024-10-31 10:56:46 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Chu,Youcheng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								29400e2e75 
								
							 
						 
						
							
							
								
								feat: change oneccl to internal ( #12296 )  
							
							 
							
							... 
							
							
							
							* feat: change oneccl
* fix: restore llama-70b
* fix: remove tab
* fix: remove extra blank
* small fix
* add comments
* fix: add a blank space 
							
						 
						
							2024-10-31 09:51:43 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Zijie Li 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								6f22133efc 
								
							 
						 
						
							
							
								
								Update AWQ and GPTQ GPU example ( #12300 )  
							
							 
							
							
							
						 
						
							2024-10-31 09:35:31 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yina Chen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								0763268e4c 
								
							 
						 
						
							
							
								
								[NPU]Qwen2 groupwise performance opt ( #12299 )  
							
							 
							
							... 
							
							
							
							* qwen2 gw performance opt
* remove debug 
							
						 
						
							2024-10-30 17:40:21 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								41b8064554 
								
							 
						 
						
							
							
								
								Support minicpm-1B in level0 pipeline ( #12297 )  
							
							 
							
							
							
						 
						
							2024-10-30 17:21:47 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jinhe 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								46d8300f6b 
								
							 
						 
						
							
							
								
								bugfix for qlora finetuning on GPU ( #12298 )  
							
							 
							
							... 
							
							
							
							* bugfix for qlora 100 step error
* indent fix
* annotation fix 
							
						 
						
							2024-10-30 16:54:10 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yina Chen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								70037ad55f 
								
							 
						 
						
							
							
								
								Groupwise prefill optimization ( #12291 )  
							
							 
							
							... 
							
							
							
							* except lm_head
* remove
* support gw lm_head
* update
* fix
* remove run.bat
* fix style
* support llama3
* slice -> split
* remove debug
* fix style
* add dpu 
							
						 
						
							2024-10-30 14:59:45 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								540eaeb12c 
								
							 
						 
						
							
							
								
								refactor attention_softmax ( #12295 )  
							
							 
							
							
							
						 
						
							2024-10-30 13:20:50 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								2b2cb9c693 
								
							 
						 
						
							
							
								
								[NPU pipeline] Support save & load and update examples ( #12293 )  
							
							 
							
							... 
							
							
							
							* support save & load, update llama examples
* update baichuan2 example
* update readme 
							
						 
						
							2024-10-30 10:02:00 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								5a15098835 
								
							 
						 
						
							
							
								
								Initial support for quantized forward on CPU when quantization_group_size=0 ( #12282 )  
							
							 
							
							... 
							
							
							
							* Initial support for quantized forward on CPU when quantization_group_size=0
* Style fix
* Style fix
* Small fix
* Small fix 
							
						 
						
							2024-10-29 19:40:17 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								3feb58d1e4 
								
							 
						 
						
							
							
								
								Support baichuan2 for level0 pipeline ( #12289 )  
							
							 
							
							
							
						 
						
							2024-10-29 19:24:16 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Zhao Changmin 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								546f455e8e 
								
							 
						 
						
							
							
								
								Patch sdpa check function in specific module attributes table ( #12285 )  
							
							 
							
							
							
						 
						
							2024-10-29 18:41:09 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								821b0033ed 
								
							 
						 
						
							
							
								
								[NPU L0] update layernorm & code refactor ( #12287 )  
							
							 
							
							... 
							
							
							
							* update layernorm & code refactor
* fix style
* add common utils
* change to Pool()
* remove print 
							
						 
						
							2024-10-29 15:01:45 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yina Chen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								4467645088 
								
							 
						 
						
							
							
								
								[NPU] Support l0 Llama groupwise ( #12276 )  
							
							 
							
							... 
							
							
							
							* except lm_head
* remove
* support gw lm_head
* update
* fix
* remove run.bat
* fix style
* support llama3 
							
						 
						
							2024-10-28 17:06:55 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								3fe2ea3081 
								
							 
						 
						
							
							
								
								[NPU] Reuse prefill of acc lib for pipeline ( #12279 )  
							
							 
							
							... 
							
							
							
							* first commit
* update example
* fix style
* update example
* embedding as const
* fix generate
* code  refactor
* meet code review
* fix style
* change max_output_len to max_context_len
* fix all-in-one
* fix example
* add check for new tokens 
							
						 
						
							2024-10-28 16:05:49 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ec362e6133 
								
							 
						 
						
							
							
								
								Add llama3 level0 example ( #12275 )  
							
							 
							
							
							
						 
						
							2024-10-28 09:24:51 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									SONG Ge 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								08cb065370 
								
							 
						 
						
							
							
								
								hot-fix redundant import funasr ( #12277 )  
							
							 
							
							
							
						 
						
							2024-10-25 19:40:39 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									SONG Ge 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a0c6432899 
								
							 
						 
						
							
							
								
								[NPU] Add support for loading a FunASR model ( #12073 )  
							
							 
							
							... 
							
							
							
							* add support for loading funasr model
* add initial support for paraformer-encoder
* add npu ops impl
* add encoder-decoder npu pipeline
* move paraformer encoders prefix 30 layers  to npu and keep the rest layers on cpu 
							
						 
						
							2024-10-25 17:22:01 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								854398f6e0 
								
							 
						 
						
							
							
								
								update example to reduce peak memory usage ( #12274 )  
							
							 
							
							
							
						 
						
							2024-10-25 17:09:26 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								e713296090 
								
							 
						 
						
							
							
								
								Update all-in-one benchmark ( #12272 )  
							
							 
							
							... 
							
							
							
							* Update all-in-one benchmark
* Small fix
* Small fix
* Small fix 
							
						 
						
							2024-10-25 16:52:59 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								43b25a2fe7 
								
							 
						 
						
							
							
								
								Fix llama 3.2 vision on LNL ( #12264 )  
							
							 
							
							... 
							
							
							
							* Fix llama 3.2 vision on LNL
* Small fix 
							
						 
						
							2024-10-25 16:23:31 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								93895b2ac2 
								
							 
						 
						
							
							
								
								Openvino all in one benchmark small fix ( #12269 )  
							
							 
							
							... 
							
							
							
							* Small update for all-in-one benchmark readme to support OpenVINO tests
* Small fix 
							
						 
						
							2024-10-25 14:13:52 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Zijie Li 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f7f62a3fef 
								
							 
						 
						
							
							
								
								Add OpenVINO performance tests to all-in-one benchmark ( #12238 )  
							
							 
							
							... 
							
							
							
							* add-openvino-to-all-in-one
* update on openvino API
* Update save_openvino.py
* Update save_openvino.py
* Update save_openvino.py
* update on run.py and save_openvino
* update references
* Create openvino-requirements.txt
* fix on comments
* Small updates
* Small fix
* Fix
---------
Co-authored-by: Yuwen Hu <yuwen.hu@intel.com> 
							
						 
						
							2024-10-25 13:53:53 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ae57e23e4f 
								
							 
						 
						
							
							
								
								fix incompatibility between llama GW & llama pipeline ( #12267 )  
							
							 
							
							... 
							
							
							
							* fix
* fix 
							
						 
						
							2024-10-25 10:31:44 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yina Chen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								b5e663854b 
								
							 
						 
						
							
							
								
								[NPU] Support llama groupwise ( #12260 )  
							
							 
							
							... 
							
							
							
							* support llama gw
* support llama gw lm_head
* fix style
* remove unused code 
							
						 
						
							2024-10-24 18:06:45 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xin Qiu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								39c9d1de52 
								
							 
						 
						
							
							
								
								fix code geex ( #12261 )  
							
							 
							
							
							
						 
						
							2024-10-24 14:34:01 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f3a2b20e6b 
								
							 
						 
						
							
							
								
								Optimize gpt2 ( #12259 )  
							
							 
							
							
							
						 
						
							2024-10-24 13:44:24 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								821fd96367 
								
							 
						 
						
							
							
								
								Initial integrate our L0 Llama impl into ipex-llm ( #12255 )  
							
							 
							
							... 
							
							
							
							* temp save
* initial support
* fix
* simplify code
* fix style
* fix example
* make default value of pipeline as False 
							
						 
						
							2024-10-24 09:49:27 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								cacc891962 
								
							 
						 
						
							
							
								
								Fix PR validation ( #12253 )  
							
							 
							
							
							
						 
						
							2024-10-23 18:10:47 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								b685cf4349 
								
							 
						 
						
							
							
								
								Fix npu group size setting of optimize_model=False ( #12256 )  
							
							 
							
							
							
						 
						
							2024-10-23 17:53:54 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								567b77a76b 
								
							 
						 
						
							
							
								
								Support IR and blob format for llama level0 pipeline ( #12251 )  
							
							 
							
							
							
						 
						
							2024-10-23 16:02:35 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								578aef245d 
								
							 
						 
						
							
							
								
								Fix models auto choose SdpaAttention with ipex 2.3 ( #12252 )  
							
							 
							
							
							
						 
						
							2024-10-23 15:33:45 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								88dc120a4c 
								
							 
						 
						
							
							
								
								fix fp16 linear ( #12250 )  
							
							 
							
							
							
						 
						
							2024-10-23 14:35:19 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yina Chen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								e8cf7f32f5 
								
							 
						 
						
							
							
								
								npu gw small fix ( #12249 )  
							
							 
							
							
							
						 
						
							2024-10-23 14:26:01 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Shaojun Liu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								aae2490cb8 
								
							 
						 
						
							
							
								
								fix UT ( #12247 )  
							
							 
							
							... 
							
							
							
							* fix ut
* Update test_transformers_api_attention.py
* Update test_transformers_api_mlp.py 
							
						 
						
							2024-10-23 14:13:06 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yina Chen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								e37f951cce 
								
							 
						 
						
							
							
								
								[NPU] Groupwise ( #12241 )  
							
							 
							
							... 
							
							
							
							* dq divide
* fix
* support attn divide
* update qwen2 7b
* divide down_proj & other linear
* use concat & reduce sum
* support scale after
* support qwen2
* w/ mm
* update reshape
* spda
* split
* split 2+
* update
* lm head-> 28
* no scale
* update
* update
* update
* fix style
* fix style
* to split linear
* update
* update code
* address comments
* fix style & remove redundant code & revert benchmark scripts
* fix style & remove code
* update save & load
---------
Co-authored-by: Yang Wang <yang3.wang@intel.com> 
							
						 
						
							2024-10-23 14:10:58 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jin, Qiao 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								8fa98e2742 
								
							 
						 
						
							
							
								
								Remove Qwen2-7b from NPU example for "Run Optimized Models (Experimental)" ( #12245 )  
							
							 
							
							... 
							
							
							
							* Remove qwen2-7b from npu example readme
* fix 
							
						 
						
							2024-10-22 17:07:51 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yina Chen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ec465fbcd7 
								
							 
						 
						
							
							
								
								Add lookup generate in load_low_bit ( #12243 )  
							
							 
							
							... 
							
							
							
							* add lookup generate in load_low_bit
* update comment 
							
						 
						
							2024-10-22 15:51:52 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								b3df47486d 
								
							 
						 
						
							
							
								
								Fix Gemma 2 on LNL ( #12240 )  
							
							 
							
							... 
							
							
							
							* Fix gemma 2 on LNL
* Python style fix 
							
						 
						
							2024-10-21 18:25:53 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								5935b25622 
								
							 
						 
						
							
							
								
								Further update windows gpu perf test regarding results integrity check ( #12232 )  
							
							 
							
							
							
						 
						
							2024-10-18 18:15:13 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								b88c1df324 
								
							 
						 
						
							
							
								
								Add Llama 3.1 & 3.2 to Arc Performance test ( #12225 )  
							
							 
							
							... 
							
							
							
							* Add llama3.1 and llama3.2 in arc perf (#12202 )
* Add llama3.1 and llama3.2 in arc perf
* Uninstall trl after arc test on transformers>=4.40
* Fix arc llama3 perf (#12212 )
* Fix pip uninstall
* Uninstall trl after test on transformers==4.43.1
* Fix llama3 arc perf (#12218 )
---------
Co-authored-by: Jin, Qiao <89779290+JinBridger@users.noreply.github.com> 
							
						 
						
							2024-10-17 21:12:45 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								9ea694484d 
								
							 
						 
						
							
							
								
								refactor ot remove old rope usage ( #12224 )  
							
							 
							
							
							
						 
						
							2024-10-17 17:06:09 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								324bcb057e 
								
							 
						 
						
							
							
								
								refactor to reduce old rope usage ( #12219 )  
							
							 
							
							
							
						 
						
							2024-10-17 14:45:09 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jiao Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								667f0db466 
								
							 
						 
						
							
							
								
								Update Eagle example to Eagle2+ipex-llm integration ( #11717 )  
							
							 
							
							... 
							
							
							
							* update to e2 example
* update
* update 
							
						 
						
							2024-10-16 23:16:14 -07:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a4a758656a 
								
							 
						 
						
							
							
								
								refactor gemma to reduce old fuse rope usage ( #12215 )  
							
							 
							
							
							
						 
						
							2024-10-16 17:40:28 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								9104a168f6 
								
							 
						 
						
							
							
								
								refactor phi-2 to reduce old fuse rope usage ( #12214 )  
							
							 
							
							
							
						 
						
							2024-10-16 17:08:14 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								bb247e991b 
								
							 
						 
						
							
							
								
								refactor merge_qkv and attention_softmax ( #12213 )  
							
							 
							
							
							
						 
						
							2024-10-16 15:58:14 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								e279148aa0 
								
							 
						 
						
							
							
								
								optimize llama3.2 vision again ( #12211 )  
							
							 
							
							
							
						 
						
							2024-10-16 14:29:48 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Chu,Youcheng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f17cc4fdee 
								
							 
						 
						
							
							
								
								feat: add llama3.2-11b-vision in all in one ( #12207 )  
							
							 
							
							... 
							
							
							
							* feat: add llama3.2-11b-vision in all in one
* fix: change model
* fix: change name
* fix: add a space
* fix: switch import 
							
						 
						
							2024-10-16 10:32:11 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c9ac39fc1e 
								
							 
						 
						
							
							
								
								Add Llama 3.2 to iGPU performance test (transformers 4.45) ( #12209 )  
							
							 
							
							... 
							
							
							
							* Add Llama 3.2 to iGPU Perf (#12200 )
* Add Llama 3.2 to iGPU Perf
* Downgrade accelerate after step
* Temporarily disable model for test
* Temporarily change ERRORLEVEL check (#12201 )
* Restore llama3.2 perf (#12206 )
* Revert "Temporarily change ERRORLEVEL check"
This reverts commit 909dbbc930ab4283737161a55bb32006e6ca1991.
* Revert "Temporarily disable model for test"
This reverts commit 95322dc3c6429aa836f21bda0b5ba8d9b48592f8.
---------
Co-authored-by: Jin, Qiao <89779290+JinBridger@users.noreply.github.com> 
							
						 
						
							2024-10-15 17:44:46 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f6611f9d3a 
								
							 
						 
						
							
							
								
								optimize llama3.2 vison attention again ( #12204 )  
							
							 
							
							
							
						 
						
							2024-10-15 16:08:20 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								9b81236a2e 
								
							 
						 
						
							
							
								
								optimzie qwen2-vl vision ( #12203 )  
							
							 
							
							
							
						 
						
							2024-10-15 15:54:25 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								d5344587ab 
								
							 
						 
						
							
							
								
								optimize internvl2 vision model's attention ( #12198 )  
							
							 
							
							
							
						 
						
							2024-10-15 10:51:00 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f8d1adc573 
								
							 
						 
						
							
							
								
								Fix Llama 3.2 & 3.1 on LNL ( #12196 )  
							
							 
							
							
							
						 
						
							2024-10-14 17:39:20 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								516b578104 
								
							 
						 
						
							
							
								
								Support cpp release for ARL on Windows ( #12189 )  
							
							 
							
							... 
							
							
							
							* Support cpp Windows release for ARL
* Temp commit for test
* Remove temp commit 
							
						 
						
							2024-10-14 17:20:31 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Zijie Li 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7d80db710e 
								
							 
						 
						
							
							
								
								Add benchmark_util for transformers >= 4.44.0 ( #12171 )  
							
							 
							
							... 
							
							
							
							* Create benchmark_util_4_45.py
* Update __init__.py
* Update lint-python
* Update benchmark_util_4_45.py
* Update benchmark_util_4_45.py
* Create benchmark_util_4_44.py 
							
						 
						
							2024-10-14 15:40:12 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jin, Qiao 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								8e35800abe 
								
							 
						 
						
							
							
								
								Add llama 3.1 in igpu perf ( #12194 )  
							
							 
							
							
							
						 
						
							2024-10-14 15:14:34 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ddcdf47539 
								
							 
						 
						
							
							
								
								Support Windows ARL release ( #12183 )  
							
							 
							
							... 
							
							
							
							* Support release for ARL
* Small fix
* Small fix to doc
* Temp for test
* Remove temp commit for test 
							
						 
						
							2024-10-11 18:30:52 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jinhe 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f983f1a8f4 
								
							 
						 
						
							
							
								
								Add Qwen2-VL gpu example ( #12135 )  
							
							 
							
							... 
							
							
							
							* qwen2-vl readme
* add qwen2-vl example
* fix
* fix
* fix
* add link
* Update regarding modules_to_not_convert and readme
* Further fix
* Small fix
---------
Co-authored-by: Yuwen Hu <yuwen.hu@intel.com> 
							
						 
						
							2024-10-11 18:25:23 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								310f18c8af 
								
							 
						 
						
							
							
								
								update NPU pipeline generate ( #12182 )  
							
							 
							
							... 
							
							
							
							* update
* fix style 
							
						 
						
							2024-10-11 17:39:20 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Shaojun Liu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								724b2ae66d 
								
							 
						 
						
							
							
								
								add npu-level0 pipeline.dll to ipex-llm ( #12181 )  
							
							 
							
							... 
							
							
							
							* add npu-level0 pipeline.dll to ipex-llm
* test
* update runner label
* fix
* update
* fix
* fix 
							
						 
						
							2024-10-11 16:05:20 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								4d93bb81fe 
								
							 
						 
						
							
							
								
								Initial support of NPU level0 Model ( #12177 )  
							
							 
							
							... 
							
							
							
							* first commit to support load dll and init llm pipeline
* add init generate
* fix style
* small updates
* fix style and check tokens number 
							
						 
						
							2024-10-11 09:45:53 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								890662610b 
								
							 
						 
						
							
							
								
								Fix auto importer for LNL release ( #12175 )  
							
							 
							
							
							
						 
						
							2024-10-10 15:17:43 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								535bee5381 
								
							 
						 
						
							
							
								
								fix qwen2 vl again ( #12174 )  
							
							 
							
							
							
						 
						
							2024-10-10 13:50:01 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								aef1f671bd 
								
							 
						 
						
							
							
								
								Support LNL Windows release ( #12169 )  
							
							 
							
							... 
							
							
							
							* Release for LNL on Windows
* Temp commit for release test
* Change option name
* Remove temp commit and change option name
* temp commit for test again
* Remove temp commit 
							
						 
						
							2024-10-09 17:41:10 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								78d253165d 
								
							 
						 
						
							
							
								
								optimize qwen2 vl perf again ( #12167 )  
							
							 
							
							
							
						 
						
							2024-10-09 16:43:48 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Zijie Li 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								3d044dbf53 
								
							 
						 
						
							
							
								
								add llama3.2-vision Pytorch example ( #12165 )  
							
							 
							
							
							
						 
						
							2024-10-09 09:20:42 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								644af2a76e 
								
							 
						 
						
							
							
								
								add basic llama 3.2 vision support ( #12163 )  
							
							 
							
							
							
						 
						
							2024-10-08 10:46:48 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ch1y0q 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								17c23cd759 
								
							 
						 
						
							
							
								
								add llama3.2 GPU example ( #12137 )  
							
							 
							
							... 
							
							
							
							* add llama3.2 GPU example
* change prompt format reference url
* update
* add Meta-Llama-3.2-1B-Instruct sample output
* update wording 
							
						 
						
							2024-09-29 14:41:54 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f71b38a994 
								
							 
						 
						
							
							
								
								Update MiniCPM_V_26 GPU example with save & load ( #12127 )  
							
							 
							
							
							
						 
						
							2024-09-26 17:40:22 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								669ff1a97b 
								
							 
						 
						
							
							
								
								fix sd1.5 ( #12129 )  
							
							 
							
							
							
						 
						
							2024-09-26 17:15:16 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a266528719 
								
							 
						 
						
							
							
								
								optimize llama 3.2 rope ( #12128 )  
							
							 
							
							
							
						 
						
							2024-09-26 16:08:10 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								584c3489e7 
								
							 
						 
						
							
							
								
								add basic support for llama3.2 ( #12125 )  
							
							 
							
							
							
						 
						
							2024-09-26 15:46:19 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								66f419f8b7 
								
							 
						 
						
							
							
								
								fix qwen2 vl ( #12126 )  
							
							 
							
							
							
						 
						
							2024-09-26 15:44:02 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ch1y0q 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								2ea13d502f 
								
							 
						 
						
							
							
								
								Add minicpm3 gpu example ( #12114 )  
							
							 
							
							... 
							
							
							
							* add minicpm3 gpu example
* update GPU example
* update
---------
Co-authored-by: Huang, Xinshengzi <xinshengzi.huang@intel.com> 
							
						 
						
							2024-09-26 13:51:37 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								77af9bc5fa 
								
							 
						 
						
							
							
								
								support passing None to low_bit in optimize_model ( #12121 )  
							
							 
							
							
							
						 
						
							2024-09-26 11:09:35 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								47e0b83cbf 
								
							 
						 
						
							
							
								
								optimize sd 1.5 ( #12119 )  
							
							 
							
							
							
						 
						
							2024-09-25 15:45:13 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jin, Qiao 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								2bedb17be7 
								
							 
						 
						
							
							
								
								Add Qwen2.5 NPU Example ( #12110 )  
							
							 
							
							... 
							
							
							
							* Add Qwen2.5 NPU Example
* fix
* Merge qwen2.py and qwen2.5.py into qwen.py
* Fix description 
							
						 
						
							2024-09-25 15:20:03 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								5d63aef60b 
								
							 
						 
						
							
							
								
								optimize qwen2 vl again ( #12109 )  
							
							 
							
							
							
						 
						
							2024-09-23 13:22:01 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								03bd01c99c 
								
							 
						 
						
							
							
								
								optimize npu qwen2 ( #12107 )  
							
							 
							
							
							
						 
						
							2024-09-20 19:46:16 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jinhe 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								02399021d6 
								
							 
						 
						
							
							
								
								add npu load_low_bit api in all-in-one benchmark ( #12103 )  
							
							 
							
							
							
						 
						
							2024-09-20 17:56:08 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								9239fd4f12 
								
							 
						 
						
							
							
								
								add basic support and optimization for qwen2-vl ( #12104 )  
							
							 
							
							
							
						 
						
							2024-09-20 17:23:06 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								828fa01ad3 
								
							 
						 
						
							
							
								
								[NPU] Add mixed_precision for Qwen2 7B ( #12098 )  
							
							 
							
							... 
							
							
							
							* Add mix_precision argument to control whether use INT8 lm_head for Qwen2-7B-Instruct
* Small fix
* Fixed on load low bit with mixed precision
* Small fix
* Update example accordingly
* Update for default prompt
* Update base on comments
* Final fix 
							
						 
						
							2024-09-20 16:36:21 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ch1y0q 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								2269768e71 
								
							 
						 
						
							
							
								
								add internvl2 example ( #12102 )  
							
							 
							
							... 
							
							
							
							* add internvl2 example
* add to README.md
* update
* add link to zh-CN readme 
							
						 
						
							2024-09-20 16:31:54 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								09b8c80d9d 
								
							 
						 
						
							
							
								
								update code for NPU qwen2 ( #12094 )  
							
							 
							
							... 
							
							
							
							* update code
* fix 
							
						 
						
							2024-09-20 15:58:32 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jin, Qiao 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								db7500bfd4 
								
							 
						 
						
							
							
								
								Add Qwen2.5 GPU example ( #12101 )  
							
							 
							
							... 
							
							
							
							* Add Qwen2.5 GPU example
* fix end line
* fix description 
							
						 
						
							2024-09-20 15:55:57 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								54b973c744 
								
							 
						 
						
							
							
								
								fix ipex_llm import in transformers 4.45 ( #12099 )  
							
							 
							
							
							
						 
						
							2024-09-20 15:24:59 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ch1y0q 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								9650bf616a 
								
							 
						 
						
							
							
								
								add transpose_value_cache for NPU benchmark ( #12092 )  
							
							 
							
							... 
							
							
							
							* add `transpose_value_cache`
* update
* update 
							
						 
						
							2024-09-19 18:45:05 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f7fb3c896c 
								
							 
						 
						
							
							
								
								Update lm_head optimization for Qwen2 7B ( #12090 )  
							
							 
							
							
							
						 
						
							2024-09-18 17:02:02 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xu, Shuo 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ee33b93464 
								
							 
						 
						
							
							
								
								Longbench: NV code to ipex-llm ( #11662 )  
							
							 
							
							... 
							
							
							
							* add nv longbench
* LongBench: NV code to ipex-llm
* ammend
* add more models support
* ammend
* optimize LongBench's user experience
* ammend
* ammend
* fix typo
* ammend
* remove cuda related information & add a readme
* add license to python scripts & polish the readme
* ammend
* ammend
---------
Co-authored-by: cyita <yitastudy@gmail.com>
Co-authored-by: ATMxsp01 <shou.xu@intel.com>
Co-authored-by: leonardozcm <leonardo1997zcm@gmail.com> 
							
						 
						
							2024-09-18 15:55:14 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Wang, Jian4 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								40e463c66b 
								
							 
						 
						
							
							
								
								Enable vllm load gptq model ( #12083 )  
							
							 
							
							... 
							
							
							
							* enable vllm load gptq model
* update
* update
* update
* update style 
							
						 
						
							2024-09-18 14:41:00 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								081af41def 
								
							 
						 
						
							
							
								
								[NPU] Optimize Qwen2 lm_head to use INT4 ( #12072 )  
							
							 
							
							... 
							
							
							
							* temp save
* update
* fix
* fix
* Split lm_head into 7 parts & remove int8 for lm_head when sym_int4
* Simlify and add condition to code
* Small fix
* refactor some code
* fix style
* fix style
* fix style
* fix
* fix
* temp sav e
* refactor
* fix style
* further refactor
* simplify code
* meet code review
* fix style
---------
Co-authored-by: Yuwen Hu <yuwen.hu@intel.com> 
							
						 
						
							2024-09-14 15:26:46 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ch1y0q 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								b4b8c3e495 
								
							 
						 
						
							
							
								
								add lowbit_path for generate.py, fix npu_model ( #12077 )  
							
							 
							
							... 
							
							
							
							* add `lowbit_path` for `generate.py`, fix `npu_model`
* update `README.md` 
							
						 
						
							2024-09-13 17:28:05 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Wang, Jian4 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								d703e4f127 
								
							 
						 
						
							
							
								
								Enable vllm multimodal minicpm-v-2-6 ( #12074 )  
							
							 
							
							... 
							
							
							
							* enable minicpm-v-2-6
* add image_url readme 
							
						 
						
							2024-09-13 13:28:35 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								48d9092b5a 
								
							 
						 
						
							
							
								
								upgrade OneAPI version for cpp Windows ( #12063 )  
							
							 
							
							... 
							
							
							
							* update version
* update quickstart 
							
						 
						
							2024-09-12 11:12:12 +08:00