Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								3fe2ea3081 
								
							 
						 
						
							
							
								
								[NPU] Reuse prefill of acc lib for pipeline ( #12279 )  
							
							 
							
							... 
							
							
							
							* first commit
* update example
* fix style
* update example
* embedding as const
* fix generate
* code  refactor
* meet code review
* fix style
* change max_output_len to max_context_len
* fix all-in-one
* fix example
* add check for new tokens 
							
						 
						
							2024-10-28 16:05:49 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								42a528ded9 
								
							 
						 
						
							
							
								
								Small update to MTL iGPU Linux Prerequisites installation guide ( #12281 )  
							
							 
							
							... 
							
							
							
							* Small update MTL iGPU Linux Prerequisites installation guide
* Small fix 
							
						 
						
							2024-10-28 14:12:07 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								16074ae2a4 
								
							 
						 
						
							
							
								
								Update Linux prerequisites installation guide for MTL iGPU ( #12263 )  
							
							 
							
							... 
							
							
							
							* Update Linux prerequisites installation guide for MTL iGPU
* Further link update
* Small fixes
* Small fix
* Update based on comments
* Small fix
* Make oneAPI installation a shared section for both MTL iGPU and other GPU
* Small fix
* Small fix
* Clarify description 
							
						 
						
							2024-10-28 09:27:14 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ec362e6133 
								
							 
						 
						
							
							
								
								Add llama3 level0 example ( #12275 )  
							
							 
							
							
							
						 
						
							2024-10-28 09:24:51 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									SONG Ge 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								08cb065370 
								
							 
						 
						
							
							
								
								hot-fix redundant import funasr ( #12277 )  
							
							 
							
							
							
						 
						
							2024-10-25 19:40:39 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									SONG Ge 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a0c6432899 
								
							 
						 
						
							
							
								
								[NPU] Add support for loading a FunASR model ( #12073 )  
							
							 
							
							... 
							
							
							
							* add support for loading funasr model
* add initial support for paraformer-encoder
* add npu ops impl
* add encoder-decoder npu pipeline
* move paraformer encoders prefix 30 layers  to npu and keep the rest layers on cpu 
							
						 
						
							2024-10-25 17:22:01 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								854398f6e0 
								
							 
						 
						
							
							
								
								update example to reduce peak memory usage ( #12274 )  
							
							 
							
							
							
						 
						
							2024-10-25 17:09:26 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								e713296090 
								
							 
						 
						
							
							
								
								Update all-in-one benchmark ( #12272 )  
							
							 
							
							... 
							
							
							
							* Update all-in-one benchmark
* Small fix
* Small fix
* Small fix 
							
						 
						
							2024-10-25 16:52:59 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								43b25a2fe7 
								
							 
						 
						
							
							
								
								Fix llama 3.2 vision on LNL ( #12264 )  
							
							 
							
							... 
							
							
							
							* Fix llama 3.2 vision on LNL
* Small fix 
							
						 
						
							2024-10-25 16:23:31 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								94c4568988 
								
							 
						 
						
							
							
								
								Update windows installation guide regarding troubleshooting ( #12270 )  
							
							 
							
							
							
						 
						
							2024-10-25 14:32:38 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								93895b2ac2 
								
							 
						 
						
							
							
								
								Openvino all in one benchmark small fix ( #12269 )  
							
							 
							
							... 
							
							
							
							* Small update for all-in-one benchmark readme to support OpenVINO tests
* Small fix 
							
						 
						
							2024-10-25 14:13:52 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Zijie Li 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f7f62a3fef 
								
							 
						 
						
							
							
								
								Add OpenVINO performance tests to all-in-one benchmark ( #12238 )  
							
							 
							
							... 
							
							
							
							* add-openvino-to-all-in-one
* update on openvino API
* Update save_openvino.py
* Update save_openvino.py
* Update save_openvino.py
* update on run.py and save_openvino
* update references
* Create openvino-requirements.txt
* fix on comments
* Small updates
* Small fix
* Fix
---------
Co-authored-by: Yuwen Hu <yuwen.hu@intel.com> 
							
						 
						
							2024-10-25 13:53:53 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ae57e23e4f 
								
							 
						 
						
							
							
								
								fix incompatibility between llama GW & llama pipeline ( #12267 )  
							
							 
							
							... 
							
							
							
							* fix
* fix 
							
						 
						
							2024-10-25 10:31:44 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yina Chen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								b5e663854b 
								
							 
						 
						
							
							
								
								[NPU] Support llama groupwise ( #12260 )  
							
							 
							
							... 
							
							
							
							* support llama gw
* support llama gw lm_head
* fix style
* remove unused code 
							
						 
						
							2024-10-24 18:06:45 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Shaojun Liu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								48fc63887d 
								
							 
						 
						
							
							
								
								use oneccl 0.0.5.1 ( #12262 )  
							
							 
							
							
							
						 
						
							2024-10-24 16:12:24 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									joan726 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								e0a95eb2d6 
								
							 
						 
						
							
							
								
								Add llama_cpp_quickstart.zh-CN.md ( #12221 )  
							
							 
							
							
							
						 
						
							2024-10-24 16:08:31 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xin Qiu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								39c9d1de52 
								
							 
						 
						
							
							
								
								fix code geex ( #12261 )  
							
							 
							
							
							
						 
						
							2024-10-24 14:34:01 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f3a2b20e6b 
								
							 
						 
						
							
							
								
								Optimize gpt2 ( #12259 )  
							
							 
							
							
							
						 
						
							2024-10-24 13:44:24 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								821fd96367 
								
							 
						 
						
							
							
								
								Initial integrate our L0 Llama impl into ipex-llm ( #12255 )  
							
							 
							
							... 
							
							
							
							* temp save
* initial support
* fix
* simplify code
* fix style
* fix example
* make default value of pipeline as False 
							
						 
						
							2024-10-24 09:49:27 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								cacc891962 
								
							 
						 
						
							
							
								
								Fix PR validation ( #12253 )  
							
							 
							
							
							
						 
						
							2024-10-23 18:10:47 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								b685cf4349 
								
							 
						 
						
							
							
								
								Fix npu group size setting of optimize_model=False ( #12256 )  
							
							 
							
							
							
						 
						
							2024-10-23 17:53:54 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								567b77a76b 
								
							 
						 
						
							
							
								
								Support IR and blob format for llama level0 pipeline ( #12251 )  
							
							 
							
							
							
						 
						
							2024-10-23 16:02:35 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								578aef245d 
								
							 
						 
						
							
							
								
								Fix models auto choose SdpaAttention with ipex 2.3 ( #12252 )  
							
							 
							
							
							
						 
						
							2024-10-23 15:33:45 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								88dc120a4c 
								
							 
						 
						
							
							
								
								fix fp16 linear ( #12250 )  
							
							 
							
							
							
						 
						
							2024-10-23 14:35:19 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yina Chen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								e8cf7f32f5 
								
							 
						 
						
							
							
								
								npu gw small fix ( #12249 )  
							
							 
							
							
							
						 
						
							2024-10-23 14:26:01 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Shaojun Liu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								aae2490cb8 
								
							 
						 
						
							
							
								
								fix UT ( #12247 )  
							
							 
							
							... 
							
							
							
							* fix ut
* Update test_transformers_api_attention.py
* Update test_transformers_api_mlp.py 
							
						 
						
							2024-10-23 14:13:06 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yina Chen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								e37f951cce 
								
							 
						 
						
							
							
								
								[NPU] Groupwise ( #12241 )  
							
							 
							
							... 
							
							
							
							* dq divide
* fix
* support attn divide
* update qwen2 7b
* divide down_proj & other linear
* use concat & reduce sum
* support scale after
* support qwen2
* w/ mm
* update reshape
* spda
* split
* split 2+
* update
* lm head-> 28
* no scale
* update
* update
* update
* fix style
* fix style
* to split linear
* update
* update code
* address comments
* fix style & remove redundant code & revert benchmark scripts
* fix style & remove code
* update save & load
---------
Co-authored-by: Yang Wang <yang3.wang@intel.com> 
							
						 
						
							2024-10-23 14:10:58 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jun Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								aedc4edfba 
								
							 
						 
						
							
							
								
								[ADD] add open webui + vllm serving ( #12246 )  
							
							 
							
							
							
						 
						
							2024-10-23 10:13:14 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jin, Qiao 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								8fa98e2742 
								
							 
						 
						
							
							
								
								Remove Qwen2-7b from NPU example for "Run Optimized Models (Experimental)" ( #12245 )  
							
							 
							
							... 
							
							
							
							* Remove qwen2-7b from npu example readme
* fix 
							
						 
						
							2024-10-22 17:07:51 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yina Chen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ec465fbcd7 
								
							 
						 
						
							
							
								
								Add lookup generate in load_low_bit ( #12243 )  
							
							 
							
							... 
							
							
							
							* add lookup generate in load_low_bit
* update comment 
							
						 
						
							2024-10-22 15:51:52 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								d8c1287335 
								
							 
						 
						
							
							
								
								Further update for Windows dGPU performance tests ( #12244 )  
							
							 
							
							
							
						 
						
							2024-10-22 15:07:21 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jason Dai 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a35cf4d533 
								
							 
						 
						
							
							
								
								Update README.md ( #12242 )  
							
							 
							
							
							
						 
						
							2024-10-22 10:19:07 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								b3df47486d 
								
							 
						 
						
							
							
								
								Fix Gemma 2 on LNL ( #12240 )  
							
							 
							
							... 
							
							
							
							* Fix gemma 2 on LNL
* Python style fix 
							
						 
						
							2024-10-21 18:25:53 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ac2dac857c 
								
							 
						 
						
							
							
								
								Disable 4k input test for now for Windows dGPU performance test ( #12239 )  
							
							 
							
							
							
						 
						
							2024-10-21 15:03:26 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ea5154d85e 
								
							 
						 
						
							
							
								
								Further update to Windows dGPU perf test ( #12237 )  
							
							 
							
							
							
						 
						
							2024-10-21 10:27:16 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								da9270be2d 
								
							 
						 
						
							
							
								
								Further update to Windows dGPU perf test ( #12233 )  
							
							 
							
							
							
						 
						
							2024-10-18 23:20:17 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								5935b25622 
								
							 
						 
						
							
							
								
								Further update windows gpu perf test regarding results integrity check ( #12232 )  
							
							 
							
							
							
						 
						
							2024-10-18 18:15:13 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ef659629f3 
								
							 
						 
						
							
							
								
								Small update to Windows dGPU perf test ( #12230 )  
							
							 
							
							... 
							
							
							
							* Small update to Windows dGPU perf test
* Small fix
* Small fixes
* Remove unnecessary file 
							
						 
						
							2024-10-18 16:39:59 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								9d7f42fd0f 
								
							 
						 
						
							
							
								
								Support manually trigger of dGPU perf test on Windows ( #12229 )  
							
							 
							
							... 
							
							
							
							* Support manually trigger of dgpu perf test on Windows
* Small fix
* Small fix
* Small update 
							
						 
						
							2024-10-18 15:38:21 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jun Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								b10fc892e1 
								
							 
						 
						
							
							
								
								Update new reference link of xpu/docker/readme.md ( #12188 )  
							
							 
							
							... 
							
							
							
							* [ADD] rewrite new vllm docker quick start
* [ADD] lora adapter doc finished
* [ADD] mulit lora adapter test successfully
* [ADD] add ipex-llm quantization doc
* [Merge] rebase main
* [REMOVE] rm tmp file
* [Merge] rebase main
* [ADD] add prefix caching experiment and result
* [REMOVE] rm cpu offloading chapter
* [ADD] rewrite new vllm docker quick start
* [ADD] lora adapter doc finished
* [ADD] mulit lora adapter test successfully
* [ADD] add ipex-llm quantization doc
* [Merge] rebase main
* [REMOVE] rm tmp file
* [Merge] rebase main
* [ADD] rewrite new vllm docker quick start
* [ADD] lora adapter doc finished
* [ADD] mulit lora adapter test successfully
* [ADD] add ipex-llm quantization doc
* [Merge] rebase main
* [REMOVE] rm tmp file
* [Merge] rebase main
* [UPDATE] update the link to new vllm-docker-quickstart 
							
						 
						
							2024-10-18 13:18:08 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jun Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								fe3b5cd89b 
								
							 
						 
						
							
							
								
								[Update] mmdocs/dockerguide vllm-quick-start awq,gptq online serving document ( #12227 )  
							
							 
							
							... 
							
							
							
							* [FIX] fix the docker start script error
* [ADD] add awq online serving doc
* [ADD] add gptq online serving doc
* [Fix] small fix 
							
						 
						
							2024-10-18 09:46:59 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Shaojun Liu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7825dc1398 
								
							 
						 
						
							
							
								
								Upgrade oneccl to 0.0.5 ( #12223 )  
							
							 
							
							
							
						 
						
							2024-10-18 09:29:19 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								b88c1df324 
								
							 
						 
						
							
							
								
								Add Llama 3.1 & 3.2 to Arc Performance test ( #12225 )  
							
							 
							
							... 
							
							
							
							* Add llama3.1 and llama3.2 in arc perf (#12202 )
* Add llama3.1 and llama3.2 in arc perf
* Uninstall trl after arc test on transformers>=4.40
* Fix arc llama3 perf (#12212 )
* Fix pip uninstall
* Uninstall trl after test on transformers==4.43.1
* Fix llama3 arc perf (#12218 )
---------
Co-authored-by: Jin, Qiao <89779290+JinBridger@users.noreply.github.com> 
							
						 
						
							2024-10-17 21:12:45 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								9ea694484d 
								
							 
						 
						
							
							
								
								refactor ot remove old rope usage ( #12224 )  
							
							 
							
							
							
						 
						
							2024-10-17 17:06:09 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								324bcb057e 
								
							 
						 
						
							
							
								
								refactor to reduce old rope usage ( #12219 )  
							
							 
							
							
							
						 
						
							2024-10-17 14:45:09 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jiao Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								667f0db466 
								
							 
						 
						
							
							
								
								Update Eagle example to Eagle2+ipex-llm integration ( #11717 )  
							
							 
							
							... 
							
							
							
							* update to e2 example
* update
* update 
							
						 
						
							2024-10-16 23:16:14 -07:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Shaojun Liu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								26390f9213 
								
							 
						 
						
							
							
								
								Update oneccl_wks_installer to 2024.0.0.4.1 ( #12217 )  
							
							 
							
							
							
						 
						
							2024-10-17 10:11:55 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a4a758656a 
								
							 
						 
						
							
							
								
								refactor gemma to reduce old fuse rope usage ( #12215 )  
							
							 
							
							
							
						 
						
							2024-10-16 17:40:28 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								9104a168f6 
								
							 
						 
						
							
							
								
								refactor phi-2 to reduce old fuse rope usage ( #12214 )  
							
							 
							
							
							
						 
						
							2024-10-16 17:08:14 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								bb247e991b 
								
							 
						 
						
							
							
								
								refactor merge_qkv and attention_softmax ( #12213 )  
							
							 
							
							
							
						 
						
							2024-10-16 15:58:14 +08:00