Yuwen Hu
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								e713296090
								
							
						 | 
						
							
							
								
								Update all-in-one benchmark (#12272)
							
							
							
							
							
							
							
							* Update all-in-one benchmark
* Small fix
* Small fix
* Small fix 
							
						 | 
						
							2024-10-25 16:52:59 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Yuwen Hu
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								43b25a2fe7
								
							
						 | 
						
							
							
								
								Fix llama 3.2 vision on LNL (#12264)
							
							
							
							
							
							
							
							* Fix llama 3.2 vision on LNL
* Small fix 
							
						 | 
						
							2024-10-25 16:23:31 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Yuwen Hu
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								94c4568988
								
							
						 | 
						
							
							
								
								Update windows installation guide regarding troubleshooting (#12270)
							
							
							
							
							
						 | 
						
							2024-10-25 14:32:38 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Yuwen Hu
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								93895b2ac2
								
							
						 | 
						
							
							
								
								Openvino all in one benchmark small fix (#12269)
							
							
							
							
							
							
							
							* Small update for all-in-one benchmark readme to support OpenVINO tests
* Small fix 
							
						 | 
						
							2024-10-25 14:13:52 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Zijie Li
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								f7f62a3fef
								
							
						 | 
						
							
							
								
								Add OpenVINO performance tests to all-in-one benchmark (#12238)
							
							
							
							
							
							
							
							* add-openvino-to-all-in-one
* update on openvino API
* Update save_openvino.py
* Update save_openvino.py
* Update save_openvino.py
* update on run.py and save_openvino
* update references
* Create openvino-requirements.txt
* fix on comments
* Small updates
* Small fix
* Fix
---------
Co-authored-by: Yuwen Hu <yuwen.hu@intel.com> 
							
						 | 
						
							2024-10-25 13:53:53 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Ruonan Wang
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								ae57e23e4f
								
							
						 | 
						
							
							
								
								fix incompatibility between llama GW & llama pipeline (#12267)
							
							
							
							
							
							
							
							* fix
* fix 
							
						 | 
						
							2024-10-25 10:31:44 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Yina Chen
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								b5e663854b
								
							
						 | 
						
							
							
								
								[NPU] Support llama groupwise (#12260)
							
							
							
							
							
							
							
							* support llama gw
* support llama gw lm_head
* fix style
* remove unused code 
							
						 | 
						
							2024-10-24 18:06:45 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Shaojun Liu
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								48fc63887d
								
							
						 | 
						
							
							
								
								use oneccl 0.0.5.1 (#12262)
							
							
							
							
							
						 | 
						
							2024-10-24 16:12:24 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									joan726
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								e0a95eb2d6
								
							
						 | 
						
							
							
								
								Add llama_cpp_quickstart.zh-CN.md (#12221)
							
							
							
							
							
						 | 
						
							2024-10-24 16:08:31 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Xin Qiu
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								39c9d1de52
								
							
						 | 
						
							
							
								
								fix code geex (#12261)
							
							
							
							
							
						 | 
						
							2024-10-24 14:34:01 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Yishuo Wang
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								f3a2b20e6b
								
							
						 | 
						
							
							
								
								Optimize gpt2 (#12259)
							
							
							
							
							
						 | 
						
							2024-10-24 13:44:24 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Ruonan Wang
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								821fd96367
								
							
						 | 
						
							
							
								
								Initial integrate our L0 Llama impl into ipex-llm (#12255)
							
							
							
							
							
							
							
							* temp save
* initial support
* fix
* simplify code
* fix style
* fix example
* make default value of pipeline as False 
							
						 | 
						
							2024-10-24 09:49:27 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Yishuo Wang
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								cacc891962
								
							
						 | 
						
							
							
								
								Fix PR validation (#12253)
							
							
							
							
							
						 | 
						
							2024-10-23 18:10:47 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									binbin Deng
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								b685cf4349
								
							
						 | 
						
							
							
								
								Fix npu group size setting of optimize_model=False (#12256)
							
							
							
							
							
						 | 
						
							2024-10-23 17:53:54 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									binbin Deng
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								567b77a76b
								
							
						 | 
						
							
							
								
								Support IR and blob format for llama level0 pipeline (#12251)
							
							
							
							
							
						 | 
						
							2024-10-23 16:02:35 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Yishuo Wang
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								578aef245d
								
							
						 | 
						
							
							
								
								Fix models auto choose SdpaAttention with ipex 2.3 (#12252)
							
							
							
							
							
						 | 
						
							2024-10-23 15:33:45 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Yishuo Wang
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								88dc120a4c
								
							
						 | 
						
							
							
								
								fix fp16 linear (#12250)
							
							
							
							
							
						 | 
						
							2024-10-23 14:35:19 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Yina Chen
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								e8cf7f32f5
								
							
						 | 
						
							
							
								
								npu gw small fix (#12249)
							
							
							
							
							
						 | 
						
							2024-10-23 14:26:01 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Shaojun Liu
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								aae2490cb8
								
							
						 | 
						
							
							
								
								fix UT (#12247)
							
							
							
							
							
							
							
							* fix ut
* Update test_transformers_api_attention.py
* Update test_transformers_api_mlp.py 
							
						 | 
						
							2024-10-23 14:13:06 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Yina Chen
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								e37f951cce
								
							
						 | 
						
							
							
								
								[NPU] Groupwise (#12241)
							
							
							
							
							
							
							
							* dq divide
* fix
* support attn divide
* update qwen2 7b
* divide down_proj & other linear
* use concat & reduce sum
* support scale after
* support qwen2
* w/ mm
* update reshape
* spda
* split
* split 2+
* update
* lm head-> 28
* no scale
* update
* update
* update
* fix style
* fix style
* to split linear
* update
* update code
* address comments
* fix style & remove redundant code & revert benchmark scripts
* fix style & remove code
* update save & load
---------
Co-authored-by: Yang Wang <yang3.wang@intel.com> 
							
						 | 
						
							2024-10-23 14:10:58 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Jun Wang
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								aedc4edfba
								
							
						 | 
						
							
							
								
								[ADD] add open webui + vllm serving (#12246)
							
							
							
							
							
						 | 
						
							2024-10-23 10:13:14 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Jin, Qiao
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								8fa98e2742
								
							
						 | 
						
							
							
								
								Remove Qwen2-7b from NPU example for "Run Optimized Models (Experimental)" (#12245)
							
							
							
							
							
							
							
							* Remove qwen2-7b from npu example readme
* fix 
							
						 | 
						
							2024-10-22 17:07:51 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Yina Chen
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								ec465fbcd7
								
							
						 | 
						
							
							
								
								Add lookup generate in load_low_bit (#12243)
							
							
							
							
							
							
							
							* add lookup generate in load_low_bit
* update comment 
							
						 | 
						
							2024-10-22 15:51:52 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Yuwen Hu
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								d8c1287335
								
							
						 | 
						
							
							
								
								Further update for Windows dGPU performance tests (#12244)
							
							
							
							
							
						 | 
						
							2024-10-22 15:07:21 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Jason Dai
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								a35cf4d533
								
							
						 | 
						
							
							
								
								Update README.md (#12242)
							
							
							
							
							
						 | 
						
							2024-10-22 10:19:07 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Yuwen Hu
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								b3df47486d
								
							
						 | 
						
							
							
								
								Fix Gemma 2 on LNL (#12240)
							
							
							
							
							
							
							
							* Fix gemma 2 on LNL
* Python style fix 
							
						 | 
						
							2024-10-21 18:25:53 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Yuwen Hu
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								ac2dac857c
								
							
						 | 
						
							
							
								
								Disable 4k input test for now for Windows dGPU performance test (#12239)
							
							
							
							
							
						 | 
						
							2024-10-21 15:03:26 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Yuwen Hu
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								ea5154d85e
								
							
						 | 
						
							
							
								
								Further update to Windows dGPU perf test (#12237)
							
							
							
							
							
						 | 
						
							2024-10-21 10:27:16 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Yuwen Hu
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								da9270be2d
								
							
						 | 
						
							
							
								
								Further update to Windows dGPU perf test (#12233)
							
							
							
							
							
						 | 
						
							2024-10-18 23:20:17 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Yuwen Hu
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								5935b25622
								
							
						 | 
						
							
							
								
								Further update windows gpu perf test regarding results integrity check (#12232)
							
							
							
							
							
						 | 
						
							2024-10-18 18:15:13 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Yuwen Hu
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								ef659629f3
								
							
						 | 
						
							
							
								
								Small update to Windows dGPU perf test (#12230)
							
							
							
							
							
							
							
							* Small update to Windows dGPU perf test
* Small fix
* Small fixes
* Remove unnecessary file 
							
						 | 
						
							2024-10-18 16:39:59 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Yuwen Hu
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								9d7f42fd0f
								
							
						 | 
						
							
							
								
								Support manually trigger of dGPU perf test on Windows (#12229)
							
							
							
							
							
							
							
							* Support manually trigger of dgpu perf test on Windows
* Small fix
* Small fix
* Small update 
							
						 | 
						
							2024-10-18 15:38:21 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Jun Wang
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								b10fc892e1
								
							
						 | 
						
							
							
								
								Update new reference link of xpu/docker/readme.md (#12188)
							
							
							
							
							
							
							
							* [ADD] rewrite new vllm docker quick start
* [ADD] lora adapter doc finished
* [ADD] mulit lora adapter test successfully
* [ADD] add ipex-llm quantization doc
* [Merge] rebase main
* [REMOVE] rm tmp file
* [Merge] rebase main
* [ADD] add prefix caching experiment and result
* [REMOVE] rm cpu offloading chapter
* [ADD] rewrite new vllm docker quick start
* [ADD] lora adapter doc finished
* [ADD] mulit lora adapter test successfully
* [ADD] add ipex-llm quantization doc
* [Merge] rebase main
* [REMOVE] rm tmp file
* [Merge] rebase main
* [ADD] rewrite new vllm docker quick start
* [ADD] lora adapter doc finished
* [ADD] mulit lora adapter test successfully
* [ADD] add ipex-llm quantization doc
* [Merge] rebase main
* [REMOVE] rm tmp file
* [Merge] rebase main
* [UPDATE] update the link to new vllm-docker-quickstart 
							
						 | 
						
							2024-10-18 13:18:08 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Jun Wang
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								fe3b5cd89b
								
							
						 | 
						
							
							
								
								[Update] mmdocs/dockerguide vllm-quick-start awq,gptq online serving document (#12227)
							
							
							
							
							
							
							
							* [FIX] fix the docker start script error
* [ADD] add awq online serving doc
* [ADD] add gptq online serving doc
* [Fix] small fix 
							
						 | 
						
							2024-10-18 09:46:59 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Shaojun Liu
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								7825dc1398
								
							
						 | 
						
							
							
								
								Upgrade oneccl to 0.0.5 (#12223)
							
							
							
							
							
						 | 
						
							2024-10-18 09:29:19 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Yuwen Hu
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								b88c1df324
								
							
						 | 
						
							
							
								
								Add Llama 3.1 & 3.2 to Arc Performance test (#12225)
							
							
							
							
							
							
							
							* Add llama3.1 and llama3.2 in arc perf (#12202)
* Add llama3.1 and llama3.2 in arc perf
* Uninstall trl after arc test on transformers>=4.40
* Fix arc llama3 perf (#12212)
* Fix pip uninstall
* Uninstall trl after test on transformers==4.43.1
* Fix llama3 arc perf (#12218)
---------
Co-authored-by: Jin, Qiao <89779290+JinBridger@users.noreply.github.com> 
							
						 | 
						
							2024-10-17 21:12:45 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Yishuo Wang
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								9ea694484d
								
							
						 | 
						
							
							
								
								refactor ot remove old rope usage (#12224)
							
							
							
							
							
						 | 
						
							2024-10-17 17:06:09 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Yishuo Wang
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								324bcb057e
								
							
						 | 
						
							
							
								
								refactor to reduce old rope usage (#12219)
							
							
							
							
							
						 | 
						
							2024-10-17 14:45:09 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Jiao Wang
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								667f0db466
								
							
						 | 
						
							
							
								
								Update Eagle example to Eagle2+ipex-llm integration (#11717)
							
							
							
							
							
							
							
							* update to e2 example
* update
* update 
							
						 | 
						
							2024-10-16 23:16:14 -07:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Shaojun Liu
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								26390f9213
								
							
						 | 
						
							
							
								
								Update oneccl_wks_installer to 2024.0.0.4.1 (#12217)
							
							
							
							
							
						 | 
						
							2024-10-17 10:11:55 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Yishuo Wang
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								a4a758656a
								
							
						 | 
						
							
							
								
								refactor gemma to reduce old fuse rope usage (#12215)
							
							
							
							
							
						 | 
						
							2024-10-16 17:40:28 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Yishuo Wang
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								9104a168f6
								
							
						 | 
						
							
							
								
								refactor phi-2 to reduce old fuse rope usage (#12214)
							
							
							
							
							
						 | 
						
							2024-10-16 17:08:14 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Yishuo Wang
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								bb247e991b
								
							
						 | 
						
							
							
								
								refactor merge_qkv and attention_softmax (#12213)
							
							
							
							
							
						 | 
						
							2024-10-16 15:58:14 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Yishuo Wang
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								e279148aa0
								
							
						 | 
						
							
							
								
								optimize llama3.2 vision again (#12211)
							
							
							
							
							
						 | 
						
							2024-10-16 14:29:48 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Chu,Youcheng
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								f17cc4fdee
								
							
						 | 
						
							
							
								
								feat: add llama3.2-11b-vision in all in one (#12207)
							
							
							
							
							
							
							
							* feat: add llama3.2-11b-vision in all in one
* fix: change model
* fix: change name
* fix: add a space
* fix: switch import 
							
						 | 
						
							2024-10-16 10:32:11 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Yuwen Hu
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								c9ac39fc1e
								
							
						 | 
						
							
							
								
								Add Llama 3.2 to iGPU performance test (transformers 4.45) (#12209)
							
							
							
							
							
							
							
							* Add Llama 3.2 to iGPU Perf (#12200)
* Add Llama 3.2 to iGPU Perf
* Downgrade accelerate after step
* Temporarily disable model for test
* Temporarily change ERRORLEVEL check (#12201)
* Restore llama3.2 perf (#12206)
* Revert "Temporarily change ERRORLEVEL check"
This reverts commit 909dbbc930ab4283737161a55bb32006e6ca1991.
* Revert "Temporarily disable model for test"
This reverts commit 95322dc3c6429aa836f21bda0b5ba8d9b48592f8.
---------
Co-authored-by: Jin, Qiao <89779290+JinBridger@users.noreply.github.com> 
							
						 | 
						
							2024-10-15 17:44:46 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Yishuo Wang
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								f6611f9d3a
								
							
						 | 
						
							
							
								
								optimize llama3.2 vison attention again (#12204)
							
							
							
							
							
						 | 
						
							2024-10-15 16:08:20 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Yishuo Wang
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								9b81236a2e
								
							
						 | 
						
							
							
								
								optimzie qwen2-vl vision (#12203)
							
							
							
							
							
						 | 
						
							2024-10-15 15:54:25 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Yishuo Wang
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								d5344587ab
								
							
						 | 
						
							
							
								
								optimize internvl2 vision model's attention (#12198)
							
							
							
							
							
						 | 
						
							2024-10-15 10:51:00 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Yuwen Hu
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								f8d1adc573
								
							
						 | 
						
							
							
								
								Fix Llama 3.2 & 3.1 on LNL (#12196)
							
							
							
							
							
						 | 
						
							2024-10-14 17:39:20 +08:00 | 
						
						
							
							
							
								
							
							
						 |