Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f61b1785fb 
								
							 
						 
						
							
							
								
								Small update to NPU example readme ( #12034 )  
							
							 
							
							... 
							
							
							
							* Small update to NPU example readme
* Small fix 
							
						 
						
							2024-09-06 15:54:23 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								0d04531ae0 
								
							 
						 
						
							
							
								
								update NPU readme of Qwen2 ( #12032 )  
							
							 
							
							... 
							
							
							
							* update readme
* update broadcast 
							
						 
						
							2024-09-06 15:02:39 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yang Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								58555bd9de 
								
							 
						 
						
							
							
								
								Optimize broadcast for npu llama ( #12028 )  
							
							 
							
							
							
						 
						
							2024-09-06 13:28:20 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								5b18bb3c4a 
								
							 
						 
						
							
							
								
								Add recommend version for mtl npu ( #12024 )  
							
							 
							
							
							
						 
						
							2024-09-05 16:28:53 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								845e5dc89e 
								
							 
						 
						
							
							
								
								Support lm_head of minicpm-2b on NPU ( #12019 )  
							
							 
							
							
							
						 
						
							2024-09-05 16:19:22 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ch1y0q 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								820f8a4554 
								
							 
						 
						
							
							
								
								add --lowbit-path option for NPU llama example ( #12020 )  
							
							 
							
							... 
							
							
							
							* add option" `--lowbit-path`
* add descriptions in `README.md` and formatting
* Update llama.py 
							
						 
						
							2024-09-05 15:31:01 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Guoqiong Song 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								8803242f5c 
								
							 
						 
						
							
							
								
								fix llama on cpu ( #12018 )  
							
							 
							
							
							
						 
						
							2024-09-04 19:17:54 -07:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Wang, Jian4 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								b3b2cd64b4 
								
							 
						 
						
							
							
								
								Support lightweight-serving glm-4v-9b  ( #11994 )  
							
							 
							
							... 
							
							
							
							* enable glm-4v-9b serving
* update readme
* update for no image input 
							
						 
						
							2024-09-05 09:25:08 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								b1408a1f1c 
								
							 
						 
						
							
							
								
								fix UT ( #12005 )  
							
							 
							
							
							
						 
						
							2024-09-04 18:02:49 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Wang, Jian4 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								2b993ad479 
								
							 
						 
						
							
							
								
								vllm update for glm-4 model automatic not_convert ( #12003 )  
							
							 
							
							
							
						 
						
							2024-09-04 13:50:32 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								9eaff5e47d 
								
							 
						 
						
							
							
								
								add save &  load support for NPU optimized model ( #11999 )  
							
							 
							
							... 
							
							
							
							* add save &  load support
* fix style 
							
						 
						
							2024-09-03 20:53:22 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								6eb55653ba 
								
							 
						 
						
							
							
								
								Performance mode strategy update for input_embeds input ( #11997 )  
							
							 
							
							
							
						 
						
							2024-09-03 17:46:16 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jinhe 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								164f47adbd 
								
							 
						 
						
							
							
								
								MiniCPM-V-2 & MiniCPM-Llama3-V-2_5 example updates ( #11988 )  
							
							 
							
							... 
							
							
							
							* minicpm example updates
* --stream 
							
						 
						
							2024-09-03 17:02:06 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jin, Qiao 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								2e54f4402b 
								
							 
						 
						
							
							
								
								Rename MiniCPM-V-2_6 CPU example ( #11998 )  
							
							 
							
							
							
						 
						
							2024-09-03 16:50:42 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								01099f08ee 
								
							 
						 
						
							
							
								
								Revert prefill logic of qwen2-7b ( #11992 )  
							
							 
							
							
							
						 
						
							2024-09-03 14:45:01 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								659d15defc 
								
							 
						 
						
							
							
								
								Fix wrong attention mask and garbage output for inputs_embeds inputs during lookup generation ( #11989 )  
							
							 
							
							... 
							
							
							
							* Fix garbage output for input_embeds inputs during lookup generation
* Fix on sliding windows
* Simplify code 
							
						 
						
							2024-09-02 19:09:12 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								2f3d1bd0ec 
								
							 
						 
						
							
							
								
								hotfix qwen2-7b weight setting ( #11991 )  
							
							 
							
							
							
						 
						
							2024-09-02 18:11:08 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a40ea7038d 
								
							 
						 
						
							
							
								
								Fix AttributeError of qwen2-1.5B ( #11990 )  
							
							 
							
							
							
						 
						
							2024-09-02 17:55:10 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yang Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c48817bd43 
								
							 
						 
						
							
							
								
								Support Qwen2-7b MLP in int4 and transpose_value_cache=True ( #11968 )  
							
							 
							
							
							
						 
						
							2024-09-02 14:37:44 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jin, Qiao 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								65e281bb29 
								
							 
						 
						
							
							
								
								Add MiniCPM-V cpu example ( #11975 )  
							
							 
							
							... 
							
							
							
							* Add MiniCPM-V cpu example
* fix
* fix
* fix
* fix 
							
						 
						
							2024-09-02 10:17:57 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								79978e6f36 
								
							 
						 
						
							
							
								
								update npu multimodal readme ( #11979 )  
							
							 
							
							... 
							
							
							
							* update npu readme of multimodal
* small fix
* meet comment 
							
						 
						
							2024-08-30 19:02:06 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								4811a490ef 
								
							 
						 
						
							
							
								
								small fix ( #11978 )  
							
							 
							
							... 
							
							
							
							* fix
* meet comment 
							
						 
						
							2024-08-30 17:55:15 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								573c20bae6 
								
							 
						 
						
							
							
								
								fix npu lm_head cpu condition ( #11976 )  
							
							 
							
							... 
							
							
							
							* fix
* fix
* fix
* fix stype
* fix style
* fix style 
							
						 
						
							2024-08-30 17:11:26 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								60aa1a2c0f 
								
							 
						 
						
							
							
								
								Initial NPU support for MiniCPM-V-2_6 ( #11966 )  
							
							 
							
							... 
							
							
							
							* initial pr
* update npu model
* fix
* fix kv cache type
* fix
* small fix
* fix style
* fix model id
* change inter_pp=4
* address comment
* fix
* fix style
* fix
* rebase 
							
						 
						
							2024-08-30 16:34:35 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									SONG Ge 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								158289d205 
								
							 
						 
						
							
							
								
								[NPU] Add initial support for minicpm-llama-v2.5 ( #11962 )  
							
							 
							
							... 
							
							
							
							* add initial support for minicpm-llama-v2.5
* update impl
* add minicpm-llama3-v2.5 example 
							
						 
						
							2024-08-30 16:00:33 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Chu,Youcheng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ae7302a654 
								
							 
						 
						
							
							
								
								add gptq option for ppl test ( #11921 )  
							
							 
							
							... 
							
							
							
							* feat:add gptq for ppl
* fix: add an empty line
* fix: add an empty line
* fix: remove an empty line
* Resolve comments
* Resolve comments
* Resolve comments 
							
						 
						
							2024-08-30 13:43:48 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								cd077881f1 
								
							 
						 
						
							
							
								
								Disable lm head ( #11972 )  
							
							 
							
							
							
						 
						
							2024-08-30 11:05:18 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Wang, Jian4 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7d103417b8 
								
							 
						 
						
							
							
								
								Fix glm4-9b-chat nan error on vllm 0.3.3 ( #11970 )  
							
							 
							
							... 
							
							
							
							* fix nan value
* update 
							
						 
						
							2024-08-30 09:50:18 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yang Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								fbf088f61e 
								
							 
						 
						
							
							
								
								remove obselete npu code ( #11967 )  
							
							 
							
							
							
						 
						
							2024-08-29 14:16:44 -07:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a9e485eb1b 
								
							 
						 
						
							
							
								
								Support MiniCPM-V-2_6 multi-modal benchmarking with latency text streamer ( #11963 )  
							
							 
							
							... 
							
							
							
							* Support MiniCPM-V-2_6 multi-modal benchmarking with latency text streamer
* Style fixes 
							
						 
						
							2024-08-29 19:22:09 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								2e49e1f8e9 
								
							 
						 
						
							
							
								
								Further fix for MiniCPM-V-2_6 example ( #11965 )  
							
							 
							
							
							
						 
						
							2024-08-29 19:14:13 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jason Dai 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								431affd0a0 
								
							 
						 
						
							
							
								
								Update README.md ( #11964 )  
							
							 
							
							
							
						 
						
							2024-08-29 18:56:35 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								14b2c8dc32 
								
							 
						 
						
							
							
								
								Update qwen2-7b example script ( #11961 )  
							
							 
							
							
							
						 
						
							2024-08-29 18:25:17 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7abe17d6f7 
								
							 
						 
						
							
							
								
								Update MiniCPM-V-2_6 Example ( #11958 )  
							
							 
							
							... 
							
							
							
							* Update example scripts regarding warmup, stream generate, moudles to not convert, etc.
* Update readme accordingly
* Fix based on comments
* Small fix
* Remove n_predict 
							
						 
						
							2024-08-29 18:23:48 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yina Chen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								5f7ff76ea5 
								
							 
						 
						
							
							
								
								update troubleshooting ( #11960 )  
							
							 
							
							
							
						 
						
							2024-08-29 17:44:22 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yina Chen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								882f4a5ff7 
								
							 
						 
						
							
							
								
								Add lnl npu driver recommend version and enable cpu_lm_head on llama3 ( #11952 )  
							
							 
							
							... 
							
							
							
							* update lnl npu driver version and enable cpu_lm_head on llama3
* update
* fix style
* typo
* address comments
* update
* add qwen2-7b 
							
						 
						
							2024-08-29 15:01:18 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								71f03dcc39 
								
							 
						 
						
							
							
								
								Support qwen2-7b with fused decoderlayer optimization on NPU ( #11912 )  
							
							 
							
							
							
						 
						
							2024-08-29 13:34:20 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jiao Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								63ac5f64bb 
								
							 
						 
						
							
							
								
								Refactor NPU baichuan multiple-process ( #11945 )  
							
							 
							
							... 
							
							
							
							* update
* add baichuan mp
* clean
* refactor
* merge
* style
* update
* update 
							
						 
						
							2024-08-28 11:33:40 -07:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									SONG Ge 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								5ca7390082 
								
							 
						 
						
							
							
								
								[NPU] Add minicpm-2b support for npu multi-processing ( #11949 )  
							
							 
							
							... 
							
							
							
							* add minicpm-2b support
* update example for minicpm-2b
* add LNL NPU driver requirement in readme 
							
						 
						
							2024-08-28 18:08:49 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								0fbb10259a 
								
							 
						 
						
							
							
								
								use sdp_causal to reduce internvl2-4b memory usage if set environment variable ( #11953 )  
							
							 
							
							
							
						 
						
							2024-08-28 17:35:05 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Guancheng Fu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								0a7bd274e2 
								
							 
						 
						
							
							
								
								Add vllm awq loading logic ( #11950 )  
							
							 
							
							... 
							
							
							
							* add vllm awq loading logic
* fix
* refine 
							
						 
						
							2024-08-28 16:46:18 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yina Chen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								b38fb67bec 
								
							 
						 
						
							
							
								
								[NPU] lm head to cpu ( #11943 )  
							
							 
							
							... 
							
							
							
							* lm head to cpu
* qwen2
* mv logic and add param to disable cpu_lm_head
* use env and lm_head opt to mp file
* fix
* update
* remove print 
							
						 
						
							2024-08-28 16:34:07 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									hxsz1997 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								e23549f63f 
								
							 
						 
						
							
							
								
								Update llamaindex examples ( #11940 )  
							
							 
							
							... 
							
							
							
							* modify rag.py
* update readme of gpu example
* update llamaindex cpu example and readme
* add llamaindex doc
* update note style
* import before instancing IpexLLMEmbedding
* update index in readme
* update links
* update link
* update related links 
							
						 
						
							2024-08-28 14:03:44 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								bec00e2015 
								
							 
						 
						
							
							
								
								Improve baichuan2 NPU performance ( #11942 )  
							
							 
							
							
							
						 
						
							2024-08-27 18:37:08 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Zijie Li 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								90f692937d 
								
							 
						 
						
							
							
								
								Update npu baichuan2 ( #11939 )  
							
							 
							
							
							
						 
						
							2024-08-27 16:56:26 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7f7f6c89f5 
								
							 
						 
						
							
							
								
								Quick fix benchmark script ( #11938 )  
							
							 
							
							
							
						 
						
							2024-08-27 15:29:27 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jiao Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								b4b6ddf73c 
								
							 
						 
						
							
							
								
								NPU Baichuan2 Multi- Process example ( #11928 )  
							
							 
							
							
							
						 
						
							2024-08-27 15:25:49 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									SONG Ge 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								e211a5b076 
								
							 
						 
						
							
							
								
								update minicpm to meet latest refactor ( #11937 )  
							
							 
							
							
							
						 
						
							2024-08-27 15:08:01 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									SONG Ge 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a81a329a5f 
								
							 
						 
						
							
							
								
								[NPU] Add example for NPU multi-processing minicpm-1b model ( #11935 )  
							
							 
							
							... 
							
							
							
							* add minicpm example 
							
						 
						
							2024-08-27 14:57:46 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7c8c9a0670 
								
							 
						 
						
							
							
								
								Update benchmark script for NPU ( #11932 )  
							
							 
							
							
							
						 
						
							2024-08-27 14:41:14 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ch1y0q 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								730d9ec811 
								
							 
						 
						
							
							
								
								Add Qwen2-audio example ( #11835 )  
							
							 
							
							... 
							
							
							
							* add draft for qwen2-audio
* update example for `Qwen2-Audio`
* update
* update
* add warmup 
							
						 
						
							2024-08-27 13:35:24 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Shaojun Liu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								b11b28e9a9 
								
							 
						 
						
							
							
								
								update CORE_XE_VERSION to 2.6.0 ( #11929 )  
							
							 
							
							
							
						 
						
							2024-08-27 13:10:13 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yina Chen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								e246f1e258 
								
							 
						 
						
							
							
								
								update llama3 npu example ( #11933 )  
							
							 
							
							
							
						 
						
							2024-08-27 13:03:18 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								14dddfc0d6 
								
							 
						 
						
							
							
								
								Update NPU example readme ( #11931 )  
							
							 
							
							
							
						 
						
							2024-08-27 12:44:58 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Zijie Li 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								6c3eb1e1e8 
								
							 
						 
						
							
							
								
								refactor from_pretrained API for NPU ( #11927 )  
							
							 
							
							
							
						 
						
							2024-08-27 09:50:30 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xiangyu Tian 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7ca557aada 
								
							 
						 
						
							
							
								
								LLM: Fix vLLM CPU convert error ( #11926 )  
							
							 
							
							
							
						 
						
							2024-08-27 09:22:19 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c1d07bc626 
								
							 
						 
						
							
							
								
								Support streaming for lookup generation ( #11922 )  
							
							 
							
							... 
							
							
							
							* Support streaming for lookup generation
* Small update
* Style fixes
* Add origin generate full back for batch inference and beam search; support input length threshold judgement for directly input with input_ids
* Fix lookup stream generate with eos token
* Small fixes
* Small fix
* index fix
* Small fix 
							
						 
						
							2024-08-26 19:33:31 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a0bbd8e28d 
								
							 
						 
						
							
							
								
								All-in-one benchmark update regarding performance mode for input length threshold ( #11920 )  
							
							 
							
							... 
							
							
							
							* All-in-one benchmark update regarding performance mode input length threshold
* typo fix 
							
						 
						
							2024-08-26 18:52:13 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									SONG Ge 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								019f725d4d 
								
							 
						 
						
							
							
								
								[NPU] Add support for running mp minicpm model on npu ( #11909 )  
							
							 
							
							... 
							
							
							
							* add initial support for npu minicpm mp
* fix minicpm-1b abnormal output error 
							
						 
						
							2024-08-26 17:52:55 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								dd303776cf 
								
							 
						 
						
							
							
								
								Add troubleshooting about transpose value setting  
							
							 
							
							
							
						 
						
							2024-08-26 16:06:32 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								24c279e0ae 
								
							 
						 
						
							
							
								
								Update IPEX_LLM_PERFORMANCE_MODE with input length threshold ( #11908 )  
							
							 
							
							... 
							
							
							
							* Update IPEX_LLM_PERFORMANCE_MODE with input length threshold
* Update based on comments. And and judgement for inputs_embeds
* Fix for benchmarking purposes
* Update based on comments
* Small fix 
							
						 
						
							2024-08-23 20:49:15 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								303a090a6b 
								
							 
						 
						
							
							
								
								Add lm_head optimization on NPU ( #11903 )  
							
							 
							
							
							
						 
						
							2024-08-23 15:51:07 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yina Chen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								23631cd357 
								
							 
						 
						
							
							
								
								disable lm_head opt for baichuan2-13b ( #11905 )  
							
							 
							
							
							
						 
						
							2024-08-23 15:39:47 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									hxsz1997 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								650e6e6ce4 
								
							 
						 
						
							
							
								
								Merge pull request  #11891  from hxsz1997/baichuan2-compresskv  
							
							 
							
							... 
							
							
							
							Add compress_kv for Baichuan2 
							
						 
						
							2024-08-23 06:09:58 +03:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								4a61f7d20d 
								
							 
						 
						
							
							
								
								update mlp of llama ( #11897 )  
							
							 
							
							... 
							
							
							
							* update mlp of llama
* relax threshold of  mlp test
* revert code 
							
						 
						
							2024-08-22 20:34:53 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								420ce7d164 
								
							 
						 
						
							
							
								
								Fix non-stop at eos token problem for lookup generation ( #11896 )  
							
							 
							
							... 
							
							
							
							* Fix non-stop by eos_token_id problem for lookup
* Small fix
* Add judgement when generation_config.eos_token_id is None
* Fix based on comments 
							
						 
						
							2024-08-22 18:55:59 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Huang, Xinshengzi 
								
							 
						 
						
							
							
							
							
								
							
							
								4cf03d6212 
								
							 
						 
						
							
							
								
								update baichuan-7b  
							
							 
							
							
							
						 
						
							2024-08-22 18:16:33 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Zijie Li 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								794abe2ce8 
								
							 
						 
						
							
							
								
								update npu-readme ( #11900 )  
							
							 
							
							
							
						 
						
							2024-08-22 17:49:35 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Guancheng Fu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								278b191dc1 
								
							 
						 
						
							
							
								
								Fix optimize lm head error ( #11899 )  
							
							 
							
							
							
						 
						
							2024-08-22 17:45:26 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Shaojun Liu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c5b51d41fb 
								
							 
						 
						
							
							
								
								Update pypi tag to 2.2.0.dev0 ( #11895 )  
							
							 
							
							
							
						 
						
							2024-08-22 16:48:09 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jinhe 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								18662dca1c 
								
							 
						 
						
							
							
								
								change 5 pytorch/huggingface models to fp16 ( #11894 )  
							
							 
							
							
							
						 
						
							2024-08-22 16:12:09 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Wang, Jian4 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								5c4ed00593 
								
							 
						 
						
							
							
								
								Add lightweight-serving whisper asr example ( #11847 )  
							
							 
							
							... 
							
							
							
							* add asr init
* update for pp
* update style
* update readme
* update reamde 
							
						 
						
							2024-08-22 15:46:28 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Huang, Xinshengzi 
								
							 
						 
						
							
							
							
							
								
							
							
								eb1e65f8a9 
								
							 
						 
						
							
							
								
								add comment  
							
							 
							
							
							
						 
						
							2024-08-22 15:14:47 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Huang, Xinshengzi 
								
							 
						 
						
							
							
							
							
								
							
							
								a2be3d7501 
								
							 
						 
						
							
							
								
								add comment of compress kv in attention forward  
							
							 
							
							
							
						 
						
							2024-08-22 15:11:55 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jinhe 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a8e2573421 
								
							 
						 
						
							
							
								
								added tokenization file for codegeex2-6b in pytorch-models( #11875 )  
							
							 
							
							... 
							
							
							
							* added tokenization file
* tokenization file readme update
* optional 
							
						 
						
							2024-08-22 14:37:56 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Huang, Xinshengzi 
								
							 
						 
						
							
							
							
							
								
							
							
								ce7de77085 
								
							 
						 
						
							
							
								
								add comment of change in model forward  
							
							 
							
							
							
						 
						
							2024-08-22 14:29:27 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Huang, Xinshengzi 
								
							 
						 
						
							
							
							
							
								
							
							
								42398a0045 
								
							 
						 
						
							
							
								
								add comment  
							
							 
							
							
							
						 
						
							2024-08-22 13:17:13 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Huang, Xinshengzi 
								
							 
						 
						
							
							
							
							
								
							
							
								48a827aa07 
								
							 
						 
						
							
							
								
								fix typos  
							
							 
							
							
							
						 
						
							2024-08-22 11:35:47 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Huang, Xinshengzi 
								
							 
						 
						
							
							
							
							
								
							
							
								8a5df93de2 
								
							 
						 
						
							
							
								
								fix typos  
							
							 
							
							
							
						 
						
							2024-08-22 11:33:07 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Huang, Xinshengzi 
								
							 
						 
						
							
							
							
							
								
							
							
								01ed397e7a 
								
							 
						 
						
							
							
								
								fix typos  
							
							 
							
							
							
						 
						
							2024-08-22 11:31:25 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Huang, Xinshengzi 
								
							 
						 
						
							
							
							
							
								
							
							
								c6ed1c412d 
								
							 
						 
						
							
							
								
								fix typos  
							
							 
							
							
							
						 
						
							2024-08-22 11:26:49 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Huang, Xinshengzi 
								
							 
						 
						
							
							
							
							
								
							
							
								2a0aa9271b 
								
							 
						 
						
							
							
								
								fix typos  
							
							 
							
							
							
						 
						
							2024-08-22 11:23:22 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Huang, Xinshengzi 
								
							 
						 
						
							
							
							
							
								
							
							
								4adadddbbc 
								
							 
						 
						
							
							
								
								fix typos  
							
							 
							
							
							
						 
						
							2024-08-22 11:12:23 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Huang, Xinshengzi 
								
							 
						 
						
							
							
							
							
								
							
							
								6a5ca17afc 
								
							 
						 
						
							
							
								
								fix typoes  
							
							 
							
							
							
						 
						
							2024-08-22 11:09:58 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								72a7bf624b 
								
							 
						 
						
							
							
								
								Support qwen2-1.5b with fused decoderlayer optimization on NPU ( #11888 )  
							
							 
							
							
							
						 
						
							2024-08-22 11:09:12 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Huang, Xinshengzi 
								
							 
						 
						
							
							
							
							
								
							
							
								6bb9035788 
								
							 
						 
						
							
							
								
								fix typos  
							
							 
							
							
							
						 
						
							2024-08-22 11:08:48 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Huang, Xinshengzi 
								
							 
						 
						
							
							
							
							
								
							
							
								86248b0505 
								
							 
						 
						
							
							
								
								add compress_kv for baichuan2  
							
							 
							
							
							
						 
						
							2024-08-22 10:59:08 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Zijie Li 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								bdbe995b01 
								
							 
						 
						
							
							
								
								Update README.md ( #11889 )  
							
							 
							
							... 
							
							
							
							Set datasets version to 2.16.1. Clear out the transformers version requirement. 
							
						 
						
							2024-08-22 09:40:16 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yina Chen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								cc27321441 
								
							 
						 
						
							
							
								
								support chatglm4 in lookup ( #11855 )  
							
							 
							
							
							
						 
						
							2024-08-21 15:53:17 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yina Chen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								0236de3ac2 
								
							 
						 
						
							
							
								
								set IPEX_LLM_LAST_LM_HEAD=1 as default ( #11885 )  
							
							 
							
							
							
						 
						
							2024-08-21 15:06:12 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									SONG Ge 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								8c5c7f32dd 
								
							 
						 
						
							
							
								
								Update doc for running npu generate example with ipex-llm[npu] ( #11876 )  
							
							 
							
							... 
							
							
							
							* update doc for running npu generate example with ipex-llm[npu]
* switch max_prompt_len to 512 to fix compile error on mtl 
							
						 
						
							2024-08-21 13:45:29 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yang Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								209d42ab79 
								
							 
						 
						
							
							
								
								Refactor npu mp to make it easier to integrate new models ( #11873 )  
							
							 
							
							... 
							
							
							
							* Refactor npu mp to make it easier to integrate new models
* fix style
* move layer functions to base 
							
						 
						
							2024-08-20 20:58:47 -07:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Guancheng Fu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								537c0d2767 
								
							 
						 
						
							
							
								
								fix vllm qwen2 models ( #11879 )  
							
							 
							
							
							
						 
						
							2024-08-21 11:05:24 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								bd1e490d62 
								
							 
						 
						
							
							
								
								fix phi3 ( #11878 )  
							
							 
							
							
							
						 
						
							2024-08-21 10:31:41 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								eab6f6dde4 
								
							 
						 
						
							
							
								
								Spr perf small fix ( #11874 )  
							
							 
							
							
							
						 
						
							2024-08-21 09:35:26 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yang Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								bdaeee1d63 
								
							 
						 
						
							
							
								
								Fix run_decoders bug ( #11871 )  
							
							 
							
							
							
						 
						
							2024-08-20 12:04:59 -07:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Chu,Youcheng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								32f0a77846 
								
							 
						 
						
							
							
								
								feat: update readme for ppl test ( #11865 )  
							
							 
							
							... 
							
							
							
							* feat: update readme for ppl test
* fix: textual adjustments
* fix: textual adjustments
* Add ipex-llm npu option in setup.py (#11858 )
* add ipex-llm npu release
* update example doc
* meet latest release changes
* optimize phi3 memory usage (#11867 )
* Update `ipex-llm` default transformers version to 4.37.0 (#11859 )
* Update default transformers version to 4.37.0
* Add dependency requirements for qwen and qwen-vl
* Temp fix transformers version for these not yet verified models
* Skip qwen test in UT for now as it requires transformers<4.37.0
* Update performance test regarding updated default `transformers==4.37.0` (#11869 )
* Update igpu performance from transformers 4.36.2 to 4.37.0 (#11841 )
* upgrade arc perf test to transformers 4.37 (#11842 )
* fix load low bit com dtype (#11832 )
* feat: add mixed_precision argument on ppl longbench evaluation
* fix: delete extra code
* feat: upgrade arc perf test to transformers 4.37
* fix: add missing codes
* fix: keep perf test for qwen-vl-chat in transformers 4.36
* fix: remove extra space
* fix: resolve pr comment
* fix: add empty line
* fix: add pip install for spr and core test
* fix: delete extra comments
* fix: remove python -m for pip
* Revert "fix load low bit com dtype (#11832 )"
This reverts commit 6841a9ac8f .
---------
Co-authored-by: Zhao Changmin <changmin.zhao@intel.com>
Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>
* add transformers==4.36 for qwen vl in igpu-perf (#11846 )
* add transformers==4.36.2 for qwen-vl
* Small update
---------
Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
* fix: remove qwen-7b on core test (#11851 )
* fix: remove qwen-7b on core test
* fix: change delete to comment
---------
Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>
* replce filename (#11854 )
* fix: remove qwen-7b on core test
* fix: change delete to comment
* fix: replace filename
---------
Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>
* fix: delete extra comments (#11863 )
* Remove transformers installation for temp test purposes
* Small fix
* Small update
---------
Co-authored-by: Chu,Youcheng <70999398+cranechu0131@users.noreply.github.com>
Co-authored-by: Zhao Changmin <changmin.zhao@intel.com>
Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>
Co-authored-by: Zijie Li <michael20001122@gmail.com>
Co-authored-by: Chu,Youcheng <1340390339@qq.com>
* Pytorch models transformers version update (#11860 )
* yi sync
* delete 4.34 constraint
* delete 4.34 constraint
* delete 4.31 constraint
* delete 4.34 constraint
* delete 4.35 constraint
* added <=4.33.3 constraint
* added <=4.33.3 constraint
* switched to chinese prompt
* Update compresskv model forward type logic (#11868 )
* update
* fix
* Update local import for ppl (#11866 )
Co-authored-by: jenniew <jenniewang123@gmail.com>
* fix: textual adjustment
---------
Co-authored-by: SONG Ge <38711238+sgwhat@users.noreply.github.com>
Co-authored-by: Yishuo Wang <yishuo.wang@intel.com>
Co-authored-by: Yuwen Hu <54161268+Oscilloscope98@users.noreply.github.com>
Co-authored-by: Zhao Changmin <changmin.zhao@intel.com>
Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>
Co-authored-by: Zijie Li <michael20001122@gmail.com>
Co-authored-by: Yina Chen <33650826+cyita@users.noreply.github.com>
Co-authored-by: RyuKosei <70006706+RyuKosei@users.noreply.github.com>
Co-authored-by: jenniew <jenniewang123@gmail.com> 
							
						 
						
							2024-08-20 20:13:54 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									RyuKosei 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								5df00869de 
								
							 
						 
						
							
							
								
								Update local import for ppl ( #11866 )  
							
							 
							
							... 
							
							
							
							Co-authored-by: jenniew <jenniewang123@gmail.com> 
							
						 
						
							2024-08-20 18:50:00 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yina Chen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c3c058373f 
								
							 
						 
						
							
							
								
								Update compresskv model forward type logic ( #11868 )  
							
							 
							
							... 
							
							
							
							* update
* fix 
							
						 
						
							2024-08-20 18:11:37 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jinhe 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								3ee194d983 
								
							 
						 
						
							
							
								
								Pytorch models transformers version update ( #11860 )  
							
							 
							
							... 
							
							
							
							* yi sync
* delete 4.34 constraint
* delete 4.34 constraint
* delete 4.31 constraint
* delete 4.34 constraint
* delete 4.35 constraint
* added <=4.33.3 constraint
* added <=4.33.3 constraint
* switched to chinese prompt 
							
						 
						
							2024-08-20 18:01:42 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								0d58c2fdf9 
								
							 
						 
						
							
							
								
								Update performance test regarding updated default transformers==4.37.0 ( #11869 )  
							
							 
							
							... 
							
							
							
							* Update igpu performance from transformers 4.36.2 to 4.37.0 (#11841 )
* upgrade arc perf test to transformers 4.37 (#11842 )
* fix load low bit com dtype (#11832 )
* feat: add mixed_precision argument on ppl longbench evaluation
* fix: delete extra code
* feat: upgrade arc perf test to transformers 4.37
* fix: add missing codes
* fix: keep perf test for qwen-vl-chat in transformers 4.36
* fix: remove extra space
* fix: resolve pr comment
* fix: add empty line
* fix: add pip install for spr and core test
* fix: delete extra comments
* fix: remove python -m for pip
* Revert "fix load low bit com dtype (#11832 )"
This reverts commit 6841a9ac8f .
---------
Co-authored-by: Zhao Changmin <changmin.zhao@intel.com>
Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>
* add transformers==4.36 for qwen vl in igpu-perf (#11846 )
* add transformers==4.36.2 for qwen-vl
* Small update
---------
Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
* fix: remove qwen-7b on core test (#11851 )
* fix: remove qwen-7b on core test
* fix: change delete to comment
---------
Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>
* replce filename (#11854 )
* fix: remove qwen-7b on core test
* fix: change delete to comment
* fix: replace filename
---------
Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>
* fix: delete extra comments (#11863 )
* Remove transformers installation for temp test purposes
* Small fix
* Small update
---------
Co-authored-by: Chu,Youcheng <70999398+cranechu0131@users.noreply.github.com>
Co-authored-by: Zhao Changmin <changmin.zhao@intel.com>
Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com>
Co-authored-by: Zijie Li <michael20001122@gmail.com>
Co-authored-by: Chu,Youcheng <1340390339@qq.com> 
							
						 
						
							2024-08-20 17:59:28 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								5e8286f72d 
								
							 
						 
						
							
							
								
								Update ipex-llm default transformers version to 4.37.0 ( #11859 )  
							
							 
							
							... 
							
							
							
							* Update default transformers version to 4.37.0
* Add dependency requirements for qwen and qwen-vl
* Temp fix transformers version for these not yet verified models
* Skip qwen test in UT for now as it requires transformers<4.37.0 
							
						 
						
							2024-08-20 17:37:58 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								d4ee0a89f3 
								
							 
						 
						
							
							
								
								optimize phi3 memory usage ( #11867 )  
							
							 
							
							
							
						 
						
							2024-08-20 17:32:51 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									SONG Ge 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								5b83493b1a 
								
							 
						 
						
							
							
								
								Add ipex-llm npu option in setup.py ( #11858 )  
							
							 
							
							... 
							
							
							
							* add ipex-llm npu release
* update example doc
* meet latest release changes 
							
						 
						
							2024-08-20 17:29:49 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Heyang Sun 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ee6852c915 
								
							 
						 
						
							
							
								
								Fix typo ( #11862 )  
							
							 
							
							
							
						 
						
							2024-08-20 16:38:11 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								2946420e14 
								
							 
						 
						
							
							
								
								add minicpmv 2.6 load_low_bit workaround ( #11856 )  
							
							 
							
							
							
						 
						
							2024-08-20 11:16:02 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									SONG Ge 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7380823f3f 
								
							 
						 
						
							
							
								
								Update Llama2 multi-processes example ( #11852 )  
							
							 
							
							... 
							
							
							
							* update llama2 multi-processes examples
* update
* update readme
* update 
							
						 
						
							2024-08-19 19:49:01 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yang Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								99b05ba1dc 
								
							 
						 
						
							
							
								
								separate prefill into a process ( #11787 )  
							
							 
							
							... 
							
							
							
							* seperate prefill into a process
* using model.share_memory()
* might work
* worked
* use long prompt
* refactor
* cleanup
* fix bug
* clean up
* changable inter and intra process stages
* refactor
* add max output len
* fix npu_model changes that may cause generate down
* fix npu_model generate import error
* fix generare forward error
---------
Co-authored-by: sgwhat <ge.song@intel.com> 
							
						 
						
							2024-08-19 17:53:36 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jinhe 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								da3d7a3a53 
								
							 
						 
						
							
							
								
								delete transformers version requirement ( #11845 )  
							
							 
							
							... 
							
							
							
							* delete transformers version requirement
* delete transformers version requirement 
							
						 
						
							2024-08-19 17:53:02 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a0fbda5bc8 
								
							 
						 
						
							
							
								
								add MiniCPM-Llama3-V-2_5 into all-in-one benchmark ( #11849 )  
							
							 
							
							
							
						 
						
							2024-08-19 17:51:16 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								9490781aec 
								
							 
						 
						
							
							
								
								optimize phi3 memory usage again ( #11848 )  
							
							 
							
							
							
						 
						
							2024-08-19 17:26:59 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yina Chen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								3cd4e87168 
								
							 
						 
						
							
							
								
								Support compress KV with quantize KV ( #11812 )  
							
							 
							
							... 
							
							
							
							* update llama
* support llama 4.41
* fix style
* support minicpm
* support qwen2
* support minicpm & update
* support chatglm4
* support chatglm
* remove print
* add DynamicCompressFp8Cache & support qwen
* support llama
* support minicpm phi3
* update chatglm2/4
* small fix & support qwen 4.42
* remove print 
							
						 
						
							2024-08-19 15:32:32 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Zhao Changmin 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								6841a9ac8f 
								
							 
						 
						
							
							
								
								fix load low bit com dtype ( #11832 )  
							
							 
							
							
							
						 
						
							2024-08-19 13:43:19 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								cfc959defa 
								
							 
						 
						
							
							
								
								Fixes regarding utf-8 in all-in-one benchmark ( #11839 )  
							
							 
							
							
							
						 
						
							2024-08-19 10:38:00 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Chu,Youcheng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								46a1cbfa64 
								
							 
						 
						
							
							
								
								feat: add mixed_precision argument on ppl longbench evaluation ( #11837 )  
							
							 
							
							... 
							
							
							
							* feat: add mixed_precision argument on ppl longbench evaluation
* fix: delete two spaces
---------
Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com> 
							
						 
						
							2024-08-19 10:00:44 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								580c94d0e2 
								
							 
						 
						
							
							
								
								Remove gemma-2-9b-it 3k input from igpu-perf ( #11834 )  
							
							 
							
							
							
						 
						
							2024-08-17 13:10:05 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jin, Qiao 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								9f17234f3b 
								
							 
						 
						
							
							
								
								Add MiniCPM-V-2_6 to iGPU Perf ( #11810 )  
							
							 
							
							... 
							
							
							
							* Add MiniCPM-V-2_6 to iGPU Perf
* keep last model in yaml
* fix MINICPM_V_IDS
* Restore tested model list
* Small fix
---------
Co-authored-by: Yuwen Hu <yuwen.hu@intel.com> 
							
						 
						
							2024-08-16 18:41:21 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								96796f95cb 
								
							 
						 
						
							
							
								
								Update all-in-one benchmark prompts for continuation task & lookup update for minicpmv ( #11827 )  
							
							 
							
							... 
							
							
							
							* Update all-in-one benchmark prompts for continuation task
* Small fix
* Add pure-text benchmark support for minicpm-v-2_6
* Support lookahead for model.llm generate of minicpmv
* Add prompt reference
* Small update
* Small fix 
							
						 
						
							2024-08-16 17:16:35 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								e966e85df8 
								
							 
						 
						
							
							
								
								force lm_head optimization in any model if set environment variable ( #11830 )  
							
							 
							
							
							
						 
						
							2024-08-16 16:48:45 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									RyuKosei 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								3b630fb9df 
								
							 
						 
						
							
							
								
								updated ppl README ( #11807 )  
							
							 
							
							... 
							
							
							
							* edit README.md
* update the branch
* edited README.md
* updated
* updated description
---------
Co-authored-by: jenniew <jenniewang123@gmail.com> 
							
						 
						
							2024-08-16 15:49:25 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jinhe 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								e07a55665c 
								
							 
						 
						
							
							
								
								Codegeex2 tokenization fix ( #11831 )  
							
							 
							
							... 
							
							
							
							* updated tokenizer file
* updated tokenizer file
* updated tokenizer file
* updated tokenizer file
* new folder 
							
						 
						
							2024-08-16 15:48:47 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jinhe 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								adfbb9124a 
								
							 
						 
						
							
							
								
								Reorganize MiniCPM-V-2_6 example & update others MiniCPM-V-2 exmaples ( #11815 )  
							
							 
							
							... 
							
							
							
							* model to fp16 & 2_6 reorganize
* revisions
* revisions
* half
* deleted transformer version requirements
* deleted transformer version requirements
---------
Co-authored-by: ivy-lv11 <zhicunlv@gmail.com> 
							
						 
						
							2024-08-16 14:48:56 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Chu,Youcheng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f463268e36 
								
							 
						 
						
							
							
								
								fix: add run oneAPI instruction for the example of codeshell ( #11828 )  
							
							 
							
							... 
							
							
							
							* fix: delete ipex extension import in ppl wikitext evaluation
* feat: add mixed_precision argument on ppl wikitext evaluation
* fix: delete mix_precision command in perplex evaluation for wikitext
* fix: remove fp16 mixed-presicion argument
* fix: Add a space.
* fix: add run oneAPI instruction for the example of codeshell
* fix: textual adjustments
* fix: Textual adjustment
---------
Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com> 
							
						 
						
							2024-08-16 14:29:06 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								17a0beb21f 
								
							 
						 
						
							
							
								
								optimize qwen2-audio again ( #11825 )  
							
							 
							
							
							
						 
						
							2024-08-16 11:11:35 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								9e9086cc2a 
								
							 
						 
						
							
							
								
								Update IPEX_LLM_PERFORMANCE_MODE ( #11823 )  
							
							 
							
							
							
						 
						
							2024-08-16 09:48:36 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Wang, Jian4 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								5a80fd2633 
								
							 
						 
						
							
							
								
								Fix lightweight-serving no streaming resp on mtl ( #11822 )  
							
							 
							
							
							
						 
						
							2024-08-16 09:43:03 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Guancheng Fu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								e70ae0638e 
								
							 
						 
						
							
							
								
								Fix vLLM not convert issues ( #11817 )  
							
							 
							
							... 
							
							
							
							* Fix not convert issues
* refine 
							
						 
						
							2024-08-15 19:04:05 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								750d4ad5dc 
								
							 
						 
						
							
							
								
								fix minicpm-v-2 fp16 ( #11819 )  
							
							 
							
							
							
						 
						
							2024-08-15 18:34:40 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								6543321f04 
								
							 
						 
						
							
							
								
								Remove 4k igpu perf on gemma-2-9b-it ( #11820 )  
							
							 
							
							
							
						 
						
							2024-08-15 18:06:19 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Chu,Youcheng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								28d1c972da 
								
							 
						 
						
							
							
								
								add mixed_precision argument on ppl wikitext evaluation ( #11813 )  
							
							 
							
							... 
							
							
							
							* fix: delete ipex extension import in ppl wikitext evaluation
* feat: add mixed_precision argument on ppl wikitext evaluation
* fix: delete mix_precision command in perplex evaluation for wikitext
* fix: remove fp16 mixed-presicion argument
* fix: Add a space.
---------
Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com> 
							
						 
						
							2024-08-15 17:58:53 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								828ab16537 
								
							 
						 
						
							
							
								
								fix phi3 and minicpmv cpu ( #11818 )  
							
							 
							
							
							
						 
						
							2024-08-15 17:43:29 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								4e178f0c5d 
								
							 
						 
						
							
							
								
								rewrite minicpmv optimization ( #11816 )  
							
							 
							
							
							
						 
						
							2024-08-15 17:27:12 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ch1y0q 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								447c8ed324 
								
							 
						 
						
							
							
								
								update transformers version for replit-code-v1-3b, `internlm2-chat-… ( #11811 )  
							
							 
							
							... 
							
							
							
							* update transformers version for `replit-code-v1-3b`, `internlm2-chat-7b` and mistral
* remove for default transformers version 
							
						 
						
							2024-08-15 16:40:48 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jinhe 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								2fbbb51e71 
								
							 
						 
						
							
							
								
								transformers==4.37, yi & yuan2 & vicuna ( #11805 )  
							
							 
							
							... 
							
							
							
							* transformers==4.37
* added yi model
* added yi model
* xxxx
* delete prompt template
* / and delete 
							
						 
						
							2024-08-15 15:39:24 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jinhe 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f43da2d455 
								
							 
						 
						
							
							
								
								deletion of specification of transformers version ( #11808 )  
							
							 
							
							
							
						 
						
							2024-08-15 15:23:32 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								07b7f13982 
								
							 
						 
						
							
							
								
								support and optimize qwen2-audio ( #11809 )  
							
							 
							
							
							
						 
						
							2024-08-15 14:59:04 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Chu,Youcheng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								3ac83f8396 
								
							 
						 
						
							
							
								
								fix: delete ipex extension import in ppl wikitext evaluation ( #11806 )  
							
							 
							
							... 
							
							
							
							Co-authored-by: Jinhe Tang <jin.tang1337@gmail.com> 
							
						 
						
							2024-08-15 13:40:01 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								9a93808fc5 
								
							 
						 
						
							
							
								
								fix and optimize minicpm v 2 ( #11799 )  
							
							 
							
							
							
						 
						
							2024-08-14 17:27:23 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jinhe 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								d8d887edd2 
								
							 
						 
						
							
							
								
								added minicpm-v-2_6 ( #11794 )  
							
							 
							
							
							
						 
						
							2024-08-14 16:23:44 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								3d6cfa291d 
								
							 
						 
						
							
							
								
								optimize minicpm v 2.5 ( #11793 )  
							
							 
							
							
							
						 
						
							2024-08-14 16:07:24 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								356281cb80 
								
							 
						 
						
							
							
								
								Further all-in-one benchmark update continuation task ( #11784 )  
							
							 
							
							... 
							
							
							
							* Further update prompt for continuation task, and disable lookup candidate update strategy on MTL
* style fix 
							
						 
						
							2024-08-14 14:39:34 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								43cca3be27 
								
							 
						 
						
							
							
								
								fix gemma2 runtime error caused by sliding window ( #11788 )  
							
							 
							
							... 
							
							
							
							* fix runtime error
* revert workflow 
							
						 
						
							2024-08-14 10:43:33 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yang Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								51bcac1229 
								
							 
						 
						
							
							
								
								follow up on experimental support of fused decoder layer for llama2 ( #11785 )  
							
							 
							
							... 
							
							
							
							* clean up and support transpose value cache
* refine
* fix style
* fix style 
							
						 
						
							2024-08-13 18:53:55 -07:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								cb79dcda93 
								
							 
						 
						
							
							
								
								refactor llama convert to fix minicpm-v 2.5 optimization ( #11783 )  
							
							 
							
							
							
						 
						
							2024-08-14 09:29:57 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yina Chen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7cd6ec9723 
								
							 
						 
						
							
							
								
								MiniCPM-V support compresskv ( #11779 )  
							
							 
							
							... 
							
							
							
							* fix check error
* fix other models
* remove print 
							
						 
						
							2024-08-13 19:03:40 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Qiyuan Gong 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								3998de14f0 
								
							 
						 
						
							
							
								
								Fix mistral forward_qkv in q4_0 ( #11781 )  
							
							 
							
							... 
							
							
							
							* Fix mistral forward_qkv without self.rotary_emb.base in q4_0.
* Replace apply_rotary_pos_emb_no_cache_xpu with rotary_half_inplaced.
* Revert https://github.com/intel-analytics/ipex-llm/pull/11765  
							
						 
						
							2024-08-13 16:48:19 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Heyang Sun 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								70c828b87c 
								
							 
						 
						
							
							
								
								deepspeed zero3 QLoRA finetuning ( #11625 )  
							
							 
							
							... 
							
							
							
							* deepspeed zero3 QLoRA finetuning
* Update convert.py
* Update low_bit_linear.py
* Update utils.py
* Update qlora_finetune_llama2_13b_arch_2_card.sh
* Update low_bit_linear.py
* Update alpaca_qlora_finetuning.py
* Update low_bit_linear.py
* Update utils.py
* Update convert.py
* Update alpaca_qlora_finetuning.py
* Update alpaca_qlora_finetuning.py
* Update low_bit_linear.py
* Update deepspeed_zero3.json
* Update qlora_finetune_llama2_13b_arch_2_card.sh
* Update low_bit_linear.py
* Update low_bit_linear.py
* Update utils.py
* fix style
* fix style
* Update alpaca_qlora_finetuning.py
* Update qlora_finetune_llama2_13b_arch_2_card.sh
* Update convert.py
* Update low_bit_linear.py
* Update model.py
* Update alpaca_qlora_finetuning.py
* Update low_bit_linear.py
* Update low_bit_linear.py
* Update low_bit_linear.py 
							
						 
						
							2024-08-13 16:15:29 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a184b120c9 
								
							 
						 
						
							
							
								
								fix minicpm-v 2.5 ( #11780 )  
							
							 
							
							
							
						 
						
							2024-08-13 16:14:00 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ec184af243 
								
							 
						 
						
							
							
								
								Add gemma-2-2b-it and gemma-2-9b-it to igpu nightly performance test ( #11778 )  
							
							 
							
							... 
							
							
							
							* add yaml and modify `concat_csv.py` for `transformers` 4.43.1 (#11758 )
* add yaml and modify `concat_csv.py` for `transformers` 4.43.1
* remove 4.43 for arc; fix;
* remove 4096-512 for 4.43
* comment some models
* Small fix
* uncomment models (#11777 )
---------
Co-authored-by: Ch1y0q <qiyue2001@gmail.com> 
							
						 
						
							2024-08-13 15:39:56 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Qiyuan Gong 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a88c132e54 
								
							 
						 
						
							
							
								
								Reduce Mistral softmax memory only in low memory mode ( #11775 )  
							
							 
							
							... 
							
							
							
							* Reduce Mistral softmax memory only in low memory mode 
							
						 
						
							2024-08-13 14:50:54 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								aa861df066 
								
							 
						 
						
							
							
								
								use new fp32 softmax kernel ( #11776 )  
							
							 
							
							
							
						 
						
							2024-08-13 14:48:11 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								23d3acdc77 
								
							 
						 
						
							
							
								
								Add experimental support of fused decoder layer for llama2 ( #11768 )  
							
							 
							
							
							
						 
						
							2024-08-13 14:41:36 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jin, Qiao 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c28b3389e6 
								
							 
						 
						
							
							
								
								Update npu multimodal example ( #11773 )  
							
							 
							
							
							
						 
						
							2024-08-13 14:14:59 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								81824ff8c9 
								
							 
						 
						
							
							
								
								Fix stdout in all-in-one benchmark to utf-8 ( #11772 )  
							
							 
							
							
							
						 
						
							2024-08-13 10:51:08 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a1eb793f70 
								
							 
						 
						
							
							
								
								optimize minicpm v 2_6 firs token perf ( #11770 )  
							
							 
							
							
							
						 
						
							2024-08-13 09:51:18 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yina Chen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								841dbcdf3a 
								
							 
						 
						
							
							
								
								Fix compresskv with lookahead issue ( #11767 )  
							
							 
							
							... 
							
							
							
							* fix compresskv + lookahead attn_mask qwen2
* support llama chatglm
* support mistral & chatglm
* address comments
* revert run.py 
							
						 
						
							2024-08-12 18:53:55 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f97a77ea4e 
								
							 
						 
						
							
							
								
								Update all-in-one benchmark for continuation task input preparation ( #11760 )  
							
							 
							
							... 
							
							
							
							* All use 8192.txt for prompt preparation for now
* Small fix
* Fix text encoding mode to utf-8
* Small update 
							
						 
						
							2024-08-12 17:49:45 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xu, Shuo 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								1b05caba2b 
								
							 
						 
						
							
							
								
								Set mistral fuse rope to false except fp6 & fp16 ( #11765 )  
							
							 
							
							... 
							
							
							
							* set mistral fuse rope to false except fp6 & fp16
* lint
* lint
---------
Co-authored-by: ATMxsp01 <shou.xu@intel.com> 
							
						 
						
							2024-08-12 17:25:07 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								8db34057b4 
								
							 
						 
						
							
							
								
								optimize lookahead init time ( #11769 )  
							
							 
							
							
							
						 
						
							2024-08-12 17:19:12 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jin, Qiao 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								05989ad0f9 
								
							 
						 
						
							
							
								
								Update npu example and all in one benckmark ( #11766 )  
							
							 
							
							
							
						 
						
							2024-08-12 16:46:46 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								57d177738d 
								
							 
						 
						
							
							
								
								optimize minicpm-v-2_6 repetition penalty ( #11763 )  
							
							 
							
							
							
						 
						
							2024-08-12 14:10:10 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Wang, Jian4 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								245dba0abc 
								
							 
						 
						
							
							
								
								Fix lightweight-serving codegeex error ( #11759 )  
							
							 
							
							
							
						 
						
							2024-08-12 10:35:37 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								66fe2ee464 
								
							 
						 
						
							
							
								
								initial support of IPEX_LLM_PERFORMANCE_MODE  ( #11754 )  
							
							 
							
							... 
							
							
							
							* add perf mode
* update
* fix style 
							
						 
						
							2024-08-09 19:04:09 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yina Chen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								4b9c57cc60 
								
							 
						 
						
							
							
								
								Support compress kv with lookahead ( #11752 )  
							
							 
							
							... 
							
							
							
							* support compress kv with lookahead
* enough kv miss param 
							
						 
						
							2024-08-09 17:39:57 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								93455aac09 
								
							 
						 
						
							
							
								
								fix minicpm V 2.6 repeat output ( #11753 )  
							
							 
							
							
							
						 
						
							2024-08-09 17:39:24 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7e917d6cfb 
								
							 
						 
						
							
							
								
								fix gptq of llama ( #11749 )  
							
							 
							
							... 
							
							
							
							* fix gptq of llama
* small fix 
							
						 
						
							2024-08-09 16:39:25 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yina Chen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								dd46c141bd 
								
							 
						 
						
							
							
								
								Phi3 support compresskv ( #11733 )  
							
							 
							
							... 
							
							
							
							* phi3 support compresskv
* fix phi3 mtl error
* fix conflict with quant kv
* fix abnormal on mtl
* fix style
* use slide windows size to compress kv
* support sliding window
* fix style
* fix style
* temp: partial support quant kv
* support quant kv with compress kv, todo: model check
* temp
* fix style
* fix style
* remove prepare
* address comment
* default -> 1.8k 
							
						 
						
							2024-08-09 15:43:43 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Qiyuan Gong 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								d8808cc2e3 
								
							 
						 
						
							
							
								
								Mistral apply_rotary_pos_emb_no_cache_xpu use rope_theta from config ( #11747 )  
							
							 
							
							... 
							
							
							
							mistral-7B-instruct-v0.2 and mistral-7B-instruct-v0.1 use different rope_theta (0.2 is 1e, 0.1 is 1e5). Pass self.config.rope_theta to apply_rotary_pos_emb_no_cache_xpu to avoid output difference. 
							
						 
						
							2024-08-09 10:35:51 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xiangyu Tian 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								044e486480 
								
							 
						 
						
							
							
								
								Fix vLLM CPU /chat endpoint ( #11748 )  
							
							 
							
							
							
						 
						
							2024-08-09 10:33:52 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jinhe 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								27b4b104ed 
								
							 
						 
						
							
							
								
								Add qwen2-1.5b-instruct into igpu performance ( #11735 )  
							
							 
							
							... 
							
							
							
							* updated qwen1.5B to all transformer==4.37 yaml
* updated qwen1.5B to all transformer==4.37 yaml 
							
						 
						
							2024-08-08 16:42:18 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Shaojun Liu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								107f7aafd0 
								
							 
						 
						
							
							
								
								enable inference mode for deepspeed tp serving ( #11742 )  
							
							 
							
							
							
						 
						
							2024-08-08 14:38:30 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Zijie Li 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								9e65cf00b3 
								
							 
						 
						
							
							
								
								Add openai-whisper pytorch gpu ( #11736 )  
							
							 
							
							... 
							
							
							
							* Add openai-whisper pytorch gpu
* Update README.md
* Update README.md
* fix typo
* fix names update readme
* Update README.md 
							
						 
						
							2024-08-08 12:32:59 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jinhe 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								d0c89fb715 
								
							 
						 
						
							
							
								
								updated llama.cpp and ollama quickstart ( #11732 )  
							
							 
							
							... 
							
							
							
							* updated llama.cpp and ollama quickstart.md
* added qwen2-1.5B sample output
* revision on quickstart updates
* revision on quickstart updates
* revision on qwen2 readme
* added 2 troubleshoots“
”
* troubleshoot revision 
							
						 
						
							2024-08-08 11:04:01 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								54cc9353db 
								
							 
						 
						
							
							
								
								support and optimize minicpm-v-2_6 ( #11738 )  
							
							 
							
							
							
						 
						
							2024-08-07 18:21:16 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yina Chen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								e956e71fc1 
								
							 
						 
						
							
							
								
								fix conflict with quant kv ( #11737 )  
							
							 
							
							
							
						 
						
							2024-08-07 18:10:30 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								00a5574c8a 
								
							 
						 
						
							
							
								
								Use merge_qkv to replace fused_qkv for llama2 ( #11727 )  
							
							 
							
							... 
							
							
							
							* update 4.38
* support new versions
* update
* fix style
* fix style
* update rope
* temp test sdpa
* fix style
* fix cpu ut 
							
						 
						
							2024-08-07 18:04:01 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yina Chen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								d2abc9711b 
								
							 
						 
						
							
							
								
								Fix MTL 4k input qwen2 compresskv error ( #11734 )  
							
							 
							
							... 
							
							
							
							* fix
* fix style 
							
						 
						
							2024-08-07 16:21:57 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yina Chen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a71ae7c22b 
								
							 
						 
						
							
							
								
								Support minicpm compresskv & modify default compresskv config & default enable compresskv on mtl 2.5k~4.5k ( #11726 )  
							
							 
							
							... 
							
							
							
							* support minicpm & modify default & default enable on mtl 2.5k~4.5k
* fix style 
							
						 
						
							2024-08-07 11:35:39 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c093f7d980 
								
							 
						 
						
							
							
								
								fix phi3 ( #11729 )  
							
							 
							
							
							
						 
						
							2024-08-07 09:39:46 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Zijie Li 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								e7f7141781 
								
							 
						 
						
							
							
								
								Add benchmark util for transformers 4.42 ( #11725 )  
							
							 
							
							... 
							
							
							
							* add new benchmark_util.py
Add new benchmark_util.py for transformers>=4.43.1. The old one renamed to benchmark_util_prev.py.
* Small fix to import code
* Update __init__.py
* fix file names
* Update lint-python
Update lint-python to exclude benchmark_util_4_29.py
benchmark_util_4_43.py
* Update benchmark_util_4_43.py
* add benchmark_util for transformers 4.42 
							
						 
						
							2024-08-07 08:48:07 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ch1y0q 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								4676af2054 
								
							 
						 
						
							
							
								
								add gemma2 example ( #11724 )  
							
							 
							
							... 
							
							
							
							* add `gemma2`
* update `transformers` version
* update `README.md` 
							
						 
						
							2024-08-06 21:17:50 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									SichengStevenLi 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								985213614b 
								
							 
						 
						
							
							
								
								Removed no longer needed models for Arc nightly perf  ( #11722 )  
							
							 
							
							... 
							
							
							
							* removed LLMs that are no longer needed
Removed: 
mistralai/Mistral-7B-v0.1
deepseek-ai/deepseek-coder-6.7b-instruct
* Update arc-perf-test-batch4.yaml
Removed: 
deepseek-ai/deepseek-coder-6.7b-instruct
mistralai/Mistral-7B-v0.1
* Update arc-perf-test.yaml
Removed: 
deepseek-ai/deepseek-coder-6.7b-instruct
mistralai/Mistral-7B-v0.1
* Create arc-perf-transformers-438.yaml
* Moved arc-perf-transformers-438.yaml location
* Create arc-perf-transformers-438-batch2.yaml
* Create arc-perf-transformers-438-batch4.yaml
* Delete python/llm/test/benchmark/arc-perf-transformers-438-batch2.yaml
* Delete python/llm/test/benchmark/arc-perf-transformers-438-batch4.yaml
* Delete python/llm/test/benchmark/arc-perf-transformers-438.yaml 
							
						 
						
							2024-08-06 16:12:00 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								929675aa6b 
								
							 
						 
						
							
							
								
								support latest phi3 ( #11721 )  
							
							 
							
							
							
						 
						
							2024-08-06 15:52:55 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jin, Qiao 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								11650b6f81 
								
							 
						 
						
							
							
								
								upgrade glm-4v example transformers version ( #11719 )  
							
							 
							
							
							
						 
						
							2024-08-06 14:55:09 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								bbdff6edeb 
								
							 
						 
						
							
							
								
								optimize internvl2 4b performance ( #11720 )  
							
							 
							
							
							
						 
						
							2024-08-06 14:25:08 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f44b732aa8 
								
							 
						 
						
							
							
								
								support internvl2-4b ( #11718 )  
							
							 
							
							
							
						 
						
							2024-08-06 13:36:32 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jin, Qiao 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7f241133da 
								
							 
						 
						
							
							
								
								Add MiniCPM-Llama3-V-2_5 GPU example ( #11693 )  
							
							 
							
							... 
							
							
							
							* Add MiniCPM-Llama3-V-2_5 GPU example
* fix 
							
						 
						
							2024-08-06 10:22:41 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jin, Qiao 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								808d9a7bae 
								
							 
						 
						
							
							
								
								Add MiniCPM-V-2 GPU example ( #11699 )  
							
							 
							
							... 
							
							
							
							* Add MiniCPM-V-2 GPU example
* add example in README.md
* add example in README.md 
							
						 
						
							2024-08-06 10:22:33 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Zijie Li 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								8fb36b9f4a 
								
							 
						 
						
							
							
								
								add new benchmark_util.py ( #11713 )  
							
							 
							
							... 
							
							
							
							* add new benchmark_util.py 
							
						 
						
							2024-08-05 16:18:48 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Wang, Jian4 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								493cbd9a36 
								
							 
						 
						
							
							
								
								Support  lightweight-serving with internlm-xcomposer2-vl-7b multimodal  input ( #11703 )  
							
							 
							
							... 
							
							
							
							* init image_list
* enable internlm-xcomposer2 image input
* update style
* add readme
* update model
* update readme 
							
						 
						
							2024-08-05 09:36:04 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								aa98ef96fe 
								
							 
						 
						
							
							
								
								change mixed_precision to q6_k ( #11706 )  
							
							 
							
							
							
						 
						
							2024-08-02 15:55:16 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xiangyu Tian 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								1baa3efe0e 
								
							 
						 
						
							
							
								
								Optimizations for Pipeline Parallel Serving ( #11702 )  
							
							 
							
							... 
							
							
							
							Optimizations for Pipeline Parallel Serving 
							
						 
						
							2024-08-02 12:06:59 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yina Chen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								8d1e0bd2f4 
								
							 
						 
						
							
							
								
								add sdp causal support in llama ( #11705 )  
							
							 
							
							
							
						 
						
							2024-08-02 10:27:40 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								736a7ef72e 
								
							 
						 
						
							
							
								
								add sdp_causal for mistral 4.36 ( #11686 )  
							
							 
							
							... 
							
							
							
							* add sdp_causal for mistral
* fix
* update 
							
						 
						
							2024-08-01 18:57:31 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yina Chen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								45c730ff39 
								
							 
						 
						
							
							
								
								Chatglm support compresskv ( #11690 )  
							
							 
							
							... 
							
							
							
							* chatglm4 support compresskv
* fix
* fix style
* support chatglm2
* fix quantkv conflict
* fix style 
							
						 
						
							2024-08-01 18:20:20 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Qiyuan Gong 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								762ad49362 
								
							 
						 
						
							
							
								
								Add RANK_WAIT_TIME into DeepSpeed-AutoTP to avoid CPU memory OOM ( #11704 )  
							
							 
							
							... 
							
							
							
							* DeepSpeed-AutoTP will start multiple processors to load models and convert them in CPU memory. If model/rank_num is large, this will lead to OOM. Add RANK_WAIT_TIME to reduce memory usage by controlling model reading parallelism. 
							
						 
						
							2024-08-01 18:16:21 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									hxsz1997 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								8ef4caaf5d 
								
							 
						 
						
							
							
								
								add 3k and 4k input of nightly perf test on iGPU ( #11701 )  
							
							 
							
							... 
							
							
							
							* Add 3k&4k input in workflow for iGPU (#11685 )
* add 3k&4k input in workflow
* comment for test
* comment models for accelarate test
* remove OOM models
* modify typo
* change test model (#11696 )
* reverse test models (#11700 ) 
							
						 
						
							2024-08-01 14:17:46 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Guancheng Fu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								afeca38a47 
								
							 
						 
						
							
							
								
								Fix import vllm condition ( #11682 )  
							
							 
							
							
							
						 
						
							2024-07-31 13:50:01 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								54bf3a23a6 
								
							 
						 
						
							
							
								
								add fallback for unsupported k-quants ( #11691 )  
							
							 
							
							... 
							
							
							
							* add fallback
* fix style
* fix 
							
						 
						
							2024-07-31 11:39:58 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Zijie Li 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								5079ed9e06 
								
							 
						 
						
							
							
								
								Add Llama3.1 example ( #11689 )  
							
							 
							
							... 
							
							
							
							* Add Llama3.1 example
Add Llama3.1 example for Linux arc and Windows MTL
* Changes made to adjust compatibilities
transformers changed to 4.43.1
* Update index.rst
* Update README.md
* Update index.rst
* Update index.rst
* Update index.rst 
							
						 
						
							2024-07-31 10:53:30 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jin, Qiao 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								6e3ce28173 
								
							 
						 
						
							
							
								
								Upgrade glm-4 example transformers version ( #11659 )  
							
							 
							
							... 
							
							
							
							* upgrade glm-4 example transformers version
* move pip install in one line 
							
						 
						
							2024-07-31 10:24:50 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jin, Qiao 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a44ab32153 
								
							 
						 
						
							
							
								
								Switch to conhost when running on NPU ( #11687 )  
							
							 
							
							
							
						 
						
							2024-07-30 17:08:06 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Wang, Jian4 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								b119825152 
								
							 
						 
						
							
							
								
								Remove tgi parameter validation ( #11688 )  
							
							 
							
							... 
							
							
							
							* remove validation
* add min warm up
* remove no need source 
							
						 
						
							2024-07-30 16:37:44 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yina Chen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								670ad887fc 
								
							 
						 
						
							
							
								
								Qwen support compress kv ( #11680 )  
							
							 
							
							... 
							
							
							
							* Qwen support compress kv
* fix style
* fix 
							
						 
						
							2024-07-30 11:16:42 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									hxsz1997 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								9b36877897 
								
							 
						 
						
							
							
								
								disable default quantize_kv of GQA on MTL ( #11679 )  
							
							 
							
							... 
							
							
							
							* disable default quantizekv of gqa in mtl
* fix stype
* fix stype
* fix stype
* fix stype
* fix stype
* fix stype 
							
						 
						
							2024-07-30 09:38:46 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c02003925b 
								
							 
						 
						
							
							
								
								add mlp for gemma2 ( #11678 )  
							
							 
							
							
							
						 
						
							2024-07-29 16:10:23 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									RyuKosei 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								1da1f1dd0e 
								
							 
						 
						
							
							
								
								Combine two versions of run_wikitext.py ( #11597 )  
							
							 
							
							... 
							
							
							
							* Combine two versions of run_wikitext.py
* Update run_wikitext.py
* Update run_wikitext.py
* aligned the format
* update error display
* simplified argument parser
---------
Co-authored-by: jenniew <jenniewang123@gmail.com> 
							
						 
						
							2024-07-29 15:56:16 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								6f999e6e90 
								
							 
						 
						
							
							
								
								add sdp for gemma2 ( #11677 )  
							
							 
							
							
							
						 
						
							2024-07-29 15:15:47 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c11d5301d7 
								
							 
						 
						
							
							
								
								add sdp fp8 for llama ( #11671 )  
							
							 
							
							... 
							
							
							
							* add sdp fp8 for llama
* fix style
* refactor 
							
						 
						
							2024-07-29 13:46:22 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7f88ce23cd 
								
							 
						 
						
							
							
								
								add more gemma2 optimization ( #11673 )  
							
							 
							
							
							
						 
						
							2024-07-29 11:13:00 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								3e8819734b 
								
							 
						 
						
							
							
								
								add basic gemma2 optimization ( #11672 )  
							
							 
							
							
							
						 
						
							2024-07-29 10:46:51 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Guoqiong Song 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								336dfc04b1 
								
							 
						 
						
							
							
								
								fix 1482 ( #11661 )  
							
							 
							
							... 
							
							
							
							Co-authored-by: rnwang04 <ruonan1.wang@intel.com> 
							
						 
						
							2024-07-26 12:39:09 -07:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Heyang Sun 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ba01b85c13 
								
							 
						 
						
							
							
								
								empty cache only for 1st token but rest token to speed up ( #11665 )  
							
							 
							
							
							
						 
						
							2024-07-26 16:46:21 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yina Chen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								fc7f8feb83 
								
							 
						 
						
							
							
								
								Support compress kv ( #11642 )  
							
							 
							
							... 
							
							
							
							* mistral snapkv
* update
* mtl update
* update
* update
* update
* add comments
* style fix
* fix style
* support llama
* llama use compress kv
* support mistral 4.40
* fix style
* support diff transformers versions
* move snapkv util to kv
* fix style
* meet comments & small fix
* revert all in one
* fix indent
---------
Co-authored-by: leonardozcm <leonardo1997zcm@gmail.com> 
							
						 
						
							2024-07-26 16:02:00 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								6bcdc6cc8f 
								
							 
						 
						
							
							
								
								fix qwen2 cpu ( #11663 )  
							
							 
							
							
							
						 
						
							2024-07-26 13:41:51 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Wang, Jian4 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								23681fbf5c 
								
							 
						 
						
							
							
								
								Support codegeex4-9b for lightweight-serving ( #11648 )  
							
							 
							
							... 
							
							
							
							* add options, support prompt and not return end_token
* enable openai parameter
* set do_sample None and update style 
							
						 
						
							2024-07-26 09:41:03 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Guancheng Fu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a4d30a8211 
								
							 
						 
						
							
							
								
								Change logic for detecting if vllm is available ( #11657 )  
							
							 
							
							... 
							
							
							
							* fix
* fix 
							
						 
						
							2024-07-25 15:24:19 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Qiyuan Gong 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								0c6e0b86c0 
								
							 
						 
						
							
							
								
								Refine continuation get input_str ( #11652 )  
							
							 
							
							... 
							
							
							
							* Remove duplicate code in continuation get input_str.
* Avoid infinite loop in all-in-one due to test_length not in the list. 
							
						 
						
							2024-07-25 14:41:19 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									RyuKosei 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								2fbd375a94 
								
							 
						 
						
							
							
								
								update several models for nightly perf test ( #11643 )  
							
							 
							
							... 
							
							
							
							Co-authored-by: Yishuo Wang <yishuo.wang@intel.com> 
							
						 
						
							2024-07-25 14:06:08 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xiangyu Tian 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								4499d25c26 
								
							 
						 
						
							
							
								
								LLM: Fix ParallelLMHead convert in vLLM cpu ( #11654 )  
							
							 
							
							
							
						 
						
							2024-07-25 13:07:19 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								777e61d8c8 
								
							 
						 
						
							
							
								
								Fix qwen2 & int4 on NPU ( #11646 )  
							
							 
							
							
							
						 
						
							2024-07-24 13:14:39 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								1b3b46e54d 
								
							 
						 
						
							
							
								
								fix chatglm new model ( #11639 )  
							
							 
							
							
							
						 
						
							2024-07-23 13:44:56 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xu, Shuo 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7f80db95eb 
								
							 
						 
						
							
							
								
								Change run.py in benchmark to support phi-3-vision in arc-perf ( #11638 )  
							
							 
							
							... 
							
							
							
							Co-authored-by: ATMxsp01 <shou.xu@intel.com> 
							
						 
						
							2024-07-23 09:51:36 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xiangyu Tian 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								060792a648 
								
							 
						 
						
							
							
								
								LLM: Refine Pipeline Parallel FastAPI ( #11587 )  
							
							 
							
							... 
							
							
							
							Refine Pipeline Parallel FastAPI 
							
						 
						
							2024-07-22 15:52:05 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Wang, Jian4 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								1eed0635f2 
								
							 
						 
						
							
							
								
								Add lightweight serving and support tgi parameter ( #11600 )  
							
							 
							
							... 
							
							
							
							* init tgi request
* update openai api
* update for pp
* update and add readme
* add to docker
* add start bash
* update
* update
* update 
							
						 
						
							2024-07-19 13:15:56 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xiangyu Tian 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								d27a8cd08c 
								
							 
						 
						
							
							
								
								Fix Pipeline Parallel dtype ( #11623 )  
							
							 
							
							
							
						 
						
							2024-07-19 13:07:40 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								d020ad6397 
								
							 
						 
						
							
							
								
								add save_low_bit support for DiskEmbedding ( #11621 )  
							
							 
							
							
							
						 
						
							2024-07-19 10:34:53 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Guoqiong Song 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								380717f50d 
								
							 
						 
						
							
							
								
								fix gemma for 4.41 ( #11531 )  
							
							 
							
							... 
							
							
							
							* fix gemma for 4.41 
							
						 
						
							2024-07-18 15:02:50 -07:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Guoqiong Song 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								5a6211fd56 
								
							 
						 
						
							
							
								
								fix minicpm for transformers>=4.39 ( #11533 )  
							
							 
							
							... 
							
							
							
							* fix minicpm for transformers>=4.39 
							
						 
						
							2024-07-18 15:01:57 -07:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								0209427cf4 
								
							 
						 
						
							
							
								
								Add disk_embedding parameter to support put Embedding layer on CPU ( #11617 )  
							
							 
							
							
							
						 
						
							2024-07-18 17:06:06 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								2478e2c14b 
								
							 
						 
						
							
							
								
								Add check in iGPU perf workflow for results integrity ( #11616 )  
							
							 
							
							... 
							
							
							
							* Add csv check for igpu benchmark workflow (#11610 )
* add csv check for igpu benchmark workflow
* ready to test
---------
Co-authored-by: ATMxsp01 <shou.xu@intel.com>
* Restore the temporarily removed models in iGPU-perf (#11615 )
Co-authored-by: ATMxsp01 <shou.xu@intel.com>
---------
Co-authored-by: Xu, Shuo <100334393+ATMxsp01@users.noreply.github.com>
Co-authored-by: ATMxsp01 <shou.xu@intel.com> 
							
						 
						
							2024-07-18 14:13:16 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xiangyu Tian 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								4594a3dd6c 
								
							 
						 
						
							
							
								
								LLM: Fix DummyLayer.weight device in Pipeline Parallel ( #11612 )  
							
							 
							
							
							
						 
						
							2024-07-18 13:39:34 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								4da93709b1 
								
							 
						 
						
							
							
								
								update doc/setup to use onednn gemm for cpp ( #11598 )  
							
							 
							
							... 
							
							
							
							* update doc/setup to use onednn gemm
* small fix
* Change TOC of graphrag quickstart back 
							
						 
						
							2024-07-18 13:04:38 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f4077fa905 
								
							 
						 
						
							
							
								
								fix llama3-8b npu long input stuck ( #11613 )  
							
							 
							
							
							
						 
						
							2024-07-18 11:08:17 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Zhao Changmin 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								e5c0058c0e 
								
							 
						 
						
							
							
								
								fix baichuan ( #11606 )  
							
							 
							
							
							
						 
						
							2024-07-18 09:43:36 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Guoqiong Song 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								bfcdc35b04 
								
							 
						 
						
							
							
								
								phi-3 on "transformers>=4.37.0,<=4.42.3" ( #11534 )  
							
							 
							
							
							
						 
						
							2024-07-17 17:19:57 -07:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Guoqiong Song 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								d64711900a 
								
							 
						 
						
							
							
								
								Fix cohere model on transformers>=4.41 ( #11575 )  
							
							 
							
							... 
							
							
							
							* fix cohere model for 4-41 
							
						 
						
							2024-07-17 17:18:59 -07:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Guoqiong Song 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								5b6eb85b85 
								
							 
						 
						
							
							
								
								phi model readme ( #11595 )  
							
							 
							
							... 
							
							
							
							Co-authored-by: rnwang04 <ruonan1.wang@intel.com> 
							
						 
						
							2024-07-17 17:18:34 -07:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Wang, Jian4 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								9c15abf825 
								
							 
						 
						
							
							
								
								Refactor fastapi-serving and add one card serving( #11581 )  
							
							 
							
							... 
							
							
							
							* init fastapi-serving one card
* mv api code to source
* update worker
* update for style-check
* add worker
* update bash
* update
* update worker name and add readme
* rename update
* rename to fastapi 
							
						 
						
							2024-07-17 11:12:43 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								5837bc0014 
								
							 
						 
						
							
							
								
								fix chatglm3 npu output ( #11590 )  
							
							 
							
							
							
						 
						
							2024-07-16 18:16:30 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Guancheng Fu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								06930ab258 
								
							 
						 
						
							
							
								
								Enable ipex-llm optimization for lm head ( #11589 )  
							
							 
							
							... 
							
							
							
							* basic
* Modify convert.py
* fix 
							
						 
						
							2024-07-16 16:48:44 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Heyang Sun 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								365adad59f 
								
							 
						 
						
							
							
								
								Support LoRA ChatGLM with Alpaca Dataset ( #11580 )  
							
							 
							
							... 
							
							
							
							* Support LoRA ChatGLM with Alpaca Dataset
* refine
* fix
* add 2-card alpaca 
							
						 
						
							2024-07-16 15:40:02 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yina Chen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								99c22745b2 
								
							 
						 
						
							
							
								
								fix qwen 14b fp6 abnormal output ( #11583 )  
							
							 
							
							
							
						 
						
							2024-07-16 10:59:00 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c279849d27 
								
							 
						 
						
							
							
								
								add disk embedding api ( #11585 )  
							
							 
							
							
							
						 
						
							2024-07-16 10:43:39 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xiangyu Tian 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								79c742dfd5 
								
							 
						 
						
							
							
								
								LLM: Add XPU Memory Optimizations for Pipeline Parallel ( #11567 )  
							
							 
							
							... 
							
							
							
							Add XPU Memory Optimizations for Pipeline Parallel 
							
						 
						
							2024-07-16 09:44:50 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ch1y0q 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								50cf563a71 
								
							 
						 
						
							
							
								
								Add example: MiniCPM-V ( #11570 )  
							
							 
							
							
							
						 
						
							2024-07-15 10:55:48 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Zhao Changmin 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								06745e5742 
								
							 
						 
						
							
							
								
								Add npu benchmark all-in-one script ( #11571 )  
							
							 
							
							... 
							
							
							
							* npu benchmark 
							
						 
						
							2024-07-15 10:42:37 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								019da6c0ab 
								
							 
						 
						
							
							
								
								use mlp silu_mul fusion in qwen2 to optimize memory usage ( #11574 )  
							
							 
							
							
							
						 
						
							2024-07-13 16:32:54 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xu, Shuo 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								13a72dc51d 
								
							 
						 
						
							
							
								
								Test MiniCPM performance on iGPU in a more stable way ( #11573 )  
							
							 
							
							... 
							
							
							
							* Test MiniCPM performance on iGPU in a more stable way
* small fix
---------
Co-authored-by: ATMxsp01 <shou.xu@intel.com> 
							
						 
						
							2024-07-12 17:07:41 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xiangyu Tian 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								0981b72275 
								
							 
						 
						
							
							
								
								Fix /generate_stream api in Pipeline Parallel FastAPI ( #11569 )  
							
							 
							
							
							
						 
						
							2024-07-12 13:19:42 +08:00