SONG Ge 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ef4b6519fb 
								
							 
						 
						
							
							
								
								Add phi-3 model support for pipeline parallel inference ( #11334 )  
							
							 
							
							... 
							
							
							
							* add phi-3 model support
* add phi3 example 
							
						 
						
							2024-06-17 17:44:24 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									hxsz1997 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								99b309928b 
								
							 
						 
						
							
							
								
								Add lookahead in test_api: transformer_int4_fp16_gpu ( #11337 )  
							
							 
							
							... 
							
							
							
							* add lookahead in test_api:transformer_int4_fp16_gpu
* change the short prompt of summarize
* change short prompt to cnn_64
* change short prompt of summarize 
							
						 
						
							2024-06-17 17:41:41 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Qiyuan Gong 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								5d7c9bf901 
								
							 
						 
						
							
							
								
								Upgrade accelerate to 0.23.0 ( #11331 )  
							
							 
							
							... 
							
							
							
							* Upgrade accelerate to 0.23.0 
							
						 
						
							2024-06-17 15:03:11 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xin Qiu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								183e0c6cf5 
								
							 
						 
						
							
							
								
								glm-4v-9b support ( #11327 )  
							
							 
							
							... 
							
							
							
							* chatglm4v support
* fix style check
* update glm4v 
							
						 
						
							2024-06-17 13:52:37 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Wenjing Margaret Mao 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								bca5cbd96c 
								
							 
						 
						
							
							
								
								Modify arc nightly perf to fp16 ( #11275 )  
							
							 
							
							... 
							
							
							
							* change api
* move to pr mode and remove the build
* add batch4 yaml and remove the bigcode
* remove batch4
* revert the starcode
* remove the exclude
* revert
---------
Co-authored-by: Yishuo Wang <yishuo.wang@intel.com> 
							
						 
						
							2024-06-17 13:47:22 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								6ea1e71af0 
								
							 
						 
						
							
							
								
								Update PP inference benchmark script ( #11323 )  
							
							 
							
							
							
						 
						
							2024-06-17 09:59:36 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									SONG Ge 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								be00380f1a 
								
							 
						 
						
							
							
								
								Fix pipeline parallel inference past_key_value error in Baichuan ( #11318 )  
							
							 
							
							... 
							
							
							
							* fix past_key_value error
* add baichuan2 example
* fix style
* update doc
* add script link in doc
* fix import error
* update 
							
						 
						
							2024-06-17 09:29:32 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yina Chen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								0af0102e61 
								
							 
						 
						
							
							
								
								Add quantization scale search switch ( #11326 )  
							
							 
							
							... 
							
							
							
							* add scale_search switch
* remove llama3 instruct
* remove print 
							
						 
						
							2024-06-14 18:46:52 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								8a3247ac71 
								
							 
						 
						
							
							
								
								support batch forward for q4_k, q6_k ( #11325 )  
							
							 
							
							
							
						 
						
							2024-06-14 18:25:50 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								e8dd8e97ef 
								
							 
						 
						
							
							
								
								fix chatglm lookahead on ARC ( #11320 )  
							
							 
							
							
							
						 
						
							2024-06-14 16:26:11 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Shaojun Liu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f5ef94046e 
								
							 
						 
						
							
							
								
								exclude dolly-v2-12b for arc perf test ( #11315 )  
							
							 
							
							... 
							
							
							
							* test arc perf
* test
* test
* exclude dolly-v2-12b:2048
* revert changes 
							
						 
						
							2024-06-14 15:35:56 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xiangyu Tian 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								4359ab3172 
								
							 
						 
						
							
							
								
								LLM: Add /generate_stream endpoint for Pipeline-Parallel-FastAPI example ( #11187 )  
							
							 
							
							... 
							
							
							
							Add /generate_stream and OpenAI-formatted endpoint for Pipeline-Parallel-FastAPI example 
							
						 
						
							2024-06-14 15:15:32 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jin Qiao 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								0e7a31a09c 
								
							 
						 
						
							
							
								
								ChatGLM Examples Restructure regarding Installation Steps  ( #11285 )  
							
							 
							
							... 
							
							
							
							* merge install step in glm examples
* fix section
* fix section
* fix tiktoken 
							
						 
						
							2024-06-14 12:37:05 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								91965b5d05 
								
							 
						 
						
							
							
								
								add glm_sdpa back to fix chatglm-6b ( #11313 )  
							
							 
							
							
							
						 
						
							2024-06-14 10:31:43 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7f65836cb9 
								
							 
						 
						
							
							
								
								fix chatglm2/3-32k/128k fp16 ( #11311 )  
							
							 
							
							
							
						 
						
							2024-06-14 09:58:07 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xin Qiu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								1b0c4c8cb8 
								
							 
						 
						
							
							
								
								use new rotary two in chatglm4 ( #11312 )  
							
							 
							
							... 
							
							
							
							* use new rotary two in chatglm4
* rempve 
							
						 
						
							2024-06-13 19:02:18 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xin Qiu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f1410d6823 
								
							 
						 
						
							
							
								
								refactor chatglm4  ( #11301 )  
							
							 
							
							... 
							
							
							
							* glm4
* remove useless code
* stype
* add rope_ratio
* update
* fix fp16
* fix style 
							
						 
						
							2024-06-13 18:06:04 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								5e25766855 
								
							 
						 
						
							
							
								
								fix and optimize chatglm2-32k and chatglm3-128k ( #11306 )  
							
							 
							
							
							
						 
						
							2024-06-13 17:37:58 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								60cb1dac7c 
								
							 
						 
						
							
							
								
								Support PP for qwen1.5  ( #11300 )  
							
							 
							
							
							
						 
						
							2024-06-13 17:35:24 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f97cce2642 
								
							 
						 
						
							
							
								
								Fix import error of ds autotp ( #11307 )  
							
							 
							
							
							
						 
						
							2024-06-13 16:22:52 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jin Qiao 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								3682c6a979 
								
							 
						 
						
							
							
								
								add glm4 and qwen2 to igpu perf ( #11304 )  
							
							 
							
							
							
						 
						
							2024-06-13 16:16:35 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a24666b8f3 
								
							 
						 
						
							
							
								
								fix chatglm3-6b-32k ( #11303 )  
							
							 
							
							
							
						 
						
							2024-06-13 16:01:34 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								01fe0fc1a2 
								
							 
						 
						
							
							
								
								refactor chatglm2/3 ( #11290 )  
							
							 
							
							
							
						 
						
							2024-06-13 12:22:58 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Guancheng Fu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								57a023aadc 
								
							 
						 
						
							
							
								
								Fix vllm tp ( #11297 )  
							
							 
							
							
							
						 
						
							2024-06-13 10:47:48 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								986af21896 
								
							 
						 
						
							
							
								
								fix perf test( #11295 )  
							
							 
							
							
							
						 
						
							2024-06-13 10:35:48 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								220151e2a1 
								
							 
						 
						
							
							
								
								Refactor pipeline parallel multi-stage implementation ( #11286 )  
							
							 
							
							
							
						 
						
							2024-06-13 10:00:23 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								14b1e6b699 
								
							 
						 
						
							
							
								
								Fix gguf_q4k ( #11293 )  
							
							 
							
							... 
							
							
							
							* udpate embedding parameter
* update benchmark 
							
						 
						
							2024-06-12 20:43:08 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								8edcdeb0e7 
								
							 
						 
						
							
							
								
								Fix bug that torch.ops.torch_ipex.matmul_bias_out cannot work on Linux MTL for short input ( #11292 )  
							
							 
							
							
							
						 
						
							2024-06-12 19:12:57 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Wenjing Margaret Mao 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								b61f6e3ab1 
								
							 
						 
						
							
							
								
								Add update_parent_folder for nightly_perf_test ( #11287 )  
							
							 
							
							... 
							
							
							
							* add update_parent_folder and change the workflow file
* add update_parent_folder and change the workflow file
* move to pr mode and comment the test
* use one model per comfig
* revert
---------
Co-authored-by: Yishuo Wang <yishuo.wang@intel.com> 
							
						 
						
							2024-06-12 17:58:13 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xin Qiu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								592f7aa61e 
								
							 
						 
						
							
							
								
								Refine glm1-4 sdp ( #11276 )  
							
							 
							
							... 
							
							
							
							* chatglm
* update
* update
* change chatglm
* update sdpa
* update
* fix style
* fix
* fix glm
* update glm2-32k
* update glm2-32k
* fix cpu
* update
* change lower_bound 
							
						 
						
							2024-06-12 17:11:56 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								cffb932f05 
								
							 
						 
						
							
							
								
								Expose timeout for streamer for fastchat worker ( #11288 )  
							
							 
							
							... 
							
							
							
							* Expose timeout for stremer for fastchat worker
* Change to read from env variables 
							
						 
						
							2024-06-12 17:02:40 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									ivy-lv11 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								e7a4e2296f 
								
							 
						 
						
							
							
								
								Add Stable Diffusion examples on GPU and CPU ( #11166 )  
							
							 
							
							... 
							
							
							
							* add sdxl and lcm-lora
* readme
* modify
* add cpu
* add license
* modify
* add file 
							
						 
						
							2024-06-12 16:33:25 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jin Qiao 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f224e98297 
								
							 
						 
						
							
							
								
								Add GLM-4 CPU example ( #11223 )  
							
							 
							
							... 
							
							
							
							* Add GLM-4 example
* add tiktoken dependency
* fix
* fix 
							
						 
						
							2024-06-12 15:30:51 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Zijie Li 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								40fc8704c4 
								
							 
						 
						
							
							
								
								Add GPU example for GLM-4 ( #11267 )  
							
							 
							
							... 
							
							
							
							* Add GPU example for GLM-4
* Update streamchat.py
* Fix pretrianed arguments
Fix pretrained arguments in generate and streamchat.py
* Update Readme
Update install tiktoken required for GLM-4
* Update comments in generate.py 
							
						 
						
							2024-06-12 14:29:50 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Qiyuan Gong 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								0d9cc9c106 
								
							 
						 
						
							
							
								
								Remove duplicate check for ipex ( #11281 )  
							
							 
							
							... 
							
							
							
							* Replacing builtin.import is causing lots of unpredicted problems. Remove this function. 
							
						 
						
							2024-06-12 13:52:02 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								10e480ee96 
								
							 
						 
						
							
							
								
								refactor internlm and internlm2 ( #11274 )  
							
							 
							
							
							
						 
						
							2024-06-11 14:19:19 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								fac49f15e3 
								
							 
						 
						
							
							
								
								Remove manual importing ipex in all-in-one benchmark ( #11272 )  
							
							 
							
							
							
						 
						
							2024-06-11 09:32:13 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Wenjing Margaret Mao 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								70b17c87be 
								
							 
						 
						
							
							
								
								Merge multiple batches ( #11264 )  
							
							 
							
							... 
							
							
							
							* add merge steps
* move to pr mode
* remove build + add merge.py
* add tohtml and change cp
* change test_batch folder path
* change merge_temp path
* change to html folder
* revert
* change place
* revert 437
* revert space
---------
Co-authored-by: Yishuo Wang <yishuo.wang@intel.com> 
							
						 
						
							2024-06-07 18:38:45 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xiangyu Tian 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								4b07712fd8 
								
							 
						 
						
							
							
								
								LLM: Fix vLLM CPU model convert mismatch ( #11254 )  
							
							 
							
							... 
							
							
							
							Fix vLLM CPU model convert mismatch. 
							
						 
						
							2024-06-07 15:54:34 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								42fab480ea 
								
							 
						 
						
							
							
								
								support stablm2 12b ( #11265 )  
							
							 
							
							
							
						 
						
							2024-06-07 15:46:00 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xin Qiu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								dbc3c2d72d 
								
							 
						 
						
							
							
								
								glm4 sdp ( #11253 )  
							
							 
							
							... 
							
							
							
							* glm4 sdp
* fix style
* update comment 
							
						 
						
							2024-06-07 15:42:23 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xin Qiu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								151fcf37bb 
								
							 
						 
						
							
							
								
								check devie name in use_flash_attention ( #11263 )  
							
							 
							
							
							
						 
						
							2024-06-07 15:07:47 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								2623944604 
								
							 
						 
						
							
							
								
								qwen2 sdpa small fix ( #11261 )  
							
							 
							
							
							
						 
						
							2024-06-07 14:42:18 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ea0d03fd28 
								
							 
						 
						
							
							
								
								Refactor baichuan1 7B and 13B ( #11258 )  
							
							 
							
							
							
						 
						
							2024-06-07 14:29:20 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Qiyuan Gong 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								1aa9c9597a 
								
							 
						 
						
							
							
								
								Avoid duplicate import in IPEX auto importer ( #11227 )  
							
							 
							
							... 
							
							
							
							* Add custom import to avoid ipex duplicate importing
* Add scope limitation 
							
						 
						
							2024-06-07 14:08:00 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Wang, Jian4 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								6f2684e5c9 
								
							 
						 
						
							
							
								
								Update pp llama.py to save memory ( #11233 )  
							
							 
							
							
							
						 
						
							2024-06-07 13:18:16 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ef8e9b2ecd 
								
							 
						 
						
							
							
								
								Refactor qwen2 moe ( #11244 )  
							
							 
							
							
							
						 
						
							2024-06-07 13:14:54 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Zijie Li 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7b753dc8ca 
								
							 
						 
						
							
							
								
								Update sample output for HF Qwen2 GPU and CPU ( #11257 )  
							
							 
							
							
							
						 
						
							2024-06-07 11:36:22 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Zhao Changmin 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								b7948671de 
								
							 
						 
						
							
							
								
								[WIP] Add look up table in 1st token stage ( #11193 )  
							
							 
							
							... 
							
							
							
							* lookuptb 
							
						 
						
							2024-06-07 10:51:05 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								8c36b5bdde 
								
							 
						 
						
							
							
								
								Add qwen2 example ( #11252 )  
							
							 
							
							... 
							
							
							
							* Add GPU example for Qwen2
* Update comments in README
* Update README for Qwen2 GPU example
* Add CPU example for Qwen2
Sample Output under README pending
* Update generate.py and README for CPU Qwen2
* Update GPU example for Qwen2
* Small update
* Small fix
* Add Qwen2 table
* Update README for Qwen2 CPU and GPU
Update sample output under README
---------
Co-authored-by: Zijie Li <michael20001122@gmail.com> 
							
						 
						
							2024-06-07 10:29:33 +08:00