Yishuo Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								f2e6abb563 
								
							 
						 
						
							
							
								
								fix mlp batch size check ( #9718 )  
							
							 
							
							
							
						 
						
							2023-12-19 14:22:22 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Heyang Sun 
								
							 
						 
						
							
							
							
							
								
							
							
								1fa7793fc0 
								
							 
						 
						
							
							
								
								Load Mixtral GGUF Model ( #9690 )  
							
							 
							
							... 
							
							
							
							* Load Mixtral GGUF Model
* refactor
* fix empty tensor when to cpu
* update gpu and cpu readmes
* add dtype when set tensor into module 
							
						 
						
							2023-12-19 13:54:38 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Qiyuan Gong 
								
							 
						 
						
							
							
							
							
								
							
							
								d0a3095b97 
								
							 
						 
						
							
							
								
								[LLM] IPEX auto importer ( #9706 )  
							
							 
							
							... 
							
							
							
							* IPEX auto importer and get_ipex_version.
* Add BIGDL_IMPORT_IPEX to control auto import, default is false. 
							
						 
						
							2023-12-19 13:39:38 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yang Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								f4fb58d99c 
								
							 
						 
						
							
							
								
								fusing qkv project and rope ( #9612 )  
							
							 
							
							... 
							
							
							
							* Try fusing qkv project and rope
* add fused mlp
* fuse append cache
* fix style and clean up code
* clean up 
							
						 
						
							2023-12-18 16:45:00 -08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Kai Huang 
								
							 
						 
						
							
							
							
							
								
							
							
								4c112ee70c 
								
							 
						 
						
							
							
								
								Rename qwen in model name for arc perf test ( #9712 )  
							
							 
							
							
							
						 
						
							2023-12-18 20:34:31 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Cengguang Zhang 
								
							 
						 
						
							
							
							
							
								
							
							
								4d22add4af 
								
							 
						 
						
							
							
								
								LLM: fix qwen efficiency issue in perf-test.  
							
							 
							
							
							
						 
						
							2023-12-18 18:32:54 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								8ed89557e5 
								
							 
						 
						
							
							
								
								LLM: add mlp optimization of mixtral ( #9709 )  
							
							 
							
							
							
						 
						
							2023-12-18 16:59:52 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Chen, Zhentao 
								
							 
						 
						
							
							
							
							
								
							
							
								b3647507c0 
								
							 
						 
						
							
							
								
								Fix harness workflow ( #9704 )  
							
							 
							
							... 
							
							
							
							* error when larger than 0.001
* fix env setup
* fix typo
* fix typo 
							
						 
						
							2023-12-18 15:42:10 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
							
							
								
							
							
								12df70953e 
								
							 
						 
						
							
							
								
								LLM: add resume_from_checkpoint related section ( #9705 )  
							
							 
							
							
							
						 
						
							2023-12-18 12:27:02 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xin Qiu 
								
							 
						 
						
							
							
							
							
								
							
							
								320110d158 
								
							 
						 
						
							
							
								
								handle empty fused norm result ( #9688 )  
							
							 
							
							... 
							
							
							
							* handle empty fused norm result
* remove fast_rms_norm
* fix style 
							
						 
						
							2023-12-18 09:56:11 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ziteng Zhang 
								
							 
						 
						
							
							
							
							
								
							
							
								67cc155771 
								
							 
						 
						
							
							
								
								[LLM] Correct chat format of llama and add llama_stream_chat in chat.py  
							
							 
							
							... 
							
							
							
							* correct chat format of llama
* add llama_stream_chat 
							
						 
						
							2023-12-15 16:36:46 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ziteng Zhang 
								
							 
						 
						
							
							
							
							
								
							
							
								0d41b7ba7b 
								
							 
						 
						
							
							
								
								[LLM] Correct chat format & add stop words for chatglm3 in chat.py  
							
							 
							
							... 
							
							
							
							* correct chat format of chatglm3
* correct stop words of chatglm3 
							
						 
						
							2023-12-15 16:35:17 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ziteng Zhang 
								
							 
						 
						
							
							
							
							
								
							
							
								d57efd8eb9 
								
							 
						 
						
							
							
								
								[LM] Add stop_word for Qwen model and correct qwen chat format in chat.py ( #9642 )  
							
							 
							
							... 
							
							
							
							* add stop words list for qwen
* change qwen chat format 
							
						 
						
							2023-12-15 14:53:58 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									SONG Ge 
								
							 
						 
						
							
							
							
							
								
							
							
								d5b81af7bd 
								
							 
						 
						
							
							
								
								Support mixtral attention optimization on transformers-v4.36.0 ( #9674 )  
							
							 
							
							... 
							
							
							
							* add example code to support mistral/mixtral attention on transformers v4.36.0
* update
* style fix
* add update for seen-tokens
* support mixtral
* rm mistral change
* small fix
* add more comments and remove use_cache part
---------
Co-authored-by: plusbang <binbin1.deng@intel.com> 
							
						 
						
							2023-12-15 14:30:23 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Cengguang Zhang 
								
							 
						 
						
							
							
							
							
								
							
							
								adbef56001 
								
							 
						 
						
							
							
								
								LLM: update qwen attention forward. ( #9695 )  
							
							 
							
							... 
							
							
							
							* feat: update qwen attention forward.
* fix: style. 
							
						 
						
							2023-12-15 14:06:15 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Wang, Jian4 
								
							 
						 
						
							
							
							
							
								
							
							
								b8437a1c1e 
								
							 
						 
						
							
							
								
								LLM: Add gguf mistral model support ( #9691 )  
							
							 
							
							... 
							
							
							
							* add mistral support
* need to upgrade transformers version
* update 
							
						 
						
							2023-12-15 13:37:39 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Wang, Jian4 
								
							 
						 
						
							
							
							
							
								
							
							
								496bb2e845 
								
							 
						 
						
							
							
								
								LLM: Support load BaiChuan model family gguf model ( #9685 )  
							
							 
							
							... 
							
							
							
							* support baichuan model family gguf model
* update gguf generate.py
* add verify models
* add support model_family
* update
* update style
* update type
* update readme
* update
* remove support model_family 
							
						 
						
							2023-12-15 13:34:33 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Lilac09 
								
							 
						 
						
							
							
							
							
								
							
							
								3afed99216 
								
							 
						 
						
							
							
								
								fix path issue ( #9696 )  
							
							 
							
							
							
						 
						
							2023-12-15 11:21:49 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jason Dai 
								
							 
						 
						
							
							
							
							
								
							
							
								37f509bb95 
								
							 
						 
						
							
							
								
								Update readme ( #9692 )  
							
							 
							
							
							
						 
						
							2023-12-14 19:50:21 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									WeiguangHan 
								
							 
						 
						
							
							
							
							
								
							
							
								1f0245039d 
								
							 
						 
						
							
							
								
								LLM: check the final csv results for arc perf test ( #9684 )  
							
							 
							
							... 
							
							
							
							* LLM: check the final csv results for arc perf test
* delete useless python script
* change threshold
* revert the llm_performance_tests.yml 
							
						 
						
							2023-12-14 19:46:08 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								9a330bfc2b 
								
							 
						 
						
							
							
								
								fix fuse mlp when using q5_0 or fp8 ( #9689 )  
							
							 
							
							
							
						 
						
							2023-12-14 16:16:05 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
							
							
								
							
							
								82ac2dbf55 
								
							 
						 
						
							
							
								
								[LLM] Small fixes for win igpu test for ipex 2.1 ( #9686 )  
							
							 
							
							... 
							
							
							
							* Fixes to install for igpu performance tests
* Small update for core performance tests model lists 
							
						 
						
							2023-12-14 15:39:51 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									WeiguangHan 
								
							 
						 
						
							
							
							
							
								
							
							
								3e8d198b57 
								
							 
						 
						
							
							
								
								LLM: add eval func ( #9662 )  
							
							 
							
							... 
							
							
							
							* Add eval func
* add left eval 
							
						 
						
							2023-12-14 14:59:02 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ziteng Zhang 
								
							 
						 
						
							
							
							
							
								
							
							
								21c7503a42 
								
							 
						 
						
							
							
								
								[LLM] Correct prompt format of Qwen in generate.py ( #9678 )  
							
							 
							
							... 
							
							
							
							* Change qwen prompt format to chatml 
							
						 
						
							2023-12-14 14:01:30 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Qiyuan Gong 
								
							 
						 
						
							
							
							
							
								
							
							
								223c9622f7 
								
							 
						 
						
							
							
								
								[LLM] Mixtral CPU examples ( #9673 )  
							
							 
							
							... 
							
							
							
							* Mixtral CPU PyTorch and hugging face examples, based on #9661  and #9671  
							
						 
						
							2023-12-14 10:35:11 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xin Qiu 
								
							 
						 
						
							
							
							
							
								
							
							
								5e46e0e5af 
								
							 
						 
						
							
							
								
								fix baichuan2-7b 1st token performance regression on xpu ( #9683 )  
							
							 
							
							... 
							
							
							
							* fix baichuan2-7b 1st token performance regression
* add comments
* fix style 
							
						 
						
							2023-12-14 09:58:32 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									ZehuaCao 
								
							 
						 
						
							
							
							
							
								
							
							
								877229f3be 
								
							 
						 
						
							
							
								
								[LLM]Add Yi-34B-AWQ to verified AWQ model. ( #9676 )  
							
							 
							
							... 
							
							
							
							* verfiy Yi-34B-AWQ
* update 
							
						 
						
							2023-12-14 09:55:47 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
							
							
								
							
							
								68a4be762f 
								
							 
						 
						
							
							
								
								remove disco mixtral, update oneapi version ( #9671 )  
							
							 
							
							
							
						 
						
							2023-12-13 23:24:59 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								1456d30765 
								
							 
						 
						
							
							
								
								LLM: add dot to option name in setup ( #9682 )  
							
							 
							
							
							
						 
						
							2023-12-13 20:57:27 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
							
							
								
							
							
								cbdd49f229 
								
							 
						 
						
							
							
								
								[LLM] win igpu performance for ipex 2.1 and oneapi 2024.0 ( #9679 )  
							
							 
							
							... 
							
							
							
							* Change igpu win tests for ipex 2.1 and oneapi 2024.0
* Qwen model repo id updates; updates model list for 512-64
* Add .eval for win igpu all-in-one benchmark for best performance 
							
						 
						
							2023-12-13 18:52:29 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Mingyu Wei 
								
							 
						 
						
							
							
							
							
								
							
							
								16febc949c 
								
							 
						 
						
							
							
								
								[LLM] Add exclude option in all-in-one performance test ( #9632 )  
							
							 
							
							... 
							
							
							
							* add exclude option in all-in-one perf test
* update arc-perf-test.yaml
* Exclude in_out_pairs in main function
* fix some bugs
* address Kai's comments
* define excludes at the beginning
* add bloomz:2048 to exclude 
							
						 
						
							2023-12-13 18:13:06 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								9b9cd51de1 
								
							 
						 
						
							
							
								
								LLM: update setup to provide new install option to support ipex 2.1 & oneapi 2024 ( #9647 )  
							
							 
							
							... 
							
							
							
							* update setup
* default to 2.0 now
* meet code review 
							
						 
						
							2023-12-13 17:31:56 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								09ca540f9b 
								
							 
						 
						
							
							
								
								use fuse mlp in qwen ( #9672 )  
							
							 
							
							
							
						 
						
							2023-12-13 17:20:08 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								c7741c4e84 
								
							 
						 
						
							
							
								
								LLM: update moe block convert to optimize rest token latency of Mixtral ( #9669 )  
							
							 
							
							... 
							
							
							
							* update moe block convert
* further accelerate final_hidden_states
* fix style
* fix style 
							
						 
						
							2023-12-13 16:17:06 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									ZehuaCao 
								
							 
						 
						
							
							
							
							
								
							
							
								503880809c 
								
							 
						 
						
							
							
								
								verfiy codeLlama ( #9668 )  
							
							 
							
							
							
						 
						
							2023-12-13 15:39:31 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xiangyu Tian 
								
							 
						 
						
							
							
							
							
								
							
							
								1c6499e880 
								
							 
						 
						
							
							
								
								[LLM] vLLM: Support Mixtral Model ( #9670 )  
							
							 
							
							... 
							
							
							
							Add Mixtral support for BigDL vLLM. 
							
						 
						
							2023-12-13 14:44:47 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								dc5b1d7e9d 
								
							 
						 
						
							
							
								
								LLM: integrate sdp kernel for FP16 rest token inference on GPU [DG2/ATSM] ( #9633 )  
							
							 
							
							... 
							
							
							
							* integrate sdp
* update api
* fix style
* meet code review
* fix
* distinguish mtl from arc
* small fix 
							
						 
						
							2023-12-13 11:29:57 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Qiyuan Gong 
								
							 
						 
						
							
							
							
							
								
							
							
								5b0e7e308c 
								
							 
						 
						
							
							
								
								[LLM] Add support for empty activation ( #9664 )  
							
							 
							
							... 
							
							
							
							* Add support for empty activation, e.g., [0, 4096]. Empty activation is allowed by PyTorch.
* Add comments. 
							
						 
						
							2023-12-13 11:07:45 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									SONG Ge 
								
							 
						 
						
							
							
							
							
								
							
							
								284e7697b1 
								
							 
						 
						
							
							
								
								[LLM] Optimize ChatGLM2 kv_cache to support beam_search on ARC ( #9579 )  
							
							 
							
							... 
							
							
							
							* optimize kv_cache to support beam_search on Arc
* correctness test update
* fix query_length issue
* simplify implementation
* only enable the optimization on gpu device
* limit the beam_search support only enabled with gpu device and batch_size > 1
* add comments for beam_search case and revert ut change
* meet comments
* add more comments to describe the differece between multi-cases 
							
						 
						
							2023-12-13 11:02:14 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Heyang Sun 
								
							 
						 
						
							
							
							
							
								
							
							
								c64e2248ef 
								
							 
						 
						
							
							
								
								fix str returned by get_int_from_str rather than expected int ( #9667 )  
							
							 
							
							
							
						 
						
							2023-12-13 11:01:21 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
							
							
								
							
							
								bf1bcf4a14 
								
							 
						 
						
							
							
								
								add official Mixtral model support ( #9663 )  
							
							 
							
							
							
						 
						
							2023-12-12 22:27:07 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ziteng Zhang 
								
							 
						 
						
							
							
							
							
								
							
							
								8931f2eb62 
								
							 
						 
						
							
							
								
								[LLM] Fix transformer qwen size mismatch and rename causal_mask ( #9655 )  
							
							 
							
							... 
							
							
							
							* Fix size mismatching caused by context_layer
* Change registered_causal_mask to causal_mask 
							
						 
						
							2023-12-12 20:57:40 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
							
							
								
							
							
								2fe38b4b9b 
								
							 
						 
						
							
							
								
								LLM: add mixtral GPU examples ( #9661 )  
							
							 
							
							
							
						 
						
							2023-12-12 20:26:36 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
							
							
								
							
							
								968d99e6f5 
								
							 
						 
						
							
							
								
								Remove empty cache between each iteration of generation ( #9660 )  
							
							 
							
							
							
						 
						
							2023-12-12 17:24:06 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xin Qiu 
								
							 
						 
						
							
							
							
							
								
							
							
								0e639b920f 
								
							 
						 
						
							
							
								
								disable test_optimized_model.py temporarily due to out of memory on A730M(pr validation machine) ( #9658 )  
							
							 
							
							... 
							
							
							
							* disable test_optimized_model.py
* disable seq2seq 
							
						 
						
							2023-12-12 17:13:52 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
							
							
								
							
							
								59ce86d292 
								
							 
						 
						
							
							
								
								LLM: support optimize_model=True for Mixtral model ( #9657 )  
							
							 
							
							
							
						 
						
							2023-12-12 16:41:26 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
							
							
								
							
							
								d272b6dc47 
								
							 
						 
						
							
							
								
								[LLM] Enable generation of html again for win igpu tests ( #9652 )  
							
							 
							
							... 
							
							
							
							* Enable generation of html again and comment out rwkv for 32-512 as it is not very stable
* Small fix 
							
						 
						
							2023-12-11 19:15:17 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									WeiguangHan 
								
							 
						 
						
							
							
							
							
								
							
							
								afa895877c 
								
							 
						 
						
							
							
								
								LLM: fix the issue that may generate blank html ( #9650 )  
							
							 
							
							... 
							
							
							
							* LLM: fix the issue that may generate blank html
* reslove some comments 
							
						 
						
							2023-12-11 19:14:57 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									ZehuaCao 
								
							 
						 
						
							
							
							
							
								
							
							
								45721f3473 
								
							 
						 
						
							
							
								
								verfiy llava ( #9649 )  
							
							 
							
							
							
						 
						
							2023-12-11 14:26:05 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Heyang Sun 
								
							 
						 
						
							
							
							
							
								
							
							
								9f02f96160 
								
							 
						 
						
							
							
								
								[LLM] support for Yi AWQ model ( #9648 )  
							
							 
							
							
							
						 
						
							2023-12-11 14:07:34 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xin Qiu 
								
							 
						 
						
							
							
							
							
								
							
							
								82255f9726 
								
							 
						 
						
							
							
								
								Enable fused layernorm  ( #9614 )  
							
							 
							
							... 
							
							
							
							* bloom layernorm
* fix
* layernorm
* fix
* fix
* fix
* style fix
* fix
* replace nn.LayerNorm 
							
						 
						
							2023-12-11 09:26:13 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
							
							
								
							
							
								894d0aaf5e 
								
							 
						 
						
							
							
								
								[LLM] iGPU win perf test reorg based on in-out pairs ( #9645 )  
							
							 
							
							... 
							
							
							
							* trigger pr temparorily
* Saparate benchmark run for win igpu based in in-out pairs
* Rename fix
* Test workflow
* Small fix
* Skip generation of html for now
* Change back to nightly triggered 
							
						 
						
							2023-12-08 20:46:40 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Chen, Zhentao 
								
							 
						 
						
							
							
							
							
								
							
							
								972cdb9992 
								
							 
						 
						
							
							
								
								gsm8k OOM workaround ( #9597 )  
							
							 
							
							... 
							
							
							
							* update bigdl_llm.py
* update the installation of harness
* fix partial function
* import ipex
* force seq len in decrease order
* put func outside class
* move comments
* default 'trust_remote_code' as True
* Update llm-harness-evaluation.yml 
							
						 
						
							2023-12-08 18:47:25 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									WeiguangHan 
								
							 
						 
						
							
							
							
							
								
							
							
								1ff4bc43a6 
								
							 
						 
						
							
							
								
								degrade pandas version ( #9643 )  
							
							 
							
							
							
						 
						
							2023-12-08 17:44:51 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yina Chen 
								
							 
						 
						
							
							
							
							
								
							
							
								70f5e7bf0d 
								
							 
						 
						
							
							
								
								Support peft LoraConfig ( #9636 )  
							
							 
							
							... 
							
							
							
							* support peft loraconfig
* use testcase to test
* fix style
* meet comments 
							
						 
						
							2023-12-08 16:13:03 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xin Qiu 
								
							 
						 
						
							
							
							
							
								
							
							
								0b6f29a7fc 
								
							 
						 
						
							
							
								
								add fused rms norm for Yi and Qwen ( #9640 )  
							
							 
							
							
							
						 
						
							2023-12-08 16:04:38 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xin Qiu 
								
							 
						 
						
							
							
							
							
								
							
							
								5636b0ba80 
								
							 
						 
						
							
							
								
								set new linear status ( #9639 )  
							
							 
							
							
							
						 
						
							2023-12-08 11:02:49 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
							
							
								
							
							
								499100daf1 
								
							 
						 
						
							
							
								
								LLM: Add solution to fix oneccl related error ( #9630 )  
							
							 
							
							
							
						 
						
							2023-12-08 10:51:55 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									ZehuaCao 
								
							 
						 
						
							
							
							
							
								
							
							
								6eca8a8bb5 
								
							 
						 
						
							
							
								
								update transformer version ( #9631 )  
							
							 
							
							
							
						 
						
							2023-12-08 09:36:00 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									WeiguangHan 
								
							 
						 
						
							
							
							
							
								
							
							
								e9299adb3b 
								
							 
						 
						
							
							
								
								LLM: Highlight some values in the html ( #9635 )  
							
							 
							
							... 
							
							
							
							* highlight some values in the html
* revert the llm_performance_tests.yml 
							
						 
						
							2023-12-07 19:02:41 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
							
							
								
							
							
								6f34978b94 
								
							 
						 
						
							
							
								
								[LLM] Add more performance tests for win iGPU (more in-out pairs, RWKV model) ( #9626 )  
							
							 
							
							... 
							
							
							
							* Add supports for loading rwkv models using from_pretrained api
* Temporarily enable pr tests
* Add RWKV in tests and more in-out pairs
* Add rwkv for 512 tests
* Make iterations smaller
* Change back to nightly trigger 
							
						 
						
							2023-12-07 18:55:16 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								d9b0c01de3 
								
							 
						 
						
							
							
								
								LLM: fix unlora module in qlora finetune ( #9621 )  
							
							 
							
							... 
							
							
							
							* fix unlora module
* split train and inference 
							
						 
						
							2023-12-07 16:32:02 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Heyang Sun 
								
							 
						 
						
							
							
							
							
								
							
							
								3811cf43c9 
								
							 
						 
						
							
							
								
								[LLM] update AWQ documents ( #9623 )  
							
							 
							
							... 
							
							
							
							* [LLM] update AWQ and verified models' documents
* refine
* refine links
* refine 
							
						 
						
							2023-12-07 16:02:20 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								7319f2c227 
								
							 
						 
						
							
							
								
								use fused mlp in baichuan2 ( #9620 )  
							
							 
							
							
							
						 
						
							2023-12-07 15:50:57 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xiangyu Tian 
								
							 
						 
						
							
							
							
							
								
							
							
								deee65785c 
								
							 
						 
						
							
							
								
								[LLM] vLLM: Delete last_kv_cache before prefilling ( #9619 )  
							
							 
							
							... 
							
							
							
							Remove last_kv_cache before prefilling to reduce peak memory usage. 
							
						 
						
							2023-12-07 11:32:33 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
							
							
								
							
							
								48b85593b3 
								
							 
						 
						
							
							
								
								Update all-in-one benchmark readme ( #9618 )  
							
							 
							
							
							
						 
						
							2023-12-07 10:32:09 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xiangyu Tian 
								
							 
						 
						
							
							
							
							
								
							
							
								0327169b50 
								
							 
						 
						
							
							
								
								[LLM] vLLM: fix memory leak in prepare_kv_cache ( #9616 )  
							
							 
							
							... 
							
							
							
							Revert modification in prepare_kv_cache to fix memory leak. 
							
						 
						
							2023-12-07 10:08:18 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xin Qiu 
								
							 
						 
						
							
							
							
							
								
							
							
								13d47955a8 
								
							 
						 
						
							
							
								
								use fused rms norm in chatglm2 and baichuan ( #9613 )  
							
							 
							
							... 
							
							
							
							* use fused rms norm in chatglm2 and baichuan
* style fix 
							
						 
						
							2023-12-07 09:21:41 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jason Dai 
								
							 
						 
						
							
							
							
							
								
							
							
								51b668f229 
								
							 
						 
						
							
							
								
								Update GGUF readme ( #9611 )  
							
							 
							
							
							
						 
						
							2023-12-06 18:21:54 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									dingbaorong 
								
							 
						 
						
							
							
							
							
								
							
							
								a7bc89b3a1 
								
							 
						 
						
							
							
								
								remove q4_1 in gguf example ( #9610 )  
							
							 
							
							... 
							
							
							
							* remove q4_1
* fixes 
							
						 
						
							2023-12-06 16:00:05 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yina Chen 
								
							 
						 
						
							
							
							
							
								
							
							
								404e101ded 
								
							 
						 
						
							
							
								
								QALora example ( #9551 )  
							
							 
							
							... 
							
							
							
							* Support qa-lora
* init
* update
* update
* update
* update
* update
* update merge
* update
* fix style & update scripts
* update
* address comments
* fix typo
* fix typo
---------
Co-authored-by: Yang Wang <yang3.wang@intel.com> 
							
						 
						
							2023-12-06 15:36:21 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Guancheng Fu 
								
							 
						 
						
							
							
							
							
								
							
							
								6978b2c316 
								
							 
						 
						
							
							
								
								[VLLM] Change padding patterns for vLLM & clean code ( #9609 )  
							
							 
							
							... 
							
							
							
							* optimize
* fix minor error
* optimizations
* fix style 
							
						 
						
							2023-12-06 15:27:26 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									dingbaorong 
								
							 
						 
						
							
							
							
							
								
							
							
								89069d6173 
								
							 
						 
						
							
							
								
								Add gpu gguf example ( #9603 )  
							
							 
							
							... 
							
							
							
							* add gpu gguf example
* some fixes
* address kai's comments
* address json's comments 
							
						 
						
							2023-12-06 15:17:54 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
							
							
								
							
							
								0e8f4020e5 
								
							 
						 
						
							
							
								
								Add traceback error output for win igpu test api in benchmark ( #9607 )  
							
							 
							
							
							
						 
						
							2023-12-06 14:35:16 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ziteng Zhang 
								
							 
						 
						
							
							
							
							
								
							
							
								aeb77b2ab1 
								
							 
						 
						
							
							
								
								Add minimum Qwen model version ( #9606 )  
							
							 
							
							
							
						 
						
							2023-12-06 11:49:14 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
							
							
								
							
							
								c998f5f2ba 
								
							 
						 
						
							
							
								
								[LLM] iGPU long context tests ( #9598 )  
							
							 
							
							... 
							
							
							
							* Temp enable PR
* Enable tests for 256-64
* Try again 128-64
* Empty cache after each iteration for igpu benchmark scripts
* Try tests for 512
* change order for 512
* Skip chatglm3 and llama2 for now
* Separate tests for 512-64
* Small fix
* Further fixes
* Change back to nightly again 
							
						 
						
							2023-12-06 10:19:20 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Heyang Sun 
								
							 
						 
						
							
							
							
							
								
							
							
								4e70e33934 
								
							 
						 
						
							
							
								
								[LLM] code and document for distributed qlora ( #9585 )  
							
							 
							
							... 
							
							
							
							* [LLM] code and document for distributed qlora
* doc
* refine for gradient checkpoint
* refine
* Update alpaca_qlora_finetuning_cpu.py
* Update alpaca_qlora_finetuning_cpu.py
* Update alpaca_qlora_finetuning_cpu.py
* add link in doc 
							
						 
						
							2023-12-06 09:23:17 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Zheng, Yi 
								
							 
						 
						
							
							
							
							
								
							
							
								d154b38bf9 
								
							 
						 
						
							
							
								
								Add llama2 gpu low memory example ( #9514 )  
							
							 
							
							... 
							
							
							
							* Add low memory example
* Minor fixes
* Update readme.md 
							
						 
						
							2023-12-05 17:29:48 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jason Dai 
								
							 
						 
						
							
							
							
							
								
							
							
								06febb5fa7 
								
							 
						 
						
							
							
								
								Update readme for FP8/FP4 inference examples ( #9601 )  
							
							 
							
							
							
						 
						
							2023-12-05 15:59:03 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									dingbaorong 
								
							 
						 
						
							
							
							
							
								
							
							
								a66fbedd7e 
								
							 
						 
						
							
							
								
								add gpu more data types example ( #9592 )  
							
							 
							
							... 
							
							
							
							* add gpu more data types example
* add int8 
							
						 
						
							2023-12-05 15:45:38 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ziteng Zhang 
								
							 
						 
						
							
							
							
							
								
							
							
								65934c9f4f 
								
							 
						 
						
							
							
								
								[LLM] Fix Qwen causal_mask and attention_mask size mismatching ( #9600 )  
							
							 
							
							... 
							
							
							
							* Fix  #9582  , caused by Qwen modified modeling_qwen.py 7f62181c94 (d2h-049182) 
							
						 
						
							2023-12-05 15:15:54 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jinyi Wan 
								
							 
						 
						
							
							
							
							
								
							
							
								b721138132 
								
							 
						 
						
							
							
								
								Add cpu and gpu examples for BlueLM ( #9589 )  
							
							 
							
							... 
							
							
							
							* Add cpu int4 example for BlueLM
* addexample optimize_model cpu for bluelm
* add example gpu int4 blueLM
* add example optimiza_model GPU for bluelm
* Fixing naming issues and BigDL package version.
* Fixing naming issues...
* Add BlueLM in README.md "Verified Models" 
							
						 
						
							2023-12-05 13:59:02 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Guancheng Fu 
								
							 
						 
						
							
							
							
							
								
							
							
								8b00653039 
								
							 
						 
						
							
							
								
								fix doc ( #9599 )  
							
							 
							
							
							
						 
						
							2023-12-05 13:49:31 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Qiyuan Gong 
								
							 
						 
						
							
							
							
							
								
							
							
								f211f136b6 
								
							 
						 
						
							
							
								
								Configurable TORCH_LINEAR_THRESHOLD from env ( #9588 )  
							
							 
							
							... 
							
							
							
							* Add TORCH_LINEAR_THRESHOLD from env (BIGDL_LLM_LINEAR_THRESHOLD)
* Change default to 512 
							
						 
						
							2023-12-05 13:19:47 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
							
							
								
							
							
								1012507a40 
								
							 
						 
						
							
							
								
								[LLM] Fix performance tests ( #9596 )  
							
							 
							
							... 
							
							
							
							* Fix missing key for cpu_embedding
* Remove 512 as it stuck for now
* Small fix 
							
						 
						
							2023-12-05 10:59:28 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Chen, Zhentao 
								
							 
						 
						
							
							
							
							
								
							
							
								8c8a27ded7 
								
							 
						 
						
							
							
								
								Add harness summary job ( #9457 )  
							
							 
							
							... 
							
							
							
							* format yml
* add make_table_results
* add summary job
* add a job to print single result
* upload full directory 
							
						 
						
							2023-12-05 10:04:10 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
							
							
								
							
							
								3f4ad97929 
								
							 
						 
						
							
							
								
								[LLM] Add performance tests for windows iGPU ( #9584 )  
							
							 
							
							... 
							
							
							
							* Add support for win gpu benchmark with peak gpu memory monitoring
* Add win igpu tests
* Small fix
* Forward outputs
* Small fix
* Test and small fixes
* Small fix
* Small fix and test
* Small fixes
* Add tests for 512-64 and change back to nightly tests
* Small fix 
							
						 
						
							2023-12-04 20:50:02 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Chen, Zhentao 
								
							 
						 
						
							
							
							
							
								
							
							
								9557aa9c21 
								
							 
						 
						
							
							
								
								Fix harness nightly ( #9586 )  
							
							 
							
							... 
							
							
							
							* update golden
* loose the restriction of diff
* only compare results when scheduled 
							
						 
						
							2023-12-04 11:45:00 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xiangyu Tian 
								
							 
						 
						
							
							
							
							
								
							
							
								5c03651309 
								
							 
						 
						
							
							
								
								[LLM] vLLM: Add Preempt for scheduler ( #9568 )  
							
							 
							
							... 
							
							
							
							Implement Preempt_by_recompute method for vllm. 
							
						 
						
							2023-12-03 20:16:25 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Chen, Zhentao 
								
							 
						 
						
							
							
							
							
								
							
							
								cb228c70ea 
								
							 
						 
						
							
							
								
								Add harness nightly ( #9552 )  
							
							 
							
							... 
							
							
							
							* modify output_path as a directory
* schedule nightly at 21 on Friday
* add tasks and models for nightly
* add accuracy regression
* comment out if to test
* mixed fp4
* for test
* add  missing delimiter
* remove comma
* fixed golden results
* add mixed 4 golden result
* add more options
* add mistral results
* get golden result of stable lm
* move nightly scripts and results to test folder
* add license
* add fp8 stable lm golden
* run on all available devices
* trigger only when ready for review
* fix new line
* update golden
* add mistral 
							
						 
						
							2023-12-01 14:16:35 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Chen, Zhentao 
								
							 
						 
						
							
							
							
							
								
							
							
								4d7d5d4c59 
								
							 
						 
						
							
							
								
								Add 3 leaderboard tasks ( #9566 )  
							
							 
							
							... 
							
							
							
							* update leaderboard map
* download model and dataset without overwritten
* fix task drop
* run on all available devices 
							
						 
						
							2023-12-01 14:01:14 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Wang, Jian4 
								
							 
						 
						
							
							
							
							
								
							
							
								ed0dc57c6e 
								
							 
						 
						
							
							
								
								LLM: Add cpu qlora support other models guide ( #9567 )  
							
							 
							
							... 
							
							
							
							* use bf16 flag
* add using baichuan model
* update merge
* remove
* update 
							
						 
						
							2023-12-01 11:18:04 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jason Dai 
								
							 
						 
						
							
							
							
							
								
							
							
								bda404fc8f 
								
							 
						 
						
							
							
								
								Update readme ( #9575 )  
							
							 
							
							
							
						 
						
							2023-11-30 22:45:52 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xin Qiu 
								
							 
						 
						
							
							
							
							
								
							
							
								69c49d21f5 
								
							 
						 
						
							
							
								
								use fused rms norm ( #9572 )  
							
							 
							
							... 
							
							
							
							* use fused rms norm
* meet code review 
							
						 
						
							2023-11-30 21:47:41 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								66f5b45f57 
								
							 
						 
						
							
							
								
								[LLM] add a llama2 gguf example ( #9553 )  
							
							 
							
							
							
						 
						
							2023-11-30 16:37:17 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								7f6465518a 
								
							 
						 
						
							
							
								
								support loading llama tokenizer from gguf model ( #9565 )  
							
							 
							
							
							
						 
						
							2023-11-30 14:56:12 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Wang, Jian4 
								
							 
						 
						
							
							
							
							
								
							
							
								a0a80d232e 
								
							 
						 
						
							
							
								
								LLM: Add qlora cpu distributed readme ( #9561 )  
							
							 
							
							... 
							
							
							
							* init readme
* add distributed guide
* update 
							
						 
						
							2023-11-30 13:42:30 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Chen, Zhentao 
								
							 
						 
						
							
							
							
							
								
							
							
								c8e0c2ed48 
								
							 
						 
						
							
							
								
								Fixed dumped logs in harness ( #9549 )  
							
							 
							
							... 
							
							
							
							* install transformers==4.34.0
* modify output_path as a directory
* add device and task to output dir parents 
							
						 
						
							2023-11-30 12:47:56 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Qiyuan Gong 
								
							 
						 
						
							
							
							
							
								
							
							
								d85a430a8c 
								
							 
						 
						
							
							
								
								Uing bigdl-llm-init instead of bigdl-nano-init ( #9558 )  
							
							 
							
							... 
							
							
							
							* Replace `bigdl-nano-init` with `bigdl-llm-init`.
* Install `bigdl-llm` instead of `bigdl-nano`.
* Remove nano in README. 
							
						 
						
							2023-11-30 10:10:29 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
							
							
								
							
							
								34503efa6a 
								
							 
						 
						
							
							
								
								Fix cpu pinned embedding ( #9556 )  
							
							 
							
							
							
						 
						
							2023-11-29 18:27:56 +08:00