Xin Qiu 
								
							 
						 
						
							
							
							
							
								
							
							
								13e61738c5 
								
							 
						 
						
							
							
								
								hide detail memory for each token in benchmark_utils.py ( #10037 )  
							
							 
							
							
							
						 
						
							2024-01-30 16:04:17 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								6b63ba23d1 
								
							 
						 
						
							
							
								
								LLM: add full module name during convert ( #10035 )  
							
							 
							
							
							
						 
						
							2024-01-30 14:43:07 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								7dfa6dbe46 
								
							 
						 
						
							
							
								
								add rwkv time shift optimization ( #10032 )  
							
							 
							
							
							
						 
						
							2024-01-30 14:10:55 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xiangyu Tian 
								
							 
						 
						
							
							
							
							
								
							
							
								f57d0fda8b 
								
							 
						 
						
							
							
								
								[LLM] Use IPEX Optimization for Self Speculative Decoding ( #9997 )  
							
							 
							
							... 
							
							
							
							Use IPEX Optimization for Self Speculative Decoding 
							
						 
						
							2024-01-30 09:11:06 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								ccf8f613fb 
								
							 
						 
						
							
							
								
								LLM: update fp16 Linear on ARC/FLEX ( #10023 )  
							
							 
							
							
							
						 
						
							2024-01-29 18:25:26 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Shaojun Liu 
								
							 
						 
						
							
							
							
							
								
							
							
								824c8029d7 
								
							 
						 
						
							
							
								
								Fix "local variable 'model' referenced before assignment" ( #10022 )  
							
							 
							
							
							
						 
						
							2024-01-29 16:18:04 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Heyang Sun 
								
							 
						 
						
							
							
							
							
								
							
							
								cc3f122f6a 
								
							 
						 
						
							
							
								
								Baichuan2 CPU example of speculative decoding ( #10003 )  
							
							 
							
							... 
							
							
							
							* Baichuan2 CPU example of speculative decoding
* Update generate.py
* Update README.md
* Update generate.py
* Update generate.py
* Update generate.py
* fix default model
* fix wrong chinese coding
* Update generate.py
* update prompt
* update sample outputs
* baichuan 7b needs transformers==4.31.0
* rename example file's name 
							
						 
						
							2024-01-29 14:21:09 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xiangyu Tian 
								
							 
						 
						
							
							
							
							
								
							
							
								f37e4702bc 
								
							 
						 
						
							
							
								
								[LLM] Use IPEX Optimization for BF16 Model ( #9988 )  
							
							 
							
							... 
							
							
							
							Use IPEX Optimization for BF16 Model by env BIGDL_OPT_IPEX=true 
							
						 
						
							2024-01-29 11:28:25 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jin Qiao 
								
							 
						 
						
							
							
							
							
								
							
							
								440cfe18ed 
								
							 
						 
						
							
							
								
								LLM: GPU Example Updates for Windows ( #9992 )  
							
							 
							
							... 
							
							
							
							* modify aquila
* modify aquila2
* add baichuan
* modify baichuan2
* modify blue-lm
* modify chatglm3
* modify chinese-llama2
* modiy codellama
* modify distil-whisper
* modify dolly-v1
* modify dolly-v2
* modify falcon
* modify flan-t5
* modify gpt-j
* modify internlm
* modify llama2
* modify mistral
* modify mixtral
* modify mpt
* modify phi-1_5
* modify qwen
* modify qwen-vl
* modify replit
* modify solar
* modify starcoder
* modify vicuna
* modify voiceassistant
* modify whisper
* modify yi
* modify aquila2
* modify baichuan
* modify baichuan2
* modify blue-lm
* modify chatglm2
* modify chatglm3
* modify codellama
* modify distil-whisper
* modify dolly-v1
* modify dolly-v2
* modify flan-t5
* modify llama2
* modify llava
* modify mistral
* modify mixtral
* modify phi-1_5
* modify qwen-vl
* modify replit
* modify solar
* modify starcoder
* modify yi
* correct the comments
* remove cpu_embedding in code for whisper and distil-whisper
* remove comment
* remove cpu_embedding for voice assistant
* revert modify voice assistant
* modify for voice assistant
* add comment for voice assistant
* fix comments
* fix comments 
							
						 
						
							2024-01-29 11:25:11 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
							
							
								
							
							
								c6d4f91777 
								
							 
						 
						
							
							
								
								[LLM] Add UTs of load_low_bit for transformers-style API ( #10001 )  
							
							 
							
							... 
							
							
							
							* Add uts for transformers api load_low_bit generation
* Small fixes
* Remove replit-code for CPU tests due to current load_low_bit issue on MPT
* Small change
* Small reorganization to llm unit tests on CPU
* Small fixes 
							
						 
						
							2024-01-29 10:18:23 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								d720554d43 
								
							 
						 
						
							
							
								
								simplify quantize kv cache api ( #10011 )  
							
							 
							
							
							
						 
						
							2024-01-29 09:23:57 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yina Chen 
								
							 
						 
						
							
							
							
							
								
							
							
								a3322e2a6c 
								
							 
						 
						
							
							
								
								add fp8 e5 to use_xmx ( #10015 )  
							
							 
							
							
							
						 
						
							2024-01-26 18:29:46 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Qiyuan Gong 
								
							 
						 
						
							
							
							
							
								
							
							
								9e18ea187f 
								
							 
						 
						
							
							
								
								[LLM] Avoid KV Cache OOM when seq len is larger than 1 ( #10006 )  
							
							 
							
							... 
							
							
							
							* Avoid OOM during muti-round streaming chat with kv cache
* For llama like kv cache, i.e., [bs, n_head, seq_len, head_dim], use is_enough_kv_cache_room_4_31.
* Other models need to compare kv cache size with kv_len. 
							
						 
						
							2024-01-26 17:30:08 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
							
							
								
							
							
								e5ae6f2c13 
								
							 
						 
						
							
							
								
								LLM: fix truncation logic of past_key_values in chatglm multi turn chat ( #10007 )  
							
							 
							
							... 
							
							
							
							* Avoid frequently truncating past_key_values  when its length is larger than required. 
							
						 
						
							2024-01-26 16:56:02 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
							
							
								
							
							
								1eaaace2dc 
								
							 
						 
						
							
							
								
								Update perf test all-in-one config for batch_size arg ( #10012 )  
							
							 
							
							
							
						 
						
							2024-01-26 16:46:36 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xin Qiu 
								
							 
						 
						
							
							
							
							
								
							
							
								7952bbc919 
								
							 
						 
						
							
							
								
								add conf batch_size to run_model ( #10010 )  
							
							 
							
							
							
						 
						
							2024-01-26 15:48:48 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									SONG Ge 
								
							 
						 
						
							
							
							
							
								
							
							
								421e7cee80 
								
							 
						 
						
							
							
								
								[LLM] Add Text_Generation_WebUI Support ( #9884 )  
							
							 
							
							... 
							
							
							
							* initially add text_generation_webui support
* add env requirements install
* add necessary dependencies
* update for starting webui
* update shared and noted to place models
* update heading of part3
* meet comments
* add copyright license
* remove extensions
* convert tutorial to windows side
* add warm-up to optimize performance 
							
						 
						
							2024-01-26 15:12:49 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
							
							
								
							
							
								f0da0c131b 
								
							 
						 
						
							
							
								
								Disable llama2 optimize model true or false test for now in Arc UTs ( #10008 )  
							
							 
							
							
							
						 
						
							2024-01-26 14:42:11 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								a00efa0564 
								
							 
						 
						
							
							
								
								LLM: add mlp & qkv fusion for FP16 Llama-7B ( #9932 )  
							
							 
							
							... 
							
							
							
							* add mlp fusion for llama
* add mlp fusion
* fix style
* update
* add mm_qkv_out
* fix style
* update
* meet code review
* meet code review 
							
						 
						
							2024-01-26 11:50:38 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Wang, Jian4 
								
							 
						 
						
							
							
							
							
								
							
							
								98ea3459e5 
								
							 
						 
						
							
							
								
								LLM : Fix llama draft_model dtype error ( #10005 )  
							
							 
							
							... 
							
							
							
							* fix llama draft_model dtype error
* updat 
							
						 
						
							2024-01-26 10:59:48 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								aae1870096 
								
							 
						 
						
							
							
								
								fix qwen kv cache length ( #9998 )  
							
							 
							
							
							
						 
						
							2024-01-26 10:15:01 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Chen, Zhentao 
								
							 
						 
						
							
							
							
							
								
							
							
								762adc4f9d 
								
							 
						 
						
							
							
								
								Reformat summary table ( #9942 )  
							
							 
							
							... 
							
							
							
							* reformat the table
* refactor the file
* read result.json only 
							
						 
						
							2024-01-25 23:49:00 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
							
							
								
							
							
								171fb2d185 
								
							 
						 
						
							
							
								
								LLM: reorganize GPU finetuning examples ( #9952 )  
							
							 
							
							
							
						 
						
							2024-01-25 19:02:38 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								24b34b6e46 
								
							 
						 
						
							
							
								
								change xmx condition ( #10000 )  
							
							 
							
							
							
						 
						
							2024-01-25 17:48:11 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ziteng Zhang 
								
							 
						 
						
							
							
							
							
								
							
							
								8b08ad408b 
								
							 
						 
						
							
							
								
								Add batch_size in all_in_one ( #9999 )  
							
							 
							
							... 
							
							
							
							Add batch_size in all_in_one, except run_native_int4 
							
						 
						
							2024-01-25 17:43:49 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Wang, Jian4 
								
							 
						 
						
							
							
							
							
								
							
							
								093e6f8f73 
								
							 
						 
						
							
							
								
								LLM: Add qwen CPU speculative example ( #9985 )  
							
							 
							
							... 
							
							
							
							* init from gpu
* update for cpu
* update
* update
* fix xpu readme
* update
* update example prompt
* update prompt and add 72b
* update
* update 
							
						 
						
							2024-01-25 17:01:34 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								bf65548d29 
								
							 
						 
						
							
							
								
								Add quantize kv cache support for chaglm2/3 ( #9996 )  
							
							 
							
							
							
						 
						
							2024-01-25 16:55:59 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Chen, Zhentao 
								
							 
						 
						
							
							
							
							
								
							
							
								86055d76d5 
								
							 
						 
						
							
							
								
								fix optimize_model not working ( #9995 )  
							
							 
							
							
							
						 
						
							2024-01-25 16:39:05 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Wang, Jian4 
								
							 
						 
						
							
							
							
							
								
							
							
								9bff84e6fd 
								
							 
						 
						
							
							
								
								LLM: Convert draft_model kv_cache from bf16 to fp32 ( #9964 )  
							
							 
							
							... 
							
							
							
							* convert bf16 to fp32
* update
* change when init
* init first and cut off after
* init and exchange
* update python type
* update
* fix bug
* update
* update 
							
						 
						
							2024-01-25 11:20:27 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yina Chen 
								
							 
						 
						
							
							
							
							
								
							
							
								99ff6cf048 
								
							 
						 
						
							
							
								
								Update gpu spec decoding baichuan2 example dependency ( #9990 )  
							
							 
							
							... 
							
							
							
							* add dependency
* update
* update 
							
						 
						
							2024-01-25 11:05:04 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yina Chen 
								
							 
						 
						
							
							
							
							
								
							
							
								27338540c3 
								
							 
						 
						
							
							
								
								Fix repetition_penalty not activated issue ( #9989 )  
							
							 
							
							
							
						 
						
							2024-01-25 10:40:41 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jason Dai 
								
							 
						 
						
							
							
							
							
								
							
							
								3bc3d0bbcd 
								
							 
						 
						
							
							
								
								Update self-speculative readme ( #9986 )  
							
							 
							
							
							
						 
						
							2024-01-24 22:37:32 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
							
							
								
							
							
								b27e5a27b9 
								
							 
						 
						
							
							
								
								Remove the check for meta device in _replace_with_low_bit_linear ( #9984 )  
							
							 
							
							
							
						 
						
							2024-01-24 18:15:39 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								d4f65a6033 
								
							 
						 
						
							
							
								
								LLM: add mistral speculative example ( #9976 )  
							
							 
							
							... 
							
							
							
							* add mistral example
* update 
							
						 
						
							2024-01-24 17:35:15 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yina Chen 
								
							 
						 
						
							
							
							
							
								
							
							
								b176cad75a 
								
							 
						 
						
							
							
								
								LLM: Add baichuan2 gpu spec example ( #9973 )  
							
							 
							
							... 
							
							
							
							* add baichuan2 gpu spec example
* update readme & example
* remove print
* fix typo
* meet comments
* revert
* update 
							
						 
						
							2024-01-24 16:40:16 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jinyi Wan 
								
							 
						 
						
							
							
							
							
								
							
							
								ec2d9de0ea 
								
							 
						 
						
							
							
								
								Fix README.md for solar ( #9957 )  
							
							 
							
							
							
						 
						
							2024-01-24 15:50:54 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Mingyu Wei 
								
							 
						 
						
							
							
							
							
								
							
							
								bc9cff51a8 
								
							 
						 
						
							
							
								
								LLM GPU Example Update for Windows Support ( #9902 )  
							
							 
							
							... 
							
							
							
							* Update README in LLM GPU Examples
* Update reference of Intel GPU
* add cpu_embedding=True in comment
* small fixes
* update GPU/README.md and add explanation for cpu_embedding=True
* address comments
* fix small typos
* add backtick for cpu_embedding=True
* remove extra backtick in the doc
* add period mark
* update readme 
							
						 
						
							2024-01-24 13:42:27 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Chen, Zhentao 
								
							 
						 
						
							
							
							
							
								
							
							
								e0db44dcb6 
								
							 
						 
						
							
							
								
								fix unexpected keyword argument 'device'  ( #9982 )  
							
							 
							
							... 
							
							
							
							* add device for chatglm3 only
* add comment for this change
* fix style
* fix style
* fix style again..
* finally fixed style 
							
						 
						
							2024-01-24 13:20:46 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Mingyu Wei 
								
							 
						 
						
							
							
							
							
								
							
							
								50a851e3b3 
								
							 
						 
						
							
							
								
								LLM: separate arc ut for disable XMX ( #9953 )  
							
							 
							
							... 
							
							
							
							* separate test_optimize_model api with disabled xmx
* delete test_optimize_model in test_transformers_api.py
* set env variable in .sh/ put back test_optimize_model
* unset env variable
* remove env setting in .py
* address errors in action
* remove import ipex
* lower tolerance 
							
						 
						
							2024-01-23 19:04:47 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
							
							
								
							
							
								8d28aa8e2b 
								
							 
						 
						
							
							
								
								[LLM] Fix the model.device problem when cpu_embedding=True ( #9971 )  
							
							 
							
							... 
							
							
							
							* Overwrite the device attribute for CPUPinnedParam
* Expose cpu_embedding=True for Linux users
* Fix python style 
							
						 
						
							2024-01-23 18:51:11 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								f82782cd3b 
								
							 
						 
						
							
							
								
								fix starcoder ( #9975 )  
							
							 
							
							
							
						 
						
							2024-01-23 17:24:53 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									WeiguangHan 
								
							 
						 
						
							
							
							
							
								
							
							
								be5836bee1 
								
							 
						 
						
							
							
								
								LLM: fix outlier value ( #9945 )  
							
							 
							
							... 
							
							
							
							* fix outlier value
* small fix 
							
						 
						
							2024-01-23 17:04:13 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								2c8a9aaf0d 
								
							 
						 
						
							
							
								
								fix qwen causal mask when quantize_kv_cache=True ( #9968 )  
							
							 
							
							
							
						 
						
							2024-01-23 16:34:05 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yina Chen 
								
							 
						 
						
							
							
							
							
								
							
							
								5aa4b32c1b 
								
							 
						 
						
							
							
								
								LLM: Add qwen spec gpu example ( #9965 )  
							
							 
							
							... 
							
							
							
							* add qwen spec gpu example
* update readme
---------
Co-authored-by: rnwang04 <ruonan1.wang@intel.com> 
							
						 
						
							2024-01-23 15:59:43 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yina Chen 
								
							 
						 
						
							
							
							
							
								
							
							
								36c665667d 
								
							 
						 
						
							
							
								
								Add logits processor & qwen eos stop in speculative decoding ( #9963 )  
							
							 
							
							... 
							
							
							
							* add logits processor & qwen eos
* fix style
* fix
* fix
* fix style
* fix style
* support transformers 4.31
* fix style
* fix style
---------
Co-authored-by: rnwang04 <ruonan1.wang@intel.com> 
							
						 
						
							2024-01-23 15:57:28 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								60b35db1f1 
								
							 
						 
						
							
							
								
								LLM: add chatglm3 speculative decoding example ( #9966 )  
							
							 
							
							... 
							
							
							
							* add chatglm3 example
* update
* fix 
							
						 
						
							2024-01-23 15:54:12 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xin Qiu 
								
							 
						 
						
							
							
							
							
								
							
							
								da4687c917 
								
							 
						 
						
							
							
								
								fix fp16 ( #9970 )  
							
							 
							
							
							
						 
						
							2024-01-23 15:53:32 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Chen, Zhentao 
								
							 
						 
						
							
							
							
							
								
							
							
								301425e377 
								
							 
						 
						
							
							
								
								harness tests on pvc multiple xpus ( #9908 )  
							
							 
							
							... 
							
							
							
							* add run_multi_llb.py
* update readme
* add job hint 
							
						 
						
							2024-01-23 13:20:37 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								27b19106f3 
								
							 
						 
						
							
							
								
								LLM: add readme for speculative decoding gpu examples ( #9961 )  
							
							 
							
							... 
							
							
							
							* add readme
* add readme
* meet code review 
							
						 
						
							2024-01-23 12:54:19 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Chen, Zhentao 
								
							 
						 
						
							
							
							
							
								
							
							
								39219b7e9a 
								
							 
						 
						
							
							
								
								add default device meta  when lcmu enabled ( #9941 )  
							
							 
							
							
							
						 
						
							2024-01-23 11:00:49 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xin Qiu 
								
							 
						 
						
							
							
							
							
								
							
							
								dacf680294 
								
							 
						 
						
							
							
								
								add fused rotary pos emb for qwen ( #9956 )  
							
							 
							
							... 
							
							
							
							* add fused rotary pos emb for qwen
* update 
							
						 
						
							2024-01-23 10:37:56 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								7b1d9ad7c0 
								
							 
						 
						
							
							
								
								LLM: limit esimd sdp usage for k_len < 8 ( #9959 )  
							
							 
							
							... 
							
							
							
							* update
* fix 
							
						 
						
							2024-01-23 09:28:23 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								3e601f9a5d 
								
							 
						 
						
							
							
								
								LLM: Support speculative decoding in bigdl-llm ( #9951 )  
							
							 
							
							... 
							
							
							
							* first commit
* fix error, add llama example
* hidden print
* update api usage
* change to api v3
* update
* meet code review
* meet code review, fix style
* add reference, fix style
* fix style
* fix first token time 
							
						 
						
							2024-01-22 19:14:56 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Cheen Hau, 俊豪 
								
							 
						 
						
							
							
							
							
								
							
							
								947b1e27b7 
								
							 
						 
						
							
							
								
								Add readme for Whisper Test ( #9944 )  
							
							 
							
							... 
							
							
							
							* Fix local data path
* Remove non-essential files
* Add readme
* Minor fixes to script
* Bugfix, refactor
* Add references to original source. Bugfixes.
* Reviewer comments
* Properly print and explain output
* Move files to dev/benchmark
* Fixes 
							
						 
						
							2024-01-22 15:11:33 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xin Qiu 
								
							 
						 
						
							
							
							
							
								
							
							
								6fb3f40f7e 
								
							 
						 
						
							
							
								
								fix error for benchmark_util.py running on cpu ( #9949 )  
							
							 
							
							
							
						 
						
							2024-01-22 10:14:40 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Heyang Sun 
								
							 
						 
						
							
							
							
							
								
							
							
								fb91c97fe8 
								
							 
						 
						
							
							
								
								support for Baichuan/Baichuan2 13B Chat running speculative decoding ( #9921 )  
							
							 
							
							... 
							
							
							
							* support for Baichuan/Baichuan2 13B Chat running speculative decoding
* fix stype 
							
						 
						
							2024-01-22 09:11:44 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xin Qiu 
								
							 
						 
						
							
							
							
							
								
							
							
								97f0cd8975 
								
							 
						 
						
							
							
								
								optimize Decilm 7b ( #9922 )  
							
							 
							
							... 
							
							
							
							* optimize deci
* update
* decilm attension forward 
							
						 
						
							2024-01-19 17:31:13 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Wang, Jian4 
								
							 
						 
						
							
							
							
							
								
							
							
								bcaeb05272 
								
							 
						 
						
							
							
								
								Update optimize qwen ( #9943 )  
							
							 
							
							... 
							
							
							
							* update for n tokens input
* fix dtype
* update 
							
						 
						
							2024-01-19 16:54:59 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
							
							
								
							
							
								db8e90796a 
								
							 
						 
						
							
							
								
								LLM: add avg token latency information and benchmark guide of autotp ( #9940 )  
							
							 
							
							
							
						 
						
							2024-01-19 15:09:57 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								bf37b3a670 
								
							 
						 
						
							
							
								
								LLM: optimize CPU speculative decoding of chatglm3 ( #9928 )  
							
							 
							
							... 
							
							
							
							* update
* fix style
* meet code review 
							
						 
						
							2024-01-19 14:10:22 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Shaojun Liu 
								
							 
						 
						
							
							
							
							
								
							
							
								967714bac8 
								
							 
						 
						
							
							
								
								gguf memory optimization for mixtral ( #9939 )  
							
							 
							
							
							
						 
						
							2024-01-19 11:13:15 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xin Qiu 
								
							 
						 
						
							
							
							
							
								
							
							
								610b5226be 
								
							 
						 
						
							
							
								
								move reserved memory to benchmark_utils.py ( #9907 )  
							
							 
							
							... 
							
							
							
							* move reserved memory to benchmark_utils.py
* meet code review 
							
						 
						
							2024-01-19 09:44:30 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Lilac09 
								
							 
						 
						
							
							
							
							
								
							
							
								7032a2ad73 
								
							 
						 
						
							
							
								
								Optimize gguf load memory for mistral ( #9923 )  
							
							 
							
							... 
							
							
							
							* optimize gguf load for mistral
* fix output of gguf mistral
* reset 
							
						 
						
							2024-01-19 09:14:39 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Shaojun Liu 
								
							 
						 
						
							
							
							
							
								
							
							
								9a46f019d7 
								
							 
						 
						
							
							
								
								gguf memory optimization for baichuan ( #9937 )  
							
							 
							
							
							
						 
						
							2024-01-19 09:11:02 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Guancheng Fu 
								
							 
						 
						
							
							
							
							
								
							
							
								2e1448f08e 
								
							 
						 
						
							
							
								
								[Serving] Add vllm_worker to fastchat serving framework ( #9934 )  
							
							 
							
							... 
							
							
							
							* add worker
* finish
* finish
* add license
* add more comments 
							
						 
						
							2024-01-18 21:33:36 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Chen, Zhentao 
								
							 
						 
						
							
							
							
							
								
							
							
								a8c866c32b 
								
							 
						 
						
							
							
								
								add ppl benchmark ( #9914 )  
							
							 
							
							... 
							
							
							
							* add ppl benchmark
* add license
* add readme
* add dataset argument
* add dataset usage
* fixed low bit args
* correct result
* fix terminal display
* fix ppl update
* enable fp16 fp32 bf16
* format the desc
* fix model_kwargs
* add more readme 
							
						 
						
							2024-01-18 17:54:28 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									WeiguangHan 
								
							 
						 
						
							
							
							
							
								
							
							
								100e0a87e5 
								
							 
						 
						
							
							
								
								LLM: add compressed chatglm3 model ( #9892 )  
							
							 
							
							... 
							
							
							
							* LLM: add compressed chatglm3 model
* small fix
* revert github action 
							
						 
						
							2024-01-18 17:48:15 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
							
							
								
							
							
								9e2ac5291b 
								
							 
						 
						
							
							
								
								Add rwkv v4 back for igpu perf test 32-512 ( #9938 )  
							
							 
							
							
							
						 
						
							2024-01-18 17:15:28 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								7bbb98abb6 
								
							 
						 
						
							
							
								
								Disable fused layer norm when using XMX to fix mpt UT ( #9933 )  
							
							 
							
							
							
						 
						
							2024-01-18 16:22:12 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Wang, Jian4 
								
							 
						 
						
							
							
							
							
								
							
							
								1fc9dfa265 
								
							 
						 
						
							
							
								
								LLM: Update for  Qwen n tokens inputs ( #9931 )  
							
							 
							
							... 
							
							
							
							* update for n tokens inputs
* update style
* update 
							
						 
						
							2024-01-18 15:56:29 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Heyang Sun 
								
							 
						 
						
							
							
							
							
								
							
							
								5184f400f9 
								
							 
						 
						
							
							
								
								Fix Mixtral GGUF Wrong Output Issue ( #9930 )  
							
							 
							
							... 
							
							
							
							* Fix Mixtral GGUF Wrong Output Issue
* fix style
* fix style 
							
						 
						
							2024-01-18 14:11:27 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								453df868c9 
								
							 
						 
						
							
							
								
								add rwkv v5 attention kernel ( #9927 )  
							
							 
							
							
							
						 
						
							2024-01-18 10:16:29 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								054952f82f 
								
							 
						 
						
							
							
								
								LLM: Fix rope of chatglm3 to support speculative decoding on CPU ( #9926 )  
							
							 
							
							
							
						 
						
							2024-01-18 09:28:10 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ziteng Zhang 
								
							 
						 
						
							
							
							
							
								
							
							
								18cd1f1432 
								
							 
						 
						
							
							
								
								[LLM]Solve the problem of calling bmm operator in BF16Linear ( #9924 )  
							
							 
							
							... 
							
							
							
							* Solve the problem of calling bmm operator in BF16Linear 
							
						 
						
							2024-01-17 18:08:35 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yina Chen 
								
							 
						 
						
							
							
							
							
								
							
							
								98b86f83d4 
								
							 
						 
						
							
							
								
								Support fast rope for training ( #9745 )  
							
							 
							
							... 
							
							
							
							* init
* init
* fix style
* add test and fix
* address comment
* update
* merge upstream main 
							
						 
						
							2024-01-17 15:51:38 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
							
							
								
							
							
								0c498a7b64 
								
							 
						 
						
							
							
								
								Add llama2-13b to igpu perf test ( #9920 )  
							
							 
							
							
							
						 
						
							2024-01-17 14:58:45 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								b059a32fff 
								
							 
						 
						
							
							
								
								LLM: add benchmark api for bigdl-llm fp16 on GPU ( #9919 )  
							
							 
							
							... 
							
							
							
							* add bmk for bigdl fp16
* fix 
							
						 
						
							2024-01-17 14:24:35 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								427f75000b 
								
							 
						 
						
							
							
								
								LLM: fix sdp of chatglm3 ( #9917 )  
							
							 
							
							... 
							
							
							
							* fix
* fix
* fix 
							
						 
						
							2024-01-17 13:37:28 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								94767da7cf 
								
							 
						 
						
							
							
								
								optimize rwkv v4 first token performance ( #9912 )  
							
							 
							
							
							
						 
						
							2024-01-17 09:27:41 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Cengguang Zhang 
								
							 
						 
						
							
							
							
							
								
							
							
								511cbcf773 
								
							 
						 
						
							
							
								
								LLM: add Ceval benchmark test. ( #9872 )  
							
							 
							
							... 
							
							
							
							* init ceval benchmark test.
* upload dataset.
* add other tests.
* add qwen evaluator.
* fix qwen evaluator style.
* fix qwen evaluator style.
* update qwen evaluator.
* add llama evaluator.
* update eval
* fix typo.
* fix
* fix typo.
* fix llama evaluator.
* fix bug.
* fix style.
* delete dataset.
* fix style.
* fix style.
* add README.md and fix typo.
* fix comments.
* remove run scripts 
							
						 
						
							2024-01-16 19:14:26 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Shaojun Liu 
								
							 
						 
						
							
							
							
							
								
							
							
								b909c5c9c2 
								
							 
						 
						
							
							
								
								GGUF load memory optimization ( #9913 )  
							
							 
							
							... 
							
							
							
							* block-wise
* convert linear for module
* revert
* Fix PEP8 checks Error 
							
						 
						
							2024-01-16 18:54:39 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
							
							
								
							
							
								8643b62521 
								
							 
						 
						
							
							
								
								[LLM] Support longer context in iGPU perf tests (2048-256)  ( #9910 )  
							
							 
							
							
							
						 
						
							2024-01-16 17:48:37 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xin Qiu 
								
							 
						 
						
							
							
							
							
								
							
							
								dee32f7d15 
								
							 
						 
						
							
							
								
								copy fused rms norm's reuslt to avoid <unk> ( #9909 )  
							
							 
							
							
							
						 
						
							2024-01-16 16:54:08 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								8d7326ae03 
								
							 
						 
						
							
							
								
								LLM: fix chatglm3 sdp to support speculative decoding ( #9900 )  
							
							 
							
							... 
							
							
							
							* fix chatglm3
* fix
* update
* meet code review
* fix 
							
						 
						
							2024-01-16 11:29:13 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Guancheng Fu 
								
							 
						 
						
							
							
							
							
								
							
							
								9f34da7cdb 
								
							 
						 
						
							
							
								
								Update PVC XMX condition ( #9901 )  
							
							 
							
							... 
							
							
							
							* update pvc xmx condition
* update condition
* update conditon 
							
						 
						
							2024-01-15 15:42:15 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								6637860ddf 
								
							 
						 
						
							
							
								
								change xmx condition ( #9896 )  
							
							 
							
							
							
						 
						
							2024-01-12 19:51:48 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									WeiguangHan 
								
							 
						 
						
							
							
							
							
								
							
							
								0e69bfe6b0 
								
							 
						 
						
							
							
								
								LLM: fix the performance drop of starcoder ( #9889 )  
							
							 
							
							... 
							
							
							
							* LLM: fix the performance drop of starcoder
* small fix
* small fix 
							
						 
						
							2024-01-12 09:14:15 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								d9cf55bce9 
								
							 
						 
						
							
							
								
								LLM: fix MLP check of mixtral ( #9891 )  
							
							 
							
							
							
						 
						
							2024-01-11 18:01:59 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ziteng Zhang 
								
							 
						 
						
							
							
							
							
								
							
							
								4f4ce73f31 
								
							 
						 
						
							
							
								
								[LLM] Add transformer_autocast_bf16 into all-in-one ( #9890 )  
							
							 
							
							... 
							
							
							
							* Add transformer_autocast_bf16 into all-in-one 
							
						 
						
							2024-01-11 17:51:07 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ziteng Zhang 
								
							 
						 
						
							
							
							
							
								
							
							
								4af88a67b9 
								
							 
						 
						
							
							
								
								support chatglm3 with bf16 ( #9888 )  
							
							 
							
							... 
							
							
							
							* support chatglm3 with bigdl-bf16 
							
						 
						
							2024-01-11 16:45:21 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
							
							
								
							
							
								0aef35a965 
								
							 
						 
						
							
							
								
								[LLM] Improve LLM doc regarding windows gpu related info ( #9880 )  
							
							 
							
							... 
							
							
							
							* Improve runtime configuration for windows
* Add python 310/311 supports for wheel downloading
* Add troubleshooting for windows gpu
* Remove manually import ipex due to auto importer
* Add info regarding cpu_embedding=True on iGPU
* More info for Windows users
* Small updates to API docs
* Python style fix
* Remove tip for loading from saved optimize_model for now
* Updated based on comments
* Update win info for multi-intel gpus selection
* Small fix
* Small fix 
							
						 
						
							2024-01-11 14:37:16 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jinyi Wan 
								
							 
						 
						
							
							
							
							
								
							
							
								07485eff5a 
								
							 
						 
						
							
							
								
								Add SOLAR-10.7B to README ( #9869 )  
							
							 
							
							
							
						 
						
							2024-01-11 14:28:41 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									WeiguangHan 
								
							 
						 
						
							
							
							
							
								
							
							
								33fd1f9c76 
								
							 
						 
						
							
							
								
								LLM: fix input length logic for run_transformer_int4_gpu ( #9864 )  
							
							 
							
							... 
							
							
							
							* LLM: fix input length logic for run_transformer_int4_gpu
* small fix
* small fix
* small fix 
							
						 
						
							2024-01-10 18:20:14 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								53531ae4ee 
								
							 
						 
						
							
							
								
								LLM: support qkv fusion for fp8e5 ( #9878 )  
							
							 
							
							... 
							
							
							
							* update
* add mistral
* meet code review 
							
						 
						
							2024-01-10 17:50:00 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Lilac09 
								
							 
						 
						
							
							
							
							
								
							
							
								cb32b985ec 
								
							 
						 
						
							
							
								
								add mistral and chatglm support to vllm ( #9879 )  
							
							 
							
							... 
							
							
							
							* add mistral and chatglm support to vllm
* add mistral and chatglm support to vllm 
							
						 
						
							2024-01-10 15:38:42 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									ZehuaCao 
								
							 
						 
						
							
							
							
							
								
							
							
								e76d984164 
								
							 
						 
						
							
							
								
								[LLM] Support llm-awq vicuna-7b-1.5 on arc ( #9874 )  
							
							 
							
							... 
							
							
							
							* support llm-awq vicuna-7b-1.5 on arc
* support llm-awq vicuna-7b-1.5 on arc 
							
						 
						
							2024-01-10 14:28:39 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								3e05c9e11b 
								
							 
						 
						
							
							
								
								LLM: update esimd sdp kernel ( #9871 )  
							
							 
							
							
							
						 
						
							2024-01-09 18:10:01 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
							
							
								
							
							
								023679459e 
								
							 
						 
						
							
							
								
								[LLM] Small fixes for finetune related examples and UTs ( #9870 )  
							
							 
							
							
							
						 
						
							2024-01-09 18:05:03 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Cheen Hau, 俊豪 
								
							 
						 
						
							
							
							
							
								
							
							
								b2aa267f50 
								
							 
						 
						
							
							
								
								Enhance LLM GPU installation document ( #9828 )  
							
							 
							
							... 
							
							
							
							* Improve gpu install doc
* Add troubleshooting - setvars.sh not done properly.
* Further improvements
* 2024.x.x -> 2024.0
* Fixes
* Fix Install BigDL-LLM From Wheel : bigdl-llm[xpu_2.0]
* Remove "export USE_XETLA=OFF" for Max GPU 
							
						 
						
							2024-01-09 16:30:50 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
							
							
								
							
							
								23fc888abe 
								
							 
						 
						
							
							
								
								Update llm gpu xpu default related info to PyTorch 2.1 ( #9866 )  
							
							 
							
							
							
						 
						
							2024-01-09 15:38:47 +08:00