Yuwen Hu 
								
							 
						 
						
							
							
							
							
								
							
							
								518ef95abc 
								
							 
						 
						
							
							
								
								Small fix for Nonetype error ( #10104 )  
							
							 
							
							
							
						 
						
							2024-02-06 14:58:52 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								d61f4905ac 
								
							 
						 
						
							
							
								
								LLM: 2bit quantization initial support ( #10042 )  
							
							 
							
							... 
							
							
							
							* basis quantize support
* fix new module name
* small update
* and mixed int4 with iq2_xxs
* remove print
* code refactor
* fix style
* meet code review 
							
						 
						
							2024-02-06 14:58:32 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									dingbaorong 
								
							 
						 
						
							
							
							
							
								
							
							
								36c9442c6d 
								
							 
						 
						
							
							
								
								Arc Stable version test ( #10087 )  
							
							 
							
							... 
							
							
							
							* add batch_size in stable version test
* add batch_size in excludes
* add excludes for batch_size
* fix ci
* triger regression test
* fix xpu version
* disable ci
* address kai's comment
---------
Co-authored-by: Ariadne <wyn2000330@126.com> 
							
						 
						
							2024-02-06 10:23:50 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jiao Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								33b9e7744d 
								
							 
						 
						
							
							
								
								fix dimension ( #10097 )  
							
							 
							
							
							
						 
						
							2024-02-05 15:07:38 -08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									SONG Ge 
								
							 
						 
						
							
							
							
							
								
							
							
								4b02ff188b 
								
							 
						 
						
							
							
								
								[WebUI] Add prompt format and stopping words for Qwen ( #10066 )  
							
							 
							
							... 
							
							
							
							* add prompt format and stopping_words for qwen mdoel
* performance optimization
* optimize
* update
* meet comments 
							
						 
						
							2024-02-05 18:23:13 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									WeiguangHan 
								
							 
						 
						
							
							
							
							
								
							
							
								0aecd8637b 
								
							 
						 
						
							
							
								
								LLM: small fix for the html script ( #10094 )  
							
							 
							
							
							
						 
						
							2024-02-05 17:27:34 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Zhicun 
								
							 
						 
						
							
							
							
							
								
							
							
								7d2be7994f 
								
							 
						 
						
							
							
								
								add phixtral and optimize phi-moe ( #10052 )  
							
							 
							
							
							
						 
						
							2024-02-05 11:12:47 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Zhicun 
								
							 
						 
						
							
							
							
							
								
							
							
								676d6923f2 
								
							 
						 
						
							
							
								
								LLM: modify transformersembeddings.embed() in langchain ( #10051 )  
							
							 
							
							
							
						 
						
							2024-02-05 10:42:10 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jin Qiao 
								
							 
						 
						
							
							
							
							
								
							
							
								ad050107b3 
								
							 
						 
						
							
							
								
								LLM: fix mpt load_low_bit issue ( #10075 )  
							
							 
							
							... 
							
							
							
							* fix
* retry
* retry 
							
						 
						
							2024-02-05 10:17:07 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									SONG Ge 
								
							 
						 
						
							
							
							
							
								
							
							
								9050991e4e 
								
							 
						 
						
							
							
								
								fix gradio check issue temply ( #10082 )  
							
							 
							
							
							
						 
						
							2024-02-04 16:46:29 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									WeiguangHan 
								
							 
						 
						
							
							
							
							
								
							
							
								c2e562d037 
								
							 
						 
						
							
							
								
								LLM: add batch_size to the csv and html ( #10080 )  
							
							 
							
							... 
							
							
							
							* LLM: add batch_size to the csv and html
* small fix 
							
						 
						
							2024-02-04 16:35:44 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
							
							
								
							
							
								7e49fbc5dd 
								
							 
						 
						
							
							
								
								LLM: make finetuning examples more common for other models ( #10078 )  
							
							 
							
							
							
						 
						
							2024-02-04 16:03:52 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Heyang Sun 
								
							 
						 
						
							
							
							
							
								
							
							
								90f004b80b 
								
							 
						 
						
							
							
								
								remove benchmarkwrapper form deepspeed example ( #10079 )  
							
							 
							
							
							
						 
						
							2024-02-04 15:42:15 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								8e33cb0f38 
								
							 
						 
						
							
							
								
								LLM: support speecht5_tts ( #10077 )  
							
							 
							
							... 
							
							
							
							* support speecht5_tts
* fix 
							
						 
						
							2024-02-04 13:26:42 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									ivy-lv11 
								
							 
						 
						
							
							
							
							
								
							
							
								428b7105f6 
								
							 
						 
						
							
							
								
								Add HF and PyTorch example InternLM2 ( #10061 )  
							
							 
							
							
							
						 
						
							2024-02-04 10:25:55 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yina Chen 
								
							 
						 
						
							
							
							
							
								
							
							
								77be19bb97 
								
							 
						 
						
							
							
								
								LLM: Support gpt-j in speculative decoding ( #10067 )  
							
							 
							
							... 
							
							
							
							* gptj
* support gptj in speculative decoding
* fix
* update readme
* small fix 
							
						 
						
							2024-02-02 14:54:55 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									SONG Ge 
								
							 
						 
						
							
							
							
							
								
							
							
								19183ef476 
								
							 
						 
						
							
							
								
								[WebUI] Reset bigdl-llm loader options with default value ( #10064 )  
							
							 
							
							... 
							
							
							
							* reset bigdl-llm loader options with default value
* remove options which maybe complex for naive users 
							
						 
						
							2024-02-01 15:45:39 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xin Qiu 
								
							 
						 
						
							
							
							
							
								
							
							
								6e0f1a1e92 
								
							 
						 
						
							
							
								
								use apply_rotary_pos_emb_cache_freq_xpu in mixtral ( #10060 )  
							
							 
							
							... 
							
							
							
							* use apply_rotary_pos_emb_cache_freq_xpu in mixtral
* fix style 
							
						 
						
							2024-02-01 15:40:49 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
							
							
								
							
							
								aae20d728e 
								
							 
						 
						
							
							
								
								LLM: Add initial DPO finetuning example ( #10021 )  
							
							 
							
							
							
						 
						
							2024-02-01 14:18:08 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Heyang Sun 
								
							 
						 
						
							
							
							
							
								
							
							
								601024f418 
								
							 
						 
						
							
							
								
								Mistral CPU example of speculative decoding ( #10024 )  
							
							 
							
							... 
							
							
							
							* Mistral CPU example of speculative decoding
* update transformres version
* update example
* Update README.md 
							
						 
						
							2024-02-01 10:52:32 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Heyang Sun 
								
							 
						 
						
							
							
							
							
								
							
							
								968e70544d 
								
							 
						 
						
							
							
								
								Enable IPEX Mistral in Speculative ( #10059 )  
							
							 
							
							
							
						 
						
							2024-02-01 10:48:16 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yina Chen 
								
							 
						 
						
							
							
							
							
								
							
							
								3ca03d4e97 
								
							 
						 
						
							
							
								
								Add deepmind sample into bigdl-llm speculative decoding ( #10041 )  
							
							 
							
							... 
							
							
							
							* migrate deepmind sample
* update
* meet comments
* fix style
* fix style 
							
						 
						
							2024-02-01 09:57:02 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									WeiguangHan 
								
							 
						 
						
							
							
							
							
								
							
							
								d2d3f6b091 
								
							 
						 
						
							
							
								
								LLM: ensure the result of daily arc perf test ( #10016 )  
							
							 
							
							... 
							
							
							
							* ensure the result of daily arc perf test
* small fix
* small fix
* small fix
* small fix
* small fix
* small fix
* small fix
* small fix
* small fix
* small fix
* concat more csvs
* small fix
* revert some files 
							
						 
						
							2024-01-31 18:26:21 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									WeiguangHan 
								
							 
						 
						
							
							
							
							
								
							
							
								9724939499 
								
							 
						 
						
							
							
								
								temporarily disable bloom 2k input ( #10056 )  
							
							 
							
							
							
						 
						
							2024-01-31 17:49:12 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jin Qiao 
								
							 
						 
						
							
							
							
							
								
							
							
								8c8fc148c9 
								
							 
						 
						
							
							
								
								LLM: add rwkv 5 ( #10048 )  
							
							 
							
							
							
						 
						
							2024-01-31 15:54:55 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									WeiguangHan 
								
							 
						 
						
							
							
							
							
								
							
							
								a9018a0e95 
								
							 
						 
						
							
							
								
								LLM: modify the GPU example for redpajama model ( #10044 )  
							
							 
							
							... 
							
							
							
							* LLM: modify the GPU example for redpajama model
* small fix 
							
						 
						
							2024-01-31 14:32:08 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuxuan Xia 
								
							 
						 
						
							
							
							
							
								
							
							
								95636cad97 
								
							 
						 
						
							
							
								
								Add AutoGen CPU and XPU Example ( #9980 )  
							
							 
							
							... 
							
							
							
							* Add AutoGen example
* Adjust AutoGen README
* Adjust AutoGen README
* Change AutoGen README
* Change AutoGen README 
							
						 
						
							2024-01-31 11:31:18 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Heyang Sun 
								
							 
						 
						
							
							
							
							
								
							
							
								7284edd9b7 
								
							 
						 
						
							
							
								
								Vicuna CPU example of speculative decoding ( #10018 )  
							
							 
							
							... 
							
							
							
							* Vicuna CPU example of speculative decoding
* Update speculative.py
* Update README.md
* add requirements for ipex
* Update README.md
* Update speculative.py
* Update speculative.py 
							
						 
						
							2024-01-31 11:23:50 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Wang, Jian4 
								
							 
						 
						
							
							
							
							
								
							
							
								7e5cd42a5c 
								
							 
						 
						
							
							
								
								LLM : Update optimize ipex bf16 ( #10038 )  
							
							 
							
							... 
							
							
							
							* use 4.35.2 and remove
* update rmsnorm
* remove
* remove
* update python style
* update
* update python style
* update
* fix style
* update
* remove whitespace 
							
						 
						
							2024-01-31 10:59:55 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Wang, Jian4 
								
							 
						 
						
							
							
							
							
								
							
							
								fb53b994f8 
								
							 
						 
						
							
							
								
								LLM : Add llama ipex optimized ( #10046 )  
							
							 
							
							... 
							
							
							
							* init ipex
* remove padding 
							
						 
						
							2024-01-31 10:38:46 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								3685622f29 
								
							 
						 
						
							
							
								
								LLM: fix llama 4.36 forward( #10047 )  
							
							 
							
							
							
						 
						
							2024-01-31 10:31:10 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								53a5140eff 
								
							 
						 
						
							
							
								
								Optimize rwkv v5 rest token again ( #10043 )  
							
							 
							
							
							
						 
						
							2024-01-31 10:01:11 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Heyang Sun 
								
							 
						 
						
							
							
							
							
								
							
							
								b1ff28ceb6 
								
							 
						 
						
							
							
								
								LLama2 CPU example of speculative decoding ( #9962 )  
							
							 
							
							... 
							
							
							
							* LLama2 example of speculative decoding
* add docs
* Update speculative.py
* Update README.md
* Update README.md
* Update speculative.py
* remove autocast 
							
						 
						
							2024-01-31 09:45:20 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									WeiguangHan 
								
							 
						 
						
							
							
							
							
								
							
							
								0fcad6ce14 
								
							 
						 
						
							
							
								
								LLM: add gpu example for redpajama models ( #10040 )  
							
							 
							
							
							
						 
						
							2024-01-30 19:39:28 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xiangyu Tian 
								
							 
						 
						
							
							
							
							
								
							
							
								9978089796 
								
							 
						 
						
							
							
								
								[LLM] Enable BIGDL_OPT_IPEX in speculative baichuan2 13b example  ( #10028 )  
							
							 
							
							... 
							
							
							
							Enable BIGDL_OPT_IPEX in speculative baichuan2 13b example 
							
						 
						
							2024-01-30 17:11:37 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ovo233 
								
							 
						 
						
							
							
							
							
								
							
							
								226f398c2a 
								
							 
						 
						
							
							
								
								fix ppl test errors ( #10036 )  
							
							 
							
							
							
						 
						
							2024-01-30 16:26:21 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xin Qiu 
								
							 
						 
						
							
							
							
							
								
							
							
								13e61738c5 
								
							 
						 
						
							
							
								
								hide detail memory for each token in benchmark_utils.py ( #10037 )  
							
							 
							
							
							
						 
						
							2024-01-30 16:04:17 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								6b63ba23d1 
								
							 
						 
						
							
							
								
								LLM: add full module name during convert ( #10035 )  
							
							 
							
							
							
						 
						
							2024-01-30 14:43:07 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								7dfa6dbe46 
								
							 
						 
						
							
							
								
								add rwkv time shift optimization ( #10032 )  
							
							 
							
							
							
						 
						
							2024-01-30 14:10:55 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xiangyu Tian 
								
							 
						 
						
							
							
							
							
								
							
							
								f57d0fda8b 
								
							 
						 
						
							
							
								
								[LLM] Use IPEX Optimization for Self Speculative Decoding ( #9997 )  
							
							 
							
							... 
							
							
							
							Use IPEX Optimization for Self Speculative Decoding 
							
						 
						
							2024-01-30 09:11:06 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								ccf8f613fb 
								
							 
						 
						
							
							
								
								LLM: update fp16 Linear on ARC/FLEX ( #10023 )  
							
							 
							
							
							
						 
						
							2024-01-29 18:25:26 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Shaojun Liu 
								
							 
						 
						
							
							
							
							
								
							
							
								824c8029d7 
								
							 
						 
						
							
							
								
								Fix "local variable 'model' referenced before assignment" ( #10022 )  
							
							 
							
							
							
						 
						
							2024-01-29 16:18:04 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Heyang Sun 
								
							 
						 
						
							
							
							
							
								
							
							
								cc3f122f6a 
								
							 
						 
						
							
							
								
								Baichuan2 CPU example of speculative decoding ( #10003 )  
							
							 
							
							... 
							
							
							
							* Baichuan2 CPU example of speculative decoding
* Update generate.py
* Update README.md
* Update generate.py
* Update generate.py
* Update generate.py
* fix default model
* fix wrong chinese coding
* Update generate.py
* update prompt
* update sample outputs
* baichuan 7b needs transformers==4.31.0
* rename example file's name 
							
						 
						
							2024-01-29 14:21:09 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xiangyu Tian 
								
							 
						 
						
							
							
							
							
								
							
							
								f37e4702bc 
								
							 
						 
						
							
							
								
								[LLM] Use IPEX Optimization for BF16 Model ( #9988 )  
							
							 
							
							... 
							
							
							
							Use IPEX Optimization for BF16 Model by env BIGDL_OPT_IPEX=true 
							
						 
						
							2024-01-29 11:28:25 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jin Qiao 
								
							 
						 
						
							
							
							
							
								
							
							
								440cfe18ed 
								
							 
						 
						
							
							
								
								LLM: GPU Example Updates for Windows ( #9992 )  
							
							 
							
							... 
							
							
							
							* modify aquila
* modify aquila2
* add baichuan
* modify baichuan2
* modify blue-lm
* modify chatglm3
* modify chinese-llama2
* modiy codellama
* modify distil-whisper
* modify dolly-v1
* modify dolly-v2
* modify falcon
* modify flan-t5
* modify gpt-j
* modify internlm
* modify llama2
* modify mistral
* modify mixtral
* modify mpt
* modify phi-1_5
* modify qwen
* modify qwen-vl
* modify replit
* modify solar
* modify starcoder
* modify vicuna
* modify voiceassistant
* modify whisper
* modify yi
* modify aquila2
* modify baichuan
* modify baichuan2
* modify blue-lm
* modify chatglm2
* modify chatglm3
* modify codellama
* modify distil-whisper
* modify dolly-v1
* modify dolly-v2
* modify flan-t5
* modify llama2
* modify llava
* modify mistral
* modify mixtral
* modify phi-1_5
* modify qwen-vl
* modify replit
* modify solar
* modify starcoder
* modify yi
* correct the comments
* remove cpu_embedding in code for whisper and distil-whisper
* remove comment
* remove cpu_embedding for voice assistant
* revert modify voice assistant
* modify for voice assistant
* add comment for voice assistant
* fix comments
* fix comments 
							
						 
						
							2024-01-29 11:25:11 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
							
							
								
							
							
								c6d4f91777 
								
							 
						 
						
							
							
								
								[LLM] Add UTs of load_low_bit for transformers-style API ( #10001 )  
							
							 
							
							... 
							
							
							
							* Add uts for transformers api load_low_bit generation
* Small fixes
* Remove replit-code for CPU tests due to current load_low_bit issue on MPT
* Small change
* Small reorganization to llm unit tests on CPU
* Small fixes 
							
						 
						
							2024-01-29 10:18:23 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								d720554d43 
								
							 
						 
						
							
							
								
								simplify quantize kv cache api ( #10011 )  
							
							 
							
							
							
						 
						
							2024-01-29 09:23:57 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yina Chen 
								
							 
						 
						
							
							
							
							
								
							
							
								a3322e2a6c 
								
							 
						 
						
							
							
								
								add fp8 e5 to use_xmx ( #10015 )  
							
							 
							
							
							
						 
						
							2024-01-26 18:29:46 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Qiyuan Gong 
								
							 
						 
						
							
							
							
							
								
							
							
								9e18ea187f 
								
							 
						 
						
							
							
								
								[LLM] Avoid KV Cache OOM when seq len is larger than 1 ( #10006 )  
							
							 
							
							... 
							
							
							
							* Avoid OOM during muti-round streaming chat with kv cache
* For llama like kv cache, i.e., [bs, n_head, seq_len, head_dim], use is_enough_kv_cache_room_4_31.
* Other models need to compare kv cache size with kv_len. 
							
						 
						
							2024-01-26 17:30:08 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
							
							
								
							
							
								e5ae6f2c13 
								
							 
						 
						
							
							
								
								LLM: fix truncation logic of past_key_values in chatglm multi turn chat ( #10007 )  
							
							 
							
							... 
							
							
							
							* Avoid frequently truncating past_key_values  when its length is larger than required. 
							
						 
						
							2024-01-26 16:56:02 +08:00