WeiguangHan 
								
							 
						 
						
							
							
							
							
								
							
							
								33fd1f9c76 
								
							 
						 
						
							
							
								
								LLM: fix input length logic for run_transformer_int4_gpu ( #9864 )  
							
							 
							
							... 
							
							
							
							* LLM: fix input length logic for run_transformer_int4_gpu
* small fix
* small fix
* small fix 
							
						 
						
							2024-01-10 18:20:14 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								53531ae4ee 
								
							 
						 
						
							
							
								
								LLM: support qkv fusion for fp8e5 ( #9878 )  
							
							 
							
							... 
							
							
							
							* update
* add mistral
* meet code review 
							
						 
						
							2024-01-10 17:50:00 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Lilac09 
								
							 
						 
						
							
							
							
							
								
							
							
								cb32b985ec 
								
							 
						 
						
							
							
								
								add mistral and chatglm support to vllm ( #9879 )  
							
							 
							
							... 
							
							
							
							* add mistral and chatglm support to vllm
* add mistral and chatglm support to vllm 
							
						 
						
							2024-01-10 15:38:42 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									ZehuaCao 
								
							 
						 
						
							
							
							
							
								
							
							
								e76d984164 
								
							 
						 
						
							
							
								
								[LLM] Support llm-awq vicuna-7b-1.5 on arc ( #9874 )  
							
							 
							
							... 
							
							
							
							* support llm-awq vicuna-7b-1.5 on arc
* support llm-awq vicuna-7b-1.5 on arc 
							
						 
						
							2024-01-10 14:28:39 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								3e05c9e11b 
								
							 
						 
						
							
							
								
								LLM: update esimd sdp kernel ( #9871 )  
							
							 
							
							
							
						 
						
							2024-01-09 18:10:01 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
							
							
								
							
							
								023679459e 
								
							 
						 
						
							
							
								
								[LLM] Small fixes for finetune related examples and UTs ( #9870 )  
							
							 
							
							
							
						 
						
							2024-01-09 18:05:03 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Cheen Hau, 俊豪 
								
							 
						 
						
							
							
							
							
								
							
							
								b2aa267f50 
								
							 
						 
						
							
							
								
								Enhance LLM GPU installation document ( #9828 )  
							
							 
							
							... 
							
							
							
							* Improve gpu install doc
* Add troubleshooting - setvars.sh not done properly.
* Further improvements
* 2024.x.x -> 2024.0
* Fixes
* Fix Install BigDL-LLM From Wheel : bigdl-llm[xpu_2.0]
* Remove "export USE_XETLA=OFF" for Max GPU 
							
						 
						
							2024-01-09 16:30:50 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
							
							
								
							
							
								23fc888abe 
								
							 
						 
						
							
							
								
								Update llm gpu xpu default related info to PyTorch 2.1 ( #9866 )  
							
							 
							
							
							
						 
						
							2024-01-09 15:38:47 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								36496d60ac 
								
							 
						 
						
							
							
								
								only use quantize kv cache on MTL ( #9862 )  
							
							 
							
							
							
						 
						
							2024-01-09 13:24:02 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									ZehuaCao 
								
							 
						 
						
							
							
							
							
								
							
							
								146076bdb5 
								
							 
						 
						
							
							
								
								Support llm-awq backend ( #9856 )  
							
							 
							
							... 
							
							
							
							* Support for LLM-AWQ Backend
* fix
* Update README.md
* Add awqconfig
* modify init
* update
* support llm-awq
* fix style
* fix style
* update
* fix AwqBackendPackingMethod not found error
* fix style
* update README
* fix style
---------
Co-authored-by: Uxito-Ada <414416158@qq.com>
Co-authored-by: Heyang Sun <60865256+Uxito-Ada@users.noreply.github.com>
Co-authored-by: cyita <yitastudy@gmail.com> 
							
						 
						
							2024-01-09 13:07:32 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								fea6f16057 
								
							 
						 
						
							
							
								
								LLM: add mlp fusion for fp8e5 and update related check ( #9860 )  
							
							 
							
							... 
							
							
							
							* update mlp fusion
* fix style
* update 
							
						 
						
							2024-01-09 09:56:32 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
							
							
								
							
							
								294fd32787 
								
							 
						 
						
							
							
								
								LLM: update DeepSpeed AutoTP example with GPU memory optimization ( #9823 )  
							
							 
							
							
							
						 
						
							2024-01-09 09:22:49 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
							
							
								
							
							
								5ba1dc38d4 
								
							 
						 
						
							
							
								
								[LLM] Change default Linux GPU install option to PyTorch 2.1 ( #9858 )  
							
							 
							
							... 
							
							
							
							* Update default xpu to ipex 2.1
* Update related install ut support correspondingly
* Add arc ut tests for both ipex 2.0 and 2.1
* Small fix
* Diable ipex 2.1 test for now as oneapi 2024.0 has not beed installed on the test machine
* Update document for default PyTorch 2.1
* Small fix
* Small fix
* Small doc fixes
* Small fixes 
							
						 
						
							2024-01-08 17:16:17 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Mingyu Wei 
								
							 
						 
						
							
							
							
							
								
							
							
								ed81baa35e 
								
							 
						 
						
							
							
								
								LLM: Use default typing-extension in LangChain examples ( #9857 )  
							
							 
							
							... 
							
							
							
							* remove typing extension downgrade in readme; minor fixes of code
* fix typos in README
* change default question of docqa.py 
							
						 
						
							2024-01-08 16:50:55 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jiao Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								3b6372ab12 
								
							 
						 
						
							
							
								
								Fix Llama transformers 4.36 support ( #9852 )  
							
							 
							
							... 
							
							
							
							* supoort 4.36
* style
* update
* update
* update
* fix merge
* update 
							
						 
						
							2024-01-08 00:32:23 -08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Chen, Zhentao 
								
							 
						 
						
							
							
							
							
								
							
							
								1b585b0d40 
								
							 
						 
						
							
							
								
								set fp8 default as e5m2 ( #9859 )  
							
							 
							
							
							
						 
						
							2024-01-08 15:53:57 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								dc995006cc 
								
							 
						 
						
							
							
								
								LLM: add flash attention for mistral / mixtral ( #9846 )  
							
							 
							
							... 
							
							
							
							* add flash attention for mistral
* update
* add flash attn for mixtral
* fix style 
							
						 
						
							2024-01-08 09:51:34 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								afaa871144 
								
							 
						 
						
							
							
								
								[LLM] support quantize kv cache to fp8 ( #9812 )  
							
							 
							
							
							
						 
						
							2024-01-08 09:28:20 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jiao Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								248ae7fad2 
								
							 
						 
						
							
							
								
								LLama optimize_model to support transformers 4.36 ( #9818 )  
							
							 
							
							... 
							
							
							
							* supoort 4.36
* style
* update
* update
* update 
							
						 
						
							2024-01-05 11:30:18 -08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								a60bda3324 
								
							 
						 
						
							
							
								
								LLM: update check for deepspeed ( #9838 )  
							
							 
							
							
							
						 
						
							2024-01-05 16:44:10 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								16433dd959 
								
							 
						 
						
							
							
								
								LLM: fix first token judgement of flash attention ( #9841 )  
							
							 
							
							... 
							
							
							
							* fix flash attention
* meet code review
* fix 
							
						 
						
							2024-01-05 13:49:37 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yina Chen 
								
							 
						 
						
							
							
							
							
								
							
							
								f919f5792a 
								
							 
						 
						
							
							
								
								fix kv cache out of bound ( #9827 )  
							
							 
							
							
							
						 
						
							2024-01-05 12:38:57 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								5df31db773 
								
							 
						 
						
							
							
								
								LLM: fix accuracy issue of chatglm3 ( #9830 )  
							
							 
							
							... 
							
							
							
							* add attn mask for first token
* fix
* fix
* change attn calculation
* fix
* fix
* fix style
* fix style 
							
						 
						
							2024-01-05 10:52:05 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jinyi Wan 
								
							 
						 
						
							
							
							
							
								
							
							
								3147ebe63d 
								
							 
						 
						
							
							
								
								Add cpu and gpu examples for SOLAR-10.7B ( #9821 )  
							
							 
							
							
							
						 
						
							2024-01-05 09:50:28 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									WeiguangHan 
								
							 
						 
						
							
							
							
							
								
							
							
								ad6b182916 
								
							 
						 
						
							
							
								
								LLM: change the color of peak diff ( #9836 )  
							
							 
							
							
							
						 
						
							2024-01-04 19:30:32 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xiangyu Tian 
								
							 
						 
						
							
							
							
							
								
							
							
								38c05be1c0 
								
							 
						 
						
							
							
								
								[LLM] Fix dtype mismatch in Baichuan2-13b ( #9834 )  
							
							 
							
							
							
						 
						
							2024-01-04 15:34:42 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								8504a2bbca 
								
							 
						 
						
							
							
								
								LLM: update qlora alpaca example to change lora usage ( #9835 )  
							
							 
							
							... 
							
							
							
							* update example
* fix style 
							
						 
						
							2024-01-04 15:22:20 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ziteng Zhang 
								
							 
						 
						
							
							
							
							
								
							
							
								05b681fa85 
								
							 
						 
						
							
							
								
								[LLM] IPEX auto importer set on by default ( #9832 )  
							
							 
							
							... 
							
							
							
							* Set BIGDL_IMPORT_IPEX default to True
* Remove import intel_extension_for_pytorch as ipex from GPU example 
							
						 
						
							2024-01-04 13:33:29 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Wang, Jian4 
								
							 
						 
						
							
							
							
							
								
							
							
								4ceefc9b18 
								
							 
						 
						
							
							
								
								LLM: Support bitsandbytes config on qlora finetune ( #9715 )  
							
							 
							
							... 
							
							
							
							* test support bitsandbytesconfig
* update style
* update cpu example
* update example
* update readme
* update unit test
* use bfloat16
* update logic
* use int4
* set defalut bnb_4bit_use_double_quant
* update
* update example
* update model.py
* update
* support lora example 
							
						 
						
							2024-01-04 11:23:16 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									WeiguangHan 
								
							 
						 
						
							
							
							
							
								
							
							
								9a14465560 
								
							 
						 
						
							
							
								
								LLM: add peak diff ( #9789 )  
							
							 
							
							... 
							
							
							
							* add peak diff
* small fix
* revert yml file 
							
						 
						
							2024-01-03 18:18:19 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Mingyu Wei 
								
							 
						 
						
							
							
							
							
								
							
							
								f4eb5da42d 
								
							 
						 
						
							
							
								
								disable arc ut ( #9825 )  
							
							 
							
							
							
						 
						
							2024-01-03 18:10:34 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								20e9742fa0 
								
							 
						 
						
							
							
								
								LLM: fix chatglm3 issue ( #9820 )  
							
							 
							
							... 
							
							
							
							* fix chatglm3 issue
* small update 
							
						 
						
							2024-01-03 16:15:55 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Wang, Jian4 
								
							 
						 
						
							
							
							
							
								
							
							
								a54cd767b1 
								
							 
						 
						
							
							
								
								LLM: Add gguf falcon ( #9801 )  
							
							 
							
							... 
							
							
							
							* init falcon
* update convert.py
* update style 
							
						 
						
							2024-01-03 14:49:02 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
							
							
								
							
							
								668c2095b1 
								
							 
						 
						
							
							
								
								Remove unnecessary warning when installing llm ( #9815 )  
							
							 
							
							
							
						 
						
							2024-01-03 10:30:05 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									dingbaorong 
								
							 
						 
						
							
							
							
							
								
							
							
								f5752ead36 
								
							 
						 
						
							
							
								
								Add whisper test ( #9808 )  
							
							 
							
							... 
							
							
							
							* add whisper benchmark code
* add librispeech_asr.py
* add bigdl license 
							
						 
						
							2024-01-02 16:36:05 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
							
							
								
							
							
								6584539c91 
								
							 
						 
						
							
							
								
								LLM: fix installation of codellama ( #9813 )  
							
							 
							
							
							
						 
						
							2024-01-02 14:32:50 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Kai Huang 
								
							 
						 
						
							
							
							
							
								
							
							
								4d01069302 
								
							 
						 
						
							
							
								
								Temp remove baichuan2-13b 1k from arc perf test ( #9810 )  
							
							 
							
							
							
						 
						
							2023-12-29 12:54:13 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									dingbaorong 
								
							 
						 
						
							
							
							
							
								
							
							
								a2e668a61d 
								
							 
						 
						
							
							
								
								fix arc ut test ( #9736 )  
							
							 
							
							
							
						 
						
							2023-12-28 16:55:34 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Qiyuan Gong 
								
							 
						 
						
							
							
							
							
								
							
							
								f0f9d45eac 
								
							 
						 
						
							
							
								
								[LLM] IPEX import support bigdl-core-xe-21 ( #9769 )  
							
							 
							
							... 
							
							
							
							Add support for bigdl-core-xe-21. 
							
						 
						
							2023-12-28 15:23:58 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									dingbaorong 
								
							 
						 
						
							
							
							
							
								
							
							
								a8baf68865 
								
							 
						 
						
							
							
								
								fix csv_to_html ( #9802 )  
							
							 
							
							
							
						 
						
							2023-12-28 14:58:51 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Guancheng Fu 
								
							 
						 
						
							
							
							
							
								
							
							
								5857a38321 
								
							 
						 
						
							
							
								
								[vLLM] Add option to adjust KV_CACHE_ALLOC_BLOCK_LENGTH ( #9782 )  
							
							 
							
							... 
							
							
							
							* add option kv_cache_block
* change var name 
							
						 
						
							2023-12-28 14:41:47 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								99bddd3ab4 
								
							 
						 
						
							
							
								
								LLM: better FP16 support for Intel GPUs ( #9791 )  
							
							 
							
							... 
							
							
							
							* initial support
* fix
* fix style
* fix
* limi esimd usage condition
* refactor code
* fix style
* small fix
* meet code review
* small fix 
							
						 
						
							2023-12-28 13:30:13 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								7d9f6c6efc 
								
							 
						 
						
							
							
								
								fix cpuinfo error ( #9793 )  
							
							 
							
							
							
						 
						
							2023-12-28 09:23:44 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Wang, Jian4 
								
							 
						 
						
							
							
							
							
								
							
							
								7ed9538b9f 
								
							 
						 
						
							
							
								
								LLM: support gguf mpt ( #9773 )  
							
							 
							
							... 
							
							
							
							* add gguf mpt
* update 
							
						 
						
							2023-12-28 09:22:39 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Cengguang Zhang 
								
							 
						 
						
							
							
							
							
								
							
							
								d299f108d0 
								
							 
						 
						
							
							
								
								update falcon attention forward. ( #9796 )  
							
							 
							
							
							
						 
						
							2023-12-28 09:11:59 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Shaojun Liu 
								
							 
						 
						
							
							
							
							
								
							
							
								a5e5c3daec 
								
							 
						 
						
							
							
								
								set warm_up: 3 num_trials: 50 for cpu stress test ( #9799 )  
							
							 
							
							
							
						 
						
							2023-12-28 08:55:43 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									dingbaorong 
								
							 
						 
						
							
							
							
							
								
							
							
								f6bb4ab313 
								
							 
						 
						
							
							
								
								Arc stress test ( #9795 )  
							
							 
							
							... 
							
							
							
							* add arc stress test
* triger ci
* triger CI
* triger ci
* disable ci 
							
						 
						
							2023-12-27 21:02:41 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Kai Huang 
								
							 
						 
						
							
							
							
							
								
							
							
								40eaf76ae3 
								
							 
						 
						
							
							
								
								Add baichuan2-13b to Arc perf ( #9794 )  
							
							 
							
							... 
							
							
							
							* add baichuan2-13b
* fix indent
* revert 
							
						 
						
							2023-12-27 19:38:53 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Shaojun Liu 
								
							 
						 
						
							
							
							
							
								
							
							
								6c75c689ea 
								
							 
						 
						
							
							
								
								bigdl-llm stress test for stable version ( #9781 )  
							
							 
							
							... 
							
							
							
							* 1k-512 2k-512 baseline
* add cpu stress test
* update yaml name
* update
* update
* clean up
* test
* update
* update
* update
* test
* update 
							
						 
						
							2023-12-27 15:40:53 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									dingbaorong 
								
							 
						 
						
							
							
							
							
								
							
							
								5cfb4c4f5b 
								
							 
						 
						
							
							
								
								Arc stable version performance regression test ( #9785 )  
							
							 
							
							... 
							
							
							
							* add arc stable version regression test
* empty gpu mem between different models
* triger ci
* comment spr test
* triger ci
* address kai's comments and disable ci
* merge fp8 and int4
* disable ci 
							
						 
						
							2023-12-27 11:01:56 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
							
							
								
							
							
								40edb7b5d7 
								
							 
						 
						
							
							
								
								LLM: fix get environment variables setting ( #9787 )  
							
							 
							
							
							
						 
						
							2023-12-27 09:11:37 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Kai Huang 
								
							 
						 
						
							
							
							
							
								
							
							
								689889482c 
								
							 
						 
						
							
							
								
								Reduce max_cache_pos to reduce Baichuan2-13B memory ( #9694 )  
							
							 
							
							... 
							
							
							
							* optimize baichuan2 memory
* fix
* style
* fp16 mask
* disable fp16
* fix style
* empty cache
* revert empty cache 
							
						 
						
							2023-12-26 19:51:25 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jason Dai 
								
							 
						 
						
							
							
							
							
								
							
							
								361781bcd0 
								
							 
						 
						
							
							
								
								Update readme ( #9788 )  
							
							 
							
							
							
						 
						
							2023-12-26 19:46:11 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
							
							
								
							
							
								c38e18f2ff 
								
							 
						 
						
							
							
								
								[LLM] Migrate iGPU perf tests to new machine ( #9784 )  
							
							 
							
							... 
							
							
							
							* Move 1024 test just after 32-32 test; and enable all model for 1024-128
* Make sure python output encoding in utf-8 so that redirect to txt can always be success
* Upload results to ftp
* Small fix 
							
						 
						
							2023-12-26 19:15:57 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									WeiguangHan 
								
							 
						 
						
							
							
							
							
								
							
							
								c05d7e1532 
								
							 
						 
						
							
							
								
								LLM: add star_corder_15.5b model ( #9772 )  
							
							 
							
							... 
							
							
							
							* LLM: add star_corder_15.5b model
* revert llm_performance_tests.yml 
							
						 
						
							2023-12-26 18:55:56 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ziteng Zhang 
								
							 
						 
						
							
							
							
							
								
							
							
								44b4a0c9c5 
								
							 
						 
						
							
							
								
								[LLM] Correct prompt format of Yi, Llama2 and Qwen in generate.py ( #9786 )  
							
							 
							
							... 
							
							
							
							* correct prompt format of Yi
* correct prompt format of llama2 in cpu generate.py
* correct prompt format of Qwen in GPU example 
							
						 
						
							2023-12-26 16:57:55 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xiangyu Tian 
								
							 
						 
						
							
							
							
							
								
							
							
								0ea842231e 
								
							 
						 
						
							
							
								
								[LLM] vLLM: Add api_server entrypoint ( #9783 )  
							
							 
							
							... 
							
							
							
							Add vllm.entrypoints.api_server for benchmark_serving.py in vllm. 
							
						 
						
							2023-12-26 16:03:57 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									dingbaorong 
								
							 
						 
						
							
							
							
							
								
							
							
								64d05e581c 
								
							 
						 
						
							
							
								
								add peak gpu mem stats in transformer_int4_gpu ( #9766 )  
							
							 
							
							... 
							
							
							
							* add peak gpu mem stats in transformer_int4_gpu
* address weiguang's comments 
							
						 
						
							2023-12-26 15:38:28 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ziteng Zhang 
								
							 
						 
						
							
							
							
							
								
							
							
								87b4100054 
								
							 
						 
						
							
							
								
								[LLM] Support Yi model in chat.py ( #9778 )  
							
							 
							
							... 
							
							
							
							* Suppot Yi model
* code style& add reference link 
							
						 
						
							2023-12-26 10:03:39 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								11d883301b 
								
							 
						 
						
							
							
								
								LLM: fix wrong batch output caused by flash attention ( #9780 )  
							
							 
							
							... 
							
							
							
							* fix
* meet code review
* move batch size check to the beginning
* move qlen check inside function
* meet code review 
							
						 
						
							2023-12-26 09:41:27 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Heyang Sun 
								
							 
						 
						
							
							
							
							
								
							
							
								66e286a73d 
								
							 
						 
						
							
							
								
								Support for Mixtral AWQ ( #9775 )  
							
							 
							
							... 
							
							
							
							* Support for Mixtral AWQ
* Update README.md
* Update README.md
* Update awq_config.py
* Update README.md
* Update README.md 
							
						 
						
							2023-12-25 16:08:09 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								1917bbe626 
								
							 
						 
						
							
							
								
								LLM: fix BF16Linear related training & inference issue ( #9755 )  
							
							 
							
							... 
							
							
							
							* fix bf16 related issue
* fix
* update based on comment & add arc lora script
* update readme
* update based on comment
* update based on comment
* update
* force to bf16
* fix style
* move check input dtype into function
* update convert
* meet code review
* meet code review
* update merged model to support new training_mode api
* fix typo 
							
						 
						
							2023-12-25 14:49:30 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xiangyu Tian 
								
							 
						 
						
							
							
							
							
								
							
							
								30dab36f76 
								
							 
						 
						
							
							
								
								[LLM] vLLM: Fix kv cache init ( #9771 )  
							
							 
							
							... 
							
							
							
							Fix kv cache init 
							
						 
						
							2023-12-25 14:17:06 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yina Chen 
								
							 
						 
						
							
							
							
							
								
							
							
								449b387125 
								
							 
						 
						
							
							
								
								Support relora in bigdl-llm ( #9687 )  
							
							 
							
							... 
							
							
							
							* init
* fix style
* update
* support resume & update readme
* update
* update
* remove important
* add training mode
* meet comments 
							
						 
						
							2023-12-25 14:04:28 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Shaojun Liu 
								
							 
						 
						
							
							
							
							
								
							
							
								b6222404b8 
								
							 
						 
						
							
							
								
								bigdl-llm stable version: let the perf test fail if the difference between perf and baseline is greater than 5% ( #9750 )  
							
							 
							
							... 
							
							
							
							* test
* test
* test
* update
* revert 
							
						 
						
							2023-12-25 13:47:11 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ziteng Zhang 
								
							 
						 
						
							
							
							
							
								
							
							
								986f65cea9 
								
							 
						 
						
							
							
								
								[LLM] Add trust_remote_code for local renamed model in bigdl_llm_model.py ( #9762 )  
							
							 
							
							
							
						 
						
							2023-12-25 11:31:14 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								be13b162fe 
								
							 
						 
						
							
							
								
								add codeshell example ( #9743 )  
							
							 
							
							
							
						 
						
							2023-12-25 10:54:01 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Guancheng Fu 
								
							 
						 
						
							
							
							
							
								
							
							
								daf536fb2d 
								
							 
						 
						
							
							
								
								vLLM: Apply attention optimizations for selective batching ( #9758 )  
							
							 
							
							... 
							
							
							
							* fuse_rope for prefil
* apply kv_cache optimizations
* apply fast_decoding_path
* Re-enable kv_cache optimizations for prefill
* reduce KV_CACHE_ALLOC_BLOCK for selective_batching 
							
						 
						
							2023-12-25 10:29:31 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
							
							
								
							
							
								ed8ed76d4f 
								
							 
						 
						
							
							
								
								LLM: update deepspeed autotp usage ( #9733 )  
							
							 
							
							
							
						 
						
							2023-12-25 09:41:14 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
							
							
								
							
							
								02436c6cce 
								
							 
						 
						
							
							
								
								[LLM] Enable more long context in-out pairs for iGPU perf tests ( #9765 )  
							
							 
							
							... 
							
							
							
							* Add test for 1024-128 and enable more tests for 512-64
* Fix date in results csv name to the time when the performance is triggered
* Small fix
* Small fix
* further fixes 
							
						 
						
							2023-12-22 18:18:23 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Chen, Zhentao 
								
							 
						 
						
							
							
							
							
								
							
							
								7fd7c37e1b 
								
							 
						 
						
							
							
								
								Enable fp8e5 harness ( #9761 )  
							
							 
							
							... 
							
							
							
							* fix precision format like fp8e5
* match fp8_e5m2 
							
						 
						
							2023-12-22 16:59:48 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Qiyuan Gong 
								
							 
						 
						
							
							
							
							
								
							
							
								4c487313f2 
								
							 
						 
						
							
							
								
								Revert "[LLM] IPEX auto importer turn on by default for XPU ( #9730 )" ( #9759 )  
							
							 
							
							... 
							
							
							
							This reverts commit 0284801fbd . 
							
						 
						
							2023-12-22 16:38:24 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Qiyuan Gong 
								
							 
						 
						
							
							
							
							
								
							
							
								0284801fbd 
								
							 
						 
						
							
							
								
								[LLM] IPEX auto importer turn on by default for XPU ( #9730 )  
							
							 
							
							... 
							
							
							
							* Set BIGDL_IMPORT_IPEX default to true, i.e., auto import IPEX for XPU.
* Remove import intel_extension_for_pytorch as ipex from GPU example.
* Add support for bigdl-core-xe-21. 
							
						 
						
							2023-12-22 16:20:32 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Chen, Zhentao 
								
							 
						 
						
							
							
							
							
								
							
							
								86a69e289c 
								
							 
						 
						
							
							
								
								fix harness runner label of manual trigger ( #9754 )  
							
							 
							
							... 
							
							
							
							* fix runner
* update golden 
							
						 
						
							2023-12-22 15:09:22 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Guancheng Fu 
								
							 
						 
						
							
							
							
							
								
							
							
								fdf93c9267 
								
							 
						 
						
							
							
								
								Implement selective batching for vLLM ( #9659 )  
							
							 
							
							... 
							
							
							
							* add control to load hf model
* finish initial version of selective_batching
* temp
* finish
* Remove print statement
* fix error
* Apply yang's optimization
* a version that works
* We need to check kv_cache passed in, this could be an error. TODO: add fast decoding path
* format
* temp solution: not batching prefill requests
* a version that works for prefill batching
* format
* a solid version: works normally
* a temp version
* Solid version: remove redundant functions
* fix format
* format
* solid: add option to enable selective_batching
* remove logic for using transformer models
* format
* format
* solid: enable argument VLLM_ENABLE_SELECTIVE_BATCHING
* format
* finish
* format 
							
						 
						
							2023-12-22 13:45:46 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								2f36769208 
								
							 
						 
						
							
							
								
								LLM: bigdl-llm lora support & lora example ( #9740 )  
							
							 
							
							... 
							
							
							
							* lora support and single card example
* support multi-card, refactor code
* fix model id and style
* remove torch patch, add two new class for bf16, update example
* fix style
* change to training_mode
* small fix
* add more info in help
* fixstyle, update readme
* fix ut
* fix ut
* Handling compatibility issues with default LoraConfig 
							
						 
						
							2023-12-22 11:05:39 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									SONG Ge 
								
							 
						 
						
							
							
							
							
								
							
							
								ba0b939579 
								
							 
						 
						
							
							
								
								[LLM] Support transformers-v4.36.0 on mistral model ( #9744 )  
							
							 
							
							... 
							
							
							
							* add support transformers-v4.36.0 on mistral model
* python/llm/src/bigdl/llm/transformers/models/mistral.py
* make the redundant implementation as utils
* fix code style
* fix
* fix style
* update with utils enough_kv_room 
							
						 
						
							2023-12-22 09:59:27 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xin Qiu 
								
							 
						 
						
							
							
							
							
								
							
							
								e36111e713 
								
							 
						 
						
							
							
								
								mixstral fused qkv and rope ( #9724 )  
							
							 
							
							... 
							
							
							
							* mixstral fused qkv and rope
* fix and clean
* fix style
* update
* update
* fix
* update
* fix 
							
						 
						
							2023-12-22 09:26:35 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jiao Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								e4f6e43675 
								
							 
						 
						
							
							
								
								safetenor to false ( #9728 )  
							
							 
							
							
							
						 
						
							2023-12-21 14:41:51 -08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Shaojun Liu 
								
							 
						 
						
							
							
							
							
								
							
							
								bb52239e0a 
								
							 
						 
						
							
							
								
								bigdl-llm stable version release & test ( #9732 )  
							
							 
							
							... 
							
							
							
							* stable version test
* trigger spr test
* update
* trigger
* test
* test
* test
* test
* test
* refine
* release linux first 
							
						 
						
							2023-12-21 22:55:33 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									WeiguangHan 
								
							 
						 
						
							
							
							
							
								
							
							
								d4d2ccdd9d 
								
							 
						 
						
							
							
								
								LLM: remove startcorder-15.5b ( #9748 )  
							
							 
							
							
							
						 
						
							2023-12-21 18:52:52 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									WeiguangHan 
								
							 
						 
						
							
							
							
							
								
							
							
								474c099559 
								
							 
						 
						
							
							
								
								LLM: using separate threads to do inference ( #9727 )  
							
							 
							
							... 
							
							
							
							* using separate threads to do inference
* resolve some comments
* resolve some comments
* revert llm_performance_tests.yml file 
							
						 
						
							2023-12-21 17:56:43 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								426660b88e 
								
							 
						 
						
							
							
								
								simplify qwen attention ( #9747 )  
							
							 
							
							
							
						 
						
							2023-12-21 17:53:29 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Wang, Jian4 
								
							 
						 
						
							
							
							
							
								
							
							
								984697afe2 
								
							 
						 
						
							
							
								
								LLM: Add bloom gguf support ( #9734 )  
							
							 
							
							... 
							
							
							
							* init
* update bloom add merges
* update
* update readme
* update for llama error
* update 
							
						 
						
							2023-12-21 14:06:25 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Heyang Sun 
								
							 
						 
						
							
							
							
							
								
							
							
								df775cf316 
								
							 
						 
						
							
							
								
								fix python style ( #9742 )  
							
							 
							
							... 
							
							
							
							* fix python style
* fix
* fix 
							
						 
						
							2023-12-21 11:25:05 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Chen, Zhentao 
								
							 
						 
						
							
							
							
							
								
							
							
								b06a3146c8 
								
							 
						 
						
							
							
								
								Fix 70b oom ( #9738 )  
							
							 
							
							... 
							
							
							
							* add default value to bigdl llm
* fix model oom 
							
						 
						
							2023-12-21 10:40:52 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xin Qiu 
								
							 
						 
						
							
							
							
							
								
							
							
								6c3e698bf1 
								
							 
						 
						
							
							
								
								mistral decoding_fast_path and fused mlp ( #9714 )  
							
							 
							
							... 
							
							
							
							* mistral decoding_fast_path and fused mlp
* meet code review 
							
						 
						
							2023-12-21 10:11:37 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Heyang Sun 
								
							 
						 
						
							
							
							
							
								
							
							
								d157f623b6 
								
							 
						 
						
							
							
								
								Load Mixtral gguf in a block-wise way ( #9725 )  
							
							 
							
							... 
							
							
							
							* Load Mixtral gguf in a block-wise way
* refine 
							
						 
						
							2023-12-21 10:03:23 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									WeiguangHan 
								
							 
						 
						
							
							
							
							
								
							
							
								34bb804189 
								
							 
						 
						
							
							
								
								LLM: check csv and its corresponding yaml file ( #9702 )  
							
							 
							
							... 
							
							
							
							* LLM: check csv and its corresponding yaml file
* run PR arc perf test
* modify the name of some variables
* execute the check results script in right place
* use cp to replace mv command
* resolve some comments
* resolve more comments
* revert the llm_performance_test.yaml file 
							
						 
						
							2023-12-21 09:54:33 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Zhao Changmin 
								
							 
						 
						
							
							
							
							
								
							
							
								4bda975a3e 
								
							 
						 
						
							
							
								
								LLM: Align lowbit model config ( #9735 )  
							
							 
							
							... 
							
							
							
							* align lowbit model config 
							
						 
						
							2023-12-21 09:48:58 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Wang, Jian4 
								
							 
						 
						
							
							
							
							
								
							
							
								e1e921f425 
								
							 
						 
						
							
							
								
								LLM: gguf other model using dtype ( #9729 )  
							
							 
							
							
							
						 
						
							2023-12-21 09:33:40 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								13ea6330bd 
								
							 
						 
						
							
							
								
								optimize qwen rope ( #9737 )  
							
							 
							
							
							
						 
						
							2023-12-20 17:34:34 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ziteng Zhang 
								
							 
						 
						
							
							
							
							
								
							
							
								4c032a433e 
								
							 
						 
						
							
							
								
								[LLM] Add glibc checker ( #9624 )  
							
							 
							
							... 
							
							
							
							* Add glibc checker
* Add env BIGDL_GLIBC_CHECK to control glibc checker. The default is false, i.e., don't check. 
							
						 
						
							2023-12-20 16:52:43 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yina Chen 
								
							 
						 
						
							
							
							
							
								
							
							
								cd652a1710 
								
							 
						 
						
							
							
								
								Support fp8 e5m2 on arc ( #9711 )  
							
							 
							
							... 
							
							
							
							* init
* fix style
* update
* fix style
* update 
							
						 
						
							2023-12-20 16:26:17 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								e54c428d30 
								
							 
						 
						
							
							
								
								add bf16/fp16 fuse mlp support ( #9726 )  
							
							 
							
							
							
						 
						
							2023-12-20 10:40:45 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Heyang Sun 
								
							 
						 
						
							
							
							
							
								
							
							
								612651cb5d 
								
							 
						 
						
							
							
								
								fix typo ( #9723 )  
							
							 
							
							
							
						 
						
							2023-12-20 09:41:59 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									WeiguangHan 
								
							 
						 
						
							
							
							
							
								
							
							
								3aa8b66bc3 
								
							 
						 
						
							
							
								
								LLM: remove starcoder-15.5b model temporarily ( #9720 )  
							
							 
							
							
							
						 
						
							2023-12-19 20:14:46 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								522cf5ed82 
								
							 
						 
						
							
							
								
								[LLM] Improve chatglm2/3 rest token performance with long context ( #9716 )  
							
							 
							
							
							
						 
						
							2023-12-19 17:29:38 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								f2e6abb563 
								
							 
						 
						
							
							
								
								fix mlp batch size check ( #9718 )  
							
							 
							
							
							
						 
						
							2023-12-19 14:22:22 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Heyang Sun 
								
							 
						 
						
							
							
							
							
								
							
							
								1fa7793fc0 
								
							 
						 
						
							
							
								
								Load Mixtral GGUF Model ( #9690 )  
							
							 
							
							... 
							
							
							
							* Load Mixtral GGUF Model
* refactor
* fix empty tensor when to cpu
* update gpu and cpu readmes
* add dtype when set tensor into module 
							
						 
						
							2023-12-19 13:54:38 +08:00