Yuwen Hu 
								
							 
						 
						
							
							
							
							
								
							
							
								21de2613ce 
								
							 
						 
						
							
							
								
								[LLM] Add model loading time record for all-in-one benchmark ( #10201 )  
							
							 
							
							... 
							
							
							
							* Add model loading time record in csv for all-in-one benchmark
* Small fix
* Small fix to number after . 
							
						 
						
							2024-02-22 13:57:18 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ovo233 
								
							 
						 
						
							
							
							
							
								
							
							
								60e11b6739 
								
							 
						 
						
							
							
								
								LLM: Add mlp layer unit tests ( #10200 )  
							
							 
							
							... 
							
							
							
							* add mlp layer unit tests
* add download baichuan-13b
* exclude llama for now
* install additional packages
* rename bash file
* switch to Baichuan2
* delete attention related code
* fix name errors in yml file 
							
						 
						
							2024-02-22 13:44:45 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									SONG Ge 
								
							 
						 
						
							
							
							
							
								
							
							
								ca1166a0e5 
								
							 
						 
						
							
							
								
								[LLM] Add quantize kv_cache for Baichuan2-13B ( #10203 )  
							
							 
							
							... 
							
							
							
							* add quantize kv_cache for baichuan2-13b
* style fix 
							
						 
						
							2024-02-22 13:43:35 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								34ee1aa91f 
								
							 
						 
						
							
							
								
								LLM: add esimd sdp support for chatglm3 ( #10205 )  
							
							 
							
							... 
							
							
							
							* add esimd sdp support
* fix style 
							
						 
						
							2024-02-22 13:37:16 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuxuan Xia 
								
							 
						 
						
							
							
							
							
								
							
							
								7cbc2429a6 
								
							 
						 
						
							
							
								
								Fix C-Eval ChatGLM loading issue ( #10206 )  
							
							 
							
							... 
							
							
							
							* Add c-eval workflow and modify running files
* Modify the chatglm evaluator file
* Modify the ceval workflow for triggering test
* Modify the ceval workflow file
* Modify the ceval workflow file
* Modify ceval workflow
* Adjust the ceval dataset download
* Add ceval workflow dependencies
* Modify ceval workflow dataset download
* Add ceval test dependencies
* Add ceval test dependencies
* Correct the result print
* Fix the nightly test trigger time
* Fix ChatGLM loading issue 
							
						 
						
							2024-02-22 10:00:43 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
							
							
								
							
							
								94cb16fe40 
								
							 
						 
						
							
							
								
								[LLM] Small updates to Win GPU Install Doc ( #10199 )  
							
							 
							
							... 
							
							
							
							* Make Offline installer as default for win gpu doc for oneAPI
* Small other fixes 
							
						 
						
							2024-02-21 17:58:40 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									hxsz1997 
								
							 
						 
						
							
							
							
							
								
							
							
								5b387bb71a 
								
							 
						 
						
							
							
								
								Change the nightly test time of ppl and harness ( #10198 )  
							
							 
							
							... 
							
							
							
							* remove include and language option, select the corresponding dataset based on the model name in Run
* change the nightly test time
* change the nightly test time of harness and ppl 
							
						 
						
							2024-02-21 17:39:33 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
							
							
								
							
							
								9975b029c5 
								
							 
						 
						
							
							
								
								LLM: add qlora finetuning example using trl.SFTTrainer ( #10183 )  
							
							 
							
							
							
						 
						
							2024-02-21 16:40:04 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jason Dai 
								
							 
						 
						
							
							
							
							
								
							
							
								4655005f24 
								
							 
						 
						
							
							
								
								Update README ( #10186 )  
							
							 
							
							
							
						 
						
							2024-02-21 16:35:52 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								f7c96b19ef 
								
							 
						 
						
							
							
								
								LLM: support iq2 for mixtral ( #10191 )  
							
							 
							
							... 
							
							
							
							* support name mapping for mixtral
* support mixtral mixed quantization
* fix style
* fix 
							
						 
						
							2024-02-21 16:00:29 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Shaojun Liu 
								
							 
						 
						
							
							
							
							
								
							
							
								079f2011ea 
								
							 
						 
						
							
							
								
								Update bigdl-llm-finetune-qlora-xpu Docker Image ( #10194 )  
							
							 
							
							... 
							
							
							
							* Bump oneapi version to 2024.0
* pip install bitsandbytes scipy
* Pin level-zero-gpu version
* Pin accelerate version 0.23.0 
							
						 
						
							2024-02-21 15:18:27 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									yb-peng 
								
							 
						 
						
							
							
							
							
								
							
							
								b1a97b71a9 
								
							 
						 
						
							
							
								
								Harness eval: Add is_last parameter and fix logical operator in highlight_vals ( #10192 )  
							
							 
							
							... 
							
							
							
							* Add is_last parameter and fix logical operator in highlight_vals
* Add script to update HTML files in parent folder
* Add running update_html_in_parent_folder.py in summarize step
* Add licence info
* Remove update_html_in_parent_folder.py in Summarize the results for pull request 
							
						 
						
							2024-02-21 14:45:32 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Zhicun 
								
							 
						 
						
							
							
							
							
								
							
							
								c7e839e66c 
								
							 
						 
						
							
							
								
								Add Qwen1.5-7B-Chat ( #10113 )  
							
							 
							
							... 
							
							
							
							* add Qwen1.5-7B-Chat
* modify Qwen1.5 example
* update README
* update prompt format
* update folder name and example README
* add Chinese prompt sample output
* update link in README
* correct the link
* update transformer version 
							
						 
						
							2024-02-21 13:29:29 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xin Qiu 
								
							 
						 
						
							
							
							
							
								
							
							
								56ad781f2f 
								
							 
						 
						
							
							
								
								qwen2 cpu fix ( #10187 )  
							
							 
							
							
							
						 
						
							2024-02-21 11:23:51 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Chen, Zhentao 
								
							 
						 
						
							
							
							
							
								
							
							
								39d37bd042 
								
							 
						 
						
							
							
								
								upgrade harness package version in workflow ( #10188 )  
							
							 
							
							... 
							
							
							
							* upgrade harness
* update readme 
							
						 
						
							2024-02-21 11:21:30 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
							
							
								
							
							
								001c13243e 
								
							 
						 
						
							
							
								
								[LLM] Add support for low_low_bit benchmark on Windows GPU ( #10167 )  
							
							 
							
							... 
							
							
							
							* Add support for low_low_bit performance test on Windows GPU
* Small fix
* Small fix
* Save memory during converting model process
* Drop the results for first time when loading in low bit on mtl igpu for better performance
* Small fix 
							
						 
						
							2024-02-21 10:51:52 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ziteng Zhang 
								
							 
						 
						
							
							
							
							
								
							
							
								276ef0e885 
								
							 
						 
						
							
							
								
								Speculative Ziya on CPU ( #10160 )  
							
							 
							
							... 
							
							
							
							* Speculative Ziya on CPU
* Without part of Accelerate with BIGDL_OPT_IPEX 
							
						 
						
							2024-02-21 10:30:39 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Zhao Changmin 
								
							 
						 
						
							
							
							
							
								
							
							
								4fbf449c2d 
								
							 
						 
						
							
							
								
								for rwkv4 ( #10179 )  
							
							 
							
							
							
						 
						
							2024-02-21 10:11:10 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									yb-peng 
								
							 
						 
						
							
							
							
							
								
							
							
								de3dc609ee 
								
							 
						 
						
							
							
								
								Modify harness evaluation workflow ( #10174 )  
							
							 
							
							... 
							
							
							
							* Modify table head in harness
* Specify the file path of fp16.csv
* change run to run nightly and run pr to debug
* Modify the way to get fp16.csv to downloading from github
* Change the method to calculate diff in html table
* Change the method to calculate diff in html table
* Re-arrange job order
* Re-arrange job order
* Change limit
* Change fp16.csv  path
* Change highlight rules
* Change limit 
							
						 
						
							2024-02-20 18:55:43 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									hxsz1997 
								
							 
						 
						
							
							
							
							
								
							
							
								b55fd00fb1 
								
							 
						 
						
							
							
								
								remove include and language option, select the corresponding dataset based on the model name in Run ( #10181 )  
							
							 
							
							
							
						 
						
							2024-02-20 17:34:52 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								3288acb8de 
								
							 
						 
						
							
							
								
								LLM : Support embedding quantization (only q2k now) ( #10170 )  
							
							 
							
							... 
							
							
							
							* basic logic added
* basic support
* support save&load, update mixed strategy
* fix style
* use int8 for lm_head
* add check for xpu 
							
						 
						
							2024-02-20 16:56:57 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Lilac09 
								
							 
						 
						
							
							
							
							
								
							
							
								eca69a6022 
								
							 
						 
						
							
							
								
								Fix build error of bigdl-llm-cpu ( #10176 )  
							
							 
							
							... 
							
							
							
							* fix build error
* fix build error
* fix build error
* fix build error 
							
						 
						
							2024-02-20 14:50:12 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									hxsz1997 
								
							 
						 
						
							
							
							
							
								
							
							
								6e10d98a8d 
								
							 
						 
						
							
							
								
								Fix some typos ( #10175 )  
							
							 
							
							... 
							
							
							
							* add llm-ppl workflow
* update the DATASET_DIR
* test multiple precisions
* modify nightly test
* match the updated ppl code
* add matrix.include
* fix the include error
* update the include
* add more model
* update the precision of include
* update nightly time and add more models
* fix the workflow_dispatch description, change default model of pr and modify the env
* modify workflow_dispatch language options
* modify options
* modify language options
* modeify workflow_dispatch type
* modify type
* modify the type of language
* change seq_len type
* fix some typos
* revert changes to stress_test.txt 
							
						 
						
							2024-02-20 14:14:53 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Zhicun 
								
							 
						 
						
							
							
							
							
								
							
							
								add3899311 
								
							 
						 
						
							
							
								
								Add ziya CPU example ( #10114 )  
							
							 
							
							... 
							
							
							
							* ziya on CPU
* add README for ziya
* specify use_cache
* add arc CPU
* update prompt format
* update link
* add comments to emphasize use_cache
* update pip cmd 
							
						 
						
							2024-02-20 13:59:52 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuxuan Xia 
								
							 
						 
						
							
							
							
							
								
							
							
								71875ebc24 
								
							 
						 
						
							
							
								
								Fix the C-Eval nightly test trigger time ( #10172 )  
							
							 
							
							... 
							
							
							
							* Add c-eval workflow and modify running files
* Modify the chatglm evaluator file
* Modify the ceval workflow for triggering test
* Modify the ceval workflow file
* Modify the ceval workflow file
* Modify ceval workflow
* Adjust the ceval dataset download
* Add ceval workflow dependencies
* Modify ceval workflow dataset download
* Add ceval test dependencies
* Add ceval test dependencies
* Correct the result print
* Fix the nightly test trigger time 
							
						 
						
							2024-02-20 09:53:59 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
							
							
								
							
							
								2bb96c775c 
								
							 
						 
						
							
							
								
								LLM: fix device setting during saving optimized model ( #10154 )  
							
							 
							
							
							
						 
						
							2024-02-20 09:52:59 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xin Qiu 
								
							 
						 
						
							
							
							
							
								
							
							
								1f6d5b9f30 
								
							 
						 
						
							
							
								
								enable fused rmsnorm and rope qwen2 ( #10163 )  
							
							 
							
							... 
							
							
							
							* qwen2
* change convert
* cleanup 
							
						 
						
							2024-02-20 08:33:09 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									yb-peng 
								
							 
						 
						
							
							
							
							
								
							
							
								e31210ba00 
								
							 
						 
						
							
							
								
								Modify html table style and add fp16.csv in harness ( #10169 )  
							
							 
							
							... 
							
							
							
							* Specify the version of pandas in harness evaluation workflow
* Specify the version of pandas in harness evaluation workflow
* Modify html table style and add fp16.csv in harness
* Modify comments 
							
						 
						
							2024-02-19 18:13:40 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									WeiguangHan 
								
							 
						 
						
							
							
							
							
								
							
							
								6c09aed90d 
								
							 
						 
						
							
							
								
								LLM: add qwen_1.5_7b model for arc perf test ( #10166 )  
							
							 
							
							... 
							
							
							
							* LLM: add qwen_1.5_7b model for arc perf test
* small fix
* revert some codes 
							
						 
						
							2024-02-19 17:21:00 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuxuan Xia 
								
							 
						 
						
							
							
							
							
								
							
							
								209122559a 
								
							 
						 
						
							
							
								
								Add Ceval workflow and modify the result printing ( #10140 )  
							
							 
							
							... 
							
							
							
							* Add c-eval workflow and modify running files
* Modify the chatglm evaluator file
* Modify the ceval workflow for triggering test
* Modify the ceval workflow file
* Modify the ceval workflow file
* Modify ceval workflow
* Adjust the ceval dataset download
* Add ceval workflow dependencies
* Modify ceval workflow dataset download
* Add ceval test dependencies
* Add ceval test dependencies
* Correct the result print 
							
						 
						
							2024-02-19 17:06:53 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									yb-peng 
								
							 
						 
						
							
							
							
							
								
							
							
								50fa004ba5 
								
							 
						 
						
							
							
								
								Specify the version of pandas in harness evaluation workflow ( #10159 )  
							
							 
							
							... 
							
							
							
							* Specify the version of pandas in harness evaluation workflow
* Specify the version of pandas in harness evaluation workflow 
							
						 
						
							2024-02-19 16:27:08 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Zhao Changmin 
								
							 
						 
						
							
							
							
							
								
							
							
								f8730e8dc1 
								
							 
						 
						
							
							
								
								Skip rescale rwkv linear when load_low_bit ( #10164 )  
							
							 
							
							... 
							
							
							
							* rwkv_ld 
							
						 
						
							2024-02-19 15:56:42 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Heyang Sun 
								
							 
						 
						
							
							
							
							
								
							
							
								3e2af5ec0a 
								
							 
						 
						
							
							
								
								Fix IPEX Baichuan Speculative ( #10162 )  
							
							 
							
							... 
							
							
							
							* Fix IPEX Baichuan Speculative
* compatible with 13B
* Update speculative.py 
							
						 
						
							2024-02-19 15:27:34 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Cheen Hau, 俊豪 
								
							 
						 
						
							
							
							
							
								
							
							
								6952847f68 
								
							 
						 
						
							
							
								
								GPU install doc - add pip install oneAPI for windows ( #10157 )  
							
							 
							
							... 
							
							
							
							* Add instructions for pip install oneAPI for windows
* Improve clarity
* Format fix
* Fix
* Fix in runtime configuration 
							
						 
						
							2024-02-19 14:46:08 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yina Chen 
								
							 
						 
						
							
							
							
							
								
							
							
								23c91cdce6 
								
							 
						 
						
							
							
								
								[LLM] Add min_step_draft in speculative decoding ( #10142 )  
							
							 
							
							... 
							
							
							
							* Fix gptj kvcache & position id
* Add min_draft_tokens in speculative decoding
* fix style
* update 
							
						 
						
							2024-02-19 14:31:41 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Chen, Zhentao 
								
							 
						 
						
							
							
							
							
								
							
							
								14ba2c5135 
								
							 
						 
						
							
							
								
								Harness: remove deprecated files ( #10165 )  
							
							 
							
							
							
						 
						
							2024-02-19 14:27:49 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Wang, Jian4 
								
							 
						 
						
							
							
							
							
								
							
							
								d3591383d5 
								
							 
						 
						
							
							
								
								LLM : Add CPU chatglm3 speculative example ( #10004 )  
							
							 
							
							... 
							
							
							
							* init chatglm
* update
* update 
							
						 
						
							2024-02-19 13:38:52 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Wang, Jian4 
								
							 
						 
						
							
							
							
							
								
							
							
								f2417e083c 
								
							 
						 
						
							
							
								
								LLM: enable chatglm3-6b target_model ipex ( #10085 )  
							
							 
							
							... 
							
							
							
							* init
* always make casual_mask
* not return last tensor
* update
* optimize_model = False
* enable optimized=False
* enable optimized_model=true
* speed_up ipex target_model
* remove if True
* use group_size
* update python style
* update
* update 
							
						 
						
							2024-02-19 13:38:32 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Heyang Sun 
								
							 
						 
						
							
							
							
							
								
							
							
								177273c1a4 
								
							 
						 
						
							
							
								
								IPEX Speculative Support for Baichuan2 7B ( #10112 )  
							
							 
							
							... 
							
							
							
							* IPEX Speculative Support for Baichuan2 7B
* fix license problems
* refine 
							
						 
						
							2024-02-19 09:12:57 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jason Dai 
								
							 
						 
						
							
							
							
							
								
							
							
								6f38e604de 
								
							 
						 
						
							
							
								
								Fix README.md ( #10156 )  
							
							 
							
							
							
						 
						
							2024-02-18 21:51:40 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Shaojun Liu 
								
							 
						 
						
							
							
							
							
								
							
							
								7a3a20cf5b 
								
							 
						 
						
							
							
								
								Fix: GitHub-owned GitHubAction not pinned by hash ( #10152 )  
							
							 
							
							
							
						 
						
							2024-02-18 16:49:28 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Shaojun Liu 
								
							 
						 
						
							
							
							
							
								
							
							
								c3daacec6d 
								
							 
						 
						
							
							
								
								Fix Token Permission issues ( #10151 )  
							
							 
							
							... 
							
							
							
							Co-authored-by: Your Name <Your Email> 
							
						 
						
							2024-02-18 13:23:54 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yina Chen 
								
							 
						 
						
							
							
							
							
								
							
							
								1508d6b089 
								
							 
						 
						
							
							
								
								Fix gptj kvcache & position id ( #10141 )  
							
							 
							
							
							
						 
						
							2024-02-18 10:02:49 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Kai Huang 
								
							 
						 
						
							
							
							
							
								
							
							
								7400401706 
								
							 
						 
						
							
							
								
								Update gpu pip install oneapi doc ( #10137 )  
							
							 
							
							... 
							
							
							
							* fix link
* fix
* fix
* minor 
							
						 
						
							2024-02-09 11:27:40 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									yb-peng 
								
							 
						 
						
							
							
							
							
								
							
							
								b7c5104d98 
								
							 
						 
						
							
							
								
								remove limit in harness run ( #10139 )  
							
							 
							
							
							
						 
						
							2024-02-09 11:20:53 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									yb-peng 
								
							 
						 
						
							
							
							
							
								
							
							
								b4dc33def6 
								
							 
						 
						
							
							
								
								In harness-evaluation workflow, add statistical tables ( #10118 )  
							
							 
							
							... 
							
							
							
							* chnage storage
* fix typo
* change label
* change label to arc03
* change needs in the last step
* add generate csv in harness/make_table_results.py
* modify needs in the last job
* add csv to html
* mfix path issue in llm-harness-summary-nightly
* modify output_path
* modify args in make_table_results.py
* modify make table command in summary
* change pr env label
* remove irrelevant code in summary; add set output path step; add limit in harness run
* re-organize code structure
* modify limit in run harness
* modify csv_to_html input path
* modify needs in summary-nightly 
							
						 
						
							2024-02-08 19:01:05 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Shaojun Liu 
								
							 
						 
						
							
							
							
							
								
							
							
								c2378a9546 
								
							 
						 
						
							
							
								
								Fix code scanning issues ( #10129 )  
							
							 
							
							... 
							
							
							
							* Fix code scanning issues
* update oneccl_bind_pt link
* update
* update
---------
Co-authored-by: Your Name <Your Email> 
							
						 
						
							2024-02-08 17:19:44 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								4d33aac7f9 
								
							 
						 
						
							
							
								
								quick fix qwen2 fp8 kv cache ( #10135 )  
							
							 
							
							
							
						 
						
							2024-02-08 17:04:59 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Cengguang Zhang 
								
							 
						 
						
							
							
							
							
								
							
							
								39d90839aa 
								
							 
						 
						
							
							
								
								LLM: add quantize kv cache for llama. ( #10086 )  
							
							 
							
							... 
							
							
							
							* feat: add quantize kv cache for llama.
* fix style.
* add quantized attention forward function.
* revert style.
* fix style.
* fix style.
* update quantized kv cache and add quantize_qkv
* fix style.
* fix style.
* optimize quantize kv cache.
* fix style. 
							
						 
						
							2024-02-08 16:49:22 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
							
							
								
							
							
								d848efe17c 
								
							 
						 
						
							
							
								
								add quantize kv cache support for qwen2 ( #10134 )  
							
							 
							
							
							
						 
						
							2024-02-08 16:17:21 +08:00