Yina Chen
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								670ad887fc
								
							
						 | 
						
							
							
								
								Qwen support compress kv (#11680)
							
							
							
							
							
							
							
							* Qwen support compress kv
* fix style
* fix 
							
						 | 
						
							2024-07-30 11:16:42 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									hxsz1997
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								9b36877897
								
							
						 | 
						
							
							
								
								disable default quantize_kv of GQA on MTL (#11679)
							
							
							
							
							
							
							
							* disable default quantizekv of gqa in mtl
* fix stype
* fix stype
* fix stype
* fix stype
* fix stype
* fix stype 
							
						 | 
						
							2024-07-30 09:38:46 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Yina Chen
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								fc7f8feb83
								
							
						 | 
						
							
							
								
								Support compress kv (#11642)
							
							
							
							
							
							
							
							* mistral snapkv
* update
* mtl update
* update
* update
* update
* add comments
* style fix
* fix style
* support llama
* llama use compress kv
* support mistral 4.40
* fix style
* support diff transformers versions
* move snapkv util to kv
* fix style
* meet comments & small fix
* revert all in one
* fix indent
---------
Co-authored-by: leonardozcm <leonardo1997zcm@gmail.com> 
							
						 | 
						
							2024-07-26 16:02:00 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Cengguang Zhang
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								70ab1a6f1a
								
							
						 | 
						
							
							
								
								LLM: unify memory optimization env variables. (#11549)
							
							
							
							
							
							
							
							* LLM: unify memory optimization env variables.
* fix comments. 
							
						 | 
						
							2024-07-11 11:01:28 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Guoqiong Song
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								c44b1942ed
								
							
						 | 
						
							
							
								
								fix mistral for transformers>=4.39 (#11191)
							
							
							
							
							
							
							
							* fix mistral for transformers>=4.39 
							
						 | 
						
							2024-06-18 13:39:35 -07:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Yina Chen
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								b6b70d1ba0
								
							
						 | 
						
							
							
								
								Divide core-xe packages (#11131)
							
							
							
							
							
							
							
							* temp
* add batch
* fix style
* update package name
* fix style
* add workflow
* use temp version to run uts
* trigger performance test
* trigger win igpu perf
* revert workflow & setup 
							
						 | 
						
							2024-05-28 12:00:18 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Yishuo Wang
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								170e3d65e0
								
							
						 | 
						
							
							
								
								use new sdp and fp32 sdp (#11007)
							
							
							
							
							
						 | 
						
							2024-05-14 14:29:18 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Cengguang Zhang
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								cfed76b2ed
								
							
						 | 
						
							
							
								
								LLM: add long-context support for Qwen1.5-7B/Baichuan2-7B/Mistral-7B. (#10937)
							
							
							
							
							
							
							
							* LLM: add split tensor support for baichuan2-7b and qwen1.5-7b.
* fix style.
* fix style.
* fix style.
* add support for mistral and fix condition threshold.
* fix  style.
* fix comments. 
							
						 | 
						
							2024-05-10 16:40:15 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Yishuo Wang
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								d884c62dc4
								
							
						 | 
						
							
							
								
								remove new_layout parameter (#10906)
							
							
							
							
							
						 | 
						
							2024-04-29 10:31:50 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Yina Chen
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								dc27b3bc35
								
							
						 | 
						
							
							
								
								Use sdp when rest token seq_len > 1 in llama & mistral (for lookup & spec) (#10790)
							
							
							
							
							
							
							
							* update sdp condition
* update
* fix
* update & test llama
* mistral
* fix style
* update
* fix style
* remove pvc constrain
* update ds on arc
* fix style 
							
						 | 
						
							2024-04-24 17:24:01 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Xin Qiu
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								e764f9b1b1
								
							
						 | 
						
							
							
								
								Disable fast fused rope on UHD  (#10780)
							
							
							
							
							
							
							
							* use decoding fast path
* update
* update
* cleanup 
							
						 | 
						
							2024-04-18 10:03:53 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Cengguang Zhang
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								3e2662c87e
								
							
						 | 
						
							
							
								
								LLM: fix get env KV_CACHE_ALLOC_BLOCK_LENGTH type. (#10771)
							
							
							
							
							
						 | 
						
							2024-04-16 09:32:30 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Yishuo Wang
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								8086554d33
								
							
						 | 
						
							
							
								
								use new fp16 sdp in llama and mistral (#10734)
							
							
							
							
							
						 | 
						
							2024-04-12 10:49:02 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Keyan (Kyrie) Zhang
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								585c174e92
								
							
						 | 
						
							
							
								
								Read the value of KV_CACHE_ALLOC_BLOCK_LENGTH from the environment variables (#10707)
							
							
							
							
							
							
							
							* Read the value of KV_CACHE_ALLOC_BLOCK_LENGTH from the environment variables.
* Fix style 
							
						 | 
						
							2024-04-10 10:48:46 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Yina Chen
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								c7422712fc
								
							
						 | 
						
							
							
								
								mistral 4.36 use fp16 sdp (#10704)
							
							
							
							
							
						 | 
						
							2024-04-09 13:50:33 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Yang Wang
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								5a1f446d3c
								
							
						 | 
						
							
							
								
								support fp8 in xetla (#10555)
							
							
							
							
							
							
							
							* support fp8 in xetla
* change name
* adjust model file
* support convert back to cpu
* factor
* fix bug
* fix style 
							
						 | 
						
							2024-04-08 13:22:09 -07:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Jiao Wang
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								69bdbf5806
								
							
						 | 
						
							
							
								
								Fix vllm print error message issue (#10664)
							
							
							
							
							
							
							
							* update chatglm readme
* Add condition to invalidInputError
* update
* update
* style 
							
						 | 
						
							2024-04-05 15:08:13 -07:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									binbin Deng
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								0a3e4e788f
								
							
						 | 
						
							
							
								
								LLM: fix mistral hidden_size setting for deepspeed autotp (#10527)
							
							
							
							
							
						 | 
						
							2024-03-26 10:55:44 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Xin Qiu
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								1dd40b429c
								
							
						 | 
						
							
							
								
								enable fp4 fused mlp and qkv (#10531)
							
							
							
							
							
							
							
							* enable fp4 fused mlp and qkv
* update qwen
* update qwen2 
							
						 | 
						
							2024-03-26 08:34:00 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Wang, Jian4
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								9df70d95eb
								
							
						 | 
						
							
							
								
								Refactor bigdl.llm to  ipex_llm (#24)
							
							
							
							
							
							
							
							* Rename bigdl/llm to ipex_llm
* rm python/llm/src/bigdl
* from bigdl.llm to from ipex_llm 
							
						 | 
						
							2024-03-22 15:41:21 +08:00 | 
						
						
							
							
							
								
							
							
						 |