Chu,Youcheng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ffa9a9e1b3 
								
							 
						 
						
							
							
								
								Update streaming in npu examples ( #12495 )  
							
							 
							
							... 
							
							
							
							* feat: add streaming
* Update readme accordingly
---------
Co-authored-by: Yuwen Hu <yuwen.hu@intel.com> 
							
						 
						
							2024-12-04 17:51:10 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jin, Qiao 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7082844f3f 
								
							 
						 
						
							
							
								
								Fix NPU LLM example save/load tokenizer ( #12485 )  
							
							 
							
							
							
						 
						
							2024-12-03 16:30:55 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ab01753b1c 
								
							 
						 
						
							
							
								
								[NPU] update save-load API usage ( #12473 )  
							
							 
							
							
							
						 
						
							2024-12-03 09:46:15 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c911026f03 
								
							 
						 
						
							
							
								
								[NPU C++] Update model support & examples & benchmark  ( #12466 )  
							
							 
							
							
							
						 
						
							2024-11-29 13:35:58 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yina Chen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								e246f1e258 
								
							 
						 
						
							
							
								
								update llama3 npu example ( #11933 )  
							
							 
							
							
							
						 
						
							2024-08-27 13:03:18 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Zijie Li 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								6c3eb1e1e8 
								
							 
						 
						
							
							
								
								refactor from_pretrained API for NPU ( #11927 )  
							
							 
							
							
							
						 
						
							2024-08-27 09:50:30 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									SONG Ge 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								8c5c7f32dd 
								
							 
						 
						
							
							
								
								Update doc for running npu generate example with ipex-llm[npu] ( #11876 )  
							
							 
							
							... 
							
							
							
							* update doc for running npu generate example with ipex-llm[npu]
* switch max_prompt_len to 512 to fix compile error on mtl 
							
						 
						
							2024-08-21 13:45:29 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									SONG Ge 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7380823f3f 
								
							 
						 
						
							
							
								
								Update Llama2 multi-processes example ( #11852 )  
							
							 
							
							... 
							
							
							
							* update llama2 multi-processes examples
* update
* update readme
* update 
							
						 
						
							2024-08-19 19:49:01 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yang Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								99b05ba1dc 
								
							 
						 
						
							
							
								
								separate prefill into a process ( #11787 )  
							
							 
							
							... 
							
							
							
							* seperate prefill into a process
* using model.share_memory()
* might work
* worked
* use long prompt
* refactor
* cleanup
* fix bug
* clean up
* changable inter and intra process stages
* refactor
* add max output len
* fix npu_model changes that may cause generate down
* fix npu_model generate import error
* fix generare forward error
---------
Co-authored-by: sgwhat <ge.song@intel.com> 
							
						 
						
							2024-08-19 17:53:36 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yang Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								51bcac1229 
								
							 
						 
						
							
							
								
								follow up on experimental support of fused decoder layer for llama2 ( #11785 )  
							
							 
							
							... 
							
							
							
							* clean up and support transpose value cache
* refine
* fix style
* fix style 
							
						 
						
							2024-08-13 18:53:55 -07:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								23d3acdc77 
								
							 
						 
						
							
							
								
								Add experimental support of fused decoder layer for llama2 ( #11768 )  
							
							 
							
							
							
						 
						
							2024-08-13 14:41:36 +08:00