Chu,Youcheng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ffa9a9e1b3 
								
							 
						 
						
							
							
								
								Update streaming in npu examples ( #12495 )  
							
							 
							
							... 
							
							
							
							* feat: add streaming
* Update readme accordingly
---------
Co-authored-by: Yuwen Hu <yuwen.hu@intel.com> 
							
						 
						
							2024-12-04 17:51:10 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a9e3f7f14c 
								
							 
						 
						
							
							
								
								optimize minicpm ( #12496 )  
							
							 
							
							
							
						 
						
							2024-12-04 17:14:16 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									joan726 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ae9c2154f4 
								
							 
						 
						
							
							
								
								Added cross-links ( #12494 )  
							
							 
							
							... 
							
							
							
							* Update install_linux_gpu.zh-CN.md
Add the link for guide of windows installation.
* Update install_windows_gpu.zh-CN.md
Add the link for guide of linux installation.
* Update install_windows_gpu.md
Add the link for guide of Linux installation.
* Update install_linux_gpu.md
Add the link for guide of Windows installation.
* Update install_linux_gpu.md
Modify based on comments.
* Update install_windows_gpu.md
Modify based on comments 
							
						 
						
							2024-12-04 16:53:13 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								e0bf0054e1 
								
							 
						 
						
							
							
								
								small fix ( #12493 )  
							
							 
							
							
							
						 
						
							2024-12-04 16:37:39 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Kai Huang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7ff4533b39 
								
							 
						 
						
							
							
								
								Support hf generate ( #12477 )  
							
							 
							
							... 
							
							
							
							* generate
* style
* update
* remove timing
* style
* style
* combine generate api
* simple in kwargs 
							
						 
						
							2024-12-04 16:31:09 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ef4028ac2d 
								
							 
						 
						
							
							
								
								[NPU] Support split lm_head for Qwen2 with CPP ( #12491 )  
							
							 
							
							... 
							
							
							
							* Use split for Qwen2 lm_head instead of slice in optimize_pre
* Support split lm_head in Qwen2 python cpp backend
* Fit with Python acc lib pipeline
* Removed default mixed_precision=True in all-in-one and related examples
* Small fix
* Style fix
* Fix based on comments
* Fix based on comments
* Stype fix 
							
						 
						
							2024-12-04 14:41:08 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								5629fdd518 
								
							 
						 
						
							
							
								
								optimize qwen2_vl multiple image input or video input ( #12487 )  
							
							 
							
							
							
						 
						
							2024-12-04 09:24:38 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c59284418c 
								
							 
						 
						
							
							
								
								Hotfix of BCE-Emdedding model ( #12490 )  
							
							 
							
							
							
						 
						
							2024-12-03 18:16:04 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jason Dai 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								80f15e41f5 
								
							 
						 
						
							
							
								
								Update README.md ( #12489 )  
							
							 
							
							
							
						 
						
							2024-12-03 18:02:28 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								4ac66db034 
								
							 
						 
						
							
							
								
								[NPU] Support streaming in Python (cpp backend) ( #12488 )  
							
							 
							
							... 
							
							
							
							* Support streaming in NPU Python (cpp backend)
* Small fix 
							
						 
						
							2024-12-03 17:17:26 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jin, Qiao 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7082844f3f 
								
							 
						 
						
							
							
								
								Fix NPU LLM example save/load tokenizer ( #12485 )  
							
							 
							
							
							
						 
						
							2024-12-03 16:30:55 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jin, Qiao 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								5fe766788e 
								
							 
						 
						
							
							
								
								Fix MiniCPM-V-2_6 running on NPU ( #12486 )  
							
							 
							
							
							
						 
						
							2024-12-03 16:16:29 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								598603bea6 
								
							 
						 
						
							
							
								
								small fix of imatrix ( #12480 )  
							
							 
							
							
							
						 
						
							2024-12-03 10:46:36 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ab01753b1c 
								
							 
						 
						
							
							
								
								[NPU] update save-load API usage ( #12473 )  
							
							 
							
							
							
						 
						
							2024-12-03 09:46:15 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								26adb82ee3 
								
							 
						 
						
							
							
								
								[NPU] Remove hard code ( #12479 )  
							
							 
							
							
							
						 
						
							2024-12-02 18:26:07 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								b2e56a2e03 
								
							 
						 
						
							
							
								
								Add release support for option xpu_arc ( #12422 )  
							
							 
							
							... 
							
							
							
							* Add release support for xpu-arc
* Dependency update 
							
						 
						
							2024-12-02 17:16:04 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								aee9acb303 
								
							 
						 
						
							
							
								
								Add NPU QuickStart & update example links ( #12470 )  
							
							 
							
							... 
							
							
							
							* Add initial NPU quickstart (c++ part unfinished)
* Small update
* Update based on comments
* Update main readme
* Remove LLaMA description
* Small fix
* Small fix
* Remove subsection link in main README
* Small fix
* Update based on comments
* Small fix
* TOC update and other small fixes
* Update for Chinese main readme
* Update based on comments and other small fixes
* Change order 
							
						 
						
							2024-12-02 17:03:10 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jin, Qiao 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								31c69a8d31 
								
							 
						 
						
							
							
								
								Fix MiniCPM-V models running on NPU ( #12478 )  
							
							 
							
							
							
						 
						
							2024-12-02 16:29:46 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								54d9a590d4 
								
							 
						 
						
							
							
								
								[NPU]Fix eos_token setting ( #12475 )  
							
							 
							
							
							
						 
						
							2024-12-02 14:18:22 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Guancheng Fu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								59bd4a214f 
								
							 
						 
						
							
							
								
								add vLLM glm4 fix ( #12474 )  
							
							 
							
							
							
						 
						
							2024-12-02 14:05:16 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								4b6c3160be 
								
							 
						 
						
							
							
								
								Support imatrix-guided quantization for NPU CW ( #12468 )  
							
							 
							
							... 
							
							
							
							* init commit
* remove print
* add interface
* fix
* fix
* fix style 
							
						 
						
							2024-12-02 11:31:26 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f99f188023 
								
							 
						 
						
							
							
								
								Hotfix of benchmark script ( #12467 )  
							
							 
							
							
							
						 
						
							2024-11-29 14:00:59 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c911026f03 
								
							 
						 
						
							
							
								
								[NPU C++] Update model support & examples & benchmark  ( #12466 )  
							
							 
							
							
							
						 
						
							2024-11-29 13:35:58 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								14d8d3d8af 
								
							 
						 
						
							
							
								
								Integrate NPU C++ imple into ipex-llm ( #12461 )  
							
							 
							
							
							
						 
						
							2024-11-29 09:25:37 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								490bb0ca53 
								
							 
						 
						
							
							
								
								[NPU] update fused layers for GW ( #12459 )  
							
							 
							
							... 
							
							
							
							* update fused layers for GW
* fix
* fix llama condition for glm model
* update 
							
						 
						
							2024-11-28 17:14:30 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yina Chen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								1b533a105c 
								
							 
						 
						
							
							
								
								[NPU] Add env to enable scale search ( #12462 )  
							
							 
							
							... 
							
							
							
							* add env enable scale search
* address comment
* move logic 
							
						 
						
							2024-11-28 17:06:00 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Heyang Sun 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								d272f6b471 
								
							 
						 
						
							
							
								
								remove nf4 unsupport comment in cpu finetuning ( #12460 )  
							
							 
							
							... 
							
							
							
							Co-authored-by: Ariadne <wyn2000330@126.com> 
							
						 
						
							2024-11-28 13:26:46 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								b29da30205 
								
							 
						 
						
							
							
								
								[NPU] Update C++ L0 ( #12458 )  
							
							 
							
							... 
							
							
							
							* update
* fix style 
							
						 
						
							2024-11-27 22:08:48 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a2272b70d3 
								
							 
						 
						
							
							
								
								Small fix in llama.cpp troubleshooting guide ( #12457 )  
							
							 
							
							
							
						 
						
							2024-11-27 19:22:11 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								6f3441ba4c 
								
							 
						 
						
							
							
								
								fix glm4-9b overflow ( #12455 )  
							
							 
							
							
							
						 
						
							2024-11-27 17:39:13 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								281c9b0bb9 
								
							 
						 
						
							
							
								
								[NPU] Add L0 support for NPU C++ ( #12454 )  
							
							 
							
							... 
							
							
							
							* add L0 models support
* meet review
* fix style 
							
						 
						
							2024-11-27 17:04:13 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Chu,Youcheng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ce6fcaa9ba 
								
							 
						 
						
							
							
								
								update transformers version in example of glm4 ( #12453 )  
							
							 
							
							... 
							
							
							
							* fix: update transformers version in example of glm4
* fix: textual adjustments
* fix: texual adjustment 
							
						 
						
							2024-11-27 15:02:25 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								effb9bb41c 
								
							 
						 
						
							
							
								
								Small update to LangChain examples readme ( #12452 )  
							
							 
							
							
							
						 
						
							2024-11-27 14:02:25 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Chu,Youcheng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								acd77d9e87 
								
							 
						 
						
							
							
								
								Remove env variable BIGDL_LLM_XMX_DISABLED in documentation ( #12445 )  
							
							 
							
							... 
							
							
							
							* fix: remove BIGDL_LLM_XMX_DISABLED in mddocs
* fix: remove set SYCL_CACHE_PERSISTENT=1 in example
* fix: remove BIGDL_LLM_XMX_DISABLED in workflows
* fix: merge igpu and A-series Graphics
* fix: remove set BIGDL_LLM_XMX_DISABLED=1 in example
* fix: remove BIGDL_LLM_XMX_DISABLED in workflows
* fix: merge igpu and A-series Graphics
* fix: textual adjustment
* fix: textual adjustment
* fix: textual adjustment 
							
						 
						
							2024-11-27 11:16:36 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f8c2bb2943 
								
							 
						 
						
							
							
								
								[NPU] optimize qwen2 prefill performance for C++ ( #12451 )  
							
							 
							
							
							
						 
						
							2024-11-27 10:46:18 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Guancheng Fu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								8331875f34 
								
							 
						 
						
							
							
								
								Fix ( #12390 )  
							
							 
							
							
							
						 
						
							2024-11-27 10:41:58 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jun Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								cb7b08948b 
								
							 
						 
						
							
							
								
								update vllm-docker-quick-start for vllm0.6.2 ( #12392 )  
							
							 
							
							... 
							
							
							
							* update vllm-docker-quick-start for vllm0.6.2
* [UPDATE] rm max-num-seqs parameter in vllm-serving script 
							
						 
						
							2024-11-27 08:47:03 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7b40f9b372 
								
							 
						 
						
							
							
								
								[NPU] Support GW for NPU C++ ( #12450 )  
							
							 
							
							
							
						 
						
							2024-11-26 17:46:40 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jin, Qiao 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c2efa264d9 
								
							 
						 
						
							
							
								
								Update LangChain examples to use upstream ( #12388 )  
							
							 
							
							... 
							
							
							
							* Update LangChain examples to use upstream
* Update README and fix links
* Update LangChain CPU examples to use upstream
* Update LangChain CPU voice_assistant example
* Update CPU README
* Update GPU README
* Remove GPU Langchain vLLM example and fix comments
* Change langchain -> LangChain
* Add reference for both upstream llms and embeddings
* Fix comments
* Fix comments
* Fix comments
* Fix comments
* Fix comment 
							
						 
						
							2024-11-26 16:43:15 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								24b46b2b19 
								
							 
						 
						
							
							
								
								[NPU] further fix  of qwen2 int8 pipeline & C++ ( #12449 )  
							
							 
							
							... 
							
							
							
							* fix
* fix style 
							
						 
						
							2024-11-26 16:39:39 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								303b104c10 
								
							 
						 
						
							
							
								
								Fix abnormal output for Qwen2-7B when sym_int8 ( #12446 )  
							
							 
							
							
							
						 
						
							2024-11-26 15:53:04 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Pepijn de Vos 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								71e1f11aa6 
								
							 
						 
						
							
							
								
								update serving image runtime ( #12433 )  
							
							 
							
							
							
						 
						
							2024-11-26 14:55:30 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								52c17fe104 
								
							 
						 
						
							
							
								
								Optimize first token of C++ NPU by adding npu_dpu_groups ( #12443 )  
							
							 
							
							... 
							
							
							
							* add npu_dpu_groups
* add check for env
* fix style 
							
						 
						
							2024-11-26 11:41:32 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jinhe 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								66bd7abae4 
								
							 
						 
						
							
							
								
								add sdxl and lora-lcm optimization ( #12444 )  
							
							 
							
							... 
							
							
							
							* add sdxl and lora-lcm optimization
* fix openjourney speed drop 
							
						 
						
							2024-11-26 11:38:09 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								0e23bd779f 
								
							 
						 
						
							
							
								
								Add support of llama3.2 for NPU C++ ( #12442 )  
							
							 
							
							... 
							
							
							
							* initial support of  llama3.2
* update
* update
* fix style
* fix style
* fix
* small fix 
							
						 
						
							2024-11-26 09:26:55 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								cdd41f5e4c 
								
							 
						 
						
							
							
								
								optimize sdxl again ( #12441 )  
							
							 
							
							
							
						 
						
							2024-11-25 17:46:46 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								b9abb8a285 
								
							 
						 
						
							
							
								
								Support qwen2.5 3B for NPU & update related examples ( #12438 )  
							
							 
							
							... 
							
							
							
							* update qwen2.5-3B
* update convert
* small fix
* replace load_in_low_bit with low_bit
* small fix 
							
						 
						
							2024-11-25 16:38:31 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jinhe 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								b633fbf26c 
								
							 
						 
						
							
							
								
								add chinese prompt troubleshooting for npu cpp examples ( #12437 )  
							
							 
							
							... 
							
							
							
							* add chinese prompt troubleshooting
* add chinese prompt troubleshooting 
							
						 
						
							2024-11-25 15:28:47 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								8164aed802 
								
							 
						 
						
							
							
								
								small change ( #12439 )  
							
							 
							
							
							
						 
						
							2024-11-25 14:35:49 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								be132c4209 
								
							 
						 
						
							
							
								
								fix and optimize sd ( #12436 )  
							
							 
							
							
							
						 
						
							2024-11-25 14:09:48 +08:00