Heyang Sun 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								d272f6b471 
								
							 
						 
						
							
							
								
								remove nf4 unsupport comment in cpu finetuning ( #12460 )  
							
							 
							
							... 
							
							
							
							Co-authored-by: Ariadne <wyn2000330@126.com> 
							
						 
						
							2024-11-28 13:26:46 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								b29da30205 
								
							 
						 
						
							
							
								
								[NPU] Update C++ L0 ( #12458 )  
							
							 
							
							... 
							
							
							
							* update
* fix style 
							
						 
						
							2024-11-27 22:08:48 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a2272b70d3 
								
							 
						 
						
							
							
								
								Small fix in llama.cpp troubleshooting guide ( #12457 )  
							
							 
							
							
							
						 
						
							2024-11-27 19:22:11 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								6f3441ba4c 
								
							 
						 
						
							
							
								
								fix glm4-9b overflow ( #12455 )  
							
							 
							
							
							
						 
						
							2024-11-27 17:39:13 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								281c9b0bb9 
								
							 
						 
						
							
							
								
								[NPU] Add L0 support for NPU C++ ( #12454 )  
							
							 
							
							... 
							
							
							
							* add L0 models support
* meet review
* fix style 
							
						 
						
							2024-11-27 17:04:13 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Chu,Youcheng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ce6fcaa9ba 
								
							 
						 
						
							
							
								
								update transformers version in example of glm4 ( #12453 )  
							
							 
							
							... 
							
							
							
							* fix: update transformers version in example of glm4
* fix: textual adjustments
* fix: texual adjustment 
							
						 
						
							2024-11-27 15:02:25 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								effb9bb41c 
								
							 
						 
						
							
							
								
								Small update to LangChain examples readme ( #12452 )  
							
							 
							
							
							
						 
						
							2024-11-27 14:02:25 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Chu,Youcheng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								acd77d9e87 
								
							 
						 
						
							
							
								
								Remove env variable BIGDL_LLM_XMX_DISABLED in documentation ( #12445 )  
							
							 
							
							... 
							
							
							
							* fix: remove BIGDL_LLM_XMX_DISABLED in mddocs
* fix: remove set SYCL_CACHE_PERSISTENT=1 in example
* fix: remove BIGDL_LLM_XMX_DISABLED in workflows
* fix: merge igpu and A-series Graphics
* fix: remove set BIGDL_LLM_XMX_DISABLED=1 in example
* fix: remove BIGDL_LLM_XMX_DISABLED in workflows
* fix: merge igpu and A-series Graphics
* fix: textual adjustment
* fix: textual adjustment
* fix: textual adjustment 
							
						 
						
							2024-11-27 11:16:36 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f8c2bb2943 
								
							 
						 
						
							
							
								
								[NPU] optimize qwen2 prefill performance for C++ ( #12451 )  
							
							 
							
							
							
						 
						
							2024-11-27 10:46:18 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Guancheng Fu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								8331875f34 
								
							 
						 
						
							
							
								
								Fix ( #12390 )  
							
							 
							
							
							
						 
						
							2024-11-27 10:41:58 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jun Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								cb7b08948b 
								
							 
						 
						
							
							
								
								update vllm-docker-quick-start for vllm0.6.2 ( #12392 )  
							
							 
							
							... 
							
							
							
							* update vllm-docker-quick-start for vllm0.6.2
* [UPDATE] rm max-num-seqs parameter in vllm-serving script 
							
						 
						
							2024-11-27 08:47:03 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7b40f9b372 
								
							 
						 
						
							
							
								
								[NPU] Support GW for NPU C++ ( #12450 )  
							
							 
							
							
							
						 
						
							2024-11-26 17:46:40 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jin, Qiao 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c2efa264d9 
								
							 
						 
						
							
							
								
								Update LangChain examples to use upstream ( #12388 )  
							
							 
							
							... 
							
							
							
							* Update LangChain examples to use upstream
* Update README and fix links
* Update LangChain CPU examples to use upstream
* Update LangChain CPU voice_assistant example
* Update CPU README
* Update GPU README
* Remove GPU Langchain vLLM example and fix comments
* Change langchain -> LangChain
* Add reference for both upstream llms and embeddings
* Fix comments
* Fix comments
* Fix comments
* Fix comments
* Fix comment 
							
						 
						
							2024-11-26 16:43:15 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								24b46b2b19 
								
							 
						 
						
							
							
								
								[NPU] further fix  of qwen2 int8 pipeline & C++ ( #12449 )  
							
							 
							
							... 
							
							
							
							* fix
* fix style 
							
						 
						
							2024-11-26 16:39:39 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								303b104c10 
								
							 
						 
						
							
							
								
								Fix abnormal output for Qwen2-7B when sym_int8 ( #12446 )  
							
							 
							
							
							
						 
						
							2024-11-26 15:53:04 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Pepijn de Vos 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								71e1f11aa6 
								
							 
						 
						
							
							
								
								update serving image runtime ( #12433 )  
							
							 
							
							
							
						 
						
							2024-11-26 14:55:30 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								52c17fe104 
								
							 
						 
						
							
							
								
								Optimize first token of C++ NPU by adding npu_dpu_groups ( #12443 )  
							
							 
							
							... 
							
							
							
							* add npu_dpu_groups
* add check for env
* fix style 
							
						 
						
							2024-11-26 11:41:32 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jinhe 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								66bd7abae4 
								
							 
						 
						
							
							
								
								add sdxl and lora-lcm optimization ( #12444 )  
							
							 
							
							... 
							
							
							
							* add sdxl and lora-lcm optimization
* fix openjourney speed drop 
							
						 
						
							2024-11-26 11:38:09 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								0e23bd779f 
								
							 
						 
						
							
							
								
								Add support of llama3.2 for NPU C++ ( #12442 )  
							
							 
							
							... 
							
							
							
							* initial support of  llama3.2
* update
* update
* fix style
* fix style
* fix
* small fix 
							
						 
						
							2024-11-26 09:26:55 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								cdd41f5e4c 
								
							 
						 
						
							
							
								
								optimize sdxl again ( #12441 )  
							
							 
							
							
							
						 
						
							2024-11-25 17:46:46 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								b9abb8a285 
								
							 
						 
						
							
							
								
								Support qwen2.5 3B for NPU & update related examples ( #12438 )  
							
							 
							
							... 
							
							
							
							* update qwen2.5-3B
* update convert
* small fix
* replace load_in_low_bit with low_bit
* small fix 
							
						 
						
							2024-11-25 16:38:31 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jinhe 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								b633fbf26c 
								
							 
						 
						
							
							
								
								add chinese prompt troubleshooting for npu cpp examples ( #12437 )  
							
							 
							
							... 
							
							
							
							* add chinese prompt troubleshooting
* add chinese prompt troubleshooting 
							
						 
						
							2024-11-25 15:28:47 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								8164aed802 
								
							 
						 
						
							
							
								
								small change ( #12439 )  
							
							 
							
							
							
						 
						
							2024-11-25 14:35:49 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								be132c4209 
								
							 
						 
						
							
							
								
								fix and optimize sd ( #12436 )  
							
							 
							
							
							
						 
						
							2024-11-25 14:09:48 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f41405368a 
								
							 
						 
						
							
							
								
								Support minicpm for NPU C++ ( #12434 )  
							
							 
							
							... 
							
							
							
							* support minicpm-1b
* update
* tune fused_layers
* update readme.md 
							
						 
						
							2024-11-25 10:42:02 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								0819fad34e 
								
							 
						 
						
							
							
								
								support Llama2-7B / Llama3-8B for NPU C++ ( #12431 )  
							
							 
							
							... 
							
							
							
							* support llama2
* update
* support fused_layers=4 for Llama2-7B 
							
						 
						
							2024-11-22 18:47:19 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								4ffa6c752c 
								
							 
						 
						
							
							
								
								New convert support for C++ NPU ( #12430 )  
							
							 
							
							... 
							
							
							
							* initial commit
* fix
* fix style
* fix style
* fix
* fix 
							
						 
						
							2024-11-22 14:28:30 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Shaojun Liu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c089b6c10d 
								
							 
						 
						
							
							
								
								Update english prompt to 34k ( #12429 )  
							
							 
							
							
							
						 
						
							2024-11-22 11:20:35 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								e61ae88c5b 
								
							 
						 
						
							
							
								
								Upgrade denpendency for xpu_lnl and xpu_arl option ( #12424 )  
							
							 
							
							
							
						 
						
							2024-11-21 18:37:15 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								2935e97610 
								
							 
						 
						
							
							
								
								small fix of cpp readme( #12425 )  
							
							 
							
							
							
						 
						
							2024-11-21 18:21:34 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								8fdc36c140 
								
							 
						 
						
							
							
								
								Optimize with new batch kernel when batch_size=1 on LNL ( #12419 )  
							
							 
							
							... 
							
							
							
							* Add use batch kernel condition for LNL
* Fix for other device judgement
* Fix based on comment 
							
						 
						
							2024-11-21 16:21:35 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jinhe 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7e0a840f74 
								
							 
						 
						
							
							
								
								add optimization to openjourney ( #12423 )  
							
							 
							
							... 
							
							
							
							* add optimization to openjourney
* add optimization to openjourney 
							
						 
						
							2024-11-21 15:23:51 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								145e8b480f 
								
							 
						 
						
							
							
								
								update batch kernel condition ( #12421 )  
							
							 
							
							
							
						 
						
							2024-11-21 10:12:46 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7288c759ce 
								
							 
						 
						
							
							
								
								Initial NPU C++ Example ( #12417 )  
							
							 
							
							... 
							
							
							
							* temp save
* meet review, update
* update
* meet review, add license
* typo 
							
						 
						
							2024-11-21 10:09:26 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jinhe 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								d2a37b6ab2 
								
							 
						 
						
							
							
								
								add Stable diffusion examples ( #12418 )  
							
							 
							
							... 
							
							
							
							* add openjourney example
* add timing
* add stable diffusion to model page
* 4.1 fix
* small fix 
							
						 
						
							2024-11-20 17:18:36 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								54c62feb74 
								
							 
						 
						
							
							
								
								[NPU] dump prefill IR for further C++ solution ( #12402 )  
							
							 
							
							... 
							
							
							
							* save prefill ir
* fix
* shorten convert time
* fix
* fix
* fix
* fix
* fix style
* dump config.json
* meet review
* small fix 
							
						 
						
							2024-11-20 15:20:05 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Wang, Jian4 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								1bfcbc0640 
								
							 
						 
						
							
							
								
								Add multimodal benchmark  ( #12415 )  
							
							 
							
							... 
							
							
							
							* add benchmark multimodal
* update
* update
* update 
							
						 
						
							2024-11-20 14:21:13 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									SONG Ge 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ff3f7cb25f 
								
							 
						 
						
							
							
								
								Fix speech_paraformer issue with unexpected changes ( #12416 )  
							
							 
							
							... 
							
							
							
							* Fix speech_paraformer issue with unexpected changes
* Add paraformer version specified 
							
						 
						
							2024-11-19 15:01:20 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									joan726 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a9cb70a71c 
								
							 
						 
						
							
							
								
								Add install_windows_gpu.zh-CN.md and install_linux_gpu.zh-CN.md ( #12409 )  
							
							 
							
							... 
							
							
							
							* Add install_linux_gpu.zh-CN.md
* Add install_windows_gpu.zh-CN.md
* Update llama_cpp_quickstart.zh-CN.md
Related links updated to zh-CN version.
* Update install_linux_gpu.zh-CN.md
Added link to English version.
* Update install_windows_gpu.zh-CN.md
Add the link to English version.
* Update install_windows_gpu.md
Add the link to CN version.
* Update install_linux_gpu.md
Add the link to CN version.
* Update README.zh-CN.md
Modified the related link to zh-CN version. 
							
						 
						
							2024-11-19 14:39:53 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Guancheng Fu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								d6057f6dd2 
								
							 
						 
						
							
							
								
								Update benchmark_vllm_throughput.py ( #12414 )  
							
							 
							
							
							
						 
						
							2024-11-19 10:41:43 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a69395f31f 
								
							 
						 
						
							
							
								
								Support performance mode of GLM4 model ( #12401 )  
							
							 
							
							... 
							
							
							
							* Initial support of prepare generation args for transformers 445
* Small fix to chatglm4 model optimization
* Small fix
* fix glm4 position id
* fix glm4 error
* Small change in conditon & fix based on comments
* Style fixes
---------
Co-authored-by: cyita <yitastudy@gmail.com> 
							
						 
						
							2024-11-18 18:46:52 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Song Fuchang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								d2c821d458 
								
							 
						 
						
							
							
								
								Add missing arguments in pipeline parallel generate method ( #12142 )  
							
							 
							
							... 
							
							
							
							Add two arguments: negative_prompt_ids and negative_prompt_attention_mask to generate method in pipeline_parallel.py. 
							
						 
						
							2024-11-18 13:50:18 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								3d5fbf2069 
								
							 
						 
						
							
							
								
								update batch kernel condition ( #12408 )  
							
							 
							
							
							
						 
						
							2024-11-15 13:47:05 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								6c5e8fc70c 
								
							 
						 
						
							
							
								
								fix again ( #12407 )  
							
							 
							
							
							
						 
						
							2024-11-15 11:57:58 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								fcc0fa7316 
								
							 
						 
						
							
							
								
								fix workflow again ( #12406 )  
							
							 
							
							... 
							
							
							
							* fix again
* fix name 
							
						 
						
							2024-11-15 11:01:35 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								d1cde7fac4 
								
							 
						 
						
							
							
								
								Tiny doc fix ( #12405 )  
							
							 
							
							
							
						 
						
							2024-11-15 10:28:38 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								548dec5185 
								
							 
						 
						
							
							
								
								fix npu pipeline workflow ( #12404 )  
							
							 
							
							
							
						 
						
							2024-11-15 10:01:33 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								d4d949443f 
								
							 
						 
						
							
							
								
								[NPU] change attention_mask to fp16 ( #12400 )  
							
							 
							
							
							
						 
						
							2024-11-14 17:20:29 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Qiyuan Gong 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7e50ff113c 
								
							 
						 
						
							
							
								
								Add padding_token=eos_token for GPU trl QLora example ( #12398 )  
							
							 
							
							... 
							
							
							
							* Avoid tokenizer doesn't have a padding token error. 
							
						 
						
							2024-11-14 10:51:30 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									SONG Ge 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								d2cbcb060c 
								
							 
						 
						
							
							
								
								Add initial support for modeling_xlm encoder on NPU ( #12393 )  
							
							 
							
							... 
							
							
							
							* Add initial support for modeling_xlm encoder on NPU
* Add EmbeddingModel class to keep the same usage with bce and npu fp16 linear convert
* Optimize currently implementation to support EmbeddingModel.encode API and convert other torch modules to NPU
* Add related example and documents 
							
						 
						
							2024-11-14 10:50:27 +08:00