Jinhe 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7e0a840f74 
								
							 
						 
						
							
							
								
								add optimization to openjourney ( #12423 )  
							
							 
							
							... 
							
							
							
							* add optimization to openjourney
* add optimization to openjourney 
							
						 
						
							2024-11-21 15:23:51 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								145e8b480f 
								
							 
						 
						
							
							
								
								update batch kernel condition ( #12421 )  
							
							 
							
							
							
						 
						
							2024-11-21 10:12:46 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7288c759ce 
								
							 
						 
						
							
							
								
								Initial NPU C++ Example ( #12417 )  
							
							 
							
							... 
							
							
							
							* temp save
* meet review, update
* update
* meet review, add license
* typo 
							
						 
						
							2024-11-21 10:09:26 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jinhe 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								d2a37b6ab2 
								
							 
						 
						
							
							
								
								add Stable diffusion examples ( #12418 )  
							
							 
							
							... 
							
							
							
							* add openjourney example
* add timing
* add stable diffusion to model page
* 4.1 fix
* small fix 
							
						 
						
							2024-11-20 17:18:36 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								54c62feb74 
								
							 
						 
						
							
							
								
								[NPU] dump prefill IR for further C++ solution ( #12402 )  
							
							 
							
							... 
							
							
							
							* save prefill ir
* fix
* shorten convert time
* fix
* fix
* fix
* fix
* fix style
* dump config.json
* meet review
* small fix 
							
						 
						
							2024-11-20 15:20:05 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Wang, Jian4 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								1bfcbc0640 
								
							 
						 
						
							
							
								
								Add multimodal benchmark  ( #12415 )  
							
							 
							
							... 
							
							
							
							* add benchmark multimodal
* update
* update
* update 
							
						 
						
							2024-11-20 14:21:13 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									SONG Ge 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ff3f7cb25f 
								
							 
						 
						
							
							
								
								Fix speech_paraformer issue with unexpected changes ( #12416 )  
							
							 
							
							... 
							
							
							
							* Fix speech_paraformer issue with unexpected changes
* Add paraformer version specified 
							
						 
						
							2024-11-19 15:01:20 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									joan726 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a9cb70a71c 
								
							 
						 
						
							
							
								
								Add install_windows_gpu.zh-CN.md and install_linux_gpu.zh-CN.md ( #12409 )  
							
							 
							
							... 
							
							
							
							* Add install_linux_gpu.zh-CN.md
* Add install_windows_gpu.zh-CN.md
* Update llama_cpp_quickstart.zh-CN.md
Related links updated to zh-CN version.
* Update install_linux_gpu.zh-CN.md
Added link to English version.
* Update install_windows_gpu.zh-CN.md
Add the link to English version.
* Update install_windows_gpu.md
Add the link to CN version.
* Update install_linux_gpu.md
Add the link to CN version.
* Update README.zh-CN.md
Modified the related link to zh-CN version. 
							
						 
						
							2024-11-19 14:39:53 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Guancheng Fu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								d6057f6dd2 
								
							 
						 
						
							
							
								
								Update benchmark_vllm_throughput.py ( #12414 )  
							
							 
							
							
							
						 
						
							2024-11-19 10:41:43 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a69395f31f 
								
							 
						 
						
							
							
								
								Support performance mode of GLM4 model ( #12401 )  
							
							 
							
							... 
							
							
							
							* Initial support of prepare generation args for transformers 445
* Small fix to chatglm4 model optimization
* Small fix
* fix glm4 position id
* fix glm4 error
* Small change in conditon & fix based on comments
* Style fixes
---------
Co-authored-by: cyita <yitastudy@gmail.com> 
							
						 
						
							2024-11-18 18:46:52 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Song Fuchang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								d2c821d458 
								
							 
						 
						
							
							
								
								Add missing arguments in pipeline parallel generate method ( #12142 )  
							
							 
							
							... 
							
							
							
							Add two arguments: negative_prompt_ids and negative_prompt_attention_mask to generate method in pipeline_parallel.py. 
							
						 
						
							2024-11-18 13:50:18 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								3d5fbf2069 
								
							 
						 
						
							
							
								
								update batch kernel condition ( #12408 )  
							
							 
							
							
							
						 
						
							2024-11-15 13:47:05 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								6c5e8fc70c 
								
							 
						 
						
							
							
								
								fix again ( #12407 )  
							
							 
							
							
							
						 
						
							2024-11-15 11:57:58 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								fcc0fa7316 
								
							 
						 
						
							
							
								
								fix workflow again ( #12406 )  
							
							 
							
							... 
							
							
							
							* fix again
* fix name 
							
						 
						
							2024-11-15 11:01:35 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								d1cde7fac4 
								
							 
						 
						
							
							
								
								Tiny doc fix ( #12405 )  
							
							 
							
							
							
						 
						
							2024-11-15 10:28:38 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								548dec5185 
								
							 
						 
						
							
							
								
								fix npu pipeline workflow ( #12404 )  
							
							 
							
							
							
						 
						
							2024-11-15 10:01:33 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								d4d949443f 
								
							 
						 
						
							
							
								
								[NPU] change attention_mask to fp16 ( #12400 )  
							
							 
							
							
							
						 
						
							2024-11-14 17:20:29 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Qiyuan Gong 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7e50ff113c 
								
							 
						 
						
							
							
								
								Add padding_token=eos_token for GPU trl QLora example ( #12398 )  
							
							 
							
							... 
							
							
							
							* Avoid tokenizer doesn't have a padding token error. 
							
						 
						
							2024-11-14 10:51:30 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									SONG Ge 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								d2cbcb060c 
								
							 
						 
						
							
							
								
								Add initial support for modeling_xlm encoder on NPU ( #12393 )  
							
							 
							
							... 
							
							
							
							* Add initial support for modeling_xlm encoder on NPU
* Add EmbeddingModel class to keep the same usage with bce and npu fp16 linear convert
* Optimize currently implementation to support EmbeddingModel.encode API and convert other torch modules to NPU
* Add related example and documents 
							
						 
						
							2024-11-14 10:50:27 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xu, Shuo 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								6726b198fd 
								
							 
						 
						
							
							
								
								Update readme & doc for the vllm upgrade to v0.6.2 ( #12399 )  
							
							 
							
							... 
							
							
							
							Co-authored-by: ATMxsp01 <shou.xu@intel.com> 
							
						 
						
							2024-11-14 10:28:15 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yina Chen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								59b01fa7d2 
								
							 
						 
						
							
							
								
								small fix ( #12397 )  
							
							 
							
							
							
						 
						
							2024-11-14 10:03:36 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								00fce5c940 
								
							 
						 
						
							
							
								
								use new q4_0 batch kernel ( #12396 )  
							
							 
							
							
							
						 
						
							2024-11-13 18:37:34 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yina Chen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								d6d63d6b84 
								
							 
						 
						
							
							
								
								[NPU] Qwen prefill attn_mask type hotfix ( #12395 )  
							
							 
							
							... 
							
							
							
							* qwen prefill attn_mask type fp16
* update 
							
						 
						
							2024-11-13 17:51:34 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yina Chen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								9220babaab 
								
							 
						 
						
							
							
								
								qwen prefill attn_mask type fp16 ( #12394 )  
							
							 
							
							
							
						 
						
							2024-11-13 17:45:26 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								1158f91648 
								
							 
						 
						
							
							
								
								Fix llava with multi-image inputs ( #12384 )  
							
							 
							
							
							
						 
						
							2024-11-13 09:27:50 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Shaojun Liu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								27152476e1 
								
							 
						 
						
							
							
								
								minor fix ( #12389 )  
							
							 
							
							
							
						 
						
							2024-11-12 22:36:43 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xu, Shuo 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								dd8964ba9c 
								
							 
						 
						
							
							
								
								changed inference-cpp/Dockerfile ( #12386 )  
							
							 
							
							... 
							
							
							
							Co-authored-by: ATMxsp01 <shou.xu@intel.com>
Co-authored-by: Shaojun Liu <61072813+liu-shaojun@users.noreply.github.com> 
							
						 
						
							2024-11-12 20:40:21 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Guancheng Fu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								0ee54fc55f 
								
							 
						 
						
							
							
								
								Upgrade to vllm 0.6.2 ( #12338 )  
							
							 
							
							... 
							
							
							
							* Initial updates for vllm 0.6.2
* fix
* Change Dockerfile to support v062
* Fix
* fix examples
* Fix
* done
* fix
* Update engine.py
* Fix Dockerfile to original path
* fix
* add option
* fix
* fix
* fix
* fix
---------
Co-authored-by: xiangyuT <xiangyu.tian@intel.com> 
							
						 
						
							2024-11-12 20:35:34 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jun Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								4376fdee62 
								
							 
						 
						
							
							
								
								Decouple the openwebui and the ollama. in inference-cpp-xpu dockerfile ( #12382 )  
							
							 
							
							... 
							
							
							
							* remove the openwebui in inference-cpp-xpu dockerfile
* update docker_cpp_xpu_quickstart.md
* add sample output in inference-cpp/readme
* remove the openwebui in main readme
* remove the openwebui in main readme 
							
						 
						
							2024-11-12 20:15:23 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Ruonan Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								6bf5a8c230 
								
							 
						 
						
							
							
								
								[NPU] Update qwen2 compile config ( #12383 )  
							
							 
							
							... 
							
							
							
							* update
* fix 
							
						 
						
							2024-11-12 16:59:44 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7a97fbb779 
								
							 
						 
						
							
							
								
								Support vpm and resampler module of minicpm-v on NPU ( #12375 )  
							
							 
							
							
							
						 
						
							2024-11-12 15:59:55 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Wang, Jian4 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								85c9279e6e 
								
							 
						 
						
							
							
								
								Update llama-cpp docker usage ( #12387 )  
							
							 
							
							
							
						 
						
							2024-11-12 15:30:17 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Shaojun Liu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								c92d76b997 
								
							 
						 
						
							
							
								
								Update oneccl-binding.patch ( #12377 )  
							
							 
							
							... 
							
							
							
							* Add files via upload
* upload oneccl-binding.patch
* Update Dockerfile 
							
						 
						
							2024-11-11 22:34:08 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								e0918934c8 
								
							 
						 
						
							
							
								
								Add fused_mlp to glm4v models ( #12378 )  
							
							 
							
							
							
						 
						
							2024-11-11 17:10:25 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								dc34e8c51f 
								
							 
						 
						
							
							
								
								optimize glm4v vision attention ( #12369 )  
							
							 
							
							
							
						 
						
							2024-11-08 17:01:57 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Qiyuan Gong 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								2dfcc36825 
								
							 
						 
						
							
							
								
								Fix trl version and padding in trl qlora example ( #12368 )  
							
							 
							
							... 
							
							
							
							* Change trl to 0.9.6
* Enable padding to avoid padding related errors. 
							
						 
						
							2024-11-08 16:05:17 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Shaojun Liu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								fad15c8ca0 
								
							 
						 
						
							
							
								
								Update fastchat demo script ( #12367 )  
							
							 
							
							... 
							
							
							
							* Update README.md
* Update vllm_docker_quickstart.md 
							
						 
						
							2024-11-08 15:42:17 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								51f7f87768 
								
							 
						 
						
							
							
								
								fix ipex 2.3 bug ( #12366 )  
							
							 
							
							
							
						 
						
							2024-11-08 13:29:15 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yina Chen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								b2e69a896c 
								
							 
						 
						
							
							
								
								[NPU] Support Baichuan groupwise & gw code refactor ( #12337 )  
							
							 
							
							... 
							
							
							
							* support minicpm 1b & qwen 1.5b gw
* support minicpm 1b
* baichuan part
* update
* support minicpm 1b & qwen 1.5b gw
* support minicpm 1b
* baichuan part
* update
* update
* update
* baichuan support
* code refactor
* remove code
* fix style
* address comments
* revert 
							
						 
						
							2024-11-08 11:42:42 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									binbin Deng 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								812d5cc32e 
								
							 
						 
						
							
							
								
								[NPU L0] Support llama3.2 in L0 pipeline ( #12361 )  
							
							 
							
							
							
						 
						
							2024-11-08 10:01:23 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xin Qiu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7ef7696956 
								
							 
						 
						
							
							
								
								update linux installation doc ( #12365 )  
							
							 
							
							... 
							
							
							
							* update linux doc
* update 
							
						 
						
							2024-11-08 09:44:58 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								8fe294e01f 
								
							 
						 
						
							
							
								
								Small fix to all-in-one benchmark ( #12362 )  
							
							 
							
							
							
						 
						
							2024-11-07 18:56:34 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yuwen Hu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								1a6cbc473f 
								
							 
						 
						
							
							
								
								Add fused mlp optimizations to glm4 models ( #12360 )  
							
							 
							
							... 
							
							
							
							* Add fused mlp to glm4 models
* Small fix 
							
						 
						
							2024-11-07 18:52:47 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xin Qiu 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								520af4e9b5 
								
							 
						 
						
							
							
								
								Update install_linux_gpu.md ( #12353 )  
							
							 
							
							
							
						 
						
							2024-11-07 16:08:01 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yishuo Wang 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ad68c56573 
								
							 
						 
						
							
							
								
								small improvement ( #12359 )  
							
							 
							
							
							
						 
						
							2024-11-07 15:57:41 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jinhe 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								71ea539351 
								
							 
						 
						
							
							
								
								Add troubleshootings for ollama and llama.cpp ( #12358 )  
							
							 
							
							... 
							
							
							
							* add ollama troubleshoot en
* zh ollama troubleshoot
* llamacpp trouble shoot
* llamacpp trouble shoot
* fix
* save gpu memory 
							
						 
						
							2024-11-07 15:49:20 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Xu, Shuo 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								ce0c6ae423 
								
							 
						 
						
							
							
								
								Update Readme for FastChat docker demo ( #12354 )  
							
							 
							
							... 
							
							
							
							* update Readme for FastChat docker demo
* update readme
* add 'Serving with FastChat' part in docs
* polish docs
---------
Co-authored-by: ATMxsp01 <shou.xu@intel.com> 
							
						 
						
							2024-11-07 15:22:42 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Yina Chen 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								d880e534d2 
								
							 
						 
						
							
							
								
								[NPU] acclib llama3.2 support groupwise ( #12355 )  
							
							 
							
							... 
							
							
							
							* change inter_pp
* add comment 
							
						 
						
							2024-11-07 11:19:55 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									Jinhe 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								79f2877413 
								
							 
						 
						
							
							
								
								add minicpm-v models to transformers_int4_npu_win api ( #12352 )  
							
							 
							
							... 
							
							
							
							* add minicpm npu
* optimize model 
							
						 
						
							2024-11-07 10:05:10 +08:00  
						
						
							 
							
							
								 
							 
							
						 
					 
				
					
						
							
								
								
									 
									SONG Ge 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								a7b66683f1 
								
							 
						 
						
							
							
								
								[NPU] Add Optimized Support for Llama3.2-1B/3B on NPU ( #12339 )  
							
							 
							
							... 
							
							
							
							* Add initial support for llama3.2-1b/3b
* move llama3.2 support into current llama_mp impl 
							
						 
						
							2024-11-06 19:21:40 +08:00