Jinhe
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								32e8362da7
								
							
						 | 
						
							
							
								
								added minicpm cpu examples (#12027)
							
							
							
							
							
							
							
							* minicpm cpu examples
* add link for minicpm-2 
							
						 | 
						
							2024-09-11 15:51:21 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Ruonan Wang
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								a0c73c26d8
								
							
						 | 
						
							
							
								
								clean NPU code (#12060)
							
							
							
							
							
							
							
							* clean code
* remove time.perf_counter() 
							
						 | 
						
							2024-09-11 15:10:35 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Wang, Jian4
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								c75f3dd874
								
							
						 | 
						
							
							
								
								vllm no padding glm4 to avoid nan error (#12062)
							
							
							
							
							
							
							
							* no padding glm4
* add codegeex 
							
						 | 
						
							2024-09-11 13:44:40 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Chu,Youcheng
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								649390c464
								
							
						 | 
						
							
							
								
								fix: textual and env variable adjustment (#12038)
							
							
							
							
							
						 | 
						
							2024-09-11 13:38:01 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Yuwen Hu
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								c94032f97e
								
							
						 | 
						
							
							
								
								Try to fix llamaindex ut again (#12061)
							
							
							
							
							
						 | 
						
							2024-09-11 12:11:04 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Shaojun Liu
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								7e1e51d91a
								
							
						 | 
						
							
							
								
								Update vllm setting (#12059)
							
							
							
							
							
							
							
							* revert
* update
* update
* update 
							
						 | 
						
							2024-09-11 11:45:08 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Wang, Jian4
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								30a8680645
								
							
						 | 
						
							
							
								
								Update for vllm one card padding (#12058)
							
							
							
							
							
						 | 
						
							2024-09-11 10:52:55 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Zijie Li
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								c5fdfde1bd
								
							
						 | 
						
							
							
								
								fix npu-model prompt (#12057)
							
							
							
							
							
						 | 
						
							2024-09-11 10:06:45 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Yuwen Hu
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								94dade9aca
								
							
						 | 
						
							
							
								
								Fix UT of ipex_llm.llamaindex (#12055)
							
							
							
							
							
						 | 
						
							2024-09-11 09:58:43 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Shaojun Liu
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								52863dd567
								
							
						 | 
						
							
							
								
								fix vllm_online_benchmark.py (#12056)
							
							
							
							
							
						 | 
						
							2024-09-11 09:45:30 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Yishuo Wang
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								d8c044e79d
								
							
						 | 
						
							
							
								
								optimize minicpm3 kv cache (#12052)
							
							
							
							
							
						 | 
						
							2024-09-10 16:51:21 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Wang, Jian4
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								5d3ab16a80
								
							
						 | 
						
							
							
								
								Add vllm glm and baichuan padding (#12053)
							
							
							
							
							
						 | 
						
							2024-09-10 15:57:28 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Guancheng Fu
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								69c8d36f16
								
							
						 | 
						
							
							
								
								Switching from vLLM v0.3.3 to vLLM 0.5.4 (#12042)
							
							
							
							
							
							
							
							* Enable single card sync engine
* enable ipex-llm optimizations for vllm
* enable optimizations for lm_head
* Fix chatglm multi-reference problem
* Remove duplicate layer
* LLM: Update vLLM to v0.5.4 (#11746)
* Enable single card sync engine
* enable ipex-llm optimizations for vllm
* enable optimizations for lm_head
* Fix chatglm multi-reference problem
* update 0.5.4 api_server
* add dockerfile
* fix
* fix
* refine
* fix
---------
Co-authored-by: gc-fu <guancheng.fu@intel.com>
* Add vllm-0.5.4 Dockerfile (#11838)
* Update BIGDL_LLM_SDP_IGNORE_MASK in start-vllm-service.sh (#11957)
* Fix vLLM not convert issues (#11817) (#11918)
* Fix not convert issues
* refine
Co-authored-by: Guancheng Fu <110874468+gc-fu@users.noreply.github.com>
* Fix glm4-9b-chat nan error on vllm 0.5.4 (#11969)
* init
* update mlp forward
* fix minicpm error in vllm 0.5.4
* fix dependabot alerts (#12008)
* Update 0.5.4 dockerfile (#12021)
* Add vllm awq loading logic (#11987)
* [ADD] Add vllm awq loading logic
* [FIX] fix the module.linear_method path
* [FIX] fix quant_config path error
* Enable Qwen padding mlp to 256 to support batch_forward (#12030)
* Enable padding mlp
* padding to 256
* update style
* Install 27191 runtime in 0.5.4 docker image (#12040)
* fix rebase error
* fix rebase error
* vLLM: format for 0.5.4 rebase (#12043)
* format
* Update model_convert.py
* Fix serving docker related modifications (#12046)
* Fix undesired modifications (#12048)
* fix
* Refine offline_inference arguments
---------
Co-authored-by: Xiangyu Tian <109123695+xiangyuT@users.noreply.github.com>
Co-authored-by: Jun Wang <thoughts.times@gmail.com>
Co-authored-by: Wang, Jian4 <61138589+hzjane@users.noreply.github.com>
Co-authored-by: liu-shaojun <johnssalyn@outlook.com>
Co-authored-by: Shaojun Liu <61072813+liu-shaojun@users.noreply.github.com> 
							
						 | 
						
							2024-09-10 15:37:43 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Ch1y0q
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								73a4360f3f
								
							
						 | 
						
							
							
								
								update lowbit path for baichuan2, qwen2, generate.py (#12051)
							
							
							
							
							
							
							
							* update lowbit path for baichuan2, qwen2, `generate.py`
* update readme 
							
						 | 
						
							2024-09-10 15:35:24 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Ruonan Wang
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								dc4af02b2a
								
							
						 | 
						
							
							
								
								Fix qwen2 1.5B NPU load error (#12049)
							
							
							
							
							
						 | 
						
							2024-09-10 14:41:18 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Yishuo Wang
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								abc370728c
								
							
						 | 
						
							
							
								
								optimize minicpm3 again (#12047)
							
							
							
							
							
						 | 
						
							2024-09-10 14:19:57 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Ch1y0q
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								f0061a9916
								
							
						 | 
						
							
							
								
								remove local import os to fix Baichuan NPU load issue (#12044)
							
							
							
							
							
						 | 
						
							2024-09-10 14:13:24 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Ruonan Wang
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								640998edea
								
							
						 | 
						
							
							
								
								update inter_pp of qwen2 (#12041)
							
							
							
							
							
						 | 
						
							2024-09-10 10:34:17 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Yishuo Wang
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								048b4590aa
								
							
						 | 
						
							
							
								
								add basic minicpm3 optimization (#12039)
							
							
							
							
							
						 | 
						
							2024-09-09 17:25:08 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Chu,Youcheng
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								16c658e732
								
							
						 | 
						
							
							
								
								LLM: add known issues to harness evaluation (#12036)
							
							
							
							
							
							
							
							* feat: 在harness中添加known issue
* fix: resolve comments
* fix: small fixes 
							
						 | 
						
							2024-09-09 14:15:42 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Yishuo Wang
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								6cedb601e4
								
							
						 | 
						
							
							
								
								remove some useless code (#12035)
							
							
							
							
							
						 | 
						
							2024-09-06 17:51:08 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									binbin Deng
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								d2e1b9aaff
								
							
						 | 
						
							
							
								
								Add input padding during prefill for qwen2-7b (#12033)
							
							
							
							
							
						 | 
						
							2024-09-06 16:39:59 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Yuwen Hu
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								f61b1785fb
								
							
						 | 
						
							
							
								
								Small update to NPU example readme (#12034)
							
							
							
							
							
							
							
							* Small update to NPU example readme
* Small fix 
							
						 | 
						
							2024-09-06 15:54:23 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Ruonan Wang
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								0d04531ae0
								
							
						 | 
						
							
							
								
								update NPU readme of Qwen2 (#12032)
							
							
							
							
							
							
							
							* update readme
* update broadcast 
							
						 | 
						
							2024-09-06 15:02:39 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Yang Wang
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								58555bd9de
								
							
						 | 
						
							
							
								
								Optimize broadcast for npu llama (#12028)
							
							
							
							
							
						 | 
						
							2024-09-06 13:28:20 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Shaojun Liu
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								e5581e6ded
								
							
						 | 
						
							
							
								
								Select the Appropriate APT Repository Based on CPU Type (#12023)
							
							
							
							
							
						 | 
						
							2024-09-05 17:06:07 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									binbin Deng
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								5b18bb3c4a
								
							
						 | 
						
							
							
								
								Add recommend version for mtl npu (#12024)
							
							
							
							
							
						 | 
						
							2024-09-05 16:28:53 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									binbin Deng
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								845e5dc89e
								
							
						 | 
						
							
							
								
								Support lm_head of minicpm-2b on NPU (#12019)
							
							
							
							
							
						 | 
						
							2024-09-05 16:19:22 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Ch1y0q
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								820f8a4554
								
							
						 | 
						
							
							
								
								add --lowbit-path option for NPU llama example (#12020)
							
							
							
							
							
							
							
							* add option" `--lowbit-path`
* add descriptions in `README.md` and formatting
* Update llama.py 
							
						 | 
						
							2024-09-05 15:31:01 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Guoqiong Song
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								8803242f5c
								
							
						 | 
						
							
							
								
								fix llama on cpu (#12018)
							
							
							
							
							
						 | 
						
							2024-09-04 19:17:54 -07:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Wang, Jian4
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								b3b2cd64b4
								
							
						 | 
						
							
							
								
								Support lightweight-serving glm-4v-9b  (#11994)
							
							
							
							
							
							
							
							* enable glm-4v-9b serving
* update readme
* update for no image input 
							
						 | 
						
							2024-09-05 09:25:08 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Shaojun Liu
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								75b19f8522
								
							
						 | 
						
							
							
								
								revert actions/download-artifact version to 3 (#12017)
							
							
							
							
							
						 | 
						
							2024-09-04 22:39:07 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Shaojun Liu
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								c6348a4666
								
							
						 | 
						
							
							
								
								Update action.yml (#12016)
							
							
							
							
							
						 | 
						
							2024-09-04 22:12:24 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Yishuo Wang
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								b1408a1f1c
								
							
						 | 
						
							
							
								
								fix UT (#12005)
							
							
							
							
							
						 | 
						
							2024-09-04 18:02:49 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Shaojun Liu
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								77cb348220
								
							
						 | 
						
							
							
								
								fix dependabot alerts (#12006)
							
							
							
							
							
							
							
							* fix dependabot alerts
* update 
							
						 | 
						
							2024-09-04 17:13:45 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Wang, Jian4
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								2b993ad479
								
							
						 | 
						
							
							
								
								vllm update for glm-4 model automatic not_convert (#12003)
							
							
							
							
							
						 | 
						
							2024-09-04 13:50:32 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Ruonan Wang
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								9eaff5e47d
								
							
						 | 
						
							
							
								
								add save &  load support for NPU optimized model (#11999)
							
							
							
							
							
							
							
							* add save &  load support
* fix style 
							
						 | 
						
							2024-09-03 20:53:22 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Yuwen Hu
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								6eb55653ba
								
							
						 | 
						
							
							
								
								Performance mode strategy update for input_embeds input (#11997)
							
							
							
							
							
						 | 
						
							2024-09-03 17:46:16 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Jinhe
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								164f47adbd
								
							
						 | 
						
							
							
								
								MiniCPM-V-2 & MiniCPM-Llama3-V-2_5 example updates (#11988)
							
							
							
							
							
							
							
							* minicpm example updates
* --stream 
							
						 | 
						
							2024-09-03 17:02:06 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Jin, Qiao
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								2e54f4402b
								
							
						 | 
						
							
							
								
								Rename MiniCPM-V-2_6 CPU example (#11998)
							
							
							
							
							
						 | 
						
							2024-09-03 16:50:42 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Yuwen Hu
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								643458d8f0
								
							
						 | 
						
							
							
								
								Update GraphRAG QuickStart (#11995)
							
							
							
							
							
							
							
							* Update GraphRAG QuickStart
* Further updates
* Small fixes
* Small fix 
							
						 | 
						
							2024-09-03 15:52:08 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									binbin Deng
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								01099f08ee
								
							
						 | 
						
							
							
								
								Revert prefill logic of qwen2-7b (#11992)
							
							
							
							
							
						 | 
						
							2024-09-03 14:45:01 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Yuwen Hu
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								659d15defc
								
							
						 | 
						
							
							
								
								Fix wrong attention mask and garbage output for inputs_embeds inputs during lookup generation (#11989)
							
							
							
							
							
							
							
							* Fix garbage output for input_embeds inputs during lookup generation
* Fix on sliding windows
* Simplify code 
							
						 | 
						
							2024-09-02 19:09:12 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									binbin Deng
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								2f3d1bd0ec
								
							
						 | 
						
							
							
								
								hotfix qwen2-7b weight setting (#11991)
							
							
							
							
							
						 | 
						
							2024-09-02 18:11:08 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									binbin Deng
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								a40ea7038d
								
							
						 | 
						
							
							
								
								Fix AttributeError of qwen2-1.5B (#11990)
							
							
							
							
							
						 | 
						
							2024-09-02 17:55:10 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Yang Wang
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								c48817bd43
								
							
						 | 
						
							
							
								
								Support Qwen2-7b MLP in int4 and transpose_value_cache=True (#11968)
							
							
							
							
							
						 | 
						
							2024-09-02 14:37:44 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Jin, Qiao
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								65e281bb29
								
							
						 | 
						
							
							
								
								Add MiniCPM-V cpu example (#11975)
							
							
							
							
							
							
							
							* Add MiniCPM-V cpu example
* fix
* fix
* fix
* fix 
							
						 | 
						
							2024-09-02 10:17:57 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Ruonan Wang
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								79978e6f36
								
							
						 | 
						
							
							
								
								update npu multimodal readme (#11979)
							
							
							
							
							
							
							
							* update npu readme of multimodal
* small fix
* meet comment 
							
						 | 
						
							2024-08-30 19:02:06 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Ruonan Wang
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								4811a490ef
								
							
						 | 
						
							
							
								
								small fix (#11978)
							
							
							
							
							
							
							
							* fix
* meet comment 
							
						 | 
						
							2024-08-30 17:55:15 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Ruonan Wang
								
							 
						 | 
						
							
							
								
								
							
							
							
								
							
							
								573c20bae6
								
							
						 | 
						
							
							
								
								fix npu lm_head cpu condition (#11976)
							
							
							
							
							
							
							
							* fix
* fix
* fix
* fix stype
* fix style
* fix style 
							
						 | 
						
							2024-08-30 17:11:26 +08:00 | 
						
						
							
							
							
								
							
							
						 |