Chen, Zhentao
								
							 
						 | 
						
							
							
							
							
								
							
							
								4d7d5d4c59
								
							
						 | 
						
							
							
								
								Add 3 leaderboard tasks (#9566)
							
							
							
							
							
							
							
							* update leaderboard map
* download model and dataset without overwritten
* fix task drop
* run on all available devices 
							
						 | 
						
							2023-12-01 14:01:14 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Wang, Jian4
								
							 
						 | 
						
							
							
							
							
								
							
							
								ed0dc57c6e
								
							
						 | 
						
							
							
								
								LLM: Add cpu qlora support other models guide (#9567)
							
							
							
							
							
							
							
							* use bf16 flag
* add using baichuan model
* update merge
* remove
* update 
							
						 | 
						
							2023-12-01 11:18:04 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Jason Dai
								
							 
						 | 
						
							
							
							
							
								
							
							
								bda404fc8f
								
							
						 | 
						
							
							
								
								Update readme (#9575)
							
							
							
							
							
						 | 
						
							2023-11-30 22:45:52 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Xin Qiu
								
							 
						 | 
						
							
							
							
							
								
							
							
								69c49d21f5
								
							
						 | 
						
							
							
								
								use fused rms norm (#9572)
							
							
							
							
							
							
							
							* use fused rms norm
* meet code review 
							
						 | 
						
							2023-11-30 21:47:41 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Yishuo Wang
								
							 
						 | 
						
							
							
							
							
								
							
							
								66f5b45f57
								
							
						 | 
						
							
							
								
								[LLM] add a llama2 gguf example (#9553)
							
							
							
							
							
						 | 
						
							2023-11-30 16:37:17 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Yishuo Wang
								
							 
						 | 
						
							
							
							
							
								
							
							
								7f6465518a
								
							
						 | 
						
							
							
								
								support loading llama tokenizer from gguf model (#9565)
							
							
							
							
							
						 | 
						
							2023-11-30 14:56:12 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Wang, Jian4
								
							 
						 | 
						
							
							
							
							
								
							
							
								a0a80d232e
								
							
						 | 
						
							
							
								
								LLM: Add qlora cpu distributed readme (#9561)
							
							
							
							
							
							
							
							* init readme
* add distributed guide
* update 
							
						 | 
						
							2023-11-30 13:42:30 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Chen, Zhentao
								
							 
						 | 
						
							
							
							
							
								
							
							
								c8e0c2ed48
								
							
						 | 
						
							
							
								
								Fixed dumped logs in harness (#9549)
							
							
							
							
							
							
							
							* install transformers==4.34.0
* modify output_path as a directory
* add device and task to output dir parents 
							
						 | 
						
							2023-11-30 12:47:56 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Qiyuan Gong
								
							 
						 | 
						
							
							
							
							
								
							
							
								d85a430a8c
								
							
						 | 
						
							
							
								
								Uing bigdl-llm-init instead of bigdl-nano-init (#9558)
							
							
							
							
							
							
							
							* Replace `bigdl-nano-init` with `bigdl-llm-init`.
* Install `bigdl-llm` instead of `bigdl-nano`.
* Remove nano in README. 
							
						 | 
						
							2023-11-30 10:10:29 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Yuwen Hu
								
							 
						 | 
						
							
							
							
							
								
							
							
								34503efa6a
								
							
						 | 
						
							
							
								
								Fix cpu pinned embedding (#9556)
							
							
							
							
							
						 | 
						
							2023-11-29 18:27:56 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									binbin Deng
								
							 
						 | 
						
							
							
							
							
								
							
							
								4ff2ca9d0d
								
							
						 | 
						
							
							
								
								LLM: fix loss error on Arc (#9550)
							
							
							
							
							
						 | 
						
							2023-11-29 15:16:18 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Yishuo Wang
								
							 
						 | 
						
							
							
							
							
								
							
							
								65121c7997
								
							
						 | 
						
							
							
								
								support loading q4_1/q5_0/q5_1/q8_0 gguf model (#9546)
							
							
							
							
							
						 | 
						
							2023-11-29 14:40:37 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Wang, Jian4
								
							 
						 | 
						
							
							
							
							
								
							
							
								b824754256
								
							
						 | 
						
							
							
								
								LLM: Update for cpu qlora mpirun (#9548)
							
							
							
							
							
						 | 
						
							2023-11-29 10:56:17 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Yuwen Hu
								
							 
						 | 
						
							
							
							
							
								
							
							
								5f5ca38b74
								
							
						 | 
						
							
							
								
								[LLM Doc] Fix api doc rendering error (#9542)
							
							
							
							
							
							
							
							* Fix api rendering error
* Fix python style 
							
						 | 
						
							2023-11-29 09:17:09 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Yishuo Wang
								
							 
						 | 
						
							
							
							
							
								
							
							
								a86c6e0b56
								
							
						 | 
						
							
							
								
								[LLM] support loading gguf model (#9544)
							
							
							
							
							
						 | 
						
							2023-11-28 15:51:15 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Xiangyu Tian
								
							 
						 | 
						
							
							
							
							
								
							
							
								916c338772
								
							
						 | 
						
							
							
								
								fix bugs in vllm length check (#9543)
							
							
							
							
							
						 | 
						
							2023-11-28 11:09:54 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									WeiguangHan
								
							 
						 | 
						
							
							
							
							
								
							
							
								5098bc3544
								
							
						 | 
						
							
							
								
								LLM: enable previous models (#9505)
							
							
							
							
							
							
							
							* enable previous models
* test mistral model
* for test
* run models separately
* test all models
* for test
* revert the llm_performance_test.yaml 
							
						 | 
						
							2023-11-28 10:21:07 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Zhao Changmin
								
							 
						 | 
						
							
							
							
							
								
							
							
								e7e0cd3b5e
								
							
						 | 
						
							
							
								
								CPU Pinned embedding Layer (#9538)
							
							
							
							
							
							
							
							* CPU Pinned embedding 
							
						 | 
						
							2023-11-28 09:46:31 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Guancheng Fu
								
							 
						 | 
						
							
							
							
							
								
							
							
								963a5c8d79
								
							
						 | 
						
							
							
								
								Add vLLM-XPU version's README/examples (#9536)
							
							
							
							
							
							
							
							* test
* test
* fix last kv cache
* add xpu readme
* remove numactl for xpu example
* fix link error
* update max_num_batched_tokens logic
* add explaination
* add xpu environement version requirement
* refine gpu memory
* fix
* fix style 
							
						 | 
						
							2023-11-28 09:44:03 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Guancheng Fu
								
							 
						 | 
						
							
							
							
							
								
							
							
								b6c3520748
								
							
						 | 
						
							
							
								
								Remove xformers from vLLM-CPU (#9535)
							
							
							
							
							
						 | 
						
							2023-11-27 11:21:25 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									binbin Deng
								
							 
						 | 
						
							
							
							
							
								
							
							
								2b9c7d2a59
								
							
						 | 
						
							
							
								
								LLM: quick fix alpaca qlora finetuning script (#9534)
							
							
							
							
							
						 | 
						
							2023-11-27 11:04:27 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Yuwen Hu
								
							 
						 | 
						
							
							
							
							
								
							
							
								11fa3de290
								
							
						 | 
						
							
							
								
								Add sutup support of win gpu for bigdl-llm (#9512)
							
							
							
							
							
						 | 
						
							2023-11-24 17:49:21 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Chen, Zhentao
								
							 
						 | 
						
							
							
							
							
								
							
							
								45820cf3b9
								
							
						 | 
						
							
							
								
								add optimize model option (#9530)
							
							
							
							
							
						 | 
						
							2023-11-24 17:10:49 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									binbin Deng
								
							 
						 | 
						
							
							
							
							
								
							
							
								6bec0faea5
								
							
						 | 
						
							
							
								
								LLM: support Mistral AWQ models (#9520)
							
							
							
							
							
						 | 
						
							2023-11-24 16:20:22 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Ruonan Wang
								
							 
						 | 
						
							
							
							
							
								
							
							
								914a5a5a27
								
							
						 | 
						
							
							
								
								LLM: fix abnormal Mistral GPU accuracy by updating rms_norm (#9529)
							
							
							
							
							
						 | 
						
							2023-11-24 15:37:50 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									SONG Ge
								
							 
						 | 
						
							
							
							
							
								
							
							
								3d24823cda
								
							
						 | 
						
							
							
								
								hot-fix mistral kv_cache (#9528)
							
							
							
							
							
						 | 
						
							2023-11-24 14:33:04 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Zhao Changmin
								
							 
						 | 
						
							
							
							
							
								
							
							
								42b7a16bc5
								
							
						 | 
						
							
							
								
								Replace torch.bmm with safe_bmm (#9519)
							
							
							
							
							
							
							
							* replace bmm with safe one
* rename args and deprecated warning 
							
						 | 
						
							2023-11-24 12:16:48 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Jason Dai
								
							 
						 | 
						
							
							
							
							
								
							
							
								b3178d449f
								
							
						 | 
						
							
							
								
								Update README.md (#9525)
							
							
							
							
							
						 | 
						
							2023-11-23 21:45:20 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Jason Dai
								
							 
						 | 
						
							
							
							
							
								
							
							
								82898a4203
								
							
						 | 
						
							
							
								
								Update GPU example README (#9524)
							
							
							
							
							
						 | 
						
							2023-11-23 21:20:26 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Jason Dai
								
							 
						 | 
						
							
							
							
							
								
							
							
								064848028f
								
							
						 | 
						
							
							
								
								Update README.md (#9523)
							
							
							
							
							
						 | 
						
							2023-11-23 21:16:21 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Ruonan Wang
								
							 
						 | 
						
							
							
							
							
								
							
							
								b63aae8a8e
								
							
						 | 
						
							
							
								
								LLM: add flash attention support for llama (#9518)
							
							
							
							
							
							
							
							* add initial flash attention for llama
* accelerate fp32 first token by changing to fp16 in advance
* support fp32 
							
						 | 
						
							2023-11-23 18:40:18 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Guancheng Fu
								
							 
						 | 
						
							
							
							
							
								
							
							
								bf579507c2
								
							
						 | 
						
							
							
								
								Integrate vllm (#9310)
							
							
							
							
							
							
							
							* done
* Rename structure
* add models
* Add structure/sampling_params,sequence
* add input_metadata
* add outputs
* Add policy,logger
* add and update
* add parallelconfig back
* core/scheduler.py
* Add llm_engine.py
* Add async_llm_engine.py
* Add tested entrypoint
* fix minor error
* Fix everything
* fix kv cache view
* fix
* fix
* fix
* format&refine
* remove logger from repo
* try to add token latency
* remove logger
* Refine config.py
* finish worker.py
* delete utils.py
* add license
* refine
* refine sequence.py
* remove sampling_params.py
* finish
* add license
* format
* add license
* refine
* refine
* Refine line too long
* remove exception
* so dumb style-check
* refine
* refine
* refine
* refine
* refine
* refine
* add README
* refine README
* add warning instead error
* fix padding
* add license
* format
* format
* format fix
* Refine vllm dependency (#1)
vllm dependency clear
* fix licence
* fix format
* fix format
* fix
* adapt LLM engine
* fix
* add license
* fix format
* fix
* Moving README.md to the correct position
* Fix readme.md
* done
* guide for adding models
* fix
* Fix README.md
* Add new model readme
* remove ray-logic
* refactor arg_utils.py
* remove distributed_init_method logic
* refactor entrypoints
* refactor input_metadata
* refactor model_loader
* refactor utils.py
* refactor models
* fix api server
* remove vllm.stucture
* revert by txy 1120
* remove utils
* format
* fix license
* add bigdl model
* Refer to a specfic commit
* Change code base
* add comments
* add async_llm_engine comment
* refine
* formatted
* add worker comments
* add comments
* add comments
* fix style
* add changes
---------
Co-authored-by: xiangyuT <xiangyu.tian@intel.com>
Co-authored-by: Xiangyu Tian <109123695+xiangyuT@users.noreply.github.com>
Co-authored-by: leonardozcm <leonardo1997zcm@gmail.com> 
							
						 | 
						
							2023-11-23 16:46:45 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Heyang Sun
								
							 
						 | 
						
							
							
							
							
								
							
							
								48fbb1eb94
								
							
						 | 
						
							
							
								
								support ccl (MPI) distributed mode in alpaca_qlora_finetuning_cpu (#9507)
							
							
							
							
							
						 | 
						
							2023-11-23 10:58:09 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Qiyuan Gong
								
							 
						 | 
						
							
							
							
							
								
							
							
								0f0c6bb631
								
							
						 | 
						
							
							
								
								[LLM] Fix Qwen registered_causal_mask is None (#9513)
							
							
							
							
							
							
							
							* Add registered_causal_mask init based on 2abd8e5777. 
							
						 | 
						
							2023-11-23 09:28:04 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Heyang Sun
								
							 
						 | 
						
							
							
							
							
								
							
							
								11fa5a8a0e
								
							
						 | 
						
							
							
								
								Fix QLoRA CPU dispatch_model issue about accelerate (#9506)
							
							
							
							
							
						 | 
						
							2023-11-23 08:41:25 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Heyang Sun
								
							 
						 | 
						
							
							
							
							
								
							
							
								1453046938
								
							
						 | 
						
							
							
								
								install bigdl-llm in deepspeed cpu inference example (#9508)
							
							
							
							
							
						 | 
						
							2023-11-23 08:39:21 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									binbin Deng
								
							 
						 | 
						
							
							
							
							
								
							
							
								86743fb57b
								
							
						 | 
						
							
							
								
								LLM: fix transformers version in CPU finetuning example (#9511)
							
							
							
							
							
						 | 
						
							2023-11-22 15:53:07 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									binbin Deng
								
							 
						 | 
						
							
							
							
							
								
							
							
								1a2129221d
								
							
						 | 
						
							
							
								
								LLM: support resume from checkpoint in Alpaca QLoRA (#9502)
							
							
							
							
							
						 | 
						
							2023-11-22 13:49:14 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Ruonan Wang
								
							 
						 | 
						
							
							
							
							
								
							
							
								139e98aa18
								
							
						 | 
						
							
							
								
								LLM: quick fix benchmark (#9509)
							
							
							
							
							
						 | 
						
							2023-11-22 10:19:57 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									WeiguangHan
								
							 
						 | 
						
							
							
							
							
								
							
							
								c2aeb4d1e8
								
							
						 | 
						
							
							
								
								del model after test (#9504)
							
							
							
							
							
						 | 
						
							2023-11-21 18:41:50 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Ruonan Wang
								
							 
						 | 
						
							
							
							
							
								
							
							
								076d106ef5
								
							
						 | 
						
							
							
								
								LLM: GPU QLoRA update to bf16 to accelerate gradient checkpointing (#9499)
							
							
							
							
							
							
							
							* update to bf16 to accelerate gradient checkpoint
* add utils and fix ut 
							
						 | 
						
							2023-11-21 17:08:36 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Cheen Hau, 俊豪
								
							 
						 | 
						
							
							
							
							
								
							
							
								3e39828420
								
							
						 | 
						
							
							
								
								Update all in one benchmark readme (#9496)
							
							
							
							
							
							
							
							* Add gperftools install to all in one benchmark readme
* Update readme 
							
						 | 
						
							2023-11-21 14:57:16 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									binbin Deng
								
							 
						 | 
						
							
							
							
							
								
							
							
								b7ae572ac3
								
							
						 | 
						
							
							
								
								LLM: update Alpaca QLoRA finetuning example on GPU (#9492)
							
							
							
							
							
						 | 
						
							2023-11-21 14:22:19 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Wang, Jian4
								
							 
						 | 
						
							
							
							
							
								
							
							
								c5cb3ab82e
								
							
						 | 
						
							
							
								
								LLM : Add CPU alpaca qlora example (#9469)
							
							
							
							
							
							
							
							* init
* update xpu to cpu
* update
* update readme
* update example
* update
* add refer
* add guide to train different datasets
* update readme
* update 
							
						 | 
						
							2023-11-21 09:19:58 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									binbin Deng
								
							 
						 | 
						
							
							
							
							
								
							
							
								96fd26759c
								
							
						 | 
						
							
							
								
								LLM: fix QLoRA finetuning example on CPU (#9489)
							
							
							
							
							
						 | 
						
							2023-11-20 14:31:24 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Xin Qiu
								
							 
						 | 
						
							
							
							
							
								
							
							
								50b01058f1
								
							
						 | 
						
							
							
								
								enable new q4_1 (#9479)
							
							
							
							
							
						 | 
						
							2023-11-17 14:58:57 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									binbin Deng
								
							 
						 | 
						
							
							
							
							
								
							
							
								3dac21ac7b
								
							
						 | 
						
							
							
								
								LLM: add more example usages about alpaca qlora on different hardware (#9458)
							
							
							
							
							
						 | 
						
							2023-11-17 09:56:43 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Heyang Sun
								
							 
						 | 
						
							
							
							
							
								
							
							
								921b263d6a
								
							
						 | 
						
							
							
								
								update deepspeed install and run guide in README (#9441)
							
							
							
							
							
						 | 
						
							2023-11-17 09:11:39 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									Zhao Changmin
								
							 
						 | 
						
							
							
							
							
								
							
							
								30abd304a7
								
							
						 | 
						
							
							
								
								LLM: Fix baichuan pre-normalize model tensor assigning issue when loading (#9481)
							
							
							
							
							
							
							
							* No need to normalized when loading 
							
						 | 
						
							2023-11-16 21:57:28 +08:00 | 
						
						
							
							
							
								
							
							
						 | 
					
				
					
						
							
								
								
									 
									WeiguangHan
								
							 
						 | 
						
							
							
							
							
								
							
							
								bc06bec90e
								
							
						 | 
						
							
							
								
								LLM: modify the script to generate html results more accurately (#9445)
							
							
							
							
							
							
							
							* modify the script to generate html results more accurately
* resolve some comments
* revert some codes 
							
						 | 
						
							2023-11-16 19:50:23 +08:00 | 
						
						
							
							
							
								
							
							
						 |