Guancheng Fu
|
b6c3520748
|
Remove xformers from vLLM-CPU (#9535)
|
2023-11-27 11:21:25 +08:00 |
|
binbin Deng
|
2b9c7d2a59
|
LLM: quick fix alpaca qlora finetuning script (#9534)
|
2023-11-27 11:04:27 +08:00 |
|
binbin Deng
|
6bec0faea5
|
LLM: support Mistral AWQ models (#9520)
|
2023-11-24 16:20:22 +08:00 |
|
Jason Dai
|
b3178d449f
|
Update README.md (#9525)
|
2023-11-23 21:45:20 +08:00 |
|
Jason Dai
|
82898a4203
|
Update GPU example README (#9524)
|
2023-11-23 21:20:26 +08:00 |
|
Jason Dai
|
064848028f
|
Update README.md (#9523)
|
2023-11-23 21:16:21 +08:00 |
|
Guancheng Fu
|
bf579507c2
|
Integrate vllm (#9310)
* done
* Rename structure
* add models
* Add structure/sampling_params,sequence
* add input_metadata
* add outputs
* Add policy,logger
* add and update
* add parallelconfig back
* core/scheduler.py
* Add llm_engine.py
* Add async_llm_engine.py
* Add tested entrypoint
* fix minor error
* Fix everything
* fix kv cache view
* fix
* fix
* fix
* format&refine
* remove logger from repo
* try to add token latency
* remove logger
* Refine config.py
* finish worker.py
* delete utils.py
* add license
* refine
* refine sequence.py
* remove sampling_params.py
* finish
* add license
* format
* add license
* refine
* refine
* Refine line too long
* remove exception
* so dumb style-check
* refine
* refine
* refine
* refine
* refine
* refine
* add README
* refine README
* add warning instead error
* fix padding
* add license
* format
* format
* format fix
* Refine vllm dependency (#1)
vllm dependency clear
* fix licence
* fix format
* fix format
* fix
* adapt LLM engine
* fix
* add license
* fix format
* fix
* Moving README.md to the correct position
* Fix readme.md
* done
* guide for adding models
* fix
* Fix README.md
* Add new model readme
* remove ray-logic
* refactor arg_utils.py
* remove distributed_init_method logic
* refactor entrypoints
* refactor input_metadata
* refactor model_loader
* refactor utils.py
* refactor models
* fix api server
* remove vllm.stucture
* revert by txy 1120
* remove utils
* format
* fix license
* add bigdl model
* Refer to a specfic commit
* Change code base
* add comments
* add async_llm_engine comment
* refine
* formatted
* add worker comments
* add comments
* add comments
* fix style
* add changes
---------
Co-authored-by: xiangyuT <xiangyu.tian@intel.com>
Co-authored-by: Xiangyu Tian <109123695+xiangyuT@users.noreply.github.com>
Co-authored-by: leonardozcm <leonardo1997zcm@gmail.com>
|
2023-11-23 16:46:45 +08:00 |
|
Heyang Sun
|
48fbb1eb94
|
support ccl (MPI) distributed mode in alpaca_qlora_finetuning_cpu (#9507)
|
2023-11-23 10:58:09 +08:00 |
|
Heyang Sun
|
11fa5a8a0e
|
Fix QLoRA CPU dispatch_model issue about accelerate (#9506)
|
2023-11-23 08:41:25 +08:00 |
|
Heyang Sun
|
1453046938
|
install bigdl-llm in deepspeed cpu inference example (#9508)
|
2023-11-23 08:39:21 +08:00 |
|
binbin Deng
|
86743fb57b
|
LLM: fix transformers version in CPU finetuning example (#9511)
|
2023-11-22 15:53:07 +08:00 |
|
binbin Deng
|
1a2129221d
|
LLM: support resume from checkpoint in Alpaca QLoRA (#9502)
|
2023-11-22 13:49:14 +08:00 |
|
Ruonan Wang
|
076d106ef5
|
LLM: GPU QLoRA update to bf16 to accelerate gradient checkpointing (#9499)
* update to bf16 to accelerate gradient checkpoint
* add utils and fix ut
|
2023-11-21 17:08:36 +08:00 |
|
binbin Deng
|
b7ae572ac3
|
LLM: update Alpaca QLoRA finetuning example on GPU (#9492)
|
2023-11-21 14:22:19 +08:00 |
|
Wang, Jian4
|
c5cb3ab82e
|
LLM : Add CPU alpaca qlora example (#9469)
* init
* update xpu to cpu
* update
* update readme
* update example
* update
* add refer
* add guide to train different datasets
* update readme
* update
|
2023-11-21 09:19:58 +08:00 |
|
binbin Deng
|
96fd26759c
|
LLM: fix QLoRA finetuning example on CPU (#9489)
|
2023-11-20 14:31:24 +08:00 |
|
binbin Deng
|
3dac21ac7b
|
LLM: add more example usages about alpaca qlora on different hardware (#9458)
|
2023-11-17 09:56:43 +08:00 |
|
Heyang Sun
|
921b263d6a
|
update deepspeed install and run guide in README (#9441)
|
2023-11-17 09:11:39 +08:00 |
|
Yina Chen
|
d5263e6681
|
Add awq load support (#9453)
* Support directly loading GPTQ models from huggingface
* fix style
* fix tests
* change example structure
* address comments
* fix style
* init
* address comments
* add examples
* fix style
* fix style
* fix style
* fix style
* update
* remove
* meet comments
* fix style
---------
Co-authored-by: Yang Wang <yang3.wang@intel.com>
|
2023-11-16 14:06:25 +08:00 |
|
Ruonan Wang
|
0f82b8c3a0
|
LLM: update qlora example (#9454)
* update qlora example
* fix loss=0
|
2023-11-15 09:24:15 +08:00 |
|
Yang Wang
|
51d07a9fd8
|
Support directly loading gptq models from huggingface (#9391)
* Support directly loading GPTQ models from huggingface
* fix style
* fix tests
* change example structure
* address comments
* fix style
* address comments
|
2023-11-13 20:48:12 -08:00 |
|
Heyang Sun
|
da6bbc8c11
|
fix deepspeed dependencies to install (#9400)
* remove reductant parameter from deepspeed install
* Update install.sh
* Update install.sh
|
2023-11-13 16:42:50 +08:00 |
|
Zheng, Yi
|
9b5d0e9c75
|
Add examples for Yi-6B (#9421)
|
2023-11-13 10:53:15 +08:00 |
|
Wang, Jian4
|
ac7fbe77e2
|
Update qlora readme (#9416)
|
2023-11-12 19:29:29 +08:00 |
|
Zheng, Yi
|
0674146cfb
|
Add cpu and gpu examples of distil-whisper (#9374)
* Add distil-whisper examples
* Fixes based on comments
* Minor fixes
---------
Co-authored-by: Ariadne330 <wyn2000330@126.com>
|
2023-11-10 16:09:55 +08:00 |
|
Ziteng Zhang
|
ad81b5d838
|
Update qlora README.md (#9422)
|
2023-11-10 15:19:25 +08:00 |
|
Heyang Sun
|
b23b91407c
|
fix llm-init on deepspeed missing lib (#9419)
|
2023-11-10 13:51:24 +08:00 |
|
dingbaorong
|
36fbe2144d
|
Add CPU examples of fuyu (#9393)
* add fuyu cpu examples
* add gpu example
* add comments
* add license
* remove gpu example
* fix inference time
|
2023-11-09 15:29:19 +08:00 |
|
binbin Deng
|
54d95e4907
|
LLM: add alpaca qlora finetuning example (#9276)
|
2023-11-08 16:25:17 +08:00 |
|
binbin Deng
|
97316bbb66
|
LLM: highlight transformers version requirement in mistral examples (#9380)
|
2023-11-08 16:05:03 +08:00 |
|
Heyang Sun
|
af94058203
|
[LLM] Support CPU deepspeed distributed inference (#9259)
* [LLM] Support CPU Deepspeed distributed inference
* Update run_deepspeed.py
* Rename
* fix style
* add new codes
* refine
* remove annotated codes
* refine
* Update README.md
* refine doc and example code
|
2023-11-06 17:56:42 +08:00 |
|
Jin Qiao
|
e6b6afa316
|
LLM: add aquila2 model example (#9356)
|
2023-11-06 15:47:39 +08:00 |
|
Yining Wang
|
9377b9c5d7
|
add CodeShell CPU example (#9345)
* add CodeShell CPU example
* fix some problems
|
2023-11-03 13:15:54 +08:00 |
|
Zheng, Yi
|
63411dff75
|
Add cpu examples of WizardCoder (#9344)
* Add wizardcoder example
* Minor fixes
|
2023-11-02 20:22:43 +08:00 |
|
dingbaorong
|
2e3bfbfe1f
|
Add internlm_xcomposer cpu examples (#9337)
* add internlm-xcomposer cpu examples
* use chat
* some fixes
* add license
* address shengsheng's comments
* use demo.jpg
|
2023-11-02 15:50:02 +08:00 |
|
Jin Qiao
|
97a38958bd
|
LLM: add CodeLlama CPU and GPU examples (#9338)
* LLM: add codellama CPU pytorch examples
* LLM: add codellama CPU transformers examples
* LLM: add codellama GPU transformers examples
* LLM: add codellama GPU pytorch examples
* LLM: add codellama in readme
* LLM: add LLaVA link
|
2023-11-02 15:34:25 +08:00 |
|
Zheng, Yi
|
63b2556ce2
|
Add cpu examples of skywork (#9340)
|
2023-11-02 15:10:45 +08:00 |
|
dingbaorong
|
f855a864ef
|
add llava gpu example (#9324)
* add llava gpu example
* use 7b model
* fix typo
* add in README
|
2023-11-02 14:48:29 +08:00 |
|
Wang, Jian4
|
149146004f
|
LLM: Add qlora finetunning CPU example (#9275)
* add qlora finetunning example
* update readme
* update example
* remove merge.py and update readme
|
2023-11-02 09:45:42 +08:00 |
|
Cengguang Zhang
|
9f3d4676c6
|
LLM: Add qwen-vl gpu example (#9290)
* create qwen-vl gpu example.
* add readme.
* fix.
* change input figure and update outputs.
* add qwen-vl pytorch model gpu example.
* fix.
* add readme.
|
2023-11-01 11:01:39 +08:00 |
|
Jin Qiao
|
96f8158fe2
|
LLM: adjust dolly v2 GPU example README (#9318)
|
2023-11-01 09:50:22 +08:00 |
|
Jin Qiao
|
c44c6dc43a
|
LLM: add chatglm3 examples (#9305)
|
2023-11-01 09:50:05 +08:00 |
|
Ruonan Wang
|
d383ee8efb
|
LLM: update QLoRA example about accelerate version(#9314)
|
2023-10-31 13:54:38 +08:00 |
|
dingbaorong
|
ee5becdd61
|
use coco image in Qwen-VL (#9298)
* use coco image
* add output
* address yuwen's comments
|
2023-10-30 14:32:35 +08:00 |
|
Yang Wang
|
8838707009
|
Add deepspeed autotp example readme (#9289)
* Add deepspeed autotp example readme
* change word
|
2023-10-27 13:04:38 -07:00 |
|
dingbaorong
|
f053688cad
|
add cpu example of LLaVA (#9269)
* add LLaVA cpu example
* Small text updates
* update link
---------
Co-authored-by: Yuwen Hu <yuwen.hu@intel.com>
|
2023-10-27 18:59:20 +08:00 |
|
Zheng, Yi
|
7f2ad182fd
|
Minor Fixes of README (#9294)
|
2023-10-27 18:25:46 +08:00 |
|
Zheng, Yi
|
1bff54a378
|
Display demo.jpg n the README.md of HuggingFace Transformers Agent (#9293)
* Display demo.jpg
* remove demo.jpg
|
2023-10-27 18:00:03 +08:00 |
|
Zheng, Yi
|
a4a1dec064
|
Add a cpu example of HuggingFace Transformers Agent (use vicuna-7b-v1.5) (#9284)
* Add examples of HF Agent
* Modify folder structure and add link of demo.jpg
* Fixes of readme
* Merge applications and Applications
|
2023-10-27 17:14:12 +08:00 |
|
Guoqiong Song
|
aa319de5e8
|
Add streaming-llm using llama2 on CPU (#9265)
Enable streaming-llm to let model take infinite inputs, tested on desktop and SPR10
|
2023-10-27 01:30:39 -07:00 |
|