Commit graph

10 commits

Author SHA1 Message Date
binbin Deng
987017ef47
Update pipeline parallel serving for more model support (#11428) 2024-06-27 18:21:01 +08:00
binbin Deng
e473b8d946
Add more qwen1.5 and qwen2 support for pipeline parallel inference (#11423) 2024-06-25 15:49:32 +08:00
Xiangyu Tian
8ddae22cfb
LLM: Refactor Pipeline-Parallel-FastAPI example (#11319)
Initially Refactor for Pipeline-Parallel-FastAPI example
2024-06-25 13:30:36 +08:00
SONG Ge
0c67639539
Add more examples for pipeline parallel inference (#11372)
* add more model exampels for pipelien parallel inference

* add mixtral and vicuna models

* add yi model and past_kv supprot for chatglm family

* add docs

* doc update

* add license

* update
2024-06-21 17:55:16 +08:00
binbin Deng
4ba82191f2
Support PP inference for chatglm3 (#11375) 2024-06-21 09:59:01 +08:00
binbin Deng
e50c890e1f
Support finishing PP inference once eos_token_id is found (#11336) 2024-06-18 09:55:40 +08:00
SONG Ge
ef4b6519fb
Add phi-3 model support for pipeline parallel inference (#11334)
* add phi-3 model support

* add phi3 example
2024-06-17 17:44:24 +08:00
binbin Deng
6ea1e71af0
Update PP inference benchmark script (#11323) 2024-06-17 09:59:36 +08:00
SONG Ge
be00380f1a
Fix pipeline parallel inference past_key_value error in Baichuan (#11318)
* fix past_key_value error

* add baichuan2 example

* fix style

* update doc

* add script link in doc

* fix import error

* update
2024-06-17 09:29:32 +08:00
binbin Deng
220151e2a1
Refactor pipeline parallel multi-stage implementation (#11286) 2024-06-13 10:00:23 +08:00