Wang, Jian4
|
9c15abf825
|
Refactor fastapi-serving and add one card serving(#11581)
* init fastapi-serving one card
* mv api code to source
* update worker
* update for style-check
* add worker
* update bash
* update
* update worker name and add readme
* rename update
* rename to fastapi
|
2024-07-17 11:12:43 +08:00 |
|
Xiangyu Tian
|
7f5111a998
|
LLM: Refine start script for Pipeline Parallel Serving (#11557)
Refine start script and readme for Pipeline Parallel Serving
|
2024-07-11 15:45:27 +08:00 |
|
Xiangyu Tian
|
7d8bc83415
|
LLM: Partial Prefilling for Pipeline Parallel Serving (#11457)
LLM: Partial Prefilling for Pipeline Parallel Serving
|
2024-07-05 13:10:35 +08:00 |
|
binbin Deng
|
987017ef47
|
Update pipeline parallel serving for more model support (#11428)
|
2024-06-27 18:21:01 +08:00 |
|
Xiangyu Tian
|
8ddae22cfb
|
LLM: Refactor Pipeline-Parallel-FastAPI example (#11319)
Initially Refactor for Pipeline-Parallel-FastAPI example
|
2024-06-25 13:30:36 +08:00 |
|
Xiangyu Tian
|
4359ab3172
|
LLM: Add /generate_stream endpoint for Pipeline-Parallel-FastAPI example (#11187)
Add /generate_stream and OpenAI-formatted endpoint for Pipeline-Parallel-FastAPI example
|
2024-06-14 15:15:32 +08:00 |
|
Xiangyu Tian
|
2299698b45
|
Refine Pipeline Parallel FastAPI example (#11168)
|
2024-05-29 17:16:50 +08:00 |
|
Xiangyu Tian
|
5c8ccf0ba9
|
LLM: Add Pipeline-Parallel-FastAPI example (#10917)
Add multi-stage Pipeline-Parallel-FastAPI example
---------
Co-authored-by: hzjane <a1015616934@qq.com>
|
2024-05-27 14:46:29 +08:00 |
|