Yang Wang
|
9e763b049c
|
Support running pipeline parallel inference by vertically partitioning model to different devices (#10392)
* support pipeline parallel inference
* fix logging
* remove benchmark file
* fic
* need to warmup twice
* support qwen and qwen2
* fix lint
* remove genxir
* refine
|
2024-03-18 13:04:45 -07:00 |
|