ipex-llm/python
Yang Wang 9e763b049c Support running pipeline parallel inference by vertically partitioning model to different devices (#10392)
* support pipeline parallel inference

* fix logging

* remove benchmark file

* fic

* need to warmup twice

* support qwen and qwen2

* fix lint

* remove genxir

* refine
2024-03-18 13:04:45 -07:00
..
llm Support running pipeline parallel inference by vertically partitioning model to different devices (#10392) 2024-03-18 13:04:45 -07:00