ipex-llm/python/llm/example/NPU/HF-Transformers-AutoModels
Yang Wang 99b05ba1dc
separate prefill into a process (#11787)
* seperate prefill into a process

* using model.share_memory()

* might work

* worked

* use long prompt

* refactor

* cleanup

* fix bug

* clean up

* changable inter and intra process stages

* refactor

* add max output len

* fix npu_model changes that may cause generate down

* fix npu_model generate import error

* fix generare forward error

---------

Co-authored-by: sgwhat <ge.song@intel.com>
2024-08-19 17:53:36 +08:00
..
LLM separate prefill into a process (#11787) 2024-08-19 17:53:36 +08:00
Multimodal Update npu multimodal example (#11773) 2024-08-13 14:14:59 +08:00