* seperate prefill into a process
* using model.share_memory()
* might work
* worked
* use long prompt
* refactor
* cleanup
* fix bug
* clean up
* changable inter and intra process stages
* refactor
* add max output len
* fix npu_model changes that may cause generate down
* fix npu_model generate import error
* fix generare forward error
---------
Co-authored-by: sgwhat <ge.song@intel.com>