ipex-llm/python
SONG Ge 284e7697b1 [LLM] Optimize ChatGLM2 kv_cache to support beam_search on ARC (#9579)
* optimize kv_cache to support beam_search on Arc

* correctness test update

* fix query_length issue

* simplify implementation

* only enable the optimization on gpu device

* limit the beam_search support only enabled with gpu device and batch_size > 1

* add comments for beam_search case and revert ut change

* meet comments

* add more comments to describe the differece between multi-cases
2023-12-13 11:02:14 +08:00
..
llm [LLM] Optimize ChatGLM2 kv_cache to support beam_search on ARC (#9579) 2023-12-13 11:02:14 +08:00