* optimize kv_cache to support beam_search on Arc * correctness test update * fix query_length issue * simplify implementation * only enable the optimization on gpu device * limit the beam_search support only enabled with gpu device and batch_size > 1 * add comments for beam_search case and revert ut change * meet comments * add more comments to describe the differece between multi-cases |
||
|---|---|---|
| .. | ||
| llm | ||