* use fused qkv forward in qwen2 * support both * fix style * fix rope * remove pring * fix style * clean up