* add attn mask for first token * fix * fix * change attn calculation * fix * fix * fix style * fix style