* add benchmark tool for gpu * update
* add benchmark utils * fix * fix bug and add readme * hidden latency data