* add argument load_in_low_bit * add docs * modify gpu doc * done --------- Co-authored-by: ivy-lv11 <lvzc@lamda.nju.edu.cn>