* reduce 1st token latency * update example * fix * fix style * update readme of gpu benchmark |
||
|---|---|---|
| .. | ||
| native_int4 | ||
| transformers_int4 | ||
| transformers_low_bit | ||
* reduce 1st token latency * update example * fix * fix style * update readme of gpu benchmark |
||
|---|---|---|
| .. | ||
| native_int4 | ||
| transformers_int4 | ||
| transformers_low_bit | ||