* Optimize transformer int4 loading memory * move cast to convert * default settting low_cpu_mem_usage