* initial * add logic for handling tensor parallel models * fix * Add some comments * add doc * fix done