ipex-llm/python
Ruonan Wang c7741c4e84 LLM: update moe block convert to optimize rest token latency of Mixtral (#9669)
* update moe block convert

* further accelerate final_hidden_states

* fix style

* fix style
2023-12-13 16:17:06 +08:00
..
llm LLM: update moe block convert to optimize rest token latency of Mixtral (#9669) 2023-12-13 16:17:06 +08:00