ipex-llm/python
Wang, Jian4 209c3501e6
LLM: Optimize qwen1.5 moe model (#10706)
* update moe block

* fix style

* enable optmize MLP

* enabel kv_cache

* enable fuse rope

* enable fused qkv

* enable flash_attention

* error sdp quantize

* use old api

* use fuse

* use xetla

* fix python style

* update moe_blocks num

* fix output error

* add cpu sdpa

* update

* update

* update
2024-04-18 14:54:05 +08:00
..
llm LLM: Optimize qwen1.5 moe model (#10706) 2024-04-18 14:54:05 +08:00