ipex-llm/python
Yuwen Hu a0150bb205 [LLM] Move embedding layer to CPU for iGPU inference (#9343)
* Move embedding layer to CPU for iGPU llm inference

* Empty cache after to cpu

* Remove empty cache as it seems to have some negative effect to first token
2023-11-03 11:13:45 +08:00
..
llm [LLM] Move embedding layer to CPU for iGPU inference (#9343) 2023-11-03 11:13:45 +08:00