* Move embedding layer to CPU for iGPU llm inference * Empty cache after to cpu * Remove empty cache as it seems to have some negative effect to first token |
||
|---|---|---|
| .. | ||
| llm | ||
| __init__.py | ||
* Move embedding layer to CPU for iGPU llm inference * Empty cache after to cpu * Remove empty cache as it seems to have some negative effect to first token |
||
|---|---|---|
| .. | ||
| llm | ||
| __init__.py | ||