* Move embedding layer to CPU for iGPU llm inference * Empty cache after to cpu * Remove empty cache as it seems to have some negative effect to first token  | 
			||
|---|---|---|
| .. | ||
| llm | ||
				* Move embedding layer to CPU for iGPU llm inference * Empty cache after to cpu * Remove empty cache as it seems to have some negative effect to first token  | 
			||
|---|---|---|
| .. | ||
| llm | ||