* LLM: add readme to long-context examples. * add precision. * update wording. * add GPU type. * add Long-Context example to GPU examples. * fix comments. * update max input length. * update max length. * add output length. * fix wording.
		
			
				
	
	
	
	
		
			1.7 KiB
		
	
	
	
	
	
	
	
			
		
		
	
	
			1.7 KiB
		
	
	
	
	
	
	
	
Running Long-Context generation using IPEX-LLM on Intel Arc™ A770 Graphics
Long-Context Generation is a critical aspect in various applications, such as document summarization, extended conversation handling, and complex question answering. Effective long-context generation can lead to more coherent and contextually relevant responses, enhancing user experience and model utility.
This folder contains examples of running long-context generation with IPEX-LLM on Intel Arc™ A770 Graphics(16GB GPU memory):
- LLaMA2-32K: examples of running LLaMA2-32K models with INT4/FP8 precision.
 - ChatGLM3-32K: examples of running ChatGLM3-32K models with INT4/FP8 precision.
 
Maximum Input Length for Different Models with INT4/FP8 Precision.
- 
INT4
Model Name Low Memory Mode Maximum Input Length Output Length LLaMA2-7B-32K Disable 10K 512 Enable 12K 512 ChatGLM3-6B-32K Disable 9K 512 Enable 10K 512  - 
FP8
Model Name Low Memory Mode Maximum Input Length Output Length LLaMA2-7B-32K Disable 7K 512 Enable 9K 512 ChatGLM3-6B-32K Disable 8K 512 Enable 9K 512  
Note: If you need to run longer input or use less memory, please set
IPEX_LLM_LOW_MEM=1to enable low memory mode, which will enable memory optimization and may slightly affect the performance.