History

Cengguang Zhang 7ec82c6042 LLM: add README.md for Long-Context examples. (#10765 ) * LLM: add readme to long-context examples. * add precision. * update wording. * add GPU type. * add Long-Context example to GPU examples. * fix comments. * update max input length. * update max length. * add output length. * fix wording.		2024-04-17 15:34:59 +08:00
..
Chatglm3-32K	Add chatglm3 long input example (#10739 )	2024-04-11 16:33:43 +08:00
LLaMA2-32K	Add chatglm3 long input example (#10739 )	2024-04-11 16:33:43 +08:00
README.md	LLM: add README.md for Long-Context examples. (#10765 )	2024-04-17 15:34:59 +08:00

README.md

Running Long-Context generation using IPEX-LLM on Intel Arc™ A770 Graphics

Long-Context Generation is a critical aspect in various applications, such as document summarization, extended conversation handling, and complex question answering. Effective long-context generation can lead to more coherent and contextually relevant responses, enhancing user experience and model utility.

This folder contains examples of running long-context generation with IPEX-LLM on Intel Arc™ A770 Graphics(16GB GPU memory):

LLaMA2-32K: examples of running LLaMA2-32K models with INT4/FP8 precision.
ChatGLM3-32K: examples of running ChatGLM3-32K models with INT4/FP8 precision.

Maximum Input Length for Different Models with INT4/FP8 Precision.

INT4

Model Name Low Memory Mode Maximum Input Length Output Length

LLaMA2-7B-32K Disable 10K 512

Enable 12K 512

ChatGLM3-6B-32K Disable 9K 512

Enable 10K 512
FP8

Model Name Low Memory Mode Maximum Input Length Output Length

LLaMA2-7B-32K Disable 7K 512

Enable 9K 512

ChatGLM3-6B-32K Disable 8K 512

Enable 9K 512

Model Name	Low Memory Mode	Maximum Input Length	Output Length
LLaMA2-7B-32K	Disable	10K	512
	Enable	12K	512
ChatGLM3-6B-32K	Disable	9K	512
	Enable	10K	512

Model Name	Low Memory Mode	Maximum Input Length	Output Length
LLaMA2-7B-32K	Disable	7K	512
	Enable	9K	512
ChatGLM3-6B-32K	Disable	8K	512
	Enable	9K	512

Note: If you need to run longer input or use less memory, please set IPEX_LLM_LOW_MEM=1 to enable low memory mode, which will enable memory optimization and may slightly affect the performance.