ipex-llm/python/llm/example/GPU/Long-Context
Chu,Youcheng acd77d9e87
Remove env variable BIGDL_LLM_XMX_DISABLED in documentation (#12445)
* fix: remove BIGDL_LLM_XMX_DISABLED in mddocs

* fix: remove set SYCL_CACHE_PERSISTENT=1 in example

* fix: remove BIGDL_LLM_XMX_DISABLED in workflows

* fix: merge igpu and A-series Graphics

* fix: remove set BIGDL_LLM_XMX_DISABLED=1 in example

* fix: remove BIGDL_LLM_XMX_DISABLED in workflows

* fix: merge igpu and A-series Graphics

* fix: textual adjustment

* fix: textual adjustment

* fix: textual adjustment
2024-11-27 11:16:36 +08:00
..
Chatglm3-32K Remove env variable BIGDL_LLM_XMX_DISABLED in documentation (#12445) 2024-11-27 11:16:36 +08:00
LLaMA2-32K Remove env variable BIGDL_LLM_XMX_DISABLED in documentation (#12445) 2024-11-27 11:16:36 +08:00
README.md LLM: add README.md for Long-Context examples. (#10765) 2024-04-17 15:34:59 +08:00

Running Long-Context generation using IPEX-LLM on Intel Arc™ A770 Graphics

Long-Context Generation is a critical aspect in various applications, such as document summarization, extended conversation handling, and complex question answering. Effective long-context generation can lead to more coherent and contextually relevant responses, enhancing user experience and model utility.

This folder contains examples of running long-context generation with IPEX-LLM on Intel Arc™ A770 Graphics(16GB GPU memory):

  • LLaMA2-32K: examples of running LLaMA2-32K models with INT4/FP8 precision.
  • ChatGLM3-32K: examples of running ChatGLM3-32K models with INT4/FP8 precision.

Maximum Input Length for Different Models with INT4/FP8 Precision.

  • INT4

    Model Name Low Memory Mode Maximum Input Length Output Length
    LLaMA2-7B-32K Disable 10K 512
    Enable 12K 512
    ChatGLM3-6B-32K Disable 9K 512
    Enable 10K 512
  • FP8

    Model Name Low Memory Mode Maximum Input Length Output Length
    LLaMA2-7B-32K Disable 7K 512
    Enable 9K 512
    ChatGLM3-6B-32K Disable 8K 512
    Enable 9K 512

Note: If you need to run longer input or use less memory, please set IPEX_LLM_LOW_MEM=1 to enable low memory mode, which will enable memory optimization and may slightly affect the performance.