[LLM] Add Text_Generation_WebUI Support (#9884)

* initially add text_generation_webui support * add env requirements install * add necessary dependencies * update for starting webui * update shared and noted to place models * update heading of part3 * meet comments * add copyright license * remove extensions * convert tutorial to windows side * add warm-up to optimize performance
2024-01-26 15:12:49 +08:00 · 2024-01-26 15:12:49 +08:00 · 421e7cee80
commit 421e7cee80
parent f0da0c131b
137 changed files with 12365 additions and 0 deletions
--- a/python/llm/example/Text-Generation-WebUI/README.md
+++ b/python/llm/example/Text-Generation-WebUI/README.md
@ -0,0 +1,92 @@
 This tutorial provides a step-by-step guide on how to use Text-Generation-WebUI to run Hugging Face transformers-based applications on BigDL-LLM.
 The WebUI is ported from [Text-Generation-WebUI](https://github.com/oobabooga/text-generation-webui).
 ## 1. Prepare the environment on Windows
 Please use a python environment management tool (we recommend using Conda) to create a python enviroment and install necessary libs.
 ### 1.1 Install BigDL-LLM
 Please see [BigDL-LLm Installation on Windows](https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#windows) for more details to install BigDL-LLM on your Client.
 ### 1.2 Install other required dependencies
 ```bash
 pip install -r requirements.txt
 ```
 Note: Text-Generation-WebUI requires transformers version >= 4.36.0
 ## 2. Start the WebUI Server
 ### 2.1 For INT4 Optimizations
 For a quick start, you may run the script as below to start WebUI directly, it will automatically optimize and accelerate LLMs using INT4 optimizations.
 ```bash
 python server.py
 ```
 ### 2.2 Optimizations for Other Percisions
 To enable optimizations for more precisions (`sym_int4`, `asym_int4`, `sym_int8`, `fp4`, `fp8`, `fp16`, `mixed_fp4`, `mixed_fp8`, etc.), you may run the command as below:
 ```bash
 python server.py --load-in-low-bit
 ```
 ### 2.3 Access the WebUI
 After the successful startup of the WebUI server, it will provide links to access the WebUI as below. Please open the public URL in your browser to access the full functionality of the WebUI.
 ```bash
 Running on local URL:  http://127.0.0.1:7860
 Running on public URL: https://your_tokens_here.gradio.live
 This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)
 ```
 ## 3. Run Models
 ### 3.1 Select the Model
 First, place your local model in `Text-Generation-WebUI/models` directory, you may also choose to download a model from Hugging Face.
 Next, please click the `Model` button to choose your model.
 ![Image text](https://github.com/intel-analytics/BigDL/blob/1df67d7927ebea0af570b09f36ce76efbf9b8bad/python/llm/example/Text-Generation-WebUI/readme_folder/image.png)
 ### 3.2 Enable BigDL-LLM Optimizations
 Text-Generation-WebUI supports multiple backends, including `Transformers`, `llama.cpp`, `BigDL-LLM`, etc. Please select the BigDL-LLM backend as below to enable low-bit optimizations.
 ![Image text](https://github.com/intel-analytics/BigDL/blob/1df67d7927ebea0af570b09f36ce76efbf9b8bad/python/llm/example/Text-Generation-WebUI/readme_folder/image-1.png)
 Then please select the device according to your device.
 ![Image text](https://github.com/intel-analytics/BigDL/blob/1df67d7927ebea0af570b09f36ce76efbf9b8bad/python/llm/example/Text-Generation-WebUI/readme_folder/image-2.png)
 ### 3.3 Load Model in Low Precision 
 One common use case of BigDL-LLM is to load a Hugging Face transformers model in low precision.
 Notes:
 -  When you start the web UI with `--load-in-4bit`, you will not be allowed to choose the quantization precision in `load-in-low-bit`. The model will be loaded with the INT4 precision as default.
 -  When you want to load model in other precisions, please run server.py with `--load-in-low-bit` parameter. You may choose the precision from the list of `load-in-low-bit` option, and the `load-in-4bit` option will be disabled.
 -  Please select the `optimize-model` and `use_cache` options to accelerate the model.
 Now you may click the `Load` button to load the model with BigDL-LLM optimizations.
 ![Image text](https://github.com/intel-analytics/BigDL/blob/1df67d7927ebea0af570b09f36ce76efbf9b8bad/python/llm/example/Text-Generation-WebUI/readme_folder/image-3.png)
 ### 3.4 Run the Model on WebUI
 Now you can do model inference on Text-Generation-WebUI with BigDL-LLM optimizations, including `Chat`, `Default` and `Notebook` Tabs. Please see [Chat-Tab Wiki](https://github.com/oobabooga/text-generation-webui/wiki/01-%E2%80%90-Chat-Tab) and [Default and Notebook Tabs Wiki](https://github.com/oobabooga/text-generation-webui/wiki/02-%E2%80%90-Default-and-Notebook-Tabs) for more details.
 ![Image text](https://github.com/intel-analytics/BigDL/blob/1df67d7927ebea0af570b09f36ce76efbf9b8bad/python/llm/example/Text-Generation-WebUI/readme_folder/image-4.png)
--- a/python/llm/example/Text-Generation-WebUI/characters/Assistant.yaml
+++ b/python/llm/example/Text-Generation-WebUI/characters/Assistant.yaml
@ -0,0 +1,4 @@
 name: AI
 greeting: How can I help you today?
 context: |
  The following is a conversation with an AI Large Language Model. The AI has been trained to answer questions, provide recommendations, and help with decision making. The AI follows user requests. The AI thinks outside the box.
--- a/python/llm/example/Text-Generation-WebUI/characters/Example.png
+++ b/python/llm/example/Text-Generation-WebUI/characters/Example.png
--- a/python/llm/example/Text-Generation-WebUI/characters/Example.yaml
+++ b/python/llm/example/Text-Generation-WebUI/characters/Example.yaml
@ -0,0 +1,17 @@
 name: Chiharu Yamada
 greeting: |-
  *Chiharu strides into the room with a smile, her eyes lighting up when she sees you. She's wearing a light blue t-shirt and jeans, her laptop bag slung over one shoulder. She takes a seat next to you, her enthusiasm palpable in the air*
  Hey! I'm so excited to finally meet you. I've heard so many great things about you and I'm eager to pick your brain about computers. I'm sure you have a wealth of knowledge that I can learn from. *She grins, eyes twinkling with excitement* Let's get started!
 context: |-
  Chiharu Yamada's Persona: Chiharu Yamada is a young, computer engineer-nerd with a knack for problem solving and a passion for technology.
  {{user}}: So how did you get into computer engineering?
  {{char}}: I've always loved tinkering with technology since I was a kid.
  {{user}}: That's really impressive!
  {{char}}: *She chuckles bashfully* Thanks!
  {{user}}: So what do you do when you're not working on computers?
  {{char}}: I love exploring, going out with friends, watching movies, and playing video games.
  {{user}}: What's your favorite type of computer hardware to work with?
  {{char}}: Motherboards, they're like puzzles and the backbone of any system.
  {{user}}: That sounds great!
  {{char}}: Yeah, it's really fun. I'm lucky to be able to do this as a job.
--- a/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-Black.woff
+++ b/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-Black.woff
--- a/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-Black.woff2
+++ b/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-Black.woff2
--- a/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-BlackItalic.woff
+++ b/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-BlackItalic.woff
--- a/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-BlackItalic.woff2
+++ b/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-BlackItalic.woff2
--- a/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-Bold.woff
+++ b/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-Bold.woff
--- a/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-Bold.woff2
+++ b/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-Bold.woff2
--- a/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-BoldItalic.woff
+++ b/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-BoldItalic.woff
--- a/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-BoldItalic.woff2
+++ b/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-BoldItalic.woff2
--- a/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-ExtraBold.woff
+++ b/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-ExtraBold.woff
--- a/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-ExtraBold.woff2
+++ b/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-ExtraBold.woff2
--- a/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-ExtraBoldItalic.woff
+++ b/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-ExtraBoldItalic.woff
--- a/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-ExtraBoldItalic.woff2
+++ b/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-ExtraBoldItalic.woff2
--- a/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-ExtraLight.woff
+++ b/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-ExtraLight.woff
--- a/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-ExtraLight.woff2
+++ b/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-ExtraLight.woff2
--- a/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-ExtraLightItalic.woff
+++ b/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-ExtraLightItalic.woff
--- a/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-ExtraLightItalic.woff2
+++ b/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-ExtraLightItalic.woff2
--- a/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-Italic.woff
+++ b/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-Italic.woff
--- a/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-Italic.woff2
+++ b/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-Italic.woff2
--- a/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-Light.woff
+++ b/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-Light.woff
--- a/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-Light.woff2
+++ b/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-Light.woff2
--- a/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-LightItalic.woff
+++ b/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-LightItalic.woff
--- a/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-LightItalic.woff2
+++ b/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-LightItalic.woff2
--- a/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-Medium.woff
+++ b/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-Medium.woff
--- a/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-Medium.woff2
+++ b/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-Medium.woff2
--- a/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-MediumItalic.woff
+++ b/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-MediumItalic.woff
--- a/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-MediumItalic.woff2
+++ b/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-MediumItalic.woff2
--- a/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-Regular.woff
+++ b/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-Regular.woff
--- a/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-Regular.woff2
+++ b/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-Regular.woff2
--- a/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-SemiBold.woff
+++ b/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-SemiBold.woff
--- a/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-SemiBold.woff2
+++ b/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-SemiBold.woff2
--- a/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-SemiBoldItalic.woff
+++ b/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-SemiBoldItalic.woff
--- a/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-SemiBoldItalic.woff2
+++ b/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-SemiBoldItalic.woff2
--- a/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-Thin.woff
+++ b/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-Thin.woff
--- a/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-Thin.woff2
+++ b/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-Thin.woff2
--- a/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-ThinItalic.woff
+++ b/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-ThinItalic.woff
--- a/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-ThinItalic.woff2
+++ b/python/llm/example/Text-Generation-WebUI/css/NotoSans/NotoSans-ThinItalic.woff2
--- a/python/llm/example/Text-Generation-WebUI/css/NotoSans/stylesheet.css
+++ b/python/llm/example/Text-Generation-WebUI/css/NotoSans/stylesheet.css
@ -0,0 +1,166 @@
 /*
 Copied from https://github.com/SillyTavern/SillyTavern/tree/6c8bd06308c69d51e2eb174541792a870a83d2d6/public/webfonts/NotoSans
 */
@font-face {
    font-family: 'Noto Sans';
    src: url('file/css/NotoSans/NotoSans-Black.woff2') format('woff2'),
        url('file/css/NotoSans/NotoSans-Black.woff') format('woff');
    font-weight: 900;
    font-style: normal;
    font-display: swap;
 }
@font-face {
    font-family: 'Noto Sans';
    src: url('file/css/NotoSans/NotoSans-ExtraBoldItalic.woff2') format('woff2'),
        url('file/css/NotoSans/NotoSans-ExtraBoldItalic.woff') format('woff');
    font-weight: bold;
    font-style: italic;
    font-display: swap;
 }
@font-face {
    font-family: 'Noto Sans';
    src: url('file/css/NotoSans/NotoSans-BlackItalic.woff2') format('woff2'),
        url('file/css/NotoSans/NotoSans-BlackItalic.woff') format('woff');
    font-weight: 900;
    font-style: italic;
    font-display: swap;
 }
@font-face {
    font-family: 'Noto Sans';
    src: url('file/css/NotoSans/NotoSans-ExtraBold.woff2') format('woff2'),
        url('file/css/NotoSans/NotoSans-ExtraBold.woff') format('woff');
    font-weight: bold;
    font-style: normal;
    font-display: swap;
 }
@font-face {
    font-family: 'Noto Sans';
    src: url('file/css/NotoSans/NotoSans-ThinItalic.woff2') format('woff2'),
        url('file/css/NotoSans/NotoSans-ThinItalic.woff') format('woff');
    font-weight: 100;
    font-style: italic;
    font-display: swap;
 }
@font-face {
    font-family: 'Noto Sans';
    src: url('file/css/NotoSans/NotoSans-BoldItalic.woff2') format('woff2'),
        url('file/css/NotoSans/NotoSans-BoldItalic.woff') format('woff');
    font-weight: bold;
    font-style: italic;
    font-display: swap;
 }
@font-face {
    font-family: 'Noto Sans';
    src: url('file/css/NotoSans/NotoSans-Bold.woff2') format('woff2'),
        url('file/css/NotoSans/NotoSans-Bold.woff') format('woff');
    font-weight: bold;
    font-style: normal;
    font-display: swap;
 }
@font-face {
    font-family: 'Noto Sans';
    src: url('file/css/NotoSans/NotoSans-LightItalic.woff2') format('woff2'),
        url('file/css/NotoSans/NotoSans-LightItalic.woff') format('woff');
    font-weight: 300;
    font-style: italic;
    font-display: swap;
 }
@font-face {
    font-family: 'Noto Sans';
    src: url('file/css/NotoSans/NotoSans-Italic.woff2') format('woff2'),
        url('file/css/NotoSans/NotoSans-Italic.woff') format('woff');
    font-weight: normal;
    font-style: italic;
    font-display: swap;
 }
@font-face {
    font-family: 'Noto Sans';
    src: url('file/css/NotoSans/NotoSans-ExtraLightItalic.woff2') format('woff2'),
        url('file/css/NotoSans/NotoSans-ExtraLightItalic.woff') format('woff');
    font-weight: 200;
    font-style: italic;
    font-display: swap;
 }
@font-face {
    font-family: 'Noto Sans';
    src: url('file/css/NotoSans/NotoSans-Light.woff2') format('woff2'),
        url('file/css/NotoSans/NotoSans-Light.woff') format('woff');
    font-weight: 300;
    font-style: normal;
    font-display: swap;
 }
@font-face {
    font-family: 'Noto Sans';
    src: url('file/css/NotoSans/NotoSans-ExtraLight.woff2') format('woff2'),
        url('file/css/NotoSans/NotoSans-ExtraLight.woff') format('woff');
    font-weight: 200;
    font-style: normal;
    font-display: swap;
 }
@font-face {
    font-family: 'Noto Sans';
    src: url('file/css/NotoSans/NotoSans-Medium.woff2') format('woff2'),
        url('file/css/NotoSans/NotoSans-Medium.woff') format('woff');
    font-weight: 500;
    font-style: normal;
    font-display: swap;
 }
@font-face {
    font-family: 'Noto Sans';
    src: url('file/css/NotoSans/NotoSans-Regular.woff2') format('woff2'),
        url('file/css/NotoSans/NotoSans-Regular.woff') format('woff');
    font-weight: normal;
    font-style: normal;
    font-display: swap;
 }
@font-face {
    font-family: 'Noto Sans';
    src: url('file/css/NotoSans/NotoSans-MediumItalic.woff2') format('woff2'),
        url('file/css/NotoSans/NotoSans-MediumItalic.woff') format('woff');
    font-weight: 500;
    font-style: italic;
    font-display: swap;
 }
@font-face {
    font-family: 'Noto Sans';
    src: url('file/css/NotoSans/NotoSans-SemiBoldItalic.woff2') format('woff2'),
        url('file/css/NotoSans/NotoSans-SemiBoldItalic.woff') format('woff');
    font-weight: 600;
    font-style: italic;
    font-display: swap;
 }
@font-face {
    font-family: 'Noto Sans';
    src: url('file/css/NotoSans/NotoSans-SemiBold.woff2') format('woff2'),
        url('file/css/NotoSans/NotoSans-SemiBold.woff') format('woff');
    font-weight: 600;
    font-style: normal;
    font-display: swap;
 }
@font-face {
    font-family: 'Noto Sans';
    src: url('file/css/NotoSans/NotoSans-Thin.woff2') format('woff2'),
        url('file/css/NotoSans/NotoSans-Thin.woff') format('woff');
    font-weight: 100;
    font-style: normal;
    font-display: swap;
 }
--- a/python/llm/example/Text-Generation-WebUI/css/chat_style-TheEncrypted777.css
+++ b/python/llm/example/Text-Generation-WebUI/css/chat_style-TheEncrypted777.css
@ -0,0 +1,133 @@
 /* All credits to TheEncrypted777: https://www.reddit.com/r/Oobabooga/comments/12xe6vq/updated_css_styling_with_color_customization_for/ */
 .message {
    display: grid;
    grid-template-columns: 60px minmax(0, 1fr);
    padding-bottom: 28px;
    font-size: 18px;
    font-family: 'Noto Sans', Arial, sans-serif;
    line-height: 1.428571429;
 }
 .circle-you,
 .circle-bot {
    background-color: gray;
    border-radius: 1rem;
    border: 2px solid white;
 }
 .circle-bot img,
 .circle-you img {
    border-radius: 10%;
    width: 100%;
    height: 100%;
    object-fit: cover;
 }
 .circle-you, .circle-bot {
    /* You can set the size of the profile images here, but if you do, you have to also adjust the .text{padding-left: 90px} to a different number according to the width of the image which is right below here */
    width: 135px;
    height: 175px;
 }
 .text {
    /* Change this to move the message box further left or right depending on the size of your profile pic */
    padding-left: 90px;
    text-shadow: 2px 2px 2px rgb(0 0 0 / 40%);
 }
 .text p {
    margin-top: 2px;
 }
 .username {
    padding-left: 10px;
    font-size: 22px;
    font-weight: bold;
    border-top: 1px solid rgb(51 64 90);
    padding: 3px;
 }
 .message-body {
    position: relative;
    border: 1px solid rgb(255 255 255 / 45.9%);
    border-radius: 10px;
    padding: 10px;
    padding-top: 5px;
    /* Message gradient background color - remove the line bellow if you don't want a background color or gradient */
    background: linear-gradient(to bottom, #171730, #1b263f);
 }
 /* Adds 2 extra lines at the top and bottom of the message */
 .message-body::before,
 .message-body::after {
    content: "";
    position: absolute;
    left: 10px;
    right: 10px;
    height: 1px;
    background-color: rgb(255 255 255 / 13%);
 }
 .message-body::before {
    top: 6px;
 }
 .message-body::after {
    bottom: 6px;
 }
 .message-body img {
    max-width: 300px;
    max-height: 300px;
    border-radius: 20px;
 }
 .message-body p {
    margin-bottom: 0 !important;
    font-size: 18px !important;
    line-height: 1.428571429 !important;
    color: rgb(243 244 246) !important;
    text-shadow: 2px 2px 2px rgb(0 0 0);
 }
 .message-body p em {
    color: rgb(138 138 138) !important;
 }
@media screen and (width <= 688px) {
    .message {
        display: grid;
        grid-template-columns: 60px minmax(0, 1fr);
        padding-bottom: 25px;
        font-size: 15px;
        font-family: 'Noto Sans', Helvetica, Arial, sans-serif;
        line-height: 1.428571429;
    }
    .circle-you, .circle-bot {
        width: 50px;
        height: 73px;
        border-radius: 0.5rem;
    }
    .circle-bot img,
    .circle-you img {
        width: 100%;
        height: 100%;
        object-fit: cover;
    }
    .text {
        padding-left: 0;
    }
    .message-body p {
        font-size: 16px !important;
    }
    .username {
        font-size: 20px;
    }
 }
--- a/python/llm/example/Text-Generation-WebUI/css/chat_style-cai-chat-square.css
+++ b/python/llm/example/Text-Generation-WebUI/css/chat_style-cai-chat-square.css
@ -0,0 +1,21 @@
@import url("file/css/chat_style-cai-chat.css");
 .circle-bot, .circle-you {
    height: 90px;
    width: 60px;
    border-radius: 10px;
    background-color: #656565;
 }
 .circle-bot img, .circle-you img {
    border-radius: 8.333px;
 }
 .circle-you {
    background-color: #656565;
 }
 .message {
    padding-bottom: 30px;
    grid-template-columns: 70px minmax(0, 1fr);
 }
--- a/python/llm/example/Text-Generation-WebUI/css/chat_style-cai-chat.css
+++ b/python/llm/example/Text-Generation-WebUI/css/chat_style-cai-chat.css
@ -0,0 +1,62 @@
 .message {
    display: grid;
    grid-template-columns: 60px minmax(0, 1fr);
    padding-bottom: 15px;
    font-size: 15px;
    font-family: 'Noto Sans', Helvetica, Arial, sans-serif;
    line-height: 22.5px !important;
 }
 .message-body {
    margin-top: 3px;
 }
 .circle-you {
    width: 50px;
    height: 50px;
    background-color: rgb(238 78 59);
    border-radius: 50%;
 }
 .circle-bot {
    width: 50px;
    height: 50px;
    background-color: rgb(59 78 244);
    border-radius: 50%;
 }
 .circle-bot img,
 .circle-you img {
    border-radius: 50%;
    width: 100%;
    height: 100%;
    object-fit: cover;
 }
 .username {
    font-weight: bold;
 }
 .message-body img {
    max-width: 300px;
    max-height: 300px;
    border-radius: 20px;
 }
 .message-body p {
    font-size: 15px !important;
    line-height: 22.5px !important;
 }
 .message-body p, .chat .message-body ul, .chat .message-body ol {
    margin-bottom: 10px !important;
 }
 .dark .message-body p em {
    color: rgb(138 138 138) !important;
 }
 .message-body p em {
    color: rgb(110 110 110) !important;
    font-weight: 500;
 }
--- a/python/llm/example/Text-Generation-WebUI/css/chat_style-messenger.css
+++ b/python/llm/example/Text-Generation-WebUI/css/chat_style-messenger.css
@ -0,0 +1,99 @@
 .message {
    padding-bottom: 25px;
    font-size: 15px;
    font-family: 'Noto Sans', Helvetica, Arial, sans-serif;
    line-height: 1.428571429;
 }
 .circle-you {
    width: 50px;
    height: 50px;
    background-color: rgb(238 78 59);
    border-radius: 50%;
 }
 .circle-bot {
    width: 50px;
    height: 50px;
    background-color: rgb(59 78 244);
    border-radius: 50%;
    float: left;
    margin-right: 10px;
    margin-top: 5px;
 }
 .circle-bot img,
 .circle-you img {
    border-radius: 50%;
    width: 100%;
    height: 100%;
    object-fit: cover;
 }
 .circle-you {
    margin-top: 5px;
    float: right;
 }
 .circle-bot + .text, .circle-you + .text {
    border-radius: 18px;
    padding: 8px 12px;
 }
 .circle-bot + .text {
    background-color: #E4E6EB;
    float: left;
 }
 .circle-you + .text {
    float: right;
    background-color: rgb(0 132 255);
    margin-right: 10px;
 }
 .circle-you + .text div, .circle-you + .text *, .dark .circle-you + .text div, .dark .circle-you + .text * {
    color: #FFF !important;
 }
 .circle-you + .text .username {
    text-align: right;
 }
 .dark .circle-bot + .text div, .dark .circle-bot + .text * {
    color: #000;
 }
 .text {
    max-width: 80%;
 }
 .text p {
    margin-top: 5px;
 }
 .username {
    font-weight: bold;
 }
 .message-body {
 }
 .message-body img {
    max-width: 300px;
    max-height: 300px;
    border-radius: 20px;
 }
 .message-body p {
    margin-bottom: 0 !important;
    font-size: 15px !important;
    line-height: 1.428571429 !important;
 }
 .dark .message-body p em {
    color: rgb(138 138 138) !important;
 }
 .message-body p em {
    color: rgb(110 110 110) !important;
 }
--- a/python/llm/example/Text-Generation-WebUI/css/chat_style-wpp.css
+++ b/python/llm/example/Text-Generation-WebUI/css/chat_style-wpp.css
@ -0,0 +1,55 @@
 .message {
    padding-bottom: 25px;
    font-size: 15px;
    font-family: 'Noto Sans', Helvetica, Arial, sans-serif;
    line-height: 1.428571429;
 }
 .text-you {
    background-color: #d9fdd3;
    border-radius: 15px;
    padding: 10px;
    padding-top: 5px;
    float: right;
 }
 .text-bot {
    background-color: #f2f2f2;
    border-radius: 15px;
    padding: 10px;
    padding-top: 5px;
 }
 .dark .text-you {
    background-color: #005c4b;
    color: #111b21;
 }
 .dark .text-bot {
    background-color: #1f2937;
    color: #111b21;
 }
 .text-bot p, .text-you p {
    margin-top: 5px;
 }
 .message-body img {
    max-width: 300px;
    max-height: 300px;
    border-radius: 20px;
 }
 .message-body p {
    margin-bottom: 0 !important;
    font-size: 15px !important;
    line-height: 1.428571429 !important;
 }
 .dark .message-body p em {
    color: rgb(138 138 138) !important;
 }
 .message-body p em {
    color: rgb(110 110 110) !important;
 }
--- a/python/llm/example/Text-Generation-WebUI/css/html_4chan_style.css
+++ b/python/llm/example/Text-Generation-WebUI/css/html_4chan_style.css
@ -0,0 +1,73 @@
 #parent #container {
    background-color: #eef2ff;
    padding: 17px;
 }
 #parent #container .reply {
    background-color: rgb(214 218 240);
    border-bottom: 1px solid rgb(183 197 217);
    border-image: none 100% 1 0 stretch;
    border-left: 0 none rgb(0 0 0);
    border-right: 1px solid rgb(183 197 217);
    color: rgb(0 0 0);
    display: table;
    font-family: arial, helvetica, sans-serif;
    font-size: 13.3333px;
    margin: 4px 0;
    overflow: hidden hidden;
    padding: 4px 2px;
 }
 #parent #container .number {
    color: rgb(0 0 0);
    font-family: arial, helvetica, sans-serif;
    font-size: 13.3333px;
    width: 342.65px;
    margin-right: 7px;
 }
 #parent #container .op {
    color: rgb(0 0 0);
    font-family: arial, helvetica, sans-serif;
    font-size: 13.3333px;
    margin: 4px 0 8px;
    overflow: hidden hidden;
 }
 #parent #container .op blockquote {
    margin-left: 0 !important;
 }
 #parent #container .name {
    color: rgb(17 119 67);
    font-family: arial, helvetica, sans-serif;
    font-size: 13.3333px;
    font-weight: 700;
    margin-left: 7px;
 }
 #parent #container .quote {
    color: rgb(221 0 0);
    font-family: arial, helvetica, sans-serif;
    font-size: 13.3333px;
    text-decoration: underline solid rgb(221 0 0);
    text-decoration-thickness: auto;
 }
 #parent #container .greentext {
    color: rgb(120 153 34);
    font-family: arial, helvetica, sans-serif;
    font-size: 13.3333px;
 }
 #parent #container blockquote {
    margin: 0 !important;
    margin-block: 1em 1em;
    margin-inline: 40px 40px;
    margin: 13.33px 40px !important;
 }
 #parent #container .message_4chan {
    color: black;
    border: none;
 }
--- a/python/llm/example/Text-Generation-WebUI/css/html_instruct_style.css
+++ b/python/llm/example/Text-Generation-WebUI/css/html_instruct_style.css
@ -0,0 +1,82 @@
 .chat {
    background: var(--block-background-fill);
    padding: 24px 19px;
    padding-right: 19px !important;
    padding-top: 0;
    border: 1px solid var(--block-border-color);
 }
 .chat > .messages {
    padding-top: 28px !important;
 }
 .message {
    display: grid;
    grid-template-columns: 60px 1fr;
    padding-bottom: 25px;
    font-size: 15px;
    font-family: 'Noto Sans', Helvetica, Arial, sans-serif;
    line-height: 24px;
 }
 .message:first-child {
    padding-top: 0;
 }
 .username {
    display: none;
 }
 .message-body p, .message-body li {
    font-size: 15px !important;
    line-height: 24px !important;
 }
 .message-body p, .chat .message-body ul, .chat .message-body ol {
    margin-bottom: 16px !important;
 }
 .message-body p:last-child, .chat .message-body ul:last-child, .chat .message-body ol:last-child {
    margin-bottom: 0 !important;
 }
 .dark .message-body p em {
    color: rgb(198 202 214) !important;
 }
 .message-body p em {
    color: rgb(110 110 110) !important;
 }
 .gradio-container .chat .assistant-message {
    padding: 20px;
    background: var(--background-fill-secondary);
    margin-top: 12px !important;
    margin-bottom: 24px !important;
    margin-right: 16px;
    border-radius: 22px;
    border-bottom-left-radius: 0;
    border: 1px solid var(--border-color-primary);
 }
 .gradio-container .chat .user-message {
    padding: 20px;
    background-color: var(--color-accent-soft);
    margin-bottom: 12px !important;
    margin-left: 16px;
    border-radius: 22px;
    border-bottom-right-radius: 0;
    border: 1px solid var(--border-color-accent-subdued);
 }
 .gradio-container .chat .assistant-message:last-child, .gradio-container .chat .user-message:last-child {
    margin-bottom: 0 !important;
 }
 code {
    background-color: #f3f4f6 !important;
 }
 .dark code {
    background-color: #1f2937 !important;
 }
--- a/python/llm/example/Text-Generation-WebUI/css/html_readable_style.css
+++ b/python/llm/example/Text-Generation-WebUI/css/html_readable_style.css
@ -0,0 +1,33 @@
 .readable-container {
    max-width: 600px;
    margin-left: auto;
    margin-right: auto;
    background-color: rgb(31 41 55);
    padding: 3em;
    word-break: break-word;
    overflow-wrap: anywhere;
    color: #efefef !important;
 }
 .readable-container p, .readable-container li {
    font-size: 16px !important;
    color: #efefef !important;
    margin-bottom: 22px;
    line-height: 1.4 !important;
 }
 .readable-container li > p {
    display: inline !important;
 }
 .readable-container code {
    overflow-x: auto;
 }
 .readable-container :not(pre) > code {
    white-space: normal !important;
 }
 .readable-container .hoverable {
    font-size: 14px;
 }
--- a/python/llm/example/Text-Generation-WebUI/css/main.css
+++ b/python/llm/example/Text-Generation-WebUI/css/main.css
@ -0,0 +1,688 @@
 .tabs.svelte-710i53 {
    margin-top: 0
 }
 .py-6 {
    padding-top: 2.5rem
 }
 .small-button {
    min-width: 0 !important;
    max-width: 171px;
    height: 39.594px;
    align-self: end;
 }
 .refresh-button {
    max-width: 4.4em;
    min-width: 2.2em !important;
    height: 39.594px;
    align-self: end;
    line-height: 1em;
    border-radius: 0.5em;
    flex: none;
 }
 .refresh-button-small {
    max-width: 2.2em;
 }
 .button_nowrap {
    white-space: nowrap;
 }
 #slim-column {
    flex: none !important;
    min-width: 0 !important;
 }
 .slim-dropdown {
    background-color: transparent !important;
    border: none !important;
    padding: 0 !important;
 }
 #download-label, #upload-label {
    min-height: 0
 }
 .dark svg {
    fill: white;
 }
 .dark a {
    color: white !important;
 }
 ol li p, ul li p {
    display: inline-block;
 }
 #chat-tab, #default-tab, #notebook-tab, #parameters, #chat-settings, #lora, #training-tab, #model-tab, #session-tab {
    border: 0;
 }
 .gradio-container-3-18-0 .prose * h1, h2, h3, h4 {
    color: white;
 }
 .gradio-container {
    max-width: 100% !important;
    padding-top: 0 !important;
 }
 #extensions {
    margin-top: 5px;
    margin-bottom: 35px;
 }
 .extension-tab {
    border: 0 !important;
 }
 span.math.inline {
    font-size: 27px;
    vertical-align: baseline !important;
 }
 div.svelte-15lo0d8 > *, div.svelte-15lo0d8 > .form > * {
    flex-wrap: nowrap;
 }
 .header_bar {
    background-color: #f7f7f7;
    box-shadow: 0 2px 3px rgba(22 22 22 / 35%);
    margin-bottom: 0;
    overflow-x: scroll;
    margin-left: calc(-1 * var(--size-4));
    margin-right: calc(-1 * var(--size-4));
    display: block !important;
    text-wrap: nowrap;
    z-index: 90;
 }
 .dark .header_bar {
    border: none !important;
    box-shadow: 0 3px 4px rgba(20 20 20 / 60%);
    background-color: #8080802b;
 }
 .header_bar button.selected {
    border-radius: 0;
 }
 .textbox_default textarea {
    height: calc(100dvh - 271px);
 }
 .textbox_default_output textarea {
    height: calc(100dvh - 185px);
 }
 .textbox textarea {
    height: calc(100dvh - 241px);
 }
 .textbox_logits textarea {
    height: calc(100dvh - 236px);
 }
 .textbox_logits_notebook textarea {
    height: calc(100dvh - 292px);
 }
 .monospace textarea {
    font-family: monospace;
 }
 .textbox_default textarea,
 .textbox_default_output textarea,
 .textbox_logits textarea,
 .textbox_logits_notebook textarea,
 .textbox textarea {
    font-size: 16px !important;
    color: #46464A !important;
 }
 .dark textarea {
    color: #efefef !important;
 }
@media screen and (width <= 711px) {
    .textbox_default textarea {
        height: calc(100dvh - 259px);
    }
    div .default-token-counter {
        top: calc( 0.5 * (100dvh - 236px) ) !important;
    }
    .transparent-substring {
        display: none;
    }
    .hover-menu {
        min-width: 250px !important;
    }
 }
 /* Hide the gradio footer */
 footer {
    display: none !important;
 }
 button {
    font-size: 14px !important;
 }
 .file-saver {
    position: fixed !important;
    height: 100%;
    z-index: 1000;
    background-color: rgb(0 0 0 / 50%) !important;
    margin-left: -20px;
    margin-right: -20px;
 }
 .file-saver > :first-child {
    position: fixed !important;
    top: 50%;
    left: 50%;
    transform: translate(-50%, -50%); /* center horizontally */
    width: 100%;
    max-width: 500px;
    background-color: var(--input-background-fill);
    border: var(--input-border-width) solid var(--input-border-color) !important;
 }
 .file-saver > :first-child > :nth-child(2) {
    background: var(--block-background-fill);
 }
 .checkboxgroup-table label {
    background: none !important;
    padding: 0 !important;
    border: 0 !important;
 }
 .checkboxgroup-table div {
    display: grid !important;
 }
 .markdown ul ol {
    font-size: 100% !important;
 }
 .pretty_scrollbar::-webkit-scrollbar {
    width: 5px;
 }
 .pretty_scrollbar::-webkit-scrollbar-track {
    background: transparent;
 }
 .pretty_scrollbar::-webkit-scrollbar-thumb,
 .pretty_scrollbar::-webkit-scrollbar-thumb:hover {
    background: #c5c5d2;
 }
 .dark .pretty_scrollbar::-webkit-scrollbar-thumb,
 .dark .pretty_scrollbar::-webkit-scrollbar-thumb:hover {
    background: #374151;
 }
 .pretty_scrollbar::-webkit-resizer {
    background: #c5c5d2;
 }
 .dark .pretty_scrollbar::-webkit-resizer {
    background: #374151;
 }
 audio {
    max-width: 100%;
 }
 /* Copied from https://github.com/AUTOMATIC1111/stable-diffusion-webui */
 .token-counter {
    position: absolute !important;
    top: calc( 0.5 * (100dvh - 218px) ) !important;
    right: 2px;
    z-index: 100;
    background: var(--input-background-fill) !important;
    min-height: 0 !important;
 }
 .default-token-counter {
    top: calc( 0.5 * (100dvh - 248px) ) !important;
 }
 .token-counter span {
    padding: 1px;
    box-shadow: 0 0 0 0.3em rgb(192 192 192 / 15%), inset 0 0 0.6em rgb(192 192 192 / 7.5%);
    border: 2px solid rgb(192 192 192 / 40%) !important;
    border-radius: 0.4em;
 }
 .no-background {
    background: var(--background-fill-primary) !important;
    padding: 0 !important;
 }
 /* ----------------------------------------------
  Chat tab
 ---------------------------------------------- */
 .h-\[40vh\], .wrap.svelte-byatnx.svelte-byatnx.svelte-byatnx {
    height: 66.67vh
 }
 .gradio-container {
    margin-left: auto !important;
    margin-right: auto !important;
 }
 .w-screen {
    width: unset
 }
 div.svelte-362y77>*, div.svelte-362y77>.form>* {
    flex-wrap: nowrap
 }
 .pending.svelte-1ed2p3z {
    opacity: 1;
 }
 .wrap.svelte-6roggh.svelte-6roggh {
    max-height: 92.5%;
 }
 /* This is for the microphone button in the whisper extension */
 .sm.svelte-1ipelgc {
    width: 100%;
 }
 #chat-tab {
    padding-top: 0;
 }
 #chat-tab button#Generate, #chat-tab button#stop {
    width: 89.3438px !important;
 }
 #chat-tab button, #notebook-tab button, #default-tab button {
    min-width: 0 !important;
 }
 #chat-tab > :first-child, #extensions {
    max-width: 880px;
    margin-left: auto;
    margin-right: auto;
 }
@media screen and (width <= 688px) {
    #chat-tab {
        padding-left: 0;
        padding-right: 0;
    }
 }
 .chat {
    margin-left: auto;
    margin-right: auto;
    max-width: 880px;
    min-height: var(--chat-height);
    overflow-y: auto;
    padding-right: 15px;
    display: flex;
    flex-direction: column;
    word-break: break-word;
    overflow-wrap: anywhere;
    border-top: none;
    border-radius: 0 0 0 8px;
 }
 .chat-parent {
    height: calc(100dvh - 98px - var(--header-height) - var(--input-delta));
    overflow: auto !important;
    border-radius: 0 !important;
    margin-bottom: var(--input-delta) !important;
 }
 .old-ui .chat-parent {
    height: calc(100dvh - 192px - var(--header-height) - var(--input-delta));
    margin-bottom: var(--input-delta) !important;
 }
 .chat-parent.bigchat {
    height: calc(100dvh - 98px - var(--header-height) - var(--input-delta)) !important;
    margin-bottom: var(--input-delta) !important;
 }
 .chat > .messages {
    display: flex;
    flex-direction: column;
    padding-top: 25px;
 }
 .chat .message:last-child {
    margin-bottom: 0 !important;
    padding-bottom: 0 !important;
 }
 .message-body li {
    list-style-position: outside;
 }
 .chat .message-body ul, .chat .message-body ol {
    padding-inline-start: 2em;
 }
 .message-body li:not(:last-child) {
    margin-top: 0 !important;
    margin-bottom: 2px !important;
 }
 .message-body li:last-child {
    margin-bottom: 0 !important;
 }
 .message-body li > p {
    display: inline !important;
 }
 .message-body ul, .message-body ol {
    font-size: 15px !important;
 }
 .message-body ul {
    list-style-type: disc !important;
 }
 .message-body pre:not(:last-child) {
    margin-bottom: 35.625px !important;
 }
 .message-body pre:last-child {
    margin-bottom: 0 !important;
 }
 .message-body code {
    white-space: pre-wrap !important;
    word-wrap: break-word !important;
    border: 1px solid var(--border-color-primary);
    border-radius: var(--radius-sm);
    background: var(--background-fill-secondary);
    font-size: 90%;
    padding: 1px 3px;
 }
 .message-body pre > code {
    display: block;
    padding: 15px;
 }
 .message-body :not(pre) > code {
    white-space: normal !important;
 }
 #chat-input {
    padding: 0;
    padding-top: 18px;
    background: transparent;
    border: none;
 }
 #chat-input textarea:focus {
    box-shadow: none !important;
 }
 #chat-input > :first-child {
    background-color: transparent;
 }
 #chat-input .progress-text {
    display: none;
 }
@media print {
    body {
        visibility: hidden;
    }
    .chat {
        visibility: visible;
        position: absolute;
        left: 0;
        top: 0;
        max-width: unset;
        max-height: unset;
        width: 100%;
        overflow-y: visible;
    }
    .message {
        break-inside: avoid;
    }
    .gradio-container {
        overflow: visible;
    }
    .tab-nav {
        display: none !important;
    }
    #chat-tab > :first-child {
        max-width: unset;
    }
 }
 #show-controls {
    position: absolute;
    height: 100%;
    background-color: var(--background-fill-primary);
    border: 0 !important;
    border-radius: 0;
 }
 #show-controls label {
    z-index: 1000;
    position: absolute;
    left: calc(100% - 168px);
 }
 #show-controls span {
    opacity: 0.6;
 }
 #typing-container {
    display: none;
    position: absolute;
    background-color: transparent;
    left: -2px;
    padding: var(--block-padding);
 }
 .typing {
    position: relative;
 }
 .visible-dots #typing-container {
    display: block;
 }
 .typing span {
    content: '';
    animation: blink 1.5s infinite;
    animation-fill-mode: both;
    height: 10px;
    width: 10px;
    background: #3b5998;;
    position: absolute;
    left:0;
    top:0;
    border-radius: 50%;
 }
 .typing .dot1 {
    animation-delay: .2s;
    margin-left: calc(10px * 1.5);
 }
 .typing .dot2 {
    animation-delay: .4s;
    margin-left: calc(10px * 3);
 }
@keyframes blink {
    0% {
        opacity: .1;
    }
    20% {
        opacity: 1;
    }
    100% {
        opacity: .1;
    }
 }
 #chat-tab .generating {
    display: none !important;
 }
 .hover-element {
    position: relative;
    font-size: 24px;
 }
 .hover-menu {
    display: none;
    position: absolute;
    bottom: 80%;
    left: 0;
    background-color: var(--background-fill-secondary);
    box-shadow: 0 0 10px rgb(0 0 0 / 50%);
    z-index: 10000;
    min-width: 330px;
    flex-direction: column;
 }
 .hover-menu button {
    width: 100%;
    background: transparent !important;
    border-radius: 0 !important;
    justify-content: space-between;
    margin: 0 !important;
    height: 36px;
 }
 .hover-menu button:not(#clear-history-confirm) {
    border-bottom: 0 !important;
 }
 .hover-menu button:not(#clear-history-confirm):last-child {
    border-bottom: var(--button-border-width) solid var(--button-secondary-border-color) !important;
 }
 .hover-menu button:hover {
    background: var(--button-secondary-background-fill-hover) !important;
 }
 .transparent-substring {
    opacity: 0.333;
 }
 #chat-tab:not(.old-ui) #chat-buttons {
    display: none !important;
 }
 #gr-hover-container {
    min-width: 0 !important;
    display: flex;
    flex-direction: column-reverse;
    padding-right: 20px;
    padding-bottom: 3px;
    flex-grow: 0 !important;
 }
 #generate-stop-container {
    min-width: 0 !important;
    display: flex;
    flex-direction: column-reverse;
    padding-bottom: 3px;
    flex: 0 auto !important;
 }
 #chat-input-container {
    min-width: 0 !important;
 }
 #chat-input-container > .form {
    background: transparent;
    border: none;
 }
 #chat-input-row {
    padding-bottom: 20px;
 }
 .old-ui #chat-input-row, #chat-input-row.bigchat {
    padding-bottom: 0 !important;
 }
 #chat-col {
    padding-bottom: 100px;
 }
 .old-ui #chat-col, #chat-col.bigchat {
    padding-bottom: 80px !important;
 }
 .old-ui #chat-buttons #clear-history-confirm {
    order: -1;
 }
 .chat ol, .chat ul {
    margin-top: 6px !important;
 }
 /* ----------------------------------------------
  Past chats menus
 ---------------------------------------------- */
 #past-chats-row {
    margin-bottom: calc( -1 * var(--layout-gap) );
 }
 #rename-row label {
    margin-top: var(--layout-gap);
 }
 /* ----------------------------------------------
  Keep dropdown menus above errored components
 ---------------------------------------------- */
 .options {
    z-index: 100 !important;
 }
 /* ----------------------------------------------
  Big profile picture for characters
 ---------------------------------------------- */
 .bigProfilePicture {
    position: fixed;
    bottom: 0;
    left: 0;
    width: calc((100vw - 880px - 120px) /2);
 }
 .pfp_character:hover {
    cursor: pointer;
 }
@media screen and (width <= 1300px) {
    .bigProfilePicture {
        display: none;
    }
 }
--- a/python/llm/example/Text-Generation-WebUI/extensions/character_bias/script.py
+++ b/python/llm/example/Text-Generation-WebUI/extensions/character_bias/script.py
@ -0,0 +1,93 @@
 #
 # Copyright 2016 The BigDL Authors.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # This file is adapted from
 # https://github.com/oobabooga/text-generation-webui/blob/main/extensions/character_bias/script.py
 import os
 import gradio as gr
 # get the current directory of the script
 current_dir = os.path.dirname(os.path.abspath(__file__))
 # check if the bias_options.txt file exists, if not, create it
 bias_file = os.path.join(current_dir, "bias_options.txt")
 if not os.path.isfile(bias_file):
    with open(bias_file, "w") as f:
        f.write("*I am so happy*\n*I am so sad*\n*I am so excited*\n*I am so bored*\n*I am so angry*")
 # read bias options from the text file
 with open(bias_file, "r") as f:
    bias_options = [line.strip() for line in f.readlines()]
 params = {
    "activate": True,
    "bias string": " *I am so happy*",
    "custom string": "",
 }
 def input_modifier(string):
    """
    This function is applied to your text inputs before
    they are fed into the model.
    """
    return string
 def output_modifier(string):
    """
    This function is applied to the model outputs.
    """
    return string
 def bot_prefix_modifier(string):
    """
    This function is only applied in chat mode. It modifies
    the prefix text for the Bot and can be used to bias its
    behavior.
    """
    if params['activate']:
        if params['custom string'].strip() != '':
            return f'{string} {params["custom string"].strip()} '
        else:
            return f'{string} {params["bias string"].strip()} '
    else:
        return string
 def ui():
    # Gradio elements
    activate = gr.Checkbox(value=params['activate'], label='Activate character bias')
    dropdown_string = gr.Dropdown(choices=bias_options, value=params["bias string"], label='Character bias', info='To edit the options in this dropdown edit the "bias_options.txt" file')
    custom_string = gr.Textbox(value=params['custom string'], placeholder="Enter custom bias string", label="Custom Character Bias", info='If not empty, will be used instead of the value above')
    # Event functions to update the parameters in the backend
    def update_bias_string(x):
        if x:
            params.update({"bias string": x})
        else:
            params.update({"bias string": dropdown_string.get()})
        return x
    def update_custom_string(x):
        params.update({"custom string": x})
    dropdown_string.change(update_bias_string, dropdown_string, None)
    custom_string.change(update_custom_string, custom_string, None)
    activate.change(lambda x: params.update({"activate": x}), activate, None)
--- a/python/llm/example/Text-Generation-WebUI/extensions/coqui_tts/harvard_sentences.txt
+++ b/python/llm/example/Text-Generation-WebUI/extensions/coqui_tts/harvard_sentences.txt
@ -0,0 +1,720 @@
 The birch canoe slid on the smooth planks.
 Glue the sheet to the dark blue background.
 It's easy to tell the depth of a well.
 These days a chicken leg is a rare dish.
 Rice is often served in round bowls.
 The juice of lemons makes fine punch.
 The box was thrown beside the parked truck.
 The hogs were fed chopped corn and garbage.
 Four hours of steady work faced us.
 A large size in stockings is hard to sell.
 The boy was there when the sun rose.
 A rod is used to catch pink salmon.
 The source of the huge river is the clear spring.
 Kick the ball straight and follow through.
 Help the woman get back to her feet.
 A pot of tea helps to pass the evening.
 Smoky fires lack flame and heat.
 The soft cushion broke the man's fall.
 The salt breeze came across from the sea.
 The girl at the booth sold fifty bonds.
 The small pup gnawed a hole in the sock.
 The fish twisted and turned on the bent hook.
 Press the pants and sew a button on the vest.
 The swan dive was far short of perfect.
 The beauty of the view stunned the young boy.
 Two blue fish swam in the tank.
 Her purse was full of useless trash.
 The colt reared and threw the tall rider.
 It snowed, rained, and hailed the same morning.
 Read verse out loud for pleasure.
 Hoist the load to your left shoulder.
 Take the winding path to reach the lake.
 Note closely the size of the gas tank.
 Wipe the grease off his dirty face.
 Mend the coat before you go out.
 The wrist was badly strained and hung limp.
 The stray cat gave birth to kittens.
 The young girl gave no clear response.
 The meal was cooked before the bell rang.
 What joy there is in living.
 A king ruled the state in the early days.
 The ship was torn apart on the sharp reef.
 Sickness kept him home the third week.
 The wide road shimmered in the hot sun.
 The lazy cow lay in the cool grass.
 Lift the square stone over the fence.
 The rope will bind the seven books at once.
 Hop over the fence and plunge in.
 The friendly gang left the drug store.
 Mesh wire keeps chicks inside.
 The frosty air passed through the coat.
 The crooked maze failed to fool the mouse.
 Adding fast leads to wrong sums.
 The show was a flop from the very start.
 A saw is a tool used for making boards.
 The wagon moved on well oiled wheels.
 March the soldiers past the next hill.
 A cup of sugar makes sweet fudge.
 Place a rosebush near the porch steps.
 Both lost their lives in the raging storm.
 We talked of the side show in the circus.
 Use a pencil to write the first draft.
 He ran half way to the hardware store.
 The clock struck to mark the third period.
 A small creek cut across the field.
 Cars and busses stalled in snow drifts.
 The set of china hit the floor with a crash.
 This is a grand season for hikes on the road.
 The dune rose from the edge of the water.
 Those words were the cue for the actor to leave.
 A yacht slid around the point into the bay.
 The two met while playing on the sand.
 The ink stain dried on the finished page.
 The walled town was seized without a fight.
 The lease ran out in sixteen weeks.
 A tame squirrel makes a nice pet.
 The horn of the car woke the sleeping cop.
 The heart beat strongly and with firm strokes.
 The pearl was worn in a thin silver ring.
 The fruit peel was cut in thick slices.
 The Navy attacked the big task force.
 See the cat glaring at the scared mouse.
 There are more than two factors here.
 The hat brim was wide and too droopy.
 The lawyer tried to lose his case.
 The grass curled around the fence post.
 Cut the pie into large parts.
 Men strive but seldom get rich.
 Always close the barn door tight.
 He lay prone and hardly moved a limb.
 The slush lay deep along the street.
 A wisp of cloud hung in the blue air.
 A pound of sugar costs more than eggs.
 The fin was sharp and cut the clear water.
 The play seems dull and quite stupid.
 Bail the boat to stop it from sinking.
 The term ended in late June that year.
 A tusk is used to make costly gifts.
 Ten pins were set in order.
 The bill was paid every third week.
 Oak is strong and also gives shade.
 Cats and dogs each hate the other.
 The pipe began to rust while new.
 Open the crate but don't break the glass.
 Add the sum to the product of these three.
 Thieves who rob friends deserve jail.
 The ripe taste of cheese improves with age.
 Act on these orders with great speed.
 The hog crawled under the high fence.
 Move the vat over the hot fire.
 The bark of the pine tree was shiny and dark.
 Leaves turn brown and yellow in the fall.
 The pennant waved when the wind blew.
 Split the log with a quick, sharp blow.
 Burn peat after the logs give out.
 He ordered peach pie with ice cream.
 Weave the carpet on the right hand side.
 Hemp is a weed found in parts of the tropics.
 A lame back kept his score low.
 We find joy in the simplest things.
 Type out three lists of orders.
 The harder he tried the less he got done.
 The boss ran the show with a watchful eye.
 The cup cracked and spilled its contents.
 Paste can cleanse the most dirty brass.
 The slang word for raw whiskey is booze.
 It caught its hind paw in a rusty trap.
 The wharf could be seen at the farther shore.
 Feel the heat of the weak dying flame.
 The tiny girl took off her hat.
 A cramp is no small danger on a swim.
 He said the same phrase thirty times.
 Pluck the bright rose without leaves.
 Two plus seven is less than ten.
 The glow deepened in the eyes of the sweet girl.
 Bring your problems to the wise chief.
 Write a fond note to the friend you cherish.
 Clothes and lodging are free to new men.
 We frown when events take a bad turn.
 Port is a strong wine with a smoky taste.
 The young kid jumped the rusty gate.
 Guess the results from the first scores.
 A salt pickle tastes fine with ham.
 The just claim got the right verdict.
 These thistles bend in a high wind.
 Pure bred poodles have curls.
 The tree top waved in a graceful way.
 The spot on the blotter was made by green ink.
 Mud was spattered on the front of his white shirt.
 The cigar burned a hole in the desk top.
 The empty flask stood on the tin tray.
 A speedy man can beat this track mark.
 He broke a new shoelace that day.
 The coffee stand is too high for the couch.
 The urge to write short stories is rare.
 The pencils have all been used.
 The pirates seized the crew of the lost ship.
 We tried to replace the coin but failed.
 She sewed the torn coat quite neatly.
 The sofa cushion is red and of light weight.
 The jacket hung on the back of the wide chair.
 At that high level the air is pure.
 Drop the two when you add the figures.
 A filing case is now hard to buy.
 An abrupt start does not win the prize.
 Wood is best for making toys and blocks.
 The office paint was a dull, sad tan.
 He knew the skill of the great young actress.
 A rag will soak up spilled water.
 A shower of dirt fell from the hot pipes.
 Steam hissed from the broken valve.
 The child almost hurt the small dog.
 There was a sound of dry leaves outside.
 The sky that morning was clear and bright blue.
 Torn scraps littered the stone floor.
 Sunday is the best part of the week.
 The doctor cured him with these pills.
 The new girl was fired today at noon.
 They felt gay when the ship arrived in port.
 Add the store's account to the last cent.
 Acid burns holes in wool cloth.
 Fairy tales should be fun to write.
 Eight miles of woodland burned to waste.
 The third act was dull and tired the players.
 A young child should not suffer fright.
 Add the column and put the sum here.
 We admire and love a good cook.
 There the flood mark is ten inches.
 He carved a head from the round block of marble.
 She has a smart way of wearing clothes.
 The fruit of a fig tree is apple-shaped.
 Corn cobs can be used to kindle a fire.
 Where were they when the noise started.
 The paper box is full of thumb tacks.
 Sell your gift to a buyer at a good gain.
 The tongs lay beside the ice pail.
 The petals fall with the next puff of wind.
 Bring your best compass to the third class.
 They could laugh although they were sad.
 Farmers came in to thresh the oat crop.
 The brown house was on fire to the attic.
 The lure is used to catch trout and flounder.
 Float the soap on top of the bath water.
 A blue crane is a tall wading bird.
 A fresh start will work such wonders.
 The club rented the rink for the fifth night.
 After the dance, they went straight home.
 The hostess taught the new maid to serve.
 He wrote his last novel there at the inn.
 Even the worst will beat his low score.
 The cement had dried when he moved it.
 The loss of the second ship was hard to take.
 The fly made its way along the wall.
 Do that with a wooden stick.
 Live wires should be kept covered.
 The large house had hot water taps.
 It is hard to erase blue or red ink.
 Write at once or you may forget it.
 The doorknob was made of bright clean brass.
 The wreck occurred by the bank on Main Street.
 A pencil with black lead writes best.
 Coax a young calf to drink from a bucket.
 Schools for ladies teach charm and grace.
 The lamp shone with a steady green flame.
 They took the axe and the saw to the forest.
 The ancient coin was quite dull and worn.
 The shaky barn fell with a loud crash.
 Jazz and swing fans like fast music.
 Rake the rubbish up and then burn it.
 Slash the gold cloth into fine ribbons.
 Try to have the court decide the case.
 They are pushed back each time they attack.
 He broke his ties with groups of former friends.
 They floated on the raft to sun their white backs.
 The map had an X that meant nothing.
 Whitings are small fish caught in nets.
 Some ads serve to cheat buyers.
 Jerk the rope and the bell rings weakly.
 A waxed floor makes us lose balance.
 Madam, this is the best brand of corn.
 On the islands the sea breeze is soft and mild.
 The play began as soon as we sat down.
 This will lead the world to more sound and fury.
 Add salt before you fry the egg.
 The rush for funds reached its peak Tuesday.
 The birch looked stark white and lonesome.
 The box is held by a bright red snapper.
 To make pure ice, you freeze water.
 The first worm gets snapped early.
 Jump the fence and hurry up the bank.
 Yell and clap as the curtain slides back.
 They are men who walk the middle of the road.
 Both brothers wear the same size.
 In some form or other we need fun.
 The prince ordered his head chopped off.
 The houses are built of red clay bricks.
 Ducks fly north but lack a compass.
 Fruit flavors are used in fizz drinks.
 These pills do less good than others.
 Canned pears lack full flavor.
 The dark pot hung in the front closet.
 Carry the pail to the wall and spill it there.
 The train brought our hero to the big town.
 We are sure that one war is enough.
 Gray paint stretched for miles around.
 The rude laugh filled the empty room.
 High seats are best for football fans.
 Tea served from the brown jug is tasty.
 A dash of pepper spoils beef stew.
 A zestful food is the hot-cross bun.
 The horse trotted around the field at a brisk pace.
 Find the twin who stole the pearl necklace.
 Cut the cord that binds the box tightly.
 The red tape bound the smuggled food.
 Look in the corner to find the tan shirt.
 The cold drizzle will halt the bond drive.
 Nine men were hired to dig the ruins.
 The junk yard had a mouldy smell.
 The flint sputtered and lit a pine torch.
 Soak the cloth and drown the sharp odor.
 The shelves were bare of both jam or crackers.
 A joy to every child is the swan boat.
 All sat frozen and watched the screen.
 A cloud of dust stung his tender eyes.
 To reach the end he needs much courage.
 Shape the clay gently into block form.
 A ridge on a smooth surface is a bump or flaw.
 Hedge apples may stain your hands green.
 Quench your thirst, then eat the crackers.
 Tight curls get limp on rainy days.
 The mute muffled the high tones of the horn.
 The gold ring fits only a pierced ear.
 The old pan was covered with hard fudge.
 Watch the log float in the wide river.
 The node on the stalk of wheat grew daily.
 The heap of fallen leaves was set on fire.
 Write fast if you want to finish early.
 His shirt was clean but one button was gone.
 The barrel of beer was a brew of malt and hops.
 Tin cans are absent from store shelves.
 Slide the box into that empty space.
 The plant grew large and green in the window.
 The beam dropped down on the workmen's head.
 Pink clouds floated with the breeze.
 She danced like a swan, tall and graceful.
 The tube was blown and the tire flat and useless.
 It is late morning on the old wall clock.
 Let's all join as we sing the last chorus.
 The last switch cannot be turned off.
 The fight will end in just six minutes.
 The store walls were lined with colored frocks.
 The peace league met to discuss their plans.
 The rise to fame of a person takes luck.
 Paper is scarce, so write with much care.
 The quick fox jumped on the sleeping cat.
 The nozzle of the fire hose was bright brass.
 Screw the round cap on as tight as needed.
 Time brings us many changes.
 The purple tie was ten years old.
 Men think and plan and sometimes act.
 Fill the ink jar with sticky glue.
 He smoke a big pipe with strong contents.
 We need grain to keep our mules healthy.
 Pack the records in a neat thin case.
 The crunch of feet in the snow was the only sound.
 The copper bowl shone in the sun's rays.
 Boards will warp unless kept dry.
 The plush chair leaned against the wall.
 Glass will clink when struck by metal.
 Bathe and relax in the cool green grass.
 Nine rows of soldiers stood in line.
 The beach is dry and shallow at low tide.
 The idea is to sew both edges straight.
 The kitten chased the dog down the street.
 Pages bound in cloth make a book.
 Try to trace the fine lines of the painting.
 Women form less than half of the group.
 The zones merge in the central part of town.
 A gem in the rough needs work to polish.
 Code is used when secrets are sent.
 Most of the news is easy for us to hear.
 He used the lathe to make brass objects.
 The vane on top of the pole revolved in the wind.
 Mince pie is a dish served to children.
 The clan gathered on each dull night.
 Let it burn, it gives us warmth and comfort.
 A castle built from sand fails to endure.
 A child's wit saved the day for us.
 Tack the strip of carpet to the worn floor.
 Next Tuesday we must vote.
 Pour the stew from the pot into the plate.
 Each penny shone like new.
 The man went to the woods to gather sticks.
 The dirt piles were lines along the road.
 The logs fell and tumbled into the clear stream.
 Just hoist it up and take it away.
 A ripe plum is fit for a king's palate.
 Our plans right now are hazy.
 Brass rings are sold by these natives.
 It takes a good trap to capture a bear.
 Feed the white mouse some flower seeds.
 The thaw came early and freed the stream.
 He took the lead and kept it the whole distance.
 The key you designed will fit the lock.
 Plead to the council to free the poor thief.
 Better hash is made of rare beef.
 This plank was made for walking on.
 The lake sparkled in the red hot sun.
 He crawled with care along the ledge.
 Tend the sheep while the dog wanders.
 It takes a lot of help to finish these.
 Mark the spot with a sign painted red.
 Take two shares as a fair profit.
 The fur of cats goes by many names.
 North winds bring colds and fevers.
 He asks no person to vouch for him.
 Go now and come here later.
 A sash of gold silk will trim her dress.
 Soap can wash most dirt away.
 That move means the game is over.
 He wrote down a long list of items.
 A siege will crack the strong defense.
 Grape juice and water mix well.
 Roads are paved with sticky tar.
 Fake stones shine but cost little.
 The drip of the rain made a pleasant sound.
 Smoke poured out of every crack.
 Serve the hot rum to the tired heroes.
 Much of the story makes good sense.
 The sun came up to light the eastern sky.
 Heave the line over the port side.
 A lathe cuts and trims any wood.
 It's a dense crowd in two distinct ways.
 His hip struck the knee of the next player.
 The stale smell of old beer lingers.
 The desk was firm on the shaky floor.
 It takes heat to bring out the odor.
 Beef is scarcer than some lamb.
 Raise the sail and steer the ship northward.
 A cone costs five cents on Mondays.
 A pod is what peas always grow in.
 Jerk the dart from the cork target.
 No cement will hold hard wood.
 We now have a new base for shipping.
 A list of names is carved around the base.
 The sheep were led home by a dog.
 Three for a dime, the young peddler cried.
 The sense of smell is better than that of touch.
 No hardship seemed to keep him sad.
 Grace makes up for lack of beauty.
 Nudge gently but wake her now.
 The news struck doubt into restless minds.
 Once we stood beside the shore.
 A chink in the wall allowed a draft to blow.
 Fasten two pins on each side.
 A cold dip restores health and zest.
 He takes the oath of office each March.
 The sand drifts over the sill of the old house.
 The point of the steel pen was bent and twisted.
 There is a lag between thought and act.
 Seed is needed to plant the spring corn.
 Draw the chart with heavy black lines.
 The boy owed his pal thirty cents.
 The chap slipped into the crowd and was lost.
 Hats are worn to tea and not to dinner.
 The ramp led up to the wide highway.
 Beat the dust from the rug onto the lawn.
 Say it slowly but make it ring clear.
 The straw nest housed five robins.
 Screen the porch with woven straw mats.
 This horse will nose his way to the finish.
 The dry wax protects the deep scratch.
 He picked up the dice for a second roll.
 These coins will be needed to pay his debt.
 The nag pulled the frail cart along.
 Twist the valve and release hot steam.
 The vamp of the shoe had a gold buckle.
 The smell of burned rags itches my nose.
 New pants lack cuffs and pockets.
 The marsh will freeze when cold enough.
 They slice the sausage thin with a knife.
 The bloom of the rose lasts a few days.
 A gray mare walked before the colt.
 Breakfast buns are fine with a hot drink.
 Bottles hold four kinds of rum.
 The man wore a feather in his felt hat.
 He wheeled the bike past the winding road.
 Drop the ashes on the worn old rug.
 The desk and both chairs were painted tan.
 Throw out the used paper cup and plate.
 A clean neck means a neat collar.
 The couch cover and hall drapes were blue.
 The stems of the tall glasses cracked and broke.
 The wall phone rang loud and often.
 The clothes dried on a thin wooden rack.
 Turn on the lantern which gives us light.
 The cleat sank deeply into the soft turf.
 The bills were mailed promptly on the tenth of the month.
 To have is better than to wait and hope.
 The price is fair for a good antique clock.
 The music played on while they talked.
 Dispense with a vest on a day like this.
 The bunch of grapes was pressed into wine.
 He sent the figs, but kept the ripe cherries.
 The hinge on the door creaked with old age.
 The screen before the fire kept in the sparks.
 Fly by night, and you waste little time.
 Thick glasses helped him read the print.
 Birth and death mark the limits of life.
 The chair looked strong but had no bottom.
 The kite flew wildly in the high wind.
 A fur muff is stylish once more.
 The tin box held priceless stones.
 We need an end of all such matter.
 The case was puzzling to the old and wise.
 The bright lanterns were gay on the dark lawn.
 We don't get much money but we have fun.
 The youth drove with zest, but little skill.
 Five years he lived with a shaggy dog.
 A fence cuts through the corner lot.
 The way to save money is not to spend much.
 Shut the hatch before the waves push it in.
 The odor of spring makes young hearts jump.
 Crack the walnut with your sharp side teeth.
 He offered proof in the form of a large chart.
 Send the stuff in a thick paper bag.
 A quart of milk is water for the most part.
 They told wild tales to frighten him.
 The three story house was built of stone.
 In the rear of the ground floor was a large passage.
 A man in a blue sweater sat at the desk.
 Oats are a food eaten by horse and man.
 Their eyelids droop for want of sleep.
 A sip of tea revives his tired friend.
 There are many ways to do these things.
 Tuck the sheet under the edge of the mat.
 A force equal to that would move the earth.
 We like to see clear weather.
 The work of the tailor is seen on each side.
 Take a chance and win a china doll.
 Shake the dust from your shoes, stranger.
 She was kind to sick old people.
 The square wooden crate was packed to be shipped.
 The dusty bench stood by the stone wall.
 We dress to suit the weather of most days.
 Smile when you say nasty words.
 A bowl of rice is free with chicken stew.
 The water in this well is a source of good health.
 Take shelter in this tent, but keep still.
 That guy is the writer of a few banned books.
 The little tales they tell are false.
 The door was barred, locked, and bolted as well.
 Ripe pears are fit for a queen's table.
 A big wet stain was on the round carpet.
 The kite dipped and swayed, but stayed aloft.
 The pleasant hours fly by much too soon.
 The room was crowded with a wild mob.
 This strong arm shall shield your honor.
 She blushed when he gave her a white orchid.
 The beetle droned in the hot June sun.
 Press the pedal with your left foot.
 Neat plans fail without luck.
 The black trunk fell from the landing.
 The bank pressed for payment of the debt.
 The theft of the pearl pin was kept secret.
 Shake hands with this friendly child.
 The vast space stretched into the far distance.
 A rich farm is rare in this sandy waste.
 His wide grin earned many friends.
 Flax makes a fine brand of paper.
 Hurdle the pit with the aid of a long pole.
 A strong bid may scare your partner stiff.
 Even a just cause needs power to win.
 Peep under the tent and see the clowns.
 The leaf drifts along with a slow spin.
 Cheap clothes are flashy but don't last.
 A thing of small note can cause despair.
 Flood the mails with requests for this book.
 A thick coat of black paint covered all.
 The pencil was cut to be sharp at both ends.
 Those last words were a strong statement.
 He wrote his name boldly at the top of the sheet.
 Dill pickles are sour but taste fine.
 Down that road is the way to the grain farmer.
 Either mud or dust are found at all times.
 The best method is to fix it in place with clips.
 If you mumble your speech will be lost.
 At night the alarm roused him from a deep sleep.
 Read just what the meter says.
 Fill your pack with bright trinkets for the poor.
 The small red neon lamp went out.
 Clams are small, round, soft, and tasty.
 The fan whirled its round blades softly.
 The line where the edges join was clean.
 Breathe deep and smell the piny air.
 It matters not if he reads these words or those.
 A brown leather bag hung from its strap.
 A toad and a frog are hard to tell apart.
 A white silk jacket goes with any shoes.
 A break in the dam almost caused a flood.
 Paint the sockets in the wall dull green.
 The child crawled into the dense grass.
 Bribes fail where honest men work.
 Trample the spark, else the flames will spread.
 The hilt of the sword was carved with fine designs.
 A round hole was drilled through the thin board.
 Footprints showed the path he took up the beach.
 She was waiting at my front lawn.
 A vent near the edge brought in fresh air.
 Prod the old mule with a crooked stick.
 It is a band of steel three inches wide.
 The pipe ran almost the length of the ditch.
 It was hidden from sight by a mass of leaves and shrubs.
 The weight of the package was seen on the high scale.
 Wake and rise, and step into the green outdoors.
 The green light in the brown box flickered.
 The brass tube circled the high wall.
 The lobes of her ears were pierced to hold rings.
 Hold the hammer near the end to drive the nail.
 Next Sunday is the twelfth of the month.
 Every word and phrase he speaks is true.
 He put his last cartridge into the gun and fired.
 They took their kids from the public school.
 Drive the screw straight into the wood.
 Keep the hatch tight and the watch constant.
 Sever the twine with a quick snip of the knife.
 Paper will dry out when wet.
 Slide the catch back and open the desk.
 Help the weak to preserve their strength.
 A sullen smile gets few friends.
 Stop whistling and watch the boys march.
 Jerk the cord, and out tumbles the gold.
 Slide the tray across the glass top.
 The cloud moved in a stately way and was gone.
 Light maple makes for a swell room.
 Set the piece here and say nothing.
 Dull stories make her laugh.
 A stiff cord will do to fasten your shoe.
 Get the trust fund to the bank early.
 Choose between the high road and the low.
 A plea for funds seems to come again.
 He lent his coat to the tall gaunt stranger.
 There is a strong chance it will happen once more.
 The duke left the park in a silver coach.
 Greet the new guests and leave quickly.
 When the frost has come it is time for turkey.
 Sweet words work better than fierce.
 A thin stripe runs down the middle.
 A six comes up more often than a ten.
 Lush fern grow on the lofty rocks.
 The ram scared the school children off.
 The team with the best timing looks good.
 The farmer swapped his horse for a brown ox.
 Sit on the perch and tell the others what to do.
 A steep trail is painful for our feet.
 The early phase of life moves fast.
 Green moss grows on the northern side.
 Tea in thin china has a sweet taste.
 Pitch the straw through the door of the stable.
 The latch on the back gate needed a nail.
 The goose was brought straight from the old market.
 The sink is the thing in which we pile dishes.
 A whiff of it will cure the most stubborn cold.
 The facts don't always show who is right.
 She flaps her cape as she parades the street.
 The loss of the cruiser was a blow to the fleet.
 Loop the braid to the left and then over.
 Plead with the lawyer to drop the lost cause.
 Calves thrive on tender spring grass.
 Post no bills on this office wall.
 Tear a thin sheet from the yellow pad.
 A cruise in warm waters in a sleek yacht is fun.
 A streak of color ran down the left edge.
 It was done before the boy could see it.
 Crouch before you jump or miss the mark.
 Pack the kits and don't forget the salt.
 The square peg will settle in the round hole.
 Fine soap saves tender skin.
 Poached eggs and tea must suffice.
 Bad nerves are jangled by a door slam.
 Ship maps are different from those for planes.
 Dimes showered down from all sides.
 They sang the same tunes at each party.
 The sky in the west is tinged with orange red.
 The pods of peas ferment in bare fields.
 The horse balked and threw the tall rider.
 The hitch between the horse and cart broke.
 Pile the coal high in the shed corner.
 A gold vase is both rare and costly.
 The knife was hung inside its bright sheath.
 The rarest spice comes from the far East.
 The roof should be tilted at a sharp slant.
 A smatter of French is worse than none.
 The mule trod the treadmill day and night.
 The aim of the contest is to raise a great fund.
 To send it now in large amounts is bad.
 There is a fine hard tang in salty air.
 Cod is the main business of the north shore.
 The slab was hewn from heavy blocks of slate.
 Dunk the stale biscuits into strong drink.
 Hang tinsel from both branches.
 Cap the jar with a tight brass cover.
 The poor boy missed the boat again.
 Be sure to set the lamp firmly in the hole.
 Pick a card and slip it under the pack.
 A round mat will cover the dull spot.
 The first part of the plan needs changing.
 A good book informs of what we ought to know.
 The mail comes in three batches per day.
 You cannot brew tea in a cold pot.
 Dots of light betrayed the black cat.
 Put the chart on the mantel and tack it down.
 The night shift men rate extra pay.
 The red paper brightened the dim stage.
 See the player scoot to third base.
 Slide the bill between the two leaves.
 Many hands help get the job done.
 We don't like to admit our small faults.
 No doubt about the way the wind blows.
 Dig deep in the earth for pirate's gold.
 The steady drip is worse than a drenching rain.
 A flat pack takes less luggage space.
 Green ice frosted the punch bowl.
 A stuffed chair slipped from the moving van.
 The stitch will serve but needs to be shortened.
 A thin book fits in the side pocket.
 The gloss on top made it unfit to read.
 The hail pattered on the burnt brown grass.
 Seven seals were stamped on great sheets.
 Our troops are set to strike heavy blows.
 The store was jammed before the sale could start.
 It was a bad error on the part of the new judge.
 One step more and the board will collapse.
 Take the match and strike it against your shoe.
 The pot boiled, but the contents failed to jell.
 The baby puts his right foot in his mouth.
 The bombs left most of the town in ruins.
 Stop and stare at the hard working man.
 The streets are narrow and full of sharp turns.
 The pup jerked the leash as he saw a feline shape.
 Open your book to the first page.
 Fish evade the net and swim off.
 Dip the pail once and let it settle.
 Will you please answer that phone.
 The big red apple fell to the ground.
 The curtain rose and the show was on.
 The young prince became heir to the throne.
 He sent the boy on a short errand.
 Leave now and you will arrive on time.
 The corner store was robbed last night.
 A gold ring will please most any girl.
 The long journey home took a year.
 She saw a cat in the neighbor's house.
 A pink shell was found on the sandy beach.
 Small children came to see him.
 The grass and bushes were wet with dew.
 The blind man counted his old coins.
 A severe storm tore down the barn.
 She called his name many times.
 When you hear the bell, come quickly.
--- a/python/llm/example/Text-Generation-WebUI/extensions/coqui_tts/languages.json
+++ b/python/llm/example/Text-Generation-WebUI/extensions/coqui_tts/languages.json
@ -0,0 +1,18 @@
 {
  "Arabic": "ar",
  "Chinese": "zh-cn",
  "Czech": "cs",
  "Dutch": "nl",
  "English": "en",
  "French": "fr",
  "German": "de",
  "Hungarian": "hu",
  "Italian": "it",
  "Japanese": "ja",
  "Korean": "ko",
  "Polish": "pl",
  "Portuguese": "pt",
  "Russian": "ru",
  "Spanish": "es",
  "Turkish": "tr"
 }
--- a/python/llm/example/Text-Generation-WebUI/extensions/coqui_tts/requirements.txt
+++ b/python/llm/example/Text-Generation-WebUI/extensions/coqui_tts/requirements.txt
@ -0,0 +1 @@
 TTS==0.21.*
--- a/python/llm/example/Text-Generation-WebUI/extensions/coqui_tts/script.py
+++ b/python/llm/example/Text-Generation-WebUI/extensions/coqui_tts/script.py
@ -0,0 +1,246 @@
 #
 # Copyright 2016 The BigDL Authors.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # This file is adapted from
 # https://github.com/oobabooga/text-generation-webui/blob/main/extensions/coqui_tts/script.py
 import html
 import json
 import os
 import random
 import time
 from pathlib import Path
 import gradio as gr
 from TTS.api import TTS
 from TTS.utils.synthesizer import Synthesizer
 from modules import chat, shared, ui_chat
 from modules.ui import create_refresh_button
 from modules.utils import gradio
 os.environ["COQUI_TOS_AGREED"] = "1"
 params = {
    "activate": True,
    "autoplay": True,
    "show_text": False,
    "remove_trailing_dots": False,
    "voice": "female_01.wav",
    "language": "English",
    "model_name": "tts_models/multilingual/multi-dataset/xtts_v2",
    "device": "cuda"
 }
 this_dir = str(Path(__file__).parent.resolve())
 model = None
 with open(Path(f"{this_dir}/languages.json"), encoding='utf8') as f:
    languages = json.load(f)
 def get_available_voices():
    return sorted([voice.name for voice in Path(f"{this_dir}/voices").glob("*.wav")])
 def preprocess(raw_input):
    raw_input = html.unescape(raw_input)
    # raw_input = raw_input.strip("\"")
    return raw_input
 def new_split_into_sentences(self, text):
    sentences = self.seg.segment(text)
    if params['remove_trailing_dots']:
        sentences_without_dots = []
        for sentence in sentences:
            if sentence.endswith('.') and not sentence.endswith('...'):
                sentence = sentence[:-1]
            sentences_without_dots.append(sentence)
        return sentences_without_dots
    else:
        return sentences
 Synthesizer.split_into_sentences = new_split_into_sentences
 def load_model():
    model = TTS(params["model_name"]).to(params["device"])
    return model
 def remove_tts_from_history(history):
    for i, entry in enumerate(history['internal']):
        history['visible'][i] = [history['visible'][i][0], entry[1]]
    return history
 def toggle_text_in_history(history):
    for i, entry in enumerate(history['visible']):
        visible_reply = entry[1]
        if visible_reply.startswith('<audio'):
            if params['show_text']:
                reply = history['internal'][i][1]
                history['visible'][i] = [history['visible'][i][0], f"{visible_reply.split('</audio>')[0]}</audio>\n\n{reply}"]
            else:
                history['visible'][i] = [history['visible'][i][0], f"{visible_reply.split('</audio>')[0]}</audio>"]
    return history
 def random_sentence():
    with open(Path("extensions/coqui_tts/harvard_sentences.txt")) as f:
        return random.choice(list(f))
 def voice_preview(string):
    string = html.unescape(string) or random_sentence()
    output_file = Path('extensions/coqui_tts/outputs/voice_preview.wav')
    model.tts_to_file(
        text=string,
        file_path=output_file,
        speaker_wav=[f"{this_dir}/voices/{params['voice']}"],
        language=languages[params["language"]]
    )
    return f'<audio src="file/{output_file.as_posix()}?{int(time.time())}" controls autoplay></audio>'
 def history_modifier(history):
    # Remove autoplay from the last reply
    if len(history['internal']) > 0:
        history['visible'][-1] = [
            history['visible'][-1][0],
            history['visible'][-1][1].replace('controls autoplay>', 'controls>')
        ]
    return history
 def state_modifier(state):
    if not params['activate']:
        return state
    state['stream'] = False
    return state
 def input_modifier(string, state):
    if not params['activate']:
        return string
    shared.processing_message = "*Is recording a voice message...*"
    return string
 def output_modifier(string, state):
    if not params['activate']:
        return string
    original_string = string
    string = preprocess(html.unescape(string))
    if string == '':
        string = '*Empty reply, try regenerating*'
    else:
        output_file = Path(f'extensions/coqui_tts/outputs/{state["character_menu"]}_{int(time.time())}.wav')
        model.tts_to_file(
            text=string,
            file_path=output_file,
            speaker_wav=[f"{this_dir}/voices/{params['voice']}"],
            language=languages[params["language"]]
        )
        autoplay = 'autoplay' if params['autoplay'] else ''
        string = f'<audio src="file/{output_file.as_posix()}" controls {autoplay}></audio>'
        if params['show_text']:
            string += f'\n\n{original_string}'
    shared.processing_message = "*Is typing...*"
    return string
 def custom_css():
    path_to_css = Path(f"{this_dir}/style.css")
    return open(path_to_css, 'r').read()
 def setup():
    global model
    print("[XTTS] Loading XTTS...")
    model = load_model()
    print("[XTTS] Done!")
    Path(f"{this_dir}/outputs").mkdir(parents=True, exist_ok=True)
 def ui():
    with gr.Accordion("Coqui TTS (XTTSv2)"):
        with gr.Row():
            activate = gr.Checkbox(value=params['activate'], label='Activate TTS')
            autoplay = gr.Checkbox(value=params['autoplay'], label='Play TTS automatically')
        with gr.Row():
            show_text = gr.Checkbox(value=params['show_text'], label='Show message text under audio player')
            remove_trailing_dots = gr.Checkbox(value=params['remove_trailing_dots'], label='Remove trailing "." from text segments before converting to audio')
        with gr.Row():
            with gr.Row():
                voice = gr.Dropdown(get_available_voices(), label="Voice wav", value=params["voice"])
                create_refresh_button(voice, lambda: None, lambda: {'choices': get_available_voices(), 'value': params["voice"]}, 'refresh-button')
            language = gr.Dropdown(languages.keys(), label="Language", value=params["language"])
        with gr.Row():
            preview_text = gr.Text(show_label=False, placeholder="Preview text", elem_id="silero_preview_text")
            preview_play = gr.Button("Preview")
            preview_audio = gr.HTML(visible=False)
        with gr.Row():
            convert = gr.Button('Permanently replace audios with the message texts')
            convert_cancel = gr.Button('Cancel', visible=False)
            convert_confirm = gr.Button('Confirm (cannot be undone)', variant="stop", visible=False)
    # Convert history with confirmation
    convert_arr = [convert_confirm, convert, convert_cancel]
    convert.click(lambda: [gr.update(visible=True), gr.update(visible=False), gr.update(visible=True)], None, convert_arr)
    convert_confirm.click(
        lambda: [gr.update(visible=False), gr.update(visible=True), gr.update(visible=False)], None, convert_arr).then(
        remove_tts_from_history, gradio('history'), gradio('history')).then(
        chat.save_history, gradio('history', 'unique_id', 'character_menu', 'mode'), None).then(
        chat.redraw_html, gradio(ui_chat.reload_arr), gradio('display'))
    convert_cancel.click(lambda: [gr.update(visible=False), gr.update(visible=True), gr.update(visible=False)], None, convert_arr)
    # Toggle message text in history
    show_text.change(
        lambda x: params.update({"show_text": x}), show_text, None).then(
        toggle_text_in_history, gradio('history'), gradio('history')).then(
        chat.save_history, gradio('history', 'unique_id', 'character_menu', 'mode'), None).then(
        chat.redraw_html, gradio(ui_chat.reload_arr), gradio('display'))
    # Event functions to update the parameters in the backend
    activate.change(lambda x: params.update({"activate": x}), activate, None)
    autoplay.change(lambda x: params.update({"autoplay": x}), autoplay, None)
    remove_trailing_dots.change(lambda x: params.update({"remove_trailing_dots": x}), remove_trailing_dots, None)
    voice.change(lambda x: params.update({"voice": x}), voice, None)
    language.change(lambda x: params.update({"language": x}), language, None)
    # Play preview
    preview_text.submit(voice_preview, preview_text, preview_audio)
    preview_play.click(voice_preview, preview_text, preview_audio)
--- a/python/llm/example/Text-Generation-WebUI/extensions/coqui_tts/style.css
+++ b/python/llm/example/Text-Generation-WebUI/extensions/coqui_tts/style.css
@ -0,0 +1,8 @@
 .SDAP .hires_opts input[type="number"] {
    width: 6em !important;
 }
 /* silero_tts preview */
 .form:has(> #silero_preview_text) {
    min-width: 75%
 }
--- a/python/llm/example/Text-Generation-WebUI/extensions/coqui_tts/voices/arnold.wav
+++ b/python/llm/example/Text-Generation-WebUI/extensions/coqui_tts/voices/arnold.wav
--- a/python/llm/example/Text-Generation-WebUI/extensions/coqui_tts/voices/female_01.wav
+++ b/python/llm/example/Text-Generation-WebUI/extensions/coqui_tts/voices/female_01.wav
--- a/python/llm/example/Text-Generation-WebUI/extensions/coqui_tts/voices/female_02.wav
+++ b/python/llm/example/Text-Generation-WebUI/extensions/coqui_tts/voices/female_02.wav
--- a/python/llm/example/Text-Generation-WebUI/extensions/example/script.py
+++ b/python/llm/example/Text-Generation-WebUI/extensions/example/script.py
@ -0,0 +1,149 @@
 #
 # Copyright 2016 The BigDL Authors.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # This file is adapted from
 # https://github.com/oobabooga/text-generation-webui/blob/main/extensions/example/script.py
 import gradio as gr
 import torch
 from transformers import LogitsProcessor
 from modules import chat, shared
 from modules.text_generation import (
    decode,
    encode,
    generate_reply,
 )
 params = {
    "display_name": "Example Extension",
    "is_tab": False,
 }
 class MyLogits(LogitsProcessor):
    """
    Manipulates the probabilities for the next token before it gets sampled.
    Used in the logits_processor_modifier function below.
    """
    def __init__(self):
        pass
    def __call__(self, input_ids, scores):
        # probs = torch.softmax(scores, dim=-1, dtype=torch.float)
        # probs[0] /= probs[0].sum()
        # scores = torch.log(probs / (1 - probs))
        return scores
 def history_modifier(history):
    """
    Modifies the chat history.
    Only used in chat mode.
    """
    return history
 def state_modifier(state):
    """
    Modifies the state variable, which is a dictionary containing the input
    values in the UI like sliders and checkboxes.
    """
    return state
 def chat_input_modifier(text, visible_text, state):
    """
    Modifies the user input string in chat mode (visible_text).
    You can also modify the internal representation of the user
    input (text) to change how it will appear in the prompt.
    """
    return text, visible_text
 def input_modifier(string, state, is_chat=False):
    """
    In default/notebook modes, modifies the whole prompt.
    In chat mode, it is the same as chat_input_modifier but only applied
    to "text", here called "string", and not to "visible_text".
    """
    return string
 def bot_prefix_modifier(string, state):
    """
    Modifies the prefix for the next bot reply in chat mode.
    By default, the prefix will be something like "Bot Name:".
    """
    return string
 def tokenizer_modifier(state, prompt, input_ids, input_embeds):
    """
    Modifies the input ids and embeds.
    Used by the multimodal extension to put image embeddings in the prompt.
    Only used by loaders that use the transformers library for sampling.
    """
    return prompt, input_ids, input_embeds
 def logits_processor_modifier(processor_list, input_ids):
    """
    Adds logits processors to the list, allowing you to access and modify
    the next token probabilities.
    Only used by loaders that use the transformers library for sampling.
    """
    processor_list.append(MyLogits())
    return processor_list
 def output_modifier(string, state, is_chat=False):
    """
    Modifies the LLM output before it gets presented.
    In chat mode, the modified version goes into history['visible'],
    and the original version goes into history['internal'].
    """
    return string
 def custom_generate_chat_prompt(user_input, state, **kwargs):
    """
    Replaces the function that generates the prompt from the chat history.
    Only used in chat mode.
    """
    result = chat.generate_chat_prompt(user_input, state, **kwargs)
    return result
 def custom_css():
    """
    Returns a CSS string that gets appended to the CSS for the webui.
    """
    return ''
 def custom_js():
    """
    Returns a javascript string that gets appended to the javascript
    for the webui.
    """
    return ''
 def setup():
    """
    Gets executed only once, when the extension is imported.
    """
    pass
 def ui():
    """
    Gets executed when the UI is drawn. Custom gradio elements and
    their corresponding event handlers should be defined here.
    To learn about gradio components, check out the docs:
    https://gradio.app/docs/
    """
    pass
--- a/python/llm/example/Text-Generation-WebUI/extensions/gallery/script.js
+++ b/python/llm/example/Text-Generation-WebUI/extensions/gallery/script.js
@ -0,0 +1,40 @@
 let gallery_element = document.getElementById('gallery-extension');
 let chat_mode_element = document.getElementById('chat-mode');
 let extensions_block = document.getElementById('extensions');
 let extensions_block_size = extensions_block.childNodes.length;
 let gallery_only = (extensions_block_size == 5);
 function gotoFirstPage() {
    const firstPageButton = gallery_element.querySelector('.paginate > button');
    if (firstPageButton) {
        firstPageButton.click();
    }
 }
 document.querySelector('.header_bar').addEventListener('click', function(event) {
    if (event.target.tagName === 'BUTTON') {
        const buttonText = event.target.textContent.trim();
        let chat_visible = (buttonText == 'Chat');
        let default_visible = (buttonText == 'Default');
        let notebook_visible = (buttonText == 'Notebook');
        let chat_mode_visible = (chat_mode_element.offsetHeight > 0 && chat_mode_element.offsetWidth > 0);
        // Only show this extension in the Chat tab
        if (chat_visible) {
            if (chat_mode_visible) {
                gallery_element.style.display = 'block';
                extensions_block.style.display = '';
            } else {
                gallery_element.style.display = 'none';
                extensions_block.style.display = 'none';
            }
        } else {
            gallery_element.style.display = 'none';
            if (gallery_only) {
                extensions_block.style.display = 'none';
            }
        }
    }
 });
--- a/python/llm/example/Text-Generation-WebUI/extensions/gallery/script.py
+++ b/python/llm/example/Text-Generation-WebUI/extensions/gallery/script.py
@ -0,0 +1,148 @@
 #
 # Copyright 2016 The BigDL Authors.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # This file is adapted from
 # https://github.com/oobabooga/text-generation-webui/blob/main/extensions/gallery/script.py
 from pathlib import Path
 import gradio as gr
 from modules.html_generator import get_image_cache
 from modules.shared import gradio, settings
 cards = []
 def generate_css():
    css = """
      .highlighted-border {
        border-color: rgb(249, 115, 22) !important;
      }
      .character-gallery > .gallery {
        margin: 1rem 0;
        display: grid !important;
        grid-template-columns: repeat(auto-fit, minmax(150px, 1fr));
        grid-column-gap: 0.4rem;
        grid-row-gap: 1.2rem;
      }
      .character-gallery > .label {
        display: none !important;
      }
      .character-gallery button.gallery-item {
        display: contents;
      }
      .character-container {
        cursor: pointer;
        text-align: center;
        position: relative;
        opacity: 0.85;
      }
      .character-container:hover {
        opacity: 1;
      }
      .character-container .placeholder, .character-container img {
        width: 150px;
        height: 200px;
        background-color: gray;
        object-fit: cover;
        margin: 0 auto;
        border-radius: 1rem;
        border: 3px solid white;
        box-shadow: 3px 3px 6px 0px rgb(0 0 0 / 50%);
      }
      .character-name {
        margin-top: 0.3rem;
        display: block;
        font-size: 1.2rem;
        font-weight: 600;
        overflow-wrap: anywhere;
      }
    """
    return css
 def generate_html():
    global cards
    cards = []
    # Iterate through files in image folder
    for file in sorted(Path("characters").glob("*")):
        if file.suffix in [".json", ".yml", ".yaml"]:
            character = file.stem
            container_html = '<div class="character-container">'
            image_html = "<div class='placeholder'></div>"
            for path in [Path(f"characters/{character}.{extension}") for extension in ['png', 'jpg', 'jpeg']]:
                if path.exists():
                    image_html = f'<img src="file/{get_image_cache(path)}">'
                    break
            container_html += f'{image_html} <span class="character-name">{character}</span>'
            container_html += "</div>"
            cards.append([container_html, character])
    return cards
 def filter_cards(filter_str=''):
    if filter_str == '':
        return cards
    filter_upper = filter_str.upper()
    return [k for k in cards if filter_upper in k[1].upper()]
 def select_character(evt: gr.SelectData):
    return (evt.value[1])
 def custom_js():
    path_to_js = Path(__file__).parent.resolve() / 'script.js'
    return open(path_to_js, 'r').read()
 def ui():
    with gr.Accordion("Character gallery", open=settings["gallery-open"], elem_id='gallery-extension'):
        gr.HTML(value="<style>" + generate_css() + "</style>")
        with gr.Row():
            filter_box = gr.Textbox(label='', placeholder='Filter', lines=1, max_lines=1, container=False, elem_id='gallery-filter-box')
            gr.ClearButton(filter_box, value='🗑️', elem_classes='refresh-button')
            update = gr.Button("Refresh", elem_classes='refresh-button')
        gallery = gr.Dataset(
            components=[gr.HTML(visible=False)],
            label="",
            samples=generate_html(),
            elem_classes=["character-gallery"],
            samples_per_page=settings["gallery-items_per_page"]
        )
    filter_box.change(lambda: None, None, None, _js=f'() => {{{custom_js()}; gotoFirstPage()}}').success(
        filter_cards, filter_box, gallery).then(
        lambda x: gr.update(elem_classes='highlighted-border' if x != '' else ''), filter_box, filter_box, show_progress=False)
    update.click(generate_html, [], None).success(
        filter_cards, filter_box, gallery)
    gallery.select(select_character, None, gradio['character_menu'])
--- a/python/llm/example/Text-Generation-WebUI/extensions/google_translate/requirements.txt
+++ b/python/llm/example/Text-Generation-WebUI/extensions/google_translate/requirements.txt
@ -0,0 +1 @@
 deep-translator==1.9.2
--- a/python/llm/example/Text-Generation-WebUI/extensions/google_translate/script.py
+++ b/python/llm/example/Text-Generation-WebUI/extensions/google_translate/script.py
@ -0,0 +1,78 @@
 #
 # Copyright 2016 The BigDL Authors.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # This file is adapted from
 # https://github.com/oobabooga/text-generation-webui/blob/main/extensions/google_translate/script.py
 import html
 import gradio as gr
 from deep_translator import GoogleTranslator
 params = {
    "activate": True,
    "language string": "ja",
 }
 language_codes = {'Afrikaans': 'af', 'Albanian': 'sq', 'Amharic': 'am', 'Arabic': 'ar', 'Armenian': 'hy', 'Azerbaijani': 'az', 'Basque': 'eu', 'Belarusian': 'be', 'Bengali': 'bn', 'Bosnian': 'bs', 'Bulgarian': 'bg', 'Catalan': 'ca', 'Cebuano': 'ceb', 'Chinese (Simplified)': 'zh-CN', 'Chinese (Traditional)': 'zh-TW', 'Corsican': 'co', 'Croatian': 'hr', 'Czech': 'cs', 'Danish': 'da', 'Dutch': 'nl', 'English': 'en', 'Esperanto': 'eo', 'Estonian': 'et', 'Finnish': 'fi', 'French': 'fr', 'Frisian': 'fy', 'Galician': 'gl', 'Georgian': 'ka', 'German': 'de', 'Greek': 'el', 'Gujarati': 'gu', 'Haitian Creole': 'ht', 'Hausa': 'ha', 'Hawaiian': 'haw', 'Hebrew': 'iw', 'Hindi': 'hi', 'Hmong': 'hmn', 'Hungarian': 'hu', 'Icelandic': 'is', 'Igbo': 'ig', 'Indonesian': 'id', 'Irish': 'ga', 'Italian': 'it', 'Japanese': 'ja', 'Javanese': 'jw', 'Kannada': 'kn', 'Kazakh': 'kk', 'Khmer': 'km', 'Korean': 'ko', 'Kurdish': 'ku', 'Kyrgyz': 'ky', 'Lao': 'lo', 'Latin': 'la', 'Latvian': 'lv', 'Lithuanian': 'lt', 'Luxembourgish': 'lb', 'Macedonian': 'mk', 'Malagasy': 'mg', 'Malay': 'ms', 'Malayalam': 'ml', 'Maltese': 'mt', 'Maori': 'mi', 'Marathi': 'mr', 'Mongolian': 'mn', 'Myanmar (Burmese)': 'my', 'Nepali': 'ne', 'Norwegian': 'no', 'Nyanja (Chichewa)': 'ny', 'Pashto': 'ps', 'Persian': 'fa', 'Polish': 'pl', 'Portuguese (Portugal, Brazil)': 'pt', 'Punjabi': 'pa', 'Romanian': 'ro', 'Russian': 'ru', 'Samoan': 'sm', 'Scots Gaelic': 'gd', 'Serbian': 'sr', 'Sesotho': 'st', 'Shona': 'sn', 'Sindhi': 'sd', 'Sinhala (Sinhalese)': 'si', 'Slovak': 'sk', 'Slovenian': 'sl', 'Somali': 'so', 'Spanish': 'es', 'Sundanese': 'su', 'Swahili': 'sw', 'Swedish': 'sv', 'Tagalog (Filipino)': 'tl', 'Tajik': 'tg', 'Tamil': 'ta', 'Telugu': 'te', 'Thai': 'th', 'Turkish': 'tr', 'Ukrainian': 'uk', 'Urdu': 'ur', 'Uzbek': 'uz', 'Vietnamese': 'vi', 'Welsh': 'cy', 'Xhosa': 'xh', 'Yiddish': 'yi', 'Yoruba': 'yo', 'Zulu': 'zu'}
 def input_modifier(string):
    """
    This function is applied to your text inputs before
    they are fed into the model.
    """
    if not params['activate']:
        return string
    return GoogleTranslator(source=params['language string'], target='en').translate(string)
 def output_modifier(string):
    """
    This function is applied to the model outputs.
    """
    if not params['activate']:
        return string
    translated_str = GoogleTranslator(source='en', target=params['language string']).translate(html.unescape(string))
    return html.escape(translated_str)
 def bot_prefix_modifier(string):
    """
    This function is only applied in chat mode. It modifies
    the prefix text for the Bot and can be used to bias its
    behavior.
    """
    return string
 def ui():
    # Finding the language name from the language code to use as the default value
    language_name = list(language_codes.keys())[list(language_codes.values()).index(params['language string'])]
    # Gradio elements
    with gr.Row():
        activate = gr.Checkbox(value=params['activate'], label='Activate translation')
    with gr.Row():
        language = gr.Dropdown(value=language_name, choices=[k for k in language_codes], label='Language')
    # Event functions to update the parameters in the backend
    activate.change(lambda x: params.update({"activate": x}), activate, None)
    language.change(lambda x: params.update({"language string": language_codes[x]}), language, None)
--- a/python/llm/example/Text-Generation-WebUI/extensions/long_replies/script.py
+++ b/python/llm/example/Text-Generation-WebUI/extensions/long_replies/script.py
@ -0,0 +1,162 @@
 #
 # Copyright 2016 The BigDL Authors.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # This file is adapted from
 # https://github.com/oobabooga/text-generation-webui/blob/main/extensions/long_replies/script.py
 import torch
 from modules import chat, shared
 from modules.text_generation import (
    decode,
    encode,
    generate_reply,
 )
 from transformers import LogitsProcessor
 import gradio as gr
 params = {
    "display_name": "Long replies",
    "is_tab": False,
    "min_length": 120,
 }
 initial_size = 0
 class MyLogits(LogitsProcessor):
    """
    Manipulates the probabilities for the next token before it gets sampled.
    Used in the logits_processor_modifier function below.
    """
    def __init__(self):
        self.newline_id = shared.tokenizer.encode('\n')[-1]
        pass
    def __call__(self, input_ids, scores):
        if input_ids.shape[-1] - initial_size < params["min_length"]:
            scores[...,self.newline_id] = -1000
            # scores[...,shared.tokenizer.eos_token_id] = -1000
        # probs = torch.softmax(scores, dim=-1, dtype=torch.float)
        # probs[0] /= probs[0].sum()
        # scores = torch.log(probs / (1 - probs))
        return scores
 def history_modifier(history):
    """
    Modifies the chat history.
    Only used in chat mode.
    """
    return history
 def state_modifier(state):
    """
    Modifies the state variable, which is a dictionary containing the input
    values in the UI like sliders and checkboxes.
    """
    return state
 def chat_input_modifier(text, visible_text, state):
    """
    Modifies the user input string in chat mode (visible_text).
    You can also modify the internal representation of the user
    input (text) to change how it will appear in the prompt.
    """
    return text, visible_text
 def input_modifier(string, state):
    """
    In default/notebook modes, modifies the whole prompt.
    In chat mode, it is the same as chat_input_modifier but only applied
    to "text", here called "string", and not to "visible_text".
    """
    return string
 def bot_prefix_modifier(string, state):
    """
    Modifies the prefix for the next bot reply in chat mode.
    By default, the prefix will be something like "Bot Name:".
    """
    return string
 def tokenizer_modifier(state, prompt, input_ids, input_embeds):
    """
    Modifies the input ids and embeds.
    Used by the multimodal extension to put image embeddings in the prompt.
    Only used by loaders that use the transformers library for sampling.
    """
    global initial_size
    initial_size = input_ids.shape[-1]
    return prompt, input_ids, input_embeds
 def logits_processor_modifier(processor_list, input_ids):
    """
    Adds logits processors to the list, allowing you to access and modify
    the next token probabilities.
    Only used by loaders that use the transformers library for sampling.
    """
    processor_list.append(MyLogits())
    return processor_list
 def output_modifier(string, state):
    """
    Modifies the LLM output before it gets presented.
    In chat mode, the modified version goes into history['visible'],
    and the original version goes into history['internal'].
    """
    return string
 def custom_generate_chat_prompt(user_input, state, **kwargs):
    """
    Replaces the function that generates the prompt from the chat history.
    Only used in chat mode.
    """
    result = chat.generate_chat_prompt(user_input, state, **kwargs)
    return result
 def custom_css():
    """
    Returns a CSS string that gets appended to the CSS for the webui.
    """
    return ''
 def custom_js():
    """
    Returns a javascript string that gets appended to the javascript
    for the webui.
    """
    return ''
 def setup():
    """
    Gets executed only once, when the extension is imported.
    """
    pass
 def ui():
    """
    Gets executed when the UI is drawn. Custom gradio elements and
    their corresponding event handlers should be defined here.
    To learn about gradio components, check out the docs:
    https://gradio.app/docs/
    """
    min_length = gr.Slider(0, 800, step=10, value=params['min_length'], label='Minimum reply length')
    min_length.change(lambda x: params.update({'min_length': x}), min_length, None)
--- a/python/llm/example/Text-Generation-WebUI/grammars/arithmetic.gbnf
+++ b/python/llm/example/Text-Generation-WebUI/grammars/arithmetic.gbnf
@ -0,0 +1,6 @@
 root  ::= (expr "=" ws term "\n")+
 expr  ::= term ([-+*/] term)*
 term  ::= ident | num | "(" ws expr ")" ws
 ident ::= [a-z] [a-z0-9_]* ws
 num   ::= [0-9]+ ws
 ws    ::= [ \t\n]*
--- a/python/llm/example/Text-Generation-WebUI/grammars/c.gbnf
+++ b/python/llm/example/Text-Generation-WebUI/grammars/c.gbnf
@ -0,0 +1,42 @@
 root ::= (declaration)*
 declaration ::= dataType identifier "(" parameter? ")" "{" statement* "}"
 dataType  ::= "int" ws | "float" ws | "char" ws
 identifier ::= [a-zA-Z_] [a-zA-Z_0-9]*
 parameter ::= dataType identifier
 statement ::=
    ( dataType identifier ws "=" ws expression ";" ) |
    ( identifier ws "=" ws expression ";" ) |
    ( identifier ws "(" argList? ")" ";" ) |
    ( "return" ws expression ";" ) |
    ( "while" "(" condition ")" "{" statement* "}" ) |
    ( "for" "(" forInit ";" ws condition ";" ws forUpdate ")" "{" statement* "}" ) |
    ( "if" "(" condition ")" "{" statement* "}" ("else" "{" statement* "}")? ) |
    ( singleLineComment ) |
    ( multiLineComment )
 forInit ::= dataType identifier ws "=" ws expression | identifier ws "=" ws expression
 forUpdate ::= identifier ws "=" ws expression
 condition ::= expression relationOperator expression
 relationOperator ::= ("<=" | "<" | "==" | "!=" | ">=" | ">")
 expression ::= term (("+" | "-") term)*
 term ::= factor(("*" | "/") factor)*
 factor ::= identifier | number | unaryTerm | funcCall | parenExpression
 unaryTerm ::= "-" factor
 funcCall ::= identifier "(" argList? ")"
 parenExpression ::= "(" ws expression ws ")"
 argList ::= expression ("," ws expression)*
 number ::= [0-9]+
 singleLineComment ::= "//" [^\n]* "\n"
 multiLineComment ::= "/*" ( [^*] | ("*" [^/]) )* "*/"
 ws ::= ([ \t\n]+)
--- a/python/llm/example/Text-Generation-WebUI/grammars/chess.gbnf
+++ b/python/llm/example/Text-Generation-WebUI/grammars/chess.gbnf
@ -0,0 +1,13 @@
 # Specifies chess moves as a list in algebraic notation, using PGN conventions
 # Force first move to "1. ", then any 1-2 digit number after, relying on model to follow the pattern
 root    ::= "1. " move " " move "\n" ([1-9] [0-9]? ". " move " " move "\n")+
 move    ::= (pawn | nonpawn | castle) [+#]?
 # piece type, optional file/rank, optional capture, dest file & rank
 nonpawn ::= [NBKQR] [a-h]? [1-8]? "x"? [a-h] [1-8]
 # optional file & capture, dest file & rank, optional promotion
 pawn    ::= ([a-h] "x")? [a-h] [1-8] ("=" [NBKQR])?
 castle  ::= "O-O" "-O"?
--- a/python/llm/example/Text-Generation-WebUI/grammars/json.gbnf
+++ b/python/llm/example/Text-Generation-WebUI/grammars/json.gbnf
@ -0,0 +1,14 @@
 root   ::= object
 object ::= "{" ws ( string ":" ws value ("," ws string ":" ws value)* )? "}"
 value  ::= object | array | string | number | ("true" | "false" | "null") ws
 array  ::= "[" ws ( value ("," ws value)* )? "]" ws
 string ::= "\"" ( [a-zA-Z0-9] )* "\"" ws
 number ::= ("-"? ([0-9] | [1-9] [0-9]*)) ("." [0-9]+)? ([eE] [-+]? [0-9]+)? ws
 ws ::= ([ \t\n] ws)?
--- a/python/llm/example/Text-Generation-WebUI/grammars/json_w_trailing_space.gbnf
+++ b/python/llm/example/Text-Generation-WebUI/grammars/json_w_trailing_space.gbnf
@ -0,0 +1,14 @@
 root   ::= object
 object ::= "{" ws ( string ":" ws value ("," ws string ":" ws value)* )? "}" ws
 value  ::= object | array | string | number | ("true" | "false" | "null") ws
 array  ::= "[" ws ( value ("," ws value)* )? "]" ws
 string ::= "\"" ( [a-zA-Z0-9] )* "\"" ws
 number ::= ("-"? ([0-9] | [1-9] [0-9]*)) ("." [0-9]+)? ([eE] [-+]? [0-9]+)? ws
 ws ::= ([ \t\n] ws)?
--- a/python/llm/example/Text-Generation-WebUI/grammars/list.gbnf
+++ b/python/llm/example/Text-Generation-WebUI/grammars/list.gbnf
@ -0,0 +1,2 @@
 root       ::= "1. " paragraph "\n" ([0-9] [0-9]? ". " paragraph "\n")+
 paragraph  ::= [a-zA-Z'.,; ]+
--- a/python/llm/example/Text-Generation-WebUI/grammars/simple_arithmetic.gbnf
+++ b/python/llm/example/Text-Generation-WebUI/grammars/simple_arithmetic.gbnf
@ -0,0 +1,7 @@
 root  ::= (expr "=" ws term "\n")+
 expr  ::= term ([-+*/] term)*
 term  ::= num | "(" ws expr ")" ws
 num   ::= [0-9]+ ws
 ws    ::= [ \t\n]*
 # this is a comment
--- a/python/llm/example/Text-Generation-WebUI/js/main.js
+++ b/python/llm/example/Text-Generation-WebUI/js/main.js
@ -0,0 +1,373 @@
 let main_parent = document.getElementById("chat-tab").parentNode;
 let extensions = document.getElementById("extensions");
 main_parent.childNodes[0].classList.add("header_bar");
 main_parent.style = "padding: 0; margin: 0";
 main_parent.parentNode.style = "gap: 0";
 main_parent.parentNode.parentNode.style = "padding: 0";
 document.querySelector(".header_bar").addEventListener("click", function(event) {
  if (event.target.tagName === "BUTTON") {
    const buttonText = event.target.textContent.trim();
    let chat_visible = (buttonText == "Chat");
    let default_visible = (buttonText == "Default");
    let notebook_visible = (buttonText == "Notebook");
    // Check if one of the generation tabs is visible
    if (chat_visible || notebook_visible || default_visible) {
      extensions && (extensions.style.display = "flex");
      if (chat_visible) {
        this.style.marginBottom = "0px";
        extensions && (extensions.style.maxWidth = "880px");
        extensions && (extensions.style.padding = "0px");
      } else {
        this.style.marginBottom = "19px";
        extensions && (extensions.style.maxWidth = "none");
        extensions && (extensions.style.padding = "15px");
      }
    } else {
      this.style.marginBottom = "19px";
      extensions && (extensions.style.display = "none");
    }
  }
 });
 //------------------------------------------------
 // Keyboard shortcuts
 //------------------------------------------------
 document.addEventListener("keydown", function(event) {
  // Stop generation on Esc pressed
  if (event.key === "Escape") {
    // Find the element with id 'stop' and click it
    var stopButton = document.getElementById("stop");
    if (stopButton) {
      stopButton.click();
    }
  }
  // Show chat controls on Ctrl + S
  else if (event.ctrlKey && event.key == "s") {
    event.preventDefault();
    var showControlsElement = document.getElementById("show-controls");
    if (showControlsElement && showControlsElement.childNodes.length >= 4) {
      showControlsElement.childNodes[3].click();
      var arr = document.getElementById("chat-input").childNodes[2].childNodes;
      arr[arr.length - 1].focus();
    }
  }
  // Regenerate on Ctrl + Enter
  else if (event.ctrlKey && event.key === "Enter") {
    event.preventDefault();
    document.getElementById("Regenerate").click();
  }
  // Continue on Alt + Enter
  else if (event.altKey && event.key === "Enter") {
    event.preventDefault();
    document.getElementById("Continue").click();
  }
  // Remove last on Ctrl + Shift + Backspace
  else if (event.ctrlKey && event.shiftKey && event.key === "Backspace") {
    event.preventDefault();
    document.getElementById("Remove-last").click();
  }
  // Copy last on Ctrl + Shift + K
  else if (event.ctrlKey && event.shiftKey && event.key === "K") {
    event.preventDefault();
    document.getElementById("Copy-last").click();
  }
  // Replace last on Ctrl + Shift + L
  else if (event.ctrlKey && event.shiftKey && event.key === "L") {
    event.preventDefault();
    document.getElementById("Replace-last").click();
  }
  // Impersonate on Ctrl + Shift + M
  else if (event.ctrlKey && event.shiftKey && event.key === "M") {
    event.preventDefault();
    document.getElementById("Impersonate").click();
  }
 });
 //------------------------------------------------
 // Position the chat typing dots
 //------------------------------------------------
 typing = document.getElementById("typing-container");
 typingParent = typing.parentNode;
 typingSibling = typing.previousElementSibling;
 typingSibling.insertBefore(typing, typingSibling.childNodes[2]);
 //------------------------------------------------
 // Chat scrolling
 //------------------------------------------------
 const targetElement = document.getElementById("chat").parentNode.parentNode.parentNode;
 targetElement.classList.add("pretty_scrollbar");
 targetElement.classList.add("chat-parent");
 let isScrolled = false;
 targetElement.addEventListener("scroll", function() {
  let diff = targetElement.scrollHeight - targetElement.clientHeight;
  if(Math.abs(targetElement.scrollTop - diff) <= 10 || diff == 0) {
    isScrolled = false;
  } else {
    isScrolled = true;
  }
 });
 // Create a MutationObserver instance
 const observer = new MutationObserver(function(mutations) {
  mutations.forEach(function(mutation) {
    updateCssProperties();
    if(!isScrolled) {
      targetElement.scrollTop = targetElement.scrollHeight;
    }
    const firstChild = targetElement.children[0];
    if (firstChild.classList.contains("generating")) {
      typing.parentNode.classList.add("visible-dots");
      document.getElementById("stop").style.display = "flex";
      document.getElementById("Generate").style.display = "none";
    } else {
      typing.parentNode.classList.remove("visible-dots");
      document.getElementById("stop").style.display = "none";
      document.getElementById("Generate").style.display = "flex";
    }
  });
 });
 // Configure the observer to watch for changes in the subtree and attributes
 const config = {
  childList: true,
  subtree: true,
  characterData: true,
  attributeOldValue: true,
  characterDataOldValue: true
 };
 // Start observing the target element
 observer.observe(targetElement, config);
 //------------------------------------------------
 // Add some scrollbars
 //------------------------------------------------
 const textareaElements = document.querySelectorAll(".add_scrollbar textarea");
 for(i = 0; i < textareaElements.length; i++) {
  textareaElements[i].classList.remove("scroll-hide");
  textareaElements[i].classList.add("pretty_scrollbar");
  textareaElements[i].style.resize = "none";
 }
 //------------------------------------------------
 // Remove some backgrounds
 //------------------------------------------------
 const noBackgroundelements = document.querySelectorAll(".no-background");
 for(i = 0; i < noBackgroundelements.length; i++) {
  noBackgroundelements[i].parentNode.style.border = "none";
  noBackgroundelements[i].parentNode.parentNode.parentNode.style.alignItems = "center";
 }
 const slimDropdownElements = document.querySelectorAll(".slim-dropdown");
 for (i = 0; i < slimDropdownElements.length; i++) {
  const parentNode = slimDropdownElements[i].parentNode;
  parentNode.style.background = "transparent";
  parentNode.style.border = "0";
 }
 //------------------------------------------------
 // Create the hover menu in the chat tab
 // The show/hide events were adapted from:
 // https://github.com/SillyTavern/SillyTavern/blob/6c8bd06308c69d51e2eb174541792a870a83d2d6/public/script.js
 //------------------------------------------------
 var buttonsInChat = document.querySelectorAll("#chat-tab:not(.old-ui) #chat-buttons button");
 var button = document.getElementById("hover-element-button");
 var menu = document.getElementById("hover-menu");
 var istouchscreen = (navigator.maxTouchPoints > 0) || "ontouchstart" in document.documentElement;
 function showMenu() {
  menu.style.display = "flex"; // Show the menu
 }
 function hideMenu() {
  menu.style.display = "none"; // Hide the menu
  if (!istouchscreen) {
    document.querySelector("#chat-input textarea").focus(); // Focus on the chat input
  }
 }
 if (buttonsInChat.length > 0) {
  for (let i = buttonsInChat.length - 1; i >= 0; i--) {
    const thisButton = buttonsInChat[i];
    menu.appendChild(thisButton);
    thisButton.addEventListener("click", () => {
      hideMenu();
    });
    const buttonText = thisButton.textContent;
    const matches = buttonText.match(/(\(.*?\))/);
    if (matches && matches.length > 1) {
      // Apply the transparent-substring class to the matched substring
      const substring = matches[1];
      const newText = buttonText.replace(substring, `&nbsp;<span class="transparent-substring">${substring.slice(1, -1)}</span>`);
      thisButton.innerHTML = newText;
    }
  }
 } else {
  buttonsInChat = document.querySelectorAll("#chat-tab.old-ui #chat-buttons button");
  for (let i = 0; i < buttonsInChat.length; i++) {
    buttonsInChat[i].textContent = buttonsInChat[i].textContent.replace(/ \(.*?\)/, "");
  }
  document.getElementById("gr-hover-container").style.display = "none";
 }
 function isMouseOverButtonOrMenu() {
  return menu.matches(":hover") || button.matches(":hover");
 }
 button.addEventListener("mouseenter", function () {
  if (!istouchscreen) {
    showMenu();
  }
 });
 button.addEventListener("click", function () {
  if (menu.style.display === "flex") {
    hideMenu();
  }
  else {
  showMenu();
  }
 });
 // Add event listener for mouseleave on the button
 button.addEventListener("mouseleave", function () {
  // Delay to prevent menu hiding when the mouse leaves the button into the menu
  setTimeout(function () {
    if (!isMouseOverButtonOrMenu()) {
      hideMenu();
    }
  }, 100);
 });
 // Add event listener for mouseleave on the menu
 menu.addEventListener("mouseleave", function () {
  // Delay to prevent menu hide when the mouse leaves the menu into the button
  setTimeout(function () {
    if (!isMouseOverButtonOrMenu()) {
      hideMenu();
    }
  }, 100);
 });
 // Add event listener for click anywhere in the document
 document.addEventListener("click", function (event) {
  // Check if the click is outside the button/menu and the menu is visible
  if (!isMouseOverButtonOrMenu() && menu.style.display === "flex") {
    hideMenu();
  }
  if (event.target.classList.contains("pfp_character")) {
    toggleBigPicture();
  }
 });
 //------------------------------------------------
 // Relocate the "Show controls" checkbox
 //------------------------------------------------
 var elementToMove = document.getElementById("show-controls");
 var parent = elementToMove.parentNode;
 for (var i = 0; i < 2; i++) {
  parent = parent.parentNode;
 }
 parent.insertBefore(elementToMove, parent.firstChild);
 //------------------------------------------------
 // Make the chat input grow upwards instead of downwards
 //------------------------------------------------
 document.getElementById("show-controls").parentNode.style.position = "absolute";
 document.getElementById("show-controls").parentNode.style.bottom = "0px";
 //------------------------------------------------
 // Focus on the chat input
 //------------------------------------------------
 document.querySelector("#chat-input textarea").focus();
 //------------------------------------------------
 // Show enlarged character picture when the profile
 // picture is clicked on
 //------------------------------------------------
 let bigPictureVisible = false;
 function addBigPicture() {
  var imgElement = document.createElement("img");
  var timestamp = new Date().getTime();
  imgElement.src = "/file/cache/pfp_character.png?time=" + timestamp;
  imgElement.classList.add("bigProfilePicture");
  var imgElementParent = document.getElementById("chat").parentNode.parentNode.parentNode.parentNode.parentNode.parentNode.parentNode;
  imgElementParent.appendChild(imgElement);
 }
 function deleteBigPicture() {
  var bigProfilePictures = document.querySelectorAll(".bigProfilePicture");
  bigProfilePictures.forEach(function (element) {
    element.parentNode.removeChild(element);
  });
 }
 function toggleBigPicture() {
  if(bigPictureVisible) {
    deleteBigPicture();
    bigPictureVisible = false;
  } else {
    addBigPicture();
    bigPictureVisible = true;
  }
 }
 //------------------------------------------------
 // Define global CSS properties for resizing and
 // positioning certain elements
 //------------------------------------------------
 let currentChatInputHeight = 0;
 function updateCssProperties() {
  // Set the height of the chat area
  const chatContainer = document.getElementById("chat").parentNode.parentNode.parentNode;
  const chatInputHeight = document.querySelector("#chat-input textarea").clientHeight;
  if (chatContainer.clientHeight > 0) {
    const newChatHeight = `${chatContainer.clientHeight - chatInputHeight + 40}px`;
    document.documentElement.style.setProperty("--chat-height", newChatHeight);
    document.documentElement.style.setProperty("--input-delta", `${chatInputHeight - 40}px`);
    // Set the position offset of the chat input box
    const header = document.querySelector(".header_bar");
    const headerHeight = `${header.clientHeight}px`;
    document.documentElement.style.setProperty("--header-height", headerHeight);
    // Offset the scroll position of the chat area
    if (chatInputHeight !== currentChatInputHeight) {
      chatContainer.scrollTop += chatInputHeight > currentChatInputHeight ? chatInputHeight : -chatInputHeight;
      currentChatInputHeight = chatInputHeight;
    }
  }
 }
 new ResizeObserver(updateCssProperties)
  .observe(document.querySelector("#chat-input textarea"));
 window.addEventListener("resize", updateCssProperties);
--- a/python/llm/example/Text-Generation-WebUI/js/save_files.js
+++ b/python/llm/example/Text-Generation-WebUI/js/save_files.js
@ -0,0 +1,40 @@
 // Functions for downloading JSON files
 function getCurrentTimestamp() {
  const now = new Date();
  const timezoneOffset = now.getTimezoneOffset() * 60000; // Convert to milliseconds
  const localTime = new Date(now.getTime() - timezoneOffset);
  const formattedTimestamp = localTime.toISOString().replace(/[-:]/g, "").slice(0, 15);
  return formattedTimestamp;
 }
 function saveFile(contents, filename) {
  const element = document.createElement("a");
  element.setAttribute("href", "data:text/plain;charset=utf-8," + encodeURIComponent(contents));
  element.setAttribute("download", filename);
  element.style.display = "none";
  document.body.appendChild(element);
  element.click();
  document.body.removeChild(element);
 }
 function saveHistory(history, character, mode) {
  let path = null;
  if (["chat", "chat-instruct"].includes(mode) && character && character.trim() !== "") {
    path = `history_${character}_${getCurrentTimestamp()}.json`;
  } else {
    try {
      path = `history_${mode}_${getCurrentTimestamp()}.json`;
    } catch (error) {
      path = `history_${getCurrentTimestamp()}.json`;
    }
  }
  saveFile(history, path);
 }
 function saveSession(session) {
  let path = null;
  path = `session_${getCurrentTimestamp()}.json`;
  saveFile(session, path);
 }
--- a/python/llm/example/Text-Generation-WebUI/js/show_controls.js
+++ b/python/llm/example/Text-Generation-WebUI/js/show_controls.js
@ -0,0 +1,30 @@
 const belowChatInput = document.querySelectorAll("#chat-tab > div > :nth-child(n+2), #extensions");
 const chatParent = document.querySelector(".chat-parent");
 function toggle_controls(value) {
  if (value) {
    belowChatInput.forEach(element => {
      element.style.display = "inherit";
    });
    chatParent.classList.remove("bigchat");
    document.getElementById("chat-input-row").classList.remove("bigchat");
    document.getElementById("chat-col").classList.remove("bigchat");
    document.getElementById("chat-tab").style.paddingBottom = "";
    let gallery_element = document.getElementById("gallery-extension");
    if (gallery_element) {
      gallery_element.style.display = "block";
    }
  } else {
    belowChatInput.forEach(element => {
      element.style.display = "none";
    });
    chatParent.classList.add("bigchat");
    document.getElementById("chat-input-row").classList.add("bigchat");
    document.getElementById("chat-col").classList.add("bigchat");
    document.getElementById("chat-tab").style.paddingBottom = "0px";
  }
 }
--- a/python/llm/example/Text-Generation-WebUI/js/switch_tabs.js
+++ b/python/llm/example/Text-Generation-WebUI/js/switch_tabs.js
@ -0,0 +1,59 @@
 let chat_tab = document.getElementById("chat-tab");
 let main_parent = chat_tab.parentNode;
 function scrollToTop() {
  window.scrollTo({
    top: 0,
    // behavior: 'smooth'
  });
 }
 function findButtonsByText(buttonText) {
  const buttons = document.getElementsByTagName("button");
  const matchingButtons = [];
  buttonText = buttonText.trim();
  for (let i = 0; i < buttons.length; i++) {
    const button = buttons[i];
    const buttonInnerText = button.textContent.trim();
    if (buttonInnerText === buttonText) {
      matchingButtons.push(button);
    }
  }
  return matchingButtons;
 }
 function switch_to_chat() {
  let chat_tab_button = main_parent.childNodes[0].childNodes[1];
  chat_tab_button.click();
  scrollToTop();
 }
 function switch_to_default() {
  let default_tab_button = main_parent.childNodes[0].childNodes[4];
  default_tab_button.click();
  scrollToTop();
 }
 function switch_to_notebook() {
  let notebook_tab_button = main_parent.childNodes[0].childNodes[7];
  notebook_tab_button.click();
  findButtonsByText("Raw")[1].click();
  scrollToTop();
 }
 function switch_to_generation_parameters() {
  let parameters_tab_button = main_parent.childNodes[0].childNodes[10];
  parameters_tab_button.click();
  findButtonsByText("Generation")[0].click();
  scrollToTop();
 }
 function switch_to_character() {
  let parameters_tab_button = main_parent.childNodes[0].childNodes[10];
  parameters_tab_button.click();
  findButtonsByText("Character")[0].click();
  scrollToTop();
 }
--- a/python/llm/example/Text-Generation-WebUI/js/update_big_picture.js
+++ b/python/llm/example/Text-Generation-WebUI/js/update_big_picture.js
@ -0,0 +1,7 @@
 function updateBigPicture() {
  var existingElement = document.querySelector(".bigProfilePicture");
  if (existingElement) {
    var timestamp = new Date().getTime();
    existingElement.src = "/file/cache/pfp_character.png?time=" + timestamp;
  }
 }
--- a/python/llm/example/Text-Generation-WebUI/models/place_your_models_here.txt
+++ b/python/llm/example/Text-Generation-WebUI/models/place_your_models_here.txt
--- a/python/llm/example/Text-Generation-WebUI/modules/RoPE.py
+++ b/python/llm/example/Text-Generation-WebUI/modules/RoPE.py
@ -0,0 +1,37 @@
 #
 # Copyright 2016 The BigDL Authors.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # This file is adapted from
 # https://github.com/oobabooga/text-generation-webui/blob/main/modules/RoPE.py
 def get_alpha_value(alpha, base):
    '''
    Gets alpha_value from alpha_value and rope_freq_base
    '''
    if base > 0:
        return (base/10000.) ** (63/64.)
    else:
        return alpha
 def get_rope_freq_base(alpha, base):
    '''
    Gets rope_freq_base from alpha_value and rope_freq_base
    '''
    if base > 0:
        return base
    else:
        return 10000 * alpha ** (64/63.)
--- a/python/llm/example/Text-Generation-WebUI/modules/block_requests.py
+++ b/python/llm/example/Text-Generation-WebUI/modules/block_requests.py
@ -0,0 +1,66 @@
 #
 # Copyright 2016 The BigDL Authors.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # This file is adapted from
 # https://github.com/oobabooga/text-generation-webui/blob/main/modules/block_requests.py
 import builtins
 import io
 import requests
 from modules.logging_colors import logger
 original_open = open
 original_get = requests.get
 class RequestBlocker:
    def __enter__(self):
        requests.get = my_get
    def __exit__(self, exc_type, exc_value, traceback):
        requests.get = original_get
 class OpenMonkeyPatch:
    def __enter__(self):
        builtins.open = my_open
    def __exit__(self, exc_type, exc_value, traceback):
        builtins.open = original_open
 def my_get(url, **kwargs):
    logger.info('Unwanted HTTP request redirected to localhost :)')
    kwargs.setdefault('allow_redirects', True)
    return requests.api.request('get', 'http://127.0.0.1/', **kwargs)
 # Kindly provided by our friend WizardLM-30B
 def my_open(*args, **kwargs):
    filename = str(args[0])
    if filename.endswith('index.html'):
        with original_open(*args, **kwargs) as f:
            file_contents = f.read()
        file_contents = file_contents.replace(b'\t\t<script\n\t\t\tsrc="https://cdnjs.cloudflare.com/ajax/libs/iframe-resizer/4.3.7/iframeResizer.contentWindow.min.js"\n\t\t\tasync\n\t\t></script>', b'')
        file_contents = file_contents.replace(b'cdnjs.cloudflare.com', b'127.0.0.1')
        return io.BytesIO(file_contents)
    else:
        return original_open(*args, **kwargs)
--- a/python/llm/example/Text-Generation-WebUI/modules/callbacks.py
+++ b/python/llm/example/Text-Generation-WebUI/modules/callbacks.py
@ -0,0 +1,122 @@
 #
 # Copyright 2016 The BigDL Authors.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # This file is adapted from
 # https://github.com/oobabooga/text-generation-webui/blob/main/modules/callbacks.py
 import gc
 import traceback
 from queue import Queue
 from threading import Thread
 import torch
 import transformers
 from transformers import is_torch_xpu_available
 import modules.shared as shared
 class StopNowException(Exception):
    pass
 class _StopEverythingStoppingCriteria(transformers.StoppingCriteria):
    def __init__(self):
        transformers.StoppingCriteria.__init__(self)
    def __call__(self, input_ids: torch.LongTensor, _scores: torch.FloatTensor) -> bool:
        return shared.stop_everything
 class Stream(transformers.StoppingCriteria):
    def __init__(self, callback_func=None):
        self.callback_func = callback_func
    def __call__(self, input_ids, scores) -> bool:
        if self.callback_func is not None:
            self.callback_func(input_ids[0])
        return False
 class Iteratorize:
    """
    Transforms a function that takes a callback
    into a lazy iterator (generator).
    Adapted from: https://stackoverflow.com/a/9969000
    """
    def __init__(self, func, args=None, kwargs=None, callback=None):
        self.mfunc = func
        self.c_callback = callback
        self.q = Queue()
        self.sentinel = object()
        self.args = args or []
        self.kwargs = kwargs or {}
        self.stop_now = False
        def _callback(val):
            if self.stop_now or shared.stop_everything:
                raise StopNowException
            self.q.put(val)
        def gentask():
            try:
                ret = self.mfunc(callback=_callback, *args, **self.kwargs)
            except StopNowException:
                pass
            except:
                traceback.print_exc()
                pass
            clear_torch_cache()
            self.q.put(self.sentinel)
            if self.c_callback:
                self.c_callback(ret)
        self.thread = Thread(target=gentask)
        self.thread.start()
    def __iter__(self):
        return self
    def __next__(self):
        obj = self.q.get(True, None)
        if obj is self.sentinel:
            raise StopIteration
        else:
            return obj
    def __del__(self):
        clear_torch_cache()
    def __enter__(self):
        return self
    def __exit__(self, exc_type, exc_val, exc_tb):
        self.stop_now = True
        clear_torch_cache()
 def clear_torch_cache():
    gc.collect()
    if not shared.args.cpu:
        if is_torch_xpu_available():
            torch.xpu.empty_cache()
        else:
            torch.cuda.empty_cache()
--- a/python/llm/example/Text-Generation-WebUI/modules/chat.py
+++ b/python/llm/example/Text-Generation-WebUI/modules/chat.py
@ -0,0 +1,881 @@
 #
 # Copyright 2016 The BigDL Authors.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # This file is adapted from
 # https://github.com/oobabooga/text-generation-webui/blob/main/modules/chat.py
 import base64
 import copy
 import functools
 import html
 import json
 import re
 from datetime import datetime
 from functools import partial
 from pathlib import Path
 import gradio as gr
 import yaml
 from jinja2.sandbox import ImmutableSandboxedEnvironment
 from PIL import Image
 import modules.shared as shared
 from modules.extensions import apply_extensions
 from modules.html_generator import chat_html_wrapper, make_thumbnail
 from modules.logging_colors import logger
 from modules.text_generation import (
    generate_reply,
    get_encoded_length,
    get_max_prompt_length
 )
 from modules.utils import delete_file, get_available_characters, save_file
 # Copied from the Transformers library
 jinja_env = ImmutableSandboxedEnvironment(trim_blocks=True, lstrip_blocks=True)
 def str_presenter(dumper, data):
    """
    Copied from https://github.com/yaml/pyyaml/issues/240
    Makes pyyaml output prettier multiline strings.
    """
    if data.count('\n') > 0:
        return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='|')
    return dumper.represent_scalar('tag:yaml.org,2002:str', data)
 yaml.add_representer(str, str_presenter)
 yaml.representer.SafeRepresenter.add_representer(str, str_presenter)
 def get_generation_prompt(renderer, impersonate=False, strip_trailing_spaces=True):
    '''
    Given a Jinja template, reverse-engineers the prefix and the suffix for
    an assistant message (if impersonate=False) or an user message
    (if impersonate=True)
    '''
    if impersonate:
        messages = [
            {"role": "user", "content": "<<|user-message-1|>>"},
            {"role": "user", "content": "<<|user-message-2|>>"},
        ]
    else:
        messages = [
            {"role": "assistant", "content": "<<|user-message-1|>>"},
            {"role": "assistant", "content": "<<|user-message-2|>>"},
        ]
    prompt = renderer(messages=messages)
    suffix_plus_prefix = prompt.split("<<|user-message-1|>>")[1].split("<<|user-message-2|>>")[0]
    suffix = prompt.split("<<|user-message-2|>>")[1]
    prefix = suffix_plus_prefix[len(suffix):]
    if strip_trailing_spaces:
        prefix = prefix.rstrip(' ')
    return prefix, suffix
 def generate_chat_prompt(user_input, state, **kwargs):
    impersonate = kwargs.get('impersonate', False)
    _continue = kwargs.get('_continue', False)
    also_return_rows = kwargs.get('also_return_rows', False)
    history = kwargs.get('history', state['history'])['internal']
    # Templates
    chat_template = jinja_env.from_string(state['chat_template_str'])
    instruction_template = jinja_env.from_string(state['instruction_template_str'])
    chat_renderer = partial(chat_template.render, add_generation_prompt=False, name1=state['name1'], name2=state['name2'])
    instruct_renderer = partial(instruction_template.render, add_generation_prompt=False)
    messages = []
    if state['mode'] == 'instruct':
        renderer = instruct_renderer
        if state['custom_system_message'].strip() != '':
            messages.append({"role": "system", "content": state['custom_system_message']})
    else:
        renderer = chat_renderer
        if state['context'].strip() != '':
            context = replace_character_names(state['context'], state['name1'], state['name2'])
            messages.append({"role": "system", "content": context})
    insert_pos = len(messages)
    for user_msg, assistant_msg in reversed(history):
        user_msg = user_msg.strip()
        assistant_msg = assistant_msg.strip()
        if assistant_msg:
            messages.insert(insert_pos, {"role": "assistant", "content": assistant_msg})
        if user_msg not in ['', '<|BEGIN-VISIBLE-CHAT|>']:
            messages.insert(insert_pos, {"role": "user", "content": user_msg})
    user_input = user_input.strip()
    if user_input and not impersonate and not _continue:
        messages.append({"role": "user", "content": user_input})
    def remove_extra_bos(prompt):
        for bos_token in ['<s>', '<|startoftext|>']:
            while prompt.startswith(bos_token):
                prompt = prompt[len(bos_token):]
        return prompt
    def make_prompt(messages):
        if state['mode'] == 'chat-instruct' and _continue:
            prompt = renderer(messages=messages[:-1])
        else:
            prompt = renderer(messages=messages)
        if state['mode'] == 'chat-instruct':
            outer_messages = []
            if state['custom_system_message'].strip() != '':
                outer_messages.append({"role": "system", "content": state['custom_system_message']})
            prompt = remove_extra_bos(prompt)
            command = state['chat-instruct_command']
            command = command.replace('<|character|>', state['name2'] if not impersonate else state['name1'])
            command = command.replace('<|prompt|>', prompt)
            if _continue:
                prefix = get_generation_prompt(renderer, impersonate=impersonate, strip_trailing_spaces=False)[0]
                prefix += messages[-1]["content"]
            else:
                prefix = get_generation_prompt(renderer, impersonate=impersonate)[0]
                if not impersonate:
                    prefix = apply_extensions('bot_prefix', prefix, state)
            outer_messages.append({"role": "user", "content": command})
            outer_messages.append({"role": "assistant", "content": prefix})
            prompt = instruction_template.render(messages=outer_messages)
            suffix = get_generation_prompt(instruct_renderer, impersonate=False)[1]
            prompt = prompt[:-len(suffix)]
        else:
            if _continue:
                suffix = get_generation_prompt(renderer, impersonate=impersonate)[1]
                prompt = prompt[:-len(suffix)]
            else:
                prefix = get_generation_prompt(renderer, impersonate=impersonate)[0]
                if state['mode'] == 'chat' and not impersonate:
                    prefix = apply_extensions('bot_prefix', prefix, state)
                prompt += prefix
        prompt = remove_extra_bos(prompt)
        return prompt
    prompt = make_prompt(messages)
    # Handle truncation
    max_length = get_max_prompt_length(state)
    while len(messages) > 0 and get_encoded_length(prompt) > max_length:
        # Try to save the system message
        if len(messages) > 1 and messages[0]['role'] == 'system':
            messages.pop(1)
        else:
            messages.pop(0)
        prompt = make_prompt(messages)
    if also_return_rows:
        return prompt, [message['content'] for message in messages]
    else:
        return prompt
 def get_stopping_strings(state):
    stopping_strings = []
    renderers = []
    if state['mode'] in ['instruct', 'chat-instruct']:
        template = jinja_env.from_string(state['instruction_template_str'])
        renderer = partial(template.render, add_generation_prompt=False)
        renderers.append(renderer)
    if state['mode'] in ['chat', 'chat-instruct']:
        template = jinja_env.from_string(state['chat_template_str'])
        renderer = partial(template.render, add_generation_prompt=False, name1=state['name1'], name2=state['name2'])
        renderers.append(renderer)
    for renderer in renderers:
        prefix_bot, suffix_bot = get_generation_prompt(renderer, impersonate=False)
        prefix_user, suffix_user = get_generation_prompt(renderer, impersonate=True)
        stopping_strings += [
            suffix_user + prefix_bot,
            suffix_user + prefix_user,
            suffix_bot + prefix_bot,
            suffix_bot + prefix_user,
        ]
    if 'stopping_strings' in state and isinstance(state['stopping_strings'], list):
        stopping_strings += state.pop('stopping_strings')
    return list(set(stopping_strings))
 def chatbot_wrapper(text, state, regenerate=False, _continue=False, loading_message=True, for_ui=False):
    history = state['history']
    output = copy.deepcopy(history)
    output = apply_extensions('history', output)
    state = apply_extensions('state', state)
    visible_text = None
    stopping_strings = get_stopping_strings(state)
    is_stream = state['stream']
    # Prepare the input
    if not (regenerate or _continue):
        visible_text = html.escape(text)
        # Apply extensions
        text, visible_text = apply_extensions('chat_input', text, visible_text, state)
        text = apply_extensions('input', text, state, is_chat=True)
        output['internal'].append([text, ''])
        output['visible'].append([visible_text, ''])
        # *Is typing...*
        if loading_message:
            yield {
                'visible': output['visible'][:-1] + [[output['visible'][-1][0], shared.processing_message]],
                'internal': output['internal']
            }
    else:
        text, visible_text = output['internal'][-1][0], output['visible'][-1][0]
        if regenerate:
            if loading_message:
                yield {
                    'visible': output['visible'][:-1] + [[visible_text, shared.processing_message]],
                    'internal': output['internal'][:-1] + [[text, '']]
                }
        elif _continue:
            last_reply = [output['internal'][-1][1], output['visible'][-1][1]]
            if loading_message:
                yield {
                    'visible': output['visible'][:-1] + [[visible_text, last_reply[1] + '...']],
                    'internal': output['internal']
                }
    if shared.model_name == 'None' or shared.model is None:
        raise ValueError("No model is loaded! Select one in the Model tab.")
    # Generate the prompt
    kwargs = {
        '_continue': _continue,
        'history': output if _continue else {k: v[:-1] for k, v in output.items()}
    }
    prompt = apply_extensions('custom_generate_chat_prompt', text, state, **kwargs)
    if prompt is None:
        prompt = generate_chat_prompt(text, state, **kwargs)
    # Generate
    reply = None
    for j, reply in enumerate(generate_reply(prompt, state, stopping_strings=stopping_strings, is_chat=True, for_ui=for_ui)):
        # Extract the reply
        visible_reply = reply
        if state['mode'] in ['chat', 'chat-instruct']:
            visible_reply = re.sub("(<USER>|<user>|{{user}})", state['name1'], reply)
        visible_reply = html.escape(visible_reply)
        if shared.stop_everything:
            output['visible'][-1][1] = apply_extensions('output', output['visible'][-1][1], state, is_chat=True)
            yield output
            return
        if _continue:
            output['internal'][-1] = [text, last_reply[0] + reply]
            output['visible'][-1] = [visible_text, last_reply[1] + visible_reply]
            if is_stream:
                yield output
        elif not (j == 0 and visible_reply.strip() == ''):
            output['internal'][-1] = [text, reply.lstrip(' ')]
            output['visible'][-1] = [visible_text, visible_reply.lstrip(' ')]
            if is_stream:
                yield output
    output['visible'][-1][1] = apply_extensions('output', output['visible'][-1][1], state, is_chat=True)
    yield output
 def impersonate_wrapper(text, state):
    static_output = chat_html_wrapper(state['history'], state['name1'], state['name2'], state['mode'], state['chat_style'], state['character_menu'])
    if shared.model_name == 'None' or shared.model is None:
        logger.error("No model is loaded! Select one in the Model tab.")
        yield '', static_output
        return
    prompt = generate_chat_prompt('', state, impersonate=True)
    stopping_strings = get_stopping_strings(state)
    yield text + '...', static_output
    reply = None
    for reply in generate_reply(prompt + text, state, stopping_strings=stopping_strings, is_chat=True):
        yield (text + reply).lstrip(' '), static_output
        if shared.stop_everything:
            return
 def generate_chat_reply(text, state, regenerate=False, _continue=False, loading_message=True, for_ui=False):
    history = state['history']
    if regenerate or _continue:
        text = ''
        if (len(history['visible']) == 1 and not history['visible'][0][0]) or len(history['internal']) == 0:
            yield history
            return
    for history in chatbot_wrapper(text, state, regenerate=regenerate, _continue=_continue, loading_message=loading_message, for_ui=for_ui):
        yield history
 def character_is_loaded(state, raise_exception=False):
    if state['mode'] in ['chat', 'chat-instruct'] and state['name2'] == '':
        logger.error('It looks like no character is loaded. Please load one under Parameters > Character.')
        if raise_exception:
            raise ValueError
        return False
    else:
        return True
 def generate_chat_reply_wrapper(text, state, regenerate=False, _continue=False):
    '''
    Same as above but returns HTML for the UI
    '''
    if not character_is_loaded(state):
        return
    if state['start_with'] != '' and not _continue:
        if regenerate:
            text, state['history'] = remove_last_message(state['history'])
            regenerate = False
        _continue = True
        send_dummy_message(text, state)
        send_dummy_reply(state['start_with'], state)
    for i, history in enumerate(generate_chat_reply(text, state, regenerate, _continue, loading_message=True, for_ui=True)):
        yield chat_html_wrapper(history, state['name1'], state['name2'], state['mode'], state['chat_style'], state['character_menu']), history
 def remove_last_message(history):
    if len(history['visible']) > 0 and history['internal'][-1][0] != '<|BEGIN-VISIBLE-CHAT|>':
        last = history['visible'].pop()
        history['internal'].pop()
    else:
        last = ['', '']
    return html.unescape(last[0]), history
 def send_last_reply_to_input(history):
    if len(history['visible']) > 0:
        return html.unescape(history['visible'][-1][1])
    else:
        return ''
 def replace_last_reply(text, state):
    history = state['history']
    if len(text.strip()) == 0:
        return history
    elif len(history['visible']) > 0:
        history['visible'][-1][1] = html.escape(text)
        history['internal'][-1][1] = apply_extensions('input', text, state, is_chat=True)
    return history
 def send_dummy_message(text, state):
    history = state['history']
    history['visible'].append([html.escape(text), ''])
    history['internal'].append([apply_extensions('input', text, state, is_chat=True), ''])
    return history
 def send_dummy_reply(text, state):
    history = state['history']
    if len(history['visible']) > 0 and not history['visible'][-1][1] == '':
        history['visible'].append(['', ''])
        history['internal'].append(['', ''])
    history['visible'][-1][1] = html.escape(text)
    history['internal'][-1][1] = apply_extensions('input', text, state, is_chat=True)
    return history
 def redraw_html(history, name1, name2, mode, style, character, reset_cache=False):
    return chat_html_wrapper(history, name1, name2, mode, style, character, reset_cache=reset_cache)
 def start_new_chat(state):
    mode = state['mode']
    history = {'internal': [], 'visible': []}
    if mode != 'instruct':
        greeting = replace_character_names(state['greeting'], state['name1'], state['name2'])
        if greeting != '':
            history['internal'] += [['<|BEGIN-VISIBLE-CHAT|>', greeting]]
            history['visible'] += [['', apply_extensions('output', greeting, state, is_chat=True)]]
    unique_id = datetime.now().strftime('%Y%m%d-%H-%M-%S')
    save_history(history, unique_id, state['character_menu'], state['mode'])
    return history
 def get_history_file_path(unique_id, character, mode):
    if mode == 'instruct':
        p = Path(f'logs/instruct/{unique_id}.json')
    else:
        p = Path(f'logs/chat/{character}/{unique_id}.json')
    return p
 def save_history(history, unique_id, character, mode):
    if shared.args.multi_user:
        return
    p = get_history_file_path(unique_id, character, mode)
    if not p.parent.is_dir():
        p.parent.mkdir(parents=True)
    with open(p, 'w', encoding='utf-8') as f:
        f.write(json.dumps(history, indent=4))
 def rename_history(old_id, new_id, character, mode):
    if shared.args.multi_user:
        return
    old_p = get_history_file_path(old_id, character, mode)
    new_p = get_history_file_path(new_id, character, mode)
    if new_p.parent != old_p.parent:
        logger.error(f"The following path is not allowed: {new_p}.")
    elif new_p == old_p:
        logger.info("The provided path is identical to the old one.")
    else:
        logger.info(f"Renaming {old_p} to {new_p}")
        old_p.rename(new_p)
 def find_all_histories(state):
    if shared.args.multi_user:
        return ['']
    if state['mode'] == 'instruct':
        paths = Path('logs/instruct').glob('*.json')
    else:
        character = state['character_menu']
        # Handle obsolete filenames and paths
        old_p = Path(f'logs/{character}_persistent.json')
        new_p = Path(f'logs/persistent_{character}.json')
        if old_p.exists():
            logger.warning(f"Renaming {old_p} to {new_p}")
            old_p.rename(new_p)
        if new_p.exists():
            unique_id = datetime.now().strftime('%Y%m%d-%H-%M-%S')
            p = get_history_file_path(unique_id, character, state['mode'])
            logger.warning(f"Moving {new_p} to {p}")
            p.parent.mkdir(exist_ok=True)
            new_p.rename(p)
        paths = Path(f'logs/chat/{character}').glob('*.json')
    histories = sorted(paths, key=lambda x: x.stat().st_mtime, reverse=True)
    histories = [path.stem for path in histories]
    return histories
 def load_latest_history(state):
    '''
    Loads the latest history for the given character in chat or chat-instruct
    mode, or the latest instruct history for instruct mode.
    '''
    if shared.args.multi_user:
        return start_new_chat(state)
    histories = find_all_histories(state)
    if len(histories) > 0:
        history = load_history(histories[0], state['character_menu'], state['mode'])
    else:
        history = start_new_chat(state)
    return history
 def load_history(unique_id, character, mode):
    p = get_history_file_path(unique_id, character, mode)
    f = json.loads(open(p, 'rb').read())
    if 'internal' in f and 'visible' in f:
        history = f
    else:
        history = {
            'internal': f['data'],
            'visible': f['data_visible']
        }
    return history
 def load_history_json(file, history):
    try:
        file = file.decode('utf-8')
        f = json.loads(file)
        if 'internal' in f and 'visible' in f:
            history = f
        else:
            history = {
                'internal': f['data'],
                'visible': f['data_visible']
            }
        return history
    except:
        return history
 def delete_history(unique_id, character, mode):
    p = get_history_file_path(unique_id, character, mode)
    delete_file(p)
 def replace_character_names(text, name1, name2):
    text = text.replace('{{user}}', name1).replace('{{char}}', name2)
    return text.replace('<USER>', name1).replace('<BOT>', name2)
 def generate_pfp_cache(character):
    cache_folder = Path(shared.args.disk_cache_dir)
    if not cache_folder.exists():
        cache_folder.mkdir()
    for path in [Path(f"characters/{character}.{extension}") for extension in ['png', 'jpg', 'jpeg']]:
        if path.exists():
            original_img = Image.open(path)
            original_img.save(Path(f'{cache_folder}/pfp_character.png'), format='PNG')
            thumb = make_thumbnail(original_img)
            thumb.save(Path(f'{cache_folder}/pfp_character_thumb.png'), format='PNG')
            return thumb
    return None
 def load_character(character, name1, name2):
    context = greeting = ""
    greeting_field = 'greeting'
    picture = None
    filepath = None
    for extension in ["yml", "yaml", "json"]:
        filepath = Path(f'characters/{character}.{extension}')
        if filepath.exists():
            break
    if filepath is None or not filepath.exists():
        logger.error(f"Could not find the character \"{character}\" inside characters/. No character has been loaded.")
        raise ValueError
    file_contents = open(filepath, 'r', encoding='utf-8').read()
    data = json.loads(file_contents) if extension == "json" else yaml.safe_load(file_contents)
    cache_folder = Path(shared.args.disk_cache_dir)
    for path in [Path(f"{cache_folder}/pfp_character.png"), Path(f"{cache_folder}/pfp_character_thumb.png")]:
        if path.exists():
            path.unlink()
    picture = generate_pfp_cache(character)
    # Finding the bot's name
    for k in ['name', 'bot', '<|bot|>', 'char_name']:
        if k in data and data[k] != '':
            name2 = data[k]
            break
    # Find the user name (if any)
    for k in ['your_name', 'user', '<|user|>']:
        if k in data and data[k] != '':
            name1 = data[k]
            break
    if 'context' in data:
        context = data['context'].strip()
    elif "char_persona" in data:
        context = build_pygmalion_style_context(data)
        greeting_field = 'char_greeting'
    greeting = data.get(greeting_field, greeting)
    return name1, name2, picture, greeting, context
 def load_instruction_template(template):
    for filepath in [Path(f'instruction-templates/{template}.yaml'), Path('instruction-templates/Alpaca.yaml')]:
        if filepath.exists():
            break
    else:
        return ''
    file_contents = open(filepath, 'r', encoding='utf-8').read()
    data = yaml.safe_load(file_contents)
    if 'instruction_template' in data:
        return data['instruction_template']
    else:
        return jinja_template_from_old_format(data)
@functools.cache
 def load_character_memoized(character, name1, name2):
    return load_character(character, name1, name2)
@functools.cache
 def load_instruction_template_memoized(template):
    return load_instruction_template(template)
 def upload_character(file, img, tavern=False):
    decoded_file = file if isinstance(file, str) else file.decode('utf-8')
    try:
        data = json.loads(decoded_file)
    except:
        data = yaml.safe_load(decoded_file)
    if 'char_name' in data:
        name = data['char_name']
        greeting = data['char_greeting']
        context = build_pygmalion_style_context(data)
        yaml_data = generate_character_yaml(name, greeting, context)
    else:
        name = data['name']
        yaml_data = generate_character_yaml(data['name'], data['greeting'], data['context'])
    outfile_name = name
    i = 1
    while Path(f'characters/{outfile_name}.yaml').exists():
        outfile_name = f'{name}_{i:03d}'
        i += 1
    with open(Path(f'characters/{outfile_name}.yaml'), 'w', encoding='utf-8') as f:
        f.write(yaml_data)
    if img is not None:
        img.save(Path(f'characters/{outfile_name}.png'))
    logger.info(f'New character saved to "characters/{outfile_name}.yaml".')
    return gr.update(value=outfile_name, choices=get_available_characters())
 def build_pygmalion_style_context(data):
    context = ""
    if 'char_persona' in data and data['char_persona'] != '':
        context += f"{data['char_name']}'s Persona: {data['char_persona']}\n"
    if 'world_scenario' in data and data['world_scenario'] != '':
        context += f"Scenario: {data['world_scenario']}\n"
    if 'example_dialogue' in data and data['example_dialogue'] != '':
        context += f"{data['example_dialogue'].strip()}\n"
    context = f"{context.strip()}\n"
    return context
 def upload_tavern_character(img, _json):
    _json = {'char_name': _json['name'], 'char_persona': _json['description'], 'char_greeting': _json['first_mes'], 'example_dialogue': _json['mes_example'], 'world_scenario': _json['scenario']}
    return upload_character(json.dumps(_json), img, tavern=True)
 def check_tavern_character(img):
    if "chara" not in img.info:
        return "Not a TavernAI card", None, None, gr.update(interactive=False)
    decoded_string = base64.b64decode(img.info['chara']).replace(b'\\r\\n', b'\\n')
    _json = json.loads(decoded_string)
    if "data" in _json:
        _json = _json["data"]
    return _json['name'], _json['description'], _json, gr.update(interactive=True)
 def upload_your_profile_picture(img):
    cache_folder = Path(shared.args.disk_cache_dir)
    if not cache_folder.exists():
        cache_folder.mkdir()
    if img is None:
        if Path(f"{cache_folder}/pfp_me.png").exists():
            Path(f"{cache_folder}/pfp_me.png").unlink()
    else:
        img = make_thumbnail(img)
        img.save(Path(f'{cache_folder}/pfp_me.png'))
        logger.info(f'Profile picture saved to "{cache_folder}/pfp_me.png"')
 def generate_character_yaml(name, greeting, context):
    data = {
        'name': name,
        'greeting': greeting,
        'context': context,
    }
    data = {k: v for k, v in data.items() if v}  # Strip falsy
    return yaml.dump(data, sort_keys=False, width=float("inf"))
 def generate_instruction_template_yaml(instruction_template):
    data = {
        'instruction_template': instruction_template
    }
    return my_yaml_output(data)
 def save_character(name, greeting, context, picture, filename):
    if filename == "":
        logger.error("The filename is empty, so the character will not be saved.")
        return
    data = generate_character_yaml(name, greeting, context)
    filepath = Path(f'characters/{filename}.yaml')
    save_file(filepath, data)
    path_to_img = Path(f'characters/{filename}.png')
    if picture is not None:
        picture.save(path_to_img)
        logger.info(f'Saved {path_to_img}.')
 def delete_character(name, instruct=False):
    for extension in ["yml", "yaml", "json"]:
        delete_file(Path(f'characters/{name}.{extension}'))
    delete_file(Path(f'characters/{name}.png'))
 def jinja_template_from_old_format(params, verbose=False):
    MASTER_TEMPLATE = """
 {%- set ns = namespace(found=false) -%}
 {%- for message in messages -%}
    {%- if message['role'] == 'system' -%}
        {%- set ns.found = true -%}
    {%- endif -%}
 {%- endfor -%}
 {%- if not ns.found -%}
    {{- '<|PRE-SYSTEM|>' + '<|SYSTEM-MESSAGE|>' + '<|POST-SYSTEM|>' -}}
 {%- endif %}
 {%- for message in messages %}
    {%- if message['role'] == 'system' -%}
        {{- '<|PRE-SYSTEM|>' + message['content'] + '<|POST-SYSTEM|>' -}}
    {%- else -%}
        {%- if message['role'] == 'user' -%}
            {{-'<|PRE-USER|>' + message['content'] + '<|POST-USER|>'-}}
        {%- else -%}
            {{-'<|PRE-ASSISTANT|>' + message['content'] + '<|POST-ASSISTANT|>' -}}
        {%- endif -%}
    {%- endif -%}
 {%- endfor -%}
 {%- if add_generation_prompt -%}
    {{-'<|PRE-ASSISTANT-GENERATE|>'-}}
 {%- endif -%}
 """
    if 'context' in params and '<|system-message|>' in params['context']:
        pre_system = params['context'].split('<|system-message|>')[0]
        post_system = params['context'].split('<|system-message|>')[1]
    else:
        pre_system = ''
        post_system = ''
    pre_user = params['turn_template'].split('<|user-message|>')[0].replace('<|user|>', params['user'])
    post_user = params['turn_template'].split('<|user-message|>')[1].split('<|bot|>')[0]
    pre_assistant = '<|bot|>' + params['turn_template'].split('<|bot-message|>')[0].split('<|bot|>')[1]
    pre_assistant = pre_assistant.replace('<|bot|>', params['bot'])
    post_assistant = params['turn_template'].split('<|bot-message|>')[1]
    def preprocess(string):
        return string.replace('\n', '\\n').replace('\'', '\\\'')
    pre_system = preprocess(pre_system)
    post_system = preprocess(post_system)
    pre_user = preprocess(pre_user)
    post_user = preprocess(post_user)
    pre_assistant = preprocess(pre_assistant)
    post_assistant = preprocess(post_assistant)
    if verbose:
        print(
            '\n',
            repr(pre_system) + '\n',
            repr(post_system) + '\n',
            repr(pre_user) + '\n',
            repr(post_user) + '\n',
            repr(pre_assistant) + '\n',
            repr(post_assistant) + '\n',
        )
    result = MASTER_TEMPLATE
    if 'system_message' in params:
        result = result.replace('<|SYSTEM-MESSAGE|>', preprocess(params['system_message']))
    else:
        result = result.replace('<|SYSTEM-MESSAGE|>', '')
    result = result.replace('<|PRE-SYSTEM|>', pre_system)
    result = result.replace('<|POST-SYSTEM|>', post_system)
    result = result.replace('<|PRE-USER|>', pre_user)
    result = result.replace('<|POST-USER|>', post_user)
    result = result.replace('<|PRE-ASSISTANT|>', pre_assistant)
    result = result.replace('<|PRE-ASSISTANT-GENERATE|>', pre_assistant.rstrip(' '))
    result = result.replace('<|POST-ASSISTANT|>', post_assistant)
    result = result.strip()
    return result
 def my_yaml_output(data):
    '''
    pyyaml is very inconsistent with multiline strings.
    for simple instruction template outputs, this is enough.
    '''
    result = ""
    for k in data:
        result += k + ": |-\n"
        for line in data[k].splitlines():
            result += "  " + line.rstrip(' ') + "\n"
    return result
--- a/python/llm/example/Text-Generation-WebUI/modules/evaluate.py
+++ b/python/llm/example/Text-Generation-WebUI/modules/evaluate.py
@ -0,0 +1,172 @@
 #
 # Copyright 2016 The BigDL Authors.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # This file is adapted from
 # https://github.com/oobabooga/text-generation-webui/blob/main/modules/evaluate.py
 import datetime
 from pathlib import Path
 import pandas as pd
 import torch
 from datasets import load_dataset
 from tqdm import tqdm
 from modules import shared
 from modules.logging_colors import logger
 from modules.models import clear_torch_cache, load_model, unload_model
 from modules.models_settings import get_model_metadata, update_model_parameters
 from modules.text_generation import encode
 def load_past_evaluations():
    if Path('logs/evaluations.csv').exists():
        df = pd.read_csv(Path('logs/evaluations.csv'), dtype=str)
        df['Perplexity'] = pd.to_numeric(df['Perplexity'])
        return df
    else:
        return pd.DataFrame(columns=['Model', 'LoRAs', 'Dataset', 'Perplexity', 'stride', 'max_length', 'Date', 'Comment'])
 past_evaluations = load_past_evaluations()
 def save_past_evaluations(df):
    global past_evaluations
    past_evaluations = df
    filepath = Path('logs/evaluations.csv')
    filepath.parent.mkdir(parents=True, exist_ok=True)
    df.to_csv(filepath, index=False)
 def calculate_perplexity(models, input_dataset, stride, _max_length):
    '''
    Based on:
    https://huggingface.co/docs/transformers/perplexity#calculating-ppl-with-fixedlength-models
    '''
    if not shared.args.no_use_fast:
        logger.warning("--no_use_fast is not being used. If tokenizing the input dataset takes a long time, consider loading the model with that option checked.")
    global past_evaluations
    cumulative_log = ''
    cumulative_log += "Loading the input dataset...\n\n"
    yield cumulative_log
    # Copied from https://github.com/qwopqwop200/GPTQ-for-LLaMa/blob/triton/utils/datautils.py
    if input_dataset == 'wikitext':
        data = load_dataset('wikitext', 'wikitext-2-raw-v1', split='test')
        text = "\n\n".join(data['text'])
    elif input_dataset == 'ptb':
        data = load_dataset('ptb_text_only', 'penn_treebank', split='validation')
        text = "\n\n".join(data['sentence'])
    elif input_dataset == 'ptb_new':
        data = load_dataset('ptb_text_only', 'penn_treebank', split='test')
        text = " ".join(data['sentence'])
    else:
        with open(Path(f'training/datasets/{input_dataset}.txt'), 'r', encoding='utf-8') as f:
            text = f.read()
    for model in models:
        if is_in_past_evaluations(model, input_dataset, stride, _max_length):
            cumulative_log += f"`{model}` has already been tested. Ignoring.\n\n"
            yield cumulative_log
            continue
        if model != 'current model':
            try:
                yield cumulative_log + f"Loading `{model}`...\n\n"
                model_settings = get_model_metadata(model)
                shared.settings.update({k: v for k, v in model_settings.items() if k in shared.settings})  # hijacking the interface defaults
                update_model_parameters(model_settings)  # hijacking the command-line arguments
                unload_model()
                shared.model, shared.tokenizer = load_model(model)
            except:
                cumulative_log += f"Failed to load `{model}`. Moving on.\n\n"
                yield cumulative_log
                continue
        cumulative_log += f"Processing `{shared.model_name}`...\n\n"
        yield cumulative_log + "Tokenizing the input dataset...\n\n"
        encodings = encode(text, add_special_tokens=False)
        seq_len = encodings.shape[1]
        if _max_length:
            max_length = _max_length
        elif hasattr(shared.model.config, 'max_position_embeddings'):
            max_length = shared.model.config.max_position_embeddings
        else:
            max_length = 2048
        nlls = []
        prev_end_loc = 0
        for begin_loc in tqdm(range(0, seq_len, stride)):
            yield cumulative_log + f"Evaluating... {100*begin_loc/seq_len:.2f}%"
            end_loc = min(begin_loc + max_length, seq_len)
            trg_len = end_loc - prev_end_loc  # may be different from stride on last loop
            input_ids = encodings[:, begin_loc:end_loc]
            target_ids = input_ids.clone()
            target_ids[:, :-trg_len] = -100
            clear_torch_cache()
            with torch.no_grad():
                outputs = shared.model(input_ids=input_ids, labels=target_ids)
                # loss is calculated using CrossEntropyLoss which averages over valid labels
                # N.B. the model only calculates loss over trg_len - 1 labels, because it internally shifts the labels
                # to the left by 1.
                neg_log_likelihood = outputs.loss
            nlls.append(neg_log_likelihood)
            prev_end_loc = end_loc
            if end_loc == seq_len:
                break
        ppl = torch.exp(torch.stack(nlls).mean())
        add_entry_to_past_evaluations(float(ppl), shared.model_name, input_dataset, stride, _max_length)
        save_past_evaluations(past_evaluations)
        cumulative_log += f"The perplexity for `{shared.model_name}` is: {float(ppl)}\n\n"
        yield cumulative_log
 def add_entry_to_past_evaluations(perplexity, model, dataset, stride, max_length):
    global past_evaluations
    entry = {
        'Model': model,
        'LoRAs': ', '.join(shared.lora_names) or '-',
        'Dataset': dataset,
        'Perplexity': perplexity,
        'stride': str(stride),
        'max_length': str(max_length),
        'Date': datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
        'Comment': ''
    }
    past_evaluations = pd.concat([past_evaluations, pd.DataFrame([entry])], ignore_index=True)
 def is_in_past_evaluations(model, dataset, stride, max_length):
    entries = past_evaluations[(past_evaluations['Model'] == model) &
                               (past_evaluations['Dataset'] == dataset) &
                               (past_evaluations['max_length'] == str(max_length)) &
                               (past_evaluations['stride'] == str(stride))]
    if entries.shape[0] > 0:
        return True
    else:
        return False
 def generate_markdown_table():
    sorted_df = past_evaluations.sort_values(by=['Dataset', 'stride', 'Perplexity', 'Date'])
    return sorted_df
--- a/python/llm/example/Text-Generation-WebUI/modules/extensions.py
+++ b/python/llm/example/Text-Generation-WebUI/modules/extensions.py
@ -0,0 +1,248 @@
 #
 # Copyright 2016 The BigDL Authors.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # This file is adapted from
 # https://github.com/oobabooga/text-generation-webui/blob/main/modules/extensions.py
 import traceback
 from functools import partial
 from inspect import signature
 import gradio as gr
 import extensions
 import modules.shared as shared
 from modules.logging_colors import logger
 state = {}
 available_extensions = []
 setup_called = set()
 def apply_settings(extension, name):
    if not hasattr(extension, 'params'):
        return
    for param in extension.params:
        _id = f"{name}-{param}"
        if _id not in shared.settings:
            continue
        extension.params[param] = shared.settings[_id]
 def load_extensions():
    global state, setup_called
    state = {}
    for i, name in enumerate(shared.args.extensions):
        if name in available_extensions:
            if name != 'api':
                logger.info(f'Loading the extension "{name}"')
            try:
                try:
                    exec(f"import extensions.{name}.script")
                except ModuleNotFoundError:
                    logger.error(f"Could not import the requirements for '{name}'. Make sure to install the requirements for the extension.\n\nLinux / Mac:\n\npip install -r extensions/{name}/requirements.txt --upgrade\n\nWindows:\n\npip install -r extensions\\{name}\\requirements.txt --upgrade\n\nIf you used the one-click installer, paste the command above in the terminal window opened after launching the cmd script for your OS.")
                    raise
                extension = getattr(extensions, name).script
                apply_settings(extension, name)
                if extension not in setup_called and hasattr(extension, "setup"):
                    setup_called.add(extension)
                    extension.setup()
                state[name] = [True, i]
            except:
                logger.error(f'Failed to load the extension "{name}".')
                traceback.print_exc()
 # This iterator returns the extensions in the order specified in the command-line
 def iterator():
    for name in sorted(state, key=lambda x: state[x][1]):
        if state[name][0]:
            yield getattr(extensions, name).script, name
 # Extension functions that map string -> string
 def _apply_string_extensions(function_name, text, state, is_chat=False):
    for extension, _ in iterator():
        if hasattr(extension, function_name):
            func = getattr(extension, function_name)
            # Handle old extensions without the 'state' arg or
            # the 'is_chat' kwarg
            count = 0
            has_chat = False
            for k in signature(func).parameters:
                if k == 'is_chat':
                    has_chat = True
                else:
                    count += 1
            if count == 2:
                args = [text, state]
            else:
                args = [text]
            if has_chat:
                kwargs = {'is_chat': is_chat}
            else:
                kwargs = {}
            text = func(*args, **kwargs)
    return text
 # Extension functions that map string -> string
 def _apply_chat_input_extensions(text, visible_text, state):
    for extension, _ in iterator():
        if hasattr(extension, 'chat_input_modifier'):
            text, visible_text = extension.chat_input_modifier(text, visible_text, state)
    return text, visible_text
 # custom_generate_chat_prompt handling - currently only the first one will work
 def _apply_custom_generate_chat_prompt(text, state, **kwargs):
    for extension, _ in iterator():
        if hasattr(extension, 'custom_generate_chat_prompt'):
            return extension.custom_generate_chat_prompt(text, state, **kwargs)
    return None
 # Extension that modifies the input parameters before they are used
 def _apply_state_modifier_extensions(state):
    for extension, _ in iterator():
        if hasattr(extension, "state_modifier"):
            state = getattr(extension, "state_modifier")(state)
    return state
 # Extension that modifies the chat history before it is used
 def _apply_history_modifier_extensions(history):
    for extension, _ in iterator():
        if hasattr(extension, "history_modifier"):
            history = getattr(extension, "history_modifier")(history)
    return history
 # Extension functions that override the default tokenizer output - The order of execution is not defined
 def _apply_tokenizer_extensions(function_name, state, prompt, input_ids, input_embeds):
    for extension, _ in iterator():
        if hasattr(extension, function_name):
            prompt, input_ids, input_embeds = getattr(extension, function_name)(state, prompt, input_ids, input_embeds)
    return prompt, input_ids, input_embeds
 # Allow extensions to add their own logits processors to the stack being run.
 # Each extension would call `processor_list.append({their LogitsProcessor}())`.
 def _apply_logits_processor_extensions(function_name, processor_list, input_ids):
    for extension, _ in iterator():
        if hasattr(extension, function_name):
            result = getattr(extension, function_name)(processor_list, input_ids)
            if type(result) is list:
                processor_list = result
    return processor_list
 # Get prompt length in tokens after applying extension functions which override the default tokenizer output
 # currently only the first one will work
 def _apply_custom_tokenized_length(prompt):
    for extension, _ in iterator():
        if hasattr(extension, 'custom_tokenized_length'):
            return getattr(extension, 'custom_tokenized_length')(prompt)
    return None
 # Custom generate reply handling - currently only the first one will work
 def _apply_custom_generate_reply():
    for extension, _ in iterator():
        if hasattr(extension, 'custom_generate_reply'):
            return getattr(extension, 'custom_generate_reply')
    return None
 def _apply_custom_css():
    all_css = ''
    for extension, _ in iterator():
        if hasattr(extension, 'custom_css'):
            all_css += getattr(extension, 'custom_css')()
    return all_css
 def _apply_custom_js():
    all_js = ''
    for extension, _ in iterator():
        if hasattr(extension, 'custom_js'):
            all_js += getattr(extension, 'custom_js')()
    return all_js
 def create_extensions_block():
    to_display = []
    for extension, name in iterator():
        if hasattr(extension, "ui") and not (hasattr(extension, 'params') and extension.params.get('is_tab', False)):
            to_display.append((extension, name))
    # Creating the extension ui elements
    if len(to_display) > 0:
        with gr.Column(elem_id="extensions"):
            for row in to_display:
                extension, _ = row
                extension.ui()
 def create_extensions_tabs():
    for extension, name in iterator():
        if hasattr(extension, "ui") and (hasattr(extension, 'params') and extension.params.get('is_tab', False)):
            display_name = getattr(extension, 'params', {}).get('display_name', name)
            with gr.Tab(display_name, elem_classes="extension-tab"):
                extension.ui()
 EXTENSION_MAP = {
    "input": partial(_apply_string_extensions, "input_modifier"),
    "output": partial(_apply_string_extensions, "output_modifier"),
    "chat_input": _apply_chat_input_extensions,
    "state": _apply_state_modifier_extensions,
    "history": _apply_history_modifier_extensions,
    "bot_prefix": partial(_apply_string_extensions, "bot_prefix_modifier"),
    "tokenizer": partial(_apply_tokenizer_extensions, "tokenizer_modifier"),
    'logits_processor': partial(_apply_logits_processor_extensions, 'logits_processor_modifier'),
    "custom_generate_chat_prompt": _apply_custom_generate_chat_prompt,
    "custom_generate_reply": _apply_custom_generate_reply,
    "tokenized_length": _apply_custom_tokenized_length,
    "css": _apply_custom_css,
    "js": _apply_custom_js
 }
 def apply_extensions(typ, *args, **kwargs):
    if typ not in EXTENSION_MAP:
        raise ValueError(f"Invalid extension type {typ}")
    return EXTENSION_MAP[typ](*args, **kwargs)
--- a/python/llm/example/Text-Generation-WebUI/modules/github.py
+++ b/python/llm/example/Text-Generation-WebUI/modules/github.py
@ -0,0 +1,57 @@
 #
 # Copyright 2016 The BigDL Authors.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # This file is adapted from
 # https://github.com/oobabooga/text-generation-webui/blob/main/modules/github.py
 import subprocess
 from pathlib import Path
 new_extensions = set()
 def clone_or_pull_repository(github_url):
    global new_extensions
    repository_folder = Path("extensions")
    repo_name = github_url.rstrip("/").split("/")[-1].split(".")[0]
    # Check if the repository folder exists
    if not repository_folder.exists():
        repository_folder.mkdir(parents=True)
    repo_path = repository_folder / repo_name
    # Check if the repository is already cloned
    if repo_path.exists():
        yield f"Updating {github_url}..."
        # Perform a 'git pull' to update the repository
        try:
            pull_output = subprocess.check_output(["git", "-C", repo_path, "pull"], stderr=subprocess.STDOUT)
            yield "Done."
            return pull_output.decode()
        except subprocess.CalledProcessError as e:
            return str(e)
    # Clone the repository
    try:
        yield f"Cloning {github_url}..."
        clone_output = subprocess.check_output(["git", "clone", github_url, repo_path], stderr=subprocess.STDOUT)
        new_extensions.add(repo_name)
        yield f"The extension `{repo_name}` has been downloaded.\n\nPlease close the the web UI completely and launch it again to be able to load it."
        return clone_output.decode()
    except subprocess.CalledProcessError as e:
        return str(e)
--- a/python/llm/example/Text-Generation-WebUI/modules/grammar/grammar_utils.py
+++ b/python/llm/example/Text-Generation-WebUI/modules/grammar/grammar_utils.py
@ -0,0 +1,696 @@
 #
 # Copyright 2016 The BigDL Authors.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # This file is adapted from
 # https://github.com/huggingface/transformers/pull/27557
 import logging
 import re
 import time
 from abc import ABC
 from functools import lru_cache
 from typing import Dict, List
 import torch
 from modules import shared
 logger = logging.getLogger(__name__)
 ########################
 # EBNF Grammar Parsing #
 ########################
 END_OF_ALTERNATE_MARKER = 0
 END_OF_RULE_MARKER = 0
 TO_BE_FILLED_MARKER = 0
 REF_RULE_MARKER = 1
 LITERAL_MARKER = 2
 class ParseState:
    def __init__(self):
        self.symbol_ids = {}
        self.grammar_encoding = []  # old name: out_grammar
 def get_symbol_id(state, src):
    if src not in state.symbol_ids:
        state.symbol_ids[src] = len(state.symbol_ids)
    return state.symbol_ids[src]
 def generate_symbol_id(state, base_name):
    next_id = len(state.symbol_ids)
    state.symbol_ids[base_name + "_" + str(next_id)] = next_id
    return next_id
 def is_word_char(c):
    return c.isalnum() or c == "-" or c == "_"
 def hex_to_int(c):
    if c.isdigit():
        return int(c)
    elif "a" <= c.lower() <= "f":
        return ord(c.lower()) - ord("a") + 10
    return -1
 def remove_leading_white_space(src, newline_ok):
    """
    Skips over whitespace and comments in the input string.
    This function processes the input string, skipping over any spaces, tabs,
    and content following a '#' character, which denotes a comment. The parsing
    of a comment continues until the end of the line (denoted by newline characters
    '\r' or '\n'). If the 'newline_ok' parameter is set to False, the function
    will stop processing and return the remaining string upon encountering a
    newline character, otherwise it will skip over newline characters as well.
    Parameters:
    src (str): The input string to be processed.
    newline_ok (bool): A flag indicating whether encountering a newline character
                       should stop the parsing (False) or if it should be skipped (True).
    Returns:
    str: The remaining portion of the input string after skipping whitespace and comments.
    """
    pos = 0
    while pos < len(src) and (src[pos].isspace() or src[pos] == "#"):
        if src[pos] == "#":
            while pos < len(src) and src[pos] not in ("\r", "\n"):
                pos += 1
        else:
            if not newline_ok and src[pos] in ("\r", "\n"):
                break
            pos += 1
    return src[pos:]
 def parse_name(src):
    pos = 0
    while pos < len(src) and is_word_char(src[pos]):
        pos += 1
    if pos == 0:
        raise RuntimeError("expecting name at " + src)
    return src[:pos], src[pos:]
 def parse_char(src):
    """
    parse the leading char from the input string
    :param src:
    :return: char, remaining_src
    """
    # if we have a backslash, it's maybe an escape
    if src[0] == "\\":
        esc = src[1]
        if esc == "x":
            first = hex_to_int(src[2])
            if first > -1:
                second = hex_to_int(src[3])
                if second > -1:
                    return (first << 4) + second, src[4:]
            raise RuntimeError("expecting \\xNN at " + src)
        elif esc in ('"', "[", "]"):
            return esc, src[2:]
        elif esc == "r":
            return "\r", src[2:]
        elif esc == "n":
            return "\n", src[2:]
        elif esc == "t":
            return "\t", src[2:]
        raise RuntimeError("unknown escape at " + src)
    elif src:
        return src[0], src[1:]
    raise RuntimeError("unexpected end of input")
 def parse_sequence(state, src, rule_name, outbuf, is_nested):
    out_start_pos = len(outbuf)
    # sequence size, will be replaced at end when known
    outbuf.append(TO_BE_FILLED_MARKER)
    last_sym_start = len(outbuf)
    remaining_src = src
    while remaining_src:
        if remaining_src[0] == '"':  # literal string
            remaining_src = remaining_src[1:]
            last_sym_start = len(outbuf)
            while remaining_src[0] != '"':
                char, remaining_src = parse_char(remaining_src)
                # each char of a literal is encoded as a "range" of char - char
                outbuf.append(LITERAL_MARKER)
                outbuf.append(ord(char))
                outbuf.append(ord(char))
            remaining_src = remove_leading_white_space(remaining_src[1:], is_nested)
        elif remaining_src[0] == "[":  # char range(s)
            remaining_src = remaining_src[1:]
            last_sym_start = len(outbuf)
            # num chars in range - replaced at end of loop
            outbuf.append(TO_BE_FILLED_MARKER)
            while remaining_src[0] != "]":
                char, remaining_src = parse_char(remaining_src)
                outbuf.append(ord(char))
                if remaining_src[0] == "-" and remaining_src[1] != "]":
                    endchar_pair, remaining_src = parse_char(remaining_src[1:])
                    outbuf.append(ord(endchar_pair))
                else:
                    # chars that aren't part of a c1-c2 range are just doubled (i.e., c-c)
                    outbuf.append(ord(char))
            # replace num chars with actual
            outbuf[last_sym_start] = len(outbuf) - last_sym_start - 1
            remaining_src = remove_leading_white_space(remaining_src[1:], is_nested)
        elif is_word_char(remaining_src[0]):  # rule reference
            name, remaining_src = parse_name(remaining_src)
            ref_rule_id = get_symbol_id(state, name)
            remaining_src = remove_leading_white_space(remaining_src, is_nested)
            last_sym_start = len(outbuf)
            outbuf.append(REF_RULE_MARKER)
            outbuf.append(ref_rule_id)
        elif remaining_src[0] == "(":  # grouping
            # parse nested alternates into synthesized rule
            remaining_src = remove_leading_white_space(remaining_src[1:], True)
            sub_rule_id = generate_symbol_id(state, rule_name)
            remaining_src = parse_alternates(state, remaining_src, rule_name, sub_rule_id, True)
            last_sym_start = len(outbuf)
            # output reference to synthesized rule
            outbuf.append(REF_RULE_MARKER)
            outbuf.append(sub_rule_id)
            if remaining_src[0] != ")":
                raise RuntimeError("expecting ')' at " + remaining_src)
            remaining_src = remove_leading_white_space(remaining_src[1:], is_nested)
        elif remaining_src[0] in ("*", "+", "?"):  # repetition operator
            if len(outbuf) - out_start_pos - 1 == 0:
                raise RuntimeError("expecting preceeding item to */+/? at " + remaining_src)
            out_grammar = state.grammar_encoding
            # apply transformation to previous symbol (last_sym_start -
            # end) according to rewrite rules:
            # S* --> S' ::= S S' |
            # S+ --> S' ::= S S' | S
            # S? --> S' ::= S |
            sub_rule_id = generate_symbol_id(state, rule_name)
            out_grammar.append(sub_rule_id)
            sub_rule_start = len(out_grammar)
            # placeholder for size of 1st alternate
            out_grammar.append(TO_BE_FILLED_MARKER)
            # add preceding symbol to generated rule
            out_grammar.extend(outbuf[last_sym_start:])
            if remaining_src[0] in ("*", "+"):
                # cause generated rule to recurse
                out_grammar.append(REF_RULE_MARKER)
                out_grammar.append(sub_rule_id)
            # apply actual size
            out_grammar[sub_rule_start] = len(out_grammar) - sub_rule_start
            # mark end of 1st alternate
            out_grammar.append(END_OF_ALTERNATE_MARKER)
            sub_rule_start = len(out_grammar)
            # placeholder for size of 2nd alternate
            out_grammar.append(TO_BE_FILLED_MARKER)
            if remaining_src[0] == "+":
                # add preceding symbol as alternate only for '+'
                out_grammar.extend(outbuf[last_sym_start:])
            # apply actual size of 2nd alternate
            out_grammar[sub_rule_start] = len(out_grammar) - sub_rule_start
            # mark end of 2nd alternate, then end of rule
            out_grammar.append(END_OF_ALTERNATE_MARKER)
            out_grammar.append(END_OF_RULE_MARKER)
            # in original rule, replace previous symbol with reference to generated rule
            outbuf[last_sym_start:] = [1, sub_rule_id]
            remaining_src = remove_leading_white_space(remaining_src[1:], is_nested)
        else:
            break
    # apply actual size of this alternate sequence
    outbuf[out_start_pos] = len(outbuf) - out_start_pos
    # mark end of alternate
    outbuf.append(END_OF_ALTERNATE_MARKER)
    return remaining_src
 def parse_alternates(state, src, rule_name, rule_id, is_nested):
    outbuf = []
    remaining_src = parse_sequence(state, src, rule_name, outbuf, is_nested)
    while remaining_src and remaining_src[0] == "|":
        remaining_src = remove_leading_white_space(remaining_src[1:], True)
        remaining_src = parse_sequence(state, remaining_src, rule_name, outbuf, is_nested)
    state.grammar_encoding.append(rule_id)
    state.grammar_encoding.extend(outbuf)
    state.grammar_encoding.append(0)
    return remaining_src
 def parse_rule(state, src):
    name, remaining_src = parse_name(src)
    remaining_src = remove_leading_white_space(remaining_src, False)
    rule_id = get_symbol_id(state, name)
    if remaining_src[:3] != "::=":
        raise RuntimeError("expecting ::= at " + remaining_src)
    remaining_src = remove_leading_white_space(remaining_src[3:], True)
    remaining_src = parse_alternates(state, remaining_src, name, rule_id, False)
    if remaining_src and remaining_src[0] == "\r":
        remaining_src = remaining_src[2:] if remaining_src[1] == "\n" else remaining_src[1:]
    elif remaining_src and remaining_src[0] == "\n":
        remaining_src = remaining_src[1:]
    elif remaining_src:
        raise RuntimeError("expecting newline or end at " + remaining_src)
    return remove_leading_white_space(remaining_src, True)
 def parse_ebnf(src):
    try:
        state = ParseState()
        grammar_repr = remove_leading_white_space(src, True)
        last_grammar_repr = ""
        while grammar_repr:
            if last_grammar_repr:
                last_parsed_rule_len = len(last_grammar_repr) - len(grammar_repr)
                logger.debug(f"last_parsed_rule: {last_grammar_repr[:last_parsed_rule_len]}")
            last_grammar_repr = grammar_repr
            grammar_repr = parse_rule(state, grammar_repr)
        state.grammar_encoding.append(0xFFFF)
        return state
    except RuntimeError as err:
        logger.warning("error parsing grammar:", err)
        return ParseState()
 def print_rule(file, grammar_encoding, index, symbol_id_names):
    rule_id = grammar_encoding[index]
    print(f"<{index}>{symbol_id_names[rule_id]} ::=", end=" ", file=file)
    pos = index + 1
    while grammar_encoding[pos]:
        if pos - 1 > index:
            print("|", end=" ", file=file)
        pos += 1  # sequence size, not needed here
        while grammar_encoding[pos]:
            if grammar_encoding[pos] == REF_RULE_MARKER:
                ref_rule_id = grammar_encoding[pos + 1]
                print(
                    f"<{pos}>{symbol_id_names[ref_rule_id]}",
                    end=" ",
                    file=file,
                )
                pos += 2
            else:
                print("<{}>[".format(pos), end="", file=file)
                num_chars = grammar_encoding[pos]
                pos += 1
                for i in range(0, num_chars, 2):
                    print("{}-".format(chr(grammar_encoding[pos + i])), end="", file=file)
                    if i + 1 < num_chars:
                        print("{}".format(chr(grammar_encoding[pos + i + 1])), end="", file=file)
                print("]", end=" ", file=file)
                pos += num_chars
        pos += 1
    print(file=file)
    return pos + 1
 def print_grammar(file, state):
    pos = 0
    symbol_id_names = {v: k for k, v in state.symbol_ids.items()}
    print("Grammar Rules:", file=file)
    while state.grammar_encoding[pos] != 0xFFFF:
        pos = print_rule(file, state.grammar_encoding, pos, symbol_id_names)
    pos = 0
    print("\nBinary representation:", file=file)
    while state.grammar_encoding[pos] != 0xFFFF:
        print(f"{state.grammar_encoding[pos]:04x}", end=" ", file=file)
        pos += 1
    print("ffff\n")
 ###################################
 # EBNF Grammar Parsing ends here  #
 ###################################
 class GrammarConstraint(ABC):
    def __init__(self, grammar_str, start_rule_name, tokenizer):
        self.tt = 0
        self.nt = 0
        state = parse_ebnf(grammar_str)
        grammar_encoding = state.grammar_encoding
        self.start_rule_id = state.symbol_ids.get(start_rule_name)
        self.eos_token_id = tokenizer.eos_token_id
        self.token_trie = TokenTrie(tokenizer)
        self.tokenizer = tokenizer
        self.grammar_encoding = grammar_encoding
        pos = 0
        rules: Dict[int, int] = {}
        while grammar_encoding[pos] != 0xFFFF:
            rule_id = grammar_encoding[pos]
            # Store the current position in the 'rules' list at the index corresponding to rule_id.
            # This effectively maps each rule_id to its position in the grammar encoding.
            rules[rule_id] = pos
            pos += 1
            # Continue to the next rule in the encoding.
            # The loop advances by the size indicated at the current position (grammar_encoding[pos])
            # plus one for the size field itself.
            while grammar_encoding[pos]:
                pos += 1 + grammar_encoding[pos]
            # Now we're at the end of the rule,
            # so advance to the next rule by skipping the 0, which means 'end of rule'.
            pos += 1
        self.start_rule_pos = rules[self.start_rule_id]
        self.rules_pos_dict: Dict[int, int] = rules
    def init_stacks(self):
        # suppose the start rule position is 0, then grammar_encoding[0] = rule_id
        # grammar_encoding[1] = rule_size
        # grammar_encoding[2] = rule_type
        # this is why we need to add 2 to the start rule position
        stack = [self.start_rule_pos + 2]
        # convert to tuple for caching(immutable)
        return self.advance_stack(tuple(stack))
    # For each stack, resolve rules to find the actual characters that are
    # accepted by this stack (not the set of sub-rules).
    # This is where the parsing happens.
    # The parsing is a top-down, left-to-right, depth-first traversal of the
    # grammar.
    @lru_cache(maxsize=32768)
    def advance_stack(self, stack):
        stack = list(stack)
        # If the stack is empty, we're done. Because no more tokens should be accepted.
        if len(stack) == 0:
            return [stack]
        # Get the top of the stack.
        pos = stack[-1]
        # If the stack head is a terminal(literal), we can resolve it immediately.
        # literal is marked with 2 in the grammar encoding.
        if self.grammar_encoding[pos] > 1:
            return [stack]
        # The stack head is a nonterminal (a rule reference, 1 in the grammar encoding).
        # Resolving this rule gives a set of one or more possible positions
        # (e.g. two in `a ::= b | c`)
        # We pop the current rule off the stack and, for each option, push:
        # - the symbol following this symbol in the current rule; then
        # - the first symbol of the resolved rule.
        referenced_rule_id = self.grammar_encoding[pos + 1]
        # subpos should points to the size of the subrule
        subpos = self.rules_pos_dict[referenced_rule_id] + 1
        stacks: List[List[int]] = []
        # do depth-first search to find all possible rules and check the next terminal
        # When this value is non-zero, it indicates that subpos is not yet at the end of the rule, so we can continue.
        # here subpos is a pointer, and the value in the rule encoding can never be 0 except for the end of the rule.
        while self.grammar_encoding[subpos]:
            new_stack = stack[:-1]
            if self.grammar_encoding[pos + 2]:
                # check if there is a next symbol in the current rule, e.g. `a ::= b c | d`
                # if yes, push the pos to rule_size to the stack
                new_stack.append(pos + 2)
            # if the type of the next symbol is not "empty", push the first symbol of the resolved rule to the stack
            if self.grammar_encoding[subpos + 1]:
                new_stack.append(subpos + 1)
            stacks.extend(self.advance_stack(tuple(new_stack)))
            # The increment subpos += self.grammar_encoding[subpos] + 1
            # moves subpos forward in the grammar encoding array to the next alternative in the current rule.
            subpos += self.grammar_encoding[subpos] + 1
        return stacks
    def accept_char(self, *args, **kwargs):
        """Process a byte according to the grammar rules."""
        raise NotImplementedError
    def accept_token_id(self, *args, **kwargs):
        """Process a token according to the grammar rules."""
        raise NotImplementedError
    def filter_vocab(self, *args, **kwargs):
        raise NotImplementedError
 class IncrementalGrammarConstraint(GrammarConstraint):
    def __init__(self, grammar_str, start_rule_name, tokenizer):
        super().__init__(grammar_str, start_rule_name, tokenizer)
    def accept_char(self, byte, stacks):
        new_stacks = []
        for stack in stacks:
            # stack is empty
            if not stack:
                continue
            pos = stack[-1]
            num_chars = self.grammar_encoding[pos]
            # to make pos point to the size of the char range rule
            pos += 1
            found = False
            for i in range(0, num_chars, 2):
                if self.grammar_encoding[pos + i] <= byte and byte <= self.grammar_encoding[pos + i + 1]:
                    found = True
                    break
            if not found:
                continue
            pos += num_chars
            new_stack = stack[:-1]
            if self.grammar_encoding[pos]:
                new_stack.append(pos)
            new_stacks.extend(self.advance_stack(tuple(new_stack)))
        return new_stacks
    def accept_string(self, string: str, stacks: List[List[int]]):
        _bytes = bytes(string, "utf-8")
        for byte in _bytes:
            stacks = self.accept_char(byte, stacks)
        return stacks
    def accept_token_id(self, token_id: int, stacks: List[List[int]]):
        if token_id == self.eos_token_id:
            if stacks and all(len(stack) != 0 for stack in stacks):
                raise Exception(
                    f"At least one of the stack should be empty when EOS is reached. However, "
                    f"the stacks are {stacks}"
                )
            return []
        for byte in self.token_trie.id2str(token_id):
            stacks = self.accept_char(byte, stacks)
            # check updated stacks
            # TODO, I commented this out because it will fail when the stack is empty
            # empty stack means the end of the grammar
            # assert stacks != []
        return stacks
    def accept_token_ids(self, token_ids: List[int], stacks: List[List[int]], as_string=True):
        if as_string:
            string = self.tokenizer.decode(token_ids)
            stacks = self.accept_string(string, stacks)
        else:
            for token_id in token_ids:
                stacks = self.accept_token_id(token_id, stacks)
        return stacks
    def batch_filter_vocab(self, batch_stacks, device):
        batch_acceptance = []
        for stacks in batch_stacks:
            batch_acceptance.append(self.filter_vocab(stacks, device))
        return torch.stack(batch_acceptance)
    def filter_vocab(self, stacks, device):
        if not stacks:  # Check if stacks is empty
            # Handle the empty case: for example, return a tensor of False
            # The size of the tensor should match the size of your vocabulary
            vocab_size = len(self.token_trie)
            logger.debug(f"sum of acceptance: {0}")
            return torch.zeros(vocab_size, dtype=torch.bool, device=device)
        acceptance_matrix = torch.cat([self.token_acceptance_for_stack(tuple(stack), device) for stack in stacks])
        # Merge stacks: any True => True
        acceptance = acceptance_matrix.reshape(len(stacks), -1).any(dim=0)
        logger.debug(f"sum of acceptance: {acceptance.sum()}")
        return acceptance
    # For each sub-rule in the grammar, cache whether each byte is accepted.
    @lru_cache(maxsize=None)
    def pos_char_acceptance(self, pos):
        acceptance = [False] * 256
        num_chars = self.grammar_encoding[pos]
        pos += 1
        for i in range(0, num_chars, 2):
            start = self.grammar_encoding[pos + i]
            end = self.grammar_encoding[pos + i + 1]
            for j in range(start, end + 1):
                acceptance[j] = True
        return acceptance
    # Probably this should be configurable. If the grammar has an exceedingly
    # large number of states, the correct setting is a tradeoff between GPU
    # RAM usage and recomputation time.
    #
    # The main variable that pushes usage up here is number of states in the
    # grammar.
    @lru_cache(maxsize=32768)
    def token_acceptance_for_stack(self, stack, device):
        st = time.time()
        stack = list(stack)  # needs to come in as a tuple for lru_cache
        accepts = [False] * len(self.token_trie)
        accepts[self.eos_token_id] = len(stack) == 0
        if len(stack) == 0:
            logger.debug("empty stack")
        def traverse_trie(trie, stacks):
            for byte, next_trie in trie.items():
                if byte == LEAF:
                    token_id = next_trie
                    if token_id != self.eos_token_id:
                        accepts[token_id] = bool(stacks)
                    continue
                new_stacks = []
                for stk in stacks:
                    if not stk:
                        continue
                    pos = stk[-1]
                    num_chars = self.grammar_encoding[pos]
                    if not self.pos_char_acceptance(pos)[byte]:
                        continue
                    pos += num_chars + 1
                    new_stack = stk[:-1]
                    if self.grammar_encoding[pos]:
                        new_stack.append(pos)
                    new_stacks.extend(self.advance_stack(tuple(new_stack)))
                if new_stacks:
                    traverse_trie(next_trie, new_stacks)
        traverse_trie(self.token_trie.trie, [stack])
        et = time.time() - st
        x = torch.tensor(accepts, dtype=torch.bool, device=device)
        self.tt += et
        self.nt += 1
        return x
 class StaticGrammarConstraint(GrammarConstraint):
    def __init__(self, grammar_str, start_rule_name, tokenizer):
        super().__init__(grammar_str, start_rule_name, tokenizer)
    def accept_char(self):
        raise NotImplementedError
 #################
 # DATA STRUCTURES
 #################
 LEAF = -1
 class TokenTrie:
    def __init__(self, tokenizer):
        self.eos_token_id = tokenizer.eos_token_id
        self.tokens = []
        self.trie = {}
        self.load_tokens(tokenizer)
    def id2str(self, token_id):
        return self.tokens[token_id]
    def __len__(self):
        return len(self.tokens)
    def load_tokens(self, tokenizer):
        def replace_hex(match):
            hex_value = match.group(1)
            return chr(int(hex_value, 16))
        if "gpt2" in tokenizer.__class__.__name__.lower():
            special = tokenizer.additional_special_tokens_ids
            # Here, the decoder does a string replace on a bunch of sequences
            # like ' .' for '.'. This interferes with our assumptions, where a
            # token should always have exactly one representation.
            # Fortunately(?) text-generation-inference doesn't seem to run this
            # cleanup, so we get extraneous spaces. So, in order to generate
            # the right token set for TGI, we have to skip the space trimming.
            # See:
            # https://github.com/huggingface/transformers/blob/main/src/transformers/tokenization_utils_base.py#L3588-L3600
            def fmt_token(id):
                if id in special:
                    return None
                return bytes(tokenizer.decode([id], clean_up_tokenization_spaces=False), "utf-8")
        elif "llama" in tokenizer.__class__.__name__.lower():
            def fmt_token(id):
                token = tokenizer.convert_ids_to_tokens(id)
                token = re.sub(r"<0x([0-9a-fA-F]{2})>", replace_hex, token)
                token = token.replace("▁", " ")
                return bytes(token, "utf-8")
        else:
            print("Warning: unrecognized tokenizer: using default token formatting")
            def fmt_token(id):
                token = tokenizer.convert_ids_to_tokens(id)
                return bytes(token, "utf-8")
        # note: vocab_size doesn't work here because there are also
        # get_added_vocab() tokens
        self.tokens = [fmt_token(i) for i in range(len(tokenizer.get_vocab()))]
        for token_id, token_bytes in enumerate(self.tokens):
            if token_bytes is not None:
                self.insert_into_trie(self.trie, token_bytes, token_id)
    def insert_into_trie(self, trie, token_bytes, token_id):
        current = trie
        for byte in token_bytes:
            if byte not in current:
                current[byte] = {}
            current = current[byte]
        current[LEAF] = token_id
@lru_cache(maxsize=5)
 def initialize_grammar(grammar_string):
    return IncrementalGrammarConstraint(grammar_string.strip(), start_rule_name="root", tokenizer=shared.tokenizer)
--- a/python/llm/example/Text-Generation-WebUI/modules/grammar/logits_process.py
+++ b/python/llm/example/Text-Generation-WebUI/modules/grammar/logits_process.py
@ -0,0 +1,113 @@
 #
 # Copyright 2016 The BigDL Authors.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # This file is adapted from
 # https://github.com/huggingface/transformers/pull/27557
 import math
 import torch
 from transformers.generation.logits_process import LogitsProcessor
 from transformers.utils import add_start_docstrings
 LOGITS_PROCESSOR_INPUTS_DOCSTRING = r"""
    Args:
        input_ids (`torch.LongTensor` of shape `(batch_size, sequence_length)`):
            Indices of input sequence tokens in the vocabulary. [What are input IDs?](../glossary#input-ids)
        scores (`torch.FloatTensor` of shape `(batch_size, config.vocab_size)`):
            Prediction scores of a language modeling head. These can be logits for each vocabulary when not using beam
            search or log softmax for each vocabulary token when using beam search
    Return:
        `torch.FloatTensor` of shape `(batch_size, config.vocab_size)`: The processed prediction scores.
 """
 class GrammarConstrainedLogitsProcessor(LogitsProcessor):
    def __init__(self, grammar_constraint):
        self.last_size = None
        self.grammar_constraint = grammar_constraint
        self.batch_stacks = None
    def filter_logits(self, logits, device):
        # resolve each stack to a tensor of True/False for each token
        # indicating acceptance
        # acceptance = self.grammar_acceptor.filter_vocab(self.stacks, device)
        acceptance = self.grammar_constraint.batch_filter_vocab(self.batch_stacks, device)
        # logger.debug(acceptance)
        # Logits to -inf where False
        logits[~acceptance] = -math.inf
    # TODO: batching
    def process_logits(self, input_ids, scores, parse_start_index=None):
        """
        :param input_ids:
        :param scores:
        :param parse_start_index: default None, which means generate from scratch. Set to 0 to parse all input_ids
        :return:
        """
        # we dynamically create stacks at the first call, so that we know the batch size and beam size
        if self.batch_stacks is None:
            self.batch_stacks = [self.grammar_constraint.init_stacks() for _ in range(len(input_ids))]
        # if self.last_size is not set (which would be the case when processing the first token).
        # In this case, do nothing.
        if self.last_size is None:
            prefix_to_parse = [
                single_input_ids[parse_start_index:] if parse_start_index is not None else []
                for single_input_ids in input_ids
            ]
            # self.grammar_acceptor.accept_token_ids(prefix_to_parse, self.stacks)
            self.batch_stacks = [
                self.grammar_constraint.accept_token_ids(prefix, stack)
                for prefix, stack in zip(prefix_to_parse, self.batch_stacks)
            ]
        #  if the length of the current input IDs (input_ids[0]) is exactly one more than self.last_size.
        #  This is expected in a scenario where inputs are processed incrementally, one token at a time.
        elif len(input_ids[0]) == self.last_size + 1:
            # self.stacks = self.grammar_acceptor.accept_token_id(input_ids[0][-1], self.stacks)
            self.batch_stacks = [
                self.grammar_constraint.accept_token_id(single_input_ids[-1], stack)
                for single_input_ids, stack in zip(input_ids, self.batch_stacks)
            ]
        #  ensure that the input size is consistent with the expected incremental processing
        #  (i.e., one token at a time).
        else:
            # here we check if the input_ids are one token longer than the last time we processed
            # but we don't check if input_ids are actually valid.
            # Imagine a scenario where we generate 10 tokens, then we replace the 10 generated tokens with 10 new tokens.
            # In this case, the input_ids will be consistent with the last_size, but the input_ids are not valid.
            # However, should we really check if the input_ids are valid here?
            # If we do, then we need to reparse the whole input_ids at each call, which is not efficient.
            # Maybe we should just trust the user to provide valid input_ids?
            # The conclusion is that, we assume the input_ids are valid, and our generation will be correct.
            # If the input_ids are not valid, then the generation result will be wrong and we don't take responsibility for that.
            raise RuntimeError(
                "Input ID's length is inconsistent with the current state of "
                "the GrammarConstrainedLogitsProcessor. If you want to process "
                "another input sequence, please instantiate a new "
                "GrammarConstrainedLogitsProcessor."
            )
        self.filter_logits(scores, scores.device)
        self.last_size = len(input_ids[0])
        return scores
    @add_start_docstrings(LOGITS_PROCESSOR_INPUTS_DOCSTRING)
    def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor) -> torch.FloatTensor:
        return self.process_logits(input_ids, scores)
--- a/python/llm/example/Text-Generation-WebUI/modules/html_generator.py
+++ b/python/llm/example/Text-Generation-WebUI/modules/html_generator.py
@ -0,0 +1,328 @@
 #
 # Copyright 2016 The BigDL Authors.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # This file is adapted from
 # https://github.com/oobabooga/text-generation-webui/blob/main/modules/html_generator.py
 import html
 import os
 import re
 import time
 from pathlib import Path
 import markdown
 from PIL import Image, ImageOps
 from modules.utils import get_available_chat_styles
 from modules import shared
 # This is to store the paths to the thumbnails of the profile pictures
 image_cache = {}
 with open(Path(__file__).resolve().parent / '../css/html_readable_style.css', 'r') as f:
    readable_css = f.read()
 with open(Path(__file__).resolve().parent / '../css/html_4chan_style.css', 'r') as css_f:
    _4chan_css = css_f.read()
 with open(Path(__file__).resolve().parent / '../css/html_instruct_style.css', 'r') as f:
    instruct_css = f.read()
 # Custom chat styles
 chat_styles = {}
 for k in get_available_chat_styles():
    chat_styles[k] = open(Path(f'css/chat_style-{k}.css'), 'r').read()
 # Handle styles that derive from other styles
 for k in chat_styles:
    lines = chat_styles[k].split('\n')
    input_string = lines[0]
    match = re.search(r'chat_style-([a-z\-]*)\.css', input_string)
    if match:
        style = match.group(1)
        chat_styles[k] = chat_styles.get(style, '') + '\n\n' + '\n'.join(lines[1:])
 def fix_newlines(string):
    string = string.replace('\n', '\n\n')
    string = re.sub(r"\n{3,}", "\n\n", string)
    string = string.strip()
    return string
 def replace_blockquote(m):
    return m.group().replace('\n', '\n> ').replace('\\begin{blockquote}', '').replace('\\end{blockquote}', '')
 def convert_to_markdown(string):
    # Blockquote
    string = re.sub(r'(^|[\n])&gt;', r'\1>', string)
    pattern = re.compile(r'\\begin{blockquote}(.*?)\\end{blockquote}', re.DOTALL)
    string = pattern.sub(replace_blockquote, string)
    # Code
    string = string.replace('\\begin{code}', '```')
    string = string.replace('\\end{code}', '```')
    string = re.sub(r"(.)```", r"\1\n```", string)
    result = ''
    is_code = False
    for line in string.split('\n'):
        if line.lstrip(' ').startswith('```'):
            is_code = not is_code
        result += line
        if is_code or line.startswith('|'):  # Don't add an extra \n for tables or code
            result += '\n'
        else:
            result += '\n\n'
    result = result.strip()
    if is_code:
        result += '\n```'  # Unfinished code block
    # Unfinished list, like "\n1.". A |delete| string is added and then
    # removed to force a <ol> or <ul> to be generated instead of a <p>.
    if re.search(r'(\n\d+\.?|\n\*\s*)$', result):
        delete_str = '|delete|'
        if re.search(r'(\d+\.?)$', result) and not result.endswith('.'):
            result += '.'
        result = re.sub(r'(\n\d+\.?|\n\*\s*)$', r'\g<1> ' + delete_str, result)
        html_output = markdown.markdown(result, extensions=['fenced_code', 'tables'])
        pos = html_output.rfind(delete_str)
        if pos > -1:
            html_output = html_output[:pos] + html_output[pos + len(delete_str):]
    else:
        html_output = markdown.markdown(result, extensions=['fenced_code', 'tables'])
    # Unescape code blocks
    pattern = re.compile(r'<code[^>]*>(.*?)</code>', re.DOTALL)
    html_output = pattern.sub(lambda x: html.unescape(x.group()), html_output)
    return html_output
 def generate_basic_html(string):
    string = convert_to_markdown(string)
    string = f'<style>{readable_css}</style><div class="readable-container">{string}</div>'
    return string
 def process_post(post, c):
    t = post.split('\n')
    number = t[0].split(' ')[1]
    if len(t) > 1:
        src = '\n'.join(t[1:])
    else:
        src = ''
    src = re.sub('>', '&gt;', src)
    src = re.sub('(&gt;&gt;[0-9]*)', '<span class="quote">\\1</span>', src)
    src = re.sub('\n', '<br>\n', src)
    src = f'<blockquote class="message_4chan">{src}\n'
    src = f'<span class="name">Anonymous </span> <span class="number">No.{number}</span>\n{src}'
    return src
 def generate_4chan_html(f):
    posts = []
    post = ''
    c = -2
    for line in f.splitlines():
        line += "\n"
        if line == '-----\n':
            continue
        elif line.startswith('--- '):
            c += 1
            if post != '':
                src = process_post(post, c)
                posts.append(src)
            post = line
        else:
            post += line
    if post != '':
        src = process_post(post, c)
        posts.append(src)
    for i in range(len(posts)):
        if i == 0:
            posts[i] = f'<div class="op">{posts[i]}</div>\n'
        else:
            posts[i] = f'<div class="reply">{posts[i]}</div>\n'
    output = ''
    output += f'<style>{_4chan_css}</style><div id="parent"><div id="container">'
    for post in posts:
        output += post
    output += '</div></div>'
    output = output.split('\n')
    for i in range(len(output)):
        output[i] = re.sub(r'^(&gt;(.*?)(<br>|</div>))', r'<span class="greentext">\1</span>', output[i])
        output[i] = re.sub(r'^<blockquote class="message_4chan">(&gt;(.*?)(<br>|</div>))', r'<blockquote class="message_4chan"><span class="greentext">\1</span>', output[i])
    output = '\n'.join(output)
    return output
 def make_thumbnail(image):
    image = image.resize((350, round(image.size[1] / image.size[0] * 350)), Image.Resampling.LANCZOS)
    if image.size[1] > 470:
        image = ImageOps.fit(image, (350, 470), Image.LANCZOS)
    return image
 def get_image_cache(path):
    cache_folder = Path(shared.args.disk_cache_dir)
    if not cache_folder.exists():
        cache_folder.mkdir()
    mtime = os.stat(path).st_mtime
    if (path in image_cache and mtime != image_cache[path][0]) or (path not in image_cache):
        img = make_thumbnail(Image.open(path))
        old_p = Path(f'{cache_folder}/{path.name}_cache.png')
        p = Path(f'{cache_folder}/cache_{path.name}.png')
        if old_p.exists():
            old_p.rename(p)
        output_file = p
        img.convert('RGB').save(output_file, format='PNG')
        image_cache[path] = [mtime, output_file.as_posix()]
    return image_cache[path][1]
 def generate_instruct_html(history):
    output = f'<style>{instruct_css}</style><div class="chat" id="chat"><div class="messages">'
    for i, _row in enumerate(history):
        row = [convert_to_markdown(entry) for entry in _row]
        if row[0]:  # don't display empty user messages
            output += f"""
                  <div class="user-message">
                    <div class="text">
                      <div class="message-body">
                        {row[0]}
                      </div>
                    </div>
                  </div>
                """
        output += f"""
              <div class="assistant-message">
                <div class="text">
                  <div class="message-body">
                    {row[1]}
                  </div>
                </div>
              </div>
            """
    output += "</div></div>"
    return output
 def generate_cai_chat_html(history, name1, name2, style, character, reset_cache=False):
    output = f'<style>{chat_styles[style]}</style><div class="chat" id="chat"><div class="messages">'
    # We use ?character and ?time.time() to force the browser to reset caches
    img_bot = f'<img src="file/cache/pfp_character_thumb.png?{character}" class="pfp_character">' if Path("cache/pfp_character_thumb.png").exists() else ''
    img_me = f'<img src="file/cache/pfp_me.png?{time.time() if reset_cache else ""}">' if Path("cache/pfp_me.png").exists() else ''
    for i, _row in enumerate(history):
        row = [convert_to_markdown(entry) for entry in _row]
        if row[0]:  # don't display empty user messages
            output += f"""
                  <div class="message">
                    <div class="circle-you">
                      {img_me}
                    </div>
                    <div class="text">
                      <div class="username">
                        {name1}
                      </div>
                      <div class="message-body">
                        {row[0]}
                      </div>
                    </div>
                  </div>
                """
        output += f"""
              <div class="message">
                <div class="circle-bot">
                  {img_bot}
                </div>
                <div class="text">
                  <div class="username">
                    {name2}
                  </div>
                  <div class="message-body">
                    {row[1]}
                  </div>
                </div>
              </div>
            """
    output += "</div></div>"
    return output
 def generate_chat_html(history, name1, name2, reset_cache=False):
    output = f'<style>{chat_styles["wpp"]}</style><div class="chat" id="chat"><div class="messages">'
    for i, _row in enumerate(history):
        row = [convert_to_markdown(entry) for entry in _row]
        if row[0]:  # don't display empty user messages
            output += f"""
              <div class="message">
                <div class="text-you">
                  <div class="message-body">
                    {row[0]}
                  </div>
                </div>
              </div>
            """
        output += f"""
          <div class="message">
            <div class="text-bot">
              <div class="message-body">
                {row[1]}
              </div>
            </div>
          </div>
        """
    output += "</div></div>"
    return output
 def chat_html_wrapper(history, name1, name2, mode, style, character, reset_cache=False):
    if mode == 'instruct':
        return generate_instruct_html(history['visible'])
    elif style == 'wpp':
        return generate_chat_html(history['visible'], name1, name2)
    else:
        return generate_cai_chat_html(history['visible'], name1, name2, style, character, reset_cache)
--- a/python/llm/example/Text-Generation-WebUI/modules/loaders.py
+++ b/python/llm/example/Text-Generation-WebUI/modules/loaders.py
@ -0,0 +1,427 @@
 #
 # Copyright 2016 The BigDL Authors.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # This file is adapted from
 # https://github.com/oobabooga/text-generation-webui/blob/main/modules/loaders.py
 import functools
 from collections import OrderedDict
 import gradio as gr
 from modules import shared
 loaders_and_params = OrderedDict({
    'Transformers': [
        'cpu_memory',
        'gpu_memory',
        'load_in_8bit',
        'bf16',
        'cpu',
        'disk',
        'auto_devices',
        'load_in_4bit',
        'use_double_quant',
        'quant_type',
        'compute_dtype',
        'trust_remote_code',
        'no_use_fast',
        'use_flash_attention_2',
        'alpha_value',
        'rope_freq_base',
        'compress_pos_emb',
        'disable_exllama',
        'disable_exllamav2',
        'transformers_info'
    ],
    'llama.cpp': [
        'n_ctx',
        'n_gpu_layers',
        'tensor_split',
        'n_batch',
        'threads',
        'threads_batch',
        'no_mmap',
        'mlock',
        'no_mul_mat_q',
        'alpha_value',
        'rope_freq_base',
        'compress_pos_emb',
        'cpu',
        'numa',
        'no_offload_kqv',
        'tensorcores',
    ],
    'llamacpp_HF': [
        'n_ctx',
        'n_gpu_layers',
        'tensor_split',
        'n_batch',
        'threads',
        'threads_batch',
        'no_mmap',
        'mlock',
        'no_mul_mat_q',
        'alpha_value',
        'rope_freq_base',
        'compress_pos_emb',
        'cpu',
        'numa',
        'cfg_cache',
        'trust_remote_code',
        'no_use_fast',
        'logits_all',
        'no_offload_kqv',
        'tensorcores',
        'llamacpp_HF_info',
    ],
    'ExLlamav2_HF': [
        'gpu_split',
        'max_seq_len',
        'cfg_cache',
        'no_flash_attn',
        'num_experts_per_token',
        'cache_8bit',
        'alpha_value',
        'compress_pos_emb',
        'trust_remote_code',
        'no_use_fast',
    ],
    'ExLlamav2': [
        'gpu_split',
        'max_seq_len',
        'no_flash_attn',
        'num_experts_per_token',
        'cache_8bit',
        'alpha_value',
        'compress_pos_emb',
        'exllamav2_info',
    ],
    'AutoGPTQ': [
        'triton',
        'no_inject_fused_attention',
        'no_inject_fused_mlp',
        'no_use_cuda_fp16',
        'wbits',
        'groupsize',
        'desc_act',
        'disable_exllama',
        'disable_exllamav2',
        'gpu_memory',
        'cpu_memory',
        'cpu',
        'disk',
        'auto_devices',
        'trust_remote_code',
        'no_use_fast',
        'autogptq_info',
    ],
    'BigDL-LLM': [
        'load_in_4bit',
        'load_in_low_bit',
        'optimize_model',
        'modules_to_not_convert',
        'cpu_embedding',
        'lightweight_bmm',
        'trust_remote_code',
        'use_cache',
    ],
    'AutoAWQ': [
        'cpu_memory',
        'gpu_memory',
        'auto_devices',
        'max_seq_len',
        'no_inject_fused_attention',
        'trust_remote_code',
        'no_use_fast',
    ],
    'GPTQ-for-LLaMa': [
        'wbits',
        'groupsize',
        'model_type',
        'pre_layer',
        'trust_remote_code',
        'no_use_fast',
        'gptq_for_llama_info',
    ],
    'ctransformers': [
        'n_ctx',
        'n_gpu_layers',
        'n_batch',
        'threads',
        'model_type',
        'no_mmap',
        'mlock'
    ],
    'QuIP#': [
        'trust_remote_code',
        'no_use_fast',
        'no_flash_attn',
        'quipsharp_info',
    ],
    'HQQ': [
        'hqq_backend',
        'trust_remote_code',
        'no_use_fast',
    ]
 })
 def transformers_samplers():
    return {
        'temperature',
        'temperature_last',
        'dynamic_temperature',
        'dynamic_temperature_low',
        'top_p',
        'min_p',
        'top_k',
        'typical_p',
        'epsilon_cutoff',
        'eta_cutoff',
        'tfs',
        'top_a',
        'repetition_penalty',
        'presence_penalty',
        'frequency_penalty',
        'repetition_penalty_range',
        'encoder_repetition_penalty',
        'no_repeat_ngram_size',
        'min_length',
        'seed',
        'do_sample',
        'penalty_alpha',
        'num_beams',
        'length_penalty',
        'early_stopping',
        'mirostat_mode',
        'mirostat_tau',
        'mirostat_eta',
        'grammar_file_row',
        'grammar_string',
        'guidance_scale',
        'negative_prompt',
        'ban_eos_token',
        'custom_token_bans',
        'add_bos_token',
        'skip_special_tokens',
        'auto_max_new_tokens',
    }
 loaders_samplers = {
    'Transformers': transformers_samplers(),
    'AutoGPTQ': transformers_samplers(),
    'GPTQ-for-LLaMa': transformers_samplers(),
    'AutoAWQ': transformers_samplers(),
    'QuIP#': transformers_samplers(),
    'HQQ': transformers_samplers(),
    'BigDL-LLM': transformers_samplers(),
    'ExLlamav2': {
        'temperature',
        'top_p',
        'min_p',
        'top_k',
        'typical_p',
        'tfs',
        'repetition_penalty',
        'repetition_penalty_range',
        'seed',
        'mirostat_mode',
        'mirostat_tau',
        'mirostat_eta',
        'ban_eos_token',
        'add_bos_token',
        'custom_token_bans',
        'skip_special_tokens',
        'auto_max_new_tokens',
    },
    'ExLlamav2_HF': {
        'temperature',
        'temperature_last',
        'dynamic_temperature',
        'dynamic_temperature_low',
        'top_p',
        'min_p',
        'top_k',
        'typical_p',
        'epsilon_cutoff',
        'eta_cutoff',
        'tfs',
        'top_a',
        'repetition_penalty',
        'presence_penalty',
        'frequency_penalty',
        'repetition_penalty_range',
        'encoder_repetition_penalty',
        'no_repeat_ngram_size',
        'min_length',
        'seed',
        'do_sample',
        'mirostat_mode',
        'mirostat_tau',
        'mirostat_eta',
        'grammar_file_row',
        'grammar_string',
        'guidance_scale',
        'negative_prompt',
        'ban_eos_token',
        'custom_token_bans',
        'add_bos_token',
        'skip_special_tokens',
        'auto_max_new_tokens',
    },
    'llama.cpp': {
        'temperature',
        'top_p',
        'min_p',
        'top_k',
        'typical_p',
        'tfs',
        'repetition_penalty',
        'presence_penalty',
        'frequency_penalty',
        'seed',
        'mirostat_mode',
        'mirostat_tau',
        'mirostat_eta',
        'grammar_file_row',
        'grammar_string',
        'ban_eos_token',
        'custom_token_bans',
    },
    'llamacpp_HF': {
        'temperature',
        'temperature_last',
        'dynamic_temperature',
        'dynamic_temperature_low',
        'top_p',
        'min_p',
        'top_k',
        'typical_p',
        'epsilon_cutoff',
        'eta_cutoff',
        'tfs',
        'top_a',
        'repetition_penalty',
        'presence_penalty',
        'frequency_penalty',
        'repetition_penalty_range',
        'encoder_repetition_penalty',
        'no_repeat_ngram_size',
        'min_length',
        'seed',
        'do_sample',
        'mirostat_mode',
        'mirostat_tau',
        'mirostat_eta',
        'grammar_file_row',
        'grammar_string',
        'guidance_scale',
        'negative_prompt',
        'ban_eos_token',
        'custom_token_bans',
        'add_bos_token',
        'skip_special_tokens',
        'auto_max_new_tokens',
    },
    'ctransformers': {
        'temperature',
        'top_p',
        'top_k',
        'repetition_penalty',
        'repetition_penalty_range',
    },
 }
 loaders_model_types = {
    'GPTQ-for-LLaMa': [
        "None",
        "llama",
        "opt",
        "gptj"
    ],
    'ctransformers': [
        "None",
        "gpt2",
        "gptj",
        "gptneox",
        "llama",
        "mpt",
        "dollyv2",
        "replit",
        "starcoder",
        "gptbigcode",
        "falcon"
    ],
 }
@functools.cache
 def list_all_samplers():
    all_samplers = set()
    for k in loaders_samplers:
        for sampler in loaders_samplers[k]:
            all_samplers.add(sampler)
    return sorted(all_samplers)
 def blacklist_samplers(loader):
    all_samplers = list_all_samplers()
    if loader == 'All':
        return [gr.update(visible=True) for sampler in all_samplers]
    else:
        return [gr.update(visible=True) if sampler in loaders_samplers[loader] else gr.update(visible=False) for sampler in all_samplers]
 def get_model_types(loader):
    if loader in loaders_model_types:
        return loaders_model_types[loader]
    return ["None"]
 def get_gpu_memory_keys():
    return [k for k in shared.gradio if k.startswith('gpu_memory')]
@functools.cache
 def get_all_params():
    all_params = set()
    for k in loaders_and_params:
        for el in loaders_and_params[k]:
            all_params.add(el)
    if 'gpu_memory' in all_params:
        all_params.remove('gpu_memory')
        for k in get_gpu_memory_keys():
            all_params.add(k)
    return sorted(all_params)
 def make_loader_params_visible(loader):
    params = []
    all_params = get_all_params()
    if loader in loaders_and_params:
        params = loaders_and_params[loader]
        if 'gpu_memory' in params:
            params.remove('gpu_memory')
            params += get_gpu_memory_keys()
    return [gr.update(visible=True) if k in params else gr.update(visible=False) for k in all_params]
--- a/python/llm/example/Text-Generation-WebUI/modules/logging_colors.py
+++ b/python/llm/example/Text-Generation-WebUI/modules/logging_colors.py
@ -0,0 +1,86 @@
 #
 # Copyright 2016 The BigDL Authors.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # This file is adapted from
 # https://github.com/oobabooga/text-generation-webui/blob/main/modules/logging_colors.py
 import logging
 logger = logging.getLogger('text-generation-webui')
 def setup_logging():
    '''
    Copied from: https://github.com/vladmandic/automatic
    All credits to vladmandic.
    '''
    class RingBuffer(logging.StreamHandler):
        def __init__(self, capacity):
            super().__init__()
            self.capacity = capacity
            self.buffer = []
            self.formatter = logging.Formatter('{ "asctime":"%(asctime)s", "created":%(created)f, "facility":"%(name)s", "pid":%(process)d, "tid":%(thread)d, "level":"%(levelname)s", "module":"%(module)s", "func":"%(funcName)s", "msg":"%(message)s" }')
        def emit(self, record):
            msg = self.format(record)
            # self.buffer.append(json.loads(msg))
            self.buffer.append(msg)
            if len(self.buffer) > self.capacity:
                self.buffer.pop(0)
        def get(self):
            return self.buffer
    from rich.console import Console
    from rich.logging import RichHandler
    from rich.pretty import install as pretty_install
    from rich.theme import Theme
    from rich.traceback import install as traceback_install
    level = logging.DEBUG
    logger.setLevel(logging.DEBUG)  # log to file is always at level debug for facility `sd`
    console = Console(log_time=True, log_time_format='%H:%M:%S-%f', theme=Theme({
        "traceback.border": "black",
        "traceback.border.syntax_error": "black",
        "inspect.value.border": "black",
    }))
    logging.basicConfig(level=logging.ERROR, format='%(asctime)s | %(name)s | %(levelname)s | %(module)s | %(message)s', handlers=[logging.NullHandler()])  # redirect default logger to null
    pretty_install(console=console)
    traceback_install(console=console, extra_lines=1, max_frames=10, width=console.width, word_wrap=False, indent_guides=False, suppress=[])
    while logger.hasHandlers() and len(logger.handlers) > 0:
        logger.removeHandler(logger.handlers[0])
    # handlers
    rh = RichHandler(show_time=True, omit_repeated_times=False, show_level=True, show_path=False, markup=False, rich_tracebacks=True, log_time_format='%H:%M:%S-%f', level=level, console=console)
    rh.setLevel(level)
    logger.addHandler(rh)
    rb = RingBuffer(100)  # 100 entries default in log ring buffer
    rb.setLevel(level)
    logger.addHandler(rb)
    logger.buffer = rb.buffer
    # overrides
    logging.getLogger("urllib3").setLevel(logging.ERROR)
    logging.getLogger("httpx").setLevel(logging.ERROR)
    logging.getLogger("diffusers").setLevel(logging.ERROR)
    logging.getLogger("torch").setLevel(logging.ERROR)
    logging.getLogger("lycoris").handlers = logger.handlers
 setup_logging()
--- a/python/llm/example/Text-Generation-WebUI/modules/logits.py
+++ b/python/llm/example/Text-Generation-WebUI/modules/logits.py
@ -0,0 +1,93 @@
 #
 # Copyright 2016 The BigDL Authors.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # This file is adapted from
 # https://github.com/oobabooga/text-generation-webui/blob/main/modules/logits.py
 import torch
 from transformers import is_torch_xpu_available
 from modules import sampler_hijack, shared
 from modules.logging_colors import logger
 from modules.text_generation import generate_reply
 global_scores = None
 def get_next_logits(prompt, state, use_samplers, previous, top_logits=25, return_dict=False):
    if shared.model is None:
        logger.error("No model is loaded! Select one in the Model tab.")
        return 'Error: No model is loaded1 Select one in the Model tab.', previous
    is_non_hf_exllamav2 = shared.model.__class__.__name__ == 'Exllamav2Model'
    is_non_hf_llamacpp = shared.model.__class__.__name__ == 'LlamaCppModel'
    if use_samplers:
        if any([is_non_hf_exllamav2, is_non_hf_llamacpp]):
            logger.error("Sampler hijacking is not supported non-Huggingface loaders.")
            # sampling is all done in c for exllama, so it is really hard to hijack
            # it should be possible to hijack llamacpp sampler by hijacking all their sampling methods,
            # but it is not implemented yet
            return 'Error: Sampler hijacking is not supported non-Huggingface loaders. Please disable the "Use samplers" option.', previous
        state['max_new_tokens'] = 1
        state['auto_max_new_tokens'] = False
        for _ in generate_reply(prompt, state):
            pass
        scores = sampler_hijack.global_scores[-1]
    else:
        if is_non_hf_exllamav2:
            if is_torch_xpu_available():
                tokens = shared.tokenizer.encode(prompt).to("xpu:0")
            else:
                tokens = shared.tokenizer.encode(prompt).cuda()
            scores = shared.model.get_logits(tokens)[-1][-1]
        elif is_non_hf_llamacpp:
            tokens = shared.tokenizer.encode(prompt)
            scores = shared.model.get_logits(tokens)[-1][-1]
        else:
            if is_torch_xpu_available():
                tokens = shared.tokenizer.encode(prompt, return_tensors='pt').to("xpu:0")
            else:
                tokens = shared.tokenizer.encode(prompt, return_tensors='pt').cuda()
            output = shared.model(input_ids=tokens)
            scores = output['logits'][-1][-1]
    probs = torch.softmax(scores, dim=-1, dtype=torch.float)
    topk_values, topk_indices = torch.topk(probs, k=top_logits, largest=True, sorted=True)
    if is_non_hf_llamacpp:
        topk_indices = [i.expand((1, 1)) for i in topk_indices]
    if hasattr(shared.tokenizer, 'convert_ids_to_tokens'):
        tokens = [shared.tokenizer.convert_ids_to_tokens(int(i)) for i in topk_indices]
    else:
        tokens = [shared.tokenizer.decode(i) for i in topk_indices]
    if return_dict:
        topk_values = [float(i) for i in topk_values]
        output = {}
        for row in list(zip(topk_values, tokens)):
            output[row[1]] = row[0]
        return output
    else:
        topk_values = [f"{float(i):.5f}" for i in topk_values]
        output = ''
        for row in list(zip(topk_values, tokens)):
            output += f"{row[0]}  -  {repr(row[1])}\n"
        return output, previous
--- a/python/llm/example/Text-Generation-WebUI/modules/metadata_gguf.py
+++ b/python/llm/example/Text-Generation-WebUI/modules/metadata_gguf.py
@ -0,0 +1,111 @@
 #
 # Copyright 2016 The BigDL Authors.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # This file is adapted from
 # https://github.com/oobabooga/text-generation-webui/blob/main/modules/metadata_gguf.py
 import struct
 from enum import IntEnum
 class GGUFValueType(IntEnum):
    UINT8 = 0
    INT8 = 1
    UINT16 = 2
    INT16 = 3
    UINT32 = 4
    INT32 = 5
    FLOAT32 = 6
    BOOL = 7
    STRING = 8
    ARRAY = 9
    UINT64 = 10
    INT64 = 11
    FLOAT64 = 12
 _simple_value_packing = {
    GGUFValueType.UINT8: "<B",
    GGUFValueType.INT8: "<b",
    GGUFValueType.UINT16: "<H",
    GGUFValueType.INT16: "<h",
    GGUFValueType.UINT32: "<I",
    GGUFValueType.INT32: "<i",
    GGUFValueType.FLOAT32: "<f",
    GGUFValueType.UINT64: "<Q",
    GGUFValueType.INT64: "<q",
    GGUFValueType.FLOAT64: "<d",
    GGUFValueType.BOOL: "?",
 }
 value_type_info = {
    GGUFValueType.UINT8: 1,
    GGUFValueType.INT8: 1,
    GGUFValueType.UINT16: 2,
    GGUFValueType.INT16: 2,
    GGUFValueType.UINT32: 4,
    GGUFValueType.INT32: 4,
    GGUFValueType.FLOAT32: 4,
    GGUFValueType.UINT64: 8,
    GGUFValueType.INT64: 8,
    GGUFValueType.FLOAT64: 8,
    GGUFValueType.BOOL: 1,
 }
 def get_single(value_type, file):
    if value_type == GGUFValueType.STRING:
        value_length = struct.unpack("<Q", file.read(8))[0]
        value = file.read(value_length)
        try:
            value = value.decode('utf-8')
        except:
            pass
    else:
        type_str = _simple_value_packing.get(value_type)
        bytes_length = value_type_info.get(value_type)
        value = struct.unpack(type_str, file.read(bytes_length))[0]
    return value
 def load_metadata(fname):
    metadata = {}
    with open(fname, 'rb') as file:
        GGUF_MAGIC = struct.unpack("<I", file.read(4))[0]
        GGUF_VERSION = struct.unpack("<I", file.read(4))[0]
        ti_data_count = struct.unpack("<Q", file.read(8))[0]
        kv_data_count = struct.unpack("<Q", file.read(8))[0]
        if GGUF_VERSION == 1:
            raise Exception('You are using an outdated GGUF, please download a new one.')
        for i in range(kv_data_count):
            key_length = struct.unpack("<Q", file.read(8))[0]
            key = file.read(key_length)
            value_type = GGUFValueType(struct.unpack("<I", file.read(4))[0])
            if value_type == GGUFValueType.ARRAY:
                ltype = GGUFValueType(struct.unpack("<I", file.read(4))[0])
                length = struct.unpack("<Q", file.read(8))[0]
                arr = [get_single(ltype, file) for _ in range(length)]
                metadata[key.decode()] = arr
            else:
                value = get_single(value_type, file)
                metadata[key.decode()] = value
    return metadata
--- a/python/llm/example/Text-Generation-WebUI/modules/models.py
+++ b/python/llm/example/Text-Generation-WebUI/modules/models.py
@ -0,0 +1,510 @@
 #
 # Copyright 2016 The BigDL Authors.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # This file is adapted from
 # https://github.com/oobabooga/text-generation-webui/blob/main/modules/models.py
 import gc
 import logging
 import os
 import re
 import time
 import traceback
 from pathlib import Path
 import torch
 import transformers
 from accelerate import infer_auto_device_map, init_empty_weights
 from accelerate.utils import is_ccl_available, is_xpu_available
 from transformers import (
    AutoConfig,
    AutoModel,
    AutoModelForCausalLM,
    AutoModelForSeq2SeqLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    #GPTQConfig
 )
 import modules.shared as shared
 from modules import RoPE, sampler_hijack
 from modules.logging_colors import logger
 from modules.models_settings import get_model_metadata
 from modules.relative_imports import RelativeImport
 transformers.logging.set_verbosity_error()
 local_rank = None
 if shared.args.deepspeed:
    import deepspeed
    from transformers.deepspeed import (
        HfDeepSpeedConfig,
        is_deepspeed_zero3_enabled
    )
    from modules.deepspeed_parameters import generate_ds_config
    # Distributed setup
    local_rank = shared.args.local_rank if shared.args.local_rank is not None else int(os.getenv("LOCAL_RANK", "0"))
    world_size = int(os.getenv("WORLD_SIZE", "1"))
    if is_xpu_available() and is_ccl_available():
        torch.xpu.set_device(local_rank)
        deepspeed.init_distributed(backend="ccl")
    else:
        torch.cuda.set_device(local_rank)
        deepspeed.init_distributed()
    ds_config = generate_ds_config(shared.args.bf16, 1 * world_size, shared.args.nvme_offload_dir)
    dschf = HfDeepSpeedConfig(ds_config)  # Keep this object alive for the Transformers integration
 sampler_hijack.hijack_samplers()
 def load_model(model_name, loader=None):
    logger.info(f"Loading {model_name}")
    t0 = time.time()
    shared.is_seq2seq = False
    shared.model_name = model_name
    load_func_map = {
        'Transformers': huggingface_loader,
        'AutoGPTQ': AutoGPTQ_loader,
        'GPTQ-for-LLaMa': GPTQ_loader,
        'llama.cpp': llamacpp_loader,
        'llamacpp_HF': llamacpp_HF_loader,
        'ExLlamav2': ExLlamav2_loader,
        'ExLlamav2_HF': ExLlamav2_HF_loader,
        'ctransformers': ctransformers_loader,
        'AutoAWQ': AutoAWQ_loader,
        'QuIP#': QuipSharp_loader,
        'HQQ': HQQ_loader,
        'BigDL-LLM': bigdl_llm_loader,
    }
    metadata = get_model_metadata(model_name)
    if loader is None:
        if shared.args.loader is not None:
            loader = shared.args.loader
        else:
            loader = metadata['loader']
            if loader is None:
                logger.error('The path to the model does not exist. Exiting.')
                raise ValueError
    shared.args.loader = loader
    output = load_func_map[loader](model_name)
    if type(output) is tuple:
        model, tokenizer = output
    else:
        model = output
        if model is None:
            return None, None
        else:
            tokenizer = load_tokenizer(model_name, model)
    shared.settings.update({k: v for k, v in metadata.items() if k in shared.settings})
    if loader.lower().startswith('exllama'):
        shared.settings['truncation_length'] = shared.args.max_seq_len
    elif loader in ['llama.cpp', 'llamacpp_HF', 'ctransformers']:
        shared.settings['truncation_length'] = shared.args.n_ctx
    logger.info(f"LOADER: {loader}")
    logger.info(f"TRUNCATION LENGTH: {shared.settings['truncation_length']}")
    logger.info(f"INSTRUCTION TEMPLATE: {metadata['instruction_template']}")
    logger.info(f"Loaded the model in {(time.time()-t0):.2f} seconds.")
    return model, tokenizer
 def load_tokenizer(model_name, model):
    tokenizer = None
    path_to_model = Path(f"{shared.args.model_dir}/{model_name}/")
    if any(s in model_name.lower() for s in ['gpt-4chan', 'gpt4chan']) and Path(f"{shared.args.model_dir}/gpt-j-6B/").exists():
        tokenizer = AutoTokenizer.from_pretrained(Path(f"{shared.args.model_dir}/gpt-j-6B/"))
    elif path_to_model.exists():
        if shared.args.no_use_fast:
            logger.info('Loading the tokenizer with use_fast=False.')
        tokenizer = AutoTokenizer.from_pretrained(
            path_to_model,
            trust_remote_code=shared.args.trust_remote_code,
            use_fast=not shared.args.no_use_fast
        )
    return tokenizer
 def huggingface_loader(model_name):
    path_to_model = Path(f'{shared.args.model_dir}/{model_name}')
    params = {
        'low_cpu_mem_usage': True,
        'trust_remote_code': shared.args.trust_remote_code,
        'torch_dtype': torch.bfloat16 if shared.args.bf16 else torch.float16,
        'use_safetensors': True if shared.args.force_safetensors else None
    }
    if shared.args.use_flash_attention_2:
        params['use_flash_attention_2'] = True
    config = AutoConfig.from_pretrained(path_to_model, trust_remote_code=params['trust_remote_code'])
    if 'chatglm' in model_name.lower():
        LoaderClass = AutoModel
    else:
        if config.to_dict().get('is_encoder_decoder', False):
            LoaderClass = AutoModelForSeq2SeqLM
            shared.is_seq2seq = True
        else:
            LoaderClass = AutoModelForCausalLM
    # Load the model in simple 16-bit mode by default
    if not any([shared.args.cpu, shared.args.load_in_8bit, shared.args.load_in_4bit, shared.args.auto_devices, shared.args.disk, shared.args.deepspeed, shared.args.gpu_memory is not None, shared.args.cpu_memory is not None, shared.args.compress_pos_emb > 1, shared.args.alpha_value > 1, shared.args.disable_exllama, shared.args.disable_exllamav2]):
        model = LoaderClass.from_pretrained(path_to_model, **params)
        if torch.backends.mps.is_available():
            device = torch.device('mps')
            model = model.to(device)
        elif is_xpu_available():
            device = torch.device("xpu")
            model = model.to(device)
        else:
            model = model.cuda()
    # DeepSpeed ZeRO-3
    elif shared.args.deepspeed:
        model = LoaderClass.from_pretrained(path_to_model, torch_dtype=params['torch_dtype'])
        model = deepspeed.initialize(model=model, config_params=ds_config, model_parameters=None, optimizer=None, lr_scheduler=None)[0]
        model.module.eval()  # Inference
        logger.info(f'DeepSpeed ZeRO-3 is enabled: {is_deepspeed_zero3_enabled()}')
    # Load with quantization and/or offloading
    else:
        if not any((shared.args.cpu, torch.cuda.is_available(), is_xpu_available(), torch.backends.mps.is_available())):
            logger.warning('torch.cuda.is_available() and is_xpu_available() returned False. This means that no GPU has been detected. Falling back to CPU mode.')
            shared.args.cpu = True
        if shared.args.cpu:
            params['torch_dtype'] = torch.float32
        else:
            params['device_map'] = 'auto'
            params['max_memory'] = get_max_memory_dict()
            if shared.args.load_in_4bit:
                # See https://github.com/huggingface/transformers/pull/23479/files
                # and https://huggingface.co/blog/4bit-transformers-bitsandbytes
                quantization_config_params = {
                    'load_in_4bit': True,
                    'bnb_4bit_compute_dtype': eval("torch.{}".format(shared.args.compute_dtype)) if shared.args.compute_dtype in ["bfloat16", "float16", "float32"] else None,
                    'bnb_4bit_quant_type': shared.args.quant_type,
                    'bnb_4bit_use_double_quant': shared.args.use_double_quant,
                }
                logger.info('Using the following 4-bit params: ' + str(quantization_config_params))
                params['quantization_config'] = BitsAndBytesConfig(**quantization_config_params)
            elif shared.args.load_in_8bit:
                if any((shared.args.auto_devices, shared.args.gpu_memory)):
                    params['quantization_config'] = BitsAndBytesConfig(load_in_8bit=True, llm_int8_enable_fp32_cpu_offload=True)
                else:
                    params['quantization_config'] = BitsAndBytesConfig(load_in_8bit=True)
                if params['max_memory'] is not None:
                    with init_empty_weights():
                        model = LoaderClass.from_config(config, trust_remote_code=params['trust_remote_code'])
                    model.tie_weights()
                    params['device_map'] = infer_auto_device_map(
                        model,
                        dtype=torch.int8,
                        max_memory=params['max_memory'],
                        no_split_module_classes=model._no_split_modules
                    )
            if shared.args.disk:
                params['offload_folder'] = shared.args.disk_cache_dir
        if shared.args.disable_exllama or shared.args.disable_exllamav2:
            try:
                gptq_config = GPTQConfig(
                    bits=config.quantization_config.get('bits', 4),
                    disable_exllama=shared.args.disable_exllama,
                    disable_exllamav2=shared.args.disable_exllamav2,
                )
                params['quantization_config'] = gptq_config
                logger.info(f'Loading with disable_exllama={shared.args.disable_exllama} and disable_exllamav2={shared.args.disable_exllamav2}.')
            except:
                exc = traceback.format_exc()
                logger.error('Failed to disable exllama. Does the config.json for this model contain the necessary quantization info?')
                print(exc)
        if shared.args.compress_pos_emb > 1:
            params['rope_scaling'] = {'type': 'linear', 'factor': shared.args.compress_pos_emb}
        elif shared.args.alpha_value > 1:
            params['rope_scaling'] = {'type': 'dynamic', 'factor': RoPE.get_alpha_value(shared.args.alpha_value, shared.args.rope_freq_base)}
        model = LoaderClass.from_pretrained(path_to_model, **params)
    return model
 def llamacpp_loader(model_name):
    from modules.llamacpp_model import LlamaCppModel
    path = Path(f'{shared.args.model_dir}/{model_name}')
    if path.is_file():
        model_file = path
    else:
        model_file = list(Path(f'{shared.args.model_dir}/{model_name}').glob('*.gguf'))[0]
    logger.info(f"llama.cpp weights detected: {model_file}")
    model, tokenizer = LlamaCppModel.from_pretrained(model_file)
    return model, tokenizer
 def llamacpp_HF_loader(model_name):
    from modules.llamacpp_hf import LlamacppHF
    for fname in [model_name, "oobabooga_llama-tokenizer", "llama-tokenizer"]:
        path = Path(f'{shared.args.model_dir}/{fname}')
        if all((path / file).exists() for file in ['tokenizer_config.json', 'special_tokens_map.json', 'tokenizer.model']):
            logger.info(f'Using tokenizer from: {path}')
            break
    else:
        logger.error("Could not load the model because a tokenizer in transformers format was not found. Please download oobabooga/llama-tokenizer.")
        return None, None
    if shared.args.no_use_fast:
        logger.info('Loading the tokenizer with use_fast=False.')
    tokenizer = AutoTokenizer.from_pretrained(
        path,
        trust_remote_code=shared.args.trust_remote_code,
        use_fast=not shared.args.no_use_fast
    )
    model = LlamacppHF.from_pretrained(model_name)
    return model, tokenizer
 def ctransformers_loader(model_name):
    from modules.ctransformers_model import CtransformersModel
    path = Path(f'{shared.args.model_dir}/{model_name}')
    ctrans = CtransformersModel()
    if ctrans.model_type_is_auto():
        model_file = path
    else:
        if path.is_file():
            model_file = path
        else:
            entries = Path(f'{shared.args.model_dir}/{model_name}')
            gguf = list(entries.glob('*.gguf'))
            bin = list(entries.glob('*.bin'))
            if len(gguf) > 0:
                model_file = gguf[0]
            elif len(bin) > 0:
                model_file = bin[0]
            else:
                logger.error("Could not find a model for ctransformers.")
                return None, None
    logger.info(f'ctransformers weights detected: {model_file}')
    model, tokenizer = ctrans.from_pretrained(model_file)
    return model, tokenizer
 def AutoAWQ_loader(model_name):
    from awq import AutoAWQForCausalLM
    model_dir = Path(f'{shared.args.model_dir}/{model_name}')
    model = AutoAWQForCausalLM.from_quantized(
                quant_path=model_dir,
                max_new_tokens=shared.args.max_seq_len,
                trust_remote_code=shared.args.trust_remote_code,
                fuse_layers=not shared.args.no_inject_fused_attention,
                max_memory=get_max_memory_dict(),
                batch_size=1,
                safetensors=any(model_dir.glob('*.safetensors')),
            )
    return model
 def bigdl_llm_loader(model_name):
    from bigdl.llm.transformers import AutoModelForCausalLM, AutoModel, AutoModelForSeq2SeqLM
    path_to_model = Path(f'{shared.args.model_dir}/{model_name}')
    config = AutoConfig.from_pretrained(path_to_model, trust_remote_code=shared.args.trust_remote_code)
    if 'chatglm' in model_name.lower():
        LoaderClass = AutoModel
    else:
        if config.to_dict().get('is_encoder_decoder', False):
            LoaderClass = AutoModelForSeq2SeqLM
            shared.is_seq2seq = True
        else:
            LoaderClass = AutoModelForCausalLM
    model = LoaderClass.from_pretrained(
                path_to_model,
                load_in_4bit=shared.args.load_in_4bit,
                load_in_low_bit=shared.args.load_in_low_bit,
                optimize_model=shared.args.optimize_model,
                modules_to_not_convert=shared.args.modules_to_not_convert,
                cpu_embedding=shared.args.cpu_embedding,
                lightweight_bmm=shared.args.lightweight_bmm,
                trust_remote_code=shared.args.trust_remote_code,
                use_cache=shared.args.use_cache,
                )
    if shared.args.device == "GPU":
        import intel_extension_for_pytorch
        model = model.to("xpu")
    tokenizer = AutoTokenizer.from_pretrained(path_to_model, trust_remote_code=shared.args.trust_remote_code)
    return model, tokenizer
 def QuipSharp_loader(model_name):
    try:
        with RelativeImport("repositories/quip-sharp"):
            from lib.utils.unsafe_import import model_from_hf_path
    except:
        logger.error(
            "\nQuIP# has not been found. It must be installed manually for now.\n"
            "For instructions on how to do that, please consult:\n"
            "https://github.com/oobabooga/text-generation-webui/pull/4803\n"
        )
        return None, None
    # This fixes duplicate logging messages after the import above.
    handlers = logging.getLogger().handlers
    if len(handlers) > 1:
        logging.getLogger().removeHandler(handlers[1])
    model_dir = Path(f'{shared.args.model_dir}/{model_name}')
    if not all((model_dir / file).exists() for file in ['tokenizer_config.json', 'special_tokens_map.json', 'tokenizer.model']):
        logger.error(f"Could not load the model because the tokenizer files could not be found in the model folder. Please download the following files from the original (unquantized) model into {model_dir}: special_tokens_map.json, tokenizer.json, tokenizer.model, tokenizer_config.json.")
        return None, None
    model, model_str = model_from_hf_path(
        model_dir,
        use_cuda_graph=False,
        use_flash_attn=not shared.args.no_flash_attn
    )
    return model
 def GPTQ_loader(model_name):
    # Monkey patch
    if shared.args.monkey_patch:
        logger.warning("Applying the monkey patch for using LoRAs with GPTQ models. It may cause undefined behavior outside its intended scope.")
        from modules.monkey_patch_gptq_lora import load_model_llama
        model, _ = load_model_llama(model_name)
    # No monkey patch
    else:
        import modules.GPTQ_loader
        model = modules.GPTQ_loader.load_quantized(model_name)
    return model
 def AutoGPTQ_loader(model_name):
    import modules.AutoGPTQ_loader
    return modules.AutoGPTQ_loader.load_quantized(model_name)
 def ExLlamav2_loader(model_name):
    from modules.exllamav2 import Exllamav2Model
    model, tokenizer = Exllamav2Model.from_pretrained(model_name)
    return model, tokenizer
 def ExLlamav2_HF_loader(model_name):
    from modules.exllamav2_hf import Exllamav2HF
    return Exllamav2HF.from_pretrained(model_name)
 def HQQ_loader(model_name):
    from hqq.core.quantize import HQQBackend, HQQLinear
    from hqq.engine.hf import HQQModelForCausalLM
    logger.info(f"Loading HQQ model with backend: {shared.args.hqq_backend}")
    model_dir = Path(f'{shared.args.model_dir}/{model_name}')
    model = HQQModelForCausalLM.from_quantized(str(model_dir))
    HQQLinear.set_backend(getattr(HQQBackend, shared.args.hqq_backend))
    return model
 def get_max_memory_dict():
    max_memory = {}
    max_cpu_memory = shared.args.cpu_memory.strip() if shared.args.cpu_memory is not None else '99GiB'
    if shared.args.gpu_memory:
        memory_map = list(map(lambda x: x.strip(), shared.args.gpu_memory))
        for i in range(len(memory_map)):
            max_memory[i] = f'{memory_map[i]}GiB' if not re.match('.*ib$', memory_map[i].lower()) else memory_map[i]
        max_memory['cpu'] = f'{max_cpu_memory}GiB' if not re.match('.*ib$', max_cpu_memory.lower()) else max_cpu_memory
    # If --auto-devices is provided standalone, try to get a reasonable value
    # for the maximum memory of device :0
    elif shared.args.auto_devices:
        if is_xpu_available():
            total_mem = (torch.xpu.get_device_properties(0).total_memory / (1024 * 1024))
        else:
            total_mem = (torch.cuda.get_device_properties(0).total_memory / (1024 * 1024))
        suggestion = round((total_mem - 1000) / 1000) * 1000
        if total_mem - suggestion < 800:
            suggestion -= 1000
        suggestion = int(round(suggestion / 1000))
        logger.warning(f"Auto-assiging --gpu-memory {suggestion} for your GPU to try to prevent out-of-memory errors. You can manually set other values.")
        max_memory[0] = f'{suggestion}GiB'
        max_memory['cpu'] = f'{max_cpu_memory}GiB' if not re.match('.*ib$', max_cpu_memory.lower()) else max_cpu_memory
    return max_memory if len(max_memory) > 0 else None
 def clear_torch_cache():
    gc.collect()
    if not shared.args.cpu:
        if is_xpu_available():
            torch.xpu.empty_cache()
        else:
            torch.cuda.empty_cache()
 def unload_model():
    shared.model = shared.tokenizer = None
    shared.model_name = 'None'
    shared.lora_names = []
    shared.model_dirty_from_training = False
    clear_torch_cache()
 def reload_model():
    unload_model()
    shared.model, shared.tokenizer = load_model(shared.model_name)
--- a/python/llm/example/Text-Generation-WebUI/modules/models_settings.py
+++ b/python/llm/example/Text-Generation-WebUI/modules/models_settings.py
@ -0,0 +1,288 @@
 #
 # Copyright 2016 The BigDL Authors.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # This file is adapted from
 # https://github.com/oobabooga/text-generation-webui/blob/main/modules/models_settings.py
 import json
 import re
 from pathlib import Path
 import yaml
 from modules import chat, loaders, metadata_gguf, shared, ui
 def get_fallback_settings():
    return {
        'wbits': 'None',
        'groupsize': 'None',
        'desc_act': False,
        'model_type': 'None',
        'max_seq_len': 2048,
        'n_ctx': 2048,
        'rope_freq_base': 0,
        'compress_pos_emb': 1,
        'truncation_length': shared.settings['truncation_length'],
        'skip_special_tokens': shared.settings['skip_special_tokens'],
        'custom_stopping_strings': shared.settings['custom_stopping_strings'],
    }
 def get_model_metadata(model):
    model_settings = {}
    # Get settings from models/config.yaml and models/config-user.yaml
    settings = shared.model_config
    for pat in settings:
        if re.match(pat.lower(), model.lower()):
            for k in settings[pat]:
                model_settings[k] = settings[pat][k]
    path = Path(f'{shared.args.model_dir}/{model}/config.json')
    if path.exists():
        hf_metadata = json.loads(open(path, 'r', encoding='utf-8').read())
    else:
        hf_metadata = None
    if 'loader' not in model_settings:
        if hf_metadata is not None and 'quip_params' in hf_metadata:
            loader = 'QuIP#'
        else:
            loader = infer_loader(model, model_settings)
        model_settings['loader'] = loader
    # GGUF metadata
    if model_settings['loader'] in ['llama.cpp', 'llamacpp_HF', 'ctransformers']:
        path = Path(f'{shared.args.model_dir}/{model}')
        if path.is_file():
            model_file = path
        else:
            model_file = list(path.glob('*.gguf'))[0]
        metadata = metadata_gguf.load_metadata(model_file)
        if 'llama.context_length' in metadata:
            model_settings['n_ctx'] = metadata['llama.context_length']
        if 'llama.rope.scale_linear' in metadata:
            model_settings['compress_pos_emb'] = metadata['llama.rope.scale_linear']
        if 'llama.rope.freq_base' in metadata:
            model_settings['rope_freq_base'] = metadata['llama.rope.freq_base']
        if 'tokenizer.chat_template' in metadata:
            template = metadata['tokenizer.chat_template']
            eos_token = metadata['tokenizer.ggml.tokens'][metadata['tokenizer.ggml.eos_token_id']]
            bos_token = metadata['tokenizer.ggml.tokens'][metadata['tokenizer.ggml.bos_token_id']]
            template = template.replace('eos_token', "'{}'".format(eos_token))
            template = template.replace('bos_token', "'{}'".format(bos_token))
            template = re.sub(r'raise_exception\([^)]*\)', "''", template)
            model_settings['instruction_template'] = 'Custom (obtained from model metadata)'
            model_settings['instruction_template_str'] = template
    else:
        # Transformers metadata
        if hf_metadata is not None:
            metadata = json.loads(open(path, 'r', encoding='utf-8').read())
            if 'max_position_embeddings' in metadata:
                model_settings['truncation_length'] = metadata['max_position_embeddings']
                model_settings['max_seq_len'] = metadata['max_position_embeddings']
            if 'rope_theta' in metadata:
                model_settings['rope_freq_base'] = metadata['rope_theta']
            if 'rope_scaling' in metadata and type(metadata['rope_scaling']) is dict and all(key in metadata['rope_scaling'] for key in ('type', 'factor')):
                if metadata['rope_scaling']['type'] == 'linear':
                    model_settings['compress_pos_emb'] = metadata['rope_scaling']['factor']
            if 'quantization_config' in metadata:
                if 'bits' in metadata['quantization_config']:
                    model_settings['wbits'] = metadata['quantization_config']['bits']
                if 'group_size' in metadata['quantization_config']:
                    model_settings['groupsize'] = metadata['quantization_config']['group_size']
                if 'desc_act' in metadata['quantization_config']:
                    model_settings['desc_act'] = metadata['quantization_config']['desc_act']
        # Read AutoGPTQ metadata
        path = Path(f'{shared.args.model_dir}/{model}/quantize_config.json')
        if path.exists():
            metadata = json.loads(open(path, 'r', encoding='utf-8').read())
            if 'bits' in metadata:
                model_settings['wbits'] = metadata['bits']
            if 'group_size' in metadata:
                model_settings['groupsize'] = metadata['group_size']
            if 'desc_act' in metadata:
                model_settings['desc_act'] = metadata['desc_act']
    # Try to find the Jinja instruct template
    path = Path(f'{shared.args.model_dir}/{model}') / 'tokenizer_config.json'
    if path.exists():
        metadata = json.loads(open(path, 'r', encoding='utf-8').read())
        if 'chat_template' in metadata:
            template = metadata['chat_template']
            for k in ['eos_token', 'bos_token']:
                if k in metadata:
                    value = metadata[k]
                    if type(value) is dict:
                        value = value['content']
                    template = template.replace(k, "'{}'".format(value))
            template = re.sub(r'raise_exception\([^)]*\)', "''", template)
            model_settings['instruction_template'] = 'Custom (obtained from model metadata)'
            model_settings['instruction_template_str'] = template
    if 'instruction_template' not in model_settings:
        model_settings['instruction_template'] = 'Alpaca'
    if model_settings['instruction_template'] != 'Custom (obtained from model metadata)':
        model_settings['instruction_template_str'] = chat.load_instruction_template(model_settings['instruction_template'])
    # Ignore rope_freq_base if set to the default value
    if 'rope_freq_base' in model_settings and model_settings['rope_freq_base'] == 10000:
        model_settings.pop('rope_freq_base')
    # Apply user settings from models/config-user.yaml
    settings = shared.user_config
    for pat in settings:
        if re.match(pat.lower(), model.lower()):
            for k in settings[pat]:
                model_settings[k] = settings[pat][k]
    return model_settings
 def infer_loader(model_name, model_settings):
    path_to_model = Path(f'{shared.args.model_dir}/{model_name}')
    if not path_to_model.exists():
        loader = None
    elif (path_to_model / 'quantize_config.json').exists() or ('wbits' in model_settings and type(model_settings['wbits']) is int and model_settings['wbits'] > 0):
        loader = 'ExLlamav2_HF'
    elif (path_to_model / 'quant_config.json').exists() or re.match(r'.*-awq', model_name.lower()):
        loader = 'AutoAWQ'
    elif len(list(path_to_model.glob('*.gguf'))) > 0:
        loader = 'llama.cpp'
    elif re.match(r'.*\.gguf', model_name.lower()):
        loader = 'llama.cpp'
    elif re.match(r'.*exl2', model_name.lower()):
        loader = 'ExLlamav2_HF'
    elif re.match(r'.*-hqq', model_name.lower()):
        return 'HQQ'
    else:
        loader = 'Transformers'
    return loader
 def update_model_parameters(state, initial=False):
    '''
    UI: update the command-line arguments based on the interface values
    '''
    elements = ui.list_model_elements()  # the names of the parameters
    gpu_memories = []
    for i, element in enumerate(elements):
        if element not in state:
            continue
        value = state[element]
        if element.startswith('gpu_memory'):
            gpu_memories.append(value)
            continue
        if initial and element in shared.provided_arguments:
            continue
        # Setting null defaults
        if element in ['wbits', 'groupsize', 'model_type'] and value == 'None':
            value = vars(shared.args_defaults)[element]
        elif element in ['cpu_memory'] and value == 0:
            value = vars(shared.args_defaults)[element]
        # Making some simple conversions
        if element in ['wbits', 'groupsize', 'pre_layer']:
            value = int(value)
        elif element == 'cpu_memory' and value is not None:
            value = f"{value}MiB"
        if element in ['pre_layer']:
            value = [value] if value > 0 else None
        setattr(shared.args, element, value)
    found_positive = False
    for i in gpu_memories:
        if i > 0:
            found_positive = True
            break
    if not (initial and vars(shared.args)['gpu_memory'] != vars(shared.args_defaults)['gpu_memory']):
        if found_positive:
            shared.args.gpu_memory = [f"{i}MiB" for i in gpu_memories]
        else:
            shared.args.gpu_memory = None
 def apply_model_settings_to_state(model, state):
    '''
    UI: update the state variable with the model settings
    '''
    model_settings = get_model_metadata(model)
    if 'loader' in model_settings:
        loader = model_settings.pop('loader')
        # If the user is using an alternative loader for the same model type, let them keep using it
        if not (loader == 'ExLlamav2_HF' and state['loader'] in ['GPTQ-for-LLaMa', 'ExLlamav2', 'AutoGPTQ']) and not (loader == 'llama.cpp' and state['loader'] in ['llamacpp_HF', 'ctransformers']):
            state['loader'] = loader
    for k in model_settings:
        if k in state:
            if k in ['wbits', 'groupsize']:
                state[k] = str(model_settings[k])
            else:
                state[k] = model_settings[k]
    return state
 def save_model_settings(model, state):
    '''
    Save the settings for this model to models/config-user.yaml
    '''
    if model == 'None':
        yield ("Not saving the settings because no model is loaded.")
        return
    with Path(f'{shared.args.model_dir}/config-user.yaml') as p:
        if p.exists():
            user_config = yaml.safe_load(open(p, 'r').read())
        else:
            user_config = {}
        model_regex = model + '$'  # For exact matches
        if model_regex not in user_config:
            user_config[model_regex] = {}
        for k in ui.list_model_elements():
            if k == 'loader' or k in loaders.loaders_and_params[state['loader']]:
                user_config[model_regex][k] = state[k]
        shared.user_config = user_config
        output = yaml.dump(user_config, sort_keys=False)
        with open(p, 'w') as f:
            f.write(output)
        yield (f"Settings for `{model}` saved to `{p}`.")
--- a/python/llm/example/Text-Generation-WebUI/modules/one_click_installer_check.py
+++ b/python/llm/example/Text-Generation-WebUI/modules/one_click_installer_check.py
@ -0,0 +1,28 @@
 #
 # Copyright 2016 The BigDL Authors.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # This file is adapted from
 # https://github.com/oobabooga/text-generation-webui/blob/main/modules/one_click_installer_check.py
 from pathlib import Path
 from modules.logging_colors import logger
 if Path('../webui.py').exists():
    logger.warning('\nIt looks like you are running an outdated version of '
                   'the one-click-installers.\n'
                   'Please migrate your installation following the instructions here:\n'
                   'https://github.com/oobabooga/text-generation-webui/wiki/Migrating-an-old-one%E2%80%90click-install')
--- a/python/llm/example/Text-Generation-WebUI/modules/presets.py
+++ b/python/llm/example/Text-Generation-WebUI/modules/presets.py
@ -0,0 +1,138 @@
 #
 # Copyright 2016 The BigDL Authors.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # This file is adapted from
 # https://github.com/oobabooga/text-generation-webui/blob/main/modules/presets.py
 import functools
 import random
 from pathlib import Path
 import yaml
 from modules import shared
 from modules.loaders import loaders_samplers
 def default_preset():
    return {
        'temperature': 1,
        'temperature_last': False,
        'dynamic_temperature': False,
        'dynamic_temperature_low': 0.1,
        'top_p': 1,
        'min_p': 0,
        'top_k': 0,
        'repetition_penalty': 1,
        'presence_penalty': 0,
        'frequency_penalty': 0,
        'repetition_penalty_range': 1024,
        'typical_p': 1,
        'tfs': 1,
        'top_a': 0,
        'epsilon_cutoff': 0,
        'eta_cutoff': 0,
        'guidance_scale': 1,
        'penalty_alpha': 0,
        'mirostat_mode': 0,
        'mirostat_tau': 5,
        'mirostat_eta': 0.1,
        'do_sample': True,
        'encoder_repetition_penalty': 1,
        'no_repeat_ngram_size': 0,
        'min_length': 0,
        'num_beams': 1,
        'length_penalty': 1,
        'early_stopping': False,
    }
 def presets_params():
    return [k for k in default_preset()]
 def load_preset(name):
    generate_params = default_preset()
    if name not in ['None', None, '']:
        with open(Path(f'presets/{name}.yaml'), 'r') as infile:
            preset = yaml.safe_load(infile)
        for k in preset:
            generate_params[k] = preset[k]
    return generate_params
@functools.cache
 def load_preset_memoized(name):
    return load_preset(name)
 def load_preset_for_ui(name, state):
    generate_params = load_preset(name)
    state.update(generate_params)
    return state, *[generate_params[k] for k in presets_params()]
 def random_preset(state):
    params_and_values = {
        'remove_tail_tokens': {
            'top_p': [0.5, 0.8, 0.9, 0.95, 0.99],
            'min_p': [0.5, 0.2, 0.1, 0.05, 0.01],
            'top_k': [3, 5, 10, 20, 30, 40],
            'typical_p': [0.2, 0.575, 0.95],
            'tfs': [0.5, 0.8, 0.9, 0.95, 0.99],
            'top_a': [0.5, 0.2, 0.1, 0.05, 0.01],
            'epsilon_cutoff': [1, 3, 5, 7, 9],
            'eta_cutoff': [3, 6, 9, 12, 15, 18],
        },
        'flatten_distribution': {
            'temperature': [0.5, 0.7, 0.8, 1, 1.2, 1.5, 2.0],
        },
        'repetition': {
            'repetition_penalty': [1, 1.05, 1.1, 1.15, 1.20, 1.25],
            'presence_penalty': [0, 0.1, 0.2, 0.4, 0.6, 0.8, 1.0, 2.0],
            'frequency_penalty': [0, 0.1, 0.2, 0.4, 0.6, 0.8, 1.0, 2.0],
        },
        'other': {
            'temperature_last': [True, False],
        }
    }
    generate_params = default_preset()
    for cat in params_and_values:
        choices = list(params_and_values[cat].keys())
        if shared.args.loader is not None:
            choices = [x for x in choices if x in loaders_samplers[shared.args.loader]]
        if len(choices) > 0:
            choice = random.choice(choices)
            generate_params[choice] = random.choice(params_and_values[cat][choice])
    state.update(generate_params)
    return state, *[generate_params[k] for k in presets_params()]
 def generate_preset_yaml(state):
    defaults = default_preset()
    data = {k: state[k] for k in presets_params()}
    # Remove entries that are identical to the defaults
    for k in list(data.keys()):
        if data[k] == defaults[k]:
            del data[k]
    return yaml.dump(data, sort_keys=False)
--- a/python/llm/example/Text-Generation-WebUI/modules/prompts.py
+++ b/python/llm/example/Text-Generation-WebUI/modules/prompts.py
@ -0,0 +1,46 @@
 #
 # Copyright 2016 The BigDL Authors.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # This file is adapted from
 # https://github.com/oobabooga/text-generation-webui/blob/main/modules/prompts.py
 from pathlib import Path
 from modules.text_generation import get_encoded_length
 def load_prompt(fname):
    if fname in ['None', '']:
        return ''
    else:
        file_path = Path(f'prompts/{fname}.txt')
        if not file_path.exists():
            return ''
        with open(file_path, 'r', encoding='utf-8') as f:
            text = f.read()
            if text[-1] == '\n':
                text = text[:-1]
            return text
 def count_tokens(text):
    try:
        tokens = get_encoded_length(text)
        return str(tokens)
    except:
        return '0'
--- a/python/llm/example/Text-Generation-WebUI/modules/relative_imports.py
+++ b/python/llm/example/Text-Generation-WebUI/modules/relative_imports.py
@ -0,0 +1,32 @@
 #
 # Copyright 2016 The BigDL Authors.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # This file is adapted from
 # https://github.com/oobabooga/text-generation-webui/blob/main/modules/relative_imports.py
 import sys
 from pathlib import Path
 class RelativeImport:
    def __init__(self, path):
        self.import_path = Path(path)
    def __enter__(self):
        sys.path.insert(0, str(self.import_path))
    def __exit__(self, exc_type, exc_value, traceback):
        sys.path.remove(str(self.import_path))
--- a/python/llm/example/Text-Generation-WebUI/modules/sampler_hijack.py
+++ b/python/llm/example/Text-Generation-WebUI/modules/sampler_hijack.py
@ -0,0 +1,403 @@
 #
 # Copyright 2016 The BigDL Authors.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # This file is adapted from
 # https://github.com/oobabooga/text-generation-webui/blob/main/modules/sampler_hijack.py
 import math
 import torch
 import transformers
 from transformers import LogitsWarper, is_torch_xpu_available
 from transformers.generation.logits_process import (
    LogitNormalization,
    LogitsProcessor,
    LogitsProcessorList,
    TemperatureLogitsWarper
 )
 from modules import shared
 global_scores = None
 class TemperatureLogitsWarperWithDynatemp(LogitsWarper):
    def __init__(self, temperature: float, dynamic_temperature: bool, dynamic_temperature_low: float):
        if not isinstance(temperature, float) or not (temperature > 0):
            except_msg = (
                f"`temperature` (={temperature}) has to be a strictly positive float, otherwise your next token "
                "scores will be invalid."
            )
            if isinstance(temperature, float) and temperature == 0.0:
                except_msg += " If you're looking for greedy decoding strategies, set `do_sample=False`."
            raise ValueError(except_msg)
        self.temperature = temperature
        self.dynamic_temperature = dynamic_temperature
        self.dynamic_temperature_low = dynamic_temperature_low
    def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor) -> torch.FloatTensor:
        # Regular temperature
        if not self.dynamic_temperature:
            scores = scores / self.temperature
            return scores
        # Dynamic temperature
        else:
            min_temp = self.dynamic_temperature_low
            max_temp = self.temperature
            exponent_val = 1.0
            # Convert logits to probabilities
            probs = torch.softmax(scores, dim=-1)
            # Calculate entropy of the softmax probabilities
            entropy = -1.0 * torch.where(probs > 0, probs * torch.log(probs), torch.zeros_like(probs)).sum()
            # Guard against future possible division by zero
            entropy = max(entropy, torch.tensor(1e-10))  # Ensures entropy is slightly greater than 0
            # Any logits which are not -Infinity will be considered for calculating max entropy.
            num_valid_tokens = torch.sum(scores > -float('inf')).item()
            # Now, calculate the max entropy by using only the valid tokens' count
            max_entropy = math.log(num_valid_tokens)
            # Guard against future possible division by zero
            max_entropy = max_entropy if max_entropy > 0.0 else 1e-10
            # Normalize the entropy
            normalized_entropy = entropy / max_entropy
            # Map the normalized entropy to the desired temperature range using the power function
            dyn_temp = min_temp + (max_temp - min_temp) * (normalized_entropy.pow(exponent_val))
            # Apply the dynamically calculated temperature scaling
            scores = scores / dyn_temp
            # print("----------------------\nTemperature from generation_config:", self.temperature)
            # print("min_temp:", min_temp)
            # print("max_temp:", max_temp)
            # print("Entropy:", entropy.item())
            # print("Max Possible Entropy considering valid tokens only:", max_entropy)
            # print("Normalized Entropy:", normalized_entropy.item())
            # print("Dynamic Temperature (dyn_temp):", dyn_temp.item())
            # print("----------------------")
            # max_prob_token_id = torch.argmax(scores, dim=-1)  # Get the token ID with the highest probability
            # max_prob_token = shared.tokenizer.convert_ids_to_tokens(int(max_prob_token_id))  # Convert ID to token
            # print("--- T=", float(dyn_temp), "token=", max_prob_token, "min=", min_temp, "max=", max_temp)
            return scores
 class MinPLogitsWarper(LogitsWarper):
    def __init__(self, min_p: float, filter_value: float = -float("Inf"), min_tokens_to_keep: int = 1):
        if min_p < 0 or min_p > 1.0:
            raise ValueError(f"`min_p` has to be a float >= 0 and <= 1, but is {min_p}")
        self.min_p = min_p
        self.filter_value = filter_value
        self.min_tokens_to_keep = min_tokens_to_keep
    def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor) -> torch.FloatTensor:
        # Convert logits to probabilities
        probs = torch.softmax(scores, dim=-1)
        # Get the probability of the top token for each sequence in the batch
        top_probs, _ = probs.max(dim=-1, keepdim=True)
        # Calculate the actual min_p threshold by scaling min_p with the top token's probability
        scaled_min_p = self.min_p * top_probs
        # Create a mask for tokens that have a probability less than the scaled min_p
        tokens_to_remove = probs < scaled_min_p
        sorted_indices = torch.argsort(scores, descending=True, dim=-1)
        sorted_indices_to_remove = torch.gather(tokens_to_remove, dim=-1, index=sorted_indices)
        if self.min_tokens_to_keep > 1:
            # Keep at least min_tokens_to_keep
            sorted_indices_to_remove[..., : self.min_tokens_to_keep] = False
        indices_to_remove = sorted_indices_to_remove.scatter(1, sorted_indices, sorted_indices_to_remove)
        scores = scores.masked_fill(indices_to_remove, self.filter_value)
        return scores
 class TailFreeLogitsWarper(LogitsWarper):
    def __init__(self, tfs: float, filter_value: float = -float("Inf"), min_tokens_to_keep: int = 1):
        tfs = float(tfs)
        if tfs < 0 or tfs > 1.0:
            raise ValueError(f"`tfs` has to be a float >= 0 and <= 1, but is {tfs}")
        self.tfs = tfs
        self.filter_value = filter_value
        self.min_tokens_to_keep = min_tokens_to_keep
    def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor) -> torch.FloatTensor:
        sorted_logits, sorted_indices = torch.sort(scores, descending=True)
        probs = sorted_logits.softmax(dim=-1)
        # Compute second derivative normalized CDF
        d2 = probs.diff().diff().abs()
        normalized_d2 = d2 / d2.sum(dim=-1, keepdim=True)
        normalized_d2_cdf = normalized_d2.cumsum(dim=-1)
        # Remove tokens with CDF value above the threshold (token with 0 are kept)
        sorted_indices_to_remove = normalized_d2_cdf > self.tfs
        # Centre the distribution around the cutoff as in the original implementation of the algorithm
        sorted_indices_to_remove = torch.cat(
            (
                torch.zeros(scores.shape[0], 1, dtype=torch.bool, device=scores.device),
                sorted_indices_to_remove,
                torch.ones(scores.shape[0], 1, dtype=torch.bool, device=scores.device),
            ),
            dim=-1,
        )
        if self.min_tokens_to_keep > 1:
            # Keep at least min_tokens_to_keep
            sorted_indices_to_remove[..., : self.min_tokens_to_keep] = 0
        indices_to_remove = sorted_indices_to_remove.scatter(1, sorted_indices, sorted_indices_to_remove)
        scores = scores.masked_fill(indices_to_remove, self.filter_value)
        return scores
 class TopALogitsWarper(LogitsWarper):
    def __init__(self, top_a: float, filter_value: float = -float("Inf"), min_tokens_to_keep: int = 1):
        top_a = float(top_a)
        if top_a < 0 or top_a > 1.0:
            raise ValueError(f"`top_a` has to be a float >= 0 and <= 1, but is {top_a}")
        self.top_a = top_a
        self.filter_value = filter_value
        self.min_tokens_to_keep = min_tokens_to_keep
    def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor) -> torch.FloatTensor:
        sorted_logits, sorted_indices = torch.sort(scores, descending=True)
        probs = sorted_logits.softmax(dim=-1)
        # Remove tokens with probability less than top_a*(max(probs))^2 (token with 0 are kept)
        probs_max = probs[..., 0, None]
        sorted_indices_to_remove = probs < probs_max * probs_max * self.top_a
        if self.min_tokens_to_keep > 1:
            # Keep at least min_tokens_to_keep
            sorted_indices_to_remove[..., : self.min_tokens_to_keep] = 0
        indices_to_remove = sorted_indices_to_remove.scatter(1, sorted_indices, sorted_indices_to_remove)
        scores = scores.masked_fill(indices_to_remove, self.filter_value)
        return scores
 class MirostatLogitsWarper(LogitsWarper):
    def __init__(self, mirostat_mode: int, mirostat_tau: float, mirostat_eta: float, filter_value: float = -float("Inf"), min_tokens_to_keep: int = 1):
        if mirostat_mode not in [2]:
            raise ValueError(f"`mirostat` has to be a an integer 2, but is {mirostat_mode}")
        self.mirostat_mode = mirostat_mode
        self.mirostat_eta = mirostat_eta
        self.mirostat_tau = mirostat_tau
        self.filter_value = filter_value
        self.min_tokens_to_keep = min_tokens_to_keep
        self.mu = 2 * self.mirostat_tau
        self.e = 0
    def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor) -> torch.FloatTensor:
        logits = scores[0]
        sorted_logits, sorted_indices = torch.sort(logits, descending=True)
        prob_original = torch.softmax(sorted_logits, dim=-1).tolist()  # candidates
        # Truncate the words with surprise values greater than mu
        for i, candidate in enumerate(prob_original):
            if candidate > 0 and -math.log2(candidate) > self.mu:
                if (i == 0):
                    sorted_logits = sorted_logits[:1]
                else:
                    sorted_logits = sorted_logits[:i]
                break
        # Normalize the probabilities of the remaining words
        if is_torch_xpu_available():
            prob_topk = torch.softmax(sorted_logits, dim=0).to("xpu")
            prev_i = torch.multinomial(prob_topk, num_samples=1, replacement=True).to("xpu")
        else:
            prob_topk = torch.softmax(sorted_logits, dim=0).to('cuda')
            prev_i = torch.multinomial(prob_topk, num_samples=1, replacement=True).to('cuda')
        observed_surprise = -math.log2(prob_topk[prev_i])
        self.e = observed_surprise - self.mirostat_tau
        # Update mu using the learning rate and error
        self.mu -= self.mirostat_eta * self.e
        sorted_indices_to_remove = torch.ones_like(scores[0], dtype=torch.bool)
        sorted_indices_to_remove[prev_i] = False
        indices_to_remove = sorted_indices_to_remove.unsqueeze(0).scatter(1, sorted_indices.unsqueeze(0), sorted_indices_to_remove.unsqueeze(0))
        scores = scores.masked_fill(indices_to_remove, self.filter_value)
        return scores
 class SpyLogitsWarper(LogitsWarper):
    def __init__(self):
        pass
    def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor) -> torch.FloatTensor:
        global global_scores
        global_scores = scores
        return scores
 class RepetitionPenaltyLogitsProcessorWithRange(LogitsProcessor):
    '''
    Copied from the transformers library
    '''
    def __init__(self, penalty: float, presence_penalty: float, frequency_penalty: float, _range: int):
        if not (penalty > 0):
            raise ValueError(f"`penalty` has to be strictly positive, but is {penalty}")
        self.penalty = penalty
        self.presence_penalty = presence_penalty
        self.frequency_penalty = frequency_penalty
        self._range = _range
    def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor) -> torch.FloatTensor:
        input_ids = input_ids[:, -self._range:]
        # We loop here because torch.unique() needs to process each row separately in the
        # case that batch_size > 1.
        for input_ids_row, scores_row in zip(input_ids, scores):
            unique_ids, counts = torch.unique(input_ids_row, return_counts=True)
            score = torch.gather(scores_row, 0, unique_ids)
            # multiplicative repetition penalty
            # if score < 0 then repetition penalty has to be multiplied to reduce the previous token probability
            score = torch.where(score < 0, score * self.penalty, score / self.penalty)
            scores_row.scatter_(0, unique_ids, score)
            # presence_penalty and frequency_penalty
            raw_presence_penalty = (counts > 0).to(scores.dtype)
            raw_frequency_penalty = counts.to(scores.dtype)
            additive_penalty = raw_presence_penalty * self.presence_penalty + raw_frequency_penalty * self.frequency_penalty
            scores_row.scatter_add_(0, unique_ids, -additive_penalty)
        return scores
 def get_logits_warper_patch(self, generation_config):
    # Make sure that temperature is float and not int
    if isinstance(generation_config.temperature, int):
        generation_config.temperature = float(generation_config.temperature)
    temperature = generation_config.temperature
    if generation_config.dynamic_temperature:
        # Make sure TemperatureLogitsWarper will be created by temporarily
        # setting temperature to a value != 1.
        generation_config.temperature = 1.1
    warpers = self._get_logits_warper_old(generation_config)
    for i in range(len(warpers)):
        if warpers[i].__class__.__name__ == 'TemperatureLogitsWarper':
            warpers[i] = TemperatureLogitsWarperWithDynatemp(temperature, generation_config.dynamic_temperature, generation_config.dynamic_temperature_low)
    warpers_to_add = LogitsProcessorList()
    min_tokens_to_keep = 2 if generation_config.num_beams > 1 else 1
    if generation_config.mirostat_mode is not None and generation_config.mirostat_mode == 2:
        warpers_to_add.append(MirostatLogitsWarper(mirostat_mode=generation_config.mirostat_mode, mirostat_eta=generation_config.mirostat_eta, mirostat_tau=generation_config.mirostat_tau, min_tokens_to_keep=min_tokens_to_keep))
        # We need to disable samplers other than temperature
        for warper in warpers:
            if not isinstance(warper, TemperatureLogitsWarper):
                warpers.remove(warper)
    else:
        if generation_config.tfs is not None and 0.0 <= generation_config.tfs < 1.0:
            warpers_to_add.append(TailFreeLogitsWarper(tfs=generation_config.tfs, min_tokens_to_keep=min_tokens_to_keep))
        if generation_config.top_a is not None and 0.0 < generation_config.top_a <= 1.0:
            warpers_to_add.append(TopALogitsWarper(top_a=generation_config.top_a, min_tokens_to_keep=min_tokens_to_keep))
        if generation_config.min_p is not None and 0.0 < generation_config.min_p <= 1.0:
            warpers_to_add.append(MinPLogitsWarper(min_p=generation_config.min_p, min_tokens_to_keep=min_tokens_to_keep))
    if len(warpers) > 0 and isinstance(warpers[-1], LogitNormalization):
        normalize = warpers.pop(-1)
    else:
        normalize = None
    warpers += warpers_to_add
    if generation_config.temperature_last:
        temperature_idx = None
        for i in range(len(warpers)):
            if warpers[i].__class__.__name__ in ['TemperatureLogitsWarper', 'TemperatureLogitsWarperWithDynatemp']:
                temperature_idx = i
                break
        if temperature_idx is not None:
            warpers.append(warpers.pop(temperature_idx))
    if normalize is not None:
        warpers.append(normalize)
    warpers.append(SpyLogitsWarper())
    warpers = LogitsProcessorList(warpers)
    # for i in range(len(warpers)):
    #     print(warpers[i].__class__.__name__)
    return warpers
 def get_logits_processor_patch(self, **kwargs):
    repetition_penalty = kwargs['generation_config'].repetition_penalty
    presence_penalty = kwargs['generation_config'].presence_penalty
    frequency_penalty = kwargs['generation_config'].frequency_penalty
    repetition_penalty_range = kwargs['generation_config'].repetition_penalty_range
    do_rep_pen_hijack = (repetition_penalty > 1) or (presence_penalty != 0) or (frequency_penalty != 0)
    if do_rep_pen_hijack:
        # Make sure that a RepetitionPenaltyLogitsProcessor will be created
        kwargs['generation_config'].repetition_penalty = 1.1  # must set to some value > 1
    result = self._get_logits_processor_old(**kwargs)
    if do_rep_pen_hijack:
        for i in range(len(result)):
            if result[i].__class__.__name__ == 'RepetitionPenaltyLogitsProcessor':
                result[i] = RepetitionPenaltyLogitsProcessorWithRange(repetition_penalty, presence_penalty, frequency_penalty, repetition_penalty_range)
    return result
 def generation_config_init_patch(self, **kwargs):
    self.__init___old(**kwargs)
    self.min_p = kwargs.pop("min_p", 0.0)
    self.dynamic_temperature = kwargs.pop("dynamic_temperature", False)
    self.dynamic_temperature_low = kwargs.pop("dynamic_temperature_low", 0.1)
    self.tfs = kwargs.pop("tfs", 1.0)
    self.top_a = kwargs.pop("top_a", 0.0)
    self.mirostat_mode = kwargs.pop("mirostat_mode", 0)
    self.mirostat_eta = kwargs.pop("mirostat_eta", 0.1)
    self.mirostat_tau = kwargs.pop("mirostat_tau", 5)
    self.repetition_penalty_range = kwargs.pop("repetition_penalty_range", 0)
    self.presence_penalty = kwargs.pop("presence_penalty", 0)
    self.frequency_penalty = kwargs.pop("frequency_penalty", 0)
    self.temperature_last = kwargs.pop("temperature_last", False)
 def hijack_samplers():
    transformers.GenerationMixin._get_logits_warper_old = transformers.GenerationMixin._get_logits_warper
    transformers.GenerationMixin._get_logits_warper = get_logits_warper_patch
    transformers.GenerationMixin._get_logits_processor_old = transformers.GenerationMixin._get_logits_processor
    transformers.GenerationMixin._get_logits_processor = get_logits_processor_patch
    transformers.GenerationConfig.__init___old = transformers.GenerationConfig.__init__
    transformers.GenerationConfig.__init__ = generation_config_init_patch
--- a/python/llm/example/Text-Generation-WebUI/modules/shared.py
+++ b/python/llm/example/Text-Generation-WebUI/modules/shared.py
@ -0,0 +1,339 @@
 #
 # Copyright 2016 The BigDL Authors.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
 # You may obtain a copy of the License at
 #
 #     http://www.apache.org/licenses/LICENSE-2.0
 #
 # Unless required by applicable law or agreed to in writing, software
 # distributed under the License is distributed on an "AS IS" BASIS,
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
 # This file is adapted from
 # https://github.com/oobabooga/text-generation-webui/blob/main/modules/shared.py
 import argparse
 import os
 import sys
 from collections import OrderedDict
 from pathlib import Path
 import yaml
 from modules.logging_colors import logger
 # Model variables
 model = None
 tokenizer = None
 model_name = 'None'
 is_seq2seq = False
 model_dirty_from_training = False
 lora_names = []
 # Generation variables
 stop_everything = False
 generation_lock = None
 processing_message = '*Is typing...*'
 # UI variables
 gradio = {}
 persistent_interface_state = {}
 need_restart = False
 # UI defaults
 settings = {
    'dark_theme': True,
    'show_controls': True,
    'start_with': '',
    'mode': 'chat',
    'chat_style': 'cai-chat',
    'prompt-default': 'QA',
    'prompt-notebook': 'QA',
    'preset': 'simple-1',
    'max_new_tokens': 512,
    'max_new_tokens_min': 1,
    'max_new_tokens_max': 4096,
    'negative_prompt': '',
    'seed': -1,
    'truncation_length': 2048,
    'truncation_length_min': 0,
    'truncation_length_max': 200000,
    'max_tokens_second': 0,
    'max_updates_second': 0,
    'custom_stopping_strings': '',
    'custom_token_bans': '',
    'auto_max_new_tokens': False,
    'ban_eos_token': False,
    'add_bos_token': True,
    'skip_special_tokens': True,
    'stream': True,
    'character': 'Assistant',
    'name1': 'You',
    'custom_system_message': '',
    'instruction_template_str': "{%- set ns = namespace(found=false) -%}\n{%- for message in messages -%}\n    {%- if message['role'] == 'system' -%}\n        {%- set ns.found = true -%}\n    {%- endif -%}\n{%- endfor -%}\n{%- if not ns.found -%}\n    {{- '' + 'Below is an instruction that describes a task. Write a response that appropriately completes the request.' + '\\n\\n' -}}\n{%- endif %}\n{%- for message in messages %}\n    {%- if message['role'] == 'system' -%}\n        {{- '' + message['content'] + '\\n\\n' -}}\n    {%- else -%}\n        {%- if message['role'] == 'user' -%}\n            {{-'### Instruction:\\n' + message['content'] + '\\n\\n'-}}\n        {%- else -%}\n            {{-'### Response:\\n' + message['content'] + '\\n\\n' -}}\n        {%- endif -%}\n    {%- endif -%}\n{%- endfor -%}\n{%- if add_generation_prompt -%}\n    {{-'### Response:\\n'-}}\n{%- endif -%}",
    'chat_template_str': "{%- for message in messages %}\n    {%- if message['role'] == 'system' -%}\n        {{- message['content'] + '\\n\\n' -}}\n    {%- else -%}\n        {%- if message['role'] == 'user' -%}\n            {{- name1 + ': ' + message['content'] + '\\n'-}}\n        {%- else -%}\n            {{- name2 + ': ' + message['content'] + '\\n' -}}\n        {%- endif -%}\n    {%- endif -%}\n{%- endfor -%}",
    'chat-instruct_command': 'Continue the chat dialogue below. Write a single reply for the character "<|character|>".\n\n<|prompt|>',
    'autoload_model': False,
    'gallery-items_per_page': 50,
    'gallery-open': False,
    'default_extensions': ['gallery'],
 }
 # Parser copied from https://github.com/vladmandic/automatic
 parser = argparse.ArgumentParser(description="Text generation web UI", conflict_handler='resolve', add_help=True, formatter_class=lambda prog: argparse.HelpFormatter(prog, max_help_position=55, indent_increment=2, width=200))
 # Basic settings
 group = parser.add_argument_group('Basic settings')
 group.add_argument('--multi-user', action='store_true', help='Multi-user mode. Chat histories are not saved or automatically loaded. Warning: this is likely not safe for sharing publicly.')
 group.add_argument('--character', type=str, help='The name of the character to load in chat mode by default.')
 group.add_argument('--model', type=str, help='Name of the model to load by default.')
 group.add_argument('--lora', type=str, nargs='+', help='The list of LoRAs to load. If you want to load more than one LoRA, write the names separated by spaces.')
 group.add_argument('--model-dir', type=str, default='models/', help='Path to directory with all the models.')
 group.add_argument('--lora-dir', type=str, default='loras/', help='Path to directory with all the loras.')
 group.add_argument('--model-menu', action='store_true', help='Show a model menu in the terminal when the web UI is first launched.')
 group.add_argument('--settings', type=str, help='Load the default interface settings from this yaml file. See settings-template.yaml for an example. If you create a file called settings.yaml, this file will be loaded by default without the need to use the --settings flag.')
 group.add_argument('--extensions', type=str, nargs='+', help='The list of extensions to load. If you want to load more than one extension, write the names separated by spaces.')
 group.add_argument('--verbose', action='store_true', help='Print the prompts to the terminal.')
 group.add_argument('--chat-buttons', action='store_true', help='Show buttons on the chat tab instead of a hover menu.')
 # Model loader
 group = parser.add_argument_group('Model loader')
 group.add_argument('--loader', type=str, help='Choose the model loader manually, otherwise, it will get autodetected. Valid options: Transformers, llama.cpp, llamacpp_HF, ExLlamav2_HF, ExLlamav2, AutoGPTQ, AutoAWQ, GPTQ-for-LLaMa, ctransformers, QuIP#.')
 # Transformers/Accelerate
 group = parser.add_argument_group('Transformers/Accelerate')
 group.add_argument('--cpu', action='store_true', help='Use the CPU to generate text. Warning: Training on CPU is extremely slow.')
 group.add_argument('--auto-devices', action='store_true', help='Automatically split the model across the available GPU(s) and CPU.')
 group.add_argument('--gpu-memory', type=str, nargs='+', help='Maximum GPU memory in GiB to be allocated per GPU. Example: --gpu-memory 10 for a single GPU, --gpu-memory 10 5 for two GPUs. You can also set values in MiB like --gpu-memory 3500MiB.')
 group.add_argument('--cpu-memory', type=str, help='Maximum CPU memory in GiB to allocate for offloaded weights. Same as above.')
 group.add_argument('--disk', action='store_true', help='If the model is too large for your GPU(s) and CPU combined, send the remaining layers to the disk.')
 group.add_argument('--disk-cache-dir', type=str, default='cache', help='Directory to save the disk cache to. Defaults to "cache".')
 group.add_argument('--load-in-8bit', action='store_true', help='Load the model with 8-bit precision (using bitsandbytes).')
 group.add_argument('--bf16', action='store_true', help='Load the model with bfloat16 precision. Requires NVIDIA Ampere GPU.')
 group.add_argument('--no-cache', action='store_true', help='Set use_cache to False while generating text. This reduces VRAM usage slightly, but it comes at a performance cost.')
 group.add_argument('--trust-remote-code', action='store_true', help='Set trust_remote_code=True while loading the model. Necessary for some models.')
 group.add_argument('--force-safetensors', action='store_true', help='Set use_safetensors=True while loading the model. This prevents arbitrary code execution.')
 group.add_argument('--no_use_fast', action='store_true', help='Set use_fast=False while loading the tokenizer (it\'s True by default). Use this if you have any problems related to use_fast.')
 group.add_argument('--use_flash_attention_2', action='store_true', help='Set use_flash_attention_2=True while loading the model.')
 # bitsandbytes 4-bit
 group = parser.add_argument_group('bitsandbytes 4-bit')
 group.add_argument('--load-in-4bit', action='store_true', help='Load the model with 4-bit precision (using bitsandbytes).')
 group.add_argument('--use_double_quant', action='store_true', help='use_double_quant for 4-bit.')
 group.add_argument('--compute_dtype', type=str, default='float16', help='compute dtype for 4-bit. Valid options: bfloat16, float16, float32.')
 group.add_argument('--quant_type', type=str, default='nf4', help='quant_type for 4-bit. Valid options: nf4, fp4.')
 # llama.cpp
 group = parser.add_argument_group('llama.cpp')
 group.add_argument('--tensorcores', action='store_true', help='Use llama-cpp-python compiled with tensor cores support. This increases performance on RTX cards. NVIDIA only.')
 group.add_argument('--n_ctx', type=int, default=2048, help='Size of the prompt context.')
 group.add_argument('--threads', type=int, default=0, help='Number of threads to use.')
 group.add_argument('--threads-batch', type=int, default=0, help='Number of threads to use for batches/prompt processing.')
 group.add_argument('--no_mul_mat_q', action='store_true', help='Disable the mulmat kernels.')
 group.add_argument('--n_batch', type=int, default=512, help='Maximum number of prompt tokens to batch together when calling llama_eval.')
 group.add_argument('--no-mmap', action='store_true', help='Prevent mmap from being used.')
 group.add_argument('--mlock', action='store_true', help='Force the system to keep the model in RAM.')
 group.add_argument('--n-gpu-layers', type=int, default=0, help='Number of layers to offload to the GPU.')
 group.add_argument('--tensor_split', type=str, default=None, help='Split the model across multiple GPUs. Comma-separated list of proportions. Example: 18,17.')
 group.add_argument('--numa', action='store_true', help='Activate NUMA task allocation for llama.cpp.')
 group.add_argument('--logits_all', action='store_true', help='Needs to be set for perplexity evaluation to work. Otherwise, ignore it, as it makes prompt processing slower.')
 group.add_argument('--no_offload_kqv', action='store_true', help='Do not offload the  K, Q, V to the GPU. This saves VRAM but reduces the performance.')
 group.add_argument('--cache-capacity', type=str, help='Maximum cache capacity (llama-cpp-python). Examples: 2000MiB, 2GiB. When provided without units, bytes will be assumed.')
 # ExLlama
 group = parser.add_argument_group('ExLlama')
 group.add_argument('--gpu-split', type=str, help='Comma-separated list of VRAM (in GB) to use per GPU device for model layers. Example: 20,7,7.')
 group.add_argument('--max_seq_len', type=int, default=2048, help='Maximum sequence length.')
 group.add_argument('--cfg-cache', action='store_true', help='ExLlamav2_HF: Create an additional cache for CFG negative prompts. Necessary to use CFG with that loader.')
 group.add_argument('--no_flash_attn', action='store_true', help='Force flash-attention to not be used.')
 group.add_argument('--cache_8bit', action='store_true', help='Use 8-bit cache to save VRAM.')
 group.add_argument('--num_experts_per_token', type=int, default=2, help='Number of experts to use for generation. Applies to MoE models like Mixtral.')
 # AutoGPTQ
 group = parser.add_argument_group('AutoGPTQ')
 group.add_argument('--triton', action='store_true', help='Use triton.')
 group.add_argument('--no_inject_fused_attention', action='store_true', help='Disable the use of fused attention, which will use less VRAM at the cost of slower inference.')
 group.add_argument('--no_inject_fused_mlp', action='store_true', help='Triton mode only: disable the use of fused MLP, which will use less VRAM at the cost of slower inference.')
 group.add_argument('--no_use_cuda_fp16', action='store_true', help='This can make models faster on some systems.')
 group.add_argument('--desc_act', action='store_true', help='For models that do not have a quantize_config.json, this parameter is used to define whether to set desc_act or not in BaseQuantizeConfig.')
 group.add_argument('--disable_exllama', action='store_true', help='Disable ExLlama kernel, which can improve inference speed on some systems.')
 group.add_argument('--disable_exllamav2', action='store_true', help='Disable ExLlamav2 kernel.')
 # GPTQ-for-LLaMa
 group = parser.add_argument_group('GPTQ-for-LLaMa')
 group.add_argument('--wbits', type=int, default=0, help='Load a pre-quantized model with specified precision in bits. 2, 3, 4 and 8 are supported.')
 group.add_argument('--model_type', type=str, help='Model type of pre-quantized model. Currently LLaMA, OPT, and GPT-J are supported.')
 group.add_argument('--groupsize', type=int, default=-1, help='Group size.')
 group.add_argument('--pre_layer', type=int, nargs='+', help='The number of layers to allocate to the GPU. Setting this parameter enables CPU offloading for 4-bit models. For multi-gpu, write the numbers separated by spaces, eg --pre_layer 30 60.')
 group.add_argument('--checkpoint', type=str, help='The path to the quantized checkpoint file. If not specified, it will be automatically detected.')
 group.add_argument('--monkey-patch', action='store_true', help='Apply the monkey patch for using LoRAs with quantized models.')
 # BigDL-LLM
 group = parser.add_argument_group('BigDL-LLM')
 group.add_argument('--device', type=str, default='cpu', help='the device type, it could be CPU or GPU')
 group.add_argument('--load-in-4bit', action='store_true', default=False, help='boolean value, True means loading linear’s weight to symmetric int 4 if'\
                   'the model is a regular fp16/bf16/fp32 model, and to asymmetric int 4 if the model is GPTQ model.Default to be False')
 group.add_argument('--load-in-low-bit', type=str, default=None, help='str value, options are sym_int4, asym_int4, sym_int5, asym_int5'\
                   ', sym_int8, nf3, nf4, fp4, fp8, fp8_e4m3, fp8_e5m2, fp16 or bf16. sym_int4 means symmetric int 4, asym_int4 means asymmetric int 4,'\
                   'nf4 means 4-bit NormalFloat, etc. Relevant low bit optimizations will be applied to the model.')
 group.add_argument('--optimize-model', action='store_true', help='boolean value, Whether to further optimize the low_bit llm model.')
 group.add_argument('--modules-to-not-convert', type=str, default=None, help='list of str value, modules (nn.Module) that are skipped when conducting model optimizations.')
 group.add_argument('--cpu-embedding', action='store_true', help='Whether to replace the Embedding layer, may need to set it to `True` when running BigDL-LLM on GPU on Windows. Default to be `False`')
 group.add_argument('--lightweight-bmm', action='store_true', help='Whether to replace the torch.bmm ops, may need to set it to `True` when running BigDL-LLM on GPU on Windows.')
 group.add_argument('--use-cache', action='store_true', help='If use_cache is True, past key values are used to speed up decoding if applicable to model.')
 group.add_argument('--trust-remote-code', action='store_true', help='Set trust_remote_code=True while loading the model. Necessary for some models.')
 # HQQ
 group = parser.add_argument_group('HQQ')
 group.add_argument('--hqq_backend', type=str, default='PYTORCH_COMPILE', help='Backend for the HQQ loader. Valid options: PYTORCH, PYTORCH_COMPILE, ATEN.')
 # DeepSpeed
 group = parser.add_argument_group('DeepSpeed')
 group.add_argument('--deepspeed', action='store_true', help='Enable the use of DeepSpeed ZeRO-3 for inference via the Transformers integration.')
 group.add_argument('--nvme-offload-dir', type=str, help='DeepSpeed: Directory to use for ZeRO-3 NVME offloading.')
 group.add_argument('--local_rank', type=int, default=0, help='DeepSpeed: Optional argument for distributed setups.')
 # RoPE
 group = parser.add_argument_group('RoPE')
 group.add_argument('--alpha_value', type=float, default=1, help='Positional embeddings alpha factor for NTK RoPE scaling. Use either this or compress_pos_emb, not both.')
 group.add_argument('--rope_freq_base', type=int, default=0, help='If greater than 0, will be used instead of alpha_value. Those two are related by rope_freq_base = 10000 * alpha_value ^ (64 / 63).')
 group.add_argument('--compress_pos_emb', type=int, default=1, help="Positional embeddings compression factor. Should be set to (context length) / (model\'s original context length). Equal to 1/rope_freq_scale.")
 # Gradio
 group = parser.add_argument_group('Gradio')
 group.add_argument('--listen', action='store_true', help='Make the web UI reachable from your local network.')
 group.add_argument('--listen-port', type=int, help='The listening port that the server will use.')
 group.add_argument('--listen-host', type=str, help='The hostname that the server will use.')
 group.add_argument('--share', action='store_true', help='Create a public URL. This is useful for running the web UI on Google Colab or similar.')
 group.add_argument('--auto-launch', action='store_true', default=False, help='Open the web UI in the default browser upon launch.')
 group.add_argument('--gradio-auth', type=str, help='Set Gradio authentication password in the format "username:password". Multiple credentials can also be supplied with "u1:p1,u2:p2,u3:p3".', default=None)
 group.add_argument('--gradio-auth-path', type=str, help='Set the Gradio authentication file path. The file should contain one or more user:password pairs in the same format as above.', default=None)
 group.add_argument('--ssl-keyfile', type=str, help='The path to the SSL certificate key file.', default=None)
 group.add_argument('--ssl-certfile', type=str, help='The path to the SSL certificate cert file.', default=None)
 # API
 group = parser.add_argument_group('API')
 group.add_argument('--api', action='store_true', help='Enable the API extension.')
 group.add_argument('--public-api', action='store_true', help='Create a public URL for the API using Cloudfare.')
 group.add_argument('--public-api-id', type=str, help='Tunnel ID for named Cloudflare Tunnel. Use together with public-api option.', default=None)
 group.add_argument('--api-port', type=int, default=5000, help='The listening port for the API.')
 group.add_argument('--api-key', type=str, default='', help='API authentication key.')
 group.add_argument('--admin-key', type=str, default='', help='API authentication key for admin tasks like loading and unloading models. If not set, will be the same as --api-key.')
 group.add_argument('--nowebui', action='store_true', help='Do not launch the Gradio UI. Useful for launching the API in standalone mode.')
 # Multimodal
 group = parser.add_argument_group('Multimodal')
 group.add_argument('--multimodal-pipeline', type=str, default=None, help='The multimodal pipeline to use. Examples: llava-7b, llava-13b.')
 # Deprecated parameters
 # group = parser.add_argument_group('Deprecated')
 args = parser.parse_args()
 args_defaults = parser.parse_args([])
 provided_arguments = []
 for arg in sys.argv[1:]:
    arg = arg.lstrip('-').replace('-', '_')
    if hasattr(args, arg):
        provided_arguments.append(arg)
 deprecated_args = []
 def do_cmd_flags_warnings():
    # Deprecation warnings
    for k in deprecated_args:
        if getattr(args, k):
            logger.warning(f'The --{k} flag has been deprecated and will be removed soon. Please remove that flag.')
    # Security warnings
    if args.trust_remote_code:
        logger.warning('trust_remote_code is enabled. This is dangerous.')
    if 'COLAB_GPU' not in os.environ and not args.nowebui:
        if args.share:
            logger.warning("The gradio \"share link\" feature uses a proprietary executable to create a reverse tunnel. Use it with care.")
        if any((args.listen, args.share)) and not any((args.gradio_auth, args.gradio_auth_path)):
            logger.warning("\nYou are potentially exposing the web UI to the entire internet without any access password.\nYou can create one with the \"--gradio-auth\" flag like this:\n\n--gradio-auth username:password\n\nMake sure to replace username:password with your own.")
            if args.multi_user:
                logger.warning('\nThe multi-user mode is highly experimental and should not be shared publicly.')
 def fix_loader_name(name):
    if not name:
        return name
    name = name.lower()
    if name in ['llamacpp', 'llama.cpp', 'llama-cpp', 'llama cpp']:
        return 'llama.cpp'
    if name in ['llamacpp_hf', 'llama.cpp_hf', 'llama-cpp-hf', 'llamacpp-hf', 'llama.cpp-hf']:
        return 'llamacpp_HF'
    elif name in ['transformers', 'huggingface', 'hf', 'hugging_face', 'hugging face']:
        return 'Transformers'
    elif name in ['autogptq', 'auto-gptq', 'auto_gptq', 'auto gptq']:
        return 'AutoGPTQ'
    elif name in ['gptq-for-llama', 'gptqforllama', 'gptqllama', 'gptq for llama', 'gptq_for_llama']:
        return 'GPTQ-for-LLaMa'
    elif name in ['exllama', 'ex-llama', 'ex_llama', 'exlama']:
        return 'ExLlama'
    elif name in ['exllamav2', 'exllama-v2', 'ex_llama-v2', 'exlamav2', 'exlama-v2', 'exllama2', 'exllama-2']:
        return 'ExLlamav2'
    elif name in ['exllamav2-hf', 'exllamav2_hf', 'exllama-v2-hf', 'exllama_v2_hf', 'exllama-v2_hf', 'exllama2-hf', 'exllama2_hf', 'exllama-2-hf', 'exllama_2_hf', 'exllama-2_hf']:
        return 'ExLlamav2_HF'
    elif name in ['ctransformers', 'ctranforemrs', 'ctransformer']:
        return 'ctransformers'
    elif name in ['autoawq', 'awq', 'auto-awq']:
        return 'AutoAWQ'
    elif name in ['quip#', 'quip-sharp', 'quipsharp', 'quip_sharp']:
        return 'QuIP#'
    elif name in ['hqq']:
        return 'HQQ'
    elif name in ['BigDL-LLM', 'bigdl-llm', 'bigdl']:
        return 'BigDL-LLM'
 def add_extension(name, last=False):
    if args.extensions is None:
        args.extensions = [name]
    elif last:
        args.extensions = [x for x in args.extensions if x != name]
        args.extensions.append(name)
    elif name not in args.extensions:
        args.extensions.append(name)
 def is_chat():
    return True
 args.loader = fix_loader_name(args.loader)
 # Activate the multimodal extension
 if args.multimodal_pipeline is not None:
    add_extension('multimodal')
 # Activate the API extension
 if args.api or args.public_api:
    add_extension('openai', last=True)
 # Load model-specific settings
 with Path(f'{args.model_dir}/config.yaml') as p:
    if p.exists():
        model_config = yaml.safe_load(open(p, 'r').read())
    else:
        model_config = {}
 # Load custom model-specific settings
 with Path(f'{args.model_dir}/config-user.yaml') as p:
    if p.exists():
        user_config = yaml.safe_load(open(p, 'r').read())
    else:
        user_config = {}
 model_config = OrderedDict(model_config)
 user_config = OrderedDict(user_config)
--- a/Show more
+++ b/Show more
		`@ -0,0 +1,2 @@`
							`root ::= "1. " paragraph "\n" ([0-9] [0-9]? ". " paragraph "\n")+`
							`paragraph ::= [a-zA-Z'.,; ]+`