[LLM] Add Text_Generation_WebUI Support (#9884)

* initially add text_generation_webui support

* add env requirements install

* add necessary dependencies

* update for starting webui

* update shared and noted to place models

* update heading of part3

* meet comments

* add copyright license

* remove extensions

* convert tutorial to windows side

* add warm-up to optimize performance
This commit is contained in:
SONG Ge 2024-01-26 15:12:49 +08:00 committed by GitHub
parent f0da0c131b
commit 421e7cee80
137 changed files with 12365 additions and 0 deletions

View file

@ -0,0 +1,92 @@
This tutorial provides a step-by-step guide on how to use Text-Generation-WebUI to run Hugging Face transformers-based applications on BigDL-LLM.
The WebUI is ported from [Text-Generation-WebUI](https://github.com/oobabooga/text-generation-webui).
## 1. Prepare the environment on Windows
Please use a python environment management tool (we recommend using Conda) to create a python enviroment and install necessary libs.
### 1.1 Install BigDL-LLM
Please see [BigDL-LLm Installation on Windows](https://bigdl.readthedocs.io/en/latest/doc/LLM/Overview/install_gpu.html#windows) for more details to install BigDL-LLM on your Client.
### 1.2 Install other required dependencies
```bash
pip install -r requirements.txt
```
Note: Text-Generation-WebUI requires transformers version >= 4.36.0
## 2. Start the WebUI Server
### 2.1 For INT4 Optimizations
For a quick start, you may run the script as below to start WebUI directly, it will automatically optimize and accelerate LLMs using INT4 optimizations.
```bash
python server.py
```
### 2.2 Optimizations for Other Percisions
To enable optimizations for more precisions (`sym_int4`, `asym_int4`, `sym_int8`, `fp4`, `fp8`, `fp16`, `mixed_fp4`, `mixed_fp8`, etc.), you may run the command as below:
```bash
python server.py --load-in-low-bit
```
### 2.3 Access the WebUI
After the successful startup of the WebUI server, it will provide links to access the WebUI as below. Please open the public URL in your browser to access the full functionality of the WebUI.
```bash
Running on local URL: http://127.0.0.1:7860
Running on public URL: https://your_tokens_here.gradio.live
This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)
```
## 3. Run Models
### 3.1 Select the Model
First, place your local model in `Text-Generation-WebUI/models` directory, you may also choose to download a model from Hugging Face.
Next, please click the `Model` button to choose your model.
![Image text](https://github.com/intel-analytics/BigDL/blob/1df67d7927ebea0af570b09f36ce76efbf9b8bad/python/llm/example/Text-Generation-WebUI/readme_folder/image.png)
### 3.2 Enable BigDL-LLM Optimizations
Text-Generation-WebUI supports multiple backends, including `Transformers`, `llama.cpp`, `BigDL-LLM`, etc. Please select the BigDL-LLM backend as below to enable low-bit optimizations.
![Image text](https://github.com/intel-analytics/BigDL/blob/1df67d7927ebea0af570b09f36ce76efbf9b8bad/python/llm/example/Text-Generation-WebUI/readme_folder/image-1.png)
Then please select the device according to your device.
![Image text](https://github.com/intel-analytics/BigDL/blob/1df67d7927ebea0af570b09f36ce76efbf9b8bad/python/llm/example/Text-Generation-WebUI/readme_folder/image-2.png)
### 3.3 Load Model in Low Precision
One common use case of BigDL-LLM is to load a Hugging Face transformers model in low precision.
Notes:
- When you start the web UI with `--load-in-4bit`, you will not be allowed to choose the quantization precision in `load-in-low-bit`. The model will be loaded with the INT4 precision as default.
- When you want to load model in other precisions, please run server.py with `--load-in-low-bit` parameter. You may choose the precision from the list of `load-in-low-bit` option, and the `load-in-4bit` option will be disabled.
- Please select the `optimize-model` and `use_cache` options to accelerate the model.
Now you may click the `Load` button to load the model with BigDL-LLM optimizations.
![Image text](https://github.com/intel-analytics/BigDL/blob/1df67d7927ebea0af570b09f36ce76efbf9b8bad/python/llm/example/Text-Generation-WebUI/readme_folder/image-3.png)
### 3.4 Run the Model on WebUI
Now you can do model inference on Text-Generation-WebUI with BigDL-LLM optimizations, including `Chat`, `Default` and `Notebook` Tabs. Please see [Chat-Tab Wiki](https://github.com/oobabooga/text-generation-webui/wiki/01-%E2%80%90-Chat-Tab) and [Default and Notebook Tabs Wiki](https://github.com/oobabooga/text-generation-webui/wiki/02-%E2%80%90-Default-and-Notebook-Tabs) for more details.
![Image text](https://github.com/intel-analytics/BigDL/blob/1df67d7927ebea0af570b09f36ce76efbf9b8bad/python/llm/example/Text-Generation-WebUI/readme_folder/image-4.png)

View file

@ -0,0 +1,4 @@
name: AI
greeting: How can I help you today?
context: |
The following is a conversation with an AI Large Language Model. The AI has been trained to answer questions, provide recommendations, and help with decision making. The AI follows user requests. The AI thinks outside the box.

Binary file not shown.

After

Width:  |  Height:  |  Size: 206 KiB

View file

@ -0,0 +1,17 @@
name: Chiharu Yamada
greeting: |-
*Chiharu strides into the room with a smile, her eyes lighting up when she sees you. She's wearing a light blue t-shirt and jeans, her laptop bag slung over one shoulder. She takes a seat next to you, her enthusiasm palpable in the air*
Hey! I'm so excited to finally meet you. I've heard so many great things about you and I'm eager to pick your brain about computers. I'm sure you have a wealth of knowledge that I can learn from. *She grins, eyes twinkling with excitement* Let's get started!
context: |-
Chiharu Yamada's Persona: Chiharu Yamada is a young, computer engineer-nerd with a knack for problem solving and a passion for technology.
{{user}}: So how did you get into computer engineering?
{{char}}: I've always loved tinkering with technology since I was a kid.
{{user}}: That's really impressive!
{{char}}: *She chuckles bashfully* Thanks!
{{user}}: So what do you do when you're not working on computers?
{{char}}: I love exploring, going out with friends, watching movies, and playing video games.
{{user}}: What's your favorite type of computer hardware to work with?
{{char}}: Motherboards, they're like puzzles and the backbone of any system.
{{user}}: That sounds great!
{{char}}: Yeah, it's really fun. I'm lucky to be able to do this as a job.

View file

@ -0,0 +1,166 @@
/*
Copied from https://github.com/SillyTavern/SillyTavern/tree/6c8bd06308c69d51e2eb174541792a870a83d2d6/public/webfonts/NotoSans
*/
@font-face {
font-family: 'Noto Sans';
src: url('file/css/NotoSans/NotoSans-Black.woff2') format('woff2'),
url('file/css/NotoSans/NotoSans-Black.woff') format('woff');
font-weight: 900;
font-style: normal;
font-display: swap;
}
@font-face {
font-family: 'Noto Sans';
src: url('file/css/NotoSans/NotoSans-ExtraBoldItalic.woff2') format('woff2'),
url('file/css/NotoSans/NotoSans-ExtraBoldItalic.woff') format('woff');
font-weight: bold;
font-style: italic;
font-display: swap;
}
@font-face {
font-family: 'Noto Sans';
src: url('file/css/NotoSans/NotoSans-BlackItalic.woff2') format('woff2'),
url('file/css/NotoSans/NotoSans-BlackItalic.woff') format('woff');
font-weight: 900;
font-style: italic;
font-display: swap;
}
@font-face {
font-family: 'Noto Sans';
src: url('file/css/NotoSans/NotoSans-ExtraBold.woff2') format('woff2'),
url('file/css/NotoSans/NotoSans-ExtraBold.woff') format('woff');
font-weight: bold;
font-style: normal;
font-display: swap;
}
@font-face {
font-family: 'Noto Sans';
src: url('file/css/NotoSans/NotoSans-ThinItalic.woff2') format('woff2'),
url('file/css/NotoSans/NotoSans-ThinItalic.woff') format('woff');
font-weight: 100;
font-style: italic;
font-display: swap;
}
@font-face {
font-family: 'Noto Sans';
src: url('file/css/NotoSans/NotoSans-BoldItalic.woff2') format('woff2'),
url('file/css/NotoSans/NotoSans-BoldItalic.woff') format('woff');
font-weight: bold;
font-style: italic;
font-display: swap;
}
@font-face {
font-family: 'Noto Sans';
src: url('file/css/NotoSans/NotoSans-Bold.woff2') format('woff2'),
url('file/css/NotoSans/NotoSans-Bold.woff') format('woff');
font-weight: bold;
font-style: normal;
font-display: swap;
}
@font-face {
font-family: 'Noto Sans';
src: url('file/css/NotoSans/NotoSans-LightItalic.woff2') format('woff2'),
url('file/css/NotoSans/NotoSans-LightItalic.woff') format('woff');
font-weight: 300;
font-style: italic;
font-display: swap;
}
@font-face {
font-family: 'Noto Sans';
src: url('file/css/NotoSans/NotoSans-Italic.woff2') format('woff2'),
url('file/css/NotoSans/NotoSans-Italic.woff') format('woff');
font-weight: normal;
font-style: italic;
font-display: swap;
}
@font-face {
font-family: 'Noto Sans';
src: url('file/css/NotoSans/NotoSans-ExtraLightItalic.woff2') format('woff2'),
url('file/css/NotoSans/NotoSans-ExtraLightItalic.woff') format('woff');
font-weight: 200;
font-style: italic;
font-display: swap;
}
@font-face {
font-family: 'Noto Sans';
src: url('file/css/NotoSans/NotoSans-Light.woff2') format('woff2'),
url('file/css/NotoSans/NotoSans-Light.woff') format('woff');
font-weight: 300;
font-style: normal;
font-display: swap;
}
@font-face {
font-family: 'Noto Sans';
src: url('file/css/NotoSans/NotoSans-ExtraLight.woff2') format('woff2'),
url('file/css/NotoSans/NotoSans-ExtraLight.woff') format('woff');
font-weight: 200;
font-style: normal;
font-display: swap;
}
@font-face {
font-family: 'Noto Sans';
src: url('file/css/NotoSans/NotoSans-Medium.woff2') format('woff2'),
url('file/css/NotoSans/NotoSans-Medium.woff') format('woff');
font-weight: 500;
font-style: normal;
font-display: swap;
}
@font-face {
font-family: 'Noto Sans';
src: url('file/css/NotoSans/NotoSans-Regular.woff2') format('woff2'),
url('file/css/NotoSans/NotoSans-Regular.woff') format('woff');
font-weight: normal;
font-style: normal;
font-display: swap;
}
@font-face {
font-family: 'Noto Sans';
src: url('file/css/NotoSans/NotoSans-MediumItalic.woff2') format('woff2'),
url('file/css/NotoSans/NotoSans-MediumItalic.woff') format('woff');
font-weight: 500;
font-style: italic;
font-display: swap;
}
@font-face {
font-family: 'Noto Sans';
src: url('file/css/NotoSans/NotoSans-SemiBoldItalic.woff2') format('woff2'),
url('file/css/NotoSans/NotoSans-SemiBoldItalic.woff') format('woff');
font-weight: 600;
font-style: italic;
font-display: swap;
}
@font-face {
font-family: 'Noto Sans';
src: url('file/css/NotoSans/NotoSans-SemiBold.woff2') format('woff2'),
url('file/css/NotoSans/NotoSans-SemiBold.woff') format('woff');
font-weight: 600;
font-style: normal;
font-display: swap;
}
@font-face {
font-family: 'Noto Sans';
src: url('file/css/NotoSans/NotoSans-Thin.woff2') format('woff2'),
url('file/css/NotoSans/NotoSans-Thin.woff') format('woff');
font-weight: 100;
font-style: normal;
font-display: swap;
}

View file

@ -0,0 +1,133 @@
/* All credits to TheEncrypted777: https://www.reddit.com/r/Oobabooga/comments/12xe6vq/updated_css_styling_with_color_customization_for/ */
.message {
display: grid;
grid-template-columns: 60px minmax(0, 1fr);
padding-bottom: 28px;
font-size: 18px;
font-family: 'Noto Sans', Arial, sans-serif;
line-height: 1.428571429;
}
.circle-you,
.circle-bot {
background-color: gray;
border-radius: 1rem;
border: 2px solid white;
}
.circle-bot img,
.circle-you img {
border-radius: 10%;
width: 100%;
height: 100%;
object-fit: cover;
}
.circle-you, .circle-bot {
/* You can set the size of the profile images here, but if you do, you have to also adjust the .text{padding-left: 90px} to a different number according to the width of the image which is right below here */
width: 135px;
height: 175px;
}
.text {
/* Change this to move the message box further left or right depending on the size of your profile pic */
padding-left: 90px;
text-shadow: 2px 2px 2px rgb(0 0 0 / 40%);
}
.text p {
margin-top: 2px;
}
.username {
padding-left: 10px;
font-size: 22px;
font-weight: bold;
border-top: 1px solid rgb(51 64 90);
padding: 3px;
}
.message-body {
position: relative;
border: 1px solid rgb(255 255 255 / 45.9%);
border-radius: 10px;
padding: 10px;
padding-top: 5px;
/* Message gradient background color - remove the line bellow if you don't want a background color or gradient */
background: linear-gradient(to bottom, #171730, #1b263f);
}
/* Adds 2 extra lines at the top and bottom of the message */
.message-body::before,
.message-body::after {
content: "";
position: absolute;
left: 10px;
right: 10px;
height: 1px;
background-color: rgb(255 255 255 / 13%);
}
.message-body::before {
top: 6px;
}
.message-body::after {
bottom: 6px;
}
.message-body img {
max-width: 300px;
max-height: 300px;
border-radius: 20px;
}
.message-body p {
margin-bottom: 0 !important;
font-size: 18px !important;
line-height: 1.428571429 !important;
color: rgb(243 244 246) !important;
text-shadow: 2px 2px 2px rgb(0 0 0);
}
.message-body p em {
color: rgb(138 138 138) !important;
}
@media screen and (width <= 688px) {
.message {
display: grid;
grid-template-columns: 60px minmax(0, 1fr);
padding-bottom: 25px;
font-size: 15px;
font-family: 'Noto Sans', Helvetica, Arial, sans-serif;
line-height: 1.428571429;
}
.circle-you, .circle-bot {
width: 50px;
height: 73px;
border-radius: 0.5rem;
}
.circle-bot img,
.circle-you img {
width: 100%;
height: 100%;
object-fit: cover;
}
.text {
padding-left: 0;
}
.message-body p {
font-size: 16px !important;
}
.username {
font-size: 20px;
}
}

View file

@ -0,0 +1,21 @@
@import url("file/css/chat_style-cai-chat.css");
.circle-bot, .circle-you {
height: 90px;
width: 60px;
border-radius: 10px;
background-color: #656565;
}
.circle-bot img, .circle-you img {
border-radius: 8.333px;
}
.circle-you {
background-color: #656565;
}
.message {
padding-bottom: 30px;
grid-template-columns: 70px minmax(0, 1fr);
}

View file

@ -0,0 +1,62 @@
.message {
display: grid;
grid-template-columns: 60px minmax(0, 1fr);
padding-bottom: 15px;
font-size: 15px;
font-family: 'Noto Sans', Helvetica, Arial, sans-serif;
line-height: 22.5px !important;
}
.message-body {
margin-top: 3px;
}
.circle-you {
width: 50px;
height: 50px;
background-color: rgb(238 78 59);
border-radius: 50%;
}
.circle-bot {
width: 50px;
height: 50px;
background-color: rgb(59 78 244);
border-radius: 50%;
}
.circle-bot img,
.circle-you img {
border-radius: 50%;
width: 100%;
height: 100%;
object-fit: cover;
}
.username {
font-weight: bold;
}
.message-body img {
max-width: 300px;
max-height: 300px;
border-radius: 20px;
}
.message-body p {
font-size: 15px !important;
line-height: 22.5px !important;
}
.message-body p, .chat .message-body ul, .chat .message-body ol {
margin-bottom: 10px !important;
}
.dark .message-body p em {
color: rgb(138 138 138) !important;
}
.message-body p em {
color: rgb(110 110 110) !important;
font-weight: 500;
}

View file

@ -0,0 +1,99 @@
.message {
padding-bottom: 25px;
font-size: 15px;
font-family: 'Noto Sans', Helvetica, Arial, sans-serif;
line-height: 1.428571429;
}
.circle-you {
width: 50px;
height: 50px;
background-color: rgb(238 78 59);
border-radius: 50%;
}
.circle-bot {
width: 50px;
height: 50px;
background-color: rgb(59 78 244);
border-radius: 50%;
float: left;
margin-right: 10px;
margin-top: 5px;
}
.circle-bot img,
.circle-you img {
border-radius: 50%;
width: 100%;
height: 100%;
object-fit: cover;
}
.circle-you {
margin-top: 5px;
float: right;
}
.circle-bot + .text, .circle-you + .text {
border-radius: 18px;
padding: 8px 12px;
}
.circle-bot + .text {
background-color: #E4E6EB;
float: left;
}
.circle-you + .text {
float: right;
background-color: rgb(0 132 255);
margin-right: 10px;
}
.circle-you + .text div, .circle-you + .text *, .dark .circle-you + .text div, .dark .circle-you + .text * {
color: #FFF !important;
}
.circle-you + .text .username {
text-align: right;
}
.dark .circle-bot + .text div, .dark .circle-bot + .text * {
color: #000;
}
.text {
max-width: 80%;
}
.text p {
margin-top: 5px;
}
.username {
font-weight: bold;
}
.message-body {
}
.message-body img {
max-width: 300px;
max-height: 300px;
border-radius: 20px;
}
.message-body p {
margin-bottom: 0 !important;
font-size: 15px !important;
line-height: 1.428571429 !important;
}
.dark .message-body p em {
color: rgb(138 138 138) !important;
}
.message-body p em {
color: rgb(110 110 110) !important;
}

View file

@ -0,0 +1,55 @@
.message {
padding-bottom: 25px;
font-size: 15px;
font-family: 'Noto Sans', Helvetica, Arial, sans-serif;
line-height: 1.428571429;
}
.text-you {
background-color: #d9fdd3;
border-radius: 15px;
padding: 10px;
padding-top: 5px;
float: right;
}
.text-bot {
background-color: #f2f2f2;
border-radius: 15px;
padding: 10px;
padding-top: 5px;
}
.dark .text-you {
background-color: #005c4b;
color: #111b21;
}
.dark .text-bot {
background-color: #1f2937;
color: #111b21;
}
.text-bot p, .text-you p {
margin-top: 5px;
}
.message-body img {
max-width: 300px;
max-height: 300px;
border-radius: 20px;
}
.message-body p {
margin-bottom: 0 !important;
font-size: 15px !important;
line-height: 1.428571429 !important;
}
.dark .message-body p em {
color: rgb(138 138 138) !important;
}
.message-body p em {
color: rgb(110 110 110) !important;
}

View file

@ -0,0 +1,73 @@
#parent #container {
background-color: #eef2ff;
padding: 17px;
}
#parent #container .reply {
background-color: rgb(214 218 240);
border-bottom: 1px solid rgb(183 197 217);
border-image: none 100% 1 0 stretch;
border-left: 0 none rgb(0 0 0);
border-right: 1px solid rgb(183 197 217);
color: rgb(0 0 0);
display: table;
font-family: arial, helvetica, sans-serif;
font-size: 13.3333px;
margin: 4px 0;
overflow: hidden hidden;
padding: 4px 2px;
}
#parent #container .number {
color: rgb(0 0 0);
font-family: arial, helvetica, sans-serif;
font-size: 13.3333px;
width: 342.65px;
margin-right: 7px;
}
#parent #container .op {
color: rgb(0 0 0);
font-family: arial, helvetica, sans-serif;
font-size: 13.3333px;
margin: 4px 0 8px;
overflow: hidden hidden;
}
#parent #container .op blockquote {
margin-left: 0 !important;
}
#parent #container .name {
color: rgb(17 119 67);
font-family: arial, helvetica, sans-serif;
font-size: 13.3333px;
font-weight: 700;
margin-left: 7px;
}
#parent #container .quote {
color: rgb(221 0 0);
font-family: arial, helvetica, sans-serif;
font-size: 13.3333px;
text-decoration: underline solid rgb(221 0 0);
text-decoration-thickness: auto;
}
#parent #container .greentext {
color: rgb(120 153 34);
font-family: arial, helvetica, sans-serif;
font-size: 13.3333px;
}
#parent #container blockquote {
margin: 0 !important;
margin-block: 1em 1em;
margin-inline: 40px 40px;
margin: 13.33px 40px !important;
}
#parent #container .message_4chan {
color: black;
border: none;
}

View file

@ -0,0 +1,82 @@
.chat {
background: var(--block-background-fill);
padding: 24px 19px;
padding-right: 19px !important;
padding-top: 0;
border: 1px solid var(--block-border-color);
}
.chat > .messages {
padding-top: 28px !important;
}
.message {
display: grid;
grid-template-columns: 60px 1fr;
padding-bottom: 25px;
font-size: 15px;
font-family: 'Noto Sans', Helvetica, Arial, sans-serif;
line-height: 24px;
}
.message:first-child {
padding-top: 0;
}
.username {
display: none;
}
.message-body p, .message-body li {
font-size: 15px !important;
line-height: 24px !important;
}
.message-body p, .chat .message-body ul, .chat .message-body ol {
margin-bottom: 16px !important;
}
.message-body p:last-child, .chat .message-body ul:last-child, .chat .message-body ol:last-child {
margin-bottom: 0 !important;
}
.dark .message-body p em {
color: rgb(198 202 214) !important;
}
.message-body p em {
color: rgb(110 110 110) !important;
}
.gradio-container .chat .assistant-message {
padding: 20px;
background: var(--background-fill-secondary);
margin-top: 12px !important;
margin-bottom: 24px !important;
margin-right: 16px;
border-radius: 22px;
border-bottom-left-radius: 0;
border: 1px solid var(--border-color-primary);
}
.gradio-container .chat .user-message {
padding: 20px;
background-color: var(--color-accent-soft);
margin-bottom: 12px !important;
margin-left: 16px;
border-radius: 22px;
border-bottom-right-radius: 0;
border: 1px solid var(--border-color-accent-subdued);
}
.gradio-container .chat .assistant-message:last-child, .gradio-container .chat .user-message:last-child {
margin-bottom: 0 !important;
}
code {
background-color: #f3f4f6 !important;
}
.dark code {
background-color: #1f2937 !important;
}

View file

@ -0,0 +1,33 @@
.readable-container {
max-width: 600px;
margin-left: auto;
margin-right: auto;
background-color: rgb(31 41 55);
padding: 3em;
word-break: break-word;
overflow-wrap: anywhere;
color: #efefef !important;
}
.readable-container p, .readable-container li {
font-size: 16px !important;
color: #efefef !important;
margin-bottom: 22px;
line-height: 1.4 !important;
}
.readable-container li > p {
display: inline !important;
}
.readable-container code {
overflow-x: auto;
}
.readable-container :not(pre) > code {
white-space: normal !important;
}
.readable-container .hoverable {
font-size: 14px;
}

View file

@ -0,0 +1,688 @@
.tabs.svelte-710i53 {
margin-top: 0
}
.py-6 {
padding-top: 2.5rem
}
.small-button {
min-width: 0 !important;
max-width: 171px;
height: 39.594px;
align-self: end;
}
.refresh-button {
max-width: 4.4em;
min-width: 2.2em !important;
height: 39.594px;
align-self: end;
line-height: 1em;
border-radius: 0.5em;
flex: none;
}
.refresh-button-small {
max-width: 2.2em;
}
.button_nowrap {
white-space: nowrap;
}
#slim-column {
flex: none !important;
min-width: 0 !important;
}
.slim-dropdown {
background-color: transparent !important;
border: none !important;
padding: 0 !important;
}
#download-label, #upload-label {
min-height: 0
}
.dark svg {
fill: white;
}
.dark a {
color: white !important;
}
ol li p, ul li p {
display: inline-block;
}
#chat-tab, #default-tab, #notebook-tab, #parameters, #chat-settings, #lora, #training-tab, #model-tab, #session-tab {
border: 0;
}
.gradio-container-3-18-0 .prose * h1, h2, h3, h4 {
color: white;
}
.gradio-container {
max-width: 100% !important;
padding-top: 0 !important;
}
#extensions {
margin-top: 5px;
margin-bottom: 35px;
}
.extension-tab {
border: 0 !important;
}
span.math.inline {
font-size: 27px;
vertical-align: baseline !important;
}
div.svelte-15lo0d8 > *, div.svelte-15lo0d8 > .form > * {
flex-wrap: nowrap;
}
.header_bar {
background-color: #f7f7f7;
box-shadow: 0 2px 3px rgba(22 22 22 / 35%);
margin-bottom: 0;
overflow-x: scroll;
margin-left: calc(-1 * var(--size-4));
margin-right: calc(-1 * var(--size-4));
display: block !important;
text-wrap: nowrap;
z-index: 90;
}
.dark .header_bar {
border: none !important;
box-shadow: 0 3px 4px rgba(20 20 20 / 60%);
background-color: #8080802b;
}
.header_bar button.selected {
border-radius: 0;
}
.textbox_default textarea {
height: calc(100dvh - 271px);
}
.textbox_default_output textarea {
height: calc(100dvh - 185px);
}
.textbox textarea {
height: calc(100dvh - 241px);
}
.textbox_logits textarea {
height: calc(100dvh - 236px);
}
.textbox_logits_notebook textarea {
height: calc(100dvh - 292px);
}
.monospace textarea {
font-family: monospace;
}
.textbox_default textarea,
.textbox_default_output textarea,
.textbox_logits textarea,
.textbox_logits_notebook textarea,
.textbox textarea {
font-size: 16px !important;
color: #46464A !important;
}
.dark textarea {
color: #efefef !important;
}
@media screen and (width <= 711px) {
.textbox_default textarea {
height: calc(100dvh - 259px);
}
div .default-token-counter {
top: calc( 0.5 * (100dvh - 236px) ) !important;
}
.transparent-substring {
display: none;
}
.hover-menu {
min-width: 250px !important;
}
}
/* Hide the gradio footer */
footer {
display: none !important;
}
button {
font-size: 14px !important;
}
.file-saver {
position: fixed !important;
height: 100%;
z-index: 1000;
background-color: rgb(0 0 0 / 50%) !important;
margin-left: -20px;
margin-right: -20px;
}
.file-saver > :first-child {
position: fixed !important;
top: 50%;
left: 50%;
transform: translate(-50%, -50%); /* center horizontally */
width: 100%;
max-width: 500px;
background-color: var(--input-background-fill);
border: var(--input-border-width) solid var(--input-border-color) !important;
}
.file-saver > :first-child > :nth-child(2) {
background: var(--block-background-fill);
}
.checkboxgroup-table label {
background: none !important;
padding: 0 !important;
border: 0 !important;
}
.checkboxgroup-table div {
display: grid !important;
}
.markdown ul ol {
font-size: 100% !important;
}
.pretty_scrollbar::-webkit-scrollbar {
width: 5px;
}
.pretty_scrollbar::-webkit-scrollbar-track {
background: transparent;
}
.pretty_scrollbar::-webkit-scrollbar-thumb,
.pretty_scrollbar::-webkit-scrollbar-thumb:hover {
background: #c5c5d2;
}
.dark .pretty_scrollbar::-webkit-scrollbar-thumb,
.dark .pretty_scrollbar::-webkit-scrollbar-thumb:hover {
background: #374151;
}
.pretty_scrollbar::-webkit-resizer {
background: #c5c5d2;
}
.dark .pretty_scrollbar::-webkit-resizer {
background: #374151;
}
audio {
max-width: 100%;
}
/* Copied from https://github.com/AUTOMATIC1111/stable-diffusion-webui */
.token-counter {
position: absolute !important;
top: calc( 0.5 * (100dvh - 218px) ) !important;
right: 2px;
z-index: 100;
background: var(--input-background-fill) !important;
min-height: 0 !important;
}
.default-token-counter {
top: calc( 0.5 * (100dvh - 248px) ) !important;
}
.token-counter span {
padding: 1px;
box-shadow: 0 0 0 0.3em rgb(192 192 192 / 15%), inset 0 0 0.6em rgb(192 192 192 / 7.5%);
border: 2px solid rgb(192 192 192 / 40%) !important;
border-radius: 0.4em;
}
.no-background {
background: var(--background-fill-primary) !important;
padding: 0 !important;
}
/* ----------------------------------------------
Chat tab
---------------------------------------------- */
.h-\[40vh\], .wrap.svelte-byatnx.svelte-byatnx.svelte-byatnx {
height: 66.67vh
}
.gradio-container {
margin-left: auto !important;
margin-right: auto !important;
}
.w-screen {
width: unset
}
div.svelte-362y77>*, div.svelte-362y77>.form>* {
flex-wrap: nowrap
}
.pending.svelte-1ed2p3z {
opacity: 1;
}
.wrap.svelte-6roggh.svelte-6roggh {
max-height: 92.5%;
}
/* This is for the microphone button in the whisper extension */
.sm.svelte-1ipelgc {
width: 100%;
}
#chat-tab {
padding-top: 0;
}
#chat-tab button#Generate, #chat-tab button#stop {
width: 89.3438px !important;
}
#chat-tab button, #notebook-tab button, #default-tab button {
min-width: 0 !important;
}
#chat-tab > :first-child, #extensions {
max-width: 880px;
margin-left: auto;
margin-right: auto;
}
@media screen and (width <= 688px) {
#chat-tab {
padding-left: 0;
padding-right: 0;
}
}
.chat {
margin-left: auto;
margin-right: auto;
max-width: 880px;
min-height: var(--chat-height);
overflow-y: auto;
padding-right: 15px;
display: flex;
flex-direction: column;
word-break: break-word;
overflow-wrap: anywhere;
border-top: none;
border-radius: 0 0 0 8px;
}
.chat-parent {
height: calc(100dvh - 98px - var(--header-height) - var(--input-delta));
overflow: auto !important;
border-radius: 0 !important;
margin-bottom: var(--input-delta) !important;
}
.old-ui .chat-parent {
height: calc(100dvh - 192px - var(--header-height) - var(--input-delta));
margin-bottom: var(--input-delta) !important;
}
.chat-parent.bigchat {
height: calc(100dvh - 98px - var(--header-height) - var(--input-delta)) !important;
margin-bottom: var(--input-delta) !important;
}
.chat > .messages {
display: flex;
flex-direction: column;
padding-top: 25px;
}
.chat .message:last-child {
margin-bottom: 0 !important;
padding-bottom: 0 !important;
}
.message-body li {
list-style-position: outside;
}
.chat .message-body ul, .chat .message-body ol {
padding-inline-start: 2em;
}
.message-body li:not(:last-child) {
margin-top: 0 !important;
margin-bottom: 2px !important;
}
.message-body li:last-child {
margin-bottom: 0 !important;
}
.message-body li > p {
display: inline !important;
}
.message-body ul, .message-body ol {
font-size: 15px !important;
}
.message-body ul {
list-style-type: disc !important;
}
.message-body pre:not(:last-child) {
margin-bottom: 35.625px !important;
}
.message-body pre:last-child {
margin-bottom: 0 !important;
}
.message-body code {
white-space: pre-wrap !important;
word-wrap: break-word !important;
border: 1px solid var(--border-color-primary);
border-radius: var(--radius-sm);
background: var(--background-fill-secondary);
font-size: 90%;
padding: 1px 3px;
}
.message-body pre > code {
display: block;
padding: 15px;
}
.message-body :not(pre) > code {
white-space: normal !important;
}
#chat-input {
padding: 0;
padding-top: 18px;
background: transparent;
border: none;
}
#chat-input textarea:focus {
box-shadow: none !important;
}
#chat-input > :first-child {
background-color: transparent;
}
#chat-input .progress-text {
display: none;
}
@media print {
body {
visibility: hidden;
}
.chat {
visibility: visible;
position: absolute;
left: 0;
top: 0;
max-width: unset;
max-height: unset;
width: 100%;
overflow-y: visible;
}
.message {
break-inside: avoid;
}
.gradio-container {
overflow: visible;
}
.tab-nav {
display: none !important;
}
#chat-tab > :first-child {
max-width: unset;
}
}
#show-controls {
position: absolute;
height: 100%;
background-color: var(--background-fill-primary);
border: 0 !important;
border-radius: 0;
}
#show-controls label {
z-index: 1000;
position: absolute;
left: calc(100% - 168px);
}
#show-controls span {
opacity: 0.6;
}
#typing-container {
display: none;
position: absolute;
background-color: transparent;
left: -2px;
padding: var(--block-padding);
}
.typing {
position: relative;
}
.visible-dots #typing-container {
display: block;
}
.typing span {
content: '';
animation: blink 1.5s infinite;
animation-fill-mode: both;
height: 10px;
width: 10px;
background: #3b5998;;
position: absolute;
left:0;
top:0;
border-radius: 50%;
}
.typing .dot1 {
animation-delay: .2s;
margin-left: calc(10px * 1.5);
}
.typing .dot2 {
animation-delay: .4s;
margin-left: calc(10px * 3);
}
@keyframes blink {
0% {
opacity: .1;
}
20% {
opacity: 1;
}
100% {
opacity: .1;
}
}
#chat-tab .generating {
display: none !important;
}
.hover-element {
position: relative;
font-size: 24px;
}
.hover-menu {
display: none;
position: absolute;
bottom: 80%;
left: 0;
background-color: var(--background-fill-secondary);
box-shadow: 0 0 10px rgb(0 0 0 / 50%);
z-index: 10000;
min-width: 330px;
flex-direction: column;
}
.hover-menu button {
width: 100%;
background: transparent !important;
border-radius: 0 !important;
justify-content: space-between;
margin: 0 !important;
height: 36px;
}
.hover-menu button:not(#clear-history-confirm) {
border-bottom: 0 !important;
}
.hover-menu button:not(#clear-history-confirm):last-child {
border-bottom: var(--button-border-width) solid var(--button-secondary-border-color) !important;
}
.hover-menu button:hover {
background: var(--button-secondary-background-fill-hover) !important;
}
.transparent-substring {
opacity: 0.333;
}
#chat-tab:not(.old-ui) #chat-buttons {
display: none !important;
}
#gr-hover-container {
min-width: 0 !important;
display: flex;
flex-direction: column-reverse;
padding-right: 20px;
padding-bottom: 3px;
flex-grow: 0 !important;
}
#generate-stop-container {
min-width: 0 !important;
display: flex;
flex-direction: column-reverse;
padding-bottom: 3px;
flex: 0 auto !important;
}
#chat-input-container {
min-width: 0 !important;
}
#chat-input-container > .form {
background: transparent;
border: none;
}
#chat-input-row {
padding-bottom: 20px;
}
.old-ui #chat-input-row, #chat-input-row.bigchat {
padding-bottom: 0 !important;
}
#chat-col {
padding-bottom: 100px;
}
.old-ui #chat-col, #chat-col.bigchat {
padding-bottom: 80px !important;
}
.old-ui #chat-buttons #clear-history-confirm {
order: -1;
}
.chat ol, .chat ul {
margin-top: 6px !important;
}
/* ----------------------------------------------
Past chats menus
---------------------------------------------- */
#past-chats-row {
margin-bottom: calc( -1 * var(--layout-gap) );
}
#rename-row label {
margin-top: var(--layout-gap);
}
/* ----------------------------------------------
Keep dropdown menus above errored components
---------------------------------------------- */
.options {
z-index: 100 !important;
}
/* ----------------------------------------------
Big profile picture for characters
---------------------------------------------- */
.bigProfilePicture {
position: fixed;
bottom: 0;
left: 0;
width: calc((100vw - 880px - 120px) /2);
}
.pfp_character:hover {
cursor: pointer;
}
@media screen and (width <= 1300px) {
.bigProfilePicture {
display: none;
}
}

View file

@ -0,0 +1,93 @@
#
# Copyright 2016 The BigDL Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# This file is adapted from
# https://github.com/oobabooga/text-generation-webui/blob/main/extensions/character_bias/script.py
import os
import gradio as gr
# get the current directory of the script
current_dir = os.path.dirname(os.path.abspath(__file__))
# check if the bias_options.txt file exists, if not, create it
bias_file = os.path.join(current_dir, "bias_options.txt")
if not os.path.isfile(bias_file):
with open(bias_file, "w") as f:
f.write("*I am so happy*\n*I am so sad*\n*I am so excited*\n*I am so bored*\n*I am so angry*")
# read bias options from the text file
with open(bias_file, "r") as f:
bias_options = [line.strip() for line in f.readlines()]
params = {
"activate": True,
"bias string": " *I am so happy*",
"custom string": "",
}
def input_modifier(string):
"""
This function is applied to your text inputs before
they are fed into the model.
"""
return string
def output_modifier(string):
"""
This function is applied to the model outputs.
"""
return string
def bot_prefix_modifier(string):
"""
This function is only applied in chat mode. It modifies
the prefix text for the Bot and can be used to bias its
behavior.
"""
if params['activate']:
if params['custom string'].strip() != '':
return f'{string} {params["custom string"].strip()} '
else:
return f'{string} {params["bias string"].strip()} '
else:
return string
def ui():
# Gradio elements
activate = gr.Checkbox(value=params['activate'], label='Activate character bias')
dropdown_string = gr.Dropdown(choices=bias_options, value=params["bias string"], label='Character bias', info='To edit the options in this dropdown edit the "bias_options.txt" file')
custom_string = gr.Textbox(value=params['custom string'], placeholder="Enter custom bias string", label="Custom Character Bias", info='If not empty, will be used instead of the value above')
# Event functions to update the parameters in the backend
def update_bias_string(x):
if x:
params.update({"bias string": x})
else:
params.update({"bias string": dropdown_string.get()})
return x
def update_custom_string(x):
params.update({"custom string": x})
dropdown_string.change(update_bias_string, dropdown_string, None)
custom_string.change(update_custom_string, custom_string, None)
activate.change(lambda x: params.update({"activate": x}), activate, None)

View file

@ -0,0 +1,720 @@
The birch canoe slid on the smooth planks.
Glue the sheet to the dark blue background.
It's easy to tell the depth of a well.
These days a chicken leg is a rare dish.
Rice is often served in round bowls.
The juice of lemons makes fine punch.
The box was thrown beside the parked truck.
The hogs were fed chopped corn and garbage.
Four hours of steady work faced us.
A large size in stockings is hard to sell.
The boy was there when the sun rose.
A rod is used to catch pink salmon.
The source of the huge river is the clear spring.
Kick the ball straight and follow through.
Help the woman get back to her feet.
A pot of tea helps to pass the evening.
Smoky fires lack flame and heat.
The soft cushion broke the man's fall.
The salt breeze came across from the sea.
The girl at the booth sold fifty bonds.
The small pup gnawed a hole in the sock.
The fish twisted and turned on the bent hook.
Press the pants and sew a button on the vest.
The swan dive was far short of perfect.
The beauty of the view stunned the young boy.
Two blue fish swam in the tank.
Her purse was full of useless trash.
The colt reared and threw the tall rider.
It snowed, rained, and hailed the same morning.
Read verse out loud for pleasure.
Hoist the load to your left shoulder.
Take the winding path to reach the lake.
Note closely the size of the gas tank.
Wipe the grease off his dirty face.
Mend the coat before you go out.
The wrist was badly strained and hung limp.
The stray cat gave birth to kittens.
The young girl gave no clear response.
The meal was cooked before the bell rang.
What joy there is in living.
A king ruled the state in the early days.
The ship was torn apart on the sharp reef.
Sickness kept him home the third week.
The wide road shimmered in the hot sun.
The lazy cow lay in the cool grass.
Lift the square stone over the fence.
The rope will bind the seven books at once.
Hop over the fence and plunge in.
The friendly gang left the drug store.
Mesh wire keeps chicks inside.
The frosty air passed through the coat.
The crooked maze failed to fool the mouse.
Adding fast leads to wrong sums.
The show was a flop from the very start.
A saw is a tool used for making boards.
The wagon moved on well oiled wheels.
March the soldiers past the next hill.
A cup of sugar makes sweet fudge.
Place a rosebush near the porch steps.
Both lost their lives in the raging storm.
We talked of the side show in the circus.
Use a pencil to write the first draft.
He ran half way to the hardware store.
The clock struck to mark the third period.
A small creek cut across the field.
Cars and busses stalled in snow drifts.
The set of china hit the floor with a crash.
This is a grand season for hikes on the road.
The dune rose from the edge of the water.
Those words were the cue for the actor to leave.
A yacht slid around the point into the bay.
The two met while playing on the sand.
The ink stain dried on the finished page.
The walled town was seized without a fight.
The lease ran out in sixteen weeks.
A tame squirrel makes a nice pet.
The horn of the car woke the sleeping cop.
The heart beat strongly and with firm strokes.
The pearl was worn in a thin silver ring.
The fruit peel was cut in thick slices.
The Navy attacked the big task force.
See the cat glaring at the scared mouse.
There are more than two factors here.
The hat brim was wide and too droopy.
The lawyer tried to lose his case.
The grass curled around the fence post.
Cut the pie into large parts.
Men strive but seldom get rich.
Always close the barn door tight.
He lay prone and hardly moved a limb.
The slush lay deep along the street.
A wisp of cloud hung in the blue air.
A pound of sugar costs more than eggs.
The fin was sharp and cut the clear water.
The play seems dull and quite stupid.
Bail the boat to stop it from sinking.
The term ended in late June that year.
A tusk is used to make costly gifts.
Ten pins were set in order.
The bill was paid every third week.
Oak is strong and also gives shade.
Cats and dogs each hate the other.
The pipe began to rust while new.
Open the crate but don't break the glass.
Add the sum to the product of these three.
Thieves who rob friends deserve jail.
The ripe taste of cheese improves with age.
Act on these orders with great speed.
The hog crawled under the high fence.
Move the vat over the hot fire.
The bark of the pine tree was shiny and dark.
Leaves turn brown and yellow in the fall.
The pennant waved when the wind blew.
Split the log with a quick, sharp blow.
Burn peat after the logs give out.
He ordered peach pie with ice cream.
Weave the carpet on the right hand side.
Hemp is a weed found in parts of the tropics.
A lame back kept his score low.
We find joy in the simplest things.
Type out three lists of orders.
The harder he tried the less he got done.
The boss ran the show with a watchful eye.
The cup cracked and spilled its contents.
Paste can cleanse the most dirty brass.
The slang word for raw whiskey is booze.
It caught its hind paw in a rusty trap.
The wharf could be seen at the farther shore.
Feel the heat of the weak dying flame.
The tiny girl took off her hat.
A cramp is no small danger on a swim.
He said the same phrase thirty times.
Pluck the bright rose without leaves.
Two plus seven is less than ten.
The glow deepened in the eyes of the sweet girl.
Bring your problems to the wise chief.
Write a fond note to the friend you cherish.
Clothes and lodging are free to new men.
We frown when events take a bad turn.
Port is a strong wine with a smoky taste.
The young kid jumped the rusty gate.
Guess the results from the first scores.
A salt pickle tastes fine with ham.
The just claim got the right verdict.
These thistles bend in a high wind.
Pure bred poodles have curls.
The tree top waved in a graceful way.
The spot on the blotter was made by green ink.
Mud was spattered on the front of his white shirt.
The cigar burned a hole in the desk top.
The empty flask stood on the tin tray.
A speedy man can beat this track mark.
He broke a new shoelace that day.
The coffee stand is too high for the couch.
The urge to write short stories is rare.
The pencils have all been used.
The pirates seized the crew of the lost ship.
We tried to replace the coin but failed.
She sewed the torn coat quite neatly.
The sofa cushion is red and of light weight.
The jacket hung on the back of the wide chair.
At that high level the air is pure.
Drop the two when you add the figures.
A filing case is now hard to buy.
An abrupt start does not win the prize.
Wood is best for making toys and blocks.
The office paint was a dull, sad tan.
He knew the skill of the great young actress.
A rag will soak up spilled water.
A shower of dirt fell from the hot pipes.
Steam hissed from the broken valve.
The child almost hurt the small dog.
There was a sound of dry leaves outside.
The sky that morning was clear and bright blue.
Torn scraps littered the stone floor.
Sunday is the best part of the week.
The doctor cured him with these pills.
The new girl was fired today at noon.
They felt gay when the ship arrived in port.
Add the store's account to the last cent.
Acid burns holes in wool cloth.
Fairy tales should be fun to write.
Eight miles of woodland burned to waste.
The third act was dull and tired the players.
A young child should not suffer fright.
Add the column and put the sum here.
We admire and love a good cook.
There the flood mark is ten inches.
He carved a head from the round block of marble.
She has a smart way of wearing clothes.
The fruit of a fig tree is apple-shaped.
Corn cobs can be used to kindle a fire.
Where were they when the noise started.
The paper box is full of thumb tacks.
Sell your gift to a buyer at a good gain.
The tongs lay beside the ice pail.
The petals fall with the next puff of wind.
Bring your best compass to the third class.
They could laugh although they were sad.
Farmers came in to thresh the oat crop.
The brown house was on fire to the attic.
The lure is used to catch trout and flounder.
Float the soap on top of the bath water.
A blue crane is a tall wading bird.
A fresh start will work such wonders.
The club rented the rink for the fifth night.
After the dance, they went straight home.
The hostess taught the new maid to serve.
He wrote his last novel there at the inn.
Even the worst will beat his low score.
The cement had dried when he moved it.
The loss of the second ship was hard to take.
The fly made its way along the wall.
Do that with a wooden stick.
Live wires should be kept covered.
The large house had hot water taps.
It is hard to erase blue or red ink.
Write at once or you may forget it.
The doorknob was made of bright clean brass.
The wreck occurred by the bank on Main Street.
A pencil with black lead writes best.
Coax a young calf to drink from a bucket.
Schools for ladies teach charm and grace.
The lamp shone with a steady green flame.
They took the axe and the saw to the forest.
The ancient coin was quite dull and worn.
The shaky barn fell with a loud crash.
Jazz and swing fans like fast music.
Rake the rubbish up and then burn it.
Slash the gold cloth into fine ribbons.
Try to have the court decide the case.
They are pushed back each time they attack.
He broke his ties with groups of former friends.
They floated on the raft to sun their white backs.
The map had an X that meant nothing.
Whitings are small fish caught in nets.
Some ads serve to cheat buyers.
Jerk the rope and the bell rings weakly.
A waxed floor makes us lose balance.
Madam, this is the best brand of corn.
On the islands the sea breeze is soft and mild.
The play began as soon as we sat down.
This will lead the world to more sound and fury.
Add salt before you fry the egg.
The rush for funds reached its peak Tuesday.
The birch looked stark white and lonesome.
The box is held by a bright red snapper.
To make pure ice, you freeze water.
The first worm gets snapped early.
Jump the fence and hurry up the bank.
Yell and clap as the curtain slides back.
They are men who walk the middle of the road.
Both brothers wear the same size.
In some form or other we need fun.
The prince ordered his head chopped off.
The houses are built of red clay bricks.
Ducks fly north but lack a compass.
Fruit flavors are used in fizz drinks.
These pills do less good than others.
Canned pears lack full flavor.
The dark pot hung in the front closet.
Carry the pail to the wall and spill it there.
The train brought our hero to the big town.
We are sure that one war is enough.
Gray paint stretched for miles around.
The rude laugh filled the empty room.
High seats are best for football fans.
Tea served from the brown jug is tasty.
A dash of pepper spoils beef stew.
A zestful food is the hot-cross bun.
The horse trotted around the field at a brisk pace.
Find the twin who stole the pearl necklace.
Cut the cord that binds the box tightly.
The red tape bound the smuggled food.
Look in the corner to find the tan shirt.
The cold drizzle will halt the bond drive.
Nine men were hired to dig the ruins.
The junk yard had a mouldy smell.
The flint sputtered and lit a pine torch.
Soak the cloth and drown the sharp odor.
The shelves were bare of both jam or crackers.
A joy to every child is the swan boat.
All sat frozen and watched the screen.
A cloud of dust stung his tender eyes.
To reach the end he needs much courage.
Shape the clay gently into block form.
A ridge on a smooth surface is a bump or flaw.
Hedge apples may stain your hands green.
Quench your thirst, then eat the crackers.
Tight curls get limp on rainy days.
The mute muffled the high tones of the horn.
The gold ring fits only a pierced ear.
The old pan was covered with hard fudge.
Watch the log float in the wide river.
The node on the stalk of wheat grew daily.
The heap of fallen leaves was set on fire.
Write fast if you want to finish early.
His shirt was clean but one button was gone.
The barrel of beer was a brew of malt and hops.
Tin cans are absent from store shelves.
Slide the box into that empty space.
The plant grew large and green in the window.
The beam dropped down on the workmen's head.
Pink clouds floated with the breeze.
She danced like a swan, tall and graceful.
The tube was blown and the tire flat and useless.
It is late morning on the old wall clock.
Let's all join as we sing the last chorus.
The last switch cannot be turned off.
The fight will end in just six minutes.
The store walls were lined with colored frocks.
The peace league met to discuss their plans.
The rise to fame of a person takes luck.
Paper is scarce, so write with much care.
The quick fox jumped on the sleeping cat.
The nozzle of the fire hose was bright brass.
Screw the round cap on as tight as needed.
Time brings us many changes.
The purple tie was ten years old.
Men think and plan and sometimes act.
Fill the ink jar with sticky glue.
He smoke a big pipe with strong contents.
We need grain to keep our mules healthy.
Pack the records in a neat thin case.
The crunch of feet in the snow was the only sound.
The copper bowl shone in the sun's rays.
Boards will warp unless kept dry.
The plush chair leaned against the wall.
Glass will clink when struck by metal.
Bathe and relax in the cool green grass.
Nine rows of soldiers stood in line.
The beach is dry and shallow at low tide.
The idea is to sew both edges straight.
The kitten chased the dog down the street.
Pages bound in cloth make a book.
Try to trace the fine lines of the painting.
Women form less than half of the group.
The zones merge in the central part of town.
A gem in the rough needs work to polish.
Code is used when secrets are sent.
Most of the news is easy for us to hear.
He used the lathe to make brass objects.
The vane on top of the pole revolved in the wind.
Mince pie is a dish served to children.
The clan gathered on each dull night.
Let it burn, it gives us warmth and comfort.
A castle built from sand fails to endure.
A child's wit saved the day for us.
Tack the strip of carpet to the worn floor.
Next Tuesday we must vote.
Pour the stew from the pot into the plate.
Each penny shone like new.
The man went to the woods to gather sticks.
The dirt piles were lines along the road.
The logs fell and tumbled into the clear stream.
Just hoist it up and take it away.
A ripe plum is fit for a king's palate.
Our plans right now are hazy.
Brass rings are sold by these natives.
It takes a good trap to capture a bear.
Feed the white mouse some flower seeds.
The thaw came early and freed the stream.
He took the lead and kept it the whole distance.
The key you designed will fit the lock.
Plead to the council to free the poor thief.
Better hash is made of rare beef.
This plank was made for walking on.
The lake sparkled in the red hot sun.
He crawled with care along the ledge.
Tend the sheep while the dog wanders.
It takes a lot of help to finish these.
Mark the spot with a sign painted red.
Take two shares as a fair profit.
The fur of cats goes by many names.
North winds bring colds and fevers.
He asks no person to vouch for him.
Go now and come here later.
A sash of gold silk will trim her dress.
Soap can wash most dirt away.
That move means the game is over.
He wrote down a long list of items.
A siege will crack the strong defense.
Grape juice and water mix well.
Roads are paved with sticky tar.
Fake stones shine but cost little.
The drip of the rain made a pleasant sound.
Smoke poured out of every crack.
Serve the hot rum to the tired heroes.
Much of the story makes good sense.
The sun came up to light the eastern sky.
Heave the line over the port side.
A lathe cuts and trims any wood.
It's a dense crowd in two distinct ways.
His hip struck the knee of the next player.
The stale smell of old beer lingers.
The desk was firm on the shaky floor.
It takes heat to bring out the odor.
Beef is scarcer than some lamb.
Raise the sail and steer the ship northward.
A cone costs five cents on Mondays.
A pod is what peas always grow in.
Jerk the dart from the cork target.
No cement will hold hard wood.
We now have a new base for shipping.
A list of names is carved around the base.
The sheep were led home by a dog.
Three for a dime, the young peddler cried.
The sense of smell is better than that of touch.
No hardship seemed to keep him sad.
Grace makes up for lack of beauty.
Nudge gently but wake her now.
The news struck doubt into restless minds.
Once we stood beside the shore.
A chink in the wall allowed a draft to blow.
Fasten two pins on each side.
A cold dip restores health and zest.
He takes the oath of office each March.
The sand drifts over the sill of the old house.
The point of the steel pen was bent and twisted.
There is a lag between thought and act.
Seed is needed to plant the spring corn.
Draw the chart with heavy black lines.
The boy owed his pal thirty cents.
The chap slipped into the crowd and was lost.
Hats are worn to tea and not to dinner.
The ramp led up to the wide highway.
Beat the dust from the rug onto the lawn.
Say it slowly but make it ring clear.
The straw nest housed five robins.
Screen the porch with woven straw mats.
This horse will nose his way to the finish.
The dry wax protects the deep scratch.
He picked up the dice for a second roll.
These coins will be needed to pay his debt.
The nag pulled the frail cart along.
Twist the valve and release hot steam.
The vamp of the shoe had a gold buckle.
The smell of burned rags itches my nose.
New pants lack cuffs and pockets.
The marsh will freeze when cold enough.
They slice the sausage thin with a knife.
The bloom of the rose lasts a few days.
A gray mare walked before the colt.
Breakfast buns are fine with a hot drink.
Bottles hold four kinds of rum.
The man wore a feather in his felt hat.
He wheeled the bike past the winding road.
Drop the ashes on the worn old rug.
The desk and both chairs were painted tan.
Throw out the used paper cup and plate.
A clean neck means a neat collar.
The couch cover and hall drapes were blue.
The stems of the tall glasses cracked and broke.
The wall phone rang loud and often.
The clothes dried on a thin wooden rack.
Turn on the lantern which gives us light.
The cleat sank deeply into the soft turf.
The bills were mailed promptly on the tenth of the month.
To have is better than to wait and hope.
The price is fair for a good antique clock.
The music played on while they talked.
Dispense with a vest on a day like this.
The bunch of grapes was pressed into wine.
He sent the figs, but kept the ripe cherries.
The hinge on the door creaked with old age.
The screen before the fire kept in the sparks.
Fly by night, and you waste little time.
Thick glasses helped him read the print.
Birth and death mark the limits of life.
The chair looked strong but had no bottom.
The kite flew wildly in the high wind.
A fur muff is stylish once more.
The tin box held priceless stones.
We need an end of all such matter.
The case was puzzling to the old and wise.
The bright lanterns were gay on the dark lawn.
We don't get much money but we have fun.
The youth drove with zest, but little skill.
Five years he lived with a shaggy dog.
A fence cuts through the corner lot.
The way to save money is not to spend much.
Shut the hatch before the waves push it in.
The odor of spring makes young hearts jump.
Crack the walnut with your sharp side teeth.
He offered proof in the form of a large chart.
Send the stuff in a thick paper bag.
A quart of milk is water for the most part.
They told wild tales to frighten him.
The three story house was built of stone.
In the rear of the ground floor was a large passage.
A man in a blue sweater sat at the desk.
Oats are a food eaten by horse and man.
Their eyelids droop for want of sleep.
A sip of tea revives his tired friend.
There are many ways to do these things.
Tuck the sheet under the edge of the mat.
A force equal to that would move the earth.
We like to see clear weather.
The work of the tailor is seen on each side.
Take a chance and win a china doll.
Shake the dust from your shoes, stranger.
She was kind to sick old people.
The square wooden crate was packed to be shipped.
The dusty bench stood by the stone wall.
We dress to suit the weather of most days.
Smile when you say nasty words.
A bowl of rice is free with chicken stew.
The water in this well is a source of good health.
Take shelter in this tent, but keep still.
That guy is the writer of a few banned books.
The little tales they tell are false.
The door was barred, locked, and bolted as well.
Ripe pears are fit for a queen's table.
A big wet stain was on the round carpet.
The kite dipped and swayed, but stayed aloft.
The pleasant hours fly by much too soon.
The room was crowded with a wild mob.
This strong arm shall shield your honor.
She blushed when he gave her a white orchid.
The beetle droned in the hot June sun.
Press the pedal with your left foot.
Neat plans fail without luck.
The black trunk fell from the landing.
The bank pressed for payment of the debt.
The theft of the pearl pin was kept secret.
Shake hands with this friendly child.
The vast space stretched into the far distance.
A rich farm is rare in this sandy waste.
His wide grin earned many friends.
Flax makes a fine brand of paper.
Hurdle the pit with the aid of a long pole.
A strong bid may scare your partner stiff.
Even a just cause needs power to win.
Peep under the tent and see the clowns.
The leaf drifts along with a slow spin.
Cheap clothes are flashy but don't last.
A thing of small note can cause despair.
Flood the mails with requests for this book.
A thick coat of black paint covered all.
The pencil was cut to be sharp at both ends.
Those last words were a strong statement.
He wrote his name boldly at the top of the sheet.
Dill pickles are sour but taste fine.
Down that road is the way to the grain farmer.
Either mud or dust are found at all times.
The best method is to fix it in place with clips.
If you mumble your speech will be lost.
At night the alarm roused him from a deep sleep.
Read just what the meter says.
Fill your pack with bright trinkets for the poor.
The small red neon lamp went out.
Clams are small, round, soft, and tasty.
The fan whirled its round blades softly.
The line where the edges join was clean.
Breathe deep and smell the piny air.
It matters not if he reads these words or those.
A brown leather bag hung from its strap.
A toad and a frog are hard to tell apart.
A white silk jacket goes with any shoes.
A break in the dam almost caused a flood.
Paint the sockets in the wall dull green.
The child crawled into the dense grass.
Bribes fail where honest men work.
Trample the spark, else the flames will spread.
The hilt of the sword was carved with fine designs.
A round hole was drilled through the thin board.
Footprints showed the path he took up the beach.
She was waiting at my front lawn.
A vent near the edge brought in fresh air.
Prod the old mule with a crooked stick.
It is a band of steel three inches wide.
The pipe ran almost the length of the ditch.
It was hidden from sight by a mass of leaves and shrubs.
The weight of the package was seen on the high scale.
Wake and rise, and step into the green outdoors.
The green light in the brown box flickered.
The brass tube circled the high wall.
The lobes of her ears were pierced to hold rings.
Hold the hammer near the end to drive the nail.
Next Sunday is the twelfth of the month.
Every word and phrase he speaks is true.
He put his last cartridge into the gun and fired.
They took their kids from the public school.
Drive the screw straight into the wood.
Keep the hatch tight and the watch constant.
Sever the twine with a quick snip of the knife.
Paper will dry out when wet.
Slide the catch back and open the desk.
Help the weak to preserve their strength.
A sullen smile gets few friends.
Stop whistling and watch the boys march.
Jerk the cord, and out tumbles the gold.
Slide the tray across the glass top.
The cloud moved in a stately way and was gone.
Light maple makes for a swell room.
Set the piece here and say nothing.
Dull stories make her laugh.
A stiff cord will do to fasten your shoe.
Get the trust fund to the bank early.
Choose between the high road and the low.
A plea for funds seems to come again.
He lent his coat to the tall gaunt stranger.
There is a strong chance it will happen once more.
The duke left the park in a silver coach.
Greet the new guests and leave quickly.
When the frost has come it is time for turkey.
Sweet words work better than fierce.
A thin stripe runs down the middle.
A six comes up more often than a ten.
Lush fern grow on the lofty rocks.
The ram scared the school children off.
The team with the best timing looks good.
The farmer swapped his horse for a brown ox.
Sit on the perch and tell the others what to do.
A steep trail is painful for our feet.
The early phase of life moves fast.
Green moss grows on the northern side.
Tea in thin china has a sweet taste.
Pitch the straw through the door of the stable.
The latch on the back gate needed a nail.
The goose was brought straight from the old market.
The sink is the thing in which we pile dishes.
A whiff of it will cure the most stubborn cold.
The facts don't always show who is right.
She flaps her cape as she parades the street.
The loss of the cruiser was a blow to the fleet.
Loop the braid to the left and then over.
Plead with the lawyer to drop the lost cause.
Calves thrive on tender spring grass.
Post no bills on this office wall.
Tear a thin sheet from the yellow pad.
A cruise in warm waters in a sleek yacht is fun.
A streak of color ran down the left edge.
It was done before the boy could see it.
Crouch before you jump or miss the mark.
Pack the kits and don't forget the salt.
The square peg will settle in the round hole.
Fine soap saves tender skin.
Poached eggs and tea must suffice.
Bad nerves are jangled by a door slam.
Ship maps are different from those for planes.
Dimes showered down from all sides.
They sang the same tunes at each party.
The sky in the west is tinged with orange red.
The pods of peas ferment in bare fields.
The horse balked and threw the tall rider.
The hitch between the horse and cart broke.
Pile the coal high in the shed corner.
A gold vase is both rare and costly.
The knife was hung inside its bright sheath.
The rarest spice comes from the far East.
The roof should be tilted at a sharp slant.
A smatter of French is worse than none.
The mule trod the treadmill day and night.
The aim of the contest is to raise a great fund.
To send it now in large amounts is bad.
There is a fine hard tang in salty air.
Cod is the main business of the north shore.
The slab was hewn from heavy blocks of slate.
Dunk the stale biscuits into strong drink.
Hang tinsel from both branches.
Cap the jar with a tight brass cover.
The poor boy missed the boat again.
Be sure to set the lamp firmly in the hole.
Pick a card and slip it under the pack.
A round mat will cover the dull spot.
The first part of the plan needs changing.
A good book informs of what we ought to know.
The mail comes in three batches per day.
You cannot brew tea in a cold pot.
Dots of light betrayed the black cat.
Put the chart on the mantel and tack it down.
The night shift men rate extra pay.
The red paper brightened the dim stage.
See the player scoot to third base.
Slide the bill between the two leaves.
Many hands help get the job done.
We don't like to admit our small faults.
No doubt about the way the wind blows.
Dig deep in the earth for pirate's gold.
The steady drip is worse than a drenching rain.
A flat pack takes less luggage space.
Green ice frosted the punch bowl.
A stuffed chair slipped from the moving van.
The stitch will serve but needs to be shortened.
A thin book fits in the side pocket.
The gloss on top made it unfit to read.
The hail pattered on the burnt brown grass.
Seven seals were stamped on great sheets.
Our troops are set to strike heavy blows.
The store was jammed before the sale could start.
It was a bad error on the part of the new judge.
One step more and the board will collapse.
Take the match and strike it against your shoe.
The pot boiled, but the contents failed to jell.
The baby puts his right foot in his mouth.
The bombs left most of the town in ruins.
Stop and stare at the hard working man.
The streets are narrow and full of sharp turns.
The pup jerked the leash as he saw a feline shape.
Open your book to the first page.
Fish evade the net and swim off.
Dip the pail once and let it settle.
Will you please answer that phone.
The big red apple fell to the ground.
The curtain rose and the show was on.
The young prince became heir to the throne.
He sent the boy on a short errand.
Leave now and you will arrive on time.
The corner store was robbed last night.
A gold ring will please most any girl.
The long journey home took a year.
She saw a cat in the neighbor's house.
A pink shell was found on the sandy beach.
Small children came to see him.
The grass and bushes were wet with dew.
The blind man counted his old coins.
A severe storm tore down the barn.
She called his name many times.
When you hear the bell, come quickly.

View file

@ -0,0 +1,18 @@
{
"Arabic": "ar",
"Chinese": "zh-cn",
"Czech": "cs",
"Dutch": "nl",
"English": "en",
"French": "fr",
"German": "de",
"Hungarian": "hu",
"Italian": "it",
"Japanese": "ja",
"Korean": "ko",
"Polish": "pl",
"Portuguese": "pt",
"Russian": "ru",
"Spanish": "es",
"Turkish": "tr"
}

View file

@ -0,0 +1 @@
TTS==0.21.*

View file

@ -0,0 +1,246 @@
#
# Copyright 2016 The BigDL Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# This file is adapted from
# https://github.com/oobabooga/text-generation-webui/blob/main/extensions/coqui_tts/script.py
import html
import json
import os
import random
import time
from pathlib import Path
import gradio as gr
from TTS.api import TTS
from TTS.utils.synthesizer import Synthesizer
from modules import chat, shared, ui_chat
from modules.ui import create_refresh_button
from modules.utils import gradio
os.environ["COQUI_TOS_AGREED"] = "1"
params = {
"activate": True,
"autoplay": True,
"show_text": False,
"remove_trailing_dots": False,
"voice": "female_01.wav",
"language": "English",
"model_name": "tts_models/multilingual/multi-dataset/xtts_v2",
"device": "cuda"
}
this_dir = str(Path(__file__).parent.resolve())
model = None
with open(Path(f"{this_dir}/languages.json"), encoding='utf8') as f:
languages = json.load(f)
def get_available_voices():
return sorted([voice.name for voice in Path(f"{this_dir}/voices").glob("*.wav")])
def preprocess(raw_input):
raw_input = html.unescape(raw_input)
# raw_input = raw_input.strip("\"")
return raw_input
def new_split_into_sentences(self, text):
sentences = self.seg.segment(text)
if params['remove_trailing_dots']:
sentences_without_dots = []
for sentence in sentences:
if sentence.endswith('.') and not sentence.endswith('...'):
sentence = sentence[:-1]
sentences_without_dots.append(sentence)
return sentences_without_dots
else:
return sentences
Synthesizer.split_into_sentences = new_split_into_sentences
def load_model():
model = TTS(params["model_name"]).to(params["device"])
return model
def remove_tts_from_history(history):
for i, entry in enumerate(history['internal']):
history['visible'][i] = [history['visible'][i][0], entry[1]]
return history
def toggle_text_in_history(history):
for i, entry in enumerate(history['visible']):
visible_reply = entry[1]
if visible_reply.startswith('<audio'):
if params['show_text']:
reply = history['internal'][i][1]
history['visible'][i] = [history['visible'][i][0], f"{visible_reply.split('</audio>')[0]}</audio>\n\n{reply}"]
else:
history['visible'][i] = [history['visible'][i][0], f"{visible_reply.split('</audio>')[0]}</audio>"]
return history
def random_sentence():
with open(Path("extensions/coqui_tts/harvard_sentences.txt")) as f:
return random.choice(list(f))
def voice_preview(string):
string = html.unescape(string) or random_sentence()
output_file = Path('extensions/coqui_tts/outputs/voice_preview.wav')
model.tts_to_file(
text=string,
file_path=output_file,
speaker_wav=[f"{this_dir}/voices/{params['voice']}"],
language=languages[params["language"]]
)
return f'<audio src="file/{output_file.as_posix()}?{int(time.time())}" controls autoplay></audio>'
def history_modifier(history):
# Remove autoplay from the last reply
if len(history['internal']) > 0:
history['visible'][-1] = [
history['visible'][-1][0],
history['visible'][-1][1].replace('controls autoplay>', 'controls>')
]
return history
def state_modifier(state):
if not params['activate']:
return state
state['stream'] = False
return state
def input_modifier(string, state):
if not params['activate']:
return string
shared.processing_message = "*Is recording a voice message...*"
return string
def output_modifier(string, state):
if not params['activate']:
return string
original_string = string
string = preprocess(html.unescape(string))
if string == '':
string = '*Empty reply, try regenerating*'
else:
output_file = Path(f'extensions/coqui_tts/outputs/{state["character_menu"]}_{int(time.time())}.wav')
model.tts_to_file(
text=string,
file_path=output_file,
speaker_wav=[f"{this_dir}/voices/{params['voice']}"],
language=languages[params["language"]]
)
autoplay = 'autoplay' if params['autoplay'] else ''
string = f'<audio src="file/{output_file.as_posix()}" controls {autoplay}></audio>'
if params['show_text']:
string += f'\n\n{original_string}'
shared.processing_message = "*Is typing...*"
return string
def custom_css():
path_to_css = Path(f"{this_dir}/style.css")
return open(path_to_css, 'r').read()
def setup():
global model
print("[XTTS] Loading XTTS...")
model = load_model()
print("[XTTS] Done!")
Path(f"{this_dir}/outputs").mkdir(parents=True, exist_ok=True)
def ui():
with gr.Accordion("Coqui TTS (XTTSv2)"):
with gr.Row():
activate = gr.Checkbox(value=params['activate'], label='Activate TTS')
autoplay = gr.Checkbox(value=params['autoplay'], label='Play TTS automatically')
with gr.Row():
show_text = gr.Checkbox(value=params['show_text'], label='Show message text under audio player')
remove_trailing_dots = gr.Checkbox(value=params['remove_trailing_dots'], label='Remove trailing "." from text segments before converting to audio')
with gr.Row():
with gr.Row():
voice = gr.Dropdown(get_available_voices(), label="Voice wav", value=params["voice"])
create_refresh_button(voice, lambda: None, lambda: {'choices': get_available_voices(), 'value': params["voice"]}, 'refresh-button')
language = gr.Dropdown(languages.keys(), label="Language", value=params["language"])
with gr.Row():
preview_text = gr.Text(show_label=False, placeholder="Preview text", elem_id="silero_preview_text")
preview_play = gr.Button("Preview")
preview_audio = gr.HTML(visible=False)
with gr.Row():
convert = gr.Button('Permanently replace audios with the message texts')
convert_cancel = gr.Button('Cancel', visible=False)
convert_confirm = gr.Button('Confirm (cannot be undone)', variant="stop", visible=False)
# Convert history with confirmation
convert_arr = [convert_confirm, convert, convert_cancel]
convert.click(lambda: [gr.update(visible=True), gr.update(visible=False), gr.update(visible=True)], None, convert_arr)
convert_confirm.click(
lambda: [gr.update(visible=False), gr.update(visible=True), gr.update(visible=False)], None, convert_arr).then(
remove_tts_from_history, gradio('history'), gradio('history')).then(
chat.save_history, gradio('history', 'unique_id', 'character_menu', 'mode'), None).then(
chat.redraw_html, gradio(ui_chat.reload_arr), gradio('display'))
convert_cancel.click(lambda: [gr.update(visible=False), gr.update(visible=True), gr.update(visible=False)], None, convert_arr)
# Toggle message text in history
show_text.change(
lambda x: params.update({"show_text": x}), show_text, None).then(
toggle_text_in_history, gradio('history'), gradio('history')).then(
chat.save_history, gradio('history', 'unique_id', 'character_menu', 'mode'), None).then(
chat.redraw_html, gradio(ui_chat.reload_arr), gradio('display'))
# Event functions to update the parameters in the backend
activate.change(lambda x: params.update({"activate": x}), activate, None)
autoplay.change(lambda x: params.update({"autoplay": x}), autoplay, None)
remove_trailing_dots.change(lambda x: params.update({"remove_trailing_dots": x}), remove_trailing_dots, None)
voice.change(lambda x: params.update({"voice": x}), voice, None)
language.change(lambda x: params.update({"language": x}), language, None)
# Play preview
preview_text.submit(voice_preview, preview_text, preview_audio)
preview_play.click(voice_preview, preview_text, preview_audio)

View file

@ -0,0 +1,8 @@
.SDAP .hires_opts input[type="number"] {
width: 6em !important;
}
/* silero_tts preview */
.form:has(> #silero_preview_text) {
min-width: 75%
}

View file

@ -0,0 +1,149 @@
#
# Copyright 2016 The BigDL Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# This file is adapted from
# https://github.com/oobabooga/text-generation-webui/blob/main/extensions/example/script.py
import gradio as gr
import torch
from transformers import LogitsProcessor
from modules import chat, shared
from modules.text_generation import (
decode,
encode,
generate_reply,
)
params = {
"display_name": "Example Extension",
"is_tab": False,
}
class MyLogits(LogitsProcessor):
"""
Manipulates the probabilities for the next token before it gets sampled.
Used in the logits_processor_modifier function below.
"""
def __init__(self):
pass
def __call__(self, input_ids, scores):
# probs = torch.softmax(scores, dim=-1, dtype=torch.float)
# probs[0] /= probs[0].sum()
# scores = torch.log(probs / (1 - probs))
return scores
def history_modifier(history):
"""
Modifies the chat history.
Only used in chat mode.
"""
return history
def state_modifier(state):
"""
Modifies the state variable, which is a dictionary containing the input
values in the UI like sliders and checkboxes.
"""
return state
def chat_input_modifier(text, visible_text, state):
"""
Modifies the user input string in chat mode (visible_text).
You can also modify the internal representation of the user
input (text) to change how it will appear in the prompt.
"""
return text, visible_text
def input_modifier(string, state, is_chat=False):
"""
In default/notebook modes, modifies the whole prompt.
In chat mode, it is the same as chat_input_modifier but only applied
to "text", here called "string", and not to "visible_text".
"""
return string
def bot_prefix_modifier(string, state):
"""
Modifies the prefix for the next bot reply in chat mode.
By default, the prefix will be something like "Bot Name:".
"""
return string
def tokenizer_modifier(state, prompt, input_ids, input_embeds):
"""
Modifies the input ids and embeds.
Used by the multimodal extension to put image embeddings in the prompt.
Only used by loaders that use the transformers library for sampling.
"""
return prompt, input_ids, input_embeds
def logits_processor_modifier(processor_list, input_ids):
"""
Adds logits processors to the list, allowing you to access and modify
the next token probabilities.
Only used by loaders that use the transformers library for sampling.
"""
processor_list.append(MyLogits())
return processor_list
def output_modifier(string, state, is_chat=False):
"""
Modifies the LLM output before it gets presented.
In chat mode, the modified version goes into history['visible'],
and the original version goes into history['internal'].
"""
return string
def custom_generate_chat_prompt(user_input, state, **kwargs):
"""
Replaces the function that generates the prompt from the chat history.
Only used in chat mode.
"""
result = chat.generate_chat_prompt(user_input, state, **kwargs)
return result
def custom_css():
"""
Returns a CSS string that gets appended to the CSS for the webui.
"""
return ''
def custom_js():
"""
Returns a javascript string that gets appended to the javascript
for the webui.
"""
return ''
def setup():
"""
Gets executed only once, when the extension is imported.
"""
pass
def ui():
"""
Gets executed when the UI is drawn. Custom gradio elements and
their corresponding event handlers should be defined here.
To learn about gradio components, check out the docs:
https://gradio.app/docs/
"""
pass

View file

@ -0,0 +1,40 @@
let gallery_element = document.getElementById('gallery-extension');
let chat_mode_element = document.getElementById('chat-mode');
let extensions_block = document.getElementById('extensions');
let extensions_block_size = extensions_block.childNodes.length;
let gallery_only = (extensions_block_size == 5);
function gotoFirstPage() {
const firstPageButton = gallery_element.querySelector('.paginate > button');
if (firstPageButton) {
firstPageButton.click();
}
}
document.querySelector('.header_bar').addEventListener('click', function(event) {
if (event.target.tagName === 'BUTTON') {
const buttonText = event.target.textContent.trim();
let chat_visible = (buttonText == 'Chat');
let default_visible = (buttonText == 'Default');
let notebook_visible = (buttonText == 'Notebook');
let chat_mode_visible = (chat_mode_element.offsetHeight > 0 && chat_mode_element.offsetWidth > 0);
// Only show this extension in the Chat tab
if (chat_visible) {
if (chat_mode_visible) {
gallery_element.style.display = 'block';
extensions_block.style.display = '';
} else {
gallery_element.style.display = 'none';
extensions_block.style.display = 'none';
}
} else {
gallery_element.style.display = 'none';
if (gallery_only) {
extensions_block.style.display = 'none';
}
}
}
});

View file

@ -0,0 +1,148 @@
#
# Copyright 2016 The BigDL Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# This file is adapted from
# https://github.com/oobabooga/text-generation-webui/blob/main/extensions/gallery/script.py
from pathlib import Path
import gradio as gr
from modules.html_generator import get_image_cache
from modules.shared import gradio, settings
cards = []
def generate_css():
css = """
.highlighted-border {
border-color: rgb(249, 115, 22) !important;
}
.character-gallery > .gallery {
margin: 1rem 0;
display: grid !important;
grid-template-columns: repeat(auto-fit, minmax(150px, 1fr));
grid-column-gap: 0.4rem;
grid-row-gap: 1.2rem;
}
.character-gallery > .label {
display: none !important;
}
.character-gallery button.gallery-item {
display: contents;
}
.character-container {
cursor: pointer;
text-align: center;
position: relative;
opacity: 0.85;
}
.character-container:hover {
opacity: 1;
}
.character-container .placeholder, .character-container img {
width: 150px;
height: 200px;
background-color: gray;
object-fit: cover;
margin: 0 auto;
border-radius: 1rem;
border: 3px solid white;
box-shadow: 3px 3px 6px 0px rgb(0 0 0 / 50%);
}
.character-name {
margin-top: 0.3rem;
display: block;
font-size: 1.2rem;
font-weight: 600;
overflow-wrap: anywhere;
}
"""
return css
def generate_html():
global cards
cards = []
# Iterate through files in image folder
for file in sorted(Path("characters").glob("*")):
if file.suffix in [".json", ".yml", ".yaml"]:
character = file.stem
container_html = '<div class="character-container">'
image_html = "<div class='placeholder'></div>"
for path in [Path(f"characters/{character}.{extension}") for extension in ['png', 'jpg', 'jpeg']]:
if path.exists():
image_html = f'<img src="file/{get_image_cache(path)}">'
break
container_html += f'{image_html} <span class="character-name">{character}</span>'
container_html += "</div>"
cards.append([container_html, character])
return cards
def filter_cards(filter_str=''):
if filter_str == '':
return cards
filter_upper = filter_str.upper()
return [k for k in cards if filter_upper in k[1].upper()]
def select_character(evt: gr.SelectData):
return (evt.value[1])
def custom_js():
path_to_js = Path(__file__).parent.resolve() / 'script.js'
return open(path_to_js, 'r').read()
def ui():
with gr.Accordion("Character gallery", open=settings["gallery-open"], elem_id='gallery-extension'):
gr.HTML(value="<style>" + generate_css() + "</style>")
with gr.Row():
filter_box = gr.Textbox(label='', placeholder='Filter', lines=1, max_lines=1, container=False, elem_id='gallery-filter-box')
gr.ClearButton(filter_box, value='🗑️', elem_classes='refresh-button')
update = gr.Button("Refresh", elem_classes='refresh-button')
gallery = gr.Dataset(
components=[gr.HTML(visible=False)],
label="",
samples=generate_html(),
elem_classes=["character-gallery"],
samples_per_page=settings["gallery-items_per_page"]
)
filter_box.change(lambda: None, None, None, _js=f'() => {{{custom_js()}; gotoFirstPage()}}').success(
filter_cards, filter_box, gallery).then(
lambda x: gr.update(elem_classes='highlighted-border' if x != '' else ''), filter_box, filter_box, show_progress=False)
update.click(generate_html, [], None).success(
filter_cards, filter_box, gallery)
gallery.select(select_character, None, gradio['character_menu'])

View file

@ -0,0 +1 @@
deep-translator==1.9.2

View file

@ -0,0 +1,78 @@
#
# Copyright 2016 The BigDL Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# This file is adapted from
# https://github.com/oobabooga/text-generation-webui/blob/main/extensions/google_translate/script.py
import html
import gradio as gr
from deep_translator import GoogleTranslator
params = {
"activate": True,
"language string": "ja",
}
language_codes = {'Afrikaans': 'af', 'Albanian': 'sq', 'Amharic': 'am', 'Arabic': 'ar', 'Armenian': 'hy', 'Azerbaijani': 'az', 'Basque': 'eu', 'Belarusian': 'be', 'Bengali': 'bn', 'Bosnian': 'bs', 'Bulgarian': 'bg', 'Catalan': 'ca', 'Cebuano': 'ceb', 'Chinese (Simplified)': 'zh-CN', 'Chinese (Traditional)': 'zh-TW', 'Corsican': 'co', 'Croatian': 'hr', 'Czech': 'cs', 'Danish': 'da', 'Dutch': 'nl', 'English': 'en', 'Esperanto': 'eo', 'Estonian': 'et', 'Finnish': 'fi', 'French': 'fr', 'Frisian': 'fy', 'Galician': 'gl', 'Georgian': 'ka', 'German': 'de', 'Greek': 'el', 'Gujarati': 'gu', 'Haitian Creole': 'ht', 'Hausa': 'ha', 'Hawaiian': 'haw', 'Hebrew': 'iw', 'Hindi': 'hi', 'Hmong': 'hmn', 'Hungarian': 'hu', 'Icelandic': 'is', 'Igbo': 'ig', 'Indonesian': 'id', 'Irish': 'ga', 'Italian': 'it', 'Japanese': 'ja', 'Javanese': 'jw', 'Kannada': 'kn', 'Kazakh': 'kk', 'Khmer': 'km', 'Korean': 'ko', 'Kurdish': 'ku', 'Kyrgyz': 'ky', 'Lao': 'lo', 'Latin': 'la', 'Latvian': 'lv', 'Lithuanian': 'lt', 'Luxembourgish': 'lb', 'Macedonian': 'mk', 'Malagasy': 'mg', 'Malay': 'ms', 'Malayalam': 'ml', 'Maltese': 'mt', 'Maori': 'mi', 'Marathi': 'mr', 'Mongolian': 'mn', 'Myanmar (Burmese)': 'my', 'Nepali': 'ne', 'Norwegian': 'no', 'Nyanja (Chichewa)': 'ny', 'Pashto': 'ps', 'Persian': 'fa', 'Polish': 'pl', 'Portuguese (Portugal, Brazil)': 'pt', 'Punjabi': 'pa', 'Romanian': 'ro', 'Russian': 'ru', 'Samoan': 'sm', 'Scots Gaelic': 'gd', 'Serbian': 'sr', 'Sesotho': 'st', 'Shona': 'sn', 'Sindhi': 'sd', 'Sinhala (Sinhalese)': 'si', 'Slovak': 'sk', 'Slovenian': 'sl', 'Somali': 'so', 'Spanish': 'es', 'Sundanese': 'su', 'Swahili': 'sw', 'Swedish': 'sv', 'Tagalog (Filipino)': 'tl', 'Tajik': 'tg', 'Tamil': 'ta', 'Telugu': 'te', 'Thai': 'th', 'Turkish': 'tr', 'Ukrainian': 'uk', 'Urdu': 'ur', 'Uzbek': 'uz', 'Vietnamese': 'vi', 'Welsh': 'cy', 'Xhosa': 'xh', 'Yiddish': 'yi', 'Yoruba': 'yo', 'Zulu': 'zu'}
def input_modifier(string):
"""
This function is applied to your text inputs before
they are fed into the model.
"""
if not params['activate']:
return string
return GoogleTranslator(source=params['language string'], target='en').translate(string)
def output_modifier(string):
"""
This function is applied to the model outputs.
"""
if not params['activate']:
return string
translated_str = GoogleTranslator(source='en', target=params['language string']).translate(html.unescape(string))
return html.escape(translated_str)
def bot_prefix_modifier(string):
"""
This function is only applied in chat mode. It modifies
the prefix text for the Bot and can be used to bias its
behavior.
"""
return string
def ui():
# Finding the language name from the language code to use as the default value
language_name = list(language_codes.keys())[list(language_codes.values()).index(params['language string'])]
# Gradio elements
with gr.Row():
activate = gr.Checkbox(value=params['activate'], label='Activate translation')
with gr.Row():
language = gr.Dropdown(value=language_name, choices=[k for k in language_codes], label='Language')
# Event functions to update the parameters in the backend
activate.change(lambda x: params.update({"activate": x}), activate, None)
language.change(lambda x: params.update({"language string": language_codes[x]}), language, None)

View file

@ -0,0 +1,162 @@
#
# Copyright 2016 The BigDL Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# This file is adapted from
# https://github.com/oobabooga/text-generation-webui/blob/main/extensions/long_replies/script.py
import torch
from modules import chat, shared
from modules.text_generation import (
decode,
encode,
generate_reply,
)
from transformers import LogitsProcessor
import gradio as gr
params = {
"display_name": "Long replies",
"is_tab": False,
"min_length": 120,
}
initial_size = 0
class MyLogits(LogitsProcessor):
"""
Manipulates the probabilities for the next token before it gets sampled.
Used in the logits_processor_modifier function below.
"""
def __init__(self):
self.newline_id = shared.tokenizer.encode('\n')[-1]
pass
def __call__(self, input_ids, scores):
if input_ids.shape[-1] - initial_size < params["min_length"]:
scores[...,self.newline_id] = -1000
# scores[...,shared.tokenizer.eos_token_id] = -1000
# probs = torch.softmax(scores, dim=-1, dtype=torch.float)
# probs[0] /= probs[0].sum()
# scores = torch.log(probs / (1 - probs))
return scores
def history_modifier(history):
"""
Modifies the chat history.
Only used in chat mode.
"""
return history
def state_modifier(state):
"""
Modifies the state variable, which is a dictionary containing the input
values in the UI like sliders and checkboxes.
"""
return state
def chat_input_modifier(text, visible_text, state):
"""
Modifies the user input string in chat mode (visible_text).
You can also modify the internal representation of the user
input (text) to change how it will appear in the prompt.
"""
return text, visible_text
def input_modifier(string, state):
"""
In default/notebook modes, modifies the whole prompt.
In chat mode, it is the same as chat_input_modifier but only applied
to "text", here called "string", and not to "visible_text".
"""
return string
def bot_prefix_modifier(string, state):
"""
Modifies the prefix for the next bot reply in chat mode.
By default, the prefix will be something like "Bot Name:".
"""
return string
def tokenizer_modifier(state, prompt, input_ids, input_embeds):
"""
Modifies the input ids and embeds.
Used by the multimodal extension to put image embeddings in the prompt.
Only used by loaders that use the transformers library for sampling.
"""
global initial_size
initial_size = input_ids.shape[-1]
return prompt, input_ids, input_embeds
def logits_processor_modifier(processor_list, input_ids):
"""
Adds logits processors to the list, allowing you to access and modify
the next token probabilities.
Only used by loaders that use the transformers library for sampling.
"""
processor_list.append(MyLogits())
return processor_list
def output_modifier(string, state):
"""
Modifies the LLM output before it gets presented.
In chat mode, the modified version goes into history['visible'],
and the original version goes into history['internal'].
"""
return string
def custom_generate_chat_prompt(user_input, state, **kwargs):
"""
Replaces the function that generates the prompt from the chat history.
Only used in chat mode.
"""
result = chat.generate_chat_prompt(user_input, state, **kwargs)
return result
def custom_css():
"""
Returns a CSS string that gets appended to the CSS for the webui.
"""
return ''
def custom_js():
"""
Returns a javascript string that gets appended to the javascript
for the webui.
"""
return ''
def setup():
"""
Gets executed only once, when the extension is imported.
"""
pass
def ui():
"""
Gets executed when the UI is drawn. Custom gradio elements and
their corresponding event handlers should be defined here.
To learn about gradio components, check out the docs:
https://gradio.app/docs/
"""
min_length = gr.Slider(0, 800, step=10, value=params['min_length'], label='Minimum reply length')
min_length.change(lambda x: params.update({'min_length': x}), min_length, None)

View file

@ -0,0 +1,6 @@
root ::= (expr "=" ws term "\n")+
expr ::= term ([-+*/] term)*
term ::= ident | num | "(" ws expr ")" ws
ident ::= [a-z] [a-z0-9_]* ws
num ::= [0-9]+ ws
ws ::= [ \t\n]*

View file

@ -0,0 +1,42 @@
root ::= (declaration)*
declaration ::= dataType identifier "(" parameter? ")" "{" statement* "}"
dataType ::= "int" ws | "float" ws | "char" ws
identifier ::= [a-zA-Z_] [a-zA-Z_0-9]*
parameter ::= dataType identifier
statement ::=
( dataType identifier ws "=" ws expression ";" ) |
( identifier ws "=" ws expression ";" ) |
( identifier ws "(" argList? ")" ";" ) |
( "return" ws expression ";" ) |
( "while" "(" condition ")" "{" statement* "}" ) |
( "for" "(" forInit ";" ws condition ";" ws forUpdate ")" "{" statement* "}" ) |
( "if" "(" condition ")" "{" statement* "}" ("else" "{" statement* "}")? ) |
( singleLineComment ) |
( multiLineComment )
forInit ::= dataType identifier ws "=" ws expression | identifier ws "=" ws expression
forUpdate ::= identifier ws "=" ws expression
condition ::= expression relationOperator expression
relationOperator ::= ("<=" | "<" | "==" | "!=" | ">=" | ">")
expression ::= term (("+" | "-") term)*
term ::= factor(("*" | "/") factor)*
factor ::= identifier | number | unaryTerm | funcCall | parenExpression
unaryTerm ::= "-" factor
funcCall ::= identifier "(" argList? ")"
parenExpression ::= "(" ws expression ws ")"
argList ::= expression ("," ws expression)*
number ::= [0-9]+
singleLineComment ::= "//" [^\n]* "\n"
multiLineComment ::= "/*" ( [^*] | ("*" [^/]) )* "*/"
ws ::= ([ \t\n]+)

View file

@ -0,0 +1,13 @@
# Specifies chess moves as a list in algebraic notation, using PGN conventions
# Force first move to "1. ", then any 1-2 digit number after, relying on model to follow the pattern
root ::= "1. " move " " move "\n" ([1-9] [0-9]? ". " move " " move "\n")+
move ::= (pawn | nonpawn | castle) [+#]?
# piece type, optional file/rank, optional capture, dest file & rank
nonpawn ::= [NBKQR] [a-h]? [1-8]? "x"? [a-h] [1-8]
# optional file & capture, dest file & rank, optional promotion
pawn ::= ([a-h] "x")? [a-h] [1-8] ("=" [NBKQR])?
castle ::= "O-O" "-O"?

View file

@ -0,0 +1,14 @@
root ::= object
object ::= "{" ws ( string ":" ws value ("," ws string ":" ws value)* )? "}"
value ::= object | array | string | number | ("true" | "false" | "null") ws
array ::= "[" ws ( value ("," ws value)* )? "]" ws
string ::= "\"" ( [a-zA-Z0-9] )* "\"" ws
number ::= ("-"? ([0-9] | [1-9] [0-9]*)) ("." [0-9]+)? ([eE] [-+]? [0-9]+)? ws
ws ::= ([ \t\n] ws)?

View file

@ -0,0 +1,14 @@
root ::= object
object ::= "{" ws ( string ":" ws value ("," ws string ":" ws value)* )? "}" ws
value ::= object | array | string | number | ("true" | "false" | "null") ws
array ::= "[" ws ( value ("," ws value)* )? "]" ws
string ::= "\"" ( [a-zA-Z0-9] )* "\"" ws
number ::= ("-"? ([0-9] | [1-9] [0-9]*)) ("." [0-9]+)? ([eE] [-+]? [0-9]+)? ws
ws ::= ([ \t\n] ws)?

View file

@ -0,0 +1,2 @@
root ::= "1. " paragraph "\n" ([0-9] [0-9]? ". " paragraph "\n")+
paragraph ::= [a-zA-Z'.,; ]+

View file

@ -0,0 +1,7 @@
root ::= (expr "=" ws term "\n")+
expr ::= term ([-+*/] term)*
term ::= num | "(" ws expr ")" ws
num ::= [0-9]+ ws
ws ::= [ \t\n]*
# this is a comment

View file

@ -0,0 +1,373 @@
let main_parent = document.getElementById("chat-tab").parentNode;
let extensions = document.getElementById("extensions");
main_parent.childNodes[0].classList.add("header_bar");
main_parent.style = "padding: 0; margin: 0";
main_parent.parentNode.style = "gap: 0";
main_parent.parentNode.parentNode.style = "padding: 0";
document.querySelector(".header_bar").addEventListener("click", function(event) {
if (event.target.tagName === "BUTTON") {
const buttonText = event.target.textContent.trim();
let chat_visible = (buttonText == "Chat");
let default_visible = (buttonText == "Default");
let notebook_visible = (buttonText == "Notebook");
// Check if one of the generation tabs is visible
if (chat_visible || notebook_visible || default_visible) {
extensions && (extensions.style.display = "flex");
if (chat_visible) {
this.style.marginBottom = "0px";
extensions && (extensions.style.maxWidth = "880px");
extensions && (extensions.style.padding = "0px");
} else {
this.style.marginBottom = "19px";
extensions && (extensions.style.maxWidth = "none");
extensions && (extensions.style.padding = "15px");
}
} else {
this.style.marginBottom = "19px";
extensions && (extensions.style.display = "none");
}
}
});
//------------------------------------------------
// Keyboard shortcuts
//------------------------------------------------
document.addEventListener("keydown", function(event) {
// Stop generation on Esc pressed
if (event.key === "Escape") {
// Find the element with id 'stop' and click it
var stopButton = document.getElementById("stop");
if (stopButton) {
stopButton.click();
}
}
// Show chat controls on Ctrl + S
else if (event.ctrlKey && event.key == "s") {
event.preventDefault();
var showControlsElement = document.getElementById("show-controls");
if (showControlsElement && showControlsElement.childNodes.length >= 4) {
showControlsElement.childNodes[3].click();
var arr = document.getElementById("chat-input").childNodes[2].childNodes;
arr[arr.length - 1].focus();
}
}
// Regenerate on Ctrl + Enter
else if (event.ctrlKey && event.key === "Enter") {
event.preventDefault();
document.getElementById("Regenerate").click();
}
// Continue on Alt + Enter
else if (event.altKey && event.key === "Enter") {
event.preventDefault();
document.getElementById("Continue").click();
}
// Remove last on Ctrl + Shift + Backspace
else if (event.ctrlKey && event.shiftKey && event.key === "Backspace") {
event.preventDefault();
document.getElementById("Remove-last").click();
}
// Copy last on Ctrl + Shift + K
else if (event.ctrlKey && event.shiftKey && event.key === "K") {
event.preventDefault();
document.getElementById("Copy-last").click();
}
// Replace last on Ctrl + Shift + L
else if (event.ctrlKey && event.shiftKey && event.key === "L") {
event.preventDefault();
document.getElementById("Replace-last").click();
}
// Impersonate on Ctrl + Shift + M
else if (event.ctrlKey && event.shiftKey && event.key === "M") {
event.preventDefault();
document.getElementById("Impersonate").click();
}
});
//------------------------------------------------
// Position the chat typing dots
//------------------------------------------------
typing = document.getElementById("typing-container");
typingParent = typing.parentNode;
typingSibling = typing.previousElementSibling;
typingSibling.insertBefore(typing, typingSibling.childNodes[2]);
//------------------------------------------------
// Chat scrolling
//------------------------------------------------
const targetElement = document.getElementById("chat").parentNode.parentNode.parentNode;
targetElement.classList.add("pretty_scrollbar");
targetElement.classList.add("chat-parent");
let isScrolled = false;
targetElement.addEventListener("scroll", function() {
let diff = targetElement.scrollHeight - targetElement.clientHeight;
if(Math.abs(targetElement.scrollTop - diff) <= 10 || diff == 0) {
isScrolled = false;
} else {
isScrolled = true;
}
});
// Create a MutationObserver instance
const observer = new MutationObserver(function(mutations) {
mutations.forEach(function(mutation) {
updateCssProperties();
if(!isScrolled) {
targetElement.scrollTop = targetElement.scrollHeight;
}
const firstChild = targetElement.children[0];
if (firstChild.classList.contains("generating")) {
typing.parentNode.classList.add("visible-dots");
document.getElementById("stop").style.display = "flex";
document.getElementById("Generate").style.display = "none";
} else {
typing.parentNode.classList.remove("visible-dots");
document.getElementById("stop").style.display = "none";
document.getElementById("Generate").style.display = "flex";
}
});
});
// Configure the observer to watch for changes in the subtree and attributes
const config = {
childList: true,
subtree: true,
characterData: true,
attributeOldValue: true,
characterDataOldValue: true
};
// Start observing the target element
observer.observe(targetElement, config);
//------------------------------------------------
// Add some scrollbars
//------------------------------------------------
const textareaElements = document.querySelectorAll(".add_scrollbar textarea");
for(i = 0; i < textareaElements.length; i++) {
textareaElements[i].classList.remove("scroll-hide");
textareaElements[i].classList.add("pretty_scrollbar");
textareaElements[i].style.resize = "none";
}
//------------------------------------------------
// Remove some backgrounds
//------------------------------------------------
const noBackgroundelements = document.querySelectorAll(".no-background");
for(i = 0; i < noBackgroundelements.length; i++) {
noBackgroundelements[i].parentNode.style.border = "none";
noBackgroundelements[i].parentNode.parentNode.parentNode.style.alignItems = "center";
}
const slimDropdownElements = document.querySelectorAll(".slim-dropdown");
for (i = 0; i < slimDropdownElements.length; i++) {
const parentNode = slimDropdownElements[i].parentNode;
parentNode.style.background = "transparent";
parentNode.style.border = "0";
}
//------------------------------------------------
// Create the hover menu in the chat tab
// The show/hide events were adapted from:
// https://github.com/SillyTavern/SillyTavern/blob/6c8bd06308c69d51e2eb174541792a870a83d2d6/public/script.js
//------------------------------------------------
var buttonsInChat = document.querySelectorAll("#chat-tab:not(.old-ui) #chat-buttons button");
var button = document.getElementById("hover-element-button");
var menu = document.getElementById("hover-menu");
var istouchscreen = (navigator.maxTouchPoints > 0) || "ontouchstart" in document.documentElement;
function showMenu() {
menu.style.display = "flex"; // Show the menu
}
function hideMenu() {
menu.style.display = "none"; // Hide the menu
if (!istouchscreen) {
document.querySelector("#chat-input textarea").focus(); // Focus on the chat input
}
}
if (buttonsInChat.length > 0) {
for (let i = buttonsInChat.length - 1; i >= 0; i--) {
const thisButton = buttonsInChat[i];
menu.appendChild(thisButton);
thisButton.addEventListener("click", () => {
hideMenu();
});
const buttonText = thisButton.textContent;
const matches = buttonText.match(/(\(.*?\))/);
if (matches && matches.length > 1) {
// Apply the transparent-substring class to the matched substring
const substring = matches[1];
const newText = buttonText.replace(substring, `&nbsp;<span class="transparent-substring">${substring.slice(1, -1)}</span>`);
thisButton.innerHTML = newText;
}
}
} else {
buttonsInChat = document.querySelectorAll("#chat-tab.old-ui #chat-buttons button");
for (let i = 0; i < buttonsInChat.length; i++) {
buttonsInChat[i].textContent = buttonsInChat[i].textContent.replace(/ \(.*?\)/, "");
}
document.getElementById("gr-hover-container").style.display = "none";
}
function isMouseOverButtonOrMenu() {
return menu.matches(":hover") || button.matches(":hover");
}
button.addEventListener("mouseenter", function () {
if (!istouchscreen) {
showMenu();
}
});
button.addEventListener("click", function () {
if (menu.style.display === "flex") {
hideMenu();
}
else {
showMenu();
}
});
// Add event listener for mouseleave on the button
button.addEventListener("mouseleave", function () {
// Delay to prevent menu hiding when the mouse leaves the button into the menu
setTimeout(function () {
if (!isMouseOverButtonOrMenu()) {
hideMenu();
}
}, 100);
});
// Add event listener for mouseleave on the menu
menu.addEventListener("mouseleave", function () {
// Delay to prevent menu hide when the mouse leaves the menu into the button
setTimeout(function () {
if (!isMouseOverButtonOrMenu()) {
hideMenu();
}
}, 100);
});
// Add event listener for click anywhere in the document
document.addEventListener("click", function (event) {
// Check if the click is outside the button/menu and the menu is visible
if (!isMouseOverButtonOrMenu() && menu.style.display === "flex") {
hideMenu();
}
if (event.target.classList.contains("pfp_character")) {
toggleBigPicture();
}
});
//------------------------------------------------
// Relocate the "Show controls" checkbox
//------------------------------------------------
var elementToMove = document.getElementById("show-controls");
var parent = elementToMove.parentNode;
for (var i = 0; i < 2; i++) {
parent = parent.parentNode;
}
parent.insertBefore(elementToMove, parent.firstChild);
//------------------------------------------------
// Make the chat input grow upwards instead of downwards
//------------------------------------------------
document.getElementById("show-controls").parentNode.style.position = "absolute";
document.getElementById("show-controls").parentNode.style.bottom = "0px";
//------------------------------------------------
// Focus on the chat input
//------------------------------------------------
document.querySelector("#chat-input textarea").focus();
//------------------------------------------------
// Show enlarged character picture when the profile
// picture is clicked on
//------------------------------------------------
let bigPictureVisible = false;
function addBigPicture() {
var imgElement = document.createElement("img");
var timestamp = new Date().getTime();
imgElement.src = "/file/cache/pfp_character.png?time=" + timestamp;
imgElement.classList.add("bigProfilePicture");
var imgElementParent = document.getElementById("chat").parentNode.parentNode.parentNode.parentNode.parentNode.parentNode.parentNode;
imgElementParent.appendChild(imgElement);
}
function deleteBigPicture() {
var bigProfilePictures = document.querySelectorAll(".bigProfilePicture");
bigProfilePictures.forEach(function (element) {
element.parentNode.removeChild(element);
});
}
function toggleBigPicture() {
if(bigPictureVisible) {
deleteBigPicture();
bigPictureVisible = false;
} else {
addBigPicture();
bigPictureVisible = true;
}
}
//------------------------------------------------
// Define global CSS properties for resizing and
// positioning certain elements
//------------------------------------------------
let currentChatInputHeight = 0;
function updateCssProperties() {
// Set the height of the chat area
const chatContainer = document.getElementById("chat").parentNode.parentNode.parentNode;
const chatInputHeight = document.querySelector("#chat-input textarea").clientHeight;
if (chatContainer.clientHeight > 0) {
const newChatHeight = `${chatContainer.clientHeight - chatInputHeight + 40}px`;
document.documentElement.style.setProperty("--chat-height", newChatHeight);
document.documentElement.style.setProperty("--input-delta", `${chatInputHeight - 40}px`);
// Set the position offset of the chat input box
const header = document.querySelector(".header_bar");
const headerHeight = `${header.clientHeight}px`;
document.documentElement.style.setProperty("--header-height", headerHeight);
// Offset the scroll position of the chat area
if (chatInputHeight !== currentChatInputHeight) {
chatContainer.scrollTop += chatInputHeight > currentChatInputHeight ? chatInputHeight : -chatInputHeight;
currentChatInputHeight = chatInputHeight;
}
}
}
new ResizeObserver(updateCssProperties)
.observe(document.querySelector("#chat-input textarea"));
window.addEventListener("resize", updateCssProperties);

View file

@ -0,0 +1,40 @@
// Functions for downloading JSON files
function getCurrentTimestamp() {
const now = new Date();
const timezoneOffset = now.getTimezoneOffset() * 60000; // Convert to milliseconds
const localTime = new Date(now.getTime() - timezoneOffset);
const formattedTimestamp = localTime.toISOString().replace(/[-:]/g, "").slice(0, 15);
return formattedTimestamp;
}
function saveFile(contents, filename) {
const element = document.createElement("a");
element.setAttribute("href", "data:text/plain;charset=utf-8," + encodeURIComponent(contents));
element.setAttribute("download", filename);
element.style.display = "none";
document.body.appendChild(element);
element.click();
document.body.removeChild(element);
}
function saveHistory(history, character, mode) {
let path = null;
if (["chat", "chat-instruct"].includes(mode) && character && character.trim() !== "") {
path = `history_${character}_${getCurrentTimestamp()}.json`;
} else {
try {
path = `history_${mode}_${getCurrentTimestamp()}.json`;
} catch (error) {
path = `history_${getCurrentTimestamp()}.json`;
}
}
saveFile(history, path);
}
function saveSession(session) {
let path = null;
path = `session_${getCurrentTimestamp()}.json`;
saveFile(session, path);
}

View file

@ -0,0 +1,30 @@
const belowChatInput = document.querySelectorAll("#chat-tab > div > :nth-child(n+2), #extensions");
const chatParent = document.querySelector(".chat-parent");
function toggle_controls(value) {
if (value) {
belowChatInput.forEach(element => {
element.style.display = "inherit";
});
chatParent.classList.remove("bigchat");
document.getElementById("chat-input-row").classList.remove("bigchat");
document.getElementById("chat-col").classList.remove("bigchat");
document.getElementById("chat-tab").style.paddingBottom = "";
let gallery_element = document.getElementById("gallery-extension");
if (gallery_element) {
gallery_element.style.display = "block";
}
} else {
belowChatInput.forEach(element => {
element.style.display = "none";
});
chatParent.classList.add("bigchat");
document.getElementById("chat-input-row").classList.add("bigchat");
document.getElementById("chat-col").classList.add("bigchat");
document.getElementById("chat-tab").style.paddingBottom = "0px";
}
}

View file

@ -0,0 +1,59 @@
let chat_tab = document.getElementById("chat-tab");
let main_parent = chat_tab.parentNode;
function scrollToTop() {
window.scrollTo({
top: 0,
// behavior: 'smooth'
});
}
function findButtonsByText(buttonText) {
const buttons = document.getElementsByTagName("button");
const matchingButtons = [];
buttonText = buttonText.trim();
for (let i = 0; i < buttons.length; i++) {
const button = buttons[i];
const buttonInnerText = button.textContent.trim();
if (buttonInnerText === buttonText) {
matchingButtons.push(button);
}
}
return matchingButtons;
}
function switch_to_chat() {
let chat_tab_button = main_parent.childNodes[0].childNodes[1];
chat_tab_button.click();
scrollToTop();
}
function switch_to_default() {
let default_tab_button = main_parent.childNodes[0].childNodes[4];
default_tab_button.click();
scrollToTop();
}
function switch_to_notebook() {
let notebook_tab_button = main_parent.childNodes[0].childNodes[7];
notebook_tab_button.click();
findButtonsByText("Raw")[1].click();
scrollToTop();
}
function switch_to_generation_parameters() {
let parameters_tab_button = main_parent.childNodes[0].childNodes[10];
parameters_tab_button.click();
findButtonsByText("Generation")[0].click();
scrollToTop();
}
function switch_to_character() {
let parameters_tab_button = main_parent.childNodes[0].childNodes[10];
parameters_tab_button.click();
findButtonsByText("Character")[0].click();
scrollToTop();
}

View file

@ -0,0 +1,7 @@
function updateBigPicture() {
var existingElement = document.querySelector(".bigProfilePicture");
if (existingElement) {
var timestamp = new Date().getTime();
existingElement.src = "/file/cache/pfp_character.png?time=" + timestamp;
}
}

View file

@ -0,0 +1,37 @@
#
# Copyright 2016 The BigDL Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# This file is adapted from
# https://github.com/oobabooga/text-generation-webui/blob/main/modules/RoPE.py
def get_alpha_value(alpha, base):
'''
Gets alpha_value from alpha_value and rope_freq_base
'''
if base > 0:
return (base/10000.) ** (63/64.)
else:
return alpha
def get_rope_freq_base(alpha, base):
'''
Gets rope_freq_base from alpha_value and rope_freq_base
'''
if base > 0:
return base
else:
return 10000 * alpha ** (64/63.)

View file

@ -0,0 +1,66 @@
#
# Copyright 2016 The BigDL Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# This file is adapted from
# https://github.com/oobabooga/text-generation-webui/blob/main/modules/block_requests.py
import builtins
import io
import requests
from modules.logging_colors import logger
original_open = open
original_get = requests.get
class RequestBlocker:
def __enter__(self):
requests.get = my_get
def __exit__(self, exc_type, exc_value, traceback):
requests.get = original_get
class OpenMonkeyPatch:
def __enter__(self):
builtins.open = my_open
def __exit__(self, exc_type, exc_value, traceback):
builtins.open = original_open
def my_get(url, **kwargs):
logger.info('Unwanted HTTP request redirected to localhost :)')
kwargs.setdefault('allow_redirects', True)
return requests.api.request('get', 'http://127.0.0.1/', **kwargs)
# Kindly provided by our friend WizardLM-30B
def my_open(*args, **kwargs):
filename = str(args[0])
if filename.endswith('index.html'):
with original_open(*args, **kwargs) as f:
file_contents = f.read()
file_contents = file_contents.replace(b'\t\t<script\n\t\t\tsrc="https://cdnjs.cloudflare.com/ajax/libs/iframe-resizer/4.3.7/iframeResizer.contentWindow.min.js"\n\t\t\tasync\n\t\t></script>', b'')
file_contents = file_contents.replace(b'cdnjs.cloudflare.com', b'127.0.0.1')
return io.BytesIO(file_contents)
else:
return original_open(*args, **kwargs)

View file

@ -0,0 +1,122 @@
#
# Copyright 2016 The BigDL Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# This file is adapted from
# https://github.com/oobabooga/text-generation-webui/blob/main/modules/callbacks.py
import gc
import traceback
from queue import Queue
from threading import Thread
import torch
import transformers
from transformers import is_torch_xpu_available
import modules.shared as shared
class StopNowException(Exception):
pass
class _StopEverythingStoppingCriteria(transformers.StoppingCriteria):
def __init__(self):
transformers.StoppingCriteria.__init__(self)
def __call__(self, input_ids: torch.LongTensor, _scores: torch.FloatTensor) -> bool:
return shared.stop_everything
class Stream(transformers.StoppingCriteria):
def __init__(self, callback_func=None):
self.callback_func = callback_func
def __call__(self, input_ids, scores) -> bool:
if self.callback_func is not None:
self.callback_func(input_ids[0])
return False
class Iteratorize:
"""
Transforms a function that takes a callback
into a lazy iterator (generator).
Adapted from: https://stackoverflow.com/a/9969000
"""
def __init__(self, func, args=None, kwargs=None, callback=None):
self.mfunc = func
self.c_callback = callback
self.q = Queue()
self.sentinel = object()
self.args = args or []
self.kwargs = kwargs or {}
self.stop_now = False
def _callback(val):
if self.stop_now or shared.stop_everything:
raise StopNowException
self.q.put(val)
def gentask():
try:
ret = self.mfunc(callback=_callback, *args, **self.kwargs)
except StopNowException:
pass
except:
traceback.print_exc()
pass
clear_torch_cache()
self.q.put(self.sentinel)
if self.c_callback:
self.c_callback(ret)
self.thread = Thread(target=gentask)
self.thread.start()
def __iter__(self):
return self
def __next__(self):
obj = self.q.get(True, None)
if obj is self.sentinel:
raise StopIteration
else:
return obj
def __del__(self):
clear_torch_cache()
def __enter__(self):
return self
def __exit__(self, exc_type, exc_val, exc_tb):
self.stop_now = True
clear_torch_cache()
def clear_torch_cache():
gc.collect()
if not shared.args.cpu:
if is_torch_xpu_available():
torch.xpu.empty_cache()
else:
torch.cuda.empty_cache()

View file

@ -0,0 +1,881 @@
#
# Copyright 2016 The BigDL Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# This file is adapted from
# https://github.com/oobabooga/text-generation-webui/blob/main/modules/chat.py
import base64
import copy
import functools
import html
import json
import re
from datetime import datetime
from functools import partial
from pathlib import Path
import gradio as gr
import yaml
from jinja2.sandbox import ImmutableSandboxedEnvironment
from PIL import Image
import modules.shared as shared
from modules.extensions import apply_extensions
from modules.html_generator import chat_html_wrapper, make_thumbnail
from modules.logging_colors import logger
from modules.text_generation import (
generate_reply,
get_encoded_length,
get_max_prompt_length
)
from modules.utils import delete_file, get_available_characters, save_file
# Copied from the Transformers library
jinja_env = ImmutableSandboxedEnvironment(trim_blocks=True, lstrip_blocks=True)
def str_presenter(dumper, data):
"""
Copied from https://github.com/yaml/pyyaml/issues/240
Makes pyyaml output prettier multiline strings.
"""
if data.count('\n') > 0:
return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='|')
return dumper.represent_scalar('tag:yaml.org,2002:str', data)
yaml.add_representer(str, str_presenter)
yaml.representer.SafeRepresenter.add_representer(str, str_presenter)
def get_generation_prompt(renderer, impersonate=False, strip_trailing_spaces=True):
'''
Given a Jinja template, reverse-engineers the prefix and the suffix for
an assistant message (if impersonate=False) or an user message
(if impersonate=True)
'''
if impersonate:
messages = [
{"role": "user", "content": "<<|user-message-1|>>"},
{"role": "user", "content": "<<|user-message-2|>>"},
]
else:
messages = [
{"role": "assistant", "content": "<<|user-message-1|>>"},
{"role": "assistant", "content": "<<|user-message-2|>>"},
]
prompt = renderer(messages=messages)
suffix_plus_prefix = prompt.split("<<|user-message-1|>>")[1].split("<<|user-message-2|>>")[0]
suffix = prompt.split("<<|user-message-2|>>")[1]
prefix = suffix_plus_prefix[len(suffix):]
if strip_trailing_spaces:
prefix = prefix.rstrip(' ')
return prefix, suffix
def generate_chat_prompt(user_input, state, **kwargs):
impersonate = kwargs.get('impersonate', False)
_continue = kwargs.get('_continue', False)
also_return_rows = kwargs.get('also_return_rows', False)
history = kwargs.get('history', state['history'])['internal']
# Templates
chat_template = jinja_env.from_string(state['chat_template_str'])
instruction_template = jinja_env.from_string(state['instruction_template_str'])
chat_renderer = partial(chat_template.render, add_generation_prompt=False, name1=state['name1'], name2=state['name2'])
instruct_renderer = partial(instruction_template.render, add_generation_prompt=False)
messages = []
if state['mode'] == 'instruct':
renderer = instruct_renderer
if state['custom_system_message'].strip() != '':
messages.append({"role": "system", "content": state['custom_system_message']})
else:
renderer = chat_renderer
if state['context'].strip() != '':
context = replace_character_names(state['context'], state['name1'], state['name2'])
messages.append({"role": "system", "content": context})
insert_pos = len(messages)
for user_msg, assistant_msg in reversed(history):
user_msg = user_msg.strip()
assistant_msg = assistant_msg.strip()
if assistant_msg:
messages.insert(insert_pos, {"role": "assistant", "content": assistant_msg})
if user_msg not in ['', '<|BEGIN-VISIBLE-CHAT|>']:
messages.insert(insert_pos, {"role": "user", "content": user_msg})
user_input = user_input.strip()
if user_input and not impersonate and not _continue:
messages.append({"role": "user", "content": user_input})
def remove_extra_bos(prompt):
for bos_token in ['<s>', '<|startoftext|>']:
while prompt.startswith(bos_token):
prompt = prompt[len(bos_token):]
return prompt
def make_prompt(messages):
if state['mode'] == 'chat-instruct' and _continue:
prompt = renderer(messages=messages[:-1])
else:
prompt = renderer(messages=messages)
if state['mode'] == 'chat-instruct':
outer_messages = []
if state['custom_system_message'].strip() != '':
outer_messages.append({"role": "system", "content": state['custom_system_message']})
prompt = remove_extra_bos(prompt)
command = state['chat-instruct_command']
command = command.replace('<|character|>', state['name2'] if not impersonate else state['name1'])
command = command.replace('<|prompt|>', prompt)
if _continue:
prefix = get_generation_prompt(renderer, impersonate=impersonate, strip_trailing_spaces=False)[0]
prefix += messages[-1]["content"]
else:
prefix = get_generation_prompt(renderer, impersonate=impersonate)[0]
if not impersonate:
prefix = apply_extensions('bot_prefix', prefix, state)
outer_messages.append({"role": "user", "content": command})
outer_messages.append({"role": "assistant", "content": prefix})
prompt = instruction_template.render(messages=outer_messages)
suffix = get_generation_prompt(instruct_renderer, impersonate=False)[1]
prompt = prompt[:-len(suffix)]
else:
if _continue:
suffix = get_generation_prompt(renderer, impersonate=impersonate)[1]
prompt = prompt[:-len(suffix)]
else:
prefix = get_generation_prompt(renderer, impersonate=impersonate)[0]
if state['mode'] == 'chat' and not impersonate:
prefix = apply_extensions('bot_prefix', prefix, state)
prompt += prefix
prompt = remove_extra_bos(prompt)
return prompt
prompt = make_prompt(messages)
# Handle truncation
max_length = get_max_prompt_length(state)
while len(messages) > 0 and get_encoded_length(prompt) > max_length:
# Try to save the system message
if len(messages) > 1 and messages[0]['role'] == 'system':
messages.pop(1)
else:
messages.pop(0)
prompt = make_prompt(messages)
if also_return_rows:
return prompt, [message['content'] for message in messages]
else:
return prompt
def get_stopping_strings(state):
stopping_strings = []
renderers = []
if state['mode'] in ['instruct', 'chat-instruct']:
template = jinja_env.from_string(state['instruction_template_str'])
renderer = partial(template.render, add_generation_prompt=False)
renderers.append(renderer)
if state['mode'] in ['chat', 'chat-instruct']:
template = jinja_env.from_string(state['chat_template_str'])
renderer = partial(template.render, add_generation_prompt=False, name1=state['name1'], name2=state['name2'])
renderers.append(renderer)
for renderer in renderers:
prefix_bot, suffix_bot = get_generation_prompt(renderer, impersonate=False)
prefix_user, suffix_user = get_generation_prompt(renderer, impersonate=True)
stopping_strings += [
suffix_user + prefix_bot,
suffix_user + prefix_user,
suffix_bot + prefix_bot,
suffix_bot + prefix_user,
]
if 'stopping_strings' in state and isinstance(state['stopping_strings'], list):
stopping_strings += state.pop('stopping_strings')
return list(set(stopping_strings))
def chatbot_wrapper(text, state, regenerate=False, _continue=False, loading_message=True, for_ui=False):
history = state['history']
output = copy.deepcopy(history)
output = apply_extensions('history', output)
state = apply_extensions('state', state)
visible_text = None
stopping_strings = get_stopping_strings(state)
is_stream = state['stream']
# Prepare the input
if not (regenerate or _continue):
visible_text = html.escape(text)
# Apply extensions
text, visible_text = apply_extensions('chat_input', text, visible_text, state)
text = apply_extensions('input', text, state, is_chat=True)
output['internal'].append([text, ''])
output['visible'].append([visible_text, ''])
# *Is typing...*
if loading_message:
yield {
'visible': output['visible'][:-1] + [[output['visible'][-1][0], shared.processing_message]],
'internal': output['internal']
}
else:
text, visible_text = output['internal'][-1][0], output['visible'][-1][0]
if regenerate:
if loading_message:
yield {
'visible': output['visible'][:-1] + [[visible_text, shared.processing_message]],
'internal': output['internal'][:-1] + [[text, '']]
}
elif _continue:
last_reply = [output['internal'][-1][1], output['visible'][-1][1]]
if loading_message:
yield {
'visible': output['visible'][:-1] + [[visible_text, last_reply[1] + '...']],
'internal': output['internal']
}
if shared.model_name == 'None' or shared.model is None:
raise ValueError("No model is loaded! Select one in the Model tab.")
# Generate the prompt
kwargs = {
'_continue': _continue,
'history': output if _continue else {k: v[:-1] for k, v in output.items()}
}
prompt = apply_extensions('custom_generate_chat_prompt', text, state, **kwargs)
if prompt is None:
prompt = generate_chat_prompt(text, state, **kwargs)
# Generate
reply = None
for j, reply in enumerate(generate_reply(prompt, state, stopping_strings=stopping_strings, is_chat=True, for_ui=for_ui)):
# Extract the reply
visible_reply = reply
if state['mode'] in ['chat', 'chat-instruct']:
visible_reply = re.sub("(<USER>|<user>|{{user}})", state['name1'], reply)
visible_reply = html.escape(visible_reply)
if shared.stop_everything:
output['visible'][-1][1] = apply_extensions('output', output['visible'][-1][1], state, is_chat=True)
yield output
return
if _continue:
output['internal'][-1] = [text, last_reply[0] + reply]
output['visible'][-1] = [visible_text, last_reply[1] + visible_reply]
if is_stream:
yield output
elif not (j == 0 and visible_reply.strip() == ''):
output['internal'][-1] = [text, reply.lstrip(' ')]
output['visible'][-1] = [visible_text, visible_reply.lstrip(' ')]
if is_stream:
yield output
output['visible'][-1][1] = apply_extensions('output', output['visible'][-1][1], state, is_chat=True)
yield output
def impersonate_wrapper(text, state):
static_output = chat_html_wrapper(state['history'], state['name1'], state['name2'], state['mode'], state['chat_style'], state['character_menu'])
if shared.model_name == 'None' or shared.model is None:
logger.error("No model is loaded! Select one in the Model tab.")
yield '', static_output
return
prompt = generate_chat_prompt('', state, impersonate=True)
stopping_strings = get_stopping_strings(state)
yield text + '...', static_output
reply = None
for reply in generate_reply(prompt + text, state, stopping_strings=stopping_strings, is_chat=True):
yield (text + reply).lstrip(' '), static_output
if shared.stop_everything:
return
def generate_chat_reply(text, state, regenerate=False, _continue=False, loading_message=True, for_ui=False):
history = state['history']
if regenerate or _continue:
text = ''
if (len(history['visible']) == 1 and not history['visible'][0][0]) or len(history['internal']) == 0:
yield history
return
for history in chatbot_wrapper(text, state, regenerate=regenerate, _continue=_continue, loading_message=loading_message, for_ui=for_ui):
yield history
def character_is_loaded(state, raise_exception=False):
if state['mode'] in ['chat', 'chat-instruct'] and state['name2'] == '':
logger.error('It looks like no character is loaded. Please load one under Parameters > Character.')
if raise_exception:
raise ValueError
return False
else:
return True
def generate_chat_reply_wrapper(text, state, regenerate=False, _continue=False):
'''
Same as above but returns HTML for the UI
'''
if not character_is_loaded(state):
return
if state['start_with'] != '' and not _continue:
if regenerate:
text, state['history'] = remove_last_message(state['history'])
regenerate = False
_continue = True
send_dummy_message(text, state)
send_dummy_reply(state['start_with'], state)
for i, history in enumerate(generate_chat_reply(text, state, regenerate, _continue, loading_message=True, for_ui=True)):
yield chat_html_wrapper(history, state['name1'], state['name2'], state['mode'], state['chat_style'], state['character_menu']), history
def remove_last_message(history):
if len(history['visible']) > 0 and history['internal'][-1][0] != '<|BEGIN-VISIBLE-CHAT|>':
last = history['visible'].pop()
history['internal'].pop()
else:
last = ['', '']
return html.unescape(last[0]), history
def send_last_reply_to_input(history):
if len(history['visible']) > 0:
return html.unescape(history['visible'][-1][1])
else:
return ''
def replace_last_reply(text, state):
history = state['history']
if len(text.strip()) == 0:
return history
elif len(history['visible']) > 0:
history['visible'][-1][1] = html.escape(text)
history['internal'][-1][1] = apply_extensions('input', text, state, is_chat=True)
return history
def send_dummy_message(text, state):
history = state['history']
history['visible'].append([html.escape(text), ''])
history['internal'].append([apply_extensions('input', text, state, is_chat=True), ''])
return history
def send_dummy_reply(text, state):
history = state['history']
if len(history['visible']) > 0 and not history['visible'][-1][1] == '':
history['visible'].append(['', ''])
history['internal'].append(['', ''])
history['visible'][-1][1] = html.escape(text)
history['internal'][-1][1] = apply_extensions('input', text, state, is_chat=True)
return history
def redraw_html(history, name1, name2, mode, style, character, reset_cache=False):
return chat_html_wrapper(history, name1, name2, mode, style, character, reset_cache=reset_cache)
def start_new_chat(state):
mode = state['mode']
history = {'internal': [], 'visible': []}
if mode != 'instruct':
greeting = replace_character_names(state['greeting'], state['name1'], state['name2'])
if greeting != '':
history['internal'] += [['<|BEGIN-VISIBLE-CHAT|>', greeting]]
history['visible'] += [['', apply_extensions('output', greeting, state, is_chat=True)]]
unique_id = datetime.now().strftime('%Y%m%d-%H-%M-%S')
save_history(history, unique_id, state['character_menu'], state['mode'])
return history
def get_history_file_path(unique_id, character, mode):
if mode == 'instruct':
p = Path(f'logs/instruct/{unique_id}.json')
else:
p = Path(f'logs/chat/{character}/{unique_id}.json')
return p
def save_history(history, unique_id, character, mode):
if shared.args.multi_user:
return
p = get_history_file_path(unique_id, character, mode)
if not p.parent.is_dir():
p.parent.mkdir(parents=True)
with open(p, 'w', encoding='utf-8') as f:
f.write(json.dumps(history, indent=4))
def rename_history(old_id, new_id, character, mode):
if shared.args.multi_user:
return
old_p = get_history_file_path(old_id, character, mode)
new_p = get_history_file_path(new_id, character, mode)
if new_p.parent != old_p.parent:
logger.error(f"The following path is not allowed: {new_p}.")
elif new_p == old_p:
logger.info("The provided path is identical to the old one.")
else:
logger.info(f"Renaming {old_p} to {new_p}")
old_p.rename(new_p)
def find_all_histories(state):
if shared.args.multi_user:
return ['']
if state['mode'] == 'instruct':
paths = Path('logs/instruct').glob('*.json')
else:
character = state['character_menu']
# Handle obsolete filenames and paths
old_p = Path(f'logs/{character}_persistent.json')
new_p = Path(f'logs/persistent_{character}.json')
if old_p.exists():
logger.warning(f"Renaming {old_p} to {new_p}")
old_p.rename(new_p)
if new_p.exists():
unique_id = datetime.now().strftime('%Y%m%d-%H-%M-%S')
p = get_history_file_path(unique_id, character, state['mode'])
logger.warning(f"Moving {new_p} to {p}")
p.parent.mkdir(exist_ok=True)
new_p.rename(p)
paths = Path(f'logs/chat/{character}').glob('*.json')
histories = sorted(paths, key=lambda x: x.stat().st_mtime, reverse=True)
histories = [path.stem for path in histories]
return histories
def load_latest_history(state):
'''
Loads the latest history for the given character in chat or chat-instruct
mode, or the latest instruct history for instruct mode.
'''
if shared.args.multi_user:
return start_new_chat(state)
histories = find_all_histories(state)
if len(histories) > 0:
history = load_history(histories[0], state['character_menu'], state['mode'])
else:
history = start_new_chat(state)
return history
def load_history(unique_id, character, mode):
p = get_history_file_path(unique_id, character, mode)
f = json.loads(open(p, 'rb').read())
if 'internal' in f and 'visible' in f:
history = f
else:
history = {
'internal': f['data'],
'visible': f['data_visible']
}
return history
def load_history_json(file, history):
try:
file = file.decode('utf-8')
f = json.loads(file)
if 'internal' in f and 'visible' in f:
history = f
else:
history = {
'internal': f['data'],
'visible': f['data_visible']
}
return history
except:
return history
def delete_history(unique_id, character, mode):
p = get_history_file_path(unique_id, character, mode)
delete_file(p)
def replace_character_names(text, name1, name2):
text = text.replace('{{user}}', name1).replace('{{char}}', name2)
return text.replace('<USER>', name1).replace('<BOT>', name2)
def generate_pfp_cache(character):
cache_folder = Path(shared.args.disk_cache_dir)
if not cache_folder.exists():
cache_folder.mkdir()
for path in [Path(f"characters/{character}.{extension}") for extension in ['png', 'jpg', 'jpeg']]:
if path.exists():
original_img = Image.open(path)
original_img.save(Path(f'{cache_folder}/pfp_character.png'), format='PNG')
thumb = make_thumbnail(original_img)
thumb.save(Path(f'{cache_folder}/pfp_character_thumb.png'), format='PNG')
return thumb
return None
def load_character(character, name1, name2):
context = greeting = ""
greeting_field = 'greeting'
picture = None
filepath = None
for extension in ["yml", "yaml", "json"]:
filepath = Path(f'characters/{character}.{extension}')
if filepath.exists():
break
if filepath is None or not filepath.exists():
logger.error(f"Could not find the character \"{character}\" inside characters/. No character has been loaded.")
raise ValueError
file_contents = open(filepath, 'r', encoding='utf-8').read()
data = json.loads(file_contents) if extension == "json" else yaml.safe_load(file_contents)
cache_folder = Path(shared.args.disk_cache_dir)
for path in [Path(f"{cache_folder}/pfp_character.png"), Path(f"{cache_folder}/pfp_character_thumb.png")]:
if path.exists():
path.unlink()
picture = generate_pfp_cache(character)
# Finding the bot's name
for k in ['name', 'bot', '<|bot|>', 'char_name']:
if k in data and data[k] != '':
name2 = data[k]
break
# Find the user name (if any)
for k in ['your_name', 'user', '<|user|>']:
if k in data and data[k] != '':
name1 = data[k]
break
if 'context' in data:
context = data['context'].strip()
elif "char_persona" in data:
context = build_pygmalion_style_context(data)
greeting_field = 'char_greeting'
greeting = data.get(greeting_field, greeting)
return name1, name2, picture, greeting, context
def load_instruction_template(template):
for filepath in [Path(f'instruction-templates/{template}.yaml'), Path('instruction-templates/Alpaca.yaml')]:
if filepath.exists():
break
else:
return ''
file_contents = open(filepath, 'r', encoding='utf-8').read()
data = yaml.safe_load(file_contents)
if 'instruction_template' in data:
return data['instruction_template']
else:
return jinja_template_from_old_format(data)
@functools.cache
def load_character_memoized(character, name1, name2):
return load_character(character, name1, name2)
@functools.cache
def load_instruction_template_memoized(template):
return load_instruction_template(template)
def upload_character(file, img, tavern=False):
decoded_file = file if isinstance(file, str) else file.decode('utf-8')
try:
data = json.loads(decoded_file)
except:
data = yaml.safe_load(decoded_file)
if 'char_name' in data:
name = data['char_name']
greeting = data['char_greeting']
context = build_pygmalion_style_context(data)
yaml_data = generate_character_yaml(name, greeting, context)
else:
name = data['name']
yaml_data = generate_character_yaml(data['name'], data['greeting'], data['context'])
outfile_name = name
i = 1
while Path(f'characters/{outfile_name}.yaml').exists():
outfile_name = f'{name}_{i:03d}'
i += 1
with open(Path(f'characters/{outfile_name}.yaml'), 'w', encoding='utf-8') as f:
f.write(yaml_data)
if img is not None:
img.save(Path(f'characters/{outfile_name}.png'))
logger.info(f'New character saved to "characters/{outfile_name}.yaml".')
return gr.update(value=outfile_name, choices=get_available_characters())
def build_pygmalion_style_context(data):
context = ""
if 'char_persona' in data and data['char_persona'] != '':
context += f"{data['char_name']}'s Persona: {data['char_persona']}\n"
if 'world_scenario' in data and data['world_scenario'] != '':
context += f"Scenario: {data['world_scenario']}\n"
if 'example_dialogue' in data and data['example_dialogue'] != '':
context += f"{data['example_dialogue'].strip()}\n"
context = f"{context.strip()}\n"
return context
def upload_tavern_character(img, _json):
_json = {'char_name': _json['name'], 'char_persona': _json['description'], 'char_greeting': _json['first_mes'], 'example_dialogue': _json['mes_example'], 'world_scenario': _json['scenario']}
return upload_character(json.dumps(_json), img, tavern=True)
def check_tavern_character(img):
if "chara" not in img.info:
return "Not a TavernAI card", None, None, gr.update(interactive=False)
decoded_string = base64.b64decode(img.info['chara']).replace(b'\\r\\n', b'\\n')
_json = json.loads(decoded_string)
if "data" in _json:
_json = _json["data"]
return _json['name'], _json['description'], _json, gr.update(interactive=True)
def upload_your_profile_picture(img):
cache_folder = Path(shared.args.disk_cache_dir)
if not cache_folder.exists():
cache_folder.mkdir()
if img is None:
if Path(f"{cache_folder}/pfp_me.png").exists():
Path(f"{cache_folder}/pfp_me.png").unlink()
else:
img = make_thumbnail(img)
img.save(Path(f'{cache_folder}/pfp_me.png'))
logger.info(f'Profile picture saved to "{cache_folder}/pfp_me.png"')
def generate_character_yaml(name, greeting, context):
data = {
'name': name,
'greeting': greeting,
'context': context,
}
data = {k: v for k, v in data.items() if v} # Strip falsy
return yaml.dump(data, sort_keys=False, width=float("inf"))
def generate_instruction_template_yaml(instruction_template):
data = {
'instruction_template': instruction_template
}
return my_yaml_output(data)
def save_character(name, greeting, context, picture, filename):
if filename == "":
logger.error("The filename is empty, so the character will not be saved.")
return
data = generate_character_yaml(name, greeting, context)
filepath = Path(f'characters/{filename}.yaml')
save_file(filepath, data)
path_to_img = Path(f'characters/{filename}.png')
if picture is not None:
picture.save(path_to_img)
logger.info(f'Saved {path_to_img}.')
def delete_character(name, instruct=False):
for extension in ["yml", "yaml", "json"]:
delete_file(Path(f'characters/{name}.{extension}'))
delete_file(Path(f'characters/{name}.png'))
def jinja_template_from_old_format(params, verbose=False):
MASTER_TEMPLATE = """
{%- set ns = namespace(found=false) -%}
{%- for message in messages -%}
{%- if message['role'] == 'system' -%}
{%- set ns.found = true -%}
{%- endif -%}
{%- endfor -%}
{%- if not ns.found -%}
{{- '<|PRE-SYSTEM|>' + '<|SYSTEM-MESSAGE|>' + '<|POST-SYSTEM|>' -}}
{%- endif %}
{%- for message in messages %}
{%- if message['role'] == 'system' -%}
{{- '<|PRE-SYSTEM|>' + message['content'] + '<|POST-SYSTEM|>' -}}
{%- else -%}
{%- if message['role'] == 'user' -%}
{{-'<|PRE-USER|>' + message['content'] + '<|POST-USER|>'-}}
{%- else -%}
{{-'<|PRE-ASSISTANT|>' + message['content'] + '<|POST-ASSISTANT|>' -}}
{%- endif -%}
{%- endif -%}
{%- endfor -%}
{%- if add_generation_prompt -%}
{{-'<|PRE-ASSISTANT-GENERATE|>'-}}
{%- endif -%}
"""
if 'context' in params and '<|system-message|>' in params['context']:
pre_system = params['context'].split('<|system-message|>')[0]
post_system = params['context'].split('<|system-message|>')[1]
else:
pre_system = ''
post_system = ''
pre_user = params['turn_template'].split('<|user-message|>')[0].replace('<|user|>', params['user'])
post_user = params['turn_template'].split('<|user-message|>')[1].split('<|bot|>')[0]
pre_assistant = '<|bot|>' + params['turn_template'].split('<|bot-message|>')[0].split('<|bot|>')[1]
pre_assistant = pre_assistant.replace('<|bot|>', params['bot'])
post_assistant = params['turn_template'].split('<|bot-message|>')[1]
def preprocess(string):
return string.replace('\n', '\\n').replace('\'', '\\\'')
pre_system = preprocess(pre_system)
post_system = preprocess(post_system)
pre_user = preprocess(pre_user)
post_user = preprocess(post_user)
pre_assistant = preprocess(pre_assistant)
post_assistant = preprocess(post_assistant)
if verbose:
print(
'\n',
repr(pre_system) + '\n',
repr(post_system) + '\n',
repr(pre_user) + '\n',
repr(post_user) + '\n',
repr(pre_assistant) + '\n',
repr(post_assistant) + '\n',
)
result = MASTER_TEMPLATE
if 'system_message' in params:
result = result.replace('<|SYSTEM-MESSAGE|>', preprocess(params['system_message']))
else:
result = result.replace('<|SYSTEM-MESSAGE|>', '')
result = result.replace('<|PRE-SYSTEM|>', pre_system)
result = result.replace('<|POST-SYSTEM|>', post_system)
result = result.replace('<|PRE-USER|>', pre_user)
result = result.replace('<|POST-USER|>', post_user)
result = result.replace('<|PRE-ASSISTANT|>', pre_assistant)
result = result.replace('<|PRE-ASSISTANT-GENERATE|>', pre_assistant.rstrip(' '))
result = result.replace('<|POST-ASSISTANT|>', post_assistant)
result = result.strip()
return result
def my_yaml_output(data):
'''
pyyaml is very inconsistent with multiline strings.
for simple instruction template outputs, this is enough.
'''
result = ""
for k in data:
result += k + ": |-\n"
for line in data[k].splitlines():
result += " " + line.rstrip(' ') + "\n"
return result

View file

@ -0,0 +1,172 @@
#
# Copyright 2016 The BigDL Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# This file is adapted from
# https://github.com/oobabooga/text-generation-webui/blob/main/modules/evaluate.py
import datetime
from pathlib import Path
import pandas as pd
import torch
from datasets import load_dataset
from tqdm import tqdm
from modules import shared
from modules.logging_colors import logger
from modules.models import clear_torch_cache, load_model, unload_model
from modules.models_settings import get_model_metadata, update_model_parameters
from modules.text_generation import encode
def load_past_evaluations():
if Path('logs/evaluations.csv').exists():
df = pd.read_csv(Path('logs/evaluations.csv'), dtype=str)
df['Perplexity'] = pd.to_numeric(df['Perplexity'])
return df
else:
return pd.DataFrame(columns=['Model', 'LoRAs', 'Dataset', 'Perplexity', 'stride', 'max_length', 'Date', 'Comment'])
past_evaluations = load_past_evaluations()
def save_past_evaluations(df):
global past_evaluations
past_evaluations = df
filepath = Path('logs/evaluations.csv')
filepath.parent.mkdir(parents=True, exist_ok=True)
df.to_csv(filepath, index=False)
def calculate_perplexity(models, input_dataset, stride, _max_length):
'''
Based on:
https://huggingface.co/docs/transformers/perplexity#calculating-ppl-with-fixedlength-models
'''
if not shared.args.no_use_fast:
logger.warning("--no_use_fast is not being used. If tokenizing the input dataset takes a long time, consider loading the model with that option checked.")
global past_evaluations
cumulative_log = ''
cumulative_log += "Loading the input dataset...\n\n"
yield cumulative_log
# Copied from https://github.com/qwopqwop200/GPTQ-for-LLaMa/blob/triton/utils/datautils.py
if input_dataset == 'wikitext':
data = load_dataset('wikitext', 'wikitext-2-raw-v1', split='test')
text = "\n\n".join(data['text'])
elif input_dataset == 'ptb':
data = load_dataset('ptb_text_only', 'penn_treebank', split='validation')
text = "\n\n".join(data['sentence'])
elif input_dataset == 'ptb_new':
data = load_dataset('ptb_text_only', 'penn_treebank', split='test')
text = " ".join(data['sentence'])
else:
with open(Path(f'training/datasets/{input_dataset}.txt'), 'r', encoding='utf-8') as f:
text = f.read()
for model in models:
if is_in_past_evaluations(model, input_dataset, stride, _max_length):
cumulative_log += f"`{model}` has already been tested. Ignoring.\n\n"
yield cumulative_log
continue
if model != 'current model':
try:
yield cumulative_log + f"Loading `{model}`...\n\n"
model_settings = get_model_metadata(model)
shared.settings.update({k: v for k, v in model_settings.items() if k in shared.settings}) # hijacking the interface defaults
update_model_parameters(model_settings) # hijacking the command-line arguments
unload_model()
shared.model, shared.tokenizer = load_model(model)
except:
cumulative_log += f"Failed to load `{model}`. Moving on.\n\n"
yield cumulative_log
continue
cumulative_log += f"Processing `{shared.model_name}`...\n\n"
yield cumulative_log + "Tokenizing the input dataset...\n\n"
encodings = encode(text, add_special_tokens=False)
seq_len = encodings.shape[1]
if _max_length:
max_length = _max_length
elif hasattr(shared.model.config, 'max_position_embeddings'):
max_length = shared.model.config.max_position_embeddings
else:
max_length = 2048
nlls = []
prev_end_loc = 0
for begin_loc in tqdm(range(0, seq_len, stride)):
yield cumulative_log + f"Evaluating... {100*begin_loc/seq_len:.2f}%"
end_loc = min(begin_loc + max_length, seq_len)
trg_len = end_loc - prev_end_loc # may be different from stride on last loop
input_ids = encodings[:, begin_loc:end_loc]
target_ids = input_ids.clone()
target_ids[:, :-trg_len] = -100
clear_torch_cache()
with torch.no_grad():
outputs = shared.model(input_ids=input_ids, labels=target_ids)
# loss is calculated using CrossEntropyLoss which averages over valid labels
# N.B. the model only calculates loss over trg_len - 1 labels, because it internally shifts the labels
# to the left by 1.
neg_log_likelihood = outputs.loss
nlls.append(neg_log_likelihood)
prev_end_loc = end_loc
if end_loc == seq_len:
break
ppl = torch.exp(torch.stack(nlls).mean())
add_entry_to_past_evaluations(float(ppl), shared.model_name, input_dataset, stride, _max_length)
save_past_evaluations(past_evaluations)
cumulative_log += f"The perplexity for `{shared.model_name}` is: {float(ppl)}\n\n"
yield cumulative_log
def add_entry_to_past_evaluations(perplexity, model, dataset, stride, max_length):
global past_evaluations
entry = {
'Model': model,
'LoRAs': ', '.join(shared.lora_names) or '-',
'Dataset': dataset,
'Perplexity': perplexity,
'stride': str(stride),
'max_length': str(max_length),
'Date': datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S'),
'Comment': ''
}
past_evaluations = pd.concat([past_evaluations, pd.DataFrame([entry])], ignore_index=True)
def is_in_past_evaluations(model, dataset, stride, max_length):
entries = past_evaluations[(past_evaluations['Model'] == model) &
(past_evaluations['Dataset'] == dataset) &
(past_evaluations['max_length'] == str(max_length)) &
(past_evaluations['stride'] == str(stride))]
if entries.shape[0] > 0:
return True
else:
return False
def generate_markdown_table():
sorted_df = past_evaluations.sort_values(by=['Dataset', 'stride', 'Perplexity', 'Date'])
return sorted_df

View file

@ -0,0 +1,248 @@
#
# Copyright 2016 The BigDL Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# This file is adapted from
# https://github.com/oobabooga/text-generation-webui/blob/main/modules/extensions.py
import traceback
from functools import partial
from inspect import signature
import gradio as gr
import extensions
import modules.shared as shared
from modules.logging_colors import logger
state = {}
available_extensions = []
setup_called = set()
def apply_settings(extension, name):
if not hasattr(extension, 'params'):
return
for param in extension.params:
_id = f"{name}-{param}"
if _id not in shared.settings:
continue
extension.params[param] = shared.settings[_id]
def load_extensions():
global state, setup_called
state = {}
for i, name in enumerate(shared.args.extensions):
if name in available_extensions:
if name != 'api':
logger.info(f'Loading the extension "{name}"')
try:
try:
exec(f"import extensions.{name}.script")
except ModuleNotFoundError:
logger.error(f"Could not import the requirements for '{name}'. Make sure to install the requirements for the extension.\n\nLinux / Mac:\n\npip install -r extensions/{name}/requirements.txt --upgrade\n\nWindows:\n\npip install -r extensions\\{name}\\requirements.txt --upgrade\n\nIf you used the one-click installer, paste the command above in the terminal window opened after launching the cmd script for your OS.")
raise
extension = getattr(extensions, name).script
apply_settings(extension, name)
if extension not in setup_called and hasattr(extension, "setup"):
setup_called.add(extension)
extension.setup()
state[name] = [True, i]
except:
logger.error(f'Failed to load the extension "{name}".')
traceback.print_exc()
# This iterator returns the extensions in the order specified in the command-line
def iterator():
for name in sorted(state, key=lambda x: state[x][1]):
if state[name][0]:
yield getattr(extensions, name).script, name
# Extension functions that map string -> string
def _apply_string_extensions(function_name, text, state, is_chat=False):
for extension, _ in iterator():
if hasattr(extension, function_name):
func = getattr(extension, function_name)
# Handle old extensions without the 'state' arg or
# the 'is_chat' kwarg
count = 0
has_chat = False
for k in signature(func).parameters:
if k == 'is_chat':
has_chat = True
else:
count += 1
if count == 2:
args = [text, state]
else:
args = [text]
if has_chat:
kwargs = {'is_chat': is_chat}
else:
kwargs = {}
text = func(*args, **kwargs)
return text
# Extension functions that map string -> string
def _apply_chat_input_extensions(text, visible_text, state):
for extension, _ in iterator():
if hasattr(extension, 'chat_input_modifier'):
text, visible_text = extension.chat_input_modifier(text, visible_text, state)
return text, visible_text
# custom_generate_chat_prompt handling - currently only the first one will work
def _apply_custom_generate_chat_prompt(text, state, **kwargs):
for extension, _ in iterator():
if hasattr(extension, 'custom_generate_chat_prompt'):
return extension.custom_generate_chat_prompt(text, state, **kwargs)
return None
# Extension that modifies the input parameters before they are used
def _apply_state_modifier_extensions(state):
for extension, _ in iterator():
if hasattr(extension, "state_modifier"):
state = getattr(extension, "state_modifier")(state)
return state
# Extension that modifies the chat history before it is used
def _apply_history_modifier_extensions(history):
for extension, _ in iterator():
if hasattr(extension, "history_modifier"):
history = getattr(extension, "history_modifier")(history)
return history
# Extension functions that override the default tokenizer output - The order of execution is not defined
def _apply_tokenizer_extensions(function_name, state, prompt, input_ids, input_embeds):
for extension, _ in iterator():
if hasattr(extension, function_name):
prompt, input_ids, input_embeds = getattr(extension, function_name)(state, prompt, input_ids, input_embeds)
return prompt, input_ids, input_embeds
# Allow extensions to add their own logits processors to the stack being run.
# Each extension would call `processor_list.append({their LogitsProcessor}())`.
def _apply_logits_processor_extensions(function_name, processor_list, input_ids):
for extension, _ in iterator():
if hasattr(extension, function_name):
result = getattr(extension, function_name)(processor_list, input_ids)
if type(result) is list:
processor_list = result
return processor_list
# Get prompt length in tokens after applying extension functions which override the default tokenizer output
# currently only the first one will work
def _apply_custom_tokenized_length(prompt):
for extension, _ in iterator():
if hasattr(extension, 'custom_tokenized_length'):
return getattr(extension, 'custom_tokenized_length')(prompt)
return None
# Custom generate reply handling - currently only the first one will work
def _apply_custom_generate_reply():
for extension, _ in iterator():
if hasattr(extension, 'custom_generate_reply'):
return getattr(extension, 'custom_generate_reply')
return None
def _apply_custom_css():
all_css = ''
for extension, _ in iterator():
if hasattr(extension, 'custom_css'):
all_css += getattr(extension, 'custom_css')()
return all_css
def _apply_custom_js():
all_js = ''
for extension, _ in iterator():
if hasattr(extension, 'custom_js'):
all_js += getattr(extension, 'custom_js')()
return all_js
def create_extensions_block():
to_display = []
for extension, name in iterator():
if hasattr(extension, "ui") and not (hasattr(extension, 'params') and extension.params.get('is_tab', False)):
to_display.append((extension, name))
# Creating the extension ui elements
if len(to_display) > 0:
with gr.Column(elem_id="extensions"):
for row in to_display:
extension, _ = row
extension.ui()
def create_extensions_tabs():
for extension, name in iterator():
if hasattr(extension, "ui") and (hasattr(extension, 'params') and extension.params.get('is_tab', False)):
display_name = getattr(extension, 'params', {}).get('display_name', name)
with gr.Tab(display_name, elem_classes="extension-tab"):
extension.ui()
EXTENSION_MAP = {
"input": partial(_apply_string_extensions, "input_modifier"),
"output": partial(_apply_string_extensions, "output_modifier"),
"chat_input": _apply_chat_input_extensions,
"state": _apply_state_modifier_extensions,
"history": _apply_history_modifier_extensions,
"bot_prefix": partial(_apply_string_extensions, "bot_prefix_modifier"),
"tokenizer": partial(_apply_tokenizer_extensions, "tokenizer_modifier"),
'logits_processor': partial(_apply_logits_processor_extensions, 'logits_processor_modifier'),
"custom_generate_chat_prompt": _apply_custom_generate_chat_prompt,
"custom_generate_reply": _apply_custom_generate_reply,
"tokenized_length": _apply_custom_tokenized_length,
"css": _apply_custom_css,
"js": _apply_custom_js
}
def apply_extensions(typ, *args, **kwargs):
if typ not in EXTENSION_MAP:
raise ValueError(f"Invalid extension type {typ}")
return EXTENSION_MAP[typ](*args, **kwargs)

View file

@ -0,0 +1,57 @@
#
# Copyright 2016 The BigDL Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# This file is adapted from
# https://github.com/oobabooga/text-generation-webui/blob/main/modules/github.py
import subprocess
from pathlib import Path
new_extensions = set()
def clone_or_pull_repository(github_url):
global new_extensions
repository_folder = Path("extensions")
repo_name = github_url.rstrip("/").split("/")[-1].split(".")[0]
# Check if the repository folder exists
if not repository_folder.exists():
repository_folder.mkdir(parents=True)
repo_path = repository_folder / repo_name
# Check if the repository is already cloned
if repo_path.exists():
yield f"Updating {github_url}..."
# Perform a 'git pull' to update the repository
try:
pull_output = subprocess.check_output(["git", "-C", repo_path, "pull"], stderr=subprocess.STDOUT)
yield "Done."
return pull_output.decode()
except subprocess.CalledProcessError as e:
return str(e)
# Clone the repository
try:
yield f"Cloning {github_url}..."
clone_output = subprocess.check_output(["git", "clone", github_url, repo_path], stderr=subprocess.STDOUT)
new_extensions.add(repo_name)
yield f"The extension `{repo_name}` has been downloaded.\n\nPlease close the the web UI completely and launch it again to be able to load it."
return clone_output.decode()
except subprocess.CalledProcessError as e:
return str(e)

View file

@ -0,0 +1,696 @@
#
# Copyright 2016 The BigDL Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# This file is adapted from
# https://github.com/huggingface/transformers/pull/27557
import logging
import re
import time
from abc import ABC
from functools import lru_cache
from typing import Dict, List
import torch
from modules import shared
logger = logging.getLogger(__name__)
########################
# EBNF Grammar Parsing #
########################
END_OF_ALTERNATE_MARKER = 0
END_OF_RULE_MARKER = 0
TO_BE_FILLED_MARKER = 0
REF_RULE_MARKER = 1
LITERAL_MARKER = 2
class ParseState:
def __init__(self):
self.symbol_ids = {}
self.grammar_encoding = [] # old name: out_grammar
def get_symbol_id(state, src):
if src not in state.symbol_ids:
state.symbol_ids[src] = len(state.symbol_ids)
return state.symbol_ids[src]
def generate_symbol_id(state, base_name):
next_id = len(state.symbol_ids)
state.symbol_ids[base_name + "_" + str(next_id)] = next_id
return next_id
def is_word_char(c):
return c.isalnum() or c == "-" or c == "_"
def hex_to_int(c):
if c.isdigit():
return int(c)
elif "a" <= c.lower() <= "f":
return ord(c.lower()) - ord("a") + 10
return -1
def remove_leading_white_space(src, newline_ok):
"""
Skips over whitespace and comments in the input string.
This function processes the input string, skipping over any spaces, tabs,
and content following a '#' character, which denotes a comment. The parsing
of a comment continues until the end of the line (denoted by newline characters
'\r' or '\n'). If the 'newline_ok' parameter is set to False, the function
will stop processing and return the remaining string upon encountering a
newline character, otherwise it will skip over newline characters as well.
Parameters:
src (str): The input string to be processed.
newline_ok (bool): A flag indicating whether encountering a newline character
should stop the parsing (False) or if it should be skipped (True).
Returns:
str: The remaining portion of the input string after skipping whitespace and comments.
"""
pos = 0
while pos < len(src) and (src[pos].isspace() or src[pos] == "#"):
if src[pos] == "#":
while pos < len(src) and src[pos] not in ("\r", "\n"):
pos += 1
else:
if not newline_ok and src[pos] in ("\r", "\n"):
break
pos += 1
return src[pos:]
def parse_name(src):
pos = 0
while pos < len(src) and is_word_char(src[pos]):
pos += 1
if pos == 0:
raise RuntimeError("expecting name at " + src)
return src[:pos], src[pos:]
def parse_char(src):
"""
parse the leading char from the input string
:param src:
:return: char, remaining_src
"""
# if we have a backslash, it's maybe an escape
if src[0] == "\\":
esc = src[1]
if esc == "x":
first = hex_to_int(src[2])
if first > -1:
second = hex_to_int(src[3])
if second > -1:
return (first << 4) + second, src[4:]
raise RuntimeError("expecting \\xNN at " + src)
elif esc in ('"', "[", "]"):
return esc, src[2:]
elif esc == "r":
return "\r", src[2:]
elif esc == "n":
return "\n", src[2:]
elif esc == "t":
return "\t", src[2:]
raise RuntimeError("unknown escape at " + src)
elif src:
return src[0], src[1:]
raise RuntimeError("unexpected end of input")
def parse_sequence(state, src, rule_name, outbuf, is_nested):
out_start_pos = len(outbuf)
# sequence size, will be replaced at end when known
outbuf.append(TO_BE_FILLED_MARKER)
last_sym_start = len(outbuf)
remaining_src = src
while remaining_src:
if remaining_src[0] == '"': # literal string
remaining_src = remaining_src[1:]
last_sym_start = len(outbuf)
while remaining_src[0] != '"':
char, remaining_src = parse_char(remaining_src)
# each char of a literal is encoded as a "range" of char - char
outbuf.append(LITERAL_MARKER)
outbuf.append(ord(char))
outbuf.append(ord(char))
remaining_src = remove_leading_white_space(remaining_src[1:], is_nested)
elif remaining_src[0] == "[": # char range(s)
remaining_src = remaining_src[1:]
last_sym_start = len(outbuf)
# num chars in range - replaced at end of loop
outbuf.append(TO_BE_FILLED_MARKER)
while remaining_src[0] != "]":
char, remaining_src = parse_char(remaining_src)
outbuf.append(ord(char))
if remaining_src[0] == "-" and remaining_src[1] != "]":
endchar_pair, remaining_src = parse_char(remaining_src[1:])
outbuf.append(ord(endchar_pair))
else:
# chars that aren't part of a c1-c2 range are just doubled (i.e., c-c)
outbuf.append(ord(char))
# replace num chars with actual
outbuf[last_sym_start] = len(outbuf) - last_sym_start - 1
remaining_src = remove_leading_white_space(remaining_src[1:], is_nested)
elif is_word_char(remaining_src[0]): # rule reference
name, remaining_src = parse_name(remaining_src)
ref_rule_id = get_symbol_id(state, name)
remaining_src = remove_leading_white_space(remaining_src, is_nested)
last_sym_start = len(outbuf)
outbuf.append(REF_RULE_MARKER)
outbuf.append(ref_rule_id)
elif remaining_src[0] == "(": # grouping
# parse nested alternates into synthesized rule
remaining_src = remove_leading_white_space(remaining_src[1:], True)
sub_rule_id = generate_symbol_id(state, rule_name)
remaining_src = parse_alternates(state, remaining_src, rule_name, sub_rule_id, True)
last_sym_start = len(outbuf)
# output reference to synthesized rule
outbuf.append(REF_RULE_MARKER)
outbuf.append(sub_rule_id)
if remaining_src[0] != ")":
raise RuntimeError("expecting ')' at " + remaining_src)
remaining_src = remove_leading_white_space(remaining_src[1:], is_nested)
elif remaining_src[0] in ("*", "+", "?"): # repetition operator
if len(outbuf) - out_start_pos - 1 == 0:
raise RuntimeError("expecting preceeding item to */+/? at " + remaining_src)
out_grammar = state.grammar_encoding
# apply transformation to previous symbol (last_sym_start -
# end) according to rewrite rules:
# S* --> S' ::= S S' |
# S+ --> S' ::= S S' | S
# S? --> S' ::= S |
sub_rule_id = generate_symbol_id(state, rule_name)
out_grammar.append(sub_rule_id)
sub_rule_start = len(out_grammar)
# placeholder for size of 1st alternate
out_grammar.append(TO_BE_FILLED_MARKER)
# add preceding symbol to generated rule
out_grammar.extend(outbuf[last_sym_start:])
if remaining_src[0] in ("*", "+"):
# cause generated rule to recurse
out_grammar.append(REF_RULE_MARKER)
out_grammar.append(sub_rule_id)
# apply actual size
out_grammar[sub_rule_start] = len(out_grammar) - sub_rule_start
# mark end of 1st alternate
out_grammar.append(END_OF_ALTERNATE_MARKER)
sub_rule_start = len(out_grammar)
# placeholder for size of 2nd alternate
out_grammar.append(TO_BE_FILLED_MARKER)
if remaining_src[0] == "+":
# add preceding symbol as alternate only for '+'
out_grammar.extend(outbuf[last_sym_start:])
# apply actual size of 2nd alternate
out_grammar[sub_rule_start] = len(out_grammar) - sub_rule_start
# mark end of 2nd alternate, then end of rule
out_grammar.append(END_OF_ALTERNATE_MARKER)
out_grammar.append(END_OF_RULE_MARKER)
# in original rule, replace previous symbol with reference to generated rule
outbuf[last_sym_start:] = [1, sub_rule_id]
remaining_src = remove_leading_white_space(remaining_src[1:], is_nested)
else:
break
# apply actual size of this alternate sequence
outbuf[out_start_pos] = len(outbuf) - out_start_pos
# mark end of alternate
outbuf.append(END_OF_ALTERNATE_MARKER)
return remaining_src
def parse_alternates(state, src, rule_name, rule_id, is_nested):
outbuf = []
remaining_src = parse_sequence(state, src, rule_name, outbuf, is_nested)
while remaining_src and remaining_src[0] == "|":
remaining_src = remove_leading_white_space(remaining_src[1:], True)
remaining_src = parse_sequence(state, remaining_src, rule_name, outbuf, is_nested)
state.grammar_encoding.append(rule_id)
state.grammar_encoding.extend(outbuf)
state.grammar_encoding.append(0)
return remaining_src
def parse_rule(state, src):
name, remaining_src = parse_name(src)
remaining_src = remove_leading_white_space(remaining_src, False)
rule_id = get_symbol_id(state, name)
if remaining_src[:3] != "::=":
raise RuntimeError("expecting ::= at " + remaining_src)
remaining_src = remove_leading_white_space(remaining_src[3:], True)
remaining_src = parse_alternates(state, remaining_src, name, rule_id, False)
if remaining_src and remaining_src[0] == "\r":
remaining_src = remaining_src[2:] if remaining_src[1] == "\n" else remaining_src[1:]
elif remaining_src and remaining_src[0] == "\n":
remaining_src = remaining_src[1:]
elif remaining_src:
raise RuntimeError("expecting newline or end at " + remaining_src)
return remove_leading_white_space(remaining_src, True)
def parse_ebnf(src):
try:
state = ParseState()
grammar_repr = remove_leading_white_space(src, True)
last_grammar_repr = ""
while grammar_repr:
if last_grammar_repr:
last_parsed_rule_len = len(last_grammar_repr) - len(grammar_repr)
logger.debug(f"last_parsed_rule: {last_grammar_repr[:last_parsed_rule_len]}")
last_grammar_repr = grammar_repr
grammar_repr = parse_rule(state, grammar_repr)
state.grammar_encoding.append(0xFFFF)
return state
except RuntimeError as err:
logger.warning("error parsing grammar:", err)
return ParseState()
def print_rule(file, grammar_encoding, index, symbol_id_names):
rule_id = grammar_encoding[index]
print(f"<{index}>{symbol_id_names[rule_id]} ::=", end=" ", file=file)
pos = index + 1
while grammar_encoding[pos]:
if pos - 1 > index:
print("|", end=" ", file=file)
pos += 1 # sequence size, not needed here
while grammar_encoding[pos]:
if grammar_encoding[pos] == REF_RULE_MARKER:
ref_rule_id = grammar_encoding[pos + 1]
print(
f"<{pos}>{symbol_id_names[ref_rule_id]}",
end=" ",
file=file,
)
pos += 2
else:
print("<{}>[".format(pos), end="", file=file)
num_chars = grammar_encoding[pos]
pos += 1
for i in range(0, num_chars, 2):
print("{}-".format(chr(grammar_encoding[pos + i])), end="", file=file)
if i + 1 < num_chars:
print("{}".format(chr(grammar_encoding[pos + i + 1])), end="", file=file)
print("]", end=" ", file=file)
pos += num_chars
pos += 1
print(file=file)
return pos + 1
def print_grammar(file, state):
pos = 0
symbol_id_names = {v: k for k, v in state.symbol_ids.items()}
print("Grammar Rules:", file=file)
while state.grammar_encoding[pos] != 0xFFFF:
pos = print_rule(file, state.grammar_encoding, pos, symbol_id_names)
pos = 0
print("\nBinary representation:", file=file)
while state.grammar_encoding[pos] != 0xFFFF:
print(f"{state.grammar_encoding[pos]:04x}", end=" ", file=file)
pos += 1
print("ffff\n")
###################################
# EBNF Grammar Parsing ends here #
###################################
class GrammarConstraint(ABC):
def __init__(self, grammar_str, start_rule_name, tokenizer):
self.tt = 0
self.nt = 0
state = parse_ebnf(grammar_str)
grammar_encoding = state.grammar_encoding
self.start_rule_id = state.symbol_ids.get(start_rule_name)
self.eos_token_id = tokenizer.eos_token_id
self.token_trie = TokenTrie(tokenizer)
self.tokenizer = tokenizer
self.grammar_encoding = grammar_encoding
pos = 0
rules: Dict[int, int] = {}
while grammar_encoding[pos] != 0xFFFF:
rule_id = grammar_encoding[pos]
# Store the current position in the 'rules' list at the index corresponding to rule_id.
# This effectively maps each rule_id to its position in the grammar encoding.
rules[rule_id] = pos
pos += 1
# Continue to the next rule in the encoding.
# The loop advances by the size indicated at the current position (grammar_encoding[pos])
# plus one for the size field itself.
while grammar_encoding[pos]:
pos += 1 + grammar_encoding[pos]
# Now we're at the end of the rule,
# so advance to the next rule by skipping the 0, which means 'end of rule'.
pos += 1
self.start_rule_pos = rules[self.start_rule_id]
self.rules_pos_dict: Dict[int, int] = rules
def init_stacks(self):
# suppose the start rule position is 0, then grammar_encoding[0] = rule_id
# grammar_encoding[1] = rule_size
# grammar_encoding[2] = rule_type
# this is why we need to add 2 to the start rule position
stack = [self.start_rule_pos + 2]
# convert to tuple for caching(immutable)
return self.advance_stack(tuple(stack))
# For each stack, resolve rules to find the actual characters that are
# accepted by this stack (not the set of sub-rules).
# This is where the parsing happens.
# The parsing is a top-down, left-to-right, depth-first traversal of the
# grammar.
@lru_cache(maxsize=32768)
def advance_stack(self, stack):
stack = list(stack)
# If the stack is empty, we're done. Because no more tokens should be accepted.
if len(stack) == 0:
return [stack]
# Get the top of the stack.
pos = stack[-1]
# If the stack head is a terminal(literal), we can resolve it immediately.
# literal is marked with 2 in the grammar encoding.
if self.grammar_encoding[pos] > 1:
return [stack]
# The stack head is a nonterminal (a rule reference, 1 in the grammar encoding).
# Resolving this rule gives a set of one or more possible positions
# (e.g. two in `a ::= b | c`)
# We pop the current rule off the stack and, for each option, push:
# - the symbol following this symbol in the current rule; then
# - the first symbol of the resolved rule.
referenced_rule_id = self.grammar_encoding[pos + 1]
# subpos should points to the size of the subrule
subpos = self.rules_pos_dict[referenced_rule_id] + 1
stacks: List[List[int]] = []
# do depth-first search to find all possible rules and check the next terminal
# When this value is non-zero, it indicates that subpos is not yet at the end of the rule, so we can continue.
# here subpos is a pointer, and the value in the rule encoding can never be 0 except for the end of the rule.
while self.grammar_encoding[subpos]:
new_stack = stack[:-1]
if self.grammar_encoding[pos + 2]:
# check if there is a next symbol in the current rule, e.g. `a ::= b c | d`
# if yes, push the pos to rule_size to the stack
new_stack.append(pos + 2)
# if the type of the next symbol is not "empty", push the first symbol of the resolved rule to the stack
if self.grammar_encoding[subpos + 1]:
new_stack.append(subpos + 1)
stacks.extend(self.advance_stack(tuple(new_stack)))
# The increment subpos += self.grammar_encoding[subpos] + 1
# moves subpos forward in the grammar encoding array to the next alternative in the current rule.
subpos += self.grammar_encoding[subpos] + 1
return stacks
def accept_char(self, *args, **kwargs):
"""Process a byte according to the grammar rules."""
raise NotImplementedError
def accept_token_id(self, *args, **kwargs):
"""Process a token according to the grammar rules."""
raise NotImplementedError
def filter_vocab(self, *args, **kwargs):
raise NotImplementedError
class IncrementalGrammarConstraint(GrammarConstraint):
def __init__(self, grammar_str, start_rule_name, tokenizer):
super().__init__(grammar_str, start_rule_name, tokenizer)
def accept_char(self, byte, stacks):
new_stacks = []
for stack in stacks:
# stack is empty
if not stack:
continue
pos = stack[-1]
num_chars = self.grammar_encoding[pos]
# to make pos point to the size of the char range rule
pos += 1
found = False
for i in range(0, num_chars, 2):
if self.grammar_encoding[pos + i] <= byte and byte <= self.grammar_encoding[pos + i + 1]:
found = True
break
if not found:
continue
pos += num_chars
new_stack = stack[:-1]
if self.grammar_encoding[pos]:
new_stack.append(pos)
new_stacks.extend(self.advance_stack(tuple(new_stack)))
return new_stacks
def accept_string(self, string: str, stacks: List[List[int]]):
_bytes = bytes(string, "utf-8")
for byte in _bytes:
stacks = self.accept_char(byte, stacks)
return stacks
def accept_token_id(self, token_id: int, stacks: List[List[int]]):
if token_id == self.eos_token_id:
if stacks and all(len(stack) != 0 for stack in stacks):
raise Exception(
f"At least one of the stack should be empty when EOS is reached. However, "
f"the stacks are {stacks}"
)
return []
for byte in self.token_trie.id2str(token_id):
stacks = self.accept_char(byte, stacks)
# check updated stacks
# TODO, I commented this out because it will fail when the stack is empty
# empty stack means the end of the grammar
# assert stacks != []
return stacks
def accept_token_ids(self, token_ids: List[int], stacks: List[List[int]], as_string=True):
if as_string:
string = self.tokenizer.decode(token_ids)
stacks = self.accept_string(string, stacks)
else:
for token_id in token_ids:
stacks = self.accept_token_id(token_id, stacks)
return stacks
def batch_filter_vocab(self, batch_stacks, device):
batch_acceptance = []
for stacks in batch_stacks:
batch_acceptance.append(self.filter_vocab(stacks, device))
return torch.stack(batch_acceptance)
def filter_vocab(self, stacks, device):
if not stacks: # Check if stacks is empty
# Handle the empty case: for example, return a tensor of False
# The size of the tensor should match the size of your vocabulary
vocab_size = len(self.token_trie)
logger.debug(f"sum of acceptance: {0}")
return torch.zeros(vocab_size, dtype=torch.bool, device=device)
acceptance_matrix = torch.cat([self.token_acceptance_for_stack(tuple(stack), device) for stack in stacks])
# Merge stacks: any True => True
acceptance = acceptance_matrix.reshape(len(stacks), -1).any(dim=0)
logger.debug(f"sum of acceptance: {acceptance.sum()}")
return acceptance
# For each sub-rule in the grammar, cache whether each byte is accepted.
@lru_cache(maxsize=None)
def pos_char_acceptance(self, pos):
acceptance = [False] * 256
num_chars = self.grammar_encoding[pos]
pos += 1
for i in range(0, num_chars, 2):
start = self.grammar_encoding[pos + i]
end = self.grammar_encoding[pos + i + 1]
for j in range(start, end + 1):
acceptance[j] = True
return acceptance
# Probably this should be configurable. If the grammar has an exceedingly
# large number of states, the correct setting is a tradeoff between GPU
# RAM usage and recomputation time.
#
# The main variable that pushes usage up here is number of states in the
# grammar.
@lru_cache(maxsize=32768)
def token_acceptance_for_stack(self, stack, device):
st = time.time()
stack = list(stack) # needs to come in as a tuple for lru_cache
accepts = [False] * len(self.token_trie)
accepts[self.eos_token_id] = len(stack) == 0
if len(stack) == 0:
logger.debug("empty stack")
def traverse_trie(trie, stacks):
for byte, next_trie in trie.items():
if byte == LEAF:
token_id = next_trie
if token_id != self.eos_token_id:
accepts[token_id] = bool(stacks)
continue
new_stacks = []
for stk in stacks:
if not stk:
continue
pos = stk[-1]
num_chars = self.grammar_encoding[pos]
if not self.pos_char_acceptance(pos)[byte]:
continue
pos += num_chars + 1
new_stack = stk[:-1]
if self.grammar_encoding[pos]:
new_stack.append(pos)
new_stacks.extend(self.advance_stack(tuple(new_stack)))
if new_stacks:
traverse_trie(next_trie, new_stacks)
traverse_trie(self.token_trie.trie, [stack])
et = time.time() - st
x = torch.tensor(accepts, dtype=torch.bool, device=device)
self.tt += et
self.nt += 1
return x
class StaticGrammarConstraint(GrammarConstraint):
def __init__(self, grammar_str, start_rule_name, tokenizer):
super().__init__(grammar_str, start_rule_name, tokenizer)
def accept_char(self):
raise NotImplementedError
#################
# DATA STRUCTURES
#################
LEAF = -1
class TokenTrie:
def __init__(self, tokenizer):
self.eos_token_id = tokenizer.eos_token_id
self.tokens = []
self.trie = {}
self.load_tokens(tokenizer)
def id2str(self, token_id):
return self.tokens[token_id]
def __len__(self):
return len(self.tokens)
def load_tokens(self, tokenizer):
def replace_hex(match):
hex_value = match.group(1)
return chr(int(hex_value, 16))
if "gpt2" in tokenizer.__class__.__name__.lower():
special = tokenizer.additional_special_tokens_ids
# Here, the decoder does a string replace on a bunch of sequences
# like ' .' for '.'. This interferes with our assumptions, where a
# token should always have exactly one representation.
# Fortunately(?) text-generation-inference doesn't seem to run this
# cleanup, so we get extraneous spaces. So, in order to generate
# the right token set for TGI, we have to skip the space trimming.
# See:
# https://github.com/huggingface/transformers/blob/main/src/transformers/tokenization_utils_base.py#L3588-L3600
def fmt_token(id):
if id in special:
return None
return bytes(tokenizer.decode([id], clean_up_tokenization_spaces=False), "utf-8")
elif "llama" in tokenizer.__class__.__name__.lower():
def fmt_token(id):
token = tokenizer.convert_ids_to_tokens(id)
token = re.sub(r"<0x([0-9a-fA-F]{2})>", replace_hex, token)
token = token.replace("", " ")
return bytes(token, "utf-8")
else:
print("Warning: unrecognized tokenizer: using default token formatting")
def fmt_token(id):
token = tokenizer.convert_ids_to_tokens(id)
return bytes(token, "utf-8")
# note: vocab_size doesn't work here because there are also
# get_added_vocab() tokens
self.tokens = [fmt_token(i) for i in range(len(tokenizer.get_vocab()))]
for token_id, token_bytes in enumerate(self.tokens):
if token_bytes is not None:
self.insert_into_trie(self.trie, token_bytes, token_id)
def insert_into_trie(self, trie, token_bytes, token_id):
current = trie
for byte in token_bytes:
if byte not in current:
current[byte] = {}
current = current[byte]
current[LEAF] = token_id
@lru_cache(maxsize=5)
def initialize_grammar(grammar_string):
return IncrementalGrammarConstraint(grammar_string.strip(), start_rule_name="root", tokenizer=shared.tokenizer)

View file

@ -0,0 +1,113 @@
#
# Copyright 2016 The BigDL Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# This file is adapted from
# https://github.com/huggingface/transformers/pull/27557
import math
import torch
from transformers.generation.logits_process import LogitsProcessor
from transformers.utils import add_start_docstrings
LOGITS_PROCESSOR_INPUTS_DOCSTRING = r"""
Args:
input_ids (`torch.LongTensor` of shape `(batch_size, sequence_length)`):
Indices of input sequence tokens in the vocabulary. [What are input IDs?](../glossary#input-ids)
scores (`torch.FloatTensor` of shape `(batch_size, config.vocab_size)`):
Prediction scores of a language modeling head. These can be logits for each vocabulary when not using beam
search or log softmax for each vocabulary token when using beam search
Return:
`torch.FloatTensor` of shape `(batch_size, config.vocab_size)`: The processed prediction scores.
"""
class GrammarConstrainedLogitsProcessor(LogitsProcessor):
def __init__(self, grammar_constraint):
self.last_size = None
self.grammar_constraint = grammar_constraint
self.batch_stacks = None
def filter_logits(self, logits, device):
# resolve each stack to a tensor of True/False for each token
# indicating acceptance
# acceptance = self.grammar_acceptor.filter_vocab(self.stacks, device)
acceptance = self.grammar_constraint.batch_filter_vocab(self.batch_stacks, device)
# logger.debug(acceptance)
# Logits to -inf where False
logits[~acceptance] = -math.inf
# TODO: batching
def process_logits(self, input_ids, scores, parse_start_index=None):
"""
:param input_ids:
:param scores:
:param parse_start_index: default None, which means generate from scratch. Set to 0 to parse all input_ids
:return:
"""
# we dynamically create stacks at the first call, so that we know the batch size and beam size
if self.batch_stacks is None:
self.batch_stacks = [self.grammar_constraint.init_stacks() for _ in range(len(input_ids))]
# if self.last_size is not set (which would be the case when processing the first token).
# In this case, do nothing.
if self.last_size is None:
prefix_to_parse = [
single_input_ids[parse_start_index:] if parse_start_index is not None else []
for single_input_ids in input_ids
]
# self.grammar_acceptor.accept_token_ids(prefix_to_parse, self.stacks)
self.batch_stacks = [
self.grammar_constraint.accept_token_ids(prefix, stack)
for prefix, stack in zip(prefix_to_parse, self.batch_stacks)
]
# if the length of the current input IDs (input_ids[0]) is exactly one more than self.last_size.
# This is expected in a scenario where inputs are processed incrementally, one token at a time.
elif len(input_ids[0]) == self.last_size + 1:
# self.stacks = self.grammar_acceptor.accept_token_id(input_ids[0][-1], self.stacks)
self.batch_stacks = [
self.grammar_constraint.accept_token_id(single_input_ids[-1], stack)
for single_input_ids, stack in zip(input_ids, self.batch_stacks)
]
# ensure that the input size is consistent with the expected incremental processing
# (i.e., one token at a time).
else:
# here we check if the input_ids are one token longer than the last time we processed
# but we don't check if input_ids are actually valid.
# Imagine a scenario where we generate 10 tokens, then we replace the 10 generated tokens with 10 new tokens.
# In this case, the input_ids will be consistent with the last_size, but the input_ids are not valid.
# However, should we really check if the input_ids are valid here?
# If we do, then we need to reparse the whole input_ids at each call, which is not efficient.
# Maybe we should just trust the user to provide valid input_ids?
# The conclusion is that, we assume the input_ids are valid, and our generation will be correct.
# If the input_ids are not valid, then the generation result will be wrong and we don't take responsibility for that.
raise RuntimeError(
"Input ID's length is inconsistent with the current state of "
"the GrammarConstrainedLogitsProcessor. If you want to process "
"another input sequence, please instantiate a new "
"GrammarConstrainedLogitsProcessor."
)
self.filter_logits(scores, scores.device)
self.last_size = len(input_ids[0])
return scores
@add_start_docstrings(LOGITS_PROCESSOR_INPUTS_DOCSTRING)
def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor) -> torch.FloatTensor:
return self.process_logits(input_ids, scores)

View file

@ -0,0 +1,328 @@
#
# Copyright 2016 The BigDL Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# This file is adapted from
# https://github.com/oobabooga/text-generation-webui/blob/main/modules/html_generator.py
import html
import os
import re
import time
from pathlib import Path
import markdown
from PIL import Image, ImageOps
from modules.utils import get_available_chat_styles
from modules import shared
# This is to store the paths to the thumbnails of the profile pictures
image_cache = {}
with open(Path(__file__).resolve().parent / '../css/html_readable_style.css', 'r') as f:
readable_css = f.read()
with open(Path(__file__).resolve().parent / '../css/html_4chan_style.css', 'r') as css_f:
_4chan_css = css_f.read()
with open(Path(__file__).resolve().parent / '../css/html_instruct_style.css', 'r') as f:
instruct_css = f.read()
# Custom chat styles
chat_styles = {}
for k in get_available_chat_styles():
chat_styles[k] = open(Path(f'css/chat_style-{k}.css'), 'r').read()
# Handle styles that derive from other styles
for k in chat_styles:
lines = chat_styles[k].split('\n')
input_string = lines[0]
match = re.search(r'chat_style-([a-z\-]*)\.css', input_string)
if match:
style = match.group(1)
chat_styles[k] = chat_styles.get(style, '') + '\n\n' + '\n'.join(lines[1:])
def fix_newlines(string):
string = string.replace('\n', '\n\n')
string = re.sub(r"\n{3,}", "\n\n", string)
string = string.strip()
return string
def replace_blockquote(m):
return m.group().replace('\n', '\n> ').replace('\\begin{blockquote}', '').replace('\\end{blockquote}', '')
def convert_to_markdown(string):
# Blockquote
string = re.sub(r'(^|[\n])&gt;', r'\1>', string)
pattern = re.compile(r'\\begin{blockquote}(.*?)\\end{blockquote}', re.DOTALL)
string = pattern.sub(replace_blockquote, string)
# Code
string = string.replace('\\begin{code}', '```')
string = string.replace('\\end{code}', '```')
string = re.sub(r"(.)```", r"\1\n```", string)
result = ''
is_code = False
for line in string.split('\n'):
if line.lstrip(' ').startswith('```'):
is_code = not is_code
result += line
if is_code or line.startswith('|'): # Don't add an extra \n for tables or code
result += '\n'
else:
result += '\n\n'
result = result.strip()
if is_code:
result += '\n```' # Unfinished code block
# Unfinished list, like "\n1.". A |delete| string is added and then
# removed to force a <ol> or <ul> to be generated instead of a <p>.
if re.search(r'(\n\d+\.?|\n\*\s*)$', result):
delete_str = '|delete|'
if re.search(r'(\d+\.?)$', result) and not result.endswith('.'):
result += '.'
result = re.sub(r'(\n\d+\.?|\n\*\s*)$', r'\g<1> ' + delete_str, result)
html_output = markdown.markdown(result, extensions=['fenced_code', 'tables'])
pos = html_output.rfind(delete_str)
if pos > -1:
html_output = html_output[:pos] + html_output[pos + len(delete_str):]
else:
html_output = markdown.markdown(result, extensions=['fenced_code', 'tables'])
# Unescape code blocks
pattern = re.compile(r'<code[^>]*>(.*?)</code>', re.DOTALL)
html_output = pattern.sub(lambda x: html.unescape(x.group()), html_output)
return html_output
def generate_basic_html(string):
string = convert_to_markdown(string)
string = f'<style>{readable_css}</style><div class="readable-container">{string}</div>'
return string
def process_post(post, c):
t = post.split('\n')
number = t[0].split(' ')[1]
if len(t) > 1:
src = '\n'.join(t[1:])
else:
src = ''
src = re.sub('>', '&gt;', src)
src = re.sub('(&gt;&gt;[0-9]*)', '<span class="quote">\\1</span>', src)
src = re.sub('\n', '<br>\n', src)
src = f'<blockquote class="message_4chan">{src}\n'
src = f'<span class="name">Anonymous </span> <span class="number">No.{number}</span>\n{src}'
return src
def generate_4chan_html(f):
posts = []
post = ''
c = -2
for line in f.splitlines():
line += "\n"
if line == '-----\n':
continue
elif line.startswith('--- '):
c += 1
if post != '':
src = process_post(post, c)
posts.append(src)
post = line
else:
post += line
if post != '':
src = process_post(post, c)
posts.append(src)
for i in range(len(posts)):
if i == 0:
posts[i] = f'<div class="op">{posts[i]}</div>\n'
else:
posts[i] = f'<div class="reply">{posts[i]}</div>\n'
output = ''
output += f'<style>{_4chan_css}</style><div id="parent"><div id="container">'
for post in posts:
output += post
output += '</div></div>'
output = output.split('\n')
for i in range(len(output)):
output[i] = re.sub(r'^(&gt;(.*?)(<br>|</div>))', r'<span class="greentext">\1</span>', output[i])
output[i] = re.sub(r'^<blockquote class="message_4chan">(&gt;(.*?)(<br>|</div>))', r'<blockquote class="message_4chan"><span class="greentext">\1</span>', output[i])
output = '\n'.join(output)
return output
def make_thumbnail(image):
image = image.resize((350, round(image.size[1] / image.size[0] * 350)), Image.Resampling.LANCZOS)
if image.size[1] > 470:
image = ImageOps.fit(image, (350, 470), Image.LANCZOS)
return image
def get_image_cache(path):
cache_folder = Path(shared.args.disk_cache_dir)
if not cache_folder.exists():
cache_folder.mkdir()
mtime = os.stat(path).st_mtime
if (path in image_cache and mtime != image_cache[path][0]) or (path not in image_cache):
img = make_thumbnail(Image.open(path))
old_p = Path(f'{cache_folder}/{path.name}_cache.png')
p = Path(f'{cache_folder}/cache_{path.name}.png')
if old_p.exists():
old_p.rename(p)
output_file = p
img.convert('RGB').save(output_file, format='PNG')
image_cache[path] = [mtime, output_file.as_posix()]
return image_cache[path][1]
def generate_instruct_html(history):
output = f'<style>{instruct_css}</style><div class="chat" id="chat"><div class="messages">'
for i, _row in enumerate(history):
row = [convert_to_markdown(entry) for entry in _row]
if row[0]: # don't display empty user messages
output += f"""
<div class="user-message">
<div class="text">
<div class="message-body">
{row[0]}
</div>
</div>
</div>
"""
output += f"""
<div class="assistant-message">
<div class="text">
<div class="message-body">
{row[1]}
</div>
</div>
</div>
"""
output += "</div></div>"
return output
def generate_cai_chat_html(history, name1, name2, style, character, reset_cache=False):
output = f'<style>{chat_styles[style]}</style><div class="chat" id="chat"><div class="messages">'
# We use ?character and ?time.time() to force the browser to reset caches
img_bot = f'<img src="file/cache/pfp_character_thumb.png?{character}" class="pfp_character">' if Path("cache/pfp_character_thumb.png").exists() else ''
img_me = f'<img src="file/cache/pfp_me.png?{time.time() if reset_cache else ""}">' if Path("cache/pfp_me.png").exists() else ''
for i, _row in enumerate(history):
row = [convert_to_markdown(entry) for entry in _row]
if row[0]: # don't display empty user messages
output += f"""
<div class="message">
<div class="circle-you">
{img_me}
</div>
<div class="text">
<div class="username">
{name1}
</div>
<div class="message-body">
{row[0]}
</div>
</div>
</div>
"""
output += f"""
<div class="message">
<div class="circle-bot">
{img_bot}
</div>
<div class="text">
<div class="username">
{name2}
</div>
<div class="message-body">
{row[1]}
</div>
</div>
</div>
"""
output += "</div></div>"
return output
def generate_chat_html(history, name1, name2, reset_cache=False):
output = f'<style>{chat_styles["wpp"]}</style><div class="chat" id="chat"><div class="messages">'
for i, _row in enumerate(history):
row = [convert_to_markdown(entry) for entry in _row]
if row[0]: # don't display empty user messages
output += f"""
<div class="message">
<div class="text-you">
<div class="message-body">
{row[0]}
</div>
</div>
</div>
"""
output += f"""
<div class="message">
<div class="text-bot">
<div class="message-body">
{row[1]}
</div>
</div>
</div>
"""
output += "</div></div>"
return output
def chat_html_wrapper(history, name1, name2, mode, style, character, reset_cache=False):
if mode == 'instruct':
return generate_instruct_html(history['visible'])
elif style == 'wpp':
return generate_chat_html(history['visible'], name1, name2)
else:
return generate_cai_chat_html(history['visible'], name1, name2, style, character, reset_cache)

View file

@ -0,0 +1,427 @@
#
# Copyright 2016 The BigDL Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# This file is adapted from
# https://github.com/oobabooga/text-generation-webui/blob/main/modules/loaders.py
import functools
from collections import OrderedDict
import gradio as gr
from modules import shared
loaders_and_params = OrderedDict({
'Transformers': [
'cpu_memory',
'gpu_memory',
'load_in_8bit',
'bf16',
'cpu',
'disk',
'auto_devices',
'load_in_4bit',
'use_double_quant',
'quant_type',
'compute_dtype',
'trust_remote_code',
'no_use_fast',
'use_flash_attention_2',
'alpha_value',
'rope_freq_base',
'compress_pos_emb',
'disable_exllama',
'disable_exllamav2',
'transformers_info'
],
'llama.cpp': [
'n_ctx',
'n_gpu_layers',
'tensor_split',
'n_batch',
'threads',
'threads_batch',
'no_mmap',
'mlock',
'no_mul_mat_q',
'alpha_value',
'rope_freq_base',
'compress_pos_emb',
'cpu',
'numa',
'no_offload_kqv',
'tensorcores',
],
'llamacpp_HF': [
'n_ctx',
'n_gpu_layers',
'tensor_split',
'n_batch',
'threads',
'threads_batch',
'no_mmap',
'mlock',
'no_mul_mat_q',
'alpha_value',
'rope_freq_base',
'compress_pos_emb',
'cpu',
'numa',
'cfg_cache',
'trust_remote_code',
'no_use_fast',
'logits_all',
'no_offload_kqv',
'tensorcores',
'llamacpp_HF_info',
],
'ExLlamav2_HF': [
'gpu_split',
'max_seq_len',
'cfg_cache',
'no_flash_attn',
'num_experts_per_token',
'cache_8bit',
'alpha_value',
'compress_pos_emb',
'trust_remote_code',
'no_use_fast',
],
'ExLlamav2': [
'gpu_split',
'max_seq_len',
'no_flash_attn',
'num_experts_per_token',
'cache_8bit',
'alpha_value',
'compress_pos_emb',
'exllamav2_info',
],
'AutoGPTQ': [
'triton',
'no_inject_fused_attention',
'no_inject_fused_mlp',
'no_use_cuda_fp16',
'wbits',
'groupsize',
'desc_act',
'disable_exllama',
'disable_exllamav2',
'gpu_memory',
'cpu_memory',
'cpu',
'disk',
'auto_devices',
'trust_remote_code',
'no_use_fast',
'autogptq_info',
],
'BigDL-LLM': [
'load_in_4bit',
'load_in_low_bit',
'optimize_model',
'modules_to_not_convert',
'cpu_embedding',
'lightweight_bmm',
'trust_remote_code',
'use_cache',
],
'AutoAWQ': [
'cpu_memory',
'gpu_memory',
'auto_devices',
'max_seq_len',
'no_inject_fused_attention',
'trust_remote_code',
'no_use_fast',
],
'GPTQ-for-LLaMa': [
'wbits',
'groupsize',
'model_type',
'pre_layer',
'trust_remote_code',
'no_use_fast',
'gptq_for_llama_info',
],
'ctransformers': [
'n_ctx',
'n_gpu_layers',
'n_batch',
'threads',
'model_type',
'no_mmap',
'mlock'
],
'QuIP#': [
'trust_remote_code',
'no_use_fast',
'no_flash_attn',
'quipsharp_info',
],
'HQQ': [
'hqq_backend',
'trust_remote_code',
'no_use_fast',
]
})
def transformers_samplers():
return {
'temperature',
'temperature_last',
'dynamic_temperature',
'dynamic_temperature_low',
'top_p',
'min_p',
'top_k',
'typical_p',
'epsilon_cutoff',
'eta_cutoff',
'tfs',
'top_a',
'repetition_penalty',
'presence_penalty',
'frequency_penalty',
'repetition_penalty_range',
'encoder_repetition_penalty',
'no_repeat_ngram_size',
'min_length',
'seed',
'do_sample',
'penalty_alpha',
'num_beams',
'length_penalty',
'early_stopping',
'mirostat_mode',
'mirostat_tau',
'mirostat_eta',
'grammar_file_row',
'grammar_string',
'guidance_scale',
'negative_prompt',
'ban_eos_token',
'custom_token_bans',
'add_bos_token',
'skip_special_tokens',
'auto_max_new_tokens',
}
loaders_samplers = {
'Transformers': transformers_samplers(),
'AutoGPTQ': transformers_samplers(),
'GPTQ-for-LLaMa': transformers_samplers(),
'AutoAWQ': transformers_samplers(),
'QuIP#': transformers_samplers(),
'HQQ': transformers_samplers(),
'BigDL-LLM': transformers_samplers(),
'ExLlamav2': {
'temperature',
'top_p',
'min_p',
'top_k',
'typical_p',
'tfs',
'repetition_penalty',
'repetition_penalty_range',
'seed',
'mirostat_mode',
'mirostat_tau',
'mirostat_eta',
'ban_eos_token',
'add_bos_token',
'custom_token_bans',
'skip_special_tokens',
'auto_max_new_tokens',
},
'ExLlamav2_HF': {
'temperature',
'temperature_last',
'dynamic_temperature',
'dynamic_temperature_low',
'top_p',
'min_p',
'top_k',
'typical_p',
'epsilon_cutoff',
'eta_cutoff',
'tfs',
'top_a',
'repetition_penalty',
'presence_penalty',
'frequency_penalty',
'repetition_penalty_range',
'encoder_repetition_penalty',
'no_repeat_ngram_size',
'min_length',
'seed',
'do_sample',
'mirostat_mode',
'mirostat_tau',
'mirostat_eta',
'grammar_file_row',
'grammar_string',
'guidance_scale',
'negative_prompt',
'ban_eos_token',
'custom_token_bans',
'add_bos_token',
'skip_special_tokens',
'auto_max_new_tokens',
},
'llama.cpp': {
'temperature',
'top_p',
'min_p',
'top_k',
'typical_p',
'tfs',
'repetition_penalty',
'presence_penalty',
'frequency_penalty',
'seed',
'mirostat_mode',
'mirostat_tau',
'mirostat_eta',
'grammar_file_row',
'grammar_string',
'ban_eos_token',
'custom_token_bans',
},
'llamacpp_HF': {
'temperature',
'temperature_last',
'dynamic_temperature',
'dynamic_temperature_low',
'top_p',
'min_p',
'top_k',
'typical_p',
'epsilon_cutoff',
'eta_cutoff',
'tfs',
'top_a',
'repetition_penalty',
'presence_penalty',
'frequency_penalty',
'repetition_penalty_range',
'encoder_repetition_penalty',
'no_repeat_ngram_size',
'min_length',
'seed',
'do_sample',
'mirostat_mode',
'mirostat_tau',
'mirostat_eta',
'grammar_file_row',
'grammar_string',
'guidance_scale',
'negative_prompt',
'ban_eos_token',
'custom_token_bans',
'add_bos_token',
'skip_special_tokens',
'auto_max_new_tokens',
},
'ctransformers': {
'temperature',
'top_p',
'top_k',
'repetition_penalty',
'repetition_penalty_range',
},
}
loaders_model_types = {
'GPTQ-for-LLaMa': [
"None",
"llama",
"opt",
"gptj"
],
'ctransformers': [
"None",
"gpt2",
"gptj",
"gptneox",
"llama",
"mpt",
"dollyv2",
"replit",
"starcoder",
"gptbigcode",
"falcon"
],
}
@functools.cache
def list_all_samplers():
all_samplers = set()
for k in loaders_samplers:
for sampler in loaders_samplers[k]:
all_samplers.add(sampler)
return sorted(all_samplers)
def blacklist_samplers(loader):
all_samplers = list_all_samplers()
if loader == 'All':
return [gr.update(visible=True) for sampler in all_samplers]
else:
return [gr.update(visible=True) if sampler in loaders_samplers[loader] else gr.update(visible=False) for sampler in all_samplers]
def get_model_types(loader):
if loader in loaders_model_types:
return loaders_model_types[loader]
return ["None"]
def get_gpu_memory_keys():
return [k for k in shared.gradio if k.startswith('gpu_memory')]
@functools.cache
def get_all_params():
all_params = set()
for k in loaders_and_params:
for el in loaders_and_params[k]:
all_params.add(el)
if 'gpu_memory' in all_params:
all_params.remove('gpu_memory')
for k in get_gpu_memory_keys():
all_params.add(k)
return sorted(all_params)
def make_loader_params_visible(loader):
params = []
all_params = get_all_params()
if loader in loaders_and_params:
params = loaders_and_params[loader]
if 'gpu_memory' in params:
params.remove('gpu_memory')
params += get_gpu_memory_keys()
return [gr.update(visible=True) if k in params else gr.update(visible=False) for k in all_params]

View file

@ -0,0 +1,86 @@
#
# Copyright 2016 The BigDL Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# This file is adapted from
# https://github.com/oobabooga/text-generation-webui/blob/main/modules/logging_colors.py
import logging
logger = logging.getLogger('text-generation-webui')
def setup_logging():
'''
Copied from: https://github.com/vladmandic/automatic
All credits to vladmandic.
'''
class RingBuffer(logging.StreamHandler):
def __init__(self, capacity):
super().__init__()
self.capacity = capacity
self.buffer = []
self.formatter = logging.Formatter('{ "asctime":"%(asctime)s", "created":%(created)f, "facility":"%(name)s", "pid":%(process)d, "tid":%(thread)d, "level":"%(levelname)s", "module":"%(module)s", "func":"%(funcName)s", "msg":"%(message)s" }')
def emit(self, record):
msg = self.format(record)
# self.buffer.append(json.loads(msg))
self.buffer.append(msg)
if len(self.buffer) > self.capacity:
self.buffer.pop(0)
def get(self):
return self.buffer
from rich.console import Console
from rich.logging import RichHandler
from rich.pretty import install as pretty_install
from rich.theme import Theme
from rich.traceback import install as traceback_install
level = logging.DEBUG
logger.setLevel(logging.DEBUG) # log to file is always at level debug for facility `sd`
console = Console(log_time=True, log_time_format='%H:%M:%S-%f', theme=Theme({
"traceback.border": "black",
"traceback.border.syntax_error": "black",
"inspect.value.border": "black",
}))
logging.basicConfig(level=logging.ERROR, format='%(asctime)s | %(name)s | %(levelname)s | %(module)s | %(message)s', handlers=[logging.NullHandler()]) # redirect default logger to null
pretty_install(console=console)
traceback_install(console=console, extra_lines=1, max_frames=10, width=console.width, word_wrap=False, indent_guides=False, suppress=[])
while logger.hasHandlers() and len(logger.handlers) > 0:
logger.removeHandler(logger.handlers[0])
# handlers
rh = RichHandler(show_time=True, omit_repeated_times=False, show_level=True, show_path=False, markup=False, rich_tracebacks=True, log_time_format='%H:%M:%S-%f', level=level, console=console)
rh.setLevel(level)
logger.addHandler(rh)
rb = RingBuffer(100) # 100 entries default in log ring buffer
rb.setLevel(level)
logger.addHandler(rb)
logger.buffer = rb.buffer
# overrides
logging.getLogger("urllib3").setLevel(logging.ERROR)
logging.getLogger("httpx").setLevel(logging.ERROR)
logging.getLogger("diffusers").setLevel(logging.ERROR)
logging.getLogger("torch").setLevel(logging.ERROR)
logging.getLogger("lycoris").handlers = logger.handlers
setup_logging()

View file

@ -0,0 +1,93 @@
#
# Copyright 2016 The BigDL Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# This file is adapted from
# https://github.com/oobabooga/text-generation-webui/blob/main/modules/logits.py
import torch
from transformers import is_torch_xpu_available
from modules import sampler_hijack, shared
from modules.logging_colors import logger
from modules.text_generation import generate_reply
global_scores = None
def get_next_logits(prompt, state, use_samplers, previous, top_logits=25, return_dict=False):
if shared.model is None:
logger.error("No model is loaded! Select one in the Model tab.")
return 'Error: No model is loaded1 Select one in the Model tab.', previous
is_non_hf_exllamav2 = shared.model.__class__.__name__ == 'Exllamav2Model'
is_non_hf_llamacpp = shared.model.__class__.__name__ == 'LlamaCppModel'
if use_samplers:
if any([is_non_hf_exllamav2, is_non_hf_llamacpp]):
logger.error("Sampler hijacking is not supported non-Huggingface loaders.")
# sampling is all done in c for exllama, so it is really hard to hijack
# it should be possible to hijack llamacpp sampler by hijacking all their sampling methods,
# but it is not implemented yet
return 'Error: Sampler hijacking is not supported non-Huggingface loaders. Please disable the "Use samplers" option.', previous
state['max_new_tokens'] = 1
state['auto_max_new_tokens'] = False
for _ in generate_reply(prompt, state):
pass
scores = sampler_hijack.global_scores[-1]
else:
if is_non_hf_exllamav2:
if is_torch_xpu_available():
tokens = shared.tokenizer.encode(prompt).to("xpu:0")
else:
tokens = shared.tokenizer.encode(prompt).cuda()
scores = shared.model.get_logits(tokens)[-1][-1]
elif is_non_hf_llamacpp:
tokens = shared.tokenizer.encode(prompt)
scores = shared.model.get_logits(tokens)[-1][-1]
else:
if is_torch_xpu_available():
tokens = shared.tokenizer.encode(prompt, return_tensors='pt').to("xpu:0")
else:
tokens = shared.tokenizer.encode(prompt, return_tensors='pt').cuda()
output = shared.model(input_ids=tokens)
scores = output['logits'][-1][-1]
probs = torch.softmax(scores, dim=-1, dtype=torch.float)
topk_values, topk_indices = torch.topk(probs, k=top_logits, largest=True, sorted=True)
if is_non_hf_llamacpp:
topk_indices = [i.expand((1, 1)) for i in topk_indices]
if hasattr(shared.tokenizer, 'convert_ids_to_tokens'):
tokens = [shared.tokenizer.convert_ids_to_tokens(int(i)) for i in topk_indices]
else:
tokens = [shared.tokenizer.decode(i) for i in topk_indices]
if return_dict:
topk_values = [float(i) for i in topk_values]
output = {}
for row in list(zip(topk_values, tokens)):
output[row[1]] = row[0]
return output
else:
topk_values = [f"{float(i):.5f}" for i in topk_values]
output = ''
for row in list(zip(topk_values, tokens)):
output += f"{row[0]} - {repr(row[1])}\n"
return output, previous

View file

@ -0,0 +1,111 @@
#
# Copyright 2016 The BigDL Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# This file is adapted from
# https://github.com/oobabooga/text-generation-webui/blob/main/modules/metadata_gguf.py
import struct
from enum import IntEnum
class GGUFValueType(IntEnum):
UINT8 = 0
INT8 = 1
UINT16 = 2
INT16 = 3
UINT32 = 4
INT32 = 5
FLOAT32 = 6
BOOL = 7
STRING = 8
ARRAY = 9
UINT64 = 10
INT64 = 11
FLOAT64 = 12
_simple_value_packing = {
GGUFValueType.UINT8: "<B",
GGUFValueType.INT8: "<b",
GGUFValueType.UINT16: "<H",
GGUFValueType.INT16: "<h",
GGUFValueType.UINT32: "<I",
GGUFValueType.INT32: "<i",
GGUFValueType.FLOAT32: "<f",
GGUFValueType.UINT64: "<Q",
GGUFValueType.INT64: "<q",
GGUFValueType.FLOAT64: "<d",
GGUFValueType.BOOL: "?",
}
value_type_info = {
GGUFValueType.UINT8: 1,
GGUFValueType.INT8: 1,
GGUFValueType.UINT16: 2,
GGUFValueType.INT16: 2,
GGUFValueType.UINT32: 4,
GGUFValueType.INT32: 4,
GGUFValueType.FLOAT32: 4,
GGUFValueType.UINT64: 8,
GGUFValueType.INT64: 8,
GGUFValueType.FLOAT64: 8,
GGUFValueType.BOOL: 1,
}
def get_single(value_type, file):
if value_type == GGUFValueType.STRING:
value_length = struct.unpack("<Q", file.read(8))[0]
value = file.read(value_length)
try:
value = value.decode('utf-8')
except:
pass
else:
type_str = _simple_value_packing.get(value_type)
bytes_length = value_type_info.get(value_type)
value = struct.unpack(type_str, file.read(bytes_length))[0]
return value
def load_metadata(fname):
metadata = {}
with open(fname, 'rb') as file:
GGUF_MAGIC = struct.unpack("<I", file.read(4))[0]
GGUF_VERSION = struct.unpack("<I", file.read(4))[0]
ti_data_count = struct.unpack("<Q", file.read(8))[0]
kv_data_count = struct.unpack("<Q", file.read(8))[0]
if GGUF_VERSION == 1:
raise Exception('You are using an outdated GGUF, please download a new one.')
for i in range(kv_data_count):
key_length = struct.unpack("<Q", file.read(8))[0]
key = file.read(key_length)
value_type = GGUFValueType(struct.unpack("<I", file.read(4))[0])
if value_type == GGUFValueType.ARRAY:
ltype = GGUFValueType(struct.unpack("<I", file.read(4))[0])
length = struct.unpack("<Q", file.read(8))[0]
arr = [get_single(ltype, file) for _ in range(length)]
metadata[key.decode()] = arr
else:
value = get_single(value_type, file)
metadata[key.decode()] = value
return metadata

View file

@ -0,0 +1,510 @@
#
# Copyright 2016 The BigDL Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# This file is adapted from
# https://github.com/oobabooga/text-generation-webui/blob/main/modules/models.py
import gc
import logging
import os
import re
import time
import traceback
from pathlib import Path
import torch
import transformers
from accelerate import infer_auto_device_map, init_empty_weights
from accelerate.utils import is_ccl_available, is_xpu_available
from transformers import (
AutoConfig,
AutoModel,
AutoModelForCausalLM,
AutoModelForSeq2SeqLM,
AutoTokenizer,
BitsAndBytesConfig,
#GPTQConfig
)
import modules.shared as shared
from modules import RoPE, sampler_hijack
from modules.logging_colors import logger
from modules.models_settings import get_model_metadata
from modules.relative_imports import RelativeImport
transformers.logging.set_verbosity_error()
local_rank = None
if shared.args.deepspeed:
import deepspeed
from transformers.deepspeed import (
HfDeepSpeedConfig,
is_deepspeed_zero3_enabled
)
from modules.deepspeed_parameters import generate_ds_config
# Distributed setup
local_rank = shared.args.local_rank if shared.args.local_rank is not None else int(os.getenv("LOCAL_RANK", "0"))
world_size = int(os.getenv("WORLD_SIZE", "1"))
if is_xpu_available() and is_ccl_available():
torch.xpu.set_device(local_rank)
deepspeed.init_distributed(backend="ccl")
else:
torch.cuda.set_device(local_rank)
deepspeed.init_distributed()
ds_config = generate_ds_config(shared.args.bf16, 1 * world_size, shared.args.nvme_offload_dir)
dschf = HfDeepSpeedConfig(ds_config) # Keep this object alive for the Transformers integration
sampler_hijack.hijack_samplers()
def load_model(model_name, loader=None):
logger.info(f"Loading {model_name}")
t0 = time.time()
shared.is_seq2seq = False
shared.model_name = model_name
load_func_map = {
'Transformers': huggingface_loader,
'AutoGPTQ': AutoGPTQ_loader,
'GPTQ-for-LLaMa': GPTQ_loader,
'llama.cpp': llamacpp_loader,
'llamacpp_HF': llamacpp_HF_loader,
'ExLlamav2': ExLlamav2_loader,
'ExLlamav2_HF': ExLlamav2_HF_loader,
'ctransformers': ctransformers_loader,
'AutoAWQ': AutoAWQ_loader,
'QuIP#': QuipSharp_loader,
'HQQ': HQQ_loader,
'BigDL-LLM': bigdl_llm_loader,
}
metadata = get_model_metadata(model_name)
if loader is None:
if shared.args.loader is not None:
loader = shared.args.loader
else:
loader = metadata['loader']
if loader is None:
logger.error('The path to the model does not exist. Exiting.')
raise ValueError
shared.args.loader = loader
output = load_func_map[loader](model_name)
if type(output) is tuple:
model, tokenizer = output
else:
model = output
if model is None:
return None, None
else:
tokenizer = load_tokenizer(model_name, model)
shared.settings.update({k: v for k, v in metadata.items() if k in shared.settings})
if loader.lower().startswith('exllama'):
shared.settings['truncation_length'] = shared.args.max_seq_len
elif loader in ['llama.cpp', 'llamacpp_HF', 'ctransformers']:
shared.settings['truncation_length'] = shared.args.n_ctx
logger.info(f"LOADER: {loader}")
logger.info(f"TRUNCATION LENGTH: {shared.settings['truncation_length']}")
logger.info(f"INSTRUCTION TEMPLATE: {metadata['instruction_template']}")
logger.info(f"Loaded the model in {(time.time()-t0):.2f} seconds.")
return model, tokenizer
def load_tokenizer(model_name, model):
tokenizer = None
path_to_model = Path(f"{shared.args.model_dir}/{model_name}/")
if any(s in model_name.lower() for s in ['gpt-4chan', 'gpt4chan']) and Path(f"{shared.args.model_dir}/gpt-j-6B/").exists():
tokenizer = AutoTokenizer.from_pretrained(Path(f"{shared.args.model_dir}/gpt-j-6B/"))
elif path_to_model.exists():
if shared.args.no_use_fast:
logger.info('Loading the tokenizer with use_fast=False.')
tokenizer = AutoTokenizer.from_pretrained(
path_to_model,
trust_remote_code=shared.args.trust_remote_code,
use_fast=not shared.args.no_use_fast
)
return tokenizer
def huggingface_loader(model_name):
path_to_model = Path(f'{shared.args.model_dir}/{model_name}')
params = {
'low_cpu_mem_usage': True,
'trust_remote_code': shared.args.trust_remote_code,
'torch_dtype': torch.bfloat16 if shared.args.bf16 else torch.float16,
'use_safetensors': True if shared.args.force_safetensors else None
}
if shared.args.use_flash_attention_2:
params['use_flash_attention_2'] = True
config = AutoConfig.from_pretrained(path_to_model, trust_remote_code=params['trust_remote_code'])
if 'chatglm' in model_name.lower():
LoaderClass = AutoModel
else:
if config.to_dict().get('is_encoder_decoder', False):
LoaderClass = AutoModelForSeq2SeqLM
shared.is_seq2seq = True
else:
LoaderClass = AutoModelForCausalLM
# Load the model in simple 16-bit mode by default
if not any([shared.args.cpu, shared.args.load_in_8bit, shared.args.load_in_4bit, shared.args.auto_devices, shared.args.disk, shared.args.deepspeed, shared.args.gpu_memory is not None, shared.args.cpu_memory is not None, shared.args.compress_pos_emb > 1, shared.args.alpha_value > 1, shared.args.disable_exllama, shared.args.disable_exllamav2]):
model = LoaderClass.from_pretrained(path_to_model, **params)
if torch.backends.mps.is_available():
device = torch.device('mps')
model = model.to(device)
elif is_xpu_available():
device = torch.device("xpu")
model = model.to(device)
else:
model = model.cuda()
# DeepSpeed ZeRO-3
elif shared.args.deepspeed:
model = LoaderClass.from_pretrained(path_to_model, torch_dtype=params['torch_dtype'])
model = deepspeed.initialize(model=model, config_params=ds_config, model_parameters=None, optimizer=None, lr_scheduler=None)[0]
model.module.eval() # Inference
logger.info(f'DeepSpeed ZeRO-3 is enabled: {is_deepspeed_zero3_enabled()}')
# Load with quantization and/or offloading
else:
if not any((shared.args.cpu, torch.cuda.is_available(), is_xpu_available(), torch.backends.mps.is_available())):
logger.warning('torch.cuda.is_available() and is_xpu_available() returned False. This means that no GPU has been detected. Falling back to CPU mode.')
shared.args.cpu = True
if shared.args.cpu:
params['torch_dtype'] = torch.float32
else:
params['device_map'] = 'auto'
params['max_memory'] = get_max_memory_dict()
if shared.args.load_in_4bit:
# See https://github.com/huggingface/transformers/pull/23479/files
# and https://huggingface.co/blog/4bit-transformers-bitsandbytes
quantization_config_params = {
'load_in_4bit': True,
'bnb_4bit_compute_dtype': eval("torch.{}".format(shared.args.compute_dtype)) if shared.args.compute_dtype in ["bfloat16", "float16", "float32"] else None,
'bnb_4bit_quant_type': shared.args.quant_type,
'bnb_4bit_use_double_quant': shared.args.use_double_quant,
}
logger.info('Using the following 4-bit params: ' + str(quantization_config_params))
params['quantization_config'] = BitsAndBytesConfig(**quantization_config_params)
elif shared.args.load_in_8bit:
if any((shared.args.auto_devices, shared.args.gpu_memory)):
params['quantization_config'] = BitsAndBytesConfig(load_in_8bit=True, llm_int8_enable_fp32_cpu_offload=True)
else:
params['quantization_config'] = BitsAndBytesConfig(load_in_8bit=True)
if params['max_memory'] is not None:
with init_empty_weights():
model = LoaderClass.from_config(config, trust_remote_code=params['trust_remote_code'])
model.tie_weights()
params['device_map'] = infer_auto_device_map(
model,
dtype=torch.int8,
max_memory=params['max_memory'],
no_split_module_classes=model._no_split_modules
)
if shared.args.disk:
params['offload_folder'] = shared.args.disk_cache_dir
if shared.args.disable_exllama or shared.args.disable_exllamav2:
try:
gptq_config = GPTQConfig(
bits=config.quantization_config.get('bits', 4),
disable_exllama=shared.args.disable_exllama,
disable_exllamav2=shared.args.disable_exllamav2,
)
params['quantization_config'] = gptq_config
logger.info(f'Loading with disable_exllama={shared.args.disable_exllama} and disable_exllamav2={shared.args.disable_exllamav2}.')
except:
exc = traceback.format_exc()
logger.error('Failed to disable exllama. Does the config.json for this model contain the necessary quantization info?')
print(exc)
if shared.args.compress_pos_emb > 1:
params['rope_scaling'] = {'type': 'linear', 'factor': shared.args.compress_pos_emb}
elif shared.args.alpha_value > 1:
params['rope_scaling'] = {'type': 'dynamic', 'factor': RoPE.get_alpha_value(shared.args.alpha_value, shared.args.rope_freq_base)}
model = LoaderClass.from_pretrained(path_to_model, **params)
return model
def llamacpp_loader(model_name):
from modules.llamacpp_model import LlamaCppModel
path = Path(f'{shared.args.model_dir}/{model_name}')
if path.is_file():
model_file = path
else:
model_file = list(Path(f'{shared.args.model_dir}/{model_name}').glob('*.gguf'))[0]
logger.info(f"llama.cpp weights detected: {model_file}")
model, tokenizer = LlamaCppModel.from_pretrained(model_file)
return model, tokenizer
def llamacpp_HF_loader(model_name):
from modules.llamacpp_hf import LlamacppHF
for fname in [model_name, "oobabooga_llama-tokenizer", "llama-tokenizer"]:
path = Path(f'{shared.args.model_dir}/{fname}')
if all((path / file).exists() for file in ['tokenizer_config.json', 'special_tokens_map.json', 'tokenizer.model']):
logger.info(f'Using tokenizer from: {path}')
break
else:
logger.error("Could not load the model because a tokenizer in transformers format was not found. Please download oobabooga/llama-tokenizer.")
return None, None
if shared.args.no_use_fast:
logger.info('Loading the tokenizer with use_fast=False.')
tokenizer = AutoTokenizer.from_pretrained(
path,
trust_remote_code=shared.args.trust_remote_code,
use_fast=not shared.args.no_use_fast
)
model = LlamacppHF.from_pretrained(model_name)
return model, tokenizer
def ctransformers_loader(model_name):
from modules.ctransformers_model import CtransformersModel
path = Path(f'{shared.args.model_dir}/{model_name}')
ctrans = CtransformersModel()
if ctrans.model_type_is_auto():
model_file = path
else:
if path.is_file():
model_file = path
else:
entries = Path(f'{shared.args.model_dir}/{model_name}')
gguf = list(entries.glob('*.gguf'))
bin = list(entries.glob('*.bin'))
if len(gguf) > 0:
model_file = gguf[0]
elif len(bin) > 0:
model_file = bin[0]
else:
logger.error("Could not find a model for ctransformers.")
return None, None
logger.info(f'ctransformers weights detected: {model_file}')
model, tokenizer = ctrans.from_pretrained(model_file)
return model, tokenizer
def AutoAWQ_loader(model_name):
from awq import AutoAWQForCausalLM
model_dir = Path(f'{shared.args.model_dir}/{model_name}')
model = AutoAWQForCausalLM.from_quantized(
quant_path=model_dir,
max_new_tokens=shared.args.max_seq_len,
trust_remote_code=shared.args.trust_remote_code,
fuse_layers=not shared.args.no_inject_fused_attention,
max_memory=get_max_memory_dict(),
batch_size=1,
safetensors=any(model_dir.glob('*.safetensors')),
)
return model
def bigdl_llm_loader(model_name):
from bigdl.llm.transformers import AutoModelForCausalLM, AutoModel, AutoModelForSeq2SeqLM
path_to_model = Path(f'{shared.args.model_dir}/{model_name}')
config = AutoConfig.from_pretrained(path_to_model, trust_remote_code=shared.args.trust_remote_code)
if 'chatglm' in model_name.lower():
LoaderClass = AutoModel
else:
if config.to_dict().get('is_encoder_decoder', False):
LoaderClass = AutoModelForSeq2SeqLM
shared.is_seq2seq = True
else:
LoaderClass = AutoModelForCausalLM
model = LoaderClass.from_pretrained(
path_to_model,
load_in_4bit=shared.args.load_in_4bit,
load_in_low_bit=shared.args.load_in_low_bit,
optimize_model=shared.args.optimize_model,
modules_to_not_convert=shared.args.modules_to_not_convert,
cpu_embedding=shared.args.cpu_embedding,
lightweight_bmm=shared.args.lightweight_bmm,
trust_remote_code=shared.args.trust_remote_code,
use_cache=shared.args.use_cache,
)
if shared.args.device == "GPU":
import intel_extension_for_pytorch
model = model.to("xpu")
tokenizer = AutoTokenizer.from_pretrained(path_to_model, trust_remote_code=shared.args.trust_remote_code)
return model, tokenizer
def QuipSharp_loader(model_name):
try:
with RelativeImport("repositories/quip-sharp"):
from lib.utils.unsafe_import import model_from_hf_path
except:
logger.error(
"\nQuIP# has not been found. It must be installed manually for now.\n"
"For instructions on how to do that, please consult:\n"
"https://github.com/oobabooga/text-generation-webui/pull/4803\n"
)
return None, None
# This fixes duplicate logging messages after the import above.
handlers = logging.getLogger().handlers
if len(handlers) > 1:
logging.getLogger().removeHandler(handlers[1])
model_dir = Path(f'{shared.args.model_dir}/{model_name}')
if not all((model_dir / file).exists() for file in ['tokenizer_config.json', 'special_tokens_map.json', 'tokenizer.model']):
logger.error(f"Could not load the model because the tokenizer files could not be found in the model folder. Please download the following files from the original (unquantized) model into {model_dir}: special_tokens_map.json, tokenizer.json, tokenizer.model, tokenizer_config.json.")
return None, None
model, model_str = model_from_hf_path(
model_dir,
use_cuda_graph=False,
use_flash_attn=not shared.args.no_flash_attn
)
return model
def GPTQ_loader(model_name):
# Monkey patch
if shared.args.monkey_patch:
logger.warning("Applying the monkey patch for using LoRAs with GPTQ models. It may cause undefined behavior outside its intended scope.")
from modules.monkey_patch_gptq_lora import load_model_llama
model, _ = load_model_llama(model_name)
# No monkey patch
else:
import modules.GPTQ_loader
model = modules.GPTQ_loader.load_quantized(model_name)
return model
def AutoGPTQ_loader(model_name):
import modules.AutoGPTQ_loader
return modules.AutoGPTQ_loader.load_quantized(model_name)
def ExLlamav2_loader(model_name):
from modules.exllamav2 import Exllamav2Model
model, tokenizer = Exllamav2Model.from_pretrained(model_name)
return model, tokenizer
def ExLlamav2_HF_loader(model_name):
from modules.exllamav2_hf import Exllamav2HF
return Exllamav2HF.from_pretrained(model_name)
def HQQ_loader(model_name):
from hqq.core.quantize import HQQBackend, HQQLinear
from hqq.engine.hf import HQQModelForCausalLM
logger.info(f"Loading HQQ model with backend: {shared.args.hqq_backend}")
model_dir = Path(f'{shared.args.model_dir}/{model_name}')
model = HQQModelForCausalLM.from_quantized(str(model_dir))
HQQLinear.set_backend(getattr(HQQBackend, shared.args.hqq_backend))
return model
def get_max_memory_dict():
max_memory = {}
max_cpu_memory = shared.args.cpu_memory.strip() if shared.args.cpu_memory is not None else '99GiB'
if shared.args.gpu_memory:
memory_map = list(map(lambda x: x.strip(), shared.args.gpu_memory))
for i in range(len(memory_map)):
max_memory[i] = f'{memory_map[i]}GiB' if not re.match('.*ib$', memory_map[i].lower()) else memory_map[i]
max_memory['cpu'] = f'{max_cpu_memory}GiB' if not re.match('.*ib$', max_cpu_memory.lower()) else max_cpu_memory
# If --auto-devices is provided standalone, try to get a reasonable value
# for the maximum memory of device :0
elif shared.args.auto_devices:
if is_xpu_available():
total_mem = (torch.xpu.get_device_properties(0).total_memory / (1024 * 1024))
else:
total_mem = (torch.cuda.get_device_properties(0).total_memory / (1024 * 1024))
suggestion = round((total_mem - 1000) / 1000) * 1000
if total_mem - suggestion < 800:
suggestion -= 1000
suggestion = int(round(suggestion / 1000))
logger.warning(f"Auto-assiging --gpu-memory {suggestion} for your GPU to try to prevent out-of-memory errors. You can manually set other values.")
max_memory[0] = f'{suggestion}GiB'
max_memory['cpu'] = f'{max_cpu_memory}GiB' if not re.match('.*ib$', max_cpu_memory.lower()) else max_cpu_memory
return max_memory if len(max_memory) > 0 else None
def clear_torch_cache():
gc.collect()
if not shared.args.cpu:
if is_xpu_available():
torch.xpu.empty_cache()
else:
torch.cuda.empty_cache()
def unload_model():
shared.model = shared.tokenizer = None
shared.model_name = 'None'
shared.lora_names = []
shared.model_dirty_from_training = False
clear_torch_cache()
def reload_model():
unload_model()
shared.model, shared.tokenizer = load_model(shared.model_name)

View file

@ -0,0 +1,288 @@
#
# Copyright 2016 The BigDL Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# This file is adapted from
# https://github.com/oobabooga/text-generation-webui/blob/main/modules/models_settings.py
import json
import re
from pathlib import Path
import yaml
from modules import chat, loaders, metadata_gguf, shared, ui
def get_fallback_settings():
return {
'wbits': 'None',
'groupsize': 'None',
'desc_act': False,
'model_type': 'None',
'max_seq_len': 2048,
'n_ctx': 2048,
'rope_freq_base': 0,
'compress_pos_emb': 1,
'truncation_length': shared.settings['truncation_length'],
'skip_special_tokens': shared.settings['skip_special_tokens'],
'custom_stopping_strings': shared.settings['custom_stopping_strings'],
}
def get_model_metadata(model):
model_settings = {}
# Get settings from models/config.yaml and models/config-user.yaml
settings = shared.model_config
for pat in settings:
if re.match(pat.lower(), model.lower()):
for k in settings[pat]:
model_settings[k] = settings[pat][k]
path = Path(f'{shared.args.model_dir}/{model}/config.json')
if path.exists():
hf_metadata = json.loads(open(path, 'r', encoding='utf-8').read())
else:
hf_metadata = None
if 'loader' not in model_settings:
if hf_metadata is not None and 'quip_params' in hf_metadata:
loader = 'QuIP#'
else:
loader = infer_loader(model, model_settings)
model_settings['loader'] = loader
# GGUF metadata
if model_settings['loader'] in ['llama.cpp', 'llamacpp_HF', 'ctransformers']:
path = Path(f'{shared.args.model_dir}/{model}')
if path.is_file():
model_file = path
else:
model_file = list(path.glob('*.gguf'))[0]
metadata = metadata_gguf.load_metadata(model_file)
if 'llama.context_length' in metadata:
model_settings['n_ctx'] = metadata['llama.context_length']
if 'llama.rope.scale_linear' in metadata:
model_settings['compress_pos_emb'] = metadata['llama.rope.scale_linear']
if 'llama.rope.freq_base' in metadata:
model_settings['rope_freq_base'] = metadata['llama.rope.freq_base']
if 'tokenizer.chat_template' in metadata:
template = metadata['tokenizer.chat_template']
eos_token = metadata['tokenizer.ggml.tokens'][metadata['tokenizer.ggml.eos_token_id']]
bos_token = metadata['tokenizer.ggml.tokens'][metadata['tokenizer.ggml.bos_token_id']]
template = template.replace('eos_token', "'{}'".format(eos_token))
template = template.replace('bos_token', "'{}'".format(bos_token))
template = re.sub(r'raise_exception\([^)]*\)', "''", template)
model_settings['instruction_template'] = 'Custom (obtained from model metadata)'
model_settings['instruction_template_str'] = template
else:
# Transformers metadata
if hf_metadata is not None:
metadata = json.loads(open(path, 'r', encoding='utf-8').read())
if 'max_position_embeddings' in metadata:
model_settings['truncation_length'] = metadata['max_position_embeddings']
model_settings['max_seq_len'] = metadata['max_position_embeddings']
if 'rope_theta' in metadata:
model_settings['rope_freq_base'] = metadata['rope_theta']
if 'rope_scaling' in metadata and type(metadata['rope_scaling']) is dict and all(key in metadata['rope_scaling'] for key in ('type', 'factor')):
if metadata['rope_scaling']['type'] == 'linear':
model_settings['compress_pos_emb'] = metadata['rope_scaling']['factor']
if 'quantization_config' in metadata:
if 'bits' in metadata['quantization_config']:
model_settings['wbits'] = metadata['quantization_config']['bits']
if 'group_size' in metadata['quantization_config']:
model_settings['groupsize'] = metadata['quantization_config']['group_size']
if 'desc_act' in metadata['quantization_config']:
model_settings['desc_act'] = metadata['quantization_config']['desc_act']
# Read AutoGPTQ metadata
path = Path(f'{shared.args.model_dir}/{model}/quantize_config.json')
if path.exists():
metadata = json.loads(open(path, 'r', encoding='utf-8').read())
if 'bits' in metadata:
model_settings['wbits'] = metadata['bits']
if 'group_size' in metadata:
model_settings['groupsize'] = metadata['group_size']
if 'desc_act' in metadata:
model_settings['desc_act'] = metadata['desc_act']
# Try to find the Jinja instruct template
path = Path(f'{shared.args.model_dir}/{model}') / 'tokenizer_config.json'
if path.exists():
metadata = json.loads(open(path, 'r', encoding='utf-8').read())
if 'chat_template' in metadata:
template = metadata['chat_template']
for k in ['eos_token', 'bos_token']:
if k in metadata:
value = metadata[k]
if type(value) is dict:
value = value['content']
template = template.replace(k, "'{}'".format(value))
template = re.sub(r'raise_exception\([^)]*\)', "''", template)
model_settings['instruction_template'] = 'Custom (obtained from model metadata)'
model_settings['instruction_template_str'] = template
if 'instruction_template' not in model_settings:
model_settings['instruction_template'] = 'Alpaca'
if model_settings['instruction_template'] != 'Custom (obtained from model metadata)':
model_settings['instruction_template_str'] = chat.load_instruction_template(model_settings['instruction_template'])
# Ignore rope_freq_base if set to the default value
if 'rope_freq_base' in model_settings and model_settings['rope_freq_base'] == 10000:
model_settings.pop('rope_freq_base')
# Apply user settings from models/config-user.yaml
settings = shared.user_config
for pat in settings:
if re.match(pat.lower(), model.lower()):
for k in settings[pat]:
model_settings[k] = settings[pat][k]
return model_settings
def infer_loader(model_name, model_settings):
path_to_model = Path(f'{shared.args.model_dir}/{model_name}')
if not path_to_model.exists():
loader = None
elif (path_to_model / 'quantize_config.json').exists() or ('wbits' in model_settings and type(model_settings['wbits']) is int and model_settings['wbits'] > 0):
loader = 'ExLlamav2_HF'
elif (path_to_model / 'quant_config.json').exists() or re.match(r'.*-awq', model_name.lower()):
loader = 'AutoAWQ'
elif len(list(path_to_model.glob('*.gguf'))) > 0:
loader = 'llama.cpp'
elif re.match(r'.*\.gguf', model_name.lower()):
loader = 'llama.cpp'
elif re.match(r'.*exl2', model_name.lower()):
loader = 'ExLlamav2_HF'
elif re.match(r'.*-hqq', model_name.lower()):
return 'HQQ'
else:
loader = 'Transformers'
return loader
def update_model_parameters(state, initial=False):
'''
UI: update the command-line arguments based on the interface values
'''
elements = ui.list_model_elements() # the names of the parameters
gpu_memories = []
for i, element in enumerate(elements):
if element not in state:
continue
value = state[element]
if element.startswith('gpu_memory'):
gpu_memories.append(value)
continue
if initial and element in shared.provided_arguments:
continue
# Setting null defaults
if element in ['wbits', 'groupsize', 'model_type'] and value == 'None':
value = vars(shared.args_defaults)[element]
elif element in ['cpu_memory'] and value == 0:
value = vars(shared.args_defaults)[element]
# Making some simple conversions
if element in ['wbits', 'groupsize', 'pre_layer']:
value = int(value)
elif element == 'cpu_memory' and value is not None:
value = f"{value}MiB"
if element in ['pre_layer']:
value = [value] if value > 0 else None
setattr(shared.args, element, value)
found_positive = False
for i in gpu_memories:
if i > 0:
found_positive = True
break
if not (initial and vars(shared.args)['gpu_memory'] != vars(shared.args_defaults)['gpu_memory']):
if found_positive:
shared.args.gpu_memory = [f"{i}MiB" for i in gpu_memories]
else:
shared.args.gpu_memory = None
def apply_model_settings_to_state(model, state):
'''
UI: update the state variable with the model settings
'''
model_settings = get_model_metadata(model)
if 'loader' in model_settings:
loader = model_settings.pop('loader')
# If the user is using an alternative loader for the same model type, let them keep using it
if not (loader == 'ExLlamav2_HF' and state['loader'] in ['GPTQ-for-LLaMa', 'ExLlamav2', 'AutoGPTQ']) and not (loader == 'llama.cpp' and state['loader'] in ['llamacpp_HF', 'ctransformers']):
state['loader'] = loader
for k in model_settings:
if k in state:
if k in ['wbits', 'groupsize']:
state[k] = str(model_settings[k])
else:
state[k] = model_settings[k]
return state
def save_model_settings(model, state):
'''
Save the settings for this model to models/config-user.yaml
'''
if model == 'None':
yield ("Not saving the settings because no model is loaded.")
return
with Path(f'{shared.args.model_dir}/config-user.yaml') as p:
if p.exists():
user_config = yaml.safe_load(open(p, 'r').read())
else:
user_config = {}
model_regex = model + '$' # For exact matches
if model_regex not in user_config:
user_config[model_regex] = {}
for k in ui.list_model_elements():
if k == 'loader' or k in loaders.loaders_and_params[state['loader']]:
user_config[model_regex][k] = state[k]
shared.user_config = user_config
output = yaml.dump(user_config, sort_keys=False)
with open(p, 'w') as f:
f.write(output)
yield (f"Settings for `{model}` saved to `{p}`.")

View file

@ -0,0 +1,28 @@
#
# Copyright 2016 The BigDL Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# This file is adapted from
# https://github.com/oobabooga/text-generation-webui/blob/main/modules/one_click_installer_check.py
from pathlib import Path
from modules.logging_colors import logger
if Path('../webui.py').exists():
logger.warning('\nIt looks like you are running an outdated version of '
'the one-click-installers.\n'
'Please migrate your installation following the instructions here:\n'
'https://github.com/oobabooga/text-generation-webui/wiki/Migrating-an-old-one%E2%80%90click-install')

View file

@ -0,0 +1,138 @@
#
# Copyright 2016 The BigDL Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# This file is adapted from
# https://github.com/oobabooga/text-generation-webui/blob/main/modules/presets.py
import functools
import random
from pathlib import Path
import yaml
from modules import shared
from modules.loaders import loaders_samplers
def default_preset():
return {
'temperature': 1,
'temperature_last': False,
'dynamic_temperature': False,
'dynamic_temperature_low': 0.1,
'top_p': 1,
'min_p': 0,
'top_k': 0,
'repetition_penalty': 1,
'presence_penalty': 0,
'frequency_penalty': 0,
'repetition_penalty_range': 1024,
'typical_p': 1,
'tfs': 1,
'top_a': 0,
'epsilon_cutoff': 0,
'eta_cutoff': 0,
'guidance_scale': 1,
'penalty_alpha': 0,
'mirostat_mode': 0,
'mirostat_tau': 5,
'mirostat_eta': 0.1,
'do_sample': True,
'encoder_repetition_penalty': 1,
'no_repeat_ngram_size': 0,
'min_length': 0,
'num_beams': 1,
'length_penalty': 1,
'early_stopping': False,
}
def presets_params():
return [k for k in default_preset()]
def load_preset(name):
generate_params = default_preset()
if name not in ['None', None, '']:
with open(Path(f'presets/{name}.yaml'), 'r') as infile:
preset = yaml.safe_load(infile)
for k in preset:
generate_params[k] = preset[k]
return generate_params
@functools.cache
def load_preset_memoized(name):
return load_preset(name)
def load_preset_for_ui(name, state):
generate_params = load_preset(name)
state.update(generate_params)
return state, *[generate_params[k] for k in presets_params()]
def random_preset(state):
params_and_values = {
'remove_tail_tokens': {
'top_p': [0.5, 0.8, 0.9, 0.95, 0.99],
'min_p': [0.5, 0.2, 0.1, 0.05, 0.01],
'top_k': [3, 5, 10, 20, 30, 40],
'typical_p': [0.2, 0.575, 0.95],
'tfs': [0.5, 0.8, 0.9, 0.95, 0.99],
'top_a': [0.5, 0.2, 0.1, 0.05, 0.01],
'epsilon_cutoff': [1, 3, 5, 7, 9],
'eta_cutoff': [3, 6, 9, 12, 15, 18],
},
'flatten_distribution': {
'temperature': [0.5, 0.7, 0.8, 1, 1.2, 1.5, 2.0],
},
'repetition': {
'repetition_penalty': [1, 1.05, 1.1, 1.15, 1.20, 1.25],
'presence_penalty': [0, 0.1, 0.2, 0.4, 0.6, 0.8, 1.0, 2.0],
'frequency_penalty': [0, 0.1, 0.2, 0.4, 0.6, 0.8, 1.0, 2.0],
},
'other': {
'temperature_last': [True, False],
}
}
generate_params = default_preset()
for cat in params_and_values:
choices = list(params_and_values[cat].keys())
if shared.args.loader is not None:
choices = [x for x in choices if x in loaders_samplers[shared.args.loader]]
if len(choices) > 0:
choice = random.choice(choices)
generate_params[choice] = random.choice(params_and_values[cat][choice])
state.update(generate_params)
return state, *[generate_params[k] for k in presets_params()]
def generate_preset_yaml(state):
defaults = default_preset()
data = {k: state[k] for k in presets_params()}
# Remove entries that are identical to the defaults
for k in list(data.keys()):
if data[k] == defaults[k]:
del data[k]
return yaml.dump(data, sort_keys=False)

View file

@ -0,0 +1,46 @@
#
# Copyright 2016 The BigDL Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# This file is adapted from
# https://github.com/oobabooga/text-generation-webui/blob/main/modules/prompts.py
from pathlib import Path
from modules.text_generation import get_encoded_length
def load_prompt(fname):
if fname in ['None', '']:
return ''
else:
file_path = Path(f'prompts/{fname}.txt')
if not file_path.exists():
return ''
with open(file_path, 'r', encoding='utf-8') as f:
text = f.read()
if text[-1] == '\n':
text = text[:-1]
return text
def count_tokens(text):
try:
tokens = get_encoded_length(text)
return str(tokens)
except:
return '0'

View file

@ -0,0 +1,32 @@
#
# Copyright 2016 The BigDL Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# This file is adapted from
# https://github.com/oobabooga/text-generation-webui/blob/main/modules/relative_imports.py
import sys
from pathlib import Path
class RelativeImport:
def __init__(self, path):
self.import_path = Path(path)
def __enter__(self):
sys.path.insert(0, str(self.import_path))
def __exit__(self, exc_type, exc_value, traceback):
sys.path.remove(str(self.import_path))

View file

@ -0,0 +1,403 @@
#
# Copyright 2016 The BigDL Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# This file is adapted from
# https://github.com/oobabooga/text-generation-webui/blob/main/modules/sampler_hijack.py
import math
import torch
import transformers
from transformers import LogitsWarper, is_torch_xpu_available
from transformers.generation.logits_process import (
LogitNormalization,
LogitsProcessor,
LogitsProcessorList,
TemperatureLogitsWarper
)
from modules import shared
global_scores = None
class TemperatureLogitsWarperWithDynatemp(LogitsWarper):
def __init__(self, temperature: float, dynamic_temperature: bool, dynamic_temperature_low: float):
if not isinstance(temperature, float) or not (temperature > 0):
except_msg = (
f"`temperature` (={temperature}) has to be a strictly positive float, otherwise your next token "
"scores will be invalid."
)
if isinstance(temperature, float) and temperature == 0.0:
except_msg += " If you're looking for greedy decoding strategies, set `do_sample=False`."
raise ValueError(except_msg)
self.temperature = temperature
self.dynamic_temperature = dynamic_temperature
self.dynamic_temperature_low = dynamic_temperature_low
def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor) -> torch.FloatTensor:
# Regular temperature
if not self.dynamic_temperature:
scores = scores / self.temperature
return scores
# Dynamic temperature
else:
min_temp = self.dynamic_temperature_low
max_temp = self.temperature
exponent_val = 1.0
# Convert logits to probabilities
probs = torch.softmax(scores, dim=-1)
# Calculate entropy of the softmax probabilities
entropy = -1.0 * torch.where(probs > 0, probs * torch.log(probs), torch.zeros_like(probs)).sum()
# Guard against future possible division by zero
entropy = max(entropy, torch.tensor(1e-10)) # Ensures entropy is slightly greater than 0
# Any logits which are not -Infinity will be considered for calculating max entropy.
num_valid_tokens = torch.sum(scores > -float('inf')).item()
# Now, calculate the max entropy by using only the valid tokens' count
max_entropy = math.log(num_valid_tokens)
# Guard against future possible division by zero
max_entropy = max_entropy if max_entropy > 0.0 else 1e-10
# Normalize the entropy
normalized_entropy = entropy / max_entropy
# Map the normalized entropy to the desired temperature range using the power function
dyn_temp = min_temp + (max_temp - min_temp) * (normalized_entropy.pow(exponent_val))
# Apply the dynamically calculated temperature scaling
scores = scores / dyn_temp
# print("----------------------\nTemperature from generation_config:", self.temperature)
# print("min_temp:", min_temp)
# print("max_temp:", max_temp)
# print("Entropy:", entropy.item())
# print("Max Possible Entropy considering valid tokens only:", max_entropy)
# print("Normalized Entropy:", normalized_entropy.item())
# print("Dynamic Temperature (dyn_temp):", dyn_temp.item())
# print("----------------------")
# max_prob_token_id = torch.argmax(scores, dim=-1) # Get the token ID with the highest probability
# max_prob_token = shared.tokenizer.convert_ids_to_tokens(int(max_prob_token_id)) # Convert ID to token
# print("--- T=", float(dyn_temp), "token=", max_prob_token, "min=", min_temp, "max=", max_temp)
return scores
class MinPLogitsWarper(LogitsWarper):
def __init__(self, min_p: float, filter_value: float = -float("Inf"), min_tokens_to_keep: int = 1):
if min_p < 0 or min_p > 1.0:
raise ValueError(f"`min_p` has to be a float >= 0 and <= 1, but is {min_p}")
self.min_p = min_p
self.filter_value = filter_value
self.min_tokens_to_keep = min_tokens_to_keep
def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor) -> torch.FloatTensor:
# Convert logits to probabilities
probs = torch.softmax(scores, dim=-1)
# Get the probability of the top token for each sequence in the batch
top_probs, _ = probs.max(dim=-1, keepdim=True)
# Calculate the actual min_p threshold by scaling min_p with the top token's probability
scaled_min_p = self.min_p * top_probs
# Create a mask for tokens that have a probability less than the scaled min_p
tokens_to_remove = probs < scaled_min_p
sorted_indices = torch.argsort(scores, descending=True, dim=-1)
sorted_indices_to_remove = torch.gather(tokens_to_remove, dim=-1, index=sorted_indices)
if self.min_tokens_to_keep > 1:
# Keep at least min_tokens_to_keep
sorted_indices_to_remove[..., : self.min_tokens_to_keep] = False
indices_to_remove = sorted_indices_to_remove.scatter(1, sorted_indices, sorted_indices_to_remove)
scores = scores.masked_fill(indices_to_remove, self.filter_value)
return scores
class TailFreeLogitsWarper(LogitsWarper):
def __init__(self, tfs: float, filter_value: float = -float("Inf"), min_tokens_to_keep: int = 1):
tfs = float(tfs)
if tfs < 0 or tfs > 1.0:
raise ValueError(f"`tfs` has to be a float >= 0 and <= 1, but is {tfs}")
self.tfs = tfs
self.filter_value = filter_value
self.min_tokens_to_keep = min_tokens_to_keep
def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor) -> torch.FloatTensor:
sorted_logits, sorted_indices = torch.sort(scores, descending=True)
probs = sorted_logits.softmax(dim=-1)
# Compute second derivative normalized CDF
d2 = probs.diff().diff().abs()
normalized_d2 = d2 / d2.sum(dim=-1, keepdim=True)
normalized_d2_cdf = normalized_d2.cumsum(dim=-1)
# Remove tokens with CDF value above the threshold (token with 0 are kept)
sorted_indices_to_remove = normalized_d2_cdf > self.tfs
# Centre the distribution around the cutoff as in the original implementation of the algorithm
sorted_indices_to_remove = torch.cat(
(
torch.zeros(scores.shape[0], 1, dtype=torch.bool, device=scores.device),
sorted_indices_to_remove,
torch.ones(scores.shape[0], 1, dtype=torch.bool, device=scores.device),
),
dim=-1,
)
if self.min_tokens_to_keep > 1:
# Keep at least min_tokens_to_keep
sorted_indices_to_remove[..., : self.min_tokens_to_keep] = 0
indices_to_remove = sorted_indices_to_remove.scatter(1, sorted_indices, sorted_indices_to_remove)
scores = scores.masked_fill(indices_to_remove, self.filter_value)
return scores
class TopALogitsWarper(LogitsWarper):
def __init__(self, top_a: float, filter_value: float = -float("Inf"), min_tokens_to_keep: int = 1):
top_a = float(top_a)
if top_a < 0 or top_a > 1.0:
raise ValueError(f"`top_a` has to be a float >= 0 and <= 1, but is {top_a}")
self.top_a = top_a
self.filter_value = filter_value
self.min_tokens_to_keep = min_tokens_to_keep
def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor) -> torch.FloatTensor:
sorted_logits, sorted_indices = torch.sort(scores, descending=True)
probs = sorted_logits.softmax(dim=-1)
# Remove tokens with probability less than top_a*(max(probs))^2 (token with 0 are kept)
probs_max = probs[..., 0, None]
sorted_indices_to_remove = probs < probs_max * probs_max * self.top_a
if self.min_tokens_to_keep > 1:
# Keep at least min_tokens_to_keep
sorted_indices_to_remove[..., : self.min_tokens_to_keep] = 0
indices_to_remove = sorted_indices_to_remove.scatter(1, sorted_indices, sorted_indices_to_remove)
scores = scores.masked_fill(indices_to_remove, self.filter_value)
return scores
class MirostatLogitsWarper(LogitsWarper):
def __init__(self, mirostat_mode: int, mirostat_tau: float, mirostat_eta: float, filter_value: float = -float("Inf"), min_tokens_to_keep: int = 1):
if mirostat_mode not in [2]:
raise ValueError(f"`mirostat` has to be a an integer 2, but is {mirostat_mode}")
self.mirostat_mode = mirostat_mode
self.mirostat_eta = mirostat_eta
self.mirostat_tau = mirostat_tau
self.filter_value = filter_value
self.min_tokens_to_keep = min_tokens_to_keep
self.mu = 2 * self.mirostat_tau
self.e = 0
def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor) -> torch.FloatTensor:
logits = scores[0]
sorted_logits, sorted_indices = torch.sort(logits, descending=True)
prob_original = torch.softmax(sorted_logits, dim=-1).tolist() # candidates
# Truncate the words with surprise values greater than mu
for i, candidate in enumerate(prob_original):
if candidate > 0 and -math.log2(candidate) > self.mu:
if (i == 0):
sorted_logits = sorted_logits[:1]
else:
sorted_logits = sorted_logits[:i]
break
# Normalize the probabilities of the remaining words
if is_torch_xpu_available():
prob_topk = torch.softmax(sorted_logits, dim=0).to("xpu")
prev_i = torch.multinomial(prob_topk, num_samples=1, replacement=True).to("xpu")
else:
prob_topk = torch.softmax(sorted_logits, dim=0).to('cuda')
prev_i = torch.multinomial(prob_topk, num_samples=1, replacement=True).to('cuda')
observed_surprise = -math.log2(prob_topk[prev_i])
self.e = observed_surprise - self.mirostat_tau
# Update mu using the learning rate and error
self.mu -= self.mirostat_eta * self.e
sorted_indices_to_remove = torch.ones_like(scores[0], dtype=torch.bool)
sorted_indices_to_remove[prev_i] = False
indices_to_remove = sorted_indices_to_remove.unsqueeze(0).scatter(1, sorted_indices.unsqueeze(0), sorted_indices_to_remove.unsqueeze(0))
scores = scores.masked_fill(indices_to_remove, self.filter_value)
return scores
class SpyLogitsWarper(LogitsWarper):
def __init__(self):
pass
def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor) -> torch.FloatTensor:
global global_scores
global_scores = scores
return scores
class RepetitionPenaltyLogitsProcessorWithRange(LogitsProcessor):
'''
Copied from the transformers library
'''
def __init__(self, penalty: float, presence_penalty: float, frequency_penalty: float, _range: int):
if not (penalty > 0):
raise ValueError(f"`penalty` has to be strictly positive, but is {penalty}")
self.penalty = penalty
self.presence_penalty = presence_penalty
self.frequency_penalty = frequency_penalty
self._range = _range
def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor) -> torch.FloatTensor:
input_ids = input_ids[:, -self._range:]
# We loop here because torch.unique() needs to process each row separately in the
# case that batch_size > 1.
for input_ids_row, scores_row in zip(input_ids, scores):
unique_ids, counts = torch.unique(input_ids_row, return_counts=True)
score = torch.gather(scores_row, 0, unique_ids)
# multiplicative repetition penalty
# if score < 0 then repetition penalty has to be multiplied to reduce the previous token probability
score = torch.where(score < 0, score * self.penalty, score / self.penalty)
scores_row.scatter_(0, unique_ids, score)
# presence_penalty and frequency_penalty
raw_presence_penalty = (counts > 0).to(scores.dtype)
raw_frequency_penalty = counts.to(scores.dtype)
additive_penalty = raw_presence_penalty * self.presence_penalty + raw_frequency_penalty * self.frequency_penalty
scores_row.scatter_add_(0, unique_ids, -additive_penalty)
return scores
def get_logits_warper_patch(self, generation_config):
# Make sure that temperature is float and not int
if isinstance(generation_config.temperature, int):
generation_config.temperature = float(generation_config.temperature)
temperature = generation_config.temperature
if generation_config.dynamic_temperature:
# Make sure TemperatureLogitsWarper will be created by temporarily
# setting temperature to a value != 1.
generation_config.temperature = 1.1
warpers = self._get_logits_warper_old(generation_config)
for i in range(len(warpers)):
if warpers[i].__class__.__name__ == 'TemperatureLogitsWarper':
warpers[i] = TemperatureLogitsWarperWithDynatemp(temperature, generation_config.dynamic_temperature, generation_config.dynamic_temperature_low)
warpers_to_add = LogitsProcessorList()
min_tokens_to_keep = 2 if generation_config.num_beams > 1 else 1
if generation_config.mirostat_mode is not None and generation_config.mirostat_mode == 2:
warpers_to_add.append(MirostatLogitsWarper(mirostat_mode=generation_config.mirostat_mode, mirostat_eta=generation_config.mirostat_eta, mirostat_tau=generation_config.mirostat_tau, min_tokens_to_keep=min_tokens_to_keep))
# We need to disable samplers other than temperature
for warper in warpers:
if not isinstance(warper, TemperatureLogitsWarper):
warpers.remove(warper)
else:
if generation_config.tfs is not None and 0.0 <= generation_config.tfs < 1.0:
warpers_to_add.append(TailFreeLogitsWarper(tfs=generation_config.tfs, min_tokens_to_keep=min_tokens_to_keep))
if generation_config.top_a is not None and 0.0 < generation_config.top_a <= 1.0:
warpers_to_add.append(TopALogitsWarper(top_a=generation_config.top_a, min_tokens_to_keep=min_tokens_to_keep))
if generation_config.min_p is not None and 0.0 < generation_config.min_p <= 1.0:
warpers_to_add.append(MinPLogitsWarper(min_p=generation_config.min_p, min_tokens_to_keep=min_tokens_to_keep))
if len(warpers) > 0 and isinstance(warpers[-1], LogitNormalization):
normalize = warpers.pop(-1)
else:
normalize = None
warpers += warpers_to_add
if generation_config.temperature_last:
temperature_idx = None
for i in range(len(warpers)):
if warpers[i].__class__.__name__ in ['TemperatureLogitsWarper', 'TemperatureLogitsWarperWithDynatemp']:
temperature_idx = i
break
if temperature_idx is not None:
warpers.append(warpers.pop(temperature_idx))
if normalize is not None:
warpers.append(normalize)
warpers.append(SpyLogitsWarper())
warpers = LogitsProcessorList(warpers)
# for i in range(len(warpers)):
# print(warpers[i].__class__.__name__)
return warpers
def get_logits_processor_patch(self, **kwargs):
repetition_penalty = kwargs['generation_config'].repetition_penalty
presence_penalty = kwargs['generation_config'].presence_penalty
frequency_penalty = kwargs['generation_config'].frequency_penalty
repetition_penalty_range = kwargs['generation_config'].repetition_penalty_range
do_rep_pen_hijack = (repetition_penalty > 1) or (presence_penalty != 0) or (frequency_penalty != 0)
if do_rep_pen_hijack:
# Make sure that a RepetitionPenaltyLogitsProcessor will be created
kwargs['generation_config'].repetition_penalty = 1.1 # must set to some value > 1
result = self._get_logits_processor_old(**kwargs)
if do_rep_pen_hijack:
for i in range(len(result)):
if result[i].__class__.__name__ == 'RepetitionPenaltyLogitsProcessor':
result[i] = RepetitionPenaltyLogitsProcessorWithRange(repetition_penalty, presence_penalty, frequency_penalty, repetition_penalty_range)
return result
def generation_config_init_patch(self, **kwargs):
self.__init___old(**kwargs)
self.min_p = kwargs.pop("min_p", 0.0)
self.dynamic_temperature = kwargs.pop("dynamic_temperature", False)
self.dynamic_temperature_low = kwargs.pop("dynamic_temperature_low", 0.1)
self.tfs = kwargs.pop("tfs", 1.0)
self.top_a = kwargs.pop("top_a", 0.0)
self.mirostat_mode = kwargs.pop("mirostat_mode", 0)
self.mirostat_eta = kwargs.pop("mirostat_eta", 0.1)
self.mirostat_tau = kwargs.pop("mirostat_tau", 5)
self.repetition_penalty_range = kwargs.pop("repetition_penalty_range", 0)
self.presence_penalty = kwargs.pop("presence_penalty", 0)
self.frequency_penalty = kwargs.pop("frequency_penalty", 0)
self.temperature_last = kwargs.pop("temperature_last", False)
def hijack_samplers():
transformers.GenerationMixin._get_logits_warper_old = transformers.GenerationMixin._get_logits_warper
transformers.GenerationMixin._get_logits_warper = get_logits_warper_patch
transformers.GenerationMixin._get_logits_processor_old = transformers.GenerationMixin._get_logits_processor
transformers.GenerationMixin._get_logits_processor = get_logits_processor_patch
transformers.GenerationConfig.__init___old = transformers.GenerationConfig.__init__
transformers.GenerationConfig.__init__ = generation_config_init_patch

View file

@ -0,0 +1,339 @@
#
# Copyright 2016 The BigDL Authors.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# This file is adapted from
# https://github.com/oobabooga/text-generation-webui/blob/main/modules/shared.py
import argparse
import os
import sys
from collections import OrderedDict
from pathlib import Path
import yaml
from modules.logging_colors import logger
# Model variables
model = None
tokenizer = None
model_name = 'None'
is_seq2seq = False
model_dirty_from_training = False
lora_names = []
# Generation variables
stop_everything = False
generation_lock = None
processing_message = '*Is typing...*'
# UI variables
gradio = {}
persistent_interface_state = {}
need_restart = False
# UI defaults
settings = {
'dark_theme': True,
'show_controls': True,
'start_with': '',
'mode': 'chat',
'chat_style': 'cai-chat',
'prompt-default': 'QA',
'prompt-notebook': 'QA',
'preset': 'simple-1',
'max_new_tokens': 512,
'max_new_tokens_min': 1,
'max_new_tokens_max': 4096,
'negative_prompt': '',
'seed': -1,
'truncation_length': 2048,
'truncation_length_min': 0,
'truncation_length_max': 200000,
'max_tokens_second': 0,
'max_updates_second': 0,
'custom_stopping_strings': '',
'custom_token_bans': '',
'auto_max_new_tokens': False,
'ban_eos_token': False,
'add_bos_token': True,
'skip_special_tokens': True,
'stream': True,
'character': 'Assistant',
'name1': 'You',
'custom_system_message': '',
'instruction_template_str': "{%- set ns = namespace(found=false) -%}\n{%- for message in messages -%}\n {%- if message['role'] == 'system' -%}\n {%- set ns.found = true -%}\n {%- endif -%}\n{%- endfor -%}\n{%- if not ns.found -%}\n {{- '' + 'Below is an instruction that describes a task. Write a response that appropriately completes the request.' + '\\n\\n' -}}\n{%- endif %}\n{%- for message in messages %}\n {%- if message['role'] == 'system' -%}\n {{- '' + message['content'] + '\\n\\n' -}}\n {%- else -%}\n {%- if message['role'] == 'user' -%}\n {{-'### Instruction:\\n' + message['content'] + '\\n\\n'-}}\n {%- else -%}\n {{-'### Response:\\n' + message['content'] + '\\n\\n' -}}\n {%- endif -%}\n {%- endif -%}\n{%- endfor -%}\n{%- if add_generation_prompt -%}\n {{-'### Response:\\n'-}}\n{%- endif -%}",
'chat_template_str': "{%- for message in messages %}\n {%- if message['role'] == 'system' -%}\n {{- message['content'] + '\\n\\n' -}}\n {%- else -%}\n {%- if message['role'] == 'user' -%}\n {{- name1 + ': ' + message['content'] + '\\n'-}}\n {%- else -%}\n {{- name2 + ': ' + message['content'] + '\\n' -}}\n {%- endif -%}\n {%- endif -%}\n{%- endfor -%}",
'chat-instruct_command': 'Continue the chat dialogue below. Write a single reply for the character "<|character|>".\n\n<|prompt|>',
'autoload_model': False,
'gallery-items_per_page': 50,
'gallery-open': False,
'default_extensions': ['gallery'],
}
# Parser copied from https://github.com/vladmandic/automatic
parser = argparse.ArgumentParser(description="Text generation web UI", conflict_handler='resolve', add_help=True, formatter_class=lambda prog: argparse.HelpFormatter(prog, max_help_position=55, indent_increment=2, width=200))
# Basic settings
group = parser.add_argument_group('Basic settings')
group.add_argument('--multi-user', action='store_true', help='Multi-user mode. Chat histories are not saved or automatically loaded. Warning: this is likely not safe for sharing publicly.')
group.add_argument('--character', type=str, help='The name of the character to load in chat mode by default.')
group.add_argument('--model', type=str, help='Name of the model to load by default.')
group.add_argument('--lora', type=str, nargs='+', help='The list of LoRAs to load. If you want to load more than one LoRA, write the names separated by spaces.')
group.add_argument('--model-dir', type=str, default='models/', help='Path to directory with all the models.')
group.add_argument('--lora-dir', type=str, default='loras/', help='Path to directory with all the loras.')
group.add_argument('--model-menu', action='store_true', help='Show a model menu in the terminal when the web UI is first launched.')
group.add_argument('--settings', type=str, help='Load the default interface settings from this yaml file. See settings-template.yaml for an example. If you create a file called settings.yaml, this file will be loaded by default without the need to use the --settings flag.')
group.add_argument('--extensions', type=str, nargs='+', help='The list of extensions to load. If you want to load more than one extension, write the names separated by spaces.')
group.add_argument('--verbose', action='store_true', help='Print the prompts to the terminal.')
group.add_argument('--chat-buttons', action='store_true', help='Show buttons on the chat tab instead of a hover menu.')
# Model loader
group = parser.add_argument_group('Model loader')
group.add_argument('--loader', type=str, help='Choose the model loader manually, otherwise, it will get autodetected. Valid options: Transformers, llama.cpp, llamacpp_HF, ExLlamav2_HF, ExLlamav2, AutoGPTQ, AutoAWQ, GPTQ-for-LLaMa, ctransformers, QuIP#.')
# Transformers/Accelerate
group = parser.add_argument_group('Transformers/Accelerate')
group.add_argument('--cpu', action='store_true', help='Use the CPU to generate text. Warning: Training on CPU is extremely slow.')
group.add_argument('--auto-devices', action='store_true', help='Automatically split the model across the available GPU(s) and CPU.')
group.add_argument('--gpu-memory', type=str, nargs='+', help='Maximum GPU memory in GiB to be allocated per GPU. Example: --gpu-memory 10 for a single GPU, --gpu-memory 10 5 for two GPUs. You can also set values in MiB like --gpu-memory 3500MiB.')
group.add_argument('--cpu-memory', type=str, help='Maximum CPU memory in GiB to allocate for offloaded weights. Same as above.')
group.add_argument('--disk', action='store_true', help='If the model is too large for your GPU(s) and CPU combined, send the remaining layers to the disk.')
group.add_argument('--disk-cache-dir', type=str, default='cache', help='Directory to save the disk cache to. Defaults to "cache".')
group.add_argument('--load-in-8bit', action='store_true', help='Load the model with 8-bit precision (using bitsandbytes).')
group.add_argument('--bf16', action='store_true', help='Load the model with bfloat16 precision. Requires NVIDIA Ampere GPU.')
group.add_argument('--no-cache', action='store_true', help='Set use_cache to False while generating text. This reduces VRAM usage slightly, but it comes at a performance cost.')
group.add_argument('--trust-remote-code', action='store_true', help='Set trust_remote_code=True while loading the model. Necessary for some models.')
group.add_argument('--force-safetensors', action='store_true', help='Set use_safetensors=True while loading the model. This prevents arbitrary code execution.')
group.add_argument('--no_use_fast', action='store_true', help='Set use_fast=False while loading the tokenizer (it\'s True by default). Use this if you have any problems related to use_fast.')
group.add_argument('--use_flash_attention_2', action='store_true', help='Set use_flash_attention_2=True while loading the model.')
# bitsandbytes 4-bit
group = parser.add_argument_group('bitsandbytes 4-bit')
group.add_argument('--load-in-4bit', action='store_true', help='Load the model with 4-bit precision (using bitsandbytes).')
group.add_argument('--use_double_quant', action='store_true', help='use_double_quant for 4-bit.')
group.add_argument('--compute_dtype', type=str, default='float16', help='compute dtype for 4-bit. Valid options: bfloat16, float16, float32.')
group.add_argument('--quant_type', type=str, default='nf4', help='quant_type for 4-bit. Valid options: nf4, fp4.')
# llama.cpp
group = parser.add_argument_group('llama.cpp')
group.add_argument('--tensorcores', action='store_true', help='Use llama-cpp-python compiled with tensor cores support. This increases performance on RTX cards. NVIDIA only.')
group.add_argument('--n_ctx', type=int, default=2048, help='Size of the prompt context.')
group.add_argument('--threads', type=int, default=0, help='Number of threads to use.')
group.add_argument('--threads-batch', type=int, default=0, help='Number of threads to use for batches/prompt processing.')
group.add_argument('--no_mul_mat_q', action='store_true', help='Disable the mulmat kernels.')
group.add_argument('--n_batch', type=int, default=512, help='Maximum number of prompt tokens to batch together when calling llama_eval.')
group.add_argument('--no-mmap', action='store_true', help='Prevent mmap from being used.')
group.add_argument('--mlock', action='store_true', help='Force the system to keep the model in RAM.')
group.add_argument('--n-gpu-layers', type=int, default=0, help='Number of layers to offload to the GPU.')
group.add_argument('--tensor_split', type=str, default=None, help='Split the model across multiple GPUs. Comma-separated list of proportions. Example: 18,17.')
group.add_argument('--numa', action='store_true', help='Activate NUMA task allocation for llama.cpp.')
group.add_argument('--logits_all', action='store_true', help='Needs to be set for perplexity evaluation to work. Otherwise, ignore it, as it makes prompt processing slower.')
group.add_argument('--no_offload_kqv', action='store_true', help='Do not offload the K, Q, V to the GPU. This saves VRAM but reduces the performance.')
group.add_argument('--cache-capacity', type=str, help='Maximum cache capacity (llama-cpp-python). Examples: 2000MiB, 2GiB. When provided without units, bytes will be assumed.')
# ExLlama
group = parser.add_argument_group('ExLlama')
group.add_argument('--gpu-split', type=str, help='Comma-separated list of VRAM (in GB) to use per GPU device for model layers. Example: 20,7,7.')
group.add_argument('--max_seq_len', type=int, default=2048, help='Maximum sequence length.')
group.add_argument('--cfg-cache', action='store_true', help='ExLlamav2_HF: Create an additional cache for CFG negative prompts. Necessary to use CFG with that loader.')
group.add_argument('--no_flash_attn', action='store_true', help='Force flash-attention to not be used.')
group.add_argument('--cache_8bit', action='store_true', help='Use 8-bit cache to save VRAM.')
group.add_argument('--num_experts_per_token', type=int, default=2, help='Number of experts to use for generation. Applies to MoE models like Mixtral.')
# AutoGPTQ
group = parser.add_argument_group('AutoGPTQ')
group.add_argument('--triton', action='store_true', help='Use triton.')
group.add_argument('--no_inject_fused_attention', action='store_true', help='Disable the use of fused attention, which will use less VRAM at the cost of slower inference.')
group.add_argument('--no_inject_fused_mlp', action='store_true', help='Triton mode only: disable the use of fused MLP, which will use less VRAM at the cost of slower inference.')
group.add_argument('--no_use_cuda_fp16', action='store_true', help='This can make models faster on some systems.')
group.add_argument('--desc_act', action='store_true', help='For models that do not have a quantize_config.json, this parameter is used to define whether to set desc_act or not in BaseQuantizeConfig.')
group.add_argument('--disable_exllama', action='store_true', help='Disable ExLlama kernel, which can improve inference speed on some systems.')
group.add_argument('--disable_exllamav2', action='store_true', help='Disable ExLlamav2 kernel.')
# GPTQ-for-LLaMa
group = parser.add_argument_group('GPTQ-for-LLaMa')
group.add_argument('--wbits', type=int, default=0, help='Load a pre-quantized model with specified precision in bits. 2, 3, 4 and 8 are supported.')
group.add_argument('--model_type', type=str, help='Model type of pre-quantized model. Currently LLaMA, OPT, and GPT-J are supported.')
group.add_argument('--groupsize', type=int, default=-1, help='Group size.')
group.add_argument('--pre_layer', type=int, nargs='+', help='The number of layers to allocate to the GPU. Setting this parameter enables CPU offloading for 4-bit models. For multi-gpu, write the numbers separated by spaces, eg --pre_layer 30 60.')
group.add_argument('--checkpoint', type=str, help='The path to the quantized checkpoint file. If not specified, it will be automatically detected.')
group.add_argument('--monkey-patch', action='store_true', help='Apply the monkey patch for using LoRAs with quantized models.')
# BigDL-LLM
group = parser.add_argument_group('BigDL-LLM')
group.add_argument('--device', type=str, default='cpu', help='the device type, it could be CPU or GPU')
group.add_argument('--load-in-4bit', action='store_true', default=False, help='boolean value, True means loading linears weight to symmetric int 4 if'\
'the model is a regular fp16/bf16/fp32 model, and to asymmetric int 4 if the model is GPTQ model.Default to be False')
group.add_argument('--load-in-low-bit', type=str, default=None, help='str value, options are sym_int4, asym_int4, sym_int5, asym_int5'\
', sym_int8, nf3, nf4, fp4, fp8, fp8_e4m3, fp8_e5m2, fp16 or bf16. sym_int4 means symmetric int 4, asym_int4 means asymmetric int 4,'\
'nf4 means 4-bit NormalFloat, etc. Relevant low bit optimizations will be applied to the model.')
group.add_argument('--optimize-model', action='store_true', help='boolean value, Whether to further optimize the low_bit llm model.')
group.add_argument('--modules-to-not-convert', type=str, default=None, help='list of str value, modules (nn.Module) that are skipped when conducting model optimizations.')
group.add_argument('--cpu-embedding', action='store_true', help='Whether to replace the Embedding layer, may need to set it to `True` when running BigDL-LLM on GPU on Windows. Default to be `False`')
group.add_argument('--lightweight-bmm', action='store_true', help='Whether to replace the torch.bmm ops, may need to set it to `True` when running BigDL-LLM on GPU on Windows.')
group.add_argument('--use-cache', action='store_true', help='If use_cache is True, past key values are used to speed up decoding if applicable to model.')
group.add_argument('--trust-remote-code', action='store_true', help='Set trust_remote_code=True while loading the model. Necessary for some models.')
# HQQ
group = parser.add_argument_group('HQQ')
group.add_argument('--hqq_backend', type=str, default='PYTORCH_COMPILE', help='Backend for the HQQ loader. Valid options: PYTORCH, PYTORCH_COMPILE, ATEN.')
# DeepSpeed
group = parser.add_argument_group('DeepSpeed')
group.add_argument('--deepspeed', action='store_true', help='Enable the use of DeepSpeed ZeRO-3 for inference via the Transformers integration.')
group.add_argument('--nvme-offload-dir', type=str, help='DeepSpeed: Directory to use for ZeRO-3 NVME offloading.')
group.add_argument('--local_rank', type=int, default=0, help='DeepSpeed: Optional argument for distributed setups.')
# RoPE
group = parser.add_argument_group('RoPE')
group.add_argument('--alpha_value', type=float, default=1, help='Positional embeddings alpha factor for NTK RoPE scaling. Use either this or compress_pos_emb, not both.')
group.add_argument('--rope_freq_base', type=int, default=0, help='If greater than 0, will be used instead of alpha_value. Those two are related by rope_freq_base = 10000 * alpha_value ^ (64 / 63).')
group.add_argument('--compress_pos_emb', type=int, default=1, help="Positional embeddings compression factor. Should be set to (context length) / (model\'s original context length). Equal to 1/rope_freq_scale.")
# Gradio
group = parser.add_argument_group('Gradio')
group.add_argument('--listen', action='store_true', help='Make the web UI reachable from your local network.')
group.add_argument('--listen-port', type=int, help='The listening port that the server will use.')
group.add_argument('--listen-host', type=str, help='The hostname that the server will use.')
group.add_argument('--share', action='store_true', help='Create a public URL. This is useful for running the web UI on Google Colab or similar.')
group.add_argument('--auto-launch', action='store_true', default=False, help='Open the web UI in the default browser upon launch.')
group.add_argument('--gradio-auth', type=str, help='Set Gradio authentication password in the format "username:password". Multiple credentials can also be supplied with "u1:p1,u2:p2,u3:p3".', default=None)
group.add_argument('--gradio-auth-path', type=str, help='Set the Gradio authentication file path. The file should contain one or more user:password pairs in the same format as above.', default=None)
group.add_argument('--ssl-keyfile', type=str, help='The path to the SSL certificate key file.', default=None)
group.add_argument('--ssl-certfile', type=str, help='The path to the SSL certificate cert file.', default=None)
# API
group = parser.add_argument_group('API')
group.add_argument('--api', action='store_true', help='Enable the API extension.')
group.add_argument('--public-api', action='store_true', help='Create a public URL for the API using Cloudfare.')
group.add_argument('--public-api-id', type=str, help='Tunnel ID for named Cloudflare Tunnel. Use together with public-api option.', default=None)
group.add_argument('--api-port', type=int, default=5000, help='The listening port for the API.')
group.add_argument('--api-key', type=str, default='', help='API authentication key.')
group.add_argument('--admin-key', type=str, default='', help='API authentication key for admin tasks like loading and unloading models. If not set, will be the same as --api-key.')
group.add_argument('--nowebui', action='store_true', help='Do not launch the Gradio UI. Useful for launching the API in standalone mode.')
# Multimodal
group = parser.add_argument_group('Multimodal')
group.add_argument('--multimodal-pipeline', type=str, default=None, help='The multimodal pipeline to use. Examples: llava-7b, llava-13b.')
# Deprecated parameters
# group = parser.add_argument_group('Deprecated')
args = parser.parse_args()
args_defaults = parser.parse_args([])
provided_arguments = []
for arg in sys.argv[1:]:
arg = arg.lstrip('-').replace('-', '_')
if hasattr(args, arg):
provided_arguments.append(arg)
deprecated_args = []
def do_cmd_flags_warnings():
# Deprecation warnings
for k in deprecated_args:
if getattr(args, k):
logger.warning(f'The --{k} flag has been deprecated and will be removed soon. Please remove that flag.')
# Security warnings
if args.trust_remote_code:
logger.warning('trust_remote_code is enabled. This is dangerous.')
if 'COLAB_GPU' not in os.environ and not args.nowebui:
if args.share:
logger.warning("The gradio \"share link\" feature uses a proprietary executable to create a reverse tunnel. Use it with care.")
if any((args.listen, args.share)) and not any((args.gradio_auth, args.gradio_auth_path)):
logger.warning("\nYou are potentially exposing the web UI to the entire internet without any access password.\nYou can create one with the \"--gradio-auth\" flag like this:\n\n--gradio-auth username:password\n\nMake sure to replace username:password with your own.")
if args.multi_user:
logger.warning('\nThe multi-user mode is highly experimental and should not be shared publicly.')
def fix_loader_name(name):
if not name:
return name
name = name.lower()
if name in ['llamacpp', 'llama.cpp', 'llama-cpp', 'llama cpp']:
return 'llama.cpp'
if name in ['llamacpp_hf', 'llama.cpp_hf', 'llama-cpp-hf', 'llamacpp-hf', 'llama.cpp-hf']:
return 'llamacpp_HF'
elif name in ['transformers', 'huggingface', 'hf', 'hugging_face', 'hugging face']:
return 'Transformers'
elif name in ['autogptq', 'auto-gptq', 'auto_gptq', 'auto gptq']:
return 'AutoGPTQ'
elif name in ['gptq-for-llama', 'gptqforllama', 'gptqllama', 'gptq for llama', 'gptq_for_llama']:
return 'GPTQ-for-LLaMa'
elif name in ['exllama', 'ex-llama', 'ex_llama', 'exlama']:
return 'ExLlama'
elif name in ['exllamav2', 'exllama-v2', 'ex_llama-v2', 'exlamav2', 'exlama-v2', 'exllama2', 'exllama-2']:
return 'ExLlamav2'
elif name in ['exllamav2-hf', 'exllamav2_hf', 'exllama-v2-hf', 'exllama_v2_hf', 'exllama-v2_hf', 'exllama2-hf', 'exllama2_hf', 'exllama-2-hf', 'exllama_2_hf', 'exllama-2_hf']:
return 'ExLlamav2_HF'
elif name in ['ctransformers', 'ctranforemrs', 'ctransformer']:
return 'ctransformers'
elif name in ['autoawq', 'awq', 'auto-awq']:
return 'AutoAWQ'
elif name in ['quip#', 'quip-sharp', 'quipsharp', 'quip_sharp']:
return 'QuIP#'
elif name in ['hqq']:
return 'HQQ'
elif name in ['BigDL-LLM', 'bigdl-llm', 'bigdl']:
return 'BigDL-LLM'
def add_extension(name, last=False):
if args.extensions is None:
args.extensions = [name]
elif last:
args.extensions = [x for x in args.extensions if x != name]
args.extensions.append(name)
elif name not in args.extensions:
args.extensions.append(name)
def is_chat():
return True
args.loader = fix_loader_name(args.loader)
# Activate the multimodal extension
if args.multimodal_pipeline is not None:
add_extension('multimodal')
# Activate the API extension
if args.api or args.public_api:
add_extension('openai', last=True)
# Load model-specific settings
with Path(f'{args.model_dir}/config.yaml') as p:
if p.exists():
model_config = yaml.safe_load(open(p, 'r').read())
else:
model_config = {}
# Load custom model-specific settings
with Path(f'{args.model_dir}/config-user.yaml') as p:
if p.exists():
user_config = yaml.safe_load(open(p, 'r').read())
else:
user_config = {}
model_config = OrderedDict(model_config)
user_config = OrderedDict(user_config)

Some files were not shown because too many files have changed in this diff Show more