From ac426cbd7d8a4f360186e171e5a088a08a6f91e2 Mon Sep 17 00:00:00 2001
From: JianweiYu <tomasyu1994@outlook.com>
Date: Mon, 25 Aug 2025 09:33:01 -0700
Subject: [PATCH] update

---
 README.md | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/README.md b/README.md
index 331dec3..6f762a1 100644
--- a/README.md
+++ b/README.md
@@ -17,6 +17,13 @@ A core innovation of VibeVoice is its use of continuous speech tokenizers (Acous
 
 The model can synthesize speech up to **90 minutes** long with up to **4 distinct speakers**, surpassing the typical 1-2 speaker limits of many prior models. 
 
+
+<p align="left">
+  <img src="Figures/MOS-preference.png" alt="MOS Preference Results" height="260px">
+  <img src="Figures/VibeVoice.jpg" alt="VibeVoice Overview" height="250px" style="margin-right: 10px;">
+</p>
+
+
 ### 🎵 Demo Examples
 
 **Cross-Lingual**
@@ -29,11 +36,6 @@ https://github.com/user-attachments/assets/6f27a8a5-0c60-4f57-87f3-7dea2e11c730
 
 For more examples, try it out via [Demo](https://aka.ms/VibeVoice-Demo).
 
-<p align="left">
-  <img src="Figures/MOS-preference.png" alt="MOS Preference Results" height="260px">
-  <img src="Figures/VibeVoice.jpg" alt="VibeVoice Overview" height="250px" style="margin-right: 10px;">
-</p>
-
 ## Models
 | Model | Context Length | Generation Length |  Weight |
 |-------|----------------|----------|----------|