update
This commit is contained in:
parent
b711571f9f
commit
ac426cbd7d
1 changed files with 7 additions and 5 deletions
12
README.md
12
README.md
|
@ -17,6 +17,13 @@ A core innovation of VibeVoice is its use of continuous speech tokenizers (Acous
|
||||||
|
|
||||||
The model can synthesize speech up to **90 minutes** long with up to **4 distinct speakers**, surpassing the typical 1-2 speaker limits of many prior models.
|
The model can synthesize speech up to **90 minutes** long with up to **4 distinct speakers**, surpassing the typical 1-2 speaker limits of many prior models.
|
||||||
|
|
||||||
|
|
||||||
|
<p align="left">
|
||||||
|
<img src="Figures/MOS-preference.png" alt="MOS Preference Results" height="260px">
|
||||||
|
<img src="Figures/VibeVoice.jpg" alt="VibeVoice Overview" height="250px" style="margin-right: 10px;">
|
||||||
|
</p>
|
||||||
|
|
||||||
|
|
||||||
### 🎵 Demo Examples
|
### 🎵 Demo Examples
|
||||||
|
|
||||||
**Cross-Lingual**
|
**Cross-Lingual**
|
||||||
|
@ -29,11 +36,6 @@ https://github.com/user-attachments/assets/6f27a8a5-0c60-4f57-87f3-7dea2e11c730
|
||||||
|
|
||||||
For more examples, try it out via [Demo](https://aka.ms/VibeVoice-Demo).
|
For more examples, try it out via [Demo](https://aka.ms/VibeVoice-Demo).
|
||||||
|
|
||||||
<p align="left">
|
|
||||||
<img src="Figures/MOS-preference.png" alt="MOS Preference Results" height="260px">
|
|
||||||
<img src="Figures/VibeVoice.jpg" alt="VibeVoice Overview" height="250px" style="margin-right: 10px;">
|
|
||||||
</p>
|
|
||||||
|
|
||||||
## Models
|
## Models
|
||||||
| Model | Context Length | Generation Length | Weight |
|
| Model | Context Length | Generation Length | Weight |
|
||||||
|-------|----------------|----------|----------|
|
|-------|----------------|----------|----------|
|
||||||
|
|
Loading…
Reference in a new issue