VibeVoice/demo/VibeVoice_colab.ipynb
2025-08-27 18:57:33 -07:00

169 lines
No EOL
4.2 KiB
Text
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

{
"cells": [
{
"cell_type": "markdown",
"source": [
"# VibeVoice Colab — T4 Quickstart (1.5B)\n",
"This page provides a quickstart guide to run VibeVoice on Colab with T4.\n",
"\n",
"T4 only support 1.5B model due to GPU memory. For the real WOW TTS experience, please try the 7B model on a stronger GPU.\n"
],
"metadata": {
"id": "AHLptWHtQmw-"
},
"id": "AHLptWHtQmw-"
},
{
"cell_type": "markdown",
"source": [
"## Step 1: Use T4\n",
"\n"
],
"metadata": {
"id": "vzwhx5AtQ37g"
},
"id": "vzwhx5AtQ37g"
},
{
"cell_type": "markdown",
"source": [
"Use T4 in Colab: go to Runtime → Change runtime type → Hardware accelerator: GPU → T4."
],
"metadata": {
"id": "ryxffqxlVbbP"
},
"id": "ryxffqxlVbbP"
},
{
"cell_type": "code",
"source": [
"import torch\n",
"print(torch.cuda.is_available())\n",
"!nvidia-smi"
],
"metadata": {
"id": "Hek0yZKdVot_"
},
"id": "Hek0yZKdVot_",
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"## Step 2: Env Install"
],
"metadata": {
"id": "S8D9WNSvWFwy"
},
"id": "S8D9WNSvWFwy"
},
{
"cell_type": "code",
"source": [
"!git clone https://github.com/microsoft/VibeVoice.git\n",
"\n",
"import os\n",
"os.chdir(\"./VibeVoice\")\n",
"\n",
"!apt update && apt install ffmpeg -y\n",
"!pip install -e ."
],
"metadata": {
"id": "2xGbc7gKMD7A"
},
"id": "2xGbc7gKMD7A",
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"## Step 3: Run VibeVoice"
],
"metadata": {
"id": "YmxjRFSFW4aE"
},
"id": "YmxjRFSFW4aE"
},
{
"cell_type": "code",
"source": [
"# First download checkpoint takes ~3 minutes\n",
"!python demo/inference_from_file.py --model_path microsoft/VibeVoice-1.5B --txt_path demo/text_examples/2p_short.txt --speaker_names Alice Frank\n",
"\n",
"from IPython.display import Audio\n",
"Audio(\"./outputs/2p_short_generated.wav\")"
],
"metadata": {
"id": "MfQ0geOJQNS5"
},
"id": "MfQ0geOJQNS5",
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"### Create your own example"
],
"metadata": {
"id": "Pd6-KX2Hdswx"
},
"id": "Pd6-KX2Hdswx"
},
{
"cell_type": "code",
"source": [
"text = \"\"\"Speaker 1: Can I try VibeVoice with my own example?\n",
"Speaker 2: Of course! VibeVoice is open-source, built to benefit everyone — youre welcome to try it out.\"\"\"\n",
"with open(\"demo/text_examples/my_example.txt\", \"w\", encoding=\"utf-8\") as f:\n",
" f.write(text)"
],
"metadata": {
"id": "ZB482MvXbg8M"
},
"id": "ZB482MvXbg8M",
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"source": [
"!python demo/inference_from_file.py --model_path microsoft/VibeVoice-1.5B --txt_path demo/text_examples/my_example.txt --speaker_names Alice Frank\n",
"Audio(\"./outputs/my_example_generated.wav\")\n"
],
"metadata": {
"id": "heoxL08yM-gf"
},
"id": "heoxL08yM-gf",
"execution_count": null,
"outputs": []
}
],
"metadata": {
"colab": {
"provenance": [],
"gpuType": "T4"
},
"kernelspec": {
"display_name": "Python 3",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.11"
},
"accelerator": "GPU"
},
"nbformat": 4,
"nbformat_minor": 5
}