169 lines
No EOL
4.2 KiB
Text
169 lines
No EOL
4.2 KiB
Text
{
|
||
"cells": [
|
||
{
|
||
"cell_type": "markdown",
|
||
"source": [
|
||
"# VibeVoice Colab — T4 Quickstart (1.5B)\n",
|
||
"This page provides a quickstart guide to run VibeVoice on Colab with T4.\n",
|
||
"\n",
|
||
"T4 only support 1.5B model due to GPU memory. For the real WOW TTS experience, please try the 7B model on a stronger GPU.\n"
|
||
],
|
||
"metadata": {
|
||
"id": "AHLptWHtQmw-"
|
||
},
|
||
"id": "AHLptWHtQmw-"
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"source": [
|
||
"## Step 1: Use T4\n",
|
||
"\n"
|
||
],
|
||
"metadata": {
|
||
"id": "vzwhx5AtQ37g"
|
||
},
|
||
"id": "vzwhx5AtQ37g"
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"source": [
|
||
"Use T4 in Colab: go to Runtime → Change runtime type → Hardware accelerator: GPU → T4."
|
||
],
|
||
"metadata": {
|
||
"id": "ryxffqxlVbbP"
|
||
},
|
||
"id": "ryxffqxlVbbP"
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"source": [
|
||
"import torch\n",
|
||
"print(torch.cuda.is_available())\n",
|
||
"!nvidia-smi"
|
||
],
|
||
"metadata": {
|
||
"id": "Hek0yZKdVot_"
|
||
},
|
||
"id": "Hek0yZKdVot_",
|
||
"execution_count": null,
|
||
"outputs": []
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"source": [
|
||
"## Step 2: Env Install"
|
||
],
|
||
"metadata": {
|
||
"id": "S8D9WNSvWFwy"
|
||
},
|
||
"id": "S8D9WNSvWFwy"
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"source": [
|
||
"!git clone https://github.com/microsoft/VibeVoice.git\n",
|
||
"\n",
|
||
"import os\n",
|
||
"os.chdir(\"./VibeVoice\")\n",
|
||
"\n",
|
||
"!apt update && apt install ffmpeg -y\n",
|
||
"!pip install -e ."
|
||
],
|
||
"metadata": {
|
||
"id": "2xGbc7gKMD7A"
|
||
},
|
||
"id": "2xGbc7gKMD7A",
|
||
"execution_count": null,
|
||
"outputs": []
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"source": [
|
||
"## Step 3: Run VibeVoice"
|
||
],
|
||
"metadata": {
|
||
"id": "YmxjRFSFW4aE"
|
||
},
|
||
"id": "YmxjRFSFW4aE"
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"source": [
|
||
"# First download checkpoint takes ~3 minutes\n",
|
||
"!python demo/inference_from_file.py --model_path microsoft/VibeVoice-1.5B --txt_path demo/text_examples/2p_short.txt --speaker_names Alice Frank\n",
|
||
"\n",
|
||
"from IPython.display import Audio\n",
|
||
"Audio(\"./outputs/2p_short_generated.wav\")"
|
||
],
|
||
"metadata": {
|
||
"id": "MfQ0geOJQNS5"
|
||
},
|
||
"id": "MfQ0geOJQNS5",
|
||
"execution_count": null,
|
||
"outputs": []
|
||
},
|
||
{
|
||
"cell_type": "markdown",
|
||
"source": [
|
||
"### Create your own example"
|
||
],
|
||
"metadata": {
|
||
"id": "Pd6-KX2Hdswx"
|
||
},
|
||
"id": "Pd6-KX2Hdswx"
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"source": [
|
||
"text = \"\"\"Speaker 1: Can I try VibeVoice with my own example?\n",
|
||
"Speaker 2: Of course! VibeVoice is open-source, built to benefit everyone — you’re welcome to try it out.\"\"\"\n",
|
||
"with open(\"demo/text_examples/my_example.txt\", \"w\", encoding=\"utf-8\") as f:\n",
|
||
" f.write(text)"
|
||
],
|
||
"metadata": {
|
||
"id": "ZB482MvXbg8M"
|
||
},
|
||
"id": "ZB482MvXbg8M",
|
||
"execution_count": null,
|
||
"outputs": []
|
||
},
|
||
{
|
||
"cell_type": "code",
|
||
"source": [
|
||
"!python demo/inference_from_file.py --model_path microsoft/VibeVoice-1.5B --txt_path demo/text_examples/my_example.txt --speaker_names Alice Frank\n",
|
||
"Audio(\"./outputs/my_example_generated.wav\")\n"
|
||
],
|
||
"metadata": {
|
||
"id": "heoxL08yM-gf"
|
||
},
|
||
"id": "heoxL08yM-gf",
|
||
"execution_count": null,
|
||
"outputs": []
|
||
}
|
||
],
|
||
"metadata": {
|
||
"colab": {
|
||
"provenance": [],
|
||
"gpuType": "T4"
|
||
},
|
||
"kernelspec": {
|
||
"display_name": "Python 3",
|
||
"name": "python3"
|
||
},
|
||
"language_info": {
|
||
"codemirror_mode": {
|
||
"name": "ipython",
|
||
"version": 3
|
||
},
|
||
"file_extension": ".py",
|
||
"mimetype": "text/x-python",
|
||
"name": "python",
|
||
"nbconvert_exporter": "python",
|
||
"pygments_lexer": "ipython3",
|
||
"version": "3.10.11"
|
||
},
|
||
"accelerator": "GPU"
|
||
},
|
||
"nbformat": 4,
|
||
"nbformat_minor": 5
|
||
} |