Skip to content

Chatterbox with OpenAI-compatible endpoints, streaming support, multiple voices, and live LLM chat integration. Docker-ready with GPU acceleration.

Notifications You must be signed in to change notification settings

dwain-barnes/chatterbox-streaming-api-docker

Repository files navigation

Chatterbox Streaming API Docker

A high-performance, text-to-speech API with streaming capabilities, featuring OpenAI-compatible endpoints and multiple voice support. Built with FastAPI and optimised for Docker deployment with GPU acceleration.

πŸš€ Features

  • OpenAI-Compatible API: Drop-in replacement for OpenAI's TTS API endpoints
  • Real-time Streaming: Low-latency audio generation with chunked streaming
  • Multiple Voices: Six distinct voices (alloy, echo, fable, onyx, nova, shimmer)
  • GPU Acceleration: CUDA support for fast inference
  • Live Chat Integration: Real-time LLM + TTS chat with Ollama
  • Cross-Platform: Windows, macOS, and Linux support
  • Docker Ready: Easy deployment with Docker Compose
  • Voice Upload: Custom voice upload functionality

πŸ“‹ Prerequisites

  • Docker and Docker Compose
  • NVIDIA GPU with CUDA support (recommended)
  • Python 3.10+ (for local development)
  • Ollama (for chat functionality)

πŸ› οΈ Quick Start

1. Clone the Repository

git clone https://github.com/dwain-barnes/chatterbox-streaming-api-docker.git
cd chatterbox-streaming-api-docker

2. Build and Run with Docker

# Build the image with voice support
docker build -f Dockerfile.voices -t chatterbox-tts-voices .

# Run the container
docker run -d --gpus all -p 5001:5001 --name chatterbox-tts-voices chatterbox-tts-voices

Or use Docker Compose:

docker-compose up -d

3. Verify Installation

# Check server health
curl http://localhost:5001/health

# List available voices
curl http://localhost:5001/v1/voices

4. Test Voice Generation

# Test with curl
curl -X POST http://localhost:5001/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"input":"Hello, this is a test of the text-to-speech system.","voice":"alloy","model":"tts-1"}' \
  --output test.wav

# Or use the provided test scripts
./voice_test_script.sh        # Linux/macOS
voice_test.bat               # Windows

🎯 Usage Examples

Basic TTS API Call

import requests

response = requests.post(
    "http://localhost:5001/v1/audio/speech",
    json={
        "input": "Hello, world! This is Chatterbox speaking.",
        "voice": "nova",
        "model": "tts-1"
    }
)

with open("output.wav", "wb") as f:
    f.write(response.content)

Streaming TTS

import requests

response = requests.post(
    "http://localhost:5001/v1/audio/speech/stream",
    json={
        "input": "This is a streaming example with real-time audio generation.",
        "voice": "echo",
        "stream": True
    },
    stream=True
)

with open("streaming_output.wav", "wb") as f:
    for chunk in response.iter_content(chunk_size=1024):
        if chunk:
            f.write(chunk)

Real-time Chat with LLM + TTS

First, ensure Ollama is running:

# Install and run Ollama
ollama pull gemma3:latest
ollama serve

Then start the chat:

# Basic chat
python ultra_simple_chat.py

# With custom voice and model
python ultra_simple_chat.py --voice fable --llm-model gemma3:latest

πŸŽ™οΈ Available Voices

Voice Characteristics
alloy Professional, neutral, balanced
echo Clear, articulate, precise
fable Warm, friendly, storytelling
onyx Deep, authoritative, commanding
nova Energetic, enthusiastic, bright
shimmer Soft, gentle, soothing

πŸ”§ Configuration

Environment Variables

# Docker environment
NVIDIA_VISIBLE_DEVICES=all
PYTHONUNBUFFERED=1

# Model configuration
TORCH_HOME=/app/.cache
HF_HOME=/app/.cache
TRANSFORMERS_CACHE=/app/.cache

Server Parameters

python server.py [voices_dir] [supported_voices] [options]

Options:
  --host          Host to bind to (default: 0.0.0.0)
  --port          Port to bind to (default: 5001)
  --device        Device to use: cuda/mps/cpu (default: cuda)
  --exaggeration  Exaggeration factor 0-1 (default: 0.5)
  --cfg-weight    CFG weight 0-1 (default: 0.5)
  --temperature   Temperature for sampling (default: 0.8)
  --chunk-size    Chunk size for streaming (default: 50)

Chat Configuration

python ultra_simple_chat.py [options]

Options:
  --llm-url       Ollama URL (default: http://localhost:11434)
  --llm-model     LLM model (default: gemma2:latest)
  --tts-url       TTS URL (default: http://localhost:5001)
  --voice         TTS voice (default: alloy)
  --system-prompt Custom system prompt
  --debug         Enable debug mode

πŸ“š API Reference

OpenAI-Compatible Endpoints

POST /v1/audio/speech

Generate complete audio from text.

Request:

{
  "model": "tts-1",
  "input": "Text to speak",
  "voice": "alloy",
  "response_format": "wav",
  "speed": 1.0
}

Response: Audio file (WAV/MP3)

POST /v1/audio/speech/stream

Generate streaming audio from text.

Request:

{
  "model": "tts-1", 
  "input": "Text to speak",
  "voice": "nova",
  "stream": true
}

Response: Streaming audio chunks

GET /v1/models

List available models.

GET /v1/voices

List available voices.

POST /v1/voices/upload

Upload custom voice file.

πŸ§ͺ Development

Local Development Setup

# Install dependencies
pip install -r requirements.txt

# Install Chatterbox TTS
pip install git+https://github.com/davidbrowne17/chatterbox-streaming.git

# Run server locally
python server.py ./voices alloy,echo,fable,onyx,nova,shimmer --device cpu

Testing

# Test all voices
./voice_test_script.sh

# Debug mode (no chunking)
python debug_chat.py --debug

# Simple chat test
python ultra_simple_chat.py --debug

🐳 Docker Configuration

Build Options

# Standard build
docker build -t chatterbox-tts .

# With voice support
docker build -f Dockerfile.voices -t chatterbox-tts-voices .

# Development build
docker build --target development -t chatterbox-tts-dev .

Docker Compose

The docker-compose.yml includes:

  • GPU support
  • Volume mounting for voices
  • Health checks
  • Restart policies
  • Environment configuration

πŸ” Troubleshooting

Common Issues

GPU not detected:

# Check NVIDIA runtime
docker run --rm --gpus all nvidia/cuda:11.8-base-ubuntu20.04 nvidia-smi

# Fallback to CPU
docker run -p 5001:5001 chatterbox-tts-voices python server.py /app/voices alloy,echo --device cpu

Audio playback issues on Windows:

# Use debug version
python debug_chat.py

# Check audio format
file test.wav

Ollama connection issues:

# Check Ollama status
curl http://localhost:11434/api/tags

# Restart Ollama
ollama serve

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

Disclaimer

Don't use this model to do bad things.

⭐ Star this repository if you find it useful!

About

Chatterbox with OpenAI-compatible endpoints, streaming support, multiple voices, and live LLM chat integration. Docker-ready with GPU acceleration.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published