Speech Transcription API is a RESTful service that processes audio input and converts speech into text using state-of-the-art speech recognition models. Ideal for building transcription tools, smart assistants, and voice-controlled applications.
- 🎤 Transcribe audio to text (STT, speech-to-text)
- 🔐 Secure JWT-based authentication
- ⚡ FastAPI backend with async support
- 🐳 Dockerized for easy deployment (CPU & GPU)
Follow the steps below to set up and run the Speech Transcription API using Docker (with optional GPU acceleration).
You can use either uv (recommended for speed) or pip.
uv sync- Create a virtual environment:
python -m venv .venv
- Activate the virtual environment:
source .venv/bin/activate # Linux/macOS # .venv\Scripts\activate # Windows
- Install the required packages:
pip install -r requirements.txt
Copy the example environment file and fill in the necessary values:
cp .env.example .envEdit the .env file to set your environment variables. You can use the default values or customize them as needed.
Start the Docker container with the following command:
docker-compose up --buildThis command will build the Docker image and start the container.
Set up the docker-compose.yml file to use GPU acceleration.
docker-compose up --buildThis command will build the Docker image and start the container with GPU support.
Then, API will be available at http://localhost:8000.
Documentation will be available at http://localhost:8000/docs.