str2speech is a simple command-line tool for converting text to speech using Transformer-based text-to-speech (TTS) models. It supports multiple models and voice presets, allowing users to generate high-quality speech audio from text.
We just added support for Dia-1.6B and Hindi voices in Kokoro TTS.
We just added support for ByteDance's MegaTTS3. Here's how easy it is to use it:
str2speech --model megatts3 --text "This is awesome!"
Works fine with just a CPU (needs about 10GB of RAM). But it's always better if you have CUDA available.
We now support Microsoft's Speech T5. This is a very lightweight model, and sounds pretty good. Try it out with this:
str2speech --model "microsoft/speecht5_tts" \
--text "My dog is prettier than yours." \
--output "t5test.wav"We now support Spark-TTS-0.5B. This is an awesome model. Here's how you use it:
str2speech --model "SparkAudio/Spark-TTS-0.5B" \
--text "Hello from Spark" \
--output "sparktest.wav"Added support for Sesame CSM-1B. Here's how to use it:
export HF_TOKEN=<your huggingface token>
str2speech --text "Hello from Sesame" --model "sesame/csm-1b"Added support for Kokoro-82M. This is how you run it:
str2speech --text "Hello again" --model "kokoro"This is probably the easiest way to use Kokoro TTS.
Added support for Zyphra Zonos. Try this out:
str2speech --text "Hello from Zonos" \
--model "Zyphra/Zonos-v0.1-transformer" \
--output hellozonos.wavAlternatively, you could write Python code to use it:
from str2speech.speaker import Speaker
speaker = Speaker("Zyphra/Zonos-v0.1-transformer")
speaker.text_to_speech("Hello, this is a test!", "output.wav")You might need to install espeak. Here's how you can install it:
sudo apt install espeak-ng
- Supports multiple TTS models, including
Sesame/CSM-1B,SparkAudio/Spark-TTS-0.5B,Kokoro, and variousfacebook/mms-ttsmodels. - Supports voice cloning with Spark-TTS and Zyphra Zonos.
- Allows selection of voice presets.
- Supports text input via command-line arguments or files.
- Outputs speech in
.wavformat. - Works with both CPU and GPU.
The following models are supported:
Sesame/CSM-1BMegaTTS3SparkAudio/Spark-TTS-0.5BZyphra/Zonos-v0.1-transformermicrosoft/speecht5_ttsKokoro(English, Hindi, and Spanish only)suno/bark-small(default TTS model)suno/barkfacebook/mms-tts-eng(English only)facebook/mms-tts-deu(German only)facebook/mms-tts-fra(French only)facebook/mms-tts-spa(Spanish only)facebook/mms-tts-swe(Swedish only)nari-labs/dia-1.6b
To install str2speech, first make sure you have pip installed, then run:
pip install str2speechRun the script via the command line:
str2speech --text "Hello, world!" --output hello.wav--text(-t): The text to convert to speech.--file(-f): A file containing text to convert to speech.--voice(-v): The voice preset to use (optional, defaults to a predefined voice).--output(-o): The output.wavfile name (optional, defaults tooutput.wav).--model(-m): The TTS model to use (optional, defaults tosuno/bark-small).--speed(-s): The speed of the speech (optional, defaults to 1.0). Supported only by Kokoro TTS currently.--clone(-c): The filename of a wav file that contains the voice to clone.--clone-voice-text(-p): The transcript of what's being said in the wav file provided.
Example:
str2speech --file input.txt --output speech.wav --model suno/barkExample 2:
str2speech --text "This is my cloned voice" \
--model zyphra/zonos-v0.1-transformer \
--output clonetest.wav --clone "./lex.wav"You can also use str2speech as a Python module:
from str2speech.speaker import Speaker
speaker = Speaker()
speaker.text_to_speech("Hello, this is a test.", "test.wav")transformers==4.49.0torch==2.5.1+cu124numpy==1.26.4scipy==1.13.1
This project is licensed under the GNU General Public License v3 (GPLv3).