Self-Supervised Speech Pre-training and Representation Learning Toolkit
- 
            Updated
            Jun 13, 2025 
- Python
Self-Supervised Speech Pre-training and Representation Learning Toolkit
[ICLR 2025] SOTA discrete acoustic codec models with 40/75 tokens per second for audio language modeling
[ACL 2024] Official PyTorch code for extracting features and training downstream models with emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
A Survey of Spoken Dialogue Models (60 pages)
A single-layer, streaming codec model providing SOTA audio quality and discrete tokens designed for superior downstream modelability.
Trainging, inference, and testing of the SAC speech codec model.
This is the code for paper: XY-Tokenizer: Mitigating the Semantic-Acoustic Conflict in Low-Bitrate Speech Codecs. Demos, technical insights and experimental results are presented on
LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT
Ultra-low bitrate speech codec (0.27-1 kbps) with cross-modal alignment and real-time capabilities
Survey of audio language models
Official Implementation of Mockingjay in Pytorch
A mini, simple, and fast end-to-end automatic speech recognition toolkit.
DUSTED: Spoken-Term Discovery using Discrete Speech Units
Causal Speech Enhancement Based on a Two-Branch Nested U-Net Architecture Using Self-Supervised Speech Embeddings
Semi-supervised spoken language understanding (SLU) via self-supervised speech and language model pretraining
Add a description, image, and links to the speech-representation topic page so that developers can more easily learn about it.
To associate your repository with the speech-representation topic, visit your repo's landing page and select "manage topics."