Learning the Beauty in Songs: Neural Singing Voice Beautifier

This repository is the official PyTorch implementation of our ACL-2022 paper.

0. Dataset (PopBuTFy) Acquirement

Audio samples

You can download the dataset from here. Please send us an email for registration (See in apply_form).
Dataset preview.

Text labels

NeuralSVB does not need text as input, but the ASR model to extract PPG needs text. Thus we also provide the text labels of PopBuTFy.

1. Preparation

Environment Preparation

Most of the required packages are in https://github.com/NATSpeech/NATSpeech/blob/main/requirements.txt

Or you can prepare environments with the Requirements.txt file in the repository directory.

pip install Requirements.txt

Data Preparation

Extract embeddings of vocal timbre:

CUDA_VISIBLE_DEVICES=0 python data_gen/tts/bin/binarize.py --config egs/datasets/audio/PopBuTFy/save_emb.yaml

Pack the dataset:

CUDA_VISIBLE_DEVICES=0 python data_gen/tts/bin/binarize.py --config egs/datasets/audio/PopBuTFy/para_bin.yaml

Vocoder Preparation

We provide the pre-trained model of HifiGAN-Singing which is specially designed for SVS with NSF mechanism.

Please unzip pre-trained vocoder into checkpoints before training your acoustic model.

This singing vocoder is trained on 100+ hours singing data (including Chinese and English songs).

PPG Extractor Preparation

We provide the pre-trained model of PPG Extractor.

Please unzip pre-trained PPG extractor into checkpoints before training your acoustic model.

After the instructions above, the directory structure should be as follows:

.
|--data
    |--processed
        |--PopBuTFy (unzip PopBuTFy.zip)
            |--data
                |--directories containing wavs
    |--binary
        |--PopBuTFyENSpkEM
|--checkpoints
    |--1009_pretrain_asr_english
        |--
        |--config.yaml
    |--1012_hifigan_all_songs_nsf
        |--
        |--config.yaml

2. Training Example

CUDA_VISIBLE_DEVICES=0,1 python tasks/run.py --config egs/datasets/audio/PopBuTFy/vae_global_mle_eng.yaml --exp_name exp_name --reset

3. Inference

Inference from packed test set

CUDA_VISIBLE_DEVICES=0,1 python tasks/run.py --config egs/datasets/audio/PopBuTFy/vae_global_mle_eng.yaml --exp_name exp_name --reset --infer

Inference results will be saved in ./checkpoints/EXP_NAME/generated_ by default.

We provided:

the pre-trained model of NSVB (en version);

Remember to put the pre-trained models in checkpoints directory.

Inference from raw inputs

WIP.

Limitations

See Appendix D "Limitations and Solutions" in our paper.

Citation

If this repository helps your research, please cite:

@inproceedings{liu-etal-2022-learning-beauty,
title = "Learning the Beauty in Songs: Neural Singing Voice Beautifier",
author = "Liu, Jinglin  and
  Li, Chengxi  and
  Ren, Yi  and
  Zhu, Zhiying  and
  Zhao, Zhou",
booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
month = may,
year = "2022",
address = "Dublin, Ireland",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.acl-long.549",
pages = "7970--7983",}

Issues

Before raising a issue, please check our Readme and other issues for possible solutions.
We will try to handle your problem in time but we could not guarantee a satisfying solution.
Please be friendly.

Acknowledgements

r9y9's wavenet_vocoder
Po-Hsun-Su's ssim
descriptinc's melgan
Official espnet
Official PyTorch Lightning

The framework of this repository is based on DiffSinger, and is a predecessor of NATSpeech.

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
data_gen		data_gen
egs		egs
modules		modules
resources		resources
tasks		tasks
utils		utils
vocoders		vocoders
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Requirements.txt		Requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Learning the Beauty in Songs: Neural Singing Voice Beautifier

0. Dataset (PopBuTFy) Acquirement

Audio samples

Text labels

1. Preparation

Environment Preparation

Data Preparation

Vocoder Preparation

PPG Extractor Preparation

2. Training Example

3. Inference

Inference from packed test set

Inference from raw inputs

Limitations

Citation

Issues

Acknowledgements

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

License

MoonInTheRiver/NeuralSVB

Folders and files

Latest commit

History

Repository files navigation

Learning the Beauty in Songs: Neural Singing Voice Beautifier

0. Dataset (PopBuTFy) Acquirement

Audio samples

Text labels

1. Preparation

Environment Preparation

Data Preparation

Vocoder Preparation

PPG Extractor Preparation

2. Training Example

3. Inference

Inference from packed test set

Inference from raw inputs

Limitations

Citation

Issues

Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Packages