Thesis Proposal: Nanxin Chen
Note: This is a virtual presentation. Here is the link for where the presentation will be taking place.
Title: Towards End-to-end Non-autoregressive speech applications
Abstract: Sequence labeling is a fascinating and challenging topic in the speech research community. The Sequence-to-sequence model is proposed for various sequence labeling tasks as a particularly popular end-to-end model. Autoregressive models are the dominant approach that predicts the label one by one, conditioning on previous results. This makes the training easier and more stable. However, this simplicity also results in inefficiency for the inference, particularly with those lengthy output sequences. To speed up the inference procedure, researchers start to be interested in another type of sequence-to-sequence model, known as non-autoregressive models. In contrast to the autoregressive models, non-autoregressive models predict the whole sequence within a constant number of iterations.
In this proposal, two different types of non-autoregressive models for speech applications are proposed: mask-based approach and noise-based approach. To demonstrate the effectiveness of the two proposed methods, we explored their usage for two important topics: speech recognition and speech synthesis. Experiments reveal that the proposed methods can match the performance of state-of-the-art autoregressive models with a much shorter inference time.
- Najim Dehak, Department of Electrical and Computer Engineering
- Sanjeev Khudanpur, Department of Electrical and Computer Engineering
- Hynek Hermansky, Department of Electrical and Computer Engineering
- Jesús Villalba, Department of Electrical and Computer Engineering