Povey, Daniel

Asst Research Prof
Electrical And Computer Engineering
www.danielpovey.com

Hackerman Hall 324
(410) 516-8038
dpovey1@jhu.edu

Jump to:

About

Education
  • Ph.D. 2003, Cambridge University
Experience
  • 2008 - 2012:  Researcher, Microsoft Research
  • 2006 - 2008:  Research Staff Member, Unspecified
  • 2003 - 2008:  Research Staff Member, Unspecified
  • 2002 - 2003:  Research Associate, Cambridge University
Research Areas
  • Speech recognition
Awards
  • 2014:  ISCA Best Paper published in Computer Speech and Language (2009-2013)

Publications

Journal Articles
  • Hadian H, Sameti H, Povey D, Khudanpur S (2018).  Flat-Start Single-Stage Discriminatively Trained HMM-Based Models for ASR.  IEEE/ACM Transactions on Audio Speech and Language Processing.  26(11).
  • Goel N, Carmiel Y, Povey D, Khudanpur S (2018).  A Pruned Rnnlm Lattice-Rescoring Algorithm for Automatic Speech Recognition.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  2018-April.
  • Khudanpur S (2018).  A time-restricted self-attention layer for ASR.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  2018-April.
  • Xu H, Li K, Wang Y, Wang J, Kang S, Chen X, Povey D, Khudanpur S (2018).  Neural Network Language Modeling with Letter-Based Features and Importance Sampling.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  2018-April.
  • Snyder D, Garcia-Romero D, Sell G, Povey D, Khudanpur S (2018).  X-Vectors: Robust DNN Embeddings for Speaker Recognition.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  2018-April.
  • Manohar V, Hadian H, Povey D, Khudanpur S (2018).  Semi-Supervised Training of Acoustic Models Using Lattice-Free MMI.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  2018-April.
  • Peddinti V, Wang Y, Povey D, Khudanpur S (2018).  Low Latency Acoustic Modeling Using Temporal Convolution and LSTMs.  IEEE Signal Processing Letters.  25(3).
  • Manohar V, Povey D, Khudanpur S (2018).  JHU Kaldi system for Arabic MGB-3 ASR challenge using diarization, audio-transcript alignment and transfer learning.  2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017 - Proceedings.  2018-January.
  • Ghahremani P, Manohar V, Hadian H, Povey D, Khudanpur S (2018).  Investigation of transfer learning for ASR using LF-MMI trained neural networks.  2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017 - Proceedings.  2018-January.
  • Watanabe S, Khudanpur S (2018).  Diarization is hard: Some experiences and lessons learned for the JHU team in the inaugural dihard challenge.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  2018-September.
  • Li K, Xu H, Wang Y, Povey D, Khudanpur S (2018).  Recurrent neural network language model adaptation for conversational speech recognition.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  2018-September.
  • Cheng G, Povey D, Huang L, Xu J, Khudanpur S, Yan Y (2018).  Output-gate projected gated recurrent unit for speech recognition.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  2018-September.
  • Hadian H, Sameti H, Povey D, Khudanpur S (2018).  End-to-end speech recognition using lattice-free MMI.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  2018-September.
  • Ghahremani P, Nidadavolu PS, Chen N, Villalba J, Povey D, Khudanpur S, Dehak N (2018).  End-to-end deep neural network age estimation.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  2018-September.
  • Chen Z, Luitjens J, Xu H, Wang Y, Povey D, Khudanpur S (2018).  A GPU-based WFST decoder with exact lattice generation.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  2018-September.
  • Sarma M, Ghahremani P, Povey D, Goel NK, Sarma KK, Dehak N (2018).  Emotion identification from raw speech signals using DNNs.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  2018-September.
  • Ghahremani P, Hadian H, Lv H, Povey D, Khudanpur S (2018).  Acoustic modeling from frequency-domain representations of speech.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  2018-September.
  • Zhu Y, Ko T, Snyder D, Mak B, Povey D (2018).  Self-attentive speaker embeddings for text-independent speaker verification.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  2018-September.
  • Povey D, Cheng G, Wang Y, Li K, Xu H, Yarmohamadi M, Khudanpur S (2018).  Semi-orthogonal low-rank matrix factorization for deep neural networks.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  2018-September.
  • Ko T, Peddinti V, Povey D, Seltzer ML, Khudanpur S (2017).  A study on data augmentation of reverberant speech for robust speech recognition.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.
  • Garcia-Romero D, Snyder D, Sell G, Povey D, McCree A (2017).  Speaker diarization using deep neural network embeddings.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.
  • Snyder D, Ghahremani P, Povey D, Garcia-Romero D, Carmiel Y, Khudanpur S (2017).  Deep neural network-based speaker embeddings for end-to-end speaker verification.  2016 IEEE Workshop on Spoken Language Technology, SLT 2016 - Proceedings.
  • Cheng G, Peddinti V, Povey D, Manohar V, Khudanpur S, Yan Y (2017).  An exploration of dropout with LSTMs.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  2017-August.
  • Zhang X, Manohar V, Povey D, Khudanpur S (2017).  Acoustic data-driven lexicon learning based on a greedy pronunciation selection framework.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  2017-August.
  • Snyder D, Garcia-Romero D, Povey D, Khudanpur S (2017).  Deep neural network embeddings for text-independent speaker verification.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  2017-August.
  • Hadian H, Povey D, Sameti H, Khudanpur S (2017).  Phone duration modeling for LVCSR using neural networks.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  2017-August.
  • Wang Y, Peddinti V, Xu H, Zhang X, Povey D, Khudanpur S (2017).  Backstitch: Counteracting finite-sample bias via negative steps.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  2017-August.
  • Trmal J, Wiesner M, Peddinti V, Zhang X, Ghahremani P, Wang Y, Manohar V, Xu H, Povey D, Khudanpur S (2017).  The Kaldi OpenKWS System: Improving low resource keyword search.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  2017-August.
  • Chen G, Povey D, Khudanpur S (2016).  Acoustic data-driven pronunciation lexicon generation for logographic languages.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  2016-May.
  • Peddinti V, Chen G, Manohar V, Ko T, Povey D, Khudanpur S (2016).  JHU ASpIRE system: Robust LVCSR with TDNNS, iVector adaptation and RNN-LMS.  2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings.
  • Snyder D, Garcia-Romero D, Povey D (2016).  Time delay deep neural network-based universal background models for speaker recognition.  2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings.
  • Peddinti V, Manohar V, Wang Y, Povey D, Khudanpur S (2016).  Far-field ASR without parallel data.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  08-12-September-2016.
  • Ghahremani P, Manohar V, Povey D, Khudanpur S (2016).  Acoustic modelling from the signal domain using CNNs.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  08-12-September-2016.
  • Povey D, Peddinti V, Galvez D, Ghahremani P, Manohar V, Na X, Wang Y, Khudanpur S (2016).  Purely sequence-trained neural networks for ASR based on lattice-free MMI.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  08-12-September-2016.
  • Ko T, Peddinti V, Povey D, Khudanpur S (2015).  Audio augmentation for speech recognition.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  2015-January.
  • Peddinti V, Povey D, Khudanpur S (2015).  A time delay neural network architecture for efficient modeling of long temporal contexts.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  2015-January.
  • Kumar G, Blackwood G, Trmal J, Povey D, Khudanpur S (2015).  A coarse-grained model for optimal coupling of ASR and SMT systems for Speech translation.  Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing.
  • Zhang X, Povey D, Khudanpur S (2015).  A diversity-penalizing ensemble training method for deep learning.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  2015-January.
  • Peddinti V, Chen G, Povey D, Khudanpur S (2015).  Reverberation robust acoustic modeling using i-vectors with time delay neural networks.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  2015-January.
  • Manohar V, Povey D, Khudanpur S (2015).  Semi-supervised maximum mutual information training of deep neural network acoustic models.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  2015-January.
  • Xu H, Chen G, Povey D, Khudanpur S (2015).  Modeling phonetic context with non-random forests for speech recognition.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  2015-January.
  • Panayotov V, Chen G, Povey D, Khudanpur S (2015).  Librispeech: An ASR corpus based on public domain audio books.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  2015-August.
  • Chen G, Xu H, Wu M, Povey D, Khudanpur S (2015).  Pronunciation and silence probability modeling for ASR.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  2015-January.
  • Garcia-Romero D, Zhang X, McCree A, Povey D (2014).  Improving speaker recognition performance in the domain adaptation challenge using deep neural networks.  2014 IEEE Workshop on Spoken Language Technology, SLT 2014 - Proceedings.
  • Zhang X, Manohar V, Liu C, Jansen A, Klakow D, Yarowsky D, Metze F (2014).  A keyword search system using open source software.  2014 IEEE Workshop on Spoken Language Technology, SLT 2014 - Proceedings.
  • Kumar G, Post M, Povey D, Khudanpur S (2014).  Some insights from translating conversational telephone speech.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.
  • Nolden D, Soltau H, Povey D, Ghahremani P, Mangu L, Ney H (2014).  Removing redundancy from lattices.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.
  • Chiu J, Wang Y, Trmal J, Povey D, Chen G, Rudnicky A (2014).  Combination of FST and CN search in Spoken Term Detection.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.
  • Vu NT, Imseng D, Povey D, Motlicek P, Schultz T, Bourlard H (2014).  Multilingual deep neural network based acoustic modeling for rapid language adaptation.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.
  • Zhang X, Trmal J, Povey D, Khudanpur S (2014).  Improving deep neural network acoustic models using generalized maxout networks.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.
  • Ghahremani P, Babaali B, Povey D, Riedhammer K, Trmal J, Khudanpur S (2014).  A pitch extraction algorithm tuned for automatic speech recognition.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.
  • Chen G, Yilmaz O, Trmal J, Povey D, Khudanpur S (2013).  Using proxies for OOV keywords in the keyword search task.  2013 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2013 - Proceedings.
  • Motlicek P, Povey D, Karafiat M (2013).  Feature and score level combination of subspace Gaussinas in LVCSR task.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.
  • Chen G, Khudanpur S, Povey D, Trmal J, Yarowsky D, Yilmaz O (2013).  Quantifying the value of pronunciation lexicons for keyword search in lowresource languages.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.
  • Hannemann M, Povey D, Zweig G (2013).  Combining forward and backward search in decoding.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.
  • Rath SP, Povey D, Veselý K, Cernocký JH (2013).  Improved feature processing for deep neural networks.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.
  • Veselý K, Ghoshal A, Burget L, Povey D (2013).  Sequence-discriminative training of deep neural networks.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.
  • Weng C, Juang BH, Povey D (2012).  Discriminative training using non-uniform criteria for keyword spotting on spontaneous speech.  13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012.  1.
  • Vinyals O, Ravuri SV, Povey D (2012).  Revisiting recurrent neural networks for robust ASR.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.
  • Riedhammer K, Bocklet T, Ghoshal A, Povey D (2012).  Revisiting semi-continuous hidden Markov models.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.
  • Vu NT, Schultz T, Povey D (2012).  Modeling gender dependency in the Subspace GMM framework.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.
  • Vinyals O, Povey D (2012).  Krylov subspace descent for deep learning.  Journal of Machine Learning Research.  22.
  • Povey D, Yao K (2012).  A basis representation of constrained MLLR transforms for robust adaptation.  Computer Speech and Language.  26(1).
  • Mikolov T, Deoras A, Povey D, Burget L, Cernocký J (2011).  Strategies for training large scale neural network language models.  2011 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2011, Proceedings.
  • Povey D, Zweig G, Acero A (2011).  Speaker adaptation with an exponential transform.  2011 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2011, Proceedings.
  • Qian Y, Povey D, Liu J (2011).  State-level data borrowing for low-resource speech recognition based on Subspace GMMs.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.
  • Qian Y, Xu J, Povey D, Liu J (2011).  Strategies for using MLP based features with limited target-language training data.  2011 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2011, Proceedings.
  • Xu H, Povey D, Mangu L, Zhu J (2011).  Minimum Bayes Risk decoding and system combination based on a recursion for edit distance.  Computer Speech and Language.  25(4).
  • Povey D, Karafiát M, Ghoshal A, Schwarz P (2011).  A symmetrization of the subspace Gaussian mixture model.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.
  • Povey D, Yao K (2011).  A basis method for robust estimation of constrained MLLR.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.
  • Povey D, Burget L, Agarwal M, Akyazi P, Kai F, Ghoshal A, Glembek O, Goel N, Karafiát M, Rastrow A, Rose RC, Schwarz P, Thomas S (2011).  The subspace Gaussian mixture model - A structured model for speech recognition.  Computer Speech and Language.  25(2).
  • Burget L, Schwarz P, Agarwal M, Akyazi P, Feng K, Ghoshal A, Glembek O, Goel N, Karafiát M, Povey D, Rastrow A, Rose RC, Thomas S (2010).  Multilingual acoustic modeling for speech recognition based on subspace Gaussian mixture models.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.
  • Ghoshal A, Povey D, Agarwal M, Akyazi P, Burget L, Feng K, Glembek O, Goel N, Karafiát M, Rastrow A, Rose RC, Schwarz P, Thomas S (2010).  A novel estimation of feature-space MLLR for full-covariance models.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.
  • Saon G, Soltau H, Chaudhari U, Chu S, Kingsbury B, Kuo HK, Mangu L, Povey D (2010).  The IBM 2008 GALE Arabic speech transcription system.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.
  • Povey D, Burget L, Agarwal M, Akyazi P, Feng K, Ghoshal A, Glembek O, Goel NK, Karafiát M, Rastrow A, Rose RC, Schwarz P, Thomas S (2010).  Subspace gaussian mixture models for speech recognition.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.
  • Goel N, Thomas S, Agarwal M, Akyazi P, Burget L, Feng K, Ghoshal A, Glembek O, Karafiät M, Povey D, Rastrow A, Rose RC, Schwarz P (2010).  Approaches to automatic lexicon learning with limited training examples.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.
  • Xu H, Povey D, Mangu L, Zhu J (2010).  An improved consensus-like method for minimum bayes risk decoding and lattice combination.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.
  • Chu SM, Povey D (2010).  Speaking rate adaptation using continuous frame rate normalization.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.
  • Chu SM, Povey D, Kuo HK, Mangu L, Zhang S, Shi Q, Qin Y (2010).  The 2009 IBM GALE mandarin broadcast transcription system.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.
  • Xu H, Povey D, Zhu J, Wu G (2009).  Minimum hypothesis phone error as a decoding method for speech recognition.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.
  • Saon G, Povey D, Soltau H (2009).  Large margin semi-tied covariance transforms for discriminative training.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.
  • Soltau H, Saon G, Kingsbury B, Kuo HKJ, Mangu L, Povey D, Emami A (2009).  Advances in Arabic Speech Transcription at IBM Under the DARPA GALE Program.  IEEE Transactions on Audio, Speech and Language Processing.  17(5).
  • Povey D, Kuo HKJ, Soltau H (2008).  Fast speaker adaptive training for speech recognition.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.
  • Saon G, Povey D (2008).  Penalty function maximization for large margin HMM training.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.
  • Povey D, Kingsbury B (2008).  Monte Carlo model-space noise adaptation for speech recognition.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.
  • Povey D, Kuo HKJ (2008).  XMLLR for improved speaker adaptation in speech recognition.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.
  • Varadarajan B, Povey D, Chu SM (2008).  Quick FMLLR for speaker adaptation in speech recognition.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.
  • Povey D, Kanevsky D, Kingsbury B, Ramabhadran B, Saon G, Visweswariah K (2008).  Boosted MMI for model and feature-space discriminative training.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.
  • Povey D, Chu SM, Varadarajan B (2008).  Universal background model based speech recognition.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.
  • Sarikaya R, Zhou B, Povey D, Afify M, Gao Y (2007).  The impact of ASR on speech-to-speech translation performance.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  4.
  • Povey D, Kingsbury B (2007).  Evaluation of proposed modifications to MPE for large scale discriminative training.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  4.
  • Soltau H, Saon G, Kingsbury B, Kuo J, Mangu L, Povey D, Zweig G (2007).  The IBM 2006 GALE arabic ASR system.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  4.
  • Choueiter G, Povey D, Chen SF, Zweig G (2006).  Morpheme-based language modeling for Arabic LVCSR.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  1.
  • Pelecanos J, Povey D, Ramaswamy G (2006).  Secondary classification for GMM based speaker recognition.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  1.
  • Zweig G, Siohan O, Saon G, Ramabhadran B, Povey D, Mangu L, Kingsbury B (2006).  Automated quality monitoring in the call center with ASR and maximum entropy.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  1.
  • Huang J, Westphal M, Chen S, Siohan O, Povey D, Libal V, Soneiro A, Schulz H, Ross T, Potamianos G (2006).  The IBM rich transcription spring 2006 speech-to-text system for lecture meetings.  Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics).  4299 LNCS.
  • Chen SF, Kingsbury B, Mangu L, Povey D, Saon G, Soltau H, Zweig G (2006).  Advances in speech transcription at IBM under the DARPA EARS program.  IEEE Transactions on Audio, Speech and Language Processing.  14(5).
  • Povey D (2006).  SPAM and full covariance for speech recognition.  INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP.  3.
  • Povey D, Saon G (2006).  Feature and model space speaker adaptation with full covariance Gaussians.  INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP.  3.
  • Povey D (2005).  Improvements to fMPE for discriminative training of features.  9th European Conference on Speech Communication and Technology.
  • Huang J, Povey D (2005).  Discriminatively trained features using fMPE for multi-stream audio-visual speech recognition.  9th European Conference on Speech Communication and Technology.
  • Gales MJF, Liu X, Moore GL, Povey D, Wang L (2005).  Automatic transcription of conversational telephone speech.  IEEE Transactions on Speech and Audio Processing.  13(6).
  • Soltau H, Kingsbury B, Mangu L, Povey D, Saon G, Zweig G (2005).  THE IBM 2004 conversational telephony system for rich transcription.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  I.
  • Povey D, Kingsbury B, Mangu L, Saon G, Soltau H, Zweig G (2005).  FMPE: Discriminatively trained features for speech recognition.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  I.
  • Saon G, Dharanipragada S, Povey D (2004).  Feature space gaussianization.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  1.
  • Povey D (2004).  Phone duration modeling for LVCSR.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  1.
  • Povey D, Woodland PC, Gales MJF (2003).  Discriminative map for acoustic model adaptation.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  1.
  • Gales MJF, Dong Y, Povey D, Woodland PC (2003).  Porting: Switchboard to the VoiceMail task.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  1.
  • Nopsuwanchai R, Povey D (2003).  Discriminative training for HMM-Based offline handwritten character recognition.  Proceedings of the International Conference on Document Analysis and Recognition, ICDAR.  2003-January.
  • Povey D, Gales MJF, Kim DY, Woodland PC (2003).  MMI-MAP and MPE-MAP for acoustic model adaptation.  EUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology.
  • Povey D, Woodland PC (2002).  Minimum phone error and I-smoothing for improved discriminative training.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  1.
  • Woodland PC, Povey D (2002).  Large scale discriminative training of hidden Markov models for speech recognition.  Computer Speech and Language.  16(1).
  • Povey D, Woodland PC (2001).  Improved discriminative training techniques for large vocabulary continuous speech recognition.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  1.
  • Povey D, Woodland PC (1999).  Frame discrimination training of HMMs for large vocabulary speech recognition.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  1.
Back to top