Povey, Daniel

Asst Research Prof
Electrical And Computer Engineering
www.danielpovey.com

Hackerman Hall 324
(410) 516-8038
dpovey1@jhu.edu

Jump to:

About

Education
  • Ph.D. 2003, Cambridge University
Experience
  • 2008 - 2012:  Researcher, Microsoft Research
  • 2006 - 2008:  Research Staff Member, Unspecified
  • 2003 - 2008:  Research Staff Member, Unspecified
  • 2002 - 2003:  Research Associate, Cambridge University
Research Areas
  • Speech recognition
Awards
  • 2014:  ISCA Best Paper published in Computer Speech and Language (2009-2013)

Publications

Journal Articles
  • Hadian H, Sameti H, Povey D, Khudanpur S (2018).  Flat-Start Single-Stage Discriminatively Trained HMM-Based Models for ASR.  IEEE/ACM Transactions on Audio Speech and Language Processing.  26(11).
  • Povey D, Hadian H, Ghahremani P, Li K, Khudanpur S (2018).  A time-restricted self-attention layer for ASR.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  2018-April.
  • Xu H, Chen T, Gao D, Wang Y, Li K, Goel N, Carmiel Y, Povey D, Khudanpur S (2018).  A Pruned Rnnlm Lattice-Rescoring Algorithm for Automatic Speech Recognition.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  2018-April.
  • Manohar V, Hadian H, Povey D, Khudanpur S (2018).  Semi-Supervised Training of Acoustic Models Using Lattice-Free MMI.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  2018-April.
  • Snyder D, Garcia-Romero D, Sell G, Povey D, Khudanpur S (2018).  X-Vectors: Robust DNN Embeddings for Speaker Recognition.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  2018-April.
  • Xu H, Li K, Wang Y, Wang J, Kang S, Chen X, Povey D, Khudanpur S (2018).  Neural Network Language Modeling with Letter-Based Features and Importance Sampling.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  2018-April.
  • Peddinti V, Wang Y, Povey D, Khudanpur S (2018).  Low Latency Acoustic Modeling Using Temporal Convolution and LSTMs.  IEEE Signal Processing Letters.  25(3).
  • Manohar V, Povey D, Khudanpur S (2018).  JHU Kaldi system for Arabic MGB-3 ASR challenge using diarization, audio-transcript alignment and transfer learning.  2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017 - Proceedings.  2018-January.
  • Ghahremani P, Manohar V, Hadian H, Povey D, Khudanpur S (2018).  Investigation of transfer learning for ASR using LF-MMI trained neural networks.  2017 IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2017 - Proceedings.  2018-January.
  • Cheng G, Povey D, Huang L, Xu J, Khudanpur S, Yan Y (2018).  Output-gate projected gated recurrent unit for speech recognition.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  2018-September.
  • Povey D, Cheng G, Wang Y, Li K, Xu H, Yarmohamadi M, Khudanpur S (2018).  Semi-orthogonal low-rank matrix factorization for deep neural networks.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  2018-September.
  • Hadian H, Sameti H, Povey D, Khudanpur S (2018).  End-to-end speech recognition using lattice-free MMI.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  2018-September.
  • Li K, Xu H, Wang Y, Povey D, Khudanpur S (2018).  Recurrent neural network language model adaptation for conversational speech recognition.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  2018-September.
  • Chen Z, Luitjens J, Xu H, Wang Y, Povey D, Khudanpur S (2018).  A GPU-based WFST decoder with exact lattice generation.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  2018-September.
  • Sarma M, Ghahremani P, Povey D, Goel NK, Sarma KK, Dehak N (2018).  Emotion identification from raw speech signals using DNNs.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  2018-September.
  • Ghahremani P, Hadian H, Lv H, Povey D, Khudanpur S (2018).  Acoustic modeling from frequency-domain representations of speech.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  2018-September.
  • Zhu Y, Ko T, Snyder D, Mak B, Povey D (2018).  Self-attentive speaker embeddings for text-independent speaker verification.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  2018-September.
  • Sell G, Snyder D, McCree A, Garcia-Romero D, Villalba J, Maciejewski M, Manohar V, Dehak N, Povey D, Watanabe S, Khudanpur S (2018).  Diarization is hard: Some experiences and lessons learned for the JHU team in the inaugural dihard challenge.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  2018-September.
  • Ghahremani P, Nidadavolu PS, Chen N, Villalba J, Povey D, Khudanpur S, Dehak N (2018).  End-to-end deep neural network age estimation.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  2018-September.
  • Garcia-Romero D, Snyder D, Sell G, Povey D, McCree A (2017).  Speaker diarization using deep neural network embeddings.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.
  • Ko T, Peddinti V, Povey D, Seltzer ML, Khudanpur S (2017).  A study on data augmentation of reverberant speech for robust speech recognition.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.
  • Snyder D, Ghahremani P, Povey D, Garcia-Romero D, Carmiel Y, Khudanpur S (2017).  Deep neural network-based speaker embeddings for end-to-end speaker verification.  2016 IEEE Workshop on Spoken Language Technology, SLT 2016 - Proceedings.
  • Trmal J, Wiesner M, Peddinti V, Zhang X, Ghahremani P, Wang Y, Manohar V, Xu H, Povey D, Khudanpur S (2017).  The Kaldi OpenKWS System: Improving low resource keyword search.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  2017-August.
  • Wang Y, Peddinti V, Xu H, Zhang X, Povey D, Khudanpur S (2017).  Backstitch: Counteracting finite-sample bias via negative steps.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  2017-August.
  • Hadian H, Povey D, Sameti H, Khudanpur S (2017).  Phone duration modeling for LVCSR using neural networks.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  2017-August.
  • Snyder D, Garcia-Romero D, Povey D, Khudanpur S (2017).  Deep neural network embeddings for text-independent speaker verification.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  2017-August.
  • Zhang X, Manohar V, Povey D, Khudanpur S (2017).  Acoustic data-driven lexicon learning based on a greedy pronunciation selection framework.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  2017-August.
  • Cheng G, Peddinti V, Povey D, Manohar V, Khudanpur S, Yan Y (2017).  An exploration of dropout with LSTMs.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  2017-August.
  • Chen G, Povey D, Khudanpur S (2016).  Acoustic data-driven pronunciation lexicon generation for logographic languages.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  2016-May.
  • Peddinti V, Chen G, Manohar V, Ko T, Povey D, Khudanpur S (2016).  JHU ASpIRE system: Robust LVCSR with TDNNS, iVector adaptation and RNN-LMS.  2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings.
  • Snyder D, Garcia-Romero D, Povey D (2016).  Time delay deep neural network-based universal background models for speaker recognition.  2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings.
  • Ghahremani P, Manohar V, Povey D, Khudanpur S (2016).  Acoustic modelling from the signal domain using CNNs.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  08-12-September-2016.
  • Peddinti V, Manohar V, Wang Y, Povey D, Khudanpur S (2016).  Far-field ASR without parallel data.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  08-12-September-2016.
  • Povey D, Peddinti V, Galvez D, Ghahremani P, Manohar V, Na X, Wang Y, Khudanpur S (2016).  Purely sequence-trained neural networks for ASR based on lattice-free MMI.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  08-12-September-2016.
  • Xu H, Chen G, Povey D, Khudanpur S (2015).  Modeling phonetic context with non-random forests for speech recognition.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  2015-January.
  • Peddinti V, Povey D, Khudanpur S (2015).  A time delay neural network architecture for efficient modeling of long temporal contexts.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  2015-January.
  • Chen G, Xu H, Wu M, Povey D, Khudanpur S (2015).  Pronunciation and silence probability modeling for ASR.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  2015-January.
  • Ko T, Peddinti V, Povey D, Khudanpur S (2015).  Audio augmentation for speech recognition.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  2015-January.
  • Manohar V, Povey D, Khudanpur S (2015).  Semi-supervised maximum mutual information training of deep neural network acoustic models.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  2015-January.
  • Peddinti V, Chen G, Povey D, Khudanpur S (2015).  Reverberation robust acoustic modeling using i-vectors with time delay neural networks.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  2015-January.
  • Kumar G, Blackwood G, Trmal J, Povey D, Khudanpur S (2015).  A coarse-grained model for optimal coupling of ASR and SMT systems for Speech translation.  Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing.
  • Zhang X, Povey D, Khudanpur S (2015).  A diversity-penalizing ensemble training method for deep learning.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  2015-January.
  • Panayotov V, Chen G, Povey D, Khudanpur S (2015).  Librispeech: An ASR corpus based on public domain audio books.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  2015-August.
  • Trmal J, Chen G, Povey D, Khudanpur S, Ghahremani P, Zhang X, Manohar V, Liu C, Jansen A, Klakow D, Yarowsky D, Metze F (2014).  A keyword search system using open source software.  2014 IEEE Workshop on Spoken Language Technology, SLT 2014 - Proceedings.
  • Garcia-Romero D, Zhang X, McCree A, Povey D (2014).  Improving speaker recognition performance in the domain adaptation challenge using deep neural networks.  2014 IEEE Workshop on Spoken Language Technology, SLT 2014 - Proceedings.
  • Kumar G, Post M, Povey D, Khudanpur S (2014).  Some insights from translating conversational telephone speech.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.
  • Vu NT, Imseng D, Povey D, Motlicek P, Schultz T, Bourlard H (2014).  Multilingual deep neural network based acoustic modeling for rapid language adaptation.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.
  • Nolden D, Soltau H, Povey D, Ghahremani P, Mangu L, Ney H (2014).  Removing redundancy from lattices.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.
  • Zhang X, Trmal J, Povey D, Khudanpur S (2014).  Improving deep neural network acoustic models using generalized maxout networks.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.
  • Chiu J, Wang Y, Trmal J, Povey D, Chen G, Rudnicky A (2014).  Combination of FST and CN search in Spoken Term Detection.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.
  • Ghahremani P, Babaali B, Povey D, Riedhammer K, Trmal J, Khudanpur S (2014).  A pitch extraction algorithm tuned for automatic speech recognition.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.
  • Chen G, Yilmaz O, Trmal J, Povey D, Khudanpur S (2013).  Using proxies for OOV keywords in the keyword search task.  2013 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2013 - Proceedings.
  • Motlicek P, Povey D, Karafiat M (2013).  Feature and score level combination of subspace Gaussinas in LVCSR task.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.
  • Chen G, Khudanpur S, Povey D, Trmal J, Yarowsky D, Yilmaz O (2013).  Quantifying the value of pronunciation lexicons for keyword search in lowresource languages.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.
  • Hannemann M, Povey D, Zweig G (2013).  Combining forward and backward search in decoding.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.
  • Rath SP, Povey D, Veselý K, Cernocký JH (2013).  Improved feature processing for deep neural networks.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.
  • Veselý K, Ghoshal A, Burget L, Povey D (2013).  Sequence-discriminative training of deep neural networks.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.
  • Weng C, Juang BH, Povey D (2012).  Discriminative training using non-uniform criteria for keyword spotting on spontaneous speech.  13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012.  1.
  • Vu NT, Schultz T, Povey D (2012).  Modeling gender dependency in the Subspace GMM framework.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.
  • Riedhammer K, Bocklet T, Ghoshal A, Povey D (2012).  Revisiting semi-continuous hidden Markov models.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.
  • Povey D, Hannemann M, Boulianne G, Burget L, Ghoshal A, Janda M, Karafiát M, Kombrink S, Motlícek P, Qian Y, Riedhammer K, Veselý K, Vu NT (2012).  Generating exact lattices in the WFST framework.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.
  • Vinyals O, Ravuri SV, Povey D (2012).  Revisiting recurrent neural networks for robust ASR.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.
  • Vinyals O, Povey D (2012).  Krylov subspace descent for deep learning.  Journal of Machine Learning Research.  22.
  • Povey D, Yao K (2012).  A basis representation of constrained MLLR transforms for robust adaptation.  Computer Speech and Language.  26(1).
  • Povey D, Zweig G, Acero A (2011).  Speaker adaptation with an exponential transform.  2011 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2011, Proceedings.
  • Qian Y, Povey D, Liu J (2011).  State-level data borrowing for low-resource speech recognition based on Subspace GMMs.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.
  • Qian Y, Xu J, Povey D, Liu J (2011).  Strategies for using MLP based features with limited target-language training data.  2011 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2011, Proceedings.
  • Mikolov T, Deoras A, Povey D, Burget L, Cernocký J (2011).  Strategies for training large scale neural network language models.  2011 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2011, Proceedings.
  • Xu H, Povey D, Mangu L, Zhu J (2011).  Minimum Bayes Risk decoding and system combination based on a recursion for edit distance.  Computer Speech and Language.  25(4).
  • Povey D, Karafiát M, Ghoshal A, Schwarz P (2011).  A symmetrization of the subspace Gaussian mixture model.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.
  • Povey D, Yao K (2011).  A basis method for robust estimation of constrained MLLR.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.
  • Povey D, Burget L, Agarwal M, Akyazi P, Kai F, Ghoshal A, Glembek O, Goel N, Karafiát M, Rastrow A, Rose RC, Schwarz P, Thomas S (2011).  The subspace Gaussian mixture model - A structured model for speech recognition.  Computer Speech and Language.  25(2).
  • Saon G, Soltau H, Chaudhari U, Chu S, Kingsbury B, Kuo HK, Mangu L, Povey D (2010).  The IBM 2008 GALE Arabic speech transcription system.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.
  • Chu SM, Povey D, Kuo HK, Mangu L, Zhang S, Shi Q, Qin Y (2010).  The 2009 IBM GALE mandarin broadcast transcription system.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.
  • Xu H, Povey D, Mangu L, Zhu J (2010).  An improved consensus-like method for minimum bayes risk decoding and lattice combination.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.
  • Povey D, Burget L, Agarwal M, Akyazi P, Feng K, Ghoshal A, Glembek O, Goel NK, Karafiát M, Rastrow A, Rose RC, Schwarz P, Thomas S (2010).  Subspace gaussian mixture models for speech recognition.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.
  • Goel N, Thomas S, Agarwal M, Akyazi P, Burget L, Feng K, Ghoshal A, Glembek O, Karafiät M, Povey D, Rastrow A, Rose RC, Schwarz P (2010).  Approaches to automatic lexicon learning with limited training examples.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.
  • Chu SM, Povey D (2010).  Speaking rate adaptation using continuous frame rate normalization.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.
  • Burget L, Schwarz P, Agarwal M, Akyazi P, Feng K, Ghoshal A, Glembek O, Goel N, Karafiát M, Povey D, Rastrow A, Rose RC, Thomas S (2010).  Multilingual acoustic modeling for speech recognition based on subspace Gaussian mixture models.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.
  • Ghoshal A, Povey D, Agarwal M, Akyazi P, Burget L, Feng K, Glembek O, Goel N, Karafiát M, Rastrow A, Rose RC, Schwarz P, Thomas S (2010).  A novel estimation of feature-space MLLR for full-covariance models.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.
  • Xu H, Povey D, Zhu J, Wu G (2009).  Minimum hypothesis phone error as a decoding method for speech recognition.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.
  • Saon G, Povey D, Soltau H (2009).  Large margin semi-tied covariance transforms for discriminative training.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.
  • Soltau H, Saon G, Kingsbury B, Kuo HKJ, Mangu L, Povey D, Emami A (2009).  Advances in Arabic Speech Transcription at IBM Under the DARPA GALE Program.  IEEE Transactions on Audio, Speech and Language Processing.  17(5).
  • Povey D, Kuo HKJ (2008).  XMLLR for improved speaker adaptation in speech recognition.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.
  • Povey D, Kingsbury B (2008).  Monte Carlo model-space noise adaptation for speech recognition.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.
  • Povey D, Kuo HKJ, Soltau H (2008).  Fast speaker adaptive training for speech recognition.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.
  • Saon G, Povey D (2008).  Penalty function maximization for large margin HMM training.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.
  • Povey D, Kanevsky D, Kingsbury B, Ramabhadran B, Saon G, Visweswariah K (2008).  Boosted MMI for model and feature-space discriminative training.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.
  • Povey D, Chu SM, Varadarajan B (2008).  Universal background model based speech recognition.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.
  • Varadarajan B, Povey D, Chu SM (2008).  Quick FMLLR for speaker adaptation in speech recognition.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.
  • Soltau H, Saon G, Kingsbury B, Kuo J, Mangu L, Povey D, Zweig G (2007).  The IBM 2006 GALE arabic ASR system.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  4.
  • Sarikaya R, Zhou B, Povey D, Afify M, Gao Y (2007).  The impact of ASR on speech-to-speech translation performance.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  4.
  • Povey D, Kingsbury B (2007).  Evaluation of proposed modifications to MPE for large scale discriminative training.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  4.
  • Pelecanos J, Povey D, Ramaswamy G (2006).  Secondary classification for GMM based speaker recognition.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  1.
  • Huang J, Westphal M, Chen S, Siohan O, Povey D, Libal V, Soneiro A, Schulz H, Ross T, Potamianos G (2006).  The IBM rich transcription spring 2006 speech-to-text system for lecture meetings.  Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics).  4299 LNCS.
  • Choueiter G, Povey D, Chen SF, Zweig G (2006).  Morpheme-based language modeling for Arabic LVCSR.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  1.
  • Zweig G, Siohan O, Saon G, Ramabhadran B, Povey D, Mangu L, Kingsbury B (2006).  Automated quality monitoring in the call center with ASR and maximum entropy.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  1.
  • Chen SF, Kingsbury B, Mangu L, Povey D, Saon G, Soltau H, Zweig G (2006).  Advances in speech transcription at IBM under the DARPA EARS program.  IEEE Transactions on Audio, Speech and Language Processing.  14(5).
  • Povey D, Saon G (2006).  Feature and model space speaker adaptation with full covariance Gaussians.  INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP.  3.
  • Povey D (2006).  SPAM and full covariance for speech recognition.  INTERSPEECH 2006 and 9th International Conference on Spoken Language Processing, INTERSPEECH 2006 - ICSLP.  3.
  • Saon G, Povey D, Zweig G (2005).  Anatomy of an extremely fast LVCSR decoder.  9th European Conference on Speech Communication and Technology.
  • Huang J, Povey D (2005).  Discriminatively trained features using fMPE for multi-stream audio-visual speech recognition.  9th European Conference on Speech Communication and Technology.
  • Povey D (2005).  Improvements to fMPE for discriminative training of features.  9th European Conference on Speech Communication and Technology.
  • Hain T, Woodland PC, Evermann G, Gales MJF, Liu X, Moore GL, Povey D, Wang L (2005).  Automatic transcription of conversational telephone speech.  IEEE Transactions on Speech and Audio Processing.  13(6).
  • Soltau H, Kingsbury B, Mangu L, Povey D, Saon G, Zweig G (2005).  THE IBM 2004 conversational telephony system for rich transcription.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  I.
  • Povey D, Kingsbury B, Mangu L, Saon G, Soltau H, Zweig G (2005).  FMPE: Discriminatively trained features for speech recognition.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  I.
  • Povey D (2004).  Phone duration modeling for LVCSR.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  1.
  • Saon G, Dharanipragada S, Povey D (2004).  Feature space gaussianization.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  1.
  • Povey D, Woodland PC, Gales MJF (2003).  Discriminative map for acoustic model adaptation.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  1.
  • Gales MJF, Dong Y, Povey D, Woodland PC (2003).  Porting: Switchboard to the VoiceMail task.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  1.
  • Povey D, Gales MJF, Kim DY, Woodland PC (2003).  MMI-MAP and MPE-MAP for acoustic model adaptation.  EUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology.
  • Nopsuwanchai R, Povey D (2003).  Discriminative training for HMM-Based offline handwritten character recognition.  Proceedings of the International Conference on Document Analysis and Recognition, ICDAR.  2003-January.
  • Povey D, Woodland PC (2002).  Minimum phone error and I-smoothing for improved discriminative training.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  1.
  • Woodland PC, Povey D (2002).  Large scale discriminative training of hidden Markov models for speech recognition.  Computer Speech and Language.  16(1).
  • Povey D, Woodland PC (2001).  Improved discriminative training techniques for large vocabulary continuous speech recognition.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  1.
  • Povey D, Woodland PC (1999).  Frame discrimination training of HMMs for large vocabulary speech recognition.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  1.
Back to top