Faculty

Hynek Hermansky

Julian S. Smith Professor

Primary Appointment: Electrical and Computer Engineering

Research Interests

Research Interests:

  • Detect, identify, classify and transmit information in speech

Hynek Hermansky is the Julian S. Smith Professor of Electrical Engineering and the director of Johns Hopkins’ Center for Language and Speech Processing (CLSP). He has been at the forefront of groundbreaking research in human hearing and speech technology research for more than three decades, both in industrial research labs and in academia.

Hermansky’s research aims at detecting, identifying, classifying, transmitting, and reconstructing information contained in sensory signals, and has far-reaching implications for machine-based artificial intelligence. His main focus is on using bio-inspired methods to recognize information in speech-related signals.

Hermansky leads an internationally acclaimed group of Johns Hopkins faculty, students, and visiting researchers at the CLSP, which comprises one of the largest and most prestigious speech and language oriented academic groups in the world. In addition to his leadership of the CLSP, Hermansky is affiliated with Hopkins’ Human Language Technology Center of Excellence and holds the position of research professor on leave at Brno University of Technology in the Czech Republic. His past affiliations include the director of research at IDIAP Research Institute, Martigny, Switzerland (2003-2008), titular professor at the Swiss Federal Institute of Technology in Lausanne, Switzerland (2005-2008), pProfessor at the Oregon Health and Sciences University (previously Oregon Graduate Institute), senior member of the Research Staff at the U.S. WEST Advanced Technologies in Boulder, CO, and research engineer at Panasonic Technologies in Santa Barbara, California.

His achievements include more than 250 peer-reviewed papers with more than 16,000 citations, and 13 patents, with another eight pending applications in topics such as a method for identifying keywords in machine recognition of speech based on the detection and classification of sparse speech sound events; a system to compute speech recognition for cell phone; and an auditory model to detect speech corrupted by additional background noises. Hermansky’s scientific contributions were recognized by the International Speech Communication Association (ISCA), which awarded him in 2013 its highest honor, the Medal for Scientific Achievement.

Hermansky’s service to the field is extensive and noteworthy. He is a Life Fellow of Institute of Electrical and Electronics Engineers (IEEE), a Fellow of the ISCA, and an External Fellow of the International Computer Science Institute. Highly sought-after by industry for his expertise, he is a current member of the advisory board for Germany’s Hearing4All Scientific Consortium Center of Excellence in Hearing Research, and he has served on advisory boards for Amazon, Audience, Inc., and VoiceBox Inc. His professional memberships include IEEE and ISCA, where he was twice elected as a board member. He is the member of the editorial board of Speech Communication, and he also was associate editor for IEEE Transaction on Speech and Audio and a former member of the editorial boards for Phonetica.

Hermansky serves in leadership roles for the field’s key workshops and conferences, presents invited lectures and keynote presentations around the globe and was lecturing worldwide as the Distinguished Lecturer for ISCA and for IEEE. Hermansky is the General Chair of INTERSPEECH 2021 in Brno, Czech Republic, was a General Chair of the 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), and chair of the technical committee for the ICASSP 2000. In addition to leading several Hopkins’ CLSP workshops, he was also on the organizational committee for ASRU 2017, ASRU2013 and ASRU 2005, for ten years was the executive chair of the annual ISCA-sponsored workshops on Text, Speech and Dialogue in the Czech Republic, and was a tutorial speaker at Interspeech 2015.

He received a M.S. in Electrical Engineering (1972) from Technical University Brno, Czech Republic and a Ph.D. in Electrical Engineering (1983) from University of Tokyo, Japan.

Awards and Honors

2013 International Speech Communication Association Medal for Scientific Achievement

 

Secondary Appointment: Director of Center for Language and Speech Processing

 

Journal Articles
  • Castro Martinez AM, Gerlach L, Payá-Vayá G, Hermansky H, Ooster J, Meyer BT (2019).  DNN-based performance measures for predicting error rates in automatic speech recognition and optimizing hearing aid parameters.  Speech Communication.  106.  44-56.
  • Hermansky H (2019).  Coding and decoding of messages in human speech communication: Implications for machine recognition of speech.  Speech Communication.  106.  112-117.
  • Wang X, Li R, Hermansky H (2018).  Stream attention for distributed multi-microphone speech recognition.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  2018-September.  3033-3037.
  • Meyer BT, Mallidi SH, Kayser H, Hermansky H (2017).  Predicting error rates for unknown data in automatic speech recognition.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  5330-5334.
  • Ogawa T, Mallidi SH, Dupoux E, Cohen J, Feldman NH, Hermansky H (2017).  A new efficient measure for accuracy prediction and its application to multistream-based unsupervised adaptation.  Proceedings - International Conference on Pattern Recognition.  2222-2227.
  • Meyer BT, Mallidi SH, Castro Martínez AM, Paya-Vaya G, Kayser H, Hermansky H (2017).  Performance monitoring for automatic speech recognition in noisy multi-channel environments.  2016 IEEE Workshop on Spoken Language Technology, SLT 2016 - Proceedings.  50-56.
  • Mallidi SH, Hermansky H (2016).  Novel neural network based fusion for multistream ASR.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  2016-May.  5680-5684.
  • Hsiao R, Ma J, Hartmann W, Karafiát M, Grézl F, Burget L, Szöke I, Cernocky JH, Watanabe S, Chen Z, Mallidi SH, Hermansky H, Tsakalidis S, Schwartz R (2016).  Robust speech recognition in unknown reverberant and noisy conditions.  2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings.  533-538.
  • Mallidi SH, Ogawa T, Hermansky H (2016).  Uncertainty estimation of DNN classifiers.  2015 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2015 - Proceedings.  283-288.
  • Spille C, Kayser H, Hermansky H, Meyer BT (2016).  Assessing speech quality in speech-aware hearing aids based on phoneme posteriorgrams.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  08-12-September-2016.  1755-1759.
  • Mallidi SH, Hermansky H (2016).  A framework for practical multistream ASR.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  08-12-September-2016.  3474-3478.
  • Mallidi SH, Ogawa T, Vesely K, Nidadavolu PS, Hermansky H (2015).  Autoencoder based multi-stream combination for noise robust speech recognition.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  2015-January.  3551-3555.
  • Hermansky H, Burget L, Cohen J, Dupoux E, Feldman N, Godfrey J, Khudanpur S, Maciejewski M, Mallidi SH, Menon A, Ogawa T, Peddinti V, Rose R, Stern R, Wiesner M, Veselý K (2015).  Towards machines that know when they do not know: Summary of work done at 2014 Frederick Jelinek Memorial Workshop.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  2015-August.  5009-5013.
  • Ganapathy S, Mallidi SH, Hermansky H (2014).  Robust feature extraction using modulation filtering of autoregressive models.  IEEE Transactions on Audio, Speech and Language Processing.  22(8).  1285-1295.
  • Mahajan N, Mesgarani N, Hermansky H (2014).  Principal components of auditory spectro-temporal receptive fields.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  1983-1987.
  • Kintzley K, Jansen A, Hermansky H (2014).  Featherweight phonetic keyword search for conversational speech.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  7859-7863.
  • Li F, Nidadavolu PS, Hermansky H (2014).  A long, deep and wide artificial neural net for robust speech recognition in unknown noise.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  358-362.
  • Schatz T, Peddinti V, Cao XN, Bach F, Hermansky H, Dupoux E (2014).  Evaluating speech features with the Minimal-Pair ABX task (II): Resistance to noise.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  915-919.
  • Hermansky H, Variani E, Peddinti V (2013).  Mean temporal distance: Predicting ASR error from temporal properties of speech signal.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  7423-7426.
  • Peddinti V, Hermansky H (2013).  Filter-bank optimization for Frequency Domain Linear Prediction.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  7102-7106.
  • Clark P, Mallidi SH, Jansen A, Hermansky H (2013).  Frequency offset correction in speech without detecting pitch.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  7020-7024.
  • Plchot O, Matsoukas S, Matejka P, Dehak N, Ma J, Cumani S, Glembek O, Hermansky H, Mallidi SH, Mesgarani N, Schwartz R, Soufifar M, Tan ZH, Thomas S, Zhang B, Zhou X (2013).  Developing a speaker identification system for the DARPA RATS project.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  6768-6772.
  • Li F, Hermansky H (2013).  Effect of filter bandwidth and spectral sampling rate of analysis filterbank on automatic phoneme recognition.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  7121-7124.
  • Jansen A, Thomas S, Hermansky H (2013).  Weak top-down constraints for unsupervised acoustic model training.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  8091-8095.
  • Jansen A, Dupoux E, Goldwater S, Johnson M, Khudanpur S, Church K, Feldman N, Hermansky H, Metze F, Rose R, Seltzer M, Clark P, McGraw I, Varadarajan B, Bennett E, Borschinger B, Chiu J, Dunbar E, Fourtassi A, Harwath D, Lee CY, Levin K, Norouzian A, Peddinti V, Richardson R, Schatz T, Thomas S (2013).  A summary of the 2012 JHU CLSP workshop on zero resource speech technologies and models of early language acquisition.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  8111-8115.
  • Thomas S, Seltzer ML, Church K, Hermansky H (2013).  Deep neural network features and semi-supervised training for low resource speech recognition.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  6704-6708.
  • Hermansky H (2013).  Long, deep and wide artificial neural nets for dealing with unexpected noise in machine recognition of speech.  Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics).  8082 LNAI.  14-21.
  • Hermansky H, Cohen JR, Stern RM (2013).  Perceptual properties of current speech recognition technology.  Proceedings of the IEEE.  101(9).  1968-1985.
  • Garimella S, Hermansky H (2013).  Factor analysis of auto-associative neural networks with application in speaker verification.  IEEE Transactions on Neural Networks and Learning Systems.  24(4).  522-528.
  • Hermansky H (2013).  Multistream recognition of speech: Dealing with unknown unknowns.  Proceedings of the IEEE.  101(5).  1076-1088.
  • Mallidi SH, Ganapathy S, Hermansky H (2013).  Robust speaker recognition using spectro-temporal autoregressive models.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  3689-3693.
  • Ma J, Zhang B, Matsoukas S, Mallidi SH, Li F, Hermansky H (2013).  Improvements in language identification on the RATS noisy speech corpus.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  69-73.
  • Kintzley K, Jansen A, Hermansky H (2013).  Text-to-speech inspired duration modeling for improved whole-word acoustic models.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  1253-1257.
  • Schatz T, Peddinti V, Bach F, Jansen A, Hermansky H, Dupoux E (2013).  Evaluating speech features with the minimal-pair ABX task: Analysis of the classical MFC/PLP pipeline.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  1781-1785.
  • Variani E, Li F, Hermansky H (2013).  Multi-stream recognition of noisy speech with performance monitoring.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  2978-2981.
  • Ogawa T, Li F, Hermansky H (2013).  Stream selection and integration in multistream ASR using GMM-based performance monitoring.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  3332-3336.
  • Thomas S, Ganapathy S, Jansen A, Hermansky H (2012).  Data-driven posterior features for low resource speech recognition applications.  13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012.  1.  790-793.
  • Jansen A, Thomas S, Hermansky H (2012).  Intrinsic spectral analysis for zero and high resource speech recognition.  13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012.  1.  878-881.
  • Kintzley K, Jansen A, Church K, Hermansky H (2012).  Inverting the point process model for fast phonetic keyword search.  13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012.  3.  2437-2440.
  • Ganapathy S, Hermansky H (2012).  Robust phoneme recognition using high resolution temporal envelopes.  13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012.  3.  1826-1829.
  • Li F, Mallidi SH, Hermansky H (2012).  Phone recognition in critical bands using sub-band temporal modulations.  13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012.  3.  1814-1817.
  • Kintzley K, Jansen A, Hermansky H (2012).  MAP estimation of whole-word acoustic models with dictionary priors.  13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012.  1.  786-789.
  • Variani E, Hermansky H (2012).  Estimating classifier performance in unknown noise.  13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012.  2.  1798-1801.
  • Thomas S, Mallidi SH, Janu T, Hermansky H, Mesgarani N, Zhou X, Shamma S, Ng T, Zhang B, Nguyen L, Matsoukas S (2012).  Acoustic and data-driven features for robust speech activity detection.  13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012.  3.  1983-1986.
  • Ganapathy S, Hermansky H (2012).  Temporal resolution analysis in frequency domain linear prediction.  Journal of the Acoustical Society of America.  132(5).
  • Garimella S, Mallidi SH, Hermansky H (2012).  Regularized auto-associative neural networks for speaker verification.  IEEE Signal Processing Letters.  19(12).  841-844.
  • Thomas S, Ganapathy S, Hermansky H (2012).  Multilingual MLP features for low-resource LVCSR systems.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  4269-4272.
  • Garcia-Romero D, Zhou X, Zotkin D, Srinivasan B, Luo Y, Ganapathy S, Thomas S, Nemala S, Sivaram GSVS, Mirbagheri M, Mallidi SH, Janu T, Rajan P, Mesgarani N, Elhilali M, Hermansky H, Shamma S, Duraiswami R (2012).  The UMD-JHU 2011 speaker recognition system.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  4229-4232.
  • Ikbal S, Misra H, Hermansky H, Magimai-Doss M (2012).  Phase AutoCorrelation (PAC) features for noise robust speech recognition.  Speech Communication.  54(7).  867-880.
  • Weinshall D, Zweig A, Hermansky H, Kombrink S, Ohl FW, Anemüller J, Bach JH, Van Gool L, Nater F, Pajdla T, Havlena M, Pavel M (2012).  Beyond novelty detection: Incongruent events, when general and specific classifiers disagree.  IEEE Transactions on Pattern Analysis and Machine Intelligence.  34(10).  1886-1901.
  • Anemüller J, Caputo B, Hermansky H, Ohl FW, Pajdla T, Pavel M, Van Gool L, Vogels R, Wabnik S, Weinshall D (2012).  DIRAC: Detection and identification of rare audio-visual events.  Studies in Computational Intelligence.  384.  3-35.
  • Sivaram GSVS, Hermansky H (2012).  Sparse multilayer perceptron for phoneme recognition.  IEEE Transactions on Audio, Speech and Language Processing.  20(1).  23-29.
  • Hirsch HG, Ganapathy S, Hermansky H (2012).  Comparison of different approaches for speech recognition in hands-free mode.  Proceedings of 10th ITG Symposium on Speech Communication.
  • Ganapathy S, Rajan P, Hermansky H (2011).  Multi-layer perceptron based speech activity detection for speaker verification.  IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.  321-324.
  • Sivaram GSVS, Thomas S, Hermansky H (2011).  Mixture of auto-associative neural networks for speaker verification.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  2381-2384.
  • Mallidi SH, Ganapathy S, Hermansky H (2011).  Modulation spectrum analysis for recognition of reverberant speech.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  189-192.
  • Carlin MA, Thomas S, Jansen A, Hermansky H (2011).  Rapid evaluation of speech representations for spoken term discovery.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  821-824.
  • Mesgarani N, Thomas S, Hermansky H (2011).  Adaptive stream fusion in multistream recognition of speech.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  2329-2332.
  • Kintzley K, Jansen A, Hermansky H (2011).  Event selection from phone posteriorgrams using matched filters.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  1905-1908.
  • Hermansky H (2011).  Speech recognition from spectral dynamics.  Sadhana - Academy Proceedings in Engineering Sciences.  36(5).  729-744.
  • Hermansky H (2011).  Dealing with unexpected words in automatic recognition of speech.  Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics).  6836 LNAI.  1-15.
  • Zweig G, Nguyen P, Van Compernolle D, Demuynck K, Atlas L, Clark P, Sell G, Wang M, Sha F, Hermansky H, Karakos D, Jansen A, Thomas S, S GSVS, Bowman S, Kao J (2011).  Speech recognitionwith segmental conditional random fields: A summary of the JHU CLSP 2010 Summer Workshop.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  5044-5047.
  • Sivaram GSVS, Hermansky H (2011).  Multilayer perceptron with sparse hidden outputs for phoneme recognition.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  5336-5339.
  • Thomas S, Nguyen P, Zweig G, Hermansky H (2011).  MLP based phoneme detectors for automatic speech recognition.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  5024-5027.
  • Mesgarani N, Thomas S, Hermansky H (2011).  Toward optimizing stream fusion in multistream recognition of speech.  Journal of the Acoustical Society of America.  130(1).
  • Pinto J, Garimella S, Magimai-Doss M, Hermansky H, Bourlard H (2011).  Analysis of MLP-based hierarchical phoneme posterior probability estimator.  IEEE Transactions on Audio, Speech and Language Processing.  19(2).  225-241.
  • Thomas S, Patil K, Ganapathy S, Mesgarani N, Hermansky H (2010).  A phoneme recognition framework based on auditory spectro-temporal receptive fields.  Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010.  2458-2461.
  • Mesgarani N, Thomas S, Hermansky H (2010).  A multistream multiresolution framework for phoneme recognition.  Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010.  318-321.
  • Sivaram GSVS, Ganapathy S, Hermansky H (2010).  Sparse auto-associative neural networks: Theory and application to speech recognition.  Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010.  2270-2273.
  • Ganapathy S, Thomas S, Hermansky H (2010).  Temporal envelope compensation for robust phoneme recognition using modulation spectrum.  Journal of the Acoustical Society of America.  128(6).  3769-3780.
  • Jansen A, Church K, Hermansky H (2010).  Towards spoken term discovery at scale with zero resources.  Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010.  1676-1679.
  • Thomas S, Ganapathy S, Hermansky H (2010).  Cross-lingual and multi-stream posterior features for low resource LVCSR systems.  Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010.  877-880.
  • Motlicek P, Ganapathy S, Hermansky H, Garudadri H (2010).  Wide-band audio coding based on frequency-domain linear prediction.  Eurasip Journal on Audio, Speech, and Music Processing.  2010.
  • Ganapathy S, Thomas S, Hermansky H (2010).  Comparison of modulation features for phoneme recognition.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  5038-5041.
  • Hermansky H (2010).  History of modulation spectrum in ASR.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  5458-5461.
  • Ganapathy S, Thomas S, Hermansky H (2010).  Robust spectro-temporal features based on autoregressive models of Hilbert envelopes.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  4286-4289.
  • Sivaram GSVS, Nemala SK, Elhilali M, Tran TD, Hermansky H (2010).  Sparse coding for speech recognition.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  4346-4349.
  • Kombrink S, Hannemann M, Burget L, Hermanský H (2010).  Recovery of rare words in lecture speech.  Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics).  6231 LNAI.  330-337.
  • Sivaram GSVS, Nemala SK, Mesgarani N, Hermansky H (2010).  Data-driven and feedback based spectro-temporal features for speech recognition.  IEEE Signal Processing Letters.  17(11).  957-960.
  • Delbruck T, Koch T, Berner R, Hermansky H (2010).  Fully integrated 500uW speech detection wake-up circuit.  ISCAS 2010 - 2010 IEEE International Symposium on Circuits and Systems: Nano-Bio Circuit Fabrics and Systems.  2015-2018.
  • Liu SC, Mesgarani N, Harris J, Hermansky H (2010).  The use of spike-based representations for hardware audition systems.  ISCAS 2010 - 2010 IEEE International Symposium on Circuits and Systems: Nano-Bio Circuit Fabrics and Systems.  505-508.
  • Ganapathy S, Motlicek P, Hermansky H (2010).  Autoregressive models of amplitude modulations in audio compression.  IEEE Transactions on Audio, Speech and Language Processing.  18(6).  1624-1631.
  • Weinshall D, Hermansky H, Zweig A, Luo J, Jimison H, Ohl F, Pavel M (2009).  Beyond novelty detection: Incongruent events, when general and specific classifiers disagree.  Advances in Neural Information Processing Systems 21 - Proceedings of the 2008 Conference.  1745-1752.
  • Ganapathy S, Thomas S, Hermansky H (2009).  Temporal envelope subtraction for robust speech recognition using modulation spectrum.  Proceedings of the 2009 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2009.  164-169.
  • Ganapathy S, Thomas S, Motlicek P, Hermansky H (2009).  Applications of signal analysis using autoregressive models for amplitude modulation.  IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.  341-344.
  • Ganapathy S, Motlicek P, Hermansky H (2009).  MDCT for encoding residual signals in frequency domain linear prediction.  127th Audio Engineering Society Convention 2009.  2.  1103-1110.
  • Thomas S, Ganapathy S, Hermansky H (2009).  Tandem representations of spectral envelope and modulation frequency features for ASR.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  2955-2958.
  • Ganapathy S, Thomas S, Hermansky H (2009).  Static and dynamic modulation spectrum for speech recognition.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  2823-2826.
  • Kombrink S, Burget L, Matejka P, Karafiát M, Hermansky H (2009).  Posterior-based out of vocabulary word detection in telephone speech.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  80-83.
  • Mesgarani N, Sivaram GSVS, Nemala SK, Elhilali M, Hermansky H (2009).  Discriminant spectrotemporal features for phoneme recognition.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  2983-2986.
  • Motlicek P, Ganapathy S, Hermansky H (2009).  Arithmetic coding of sub-band residuals in FDLP speech/audio Codec.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  2591-2594.
  • Stricker C, Wagen JF, Aradilla G, Bourlard H, Hermansky H, Pinto J, Rey PH, Théraulaz J (2009).  Intelligent multi-modal interfaces for mobile applications in hostile environment(IM-HOST).  Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics).  5440 LNCS.  71-102.
  • Ganapathy S, Motlicek P, Hermansky H (2009).  Error resilient speech coding using sub-band hilbert envelopes.  Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics).  5729 LNAI.  355-362.
  • Pavel M, Slaney M, Hermansky H (2009).  Reconciliation of human and machine speech recognition performance.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  1669-1672.
  • Thomas S, Ganapathy S, Hermansky H (2009).  Phoneme recognition using spectral envelope and modulation frequency features.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  4453-4456.
  • Pinto J, Sivaram GSVS, Hermansky H, Magimai-Doss M (2009).  Volterra series for analyzing MLP based phoneme posterior estimator.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  1813-1816.
  • Ganapathy S, Thomas S, Hermansky H (2009).  Modulation frequency features for phoneme recognition in noisy speech.  Journal of the Acoustical Society of America.  125(1).
  • Thomas S, Ganapathy S, Hermansky H (2008).  Hilbert envelope based features for far-field speech recognition.  Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics).  5237 LNCS.  119-124.
  • Sivaram GSVS, Hermansky H (2008).  Introducing temporal asymmetries in feature extraction for automatic speech recognition.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  890-893.
  • Sivaram GSVS, Hermansky H (2008).  Emulating temporal receptive fields of auditory mid-brain neurons for automatic speech recognition.  European Signal Processing Conference.
  • Valente F, Hermansky H (2008).  On the combination of auditory and modulation frequency channels for ASR applications.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  2242-2245.
  • Tošic T, Magimai-Doss M, Hermansky H (2008).  Using comparison of parallel phoneme probability streams for OOV word detection.  European Signal Processing Conference.
  • Thomas S, Ganapathy S, Hermansky H (2008).  Recognition of reverberant speech using frequency domain linear prediction.  IEEE Signal Processing Letters.  15.  681-684.
  • Ganapathy S, Thomas S, Hermansky H (2008).  Front-end for far-field speech recognition based on frequency domain linear prediction.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  984-987.
  • Pinto J, Hermansky H (2008).  Combining evidence from a generative and a discriminative model in phoneme recognition.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  2414-2417.
  • Anemüller J, Bach JH, Caputo B, Havlena M, Jie L, Kayser H, Leibe B, Motlicek P, Pajdla T, Pavel M, Torii A, Gool LV, Zweig A, Hermansky H (2008).  The DIRAC AWEAR audio-visual platform for detection of unexpected and incongruent events.  ICMI'08: Proceedings of the 10th International Conference on Multimodal Interfaces.  289-292.
  • Ganapathy S, Motlicek P, Hermansky H, Garudadri H (2008).  Spectral noise shaping: Improvements in speech/audio codec based on linear prediction in spectral domain.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  675-678.
  • Thomas S, Ganapathy S, Hermansky H (2008).  Hilbert envelope based spectro-temporal features for phoneme recognition in telephone speech.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  1521-1524.
  • Thomas S, Ganapathy S, Hermansky H (2008).  Spectro-temporal features for automatic speech recognition using linear prediction in spectral domain.  European Signal Processing Conference.
  • Ganapathy S, Motlicek P, Hermansky H, Garudadri H (2008).  Autoregressive modelling of hilbert envelopes for wide-band audio coding.  Audio Engineering Society - 124th Audio Engineering Society Convention 2008.  3.  1620-1627.
  • Motlícek P, Ganapathy S, Hermansky H, Garudadri H, Athineos M (2008).  Perceptually motivated sub-band decomposition for FDLP audio coding.  Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics).  5246 LNAI.  435-442.
  • Sivaram GSVS, Hermansky H (2008).  Emulating temporal receptive fields of higher level auditory neurons for ASR.  Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics).  5246 LNAI.  509-516.
  • Pinto J, Sivaram GSVS, Hermansky H (2008).  Reverse correlation for analyzing MLP posterior features in ASR.  Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics).  5246 LNAI.  469-476.
  • Krishnan Parthasarathi SH, Motlícek P, Hermansky H (2008).  Exploiting contextual information for speech/non-speech detection.  Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics).  5246 LNAI.  451-459.
  • Burget L, Schwarz P, Matejka P, Hannemann M, Rastrow A, White C, Khudanpur S, Hermansky H, Cernocký J (2008).  Combination of strongly and weakly constrained recognizers for reliable detection of OOVs.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  4081-4084.
  • Ganapathy S, Motlicek P, Hermansky H, Garudadri H (2008).  Temporal masking for bit-rate reduction in audio codec based on Frequency Domain Linear Prediction.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  4781-4784.
  • White C, Zweig G, Burget L, Schwarz P, Hermansky H (2008).  Confidence estimation, OOV detection and language ID using phone-to-word transduction and phone-level alignments.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  4085-4088.
  • Valente F, Hermansky H (2008).  Hierarchical and parallel processing of modulation spectrum for ASR applications.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  4165-4168.
  • Pinto J, Yegnanarayana B, Hermansky H, Magimai.-Doss M (2008).  Exploiting contextual information for improved phoneme recognition.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  4449-4452.
  • Motlicek P, Ganapathy S, Hermansky H, Garudadri H (2008).  Frequency domain linear prediction for QMF sub-bands and applications to audio coding.  Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics).  4892 LNCS.  248-258.
  • Valente F, Vepa J, Plahl C, Gollan C, Hermansky H, Schlüter R (2007).  Hierarchical Neural Networks feature extraction for LVCSR system.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  1.  265-268.
  • Ketabdar H, Hannemann M, Hermansky H (2007).  Detection of out-of-vocabulary words in posterior based ASR.  International Speech Communication Association - 8th Annual Conference of the International Speech Communication Association, Interspeech 2007.  4.  2772-2775.
  • Valente F, Vepa J, Hermansky H (2007).  Multi-stream features combination based on Dempster-Shafer rule for LVCSR system.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  1.  273-276.
  • Motlicek P, Hermansky H, Ganapathy S, Garudadri H (2007).  Non-uniform speech/audio coding exploiting predictability of temporal evolution of spectral envelopes.  Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics).  4629 LNAI.  350-357.
  • Pinto J, Lovitt A, Hermansky H (2007).  Exploiting phoneme similarities in hybrid HMM-ANN keyword spotting.  International Speech Communication Association - 8th Annual Conference of the International Speech Communication Association, Interspeech 2007.  4.  2388-2391.
  • Prasanna SRM, Hermansky H (2007).  MRASTA and PLP in automatic speech recognition.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  1.  137-140.
  • Motlicek P, Ullal V, Hermansky H (2007).  Wide-band perceptual audio coding based on frequency-domain linear prediction.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  1.
  • Valente F, Hermansky H (2007).  Combination of acoustic classifiers based on dempster-shafer theory of evidence.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  4.
  • Fousek P, Hermansky H (2006).  Towards asr based on hierarchical posterior-based keyword recognition.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  1.
  • Valente F, Hermansky H (2006).  Discriminant linear processing of time-frequency plane.  Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH.  1.  349-352.
  • Motlíek P, Hermansky H, Garudadri H, Srinivasamurthy N (2006).  Speech coding based on spectral dynamics.  Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics).  4188 LNCS.  471-478.
  • Hermansky H, Fousek P (2005).  Multi-resolution RASTA filtering for TANDEM-based ASR.  9th European Conference on Speech Communication and Technology.  361-364.
  • Hermansky H, Fousek P, Lehtonen M (2005).  The role of speech in multimodal human-computer interaction (towards reliable rejection of non-keyword input).  Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics).  3658 LNAI.  2-8.
  • Verhelst W, Herre J, Kubin G, Hermansky H, Jensen SH (2005).  Eurasip Journal on Applied Signal Processing: Editorial.  Eurasip Journal on Applied Signal Processing.  2005(9).  1289-1291.
  • Morgan N, Zhu Q, Stolcke A, Sönmez K, Sivadas S, Shinozaki T, Ostendorf M, Jain P, Hermansky H, Ellis D, Doddington G, Chen B, Çetin O, Bourlard H, Athineos M (2005).  Pushing the envelope - Aside.  IEEE Signal Processing Magazine.  22(5).  81-88.
  • Sivadas S, Hermansky H (2004).  On use of task independent training data in tandem feature extraction.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  1.
  • Misra H, Ikbal S, Bourlard H, Hermansky H (2004).  Spectral entropy based feature for robust ASR.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  1.
  • Ikbal S, Misra H, Bourlard H, Hermansky H (2004).  Phase AutoCorrelation (PAC) features in entropy based multi-stream for robust speech recognition.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  1.
  • Athineos M, Hermansky H, Ellis DPW (2004).  LP-TRAP: Linear predictive temporal patterns.  8th International Conference on Spoken Language Processing, ICSLP 2004.  949-952.
  • Fousek P, Svojanovský P, Grézl F, Hermansky H (2004).  New nonsense syllables database - Analyses and preliminary asr experiments.  8th International Conference on Spoken Language Processing, ICSLP 2004.  2749-2752.
  • Ikbal S, Misra H, Sivadas S, Hermansky H, Bourlard H (2004).  Entropy based combination of tandem representations for noise robust ASR.  8th International Conference on Spoken Language Processing, ICSLP 2004.  2553-2556.
  • Matejka P, Schwarz P, Hermansky H, Cernocky J (2003).  Phoneme recognition using temporal patterns.  Lecture Notes in Artificial Intelligence (Subseries of Lecture Notes in Computer Science).  2807.  198-205.
  • Sivadas S, Hermansky H (2003).  Generalized Tandem feature extraction.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  1.  56-59.
  • Ikbal S, Hermansky H, Bourlard H (2003).  Nonlinear spectral transformations for robust speech recognition.  2003 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003.  393-398.
  • Sivadas S, Hermansky H (2003).  In search of target class definition in tandem feature extraction.  EUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology.  837-840.
  • Adami AG, Hermansky H (2003).  Segmentation of speech for speaker and language recognition.  EUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology.  841-844.
  • Jain P, Hermansky H (2003).  Beyond a single critical-band in TRAP based ASR.  EUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology.  437-440.
  • Hermansky H, Jain P (2003).  Band-independent speech-event categories for TRAP based ASR.  EUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology.  1013-1016.
  • Malayath N, Hermansky H (2003).  Data-driven spectral basis functions for automatic speech recognition.  Speech Communication.  40(4).  449-466.
  • Grézl F, Hermansky H (2003).  Local averaging and differentiating of spectral plane for TRAP-based ASR.  EUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology.  1017-1020.
  • Kajarekar SS, Adami AG, Hermansky H (2003).  Novel approaches for one- and two-speaker detection.  EUROSPEECH 2003 - 8th European Conference on Speech Communication and Technology.  2661-2664.
  • Kajarekar SS, Hermansky H (2003).  Analysis of information in speech based on MANOVA.  Advances in Neural Information Processing Systems.
  • Hermansky H (2003).  TRAP-TANDEM: Data-driven extraction of temporal features from speech.  2003 IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU 2003.  255-260.
  • Sivadas S, Hermansky H (2002).  Hierarchical tandem feature extraction.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  1.
  • Adami AG, Kajarekar SS, Hermansky H (2002).  A new speaker change detection method for two-speaker segmentation.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  4.
  • Jain P, Hermansky H, Kingsbury B (2002).  Distributed speech recognition using noise-robust MFCC and TRAPS-estimated manner features.  7th International Conference on Spoken Language Processing, ICSLP 2002.  473-476.
  • Malayath N, Hermansky H (2002).  Bark resolution from speech data.  7th International Conference on Spoken Language Processing, ICSLP 2002.  2169-2172.
  • Adami A, Burget L, Dupont S, Garudadri H, Grezl F, Hermansky H, Jain P, Kajarekar S, Morgan N, Sivadas S (2002).  Qualcomm-ICSI-OGI features forasr.  7th International Conference on Spoken Language Processing, ICSLP 2002.  21-24.
  • Kajarekar SS, Yegnanarayana B, Hermansky H (2001).  A study of two dimensional linear discriminants for ASR.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  1.  137-140.
  • Hermansky H (2000).  Preface.  Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics).  1902.  V.
  • Malayath N, Hermansky H, Kajarekar S, Yegnanarayana B (2000).  Data-driven temporal filters and alternatives to GMM in speaker verification.  Digital Signal Processing: A Review Journal.  10(1).  55-74.
  • Yang HH, Hermansky H (2000).  Search for information bearing components in speech.  Advances in Neural Information Processing Systems.  803-809.
  • Hermansky H, Ellis DPW, Sharma S (2000).  Tandem connectionist feature extraction for conventional HMM systems.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  3.  1635-1638.
  • Yang HH, Van Vuuren S, Sharma S, Hermansky H (2000).  Relevance of time-frequency features for phonetic and speaker-channel classification.  Speech Communication.  31(1).  35-50.
  • Kajarekar SS, Hermansky H (2000).  Analysis of information in speech and its application in speech recognition.  Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics).  1902.  283-288.
  • Sivadas S, Jain P, Hermansky H (2000).  Discriminative MLPS in HMM-based recognition of speech in cellular telephony.  6th International Conference on Spoken Language Processing, ICSLP 2000.
  • Sharma S, Ellis D, Kajarekar S, Jain P, Hermansky H (2000).  Feature extraction using non-linear transformation for robust speech recognition on the Aurora database.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  2.  1117-1120.
  • Kajarakar SS, Hermansky H (2000).  Optimization of units for continuous-digit recognition task.  6th International Conference on Spoken Language Processing, ICSLP 2000.
  • Jain P, Hermansky H (2000).  Temporal patterns of critical-band spectrum for text-to-speech.  6th International Conference on Spoken Language Processing, ICSLP 2000.
  • Arai T, Pavel M, Hermansky H, Avendano C (1999).  Syllable intelligibility for temporally filtered LPC cepstral trajectories.  Journal of the Acoustical Society of America.  105(5).  2783-2791.
  • Hermansky H, Sharma S (1999).  TempoRAl Patterns (TRAPs) in ASR of noisy speech.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  1.  289-292.
  • Hermansky H (1999).  Data-driven analysis of speech.  Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics).  1692.  10-18.
  • Kanedera N, Arai T, Hermansky H, Pavel M (1999).  On the relative importance of various components of the modulation spectrum for automatic speech recognition.  Speech Communication.  28(1).  43-55.
  • Yegnanarayana B, Avendano C, Hermansky H, Satyanarayana Murthy P (1999).  Speech enhancement using linear prediction residual.  Speech Communication.  28(1).  25-42.
  • Yang H, van Vuuren S, Hermansky H (1999).  Relevancy of time-frequency features for phonetic classification measured by mutual information.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  1.  225-228.
  • Kanedera N, Hermansky H, Arai T (1998).  On properties of modulation spectrum for robust automatic speech recognition.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  2.  613-616.
  • Yegnanarayana B, Satyanarayana Murthy P, Avendano C, Hermansky H (1998).  Enhancement of reverberant speech using LP residual.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  1.  405-408.
  • Hermansky H (1998).  Should recognizers have ears?.  Speech Communication.  25(1-3).  3-27.
  • Avendano C, Hermansky H (1997).  On the properties of temporal processing for speech in adverse environments.  IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics.
  • Avendano C, Hermansky H (1997).  On the effects of short-term spectrum smoothing in channel normalization.  IEEE Transactions on Speech and Audio Processing.  5(4).  372-374.
  • Hermansky H (1997).  Modulation spectrum in the automatic recognition of speech.  IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.  140-147.
  • Tibrewala S, Hermansky H (1997).  Sub-band based recognition of noisy speech.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  2.  1255-1258.
  • Arai T, Pavel M, Hermansky H, Avendano C (1996).  Intelligibility of speech with filtered time trajectories of spectral envelopes.  International Conference on Spoken Language Processing, ICSLP, Proceedings.  4.  2490-2493.
  • Avendano C, van Vuuren S, Hermansky H (1996).  Data based filter design for RASTA-like channel normalization in ASR.  International Conference on Spoken Language Processing, ICSLP, Proceedings.  4.  2087-2090.
  • Avendano C, Hermansky H (1996).  Study on the dereverberation of speech based on temporal envelope filtering.  International Conference on Spoken Language Processing, ICSLP, Proceedings.  2.  889-892.
  • Hermansky H, Tibrewala S, Pavel M (1996).  Towards ASR on partially corrupted speech.  International Conference on Spoken Language Processing, ICSLP, Proceedings.  1.  462-465.
  • Avendano C, Hermansky H, Vis M, Bayya A (1996).  Adaptive speech enhancement using frequency-specific SNR estimates.  IEEE Workshop on Interactive Voice Technology for Telecommunications Applications, IVTTA.  65-68.
  • Bourlard H, Hermansky H, Morgan N (1996).  Towards increasing speech recognition error rates.  Speech Communication.  18(3).  205-231.
  • Morgan N, Bourlard H, Greenberg S, Hermansky H, Wu SL (1995).  Stochastic perceptual models of speech.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  1.  397-400.
  • Cole R, Hermansky H, Novick DG, Oviatt S, Hirschman L, Atlas L, Beckman M, Biermann A, Bush M, Clements M, Cohen J, Garcia O, Hanson B, Levinson S, McKeown K, Morgan N, Ostendorf M, Price P, Silverman H, Spitz J, Waibel A, Weinstein C, Zahorian S, Zue V (1995).  The Challenge of Spoken Language Systems: Research Directions for the Nineties.  IEEE Transactions on Speech and Audio Processing.  3(1).  1-21.
  • Hermansky H, Wan EA, Avendano C (1995).  Speech enhancement based on temporal processing.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  1.  405-408.
  • Hermansky H, Wan EA, Avendano C (1994).  Noise suppression in cellular communications.  Proceedings of the 2nd IEEE Workshop on Interactive Voice Technology for Telecommunications Applications (IVTTA 94).  85-88.
  • Hermansky H, Morgan N (1994).  RASTA Processing of Speech.  IEEE Transactions on Speech and Audio Processing.  2(4).  578-589.
  • Hermansky H, Morgan N, Hirsch HG (1993).  Recognition of speech in additive and convolutional noise based on RASTA spectral processing.  Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing.  2.
  • Hermansky H, Morgan N, Bayya A, Kohn P (1992).  RASTA-PLP speech analysis technique.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  1.  121-124.
  • Morgan N, Hermansky H, Bourlard H, Kohn P, Wooters C (1991).  Continuous speech recognition using PLP analysis with multilayer perceptrons.  Proceedings - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing.  1.  49-52.
  • Morgan N, Wooters C, Hermansky H (1991).  Experiments with temporal resolution for continuous speech recognition with multi-layer perceptrons.  Neural Networks for Signal Processing.  405-410.
  • Bayya A, Hermansky H (1990).  Towards feature-based speech metric.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  2.  781-784.
  • Hermansky H (1990).  Perceptual linear predictive (PLP) analysis of speech.  Journal of the Acoustical Society of America.  87(4).  1738-1752.
  • Hermansky H, Broad DJ (1989).  Effective second formant F2′ and the vocal tract front-cavity.  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  1.  480-483.
  • Hermansky H, Junqua JC (1988).  OPTIMIZATION OF PERCEPTUALLY-BASED ASR FRONT-END..  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  219-222.
  • Hermansky H (1987).  EFFICIENT SPEAKER-INDEPENDENT AUTOMATIC SPEECH RECOGNITION BY SIMULATION OF SOME PROPERTIES OF HUMAN AUDITORY PERCEPTION..  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  1159-1162.
  • Hermansky H, Tsuga K, Makino S, Wakita H (1986).  PERCEPTUALLY BASED PROCESSING IN AUTOMATIC SPEECH RECOGNITION..  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  1971-1974.
  • Hermansky H, Hanson BA, Wakita H (1985).  PERCEPTUALLY BASED LINEAR PREDICTIVE ANALYSIS OF SPEECH..  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  509-512.
  • Hermansky H, Hanson BA, Wakita H (1985).  Low-dimensional representation of vowels based on all-pole modeling in the psychophysical domain.  Speech Communication.  4(1-3).  181-187.
  • Hermansky H, Hanson BA, Wakita H, Fujisaki H (1985).  LINEAR PREDICTIVE MODELING OF SPEECH IN MODIFIED SPECTRAL DOMAINS..  IERE Conference Proceedings.  (62).  55-62.
  • Hermansky H, Fujisaki H, Sato Y (1984).  SPECTRAL ENVELOPE SAMPLING AND INTERPOLATION IN LINEAR PREDICTIVE ANALYSIS OF SPEECH..  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  1.
  • Hermansky H, Fujisaki H, Sato Y (1983).  ANALYSIS AND SYNTHESIS OF SPEECH BASED ON SPECTRAL TRANSFORM LINEAR PREDICTIVE METHOD..  ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings.  2.  777-780.
Other Publications
  • Elhilali M, Hermansky H, Andreou A, Anderson D, Vasilaki E, Slaney M, Lewicki M, Mesgarani N, Duangudom T, Shamma SA (2004).  Designing a biological key-word spotting system.  Neuromorphic Engineering Workshop, Telluride, CO.
Conference Proceedings
  • Garcia-Romero D, Zhou X, Zotkin D, Srinivasan B, Luo Y, Ganapathy S, Thomas S, Nemala S, G. S. V. S. Sivaram , Mirbagheri M, Mallidi SH, Janu T, Rajan P, Mesgarani N, Elhilali M, Hermansky H, Shamma S, Duraiswami R (2012).  The UMD-JHU 2011 speaker recognition system.  Proceedings of the 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).  4229-4232.
  • G. S. V. S. Sivaram , Nemala SK, Elhilali M, Tran T, Hermansky H (2010).  Sparse coding for speech recognition.  Proceedings of the Acoustics Speech and Signal Processing (ICASSP).  4346-4349.
  • Mesgarani N, Sivaram G, Nemala S, Elhilali M, Hermansky H (2009).  Discriminant Spectrotemporal Features for Phoneme Recognition.  Proceedings of the 10th Annual Conference of the International Speech Communication Association (INTERSPEECH).
Back to top