Doctor of Engineering in Electrical Engineering 1983

Hynek Hermansky

Education

University of Tokyo, Japan

Dissertation Title: Improved Linear Predictive Analysis Based On Spectral Processing

Research Supervisor: Professor Hiroya Fujisaki

Master of Science in Electrical Engineering 1972

Technical University Brno, Czech Republic

Thesis Title: Synthesizer of Electronic Sounds

Current Appointments

· Julian S. Smith Professor of Electrical Engineering, the Johns Hopkins University, Whiting School of Engineering, Since April 2012

· Director, Center for Language and Speech Processing, the Johns Hopkins University Since April 2012

· Research Professor (on leave of absence), Brno University of Technology, Czech Republic, Since 2000

Past Appointments

· Professor, Department of Electrical and Computer Engineering, the Johns Hopkins University, Whiting School of Engineering, October 2008 to March 2012

· Interim Director, Center for Language and Speech Processing, the Johns Hopkins University Since Nov. 2010 to March 2012

· Director of Research, IDIAP Research Institute, Martigny, Switzerland, 2003 to 2008

· Titular Professor, Swiss Federal Institute of Technology at Lausanne, 2005 to 2008

· Professor and Director, Center for Information Technology, Oregon Graduate Institute of Science and Technology, Portland, Oregon, 1997 to 2003 (Associate Professor, 1993 to 1997)

· Senior Member of Technical Staff, U S WEST Advanced Technologies, Boulder, Colorado, 1990 to 1993 (Member of Technical Staff, 1988 to 1990)

· Research Engineer, Panasonic Technologies, Speech Research Laboratory, Santa Barbara, California, 1983 to 1988

· Research Assistant, University of Tokyo, Japan, 1979 to 1983

· Assistant Professor, Brno University of Technology, Czech Republic, 1972 to 1978

Teaching

The Johns Hopkins University

· EN.520.315 Introduction to Information Processing of Sensory Signals (3rd year students), since 2009

· EN.520.515 Processing of Audio and Visual Signals (Graduate level), since 2011

· EN.520.680 Audio and visual processing humans and by machines (Graduate level), since 2010

Other institutions

· Speech and Audio Processing by Humans and by Machines, (Graduate seminar), Swiss Federal Institute of Technology at Lausanne 2005 to 2007

· Speech and Audio Processing by Humans and by Machines, (Graduate level), Brno University of Technology, Czech Republic, since 2000

· Speech and Audio Processing by Humans and by Machines, (Graduate level), Oregon Graduate Institute of Science and Technology, Portland, Oregon 1994 to 2003

· Speech Processing (Graduate level)

· Speech Systems (Graduate level)

· Signals and systems for multimedia engineering (Graduate level), Brno University of Technology, Czech Republic 1972 to 1978

· Theory of Communications (5th year students)

· Electroacoustics (4th year students)

Invited lecturer in special courses and tutorials

· Invited lecture on neural networks for speech, Academica Sinica, Taipei, Taiwan, May 2014

· Invited talk, International Conference on Learning Representations 2014, Banff, Canada, April 2014

· Invited tutorial, Speech Processing, Indian Institute of Information Technology, Hyderabad, India, December 2013

· Invited Keynote Talk, Artificial Neural Networks: Deep, Long and Wide, Oriental COCOSDA, Gurgaon, India, November 2013

· ISCA Medalist Keynote Talk, Interspeech 2013, Lyon, France

· Invited lecture on neural networks, Carnegie-Mellon University, Colloquium of Language Technology Institute, October 2013

· Invited lecture on Speech Coding in ASR, Workshop on Computational Models of Early Language Acquisition, Ecole Normal Superieur, Paris, July 2013

· Invited talk on Multistream Recogntion of Speech, 2013 Telluride Workshop on Neuromorophic Cognition Engineering

· Tutorial on Speech Processing, 2009 Telluride Workshop on Neuromorophic Cognition Engineering

· Tutorial on Machine Recognition of Speech, 2009 Summer Workshop on Human Language Technology, The Johns Hopkins University, Baltimore

· Invited Plenary Lecture, Information Extraction from Cognitive Signals, IEEE INDICON 2011, Hyderabad, India

· Invited talk, Workshop on Image and Speech Processing 2012, Hyderabad, India

· Invited talk, Schlumberger workshop on Mathematical Models of Sound Analysis, IHES, Paris, June 2012

· Invited talk, Workshop on Image and Speech Processing 2011, Hyderabad, India

· Invited talk, Computational Hearing Workshop, University College London, May 2010

· Invited talk, Multistream Processing of Speech, IBM Watson Research Center, 2010

· Leading session on classification, learning, and self-organization in biological systems at the School on Neuromorphic Cognition 2009, Capo Caccia, Sardinia

· Tutorial on Speech Processing, 2008 Telluride Workshop on Neuromorophic Cognition Engineering

· Tutorial on modulation spectrum processing, Interspeech 2007, Antwerp, Belgium

· Tutorial at the Summer Workshop on Multi-Sensory Modalities in Cognitive Science, Gerzensee, Switzerland, 2007

· Invited talk, Extraction Information from Speech, IBM Watson Research Center, 2007

· Tutorial on Speech Processing, 2006 Telluride Workshop on Neuromorophic Cognition Engineering

· Tutorial at Indian National Academy of Engineering Workshop on Image and Speech Processing, 2006

· Tutorial on Speech Processing, 2005 Telluride Workshop on Neuromorophic Cognition Engineering

· Invited lecturer at International Graduate School for Neurosensory Science, Bad Zwischenahn, Germany 2004

· Tutorial on Speech Processing, 2004 Telluride Workshop on Neuromorophic Engineering

· Tutorial at the International Conference on Intelligent Sensors, Melbourne, Australia, 2004

· Invited talk, Carnegie-Mellon University, 2003

· Invited lecturer at the Language and Speech Engineering Postgraduate Course 2002, EPFL Lausanne,

· Invited talk, Center for Language and Speech Processing Distinguished Lectures Series, 2002

· Invited lecturer at the Language and Speech Engineering Postgraduate Course 2001, EPFL Lausanne,

· Invited lecturer at the LOT Summer School 2001, Nijmegen, Netherlands

· Invited lecturer at the International Graduate School for Neurosensory Science, Bad Zwischenahn, Germany 2001

· Invited lecturer at the NATO Workshop on Temporal Dynamics of Speech 2001, Ill Chiocco, Italy

· Invited lecturer at the Speech Euromasters Summer School 2000 Chios, Greece

· Invited Keynote Talk, Advances in Neural Information Processing (NIPS2000), Denver, Colorado, December 2000

· Tutorial at the Fifth International Symposium on Signal Processing and its Applications, Brisbane, Australia, 1999

· Invited lecturer at the ESCA Workshop on Auditory Basis of Speech Perception 1996, Keele, U.K.

· Invited lecturer at the NATO Workshop on Robust Speech Recognition 1997, Il Chiocco, Italy

· Invited lecturer at the intensive course on speech processing 1998, KTH Stockholm, Sweden

· Invited lecturer at the Summer School "Speech Recognition and Neural Networks" 1998, Vietri-sul-Mare, Italy

Workshop and panel organization, invited panelist,

· General Chair, 2013 IEEE Workshop on Automatic Speech Recognition and Understanding, Olomouc, Czech Republic, 2013

· Organizer, Johns Hopkins Summer Workshop on Speaker Recognition, June-July 2013

· Organizing and leading ONR sponsored Workshop on Temporal Dynamics in Speech and Hearing, Antwerp, Belgium 2007

· Organizing and leading a group at the 2007 JHU 6 week Summer Workshop on Human Language Technology

· Organizing and leading a panel on Handling Unexpected Acoustic Data in Machine Recognition of Speech, IEEE Workshop on Automatic Speech Recognition and Understanding, Kyoto, Japan 2007

· Organizing and leading a panel on Human-like Speech Processing, IEEE Workshop on Automatic Speech Recognition and Understanding, San Juan, Puerto Rico 2005

· Leading a panel on Industrial Applications of Speech Technology, Eurospeech 2001, Aalborg, Denmark

· Invited participant on the panel on Future of Speech Research, International Conference on Spoken Language Processing, Sydney, Australia 2000

· Invited senior scientist (seven times), 6 week JHU Summer Workshop on Human Language Technology

· Invited Faculty (six times), Telluride Summer Workshop on Neuromorphic Engineering

Publications

Articles in Refereed Journals

1. Hynek Hermansky, Jordan R. Cohen, Richard M. Stern, Perceptual Properties

of Current Speech Recognition Technology, Invited Paper, Proceedings of IEEE, Invited Paper, Proceedings of IEEE, vol, 101, No. 9, pp. 1968-1985, September 2013

2. Hynek Hermansky, Dealing with Unknown Unknowns: Multi-stream Recognition of Speech, Invited Paper, Proceedings of IEEE, Vol, 101, No. 5, pp. 1076-1088, May 2013

3. Sriram Ganapathy and Hynek Hermansky, "Temporal Resolution Analysis in Frequency Domain Linear Prediction", Express Letters section of the Journal of the Acoustical Society of America, September 2012

4. S. Garimella, S. Mallidi and H. Hermansky, "Regularized Auto-Associative Neural Networks for Speaker Verification", IEEE Signal Processing Letters, Dec 2012, pp. 841-844

5. D. Weinshall, A. Zweig, H. Hermansky, S. Kombrink, F. W. Ohl, J. Anemueller, J. H. Bach, L. Van Gool, F. Nater, T. Pajdla, M. Havlena and M. Pavel, Beyond Novelty Detection: Incongruent Events, when General and Specific Classifiers Disagree, IEEE Transactions On Pattern Analysis And Machine Intelligence, Vol. 34, No 10, pp. 1886-1901, October 2012

6. H. Hermansky, Speech recognition from spectral dynamics, Invited Paper, SADHANA, Indian Academy of Sciences, Vol. 36, Part 5, October 2011, pp. 729–744

7. N. Mesgarani, S. Thomas and H. Hermansky, Towards optimizing stream fusion in multistream recognition of speech, J. Acoust. Soc. Am. Volume 130, Issue 1, pp. EL14-EL18, 2011

8. G. Sivaram and H. Hermansky, Sparse Multilayer Perceptron for Phoneme Recognition, IEEE Transactions on Audio, Speech, And Language Processing, vol.20, no.1, pp.23-29, Jan. 2012

9. Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky,"Temporal envelope compensation for robust phoneme recognition using modulation spectrum", the Journal of the Acoustical Society of America, 2011

10. G. Sivaram, S. Nemala, N. Mesgarani, H. Hermansky, “Data-driven and feedback based spectro-temporal features for speech recognition”, Signal Processing Letters, IEEE, volume 17, issue 11, 2010

11. J. Pinto, G. Sivaram, M. Magimai.-Doss, H. Hermansky, and H. Bourlard, Analyzing MLP Based Hierarchical Phoneme Posterior Probability Estimator, IEEE Transactions on Audio, Speech, and Language Processing, 2010

12. S. Ganapathy, P. Motlicek and H. Hermansky, Autoregressive Models of Amplitude Modulations in Audio Compression in: IEEE Transactions on Audio, Speech, And Language Processing, 2010

13. P. Motlicek, S. Ganapathy, H. Hermansky and H. Garudadri, Wide-Band Audio Coding based on Frequency Domain Linear Prediction, in: EURASIP Journal on Audio Speech And Music Processing, Special Issue on, 2009

14. S. Ganapathy, S. Thomas, and H. Hermansky, "Modulation frequency features for phoneme recognition in noisy speech", the Journal of the Acoustical Society of America, Vol 125, No 1, pp EL8-EL12, 2009

15. W. Verhelst, J. Herre, G. Kubin, H. Hermansky, S. H. Jensen, Editorial, Special Issue in Antropomorphic Processing of Audio and Speech, EURASIP Journal on Applied Signal Processing 2005:9, 1289–1291, 2005 Hindawi Publishing Corporation

16. S. Thomas, S. Ganapathy and H. Hermansky, "Recognition Of Reverberant Speech Using Frequency Domain Linear Prediction", IEEE Signal Processing Letters, 2008.

17. N. Morgan, Q. Zhu, A. Stolcke, K. Sonmez, S. Sivadas, T. Shonozaki, M. Ostendorf, P. Jain, H. Hermansky, D. Gelbart, D. Ellis, G. Doddington, B. Chen, O. Cetin, H. Bourlard and M. Athineos, “Pushing the Envelope – Aside : Beyond the Spectral Envelope as the Fundamental Representation for Speech Recognition,” Invited Paper in the IEEE Signal Processing Magazine, 2005.

18. H. Hermansky and N. Morgan, “Show What You Know: Musings on the Reporting of Negative Results in Speech Recognition Research,” Invited Editorial Note in Journal of Negative Results in Speech and Audio Sciences, 2004.

19. H. H. Yang, S. Sharma, S. van Vuuren, H. Hermansky, “Relevance of Time-Frequency Features for Phonetic and Speaker/Channel Classification,” Speech Communication, August 2000.

20. N. Malayath, H. Hermansky, S. Kajarekar and B. Yegnanarayana, “Data-Driven Temporal Filters and Alternatives to GMM in Speaker Verification,” in Digital Signal Processing, Vol. 10, pp 55-74, 2000.

21. N. Kanedera, T. Arai, H. Hermansky and M. Pavel, “On the Relative Importance of Various Components of the Modulation Spectrum of Speech,” Speech Communication 28 (1), pp. 43-56, May 1999.

22. B. Yegnanarayana, C. Avendano, H. Hermansky and P. S. Murthy, “Speech Enhancement Using Linear Prediction Residual,” Speech Communication 28 (1), pp. 25-42, May 1999.

23. T. Arai, M. Pavel, H. Hermansky and C. Avendano, Syllable Intelligibility for Temporally-Filtered LPC Cepstral Trajectories, Journal of the Acoustical Society of America, (105), 5, pp. 2783-2791, May 1999.

24. H. Hermansky, “Should recognizers have ears?” in Speech Communication, vol. 25, num. 3-27, 1998.

25. H. Bourlard, H. Hermansky and N. Morgan, “Towards Increasing Speech Recognition Error Rates,” invited paper, Speech Communication, Vol 18 (4), May 1996.

26. H. Hermansky, “Robust Speech Recognition,” in Cole, Hirshman et al., “The Challenge of Spoken Language Systems, Research Directions for the Nineties,” IEEE Transactions on Speech and Audio Processing, Vol. 3, No. 1, pp. 1-21, January 1995.

27. H. Hermansky and N. Morgan, RASTA Processing of Speech, IEEE Transactions on Speech and Audio Processing, Vol. 2, No. 4, pp. 587-589, October 1994.

28. J. C. Junqua, H. Wakita and H. Hermansky, “Optimizing Perceptually Based ASR Front End,” IEEE Transactions on Speech and Audio Processing, Vol. 1, No. 1, pp. 39-49, January 1993.

29. H. Hermansky, “Perceptual Linear Predictive (PLP) Analysis of Speech,” Journal of the Acoustical Society of America, Vol. 87, 4, April 1990, pp. 1738-1752.

Book Chapters

1. Anemüller, J., B. Caputo, Hermansky, H., Ohl, F., Pajda, T, Pavel, M, Van Gool, L, Vogels, R, Wabnik, S, Weinshall, D. "DIRAC: Detection and identification of rare audio-visual events." Studies in Computational Intelligence 384: 3-35, Anemüller ,Weinshall, Van Gool, Eds., Springer 2012

2. F. Valente and H. Hermansky, “Data-driven extraction of spectral-dynamics based posteriors”, Handbook of Natural Language Processing and Machine Translation: DARPA Global Autonomous Language Exploitation, Olive, Christianson, McCary, Eds. Springer, March 2011.

3. H. Hermansky, “Data-driven extraction of temporal features from speech,” in Dynamics of Speech Production and Perception 5, P. Divenyi et al. (Eds.), IOS Press, 2006

4. H. Hermansky, “Speech and its processing,” in Language and Speech Engineering, M. Rajman (Ed.), EPFL Press, 2006.

5. C. Avendano, L. Deng, H. Hermansky and B. Gold, “Analysis and Representation of Speech,” in Speech Processing in the Auditory System, Greenberg and Aintsworth (Eds.), Springer 2004.

6. N. Morgan, H. Bourlard and H. Hermansky, “Automatic Speech Recognition: an Auditory Perspective,” in Speech Processing in the Auditory System, Greenberg and Aintsworth (Eds.), Springer 2004.

7. H. Hermansky and N. Morgan, “Automatic Speech Recognition,” in Encyclopedia of Cognitive Science, L. Nadel (Ed.), Nature Publishing Group, Macmilian Publishers, 2002.

8. H. Hermansky, “Modulation Spectrum in Speech Processing,” in Signal Analysis and Prediction, Prochazka, Uhlir, Rayner and Kingsbury (Eds.), Birkhauser Boston, 1998.

Peer Reviewed Papers in Conference Proceedings

1. Hynek Hermansky, Speech representation based on spectral dynamics, Proceedings of International Worskhop on Models and Analysis of Vocal emissions for Biomedical Applications, Firenze, Italy, December 2013

2. Hynek Hermansky: Long, Deep and Wide Artificial Neural Nets for Dealing with Unexpected Noise in Machine Recognition of Speech, Proc. Text, Speech and Dialogue 2013, Springer 2013

3. Hynek Hermansky, Jeff Ma, Bing Zhang, Spyros Matsoukas, Sri Harish Mallidi, Feipeng Li, Hynek Hermansky, Improvements in Language Identification on the RATS Noisy Speech Corpus, Proc. INTERSPEECH 2013

4. Sri Harish Mallidi, Sriram Ganapathy, Hynek Hermansky, Robust Speaker Recognition Using Spectro-Temporal Autoregressive Models, Proc. INTERSPEECH 2013

5. Thomas Schatz, Vijayaditya Peddinti, Francis Bach, Aren Jansen, Hynek Hermansky, Emmanuel Dupoux, Evaluating speech features with the Minimal-Pair ABX task: Analysis of the classical MFC/PLP pipeline, Proc. INTERSPEECH 2013