A multi-resolution envelope-power based model for speech intelligibility

Søren Jørgensen; Stephan D Ewert; Torsten Dau

doi:10.1121/1.4807563

A multi-resolution envelope-power based model for speech intelligibility

J Acoust Soc Am. 2013 Jul;134(1):436-46. doi: 10.1121/1.4807563.

Authors

Søren Jørgensen¹, Stephan D Ewert, Torsten Dau

Affiliation

¹ Centre for Applied Hearing Research, Department of Electrical Engineering, Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark. sjor@elektro.dtu.dk

PMID: 23862819
DOI: 10.1121/1.4807563

Abstract

The speech-based envelope power spectrum model (sEPSM) presented by Jørgensen and Dau [(2011). J. Acoust. Soc. Am. 130, 1475-1487] estimates the envelope power signal-to-noise ratio (SNRenv) after modulation-frequency selective processing. Changes in this metric were shown to account well for changes of speech intelligibility for normal-hearing listeners in conditions with additive stationary noise, reverberation, and nonlinear processing with spectral subtraction. In the latter condition, the standardized speech transmission index [(2003). IEC 60268-16] fails. However, the sEPSM is limited to conditions with stationary interferers, due to the long-term integration of the envelope power, and cannot account for increased intelligibility typically obtained with fluctuating maskers. Here, a multi-resolution version of the sEPSM is presented where the SNRenv is estimated in temporal segments with a modulation-filter dependent duration. The multi-resolution sEPSM is demonstrated to account for intelligibility obtained in conditions with stationary and fluctuating interferers, and noisy speech distorted by reverberation or spectral subtraction. The results support the hypothesis that the SNRenv is a powerful objective metric for speech intelligibility prediction.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Acoustic Stimulation
Adult
Humans
Male
Models, Theoretical
Nonlinear Dynamics
Perceptual Masking*
Psychoacoustics
Social Environment
Sound Spectrography*
Speech Acoustics*
Speech Perception*