Search Abstracts | Symposia | Slide Sessions | Poster Sessions
Cochlear-scaled entropy predicts robust neural coding of speech envelopes
Poster Session D, Saturday, September 13, 5:00 - 6:30 pm, Field House
Baorian Nuchged1, Christian Stilp2, Keith Kluender3, Fernando Llanos1; 1The University of Texas at Austin, 2Marquette University, 3Purdue University
During processing of connected speech, such as sentences or spoken narratives, low-frequency cortical oscillations synchronize in phase with amplitude fluctuations in the broadband speech envelope. The degree of synchronization between brain and envelope oscillations is modulated by higher-order cognitive factors, such as attention and memory, independently of the acoustic signal. This suggests that real-time encoding of speech envelopes reflects more than mere sensory transduction of fluctuations in signal amplitude. Previous research has proposed that neural tracking of the speech envelope reflects the parsing of speech primitives, such as phonemes, syllables, or peaks in the envelope derivative (rapid envelope changes, e.g., Oganian & Chang, 2019). Here, we propose neural coding of temporal speech patterns is better predicted by spectrally-local changes in relative energy within the temporal fine structure–operationalized by cochlear-scaled entropy (CSE). This metric reflects the fact that sensorineural systems respond predominantly to change. Notably, prior research (Stilp & Kluender, 2010) has shown that speech intervals with higher CSE are more critical for intelligibility than those with lower CSE. To evaluate our hypothesis, we analyzed EEG data (32 channels) from 15 native English speakers as they listened to 36 randomized repetitions of 30 sentences. Each sentence was segmented into non-overlapping time frames ranked from high to low CSE. First, we tested whether amplitude modulation is more robustly tracked during the presentation of high-CSE frames, low-CSE frames, or peak-derivative frames (time frames conveying greater and positive envelope changes per time unit). Neural tracking was assessed by correlating the speech envelopes of high-CSE, low-CSE, or peak-derivative frames with the low-frequency EEG oscillations evoked during their presentation. High-CSE frames elicited stronger neural tracking than both low-CSE and peak-derivative frames (linear mixed-effects: t-ratios > 17.24, ps < .001). Next, we examined the neural tracking of envelope signals reconstructed via compressed sensing from sparse temporal representations of the original sentence envelopes. Reconstructions were derived from small percentages (20%, 35%, and 50%) of high-CSE, low-CSE, peak-derivative, or randomly selected time points. We found that reconstructed envelopes based on high-CSE frames elicited significantly stronger neural tracking than those derived from the other conditions (linear mixed-effects: t-ratios > 15.73, ps < .001). Together, our findings demonstrate that spectrally-local changes in relative energy of speech can drive robust cortical encoding of speech envelopes. These results support a model of temporal sparse coding in which the brain selectively enhances the processing of speech segments that convey greater informational value. By prioritizing high-CSE segments—those most critical for intelligibility—this encoding strategy enables listeners to maximize neural encoding bandwidth while minimizing intelligibility loss. References: Stilp, C. E., & Kluender, K. R. (2010). Cochlea-scaled entropy, not consonants, vowels, or time, best predicts speech intelligibility. Proceedings of the National Academy of Sciences, 107(27), 12387-12392. Oganian, Y., & Chang, E. F. (2019). A speech envelope landmark for syllable encoding in human superior temporal gyrus. Science advances, 5(11)
Topic Areas: Speech Perception, Computational Approaches