Search Abstracts | Symposia | Slide Sessions | Poster Sessions
Pre-processing of speech in the brain
Poster Session A, Friday, September 12, 11:00 am - 12:30 pm, Field House
This poster is part of the Sandbox Series.
Galit Agmon1; 1Bar-Ilan University
Spontaneous speech presents unique challenges that are often absent from the carefully controlled linguistic stimuli typically used in laboratory settings. Produced "on the fly," unscripted and unedited, spontaneous speech is marked by pauses, fillers, and incomplete syntactic structures. Traditionally, such features have been dismissed as performance errors and largely overlooked in the study of linguistic competence. Yet disfluencies are not rare exceptions – they are a pervasive characteristic of everyday language use, highlighting the dramatic gap between idealized models of linguistic competence and the realities of natural speech. Despite the apparent "messiness" of spontaneous speech, everyday auditory communication occurs with remarkable ease, raising a fundamental question in the neuroscience of language: How does the brain overcome disfluent input? The brain must actively contend with these disfluencies during real-time processing – but how? In this “sandbox presentation”, I propose the framework of "speech pre-processing," which suggests that auditory brain regions help organize the disfluent input into a cleaner, more structured version – removing disfluencies and segmenting speech into sentences – before higher-level linguistic analysis takes place. I will present two projects: One that has been published (Agmon et al., 2023), and another that is currently an ongoing long-term project in my lab. In Agmon et al. (2023), we used EEG to analyze temporal response functions (TRFs) to spontaneous speech in Hebrew. Assessing separate TRFs for disfluent and fluent segments, we found reduced TRF responses to disfluencies, possibly reflecting early suppression of disfluent input within auditory regions. We also observed that sentence boundaries increased the latency of the speech TRF. Taken together, a possible explanation is that disfluencies and sentence boundaries have a unique prosodic profile. Indeed, prosodic cues such as pauses and lengthening can disambiguate syntactic structures or convey hesitation. However, it remains unclear whether the observed effects are purely a bottom-up response to low-level acoustic cues, or whether they reflect top-down direction of auditory attention guided by lexical and syntactic information. To better interpret the findings of Agmon et al. (2023), we aim to identify the role of prosody versus syntax in the processing of spontaneous speech, which is the goal of the second project in progress. In my lab, we are currently annotating corpora of spontaneous speech, marking disfluencies and sentence boundaries to advance two lines of research. First, using fMRI, we aim to localize brain responses to disfluencies, predicting reduced activation in language-related regions, possibly already in auditory regions. Second, leveraging large language models (LLMs), we are training two separate models to predict sentence boundaries based on either lexico-syntactic input or prosodic input alone. By generating regressors from these predictions, we will disentangle the contributions of acoustic versus lexico-syntactic cues to speech segmentation. Finally, looking ahead, I will present evidence of critical gaps in current LLMs' handling of spontaneous speech, and advocate for the development of NLP tools that explicitly account for speech pre-processing during training, enabling more reliable modeling of real-world language use.
Topic Areas: Speech Perception, Prosody