Poster Presentation

Search Abstracts | Symposia | Slide Sessions | Poster Sessions

Cortical temporal dynamics of tracking linguistic and paralinguistic elements

Poster Session A, Friday, September 12, 11:00 am - 12:30 pm, Field House
This poster is part of the Sandbox Series.

Benjamin Lang1, Jeffrey Xing1, Lauren Ostrowski1, Kate Urrutia1, Mingxiong Huang1,2, Timothy Gentner1; 1University of California, San Diego, 2VA San Diego Healthcare System

Emerging evidence suggests temporal neurodynamics track a broad spectrum of linguistic features, including acoustic-phonetic, syntactic, lexico-semantic, and prosodic elements (Gnanateja et al., 2025; Mai et al., 2024; Gwilliams et al., 2024, 2022; Bhaya-Grossman and Chang, 2021; Law and Pylkkänen, 2021) across multiple, hierarchically organized timescales (Gwilliams et al. 2022, 2024). While the neurodynamics of linguistic processing are becoming increasingly well characterized, it remains unclear how the brain encodes and tracks paralinguistic elements such as speaker emotion. Paralinguistic information varies temporally with speech, relies on the same overlapping acoustic elements such as pitch and voice quality (Ní Chasaide and Gobl, 2004), and provides social and affective context that contributes to a speaker’s intended meaning (Guyer et al., 2021). We hypothesize that paralinguistic and linguistic information are jointly encoded within a unified hierarchical structure, with paralinguistic processing unfolding at similar timescales and in parallel with linguistic processing. We investigate this hypothesis with a natural listening task using magnetoencephalography (MEG). Stimuli consist of 90 speech excerpts (30-90 seconds each) from the Buckeye corpus (Pitt et al. 2007), first selected based on computational estimates of emotional valence diversity (Kounadis-Bastian et al. 2024), and subsequently evaluated by naïve human raters (N = 156), who rated speaker valence continuously over time from most positive to most negative. Raters also rated speech excerpts in which semantic content was degraded by low-pass filtering and in which prosodic information was abolished (reading time-locked transcripts). Excerpt conditions were presented between-subjects. Preliminary results indicate strong inter-rater reliability of valence ratings (mean standardized Cronbach’s α = ~0.86). As expected, perception of speaker emotion was modulated by the availability of acoustic and lexical content, where all excerpts show significant condition differences in average time-resolved valence ratings above condition-shuffled controls. If speaker emotion exhibits similar cortical tracking to linguistic elements, we predict that the continuous valence rating for each speech excerpt can be uniquely decoded from whole-head MEG sensor data independent of linguistic elements (King et al., 2020). We also expect that the temporal generalization (King and Dehaene, 2014) of valence rating decoding shows larger significant temporal windows than linguistic elements. Full-speech valence ratings should be significantly decoded when either semantic- or prosodic-degraded excerpts are presented, and decoding performance should depend on the strength of the valence ratings, their divergence across conditions, and specific linguistic features present in the signal. We will also examine the time-varying cortical structures underlying both linguistic and paralinguistic processing by projecting relevant feature vectors to constrain MEG source imaging onto signal covariates (Huang et al., 2024) to understand how linguistic and paralinguistic processing interact across spatio-temporal loci.

Topic Areas: Speech Perception, Prosody

SNL Account Login


Forgot Password?
Create an Account