Search Abstracts | Symposia | Slide Sessions | Poster Sessions
Slide Session A
Friday, September 12, 1:45 - 2:30 pm, Elstad Auditorium
Talk 1: Seeing What’s Said: Divergent Effects of Visual Input on Acoustic and Semantic Speech Processing in Monolinguals and Bilinguals
Haoyin Xu1, Seana Coulson1; 1Department of Cognitive Science, University of California, San Diego
Successful speech comprehension often relies not only on auditory input but also on visual cues such as facial expressions and gestures. While the role of visual context in speech processing is well documented in monolinguals, its impact on bilingual listeners—especially in naturalistic settings—remains less understood. Given the increased cognitive demands and variability in L2 fluency among bilinguals, it is unclear whether visual input imposes additional cognitive load or facilitates comprehension by providing nonverbal cues such as gestures and mouth movements. In this study, we employed temporal response function (TRF) modeling to examine how visual context influences neural speech processing in monolingual and bilingual adults, focusing on both acoustic and semantic representations. Using EEG and TRF modeling, we analyzed participants' tracking of two key features of continuous speech: the amplitude envelope (acoustic tracking) and lexical surprisal (semantic tracking). EEG data were collected from 24 monolingual English speakers and 24 Mandarin-English bilinguals as they watched TED Talk excerpts presented in both audio-only (AO) and audiovisual (AV) formats. We applied backward modeling using the multivariate TRF (mTRF) toolbox to reconstruct the speech envelope and lexical surprisal from the EEG data. Decoding accuracy was interpreted as an index of how strongly each speech feature was neurally tracked under each condition. Our results revealed a clear dissociation in tracking performance across speaker groups. Bilingual participants showed significantly improved envelope tracking in the AV condition, suggesting that visual cues enhanced low-level auditory processing (Language experience [Bilingual] × Condition [AV]: estimate = 0.72, CI = [0.01, 1.43], p = 0.046). However, they did not exhibit improved surprisal tracking with visual input; instead, their semantic tracking was strongly predicted by English fluency (Language fluency [English]: estimate = 0.05, CI = [0.01, 0.08], p = 0.009). In contrast, monolingual participants demonstrated significantly enhanced surprisal tracking in the AV condition (Condition [AV]: estimate = 0.70, CI = [0.11, 1.30], p = 0.021), suggesting that visual input facilitated higher-level semantic processing, though they did not benefit in envelope tracking. Linear mixed-effects models confirmed these interactions between language experience and stimulus condition across both measures. These findings suggest that visual context modulates speech processing differently based on language background. For bilinguals, visual input may support early auditory processing, possibly as a compensatory mechanism for heightened perceptual demands in a second language. However, this benefit appears limited to acoustic features, potentially due to cognitive resource trade-offs that constrain semantic integration. Monolinguals, by contrast, seem to leverage visual input to enrich semantic predictions, highlighting the role of language proficiency in multimodal integration. Taken together, these findings demonstrate that language experience shapes distinct strategies for integrating visual information during speech comprehension. Building on this work, future research should further explore how the dynamics of audiovisual speech integration vary with language dominance and proficiency.
Talk 2: Seeing Speech in a New Light: An MEG Study on Augmenting Speech Performance using Rapid Invisible Frequency Tagging (RIFT)
Charlie Reynolds1, Yali Pan1, Ana Pesquita1, Ole Jensen2, Katrien Segaert1, Hyojin Park1; 1University of Birmingham, 2University of Oxford
In challenging listening environments, visual cues such as lip movements can enhance speech comprehension. Here, we hypothesise that the external modulation of visual speech signals using non-invasive rhythmic stimulation can harnessed to improve speech understanding. We directly tested the hypothesis using a novel paradigm using Rapid Invisible Frequency Tagging (RIFT). RIFT is a technique that modulates visual stimuli at specific frequencies below the threshold of conscious perception to influence neural processing. We manipulated visual speech signals - using RIFT - to influence the integration of visual and auditory information, and measured brain responses using magnetoencephalography (MEG) alongside speech comprehension performance. 40 participants viewed naturalistic speech videos under dichotic listening conditions. One ear was presented with speech that matched the visual speech information (task relevant) while the other was presented with speech that did not (task irrelevant). Both streams of auditory speech were tagged at 40Hz. The visual flicker (55Hz) was implemented on the area whereby participants derive visual speech information: the speaker’s mouth and was modulated by either task relevant or irrelevant speech amplitude envelopes. When modulated by relevant speech information, RIFT significantly enhanced performance in behavioural measures of speech comprehension. The MEG results showed significant effects of auditory and visual tagging in their respective sensory cortices across all experimental conditions. The visual tagging response was significantly stronger when the amplitude was modulated by relevant speech. This stronger tagging response predicted speech comprehension performance. These results suggest that modulating visual input with relevant auditory speech rhythms can facilitate the excitability of visual cortex perhaps leading to enhanced crossmodal integration. Non-invasive sensory stimulation through RIFT may therefore serve as a promising tool for improving speech intelligibility in complex listening environments with multiple competing speakers, particularly for populations such as older adults, individuals with hearing impairments, or those with auditory processing disorders.
Talk 3: Interbrain Synchrony as a Potential Biomarker of Communicative Success in Aphasia: Preliminary Evidence from fNIRS Hyperscanning
Grace Magee1, James Colwell1, Joseph Licata1, Shannon Kelley1, Berk Tellioglu2, David Boas1,3, Swathi Kiran2,4, Maria Varkanitsa2,4, Erin Meier5, Meryem Yucel1,3; 1Department of Biomedical Engineering, Boston University, Boston, MA, 2Center for Brain Recovery, Boston University, Boston, MA, 3Neurophotonics Center, Boston University, Boston, MA, 4Department of Speech, Language, and Hearing Sciences, Boston University, Boston, MA, 5Department of Communication Sciences and Disorders, Northeastern University, Boston, MA
Introduction Communication is a two-way process, yet aphasia research neglects the interaction dynamics between people with aphasia (PWA) and their communication partners. Hyperscanning research indicates that greater communicative success coincides with greater interbrain synchrony (IBS) in the prefrontal cortex (PFC) and temporoparietal junction (TPJ) between healthy dyads[1,2] and that IBS varies as a function of intersubject interactivity and familiarity/closeness[3]. To assess whether IBS is a viable biomarker of communicative success, we examined IBS across language tasks with different levels of mutual interaction and alignment in dyads including healthy adults and an individual with aphasia. Methods In Experiment 1, 32 young adults (16 dyads) completed four tasks that varied in the type of expected alignment and coupling: Independent Reading of story passages (low/spurious coupling), Joint Reading of story passages (entrainment-based coupling), Joint Singing of familiar songs (entrainment-based coupling), and (4) Picture Guess, involving partners taking turns asking questions/guessing depicted objects on the other’s cards (semantic alignment-based coupling). Each task contained two 120s blocks interleaved with 30s rest. In Experiment 2, one individual with anomic aphasia completed the same tasks with a familiar, trained speech-language pathologist (dyad 1) and a non-clinician stranger (dyad 2). fNIRS data were collected from 16 sources and 16 detectors per cap arranged over bilateral inferior and middle frontal gyri (IFG/MFG), dorsolateral (dl) PFC, TPJ, and supramarginal gyrus (SMG). Standard preprocessing steps were followed in Homer3[4]. [4] IBS was determined by computing wavelet coherence (R2 values)[5] between homologous brain regions in each dyad. Results In Experiment 1, Joint Singing IBS was significantly greater than Joint Reading IBS in bilateral MFG (p<0.01) and TPJ (p<0.01) and Picture Guess IBS in bilateral MFG (p<0.05), dlPFC (p<0.001), and TPJ (p<0.001). Bilateral SMG IBS was greater during Picture Guess than Joint Reading (p<0.01). IBS in IFG did not vary by task (p>0.05). In Experiment 2, IBS increased from block 1 to block 2 across most tasks and regions for dyad 1 but decreased over time for dyad 2. Overall, across both blocks IBS tended to be lower across regions for the entire time series for dyad 1 than dyad 2 for Joint Singing, Joint Reading, and Picture Guess, but the opposite trend was observed for Independent Reading. Conclusions As expected, IBS was greatest between young adults in prefrontal regions and TPJ for Joint Singing, a task that stimulated entrainment (like Joint Reading) but also capitalized on stimulus familiarity and melody to promote alignment. Opposite trends in increasing/decreasing IBS across interactive tasks for the PWA/clinician versus PWA/stranger dyads may be due to dyadic differences in leader/follower dynamics[6]. In dyad 1, the clinician acted as follower, modifying his speech rate/style to match the patient’s with increasing behavioral alignment over time. In contrast, in dyad 2, the PWA acted as follower, trying to (unsuccessfully) keep up with the stranger (leader) who did not modify his speech rate/style, perhaps resulting in increasing disconnect over time. If replicated across additional dyads involving PWA, these findings have important implications for both aphasia care and two-person neuroscience.