Poster Presentation

Search Abstracts | Symposia | Slide Sessions | Poster Sessions

MEG–fMRI Integration in Naturalistic Speech Comprehension for High Spatial and Temporal Resolution

Poster Session E, Sunday, September 14, 11:00 am - 12:30 pm, Field House
This poster is part of the Sandbox Series.

Beige Jerry Jin1, Leila Wehbe1; 1Carnegie Mellon University

Speech comprehension is a complex process that recruits multiple subprocesses – including lexical access, syntactic parsing, and semantic integration – that unfold on the order of milliseconds yet are distributed across cortical regions. Researchers have traditionally studied speech comprehension using various neuroimaging techniques. For example, MEG affords high temporal resolution, whereas fMRI provides high spatial resolution. However, a high-fidelity portrait that preserves both resolutions remains elusive. We present a transformer‑based encoding framework that combines MEG and fMRI within a naturalistic language experiment to estimate latent brain source responses with high spatiotemporal resolution. We collected whole‑head MEG while participants passively listened to more than 10 hours of “Moth Radio Hour” stories. A participant‑specific 3 T fMRI scan was also obtained for repeats of one anchor story. To obtain fMRI responses for the remaining stories, we projected a public dataset collected on identical stimuli (LeBel et al.) onto each participant’s cortical surface via surface‑based alignment and vertex‑wise ridge regression. We built a transformer-based encoding model whose output represents a fixed set of cortical sources. A sliding‑window self‑attention enables the encoding model to leverage the preceding 10s of stimulus features, including semantic embeddings, phonemes, and acoustic spectra. Predicted source activations are (i) mapped to MEG sensors through each subject’s lead‑field matrix and (ii) convolved with a canonical hemodynamic response to yield fMRI predictions. The network is trained end‑to‑end with a joint MEG  and fMRI loss. On held‑out stories, our model performs on par with single‑modality baselines while simultaneously offering high spatial and temporal resolution. We validate our model with simulation experiments that show more accurate recovery of source locations than classical minimum‑norm estimates. When we train our model with real data, the model learns time‑locked source patterns following word onset that are a function of the word’s contextualized features. Thus, by combining the power of naturalistic experiments, MEG, and fMRI with a transformer-based encoding model, we propose a practical route to millisecond‑and‑millimetre brain mapping. This framework opens new avenues for non‑invasively probing the dynamics of language and cognition without sacrificing either spatial or temporal fidelity, and it will be released as a package along with the MEG dataset.

Topic Areas: Computational Approaches, Speech Perception

SNL Account Login


Forgot Password?
Create an Account