Search Abstracts | Symposia | Slide Sessions | Poster Sessions
Neural tracking of semantic and rhythmic information under temporal compression
Poster Session C, Saturday, September 13, 11:00 am - 12:30 pm, Field House
Zhengwu Ma1, Qimiao Gao, Nan Wang, Jixing Li; 1City University of Hong Kong
Introduction. Understanding how the human brain processes time-compressed auditory input is fundamental to revealing the limits and flexibility of temporal encoding across different cognitive domains. While both speech and music rely on temporally structured sequences—semantic and syntactic in speech, rhythmic and harmonic in music—the extent to which the brain can preserve and integrate these representations under temporal compression remains unclear. Recent advances in large-scale language and music models offer a novel opportunity to probe the encoding of semantic and rhythmic information in the brain. By aligning neural signals with model-derived embeddings, it is now possible to quantify how representational fidelity changes under different listening conditions. In this study, we leverage state-of-the-art neural language and music models to extract high-dimensional representations of words and music beats from AI-synthesized Mandarin songs. We then examine how these embeddings are tracked by EEG responses under varying levels of temporal compression. This approach enables a fine-grained comparison of speech and music processing limits, providing new insights into the neural basis of auditory comprehension under time pressure. Methods. We recorded 256-channel EEG from 36 right-handed native Mandarin speakers (19 females, mean age = 24.8 years, SD = 6.9), all undergraduate or graduate students in Shanghai with no self-reported neurological disorders. Participants passively listened to a 2-minute AI-synthesized Mandarin song presented at four playback speeds (4×, 3×, 2×, and 1×, in that order). For each word and music beat in the stimuli, we extracted semantic and rhythmic embeddings using a large language model (LLaMA3-8B; Grattafiori et al., 2024) and a large music model (MusicLM; Agostinelli et al., 2023), respectively. Source-localized EEG epochs were then extracted from –100 ms to 500 ms relative to the offset of each word and beat at each speed condition. For each subject, we regressed word and beat embeddings against EEG signals sampled every 10 ms throughout the epoch, across all playback speeds. The regression was performed for each source within a bilateral language network mask. To assess the impact of temporal compression on speech and music processing, we compared the coefficient of determination (R²) at the faster speeds (2×, 3×, and 4×) against the baseline (1×) condition. Significant spatiotemporal clusters for group-level speech condition contrasts were identified using a cluster-based permutation t-test (Maris & Oostenveld, 2007) with 10,000 permutations. Results. For speech processing, we observed significantly reduced activity in the left anterior temporal lobe (LATL) at 2×, 3×, and 4× playback speeds compared to the original (1×) speed. A similar pattern was found for music processing at 2× and 3× speeds, with decreased LATL activity. However, at 4× speed, music processing elicited a distinct significant cluster in the superior temporal regions relative to the 1× condition. These findings suggest reduced semantic comprehension and integrative processing for both speech and music at moderately compressed rates (2× and 3×). At the highest compression (4×), music processing appears to engage more low-level auditory regions, potentially reflecting diminished processing of acoustic detail.
Topic Areas: Computational Approaches, Speech Perception