Search Abstracts | Symposia | Slide Sessions | Poster Sessions
Low-Rank Tensor Encoding Models Disentangle Natural Speech Comprehension Processes
Poster Session E, Sunday, September 14, 11:00 am - 12:30 pm, Field House
Lane Lewis1, Leila Wehbe, Xaq Pitkow; 1Carnegie Mellon University
Introduction: The recent emergence of LLM encoding models promises a new avenue to discover and characterize rich semantic information in the brain, yet interpretable methods for linking information in LLMs to language processing over time are limited. The non-interpretability of these methods — and the fact that most experiments using them employ uncontrolled stimuli — have led to criticism of this approach as potentially reflecting low-level processing instead of high-level semantic or syntactic computations. In this work, we develop a low-rank tensor regression method to decompose LLM encoding models into interpretable components of semantic features, time course, and brain region activation, and use this method to uncover important components related to speech comprehension from a Magnetoencephalography (MEG) dataset, where subjects listened to narrative stories. We use this approach to disentangle low-level and high-level language features and estimate their respective contributions to the recorded brain activity. Methods: To investigate the effects of language over time, we use a time delay encoding model with word embeddings from a large language model as inputs (Llama2-7b, 20 word context window), as well as low-level control variables such as the word onset times and the audio spectrogram. This input structure, combined with an output prediction over channels, yields a natural 3-D regression weight tensor over time, language features, and MEG sensors. In our method, we decompose this tensor using a low-rank CP approximation where the weight tensor is a sum of rank-1 tensor components. To fit, we use stochastic gradient descent with an MSE reconstruction loss and a ridge penalty. In order to interpret the language drivers of each factor, we compute the maximal driving natural language inputs for each embedding tensor component. Instead of using a generic beam search, which generates sequences with high linguistic probability, we introduce a new procedure in which we sample a sequence in the reverse order, starting with the most recent word which contributes most to the MEG activity. Results: Using our method, we show improved performance compared to a standard ridge regression encoding model when utilizing only a few factors, demonstrating that our low-dimensional constraint leads to more accurate characterization of MEG brain responses. Our model accounts for low level aspects of the speech input which are known to affect MEG activity (word onset, phrase boundaries, and audio signals), and is able to disentangle them from factors that are smaller in magnitude, but more complex (semantic attributes related to concepts such as pronouns, motion, emotions, etc). Our model also reveals the time-course and spatial locations of these factors. Conclusions: In this work, we propose a novel low-rank tensor encoding model to extract and interpret complex semantic processing of language during naturalistic speech. This method demonstrates greater performance over ridge-regression encoding models while capturing more interpretable language computation, making it a powerful tool for exploratory language neuroscience. Crucially, our work addresses one of the main criticisms of naturalistic language experimentation — the correlation of different variables — and separates confounding components from the high-level components of interest.
Topic Areas: Speech Perception, Computational Approaches