Poster Presentation

Search Abstracts | Symposia | Slide Sessions | Poster Sessions

When smaller language models align better with the brain: Is the scaling law universal?

Poster Session A, Friday, September 12, 11:00 am - 12:30 pm, Field House

Shaoyun Yu1, Ping Li1; 1The Hong Kong Polytechnic University

Recent studies suggest that bigger language models appear to more closely mirror our linguistic brains. The success of large language models (LLMs) has been significantly driven by the scaling law, which states that models’ language capabilities increase with model size (Kaplan et al., 2020). This scaling law has been generalized to the domain of model-brain alignment. Several studies report that larger LLMs, often with billions of parameters, consistently produce better predictions of neural activity during language comprehension (Antonello et al., 2023; Hong et al., 2024). At the same time, AI researchers have shown growing interest in small language models (SLMs), motivated by considerations of computational and resource efficiency (Nguyen et al., 2024). Given that the human brain is a remarkably energy-efficient system compared to LLMs (Mehonic & Kenyon, 2022), an open question arises: can smaller, more efficient, and human-learning-based models generate brain-like representations that rival those of their larger counterparts? The present study systematically assesses the impact of model size by leveraging ALBERT, a family of open-source SLMs designed for efficiency (Lan et al., 2020). ALBERT includes four variant models, base, large, xlarge, and xxlarge, with parameter sizes ranging from 12M to 235M. These relatively small models exhibit progressively higher language performance, comparable or even superior to heavier models such as BERT-large (size=334M). In a previous model-brain alignment study, the ALBERT base model surprisingly ranked among the top models (only behind the larger GPT-2 models) in predicting ECoG recordings from five participants (Schrimpf et al., 2021). To further investigate how ALBERT’s size relates to its neural prediction performance, we utilized Reading Brain, a large-scale fMRI and eye-tracking dataset about naturalistic reading, collected from fifty-two native English speakers (Li et al., 2022). For each ALBERT variant, we trained encoding models to predict the participants’ fMRI signals from model embeddings. Model-brain alignment was evaluated as the correlation between the predicted and actual fMRI time series. The results indicate a positive answer to our research question. Although the size of the base, large, xlarge, and xxlarge models increases from 12M, 18M, 60M to 235M parameters, their corresponding brain alignment scores followed a negative trend, with the average correlations being .085, .086, .080, and .082 for the whole cortex, and .092, .091, .084, and .087 within the language network (Fedorenko et al., 2024). Particularly striking is the finding that, despite having nearly 20 times fewer parameters, the base model significantly outperformed the xxlarge model in brain alignment (p=.042, whole cortex; p= .024, language network). The case of ALBERT suggests that efficient models with as few as 12M parameters not only can perform well linguistically (Lan et al., 2020) but also demonstrate decent brain alignment. Our findings imply that the scaling law may not necessarily be universal when it comes to model-brain alignment. In line with recent studies (Yu et al., 2024; Aw & Toneva, 2023), we suggest that model size is not the only path toward achieving brain-like representations, and other factors, including model efficiency and potential human-learning like mechanisms, will be crucial.

Topic Areas: Computational Approaches, Reading

SNL Account Login


Forgot Password?
Create an Account