Poster Presentation

Search Abstracts | Symposia | Slide Sessions | Poster Sessions

Active Use of Latent Tree-structured Sentence Representation in Humans and Large Language Models

Poster E73 in Poster Session E, Sunday, September 14, 11:00 am - 12:30 pm, Field House

Wei Liu¹, Ming Xiang², Nai Ding¹; ¹Zhejiang University, ²The University of Chicago

Understanding how sentences are internally represented in the human brain, as well as in large language models (LLMs) such as ChatGPT, is a major challenge for cognitive science. Classic linguistic theories propose that the brain encodes a sentence using a tree-structured representation, e.g., based on hierarchically organized constituents. In contrast, LLMs do not explicitly parse linguistic constituents while their latent sentence representations remain poorly explained. Here, we develop a method to reveal that humans and LLMs construct similar tree-structured latent representations of sentences by analyzing their rule inferencing behavior. The method is based on a novel one-shot learning task, in which the participants learn to delete a word string from a sentence based on a single demonstration. In the task, we always delete a constituent in the demonstration but, with just one demonstration, the underlying deletion rule is highly ambiguous – The participants may infer the rule based on, e.g., the ordinal position, semantic, or syntactic properties of individual words. Alternatively, if the participants internally encoded the sentence using a tree-structured representation, they may rely on such representations to make inference, e.g., whether the deleted word string is a constituent. We apply the method to 372 human participants and 5 recently developed LLMs, and find both humans and LLMs tend to delete a constituent, instead of a nonconstituent word string, and the inferred constituent deletion rule differs between languages, i.e., Chinese and English. These phenomena are not explained by models that only have access to word properties and word positions. Based on the word deletion behavior, we can reconstruct the latent tree structure of a sentence for both humans and LLMs, and the reconstructed constituency tree is partly consistent with the constituency tree informed by linguistic analyses. Altogether, these results strongly support that both human participants and LLMs can develop a tree-structured latent representation for sentences and actively use it in a language inference task. The data-driven methods proposed here can also be applied to characterize the internal representation of sentence for individuals or special populations in the future.

Topic Areas: Syntax and Combinatorial Semantics, Methods

SNL Account Login

News

2025 Membership is Open - Renew Now!

Meeting Registration is Open.

Abstract Submissions are Closed.

Symposium Submissions are Closed.

See Dates & Deadlines for other important dates.