Search Abstracts | Symposia | Slide Sessions | Poster Sessions
First steps towards simulating aphasia in the EARSHOT model of human speech processing
Poster Session C, Saturday, September 13, 11:00 am - 12:30 pm, Field House
This poster is part of the Sandbox Series.
Ihintza Malharin1,2, M. Belén Saavedra2,5, Linkai Peng3, Simona Mancini1,4, James S. Magnuson1,3,4; 1BCBL. Basque Center on Cognition, Brain and Language, 2University of the Basque Country UPV/EHU, 3University of Connecticut. Storrs, CT, USA, 4Ikerbasque. Basque Foundation for Science, Bilbao, Spain, 5Université de Lorraine, France
EARSHOT is a model of human speech processing that learns to map real speech to semantic representations (Magnuson et al., 2020). We report first steps towards simulating aspects of aphasia with EARSHOT by damaging increasing proportions of randomly selected weights in different model layers. Although the model is purely receptive, we simulated naming/identification tasks by presenting spoken words to damaged models and measuring the cosine similarity of the output to every word in the lexicon. We operationalized the model's naming/identification response as the word with highest cosine similarity. For errors, we evaluated phonetic and semantic similarity of the response to the target vector. We were interested in how robust EARSHOT would be to damage, and whether we might observe systematic patterns of phonetic vs. semantic deficits following damage to different model components. Method: We trained the model on 1000 high-frequency English words spoken by one talker. The inputs were 256-channel spectral slices. The model’s task was to activate the defined semantic vector (SkipGram) for the current target word at each frame. The model has 512 hidden nodes (long short-term memory nodes, or LSTMs), and 300 semantic output nodes. As each word was presented, we identified the word with peak cosine similarity to the output at any time step, and operationalized this as the model's response. The undamaged model reached 95% accuracy. We randomly masked progressively larger proportions of connections in 3 different model layers to emulate damage. We tested damaged models by presenting all 1000 words and assessing both accuracy and error patterns by measuring semantic (cosine) similarity and phonemic similarity (1 - Levenshtein distance normalized by the length of the longest word in a pair) of responses and targets. Results: At all levels of damage for all three locations, we observed mainly mixed errors: items with fairly low phonetic similarity and modest semantic similarity, as well as fewer items with high phonetic and semantic similarity. Relatively pure phonetic errors occurred at all damage locations, but were most prevalent for input-to-hidden damage. Relatively pure semantic errors were rarer but occurred at all damage locations. As we increased damage, the mixture of error types was fairly stable for input-to-hidden damage, but at other locations, errors became less similar to targets phonetically. Conclusions: We can consider the input-to-hidden transformation a primarily encoding step, where EARSHOT learns to convert spectral inputs to a complex distributed code with significant phonetic structure (despite not being trained on phonetic targets; see Magnuson et al., 2020), and the hidden-to-output step as a primarily decoding step, where the intermediate hidden code is transformed to a semantic representation. We might expect the hidden-to-hidden layer to play a combined or intermediate role, but in fact it patterns with the hidden-to-output layer. This division of labor is apparent in the persistence of phonetic errors at higher levels of damage only for the input-to-hidden layer. Our next focus will be examining to what degree these patterns relate to patient data.
Topic Areas: Computational Approaches, Disorders: Acquired