Search Abstracts | Symposia | Slide Sessions | Poster Sessions
Decoding phonetic cue weighting from electroencephalography responses to naturalistic speech
Poster Session D, Saturday, September 13, 5:00 - 6:30 pm, Field House
This poster is part of the Sandbox Series.
Chiung-Yu Chang1, Lisa D. Sanders1; 1University of Massachusetts Amherst
Although multiple acoustic cues influence listeners’ phonetic categorization, their relative impact is often unequal. Inappropriate perceptual weighting of acoustic cues can cause second language learners’ challenges in phonetic categorization. For example, past research has suggested that Japanese learners’ difficulty in discriminating American English /r/ and /l/ result from their overweighting second formant (F2) onset and underweighting third formant (F3) onset (Iverson et al., 2003). Thus, assessing learners’ cue weighting is critical to develop more targeted and effective phonetic training. The current behavioral methods of estimating cue weighting, however, are subject to variabilities in post-perceptual processes. To circumvent overt responses, we propose a novel method based on representational similarity analysis of electroencephalography (EEG) data. This method first involves calculating the pairwise distances between EEG epochs time-locked to different phonetic categories. The resulting pattern of distances, i.e., the representational dissimilarity matrix (RDM), is considered the representational geometry of the phonetic categories. Analogous RDMs are constructed for the acoustic cues under investigation. The next step is to relate the RDMs based on EEG data and acoustic cues. Considering the possible non-linear mapping between neural responses and acoustic cues, dissimilarity values in each RDM are submitted to rank-rank regression (Wilhelm & Morgen, 2023). Specifically, the rank of EEG dissimilarity is the response variable, and those of acoustic dissimilarities are the predictors. The output regression coefficients are taken as relative perceptual weights of acoustic cues. We will test the proposed method on 19 native English listeners’ EEG responses to an audiobook version of “The Old Man and the Sea,” annotated with timestamps of phoneme boundaries (Di Liberto et al., 2023). To evaluate whether our method generalizes to acoustic cues with different temporal alignments, two phonemic contrasts will be examined: (1) /r/ versus /l/ and (2) prevocalic voiced versus voiceless stops (e.g., /p/ versus /b/). In both cases, the EEG epoch length and channel(s) will be optimized by nested cross-validation. For the contrast between /r/ and /l/, F2 and F3 values will be measured at the onset of phonemes and exemplify synchronous cues. On the other hand, the contrast between voiced and voiceless stops has two asynchronous cues: voice onset time (VOT) aligned with the stop onset and fundamental frequency (F0) aligned with the following vowel. Based on previous behavioral studies on cue weighting for the /r/-/l/ contrast (Iverson et al., 2003), we predict that the regression coefficient for F3 onset would be larger than that for F2 onset. Similarly, the regression coefficient for VOT is expected to be larger than that for F0 (e.g., Whalen et al., 1990). We also speculate that the optimal EEG epoch length for synchronous cues (F2 and F3 onsets for /r/-/l) will be shorter than asynchronous cues (VOT and F0 for voiced-voiceless stops) because synchronous cues can be integrated instantly.
Topic Areas: Speech Perception,