Lunch at 12:30pm, talk at 1pm, in 148 Fitzpatrick

Title: Language Complexity and Speech Recognition Accuracy: Orthographic Complexity Hurts, Phonological Complexity Doesn’t

Abstract: We investigate what linguistic factors affect the performance of Automatic Speech Recognition (ASR) models. We hypothesize that orthographic and phonological complexities both degrade accuracy. To examine this, we fine-tune the multilingual self-supervised pretrained model Wav2Vec2-XLSR-53 on 12 languages with 10 writing systems, and we compare their ASR accuracy, number of graphemes, unigram grapheme entropy, logographicity (how much word/morpheme-level information is encoded in the writing system), and number of phonemes. The results demonstrate that a high logographicity correlates with low ASR accuracy, while phonological complexity has no significant effect.

Bio: Chihiro Taguchi is a second-year Ph.D. student in the NLP group, advised by Dr. David Chiang. His research interests broadly include language sciences, in particular both text-based and speech-based NLP and theoretical linguistics. He is currently working on the project “Language Documentation with an AI Helper” and is investigating how to effectively apply speech recognition technologies to low-resource languages.