Lunch at 12:30pm, talk at 1pm, in 148 Fitzpatrick

Title: On Improving Latin Polarity Detection through Data Augmentation

Abstract: The task of sentiment analysis is quite well-studied in high-resources languages like English and in opinionated contexts like reviews; however, much less ground has been made in literary and low-resource environments. In this talk, I will describe the submissions from Nostra Domina (the Latin for Notre Dame, or “Our Lady”) to the EvaLatin 2024 shared task of emotion polarity detection. To overcome the lack of available sentiment resources and the complexity of the textual genres at hand, our team elected to augment pre-existing data through automatic polarity annotation. I will present our two methods for doing so based on the k-means algorithm. Moreover, I will describe our use of Latin large language models in a neural architecture to better capture the underlying contextual sentiment representations. Finally, I will discuss our results and future directions, noting that our best approach achieved the second highest macro-averaged Macro-F1 score—and, thus, second place—on the shared task’s test set.

Bio: Stephen Bothwell is a fourth-year Ph.D. student advised by Dr. David Chiang in the NLP Group at the University of Notre Dame. His research focuses on applications of NLP techniques to ancient languages across the gamut of linguistic features. His prior work has been or is being published at major NLP conferences like EMNLP and LREC-COLING. He is also a senior fellow with the Navari Family Center for Digital Scholarship, where he builds and presents workshops pertaining to neural networks and textual data at large. His current research interests include enhancing language modeling for ancient languages, studying more linguistically-inspired approaches to tokenization, investigating interpretability in neural networks, and consistently finding further uses for edit distance in NLP contexts.