Lunch at 12:30pm, talk at 1pm, in 148 Fitzpatrick

Title: Cross-Lingual Biases and Cultural Understanding in LLMs

Abstract: Human biases are ubiquitous but not uniform: disparities exist across linguistic, cultural, and societal borders. As large amounts of recent literature suggest, language models (LMs) trained on human data can reflect and often amplify the effects of these social biases. However, most existing studies on bias are heavily skewed towards Western and European languages. In our work (EMNLP’23), we scale the Word Embedding Association Test (WEAT) to 24 languages, enabling broader studies and yielding interesting findings about LM bias. We additionally enhance this data with culturally relevant information for each language, capturing local contexts on a global scale. Further, to encompass more widely prevalent societal biases, we examine new bias dimensions across toxicity, ableism, and more. We also briefly talk about an extension of this work that explores cultural understanding in LLMs and how that is linked with language.

Bio: Anjishnu Mukherjee is a second-year Ph.D. student in the NLP group at George Mason University, advised by Dr. Antonios Anastasopoulos. His research interests are centered around cross-lingual understanding of culture and related social biases. He is currently working on developing metrics for nuanced measurements of cultural differences and ways to mitigate them in language models.