Lunch at 12:30pm, (virtual) talk at 1pm, in 148 Fitzpatrick

Title: Teaching Machines to Read with Less Supervision

Abstract: Pre-trained Transformers have recently led to significant improvements on benchmark datasets, suggesting the time may be right for an effort to realize the vision of computers that can read the web. In this talk, I will discuss several projects that aim to extract timely information from Twitter and other sources, with an emphasis on reducing human labeling effort. To extract relationships from text without mention-level supervision, I will first present an approach that combines the benefits of structured learning and neural networks, accurately predicting latent relation mentions given only indirect supervision from a knowledge base. In extensive experiments, we show that a combination of structured inference, missing data modeling, and end-to-end representation learning leads to state-of-the-art results. While minimally supervised learning can be an effective means of quickly prototyping extractors without relying on costly human annotations, supervised learning still typically yields state-of-the-art performance. Yet, it is well-known that supervised models perform poorly, when applied outside the domain of their training datasets. In the second part of the talk, I will discuss recent advances on domain adaptation from an economic perspective. Conventional wisdom suggests labeled data is expensive, so methods that leverage large quantities of unlabeled text are a cost-effective way to adapt NLP models to new domains. We analyze this assumption in the context of domain adaptive pre-training, where the goal is to adapt a single model as efficiently as possible. Our experiments suggest that for small budgets, allocating all available funds to annotation is more effective than procuring GPUs or TPUs for in-domain pre-training, however as available budget increases, a combination of pre-training and annotation becomes the best approach. As more tasks are added, model costs can be amortized, making pre-training an attractive option for smaller budgets.

Bio: Alan Ritter is an associate professor in the School of Interactive Computing at Georgia Tech. His research interests include natural language processing, information extraction, and machine learning. He completed his Ph.D. at the University of Washington and was a postdoctoral fellow in the Machine Learning Department at Carnegie Mellon. His research aims to solve challenging technical problems that can help machines learn to read vast quantities of text with minimal supervision. In a recent project, covered by WIRED (https://www.wired.com/story/machine-learning-tweets-critical-security-flaws/), his group built a system that reads millions of tweets for mentions of new software vulnerabilities. He is the recipient of an NSF CAREER award and an Amazon Research Award.