Lunch at 12:30pm, talk at 1pm, in 148 Fitzpatrick
Title: Addressing Neural Text Degeneration by Moving Beyond Autoregressive Models
Abstract: It is well-known that neural language models such as GPT-2 suffer from a phenomenon called “neural text degeneration”, where they produce output which is repetitive, bland, or incoherent. Due to this problem, standard decoding algorithms, such as beam search and random sampling, fail to produce good results. Prior work has attempted to address this issue, either by introducing more advanced sampling techniques, or by retraining the model using a different objective function. To our knowledge, however, all previous work on this topic continues to use the standard autoregressive model structure. We believe that this may be responsible for the problems observed, since autoregressive models are locally normalized, and therefore may suffer from label bias. So we think that the key to solving neural text generation may therefore be to switch to a more expressive, globally normalized class of models.
In this talk, I present ongoing work on this approach, first introducing the problem of neural text degeneration, then explaining how it might be caused by locally normalized models and label bias, and then, lastly, introducing our new model.
Bio: Darcey is a 3rd year PhD student in the Computer Science and Engineering department, where she is advised by David Chiang. Her research focuses on probabilistic modeling for NLP. This includes work on factor graph grammars, a new class of models which uses graph grammars to generate probabilistic graphical models. More recently, she has been working on globally normalized alternatives to autoregressive language models.