Archive | 2019

Fully Automatic Journalism: We Need to Talk About Nonfake News Generation

 

Abstract


It would have been hard this year not to be aware of the furore around OpenAI’s ‘too dangerous to release’ GPT-2 text generator,1 with one carefully selected text about unicorns in particular cropping up everywhere (Figure 1). Text generation using large neural language models trained on large datasets (Zellers et al., 2019; Radford et al., 2019) is reaching headline grabbing levels of quality, with coverage focusing on the dangers of misuse, in particular for fake news generation. There are factors that make these models less dangerous than the headlines imply: one, the content of generated texts is not fully controllable beyond general topic, with a lot of unpredictability in the output. Two, texts generated with models like GPT-2 are easy to spot, because they contain telltale inconsistencies and non sequiturs (e.g. ‘fourhorned unicorns’ and ‘two centuries old mystery’ in Figure 1). Besides, the very techniques that make the models work so well for generation, also make it easy to automatically detect text generated by them, using tools like GLTR (Gehrmann et al., 2019) and Grover (Zellers et al., 2019). Meanwhile, the first fully automatic news generators are going into production without much media furore or awareness among readers, driven by very different, fully controllable and as yet undetectable, text generation technologies.

Volume None
Pages None
DOI 10.36370/tto.2019.29
Language English
Journal None

Full Text