[PDF] The Deepfake Detection Dilemma: A Multistakeholder Exploration of Adversarial Dynamics in Synthetic Media

Abstract

Synthetic media detection technologies label media as either synthetic or non-synthetic and are increasingly used by journalists, web platforms, and the general public to identify misinformation and other forms of problematic content. As both well-resourced organizations and the non-technical general public generate more sophisticated synthetic media, the capacity for purveyors of problematic content to adapt induces a \newterm{detection dilemma}: as detection practices become more accessible, they become more easily circumvented. This paper describes how a multistakeholder cohort from academia, technology platforms, media entities, and civil society organizations active in synthetic media detection and its socio-technical implications evaluates the detection dilemma. Specifically, we offer an assessment of detection contexts and adversary capacities sourced from the broader, global AI and media integrity community concerned with mitigating the spread of harmful synthetic media. A collection of personas illustrates the intersection between unsophisticated and highly-resourced sponsors of misinformation in the context of their technical capacities. This work concludes that there is no "best" approach to navigating the detector dilemma, but derives a set of implications from multistakeholder input to better inform detection process decisions and policies, in practice.

Full PDF

TThe Deepfake Detection Dilemma: A Multistakeholder Exploration of AdversarialDynamics in Synthetic Media

Claire Leibowicz,

Sean McGregor,

Aviv Ovadya The Partnership on AI XPRIZE Foundation, Syntiant Corp Thoughtful Technology Project * These authors contributed equally

Abstract

Synthetic media detection technologies label media as ei-ther synthetic or non-synthetic and are increasingly used byjournalists, web platforms, and the general public to iden-tify misinformation and other forms of problematic content.As both well-resourced organizations and the non-technicalgeneral public generate more sophisticated synthetic media,the capacity for purveyors of problematic content to adaptinduces a detection dilemma : as detection practices becomemore accessible, they become more easily circumvented.This paper describes how a multistakeholder cohort fromacademia, technology platforms, media entities, and civil so-ciety organizations active in synthetic media detection and itssocio-technical implications evaluates the detection dilemma.Speciﬁcally, we offer an assessment of detection contextsand adversary capacities sourced from the broader, global AIand media integrity community concerned with mitigatingthe spread of harmful synthetic media. A collection of per-sonas illustrates the intersection between unsophisticated andhighly-resourced sponsors of misinformation in the contextof their technical capacities. This work concludes that thereis no “best” approach to navigating the detector dilemma,but derives a set of implications from multistakeholder inputto better inform detection process decisions and policies, inpractice.

The information ecosystem is comprised of actors that col-lect, spread, and consume newsworthy or credible informa-tion. Increasingly, these actors are disrupted by persons, or-ganizations, and governments aiming to spread misinfor-mation via text, images, and video (Verdoliva 2020; War-dle 2019). Among the negative impacts are increases in vi-olence, non-consensual sexual exploitation, ﬁnancial loss,and political unrest (Ajder et al. 2019; Gregory 2019).Thus, many actors in the global information ecosystem,and in society in general, have an interest in detecting andstopping the spread of problematic content online, includ-ing misinformation. Increasingly, these actors look to pro-grammatic tools that check whether source media has beenmanipulated or synthesized using AI techniques in orderto do so. While synthetic media is not inherently harm-ful or malicious, and can be used for satirical and artistic

Copyright © 2021, the Partnership on AI. All rights reserved. purposes, such signals are used by tech platforms and oth-ers for evaluating the credibility and potential harmful im-pact of content (Saltz, Coleman, and Leibowicz 2020; Bick-ert 2020; Roth and Achuthan 2020). Journalists and fact-checkers want detection tools to determine the authenticityof a source video (Leibowicz, Stray, and Saltz 2020; Co-hen 2020). Conversely, organizations concerned with doc-umenting systemic abuses around the world want detectiontechnologies to ensure people cannot claim unmanipulatedevidence of abuses are dismissed as fake (Gregory 2018).As one of many efforts intended to counteract the chal-lenges that synthetic media presents, artifact detection toolsand technologies have been developed by a variety of actorsseeking to reduce the harm spawned by malicous syntheticmedia; for example, Microsoft has partnered with the Real-ity Defender tool and Google’s Jigsaw has created its owntool called Assembler (Burt and Horvitz 2020; Alba 2020).These tools can be used to analyze videos, audio, or imagesto determine the likelihood that they were manipulated orentirely synthesized, without relying on external corrobora-tion or context. We use the word detector throughout thisdocument to mean ‘artifact detector tool or technology’ al-though, as we discuss, they are not the only detection op-tions.While many research results show advances in detectortechnology, tests on media encountered ‘in the wild’ oftenshow detectors have serious limitations (Leibowicz, Stray,and Saltz 2020; Hwang 2020). Moreover, we face the chal-lenge that a speciﬁc detector, or even an entire detection ap-proach combining many detection models, may be compro-mised if the synthetic media producer actively works to de-feat the detectors (Neekhara et al. 2020).This leads us to the detection dilemma : the more accessi-ble detection technology becomes, the more easily it can becircumvented. As a corollary, we are faced with many chal-lenging questions that impact the practicability of using de-tection, and the equity in access and outcomes. How shouldthe technical detection community share their tools and tech-niques, given that we would like them to be effective not justin research but also in the real world? How can we ensurethat such techniques, if effective, are accessible to other ac-tors in the global information ecosystem beyond the largesttechnology companies, including journalists, fact-checkers,and others in civil society? a r X i v : . [ c s . C Y ] F e b eople working in cryptography, computer security, andfraud detection think in terms of formal games to answersimilar questions. In such games, an adversary actively at-tempts to defeat a defender (Petcher and Morrisett 2015).Both parties adapt their tactics in response to the capabili-ties and actions of their counterparty. Through time the bal-ance of power can shift between parties, but security gamesrarely reach a deﬁnitive end in the real world. It is usefulto adapt such frameworks and games to synthetic media de-tection, a more recent adversarial challenge. However, thesynthetic media detection game will eventually end whensynthetic media becomes indistinguishable from unmodiﬁedvideo (at least with respect to artifacts ). While current indi-cations from the research community suggest we have notreached the point where indistinguishable synthetic video ispossible (Yang et al. 2019), our analysis should be viewedas capturing a moment in time that requires adaptation assynthetic content gets more sophisticated. New techniquesfor generating, concealing, and detecting synthetic mediaare all actively being developed (Ovadya and Whittlestone2019). Our work therefore aims to explain the roles of vari-ous actors active in misinformation generation and detectionto provide both insight into the current state of play and intothe likely development of capacities as techniques continueto develop and become more broadly known. Increased so-phistication and ubiquity of synthetic media will bring withit increased challenges to sharing robust detection tools withthe global information integrity community, and thereforemitigating malicious content online. Multistakeholder Input

Coping with the real world dynamics and impacts of syn-thetic media requires multidisciplinary input and attention.Leibowicz (2020) described multistakeholder governance ofthe Deepfake Detection Challenge, a 2020 machine learn-ing competition funded by Facebook to build better deep-fake detection models. A group of nine experts ranging fromcomputer vision researchers to misinformation trend expertsweighed in on the challenge’s governance, and in doing so,articulated the need for increased access to detection tech-nologies for journalists, fact-checkers, and those in civil so-ciety, as well as the need to attend to the adversarial dynam-ics of synthetic media detection (Leibowicz 2020).Building on Leibowicz’s multistakeholder protocol for in-forming deepfake detection model creation, we consulteda multidisciplinary group of actors from media, civil soci-ety, and industry to inform a framework and assessment ofthe synthetic media detection dilemma; we facilitated three,hour-long workshop sessions with this cohort and offered athree week review period on the initial ideas informing thisdocument. Of the ten individuals consulted, three worked inresearch and development at global media entities and hadexpertise in the needs of modern day newsrooms, two weremachine learning researchers, one in product managementat a large technology company, two in AI and content policyat large technology companies, and the other two individu-als were experts in human rights and the global threats ofsynthetic media and misinformation more generally.

Motivations

Although a detector that is unknown to the purveyors ofmisinformation is more effective than one that is publishedpublicly, there is a need to facilitate an ecosystem of de-tection technology sharing that can reduce the negative im-pacts of synthetic media (Leibowicz, Stray, and Saltz 2020).This will involve difﬁcult choices around information shar-ing for well-meaning societal actors including technologyplatforms, academics, journalists, and other detection orga-nizations. In many cases, the best choice for society may beat odds with an organization’s particular self-interest, andmay require more restrictions—or more openness—than itmight naturally be comfortable with.The experience of governments attempting to limit thedistribution of strong cryptography software indicates thatgovernmental regulation is unlikely to succeed in requiringa detector exposure protocol (Dame-Boyle 2015). However,formalizing the current state of play in the detector com-munity can support researchers so that they can best exposeartifact detection models in a way that helps address societalneeds (Leibowicz, Adler, and Eckersley 2019). Stakehold-ers who would directly be using these tools to improve soci-etal outcomes, including the media and misinformation re-search community, have a greater capacity to know whetherthe models they work with are trustworthy for their intendeduse cases. This document seeks to help ground the conversa-tion around these goals, providing useful context and frame-works for making sense of malicious synthetic media.In the following sections, we map the current state of playof synthetic media detection. To our knowledge, this is theﬁrst framework informed by multistakeholder input that de-scribes the technical and adversarial dynamics and implica-tions for the technical detection community. We aim to in-form recommendations for responsibly deploying detectiontechnologies, and also to explain to media organizations andother non-technical actors using detection technology whatcan be concluded about deepfake detection in adversarialsettings.We ﬁrst describe the existing synthetic media actors andexplore their technical capabilities. We then develop per-sonas to illustrate these actors and their capabilities. Next,we describe the types of detectors, their exposure levels,and options for defending effectively against synthetic me-dia generators. We use scenarios with the personas to pro-vide further context and implications for different detec-tor exposure levels. Finally, we discuss the key lessons thatemerge from this analysis and the implications for the de-tection dilemma. These involve taking into account not justwhat detection can do, but other levers and needs, such asapp store policies and forensic tool user training, that willenable accessible and responsible detection deployment.

Grounding this work in operational, real world informationcontexts requires ﬁrst detailing the reality of how detectorsmay be applied. Table 1 presents seven detection contextsassociated with a variety of stakeholders using detection.These contexts typically have a variety of human processes etection context Forensic Actors Forensic ActionsNation-shaking media analysis:

Contentwhich could alter the lives of millions. • Top media forensics experts Extremely detailed examinationwith sophisticated and potentiallycustomized tools

Suspect media analysis:

Investigations intodisinformation, abuse, and criminal conduct. • Journalists• OSINT investigators• Law enforcement Close human examination, primar-ily with standard tools

Automation augmented investigations:

In-vestigations where it is suspected that syn-thetic media may play a role (e.g. a disinfor-mation campaign). • Disinformation investigators (e.g.DFRLab, Graphika)• Platform threat intelligence ana-lysts• Disinformation researchers Tools pull many pieces of media fordetection, providing analysis for hu-man review.

Flagged content evaluation:

Evaluation ofcontent that has been ﬂagged as suspicious byusers for a platform. • Platforms (e.g., Facebook)• Platform users Flagging of suspect media by users,which platforms do automated eval-uation of (outputs of which may berevealed to users, or impact ranking,moderation, etc.)

Full platform evaluation:

Detection acrossentire platforms. • Platforms Automated evaluation of all plat-form content, which may just impactmetrics, or which may be revealedto users, or impact ranking, modera-tion, etc.)

General public evaluation:

Detection toolsprovided directly to users, either within aplatform or as a separate application or web-site. • Everyday people• Non-specialist journalists andother civil society actors Usage of tools that can take a me-dia item or online account as inputand provide an evaluation meant fora non-expert.Table 1:

Detection Contexts . A description of detection contexts, some of the key actors that would be involved in each context,and the actions those actors are likely to take in order to achieve their goals. The rows are ordered according to increasingdetector model exposure implied by their contexts.e.g., making the determination whether a story should bepublished or not), but in some cases the detector may makecontent decisions in a completely automated fashion. Theseprocess decisions help determine the exposure of the modelto adversaries. A model that is selectively applied to onlynation-shaking media can be tightly controlled and appliedto a select few pieces of media, while web platforms mayapply a detector billions of times. This work focuses on theefﬁcacy of detectors absent human processes to adapt theirdecisions to evolving adversarial capabilities.The misinformation actors working to avoid detection inthese contexts exhibit a spectrum of capabilities for bothsynthetic media generation and detection circumvention. Wenext introduce two classiﬁcations describing misinformationactors active in creating and deploying synthetic media tech-nologies, tools, and content.

Technical competency spectrum

The success of actors in evading synthetic media detectiondepends in part on their technical resources. The spectrum oftechnical competency can be split into three main categories:•

Novel resource actors:

Can do original research and im-plement complex systems from scratch.•

Common resource actors:

Rely on models and imple-mentations created by novel resource actors. Can buildcustom tools and pipelines that do not require researcher-level knowledge and understanding.•

Consumer resource actors:

Rely on apps and websitescreated by others. Can only create content and not thetools for creating the content.Most known examples at these sophistication levels arenot purveyors of misinformation. The academic creators ofa system called pix2pix (Isola et al. 2017) would be novelresource actors if they developed the software for disinfor-mation purposes. Pix2pix is a machine learning system forimage-to-image translation developed for benign purposes.For example, a demo of pix2pix lets you translate a sketchdrawing of a shoe into a photo-like image of a shoe matchingthat outline, and one can similarly use it to colorize grayscaleimages or to stylize a photograph (Hesse 2017).However, it can also be used for more harmful purposes.“DeepNude” is an example of a deepfake generator thatpipelines several machine learning advances to make a com-prehensive framework for image manipulation. DeepNude iscapable of translating photos of people to manipulated im-ages without clothing, and notably does not require the sub-ject’s consent (Cole 2019). While the creators of DeepNudeare certainly novel resource actors , their users are commonresource actors because the software greatly reduces boththe effort and expertise required to generate the deepfake.However, the most realistic results from DeepNude still re-quire understanding the underlying system.Finally, research advances are likely to enable true one-shot image translation —in other words, the creation of asystem which can be given a single image pair, e.g. “shoeoutline → shoe” or “clothed person → unclothed person,”and then repeat that operation with any future input. A con-sumer grade tool that lets anyone use this capability could be incredibly valuable to artists and scientists—but wouldmake creating non-consensual unclothed images easy foreven consumer resource actors. Fundamental capabilityadvances have many positive applications, but they can alsobe misused. Ultimately, the degree of technical competencerequired to create malicious synthetic media is likely to de-crease as research results are translated into general purposeconsumer products.

Anti-detection competence spectrum

Building on the general notion of technical competence, weare particularly interested here in how competent an actor isat avoiding detection, or at providing tools that help othersavoid detection. We can deﬁne an anti-detection competencespectrum closely related to the technical competency spec-trum:•

Novel anti-detection actor:

Can develop methods of cir-cumvention that require signiﬁcant technical resources orresearcher knowledge.•

Determined anti-detection actor:

Can probe detectorsystems, execute strategies to defeat them, and plumb to-gether pipelines to support this, building on work fromnovel anti-detection actors.•

Consumer anti-detection actor:

Can use consumer anti-detection tools.It is not only malicious actors who might be interestedin overcoming detection. Curious, status-seeking, and evendeeply principled actors who are not proponents of mali-cious use may intentionally do novel research or develop-ment to thwart detection too. For example, an academic orhobbyist may publicize methods on how to overcome detec-tion systems, even if there is no way to address the ﬂaws thatthey uncovered, under the principle that “security throughobscurity” is harmful. Of course, they would be ignoring thehistory of fraud mitigation which often relies on a period ofobscurity to succeed. There is a long history of such activityin other domains within cybersecurity and fraud, where thenet impact of such developments may be ambiguous. Ac-tors with non-malicious intent may therefore signiﬁcantlyexpand the availability of anti-detection technology, therebyunintentionally enabling increased malicious use. In fact,since one of the principal methods for generating syntheticmedia involves engineering detectors that a neural networkmodel must then defeat, thousands of deep learning engi-neers are implicitly engaged in defeating detection.

Personas

Personas are ﬁctional persons constructed to represent realpersons and are common in user experience engineering andmarket research. While persona methods are typically usedto understand how people interface with a system, we em-ploy them here as a means of fostering collective under-standing among diverse stakeholders in the misinformationchallenge domain. Figure 1 introduces six personas that willbe referenced in the following sections. These personas arenot meant to be exhaustive—there are many additional typesof actors—but this is a set found to be useful in stakeholderiscussions and covers a wide range of actor capabilities,motivations, and interactions between them.

Having outlined the personas ﬁlling rolls in either the gen-eration of misinformation or the required technical infras-tructure, this section turns to the ways that one can defendagainst misuse with detection technology. At present, syn-thetic media can be detected due to a variety of factors in-cluding:•

Artifacts:

Inconsistencies with the physical world (e.g.,strange shifting facial lines, disappearing teeth), or statis-tical abnormalities (e.g., unnatural audio or video spectro-grams). See Bully Bob’s persona in Figure 1, part f for anexample where Bob’s hair blends into what appears to bea graduation cap.•

Identiﬁable source material:

The synthetic media is de-rived from one or more modiﬁed source media that canbe identiﬁed from the post-modiﬁcation result (e.g., themodiﬁed source speech was from a speech delivered sixmonths ago).•

External context:

Knowledge of synthetic media sub-jects or environments can highlight inconsistencies (e.g.,“the political candidate was out of the country during theevent” or “the political candidate has a shoulder impair-ment and cannot lift his arm above his head” or “the sun isnot in the right place for the purported time of the year”).•

Identiﬁable non-corroborating media:

Other media ev-idence, such as additional ﬁlming angles, can show mediaas being collectively inconsistent.These are all factors that can be used to detect syntheticmedia (i.e., a true positive for a detector), but there are othersignals that can establish the veracity of media (i.e., a truenegative ). In this analysis we focus on the ways that the ad-versary can fool a detector into reporting a false negativepresuming the content is synthetic. We also speciﬁcally con-centrate on artifact detection in this work as it is the primaryfocus of most detection research. However, there is muchpromising work to be done beyond that, and we encourageadditional resources in expanding the utility of complemen-tary approaches. What follows is the state of play at the endof 2020 for synthetic media detection of artifacts with themost common and powerful form of detector: neural net-works.

Detector Exposure Levels

Neural networks are biologically-inspired computer pro-grams that network together many chained mathematical op-erations to perform tasks like classiﬁcation (e.g., synthetic vsnon-synthetic) and synthetic media generation (e.g., trans-fer a face onto a person). As a tool for both generating anddetecting synthetic media, neural networks are of immenseinterest to the media integrity community. Detector neuralnetworks can be operationally and technically exposed tovarious degrees to the adversary. Each level of exposure is controlled by the person or organization developing the de-tector and these choices may include (in order of decreasingexposure):•

Publicly shared trained models:

This gives the publicaccess to all information about the detector, including theability to run the model themselves. The public then hasaccess to both the full description of the neural networkand the conﬁguration (i.e., “weights” or “parameters” toneural network practitioners).•

Open access queries:

Anyone can make a near-unlimitednumber of queries to the detector model, to see if the de-tector considers any number of pieces of content to besynthetic.•

Public untrained models:

The detector model is fully de-scribed (e.g., it is published in a public venue) but it is notrunnable without appropriate data to conﬁgure it (i.e., thenetwork is not trained).•

Controlled access service:

The detector may limit accessto querying the detector.•

Private use:

A single organization controls the unpub-lished model, does not disclose the existence of the detec-tor, and never allows external parties to infer the predic-tions of the detector.Users of detector models must know the exposure of themodel before they can reason about the capabilities of themodel to detect synthetic media. In the following sections,we describe each of these exposure levels in more detail andgive insight into their implications via persona examples.

Publicly Shared Trained Model

Neural network detectormodels “learn” to differentiate synthetic examples from non-synthetic examples by being shown examples of both typesof content and iteratively updating their neurons, or weights,as they are known to machine learning practitioners. A sim-ilar process is executed when producing a synthetic mediagenerator. However, instead of teaching the neural networkto differentiate two types of content, the generator neuralnetwork is told to produce content that can fool a secondneural network called a discriminator into failing to dif-ferentiate synthetic and non-synthetic media. As discrimi-nators improve in their ability to differentiate synthetic andnon-synthetic media, the synthetic media generators also im-prove.

Persona Example : Researcher Roberta publishes a re-search article showing state-of-the-art performance insynthetic media generation. She achieved the perfor-mance by taking all publicly shared trained models andusing them to train a new generator. The resulting gen-erator defeats all publicly shared trained models. OpenOphelia doesn’t care that the detectors are defeated, butshe does want to use the source code to make funnyTikTok videos and contribute the generator to her fa-vorite open source video editing project. Now Propa-gandist Paul, and Bully Bob can both easily use thesame open source software to defeat detectors. Whilethe open source software is openly available to them, a) Nation-state tech lead Nancy: a 35-year-old in a secretmilitary intelligence organization who has access to a staff oftop-notch machine learning researchers and security experts. Ifa person is not on her payroll, she can likely coerce throughmoney or extortion anyone living in her country to provide tech-nical assistance. She is in charge of providing tools for the infor-mation operators—some of their content is getting detected byplatforms and OSINT experts, and she’s supposed to make thatproblem go away. Her users incidentally have access to excel-lent surveillance imagery on their targets. The detector commu-nity does not know someone like Nancy exists, but the generalconsensus is that there must be a Nancy working for one or moreof the major industrialized powers. (b)

Propagandist Paul: a 65-year-old who uses consumer appsto create emotionally evocative political content for proﬁt—some of which is intentionally manipulated—and which heshares in the 30 online groups he manages, and which occa-sionally goes viral across many other groups, platforms, andeven heavily trafﬁcked partisan news sources. Sometimes whenhe posts manipulated media that he has created or found, theplatform places a warning on the content, or removes it entirely.He sometimes tries to search around to ﬁnd a way to preventthat from happening, and often compares notes with colleaguesof similar viewpoints. He often does just as well remixing andresharing synthesized content created by others.(c)

Researcher Roberta: the head of a prominent adversarialmachine learning research lab, Roberta is a 38-year-old womaninterested in investigating and publishing methods for circum-venting synthetic media detectors. She and many of her peersin computer security believe that sharing the details of how tobreak existing systems is the best way to build more robust sys-tems going forward. As a side effect, her technical work andpublicly available source code also makes it easier for unsophis-ticated adversaries to apply her advanced circumvention tech-niques, including during a recent spat of political protests inanother country. (d)

Open Ophelia: a 20-year-old undergrad on summer vaca-tion without an internship, Ophelia is intensely interested insoftware engineering and artiﬁcial intelligence. She doesn’t yetunderstand all the math behind synthetic media generation, butshe is eager to learn it so she can produce more realistic digitalart. She occasionally reads research papers and closely followsthe work of Researcher Roberta. Ophelia plans on integratingRoberta’s research code into her favorite open source video edit-ing platform so she can make the most ridiculous (and realistic)TikTok videos ever seen. If/when she does, everyone will beable to make synthetic media with her codebase.(e)

Market-manipulator Mike: works with an irresponsiblehedge fund (or perhaps an organized crime ’investment vehi-cle’), and has similar anti-detection competences to Nation-state Nancy. He may only need to fool automated-trading al-gorithms for a fraction of a second in order to make hundredsof millions of dollars. (f)

Bully Bob: a teenager who uses consumer apps to synthesizeschoolmates saying terrible things in order to embarrass them,get them into ﬁghts, or in trouble with the authorities. Bob wantshis media to look more realistic but is unwilling to learn howto do anything too complicated. Thanks to the work of OpenOphelia, Bob is starting to get easy access to more advancedtechniques. Bob doesn’t care if the content is eventually ﬁltered,so long as it is not blocked for a few hours of fun.

Figure 1:

Personas . The personas above are ﬁctitious persons rooted in the motivations and capabilities of real persons in theworld. They were selected to represent the full range of capabilities and motivations of the misinformation community. Inclusionof prosocial persons (e.g., Researcher Roberta) in this persona list does not constitute classiﬁcation of those persons as adver-saries, but as persons with complex interactions between both the detector community and the purveyors of misinformation.All portraits above are ﬁctitious people and rendered by

ThisPersonDoesNotExist.com .hey often don’t go through the trouble since they al-ready work with a different piece of software. However,Nation-state Nancy takes Open Ophelia’s source codeand integrates it within a growing codebase utilized bythe government-funded propaganda teams. ∴ Novel anti-detection actors can use new detectorsto improve the ﬁdelity of synthetic media. ∴ If a trained detector neural network is availableto the adversary, all determined adversaries can hidethe artifacts of interest to the detector.

Open Access Queries

Detector models can be hosted byorganizations like large social networks that then check alluser-submitted content for synthetic media. Since the ad-versary can hide inside a large user population, it is pos-sible to repeatedly submit synthetic media that is repeatedlyand imperceptibly altered until it passes through the detectormodel. This black box attack allows the adversary to defeat adetector that is 99.9 percent accurate by testing, on average,1,000 variations of the same synthetic media.

Persona Example : Nation-state Nancy wants a videoshowing an unfriendly political candidate making big-oted statements to go go viral on social networks. Sheissues a contract to a security contractor, which gen-erates a deepfake using a publicly available deepfakemodel. The deepfake generator is already known bythe relevant platform, which has a policy against allsynthetic media, so it automatically labels the videoto users with a warning that the content may be syn-thetic. Nancy now gives the deepfake to another teamin order to remove the warning label. They submit vari-ations of the deepfake with subtle changes to the video(e.g., adding small amounts of random noise) until oneor more of the videos is posted without labeling. ∴ If the public can repeatedly test the detectormodel, determined adversaries will eventually ﬁnd away to break it.

Public Untrained Models

Researchers studying detectormodels gain career advancement by publishing their re-search. Publication norms require at least a full descriptionof what the detector model is doing, but in many cases itis possible to hold back the weights of the neural networkfrom general availability (Campbell 2020). Without accessto the weights, only novel resource actors will be able totrain new generator models and detector models based onthe knowledge from the research publication. Such actorshave the engineering capacity, access to datasets needed fortraining, and the compute power required to reimplement thedetector.

Persona Example : Researcher Roberta develops a newdetector model and publishes it in the proceedings ofthe International Conference on Synthetic Media De-tection (ICSMD). Facebook and Nation-state Nancyimmediately begin training secret versions of the modelfor private use. The datasets amassed by Facebook andNancy far exceed the datasets available to Roberta, so the resulting detectors are more complete than the onein the academic press. Differences in training meanthe resulting networks between Facebook and Nancyhave slightly different failings and the detector model atFacebook may detect some synthetic media that passesNancy’s detector model. Still, when Nancy submitssynthetic media to Facebook, the chances of syntheticcontent being detected and labeled as such are greatlydiminished. ∴ Publishing a model without training it may sig-niﬁcantly increase the effort required to circumvent themodel relative to publishing trained models. ∴ The effectiveness of a detector model depends onthe training of the model. The same model may be lesseffective if trained by actors with less resources.

Controlled Access Service

Preventing access to a net-worked computer program can be difﬁcult, but every detec-tor user can be formally qualiﬁed and granted access undercontrolled circumstances. The detector’s exposure may in-crease if any member of the in group leaks their code oraccount credentials. A middle ground permitting detectorcoalitions is to provide outside parties with a ﬁxed numberof queries on a detector that is controlled and hosted by theorganization producing the detector. Providing the detectorimplementation is not exﬁltrated, access to the detector canbe withdrawn from outside parties in the future. When shar-ing the trained model externally, it is impossible to knowwhether the outside parties have been compromised.

Persona Example : Only Nation-state Nancy poses athreat to detectors shared via controlled access. She al-ready knows the Google Detector Suite password of the

New York Times reporter due to a previous operationthat installed a keylogger on several reporter laptops.After Nancy uses the access to check whether the mis-information campaign she is about to launch will bedetected, Google notices the sudden spike in trafﬁc andsends an email to the

New York Times user and locksthe account. Propagandist Paul remains confused why

The New York Times is so good at ﬁnding and debunk-ing his synthetic media because he doesn’t know aboutthe Google Detector Suite. ∴ Trusted parties can share detector results witheach other without greatly compromising their effec-tiveness.

Private Use

A detector can potentially be held for privateuse, meaning no one outside the organization or detectorcoalition has access or perhaps even knowledge of the de-tector model’s existence. This is the most restrictive use casefor the detector model and therefore is the most capable ofavoiding active circumvention by the adversary. However,even with such limited exposure there are limitations to whatdetector models can accomplish. In the case of the Deep-Fake Detection Challenge (DFDC), teams built their solu-tions based on a dataset ﬁlmed and modiﬁed by Facebook,but were ranked for a $1,000,000 prize purse based on datafrom the platform’s real users (Dolhansky et al. 2019). Thereal user data was not created to circumvent detection, butccuracy RankingFacebook-GeneratedData UserData Facebook-GeneratedData UserDataCompetitor 1 ∼ ∼

57% 1 905Competitor 2 ∼ ∼

65% 4 1Table 2:

Two competitor results from the Facebook Deep-fake Detection Challenge.

All models degraded signiﬁ-cantly from their test set performance on Facebook gener-ated data to test set data deﬁned on user generated data.results are not promising as shown in Table 2.The top performing competitor on user data has an ac-curacy of only about 65% and a performance of 50% isachievable by ﬂipping a coin. This performance is partic-ularly troublesome in context, where there are millions ofnon-synthetic videos for every instance of a synthetic video.Thus when a detector is applied to a large population ofvideos, the likelihood that a video which is labeled as syn-thetic is actually synthetic is very small. Conversely, a syn-thetic video generated by a model which was not used togenerate training data will likely be missed by the detector.

Persona Example : Twitter develops an informationhealth check dashboard that reports on the rate of syn-thetic media in user-submitted content. They do nottake action on individual accounts (fast system ac-tions constitute “Open Access Query” availability), butthey do use the information to make community man-agement decisions. The dashboard runs 15 differentstate-of–of-the–the-art synthetic media detectors andpresents the results in a tabular form. After private useof the tool begins, Researcher Roberta publicly pub-lishes a generator that introduces speciﬁc people intocrowds without swapping their faces. The health checktool does not detect these cases initially, but Twittereventually adds a 16 th detector trained to recognize thisspeciﬁc synthetic media case. Statistically, they knowthe prevalence of crowd injection generators, but thedetector still only classiﬁes two out of three crowd in-jections as being synthetic, so it is not used to program-matically label content. ∴ Even when the adversary does not know a detec-tor model exists, the detector will perform poorly onsynthetic media types it was not trained to recognize. ∴ Detector tools cannot currently conﬁrm the in-tegrity of non-synthetic media, but can be one part of averiﬁcation process.

Defender Signaling Options

Once a detection system is used to evaluate content, there area number of actions that can be taken, which give differentlevels of signal to an adversary about what types of contentare detectable: •

Strongest signal:

Immediate platform actions on the con-tent (remove, label, downrank, etc.).•

Strong signal:

Direct actions that identify speciﬁc de-tected content. For example: – Delayed platform actions on the content (may be in-tentional to weaken the signal, or because of true de-lays, due to the time needed for human review or evenqueued automated review). – Calling out of detected content on either social mediaor established media, by other actors (e.g., civil society,corporate actors, government, individuals). – Ofﬂine or off-platform direct action on a creator or dis-tributor, using detected content as evidence.•

Weak signal:

Delayed response to a user or group, with-out identifying what piece of content triggered the re-sponse.•

No signal:

Analytics outputs (tracking extent of such con-tent in aggregate).In addition, if detail is provided about how the content wasidentiﬁed, for example what tools were used, what manipu-lation was detected, or what signatures were identiﬁed, thatprovides additional information to the adversary. Of course,not providing that additional context might decrease trust inthe claim a piece of media was manipulated or synthesized(Saltz, Leibowicz, and Wardle 2020).

There are many obstacles to real world, accessible detec-tion in an adversarial landscape, and detection is only onepiece in a much larger suite of problematic content mitiga-tions; yet, detection can act as a useful tool in our maliciouscontent mitigation toolbox. The following summarizes keyinsights on reckoning with the detection dilemma, buildingon our analysis above.

Aim for “goldilocks exposure.”

Keeping detector model sharing closed can clearly be help-ful for resisting adversaries—but only up to a point. If de-tectors are shared too little, then there will be less interestin researching detection and improving the state-of-the-art,most defenders will not get the latest advances, and few ifany will keep up with the rate of improvement of syntheticmedia generation technology.A norm for academics to avoid publishing state-of-the-art detection models publicly might not prevent the mostpowerful anti-detection actors with intelligence capabilitiesthat enable them to exﬁltrate or recreate such models, butit would prevent novel resource actors like Open Opheliafrom making more realistic synthetic media art tools—andthus incidentally prevent Propagandist Paul and Bully Bobfrom misusing those tools and defeating detectors. If onebelieves there are many such Pauls and Bobs, then it maybe a tradeoff that one chooses to make, even though it mayslow innovation and openness. Privately shared datasets arenot uncommon in research, and similar (though potentiallymore rigorously monitored) structures could be used forharing such models. Similar concerns and approaches ap-ply to models used to generate synthetic media.

Every time a detector is used, it becomes less likelyto be effective in the future—particularly againstnovel resource actors.

In a world where novel resource actors are true threats, thebest detectors should be used sparingly, or in ways that pro-vide the least signal to the adversary. One way to do thatis with a hybrid model, with detectors of varying levels ofquality being provided at different levels of trust and usage.Nation-shaking media, analysis of broad groups, and analyt-ics would utilize the strongest detectors.In practice, this might look like sharing the most pow-erful trained models among a small number of trusted ac-tors developing and deploying detectors. Weaker versionsof these detectors could also be provided as a service withrate-limited and monitored access controls to civil societyhubs (e.g., via the misinformation ﬁghting non-proﬁt FirstDraft News), which would then vet users in their networksto give access and the necessary funding required to sup-port their use. The civil society hubs would ideally be incen-tivized to share credentials as broadly as possible to mitigategatekeeping—but with strong incentives including threat ofloss of access to vet those who are given access thoroughly.If a piece of content satisﬁed a certain level of potentialharm, it would be elevated up the trust chain to better andbetter detectors. It is also important that alongside access tothe technology or tool, such users are granted capacity andtraining to use them effectively.Beyond relying on hubs as vetting organizations , there isalso the potential to support or create media forensics serviceproviders with powerful tools and resources to support newsorganizations and other crucial actors. These might be neworganizations, or additional programs for existing hubs; theywould likely need to be externally funded in order to supportunder-resourced news and civil society organizations. Theywould need to not only provide access to tools, but providetraining to ensure they can be used effectively and responsi-bly.

Play offense.

The less one needs to defend against an adversary, the bet-ter. Ideally, all synthesis tools might provide secret signa-tures, watermarks, consent mechanisms, and so on, includ-ing potential Content Authenticity Initiative and Project Ori-gin standards (Ovadya 2019; Fischer 2020; Rosenthol et al.2020). Many of these approaches can even be implementedwithout any decrease in user privacy, though it is vital tothink carefully about potential tradeoffs and how such tech-niques might put individual journalists and rights defendersat risk (Gregory 2020).Moreover, it appears to be possible to watermark trainingdata such that the media outputs of the resulting model arealso watermarked as synthetic, though there are also deﬁnitelimitations to this approach requiring more thorough analy-sis (Yu, Skripniuk, and Abdelnabi 2020). Since most actorswill need to rely on consumer grade tools, those making iteasier to create and access such tools have the largest levers. It is far easier to simply identify signatures that havebeen intentionally added to synthetic media by a consumertool than it is to identify something that may or may nothave been created by a tool. Such signatures can also en-able higher conﬁdence and enable truly automated responsesto synthetic media created through tools that provide sig-natures. App stores can encourage broad adoption of suchsignatures by requiring synthetic media tools to utilize them(Ovadya 2020).These techniques are not capable of defeating Nation-State Nancy, but they will defeat Bully Bob and Propagan-dist Paul. Where “perfect is the enemy of the good,” it isnecessary to mitigate rather than solve.

The framework and explanations described above serve asa necessary prerequisite to establishing formal policy, pro-cesses, and institutions for detector exposure and collabo-ration. Having established the actors and capacities of themisinformation ecosystem, civil society organizations andjournalists can begin exploring technical collaboration andsharing of tools developed by the largest technology com-panies. It may be valuable to apply a similar multistake-holder process to the issues of establishing the veracity ofmedia—proving true as opposed to proving false—and thepotential adversarial dynamics around that. Even within de-tection there are many more areas of difﬁculty which mayneed to be addressed. For example, adversarial tactics thatmake it easier to claim doubt or harder to claim certainty in-stead of attempting to fully defeat detection. There is alsothe challenge of even deﬁning what counts as synthetic—as almost all media content has some form of synthesisor manipulation—and applying that level of nuance to thisanalysis. There are some efforts from the fact-checking com-munity, like the MediaReview schema, that are beginning todo this, but there is much work to be done (Benton 2020).However, even if we address the technical side of detec-tion which is the focus of this paper, it is crucial to remem-ber that investment in detection tools will do little if thosetools are not sufﬁciently available, understood, or trusted.Access restrictions to detection tools may help mitigate thedetector dilemma in wealthier countries—but insufﬁcient in-vestment in vetting organizations and forensics providerscould leave stakeholders in poorer countries with little ac-cess. While some might argue that this would be a betteroutcome than having useless detectors worldwide—a plau-sible outcome with no access restrictions—it behooves us toaim for equity in detection capacity. Future research shouldexplore how to meaningfully equip stakeholders in less re-sourced environments to leverage detection technologies.Moreover, while detection tools may enable content to beﬂagged on one platform or another, there are plenty of non-platform or closed network methods that nefarious actorshave for broadly sharing media. Therefore, in addition to in-vesting in tools, we must resource institutions that enablecapacity and legitimacy for detection tools. This legitimacycan be built on a combination of clear messaging from toolmakers and solid governance of the organizations determin-ing who gets access to such tools and technologies. In sum,avigating the detection dilemma requires follow up workthat evaluates answers to three key questions:•

Who gets access?•

What sort of access do these groups or individuals have?•

How are these groups and individuals chosen?Future work should look to prior instances where these ques-tions have been answered in ways that succeed and fail, andinterrogate the reasons why. We need an answer to why peo-ple and organizations should follow any best practices thatemerge.In our world of democratized innovation, no single actorcan truly control detection access and there is no perfect an-swer for ensuring robustness and access to detection tools.There is simply the work of weighing tradeoffs and thenbuilding institutional structures and norms that can continueto navigate those uncertain waters—well enough—throughcollaboration across disciplines and stakeholders.

Thanks to Sam Gregory for extensive comments on an ear-lier document draft. Thanks to Emily Saltz, Jonathan Stray,the PAI AI and Media Integrity Steering Committee, and theentire Partnership on AI staff for their input into this projectand resultant paper. Thanks to Paul Baranay for L A TEXingand editing.

References arXiv e-prints .Fischer, S. 2020. Project Origin is watermarking media toﬁght fake news ahead of the election. Axios. Gregory, S. 2018. Prepare Don’t Panic. WITNESS Blog.Gregory, S. 2019. Deepfakes and Synthetic Media: UpdatedSurvey of Solutions Against Malicious Usages. WITNESSBlog.Gregory, S. 2020. Tracing trust: Why we must build authen-ticity infrastructure that works for all. WITNESS Blog.Hesse, C. 2017. Image-to-Image Demo: Interactive Im-age Translation with pix2pix-tensorﬂow. URL https://afﬁnelayer.com/pixsrv/.Hwang, T. 2020. Deepfakes: A Grounded Threat Assess-ment. Center for Security and Emerging Technology.Isola, P.; Zhu, J.-Y.; Zhou, T.; and Efros, A. A. 2017.Image-to-Image Translation with Conditional AdversarialNetworks. In . IEEE. doi:10.1109/cvpr.2017.632. URL https://doi.org/10.1109/cvpr.2017.632.Leibowicz, C. 2020. The Deepfake Detection Challenge:Insights and Recommendations for AI and Media Integrity.PAI Research Paper.Leibowicz, C.; Adler, S.; and Eckersley, P. 2019. When Is ItAppropriate to Publish High-Stakes AI Research? PAI Blog.Leibowicz, C.; Stray, J.; and Saltz, E. 2020. ManipulatedMedia Detection Requires More Than Tools: CommunityInsights on What’s Needed. PAI Blog.Neekhara, P.; Dolhansky, B.; Bitton, J.; and Ferrer, C. C.2020. Adversarial Threats to DeepFake Detection: A Prac-tical Perspective. arXiv preprint arXiv:2011.09957 .Ovadya, A. 2019. Making deepfake tools doesn’t have to beirresponsible. Here’s how. MIT Technology Review.Ovadya, A. 2020. Making Sense of Deepfake Mitigations.Medium.Ovadya, A.; and Whittlestone, J. 2019. Reducing MaliciousUse of Synthetic Media Research: Considerations and Po-tential Release Practices for Machine Learning. ArXiv.Petcher, A.; and Morrisett, G. 2015. The foundational cryp-tography framework. In

International Conference on Prin-ciples of Security and Trust , 53–72. Springer.Rosenthol, L.; Parsons, A.; Scouten, E.; Aythora, J.; Mac-Cormack, B.; England, P.; Levallee, M.; Dotan, J.; Hanna,S.; Farid, H.; and Gregory, S. 2020. The Content Authentic-ity Initiative: Setting the Standard for Digital Content Attri-bution. Adobe Whitepaper.Roth, Y.; and Achuthan, A. 2020. Building rules in public:Our approach to synthetic and manipulated media. TwitterBlog.Saltz, E.; Coleman, L.; and Leibowicz, C. 2020. Making AIArt Responsibly: A Field Guide. Medium.Saltz, E.; Leibowicz, C.; and Wardle, C. 2020. Encounterswith Visual Misinformation and Labels Across Platforms:An Interview and Diary Study to Inform Ecosystem Ap-proaches to Misinformation Interventions.erdoliva, L. 2020. Media Forensics and DeepFakes: AnOverview.

IEEE Journal of Selected Topics in Signal Pro-cessing .Wardle, C. 2019. Understanding Information Disorder. FirstDraft News.Yang, X.; Li, Y.; Qi, H.; and Lyu, S. 2019. Exposing GAN-Synthesized Faces Using Landmark Locations.

Proceedingsof the ACM Workshop on Information Hiding and Multime-dia Security .Yu, N.; Skripniuk, V.; and Abdelnabi, S. 2020. ArtiﬁcialGAN Fingerprints: Rooting Deepfake Attribution in Train-ing Data. arXiv e-printsarXiv e-prints