[PDF] AMP: Authentication of Media via Provenance

Abstract

Advances in graphics and machine learning have led to the general availability of easy-to-use tools for modifying and synthesizing media. The proliferation of these tools threatens to cast doubt on the veracity of all media. One approach to thwarting the flow of fake media is to detect modified or synthesized media through machine learning methods. While detection may help in the short term, we believe that it is destined to fail as the quality of fake media generation continues to improve. Soon, neither humans nor algorithms will be able to reliably distinguish fake versus real content. Thus, pipelines for assuring the source and integrity of media will be required---and increasingly relied upon. We propose AMP, a system that ensures the authentication of media via certifying provenance. AMP creates one or more publisher-signed manifests for a media instance uploaded by a content provider. These manifests are stored in a database allowing fast lookup from applications such as browsers. For reference, the manifests are also registered and signed by a permissioned ledger, implemented using the Confidential Consortium Framework (CCF). CCF employs both software and hardware techniques to ensure the integrity and transparency of all registered manifests. AMP, through its use of CCF, enables a consortium of media providers to govern the service while making all its operations auditable. The authenticity of the media can be communicated to the user via visual elements in the browser, indicating that an AMP manifest has been successfully located and verified.

Full PDF

11 AMP: Authentication of Media via Provenance

Paul England, Henrique S. Malvar, Eric Horvitz, Jack W. Stokes, C´edric Fournet, Rebecca Burke-Aguero,Amaury Chamayou, Sylvan Clebsch, Manuel Costa, John Deutscher, Shabnam Erfani, Matt Gaylor,Andrew Jenks, Kevin Kane, Elissa Redmiles, Alex Shamis, Isha Sharma, Sam Wenker, Anika Zaman

Microsoft

Abstract —Advances in graphics and machine learning have ledto the general availability of easy-to-use tools for modifying andsynthesizing media. The proliferation of these tools threatensto cast doubt on the veracity of all media. One approachto thwarting the ﬂow of fake media is to detect modiﬁed orsynthesized media through machine learning methods. Whiledetection may help in the short term, we believe that it is destinedto fail as the quality of fake media generation continues toimprove. Soon, neither humans nor algorithms will be able toreliably distinguish fake versus real content. Thus, pipelines forassuring the source and integrity of media will be required—and increasingly relied upon. We propose AMP, a system thatensures the authentication of media via certifying provenance.AMP creates one or more publisher-signed manifests for a mediainstance uploaded by a content provider. These manifests arestored in a database allowing fast lookup from applications suchas browsers. For reference, the manifests are also registeredand signed by a permissioned ledger, implemented using theConﬁdential Consortium Framework (CCF). CCF employs bothsoftware and hardware techniques to ensure the integrity andtransparency of all registered manifests. AMP, through its useof CCF, enables a consortium of media providers to govern theservice while making all its operations auditable. The authenticityof the media can be communicated to the user via visual elementsin the browser, indicating that an AMP manifest has beensuccessfully located and veriﬁed.

I. I

NTRODUCTION

Advances in graphics and machine learning have enabledthe creation and distribution of easy-to-use tools for synthesiz-ing fake media. These tools enable non-expert users to modifyor synthesize audiovisual media that looks convincingly real.Although subtle artifacts may be detected in some cases byexperts or by statistical classiﬁers developed with machinelearning, we expect that the march of technical advances willsoon make it impossible to distinguish fake media from real.Tools for media synthesis, coupled with wide-scale distributionof social media, threaten to cause harm to individuals, insti-tutions, and nations. More generally, widespread distributionof fake media has the potential to undermine societys trust inthe veracity of all media. With the rise of fake media, whatcan be done to protect the veracity of media and provide apathway to trust?We are pursuing an answer by providing users with reliableinformation about the source and authenticity of a mediaobject, through a veriﬁable and trustworthy media authen-tication service. That should allow the consumer to relyon the reputation of the media producer to make informeddecisions about the medias trustworthiness. For example, amedia company or publisher can attest that it published a work in accordance with their editorial standards, or contentcaptured at a certain location and time by cameras in the handsof a trusted reporting team.The simplest building block for proving provenance isto sign the media object digitally. However, the variety ofmechanisms for media distribution, with many of them mod-ifying the media ﬁles or streams, means that maintainingdigital signatures is difﬁcult. Additional challenges are alsoinvolved. For example, in a typical redistribution scenario,media content is re-encoded by a content distribution network(CDN). Such re-encodings are needed to address variationsin channel bandwidth, rendering device resolution, and otherconstraints. To preserve provenance information, certiﬁcatesmust be tracked and re-inserted for each transformation.We present a practical system named AMP (for a uthentication of m edia via p rovenance) aimed at providingrobust veriﬁcation of provenance while supporting a widevariety of production and distribution scenarios at Internetscale. We propose AMP as an approach to mitigating thenegative societal impact of fake/synthetic media, based on cer-tiﬁable provenance. The AMP effort brings together expertisein security and media, leveraging advances in cryptography,watermarking, and recently released cloud security and ledgerservices.Threats to the integrity of sources include the use of arange of techniques, from simple modiﬁcations of timing tomore sophisticated uses of graphics and generative models,for manipulating or synthesizing audiovisual content that isperceived by consumers as capturing actual events.Approaches to securing media from a reputable providerto its consumption include (1) strong authentication and (2)fragile watermarking. A complementary approach involves(3) the detection of manipulation or synthesis via patternrecognition employing machine-learned classiﬁers. Additionalopportunities include (4) event-certiﬁcation methods for cer-tifying that media as captured is linked to actual physicalevents, rooted in activities that are certiﬁed via a combinationof methods to have occurred at a time and place. With AMP,we focus on securing media based on the joint use of (1) and(2), thereby providing the certiﬁcation of the identity of themedia provider.The AMP system consists of four main modules includingthe AMP Service , the

Media Provenance Ledger , the

ManifestDatabase , and the

AMP Authoring Tools . AMP authenticatesmedia using a digitally signed data structure called a manifest ,and the

AMP Service allows content providers to upload theirmedia manifests to AMP. Manifests are registered in the

Media a r X i v : . [ c s . MM ] J un Provenance Ledger , which is a public distributed ledger basedon the Conﬁdential Consortium Framework (CCF) [1], [2].Manifests can be distributed together with media contents,whereas the ledger ensures integrity and auditability of the fullhistory of media publishing operations. In addition, manifestsare indexed by media fragments in a

Manifest Database forfast querying. Once a manifest or group of manifests has beenuploaded, media players can then use the

AMP Service tovalidate the authenticity of the corresponding media contents,even if the content is distributed without its manifest. A set oftools allows content providers to interact with the

AMP Service when the content is published. In addition to the serviceand tools, media players (browsers, smartphone applications,etc.) need to be extended to check and display provenanceinformation.Enabling large-scale media provenance will require thecooperation of multiple participants, including content pro-ducers, publishers, and technology providers. We envisionAMP supporting a media provenance consortium with opengovernance rules, where all the governance operations arerecorded in the

Media Provenance Ledger , for auditability andtransparency. We hope that the AMP project will be a startingpoint for broadly adopted standards for media provenanceveriﬁcation.In this paper we make the following novel contributions:1) We describe an end-to-end solution for media producersto provide provenance information for each media itemproduced.2) We describe and benchmark a novel system to track theprovenance of videos uploaded to the Internet.3) We describe how these videos can be distributed viaCDNs or social media platforms while maintainingthe required provenance information and not requiringcoordination with the CDN providers.4) We show how the use of novel ledger techniques canscale-out to handle the majority of media items producedfor distribution on the Internet.II. AMP S

YSTEM O VERVIEW

We provide an overview in this section of the core AMPconcepts and how they are composed to form an end-to-end media authentication and veriﬁcation system. Figure 6illustrates how the AMP components are integrated into aproduction, distribution, and rendering pipeline. A contentprovider uses the

AMP Authoring Tools to create signedmanifests and register them as part of publication, so thatit can be authenticated by the

AMP Service . The manifestscan either be created locally by the publisher if they do notwant to upload the media to the backend or alternatively in the

AMP Service itself. Other organizations such as a CDN, socialmedia platform, or internet service provider (ISP) can similarlyrecord transformations that they apply to the content provider’soriginal media content, using the

AMP Authoring Tools . The

AMP Service records the resulting publication metadata ina manifest, signed by the provider (or the transformer), andstored in a

Manifest Database (DB) for fast veriﬁcation. Oneor more cryptographic hashes of the media content are also

Media Provenance Ledger AMP Authoring

ToolsManifest

DatabasePlayers, Viewers, and Web Pages (Browser, Photo Tool, etc.)

Media Publisher, CDN, ISP,

Social Media Platform

Trust Execution Environmant

AMP ClientSoftware and Browser Extension

Client

Backend

AMP Authoring

Tools

AMP Service

Manifest

Fig. 1. High-level overview of the AMP system components (shown in blue). stored in a veriﬁable ledger, called the

Media ProvenanceLedger , using CCF. Finally, consumer applications such asbrowsers, web sites, or media players use AMP manifestsand libraries for verifying (i.e., authenticating) that a mediaitem indicated as coming from a content provider has beenpreviously registered in the

AMP Service by that provider.

A. AMP System Components

AMP Manifest.

The manifest is the central data structurein AMP. It authenticates media objects (including variouscryptographic hashes of their encodings) and binds them totheir publisher-provided metadata. Manifests support simplemedia objects, streaming media, progressive download andadaptive bitrate streaming. A manifest can also record theattribution of derived works through “back-pointers” to oneor more source objects, as well as descriptions of how theoriginal works were transformed.AMP includes two different sets of modules, one for a near-term solution and another for a long-term solution. The near-term modules indicate provenance using a detached manifestand can work with today’s media formats and infrastructure.In addition, the long-term solution uses an embedded man-ifest, which is included in the media stream itself, but itwill ultimately require extensions to both media and browserstandards.

Media Provenance Ledger.

Manifests are recorded on aCCF blockchain. CCF operates the ledger (i.e., blockchain)of published works, which is essentially a list of manifests,relying on trusted hardware and providing high availability viathe Raft [3] consensus protocol. Our implementation of CCFsupports the registration of new manifests and issues signedmanifest receipts. These receipts complement the producer’ssignatures; they enable any media consumers to independentlyverify that the media they receive has been published with thecorresponding metadata. CCF natively supports online query-ing and validation of transactions along with their endorsingcertiﬁcates.

Manifest Database.

Eventually in the long-term solution, weexpect manifests to be distributed with the media objects themselves so that their provenance can be veriﬁed locally.To support a gradual transition, and to withstand the distribu-tion of media without their associated manifests (e.g., mediastreamed from YouTube), AMP maintains an indexed manifestdatabase, so that clients can retrieve manifests given mediaexcerpts.

AMP Service.

The

AMP Service exposes the

ManifestDatabase and the

Media Provenance Ledger to client appli-cation through a set of REST APIs.

AMP Tools and Libraries.

We also provide a set of tools andlibraries for interacting with the

AMP Service . The tools cover:(a) the creation, signing, and ingestion of content/manifestsinto the AMP system, (b) querying the AMP system for mediaauthentication information and checking that media objectsare intact, and (c) AMP service governance (adding/removingmembers and users, etc.).

Fragile Watermarking.

In many cases, the media will betransformed without registering a manifest that records thetransformation. To facilitate the retrieval of any manifest forthe original media object, the publisher can insert a watermarkusing the AMP Watermark Tool. This watermark carries aunique manifest identiﬁer, which may be used to retrieve theoriginal contents and metadata, and can be used to comparethem with the transformed media.

User Experience Components.

To provide a good userexperience, AMP includes three components including twovariants of a browser extension, two demonstration web pages,and a modiﬁed Chromium browser capable of displaying a newvariant of an HTML video element.

Implementation

AMP has been implemented to run on Win-dows and Linux (Ubuntu LTS 18.04). The Media ProvenanceLedger has been developed and tested on Ubuntu 18.04 sincethe CCF framework only currently supports this version ofLinux.The core AMP components are primarily implemented inC

ANIFESTS

An AMP manifest is a data structure that cryptographicallyauthenticates media objects and their associated metadata.Manifests are registered on the Media Provenance Ledger(Section V), optionally distributed by media providers and dis-tributors, and recorded in a complementary Manifest Database(Section VII). The purpose of manifests is to allow mediaplayer clients to quickly and easily verify the publisher (andpossibly the distributor) of a media object. The values storedin the manifest data structure are generated by the contentprovider as it publishes the media object.AMP supports two types of manifests: static and streaming.A static manifest handles a simple media object (e.g. JPEG)or a collection of objects with different encodings (facsimiles), while a streaming manifest contains an array of cryptographichashes corresponding to “chunks” of the associated media. Forexample, a chunk might correspond to one or more secondsof video or audio.AMP manifests can be used to authenticate the originalsource material, or the transformation from one format to an-other. Note that checking whether a transformation is faithfulis not discussed here.AMP manifests are signed by publishers, CDNs, etc. Thecryptographic hash of a manifest is called its AMP manifestID (ManifestID). It serves as a unique identiﬁer and a commit-ment for the manifest. ManifestIDs are also digitally signed bycontent producers or distributors, and recorded on the ledger.AMP uses X.509 to create all digital signatures and SHA-256for all cryptographic hashes.

Static Manifests.

A few of the important ﬁelds of a staticmanifest (and the streaming manifest) are provided in Table I.A detailed description of the static manifest can be found inthe appendix. The publisher assigns a MediaID to identify aparticular media object. In addition, the MediaID is encodedinto the media object as a watermark and may also be insertedinto the media’s metadata.The EncodingInformation ﬁeld contains a string whichindicates the media type (e.g., “JPEG”, “MP4”). This ﬁeldhelps to guard against the media’s cryptographic hashes beingwrongly interpreted.AMP manifests can also authenticate media objects thatare derived from other media objects by means of “backpointers” to one or more source manifests. These “transfor-mation manifests” can be used by publishers or CDNs torecord transcoding and re-compressions of source material.Transformation manifests can also be used to record theoriginal media objects that were edited together to make acomposite derived work.The value of the OriginManifestID ﬁeld includes one ormore ManifestIDs that describe the source media used tocreate a derived work. If a media object is a simple transcodingof another media object, this will be a single element array. Ifa media object is created from several source objects (e.g., anews video created from several original media objects) thenadditional ManifestIDs can be recorded in the array. Note thatOriginManifestID[] is not authoritative on its own: it shouldonly be trusted if the ManifestID that describes the transformis signed by a trusted authority.The AMP manifest includes a Copyright ﬁeld which can beused to provide the copyright string associated with the mediaobject. This ﬁeld provides a simple and legally enforceableway of limiting fake or misleading manifests. Allowed stringsmay also be dictated in the AMP terms of service.In the simplest case (e.g., a picture or a text ﬁle), themanifest contains the cryptographic hash of the image or textand its associated metadata in the ObjectHash array ﬁeld.Optionally, the publisher can create and authenticate more thanone encoding of a media object to optimize for client screenresolutions or network conditions. We call these alternaterepresentations facsimiles . Streaming Manifests.

AMP authenticates media objects withdigital signatures. It is straightforward to do this with text

Field Manifest Type DescriptionMediaID Static/Streaming Publisher-assigned identiﬁer for the media object.MasterCopyLocator Static/Streaming URI of a stable, publisher provided location service or a generic URL redirector service.EncodingInformation Static/Streaming String describing the media type (e.g., “JPEG”, “MP4”).OriginManifests[] Static/Streaming One or more ManifestIDs that describe the source media used to create a derived work.Copyright Static/Streaming Copyright string associated with the media object.ObjectHash[] Static Cryptographic hash of the associated simple media object (or collection of related mediaobjects).ChunkDigest Streaming An ordered array of chunk-hashes starting from the beginning of the work.TABLE IK

EY MANIFEST FIELDS . A

MORE DETAILED DESCRIPTION OF THE MANIFEST STRUCTURES CAN BE FOUND IN THE SUPPLEMENTARY MATERIAL . and images: we simply generate the cryptographic hash andthen sign picture.jpg or doc.html. Streaming media is moreproblematic because (a) an application should not have to waitto download the entire ﬁle before it can check the signature,(b) streaming services support changing the stream resolutionto match network constraints (adaptive bitrate streaming),(c) some transport layers are lossy, and (d) users can oftennavigate back and forth in streams. These issues imply thatAMP must authenticate much smaller regions (i.e., “chunks”)in the stream.All of the ﬁelds for the streaming manifest match thosein the static manifest in Table I with the exception of theﬁnal ﬁeld. While a static manifest contains one or morecryptographic hashes of an image or text document in the Ob-jectHash ﬁeld, a streaming manifest contains a ChunkDigestwhich includes an ordered array of chunk-hashes. However,the details of the streaming manifest in the appendix shouldbe consulted for more details.Clients must be able to quickly determine where individualchunks start and end in order to be able to calculate thecryptographic hashes of the chunks and compare these againstthe entries in an AMP manifest. Unfortunately, different mediaformats and network delivery mechanisms require differentchunking strategies.In one case, the AMP system supports ﬁle offset-basedchunking, which works well for HTTP GET-based streaming(which is most common on today’s internet). Lossy broadcaststreaming requires different chunking strategies, such as I-frame-to-I-frame chunks for an MPEG stream. Practically,streaming players process a cryptographic hash of a chunkevery few seconds. In most scenarios, consecutive chunksdelivered to the client will map to consecutive chunk-hashesin a single manifest. However, if a server is dynamicallyswitching streams, then more than one manifest may be neededto authenticate a stream.AMP also supports adaptive bitrate streaming protocols suchas DASH and HLS. Adaptive bitrate streaming requires severaldifferent encodings of a media object, optimized for differentnetwork conditions and client capabilities. Adaptive bitratestreams are supported in AMP either by publishing severalmanifests authenticating the different encodings, or by usinga single manifest that authenticates multiple facsimiles. Detached and Embedded Manifests.

Initially before en-coding standards can be modiﬁed, manifests will be storedseparately from the media itself, and we call these “detachedmanifests”. Long-term, we hope that “embedded manifests”

AMP Alliance FrankFrank HeatherHeatherGeorgeEllenTPS USACharles DeeCharles DeeTPS UKAlice BobAlice Bob Western Broadcasting CompanyTransnational Press Syndicate WBC SouthWBC North WBC SouthWBC North

Fig. 2. Example Public Key Infrastructure will be contained within the media’s metadata and be trans-ported within the media stream itself. We have implementedtwo versions of AMP utilizing both detached manifests andembedded manifests.IV. P

ROVENANCE B INDING

Authenticating that media has not been altered since themanifest was signed demonstrates the media’s integrity, buttying the signer to an identity known and trusted by theconsumer is what provides provenance , and allows the con-sumer’s trust in that producer to be imputed to the media.We have deployed a public key infrastructure (PKI) of X.509certiﬁcates [4] governed and administered by an alliance toprovide a root of trust for establishing identity. The alliance isthen trusted to verify the identity of media producing organiza-tions and individuals, and issue credentials from its CertiﬁcateAuthority (CA) to those organizations and individuals thatcan be used to sign manifests and authenticate to the

AMPService . We expect that this responsibility will be delegatedto Certiﬁcate Authorities, who already provide these servicesfor the authentication of secure web sites. They will performdue diligence in establishing the identity of media producerapplicants for credentials under contractual obligations to thealliance. The hierarchical nature of a certiﬁcate-based identitysystem allows a single parent credential to be issued to theorganization, which can then issue subordinate credentials forindividuals or organizational units. The exact structure of thesubtree of the PKI for a particular organization is beyond thescope of this design, as it is intended to be customized to theparticular needs and structure of each media producer.

Initially the root(s) of this PKI will be operated by thealliance and disconnected from the roots of trust currently usedfor the web PKI. Certiﬁcates used by participants will be givenExtended Key Usage (EKU) extensions authorizing them forparticular purposes. We have identiﬁed ﬁve uses and thereforeﬁve EKUs to use in this PKI: 1) server authentication, usedby the

AMP Service to authenticate itself to clients, 2) clientauthentication, used by clients to authenticate themselves tothe

AMP Service , 3) manifest signing, which will be usedby producers to sign manifests, 4) time stamping, which willbe used by the

AMP Service and ledgers to attest to thepublication time of a manifest, and 5) ledger registration,which will be used by ledgers to countersign manifests andattest they have been registered on that manifest. Server au-thentication, client authentication, and time stamping alreadyhave standard EKUs deﬁned by the standard, and we willuse those. Manifest signing and ledger registration EKUsare new purposes for which permanent, unique EKUs havenot yet been allocated. We expect some certiﬁcates will beissued with multiple purposes: for example, the signer of amanifest will frequently be the client who registers it with the

AMP Service , and so may use the same certiﬁcate for bothpurposes. Whether or not to combine these purposes in a singlecertiﬁcate becomes a governance decision for the alliance andmedia-producing entities, and the structure of our PKI allowsfor both possibilities.One possible structure for such a PKI is given in Figure 2.A single root operated by the alliance sits at the top, andissues intermediate CA credentials to each participating orga-nization: in this example, the Transnational Press Syndicate(TPS) and the Western Broadcasting Company (WBC). Theseorganizations each in turn issue further credentials to unitsof their organization: UK and USA bureaus in the case ofTPS, and North and South bureaus in the case of WBC.Below each of these intermediates are individuals, but theyare enclosed in a dotted-line box because, as described above,these are optional: An organization may wish to issue signingcredentials to individuals, in which case the organizational unitcredentials are also intermediate CAs. Alternatively, organiza-tions may wish to maintain centralized publication pipelines,ingest media from individuals through a mechanism externalto AMP, and sign as part of this process. In this case, theorganizational unit credentials themselves are leaf certiﬁcates.V. M

EDIA P ROVENANCE L EDGER

AMP implements an instance of CCF [5] to build a ledger-based application which is designed to securely store a cryp-tographic hash and copyright string for each manifest. Anyapplication built with CCF is designed to be administered by agroup of consortium members via CCF’s governance features.Additionally, AMP utilizes signed receipts as standalone proofthat manifests are registered at a given index in the ledger.CCF exposes to its users a key-value store. This key-value store provides a simple abstraction of keys being acryptographic hash of a manifest (i.e., ManifestID), with thevalue being a signature computed by the publisher over aconcatenation of the ManifestID and the copyright string (i.e., Copyright in Table I). Once written, these key-value pairs arestored in a Merkle tree, and the Merkle tree is replicated andstored on persistent storage. To ensure that any tampering canbe detected, CCF maintains a private key that the serviceprotects and occasionally uses it to sign the Merkle root inthe distributed ledger.One of the core features that AMP utilizes from CCF is itsuniversally veriﬁable receipts. The receipt for a given requestvalidates the query, its response, and, more importantly, itcertiﬁes that its execution was recorded on the ledger. The keyproposition of a receipt is that it is possible to cryptographi-cally validate that the signature of the manifest’s cryptographichash and the copyright string were successfully recorded,based on just the manifest, the receipt, and the public keyof the CCF service [1] without needing to contact the CCFservice.Our prototype is designed to be run in a cloud datacenter.In a real-world implementation we expect and have designedthe service to be run by an operator (such as Azure). CCF’sutilization of trusted execution environments allows for aCCF service to be run in a public cloud while maintainingconﬁdentiality from the cloud provider or operator.VI. F

RAGILE W ATERMARKING

We use watermarking to modify the media content in animperceptible way. Faint noise-like patterns are inserted withinthe media content at production, and they can be read backat rendering. We tune the watermarking parameters such datamedia editing that preserves reasonably high ﬁdelity preservesthe detectability of watermarks, whereas heavier editing suchas partial content replacement or fake media insertions [6],[7]) will render the watermarking indetectable. Hence the termfragile watermarking.We propose the use of fragile watermarking techniquesusing a spread-spectrum approach [8], which adds low-levelpseudorandom noise patterns within the media payload, beit video, audio, or images. The added noise is low enough(comparable to the small distortions due to the compressionformats) and can be embedded in such a way that makes itimperceptible to human eyes and ears.For each type of media and application scenario, we can de-sign watermarking parameters that inﬂuence the thresholds onallowed changes, so that various kinds of minor modiﬁcationsare considered as benign editing. In addition, we use keylesswatermarking for AMP which simpliﬁes system design andmakes watermarking detection open, so it can be performedby any entity in the media distribution path.

Watermark Payload and Insertion.

The watermark payloadstring, which is inserted into the media item, is describedin Table II and contains the following ﬁelds: a media objectID (MediaID), a publisher URI (MasterCopyLocator), and asignature over these two ﬁelds (WatermarkPayloadSignature).AMP does not provide a centralized database containing theMasterCopyLocator and MediaID. Instead after decoding, theclient extracts the payload and submits the MediaID to thepublisher using via MasterCopyLocator. Both the MediaIDand the MasterCopyLocator are speciﬁed by the publisher. The

Field DescriptionMediaID Publisher-assigned identiﬁer;same as in Table I.MasterCopyLocator Same as in Table I.Watermark Signature value over the MediaIDPayload and the PublisherID.Signature TABLE IIW

ATERMARK PAYLOAD . MasterCopyLocator is typically a URI for the publishers Webservice which is used to locate the media by their uniqueMediaID. The watermarking insertion process transforms amedia object by embedding a signed watermark before itspublication.

Watermark Decoding.

The client inputs a media object tothe Watermark Veriﬁcation Module in the AMP libraries toextract the watermark payload ﬁelds depicted in Table II.The Watermark Veriﬁcation Module uses the MasterCopy-Locator to obtain a signing certiﬁcate. Then, the WatermarkVeriﬁcation Module uses this signing certiﬁcate to check theWatermarkPayloadSignature over the MediaID and Master-CopyLocator. If this cryptographic step succeeds, it ﬁnallyreturns the MediaID and the MasterCopyLocator back to theclient. Once the client has recovered the MasterCopyLocatorand the MediaID, it can then contact the publishers provenanceservice to authenticate that the media is valid. Watermarkextraction is keyless: either it fails, or it returns the watermarkpayload. VII. M

ANIFEST D ATABASE

Ideally in the future, all AMP manifests and ledger receiptswill be delivered as additional metadata with the mediaobjects. Delivering the receipt along with the media allowsthe client to quickly validate that the media has been pre-viously authenticated without contacting the AMP Service.The widespread use of adding the manifest and receipt to themetadata will most likely require adoption by one or moremultimedia standards bodies. In the mean time, a client canuse the

Manifest Database to map a media object or chunk toa suitable manifest and receipt.The AMP

Manifest Database contains manifests and re-ceipts. It is exposed as a public service that lets clients obtainone or more AMP manifests and receipts that authenticate apublished or transcoded media object. To perform this functionefﬁciently, the

Manifest Database uses the following indexes:(a) the MediaID delivered via the metadata or a watermark,and (b) the media ObjectHash or, in the case of streamingmedia, the cryptographic hashes of all of the contained chunks(ChunkDigest).Media players can quickly and easily extract or calculatethe ObjectHash or a ChunkDigest from the media, and thenuse the

Manifest Database to ﬁnd a matching manifest andthe corresponding receipt. To validate the legitimacy of anymanifest that was retrieved from the

Manifest Database thefollowing steps will need to occur:1) The contents of the manifest will be hashed by apredetermined cryptographic hash function. 2) The receipt will then be checked to ensure that itcontains the previously calculated hash.3) The validator will then validate that the receipt is en-dorsed by media provenance ledger via a signature overthe receipt by the private key of the CCF service.These steps ensure the validity of the manifest returned by the

Manifest Database by proving it is produced and endorsed bythe media provenance ledger.The

Manifest Database can be centralized or distributed.Because authoritative truth is stored in the ledger, the securityrequirements for the

Manifest Database are much less thanfor the ledger itself. Note that AMP manifests do not addressproblems that arise from more than one publisher signing thesame original content either the same simple object or oneor more ChunkDigests. Similarly, the AMP Service does notstop a rogue CDN from claiming that one media object isa faithful transformation of an original when in fact it hasbeen maliciously authored. We believe that these issues canbe addressed by a combination of client policies (e.g., onlyconsider the oldest manifest of a media object) and server-side terms-of-service.

Transformation Services.

A Transformation Service takesone or more media objects and creates a derived object. ACDN is a simple example: CDNs can take a single mediaobject and re-encodes it into several derived objects withdifferent compression parameters to optimize for bandwidthand network losses. AMP manifests support transformationservices by allowing entities to indicate the ManifestID of oneor more source objects that were used to create the derivedobject.Note that a transformation manifest does not in itselfguarantee that a derived object is indeed a high-ﬁdelity trans-formation of a source object. It is entirely possible that the“purportedly derived” object is unrelated to the stated original.Trust assessments should involve the entity that signed thetransformation manifest. In the simple case, this might bethe original publisher. For example, a media publisher createsa master media object and a dozen copies with differentcompression factors. A more complex example might be aCDN acting on behalf of the media publisher.Policies can be developed for transitive trust that work forcommon scenarios. These policies can be enforced with acombination of client- and server-side rules, as well as server-side terms-of-service. Other entities might create and signtransformation manifests. For example, a third-party servicemight use heuristics to compare the semantic content of twovideos and create and sign transformation manifests for thevideos that they determine are semantically identical. Oncemore, AMP makes no trust assumptions: it is up to clientsto use trust policies that are appropriate for a given scenario.In the case of streaming manifests, there is no requirementthat source-chunks map 1:1 to transformed chunks: chunksare “natural” for each stream.

Manifest Revocation.

As noted previously, CCF’s ledger isimmutable; once a manifest is stored on the CCF ledger,it cannot be removed. Therefore when a publisher wants torevoke a manifest from the ledger, it must insert a revocation object to the ledger. To enable efﬁcient queries, the

ManifestDatabase deletes this manifest in this case.VIII. U

SER E XPERIENCE

User experience is a critical part of the AMP system.AMP provides three separate types components to facilitatea good user experience including a modiﬁed browser, browserextensions and example web sites.

A. Modiﬁed Chromium Browser

For media with the embedded manifest, we ﬁrst created amodiﬁed Edge Chromium browser which included a modiﬁedvideo element. This modiﬁcation was done to evaluate theembedded manifest included in a modiﬁed video stream.

B. Provenance Browser Extensions

AMP includes separate browser extensions for displayingmedia using detached manifests and embedded manifests. Thegoal of the detached manifest is to allow for the authenticationof the media without any modiﬁcations to the standardized me-dia and browser as well as the media transport infrastructure.To do so, we have created a browser extension that works onboth the Chrome and latest Edge Chromium browsers.To support media authentication, the AMP detached man-ifest browser extension monitors the web trafﬁc using thewebRequest API for a particular site. Typically, a particularmedia player embedded in either a browser or an applicationstreams the media for playback or rendering using HTTPpartial-GETs. This process fetches data from the media serverbased on different protocols which may vary depending uponnetwork quality conditions. Since AMP’s detached manifestrequires hashes to be computed on ﬁxed chunk sizes so thatthe hashes of the received media matches those in the manifest,AMP’s browser extension must stream of second copy of thedata. For demonstration purposes, we have implemented thebrowser extension to authenticate any video on YouTube aftera manifest for that video has been uploaded to the

AMPService . An example of this browser extension is shown inthe Figure 3 for the case of a video which has been registeredin the manifest. For this authenticated video, the browserextension displays a green check mark indicating that thevideo’s manifest has indeed been located in the ManifestDatabase. If the video cannot be authenticated, the browserextension icon is blank. If a video can be authenticated, thenclicking on the browser extension icon does display the coremanifest information associated with its publication.The browser extension for the embedded manifest has asimilar user interface. However, the detection mechanism ismuch simpler than for the detached manifest because thebrowser signal is generated by the new video element in themodiﬁed Edge Chromium browser.

C. Example Web Sites

While browser extensions are able to convey whether asingle video which is being played has been authenticated,it has difﬁculties conveying to a user if two or more video

Fig. 3. Browser extension which supports a detached manifest. The greencheck mark in the upper right indicates that the video has been authenticated.Clicking on the browser extension icon causes the core manifest informationto be displayed in the popup window.Fig. 4. Example of a synthetic social media that can be used to displayﬁne-grained provenance signals. can authenticated while being played simultaneously. Anotherchallenge is that it may be difﬁcult for the user to understandwhich video on a web page containing multiple videos is beingauthenticated. In this case, it may be useful for the web pagedeveloper to embed the provenance signal directly into theweb page. To this end, we have developed two demonstrationweb sites to display AMP’s authentication information in aﬁne-grained setting. The ﬁrst is a synthetic social media website shown in Figure 4, and the second is an example newssite that is depicted in Figure 5.IX. T

OOLS AND G OVERNANCE

There are two parts to the authoring and management back-end. The ﬁrst part supports the publishing ﬂow. We havedeveloped tools that create a signed manifest (AMP ManifestCreation Tool, AMP Signing Tool), watermark the media(AMP Watermark Tool) and record the manifest on a ledger(AMP Ledger Insertion Tool). The AMP Client ProvenanceLibrary can be used by the client application to chunk a videoand compute the cryptographic hashes of these chunks. Thesetools and tool-chains can be used by an ISP, CDN, or anothermedia editing tool to support “authenticated transformations”

Fig. 5. Demonstration news web site for displaying AMP’s provenancesignals.

Fig. 6. Integration of AMP tools and services into a media production anddistribution pipeline. of an original work, as well as tools that allow authenticationinformation to be added to legacy media (e.g., videos alreadyhosted on YouTube).The second part of the authoring back-end relates to gov-ernance. We use the Microsoft CCF (Conﬁdential ConsortiumFramework) technology to maintain a ledger of publishedworks and provide a governance model over it. CCF providesa ﬂexible governance model, allowing for a group of membersto vote on everything from adding and removing users toupdating the CCF service code. If AMP is adopted to providemedia provenance, we will collaborate with our media partnersat that time to create a governance model. When additionalpartners join the partnership, we will use CCF to evolve thegovernance rules as required.X. E

XAMPLE M EDIA P UBLISHING F LOW

The purpose and operation of the various AMP componentsis demonstrated by tracing a typical ﬂow of media throughthe system. A typical ﬂow is depicted in Figure 6. Themedia publishing ﬂow consists of two phases: publishingand playback. We present below how various AMP Servicecomponents can be used during the publishing and playbackphases.

Publishing

Assume a content producer generates two media objects:picture.jpg, and video.mp4. The publisher:1) Uses ffmpeg to convert video.mp4 into a set of re-compressions, video[n].mp4, at various quality levels(e.g., using DASH).2) Generates a collection of unique ObjectIDs for theobjects to be authenticated.3) Uses the AMP Watermarking Tool to insert encodedversions of the ObjectIDs, their PublisherID and theWatermarkPayloadSignature into the watermark payloadof the picture.jpg and all videos that are to be published.4) Uses the AMP Manifest Creation Tool to create amanifest for the media objects.5) Uses the AMP Signing Tool to signing the manifest withits publisher’s key.6) Registers the manifest’s cryptographic hash and copy-right string with the Media Provenance Ledger usingthe AMP Ledger Insertion Tool.7) Uploads the manifests to the AMP

Manifest Database using the AMP Ledger Insertion Tool.8) Broadcasts (i.e., stages on a web site, etc.) picture.jpgand video[n].mp4.Optional step for CDNs, ISPs, etc:1) CDNs take video[n].mp4 and picture.jpg and createfurther derived copies using steps 4 through 8 except thatthese manifests refer back to the original AMP manifest.

Playback

A client application (e.g., browser, media player, etc.):1) Links to the AMP Client Provenance Library or imple-ments the functionality itself to cryptographically hasha video object’s “chunks” or simple media object (e.g.,JPEG, text).2) Consults the AMP Service to obtain a suitable manifestor manifests.3) Veriﬁes the publisher’s signature and the receipt gen-erated by CCF to ensure that the manifest is valid.Successful veriﬁcation ensures the authenticity of themedia.4) Displays the authentication information (simple or morecomplex information) if the media is authenticated.5) Searches for a watermark in the media If a valid manifestis not found in the

Manifest Database . Next, attempts tovalidate the media object based on the PublishedID andObjectID if the WatermarkPayloadSignature is valid.XI. P

ERFORMANCE E VALUATION

Media Provenance Ledger.

We begin by measuring the timerequired to insert a manifest’s relevant data into the

MediaProvenance Ledger . In this test, we insert strings, whichconsist of an example 256-bit cryptographic hash of a manifest(ManifestID) and a copyright string, into the ledger. Thesedata structures do not need to be addressable in CCF sincethe fact that they are recorded in the ledger is sufﬁcient. To this end, we measure the maximum sustainable rate at whicha manifest’s data can be submitted.

Application.

We built a C++ application that customizes theCCF framework to produce a

Media Provenance Ledger . Theledger application is small and can be expressed in severalhundred lines of C++ code. The following is an example ofthe data that the ledger application stores: {"method": "LOG\_record","params’’: {"id": 0, "msg":"88c3ba2b25cef698d9ca6775b7fd5c5e8bbc246098a55ad51b8078834c4add44Copyright (c) CompanyName Corporation.All rights reserved."}}

Experimental Setup.

We ran the performance application inthree cluster conﬁgurations:1)

Single Azure Region - Each computer is an Intel(R)Xeon(R) E-2176G CPU @ 3.70GHz, and the applicationruns inside a 4 core virtual machine.2) - Each com-puter is an Intel(R) Xeon(R) E-2176G CPU @ 3.70GHz,and the application runs inside a 4 core virtual machine.The computers are every distributed between the eastUSA and west Europe Azure regions.3)

Emerging hardware - A cluster that is running in ourown datacenter. All computers are under the same 40Gswitch, and the computers is an Intel(R) Xeon(R) E-2288G CPU @ 3.70GHz which has 8 cores.All of these VMs are running Ubuntu 18.04, and the resultsare shown in Table III. We expect that there will be up to 1billion entries added to the ledger every day, this results inan expected load of 11,575 operations per second. We canconclude from these results that our implementation of the

Media Provenance Ledger can comfortably handle this load.Even with just a few nodes, we can achieve latencies that arelow enough to not interfere with the user’s experience in mediaconsumption.

AMP Service.

Next, we estimate the maximum scale re-quirements for the AMP Service assuming the followingparameters: • • Every publisher uploads 100, 10-minute original videoclips uploaded each day • The video is divided into 10 second chunks (10 mins is60 chunks) and each chunk is cryptographically hashed • Each original video is transformed into 99 (100-1) vari-ants by the CDNUsing these parameters, this translates into •

370 million original videos/year •

37 billion original and transformed videos/year •

22 billion original chunks/year • Conﬁguration 1

Throughput (tx/s) Avg. latency (ms) nodes

Conﬁguration 2

Throughput (tx/s) Avg. latency (ms) nodes

Conﬁguration 3

Throughput (tx/s) Avg. latency (ms) nodes

EDIA P ROVENANCE L EDGER THROUGHPUT AND LATENCY . YouTube ID

ODG Mean ODG StdXFmn9kmZAWU -0.74 0.0066xn 8UQ1W6 c -1.51 0.0043bF nULoyi9o -0.99 0.0039iuX826AGXWU -1.26 0.0026TABLE IVO

BJECTIVE DIFFERENCE GRADE SCORES FOR AUDIO WATERMARKING OFFOUR DIFFERENT Y OU T UBE VIDEOS . known chunks is 1.4 TBytes. Azure offers VMs with enoughmemory and disk to hold the index in a single instance, andtherefore the index will not require sharding.If the AMP Service exceeds these estimates, we can shardthe index. Scaling through sharding is easy: the indices arecryptographic hashes so they will be uniformly distributed.Therefore, we believe that it will be practical to have the Manifest Database indexed on chunk-hashes.

Audio Watermarking.

AMP’s audio watermarking moduleinserts a watermark into the frequency domain coefﬁcientsof the audio signal. It is important to measure the distortionintroduced by the watermark, as we want it to be imper-ceptible. Table IV measures the Objective Difference Grade(ODG) [9] for the audio channel of four different YouTubevideos. The ODG ranges from 0 (no distortion) to -4 (highperceptual distortion). The mean and standard deviation arecomputed for ﬁve different trials with 1000 random bits ofinformation inserted using 512 chips per information bit.Preliminary experiments show that watermarking generates noaudible distortions.XII. AMP P

ARTNERSHIP AND S TANDARDS

We believe that the proposed AMP media provenancecertiﬁcation and veriﬁcation system can only be successful ifit becomes a widely adopted industry standard. Thus, we areforming a partnership with media organizations and additionaltechnology providers. We plan to put this collaboration on aformal footing through the formation of an industry alliancesimilar to the Alliance for Open Media. Other companies can join, either as active contributors or supporters. Such a modelcan move quickly for ratiﬁcation of a more detailed design,with the goal of developing reference code and performingsufﬁcient testing to assess the efﬁciency and performance ofthe proposed provenance certiﬁcation and veriﬁcation system.A key goal of such an effort should be to promote thedevelopment of an open standard, to motivate fast adoption.We believe that the implementation of a reliable provenancecertiﬁcation and veriﬁcation system can be a signiﬁcant stepin increasing trust in media. It will also beneﬁt the businessmodels of all bona ﬁde entities involved in the creation anddistribution of media.XIII. D ISCUSSIONS

It will take a number of years before manifests for a largepercentage of online media are stored in AMP. We believethis content gap and inability to report on the authenticity ofmedia will be biggest issue with adoption. We expect thatthis can be solved in the user-interface such that users areonly informed when there is valuable information to provideto them. At the point when most media that is consumed doeshave authentication it would become prudent to report thatauthentication for some media is missing. Another directionfor future work is understanding how provenance informationshould best be conveyed to the user as another heuristic forevaluating content credibility [10], [11].One area that AMP does not address is the detection of fakemedia. We believe that the quality of fake media will rapidlyimprove and become more widely encountered. Additionalfake media detection algorithms will need to be incorporatedinto the media processing pipeline in the near future. A numberof academic and industry efforts are currently underway toimprove the detection of deep fakes. We see this work asorthogonal to the provenance solution proposed by AMP, andthese detection methods can also be included in the future aspart of the AMP service.We have designed AMP to authenticate that a media itemwas published by a known source. AMP is not a digitalrights management (DRM) system that is designed to enforcecopyright of the media content providers. Media provenanceand AMP are about verifying the producing entity, not ver-ifying/tracking/authorizing the consuming entity. While it ispossible to use AMP in this way, functionality such as self-veriﬁable receipts would work against this, and this is aproperty we do not intend to change.XIV. R

ELATED W ORK

Previous related research to the AMP system and effort spanthree main areas: previously proposed provenance systems,other provenance partnerships, and deep fake detection andcontent generation.

Provenance Systems . Provenance systems for the preventionof deep fakes is a new and relatively understudied area. Theprovenance-based system that is most closely related to AMPwas recently proposed by Hasan [12]. Like AMP, this systemalso employs blockchain. However, it is based on the Ethereumblockchain and smart contracts. Since AMP utilizes CCF, it is much more efﬁcient, allowing the speedup of manifestinsertion and queries by several orders of magnitude whichis required for widespread deployment.In addition to [12], several startups have proposedprovenance-based systems including: Amber and Witness.Amber’s technology [13], [14] is aimed at camera manufac-turers and adds a cryptographic hash to the video at a userspeciﬁed rate. Similar to AMP, these hashes are then storedon an Ethereum blockchain.Similarly, Truepic [15] also provides a photo and veriﬁca-tion service where the cryptographic signature is written to ablockchain.

Provenance Partnerships . Several other partnerships havebeen created to ensure the provenance of media. The NewYork Times Company is working with IBM on The NewsProvenance Project [16]. This collaboration is also using ablockchain to provide a provenance solution for media.The Content Authenticity Initiative is a second partnershipwith Adobe, The New York Times Company and Twitter [17].Witness is a non-governmental organization which aims tohelp ensure that human rights abuses can be documented in averiﬁable manner. Witness published the ProofMode Androidapplication [18] in 2017 which stores metadata about imagesand videos taken by those seeking to provide evidence ofhuman rights abuses. The app includes a hash of the media andits metadata along with a cryptographic signature that helpsto ensure the chain of custody.

Deep Fake Detection . Deep fake detection is an alternatemethod to provenance solutions and rely on the algorithmicdetection of synthetically generated media. A number of deepfake detection algorithms have been proposed in the literature.In [19], Li et al. describe their realization that deep fakevideos which had been created prior the paper’s publicationin 2018 often had eyes which failed to blink, which is naturalfor humans. Thus, they created an eye blink detector and usedit as a proxy to detect deep fake videos.McCloskey and Albright [20] noted that generative adver-sarial networks (GANs) fail to accurately reproduce colors thatare captured naturally by photosensitive cells in a camera’ssensor. Their approach to detecting deep fakes is to train aconvolutional neural network (CNN) to detect this mismatchin the color.Face warping artifacts can be introduced during the gen-eration of deep fake videos. Li and Lyu trained a CNN todetect these artifacts to detect some types of deep fake attacksin [21]. Similarly, Yang et al. [22] also trained a CNN to detectinconsistencies in head poses.In [23], Korshunov and Marcel explore trying to jointlyuse the audio and video, but their experiments indicated thatadding the audio did not help.In the FaceForensics++ system proposed by R¨ossler etal. [24], the Xception computer vision object recognitionmodel which also employs CNNs were also used to varioustypes of deep fakes. A leader board of deep fake detectionalgorithms on the FaceForensics++ dataset can be foundat [25].

Content Generation . Generative adversarial networks wereoriginally proposed by Goodfellow, et al. [26]. Several impor- tant works [27], [28] have investigated using GANs for large-scale, synthetic image generation. Recent research in GANshas enabled talking head models to be quickly adapted withjust a few frames [29].Popular face swap algorithms include Deepfakes [6] andFaceSwap [7]. Facial expressions can be transferred from oneperson to a person in a video in real-time using the Face2Facealgorithm [30]. XV. C ONCLUSION

Synthesized and fake media has become a threat to individu-als and private and public institutions. The threat has increasedbecause of rapid advances in methods for synthesizing media,coupled with the wide reach of the Web. Fake media hasthe potential to signiﬁcantly undermine trust in media andjournalism, threatening the foundations of democracy. Webelieve that algorithms for fake media detection will havelimited success in the long term, so we propose the useof provenance certiﬁcation and authentication, as that is afundamental step in increasing trust.We have proposed, designed, and built a prototype of theAMP system. AMP allows trusted content providers to formone or more consortiums that allow applications such as amedia player or a browser to provide an indication to users thatthe source of the content they are viewing has been veriﬁed.Beyond the core security pipeline, human factors and designwill play an important role on the success of AMP. Inspiredby the TLS lock icon, we propose that applications such asbrowsers and media players include UI elements to alert usersthat the received content can be traced back to its originalsource.For a provenance solution such as AMP to be successful, itmust be formally adopted by a recognized standards body. Weare seeking the development of such standards for the AMPsystem or a variant that provides similar functionality. Wealso believe that it is important to open source the code for awidely used provenance system. Thus, we plan to open sourcethe AMP system in near future to facilitate its widespreadadoption. R

EFERENCES[1] Microsoft, “Ccf: A framework for building conﬁdential veriﬁ-able replicated services,” https://github.com/microsoft/CCF/blob/master/CCF-TECHNICAL-REPORT.pdf, 2019.[2] ——, “Ccf documentation,” https://microsoft.github.io/CCF/, 2019.[3] D. Ongaro and J. K. Ousterhout, “In search of an understandableconsensus algorithm,” in

USENIX Annual Technical Conference (ATC) ,2014.[4] D. Cooper, S. Santesson, S. Farrell, S. Boeyen, R. Housley, and W. Polk,“Rfc 5280: Internet x.509 public key infrastructure certiﬁcate andcertiﬁcate revocation list (crl) proﬁle,” https://tools.ietf.org/html/rfc5280.[5] Microsoft, “Conﬁdential consortium framework,” https://github.com/Microsoft/CCF, 2019.[6] “faceswap: Faceswap is a tool that utilizes deep learning to recognizeand swap faces in pictures and videos,” https://github.com/deepfakes/faceswap/.[7] M. Kowalski, “Faceswap,” https://github.com/MarekKowalski/FaceSwap/.[8] H. Malvar and D. Florencio, “Improved spread spectrum: Anew modulation technique for robust watermarking,”

IEEETransactions on Signal Processing

Second International Conference on Web Deliveringof Music, 2002. WEDELMUSIC 2002. Proceedings.

IEEE, 2002, pp.161–167.[10] S. Zannettou, M. Sirivianos, J. Blackburn, and N. Kourtellis, “The webof false information: Rumors, fake news, hoaxes, clickbait, and variousother shenanigans,”

Journal of Data and Information Quality (JDIQ) ,vol. 11, no. 3, pp. 1–37, 2019.[11] J. Schwarz and M. Morris, “Augmenting web pages and search resultsto support credibility assessment,” in

Proceedings of the SIGCHI con-ference on human factors in computing systems , 2011, pp. 1245–1254.[12] H. R. Hasan and K. Salah, “Combating deepfake videos usingblockchain and smart contracts,” in

IEEE Access

IEEE Workshop onInformation Forensics and Security (WIFS) , 2018.[20] S. McCloskey and M. Albright, “Detecting gan-generated imagery usingcolor cues,” arXiv preprint arXiv:1812.08247 , 2018.[21] Y. Li and S. Lyu, “Exposing deepfake videos by detecting face warpingartifacts,” in

Workshop on Media Forensics , 2019.[22] X. Yang, Y. Li, and S. Lyu, “Exposing deep fakes using inconsistenthead poses,” in

IEEE International Conference on Acoustics, Speechand Signal Processing , 2019, pp. 8261–8265.[23] P. Korshunov and S. Marcel, “Deepfakes: a new threat to face recogni-tion? assessment and detection,” arXiv preprint arXiv:1812.08685 , 2018.[24] A. R¨ossler, D. Cozzolino, L. Verdoliva, C. Riess, J. Thies, andM. Nießner, “Faceforensics++: Learning to detect manipulated facialimages,” in

ICCV 2019 , 2019.[25] T. V. C. Group, “Faceforensics benchmark,” http://kaldir.vc.in.tum.de/faceforensics benchmark//, 2019.[26] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,S. Ozair, , and A. C. Y. Bengio, “Generative adversarial nets,” in

Conference on Neural Information Processing Systems (NIPS) , 2014.[27] T. Karras, T. Aila, S. Laine, and J. Lehtinen, “Progressive growingof gans for improved quality, stability, and variation,” in

InternationalConference on Learning Representations(ICLR) , 2018.[28] A. Brock, J. Donahue, and K. Simonyan, “Large scale gan training forhigh ﬁdelity natural image synthesis,” in

International Conference onLearning Representations(ICLR) , 2019.[29] E. Zakharov, A. Shysheya, E. Burkov, and V. Lempitsky, “Few-shotadversarial learning of realistic neural talking head models,” arXivpreprint arXiv:1905.08233v2 , 2019.[30] J. Thies, M. Zollh¨ofer, M. Stamminger, C. Theobalt, and M. Nießner,“Face2face: Real-time face capture and reenactment of rgb videos,” in

Proc. Computer Vision and Pattern Recognition (CVPR), IEEE , 2016.[31] M. Castro and B. Liskov, “Practical byzantine fault tolerance,” in

OSDI A PPENDIX AI NTRODUCTION

The purpose of this appendix is to provide additional detailsabout the AMP system design which are not covered in thesubmitted paper. In particular, Section B includes an extendeddescription of the different types of manifests, and Section Cdescribes AMP’s structures in detail. Finally, CCF is used toprovide additional functionality which is described in D. ManifestContainer ManifestCore

PublisherAttestation

LedgerAttestation

FacsimileInformation

FacsimileDescriptor

FacsimileInfoDigest

WorkInfo

PublisherInfo

HashAlgorithm

Fig. 7. A simpliﬁed illustration of a ManifestCore and related data structures.Many ﬁelds and some data structures are omitted for clarity. A PPENDIX BM ANIFEST D ETAILS

This section provides additional details about the full man-ifest which is depicted in Figure 7. A manifest is actuallya container (ManifestContainer) and includes a core manifestcalled the ManifestCore, a FacssimileInformation to describefacsimiles of the original media object, and two structureswhich provide supporting evidence of the publisher (Pub-lisherAttestation) and the ledger (LedgerAttestation). Beforedescribing the manifest details, we next provide an overviewof how to use a manifest.

A. Using Manifests

Manifests can be created by publishers, re-distributors(CDNs, ISPs), social media platforms, recording devices, etc.and manifests are signed by the entity that created them.Manifests can also be countersigned by cloud services: forexample, the CCF cloud service produces a signed receiptto acknowledge that a Media Manifest has been recordedon a ledger. Manifests are conventionally JSON or CBORencoded; the canonicalization rules are described later in thisspeciﬁcation. If the canonicalization rules are followed, thenmanifests can be translated back and forth from JSON toCBOR as needed. For ease of development, the manifestauthoring tools sign both the CBOR representation (usinga COSE signature) and the JSON representation (using aJWT signature). The ManifestID is the hash of the CBOR-encoded manifest. Manifests can be delivered to clients asmetadata with the actual media objects. However, since theecosystem for delivering media is complex, we expect that itwill take time for this delivery infrastructure to be widespread.Considering this, AMP provides a

Manifest Database thatclients can use to search for a manifest for a work. Thereare several ways to query the

Manifest Database , including querying by ManifestID, querying by the hash of the entirework, or an array of hashes of chunks of the work.

B. Authenticating Works

Each manifest authenticates either precisely one work, orseveral facsimiles of a work. There are no technical restrictionson what constitutes a facsimile, but the intention is thatfacsimiles support the very common scenario in which websites, CDNs, etc. prepare a family of media objects (images,video, audio) that are optimized for different devices andnetwork conditions, but all of which represent the same contentjust not the same exact bits.Manifests broadly contain two classes of data:

Metadata:

This is publisher-assigned data, such as a publishername and a title for the work.

Media bindings:

This describes the facsimiles: for example,cryptographic digests of the media, or subsets/chunks of themedia and media type information.These ﬁelds are described in more detail in the followingsections.

C. Metadata

Most metadata is contained in the structures PublisherInfoand WorkInfo, with the option to include facsimile-speciﬁcinformation in the FacsimileDescriptor structure. This spec-iﬁcation intentionally limits the metadata that is deﬁned inthis structure, and still less is mandatory. A minimal set ofmetadata would be the name of the publisher and the nameof the work. If additional metadata needs to be attached, thenit can be expressed in the OtherClaims data structures. Themanifest supports an array of OtherClaims structures to beincluded in PublisherInfo (claims about the publisher), Work-Info (claims about the work), FacsimileDescriptor (claimsabout the facsimile), and SourceWork (describing the howa source work was transformed to produce a derived work).OtherClaims allows two sorts of claims to be associated withthe manifest. Claim-sets can be embedded directly into themanifest, or a URI (or other descriptor) can be used toassociate claims outside the manifest. In the case of externalclaims, OtherData allows the option that the manifest cancryptographically commit to the external claims by includingthe hash of the external data in the OtherClaims structure.OtherClaims contains a string type descriptor. We expect todeﬁne a few standard descriptors such as “XMP,” “EIDR”,“SCHEMA,” and then use a DNS-style namespace to allowextensions. For example, the current manifest tools use thetype descriptor “;com.microsoft.amp.youtube-info”; to encodeYouTube metadata.

D. Media Bindings1) Authentication using an Object Digest:

All facsimilesare authenticated by hashing the entirety of the data thatconstitutes the facsimile: e.g., the hash of the entire PDF,JPG, MP4, OGV, etc. ﬁle. Some commonly used multimediastandards allow multiple streams to be packaged in a singleobject. In some cases, it still makes sense to authenticate the entire container ﬁle or stream. In other cases, a subset of theunderlying media ﬁle is authenticated. One important exampleof this is when the manifest is packaged in the media ﬁleitself for example, when the manifest is embedded in anISO/MPEG container. In all cases, the manifest directly orindirectly speciﬁes exactly what parts of the media object arehashed.

2) Authentication using Chunking:

Most modern mediaplayers download and buffer a few seconds of media and thenstart playing almost immediately, so authenticating a mediaobject based on the hash of the whole ﬁle is inappropri-ate. To support progressive/streaming playback of media, thesystem supports streaming authentication using a collectionof the hashes of “chunks” of the media object. Differentmedia delivery schemes demand different chunking schemes.Two chunking schemes are currently supported: ﬁle-offset-based chunking, and a Merkle-tree based scheme for MP4-containerized video. Each Facsimile can be authenticated usingmore than one chunking scheme to allow a single work to bedelivered in multiple ways.

3) File-Offset-Based Chunking:

The most common me-dia rendering technology on the web today is the HTML5 < video > element. The simplest way of using an HTML5video player is to conﬁgure the < video > element to fetchvideo data from a URL. In this case, the < video > elementperforms a sequence of HTTP partial-GET operations to fetchthe video data. File-offset-based chunking can be used to doprogressive authentication in this case: the manifest containsan array of hashes of (say) 256KB chunks of the underlyingvideo ﬁle, and the video player or browser calculates video-stream hashes and checks that they match a manifest. File-offset-based hashing can also work with Adaptive Bitrate(ABR) Streaming in some circumstances. ABR on the webis enabled by video player logic (often a JavaScript libraryrunning in the web page) fetching audio and video data froma collection of ﬁles encoded at different bandwidths. File-offset-based chunking still works in this case: each of theunderlying video ﬁles is chunked, hashed, and encoded inthe manifest. The SimpleChunkList data structure is used toencoded ﬁle-offset-based chunking. SimpleChunkLists containan array of hashes and the size of the underlying chunks. Thesize of each chunk is recorded in the manifest, but we willadditionally deﬁne some standard lengths to enable chunk-hashes to be calculated by clients when they do not yet havea valid manifest. The ﬁnal chunk in a ﬁle may be less thanthe chunk size.

4) MP4-Container Hashing and Merkle Tree Authentica-tion:

The MP4 ISO/IEC container format is a widely usedstandard for encoding any sort of media object in a ﬁleor stream. MP4 deﬁnes “box” types for holding multimediadata and metadata. For the purposes of this discussion, thefollowing box types are important: • MOOV: Basic stream metadata: one per container • MDAT: Video or audio data: typically, a few seconds • MOOF: Describes the samples in the subsequent MDATThe simplest fragmented MP4 container contains { MOOV[MOOF, MDAT] + } , but most containers have additionalboxes. MP4-Container-chunking deﬁnes a chunk as a subset of the MOOF data that deﬁnes the sample, together with thecorresponding video data: i.e., the MDAT.Chunk hashes deﬁned in this way can be embedded in aManifestCore using the MerkleTreeAuthenticator, describednext.

5) Merkle Tree Authentication:

Typical chunk sizes forfragmented MP4 are a few seconds long, so the chunk hashdata can be quite large. If the authentication data is encodedas a simple array, then the array of chunk hashes must beavailable in its entirety before authentication can begin. TheMerkleTreeAuthenticator is an alternative representation of thechunk-hashes that allows authentication to begin when only asubset of the authentication data is available. This is achievedby encoding part of the authentication data in the manifest,and additional “evidence” in the media stream itself. Together,these allow a player to check that a media chunk is consistentwith a manifest. This form of authentication is supported byencoding the authentication data as a Merkle hash tree. AMerkle Tree, depicted in Figure 8, is a binary tree of hashes,where the leaves of the tree are the digests of the [MOOFMDAT] samples, and each row in the tree is the hash of thedata or hashes in the row beneath.The Merkle Tree authenticator is encoded in two parts,which are typically distributed separately. The actual MediaManifest contains one row of hashes from the tree: forexample, the D2,0 and D2,1 digests in the ﬁgure above. Thiswould be sufﬁcient to the authenticate the video data as longas the player can read and hash all of the data leading up toD2,0 or D2,1, but (in this example) the player would haveto read, chunk, and hash half of the ﬁle before authenticationcould begin. To avoid the need for excessive read-ahead, themedia can be distributed with the relevant missing parts ofthe tree, so that the player can validate that a particular chunkhash is consistent with the manifest. For example, in the ﬁgureabove, to prove that the ﬁrst sample is consistent with D 2,0 theevidence would be D0,1 and D1,2 because these hash valuescan be used to form the missing parts of the tree.The tree is formed as follows. The depth of the tree isdetermined by the number of chunks in the ﬁle. In general, thenumber of leaf hashes is not a power of two. In such cases,the tree depth is calculated by rounding up the number of leafhashes to the next power of two. For example, if there are 5chunks, then this rounds up to 8, which leads to a tree depthof 4, including the leaves of the tree. The general rules forforming the tree (in both the power-of-two, and non-power oftwo case) are as follows:1) The leaf hashes are formed from the hash of the chunkdata.2) The “hash” of non-present chunk is termed null.To form intermediate node hashes in the tree:1) If both inputs are non-null, then output = Hash(LHS—RHS)2) If one input (RHS) is null, thenoutput = LHS3) If both inputs are null, thenoutput = nullThe MerkleTreeAuthenticator data structure encodes one rowof the hash tree in the Media Manifest, omitting null values.Encoding of the evidence hashes is described in the nextsection. MOOF+MDAT

MOOF+MDAT

MOOF+MDAT nothing

D0,0

D0,1

D0,2

D0,3

D0,4

D0,5

D0,6 null

D1,0 D1,1

D1,2

D1,3

D2,0

D2,1

D3,0

Fig. 8. Merkle Hash Tree formed over multimedia data. The “leaves” of the hash tree are the hashes of the media samples, and each row in the hash tree isformed from the hash of the concatenation of the two hashes in the lower layer. The top-hash is called the root of the hash tree. If the number of samples isnot a power of two, the leaves of the “missing” samples at the end of the ﬁle are null, and are processed according to the rules in this section.

Encoding Evidence in an MP4 Container.

The evidence thatallows a player to determine that a chunk is consistent withan associated manifest is encoded in an ISO/MP4 UUID-boxcalled a ChunkIntegrityBox. The ChunkIntegrityBox enablesveriﬁcation of a set of samples when combined with themanifest.Box Type: ‘uuid’% (Big-Endian Bytes)Box Extended Type:469d22dfe1924defa71ef4c9f2ce3e71Container: Movie Fragment Box (‘traf’)Mandatory:NoQuantity:Zero or oneSyntaxclass ChunkIntegrityBox extendsFullBox(‘uuid’, 469d22dfe1924defa71ef4c9f2ce3e71,version=0, ﬂags=0) { unsigned int(8) hash tree id;unsigned int(16)hash location;unsigned int(8)hash size;unsigned int(8)hash count; { unsigned int(8)[hash size] hash; } [hash count] } The ChunkIntegrityBox ﬁelds are as follows: • hash tree id: The index into the list of hashed streamsin the manifest • hash location: The zero-based chunk index • hash size: The size, in bytes, of the hash value • hash[] Every non-null hash from the tree required to getfrom this chunks node to one of the nodes found in themanifest. These hashes are sequenced from leaf to root. E. Adaptive Bitrate Streaming

The MPEG-DASH and Microsoft Smooth Streaming areadaptive bitrate (ABR) streaming formats that allow a clientplayer to select between different encodings of the samevideo object. Stream selection can happen when playing startsbut, if network conditions change, can also happen duringplayback. Under the covers, these streaming standards areusually enabled by creating a set of underlying compressedmedia ﬁles and dynamically assembling them into HLS orDASH objects with CMAF (MP4) chunks. The individual ﬁlesare encoded using different bandwidths/compression ratios,and, for each bandwidth, the original video is usually split intoshorter ﬁles to allow client players to switch bandwidths everyfew seconds by fetching from a different source ﬁle. Adaptivestreaming is supported by a set of ManifestCore strucures bytreating each of the separately encoded constituent ﬁles asa Facsimile. In some cases, this might be a TransformationManifest with a back-pointer to the manifest for an originalhigh-deﬁnition ﬁle that was used to create the ABR streams,and in other cases the ABR streams will all be authenticatedusing a simple (non-transformation) manifest.

F. Transformation Manifests

Transformation Manifests are used to authenticate Worksthat are transformed from other Works. Transformation Man-ifests can be authored by the same publisher that createdthe original work, can be authored by an entity operating onbehalf of another entity (e.g. a CDN), or can be created by acompletely unrelated entity, tool, or person. Such manifestsallow an entity to apply a transformation to a work, establishthe original work as its source, and make a signed claim thistransformation does not alter the meaning of the content of theoriginal Work. The manifest does not itself prove this assertionautomatically but provides an auditable trail through which theassertion could be challenged. How such a challenge would be Trust assessments when several parties are involved are not discussed here. resolved is beyond the scope of this work; the manifest onlyensures the transforming entity is accountable for transformedWorks it releases.Transformation Manifests differ from original work mani-fests in that they specify the ManifestID of the source workor works used to create the derived work, and also includethe nature of the transformation applied. The primary initialscenario enabled by Transformation Manifests is re-encodingof a media object after the original manifest is created. How-ever, we have allowed for future extensibility to express morecomplex sorts of derivation such as editing and media objectcomposition. Such an extension of Transformation Manifestsmay allow for the meaning of the original work to be altered,but in a speciﬁc and documented way they assert is acceptableand does not alter the meaning of the transformed content. Forexample, a derivative work in the form of a news report mightuse a clip of a newsworthy event, and the producing entitycould both assert the originality of its own content and make aclaim that the clip of the event being described is unaltered, oritself transformed in some acceptable way, such as transcoded,or decorated with the entity’s chyron or watermark. G. Distributing Manifests and Manifest Containers

A simpliﬁed representation of a ManifestCore and relateddata structures in illustrated in Figure 7. The central datastructure that cryptographically authenticates media is calledthe ManifestCore. A ManifestCore directly contains somedata items, and cryptographic commitment to external datastructures that may be distributed with the manifest or byother means. The ManifestCore uses commitments/hashesrather than embedding the data structure directly when thesupplemental data is not always required. For example: • The facsimile media authentication information is en-coded in one or more external FacsimileDescriptors. Thisallows a media object to be distributed with only theFacsimileDescriptors that are relevant. For example, if avideo object is encoded in WEBM and MP4, and each isencoded in 5 different bit rates and resolutions, this is 10facsimiles. If a player is just playing one of these streams,then only the appropriate FacsimileDescriptor needs to beavailable to authenticate the stream. • There are a wide range of media metadata formats, andthere is a wide range of data that a publisher might wantto associate with a work; some of which the publishermight not want to distribute. Commitment can be used toattach supplemental metadata to a manifest.The consequence of this is that a ManifestCore always needsadditional data structures before it can be used to authenticatemedia. The ManifestContainer data structure is an envelopethat allows a ManifestCore to be distributed with supplementaldata structures that allow a work to be authenticated. Notethat the ManifestCore cannot be modiﬁed after it is createdbecause the MediaID would change and signatures wouldbreak. However, ManifestContainers (each of which containsa ManifestCore) can be freely created with just the dataneeded for the intended purpose. ManifestContainers can alsocontain signature blocks and certiﬁcates from the publisher(PublisherAttestation and LedgerAttestation).

H. Signing Manifests

Manifests are typically signed by the originator (publisher,redistributor, social media platform, etc.) and may be counter-signed by distributed ledger services. Manifest signatures areperformed over the hash of a canonical representation ofthe manifest. JSON and CBOR representations are used bydifferent parts of the system, so the manifest is signed twice:once to produce a JWT signature block (JSON) and once toproduce a COSE signature block (CBOR). A PublisherAttesta-tion optionally allows the signer certiﬁcate or certiﬁcate chainto be bundled in the ManifestContainer.

I. Canonicalization

ManifestIDs and signatures are over JSON or CBOR canon-ical encodings. JSON canonicalization follows the IEFT JCSdraft. CBOR canonicalization follows RFC7049. COSE sig-natures follow RFC8152.A

PPENDIX CM ANIFEST S TRUCTURES

This section includes the detailed deﬁnitions for some ofthe key AMP manifest structures.

A. ManifestContainer

A ManifestContainer (Table V) is a holder for informationneeded to authenticate a media object.The ManifestContainer structure contains all the informa-tion necessary to validate a work. ManifestCore is the centralstructure: it is usually hashed and signed, in which casepublisher signing information is held in PublisherAttestation.If the manifest is registered on a public ledger/blockchain thenadditional evidence from the service provider can be stored inLedgerAttestation.The actual multimedia data is hashed to allow it to beauthenticated during playback or forensic analysis. The mediahashes are not stored directly in the ManifestCore, but insteadare stored in FacsimileDescriptor structures inside Facsimile-Info, with one FacsimileDescriptor structure per facsimile. TheFacsimileDescriptors are cryptographically bound to the man-ifest by hashing. Keeping the FacsimileDescriptors separateallows ManifestContainers to be smaller in the case wherea manifest is expected to be used with just one or a fewfacsimiles.

B. ManifestCore

A ManifestCore (Table VI) cryptographically authenticatesa single work or a set of facsimiles of a work (e.g., a setof JPG images with different sizes and compression ratios, or,for video, different bandwidth encodings, different video framesizes, and different encoding schemes). Supported work/mediatypes include video, audio, image, text, PDF, HTML.A ManifestCore structure will often be packaged inside anenveloping ManifestContainer structure. The enclosing Man-ifestContainer contains additional information to validate amedia object, as well as signatures from the publisher andother parties. Name Type DescriptionVersion number The structure version. This document describes version 1.CoreManifest ManifestCore ManifestCore authenticates a media object and associated metadata. To authenticatea media object, a ManifestCore and one or more FacsimileDescriptors (embedded inthe FacsimileInformation) data structure are required.FacsimileInfo FacsimileInformation A container for one or more FacsimileDescriptors that cryptographically authenticatemedia objects. Note that FacsimileInformation may contain descriptors for a subsetof the Facsimiles described by the manifest (to reduce storage and bandwidth whennot all FacsimileDescriptors are required.)PublisherAttestation PublisherAttestation Manifest signatures and certiﬁcates from the publisher (optional)LedgerAttestation LedgerAttestation Manifest signatures and certiﬁcates from ledger (or other) services (optional)ManifestLocator string An optional string that helps locate the manifest or additionalFacsimileDescriptors (optional)TABLE VM

ANIFEST C ONTAINER STRUCTURE DESCRIPTION . The ManifestCore does not directly contain the Facsimi-leDescriptors that authenticate a facsimile; instead, a Mani-festCore contains (essentially) an array of hashes of Facsimi-leDescriptors. The FacsimileDescriptors themselves are storedoutside the ManifestCore - often in an enclosing Facsimile-Information structure. This saves storage and bandwidth if amanifest is being used to authenticate just one or a subset ofthe deﬁned Facsimiles.In addition to cryptographically authenticating a work, aManifestCore contains an optional publisher assigned meta-data identifying the publisher (PublisherInfo), and the workbeing authenticated (WorkInfo). These structures can alsoreference external metadata.The ManifestCore allows the expression of “authorizedderivation” of a work by services such as social platforms,CDNs, or publishing tools. To support this, a ManifestCore cancontain a back-pointer to other ManifestCores called OriginManifests. If the work is a simple transcoding of another work,then this will point to the manifest for the original work. If thework is a composite of several originals, then the ManifestCorecan point back to several originals.All cryptographic digests in a ManifestCore and relatedstructures must use the hashing algorithm described in Mani-festCore (HashingAlgorithm). The ManifestID is the hash ofa canonical representation of a manifest. Currently, this is thehash of a canonical CBOR-encoding of the manifest.In this structure, “FacsimileInfoDigests”: “type”: [“ar-ray”,“null”],“items”: “type”: [“string”,“null”].

C. PublisherInfo

The PublisherInfo structure in Table VII is a containerfor information about the publisher or redistributor of thismanifest.

D. OtherClaims

OtherClaims (Table VIII) is a container for additional claimsto be associated with a publisher, work, facsimile, or trans-formation. ManifestCores natively support a minimal amountof metadata. Publishers may choose to include or referenceadditional metadata about the work, the facsimile, the trans-formation, or the publisher using this data structure. Two typesof extension are supported: (1) EmbeddedClaims is any string-encoded data that is embedded in manifest itself, and (2) ExternalClaims is a pointer (e.g. an URL, ﬁle name or GUID)to an external data object. Optionally, ExternalClaimsDigestcan contain the digest of external data if its integrity must beprotected.Speciﬁcally, “EmbeddedClaims”: “type”: [“string”, “null”]and “ExternalClaimsDigest”: “type”: [“string”, “null”].

E. WorkInfo

The WorkInfo structure (Table IX) is a container forpublisher- or redistributor-provided information about a work.This information is the same for all facsimiles described by aManifestCore.

F. SourceWorkInfo

SourceWorkInfo (Table X) identiﬁes one or more sourceworks that were used to produce a derived work.

G. SourceWork

SourceWork (Table XI) identiﬁes the source work in a trans-formation manifest. It also describes the type of transformationperformed and may optionally contain details about the exacttransformation applied.

H. ManifestReference

A ManifestReference (Table XII) is a description of aManifestCore that is stored elsewhere. A ManifestLocatorMUST contain the ManifestID of the referenced manifest andmay optionally include the URI of a service by which theManifest can be obtained.

I. TypedDigest

TypedDigest (Table XIII) is a container class for a typeddigest. Most digests used in ManifestCore-related data struc-tures are simple byte-arrays with the hash algorithm deﬁnedthe associated ManifestCore. This data structure is used whentyped digests are required.

J. DerivationSort

The DeriviationSort (Table XIV) is an enumeration thatdescribes the type of transformation of the original work usedto form a derived work. Name Type DescriptionVersion number The structure version. This document describes version 1.SerialNumber byte[] Statistically unique / random serial number for the manifestDigestAlgorithm string All hashes in this manifest and the contained data structures use thealgorithm stated hereMediaID byte[] A publisher-assigned quasi-unique ID for the work or family of works. The MediaIDcan attached to works (e.g., in a metadata ﬁeld in the ﬁle) or encodedusing a watermark.CreationTime date-time The date/time when this manifest was created. Note that WorkInfo can specify adifferent time for the creation of the work.Publisher PublisherInfo Information about the publisher (or redistributor) that created this manifestWork WorkInfo Information about the work or works described by this manifestFacsimileInfoDigests byte[][] An array of hashes of FacsimileInfo structures that are typically delivered in anenveloping ManifestContainer. The FacsimileInfo structures authenticate the mediaobjects described by this manifest.OriginManifests SourceWorkInfo If the manifest is a derived work (transcoding or composite edited work) this datastructure contains the original manifest of manifests, as well as the transformationsthat were applied. (optional)TABLE VIM

ANIFEST C ORE STRUCTURE DESCRIPTION .Name Type DescriptionName string The name of the publisherOtherInfo string Any other information that the publisher needs to associate with the work or works (optional)AdditionalClaims OtherClaims[] Any other information about the publisher that should be associated with this manifest (optional)TABLE VIIP

UBLISHER I NFO STRUCTURE DESCRIPTION .Name Type DescriptionName string The name of the publisherClaimSort string Publisher chosen identiﬁer for the sort of metadata encoded in this record.EmbeddedClaims string String encoding of additional metadata (optional)ExternalClaims string A locator (URI, etc.) of external metadata (optional)ExternalClaimsDigest byte[] Optional digest of the external metadata. This can be used if the additional metadata is stableand the publisher wishes to cryptographically commit to the exact metadata at the time ofmanifest creation. If this is not required, then this ﬁeld should be omitted or null. (optional)TABLE VIIIO

THER C LAIMS STRUCTURE DESCRIPTION .Name Type DescriptionTitle string The name or title of the work or family of worksTitle2 string Additional name/title information (optional)OtherInfo string Optional publisher-chosen data (optional)Copyright string A copyright notice for the work or family of works (optional)CreationTime date-time Publisher-chosen original publication or creation time. This need not be the same as themanifest creation time (optional)MasterCopyLocator string A stable URI, etc. of a master original (facsimiles may have their own Facsimilelocators) (optional)Duration number If the work is video or audio this can be the length of the work in 100ns (1e-7 secs)units (optional)AdditionalClaims OtherClaims[] Other publisher-provided metadata. (optional)TABLE IXW

ORK I NFO STRUCTURE DESCRIPTION .Name Type DescriptionOriginManifests SourceWork[] An array of identiﬁers for the source works,and how the source works were processed to create the derived work.TABLE XS

OURCE W ORK I NFO STRUCTURE DESCRIPTION . Name Type DescriptionOriginManifest ManifestReference A reference to the manifest of the origin workDerivationType DerivationSort Describes the transformation of the source work to formthe derived work: e.g. a simple transcoding.AdditionalClaims OtherClaims[] Any other information about the transformation that was appliedto the original to produce the derived work transformation,for example, EIDR claims. (optional)TABLE XIS

OURCE W ORK STRUCTURE DESCRIPTION .Name Type DescriptionVersion number The structure version. This document describes version 1.ManifestLocator string An optional ﬁeld to encode a service, ﬁle, etc. that canbe used to locate the referenced manifest (optional)ManifestID TypedDigest The ManifestID (manifest digest) of the referenced manifestTABLE XIIM

ANIFEST R EFERENCE STRUCTURE DESCRIPTION .Name Type DescriptionDigestAlgorithm string The digest/hash algorithm used to create this digestDigestValue byte[] The digest/hash valueTABLE XIIIT

YPED D IGEST STRUCTURE DESCRIPTION .Name Type DescriptionTranscoded 1 The derived work is a simple transcoding of the original workCompleteCopy 2 The entire original work is included in the derived workPartialCopy 3 Part of the original is included in the derived workEditedCopy 4 One or more named editing operations have been applied to the original to produce the derived workTABLE XIVD

ERIVATION S ORT STRUCTURE DESCRIPTION . K. FacsimileInformation

The FacsimileInformation structure (Table XV) is a con-tainer structure for one or more TaggedFacsimileDescriptors.

L. TaggedFacsimileDescriptor

A TaggedFacsimileDescriptor (Table XVI) is a container fora FacsimileDescriptor. The index is the array index of the hashof the FacsimileDescriptor in the associated ManifestCore.

M. FacsimileDescriptor

A facsimile is a particular encoding or representation ofa work and is represented by a FacsimileDescriptor (Ta-ble XVII). A ManifestCore can describe one facsimile, or acollection of facsimiles that the publisher deems equivalent:e.g., a set of videos with different encoding schemes orparameters. Non-streaming media is cryptographically boundto the hash/digest of the complete work. Streaming mediacan also be progressively authenticated. Progressive authen-tication is supported using an array of digests of “chunks” ofthe media stream. Different scenarios are best supported bydifferent chunking schemes so the ChunkAuthenticator datastructures come in several forms. The simplest is ﬁle-offsetbased chunking. Other chunking schemes are also deﬁned, andmore will be added as needed.

N. FacsimileType

A FacsimileType (Table XVIII) is the type of media objectof a Facsimile. Note that a video stream may be decomposedinto separate video, audio, and muxed facsimiles.

O. ChunkAuthenticator

ChunkAuthenticator (Table XIX) is the base class forvarious ways that the chunks of a streaming work can beauthenticated. Chunks are always authenticated by the hashof a chunk, but the deﬁnition of a chunk (e.g., its size, orhow chunk boundaries are established) can vary. Concretevariations are deﬁned by different derived structures withdifferent ChunkingScheme values.

P. SimpleChunkListAuthenticator

SimpleChunkListAuthenticator (Table XX) describesﬁle/stream-offset based chunking. For example, if ChunkSizein 1MiByte, then the ﬁrst chunk is the ﬁrst MiByte of themedia object/ﬁle, the second chunk is the second MiByte,etc. The last chunk in a ﬁle/stream can be smaller.

Q. IsoBoxAuthenticator

IsoBoxAuthenticator (Table XXI) describes chunks of mul-timedia data encoded in an MPEG/ISO container. Name Type DescriptionVersion number The structure version. This document describes version 1.Records TaggedFacsimileDescriptor[] An array of FacsimileDescriptors tagged with anindex that is the location in ManifestCore.FacsimileInfoDigests.TABLE XVF

ACSIMILE I NFORMATION STRUCTURE DESCRIPTION .Name Type DescriptionIndex number The zero-based array index into ManifestCore.FacsimileInfoDigeststhat contains the digest of this FacsimileDescriptorFacsimile FacsimileDescriptor The crypotographic descriptor of a FacsimileTABLE XVIT

AGGED F ACSIMILE D ESCRIPTOR STRUCTURE DESCRIPTION .Name Type DescriptionFacsimileMajorType FacsimileType Media type of this facsimile (video, audio, muxed, etc.)ContainerType string The name of the ﬁle/container format for this multimedia,e.g., JPG or MP4.EncodingInformation string String-encoded encoding scheme and parameters for this particularfacsimile. If this is a muxed stream, then this will contain the videoencoder info, and EncodingInformation2 will contain the audioencoding infoEncodingInformation2 string If this Facsimile contains more than one media type, then this is thesecondary type. E.g. the audio encoder type for an AV muxedstream. (optional)Length number Length, in bytes, of the facsimileObjectDigest byte[] Digest of the entire work using the hash algorithm in the containingAmp ManifestFacsimileLocator string Any other information about facsimile that should be associatedwith this (optional)ObjectContainers string If missing or null, then the data hashes to obtain the ObjectDigestis the entire object - e.g., the JPG or MP4 ﬁle. If the data to be hashedis wrapped in a container format and not all of the data in theenveloping ﬁle/stream should be hashed, then this ﬁeld whichcontainers/streams should be hashed (placeholder/todo) (optional)AdditionalClaims OtherClaims[] Any other data that the publisher wishes to associatewith the facsimile. (optional)ChunkData Array of any One or more chunked representations of the facsimile. Only needed forSimpleChunkListAuthenticator progressive authentication of streaming media objects, )IsoBoxAuthenticator otherwise null or omitted. (optionalMerkleTreeAuthenticator TABLE XVIIF

ACSIMILE D ESCRIPTOR STRUCTURE DESCRIPTION .Name Type DescriptionUnknown 0 Facsimile type is not known or is not speciﬁedMuxedAV 1 Multiplexed AV streamVideo 2 Video stream (no audio)Audio 3 Audio streamImage 4 Any sort of imageText 5 Any sort of textTABLE XVIIIF

ACSIMILE T YPE STRUCTURE DESCRIPTION .Name Type DescriptionChunkingScheme number This tag indicates the actual type of this structureNumChunks number The number of chunks described by this authenticatorChunkDigest byte[][] An ordered array of chunk-hashes starting from the beginningof the work. All ChunkAuthenticators have a list of chunk digests,but speciﬁc authenticators may have additional data that describe exactlywhat each chunk maps to (e.g. ﬁle offset-based, I-frame-to-I-frame, etc.)TABLE XIXC

HUNK A UTHENTICATOR STRUCTURE DESCRIPTION . Name Type DescriptionChunkingScheme number SimpleChunkList is ChunkingScheme 1ChunkSize number All chunks are this size (optional)NumChunks number The number of chunks described by this authenticator)ChunkDigest byte[][] An ordered array of chunk-hashes starting from the beginning of the work. AllChunkAuthenticators have a list of chunk digests, but speciﬁc authenticators may have additionaldata that describe exactly what each chunk maps to (e.g. ﬁle offset-based, I-frame-to-I-frame, etc.)TABLE XXS

IMPLE C HUNK L IST A UTHENTICATOR STRUCTURE DESCRIPTION .Name Type DescriptionChunkingScheme number Iso-box chunking is ChunkingScheme 2NumChunks number The number of chunks described by this authenticatorChunkDigest byte[][] An ordered array of chunk-hashes starting from the beginning of the work. AllChunkAuthenticators have a list of chunk digests, but speciﬁc authenticators may have additionaldata that describe exactly what each chunk maps to (e.g. ﬁle offset-based, I-frame-to-I-frame, etc.)TABLE XXII SO B OX A UTHENTICATOR STRUCTURE DESCRIPTION . R. MerkleTreeAuthenticator

MerkleTreeAuthenticator supports chunk authentication us-ing a Merkle hash tree. This style of chunk authenticationminimizes the amount of data that needs to be downloadedto authenticate the ﬁrst chunk (including the case whereplayback starts in the middle of the ﬁle.) We use the followingterminology: the top-hash is the root of the tree, also referredto as row zero. The two children of the top-hash are rowone, and so on. The hashes in the ﬁnal row in the hash treeare also called the leaves of the tree. The leaves of the treeare the hashes of the media chunks using the hash algorithmspeciﬁed in the manifest. If the number of chunks is not apower of two, then the tree is padded with leaves with zero-values: e.g., 32 bytes of zero in the case of a sha256 hash tree.The number of hashes in the ChunkAuthenticator to supportthe MerkleTreeAuthenticator must always be a power of two(todo, we could relax this), and the hash algorithm used is thealgorithm in the containing manifest. The hash tree is typicallysplit into two parts at a row called the split-row. The upper partof the tree - from the root to the split-row - is encoded in theMerkleTreeAuthenticator (Table XXII) and hence distributedwith the manifest. The lower part of the tree is distributedwith the media itself. The MerkleTreeAuthenticator containsthe row of hashes at the split-row, with the row number givenby the EncodedRow ﬁeld. The number of hashes encodedwill always be a power of two. If needed, the tree up tothe root can be calculated by repeated hashing. Rows belowthe split-row can be derived from the media itself, or can bederived from the media and fragments of the hash tree called“evidence” sent by other means. It is out of scope of themanifest speciﬁcation to describe how evidence is encoded,but we expect that each media chunk will be distributed withan array of hashes that allow clients to verify that the hash ofa chunk of media data is one of the leaves of the completehash tree.

S. PublisherAttestation

PublisherAttestation (Table XXIII) is a container class forpublisher-created signatures, etc.

T. LedgerAttestation

LedgerAttestation (Table XXIV) is a container class forledger-created signatures, etc.A

PPENDIX

DCCF D

ETAILS

Manifests are recorded on a public blockchain using CCF.CCF operates the public ledger (i.e., blockchain) of publishedworks, essentially a list of manifests, relying on a distributednetwork of replicas running on trusted hardware and synchro-nized using Practical Byzantine Fault Tolerance (PBFT) [31]or Raft [3]. CCF supports the registration of new manifests andissues signed manifest receipts. These receipts complementthe producer’s signatures; they enable any media consumersto independently verify that the work they receive has beenpublished with the corresponding metadata. CCF also supportsonline querying and validation of ledger transactions and theirendorsing certiﬁcates, as well as the transparent governanceof the service by a consortium of media producers. a) Governance:

CCF provides a ﬂexible governancemodel. This allows for AMP to deﬁne the governance bywriting scripts in scripting languages such as the Lua [32]or JavaScript [33] languages. These scripts specify rules foractions such as adding new members, adding or removingusers, adding and removing nodes from the system, useraccess control, etc. The speciﬁcs of the governance modelwill be deﬁned as part of the media consortium that controlsAMP, and these rules will evolve with time by modifying thegovernance scripts. b) Trust and Integrity:

CCF is designed to support twodifferent types of consensus algorithms including Crash FaultTolerance (CFT) and Byzantine Fault Tolerance (BFT). TheCFT variant that CCF supports is a modiﬁed version ofRaft [3], and the variant of BFT implemented by CCF isa modiﬁed version of Practical Byzantine Fault Tolerance(PBFT) [31]. Raft leverages trusted execution environments(TEEs) and speciﬁcally Intel’s SGX. Its trust model is thata single TEE compromise destroys both conﬁdentiality andintegrity. By using this trust model, CCF is able to utilize a Name Type DescriptionChunkingScheme number Hash-Tree-chunking is ChunkingScheme 3EncodedRow number The row of the tree that is encoded in this authenticator. Zero means that only the roothash is encoded, 1 means that the pair of hashes at row 1 of the Merkle tree is encoded. -1 meansthat the hashes are the leaf hashes.NumChunks number The number of chunks described by this authenticatorChunkDigest byte[][] An ordered array of chunk-hashes starting from the beginning of the work. AllChunkAuthenticators have a list of chunk digests, but speciﬁc authenticators may have additional datathat describe exactly what each chunk maps to (e.g. ﬁle offset-based, I-frame-to-I-frame, etc.)TABLE XXIIM

ERKLE T REE A UTHENTICATOR STRUCTURE DESCRIPTION .Name Type DescriptionCoseSignatureToken byte[] COSE Signature1 signature block (optional)JsonWebToken string String-encoded JSON signature block (optional)PemEncodedCertiﬁcates string[] Certiﬁcate chain for the signing key ordered from theself-signed root, through the subordinate CAs, to the key used to sign the manifest (optional)TABLE XXIIIP

UBLISHER A TTESTATION STRUCTURE DESCRIPTION .Name Type DescriptionLedgerAttestationValue string (optional)TABLE XXIVL

EDGER A TTESTATION STRUCTURE DESCRIPTION . variant of Raft which can handle malicious attacks as longas Intel’s SGX is not compromised. PBFT is a consensusalgorithm that can make progress if less than / of the nodesare actively malicious. PBFT’s trust model is that a single TEEcompromise destroys conﬁdentiality but f+1 compromises (in a3f+1 network) are required to destroy integrity. This distinctionmeans that even if some of the CCF nodes, which are runningin a SGX enclave, are compromised, the Media ProvenanceLedger will not lose integrity. This added security comes atan increased performance and latency cost when committingdata to the ledger. Critically, both of these consensus protocolsoffer ﬁnality. This property states that once a transaction hasbeen committed, it cannot be reverted. Furthermore, a CCFreceipt provides an additional ﬁnality proof. c) Distributed Execution:c) Distributed Execution: