[PDF] Contour: A Practical System for Binary Transparency

Abstract

Transparency is crucial in security-critical applications that rely on authoritative information, as it provides a robust mechanism for holding these authorities accountable for their actions. A number of solutions have emerged in recent years that provide transparency in the setting of certificate issuance, and Bitcoin provides an example of how to enforce transparency in a financial setting. In this work we shift to a new setting, the distribution of software package binaries, and present a system for so-called "binary transparency." Our solution, Contour, uses proactive methods for providing transparency, privacy, and availability, even in the face of persistent man-in-the-middle attacks. We also demonstrate, via benchmarks and a test deployment for the Debian software repository, that Contour is the only system for binary transparency that satisfies the efficiency and coordination requirements that would make it possible to deploy today.

Full PDF

CContour: A Practical System for Binary Transparency

Mustafa Al-Bassam

University College [email protected]

Sarah Meiklejohn

University College [email protected]

ABSTRACT

Transparency is crucial in security-critical applications that rely onauthoritative information, as it provides a robust mechanism forholding these authorities accountable for their actions. A number ofsolutions have emerged in recent years that provide transparencyin the setting of certificate issuance, and Bitcoin provides an exam-ple of how to enforce transparency in a financial setting. In thiswork we shift to a new setting, the distribution of software packagebinaries, and present a system for so-called “binary transparency.”Our solution, Contour, uses proactive methods for providing trans-parency, privacy, and availability, even in the face of persistentman-in-the-middle attacks. We also demonstrate, via benchmarksand a test deployment for the Debian software repository, thatContour is the only system for binary transparency that satisfiesthe efficiency and coordination requirements that would make itpossible to deploy today.

Full version of an extended abstract published in CBT 2018.

Historically, functional societies have relied to a large degree ontrust in their governing institutions, with participants in varioussystems (nation states, the Internet, financial markets, etc.) trustingthose in charge to follow an agreed-upon set of rules and thusprovide the system with some level of integrity. In recent years,however, increasing numbers of incidents have demonstrated thatintegrity cannot be meaningfully achieved solely by placing trust ina small number of entities. As a result, people are now demandingmore active participation in the systems with which they interact,and more accountability for the entities that govern them. The mainmethod that has been relatively successful thus far in achievingaccountability is the idea of transparency , in which informationabout the decisions within the system are made globally visible,thus enabling any participant to check for themselves whether ornot the decisions comply with what they perceive to be the rules.One of the technical settings in which the idea of transparencyhas been most thoroughly — and successfully — deployed is theissuance of X.509 certificates. This is partially due to the natureof these certificates (which are themselves intended to be glob-ally visible), and partially to the many publicized failures of ma-jor certificate authorities (CAs) [17, 22]. A long line of recent re-search [4, 9, 19, 21, 23, 24, 28, 31] has provided and analyzed solu-tions that bring transparency to the issuance of both X.509 certifi-cates (“certificate transparency”) and to the assignment of publickeys to end users (“key transparency”).Despite their differences, many of these systems share a fun-damentally similar architecture [6]: after being signed by CAs,certificates are stored by log servers in a globally visible append-only log; i.e., in a log in which entries cannot be deleted withoutdetection. Clients are told to not accept certificates unless theyhave been included in such a log, and to determine this they rely on auditors , who are responsible for checking inclusion of the specificcertificates seen by clients. Because auditors are often thought ofsoftware running on the client (e.g., a browser extension), theymust be able to operate efficiently. Finally, in order to expose mis-behavior, monitors (inefficiently) inspect the certificates stored in agiven log to see if they satisfy the rules of the system.To prevent clients from accepting bad certificates, such systemsthus rely on monitors to expose them. Because auditors are the onescommunicating with the client, however, to achieve this property anadditional line of communication is needed between the auditor andmonitor in the form of a gossip protocol [7, 27]. In such a protocol,the auditor and monitor periodically exchange information on theircurrent and previous views of the log, which allows them to detectwhether or not their views are consistent , and thus whether ornot the log server is misbehaving by presenting “split” views ofthe log. If such attacks are possible, then the accountability of thesystem is destroyed, as a log server can present one log containingall certificates to auditors (thus convincing it that its certificatesare in the log), and one log containing only “good” certificates tomonitors (thus convincing them that all participants in the systemare obeying the rules).While gossiping can detect this misbehavior, it is ultimately aretroactive mechanism — i.e., it detects this behavior after an audi-tor has already accepted a certificate as valid and it is too late — andis thus most effective in settings where (1) no persistent man-in-the-middle (MitM) attack can occur, so the line of communicationbetween an auditor and monitors remains open, and (2) some formof external punishment is possible, to sufficiently disincentivizemisbehavior on the basis of detection. Specifically for (1), if anauditor has no means of communication that is not under an adver-sary’s control for the foreseeable future (a scenario we refer to as apersistent MitM attack), then the adversary may block all gossipbeing sent to and from the auditor, and thus monitors may neversee evidence of log servers misbehaving.Such a persistent MitM attack may be performed by an adver-sary who has compromised the cryptographic signing keys of thesoftware distribution authority. This would enable them to compro-mise individual devices with malicious software updates, and thenprevent gossiping between auditors and monitors by either usingthe malicious software to disable the gossiping system, or — if theycontrol the network the device is connect to — prevent gossipingat a network level until the device stops gossiping. For example,the proposed gossip protocol for CT implements a fixed-sized poolof items to gossip, with items eventually being removed from thepool, as it would be wasteful for devices to gossip about the sameinformation permanently [27]. An adversary would then have tocarry out an attack only until this pool were emptied.Various systems have been proposed recently that use proactivetransparency mechanisms designed to operate in settings wherethese assumptions cannot be made, such as Collective Signing [30] a r X i v : . [ c s . CR ] A ug CoSi), but perhaps the most prominent example of such a system isBitcoin (and all cryptocurrencies based on the idea of a blockchain ).In Bitcoin, all participants have historically played the simultaneousrole of log servers (in storing all Bitcoin transactions), auditors, andmonitors (in checking that no double-spending takes place). Thehigh level of integrity achieved by this comes at great expense to theparticipants, both in terms of storage costs (the Bitcoin blockchainis currently over 100 GB ) and computational costs in the form ofthe expensive proof-of-work mechanism required to maintain theblockchain, but several recent proposals attempt to achieve the samelevel of integrity in a more scalable way [20, 31]. CoSi [30] achievesthis property by allowing a group of witnesses to collectively signstatements to indicate that they have been “seen,” but assumesthe setup and maintenance of a Sybil-free set of witnesses, whichintroduces a large coordination effort.Because of the effectiveness of these approaches, there has beeninterest in repurposing them to provide not only transparency forcertificates or monetary transfers, but for more general classes ofobjects (“general transparency” [10]). One specific area that thusfar has been relatively unexplored is the setting of software dis-tribution (“binary transparency”). Bringing transparency to thissetting is increasingly important, as there are an increasing numberof cases in which actors target devices with malicious softwaresigned by the authoritative keys of update servers. For example,the Flame malware, discovered in 2012, was signed by a rogue Mi-crosoft certificate and masqueraded as a routine Microsoft softwareupdate [17]. In 2016, a US court compelled Apple to produce andsign custom firmware in order to disable security measures on aphone that the FBI wanted to unlock [13]. Challenges of binary transparency.

Aside from its growing rel-evance, binary transparency is particularly in need of explorationbecause the techniques described above for both certificate trans-parency and Bitcoin cannot be directly translated to this setting.Whereas certificates and Bitcoin transactions are small (on the or-der of kilobytes), software binaries can be arbitrarily large (oftenon the order of gigabytes), so cannot be easily stored and replicatedin a log or ledger.Most importantly, by their very nature software packages havethe ability to execute arbitrary code on a system, so malicious soft-ware packages can easily disable gossiping mechanisms, and wecannot assume that the auditor always has a means of commu-nication that is not under an adversary’s control. Specifically, asdiscussed earlier a malicious adversary may perform a MitM attackto prevent gossip while presenting an auditor a malicious view ofthe log, and the log may itself contain a malicious software updatethat executes code to disable gossiping. This makes retroactivemethods for detecting misbehavior uniquely poorly suited to thissetting, in which clients need to know that a software package hasbeen inspected by independent parties before installing it, not after.Binary transparency systems relying on such retroactive methods,based on Certificate Transparency, are currently being proposedfor Firefox [1].

Our contributions.

We present Contour, a solution for binary blockchain.info/charts/blocks-size transparency that utilizes the Bitcoin blockchain to proactivelyprevent clients from installing malicious software, even in the faceof long-term MitM attacks. Concretely, we contribute a realisticthreat model for this setting and demonstrate that Contour is able tomeet it; we also show, via comparison with previous solutions, thatContour is currently the only solution able to satisfy these securityproperties while still maintaining efficiency and a minimal levelof coordination among the various participants in the system. Wealso provide a prototype implementation that further demonstratesthe efficiency of Contour, and finally provide an argument for itspracticality via a test deployment for the Debian software reposi-tory. Putting everything together, we view Contour as a solutionfor binary transparency that is ready to be deployed today.We begin in Section 4 by presenting our threat model. In additionto the goal of preventing split views, we highlight the importanceof auditor privacy , in which auditors should not reveal the particu-lar binaries in which they are interested (as this could reveal, forexample, that a client has a susceptible version of some software),and of availability , in which auditors and monitors should still beable to do their job even if the original software update server losesits data or goes offline.After then presenting the design of Contour in Section 5, wego on to analyze both its security and its efficiency in Section 6.Given the volume of related research on certificate transparency,we also present some comparisons here, and argue that ours is thefirst efficient solution to provide these security guarantees withoutrequiring any coordination cost, in the form of selecting a centralentity to perform authorization, or otherwise trusting some partyto form a Sybil-free set of nodes.To validate our efficiency claims, in Section 7 we describe animplementation of Contour and benchmark its performance, findingthat almost all operations can be performed very quickly (on theorder of microseconds), that auditors can store minimal information(on the order of kilobytes), and that arbitrary numbers of binariescan be represented by a single small (235-byte) Bitcoin transaction.We also validate our claims of real-world relevance by presenting,in Section 8, the application of Contour to the current packagerepository for the Debian operating system. We find that it wouldrequire minimal overhead for existing actors, and cost under 17USD per day (even given the current high price of Bitcoin).Finally, in Section 9 we present some possible extensions toContour, including a discussion of how to use it to achieve generaltransparency, and in Section 10 we conclude. There is by now a significant volume of related work on the idea oftransparency, particularly in the settings of certificates, keys, andBitcoin. We briefly describe some of this work here, and provide amore thorough comparison to the most relevant work in Section 6.3.While Contour uses similar techniques to previous solutions withinthese other contexts, to the best of our knowledge it is the first fulldeployable solution in the context of binary transparency.In terms of certificate transparency, AKI [19] and ARPKI [4]provide a distributed infrastructure for the issuance of certificates,thus providing a way to prevent rather than just detect misbehav-ior. Certificate Transparency (CT) [21] focuses on the storage of ertificates rather than their issuance, Ryan [28] demonstrated howto handle revocation within CT, and Dowling et al. [9] provideda proof of security for it. Eskandarian et al. [11] propose how tomake some aspects of gossiping in CT more privacy-friendly us-ing zero-knowledge proofs. CONIKS [24] focuses instead on keytransparency, and thus pays more attention to privacy and doesnot require the use of monitors (but rather has users monitor theirown public keys).In terms of solutions that avoid gossip, Fromknecht et al. [14]propose a decentralized PKI based on Bitcoin and Namecoin, andIKP [23] provides a way to issue certificates based on Ethereum.EthIKS [5] provides an Ethereum-based solution for key trans-parency and Catena [31] provides one based on Bitcoin. While bothCatena and Contour utilize similar recent features of Bitcoin toachieve efficiency, they differ in their focus (key vs. binary trans-parency), and thus in the proposed threat model; e.g., Catena dis-misses eclipse attacks [29] on the Bitcoin network, whereas we con-sider them well within the scope of a MitM attacker. Chainiac [26]is a system for proactive software update transparency based on averifiable data structure called a skipchain. Chainiac uses a consen-sus mechanism based on Collective Signing (CoSi) [30], leading tothe need for an authority to maintain a Sybil-free set of nodes.Finally, in terms of more general solutions, Chase and Meiklejohnabstract CT into the general idea of a “transparency overlay” [6]and prove its security. Similarly, CoSi [20, 30] is a general consensusmechanism that shares our goal of providing transparency even inthe face of MitM attacks and thus avoids gossiping, but requiressetting up a distributed set of “witnesses” that is free of Sybils. Thisis a deployment overhead that we avoid. Software distribution on modern desktop and mobile operatingsystems is managed through centralized software repositories suchas the Apple App Store, the Android Play Store, or the MicrosoftStore. Most Linux distributions such as Debian also have their ownsoftware repositories from which administrators can install andupdate software packages using command-line programs.To reduce the trust required in these repositories, efforts suchas deterministic builds allow users to verify that a compiled binarycorresponds to the published source code of open-source software, atraditionally difficult process due to sources of non-determinism inbuild processes. Deterministic builds are achieved by recording theenvironment when building software, then replaying the behaviorof this environment in later builds to achieve the same results [8].While this prevents developers from inserting malicious code intothe compiled binaries (i.e., making their code public but includinga different version in the actual binary), it does not address thetargeted malware threat that Contour aims to solve, in which thesource code (or binary) for one targeted set of users is differentfrom the copy received by everyone else.

The concept of a blockchain was first used in Bitcoin, which isdesigned to be a globally consistent append-only ledger of financial transactions [25]. Given our limited usage of Bitcoin, we focus forbrevity only on the properties that we require for Contour.Briefly, the Bitcoin blockchain is (literally) a chain of blocks. Eachblock contains two components: a header and a list of transactions.In addition to other metadata, the header stores the hash of theblock (which, in compliance with the proof-of-work consensusmechanism, must be below some threshold in order to show that acertain amount of so-called “hashing power” has been expended toform the block), the hash of the previous block (thus enabling thechain property), and the root of the Merkle tree that consists of alltransactions in the block.On the constructive side, while the scripting language used byBitcoin is (intentionally) limited in its functionality, Bitcoin transac-tions can nevertheless store small amounts of arbitrary data. Thismakes Bitcoin potentially useful for other applications that mayrequire the properties of its ledger, such as certifying the owner-ship and timestamp of a document [3]. One mechanism that allowsBitcoin to store such data is the script opcode

OP_RETURN , whichcan be used to embed up to 80 bytes of arbitrary data.Another aspect of Bitcoin that enables additional development isthe idea of an SPV (Simplified Payment Verification) client. Ratherthan perform the expensive verification of the digital signaturescontained in Bitcoin transactions, or the checks necessary to de-termine whether or not double-spending has taken place, theseclients check only that a given transaction has made it into someblock in the blockchain. As this can be achieved using only theroot hashes stored in the block headers, such clients can store onlythese headers (which are small) and verify only Merkle proofs ofinclusion obtained from “full” nodes (which is fast), and are thussignificantly more efficient than their full node counterparts.On the destructive side, various attacks have been demonstratedthat undermine the security guarantees of Bitcoin. In eclipse at-tacks [2, 16, 18], an adversary exploits the topology of the Bitcoinnetwork to interrupt, or at least delay, the delivery of announce-ments of new transactions and blocks to a victim node. More ex-pensive “51%” attacks, in which the adversary controls more thanhalf of the collective hashing power of the network, allow the ad-versary to fork the blockchain, and it has been demonstrated [12]that such attacks can in fact be carried out with far less than 51%of the hashing power. In this section, we describe the actors in the ecosystem for softwaredistribution transparency (Section 4.1), along with the interactionsbetween these actors (Section 4.2), and the goals we hope to achievein this setting (Section 4.3).

We consider a system with five types of actors: services, authorities,monitors, auditors, and clients. We describe each of these typesbelow in the singular, but for the correct and secure functioning ofa transparency overlay we require a distributed set of auditors andmonitors, each acting independently. en.bitcoin.it/wiki/OP_RETURN ervice: The service is responsible for producing actions, such asthe issuance of a software update. In order to have these binariesauthorized, they must be sent to the authority.

Authority:

The authority is responsible for publishing statements that declare it has received a given software binary from a service.These statements also claim that the authority has — in someform — published these binaries in a way that allows them to beinspected by the monitor. The authority is also responsible forplacing its statements into a public audit log , where they can beefficiently verified by the auditor.

Monitor:

The monitor is responsible for inspecting the binariespublished by the authority and performing out-of-band tests todetermine their validity (e.g., to ensure that software updates donot contain malware).

Auditor:

The auditor is responsible for checking specific binariesagainst the statements made by the authority that claim they arepublished.

Client:

The client receives software updates from either the au-thority or the service, along with a statement that claims theupdate has been published for inspection. It outsources all respon-sibility to the auditor, so in practice the auditor can be thoughtof as software that sits on the client (thus making the client andauditor the same actor, which we assume for the rest of the paper).

In terms of the interactions between these entities, one of the mainbenefits of Contour — as discussed in the introduction — is that en-tities do not need to engage in prolonged multi-round interactionslike gossiping, but rather pass messages atomically to one another.As we see in Section 6.1, this makes it significantly more expen-sive for an adversary to present undetected split views of a log bylaunching man-in-the-middle attacks. We therefore outline onlynon-interactive algorithms needed to generate messages, ratherthan interactive protocols, and wait to specify the exact inputs andoutputs until we present our construction in Section 5.

Authority . commit : The authority runs this algorithm to commitstatements to the audit log.

Authority . prove _ incl : The authority runs this algorithm to pro-vide a proof that a specific statement is in the audit log.

Auditor . check _ incl : The auditor runs this algorithm to check theproof of inclusion for a specific statement.

Monitor . get _ commits : The monitor runs this algorithm to retrieverelevant commitments from the audit log.

We break the goals of the system down into security goals (denotedwith an S) and deployability goals (denoted with a D).As discussed in the introduction, it is especially crucial in thesetting of binary transparency to consider adversaries that canperform persistent man-in-the-middle attacks, as it is realistic thatthey would be able to compromise the client’s machine. Like certifi-cate transparency (but unlike key transparency), we do not need tomake the contents of the audit log private, as binaries are assumedto be public information, but we do need to guarantee privacy forthe specific binaries that a client downloads, as this could reveal that a client has a software version susceptible to malware. Finally,even though binaries are typically large, we need to neverthelessprovide a solution efficient enough to be deployed in practice.Keeping these requirements in mind, we aim in all our securitygoals to defend against the specified attacks in the face of maliciousauthorities that, in addition to performing all the usual actions ofthe authority, can also perform man-in-the-middle attacks on theauditor’s network communications. If additional adversaries areconsidered we state them explicitly.

S1: No split views.

We should prevent split-view attacks, in whichthe information contained in the audit log convinces the auditorthat the authority published a binary, and thus it is able to beinspected by monitors, whereas in fact it is not and only appearsthat way in the auditor’s “split” view of the log.

S2: Availability.

We should prevent attacks on availability, inwhich the information contained in the audit log convinces theauditor that a binary is available to be inspected by monitors,when in fact the authority has not published it or has, after theinitial publication, lost it or intentionally taken it down.

S3: Auditor privacy.

We should ensure that the specific binariesin which the auditor is interested are not revealed to any otherparties. We thus consider how to achieve this not only in the faceof malicious authorities, but in the case in which all parties asidefrom the auditor are malicious.

D1: Efficiency.

Contour should operate as efficiently as possible,in terms of computational, storage, and communication costs. Inparticularly, the overhead beyond the existing requirements for asoftware distribution system should be minimal.

D2: Minimal setup.

In addition to the computational overheads,we would like as little effort — in terms of, e.g., coordination —to be done as possible in order to deploy Contour, and for it torequire the minimal amount of change to the existing system.

Bitcoinblockchain auditor authoritymonitor commit get_commits get_commits

7. inspectsbinary data(from archivalnode or authority) serviceclient

1. sendsbinary10. get_arch_state

9. mirrorsbinary data archivalnode archivalnode archivalnode n archival nodes

4. getheaders 3. sends binaryand proofof inclusion( prove_incl )

5. check_incl

Figure 1: The overall structure of Contour. Dashed lines rep-resent steps that are required only if archival nodes are used.

In this section we describe the overall design of Contour. Anoverview of all the interactions in the system can be seen in Figure 1. .1 Setup and instantiation Contour and its security properties make use of a blockchain, whoseprimary purpose — as we see in Section 6.1 — is to provide an im-mutable ledger that prevents split-view attacks. Because the Bitcoinblockchain is currently the most expensive to attack, we use it hereand in our security analysis in Section 6.1, but observe that anyblockchain could be used in its place. An authority must initiallyestablish a known Bitcoin address that Contour commitments arepublished with. As knowledge of the private key associated withthe Bitcoin address is required to sign transactions to spend trans-action outputs sent to the address, this acts as the root-of-trust forthe authority. This address can be an embedded value in the auditorsoftware. An initial amount of coins must be sent to the Bitcoinaddress to enable it to start making transactions from the address.

To start, the authority receives information from services; i.e., soft-ware binaries from the developers of the relevant packages (Step 1of Figure 1). As it receives such a binary, it incorporates its hashas a leaf in a Merkle tree with root h T . The root, coupled with thepath down to the leaf representing the binary, thus proves that theauthority has seen the binary, so we view the root as a batchedstatement attesting to the fact that the authority has seen all thebinaries represented in the tree. Once the Merkle tree reaches some(dynamically chosen) threshold n in size, the authority runs the commit algorithm (Step 2 of Figure 1) as follows: commit ( h T ) : Form a Bitcoin transaction in which one of the out-puts embeds h T by using OP_RETURN . One of the inputs must bea previous transaction output that can only be spent by the au-thority’s Bitcoin address (i.e. a standard Bitcoin transaction tothe authority’s address). The other outputs are optional and maysimply send the coins back to the authority’s address, according tothe miner’s fees it wants to pay. (See Section 7.2 for some concretechoices.) Sign the transaction with the address’s private key andpublish to the Bitcoin blockchain and return the raw transactiondata, denoted tx .Crucially, the commit algorithm stores only the root hash in thetransaction, meaning its size is independent of the number of state-ments it represents. Furthermore, if the blockchain is append-only —i.e., if double spending is prevented — then the log represented bythe commitments in the blockchain is append-only as well. After committing a batch of binaries to the blockchain, the authoritycan now make these binaries accessible to clients. When a clientrequests a software update, the authority sends not only the relevantbinary, but also an accompanying proof of inclusion, which assertsthat the binary has been placed in the log and is thus accessible tomonitors (Step 3 of Figure 1).To generate this proof, the authority must first wait for its trans-action to be included in the blockchain (or, for improved security,for it to be embedded k blocks into the chain). We denote the headerof the block in which it was included as head B . The proof thenneeds to convince anyone checking it of two things: (1) that therelevant binary is included in a Merkle tree produced by the au-thority and (2) that the transaction representing this Merkle tree is in the blockchain. Thus, as illustrated in Figure 2, this meansproviding a path of hashes leading from the values retrieved fromthe blockchain to a hash of the statement itself. block header blockchain(transactionsMerkle tree) transactionmerkle rootintermediatenode intermediatenode transaction transaction transaction transaction ... ... binary batchmerkle rootintermediatenode intermediatenode binary binary binary binary ... ... ... ......... transactionmerkle root block header block header n transactionmerkle root (binariesMerkle tree) Figure 2: An example of a path of hashes leading from theblock’s transactions Merkle root to the hash of bin . For a given binary bin , the algorithm prove _ incl thus runs asfollows: prove _ incl ( tx , head B , bin ) : First, form a Merkle proof for the in-clusion of tx in the block represented by head B . This means form-ing a path from the root hash stored in head B to the leaf repre-senting tx ; denote these intermediate hashes by π tx . Second, forma Merkle proof for the inclusion of bin in the Merkle tree repre-sented by tx (using the hash h T stored in the OP_RETURN output)by forming a path from h T to the leaf representing bin ; denotethese intermediate hashes by π bin . Return ( head B , tx , π tx , π bin ) . To verify this proof, the auditor must check the Merkle proofs, andmust also check the authority’s version of the block header againstits own knowledge of the Bitcoin blockchain. This means that theauditor must first keep up-to-date on the headers in the blockchain,which it obtains by running an SPV client (Step 4 in Figure 1). Byrunning this client, the auditor builds up a set S = { head B i } i ofblock headers, which it can check against the values in the proof ofinclusion. This means that, for a binary bin , check _ incl (Step 5 inFigure 1) runs as follows: check _ incl ( S , bin , ( head B , tx , π tx , π bin )) : First, check that head B ∈ S ; output 0 if not. Next, extract h T from tx (using the hash storedin the OP_RETURN output), form h bin ← H ( bin ) , and check that π bin forms a path from the leaf h bin to the root h T . Finally, form h tx ← H ( tx ) , and check that π tx forms a path from the leaf h tx tothe root hash in head B . If both these checks pass then output 1;otherwise output 0.As well as verifying the inclusion proof, the auditor must alsocheck that the address that the proof’s transaction was sent frommatches the authority’s address (i.e. one of the transaction inputsmust be a previous transaction output that can only be spent bythe authority’s address). .5 Ensuring availability Independently of auditors, monitors must retrieve all commitmentsassociated with the authority from the blockchain and mirror theirbinaries (Steps 6 and 7 of Figure 1). This means get _ commits runsas follows: get _ commits () : Retrieve all transactions in the blockchain sentwith the authority’s address, and return the hashes stored in the

OP_RETURN outputs.After checking the binaries against their commitments, the mon-itors then inspect them — to, e.g., ensure they are not malware — inways we consider outside of the scope of this paper.While the system we have described thus far functions correctlyand allows monitors to detect if an authority has committed to abinary but not published it, in order to make the binaries themselvesavailable for inspection, we assume the monitors can mirror theauthority’s logs. It therefore fails to satisfy our goal of availabilityin the event that the authority goes down at some point in time.We thus consider the case where the authority commits binariesto the blockchain, but — either intentionally or because it loses thedata sometime in the future — does not supply the data to moni-tors. While this is detectable, as monitors can see that there arecommitments in the blockchain with no data behind them, to disin-centive this behavior requires some retroactive real-world methodof punishment. More importantly, it prevents the monitor from pin-pointing specific bad actions, such as malicious binaries, and thusfrom identifying potential victims of the authority’s misbehavior.Because of this, it is thus desirable to not only enable the de-tection of this form of misbehavior, but in fact to prevent it fromhappening in the first place. One way to achieve this is to have au-ditors mirror the binary themselves and send it to monitors beforeaccepting it, to ensure that they have seen it and believe it to bebenign. While this would be effective, and is arguably practical in asetting such as Certificate Transparency (modulo concerns aboutprivacy) where the objects being sent are relatively small, in thesetting of software distribution — where the objects being sent arelarge binaries — it is too inefficient to be considered.Instead, we propose a new actor in the ecosystem presentedin Section 4: archival nodes, or archivists , that are responsible formirroring all data from the authority (Steps 8 and 9 in Figure 1).To gain the extra guarantee that the data is available to monitors,auditors may thus use any archival nodes of which they are awareto check their state (i.e., the most recent block header for whichthey have data from the authority) and ensure that they cover theblock headers relevant to the proofs they are checking (Step 10 inFigure 1). This means adding the following two algorithms to thelist in Section 4.2:

Archivist . get _ commits () : The archivist runs this algorithm to ac-cess the commitments made by the authority, just as is done bythe monitor (using the same algorithm).

Auditor . get _ arch _ state () : The auditor (optionally) runs this al-gorithm to obtain the state of any archivists of which it is aware.This is simply the latest block header for which the archival nodehas mirrored the data behind the commitments held within.Using archival nodes makes it possible to continue to pinpointspecific bad actions in the past (e.g., the publication of malware), even if the authority loses or stops providing this data, but we stressthat their usage is optional and affects only availability. Essentially,archival nodes allow for a more granular detection of the misbehav-ior of an authority, but do come at the cost of requiring additionalnodes to store a potentially large amount of data. If such granularityis not necessary, or if the system has no natural candidates with thenecessary storage requirements, then archival nodes do not need tobe used and the system still remains secure. In Section 8 we explorethe role of the archival nodes in the Debian ecosystem and discoverthat, while the storage costs are indeed expensive, there is alreadyat least one entity playing this role.

In this section, we evaluate Contour in terms of how well it meetsthe security goals (Section 6.1) and deployability goals (Section 6.2)specified in our threat model in Section 4.3. We also compare itwith respect to previous solutions in Section 6.3, and argue that itis the only system to achieve all our goals.

In order to prevent split views, we relyon the security of the Bitcoin blockchain and its associated proof-of-work-based consensus mechanism. If every party has the sameview of the blockchain, then split views of the log are impossible,as there is a unique commitment to the state of the log at any givenpoint in time. The ability to prevent split views therefore reducesto the ability to carry out attacks on the Bitcoin blockchain.If, for whatever reason, the adversary cannot carry out an eclipseattack, then it can perform a split-view attack only if it can fork theBitcoin blockchain. This naïvely requires it to control 51% of thenetwork’s mining power, which we estimate would cost roughly2043M USD in electricity and hardware costs as of December 2017(see Appendix A for the analysis). Regardless of the exact number, itis generally agreed that carrying out such an attack is prohibitivelyexpensive.If an eclipse attack is possible, due to the adversary’s MitMcapability, the adversary can “pause” the auditor at a block heightrepresenting some previous state of the log, and can prevent theauditor from hearing about new blocks past this height. It is thenfree to mine blocks at its own pace, and so performing a split-view attack would be significantly cheaper. As a key distinguishingproperty of Contour’s threat model is that split-view attacks shouldbe prevented even in the face of an adversary that can carry outsuch attacks, it is important to consider the nuances and costs ofthis attack, especially as we are not aware of any previous literatureconsidering the costs of eclipse attacks on Bitcoin nodes.The cost of performing an eclipse attack depends on how muchtime the adversary has to perform a split-view attack, as the hashrate depends on the number of mining rigs available. As a roughestimate (see Appendix A for calculations), if auditors consider aBitcoin transaction to be confirmed after 6 blocks (the standard formost Bitcoin wallets), then as of December 2017 the attack wouldcost 8.3M USD if the adversary wants to perform the attack withina week. This would mean, however, that the auditor would receivea new block only every 1.4 days, which would be detectable as aneclipse attack. If auditors conservatively require that new blocks rrive in intervals of up to three hours before assuming that theyare the victim of an eclipse attack, then as of December 2017 anattack would cost roughly 91.8M USD. While the decentralized (and thus fullyreplicated) nature of the blockchain can guarantee availability, itguarantees these properties only with respect to the commitmentsto statements made by the authority, rather than with respect tothe statements — and thus the binaries — themselves. As discussedin Section 5.5, the use of the blockchain thus does not guaranteethat binaries are actually available for inspection, or will continueto be into the future.Even just using monitors, Contour can already detect that anauthority committed a statement without making the statementdata (i.e., the actual binaries) available. Using the archival nodesintroduced in Section 5.5, we can achieve a stronger notion ofavailability — in which as long as the binaries have been publishedat some point they can be retrieved indefinitely into the future —as long as these nodes are honest about whether or not they havemirrored the relevant data.In binary transparency, many ISPs and hosting providers alreadyprovide their customers local mirrors of Debian repositories. Wetherefore envision that ISPs can act as archival nodes on behalfof their hosting clients, which creates a decentralized network ofarchival nodes. We elaborate on the overheads required to do so inSections 7.2 and 8.

Recall from Section 4.2 that one ofthe goals of Contour was to avoid prolonged interactions and en-gage only in the atomic exchange of messages. In particular, theauditor receives pre-formed proofs of inclusion from the authority(as opposed to having to request them for specific binaries, as theywould in all certificate and key transparency systems), retrievescommitments directly from the blockchain, does not engage in anyform of gossip with monitors, and receives the latest block hashfrom archival nodes without providing any input of its own. Wethus achieve privacy by design, as at no point does the auditorreveal the statements in which it is interested to any other party.One particular point to highlight is that Contour achieves auditorprivacy despite the fact that auditors run SPV clients, which areknown to potentially introduce privacy issues due to the use ofBloom filtering and the reliance on full nodes. This is because theproofs of inclusion contain both the raw transaction data and theblock header, so the auditor does not need query a full node for theinclusion of the transaction and can instead verify it itself (and, asa bonus, saves the bandwidth costs of doing so).

Table 1 summarizes the computationalcomplexity of each of the operations required to run Contour, andTable 2 summarizes the size complexity (which in turn informs thebandwidth requirements, as we explore further in Section 7.2).As we will see in Sections 7.2 and 8, in real deployments of Con-tour there are already significant storage costs for the authorityand archival nodes, as they must store the full set of binaries. Ittherefore does not impose a significant additional burden to havethem perform relatively inefficient (i.e., linear in n ) operations or Operation Time complexity commit O ( n S ) prove _ incl (one-time) O ( log ( n T )) prove _ incl (per statement) O ( log ( n S )) check _ incl O ( log ( n S ) + log ( n T )) Table 1: Asymptotic computational costs for the operationsof Contour, where n S is the number of statements in a batchand n T is the number of transactions in a block. Object Size ComplexityInclusion proof O ( log ( n S ) + log ( n T )) Log commitment ( tx ) O ( ) Archival node data overhead O ( n S ) Table 2: Asymptotic storage costs for the objects in Contour,where n S is the number of statements in a batch and n T isthe number of transactions in a block. store relatively inefficient objects. As for the end-user devices onwhich the auditor is run, we impose a relatively minimal perfor-mance overhead (with everything logarithmic in n S and/or n T ),and confirm this in Section 7.2.3. In terms of coordination, the onlysetup requirement in Contour is the role of the archival nodes, asthe rest is just a matter of adding software. As we will see in Sec-tion 8 when we look at Debian, in some settings there are alreadynatural candidates for these actors, but if these actors are not in-terested in the guarantees of Contour then we can still deploy itwithout requiring the existing actors to change their behavior. Moreimportantly, there are no trust requirements placed on these nodesto prevent log equivocation: even if archival nodes misbehave, mon-itors can still individually detect misbehavior by an authority thatpublishes commitments but not the underlying data. This is in starkcontrast to previous solutions that require the initial establishmentof a semi-trusted set of nodes.

To fully pinpoint both the benefits and tradeoffs of Contour, wecompare it with several known systems designed to provide trans-parency. In particular, we consider the tradeoffs as compared to Cer-tificate Transparency (CT), Collective Signing (CoSi) [30], CONIKS [24],and Bitcoin. We summarize these tradeoffs in Table 3.Looking at Table 3, we first mention that the efficiency numbersfor CoSi are somewhat misleading, as there is no global log andthus no notion of checking inclusion in the log; this is why we listthe efficiency costs as constant. In fact, only Bitcoin and Contourensure a globally consistent ledger, as certificates are stored in adistributed set of logs in CT and CONIKS and there is no proposedmethod for achieving consensus amongst them.Arguably the main benefit of both CT and CONIKS is their effi-ciency, as the auditor is required to store only a single hash. Thetradeoff, however, is that they cannot prevent the authority fromlaunching a split-view attack, but instead rely on gossiping mech-anisms to detect such misbehavior after the fact. As discussed in ecurity goals (S1-S3) Deployability goals (D1-D2)Split views Availability Auditor privacy Efficiency (cost) Efficiency (size) Minimal setupCT detect no* no log ( n ) ( n ) n n yesContour prevent yes yes log ( n ) b yes Table 3: A comparison between existing solutions and Contour in terms of the five goals presented in Section 4.3. For efficiency,we measure the asymptotic costs for the auditor in terms of both the computations it must perform (‘cost’) and the data it muststore (‘size’). We use n to denote the number of statements and b to denote the number (but not size) of blocks in the Bitcoinblockchain. For CoSi, availability is not a explicit requirement, but can be satisfied as long as at least one witness retains thedata, and for CT it is not satisfied by the basic design but could be if auditors and monitors gossiped about certificates. the introduction, this is problematic in a setting — like binary trans-parency — in which adversaries can launch persistent man-in-the-middle attacks. These systems also do not achieve robust privacyfor the auditor, as it must periodically reveal information to theauthority (or the monitor) about the objects in which it is interested.The other main tradeoff we observe is, perhaps unsurprisingly,between efficiency and setup costs. The first three systems all re-quire the establishment of some initial set of distributed entities —in the case of CT, log servers are essentially authorized by Google,in the case of CONIKS, identity providers are chosen by users andlisted in a PKI, and in the case of CoSi, witnesses must form a Sybil-free set — that are trusted to some extent (if not individually, thenas a group). We require no such setup, which means Contour ismuch more easily integrated into existing systems.In contrast, in both Bitcoin and Contour, the blockchain is main-tained by a decentralized network and is not subject to interventionby central authorities. While Contour mitigates the inefficiencyof Bitcoin, it still requires the auditor to store some informationfrom all the block headers. We show in the next two sections thatContour is nevertheless efficient enough to be practical, but leave itas an interesting open problem to investigate to which extent thesetradeoffs between efficiency and decentralization are inherent. To test Contour and analyze its performance, we have implementedand provided benchmarks for a prototype Python module andtoolset that developers can use. We have released the implementa-tion as an open-source project. The implementation consists of roughly 1000 lines of Python code,and provides a set of developer APIs and corresponding command-line tools. We used SHA-256 as the hashing algorithm to buildMerkle trees, and modified versions (for Bitcoin compatibility) of anexisting Merkle tree implementation (https://github.com/jvsteiner/merkletree) and a Python-based Bitcoin library pycoinnet (https://github.com/richardkiss/pycoinnet/) in order to develop our Merkletree and SPV client, respectively.

Authority:

We provide API calls for

Authority . commit , whichcommits batches of statements to the Bitcoin blockchain, and https://github.com/musalbas/contour Authority . prove _ incl , which allows it to generate inclusion proofsfor individual statements. Auditor:

We provide an API call for

Auditor . check _ incl , whichallows end-user software to verify proofs of inclusion. We alsoprovide an Auditor . sync call that uses the Bitcoin SPV protocol todownload and verify all the block hashes in the Bitcoin blockchain,so that inclusion proofs can be efficiently verified independentlyof third parties. (This call needs to be run only once.) Monitor:

We provide an API call for

Monitor . get _ commits , whichgets all the statement batches associated with a specific authority.Monitors can then use these commitments to check the validityof the statement data (which they can retrieve from the authorityor an archival node using a web server), and do whatever manualinspection is necessary; we consider this functionality outside ofthe scope of this paper. Archival node:

The archival node API can be used to operate anarchival node, by specifying the authority’s Bitcoin address andweb address where statement data is published. The archival stateand mirrored statement data is stored as flat files on disk, allowingthe archival node to provide access to auditors and monitors byrunning a web server. By accessing the archival state via a HTTPSserver, auditors can securely authenticate the state of the archivalnode using public-key cryptography.

To evaluate the performance of our implementation, we testedall the operations listed above on a laptop with an Intel Core i52.60 GHz CPU and 12 GB of RAM, that was connected to a WiFinetwork with an Internet connection of 5 Mbit/s. We also assumethat a batch to be committed contains 1 million statements, althoughas was seen in Table 1 — and will be confirmed later on in Figure 3 —these numbers scale as expected (either logarithmically or linearly),so it is easy to extrapolate the results for other batch sizes giventhe ones we present here.We consider the complexity of these operations in terms of theircomputational, storage, and bandwidth requirements. A summaryof our timing benchmarks can be found in Table 4, and our band-width requirements are in Table 5.

The overhead of bothgenerating and verifying a proof of inclusion is dependent on thenumber of transactions in a Bitcoin block. To capture the worst-case peration Time (µs) σ (µs) commit prove _ incl (one-time) 8 . . prove _ incl (per statement) 12 6 . check _ incl

224 62 . Table 4: Average time of individual operations, and stan-dard deviation σ , when the batch size is 1M. The timings for commit were averaged over 20 runs, and for prove _ incl and check _ incl over 1M runs. The timings for commit are in boldto emphasize that they are in seconds, not microseconds. Operation Bandwidth

Authority . commit (using APIs) 1 MB Authority . commit (one-time setup for full node) 126 GB Authority . commit (using full node) 235 B Auditor . sync . Auditor . prove _ incl . Table 5: The bandwidth cost of operations, when the batchsize is 1M. The cost of

Authority . commit depends on whetheror not the authority is running a full Bitcoin node or relyingon third party APIs. For running a full node, there is a one-time setup cost to synchronize the blockchain. scenario, we consider the maximum number of transactions thatcan fit into a block. Currently, the Bitcoin block size limit is 1 MB,up to 97 bytes of which is non-transaction data. The minimumtransaction size is 166 bytes, so the upper bound on the number oftransactions in a given block is 6,023. While this is far higher thanthe number of transactions that Bitcoin blocks currently contain, we nevertheless use it as a worst-case cost and an acknowledgmentthat Bitcoin is evolving and blocks may grow in the future. To run commit and prove _ incl , anauthority must have access to the full blocks in the Bitcoin blockchain,as well as the ability to broadcast transactions to the network.Rather than achieve these by running the authority as a full node,our implementation uses external blockchain APIs supplied byblockchain.info and blockcypher.com. This decision was based onthe improved efficiency and ease of development for prototyping,but it does not affect the security of the system: authorities do notneed to validate the blockchain, as invalid blocks from a dishon-est external API simply result in invalid inclusion proofs that arerejected by the auditor.To run commit , an authority must first build the Merkle treecontaining its statements. Sampled over 20 runs, the average timeto build a Merkle tree for 1M statements was 5 . σ = .

29 s). Afterbuilding the tree, an authority next embeds its root hash (which is32 bytes) into an

OP_RETURN

Bitcoin transaction to broadcast to thenetwork. Sampled over 1,000 runs, the average time to generate thistransaction — in the standard case of one input and two outputs,one for

OP_RETURN and one for the authority’s change — was 0 .

03 s( σ = .

007 s). The average total time to run commit was thus 5 .

93 s, https://en.bitcoin.it/wiki/Block https://en.bitcoin.it/wiki/Maximum_transaction_rate https://blockchain.info/charts/n-transactions-per-block as seen in Table 4, and it resulted in 235 bytes (the size of thetransaction) being broadcast to the network.Next, to run prove _ incl , the authority proceeds in two phases:first constructing the Merkle proof for its transaction within theblock where it eventually appears, and next constructing the Merkleproof for each statement represented in a transaction. The timefor the first phase, averaged over 1M runs and for a block with6,023 transactions (our upper bound from Section 7.2.1), was 8 . For the auditor, we considered twocosts: the initial cost to retrieve the necessary header data ( sync ),and the cost to verify an inclusion proof ( check _ incl ). We do notprovide benchmarks for the Auditor . get _ arch _ state call, as thisis a simple web request that returns a single 32-byte hash.To run sync , auditors use the Bitcoin SPV protocol to downloadand verify the headers of each block, which are 80 bytes each. Asof December 5 2017, there are 497,723 valid mined blocks, whichequates to 39 . . . . check _ incl , we again use our upper bound from Sec-tion 7.2.1 and assume every block contains 6,023 transactions. Thismeans the inclusion proof contains: (1) an 80-byte block header;(2) the raw transaction data, which is 235 bytes; (3) a Merkle prooffor the transaction, which consists of log ( ) − ( ) − A v e r a g e t i m e f o r check _ incl ( µ s ) Figure 3: The time to verify an inclusion proof with varyingbatch sizes, averaged over 100K runs. runs, the time for the auditor to verify the inclusion proof was224 µs ( σ = .

14 µs).To confirm that the time to run check _ incl scales logarithmi-cally with the number of statements in the batch, we also ran it forvarying numbers of statements. The results are in Figure 3. Monitors must run a Bitcoin full nodein order to get a complete uncensored view of the blockchain. Asof December 2017, running a full node requires 145 GB of free diskspace, increasing by up to 144 MB daily. It took us around threedays to fully bootstrap a full node and verify all the blocks, althoughthis operation needs to be performed only once per monitor.

Like monitors, archival nodesneed to run a Bitcoin full node. Additionally, archival nodes mustdownload and store all the data from the authority. The costs hereare entirely dependent on the number and size of the statements;we examine the costs for Debian in Section 8.In order for archival nodes to know which statement data todownload from authorities to independently rebuild the Merkletree roots committed in Bitcoin transactions and check that theymatch with the data provided, authorities must point the archivalnodes to the location of the data. Again, this is dependent on themechanism that the authority uses to make the data available.As in Debian, however, archives use statements that representfiles. We may therefore expect that, in addition to a Merkle tree,authorities would use metadata files to link each leaf in the tree toa file on the server that archival nodes then mirror; this would beparticularly useful in a setting — like Debian — where it would beundesirable to reorganize files that are already stored. The metadatafile would consist of a mapping of 32-byte hashes to filenames. Theaverage Debian package filename is 60 bytes, so including sucha metadata file would introduce an average storage overhead, forboth authorities and archival nodes, of 92 bytes per statement.

To demonstrate how Contour can be used on a real system, weprototyped it for auditing software binaries in the Debian softwarerepository. Our results show that Contour provides a way to addtransparency to this repository without major changes to the exist-ing infrastructure and with minimal overheads. It could be deployedon top of the Debian ecosystem today, without any participant whodid not want to opt in having to change their behavior.We begin with an overview of how Debian currently works, andthen go on to explain how existing actors in the ecosystem couldplay the roles necessary for Contour, along with the overheads.

Debian is a popular Linux distribution used by over 32% of websitesthat run Linux. Software packages are installed and updated onDebian machines using the apt command-line program. The Debiansoftware repository contains

Release files for various versions ofDebian, which are updated every time any package in the repositoryis updated. Each

Release file contains a checksum for a

Packages file, which contains a list of available software packages and theirassociated checksums for integrity checking.Software packages are downloaded as .deb archives which pro-vide the compiled binaries and scripts required to install a packageon a system. These files are hosted in directories on HTTP mirrors,of which hundreds exist around the world. To cryptographically authenticate software packages, Debianhas a set of tools called apt-secure . Debian installations comewith a built-in set of PGP keys [15] that are used as trusted keys forvalidating software packages. Alongside the

Release files in therepository, there are

Release.gpg files that contain PGP signaturesof the

Release files under trusted PGP keys. Through the single signature of a

Release file, apt can validatethat individual .deb packages were authorised by a trusted PGPkey by checking that the checksums of packages are included inthe

Packages file whose checksum is included in the root

Release file. This of course creates a central point of failure, as the owner ofthe signing key can serve individual users targeted

Release files —for example, if coerced to do so by law enforcement — that link tomalicious packages.

In the case of Debian software distribution, the most natural oper-ators for a Contour authority are the maintainers of the softwarerepository. Specifically, the Contour authority would be the ownerof the PGP key, as only this entity has the power to modify thesoftware repository. Importantly, it is also possible for third partiesto act as Contour authorities by proxy and commit binaries to thelog on behalf of the maintainers of the Debian software repository.As committed binaries are transparent, the third party is not trustedany more than the maintainers of the Debian software repositorywould be, as any rogue additions to the log would still be detectable.This means it would be possible to deploy Contour today withoutany intervention or permission from the Debian project itself. https://w3techs.com/technologies/details/os-linux/all/all https://wiki.debian.org/SecureApt . .deb ) file asmetadata to be downloaded by Debian machines. At 980K softwarepackages, this would require a maximum of 1 . . . .

07% overhead.

On the end-user side, the apt program would need to be modi-fied to integrate the

Auditor . check _ incl and Auditor . sync calls,as implemented and analyzed in Section 7. This would ensure thatdownloaded packages are in the log before being installed.In terms of overhead for end-user Debian machines, as discussedabove this would require an extra 1 . .deb archive size of 190 MB. Verify-ing that each of these are in the log would require an extra 698 . Debian’s reproducible builds project allows any interested partiesto verify that binaries published in the software repositories arecompiled from a given source code. There are no specific partiesassigned to the role of monitoring builds to see if they can be builtfrom the source code. Similarly in Contour, any parties vested in thesecurity of Debian may act as a monitor. Aside from end users, weanticipate that large organizations supplying critical infrastructureusing Debian, national CERTs, and NGOs such as the ElectronicFrontier Foundation would have an interest in monitoring the log.Generally, any party that wants extra guarantees about the soft-ware updates they are installing — e.g., in order to be sure that theupdates that have been pushed to their machines are the sameas those that have been pushed to other machines — should run amonitor. For example, if a party running Debian receives update and update on their machine for some software package, but thelog contains update , update , and update , then this raises a redflag as to why they did not receive update . In particular, update may be a malicious update targeted to specific machines, and theparty can check to see if the contents of update have been madeavailable by the authority. If they have not, then the authority isconsidered to be misbehaving. Optionally, honest archival nodeswould prevent auditors from accepting the update altogether. There are 269 Debian mirrors hosting the full 1 . . . In summary, Contour could be deployed on top of the existingsystem for Debian software distribution with minimal changes tothe existing infrastructure. In terms of operating costs, the biggestoverhead required to enable Contour is the extra storage space re-quired for archival nodes (and again, this cost is optional). All othercosts are minimal, with only a 0.07% storage overhead required for https://wiki.debian.org/ReproducibleBuilds snapshot.debian.org/ he authority, and a 0.1% bandwidth overhead for the end user. Thecomputational costs for these users are minimal as well.One distinguishing feature of Contour is that no existing partiesin the Debian infrastructure are required to participate if they donot want to, and as discussed earlier the security assumptions ofthe system would remain the same even if a third party acted as anauthority. This places Contour in contrast to existing proposals fortransparency (including some of the ones presented in Section 6.3),as they require the initial setup of some Sybil-free set of nodes.In contexts such as the distribution of Debian software packages,this assumption — and the security implications if it is violated —presents a significant obstacle to deployability, and avoiding thisobstacle was one of our main goals in designing Contour. Selective disclosure.

When releasing software updates that patchcritical security vulnerabilities, some software vendors may prefernot to reveal to potential attackers that, in the window of time inwhich a commitment has not yet been included in the blockchain,they can take advantage of victims with this vulnerable softwareinstalled. In such a case, Contour accounts for this by allowing theauthority to commit to a batch of binaries visibly on the blockchain,but delay the publication of the binaries themselves until the com-mitment is sufficiently deep in the blockchain.

Generalized transparency.

Although we have designed Contourfor the specific application of binary transparency, the system isgeneral enough to be applied to other applications requiring trans-parency. With the tradeoffs discussed in Section 6.3, it can evenbe applied to the setting of certificate transparency by using CAsas authorities, although it may be most beneficial in settings thatpresent similar challenges to the ones discussed in the introduction(i.e., in which objects are large and persistent MitM attacks are arealistic threat).

Archival node scalability.

The current design of Contour requiresarchival nodes to store all data, which as we have discussed inSection 8 incurs a significant overhead. There are likely many alter-native designs that alleviate these requirements, such as a sharded solution in which archival nodes store only the data for which theysufficient space, and we leave an exploration of this space as aninteresting open problem.

10 CONCLUSION

We have proposed Contour, a system that provides proactive trans-parency, logarithmic scaling for auditors in the number of packagesthey have installed, and does not require the initial coordination offorming a Sybil-free set of nodes. We have demonstrated that, evenfor attackers that are capable of performing persistent man-in-the-middle attacks, compromising the integrity of the system requiresmillions of dollars in energy and hardware costs. We also saw thatContour could be applied today to the Debian software repositorywith relatively low overhead to existing infrastructure, and withno changes or coordination required for any participant (even theDebian server) who does not wish to opt in.

ACKNOWLEDGEMENTS

Mustafa Al-Bassam is supported by a scholarship from the AlanTuring Institute, and Sarah Meiklejohn is supported by EPSRC grantEP/N028104/1.

REFERENCES [1] Security/binary transparency - mozillawiki, 2017. https://wiki.mozilla.org/Security/Binary_Transparency.[2] M. Apostolaki, A. Zohar, and L. Vanbever. Hijacking Bitcoin: Large-scale NetworkAttacks on Cryptocurrencies, 2016. arxiv.org/abs/1605.07524.[3] M. Bartoletti and L. Pompianu. An analysis of Bitcoin OP_RETURN metadata.In , 2017.[4] D. Basin, C. Cremers, T. H.-J. Kim, A. Perrig, R. Sasse, and P. Szalachowski. ARPKI:Attack Resilient Public-Key Infrastructure. In

ACM CCS 2014 , pages 382–393,2014.[5] J. Bonneau. EthIKS: Using Ethereum to audit a CONIKS key transparency log. In , 2016.[6] M. Chase and S. Meiklejohn. Transparency Overlays and Applications. In

ACMSIGSAC Conference on Computer and Communications Security , 2016.[7] L. Chuat, P. Szalachowski, A. Perrig, B. Laurie, and E. Messeri. Efficient GossipProtocols for Verifying the Consistency of Certificate Logs. In

IEEE Conferenceon Communications and Network Security , 2015.[8] X. de CarnÃľ de Carnavalet and M. Mannan. Challenges and implications of veri-fiable builds for security-critical open-source software. In , 2014.[9] B. Dowling, F. Günther, U. Herath, and D. Stebila. Secure Logging Schemes andCertificate Transparency. In

ESORICS 2016 , 2016.[10] A. Eijdenberg, B. Laurie, and A. Cutter. Verifiable Data Structures, 2015. github.com/google/trillian/blob/master/docs/VerifiableDataStructures.pdf.[11] S. Eskandarian, E. Messeri, J. Bonneau, and D. Boneh. Certificate transparencywith privacy.

CoRR , abs/1703.02209, 2017.[12] I. Eyal and E. G. Sirer. Majority is not enough: Bitcoin mining is vulnerable. In

Financial Cryptography and Data Security , 2014.[13] C. Farivar. Judge: Apple must help FBI unlock San Bernardinoshooter’s iPhone, 2016. arstechnica.com/tech-policy/2016/02/judge-apple-must-help-fbi-unlock-san-bernardino-shooters-iphone/.[14] C. Fromknecht, D. Velicanu, and S. Yakoubov. A decentralized public key in-frastructure with identity retention. IACR Cryptology ePrint Archive, Report2014/803, 2014. eprint.iacr.org/2014/803.pdf.[15] S. Garfinkel.

PGP: Pretty Good Privacy . O’Reilly Media, Sebastopol, CA, USA, 1stedition, 1996.[16] A. Gervais, H. Ritzdorf, G. Karame, and S. Capkun. Tampering with the Deliveryof Blocks and Transactions in Bitcoin. In

ACM CCS 2015 , 2015.[17] D. Goodin. “Flame” malware was signed by rogue Mi-crosoft certificate, 2012. arstechnica.com/security/2012/06/flame-malware-was-signed-by-rogue-microsoft-certificate/.[18] E. Heilman, A. Kendler, A. Zohar, and S. Goldberg. Eclipse Attacks on Bitcoin’sPeer-to-Peer Network. In

USENIX Security 2015 , 2015.[19] T. H.-J. Kim, L.-S. Huang, A. Perrig, C. Jackson, and V. Gligor. Accountable keyinfrastructure (AKI): a proposal for a public-key validation infrastructure. In

WWW 2013 , pages 679–690, 2013.[20] E. K. Kogias, P. Jovanovic, N. Gailly, I. Khoffi, L. Gasser, and B. Ford. EnhancingBitcoin Security and Performance with Strong Consistency via Collective Signing.In

USENIX Security 2016

USENIX Security 2015 , 2015.[25] S. Nakamoto. Bitcoin: A Peer-to-Peer Electronic Cash System, 2008. bitcoin.org/bitcoin.pdf.[26] K. Nikitin, E. Kokoris-Kogias, P. Jovanovic, N. Gailly, L. Gasser, I. Khoffi, J. Cappos,and B. Ford. CHAINIAC: Proactive software-update transparency via collec-tively signed skipchains and verified builds. In , pages 1271–1287, Vancouver, BC, 2017. USENIX Association.[27] L. Nordberg, D. Gillmor, and T. Ritter. Gossiping in CT, 2016. tools.ietf.org/html/draft-ietf-trans-gossip-03.[28] M. D. Ryan. Enhanced Certificate Transparency and End-to-end Encrypted Mail.In

NDSS 2014 , 2014.[29] A. Singh, T.-W. J. Ngan, P. Druschel, and D. S. Wallach. Eclipse attacks on overlaynetworks: Threats and defenses. In

IEEE Conference on Computer Communications ,2006.

30] E. Syta, I. Tamas, D. Visher, D. I. Wolinsky, P. Jovanovic, L. Gasser, N. Gailly,I. Khoffi, and B. Ford. Keeping Authorities “Honest or Bust” with DecentralizedWitness Cosigning. In

IEEE Symposium on Security and Privacy (“Oakland”) , 2016.[31] A. Tomescu and S. Devadas. Catena: Efficient Non-equivocation via Bitcoin. In

IEEE Symposium on Security and Privacy (“Oakland”) , 2017.

A COST OF A SPLIT-VIEW ATTACK

To support our argument in Section 6.1 about the infeasibility ofcarrying out a split-view attack, we provide here more concreteestimates for the associated costs of the attack. These are roughestimates, as they make assumptions about certain properties (e.g.,electricity costs and choice of mining hardware) that are not guaran-teed to hold in practice. We are not aware of any previous literatureconsidering the costs of eclipse attacks on Bitcoin nodes, so weconsider these estimates (even if rough) to be important.We first calculate the cost to mine a single block, and then analyzethe cost of performing a split-view attack in the case where theadversary is able to perform an eclipse attack and where it cannot.

Cost to mine a single block.

The probability of a miner findinga valid block after each hashing attempt is − D , where D is theperiodically adjusted difficulty of the network. For a miner to minea block then, they must make on average D − hashing attempts.The total electricity cost ( C ) of mining a block is thus C = D − · J · E , (1)where J is the number of joules required per hashing attempt, and E is the electricity cost of one joule. As of December 2017, the mostenergy-efficient Bitcoin mining hardware is the Antminer S9, whichhas an energy cost of 9 . · − joules per hash, and the averageretail price of one kilowatt hour in the US is 0.10 USD. The costper joule, E , is therefore . · · = . · − USD. As of December2017, the Bitcoin mining difficulty ( D ) is 1,347,001,430,558. Pluggingthese numbers into Equation 1, the total electricity cost to mine ablock, using the most efficient hardware and assuming standardelectricity costs, is thus 15,908 USD.To also take hardware costs into account, the number of miningrigs N needed to mine a block in S seconds is N = ( D − ) H · S , (2)where H is the number of hashes that the mining rig is capable ofcalculating per second. This formula is graphed in Figure 4 for theAntminer S9 rig, which is capable of calculating 14 terahashes persecond and has a retail cost of 2,400 USD. We use these formulasto estimate the cost of split-view attacks in the following analysis.

Using eclipse attacks.

If an eclipse attack is possible, an adver-sary can launch a successful split-view attack solely by mining k blocks at its own pace, where k is the number of blocks the auditorrequires to be mined after a block containing a given commitmentin order to consider that commitment as valid. (It is standard inmost Bitcoin wallets to use k = en.bitcoin.it/wiki/Mining_hardware_comparison Time to produce block (seconds) M i n i n g r i g s r e q u i r e d Figure 4: The number of Antminer S9 rigs required to pro-duce blocks under a certain time limit.

Using our rough estimates above, it would cost the adversary15,908 USD in electricity costs to mine a block, or 95,448 USD for k =

6. The hardware costs depends on how much time the adversaryneeds to conduct the attack, or how long they are able to continuetheir man-in-the-middle attack on the auditor. If — as a conservativenumber — the adversary wants to conduct the attack within a week,it must mine a block every 1.4 days to produce 6 blocks, whichrequires 3,417 mining rigs at a hardware cost of 8,200,800 USD.This brings the total cost of the attack to 8.3M USD. Moreover, thisattack is also fundamentally targeted: if the adversary wants to latercompromise previously non-eclipsed auditors, it must mine a newset of blocks (assuming these auditors have more up-to-date blocks)and pay the electricity costs again. Even for an adversary with fewfinancial constraints, this makes it significantly more difficult toconduct such an attack on a wide scale.Furthermore, if the adversary takes 1.4 days to mine a block,or in general the auditor sees no new blocks until long after theexpected 10-minute interval, it may assume that an eclipse attackis being performed. We can thus greatly increase the cost of theattack by adding simple checks to the auditor to ensure that thereis a maximum interval between blocks. If we generously set sucha check to require a maximum of 3 hours between blocks, then atotal of 38,263 mining rigs are required at a cost of 91.8M USD.In addition, the blocks must still follow the same difficulty levelas honest blocks, so by mining these only in the eclipsed view ofthe network the adversary is not only expending the energy neededto do so but is also forfeiting the mining reward associated withthem. As of December 5 2017, the Bitcoin mining reward is 12.5bitcoins, or roughly 145,250 USD, so for k = Ignoring eclipse attacks.

To perform a split-view attack withoutan eclipse attack, an adversary must fork the Bitcoin blockchain, hich naïvely requires control of 51% of the network’s miningpower.As of December 5 2017, the total hashing power of the Bitcoinnetwork was 11,918,845 terahashes per second. Conducting a 51%attack would therefore require the adversary to be able to computemore than 11,918,845 terahashes per second. Per hour, the totalelectricity cost would be 11918845 · · · J · E , or — usingour earlier estimates for J and E — 117,979 USD per hour. In termsof hardware costs, if we use the figures for the Antminer S9 frombefore, the total number of mining rigs required would be greaterthan · · = blockchain.info/charts/hash-rateblockchain.info/charts/hash-rate