[PDF] Auditing Australian Senate Ballots

Abstract

We explain why the Australian Electoral Commission should perform an audit of the paper Senate ballots against the published preference data files. We suggest four different post-election audit methods appropriate for Australian Senate elections. We have developed prototype code for all of them and tested it on preference data from the 2016 election.

Full PDF

aa r X i v : . [ c s . CR ] O c t Auditing Australian Senate Ballots

Berj Chilingirian ∗ , Zara Perumal , Ronald L. Rivest ,Grahame Bowland † , Andrew Conway ‡ , Philip B. Stark ,Michelle Blom , Chris Culnane , and Vanessa Teague Computer Science and Artiﬁcial Intelligence Laboratory,Massachusetts Institute of Technology. [berjc,zperumal,rivest]@mit.edu erinaceous.io , [email protected] Silicon Econometrics Pty. Ltd., [email protected] Department of Statistics, University of California, Berkeley. [email protected] Department of Computing and Information Systems, Universityof Melbourne. [michelle.blom,christopher.culnane,vjteague]@unimelb.edu.au

November 8, 2016

Abstract

We explain why the AEC should perform an audit of the paper Sen-ate ballots against the published preference data ﬁles. We suggest fourdiﬀerent post-election audit methods appropriate for Australian Senateelections. We have developed prototype code for all of them and tested iton preference data from the 2016 election. ∗ Authors are grouped by institution, in alphabetical order, and then listed in alphabeticalorder within each institution. † Grahame Bowland is a member of the Australian Greens. His contribution to this projecthas consisted entirely of help in implementing the Australian Senate counting rules and facil-itating Bayesian audits using his code. The techniques here are non-political. ‡ Andrew Conway is a member of the Secular Party. ontents Introduction

A vote in the Australian Senate is a list of handwritten numbers indicatingpreferences for candidates. Voters typically list about six preferences, but maylist any number from one to more than 200. Ballots are scanned, digitized andthen counted electronically using the Single Transferable Vote (STV) algorithm[Aus16].Automating the scanning and counting of Senate votes is a good idea. How-ever, we need to update our notion of “scrutiny” when so much of the processis electronic. We suggest that, when the preference data ﬁle for a state is pub-lished, there should be a statistical audit of a random sample of paper ballots.This should be performed in an open and transparent manner, in front of scru-tineers.Election outcomes must be accompanied by evidence that they accuratelyreﬂect the will of the voters. At the very least, the system should be

SoftwareIndependent [Riv08].A voting system is software independent if an undetected change orerror in its software cannot cause an undetectable change or error inan election outcome.This principle was articulated after security analyses of electronic votingmachines in the USA showed that the systems were insecure [FHF06, KSRW04,BEH +

08, CAt07]. The researchers found opportunities for widespread votemanipulation that could remain hidden, even from well-intentioned electoraloﬃcials who did their best to secure the systems.Followup research in Australia has shown election software, like any othersoftware, to be prone to errors and security problems [HT15, CBNT]. For thisreason, evidence of an accurate Senate outcome needs to be derived directlyfrom the paper ballots.Legislation around the scrutiny of the count has not kept pace with thetechnology and processes deployed to perform the count. As a result, the scru-tineering has lost a signiﬁcant portion of its value. With the adoption of anew counting process the scrutineering procedures need to be updated to tar-get diﬀerent aspects of the system. The current approach might comply withlegislation, but it doesn’t give scrutineers evidence that the output is correct.This paper suggests four diﬀerent techniques for auditing the paper Sen-ate ballots to check the accuracy of the published preference data ﬁles. Thetechniques vary in their assumptions, the amount of work involved, and theconﬁdence that can be obtained.These suggestions might be useful in two contexts: • if there is a challenge to this year’s Senate outcome, • as an AEC investigation of options for future elections.An audit should generate evidence that the election result is accurate, ordetect that there has been a problem, in time for it to be corrected. We hopethat these audits become a standard part of Australian election conduct.3 .1 Q & A • Q: Why do post-election audits?

A: to derive conﬁdence in the accuracy of the preference data ﬁles, or toﬁnd errors in time to correct them. • Q: What can a post-election audit tell you about the election?

A: It can tell you with some conﬁdence that the outcome is correct, orit can tell you that the error rate is high enough to warrant a carefulre-examination of all the ballots. • Q: Can the conclusion of the audit be wrong?

A: Yes, with small probability an audit can conﬁrm an outcome that is,in fact, wrong. It can also raise an alarm about a large error rate, even ifthe errors do not in fact make the outcome wrong. • Q: Who does post-election audits now?

A: Many US states require by law, and routinely conduct, post-electionaudits of voter-veriﬁed paper votes when the tallies are conducted electron-ically. Exact regulations vary—the best examples are the Risk-LimitingAudits [BFG +

12] conducted by California and Colorado. • Q: What is needed to do a post-election audit?

A: The audit begins with the electronic list of ballots, and (usually) relieson being able to retrieve the paper ballot corresponding to a particularrandomly-chosen vote in the ﬁle. There must also be time and people to re-trieve the paper ballots and reconcile them with the preference data ﬁle. Avideo of random ballot selection is here: . • Q: How long does it take? How many ballots must be examined?

A: It depends on the audit method, the level of conﬁdence derived, thesize of the electoral margin and the number of errors in the sample. Thisis described carefully below. • What is the diﬀerence between a statistical post-election auditand a recount?

A: It’s not feasible to do manual recounts; a statistical post-election auditwould provide a comparable way of assessing the accuracy of the outcome.

This paper describes four suggested approaches to auditing the paper evidenceof Australian Senate votes, each described in more detail in Section 2.

Section 2.1

Bayesian audits [RS12],

Section 2.2 a “negative” audit based on an upper bound on the margin,

Section 2.3 a simple scheme with a ﬁxed sample size,

Section 2.4 a “conditional” risk-limiting audit, which tests one particular al-ternative election outcome. 4e have prototype code available for completing any of the above kindsof audit. This would be the ﬁrst time these sort of auditing steps are beingapplied, and so this year’s eﬀorts would be much more “exploratory” in characterthan “authoritative”. We hope to be able to perform two or more kinds of auditson the same samples. However, we do not even know, at the time of writing,whether any audit will happen at all.The key objective is to provide evidence that the announced election outcomeis right, or, if it is wrong, to ﬁnd out early enough to correct it by carefulinspection of the paper evidence.

This very brief security analysis of the current process is based on documentson the AEC’s website [AEC16]. The objective of the system is, in principle,extremely simple: capture the vote preferences from the ballot papers, and thenpublish and tally them.The current implementation results in a number of points of trust, in whichthe integrity of the data is not checked by humans and is dependent on thesecure and error-free operation of the software. Whilst internal audit steps areuseful, there are many systematic errors and security problems they would notdetect. We list the three most obvious examples below.

Image Scanning

There appears to be no veriﬁcation that the scanned imageis an accurate representation of the paper ballot. As such, a malicious, orbuggy, component could alter or reuse a scanned image, which would thenbe utilised for both the automatic and manual data entry. This wouldpass all subsequent scrutiny, whilst not being an accurate representationof the paper ballot. We understand that scrutineers can ask to see thepaper ballot, but this seems very unlikely to happen if the image is clearand the preferences match.

Ballot Data Storage

Whilst a cryptographic signature is produced at the endof the scanning and processing stage, and prior to submission to the count-ing system, this signature is based on whatever is in the database. Thereis no veriﬁcation that the database accurately represents what was pro-duced by the automatic recognition or the manual operator, nor that itwas the same thing displayed to scrutineers on the screen. An error, ormalicious component, with access to the database could undetectably alterthe contents.

Signature Checking

Automatic signature generation is a problem in the pres-ence of a misbehaving device. There is no restriction on the device creatingsignatures on alternative data. Likewise, there appears to be no scrutinyover the data being sent between the scanning process and the countingprocess, particularly, that the sets of data are equal. There appears tobe logging emanating from both services, but no clear description of howsuch logs will be reconciled and independently scrutinised.In summary, there are plenty of opportunities for accidental or deliberatesoftware problems to cause a discrepancy between the preference ﬁles and thepaper votes. This is why the paper ballots should be audited when the preferenceﬁles are published. 5 .4 Background on audits

The audit process begins with the electronic data ﬁle that describes full pref-erences for all votes in a state. This ﬁle implies a reported election outcome R ,which is a set of winning candidates which we assume to be properly computedfrom the preferences in the data ﬁle. (Actually we don’t have to assume—we cancheck by rerunning the electronic count.) Each line in the data ﬁle is a reportedvote —we denote them r , . . . , r n , where n is the total number of voters in thestate. Each reported vote r i (including blank or informal ones) corresponds toan actual vote a i expressed on paper, which can be retrieved to check whetherit matches r i . The whole collection of actual votes implies an actual electionoutcome A . We want to know whether A = R .The audit proceeds by retrieving and inspecting a random sample of paperballots. A comparison audit chooses random votes from the electronic dataﬁle and compares each one with its corresponding paper ballot. The auditorrecords discrepancies between the paper and electronic votes. A ballot polling audit chooses paper ballots at random and records the votes, without using theelectronic vote data.Although the security of paper ballot processing is important, it’s indepen-dent of the audit we describe here. An audit checks whether the electronicresult accurately reﬂects the paper evidence. Of course if the paper evidencewasn’t properly secured, that won’t be detected by this process. Our deﬁnitionof “correct” is “matching the retained paper votes.”An election audit is an attempt to test the hypothesis “That the reportedelection outcome is incorrect,” that is, that R = A . There are two kinds ofwrong answer: an audit may declare that the oﬃcial election outcome is correctwhen in fact it is wrong, or it may declare that the oﬃcial outcome is wrongwhen in fact it is correct. The latter problem is easily solved in simpler contextsby never declaring an election outcome wrong, but instead declaring that afull manual recount is required. The ﬁrst problem, of mistakenly declaring anelection outcome correct when it is not, is the main concern of this paper.An audit is Risk-limiting [LS12] if it guarantees an upper bound on theprobability of mistakenly declaring a wrong outcome correct. A full manualrecount is risk-limiting, but prohibitively expensive in our setting. None ofthe audits suggested in this paper is proven to be risk limiting, however all ofthem provide some way of estimating the rate of errors and hence the likelihoodthat the announced outcome is wrong. In some cases, the audit may not sayconclusively whether the error rate is large enough to call the election result intoquestion. In others, we can derive some conﬁdence either that the announcedoutcome is correct or that a manual inspection of all ballots is warranted.

Election auditing is well understood for US-style ﬁrst-past-the-post electionsbut diﬃcult for complex voting schemes. The Australian Senate uses the SingleTransferable Vote (STV). There are many characteristics that make auditingchallenging: • It is hard to compute how many votes it takes to change theoutcome.

Calculating winning margins for STV is NP-hard in gen-eral [Xia12], and the parameters of Australian elections (sometimes more6han 150 candidates) make exact solutions infeasible in practice. Thereare not even eﬃcient methods for reliably computing good bounds. • A full hand count is infeasible, since there are sometimes millions ofvotes in one constituency, • In practice the margins can sometimes be remarkably small.

Forexample, in Western Australia in 2013 a single lost box of ballots wasfound to be enough to change the election outcome. In Tasmania in 2016there were more than 300,000 votes, but the ﬁnal seat was determined bya diﬀerence of 141 votes (meaning errors in the interpretation of 71 ballotsmight have altered the outcome).This makes it diﬃcult to use existing post-election auditing methods.To get an idea of the ﬁendish complexity of Australian Senate outcomes,consider the case of the last seat allocated to the State of Victoria in 2013.Ricky Muir from the Australian Motoring Enthusiasts Party won the seat, ina surprise result that ousted sitting Senator Helen Kroger of the Liberal party.In the last elimination round (round 291), Muir had 51,758 more votes thanKroger, and this was generally reported in the media as the amount by whichhe won. However, the true margin was less than 3000 (about 0.1%). If Krogerhad persuaded 1294 of her voters, and 1301 of Janet Rice (Greens)’s voters, tovote instead for Joe Zammit (Australian Fishing and Lifestyle Party), this wouldhave prevented Zammit from being excluded in count 224. Muir, deprived ofZammit’s preferences, would have been excluded in the next count, and Krogerwould have won. (Our algorithm for searching for these small margins is de-scribed in the full version of this paper.)This change could be made by altering 2595 ballots, in each case swappingtwo preferences, none of them ﬁrst preferences, all below the line. First prefer-ences are relatively well scrutinised in pollsite processes before dispatch to thecentral counting station. Other preferences are not. Also lowering a particu-lar candidate’s preference wouldn’t usually be expected to help that candidate(though we are not the ﬁrst to notice STV’s nonmonotonicity). So the outcomecould have been changed by swapping poorly-scrutinised preferences, half ofwhich seemed to disadvantage the candidate they actually helped, in far fewerballots than generally expected.

This section describes four diﬀerent proposals and compares them according tothe degree of conﬁdence derived, the amount of auditing required, and otherassumptions they need to make. We have already implemented prototype soft-ware for running Bayesian Audits (Section 2.1) and computing upper boundson the winning margin (Section 2.2). We have tested the code on the AEC’sfull preference data from some states in the 2016 election—results are describedbrieﬂy below.

Rivest and Shen’s “Bayesian audit” [RS12] evaluates the accuracy of an an-nounced election outcome without needing to know the electoral margin. It7amples from the posterior distribution over proﬁles of cast ballots, given a priorand given a sample of the cast paper ballots (interpreted by hand). It only looksat a sample of the cast paper ballots—it does not compare the sampled paperballots with an electronic interpretation of them.An proﬁle is a set of ballots. The auditor doesn’t know the proﬁle of cast(paper) ballots, and so he works with a probability distribution p over possiblesuch proﬁles, which summarises everything the auditor belives about what theproﬁle of cast ballots may be.The Bayesian audit proceeds in stages. Successive stages consider increas-ingly larger samples of the cast ballots.Each stage of the Bayesian audit provides an answer to the question “what isthe probability of various election outcomes (including the announced outcome),if we were to examine the complete proﬁle of all cast ballots?”This question is answered by simulating elections on proﬁles chosen accordingto the posterior distribution based on p , and measuring the frequency of eachoutcome.Each audit stage has three phases:1. audit some randomly chosen paper ballots (that is, obtain their interpre-tations by a human),2. update p using Bayes’ Rule,3. sample from the posterior distribution on proﬁles determined by p anddetermine the election outcome for each; measure the frequency of diﬀerentoutcomes.Like any process that uses Bayes’ Rule, choosing a prior is a key part ofthe initialization. The suggestion in [RS12] is to allow any political partisan tochoose the prior that most supports their political beliefs. When everyone (whouses Bayes’ Rule properly) is satisﬁed that the evidence points to the accuracy ofthe announced result, the audit can stop. For example, the auditors could agreeto stop when 95% of simulated election outcomes match the reported outcome.In the Australian Senate case, we assume that there will be only one apolit-ical auditing team (though in future candidate-appointed scrutineers could dothe calculations themselves). Hence we suggest a prior that is neutral—if theannounced outcome is correct, this probability distribution will be graduallycorrected towards it.An alternative, simpler version amounts to a bootstrap, treating the popu-lation of reported ballots as if it is the (prior) probability distribution of ballots,and then seeing how often one gets the same result for samples drawn fromthat prior. This gives an approximate indication of how much auditing of paperballots would be necessary, assuming that the paper ballots were very similarto the electronic votes. We have run this version of the audit on the Senateoutcome from 2016. Table 1 shows the number of samples needed in the boot-strapping version, in order to get 95% of trials to match the oﬃcial outcome.Tasmania is the closest, and the only one that’s really infeasible: a sample sizeof about 250,000 ballots is needed before 95% of trials produce the oﬃcial out-come, which is not much better than a complete re-examination of all ballots.This is hardly surprising given the closeness of the result. Queensland requires23,000, which is still only a tiny fraction of the total ballots. Apart from that,all the other states require only a few thousand samples.8 tate Number of votes (millions) Audit sample size (thousands)NSW 4.4 4.6NT 0.1 1.5Qld 2.7 23SA 1.1 3Tas 0.34 250Vic 3.5 6WA 1.4 9Table 1: Sample sizes for 95% agreement in bootstrap Bayesian Audit.We suggest a combination of the bootstrapping method with the retrievalof paper ballots: have a single short partial ballot in favor of each candidate,combined with an empirical Bayes approach that speciﬁes that only ballots ofthe forms already seen in the sample (or the short singleton ballots) may appearin the posterior distribution.Although these audits were designed for complex elections, there are sig-niﬁcant challenges to adapting them to the Australian Senate. Running thesimulations eﬃciently is challenging when the count itself takes some time torun. Answers to these challenges are described in the full version of the paper.

We have implemented some eﬃcient heuristics for searching for ways to changethe election outcome by altering only a small number of votes—the code is avail-able at https://github.com/SiliconEconometrics/PublicService . The Kroger/Muirmargin described in the Introduction is an example. We can guarantee thatthe solution we ﬁnd is genuine, i.e. a true way to change the outcome with thatnumber of ballots, but we can’t guarantee that it is minimal—there might bean even smaller margin that remains unknown. The algorithm produces a listof alternative outcomes together with an upper bound on the number of votesthat need to change to produce them.If the error rate is demonstrably higher than this upper bound on the margin,then we can be conﬁdent it is large enough to change the election result. Ofcourse, it does not follow that the election result is wrong, especially if the errorsare random rather than systematic or malicious. It means that all the paperevidence must be inspected.This allows a “negative audit,” which can allow us to infer with high conﬁ-dence that the number of errors is high enough.Suppose there are N ballots in all. Suppose we know that the outcome couldbe altered by altering no more than X ballots in all, provided those ballots weresuitably chosen. Suppose we think the true ballot error rate p (ballots witherrors divided by total ballots, no matter how many errors each ballot has) is q ,with qN ≫ X ; that is, we think the error rate is large enough that the outcomecould easily be wrong. Then a modest sample of size n should let us infer withhigh conﬁdence that pN > X .For example, consider the 2016 Tasmanian Senate result, in which the ﬁnalmargin was 71 out of 339,159 votes (a diﬀerence of 141 votes). We can computethe conﬁdence bounds based on a binomial distribution. A lower 95% conﬁdence9ound for p if we ﬁnd 3 ballots with errors in a sample of size 2500 is about0.0003. That’s much greater than the error rate of 71 / ,

159 = 0 . https://gist.github.com/pbstark/58653bbc26f269d4588ea7cd5b2e12bf . A much simpler alternative is to take a ﬁxed sample size of paper ballots (e.g.0.1% of the cast ballots), draw that many ballots at random and examine themall. This conveniently puts a “cap” on the number of randomly-chosen paperballots to be examined, but the audit results may provide less certainty than anuncapped audit would provide.

Assume now that the aim is to try to ﬁnd conﬁdence that the election outcome iscorrect. This audit could quantify the conﬁdence in that assertion, by computingbinomial upper conﬁdence bounds on the overall error rate. The idea is to ﬁndthe p-value (or conﬁdence level) that the sample you actually have gives youthat the outcome is right.Even an error rate of 0.0002, i.e. , two ballots with errors per 10,000 ballots,could have changed the electoral result in Tasmania, depending on the exactnature of those errors. The sample size required to show that the error rateis below that threshold—if it is indeed below that threshold—is prohibitivelylarge. If we take a sample of 1,000 ballots and we ﬁnd no errors that aﬀect the71 margin, the measured risk is the chance of seeing no errors if the true errorrate is 0.0002, i.e., (0 . ∗ (1 − . = 81% . If we took a sample of2,000, the measured risk would be (0 . ∗ (1 − . = 67% . However, this method might be quite informative for other contests. Manualinspection of a sample of 1,000 ballots could give 99% conﬁdence that the errorrate is below 0.0046 (46 ballots with errors per 10,000 ballots), if the inspectionﬁnds no errors at all. If it ﬁnds one ballot with an error, there would be 99%conﬁdence that the error rate is below about 0.0066 (66 ballots with errors per10,000 ballots).Similarly, manual inspection of a sample of 500 ballots could give 99% con-ﬁdence that the error rate is below 0.0092 (92 ballots with errors per 10,000ballots), if the inspection ﬁnds no errors at all. If it ﬁnds one ballot with anerror, there would be 99% conﬁdence that the error rate is below about 0.0132(132 ballots with errors per 10,000 ballots).If more errors are found, this gives a way to estimate the error rate. If it islarge, this would give a strong argument for larger audits in the future.

We can also derive some partial conﬁdence measures from the given sample.For example, you could list, for each candidate, the precentage of the time thatcandidate was elected across the Bayesian experiments. (Each experiment starts10ith a small urn ﬁlled with the 14000 ballots, plus perhaps some prior ballots,and expands it out to a full-sized proﬁle of 14M ballots with a polya’s urnmethod or equivalent. This is for a nationwide election; for the senate the full-size proﬁles are the size of each senate district.) Depending on the computationtime involved, we might run say 100 such experiments. So, you might have aﬁnal output that says:Joe Jones 99.1 %Bob Smith 96.2 %Lila Bean 82.1 % . . .

Rob Meek 2.1 %Sandy Slip 0.4 %Sara Tune 0.0 %Such results are meaningful at a human level, and show what can be reason-ably concluded from the small sample.This allows us to have a commitment to a given level of audit eﬀort, ratherthan a commitment to a given level of audit assurance, and then give resultsthat say something about the assurance obtained for that level of eﬀort.

Back to the Tasmanian 2016 example again. One way to examine the issueis to consider the particular, most obvious, alternative hypothesis, i.e. thatthe correct election result diﬀers only in changing the ﬁnal tallies of the lasttwo candidates. If we assume that all the other, earlier, elimination and seatingorders are correct, we can conduct a risk-limiting audit that tests only for the oneparticular alternative hypothesis. (Of course, it isn’t truly risk limiting becauseit doesn’t limit the risks of other hypotheses.) This may be relevant in a legalcontext in which a challenging candidate asserts a particular alternative. Thismethod would provide evidence that the error rate is small enough to precludethat alternative (if indeed it is), without considering other alternatives.This can be run as a ballot-level comparison audit, in which the electronicballot record is directly compared with its paper source. When an error isdetected, its impact on the ﬁnal margin can be quantiﬁed (a computationallyinfeasible problem when considering all possible alternative outcomes). A risk-limiting audit could be based on the Kaplan-Markov method from [Sta08]. Itallows the sample to continue to expand if errors are found: that is, it involvessequential testing. At 1% risk limit, the method requires an initial sample sizeof about (10/margin), where the margin is expressed as a fraction of the totalballots cast. Here, that’s about 0.0002. A risk limit of 5% would require handinspection of roughly 16,000 ballots, assuming no errors were found.

These four diﬀerent audit methods could each be conducted on the same dataset.We would generate the sample by choosing random elements of the oﬃcial pref-erence data ﬁle, then fetching the corresponding paper ballot. The BayesianAudit and the simple capped scheme would then simply treat the paper ballotsas the random sample. The upper-bounds based scheme and the conditional risklimiting audiit would consider the errors relative to what had been reported.11here are important details in exactly how the audit is conducted. Wesuggest that the auditors not see the electronic vote before they are asked todigitize the paper—otherwise they are likely to be biased to agree. However,we also suggest that they are notiﬁed in the case of a discrepancy and asked todouble-check their result—this should increase the accuracy of the audit itself.Details of this process are interesting future work. It is, of course, importantthat the audit itself should be software independent.If the rate of error is high then a high level of auditing is required. With fewor no errors, our best estimates of the necessary sample size for each techniqueapplied to the Tasmanian 2016 Senate are: • for Bayesian audits, about 250,000 samples until 95% of trials match theoﬃcial outcome, • for “negative” audits, a sample that found 3 or more errors out of 2500ballots would give a 95% conﬁdence bound on the error rate (being bigenough), • a ﬁxed sample size of 500 or 1000, even with no errors, seems unlikely tobe large enough to infer anything meaningful for Tasmania 2016, thoughit may be useful for other contexts, • a conditional risk-limiting audit would require about 16,000 ballots for arisk limit of 5%, assuming no errors were found.Most other states would probably be easier to audit as they do not seem tobe as close. All the tools necessary for conducting a Bayesian audit of Australian Senatevotes are available as a Python package at https://pypi.python.org/pypi/aus-senate-audit ,with code and instructions at https://github.com/berjc/aus-senate-audit .Code for searching for small successful manipulations is at https://github.com/SiliconEconometrics/Pub

Code for computing relevant statistical bounds is at https://gist.github.com/pbstark/58653bbc26f269

Elections must come with evidence that the results are correct. This work con-tributes some techniques for producing such evidence for the partly-automatedAustralian Senate count.All of the audits discussed here can be conducted immediately, using codealready available or speciﬁcally produced as a prototype for this project.

In the future we could expand the precision with which we record errors andmake inferences about their implications. We are also pursuing an easier userinterface for administering the audit. 12 eferences [AEC16] .[BEH +

08] Kevin RB Butler, William Enck, Harri Hursti, Stephen E McLaugh-lin, Patrick Traynor, and Patrick McDaniel. Systemic issues in thehart intercivic and premier voting systems: Reﬂections on projecteverest.

EVT , 8:1–14, 2008.[BFG +

12] Jennie Bretschneider, Sean Flaherty, Susannah Good-man, Mark Halvorson, Roger Johnston, Mark Linde-man, Ronald L Rivest, Pam Smith, and Philip BStark. Risk-limiting post-election audits: Why and how. ,2012.[CAt07] California top to bottom review of voting. ,2007.[CBNT] Andrew Conway, Michelle Blom, Lee Naish, and VanessaTeague. An analysis of new south wales electronic vote counting. https://siliconeconometrics.github.io/PublicService/CountVotes/NSWLGE2012MillionRun [FHF06] Ariel J Feldman, J Alex Halderman, and Edward W Felten. Securityanalysis of the diebold accuvote-ts voting machine. 2006.[HT15] J Alex Halderman and Vanessa Teague. The new south wales iVotesystem: Security failures and veriﬁcation ﬂaws in a live online elec-tion. In

International Conference on E-Voting and Identity , pages35–53. Springer, 2015.[KSRW04] Tadayoshi Kohno, Adam Stubbleﬁeld, Aviel D Rubin, and Dan SWallach. Analysis of an electronic voting system. In

Security andPrivacy, 2004. Proceedings. 2004 IEEE Symposium on , pages 27–40.IEEE, 2004.[LS12] M. Lindeman and P.B. Stark. A gentle introduction to risk-limitingaudits.

IEEE Security and Privacy , 10:42–49, 2012.[Riv08] Ronald L Rivest. On the notion of ‘software independence’invoting systems.

Philosophical Transactions of the Royal Societyof London A: Mathematical, Physical and Engineering Sciences ,366(1881):3759–3767, 2008.[RS12] Ronald L Rivest and Emily Shen. A bayesian method for auditingelections. In

EVT/WOTE , 2012.13Sta08] P.B. Stark. Conservative statistical post-election audits.

Annals ofApplied Statistics , 2008.[Xia12] L. Xia. Computing the margin of victory for various voting rules. In