15
Automated Analysis of Election Audit Logs Patrick Baxter Clemson University Anne Edmundson Cornell University Keishla Ortiz University of Puerto Rico-Arecibo Ana Maria Quevedo Miami Dade College Samuel Rodr´ ıguez University of Puerto Rico-Mayag¨ uez Cynthia Sturton University of California-Berkeley David Wagner University of California-Berkeley Abstract The voting audit logs produced by electronic voting sys- tems contain data that could be useful for uncovering procedural errors and election anomalies, but they are currently unwieldy and difficult for election officials to use in post-election audits. In this work, we develop new methods to analyze these audit logs for the detec- tion of both procedural errors and system deficiencies. Our methods can be used to detect votes that were not included in the final tally, machines that may have ex- perienced hardware problems during the election, and polling locations that exhibited long lines. We tested our analyses on data from the South Carolina 2010 elec- tions and were able to uncover, solely through the anal- ysis of audit logs, a variety of problems, including vote miscounts. We created a public web application that ap- plies these methods to uploaded audit logs and generates useful feedback on any detected issues. 1 Introduction Election officials are tasked with the difficult job of en- suring fair and smooth elections. It is their responsibil- ity to ensure that ballots are cast, collected, and tallied properly and that every registered voter who comes to a polling place to vote is given the opportunity to do so. This requires that every polling location is staffed with well-trained poll workers and provisioned with enough ballots and balloting stations to accommodate all vot- ers. A number of election day events can thwart even the best efforts on the part of election officials; surges of voters all arriving to vote at the same time, malfunction- ing machines, and poll worker errors are a few. Infor- mation about election day events can help officials, and researchers who study elections, better understand what worked and what did not work and better prepare for the next election. In the November 2010 U.S. elections, 33% of reg- istered voters were using Direct Recording Electronic (DRE) voting machines [22]. Federal standards require that electronic voting machines generate detailed audit logs for use during post-election audits. Unfortunately, while the logs contain large amounts of data, it is not im- mediately obvious what sort of useful information can be learned from the data. Furthermore, even simple tallies are cumbersome, time consuming, and prone to human error if done manually. For these reasons, election offi- cials do not regularly perform countywide post-election analysis of the log data. However, log data contain a trove of information that can shed light on what takes place at the polling place on election day. For example, election officials can use the information to learn about voting machines that may need maintenance, or ways that poll workers and other resources may be better allocated. In this work, we aim to make DRE audit log analysis more useful and accessible to election officials and other interested parties. We develop new methods to analyze audit logs for the detection of both procedural errors and system deficiencies. We created AuditBear, a public web application that provides our fully automated analyses as a free service for use by election officials or interested third parties. A strength of our tool is that even in the face of missing data, we are still able to pull out useful information. Our research contributes to the election audit process in the following ways. We introduce methods for identifying, solely from publicly available audit logs, potential errors in the software, hardware, and system configuration of DREs. We introduce methods for identifying instances of human error by poll workers. Furthermore, we dif- ferentiate between random errors and patterns of er- ror that suggest shortcomings in the training elec- tion workers receive.

Automated Analysis of Election Audit Logs€¦ · between voting machines and the Election Management System server [8, 14], and return-oriented programming exploits [9]. In all these

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Automated Analysis of Election Audit Logs€¦ · between voting machines and the Election Management System server [8, 14], and return-oriented programming exploits [9]. In all these

Automated Analysis of Election Audit Logs

Patrick BaxterClemson University

Anne EdmundsonCornell University

Keishla OrtizUniversity of Puerto Rico-Arecibo

Ana Maria QuevedoMiami Dade College

Samuel RodrıguezUniversity of Puerto Rico-Mayaguez

Cynthia SturtonUniversity of California-Berkeley

David WagnerUniversity of California-Berkeley

Abstract

The voting audit logs produced by electronic voting sys-tems contain data that could be useful for uncoveringprocedural errors and election anomalies, but they arecurrently unwieldy and difficult for election officials touse in post-election audits. In this work, we developnew methods to analyze these audit logs for the detec-tion of both procedural errors and system deficiencies.Our methods can be used to detect votes that were notincluded in the final tally, machines that may have ex-perienced hardware problems during the election, andpolling locations that exhibited long lines. We testedour analyses on data from the South Carolina 2010 elec-tions and were able to uncover, solely through the anal-ysis of audit logs, a variety of problems, including votemiscounts. We created a public web application that ap-plies these methods to uploaded audit logs and generatesuseful feedback on any detected issues.

1 Introduction

Election officials are tasked with the difficult job of en-suring fair and smooth elections. It is their responsibil-ity to ensure that ballots are cast, collected, and talliedproperly and that every registered voter who comes to apolling place to vote is given the opportunity to do so.This requires that every polling location is staffed withwell-trained poll workers and provisioned with enoughballots and balloting stations to accommodate all vot-ers. A number of election day events can thwart eventhe best efforts on the part of election officials; surges ofvoters all arriving to vote at the same time, malfunction-ing machines, and poll worker errors are a few. Infor-mation about election day events can help officials, andresearchers who study elections, better understand whatworked and what did not work and better prepare for thenext election.

In the November 2010 U.S. elections, 33% of reg-istered voters were using Direct Recording Electronic(DRE) voting machines [22]. Federal standards requirethat electronic voting machines generate detailed auditlogs for use during post-election audits. Unfortunately,while the logs contain large amounts of data, it is not im-mediately obvious what sort of useful information can belearned from the data. Furthermore, even simple talliesare cumbersome, time consuming, and prone to humanerror if done manually. For these reasons, election offi-cials do not regularly perform countywide post-electionanalysis of the log data.

However, log data contain a trove of information thatcan shed light on what takes place at the polling placeon election day. For example, election officials can usethe information to learn about voting machines that mayneed maintenance, or ways that poll workers and otherresources may be better allocated.

In this work, we aim to make DRE audit log analysismore useful and accessible to election officials and otherinterested parties. We develop new methods to analyzeaudit logs for the detection of both procedural errors andsystem deficiencies. We created AuditBear, a public webapplication that provides our fully automated analyses asa free service for use by election officials or interestedthird parties.

A strength of our tool is that even in the face of missingdata, we are still able to pull out useful information. Ourresearch contributes to the election audit process in thefollowing ways.

• We introduce methods for identifying, solely frompublicly available audit logs, potential errors in thesoftware, hardware, and system configuration ofDREs.

• We introduce methods for identifying instances ofhuman error by poll workers. Furthermore, we dif-ferentiate between random errors and patterns of er-ror that suggest shortcomings in the training elec-tion workers receive.

Page 2: Automated Analysis of Election Audit Logs€¦ · between voting machines and the Election Management System server [8, 14], and return-oriented programming exploits [9]. In all these

• We introduce a new method for conducting a sta-tistical analysis of voter flow. This allows for im-proved resource allocation in future elections.

• We conduct a case study using our methods andidentify instances of poor worker training, long vot-ing lines, and missing votes during the 2010 SouthCarolina election.

• Using our experience with the case study, we sug-gest new content that, if included in election logfiles, would allow for additional useful analysis.

We implement these methods for the ES&S iVotronicDRE; the 2010 South Carolina data was already publiclyavailable through a previous Freedom of Information Actrequest and the iVotronic was used in that election. TheiVotronic system is a standalone, portable, touchscreensystem that records vote totals, ballot images and anevent log on internal flash memory. The iVotronic vot-ing machine is one of the most widely deployed DREs inthe U.S. In 2010, 422 jurisdictions tallying more than 22million registered voters used this system [20]. In addi-tion, the types of analyses we identify and our algorithmsfor analysis are applicable to other DRE voting systemsthat produce the necessary audit logs.

In this work we assume that DRE audit logs are com-plete, accurate, trustworthy, and free of accidental ormalicious tampering. Many studies have examined thevarious security weaknesses of DRE machines. DREshave been found to be susceptible to poor software engi-neering practices leading directly to exploitable vulner-abilities [16, 7], insider attacks [5], viruses that spreadbetween voting machines and the Election ManagementSystem server [8, 14], and return-oriented programmingexploits [9]. In all these cases, the demonstrated pay-load is usually a vote-stealing or vote-altering attack andoften the associated logs and counters are modified toremove any traces of the attack. The iVotronic may besusceptible to many security exploits, and our tool doesnot aim to detect or prevent these; detecting and prevent-ing audit log tampering is outside the scope of this work.Rather than making DREs more secure, our tool aims tomake them more reliable. While it is crucial to discovermachines that experience malicious tampering, finding anumber of uncounted votes or learning where voters maybe discouraged from voting by long lines is importanttoo. While many states are moving away from the use ofDREs precisely because of their security failings, DREsare still in widespread use and our tool provides easy-to-apply techniques that make results and future electionsmore reliable.

2 Background

2.1 Introduction to the iVotronic DRE

A brief description of the iVotronic’s functionality andits main system components follows.

Voting terminal. The voting terminal is a stand-alonetouchscreen voting unit. It is equipped with an inter-nal battery, which keeps the unit operational in theevent of a power failure. The terminal features threeinternal flash memories, which store the votes andterminal audit data during the voting process. Threememories are used for redundancy as the three storethe same data. A removable compact flash card(CF) is installed in the back of the terminal prior todeployment to the precinct; this is used to store au-dit data and ballot images (cast vote records). Typi-cally, each polling location is assigned several iVo-tronic machines as well as one audio (ADA) termi-nal.

Personalized Electronic Ballot (PEB). The PEB is aproprietary cartridge required to operate the iVo-tronic terminal. The voting terminal is delivered tothe precinct with no ballot style information; thatis later supplied by the PEB. The PEBs are pro-grammed at election central with the ballot datafor each voting location. At the opening of thepolls, poll workers download ballot style informa-tion from the PEB’s internal flash memory to the ter-minal. When the PEB is placed in the machine, theterminal and the PEB can communicate through aninfrared port. Typically, counties deploy two typesof PEBs to the precinct: a) a Master PEB and b) anActivator PEB. They are interchangeable, but pollworkers are trained to keep them separate and usethem for different purposes.

• Master PEB. Poll workers use the Master PEBto open and close all terminals on election day.The same Master PEB should be used to openall terminals in the polling location. In thesame fashion, the Master PEB should be usedto close all terminals in the polling location atthe end of the voting day. When a terminal isclosed, it uploads its vote totals onto the PEBinserted into it. The Master PEB accumulatesthe precinct totals so that they can later betransported to the tabulation center where thevote totals are uploaded (through PEB read-ers) and included in the official tally.

2

Page 3: Automated Analysis of Election Audit Logs€¦ · between voting machines and the Election Management System server [8, 14], and return-oriented programming exploits [9]. In all these

• Activator PEBs. Activator PEBs are usedby poll workers to activate ballots for voters.Each voter’s session with the voting terminalstarts with a poll worker inserting an Activa-tor PEB into the terminal. Election officialsprovide each precinct with multiple ActivatorPEBs.

Internally, all PEBs at the precinct are identical. Theonly difference between them is the color of therubber band on their exterior. Thus, a Master PEBcan be used to activate a voter’s ballot and an Ac-tivator PEB can be used to open and close termi-nals; though, as a matter of procedure and training,they should not be used this way. Poll workers aretrained so that they put each precinct’s Master PEB,CF cards and totals tape in a designated bag that istransported to Election Central after polls close; Ac-tivator PEBs may be left behind. Thus, if an Activa-tor PEB is used to close terminals, its vote data maynot be uploaded to the aggregated totals on electionnight.

Removable Compact Flash (CF) card. The CF cardsare programmed at Election Central and installedin the back of the voting terminal prior to deploy-ment at the polling location. The CF cards containgraphic (bitmap) files read by the voting terminalduring the voting process. The audio files requiredfor the ADA terminals are also stored in the CFcards. The CF cards are also used as an externalmemory device as the terminal’s event log and bal-lot images are written to the CF card when the ter-minal is closed for voting. Once the polls close, theCF cards are removed from the back of the terminaland delivered to election headquarters on electionnight. The CF cards are uploaded to the ElectionManagement System during the canvassing process.Election officials generate the election’s ballot im-age and event log databases from each precinct’s setof CF cards. For the remainder of this paper, anyreference to the iVotronic logs will refer to the ag-gregated data from all precincts in a single county.

2.2 iVotronic Audit Data

The ES&S voting solution produces many log files,but in our analysis we focus on three: the event log(EL152.lst), the ballot image file (EL155.lst), and thesystem log (EL68a.lst). Other files produced by the iVo-tronic are the Unofficial Precinct Report-Group Details(EL30a.lst), the Official Precinct Report (EL30.lst), theUnofficial Summary Report-Group Details (EL45a.lst),the Official Summary Report (EL45.lst), and the Manual

Changes Log Listing (EL68.lst). We did not use these fortwo primary reasons: first, unofficial data detracts fromthe validity of our analyses; second, these logs were notavailable for the majority of counties. There is limiteddocumentation about the iVotronic logs; therefore theremay be additional files that have not been released to thepublic.

The event log (EL152.lst) contains audit log entriesfrom each iVotronic terminal used in the election. Thelog records, in chronological order, all events that oc-curred on that machine during the election. It typicallybegins at election headquarters, before the election, witha “clear and test” of the terminal to delete previous elec-tion data from the terminal’s memory. It also records allelection day events, including polls open, polls closing,and the number of ballots cast. Each event log entry in-cludes the iVotronic’s terminal serial number, the PEB’sserial number, the date and time, the event that occurredand a description of the event.

The ballot image file (EL155.lst) contains all ballotimages saved by the iVotronic terminals during the vot-ing process. An ES&S ballot image is a bit of a mis-nomer and might more rightfully be called a cast voterecord: it is a list of all choices made for each vote cast;it is not a scanned or photographic image. However, westick with the ES&S terminology and refer to each castvote record as a ballot image. The ballot images are seg-regated by precinct and terminal where the votes werecast. The ballots are saved in a random order to protectthe privacy of the voter.

The system log listing file (EL68a.lst) chronologicallytracks activity in the election reporting database at theelection headquarters. Its entries reflect the commandsexecuted by the operators during pre-election testing,election night reporting, post-election testing, and can-vassing. It also contains the totals accumulated in thevarious precincts during election night reporting, as wellas any warnings or errors reported by the software duringthe tabulation process. The system log also tracks the up-loading of PEBs and CF cards to the election-reportingdatabase. Manual adjustment of precinct totals are alsorecorded in the system log file.

2.3 Voter Privacy

Our tool depends on the audit logs being publicly avail-able. It is important that neither the logs themselves, norour analyses endanger voter privacy.

None of the logs we use in our analyses contain per-sonally identifiable information. The event log containsthe times and machines on which votes were cast, butcontains no voter information and no form of a ballot im-age. It simply states that a vote was cast on machine X

3

Page 4: Automated Analysis of Election Audit Logs€¦ · between voting machines and the Election Management System server [8, 14], and return-oriented programming exploits [9]. In all these

at time Y. The ballot image file is a record of all cast bal-lots. Each ballot image depicts the choices a voter made,which machine they used, and which precinct they votedin. In order to protect voter privacy, the system recordsthe ballot images in a random order. As long as no ma-chine was used by only a single voter, the ballot imagecannot be traced back to a particular entry in the eventlog. Because the event log does not contain any cast voterecords and the ballot images are randomly ordered, andassuming no machine was used by only a single voter,there is no way to link the two log files to determine howa voter voted. Public information in combination withthese logs does not affect voter privacy either; even if aspectator views the order of voters, no voter can be linkedto his or her respective ballot image.

The system log only reflects operator actions and votetotals, and therefore has no foreseeable impact on voterprivacy.

2.4 Other DRE Audit Logs

Our tool focuses on the iVotronic system, but it can beapplicable to other systems, provided their logs containthe information described in Section 3, as well as a wayto collect the logs at a central location in an electronicformat. Other machines that we considered based ontheir popularity were the ES&S (formerly Premier Elec-tion Solutions) AccuVote-TSX, Sequoia AVC Edge, andHart eSlate. Both the AccuVote-TSX and AVC Edge lackthe ability to export logs to a central location for analysis.Additionally, it is not clear from available documentationthat these two machines support file formats that are suit-able to our tool. The eSlate DRE machine is more con-ducive to a tool such as ours. This machine allows for theautomatic collection of audit logs to a central location.It also contains the most complete logging of the threesystems; it is more likely that the eSlate contains all ofthe required information necessary to run our analyses.However, it is known that the eSlate contains a weaknessin the protection of voter privacy: the audit log includesinformation that could connect voters to their votes [24].These are not fundamental limitations of the idea of loganalysis, but rather a shortcoming of existing systems, sowe scoped our work to reflect this. More details on sug-gestions to make logging systems more amenable to thistype of analysis are discussed in Section 6.

3 Analysis

Our system is structured as a set of analyses, each one de-signed to shed light on one particular aspect of election-day and post election-day activities. In this section wepresent a description of our analyses.

3.1 Analyses of Interest

We focus on analyses that we expect to be most useful toelection officials or interested third parties. First, sinceincorrectly set internal clocks on the DREs make analy-sis of the audit log data more difficult and less reliable,we identify erroneous time stamps in the audit logs. In-correctly set clocks may also prevent a voting machinefrom starting up on time [23].

Then, since vote-counting is fundamental to elections,we use the audit logs to detect instances of cast votesbeing under-counted. We do this by looking for discrep-ancies between the different log files that indicate somevotes have been left out of the final tally.

Third, we use audit logs to identify incidents of linesat polling locations. Long lines in the voting place cannegatively affect the fairness of elections. There is a pos-itive correlation between line length and the likelihoodof a voter reneging – that is, leaving the polling locationwithout voting [19]. A study conducted by the VotingRights Institute of the Democratic National Committeefound that as many as 3% of voters in the 2004 gen-eral election in Ohio reneged [10]. A field study con-ducted during the 2008 presidential primary in Califor-nia observed close to 2% of voters leaving the pollinglocation without voting when there were lines [19]. Inaddition to reneging, voters may be deterred from go-ing to the polling place in the first place if they expectlong lines, which is known as balking. A case study ofthe 2004 presidential election estimates that between 4%and 4.9% of voters in Franklin County, Ohio may havebalked for fear of encountering long lines [3]. Finally, in-cidents of long lines do not occur with equal probabilityat all precincts. Locations in lower socioeconomic neigh-borhoods have a higher chance of experiencing long lineson election day [19, 10]. For these reasons, understand-ing where and how often long lines occur is importantfor gauging the success and fairness of an election andcan help election officials better allocate resources at thenext election.

To this end, we conduct two analyses. In the first,we find all locations that were open past the official pollclosing time and use this as a proxy for the existence oflong lines at the end of the day. Election officials mightalso like to know when lines occurred throughout the dayor whether there were lines earlier in the day that had dis-appeared by closing time. We perform the former anal-ysis for those subset of locations that stayed open late.In other words, for those locations that stayed open late,we are able to show through analysis of the audit logs,at what other times of the day they likely had long lines.In Section 6 we suggest simple improvements to the logdata that would make our long lines analysis possible forall precincts, not just those that stayed open late.

4

Page 5: Automated Analysis of Election Audit Logs€¦ · between voting machines and the Election Management System server [8, 14], and return-oriented programming exploits [9]. In all these

In our fourth set of analyses, we identify a number ofseemingly small election-day issues that can have a veryreal, negative effect on the accuracy and fairness of anelection. These include: malfunctioning displays that gounnoticed; failure to follow procedures on the part of pollworkers; and audit data that was not recorded, but shouldhave been.

Incorrect displays might cause a voter to cast a voteother than as intended. Failing to follow election-dayprotocol can lead to a loss of votes when a machine isnot correctly closed out at the end of the day. Missingaudit data is a concern as it makes it difficult for ourother analyses to give accurate results. Some of theseerrors could result in fewer working machines on elec-tion day. Fewer machines can mean longer lines at thepolling place. Also, the number of machines per voterappears to have an affect on the percentage of votes notincluded in the final tally due to error: as the numberof machines per voter increases, the more likely the castvotes get counted [10].

3.2 Algorithms

We describe here the algorithms used in each of our anal-yses. For the majority of these we only consider datafrom election day between the hours of 7 A.M. and 7P.M., which are the times that polls open and close inSouth Carolina. In our preliminary analysis, we foundexamples of log entries with seemingly incorrect timestamps. We identified two types of time stamp errors:errors resulting from incorrectly set clocks and errors re-sulting from apparent bugs in the time stamp mechanismitself. Given only a time stamp in the logs, it is impossi-ble to know whether it is correct; however, we developeda number of heuristics to find those terminals that likelydo have an incorrect clock. We try to minimize the num-ber of false positive reports we give; therefore, we maymiss some terminals with an incorrect clock. We providethe user with a report detailing the results of this timestamp analysis.

This analysis is meant only to indicate whether a ma-chine had a noticeably incorrect time; we are not con-cerned with whether multiple machines in a precincthave synchronized internal clocks. Our analyses neverrequire piecing together the order of events that tookplace in two different machines; therefore we neither re-quire nor suggest that election officials should synchro-nize the clocks on different machines. While our tool cantolerate a fair amount of error, we do recommend that themachines be set with a reasonably correct time.

AuditBear detects whether any votes were left out ofthe tally. We assume the tabulation software is correctand instead use the audit logs to check that all cast votes

are entered into the tabulation software. Recall that eachvoting terminal’s vote totals are loaded onto a PEB whenpolls are closed and then all these PEBs’ data are loadedinto the election reporting database. There are two pointsin this process where votes could be omitted: a terminalmay be forgotten and never closed, so that no PEB con-tains its vote totals; or a PEB used to close a terminalmight be forgotten and not uploaded to the database. Weshow how to use the audit logs to detect both of theseproblems.

In order to find instances of PEBs that were not up-loaded, we compared the contents of the event log andballot image files to that of the system log listing file.We first identify, by parsing the event log, the set of PEBsused to close out all voting terminals in the county andthen verify each one appears as uploaded in the systemlog file. When a PEB is missing from the system logfile, we report the case because it signifies that the PEBwas not uploaded and the votes may not be in the certi-fied totals. Our tool only has the ability to detect miss-ing PEBs when the corresponding CF card has been up-loaded; there may be additional missing votes and auditinformation that we do not detect.

Looking for terminals that were never closed is a chal-lenging problem: essentially we need to identify eventsthat are missing from the logs. We do this by findingterminals that were opened, but never closed.

AuditBear also reports on polling locations that stayedopen late and that had long lines throughout the day.While the totals report for each precinct specifies whattime the totals report was printed, it may not be indicativeof the time the polls closed for that precinct. For exam-ple, a poll worker may work on other closing proceduresbefore printing the totals report; if election officials wereto refer to these tapes to identify polling locations openlate, they would experience many false positives. Addi-tionally, this allows a more convenient way for electionofficials to find out which precincts were open after 7P.M.

In order to identify locations that stayed open past 7P.M., AuditBear first compiles a list of every terminal inthe event log file for which the last vote was cast after 7P.M. Then, using information from the ballot image file,the algorithm groups terminals by polling location andcomputes the mean time of the last cast vote for eachgroup. We take the mean in order to account for anychance error in the time stamps. Finally, we report whichpolling locations stayed open late and also provide, forevery county, a chart detailing the number of polling lo-cations that stayed open late and by how long.

Identifying locations that had lines throughout the dayis trickier. We start by positing that when there is a lineof voters waiting, there will be negligible idle time for

5

Page 6: Automated Analysis of Election Audit Logs€¦ · between voting machines and the Election Management System server [8, 14], and return-oriented programming exploits [9]. In all these

each machine between voters. We would like to identifywindows of time where this was the case for a partic-ular polling location. (Note that this does not allow usto differentiate between voters standing in line and vot-ers arriving in a steady stream that keeps the machines atmaximum capacity. It is a shortcoming of our approach,but seems difficult to avoid given only the log data.) Theanalysis is complicated however, by the fact that the logsdo not record an event when a new ballot is activated for avoter, only when a ballot is cast. Given the time stampst1andt2 of two cast vote events for votersv1 andv2, it couldbe the case thatv2 walked up to the terminal as soon asv1 cast her vote and then spentt2− t1 minutes markingher ballot before casting her vote. Or, it could be that theterminal was idle for most oft2− t1 and at the last mo-ment, voterv2 approached the terminal, quickly markedher ballot and then cast her vote. If we knew how long theaverage voter took to mark her ballot we could use that toestimate the length of the idle time between two consec-utive cast vote events. We do not know that informationdirectly, but we can infer it from the logs we have. Weknow which locations were open after 7 P.M. and we alsoknow that a polling location should only stay open late ifthere are people waiting in line at the official poll clos-ing time. We assume this protocol is followed, and thatthe line moves efficiently, and therefore the terminals ina given location experience very little idle time betweenvoters after 7 P.M. We also assume the time it takes tomark a ballot is a random variable and these late votersare a random sample of the entire voting population forthat precinct. Therefore the average time it takes them tovote represents the time it takes the average voter in theprecinct to vote.1 We then look for other times through-out the day where the time between votes is similar toor less than the time between votes after 7 P.M. Startingat 7 A.M., for each location, we look at each one hourwindow that starts on the hour and compile the setS1

of time-between-consecutive-votes for every machine inthat location during the window. Next, we compare thisset toS2, the time-between-consecutive-votes for everymachine in that location after 7 P.M. If the mean ofS1 isless than the mean ofS2, this suggests times inS1 wereshorter than inS2 and there possibly were long lines inthat window. We then perform the Mann Whitney U sta-tistical test to determine whether the observed differencein mean is likely due to chance error. For windows wherethe two-tailed p-value is less than 0.05, there is evidencethat the difference in mean is real and there possibly werelong lines during the window whenS1 was collected. For

1It is possible that voters who arrive later in the day are fromaparticular population that has a different average time-to-vote than vot-ers who arrive earlier in the day; however, in their field study of theCalifornia 2008 primary, Spencer and Markovitz found that, at a givenlocation, the average time to vote was constant throughout the day [19],so we feel our assumption is not unreasonable.

those precincts open after 7 P.M., we report the hoursduring the day that possibly experienced long lines.

Note that we only perform this analysis on locationsthat were open after 7 P.M. This analysis would be use-ful to perform on all locations; after all, it is possible apolling place had long lines early in the day, but was stillable to close on time. One way to do that would be to de-fine S2 to be the time-between-consecutive-votes after 7P.M. for every machine ineverylocation that is open lateand then separately compare each location’sS1 set to theglobalS2. However, doing so assumes that the late vot-ers are a random sample of the entire voting populationacross all precincts in the county and have a representa-tive average time-to-vote; this is not a safe assumption.The average time to vote depends on a number of factorsthat can vary widely from precinct to precinct, including:the number of issues on the ballot, the clarity and lengthof the writing on the ballot, and the socioeconomic levelof the polling location [19, 3]. Between locations, theaverage time to vote can vary by as much as 1.5 min-utes [19]. In order to make the long lines analysis appli-cable to all voting locations, one would have to correctfor each of these confounding factors. How one mightdo so is an interesting open question, and we feel thatextending the long lines analysis to all voting locationsmight be amenable to deeper analysis.

We also report on machines that had an uncalibrateddisplay at the time when a vote was cast; there is thepossibility that those votes may not have been cast asintended. When detecting votes cast on uncalibrated ma-chines, we looked for three specific events in the eventlog: a machine uncalibrated event, a vote cast event, anda machine recalibrated event. We used a simple finitestate machine with states ={uncalibrated machine withno votes cast, uncalibrated machine with at least one votecast, calibrated machine} and tracked the current state ofeach terminal as we iterated through the event log. Wethen report any machine that had ever been in the state“uncalibrated machine with at least one vote cast.”

The procedural errors we are concerned with are: notprinting zero tapes, casting votes with Master PEBs, andopening and closing a machine with different PEBs. Foreach polling location, we check that every machine in thelocation recorded printing zero tapes, that each machinewas opened and closed with the same PEB, and that noPEB used to open or close a machine was also used tocast a vote.

Last, we consider how to detect missing data: auditdata that was not recorded, but should have been. Insome cases, this may be impossible; if there is no traceof a terminal in any of the logs, we have no way of know-ing that its data is missing given only the logs. We focuson the audit data for cast votes and find votes for which

6

Page 7: Automated Analysis of Election Audit Logs€¦ · between voting machines and the Election Management System server [8, 14], and return-oriented programming exploits [9]. In all these

either the cast vote event was missing or the ballot im-age was missing. We can not detect a cast vote whichis missing both the cast vote event and the ballot image.For each voting terminal, we compare the number of castvote events in the event log with the number of ballot im-ages in the system log. We report those terminals wherethe two values are not equal as the logs must be missingdata from those machines.

4 Implementation

We built a web application called AuditBear to give elec-tion officials and advocacy groups easy access to ourtoolset. AuditBear requires the user to upload an eventlog and a ballot image file; we strongly suggest they alsosubmit the system log to take advantage of the full rangeof analyses our tool provides.

AuditBear uses only publicly available log data anddoes not store any information from the logs. In the casethat there is a malicious person with access to the tool,no harm can be done because the logs do not containany private information and it cannot be derived from theuse of multiple logs or public information. Additionally,voter privacy will not be harmed if there is a web securitybreach in the log upload process for the same reasons.Therefore, the storage and distribution of these logs doesnot pose a risk to voter privacy.

AuditBear produces reports that warn election offi-cials about possible miscounts or procedural errors. Eachreport provides details about the errors found and ex-plains the possible consequences of the error and sug-gests, where applicable, steps the election officials mighttake to address the error.

5 Results

This section discusses our findings after running our toolon the audit logs from the South Carolina 2010 GeneralElection. We tested our analysis using log files down-loaded from the website titled South Carolina Voting In-formation.2

While we report the anomalies AuditBear detects,there are limited resources for confirmation. There is noplausible way to confirm long lines or problematic ma-chines in the 2010 elections. As we do not handle ma-licious tampering of the logs, we assume that they arecorrect. While we do detect incomplete logs, we have nomeans to know whether there are errors in the FOIA re-quest. With the assumption that the logs are correct, wecan compare our detected number of missing votes to thenumber found in a previous study for corroboration.

2www.scvotinginfo.com

5.1 Date/Time Errors

AuditBear found 1465 out of 4994 machines across 12counties whose date was changed during election dayvoting. Figure 1 shows an example of a 1-hour timechange for Georgetown County. This county had 125 outof 140 machines adjusted nearly exactly one hour back intime. This suggests the wrong Daylight Savings Time al-gorithm was in use, as mentioned in previous audits [6].

Figure 1: Event log entry for resetting an iVotronic clockby one hour in Georgetown County.

Anomalous time changes were detected in 18 ma-chines. An anomaly is any occurrence of an unexplaineddate change while a machine is open for voting. Figure 2is an example that occurred in Richland County. The ma-chine was manually corrected about 30 minutes later.

Figure 2: The event log shows a seemingly random dateanomaly, which occurred in Richland County during theelection day.

5.2 Missing Votes

Our analysis shows that a total of 15 PEBs containing2082 votes were not uploaded from the 14 counties thatwe audited in South Carolina. Figure 3 summarizes thePEBs not uploaded during the General 2010 elections.

AuditBear also identified a few instances of machinesnot being closed. A machine must be closed in order tocollect the votes and audit data from that machine. Therewas a single machine that was not closed in each of thefollowing counties: Greenville County, Horry County,and Sumter County. If this was a close election, in-formation such as this could be cause for an extensiveaudit or recount of the votes. In this particular case, weknow from previous audits that the missing data amounts

7

Page 8: Automated Analysis of Election Audit Logs€¦ · between voting machines and the Election Management System server [8, 14], and return-oriented programming exploits [9]. In all these

Figure 3: The number of PEBs and their correspondingvotes per county that were not uploaded.

to 2082 missing votes, which would not have affected theoutcome for any of the races in the South Carolina 2010elections [18].

Figure 4 shows some of AuditBear’s output onGreenville County’s log files.

Figure 4: Feedback generated by AuditBear for officialswhen detecting that some votes were not uploaded.

5.3 Long Lines

We found 671 out of a total of 942 South Carolinaprecincts stayed open late. Berkeley County had thehighest incidence of delayed closing times with 93% ofpolling locations closing after 7 P.M. Figure 5 depictsthe precincts that closed late in Berkeley County. In thefuture, resources could be allocated to those polling loca-tions that stayed open the latest to help move their linesmore quickly. To detect possible lines before 7 P.M., welooked at only the precincts that were open late. Fig-ure 6 shows the time periods when the Berkeley Countyprecincts experienced long lines before 7 P.M. Figure 7shows the details of the results of the Mann Whitney Ustatistical test for long lines.

Figure 5: The number of precincts that closed withincertain time intervals after 7:00 P.M. (late) in BerkeleyCounty.

Figure 6: The number of precincts that may have experi-ence long lines within certain time intervals in BerkeleyCounty.

8

Page 9: Automated Analysis of Election Audit Logs€¦ · between voting machines and the Election Management System server [8, 14], and return-oriented programming exploits [9]. In all these

Figure 7: Times when there were likely long lines inBerkeley County on a per-precinct basis.

5.4 Calibration Errors

We found seven counties where at least one machine waspossibly not calibrated when votes were cast on that ma-chine. An uncalibrated display could potentially causevotes to not be cast as intended. Our report suggests anelection official or technician inspect these machines forpossible calibration issues; see Figure 8.

Figure 8: Feedback generated by AuditBear for electionofficials when calibration issues are detected.

5.5 Procedural Errors

Our findings reveal the need for improvements in pollworker training. When opening and closing a machine,the same Master PEB should be used, but in 11 coun-ties there were machines opened and closed with differ-ent PEBs. Our results showed an association betweenthis error and certain precincts where poll workers madethose mistakes repeatedly. When this error is made mul-tiple times at a single precinct, it indicates that perhapsthe poll workers do not know the procedures, whereasa random distribution of these errors across polling lo-cations probably means a mistake was made. ColletonCounty had five instances of this procedural error, butfour of those instances took place at one polling location.Figure 9 shows this report from Colleton County.

A poll worker assigns each voter a PEB to use when itis that voter’s turn. When voters cast votes, they shouldnot do so with a Master PEB. For most precincts in Flo-rence County, this error never occurred, however, fourprecincts had large numbers of this error; Mill Branchhad this error on 56 out of 540 votes, Pamplico 2 hadthis error on 82 out of 579 votes, McAllister Hill had thiserror on 102 out of 749 votes, and Effingham had thiserror on 86 out of 591 votes. This pattern of errors againsuggests certain poll workers did not know the properprocedure. Figure 10 shows this relationship in FlorenceCounty.

Figure 9: An example response when AuditBear reportsmachines that were opened and closed with differentPEBs.

5.6 Audit Data

In several counties, the audit logs appeared to be incom-plete. Our analysis detected six counties that did not havethe same set of machines in both the event log and ballotimage file. Florence County had the most inconsisten-cies with 65 machines that had votes cast on them ac-cording to the event log, but no ballot images. We also

9

Page 10: Automated Analysis of Election Audit Logs€¦ · between voting machines and the Election Management System server [8, 14], and return-oriented programming exploits [9]. In all these

Figure 10: This histogram shows the percentage of votesat each precinct that were cast with a Master PEB. Theprecincts that showed the highest percentages had be-tween 56 and 102 instances of this error.

saw cases where there were ballot images for votes caston machines that did not record any events on the eventlog. In addition to an unusually large amount of missingdata, the analysis of Florence County showed machinesthat did not have the same number of votes cast as ballotimages. See Figure 11 for example output from Audit-Bear.

Figure 11: Feedback generated by AuditBear when in-complete audit data is detected.

6 Future Voting Systems Suggestions

We believe the following recommendations will makeaudit files more usable.

Voting systems should support automatic generationand collection of audit logs in a central location. Whilemany other DRE systems do capture data in their auditlogs similar to what the iVotronic does, no other widelydeployed voting system makes it as easy to gather all theaudit logs from all of the voting machines into a singleplace. As a result, while our methods are in principle ap-plicable to other deployed voting systems, in practice thiswould require additional effort from election officials. Inaddition, audit logs should have a universal electronic fileformat. This would allow for a more extendable tool.

Vendors should document the meaning of all events.We found audit logs with event messages, such as “UN-KNOWN,” “Warning PEB I/O flag set,” and “WarningI/O flagged PEB will be used,” which sound ominous,however we could not determine the gravity of the is-sue. Despite combing through all of ES&S‘s publiclyavailable information about the iVotronic, the meaningof these events still remain a mystery [21, 12, 13].

Accuracy of date and time logging needs improve-ment. When the machine has an incorrect clock, timestamps are inaccurate and it becomes difficult to recreateelection day events. In addition, some audit log analyses,

10

Page 11: Automated Analysis of Election Audit Logs€¦ · between voting machines and the Election Management System server [8, 14], and return-oriented programming exploits [9]. In all these

such as the open late analysis, are made more difficult byunreliable time stamps.

Make system manuals available to the public. Votingmachine audit logs are public information. The generalpublic can request them under the Freedom of Informa-tion Act. In the same fashion, we recommend that votingsystem manuals be made freely available. This wouldallow the public to see for themselves if there were anyproblems that should be addressed.

Capture the ballot activation event. Recording the timeeach ballot is activated (as opposed to only recordingwhen the ballot is cast) would make it easier to learnwhen the voting machines were heavily used and whenthey were idle. As the ballot image file is still random-ized, there is still no way to connect a voter to his or hervote. Even if a spectator watches the polling location andrecords how long each voter spends in the voting booth,there are hundreds (if not thousands) of ballot images inthe ballot image file. It is unlikely that a malicious per-son could use this information in order to determine avoter’s choice. This is a simple change that would notcompromise voter privacy, and would make it easier foran automated tool such as AuditBear to extract informa-tion about long lines during the day.

Future voting system standards introduce stronger re-quirements for audit logs; this could make it easier toapply our analyses to other voting systems [24]. Thecurrent standards require voting systems to keep a per-manent record of audit data, but does not specify a par-ticular format. These standards also address the matter oftime stamps; they state “All systems shall include a real-time clock as part of the systems hardware. The systemshall maintain an absolute record of the time and dateor a record relative to some event whose time and dataare known and recorded” [1]. Newly proposed, but notyet approved standards, make additional requirements:voting systems must produce electronic records that areexportable and transmitted to the Election ManagementSystem. Another proposal under these standards is thatvoter privacy be maintained even when reports are com-bined [2]. Many of these standards align with our sug-gestions for future voting systems.

7 Related Work

Two recent studies used event logs from the iVotronicvoting system to audit elections [6, 17]. Buell et al. [6]analyzed the same South Carolina elections that we didand also discovered votes not included in the certifiedcounts and problems with the audit data. By consult-ing additional audit materials, such as the printed resultstapes, the authors were able to offer possible explana-tions for why the problems occurred. Our work takes

a slightly different approach. We focus on developingan automated analysis of the publicly available audit logdata that can be used by anyone to detect other possibleerrors in addition to missing votes.

Sandler et al. [17] analyzed vote tallies by comparingeach machine’s protected vote count to the printed re-sults tapes. Their report also finds time stamps that weremost likely inaccurate. With further investigation, theyconcluded that the machine hardware clock was incor-rect. Our research provides analyses to identify similarproblems, but in a way that can be automated.

There has also been research on using the audit logs toanalyze election-day procedure and activity. Antonyanet al. showed how event logs could be used to determineif a machine acted “normally” on election day [4]. Theybuilt a finite state machine that models the sequences ofevents that a well-behaved AccuVote-OS scanner mightproduce and used it to analyze AccuVote-OS logs. Thistype of analysis could be useful for the iVotronic systemsthat we studied, too.

Other work has focused specifically on detecting orpredicting long lines. Formal models adapted from queu-ing theory and simulations of voter queues can be usedto better understand the factors that affect the length andduration of lines at the polling place [3, 11]. Our workis complementary, and in fact information about electionday events gathered by AuditBear can be used to informthe queuing models. A field study, such as that done bySpencer and Markovitz [19], can provide ground truthabout the occurrence of long lines on election day, but inthe absence of that ideal, our analysis can provide someinformation about the time and duration of long lines.

Voter Verified Paper Audit Trails (VVPATs) are a dif-ferent type of audit log. Unlike the audit logs we usedin our analyses, VVPATs are viewed and verified by thevoter and are more suited to audits concerning a DREincorrectly capturing a voters intent. Our work is moreconcerned with identifying cases of cast votes not beingincluded in the final count, or issues at the polling placethat might prevent the voter from casting their vote in thefirst place. With VVPATs, as long as a certain percentageof voters do check their paper ballot [15], the voting ma-chine need not be assumed correct, whereas our analysesdo make this assumption.

8 Conclusion

This paper develops methods to analyze audit data fromDRE voting machines. It introduces new methods forextracting information about election-day activities andpost-election anomalies from audit data. We conduct anaudit on the 2010 South Carolina election using these

11

Page 12: Automated Analysis of Election Audit Logs€¦ · between voting machines and the Election Management System server [8, 14], and return-oriented programming exploits [9]. In all these

methods and are able to detect instances of missing votes,procedural errors, and likely instances of lines through-out the day. With this information, election officials canimprove poll worker training, resource allocation, elec-tion tabulation procedures, and voting machine prepa-ration testing. Based on our experience during this re-search, we make suggestions for future audit logs thatwould make an automated analysis such as ours evenmore informative.

We built a web application, AuditBear, to performthese analyses. Users can upload the iVotronic log filesto our website and run the analyses. By automating ouranalyses we can provide intelligent feedback to electionofficials during the canvassing process and help themquickly correct any problems in order to produce accu-rate election results. AuditBear is freely available online.

9 Acknowledgments

Special thanks to Dr. Kristen Gates and the TRUSTprogram staff. Additionally, we want to recognize Dr.Duncan Buell for the helpful discussions we had withhim about the topic of this paper. This research wasfunded in part by National Science Foundation grantCCF-0424422.

References

[1] Voluntary Voting System Guidelines. http://www.eac.

gov/assets/1/AssetManager/2005\%20voluntary\

%20voting\%20system\%20standards\%20volume\%201.

pdf, 2005.

[2] VVSG Recommendations to the EAC.http://www.eac.gov/assets/1/Page/TGDC\%20Draft\%20Guidelines.pdf,2007.

[3] A LLEN , T., AND BERNSHTEYN, M. Mitigating Voter WaitingTimes.Chance Magazine 19, 4 (2006), 25–36.

[4] A NTONYAN , T., DAVTYAN , S., KENTROS, S., KIAYIAS , A.,M ICHEL, L., NICOLAOU, N., RUSSELL, A., AND SHVARTS-MAN , A. Automating voting terminal event log analysis. InPro-ceedings of EVT/WOTE 2009(Aug. 2009.).

[5] A PPEL, A. W., GINSBURG, M., HURSTI, H., KERNIGHAN,B. W., RICHARDS, C. D., TAN , G., AND VENETIS, P. The NewJersey voting-machine lawsuit and the AVC Advantage DRE vot-ing machine. InProceedings of the Conference on ElectronicVoting Technology/Workshop on Trustworthy Elections(Berke-ley, CA, USA, 2009), EVT/WOTE’09, USENIX Association,pp. 5–5.

[6] BUELL , D. A., HARE, E., HEINDEL, F., MOORE, C.,AND ZIA ,B. Auditing a DRE-based election in South Carolina. InProceed-ings of EVT/WOTE 2011(Aug. 2011.).

[7] BUTLER, K., ENCK, W., HURSTI, H., MCLAUGHLIN , S.,TRAYNOR, P., AND MCDANIEL , P. Systemic issues in the HartInterCivic and Premier voting systems: Reflections followingproject EVEREST. InProceedings of the USENIX/ACCURATEElectronic Voting Technology Workshop(2008), EVT, USENIXAssociation.

[8] CALANDRINO , J. A., FELDMAN , A. J., HALDERMAN , J. A.,WAGNER, D., YU, H., AND ZELLER, W. P. Source code reviewof the Diebold voting system, July 2007. Report commissioned aspart of the California Secretary of State’s Top-To-Bottom Reviewof California voting systems.

[9] CHECKOWAY, S., FELDMAN , A. J., KANTOR, B., HALDER-MAN , J. A., FELTEN, E. W., AND SHACHAM , H. Can DREsprovide long-lasting security? The case of return-oriented pro-gramming and the AVC advantage. InProceedings of EVT/WOTE2009(Aug. 2009.).

[10] DEMOCRATIC NATIONAL COMMITTEE, VOTING RIGHTS IN-STITUTE. Democracy at risk: The 2004 election in Ohio.http:

//www.votetrustusa.org/pdfs/DNCfullreport.pdf,June 22, 2005.

[11] EDELSTEIN, W. A., AND EDELSTEIN, A. D. Queuing and elec-tions: Long lines, DREs and paper ballots. InProceedings ofEVT/WOTE 2010(Aug. 2010.).

[12] ELECTION SYSTEMS & SOFTWARE. iVotronic Real TimeAudit Log. http://www.essvote.com/HTML/docs/ESS_

iVotronic_RTAL.pdf/, 2011.

[13] ELECTION SYSTEMS & SOFTWARE. iVotronic Touch ScreenVoting System. http://www.essvote.com/HTML/docs/

iVotronic.pdf/, 2011.

[14] FELDMAN , A. J., HALDERMAN , J. A., AND FELTEN, E. W.Security analysis of the Diebold AccuVote-TS voting machine.In Proceedings of EVT 2007(Aug. 2007.).

[15] HALL , J. L. Design and the support of transparency in VVPATsystems in the US voting systems market.http://vote.nist.

gov/jlh-vvpat-design-transparency.pdf/, 2006.

[16] KOHNO, T., STUBBLEFIELD, A., RUBIN , A. D., AND WAL -LACH , D. S. Analysis of an electronic voting system. InPro-ceedings of the IEEE Symposium on Security and Privacy(2004).

[17] SANDLER, D., AND WALLACH , D. S. Casting votes in the au-ditorium. InProceedings of EVT 2007(Aug. 2007.).

[18] SOUTH CAROLINA STATE ELECTION COMMISSION. StatewideResults.http://www.enr-scvotes.org/SC/19077/40477/en/summary.html, 2010.

[19] SPENCER, D. M., AND MARKOVITZ , Z. S. Long lines at pollingstations? Observations from an election day field study.ElectionLaw Journal 9, 1 (2010), 3–17.

[20] VERIFIED VOTING FOUNDATION. Voting Foundation: StateElection Equipment.http://verifiedvoting.org/, 2010.

[21] VERIFIED VOTING FOUNDATION. ES&S iVotronic Full Infor-mation Sheet.http://verifiedvoting.org/article.php?id=5165/, 2011.

[22] VERIFIED VOTING FOUNDATION. America’s voting systemsin 2010. http://www.verifiedvoting.org, Accessed Oct.2011.

[23] VOTERSUNITE.ORG, VOTETRUSTUSA, VOTER ACTION,POLLWORKERS FOR DEMOCRACY. E-voting failuresin the 2006 mid-term elections, a sampling of problemsacross the nation. http://www.votersunite.org/info/

E-VotingIn2006Mid-Term.pdf, Jan. 2007.

[24] WAGNER, D. Voting systems audit log study. Report commis-sioned by the California Secretary of State, Jun 1, 2010.

12

Page 13: Automated Analysis of Election Audit Logs€¦ · between voting machines and the Election Management System server [8, 14], and return-oriented programming exploits [9]. In all these

A Event Log File

13

Page 14: Automated Analysis of Election Audit Logs€¦ · between voting machines and the Election Management System server [8, 14], and return-oriented programming exploits [9]. In all these

B Ballot Image File

14

Page 15: Automated Analysis of Election Audit Logs€¦ · between voting machines and the Election Management System server [8, 14], and return-oriented programming exploits [9]. In all these

C System Log File

15