Using data published by the VA Department of Elections (“ELECT”), we plotted the Ballot Invalidation Rate (BIR) vs. the % of vote share for the winner in order to attempt to determine if “Differential Invalidation” of ballots occurred in the 2024 VA General Election. The plotted data appears to show differential invalidation and suggests that there are underlying issues that should be investigated and addressed, including data reliability and consistency issues where the number of reported total votes cast is greater than the number of reported ballots cast for some localities.
Details
“Differential invalidation” takes place when the ballots of one candidate or position are invalidated at a higher rate than for other candidates or positions. Note that differential invalidation does not directly indicate any sort of fraud. It is however indicative of an unfairness or inequality in the rate of incomplete or invalid ballots conditioned on candidate choice. While it could be caused by fraud or malfeasance, it could also be caused by confusing ballot layout, poor procedural controls and uniformity, under-voting (not choosing a candidate) by the voter, or other compounding factors, etc. (ref: [1] ch. 6)
The Free and Fair Hypothesis
In a democratic election, each persons vote counts the same. There are other requirements, but this is a necessary condition. In the presense of invalidation, the free and fair hypothesis reduces to each person’s vote having the same probability of being invalidated as any other persons ballot. From a statistical standpoint, this means that the invalidation must be independent of the candidate chosen on the ballot (or of the person voting) [ref: 1, pg. 132]
The data used for this analysis was the “unofficial” election results (the certified results are not yet published), and comes directly from the VA Dept of Elections. The data was downloaded on Nov 18th at 4:34 pm. We purposefully waited to perform this analysis until after the localities had completed their canvass operations, and for the data feeds on the VA Department of Elections (“ELECT”) website to mostly stabilize. The actual certified results will not be available until at least Dec 2 after the State Electoral Board meets to finalize the certification. We will revisit this analysis at that time.
- The list of the number of Active Voters, Inactive Voters and total Ballots cast in a given locality can be found here: https://enr.elections.virginia.gov/results/public/Virginia/elections/2024NovemberGeneral/voters
- The Turnout statistics and Vote Tally reports CSV file reports can be downloaded from here: https://enr.elections.virginia.gov/results/public/Virginia/elections/2024NovemberGeneral/reports
- The direct link to the turnout report used for this computation is here: https://enr.elections.virginia.gov/cdn/results/d2c804ee-4ec2-46bb-91d7-5b41526eab03/Election%20Turnout_bb03ef46-1af5-4f14-86a8-ead1b9094036.csv
- The direct link for the vote tally results report is here: https://enr.elections.virginia.gov/cdn/results/d2c804ee-4ec2-46bb-91d7-5b41526eab03/Election%20Results_4b7d6963-0b3a-4b2f-b331-d9d14ca39bc0.csv
With this dataset in hand we can know how many ballots were cast, as well as how many votes were counted for each candidate in each race in each locality (at least as reported by the state). For a given race, we can then compute the number of incomplete or invalid ballots by subtracting the total number of votes recorded for that race in the locality from the total number of reported ballots cast.
In accordance with the techniques presented in [1] and [2], we computed the plots of the Invalidation Rate vs the Percent Vote Share for the Winner in an attempt to observe if there looks to be any evidence of Differential Invalidation ([1], ch 6). This is similar to the techniques presented in [2], which we have used previously to produce election fingerprint that plotted the 2D histograms of the vote share for the winner vs the turnout percentage. (The 2024 versions are coming, just not ready yet.)
Each dot in Figure 3 below is representing the ballots from a specific locality. The x axis is the percent vote share for the winner (Harris), and the y axis is the ballot invalidation rate, and is computed as 100 – 100 * Nvotes / Nballots.
A few things are immediately apparent from the plot in Figure 3:
- There is clearly a distinction in the invalidation rate between localities that had low vote share and high vote share for harris.
- The data for localities where Harris had low vote share do not have a large distribution of invalidation rates, whereas the high vote share localities do.
- There are a number of localities that are reporting negative invalidation rates. How is this possible, you ask? Well there are a number of localities in the CSV data that have higher vote totals than the corresponding reported number of total ballots cast in the locality.
This implies that there is something significantly wrong in the data and reporting tools or procedures used by ELECT, as all of this data was pulled nearly simultaneously and therefore the data should be at least self-consistent. While we understand that this is still unofficial data and that new updates may occur over time, at any given point in time the data should at least be self-consistent.
Note that there are still a few localities that have not yet had their vote totals reflected in the CSV files from ELECT. Those localities were omitted from this analysis. The combined information from all of the data source files that was used to generate this plot is available below.
In conclusion there does appear to be some indications that differential invalidation occurred in the 2024 VA General Election for President. Due to data inconsistencies and the fact that this data is still officially “unofficial” it is hard to make any definitive conclusions, but these results are suggestive of the existence of multiple underlying issues that need to be examined, understood and/or resolved. We can definitively say, however, that this is yet another example of the data streams from ELECT lacking self-consistency, which is a big problem in and of itself.
References
- [1] Forsberg, O.J. (2020). Understanding Elections through Statistics: Polling, Prediction, and Testing (1st ed.). Chapman and Hall/CRC. https://doi.org/10.1201/9781003019695
- [2] Klimek, Peter & Yegorov, Yuri & Hanel, Rudolf & Thurner, Stefan. (2012). Statistical Detection of Systematic Election Irregularities. Proceedings of the National Academy of Sciences of the United States of America. 109. 16469-73. https://doi.org/10.1073/pnas.1210722109.