Election Data Analysis Election Forensics Election Integrity Ginsu Science Interesting programming technical

Vote Discrepancy in VA 2020 Voting data

Note: As of Update 2021-08-13 (see below) the discrepancy is now 382176 Votes.

This needs to be explained. I hope this is just a mistake on my part or it can be attributed to a documented clerical or reporting error. I’ve gone over this a number of times now and triple checked my data sources and methods (all described or linked here). I’m only using the publicly available data from the VA Dept of Elections website and a doing a fairly basic consistency check between the machine readable data (csv) and what is presented on their site.

The VA Dept of Election summarizes the voting results for the 2020 Presidential election as follows. We note that even just taking screen shots from the VA DoE website we can notice that the vote totals for the same elections have some error in them. The first screen shot shows the total number of votes in the 2020 Presidential Election as 4,486,821 while the second shows the number of votes cast as 4,460,524. Now that is not a major difference in relation to the margin of victory or the number of votes cast, so it’s not something that bothers me. The two reported numbers are close enough that it is plausible that one includes overvotes/undervotes and the other doesn’t, or something similar. But it IS a noted difference, none the less. (issue # 1)

Screen Shot (as of 2021-08-10) of as of 2021-08-10
Screen Shot (as of 2021-08-10)

The VA DoE also publishes the 2020 General Election results (here) as a csv file. (I pulled a fresh copy as of 2021-08-10 to run this analysis, but its too large to upload to the site). One would expect the data in the csv and the data presented on the VA DoE website to match, or at least be close enough to chalk up any errors to something simple like over-votes/undervotes, etc.. They don’t. Not only do they not match, but they don’t match in a very peculiar way. Let’s walk through the process of grabbing the data from the csv and tabulating metrics above in a very simple MATLAB script, shall we?

% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Import the November General vote totals and collate ...
NovemberGeneral = importVAVotefile("SourceData/2020 November General.csv", [2, Inf]);
NovemberGeneral = NovemberGeneral(NovemberGeneral.OfficeTitle == 'President and Vice President',:);
LocalityName = NovemberGeneral.LocalityName;
DistrictName = NovemberGeneral.DistrictName;
PrecinctName = NovemberGeneral.PrecinctName;

uid = join(cat(2,string(LocalityName),string(DistrictName),string(PrecinctName)),' : ');

[uuids, ui,ua] = unique(uid);
[ulns, uci,uca] = unique(NovemberGeneral.LastName);
votes = accumarray([ua,uca],NovemberGeneral.TOTAL_VOTES);
votes(isnan(votes)) = 0;
votes(:,end+1) = sum(votes,2);

isAbsentee = contains(string(uuids),"Central Absentee Precinct");
isProvisional = contains(string(uuids),"Provisional");

% Compute the sum of all ballots
TotalVotes = sum(votes(:,end))

% Compute the sum of the in person ballots
InPersonVotes = sum(votes(~isAbsentee & ~isProvisional,end))

% Compute the sum of the provisional or absentee ballots
Absentee = sum(votes(isAbsentee | isProvisional,end))
>> TotalVotes = 4462600
>> InPersonVotes = 1630833
>> Absentee = 2831767

Wait a minute?? … the data from the first link show the Absentee vote being 2,687,304. Yet the “2020 November General.csv” file shows 2,831,767 absentee votes. Thats a difference of somewhere around 144,463 votes! Even more interesting is the fact that the total number of votes is still consistent with the published summary screenshots above. Thats a significant enough discrepancy that I get my worry beads out. Now, granted, this could … in some convoluted Rube Goldberg esque manner … be just be a clerical error, or maybe I’m just missing something totally obvious about the data. But no matter the reason, this NEEDS to be explained.

Update 2021-08-10 6:57pm

I’ve received a update that further corroborates the discontinuity in these numbers. In response to a FOIA request, the VA Dept of Elections supplied a summary file of the counts of all requested and returned absentee ballots per county.

This data file comes from It contains only two data columns, the Total Requested and the Total Returned. Summing the Total Returned column produces a result of 2,829,037. which is only off from my result by 2730 (<< 1%). So we have yet another official source of VA election data that shows a discrepancy from the official results that were certified by VA. No fancy MATLAB scripts required to process, just simple sum of the table columns.

Update 2021-08-13 3:00 PM

Looking closer at the official certified results from the VA DoE (first graphic and link above), they report that in 2020 the Total Registered = 5,975,696, the Total Voting = 4,486,821, and the % Turnout = 81.48%. They define the “Turnout” metric in the header as “% Voting of Total Registered”. So according to VA DoE … Total Voting / Total Registered * 100 = 4486821 / 5975696 * 100 = 81.48. Simple, right?

Except its wrong. Way wrong. Go ahead … grab a calculator. Try it for yourself.

4486821 / 5975696 * 100 = 75.0845 is what you’ll come up with. That’s a Difference of 6.40% from the reported turnout. .064 * Total Registered = .064 * 5,975,696 = 382,176 (rounded) Unaccounted for votes.

But Jon … you say … surely you are just misinterpreting their metric! They wouldn’t have published the certified results with that glaring of an error. There has to be some other variable that’s counted in the turnout numbers that they just forgot to describe in the header line of the report.

If that were the case, we would expect the other election totals in the subsequent rows to show similar errors then, right? Yeah … not so much. Every other year in this same report has a difference of less than 0.1%.

Remember these are the numbers that VA certified to congress. Even if this is just a clerical error, we need to know how and why it happened and make sure it doesn’t happen again. We need a full audit of the VA 2020 Election.

YearTotal RegisteredPercentage Change from Previous YearTotal VotingTurnout (% Voting of Total Registered)Voting Absentee (Included in Total Voting)Total Voting / Total RegisteredDifference From Reported Turnout


In summary, I can rectify the “2020 November General” and the “Daily_Registrant_Count_By_Locality” data files to give the following results compared to the HTML web report. The additional cross check of the response from FOIA request, matches my number to within a few thousand. All of this data is provided by VA DoE. All of it is supposed to be authoritative and definitive. And none of it makes any sense.

VA DoE Certified ResultsBiden VoteTrump VoteTotal VotingTotal Registered% Change from previous year(Reported) Turnout (Total Voting / Total Registered)(Computed) TurnoutVoting Absentee
#1VA DoE Report: Registration / Turnout4,486,8215,975,6766.18%81.48%75.08% (~382,443 Votes)2,687,304
#2VA DoE Elections Database2,413,5681,962,4304,460,524
#3CSV: 2020-November-General2,413,5681,962,4304,462,6002,831,767
#4CSV: Registrations 11-01-20205,975,7176.18%
#5FOIA’d CSV2,829,037
#6Daily Absentee List (by Request)2,826,484
Computed from #3 & #474.68%74.68%
Max Abs Difference0024,2214106.8% 0.4%144,463
Inconsistencies in VA Dept of Elections Provided Data
Registration / Turnout Report:
Elections Database:
DAL (as of 2020-11-30):

Other Code Used in this analysis:

I will note for completeness here that the importVAVotefile(...) function referenced above is auto-generated by the MATLAB Import Data tool. I’ve listed it below for the record.

function NovemberGeneral = importVAVotefile(filename, dataLines)
%IMPORTFILE Import data from a text file
%  NOVEMBERGENERAL = IMPORTFILE(FILENAME) reads data from text file
%  FILENAME for the default selection.  Returns the data as a table.
%  specified row interval(s) of text file FILENAME. Specify DATALINES as
%  a positive scalar integer or a N-by-2 array of positive scalar
%  integers for dis-contiguous row intervals.
%  Example:
%  NovemberGeneral = importVAVotefile("2020 November General.csv", [2, Inf]);
%  See also READTABLE.
% Auto-generated by MATLAB on 07-Aug-2021 14:36:17

%% Input handling

% If dataLines is not specified, define defaults
if nargin < 2
    dataLines = [2, Inf];

%% Set up the Import Options and import the data
opts = delimitedTextImportOptions("NumVariables", 22);

% Specify range and delimiter
opts.DataLines = dataLines;
opts.Delimiter = ",";

% Specify column names and types
opts.VariableNames = ["CandidateUid", "FirstName", "MiddleName", "LastName", "Suffix", "TOTAL_VOTES", "Party", "WriteInVote", "LocalityUid", "LocalityCode", "LocalityName", "PrecinctUid", "PrecinctName", "DistrictUid", "DistrictType", "DistrictName", "OfficeUid", "OfficeTitle", "ElectionUid", "ElectionType", "ElectionDate", "ElectionName"];
opts.VariableTypes = ["categorical", "categorical", "categorical", "categorical", "categorical", "double", "categorical", "categorical", "categorical", "double", "categorical", "categorical", "categorical", "categorical", "categorical", "categorical", "categorical", "categorical", "categorical", "categorical", "string", "categorical"];

% Specify file level properties
opts.ExtraColumnsRule = "ignore";
opts.EmptyLineRule = "read";

% Specify variable properties
opts = setvaropts(opts, "ElectionDate", "WhitespaceRule", "preserve");
opts = setvaropts(opts, ["CandidateUid", "FirstName", "MiddleName", "LastName", "Suffix", "Party", "WriteInVote", "LocalityUid", "LocalityName", "PrecinctUid", "PrecinctName", "DistrictUid", "DistrictType", "DistrictName", "OfficeUid", "OfficeTitle", "ElectionUid", "ElectionType", "ElectionDate", "ElectionName"], "EmptyFieldRule", "auto");

% Import the data
NovemberGeneral = readtable(filename, opts);