Categories
Election Data Analysis Election Forensics Election Integrity technical

There’s More to the Story …

A while back I posted about the fact that the official turnout numbers on the VA Dept of Elections (“ELECT”) website we’re incorrect. ELECT’s webpage showed the turnout for the 2020 election was 81.48%, when their own stats on the same page clearly show the results should be 75.08%.

I first published this discrepancy on Aug 13 2021 via twitter (here), and then again on Sep 09 2021 on this blog (as well as twitter) in the first section of my 2020 analysis report (here). ELECT subsequently (and quietly) updated their results page by Sept 10th, but made no public notice as to the error and revision, and never responded to my questions to them as to how the error came about in the first place, why was it still on their official results page for almost a year, and what were they doing to ensure these types of errors didn’t happen again. I subsequently updated my 2020 analysis report to include the silent update of the results by ELECT (here).

In going over and reviewing some of my 2020 work and trying to make sure all of my documentation is complete, I’ve come across another tidbit of interesting information regarding the results on the ELECT site:

There were other silent updates to the official results that predate my finding the wrong turnout numbers.

Using the wayback machine (here), the first record I can find that has the 2020 results is from March 10 2021. Subsequent changes made to the values that can be seen with the wayback machine data are summarized below.

Wayback Machine DateYearTotal RegisteredPercent Change from Previous YearTotal VotingTurnout (% Voting of Total Registered)Voting Absentee (included in Total Voting)
2021-09-1020205,975,6966.18%4,486,82175.08%2,687,304
2021-09-0820205,975,6966.18%4,486,82181.48%2,687,304
2021-04-0920205,975,6966.18%4,486,82181.48%2,687,304
2021-04-0220205,975,8026.18%4,413,38873.85%2,614,927
2021-03-1020205,975,8026.18%4,413,38873.85%2,614,927
Statistics posted on https://www.elections.virginia.gov/resultsreports/registrationturnout-statistics/ (as per the WayBack Machine) at different times since the close of the 2020 election.

From the table above there looks to have been a change sometime between April 2nd and April 9th to the number of registered voters, the total number of voters in the 2020 election, the turnout statistic, and the number of voters voting absentee. There also was a change sometime between Sept 8th and Sept 10th to the turnout statistic (presumably in response to my documenting the error and publishing it).

The entry from 2021-03-10 agrees with the official Election 2020 Summary Report dated 2021-01-25. The entry 2021-09-10 agrees with the Election 2020 Summary Report dated 2021-10-01. To their credit, ELECT has at least put a notice on the page for the Election Summary Reports that the report was updated on Oct 1st, but there is no explanation as to the reasoning for the revisions.

I would like to know why was ELECT modifying the results for months after the close of the election? Where did the extra ~73K ballots come from that were added to the results on April 9th? Where did the 81% come from?

Categories
Election Data Analysis Election Forensics Election Integrity programming technical

Another Interesting VA Election Data Discrepancy

On a spur of curiosity I went back to some of the data provided by the VA dept of elections (“ELECT”) for both the 2020 and 2021 elections and ran a new data consistency test …

I have a copy of the final Daily Absentee List (DAL) for both 2020 and 2021. I also have a copy of the paired Registered Voter List (RVL) and Voter History List (VHL) generated shortly after the close of the 2021 General Election and within a few moments of each other.

I was curious what the percentage of approved and counted absentee ballots from the DAL is that do NOT have an associated “voter credit” in the VHL for both 2020 and 2021. If ELECT’s data is accurate the number should be ideally 0, but most official thresholds for acceptability that I’ve seen for accuracy in election data systems hover somewhere around 0.1%. (0.1% is a fairly consistent standard that I’ve seen per the documentation for various localities Risk Limiting Audits, and the Election Scanner Certification procedures, etc.) The VHL should cover all of the activity for the last four years, but to ensure that I’m accounting for people that might have been officially removed from the RVL and VHL since the 2020 election (due to death, moving out of state, etc), I only run this test on the subset of the entries in the DAL that still have a valid listings in the RVL.

The results are below. Both years seem to have a high amount of discrepancies compared to the 0.1% threshold, with 2020’s discrepancy percentage being over 3x the percentage computed for 2021.

YearPercent of Counted DAL Ballots without Voter Credit
20201.352%
20210.449%

For those interested in the computation, the MATLAB pseudo-code is given below. I can’t actually link to the source data files because of VA’s draconian restrictions on redistributing the contents of the DAL, RVL and VHL data files.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% We first compute the indices of the DAL entries that represent 
% approved and countable ballots ...
%
% 'dal2020' and 'dal2021' variables are the imported DAL tables 
% 'VAVoteHistory' is the imported Voter History List
% 'RegisteredVoterList' is the Registered Voter List
% 
% All four of the above are imported directly from the CSV 
% files provided from the VA Department of elections with 
% very little error checking save for obvious whitespace or 
% line ending checks, etc.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

aiv2021 = dal2021.APP_STATUS == 'Approved' & dal2021.BALLOT_STATUS == 'Issued';
amv2021 = dal2021.APP_STATUS == 'Approved' & dal2021.BALLOT_STATUS == 'Marked';
aomv2021 = dal2021.APP_STATUS == 'Approved' & dal2021.BALLOT_STATUS == 'On Machine';
appv2021 = dal2021.APP_STATUS == 'Approved' & dal2021.BALLOT_STATUS == 'Pre-Processed';
afwv2021 = dal2021.APP_STATUS == 'Approved' & dal2021.BALLOT_STATUS == 'FWAB';
counted2021 = amv2021 | aomv2021 | appv2021 | afwv2021; % Approved and Countable
    
aiv2020 = dal2020.APP_STATUS == 'Approved' & dal2020.BALLOT_STATUS == 'Issued';
amv2020 = dal2020.APP_STATUS == 'Approved' & dal2020.BALLOT_STATUS == 'Marked';
aomv2020 = dal2020.APP_STATUS == 'Approved' & dal2020.BALLOT_STATUS == 'On Machine';
appv2020 = dal2020.APP_STATUS == 'Approved' & dal2020.BALLOT_STATUS == 'Pre-Processed';
afwv2020 = dal2020.APP_STATUS == 'Approved' & dal2020.BALLOT_STATUS == 'FWAB';
counted2020 = amv2020 | aomv2020 | appv2020 | afwv2020; % Approved and Countable

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Next we compute the indices in the VHL that represent 
% 2020 and 2021 General Election entries for voter credit
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
valid_2020_entries = strcmpi(strtrim(string(VAVoteHistory.ELECTION_NAME)), '2020 November General');
valid_2021_entries = strcmpi(strtrim(string(VAVoteHistory.ELECTION_NAME)), '2021 November General');


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% We use the MATLAB intersect function to make sure that 
% we are only using DAL entries that are still in the RVL 
% and therefore are possible to be present in the VHL and 
% compute the percentages.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
[did,iida,iidb] = intersect(dal2020.identification_number(counted2020), ...
    RegisteredVoterList.IDENTIFICATION_NUMBER);
[vid,iida,iidb] = intersect(VAVoteHistory.IDENTIFICATION_NUMBER(valid_2020_entries),...
    RegisteredVoterList.IDENTIFICATION_NUMBER);
[iid,iida,iidb] = intersect(did,vid);
pct2020 = (1-numel(iida) / numel(did)) * 100

[did,iida,iidb] = intersect(dal2021.identification_number(counted2021), ...
    RegisteredVoterList.IDENTIFICATION_NUMBER);
[vid,iida,iidb] = intersect(VAVoteHistory.IDENTIFICATION_NUMBER(valid_2021_entries),...
    RegisteredVoterList.IDENTIFICATION_NUMBER);
[iid,iida,iidb] = intersect(did,vid);
pct2021 = (1-numel(iida) / numel(did)) * 100