Categories
Election Data Analysis Election Forensics Election Integrity programming technical

Another Interesting VA Election Data Discrepancy

On a spur of curiosity I went back to some of the data provided by the VA dept of elections (“ELECT”) for both the 2020 and 2021 elections and ran a new data consistency test …

I have a copy of the final Daily Absentee List (DAL) for both 2020 and 2021. I also have a copy of the paired Registered Voter List (RVL) and Voter History List (VHL) generated shortly after the close of the 2021 General Election and within a few moments of each other.

I was curious what the percentage of approved and counted absentee ballots from the DAL is that do NOT have an associated “voter credit” in the VHL for both 2020 and 2021. If ELECT’s data is accurate the number should be ideally 0, but most official thresholds for acceptability that I’ve seen for accuracy in election data systems hover somewhere around 0.1%. (0.1% is a fairly consistent standard that I’ve seen per the documentation for various localities Risk Limiting Audits, and the Election Scanner Certification procedures, etc.) The VHL should cover all of the activity for the last four years, but to ensure that I’m accounting for people that might have been officially removed from the RVL and VHL since the 2020 election (due to death, moving out of state, etc), I only run this test on the subset of the entries in the DAL that still have a valid listings in the RVL.

The results are below. Both years seem to have a high amount of discrepancies compared to the 0.1% threshold, with 2020’s discrepancy percentage being over 3x the percentage computed for 2021.

YearPercent of Counted DAL Ballots without Voter Credit
20201.352%
20210.449%

For those interested in the computation, the MATLAB pseudo-code is given below. I can’t actually link to the source data files because of VA’s draconian restrictions on redistributing the contents of the DAL, RVL and VHL data files.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% We first compute the indices of the DAL entries that represent 
% approved and countable ballots ...
%
% 'dal2020' and 'dal2021' variables are the imported DAL tables 
% 'VAVoteHistory' is the imported Voter History List
% 'RegisteredVoterList' is the Registered Voter List
% 
% All four of the above are imported directly from the CSV 
% files provided from the VA Department of elections with 
% very little error checking save for obvious whitespace or 
% line ending checks, etc.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

aiv2021 = dal2021.APP_STATUS == 'Approved' & dal2021.BALLOT_STATUS == 'Issued';
amv2021 = dal2021.APP_STATUS == 'Approved' & dal2021.BALLOT_STATUS == 'Marked';
aomv2021 = dal2021.APP_STATUS == 'Approved' & dal2021.BALLOT_STATUS == 'On Machine';
appv2021 = dal2021.APP_STATUS == 'Approved' & dal2021.BALLOT_STATUS == 'Pre-Processed';
afwv2021 = dal2021.APP_STATUS == 'Approved' & dal2021.BALLOT_STATUS == 'FWAB';
counted2021 = amv2021 | aomv2021 | appv2021 | afwv2021; % Approved and Countable
    
aiv2020 = dal2020.APP_STATUS == 'Approved' & dal2020.BALLOT_STATUS == 'Issued';
amv2020 = dal2020.APP_STATUS == 'Approved' & dal2020.BALLOT_STATUS == 'Marked';
aomv2020 = dal2020.APP_STATUS == 'Approved' & dal2020.BALLOT_STATUS == 'On Machine';
appv2020 = dal2020.APP_STATUS == 'Approved' & dal2020.BALLOT_STATUS == 'Pre-Processed';
afwv2020 = dal2020.APP_STATUS == 'Approved' & dal2020.BALLOT_STATUS == 'FWAB';
counted2020 = amv2020 | aomv2020 | appv2020 | afwv2020; % Approved and Countable

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Next we compute the indices in the VHL that represent 
% 2020 and 2021 General Election entries for voter credit
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
valid_2020_entries = strcmpi(strtrim(string(VAVoteHistory.ELECTION_NAME)), '2020 November General');
valid_2021_entries = strcmpi(strtrim(string(VAVoteHistory.ELECTION_NAME)), '2021 November General');


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% We use the MATLAB intersect function to make sure that 
% we are only using DAL entries that are still in the RVL 
% and therefore are possible to be present in the VHL and 
% compute the percentages.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
[did,iida,iidb] = intersect(dal2020.identification_number(counted2020), ...
    RegisteredVoterList.IDENTIFICATION_NUMBER);
[vid,iida,iidb] = intersect(VAVoteHistory.IDENTIFICATION_NUMBER(valid_2020_entries),...
    RegisteredVoterList.IDENTIFICATION_NUMBER);
[iid,iida,iidb] = intersect(did,vid);
pct2020 = (1-numel(iida) / numel(did)) * 100

[did,iida,iidb] = intersect(dal2021.identification_number(counted2021), ...
    RegisteredVoterList.IDENTIFICATION_NUMBER);
[vid,iida,iidb] = intersect(VAVoteHistory.IDENTIFICATION_NUMBER(valid_2021_entries),...
    RegisteredVoterList.IDENTIFICATION_NUMBER);
[iid,iida,iidb] = intersect(did,vid);
pct2021 = (1-numel(iida) / numel(did)) * 100