Categories
Election Data Analysis Election Forensics Election Integrity programming technical

Updates on the In-Person Early Vote records added after the close of Early Voting in VA 2021

Previously I wrote about finding In-Person Early Vote records inserted into the Daily Absentee List (DAL) records after the close of early voting in VA in 2021. Well, theres been quite a bit of activity since then and I have somme updates to share.

I originally discovered this issue and began digging into it around Nov 8th 2021, and finally published it to my blog on Dec 10th 2021. At the same time, queries were sent through the lawyers for the RPV to the ELECT Chairman (Chris Piper) and to a number of registrars to attempt to determine the cause of this issue, but no response was supplied. I also raised this issue to my local Board of Elections chair, and requested that ELECT comment on the matter through their official twitter account.

Since that time I have continued to publish my findings, have continued to request responses from ELECT, and have offered to work with them to address and resolve these discrepancies. I know ELECT pays attention to this site and my twitter account, as they have quietly corrected both their website and data files after I have pointed out other errors and discrepancies. Additionally, Chris Piper has continued to publicly insist that there were no major issues in either the 2020 or 2021 election (including under questioning by the VA Senate P&E Committee), and neither he nor any member of ELECT has publicly acknowledged any of the issues I have raised … besides the aforementioned changing of their site contents, of course. I have thankfully had a few local board of elections members work with me, as well as a few local registrars … but I did not see any meaningful response or engagement from anyone at ELECT until Feb 23rd 2022 as discussed below.

On Feb 22nd 2022, I was invited to participate in a meeting arranged by VA State Senator Amanda Chase with the VA Attorney Generals office to discuss election integrity issues. I specifically cited a number of the issues that I’ve documented here on my blog, including the added DAL entries, as justification for my belief that there is an arguable case to be made for there being criminal gross negligence and maladministration at ELECT with respect to the administration and handling of VA election data.

That meeting apparently shook some things loose. Good.

The day after that meeting Chris Piper finally sent a response at 10:45am to our inquiry on the subject of the added DAL entries. It is quoted below:

While I am glad that he finally responded, his technical reasoning does not address all of the symptoms that are observed:

  • He states the cause was due to the EPB’s from one vendor, but the data shows DAL additions being attributed to multiple localities that use different EPB vendors.
  • His explanation does not address the distribution of the numbers of DAL additions across all of the precincts observed. A misconfigured or malfunctioning poll book would affect all of the check-ins entered on it, not just a sporadic few.
  • This also does not seem like a minor issue as its affecting thousands of voters ballots and voter history. So I’m rather concerned with Mr. Piper’s attitude toward this issue, as well as others. It needs to be addressed as a logic and accuracy testing issue, as a matter of procedures and training, in addition to simply asking vendors to add checks to their software. Also, will this be addressed in VERIS at all, or if/when a replacement system is put in place?
  • In response to his last paragraph, I will simply note that I have been actively and consistently working to raise all of these issues through official channels … through official requests via the RPV, working with local election board members and registrars, and asking for input from elect through social media. I have not made accusations of malicious or nefarious intent, but I do think there is plenty of evidence to make the case of incompetence +/or gross negligence with our elections data … which is actually a crime in VA … and Chris Piper has been the head of ELECT and the man responsible for ensuring our election data during that time (he is stepping down effective March 11th).

Since his response was sent, a few additional things have occurred:

(A) The AG’s office informed us that they are actually required by law to treat ELECT as their client and defend them from any accusations of wrongdoing. This is frustrating as there does not seem to be any responsive cognizant authority that is able to act on this matter in the interest of the public. This is not a local jurisdictional issue as it affects voters statewide, and is therefore in the purview of the AG as the Department of Elections has been heretofore dismissive and non-responsive of these matters. I am not a lawyer, however.

(B) I was able to connect with the Loudoun County Deputy Director of Elections (Richard Keech) as well as the Prince William County registrar (Eric Olsen) and have been working through the finer details of Chris’s explanation to verify and validate at the local level. [Note: I previously had Richard erroneously listed here as the Registrar instead of the Deputy Directory]. Both Richard and Eric are continuing to look into the matter, and I continue to work with them to get to the bottom of this issue.

  • Richard confirmed his belief that the bad OS system date on Election Day EPBs was responsible for the errors, however with some slight differences in the details from Piper’s description. There were multiple vendors affected, not just one. Per Richard, the problem appeared to be that a number of Loudoun poll-books (regardless of vendor) that were used for Election Day had been in storage so long that their batteries had completely depleted. When they were finally powered up, their OS system clocks had a wildly incorrect date. The hardware used was a mixture of Samsung SM-T720 and iPad tablets, depending on poll-book application vendor. The hardware was purchased separately through local contracts with CDW, and the software was uploaded and configured by the vendors.
  • In Loudoun, all of the EPBs went through logic and accuracy testing before the election per Richard, but it does not appear that the procedures for the Logic and Accuracy testing had any specific checks for OS date settings.
  • In Prince William County the registrar (Eric) was not aware of any issues with the system clocks on the poll-books, and he was skeptical of the distribution of the small numbers of added DAL entries. He noted, as I did above, that if a poll book was misconfigured it would affect all of the records that passed through it, not just a small handful. He also noticed that there was a discrepancy with the attribution of polling place names that I had extracted from the DAL files, where some of the names did not correspond with actual polling places in PWC. He has stated he will look into the matter and get back to me. I will update my blog when he does so.
  • From my communications with Richard, the VERIS system imports a text based file for processing voter credit and does not have any special checks against the dates for in-person vs early voting records. Hence, why this issue can impact multiple vendors if their applications use the system clock to date-stamp their exported txt files for upload into VERIS.

(C) I have reworked my code per my conversations with Ricky and Eric, and fixed a few bugs and parsing errors along the way. Most notably, there are a number of missing or malformed field values in the raw DAL files that were being parsed into ‘undefined’ categorical values by the default csv MATLAB parser. These ‘undefined’ values, even when located in unimportant fields in the row of a MATLAB table, can cause the entire row to be incomparable to other entries when performing logical operations. I have adjusted my parser and logic to account for +/or ignore these entries as necessary. Additionally I had previously looked for new entries by comparing values across entire rows, but have adjusted to now only look at voter ID numbers that have not been seen previously, in order to omit those entries that had simply been adjusted (address changes, etc) after the fact, or that contained ‘undefined’ field elements as mentioned previously. Also I noticed that some of the dal files had duplicate records of Approved and On-Machine records for the same voter ID. While that is an issue in itself, I de-duplicated those entries for this analysis. This new logic gives the updated results presented below, with a total number of discrepancies now at 2820.

I will note that I am still a little skeptical of the “bad date” explanation as being a complete answer to this issue, as it does not adequately explain the distribution of small numbers of discrepancies attributed to multiple precincts, for one thing. While the bad date may explain part of the issue, it does not adequately account for all of the observed effects, IMO. For example, in Loudoun there are 26 precincts listed below that have inserted DAL records attributed to them. Many of these precincts having only 1 or 2 associated records. If the bad OS date explanation is to blame, then (a) there must have been at least 26 poll-books, one at each precinct, in Loudoun with misconfigured and incorrect dates AND (b) many of these poll-books were used to only check in 1 or 2 people total, as anyone checked in on a misconfigured poll-book would have their voter credit/DAL file entries affected. This would have to have been replicated at ALL of the precincts in ALL of the localities listed below. While the above scenario is admittedly possible, I find it rather implausible.

Update 2022-03-20

I’ve heard back from both Ricky Keech (Loudoun) and Eric Olsen (PWC).

Eric looked into the 10 entries for PWC and all of them were Military voters who did actually walk in, in person, to the registrars office and vote on machine absentee after the close of early voting, as allowed for by state law. So everything checks out for PWC.

Per discussion with Ricky, there were two issues: The first being a number of pollbooks for four specific precincts that were used for Election Day having the wrong date setting as discussed above. The other issues was possibly the connectivity issue of the Pollbook to the servers in South Riding during the early voting period that had to be hand corrected. Ricky’s explanation e-mail to me is copied below.

Hi Jon,

Following up on our conversation the other day.  So, I did some more digging and was able to figure out what happened.  The bulk of the voters (1213) were from the four precincts that had a tablet with the wrong date on it.  That accounts for all voters in precincts 214, 416, 628, and 708.  I had a staff person go back and pull out the tablets used in those precincts, and we confirmed that each of those four had one table with the wrong date.

That leaves 141 voters that received ‘credit’ after election day.  Once we had narrowed it down, I looked for patterns and noticed all the remaining precincts were in South Riding.  That jogged my memory and led me to the solution.  When I ran the final reports on the Sunday before the election, I noticed that the number showing as voting early seemed to be off by 100 or so.  This was odd because our daily checked in count and voted count reconciled at every site every day.  So, I went back and compared the voters checked into our pollbooks at early voting to the voters with early voting (On Machine) credit in VERIS and found that there were 137 voters who voted on October 19 at the Dulles South EV site and for some reason did not have credit.  I worked on this Monday to make sure it was right, and it was, none of those voters had credit.  This could either have been a connectivity issue at Dulles South EV site OR an issue with VERIS when the data was uploaded to mark the voters.  I can say definitively that the number checked in on the pollbooks at that site on that day and the number of people who put ballots into the machine was correct, we check that constantly and the observers on site checked as well.  I can guarantee that if there had been a discrepancy, we’d have heard about it right away.

So, after determining that was exactly what happened I uploaded credit for those voters at 2:06:51pm on Wednesday, November 3 and the upload completed processing at 2:07:07pm.

When we spoke the other day, I thought it was likely a connectivity issue, but now I’m not entirely sure that’s the case, as if the connection wasn’t working the numbers should have been off.  And they were correct on the devices at the EV site and my laptop here at the office.  Everything matched.

So long story short, we did an audit, discovered missing credit from one early voting site on one day, and corrected it.

The other four voters were people who voted an emergency early voting ballot on Monday, November 1.

Richard Keech, Deputy Director of Elections for Loudoun County, Mar 11 2022 email to Jon Lareau
LocalityCOUNT
LOUDOUN COUNTY1344
HANOVER COUNTY1302
CHESAPEAKE CITY92
PRINCE WILLIAM COUNTY10
HENRICO COUNTY7
WINCHESTER CITY7
CHARLES CITY COUNTY5
CAMPBELL COUNTY3
CHARLOTTE COUNTY3
FAUQUIER COUNTY3
LUNENBURG COUNTY3
WASHINGTON COUNTY3
ALEXANDRIA CITY2
AMELIA COUNTY2
AMHERST COUNTY2
BATH COUNTY2
CAROLINE COUNTY2
FALLS CHURCH CITY2
HENRY COUNTY2
NORTHAMPTON COUNTY2
ORANGE COUNTY2
ROANOKE CITY2
VIRGINIA BEACH CITY2
ALLEGHANY COUNTY1
APPOMATTOX COUNTY1
BLAND COUNTY1
CHARLOTTESVILLE CITY1
CLARKE COUNTY1
CULPEPER COUNTY1
ESSEX COUNTY1
FAIRFAX CITY1
FLOYD COUNTY1
KING GEORGE COUNTY1
MIDDLESEX COUNTY1
NELSON COUNTY1
NOTTOWAY COUNTY1
POQUOSON CITY1
STAFFORD COUNTY1
STAUNTON CITY1
LOCALITYPRECINCTCOUNT
HANOVER COUNTY704 – ELMONT667
HANOVER COUNTY602 – LEE DAVIS635
LOUDOUN COUNTY416 – HAMILTON443
LOUDOUN COUNTY214 – SUGARLAND NORTH344
LOUDOUN COUNTY708 – SENECA319
LOUDOUN COUNTY628 – MOOREFIELD STATION97
LOUDOUN COUNTY319 – JOHN CHAMPE21
LOUDOUN COUNTY313 – PINEBROOK16
LOUDOUN COUNTY112 – FREEDOM13
LOUDOUN COUNTY122 – HUTCHISON FARM11
LOUDOUN COUNTY126-GOSHEN POST10
CHESAPEAKE CITY055 – GEORGETOWN EAST8
LOUDOUN COUNTY107 – LITTLE RIVER8
CHESAPEAKE CITY053 – FAIRWAYS7
LOUDOUN COUNTY121 – TOWN HALL7
LOUDOUN COUNTY316 – CREIGHTON’S CORNER7
LOUDOUN COUNTY318 – MADISON’S TRUST7
CHESAPEAKE CITY008 – SOUTH NORFOLK RECREATION6
CHESAPEAKE CITY012 – GEORGETOWN6
LOUDOUN COUNTY114 – DULLES SOUTH6
CHESAPEAKE CITY018 – INDIAN RIVER5
LOUDOUN COUNTY124 – LIBERTY5
LOUDOUN COUNTY320 – STONE HILL5
CHESAPEAKE CITY029 – TANGLEWOOD4
CHESAPEAKE CITY042 – PARKWAYS4
CHESAPEAKE CITY059 – CLEARFIELD4
CHESAPEAKE CITY065 – WATERWAY II4
LOUDOUN COUNTY119 – ARCOLA4
LOUDOUN COUNTY322-BUFFALO TRAIL4
CHESAPEAKE CITY005 – CRESTWOOD3
CHESAPEAKE CITY022 – NORFOLK HIGHLANDS3
CHESAPEAKE CITY057 – CYPRESS3
LOUDOUN COUNTY120 – LUNSFORD3
LOUDOUN COUNTY123 – CARDINAL RIDGE3
WINCHESTER CITY101 – MERRIMANS3
AMHERST COUNTY501 – MADISON2
CHARLES CITY COUNTY101 – PRECINCT 1-12
CHARLES CITY COUNTY301 – PRECINCT 3-12
CHARLOTTE COUNTY702 – BACON/SAXE2
CHESAPEAKE CITY010 – OSCAR SMITH2
CHESAPEAKE CITY014 – GRASSFIELD2
CHESAPEAKE CITY015 – GREENBRIER MIDDLE SCHOOL2
CHESAPEAKE CITY016 – HICKORY GROVE2
CHESAPEAKE CITY023 – OAK GROVE2
CHESAPEAKE CITY024 – OAKLETTE2
CHESAPEAKE CITY031 – CARVER SCHOOL2
CHESAPEAKE CITY032 – PROVIDENCE2
CHESAPEAKE CITY034 – HICKORY MIDDLE SCHOOL2
CHESAPEAKE CITY043 – PLEASANT CROSSING2
CHESAPEAKE CITY056 – GREEN TREE2
FALLS CHURCH CITY003 – THIRD WARD2
LOUDOUN COUNTY302 – ROUND HILL2
LOUDOUN COUNTY308 – ST LOUIS2
LOUDOUN COUNTY309 – ALDIE2
LOUDOUN COUNTY617 – OAK GROVE2
ORANGE COUNTY101 – ONE WEST2
PRINCE WILLIAM COUNTY409 – TYLER2
PRINCE WILLIAM COUNTY513 – LYNNWOOD2
PRINCE WILLIAM COUNTY712 – LEESYLVANIA2
WASHINGTON COUNTY701 – HIGH POINT2
WINCHESTER CITY201 – VIRGINIA AVENUE2
ALEXANDRIA CITY110 – CHARLES HOUSTON CENTER1
ALEXANDRIA CITY201 – NAOMI L. BROOKS SCHOOL1
ALLEGHANY COUNTY101 – ARRITT1
AMELIA COUNTY301 – NUMBER THREE1
AMELIA COUNTY501 – NUMBER FIVE1
APPOMATTOX COUNTY401 – COURTHOUSE1
BATH COUNTY101 – WARM SPRINGS1
BATH COUNTY201 – HOT SPRINGS1
BLAND COUNTY301 – HOLLYBROOK1
CAMPBELL COUNTY102 – NEW LONDON1
CAMPBELL COUNTY402 – COURT HOUSE1
CAMPBELL COUNTY602 – CONCORD1
CAROLINE COUNTY202 – SOUTH MADISON1
CAROLINE COUNTY401 – DAWN1
CHARLES CITY COUNTY201 – PRECINCT 2-11
CHARLOTTE COUNTY201 – RED OAK WYLLIESBURG1
CHARLOTTESVILLE CITY102 – CLARK1
CHESAPEAKE CITY006 – DEEP CREEK1
CHESAPEAKE CITY009 – BELLS MILL1
CHESAPEAKE CITY011 – GENEVA PARK1
CHESAPEAKE CITY020 – E W CHITTUM1
CHESAPEAKE CITY033 – WESTOVER1
CHESAPEAKE CITY046 – BELLS MILL II1
CHESAPEAKE CITY048 – JOLLIFF MIDDLE SCHOOL1
CHESAPEAKE CITY049 – WATERWAY1
CHESAPEAKE CITY050 – RIVER WALK1
CHESAPEAKE CITY051 – COOPERS WAY1
CHESAPEAKE CITY062 – FENTRESS1
CHESAPEAKE CITY063 – POPLAR BRANCH1
CHESAPEAKE CITY064 – DEEP CREEK II1
CLARKE COUNTY301 – MILLWOOD1
CULPEPER COUNTY303 – CARDOVA1
ESSEX COUNTY201 – NORTH1
FAIRFAX CITY001 – ONE1
FAUQUIER COUNTY202 – AIRLIE1
FAUQUIER COUNTY404 – SPRINGS VALLEY1
FAUQUIER COUNTY501 – THE PLAINS1
FLOYD COUNTY301 – COURTHOUSE1
HENRICO COUNTY105 – GREENDALE1
HENRICO COUNTY209 – GLEN LEA1
HENRICO COUNTY304 – JACKSON DAVIS1
HENRICO COUNTY316 – COLONIAL TRAIL1
HENRICO COUNTY416 – SPOTTSWOOD1
HENRICO COUNTY506 – EANES1
HENRICO COUNTY513 – PLEASANTS1
HENRY COUNTY203 – HORSEPASTURE #21
HENRY COUNTY501 – BASSETT NUMBER ONE1
KING GEORGE COUNTY101 – COURTHOUSE1
LOUDOUN COUNTY108 – MERCER1
LOUDOUN COUNTY118 – MOOREFIELD1
LOUDOUN COUNTY401 – WEST LOVETTSVILLE1
LUNENBURG COUNTY301 – ROSEBUD1
LUNENBURG COUNTY501 – REEDY CREEK1
LUNENBURG COUNTY502 – PEOPLES COMMUNITY CENTER1
MIDDLESEX COUNTY501 – WILTON1
NELSON COUNTY401 – ROSELAND1
NORTHAMPTON COUNTY201 – PRECINCT 2-11
NORTHAMPTON COUNTY401 – PRECINCT 4-11
NOTTOWAY COUNTY201 – PRECINCT 2-11
POQUOSON CITY001 – CENTRAL1
PRINCE WILLIAM COUNTY103 – GLENKIRK1
PRINCE WILLIAM COUNTY210 – PENN1
PRINCE WILLIAM COUNTY305 – PATTIE1
PRINCE WILLIAM COUNTY311 – SWANS CREEK1
ROANOKE CITY014 – Crystal Spring1
ROANOKE CITY019 – Forest Park1
STAFFORD COUNTY702 – WHITSON1
STAUNTON CITY301 – WARD NO 31
VIRGINIA BEACH CITY030 – RED WING1
VIRGINIA BEACH CITY063 – CULVER1
WASHINGTON COUNTY302 – SOUTH ABINGDON1
WINCHESTER CITY301 – WAR MEMORIAL1
WINCHESTER CITY402 – ROLLING HILLS1

The MATLAB code for generating the above is given below. The raw time-stamped DAL data files, as downloaded from the ELECT website, are loaded from the ‘droot’ directory tree as shown below, and I only utilize the latest daily download of DAL file data for simplicity.


warning off all
% Data directory root
droot = 'SourceData/DAL/2021/';

% Gets the list of DAL files that were downloaded from the ELECT provided
% URL over the course of the 2021 Election.
files = dir([droot,'raw/**/Daily_Absentee_List_*.csv']);
matc = regexp({files.name}, 'Daily_Absentee_List_\d+(T\d+)?.csv','match');
matc = find(~cellfun(@isempty,matc));
files = files(matc);

% Only process the last updated DAL for each day.  I downloaded multiple
% times per day, but we will just take the last file downloaded each day
% for simplicity here.
matc = regexp({files.name}, '(\d+)(T\d+)?','tokens');
fd = []; pc = 0; ic = 0; idx = [];
for i=1:numel(matc)
    if isempty(regexp(matc{i}{1}{1},'^2021.*'))
        fd(i) = datenum(matc{i}{1}{1},'mmddyyyy');
    else
        fd(i) = datenum(matc{i}{1}{1},'yyyymmdd');
    end
    datestr(fd(i))
    if pc ~= fd(i)
        ic = ic+1;
    end
    idx(ic) = i;
    pc = fd(i);
end
files = files(idx);

% Now that we have our list of files, lets process.
seen = [];
firstseen = [];
astats = [];
astatsbc = {};
cumOnMachine = [];
T = [];
for i = 1:numel(files)
    % Extract the date of the ELECT data pull.  Note that the first few
    % days I was running the script I was not including the time of the
    % pull when I was pulling the data fro the ELECT url and writing to
    % disk, so there's soe special logic here to handle that issue.
    matc = regexp(files(i).name, '(\d+)(T\d+)?','tokens');
    matc = [matc{1}{:}];
    if isempty(regexp(matc,'^2021.*'))
        fdn = datenum(matc,'mmddyyyy')+.5;
    else
        fdn= datenum(matc,'yyyymmddTHHMMSS');
    end
    fds = datestr(fdn,30)
    fdt = datetime(fdn,'ConvertFrom','datenum');

    % Move a copy to the 'byDay' folder so we can keep a reference to the
    % data that went into this analysis.
    dal2021filename = [files(i).folder,filesep,files(i).name];
    ofn = [droot,'byDay/Daily_Absentee_List_',fds,'.csv'];
    if ~exist(ofn)
        copyfile(dal2021filename,ofn);
    end

    % Import the DAL file
    dal2021 = import2021DALfile(dal2021filename, [2, Inf]);

    % Cleanup and handle undefined or imssing values.  
    dal2021.CITY(isundefined(dal2021.CITY)) = 'UNDEFINED';
    dal2021.STATE(isundefined(dal2021.STATE)) = 'UNDEFINED';
    dal2021.ZIP(isundefined(dal2021.ZIP)) = 'UNDEFINED';
    dal2021.COUNTRY(isundefined(dal2021.COUNTRY)) = 'UNDEFINED';

    % Do some basic indexing of different DAL status categories and combinations
    appvd = dal2021.APP_STATUS == 'Approved' ;
    aiv2021 = appvd & dal2021.BALLOT_STATUS == 'Issued';
    amv2021 = appvd & dal2021.BALLOT_STATUS == 'Marked';
    aomv2021 = appvd & dal2021.BALLOT_STATUS == 'On Machine';
    appv2021 = appvd & dal2021.BALLOT_STATUS == 'Pre-Processed';
    afwv2021 = appvd & dal2021.BALLOT_STATUS == 'FWAB';
    appmv2021 = amv2021 | aomv2021 | appv2021 | afwv2021; % Approved and Countable

    % Accumulate the stats for each DAL file
    rstats = table(fdt,sum(aiv2021),sum(amv2021),sum(aomv2021),...
        sum(appv2021),sum(afwv2021),sum(appmv2021),'VariableNames',{'DALFileDate',...
        'NumIssued','NumMarked','NumOnMachine','NumPreProcessed','NumFWAB','NumCountable'});
    astats = [astats; rstats];

    % Write out the entries that were approved and countable in this DAL
    % file
    ofn = [droot,'byDayCountable/Daily_Absentee_List_Countable_',fds,'.csv'];
    if ~exist(ofn)
        writetable(dal2021(appmv2021,:),ofn);
    end

    % Write out the entries that were approved, countable and marked as 'On
    % Machine' (i.e. an Early In-Person Voter Check-In) in this DAL file
    ofn = [droot,'byDayCountableOnMachine/Daily_Absentee_List_Countable_OnMachine',fds,'.csv'];
    if ~exist(ofn)
        writetable(dal2021(aomv2021,:),ofn);
    end

    % Since the DAL file grows over time, we're going to try and figure out
    % which On-Machine entries in each new file:
    %       (a) We've seen before and are still listed.
    %       (b) We haven't seen before (a NEW entry).
    %       (c) We've seen before but the listing is missing (a DELETED
    %           entry).
    if isempty(T)
        % Only applicable for the first file we process.
        T = dal2021(aomv2021,:);

        % There are sometimes duplicate uid numbers in the approved and
        % on-machine counts!  This is a problem in and of itself, but not
        % the problem I'm trying to focus on at the moment.  So I'm going
        % to remove any duplicated rows based on UID.
        [uid, ia, ib] = unique(T.identification_number);
        T = T(ia,:);
        inew = (1:size(T,1))';
        ideleted = [];
        firstseen = repmat(fdt,size(T,1),1);
    else
        dOM = dal2021(aomv2021,:);

        % There are sometimes duplicate uid numbers in the approved and
        % on-machine counts!  This is a problem in and of itself, but not
        % the problem I'm trying to focus on at the moment.  So I'm going
        % to remove any duplicated rows based on UID.
        [uid, ia, ib] = unique(dOM.identification_number);
        dOM = dOM(ia,:);

        %[~,ileft,iright] = innerjoin(T,dOM);
        [~,ileft,iright] = intersect(T.identification_number,dOM.identification_number);

        % 'inew' will be a boolean vector representing those entries in
        % 'dOM' that are new On-Machine records
        inew = true(size(dOM,1),1);
        inew(iright) = false;

        % 'ideleted' will be a boolean vector representing those entries in
        % 'T' that are missing On-Machine records in 'dOM'
        ideleted = true(size(T,1),1);
        ideleted(ileft) = false;
        T = [T;dOM(inew,:)];
        firstseen = [firstseen; repmat(fdt,sum(inew),1)];
    end
    clf;
    plot(astats.DALFileDate,astats{:,2:end},'LineWidth',2);
    grid on;
    grid minor;
    legend(astats.Properties.VariableNames(2:end),'Location','NorthWest');
    xlabel('Date of DAL file pull from ELECT');
    ylabel('Counts');
    drawnow;
end
ofn = [droot,'byDayStats.csv'];
writetable(astats,ofn);
T.FirstSeen = firstseen;
ofn = [droot,'onMachineRecords.csv'];
writetable(T,ofn);
ofn = [droot,'onMachineRecords_missing.csv'];
writetable(T(ideleted,:),ofn);
cutoffDate = datetime('2021-11-01');
ofn = [droot,'onMachineRecords_after_20211101.csv'];
writetable(T(T.FirstSeen >= cutoffDate,:),ofn);
Ta = T(T.FirstSeen >= cutoffDate,:);
[ulocality,ia,ib] = unique(Ta.LOCALITY_NAME);
clocality = accumarray(ib,1,size(ulocality));
Tu = table(ulocality,clocality,'VariableNames',{'Locality','COUNT'});
ofn = [droot,'numOnMachineRecords_after_20211101_byLocality.csv'];
writetable(Tu,ofn);
Ta = T(T.FirstSeen >= cutoffDate,:);
[uprecinct,ia,ib] = unique(join([string(Ta.LOCALITY_NAME),string(Ta.PRECINCT_NAME)]));
cprecinct = accumarray(ib,1,size(uprecinct));
Tu = table(Ta.LOCALITY_NAME(ia),Ta.PRECINCT_NAME(ia),cprecinct,'VariableNames',{'LOCALITY','PRECINCT','COUNT'});
ofn = [droot,'numOnMachineRecords_after_20211101_byLocalityByPrecinct.csv'];
writetable(Tu,ofn);

The adjusted MATLAB parser function is listed below:

function dal = import2021DALfile(filename, dataLines)
%IMPORTFILE Import data from a text file
%  DAILYABSENTEELIST10162021 = IMPORTFILE(FILENAME) reads data from text
%  file FILENAME for the default selection.  Returns the data as a table.
%
%  DAILYABSENTEELIST10162021 = IMPORTFILE(FILE, DATALINES) reads data
%  for the specified row interval(s) of text file FILENAME. Specify
%  DATALINES as a positive scalar integer or a N-by-2 array of positive
%  scalar integers for dis-contiguous row intervals.
%
%  Example:
%  dal = import2021DALfile("SourceData/DAL/Daily_Absentee_List_10162021.csv", [2, Inf]);
%
%  See also READTABLE.
%
% Auto-generated by MATLAB on 16-Oct-2021 14:19:26
%% Input handling
% If dataLines is not specified, define defaults
if nargin < 2
    dataLines = [2, Inf];
end
%% Set up the Import Options and import the data
opts = delimitedTextImportOptions("NumVariables", 38);

% Specify range and delimiter
opts.DataLines = dataLines;
opts.Delimiter = ",";

% Specify column names and types
opts.VariableNames = ["ELECTION_NAME", "LOCALITY_CODE", "LOCALITY_NAME", "PRECINCT_CODE", "PRECINCT_NAME", "LAST_NAME", "FIRST_NAME", "MIDDLE_NAME", "SUFFIX", "ADDRESS_LINE_1", "ADDRESS_LINE_2", "ADDRESS_LINE_3", "CITY", "STATE", "ZIP", "COUNTRY", "INTERNATIONAL", "EMAIL_ADDRESS", "FAX", "VOTER_TYPE", "ONGOING", "APP_RECIEPT_DATE", "APP_STATUS", "BALLOT_RECEIPT_DATE", "BALLOT_STATUS", "identification_number", "PROTECTED", "CONG_CODE_VALUE", "STSENATE_CODE_VALUE", "STHOUSE_CODE_VALUE", "AB_ADDRESS_LINE_1", "AB_ADDRESS_LINE_2", "AB_ADDRESS_LINE_3", "AB_CITY", "AB_STATE", "AB_ZIP", "BALLOTSTATUSREASON", "Ballot_Comment"];
opts.VariableTypes = ["string", "string", "categorical", "string", "string", "string", "string", "string", "string", "string", "string", "string", "categorical", "categorical", "categorical", "categorical", "categorical", "string", "string", "string", "categorical", "string", "categorical", "string", "categorical", "double", "string", "categorical", "categorical", "categorical", "string", "string", "string", "string", "string", "string", "string", "string"];

% Specify file level properties
opts.ExtraColumnsRule = "ignore";
opts.EmptyLineRule = "read";

% Specify variable properties
opts = setvaropts(opts, ["ELECTION_NAME", "LOCALITY_CODE", "PRECINCT_CODE", "PRECINCT_NAME", "LAST_NAME", "FIRST_NAME", "MIDDLE_NAME", "SUFFIX", "ADDRESS_LINE_1", "ADDRESS_LINE_2", "ADDRESS_LINE_3", "EMAIL_ADDRESS", "FAX", "VOTER_TYPE", "APP_RECIEPT_DATE", "BALLOT_RECEIPT_DATE", "PROTECTED", "AB_ADDRESS_LINE_1", "AB_ADDRESS_LINE_2", "AB_ADDRESS_LINE_3", "AB_CITY", "BALLOTSTATUSREASON", "Ballot_Comment"], "WhitespaceRule", "preserve");
opts = setvaropts(opts, ["ELECTION_NAME", "LOCALITY_CODE", "LOCALITY_NAME", "PRECINCT_CODE", "PRECINCT_NAME", "LAST_NAME", "FIRST_NAME", "MIDDLE_NAME", "SUFFIX", "ADDRESS_LINE_1", "ADDRESS_LINE_2", "ADDRESS_LINE_3", "CITY", "STATE", "COUNTRY", "INTERNATIONAL", "EMAIL_ADDRESS", "FAX", "VOTER_TYPE", "ONGOING", "APP_RECIEPT_DATE", "APP_STATUS", "BALLOT_RECEIPT_DATE", "BALLOT_STATUS", "PROTECTED", "CONG_CODE_VALUE", "STSENATE_CODE_VALUE", "STHOUSE_CODE_VALUE", "AB_ADDRESS_LINE_1", "AB_ADDRESS_LINE_2", "AB_ADDRESS_LINE_3", "AB_CITY", "AB_STATE", "BALLOTSTATUSREASON", "Ballot_Comment"], "EmptyFieldRule", "auto");

% Import the data
dal = readtable(filename, opts);

% Perform some cleanup on commonly found issues...
dal.Ballot_Comment = strrep(dal.Ballot_Comment,char([13,10]),". ");
dal.BALLOTSTATUSREASON = strrep(dal.BALLOTSTATUSREASON,char([13,10]),". ");
dal.LOCALITY_NAME = categorical(strtrim(string(dal.LOCALITY_NAME)));
dal.PRECINCT_NAME = categorical(regexprep(strtrim(string(dal.PRECINCT_NAME)),'^(\d+)( +)(\w+)','$1 - $3'));
end
Categories
Election Data Analysis Election Forensics Election Integrity technical

There’s More to the Story …

A while back I posted about the fact that the official turnout numbers on the VA Dept of Elections (“ELECT”) website we’re incorrect. ELECT’s webpage showed the turnout for the 2020 election was 81.48%, when their own stats on the same page clearly show the results should be 75.08%.

I first published this discrepancy on Aug 13 2021 via twitter (here), and then again on Sep 09 2021 on this blog (as well as twitter) in the first section of my 2020 analysis report (here). ELECT subsequently (and quietly) updated their results page by Sept 10th, but made no public notice as to the error and revision, and never responded to my questions to them as to how the error came about in the first place, why was it still on their official results page for almost a year, and what were they doing to ensure these types of errors didn’t happen again. I subsequently updated my 2020 analysis report to include the silent update of the results by ELECT (here).

In going over and reviewing some of my 2020 work and trying to make sure all of my documentation is complete, I’ve come across another tidbit of interesting information regarding the results on the ELECT site:

There were other silent updates to the official results that predate my finding the wrong turnout numbers.

Using the wayback machine (here), the first record I can find that has the 2020 results is from March 10 2021. Subsequent changes made to the values that can be seen with the wayback machine data are summarized below.

Wayback Machine DateYearTotal RegisteredPercent Change from Previous YearTotal VotingTurnout (% Voting of Total Registered)Voting Absentee (included in Total Voting)
2021-09-1020205,975,6966.18%4,486,82175.08%2,687,304
2021-09-0820205,975,6966.18%4,486,82181.48%2,687,304
2021-04-0920205,975,6966.18%4,486,82181.48%2,687,304
2021-04-0220205,975,8026.18%4,413,38873.85%2,614,927
2021-03-1020205,975,8026.18%4,413,38873.85%2,614,927
Statistics posted on https://www.elections.virginia.gov/resultsreports/registrationturnout-statistics/ (as per the WayBack Machine) at different times since the close of the 2020 election.

From the table above there looks to have been a change sometime between April 2nd and April 9th to the number of registered voters, the total number of voters in the 2020 election, the turnout statistic, and the number of voters voting absentee. There also was a change sometime between Sept 8th and Sept 10th to the turnout statistic (presumably in response to my documenting the error and publishing it).

The entry from 2021-03-10 agrees with the official Election 2020 Summary Report dated 2021-01-25. The entry 2021-09-10 agrees with the Election 2020 Summary Report dated 2021-10-01. To their credit, ELECT has at least put a notice on the page for the Election Summary Reports that the report was updated on Oct 1st, but there is no explanation as to the reasoning for the revisions.

I would like to know why was ELECT modifying the results for months after the close of the election? Where did the extra ~73K ballots come from that were added to the results on April 9th? Where did the 81% come from?

Categories
Election Data Analysis Election Forensics Election Integrity programming technical

Another Interesting VA Election Data Discrepancy

On a spur of curiosity I went back to some of the data provided by the VA dept of elections (“ELECT”) for both the 2020 and 2021 elections and ran a new data consistency test …

I have a copy of the final Daily Absentee List (DAL) for both 2020 and 2021. I also have a copy of the paired Registered Voter List (RVL) and Voter History List (VHL) generated shortly after the close of the 2021 General Election and within a few moments of each other.

I was curious what the percentage of approved and counted absentee ballots from the DAL is that do NOT have an associated “voter credit” in the VHL for both 2020 and 2021. If ELECT’s data is accurate the number should be ideally 0, but most official thresholds for acceptability that I’ve seen for accuracy in election data systems hover somewhere around 0.1%. (0.1% is a fairly consistent standard that I’ve seen per the documentation for various localities Risk Limiting Audits, and the Election Scanner Certification procedures, etc.) The VHL should cover all of the activity for the last four years, but to ensure that I’m accounting for people that might have been officially removed from the RVL and VHL since the 2020 election (due to death, moving out of state, etc), I only run this test on the subset of the entries in the DAL that still have a valid listings in the RVL.

The results are below. Both years seem to have a high amount of discrepancies compared to the 0.1% threshold, with 2020’s discrepancy percentage being over 3x the percentage computed for 2021.

YearPercent of Counted DAL Ballots without Voter Credit
20201.352%
20210.449%

For those interested in the computation, the MATLAB pseudo-code is given below. I can’t actually link to the source data files because of VA’s draconian restrictions on redistributing the contents of the DAL, RVL and VHL data files.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% We first compute the indices of the DAL entries that represent 
% approved and countable ballots ...
%
% 'dal2020' and 'dal2021' variables are the imported DAL tables 
% 'VAVoteHistory' is the imported Voter History List
% 'RegisteredVoterList' is the Registered Voter List
% 
% All four of the above are imported directly from the CSV 
% files provided from the VA Department of elections with 
% very little error checking save for obvious whitespace or 
% line ending checks, etc.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

aiv2021 = dal2021.APP_STATUS == 'Approved' & dal2021.BALLOT_STATUS == 'Issued';
amv2021 = dal2021.APP_STATUS == 'Approved' & dal2021.BALLOT_STATUS == 'Marked';
aomv2021 = dal2021.APP_STATUS == 'Approved' & dal2021.BALLOT_STATUS == 'On Machine';
appv2021 = dal2021.APP_STATUS == 'Approved' & dal2021.BALLOT_STATUS == 'Pre-Processed';
afwv2021 = dal2021.APP_STATUS == 'Approved' & dal2021.BALLOT_STATUS == 'FWAB';
counted2021 = amv2021 | aomv2021 | appv2021 | afwv2021; % Approved and Countable
    
aiv2020 = dal2020.APP_STATUS == 'Approved' & dal2020.BALLOT_STATUS == 'Issued';
amv2020 = dal2020.APP_STATUS == 'Approved' & dal2020.BALLOT_STATUS == 'Marked';
aomv2020 = dal2020.APP_STATUS == 'Approved' & dal2020.BALLOT_STATUS == 'On Machine';
appv2020 = dal2020.APP_STATUS == 'Approved' & dal2020.BALLOT_STATUS == 'Pre-Processed';
afwv2020 = dal2020.APP_STATUS == 'Approved' & dal2020.BALLOT_STATUS == 'FWAB';
counted2020 = amv2020 | aomv2020 | appv2020 | afwv2020; % Approved and Countable

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Next we compute the indices in the VHL that represent 
% 2020 and 2021 General Election entries for voter credit
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
valid_2020_entries = strcmpi(strtrim(string(VAVoteHistory.ELECTION_NAME)), '2020 November General');
valid_2021_entries = strcmpi(strtrim(string(VAVoteHistory.ELECTION_NAME)), '2021 November General');


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% We use the MATLAB intersect function to make sure that 
% we are only using DAL entries that are still in the RVL 
% and therefore are possible to be present in the VHL and 
% compute the percentages.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
[did,iida,iidb] = intersect(dal2020.identification_number(counted2020), ...
    RegisteredVoterList.IDENTIFICATION_NUMBER);
[vid,iida,iidb] = intersect(VAVoteHistory.IDENTIFICATION_NUMBER(valid_2020_entries),...
    RegisteredVoterList.IDENTIFICATION_NUMBER);
[iid,iida,iidb] = intersect(did,vid);
pct2020 = (1-numel(iida) / numel(did)) * 100

[did,iida,iidb] = intersect(dal2021.identification_number(counted2021), ...
    RegisteredVoterList.IDENTIFICATION_NUMBER);
[vid,iida,iidb] = intersect(VAVoteHistory.IDENTIFICATION_NUMBER(valid_2021_entries),...
    RegisteredVoterList.IDENTIFICATION_NUMBER);
[iid,iida,iidb] = intersect(did,vid);
pct2021 = (1-numel(iida) / numel(did)) * 100
Categories
Election Data Analysis Election Forensics Election Integrity programming technical

Update on 2021 VA Election Data Comparisons

Originally Posted 01-31-2022.

As of 03-20-2022, the latest version of my master spreadsheet posted below.

This includes the updates from my discussions with the York County registrar (see here), the comparison to DAL file entries (here), and the latest updates and comparisons to Statement of Results (SOR) / Machine Tape records, including ABLEMARLE, ACCOMACK COUNTY (partial), FAIRFAX COUNTY, FREDERICK COUNTY, GOOCHLAND COUNTY, HENRICO COUNTY (partial), ISLE OF WIGHT, LOUDOUN COUNTY (partial), LOUISA COUNTY, NORFOLK CITY, ORANGE COUNTY, POWHATAN COUNTY (partial), PRINCE WILLIAM COUNTY, STAFFORD COUNTY, SPOTSYLVANIA COUNTY, SUFFOLK COUNTY and YORK COUNTY. There’s more to process, but I wanted to put out a status update.

The previous result between the “official” CSV results and the “official” DAL file results has not changed at 5,766 Net discrepancies and 17,194 absolute. Latest update includes some cleaning up of the spreadsheet formulas and making sure that I am only accounting for discrepancies for those rows where we actually have verified transcriptions and/or tape images. The current discrepancies in the records between the CSV and the SOR / Machine Tapes is 4,549 Net discrepancies and 27,905 Absolute discrepancies. “Net” discrepancies are summed such that +/- deltas per candidate can cancel out, whereas “Absolute” discrepancies are summed using the absolute value. I have only been focusing on the Governors race at the moment and not any of the down-ballot races. Again many many “thank you’s” to the volunteers at VFA who have been helping to put all of this together.

Categories
Election Data Analysis Election Forensics Election Integrity programming technical Uncategorized

VA 2020 and 2021 Machine Tapes Compared to Official Results for a single VA County

(Updates in red text as of 01-27-2022)

Walt Lantham, the registrar of York county (yes … it’s York County … don’t worry … I asked Walt if he was ok with me identifying him and the locality publicly before making this update) generously reached out and contacted me to try and clear up some of the original numbers I posted below. We had a long and detailed couple of phone calls (over two hours in total) and went over all over these results in detail. It turns out that I was missing the hand counted ballot tally sheets from the data that I had received. This cleared up most of the technical issues with the 2021 data and all but 1 issue with the data below for 2020. The remaining issue is the discrepancy with the official State CSV files, which is outside of Walt’s ability to diagnose. Our discussion also brought up some of the interesting procedural issues and challenges related to data reporting, retention, maintenance and general practices as well. I’ll touch on that in a bit. It was a great conversation and a big thank you to Walt and his staff for taking the time to address these issues in detail.

Special thanks again to the grassroots folks in VFA, and throughout the state, for all their hard work in collecting the machine tape information for various localities. There were a few localities that provided not only the 2021 machine tapes, but the 2020 versions as well! The first one of those that I have compiled all the data for is reported below. As with all of my results, I invite those that wish to replicate, validate or refute to contact me and I will happily provide the necessary details and data files.

I believe we have some significant issues with our elections in VA. The data below helps clearly make that case, in conjunction with the multiple other data analyses that I’ve done and posted here on my blog. The data below is only from a single locality in the state (I am working on others). I am purposefully summarizing the data and abstracting which locality this represents, as I do not want to jeopardize any potential legal proceedings or official investigations. But I do feel strongly that the “fact of” this information needs to be made public and become part of the discussion on the matter. (And … just to be clear … the background image for this blog post is the tapes from a different locality in 2021 only, which I discussed here)

I’ll note that the machine tapes and Statements of Results (SORs) are the “gold standard” to measure the published election results against in VA. Once a vote is cast on a machine, there is no process for it to be removed from the official vote count. Each ballot category is supposed to be scanned into a set of dedicated machines. Provisional ballots have their own dedicated machines that they get scanned into. Mail-In ballots have their own dedicated machines. Early In-Person ballots get their own machines, as do the In-Person Early and Day-Of ballots.

Ballots that cannot be scanned for some reason must be hand counted. This hand counting and verification is performed at the post-election canvass. In my discussion with Walt, we noted the fact that there is no requirement for the hand-count tally worksheets used during the canvass to be included in the statement of results reports that are available for the public to view. There is also no mechanism for the state to track or delineate the numbers of hand-counted ballots, or way for voters to know the various reasons why ballots needed to be hand counted in the first place. But luckily for me … Walt keeps good notes and records, and e-mailed me a copy of the Tally sheets that he preserved from the canvass for my records and to fill in the data gaps.

There should be zero difference from the sum of these reports (including the hand counts) to the reported official results, save for very very very small and rare documented errors and issues. Maaaaaybe 1 or 2 differences here and there, but nowhere near the results presented below when compared to the State CSV files. I have talked to multiple elections officers, public officials, registrars and lawyers and ALL of them have agreed with this sentiment.

Input Data:

The official results are supplied by the VA Dept. of Elections (ELECT) website in a csv file for 2020 (here) and 2021 (here). Additional versions of the official results are reported through a web portal to the ELECT historical archive for 2020 (here). Turnout numbers are supplied similarly from csv files on the ELECT servers for 2020 (here) and 2021 (here). Note that the official CSV and WEB tally results for 2020 show significant differences, yet both are supposed to reflect the “official” results. That, on its own, is a significant issue that warrants further inspection. I’ve computed the deviations between both data sets for this locality, so it’s dealers choice as to which dataset you want to compare against. For 2021 the CSV results are the only ones available.

SOR and machine tape information was obtained by volunteers across the state going to their local registrar or county clerks office. For some counties complete datasets of both 2021 and 2020 election results were obtained, with hand transcriptions of tape returns as well as photographic copies of the originals. The results reported below are for a moderately sized York County and were derived from both hand transcribed results and photographic copy of the machine tapes for both 2020 and 2021. The Statement of Results (SOR) reports and change logs from the county do not document any issues that would account for the discrepancies below. There were no issues presented at the canvass (where the tapes and SOR’s are checked against the state database entries for errors) that would explain the results below that I am aware of. As noted above per my discussions with Walt, and with him providing the copies of the hand-count tally worksheets, we were able to address most of the discrepancies that I came across except for the discrepancy of the 2020 data with the CSV version of the official results posted by VA ELECT. In fact, Walt’s corrections made the discrepancy with the CSV results slightly worse, even though they fixed the discrepancies with the WEB based portal results.

Results:

The tables below show the computed differences of the VA Dept of Elections data minus the collected machine tape results, stratified by major candidate party. The 2020 results are for office of president, the 2021 results for Governors race. The 2021 and 2020 Web results were strictly positive differences, so there is no distinction between the Sum of the differences (the NET change) and the Sum of the Absolute Value of the differences (how many votes were affected). Whereas, the 2020 CSV results had some negative differences, so there are two different ways to compute the effects of these deviations.

The 2021 Governor’s race showed a smaller amount of % deviation from the machine tapes than the 2020 Presidential race for this locality, but still show non-zero differences. (Remember, there should be zero difference.)

It is notable to mention that in 2021 the reporting was changed such that absentee mail-in and early-vote totals are separated in the reported data. The 2021 In-Person Day-Of results were identical to the reported results presented by the VA Department of Elections, with the In-Person Early vote only showing a single vote discrepancy. Almost all of the differences are manifested in the Mail-In or Provisional ballots for the 2021 data.

Likewise for 2020 there is very little difference (a single vote again) in the In-Person Day-Of precinct totals from the reported results. Almost all of the differences can be attributed to Central Absentee Precinct (CAP) data when compared against the state’s official CSV report.

2021 Results

YearPRECINCTVDOE-TAPE difference for Democratic candidateVDOE-TAPE difference for Libertarian candidateVDOE-TAPE difference for Republican candidate
2021Total Differences Provisional22 0018 0
2021Total Differences AB – Central Absentee Precinct25 009 0
2021Total Differences EV – Central Absentee Precinct001
2021Total Differences PE – Central Absentee Precinct1 002 0
2021Total Differences Day-Of Precincts000
Total Differences48 0030 1
Machine Tape Totals12142 1219015017455 17484
% of Machine Tape Totals48/12142*100 = 0.40%
0/12190*100 = 0.00%
0/150*100 = 0.00%30/17455*100 = 0.17%
1/17484*100 = 0.006%
Differences of 2021 Tape Counts from Official Results (CSV)

2020 Results (ELECT Web Portal Archive)

YearPRECINCTVDOE-TAPE difference for Democratic candidateVDOE-TAPE difference for Libertarian candidateVDOE-TAPE difference for Republican candidate
2020Total Differences AB – Central Absentee Precinct234 010 112 0
2020Total Differences Provisional2 001 0
2020Total Differences Day-Of Precincts001
Total Differences236 010 0114 1

Machine Tape Totals17447 17683670 68020027 20240

% of Machine Tape Totals236/17447*100 = 1.35%
0/17447*100 = 0.00%
10/670*100 = 1.49%
0/670*100 = 0.00%
114/20027*100 = 0.57%
1/20027*100 = 0.005%
Differences of 2020 Tape Counts from Official Results (WEB)

2020 Results (ELECT CSV Report)

YearPRECINCTVDOE-TAPE difference for Democratic candidateVDOE-TAPE difference for Libertarian candidateVDOE-TAPE difference for Republican candidate
2020Total Differences AB – Central Absentee Precinct77 157-1 1111 101
2020Total Differences Provisional-114 116-1 1-52 53
2020Total Differences Day-Of Precincts001
Total Differences-37 2732 1264 155

Machine Tape Totals17447 17683670 68020027 20240

% of Machine Tape Total-37/17447*100 = -0.21%
273/17683*100 = 1.54%
-2/670*100 = -0.30%
12/680*100 = 1.80%
64/20027*100 = 0.32%
155/20240*100 = 0.77%
Abs Val Total Differences191 2732 12164 155
Abs Val % of Machine Tape Totals191/17447*100 = 1.09%
273/17683*100 = 1.54%
2/670*100 = 0.30%
12/680*100 = 1.80%
164/20027*100 = 0.82%
155/20240*100 = 0.77%
Differences of 2020 Tape Counts from Official Results (CSV)

Categories
Election Data Analysis Election Forensics Election Integrity programming technical

Latest tallies for VA 2021: 17,194 ballot discrepancies in official reported data … so far …

The more I dig, the more I am disturbed by the VA Department of Elections (a.k.a. “ELECT”) lack of data integrity and “data hygiene”. If any private citizen or company was as reckless and error prone with financial or other legal records, they’d be in jail and fined out of existence. And I’m being very very very generous as I am assuming “innocent” incompetence on the part of our elected officials and ELECT. (Although I’m not exactly sure which is worse … discovering that there is a hypothetical cabal of bad actors responsible behind the curtain, or that the vast majority of our elected officials are simply just incompetent, and the system as-is is unreliable.)

Many have been asking me for updates and status as to the various datasets I’ve been working on. Process is slow going, as I do this in my spare time, but progress IS being made. (With specific thanks to all of my peeps over at VFA on telegram who have stepped up to the plate in collecting machine tape transcriptions and images.)

I’ve previously posted on basic stats and other findings (here, here, here, here, here).

I have been working through all of the data and combined inputs from:

There is a whole lot of work left to do to digitize and transcribe the machine tapes and compare them to the results of the data as presented by ELECT. But even ignoring the machine tapes for the moment, we can see from the attached master spreadsheet that there are a NET 5,766 and ABSOLUTE 17,194 ballot accounting discrepancies between the various ELECT provided datasets for the 2021 Election.

Columns L,M,N,O,P are derived from the “2021 November General.csv” file as of  12-11-2020 provided by VA ELECT on their website
Columns S-Y are derived from the “Turnout-2021 November General.csv” file as of 12-12-2021 provided by VA ELECT on their website
Columns Z-AP are derived from the Daily Absentee List (DAL) file provided by VA ELECT, downloaded 2021-12-05T11-22-04
Columns H-K are generated from in person inspection of the machine tapes for each precinct
Columns E-G are the differences between the machine tapes and the sums reported in the (2021 November Genreal.csv” file 
Column Q is the difference between the CAP precinct data reported in “2021 November General.csv” and the corresponding “Turnout-2021 November General.csv” or DAL derived data.

For Rows with precinct == “## Provisional” : Q is the difference between column P and the sum of all of the column S for the corresponding precincts.

For Rows with precinct == “## AB – Central Absentee Precinct” : Q is equal to column AM minus column P for both that row and the corresponding row of Post-Election-Day (PE) counts

For Rows with precinct == “## EV – Central Absentee Precinct” : Q is equal to column AN minus column P
Column R is the absolute value of column Q

Summing column Q (the net discrepancy) gives 5,766

Summing column R (the absolute discrepancy) gives 17,194

I stress again that this is based on data directly supplied by the VA Department of Elections. This is not based on estimates or approximations. There should be Zero (as in Zip, Zilch, Nada) discrepancies between their own distinct datasets that are supposed to be representing the same real world event (the election).

Categories
Election Data Analysis Election Forensics Election Integrity programming technical

Manassas City, VA : Mismatch between 2021 Absentee Counts and Machine Tape Records

By law in VA copies of the return sheets and machine tapes, or “Statement of Results” (SOR), from tabulation machines are public record and must be made available for 60 days after the election. (see VA 24.2-658) A grassroots team of a number of volunteers (coordinated through Virginia First Audits) made a concerted effort to collect images of a number of these tapes for specific localities of interest. I have been going through these tapes and comparing the results they show with the official results presented by the VA Department of Elections. The 60 day window ends Jan 1, 2022. If you would like to assist in recording and preserving SOR and Tape information, please see this telegram channel https://t.me/VA2021ElectionEvidence/139.

The reported totals of counted votes on the tapes should match exactly to what is reported as counted votes by the VA Department of Elections, after correctly omitting over-votes/under-votes and separating out provisional ballots (which is simple to do with the way VA reported its data). I have confirmed this should be the case via multiple email and phone conversations with election experts and officials, in an effort to make sure that I wasn’t missing some obvious process or fact that could account for the discrepancies.

We were not allowed to photograph the SOR for Manassas City, but we were allowed to view and transcribe the results by hand. Myself and two other volunteers viewed and transcribed these results on 12-23-2021. There is only 1 congressional district in Manassas City, so there is only one set of Central Absentee Precinct data, and there are 6 physical precincts.

SOR and Machine Tape Data:

PRECINCTTape DemocraticTape LiberationTape RepublicanTape UNDEFINEDTotal
## Provisional1312016
##AB – Central Absentee Precinct9021424911166
##EV – Central Absentee Precinct213420159243750
##PE – Central Absentee Precinct28113042
001 – DEAN53583750918
002 – WEEMS5531656401133
003 – METZ568863521213
004 – HAYDON527858221119
005 – BALDWIN46184823954
006 – ROUND42325522979
Table 1: Transcribed Statements of Results (SOR) and Machine Tape data for Manassas City. Collected 12-23-2021.

VA Dept of Elections Data (Certified Official Election Results):

PRECINCTVDOE DemocraticVDOE LiberationVDOE RepublicanVDOE UNDEFINEDTotal
## Provisional1312016
##AB – Central Absentee Precinct9131525111180
##EV – Central Absentee Precinct213420159243750
##PE – Central Absentee Precinct28115044
001 – DEAN53583750918
002 – WEEMS5531656401133
003 – METZ568863521213
004 – HAYDON527858221119
005 – BALDWIN46184823954
006 – ROUND42325522979
Table 2: Official VA Department of Elections data https://apps.elections.virginia.gov/SBE_CSV/ELECTIONS/ELECTIONRESULTS/2021/2021%20November%20General%20.csv

Results:

PRECINCT(VDOE-Tape) Democratic(VDOE-Tape) Liberation(VDOE-Tape) Republican(VDOE-Tape) UNDEFINEDTotal% Diff from VDOE Total
## Provisional000000.00%
##AB – Central Absentee Precinct11120141.19%
##EV – Central Absentee Precinct000000.00%
##PE – Central Absentee Precinct002024.55%
001 – DEAN000000.00%
002 – WEEMS000000.00%
003 – METZ000000.00%
004 – HAYDON000000.00%
005 – BALDWIN000000.00%
006 – ROUND000000.00%
Table 3: Comparison of Results between SOR and VA Dept of Elections Data
Categories
Election Data Analysis Election Forensics Election Integrity programming technical

Prince William County, VA : Mismatch between 2021 Absentee Counts and Machine Tape Records

By law in VA copies of the return sheets and machine tapes, or Statement of Results (SOR), from tabulation machines are public record and must be made available for 60 days after the election. (see VA 24.2-658) A grassroots team of a number of volunteers (coordinated through Virginia First Audits) made a concerted effort to collect images of a number of these tapes for specific localities of interest. I have been going through these tapes and comparing the results they show with the official results presented by the VA Department of Elections. I started with Prince William County, which is where I live.

The reported totals of counted votes on the tapes should match exactly to what is reported as counted votes by the VA Department of Elections, after correctly omitting over-votes/under-votes and separating out provisional ballots (which is simple to do with the way VA reported its data). I have confirmed this should be the case via multiple email and phone conversations with election experts and officials, in an effort to make sure that I wasn’t missing some obvious process or fact that could account for the discrepancies.

An example of the information included on these tapes is shown below in Figure 1. The exposed section of the far left tape shows the vote totals for that machine, per candidate, summarized by House and Congressional district combination. The exposed section of the middle tape in the image shows the report summary of the total ballots counted for each House and Congressional district combination. The exposed section of the far right tape top in the image shows the signatures of the election officials on the tape, and the bottom show the introductory information at the beginning of the report that identifies the machine information. Each machine is dedicated to a specific precinct and ballot type and are labeled as such.

Figure 1: Example set of voting machine tapes from Prince William County 2021 VA Election.

Computation of Results

The machine tape images corresponding to absentee ballot categories (“ABS” – Mail-In Absentee, “EARLY” – In Person Early Voting, and “PE” – Mail-In ballots received Post Election but with valid postmark) were examined by hand and the tabulated results for each precinct were extracted. These results were then compared with the reported totals from the VA Department of Elections website dated 12-11-2021 (https://apps.elections.virginia.gov/SBE_CSV/ELECTIONS/ELECTIONRESULTS/2021/2021%20November%20General%20.csv).

The “Margin” column in the tables below accounts for individual candidate totals that could not be read from the tape images due to imaging errors, but the end of tape reports listed the total number of ballots processed for that precinct so a maximum limit (“Margin”) of the number of votes is known. We are currently in the process of going back to the registrar and filling in the missing data.

All of the tables below correctly omit undervotes/overvotes and provisional ballot counts. (The machine tapes do not include provisional ballots.)

The results presented below in Table 1, Table 2 and Table 3 show a consistent pattern of the machine tapes and the Department of Elections data giving nearly identical matches to the In-Person Early vote, but different results for the

Prince William County – Congressional District 01


DemocratLibertarianRepublican TotalMargin
TAPE ABS5076891842702821
TAPE EARLY9748647639174510
TAPE PE19398634153





CSV ABS51559018847129
CSV EARLY974664763817448
CSV PE2511396360






PWC CD 01Est. Delta DemEst. Delta LibEst. Delta RepEst. Delta TotEst. Delta Tot – Margin
CSV ABS – TAPE ABS79142122101 (~ 1.42 % diff)
CSV EARLY – TAPE EARLY-20-1-3-3 (~ -.02 % diff)
CSV PE – TAPE PE584107219 (~ 5.28 % diff)
Table 1: 2021 VA Governors Race, Prince William County – CD 01

Prince William County, Congressional District 10


DemocratLibertarianRepublican TotalMargin
TAPE ABS22922899633160
TAPE EARLY5809286045118820
TAPE PE5742411934






CSV ABS23202810093357
CSV EARLY580928604511882
CSV PE79543127






PWC CD 10Est. Delta DemEst. Delta LibEst. Delta RepEst. Delta TotEst. Delta Tot – Inc Slop
CSV ABS – TAPE ABS280134141 (~ 1.22 % diff)
CSV EARLY – TAPE EARLY00000 (~ 0.00 % diff)
CSV PE – TAPE PE22119428 (~ 6.30 % diff)
Table 2: 2021 VA Governors Race, Prince William County – CD 10

Prince William County, Congressional District 11


DemocratLibertarianRepublican TotalMargin
TAPE ABS530793133367330
TAPE EARLY9780603578134180
TAPE PE15263629399






CSV ABS53759313146782
CSV EARLY978060357813418
CSV PE2371063310






PWC CD 11Est. Delta DemEst. Delta LibEst. Delta RepEst. Delta TotEst. Delta Tot – Inc Slop
CSV ABS – TAPE ABS680-194949 (~ 0.72 % diff)
CSV EARLY – TAPE EARLY00000 (~ 0.00 % diff)
CSV PE – TAPE PE8542711617 (~ 5.48 % diff)
Table 3: 2021 VA Governors Race, Prince William County – CD 11

Source Data:

Raw Machine Tape Images

2021 November General – 2021-12-11T15:58:46

Categories
Election Data Analysis Election Forensics Election Integrity programming technical

Compiled per precinct data for VA 2020 and 2021

Below I’ve linked copies of newly compiled sets of VA 2020 President and 2021 Governor November General Election data for anybody who is interested. The data set is compiled from only official VA Dept of Elections data sources including the General Election CSV files, the turnout CSV files, and the Daily Absentee List for each election.

Columns common to all source data:

  • “LOCALITY” – Locality Name
  • “DISTRICT” – Congressional district ID number
  • “PRECINCT_ID” – Precinct ID number
  • “PRECINCT” – Precinct Name

Columns compiled from the General Election CSV file:

  • “Democratic” – Number of Democratic Party Votes
  • “Liberation” – Number of Liberation Party Votes
  • “Republican”- Number of Republican Party Votes
  • “UNDEFINED” – Number of Write in Votes
  • “TOTAL_VOTES” – Sum of [“Democratic”, “Liberation”, “Republican”, “UNDEFINED”]

Columns compiled from the Turnout CSV file:

  • “PROVISIONAL_BALLOTS” – Number of Provisional Ballots
  • “ABSENTEE_BALLOTS” – Number of Absentee Ballots
  • “CURBSIDE_BALLOTS” – Number of Curbside Ballots
  • “ABS_OR_CURB_BALLOTS” – Sum of [“ABSENTEE_BALLOTS”, “CURBSIDE_BALLOTS”]
  • “TOTAL_VOTE_TURNOUT” – Total number of ballots coollected
  • “ACTIVE_REGISTERED_VOTERS” – The number of active voters
  • “TOTAL_REGISTERED_VOTERS” – The total number of registered voters

Columns compiled from the DAL file:

  • “ABSENTEE_ISSUED” – Number of approved but outstanding “Issued” ballots listed in the DAL
  • “ABSENTEE_MARKED” – Number of approved ballots listed as “Marked” in the DAL
  • “ABSENTEE_PRE_PROCESSED” – Number of approved ballots listed as “Pre-Processed” in the DAL
  • “ABSENTEE_MRKORPRE” – Sum of [“ABSENTEE_MARKED”, “ABSENTEE_PRE_PROCESSED”]
  • “ABSENTEE_EARLY_IN_PERSON” – Number of approved ballots listed as “On Machine” in the DAL
  • “ABSENTEE_FWAB” – Number of approved ballots listed as “FWAB” in the DAL
  • “ABSENTEE_COUNTABLE” – Sum of [“ABSENTEE_MARKED”, “ABSENTEE_PRE_PROCESSED”, “ABSENTEE_EARLY_IN_PERSON”, “ABSENTEE_FWAB”]

Columns created by summing all entries in a locality:

  • “TOTAL_VOTES_IN_LOCALITY”
  • “ACTIVE_REGISTERED_VOTERS_IN_LOCALITY”
  • “TOTAL_REGISTERED_VOTERS_IN_LOCALITY”
  • “ABSENTEE_ISSUED_IN_LOCALITY”
  • “ABSENTEE_MARKED_IN_LOCALITY”
  • “ABSENTEE_PRE_PROCESSED_IN_LOCALITY”
  • “ABSENTEE_MARKED_OR_PRE_IN_LOCALITY”
  • “ABSENTEE_EARLY_IN_PERSON_IN_LOCALITY”
  • “ABSENTEE_FWAB_IN_LOCALITY”
  • “ABSENTEE_COUNTABLE_IN_LOCALITY”

Source Data Files (except DAL):

Categories
Election Data Analysis Election Forensics Election Integrity programming technical

VHL, RVL and DAL metrics for VA 2021 and (2020) General Elections

As I now have available the Registered Voter List (RVL), the Voter History List (VHL) and the completed Daily Absentee List (DAL) files for the 2020 and 2021 VA November General elections, I did some basic metrics computations on these files for those that are interested. The Voter History and Registered Voter List files used to generate these results were downloaded from VA Dept of Elections on 2021-12-11. The 2021 DAL file used was downloaded 2021-12-05. The 2020 DAL file was downloaded 2020-11-09.

% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Basic Voter History and Registered Voter List Metrics
nInPDayOf2020 = 1638026 
nAbs2020 = 2734048 
nInPAbsOf2020 =  657 

nInPDayOf2021 = 1721868 
nAbs2021 = 1188842 
nInPAbsOf2021 = 0 

nUniqueVHIDs = 4769508 
nUniqueRVLIDs = 5956464 
nIntersectUVVHwURVL = 4769508 

% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Ages of the registered voters at the time of the election from RVL
regv_over100_On2021Elect = 2736 
regv_regv_lessThan18_On2021Elect = 397 
regv_over100_On2020Elect = 1774 
regv_lessThan18_On2020Elect = 43939 
regv_lessThan17_On2020Elect = 391

% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Ages of the registered voters at the time of the election from VHL
vh_over100_On2021Elect = 514 
vh_lessThan18_On2021Elect = 0 
vh_over100_On2020Elect = 558 
vh_lessThan18_On2020Elect = 1 
vh_lessThan17_On2020Elect = 0

% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Do some basic indexing of different DAL status categories and combinations
nApprovedAndIssued2021 = 58151 
nApprovedAndMarked2021 = 78278 
nApprovedAndOnMachine2021 = 861470 
nApprovedAndPreProcessed2021 = 254374 
nApprovedAndFWAB2021 = 52 
nApprovedAndCountable2021 = 1194174 

nApprovedAndIssued2020 = 106037 
nApprovedAndMarked2020 = 348705 
nApprovedAndOnMachine2020 = 1796973
nApprovedAndPreProcessed2020 = 670765
nApprovedAndFWAB2020 = 1351
nApprovedAndCountable2020 = 2817794

% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% How many countable entries in the 2021 dal are not contained in the voter
% registraion file for 2021
phantomDALVoters2021 = 0

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% How many countable entries in the 2020 dal are not contained in the voter
% registraion file for 2021
phantomDALVoters2020 = 48721

% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% How many approved but not counted entries in the 2020 dal are not
% contained in the voter registraion file for 2021?  
phantomDALBallots2020 = 3671

% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% How many absentee ballots were issued but not counted or spoiled AND the
% person is marked as having voted in person on election day
% For 2020:
numUnspoiled2020 = 106037
numUnspoiledAndInPDayOf2020 = 35806

% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% How many absentee ballots were issued but not counted or spoiled AND the
% person is marked as having voted in person on election day
% For 2021:
numUnspoiled2021 = 58151
numUnspoiledAndInPDayOf2021 = 4632

% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% How many absentee ballots were approved and counted AND the
% person is marked as having voted IN PERSON on election day. (Should be 0)
% For 2020:
numCountedAbs2020 = 2817794
numCountedAbs2020AndInPDayOf2020 = 245

% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% How many absentee ballots were approved and counted AND the
% person is marked as having voted IN PERSON on election day. (Should be 0)
% For 2021:
numCountedAbs2021 = 1194174
numCountedAbs2021AndInPDayOf2021 = 14

The MATLAB program listing for generating these metrics is below. I have left off the parsing routines that parse the various csv input files into MATLAB Table objects. Text highlighted in BOLD is the computed metric results as listed above.

% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Basic Voter History and Registered Voter List Metrics
inPDayOf2020 = VAVoteHistory.ELECTION_NAME=='2020 November General' & ...
    VAVoteHistory.ELECTION_YEAR==2020 & ...
    VAVoteHistory.ABSENTEE == 'False';
abs2020 = VAVoteHistory.ELECTION_NAME=='2020 November General' & ...
    VAVoteHistory.ELECTION_YEAR==2020 & ...
    VAVoteHistory.ABSENTEE == 'True';
inPAbsOf2020 = VAVoteHistory.ELECTION_NAME=='2020 November General' & ...
    VAVoteHistory.ELECTION_YEAR==2020 & ...
    VAVoteHistory.ABSENTEE == 'True' & VAVoteHistory.VOTE_IN_PERSON=='True';
nInPDayOf2020 = sum(inPDayOf2020)
nAbs2020 = sum(abs2020)
nInPAbsOf2020 = sum(inPAbsOf2020)
inPDayOf2021 = VAVoteHistory.ELECTION_NAME=='2021 November General' & ...
    VAVoteHistory.ELECTION_YEAR==2021 & ...
    VAVoteHistory.ABSENTEE == 'False';
abs2021 = VAVoteHistory.ELECTION_NAME=='2021 November General' & ...
    VAVoteHistory.ELECTION_YEAR==2021 & ...
    VAVoteHistory.ABSENTEE == 'True';
inPAbsOf2021 = VAVoteHistory.ELECTION_NAME=='2021 November General' & ...
    VAVoteHistory.ELECTION_YEAR==2021 & ...
    VAVoteHistory.ABSENTEE == 'True' & VAVoteHistory.VOTE_IN_PERSON=='True';
nInPDayOf2021 = sum(inPDayOf2021)
nAbs2021 = sum(abs2021)
nInPAbsOf2021 = sum(inPAbsOf2021)
uvhid = unique(VAVoteHistory.IDENTIFICATION_NUMBER);
urvlid = unique(RegisteredVoterList.IDENTIFICATION_NUMBER);
nUniqueVHIDs = numel(uvhid)
nUniqueRVLIDs = numel(urvlid)
intersectUVVHwURVL = intersect(uvhid,urvlid);
nIntersectUVVHwURVL = numel(intersectUVVHwURVL)

% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Ages of the registered voters at the time of the election from RVL
dob = datenum(RegisteredVoterList.DOB);
ageOn2021Elect = (datenum("11/03/2021") - dob) / 365.25;
ageOn2020Elect = (datenum("11/04/2020") - dob) / 365.25;

regv_over100_On2021Elect = sum(ageOn2021Elect > 100)
regv_lessThan18_On2021Elect = sum(ageOn2021Elect < 18)
regv_over100_On2020Elect = sum(ageOn2020Elect > 100)
regv_lessThan18_On2020Elect = sum(ageOn2020Elect < 18)
regv_lessThan17_On2020Elect = sum(ageOn2020Elect < 17)

% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Ages of the registered voters at the time of the election from VHL
vh2021 = VAVoteHistory.ELECTION_NAME=='2021 November General' & ...
    VAVoteHistory.ELECTION_YEAR==2021;
vh2020 = VAVoteHistory.ELECTION_NAME=='2020 November General' & ...
    VAVoteHistory.ELECTION_YEAR==2020;
dob_vh = datenum(VAVoteHistory.DOB);
ageOn2021Elect_vh = (datenum("11/03/2021") - dob_vh) / 365.25;
ageOn2020Elect_vh = (datenum("11/04/2020") - dob_vh) / 365.25;

vh_over100_On2021Elect = sum(ageOn2021Elect_vh(vh2021) > 100)
vh_lessThan18_On2021Elect = sum(ageOn2021Elect_vh(vh2021) < 18)
vh_over100_On2020Elect = sum(ageOn2020Elect_vh(vh2020) > 100)
vh_lessThan18_On2020Elect = sum(ageOn2020Elect_vh(vh2020) < 18)
vh_lessThan17_On2020Elect = sum(ageOn2020Elect_vh(vh2020) < 17)

% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Do some basic indexing of different DAL status categories and combinations
aiv2021 = dal2021.APP_STATUS == 'Approved' & dal2021.BALLOT_STATUS == 'Issued';
amv2021 = dal2021.APP_STATUS == 'Approved' & dal2021.BALLOT_STATUS == 'Marked';
aomv2021 = dal2021.APP_STATUS == 'Approved' & dal2021.BALLOT_STATUS == 'On Machine';
appv2021 = dal2021.APP_STATUS == 'Approved' & dal2021.BALLOT_STATUS == 'Pre-Processed';
afwv2021 = dal2021.APP_STATUS == 'Approved' & dal2021.BALLOT_STATUS == 'FWAB';
appmv2021 = amv2021 | aomv2021 | appv2021 | afwv2021; % Approved and Countable

aiv2020 = dal2020.APP_STATUS == 'Approved' & dal2020.BALLOT_STATUS == 'Issued';
amv2020 = dal2020.APP_STATUS == 'Approved' & dal2020.BALLOT_STATUS == 'Marked';
aomv2020 = dal2020.APP_STATUS == 'Approved' & dal2020.BALLOT_STATUS == 'On Machine';
appv2020 = dal2020.APP_STATUS == 'Approved' & dal2020.BALLOT_STATUS == 'Pre-Processed';
afwv2020 = dal2020.APP_STATUS == 'Approved' & dal2020.BALLOT_STATUS == 'FWAB';
appmv2020 = amv2020 | aomv2020 | appv2020 | afwv2020; % Approved and Countable

nApprovedAndIssued2021 = sum (aiv2021)
nApprovedAndMarked2021 = sum (amv2021)
nApprovedAndOnMachine2021 = sum (aomv2021)
nApprovedAndPreProcessed2021 = sum (appv2021)
nApprovedAndFWAB2021 = sum (afwv2021)
nApprovedAndCountable2021 = sum (appmv2021)

nApprovedAndIssued2020 = sum (aiv2020)
nApprovedAndMarked2020 = sum (amv2020)
nApprovedAndOnMachine2020 = sum (aomv2020)
nApprovedAndPreProcessed2020 = sum (appv2020)
nApprovedAndFWAB2020 = sum (afwv2020)
nApprovedAndCountable2020 = sum (appmv2020)

% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% How many countable entries in the 2021 dal are not contained in the voter
% registraion file for 2021?
ua = unique(string(dal2021.identification_number(appmv2021)));
ub = unique(string(RegisteredVoterList.IDENTIFICATION_NUMBER));
[uc,ia,ib] = intersect(ua,ub);
phantomDALPct2021 = numel(uc) / numel(ua) * 100
phantomDALVoters2021 = numel(ua) - numel(uc)

% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% How many countable entries in the 2020 dal are not contained in the voter
% registraion file for 2021? 
[ua2020,ia,ic] = unique(string(dal2020.identification_number(appmv2020)));
ub2020 = unique(string(RegisteredVoterList.IDENTIFICATION_NUMBER));
[uc2020] = intersect(ua2020,ub2020);
phantomDALPct2020 = numel(uc2020) / numel(ua2020) * 100
phantomDALVoters2020 = numel(ua2020) - numel(uc2020)

% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% How many approved but not counted entries in the 2020 dal are not
% contained in the voter registraion file for 2021?  
[ua2020,ia,ic] = unique(string(dal2020.identification_number(aiv2020)));
ub2020 = unique(string(RegisteredVoterList.IDENTIFICATION_NUMBER));
[uc2020] = intersect(ua2020,ub2020);
phantomDALBallotsPct2020 = numel(uc2020) / numel(ua2020) * 100
phantomDALBallots2020 = numel(ua2020) - numel(uc2020)

% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% How many absentee ballots were issued but not counted or spoiled AND the
% person is marked as having voted in person on election day
% For 2020:
unspoiled2020 = dal2020.identification_number(aiv2020);
ngIPDOidx = VAVoteHistory.ELECTION_NAME=='2020 November General' & ...
    VAVoteHistory.ELECTION_YEAR==2020 & ...
    VAVoteHistory.ABSENTEE == 'False';
novGenInPersonDayOf2020 = VAVoteHistory.IDENTIFICATION_NUMBER(ngIPDOidx);
ua2020 = unique(unspoiled2020);
ub2020 = unique(novGenInPersonDayOf2020);
uc2020 = intersect(ua2020,ub2020);
numUnspoiled2020 = numel(unspoiled2020)
numUnspoiledAndInPDayOf2020 = numel(uc2020)

% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% How many absentee ballots were issued but not counted or spoiled AND the
% person is marked as having voted in person on election day
% For 2021:
unspoiled2021 = dal2021.identification_number(aiv2021);
ngIPDOidx = VAVoteHistory.ELECTION_NAME=='2021 November General' & ...
    VAVoteHistory.ELECTION_YEAR==2021 & ...
    VAVoteHistory.ABSENTEE == 'False';
novGenInPersonDayOf2021 = VAVoteHistory.IDENTIFICATION_NUMBER(ngIPDOidx);
ua2021 = unique(unspoiled2021);
ub2021 = unique(novGenInPersonDayOf2021);
uc2021 = intersect(ua2021,ub2021);
numUnspoiled2021 = numel(unspoiled2021)
numUnspoiledAndInPDayOf2021 = numel(uc2021)

% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% How many absentee ballots were approved and counted AND the
% person is marked as having voted IN PERSON on election day. (Should be 0)
% For 2020:
countedAbs2020 = dal2020.identification_number(appmv2020);
ngIPDOidx = VAVoteHistory.ELECTION_NAME=='2020 November General' & ...
    VAVoteHistory.ELECTION_YEAR==2020 & ...
    VAVoteHistory.ABSENTEE == 'False';
novGenInPersonDayOf2020 = VAVoteHistory.IDENTIFICATION_NUMBER(ngIPDOidx);
ua2020 = unique(countedAbs2020);
ub2020 = unique(novGenInPersonDayOf2020);
uc2020 = intersect(ua2020,ub2020);
numCountedAbs2020 = numel(countedAbs2020)
numCountedAbs2020AndInPDayOf2020 = numel(uc2020)

% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% How many absentee ballots were approved and counted AND the
% person is marked as having voted IN PERSON on election day. (Should be 0)
% For 2021:
countedAbs2021 = dal2021.identification_number(appmv2021);
ngIPDOidx = VAVoteHistory.ELECTION_NAME=='2021 November General' & ...
    VAVoteHistory.ELECTION_YEAR==2021 & ...
    VAVoteHistory.ABSENTEE == 'False';
novGenInPersonDayOf2021 = VAVoteHistory.IDENTIFICATION_NUMBER(ngIPDOidx);
ua2021 = unique(countedAbs2021);
ub2021 = unique(novGenInPersonDayOf2021);
uc2021 = intersect(ua2021,ub2021);
numCountedAbs2021 = numel(countedAbs2021)
numCountedAbs2021AndInPDayOf2021 = numel(uc2021)