Categories
Election Data Analysis mathematics programming technical

Derivation of Expected number of String Collisions in VA Registered Voter Data

Below I present the theory and derivation as to how I arrived at the expected value of 11 collisions (+/- 3) as mentioned in my posts discussing string distance analysis (here and here). I’ve tried to make the derivation below as digestible as possible, with accessible references, but it is admittedly still a very technical read. I think its important to “show my work” on the subject, though, and I present it here and am happy to take comments and criticism (contact).

Q: How much of a chance do we actually have of getting an exact (Hamming distance of 0) collision in the full name and full date of birth? Well, there is a similar and well known probability puzzle that asks how many random people do you need to approximately have a 50% chance of 2 of them sharing the same birthday (not including the year of birth). This is known as the “Birthday Problem” in probability theory, and rather surprisingly, the answer is that you only need about 23 people in your population sample to have a 50% probability that 2 of those people will share a day-of-year of birth. To quote the wikipedia article on the matter “… While it may seem surprising that only 23 individuals are required to reach a 50% probability of a shared birthday, this result is made more intuitive by considering that the birthday comparisons will be made between every possible pair of individuals. With 23 individuals, there are 23 × 22/2 = 253 pairs to consider, far more than half the number of days in a year.” The same mathematics of the birthday problem is the basis of the Birthday Attack cryptographic exploit, and it is therefore a well-studied problem in cryptography and cyber security.

Figure 1: The computed probability of at least two people sharing a birthday versus the number of people. A recreation of the classic “Birthday Problem”.

Now, as interesting as the toy birthday problem is as described above, it is over simplified for the problem we are looking at here. Firstly, the problem setup assumes independent and identically distributed random variable (e.g. an “IID” set of variables). While this is not exactly the case, the IID assumption provides for a computable first order estimate, and in the case of the classical birthday problem the estimate has been shown to be fairly accurate under experimental conditions.

Secondly, when we start additionally considering the year of birth, or sharing of first names, middle names and last names, things get much more complicated to compute, but the method is the same. We want to determine the probability of 2 people sharing the same First Name, Middle Name, Last Name, Suffix, Month-of-Birth, Day-of-Birth and Year-of-Birth in the population of unique registrants in the Registered Voter List. This means that in addition to the 365 day-of-birth possibilities, we need to estimate the number of possible years to cover, the number of possible first names, the number of possible middle names, the number of possible last names, the number of possible suffix strings and then include these possibilities into the same formulation as the birthday problem setup.

For determining how many years we should cover, I will simply use the average life expectancy of approximately 79 years. We can therefore update our N value of the birthday problem from 365 to 365 * 79 = 28835. When we perform the same analysis as the standard birthday problem with just this new parameter included, we end up needing 200 people in our sample population to have a 50% probability of of 2 people having a match.

Figure 2: The computed probability of at least two people sharing a birthday versus the number of people in the sample population. A recreation of the classic “Birthday Problem”, but we’ve updated the analysis to include the year of birth, and assumed the average life expectancy of 79 years. This moves the 50% crossover point to a population size of 200 from 23 for the standard Birthday Problem setup.

A similar analysis can be done with the number of names being considered, etc. For each (assumed independent and uniform) variable we add to the setup, we multiply the number of possible states (N) by the number of unique variable settings.

We can estimate the universe of possible names using the frequentist method from the RVL data itself: We know that we have 6,127,859 unique voter ID’s in the RVL, and there are 14 unique SUFFIX entries, 291,368 unique FIRST names, 405,591 unique MIDDLE names, and 465,185 unique LAST names. So multiplying out 365 x 79 x 14 x 291368 x 405591 x 465185 = 2.22 x 10^22 potential states to consider.

Now unfortunately, as we start dealing with bigger and bigger N values the ability of computers to maintain the necessary precision to carry out the mathematics for direct computation becomes harder and harder, eventually resulting in Infinite or divide-by-zero answers as the probabilities get smaller and smaller. So lets begin by first determining if we can find the 50% crossover point for the unique voter ID population size. We find that we only need 410 unique First, Middle, and Last names (each) to break the 50% probability limit.

Figure 3: The computed probability of at least two people sharing a first name, middle name, last name, suffix, month-of-birth, day-of-birth, year-of-birth versus the number of people in the sample population. This assumes the Nyears = 79, Nsuffix = 14, Nfirst = 410, Nmiddle = 410, Nlast = 410.

As we increase the number of unique (first, middle, last) names under consideration, we find that we very quickly reduce the probability to near zero (again … this is assuming an IID set of variables … more on that later). In fact we only need to assume that there are 1300 unique first names, middle names and last names before the probability drops to under 1%. This is two full orders of magnitude below the actual number of unique first names, middle names and last names (each) that we find by simple examination of the RVL file, so the actual probability of a collision under these conditions should be much, much, much lower. While not exactly zero, it is computationally indistinguishable from zero given machine precision. Note (again) that this formulation is still simplified in that it assumes a uniform distribution within the N possible states, but it still serves to give a first order approximation and sanity check.

Figure 4: The computed probability of at least two people sharing a first name, middle name, last name, suffix, month-of-birth, day-of-birth, year-of-birth versus the number of people in the sample population. This assumes the Nyears = 79, Nsuffix = 14, Nfirst = 1300, Nmiddle = 1300, Nlast = 1300.

As we start approaching the limit of computational precision we have to resort to approximation methods for computing the very small, but non-zero probability of collision given the actual number of unique first, middle and last names observed in the RVL dataset. We can use the Taylor series expansion for small powers in order to do this, and our equation for computing the probability becomes: Pb = 1 – exp(-k*(k-1) / (2 *N)).

Replicating our earlier example in Figure 4 above with Nfirst == Nmiddle == Nlast == 1300 to show the comparison of the Taylor expansion to the explicit computation produces the graphic in Figure 5 below. We see that the small value approximation is close, but slightly over-estimates the directly computed probability for IID variables.

Figure 5: The computed probability of at least two people sharing a first name, middle name, last name, suffix, month-of-birth, day-of-birth, year-of-birth versus the number of people in the sample population. This assumes the Nyears = 79, Nsuffix = 14, Nfirst = 1300, Nmiddle = 1300, Nlast = 1300.

When we perform this Taylor series approximation and look to find the number of records required in order to obtain a 50% probability that any 2 records would match given our updated universe of possible matches, we end up with requiring K = 176,000,000,000, or 176 Billion records. When we again try to evaluate the Taylor series for the explicit number of unique Voter ID’s present in the RVL file, which is just over 6M, we again obtain a number that is computationally indistinguishable from 0. (To be absolutely meticulous … its a bigger number that is indistinguishable from 0 than we previously computed, but it is still indistinguishable from zero.)

Figure 5: The computed probability of at least two people sharing a first name, middle name, last name, suffix, month-of-birth, day-of-birth, year-of-birth versus the number of people in the sample population. This assumes the Nyears = 79, Nsuffix = 14, Nfirst = 291368, Nmiddle = 405591, Nlast = 465185, and the computed 50% crossover point occurs at approximately 176 Billion samples required.

Another Implementation note: In order to explicitly code the above direct computations we also need to do some clever tricks with logarithms in order to avoid numerical overflow / underflow issues as much as possible. The formula for computing the permutations, which is N! / (N-K)! = N x (N-1) x … x (N-K+1) can have numerical issues when N becomes large. However if we take the base-10 logarithm of the equation, we can use the product and quotient rules of logarithms to compute the result and avoid numerical overflow: log10( N! / (N-K)! ) = log10(N!) – log((N-K)!) = log10(N) + log10(N-1) + … + … log10(N-K+1), which is a much more stable computation.

We can perform a similar trick in order to compute the denominator of N^k by using the power property of logarithms such that log10( N^k ) = k x log10(N).

You must of course remember to reverse the logarithm once you’ve computed the log-sums. So the final computation of Pb becomes the following:

Vnr = log10(N) + log10(N-1) + … + … log10(N-K+1), where N is the number of possible states N = 365 x Nyears x Nfirst x Nmiddle x Nlast x Nsuffix.

Vt = k x log10(N)

Pa_log10 = Vnr – Vt = log10(Pa) = log10(Vnr/Vt)

Pb = 1 – 10^(Pa)

Updating from uniform distributions to non-uniform distributions

So what happens when we take into account the fact that names and birthdays are not uniformly distributed? (e.g. the last name of “Smith” is more frequent than “Sandeval”) This fact increases the probability of a collision occurring in the dataset. This increase also makes intuitive sense as we can anecdotally observe that coincident names and birthdates, while still rare … do actually happen in real life with common names.

However, in the non-uniform case we don’t have as nearly of a nice closed set of formulas for computing the probability. What we can do instead to estimate the probability is perform a number of Monte Carlo simulations of selecting K values from the weighted possibilities, and determine how many collisions occurred in each simulation trial. By setting K equal to the number of unique Voter ID values in the RVL dataset, we can directly answer the question via simulation of “how many collisions of First+Middle+Last+Suffix+DOB should we expect when looking at the VA Registered Voter List file“?

We can determine the weightings for each variable easily enough from the distributions of unique values in the data itself.

The below MATLAB weightedCollisionSim(…) function is a program that can be used to perform this analysis. It assumes that the RVL table object is a global variable to setup the trials, and uses the MATLAB built-in randsample(…) function to perform each draw.

After 100 simulation runs, the results are that for the K=6,127,859 unique voter ID’s in the RVL, we should expect to have an average of about 11 collisions at Hamming distance of 0, with a standard deviation of roughly 3.

I will note that as a validation and verification step, the MATLAB simulation code below, when used with uniform sampling, produces similar results to what we analytically derived above.

function [p,m,s] = weightedCollisionSim(k,ntrials,varargin)
% To compute the probability the ntrials must be >> 1:
% [p,m,s] = weightedCollisionSim(k,ntrials,values1,weights1,...,values2,weights2)
% [p,m,s] = weightedCollisionSim(k,ntrials,Nvalues1,weights1,...,Nvalues2,weights2)
%
% OUTPUTS:
% p = Probability of a collision
% m = mean number of collisions
% s = standard deviation of collisions

if nargin == 0
    global rvl; % Assume the RVL is an available global var

    ntrials = 100; % Number of trials
   
    % Population size set as num of unique voter IDs in RVL
    npop = numel(unique(rvl.IDENTIFICATION_NUMBER));

    % Convert the DOB strings to datetime objects
    dob = datetime(rvl.DOB);

    % How many unique days of the year are there?
    [ud,uda,udb] = unique(day(dob,'dayofyear'));
    % How often do they occur?
    nud = accumarray(udb,1,[numel(ud),1]);
    Ndays = numel(ud);

    % How many unique years of birth are there?
    [uy,uya,uyb] = unique(year(dob));
    % How often do they occur?
    nuy = accumarray(uyb,1,[numel(uy),1]);
    Nyears = numel(uy);

    % How many unique suffix strings are there?
    [us,usa,usb] = unique(rvl.SUFFIX);
    % How often do they occur?
    nus = accumarray(usb,1,[numel(us),1]);
    Nsuffix = numel(us);

    % How many unique first names are there?
    [uf,ufa,ufb] = unique(rvl.FIRST_NAME);
    % How often do they occur?
    nuf = accumarray(ufb,1,[numel(uf),1]);
    Nfirst = numel(uf);

    % How many unique middle names are there?
    [um,uma,umb] = unique(rvl.MIDDLE_NAME);
    % How often do they occur?
    num = accumarray(umb,1,[numel(um),1])
    Nmiddle = numel(um);

    % How many unique last names are there?
    [ul,ula,ulb] = unique(rvl.LAST_NAME);
    % How often do they occur?
    nul = accumarray(ulb,1,[numel(ul),1]);
    Nlast = numel(ul);
        
    % Initializing the weighting vectors
    w0 = nus;
    w1 = nud;
    w2 = nuy;
    w3 = nuf;
    w4 = num;
    w5 = nul;

    % Recursively compute results and return
    [p,m,s] = weightedCollisionSim(npop,ntrials,1:Nsuffix,w0,1:Ndays,w1,1:Nyears,w2,...
        1:Nfirst,w3,1:Nmiddle,w4,1:Nlast,w5);
    return
end

if nargin < 2 || isempty(ntrials)
    ntrials = 1;
end

nc = zeros(ntrials,1);
for j = 1:ntrials
    fprintf('Trial %d\n',j);
    y = zeros(k,numel(varargin)/2);
    m = 1;
    for i = 1:2:numel(varargin)
        w = varargin{i+1};
        v = varargin{i};
        if ~isempty(w) && isvector(w)
            % Non-uniform weightings
            y(:,m) = randsample(v,k,true,w);
        else
            % Uniform sampling
            y(:,m) = randsample(v,k,true);
        end
        m = m+1;
    end
    [u,~,ib] = unique(y,'rows');
    nu = accumarray(ib,1,[size(u,1),1]);
    nc(j) = sum(nu > 1);
end
p = mean(nc>0);
m = mean(nc);
s = std(nc);
Categories
Election Data Analysis Election Forensics Election Integrity Interesting programming technical

FEC Filing Summaries Supporting James O’keefe Recent Revelation

Per the recent James O’Keefe video documenting incredulous amounts of contributions by individuals to political committees, I took a few minutes to download the public FEC data in bulk format and collated all of the individuals that had more than 100 donations to each of the following organizations: DNC, DCCC, DSCC, RNC, NRSC, NRCC, WinRed, ActBlue in the 2021-2022 Election period.

I made sure to account for and remove those records of contributions that had been (legally) returned due to over-contribution, or earmarked for campaign committee legal or facility funds, etc which are exempt from campaign finance limits.

I hope it helps James. I’m not going to try and do any sort of analysis on this data, as I’ve got plenty to do regarding the IT of our elections, but I wanted to help where I could. Any questions, or if you would like the raw transaction data, please feel free to contact me, or if you have a particular committee ID that you would like the information for, just let me know. I am happy to help.

Categories
Election Data Analysis Election Forensics Election Integrity Interesting programming technical

Potential duplicate registrants in VA voter list via Hamming Distance

Using the 2022-11-23 Registered Voter List (RVL) and the 2023-01-26 Voter History List (VHL) purchased from the VA Department of Elections (ELECT) I wrote up an analysis script to check for potentially duplicated registrant records in the RVL and cross reference duplicate pairings with the VHL to identify potential duplicate votes. This was my initial attempt at quantifying the number of potentially duplicate records in the RVL, and I have since updated the code to use a more rigorous Levenshtein distance metric, as well as making improvements to the parsing routines, bugfixes, etc. The details of the Hamming distance work are summarized below, and left up here for reference. For the latest and up to date information, please see the newer article posted here.

Errata note: One of the code bugs I discovered was that some of the entries did not actually get checked as they were accidentally skipped, so the numbers below are lower than the numbers presented in the newer work.

Please note that I will not publish voter Personally Identifiable Information (PII) on this blog. I have substituted fictitious PII information for all examples given below, and cryptographically hashed all voter information in the downloadable results file. I will make available the detailed information to those that have the authorization to receive and process voter data upon request (contact us).

Summary of Results:

We should mathematically expect approximately 11 exact string collisions in the full RVL dataset when comparing (First Name + Middle Name + Last Name + Suffix + Full DOB), but instead we see 1982 such collisions, which is over an order of magnitude increase from the expected value. While its possible that some of these collisions are false positives, there are quite a number of them that are deserving of further scrutiny.

Method:

For every entry in the latest RVL, I performed a string distance comparison, based on Hamming distance, between every possible pair of strings of (FIRST NAME + MIDDLE NAME + LAST NAME + SUFFIX + FULL DOB).  So for the ~6M different RVL entries, we need to compute ~3.6 x 10^13 different string comparisons. A hamming distance of 0 indicates the strings being compared are identical, a hamming distance of 1 indicates that there is a single character different between the two strings, a hamming distance of 2 indicates 2 characters are different, etc.  This obviously is a very computationally intensive process and it took over two days to complete the processing, once I got the bugs worked out.  (I’ve been quietly working on this one for a while now … )

Note that the Hamming distance only compares each respective position in a string and does not account for adding or removing a character completely from a string. A metric that does include addition and subtraction is the Levenshtein Edit Distance, which is much more computationally expensive (but more rigorous) metric. The Hamming distance is related to the Levenshtein distance in that it is mathematically the upper bound on the Levenshtein distance for arbitrary strings. I haven’t yet finished making an optimized GPU accelerated version of the Levenshtein edit distance metric, but it is in the works and I will redo this analysis with the new metric once that is completed.

I aggregated all of the Hamming distance pairings that were less than or equal to 3 characters different in order to identify potential (key word) duplicated registrants, and additionally for each pairing looked at the voter history information for each registrant in the pair to determine if there was a potential (again … key word) for multiple ballots to be cast by the same person in any given election.  As we allow for more characters to be different, we potentially are including many more likely false positive matches, even if we are catching more true positives.

For example: At a Hamming distance of 4 the strings of “Dave Joseph Smith M 10/01/1981” and “Tony Joseph Smith M 10/01/1981” at the same address would produce a potential match, but so would “Davey Joseph Smith M 10/01/1981” and “David Josiph Smith M 10/02/1981”. The first pair is more likely to be a false positive due to twins, while the second is more likely to be due to typo’s, mistakes, or use of nicknames and might warrant further investigation. A much stronger potential match would be something like “David Josiph Smith M 10/01/1981” and “David Joseph Smith M 10/01/1981”, with a Hamming distance of 1 at the same address. In an attempt to limit false positives, I have clamped the Hamming distance checks to <= 3 in this analysis.

One of the drawbacks of using Hamming distance over a more complete metric such as Levenshtein, is that the Hamming distance would give a very high score, and would therefore filter out of our results, an example pairing such as: “David Joseph Smith M 10/01/1981” and “Dave Joseph Smith M 10/01/1981”. The change from “id” to “e” adds/subtracts a character making the rest of the characters in the remainder of the string shift position and also not match. A Levenshtein metric would correctly return a small distance of 2, whereas the hamming distance returns 27. (As mentioned earlier, I am working on a Levenshtein implementation, but it is not yet complete.)

Note that with the official records obtained from ELECT, and in accordance with the laws of VA, I do not have access to the social security number or drivers license numbers for each registration record, which would help in identifying and discriminating potential duplicate errors vs things like twins, etc. I only have the first name, middle name, last name, suffix, month of birth, day of birth, year of birth, gender, and address information that I can work with.  I can therefore only take things so far before someone else (with investigative authority and ability to access those other fields) would need to step in and confirm and validate these findings.

Results:

The summary totals are as follows, with detailed examples.

Hamming Distance0123
Number of Potential Duplicate Registrant Pairs1982327621864120642
Number of Potential Duplicate Ballots110324831210175872

According to my derivations and simulations that are described in detail at the end of this article, we should only expect to see an average of 11 (+/- 3) potential duplicate pairs (a.k.a. “collisions”) at a Hamming distance of 0. This is over two orders of magnitude different than what we observe in the compiled results table above. Such a discrepancy deserves further investigation and verification.

Examples of Types of Issues Observed:

NOTE THE BELOW INFORMATION HAS HAD THE VOTER PERSONALLY IDENTIFIABLE INFORMATION (“PII”) FICTIONALIZED. WHILE THESE ARE BASED ON REAL DATA TO ILLUSTRATE THE DIFFERENT TYPES OF OBSERVATIONS, THEY DO NOT REPRESENT REAL VOTER INFORMATION.

Example #1: The following set of records has the exact match (Hamming Distance = 0) of full name and full birthdate (including year), but different address and different voter ID numbers AND there was a vote cast from each of those unique voter ID’s in the 2020 General Election.  While it’s remotely possible that two individuals share the exact same name, month, day and year of birth … it is probabilistically unlikely (see section below on mathematical derivation of probabilities if interested), and should warrant further scrutiny.

Voter Record A:

AMY BETH McVOTER 12/05/1970 F 12345 CITIZEN CT

Voter Record B:

AMY BETH McVOTER 12/05/1970 F 5678 McPUBLIC DR

Example #2: This set of records has a single character different (Hamming distance of 1) in their first name, but middle name, last name, birthdate and address are identical AND both records are associated with votes that were cast in the 2020, 2021, and 2022 November General Elections.  While it is possible that this is a pair of 23 year old twins (with same middle names) that live together, it at least bears looking into.

Voter Record A:

TAYLOR DAVID VOTER 02/16/2000 M 6543 OVERLOOK AVE NW

Voter Record B:

DAYLOR DAVID VOTER 02/16/2000 M 6543 OVERLOOK AVE NW

Example #3: This set of records has two characters different (Hamming distance of 2) in their birthdate, but name and address are identical AND the birth years are too close together for a child/parent relationship, AND both records are associated with votes that were cast in the 2020 and 2022 November General Elections. 

Voter Record A:

REGINA DESEREE MACGUFFIN 02/05/1973 F 123 POPE AVE

Voter Record B:

REGINA DESEREE MACGUFFIN 03/07/1973 F 123 POPE AVE

Example #4: This set of records has again a single character different (Hamming distance of 1) in the first name (but not the first letter this time) and the last name, birthdate and address are identical.  There were also multiple votes cast in the 2019 and 2022 November General from these registrants.

Voter Record A:

EDGARD JOHNSON 10/19/1981 M 5498 PAGELAND BLVD

Voter Record B:

EDUARD JOHNSON 10/19/1981 M 5498 PAGELAND BLVD

Example #5: This set of records has two characters different (Hamming distance of 2) in the first and middle names and the last name, birthdate, gender and address are identical.  There were also multiple votes cast in the 2021 and 2022 November General from these registrants. Again it is possible that these records represent a set of twins given the information that ELECT provides.

Voter Record A:

ALANA JAVETTE THOMPSON 01/01/2003 F 123 CHARITY LN

Voter Record B:

ALAYA YAVETTE THOMPSON 01/01/2003 F 123 CHARITY LN

Example #6: The following set of records has the exact match (Hamming Distance = 0) of full name and full birthdate (including year), and same address but different voter ID numbers.  There was no duplicated votes in the same election detected between the two ID numbers.

Voter Record A:

JAMES TIBERIUS KIRK 03/22/2223 M 1701 Enterprise Bridge

Voter Record B:

JAMES TIBERIUS KIRK 03/22/2223 M 1701 Enterprise Bridge

Example #7: The following set of records has the exact match (Hamming Distance = 0) of full name and full birthdate (including year), same address but different gender and voter ID numbers.  There was no duplicated votes in the same election detected between the two ID numbers.

Voter Record A:

MAXWELL QUAID CLINGER 11/03/2004 M 4077 MASH DR

Voter Record B:

MAXWELL QUAID CLINGER 11/03/2004 U 4077 MASH DR

Results Dataset:

A full version of the aggregated excel data is provided below, however all voter information (ID, first name, middle name, last name, dob, gender, address) have been removed and replaced by a one-way hash number, with randomized salt, based on the voter ID. The full file with specific voter information can be provided to parties authorized by ELECT to recieve and process voter information, Election Officials, or Law Enforcement upon request.

On the mathematical probability of matches:

2023-05-27: I have moved my derivation of the expected value of the number of collisions to a separate post, available here.

Categories
Election Data Analysis Election Forensics Election Integrity technical

Discrepancies between official VA turnout report and Voter History information

Now that the Voter History List and corresponding list of those who voted in the 11-08-2022 November General election has been released by the VA dept. of elections (“ELECT”), we have compiled and compared the results from the official turnout statistics on the ELECT website (here) to the information contained in both the Daily Absentee List (“DAL”) and Voter History List (“VHL”) information.

These are all “official” datasets provided by ELECT, and they should theoretically all match. However, there are a number of discrepancies in the official records that need to be explained. Performing simple differences between the accumulated totals for each locality exposes these issues. For example:

  • Why does the official results on the ELECT website show the total number of ballots cast for Albemarle County to be 11,105 less than the number of ballots recorded in the VHL? Why do the DAL file records show 11,429 more cast absentee (Early or Mail) ballots than the ELECT official results in Albemarle County? A: The Albemarle electoral board looked into this matter when we brought it to their attention and they reported that the registrar had made a transcription error and submitted the wrong line when submitting the election results to ELECT. While mistakes do happen, this explanation still begs the question as to why did ELECT not catch or flag these discrepancies, and what procedures are in place to prevent, catch or rectify these issues.
  • Why does Richmond City show 25,500 more ballots cast in the official turnout results on the ELECT website than are present in the VHL? Why are there 325 more absentee (Early or Mail) ballots in the ELECT website data than the DAL file?
  • Why is the sum of the absolute value of discrepancies between the VHL and the official results on the ELECT website 69,381?
  • Why is the sum of the absolute value of discrepancies in absentee ballots (Early or Mail) between the DAL and the official results on the ELECT website 28,515?
  • Why does the VHL data records contain zero “Election Day” records for Covington City?
  • … etc

We have notified a number of local electoral board members, registrars and election integrity groups, and postponed publishing this data until after they were able to validate the existence of these discrepancies, in the hopes that they can discover the source of the discrepancies and rectify the issues.

As a further dive into the data, we also computed and noted the following additional issues. Documentation on these specific discrepancies are available upon request by entities that are able to receive and process election data according to VA law and ELECT requirements. We will not publish the details publicly on this blog as they reveal personal voter information, but will give the summary results below. Please contact us for further information.

  • There were 3,354 records of valid absentee ballots cast recorded in the VHL that did not have corresponding entries (matched by voter ID) in the DAL record.
  • There were also 19,723 records of valid absentee ballots cast recorded in the DAL that did not have corresponding entries (matched by voter ID) in the VHL record.
  • The VHL contained 6 records of ballots cast in the VA Nov 2022 election that do not have a corresponding registration record in the Registered Voter List (RVL) dated 11/23/2022.
  • The VHL contained 93 records of ballots cast in any election that do not have a corresponding registration record in the Registered Voter List (RVL) dated 11/23/2022.
  • The 11/23/2022 statewide RVL contained 1,197 records that had duplicate records with different voter IDs, based on performing an exact match to first name, middle name, last name, suffix, full date-of-birth, and gender.

Background and Methodology:

The DAL file is supposed to track all of the transactions for the non-election day ballots for a specific election. This includes any Mail-In or Early In-Person ballots. (Any non-election day ballot is considered to be an “absentee” ballot in VA.) It is updated daily throughout the course of the election and is finalized once the election is certified. The VA 2022 election was certified by all of the localities by 11-18-2022. The last version of the DAL file we purchased and obtained from ELECT and used for this analysis was dated 12/13/2022, and it had been unchanged for a number of weeks by that point. We downloaded and archived copies of the DAL file multiple times per day in order to track the changes near-real-time over the course of the election cycle. (The day-to-day changes to the DAL records over time are documented in other blog entries.)

The VHL is updated by election officials after the election has concluded by adding the “voter credit” into VERIS (the state election database) for all registered voters that cast a ballot in the election (either absentee or on election day). The VHL contains information covering the last 4 years of election history for each currently registered voter. As of mid-January we were informed that all of the voter credit updates had been completed in the VERIS database for the 2022 General Election. (Why this isn’t done automatically similar to the DAL file entries and made available before election certification?) The version of the VHL that we purchased and obtained from ELECT and used for this analysis is from 1/26/2023. We specifically tried to time our purchase of the VHL to occur after the voter credit updates had been completed but before the scheduled maintenance process of removing longstanding “inactive” voters had taken place. We note that even if some “Inactive” voters had been removed from the VHL by the time we purchased the VHL, that should not impact this analysis, as the act of voting in the 2022 election would require the voters status to be listed as “Active”.

The “official” turnout report page on the ELECT website (here, again) was downloaded and converted to an excel spreadsheet on 2/4/2023. The notes on the website page is that the information content was last updated on 12/06/2022. (We have archived a pdf copy of the site as we downloaded it for reference attached to the bottom of this post.) All of the local electoral boards had completed their canvass by 11/18/2022 and submitted their information to ELECT, so there is no reason that this information should be missing any data.

The official turnout results from ELECT tabulate the Early Voting, Election Day, By Mail, and Provisional ballots cast and accounted for during the 2022 General Election. The DAL file includes information as to the Mail-In, Early Voting, and Absentee Provisional ballots cast. The VHL contains information as to all voters who cast a ballot in the election, and broken down by Election Day, Absentee (either Early or Mail-In), and Provisionals for each sub-group. Theoretically all of these data files should match exactly.

We note that there is a known issue that could potentially, but rarely, cause the number of ballots reported in the VHL to be strictly less than the number of ballots in the official results reported on the ELECT webpage. Due to the time delay between the close of the election and the update and publication of the VHL, there can be registrants that have been legitimately removed due to death, etc during that interval and their corresponding voting record is also removed when that happens. In the data presented below, this would manifest as a positive (+) discrepancy, but even so, these numbers should be relatively small.

We also again note that we made every effort to purchase the VHL dataset after all of the voter credit information was updated but before the department of elections performed their annual removal of records that had been inactive for more than 2 federal elections. Even so, if the voter was “inactive” and slated for removal, they wouldn’t be showing up in the count of ballots cast anyway. The act of casting a ballot immediately moves a voter into the “active” category.

The official turnout results are collated by locality, so we took the DAL and VHL information and accumulated their tallies to match the official turnout report breakdown. We then subtracted the DAL +/or VHL results from the official results to determine how much of a discrepancy exists between each of the datasets.

Terminology:

  • delta EV (DAL): This is the change in the Early Vote ballot numbers as calculated by subtracting the DAL “On Machine” ballots from the official turnout report “Early Vote” tally.
  • delta ByMail (DAL): This is the change in the Mail In ballot numbers as calculated by subtracting the DAL (“Pre-Processed” + “Marked”) ballots from the official turnout report “Mail In” tally.
  • Net DAL: The net sum of the DAL differences in each locality
  • Sum Abs Value DAL: The sum of the absolute values of the DAL differences in each locality.
  • delta ED (VHL): This is the change in Election Day ballot numbers as calculated by subtracting the VHL (“Election Day” AND NOT “Provisional”) ballots from the official turnout report “Election Day” tally.
  • delta Provisional (VHL): This is the change in Provisional ballot numbers as calculated by subtracting the VHL “Provisional” ballot total from the official turnout report “Provisional” tally.
  • delta Early or Mail (VHL):  This is the change in Early OR Mail-In ballot numbers as calculated by subtracting the VHL (“Absentee” AND NOT “Provisional”) ballots from the official turnout report (“Early Voting” + “Mail In”) tally.
  • delta Provisional (VHL): This is the change in Provisional ballot numbers as calculated by subtracting the VHL “Provisional” ballots from the official turnout report “Provisional” tally.
  • Net VHL: The net sum of the VHL differences for each locality   
  • Sum Abs Value VHL: The sum of the absolute values of the VHL differences for each locality

Data Results:

We present the data and results in the following EXCEL spreadsheet. Tab 1 is the ELECT website data with differences between the DAL and VHL data on the right hand side. Tab 2 is the accumulated DAL data by locality. Tab 3 is the accumulated VHL data by locality.

The screen capture of the ELECT website data from 2/4/2023 is also here:

https://digitalpollwatchers.org/wp-content/uploads/2023/02/ELECT-Turnout-Results-Screenshot-2023-02-04-1.pdf

Categories
Election Data Analysis Election Forensics Election Integrity programming technical

Distribution of Invalid Voter Addresses and Absentee Ballots in VA 2022 General Election, with Mailing Address Substitution

Forward:

In a previous post I documented the results from an United States Postal Service (USPS) National Change of Address (NCOA) database check with the 2022 Virginia Registered Voter List (RVL) primary address records joined with the Daily Absentee List (DAL) reports of absentee ballots cast. There was a significant public reaction to the fact that over 15K RVL primary addresses associated with voters who cast ballots in the VA 2022 general election were not recognized by the USPS database as valid addresses (among other issues). I reported the data as I found it, but a common commentary was that my analysis did not account for rural voters who do not have a traditional street address, or that do not have mail receptacles and use PO boxes as their primary delivery mechanism.

The requirements for voter registration and primary address are specified by the VA Constitution, Federal and VA law, and require the following:

  1. The VA constitution (Section II-1 and II-2) and the National Voter Registration Act (NVRA) requires that registered voter primary addresses be an actual physical (and deliverable) street address. The de-facto arbiter of what defines a recognizable and/or deliverable address is the USPS, therefore a street address that is not recognized or is undeliverable according to USPS is not compliant. We should be making every effort to ensure that the primary address associated with each voter is able to be correctly recognized and translated into a deliverable address by the USPS. This may require adjusting VA’s data normalization policies such that input street addresses are correctly mapped to USPS addresses, or a legislative action in order to correct.
  2. There is also the legal restriction that registered VA voters are not allowed to use PO Boxes as their primary registered voter address. There is an exception that “protected” voters are allowed to provide a PO box address to be displayed in public records, but their actual address on file must still be a physical address.
  3. The 2022 VA GREB, section 6..2.3 states that in special cases, a rural voter may supply the name of the highway and enough detail in the comments section of the voter application that the registrar may ascertain where the physical address is. This seems (IMO, but I’m not a lawyer) in contradiction to the language in the VA constitution and in the NVRA that make the implicit requirement of deliverable addresses. Also, if the address as entered is not recognized by the USPS NCOA system, then it will constantly be generating validation errors every time it is checked against the NCOA database, which VA is currently required to do at least annually. Again, this may require adjusting VA’s data normalization policies such that input street addresses are correctly mapped to USPS addresses, or a legislative action in order to correct.
  4. Additionally the USPS is supposed to recognize “Post Box Street Address” (PBSA) locations such that when someone addresses an envelope or package to the street address of a residence that is served only by delivery to a PO box, the USPS should automatically recognize this and adjust accordingly. Per the NCOA documentation, the USPS NCOA database checks are supposed to be doing this detection and translation already.

And again, to be clear, my point is not to accuse anyone of malicious intent or wrongdoing … I am simply trying to point out that the way we are using and managing our data is discrepant with what the requirements actually are. We either need to change the data and our practices to conform to the legal requirements, or change the law to fit how we are actually (and practically) using the data.

That being said, and in order to show that there are still significant data issues even after adjusting for rural routes, etc., I’ve re-run my NCOA analysis to account for records with invalid primary addresses but that had valid mailing addresses, even if those mailing addresses are PO Boxes. Any RVL record with a primary address that was marked invalid by an NCOA check, but had a different mailing address AND that mailing address was returned from a second NCOA check as NOT-invalid (even if it was a PO Box), was replaced with the mailing address listing and NCOA results. While the requirement is still that primary addresses are supposed to be valid, this re-processing and allowing for primary OR mailing addresses to be used is more in line with the way VA has actually implemented its voter registrations across the state … even though that implementation seems to be at odds with the requirements.

The rest of the analysis is performed the same as before, and is presented in the same format for consistency. I have updated the dates and numerical results as appropriate, but have otherwise kept the format and much of the language and layout the same as the previous analysis.


BLUF (Bottom Line Up Front):

There were 2,164 ballots cast during early voting in the VA 2022 General Election where the voters’ (primary or mailing) registered address on record were flagged as “Invalid” by a National Change of Address (NCOA) database check. If we include addresses that were identified as 90-day Vacant the total rises to 4,274. (The previous analysis that used only the primary addresses returned 15,419 invalid records with associated ballots, and went up to 17,244 when including addresses flagged as vacant.)

A certified commercial provider of NCOA data verification was used to facilitate this analysis on raw data obtained from the VA Dept of Elections (“ELECT”). It is not technically possible to obtain a truly time-synchronized complete set of data for any election due to the way elections are run in VA, but we made every effort to obtain the data from the state as close in time as was possible. The NCOA database is maintained and curated by the United States Postal Service (USPS).

For those wishing to review specific entries, or to help validate these issues, and who are part of an organization that is able to receive and handle election information according to VA law and the VA Dept of Elections requirements, you may contact us to request the raw data breakdowns. We will need to validate your organization or employment and will make data available as legally allowed.

Commentary and Discussion:

I would like to be very clear: We are simply presenting the data as compiled to facilitate public discourse. We have strived to only utilize data directly obtained from authoritative sources (ELECT, the USPS via TrueNCOA provider).

The designation of “Invalid” addresses is according to the definition by the USPS and TrueNCOA, i.e. the TrueNCOA check has reported the addresses as listed in the RVL have no match in the USPS database. Invalid addresses do not include things like valid P.O. Boxes or valid rural addresses that are automatically recognized and translated by USPS to Post Box Street Address (PBSA) records.

The VA Constitution (Section II-1 and II-2) specify the requirements for voter eligibility to include that voters are required to supply a primary address for their registration record, regardless of their method of voting. VA is required by law to consistently maintain and validate these records. Based on the below analysis, the data shows that there is a small but statistically significant number of “Invalid” addresses associated for voters who cast ballots in the Nov 2022 election.

Continuing EPEC’s mission to promote voter participation, analyze election technology, and educate the public about best practices in managing election technology systems; we are providing the below analysis in order to educate and inform the public, legislators and elections officials about the existence of these discrepancies.

Details:

After receiving the results of a National Change of Address (NCOA) database check on the registration (not the temporary) addresses in the latest VA Registered Voter List (RVL). I’ve gone through and collated the flagged addresses and reconciled them with the entries in the Daily Absentee List (DAL) file records provided by the VA Dept. of Elections (“ELECT”).

The DAL file (dated 2022-12-13) provides a records of all of the voters that cast absentee (either Early In-Person or Mail-In) ballots in the election, and the RVL (dated 2022-11-23) gives all of the registered voter addresses and other pertinent information. Both datasets come directly from the VA Dept. of Elections and must be purchased. Total cost was ~$7000. The two datasets can be tied together using the voter Identification Number that is assigned to each (supposedly) unique voter by the state. Entries in the RVL should be unique to each registered voter (although there are a small number of duplicate voter IDs that I have seen … but thats for another post), whereas the DAL file can have multiple entries attributed to a single voter recording the various stages of ballot processing.

The NCOA check was performed on all addresses in the RVL file in order to detect recent moves, invalid addresses, vacant addresses, P.O. Boxes, commercial addresses, etc. The NCOA check takes multiple days to run using a commercial service provider and was executed between 2022-12-19 through 2022-12-30.

*** As noted above, in this run I substituted those primary addresses that evaluated to “invalid” with their corresponding “mailing” address records that had been recognized as “valid”, if possible. ***

Results:

Raw TrueNCOA Processing result stats on the full RVA dataset:
NCOA Processing of VA RVL RecordsFull RVL 11-23 Primary Addresses (12/30/2022)%Unique RVL Mailing Addresses (12/30/2022)%
Records Processed6,127,856
216,896
18 – Month NCOA Moves282,6694.61%8,2473.80%
48 – Month NCOA Moves163,9312.68%8,0993.73%
Moves with no Forwarding Address20,9130.34%1,4150.65%
Total NCOA Moves467,5137.63%17,7618.19%
Vacant Flag29,3150.48%22,76310.49%
DPV Updated/Address Corrected Records601,5399.82%18,5558.55%
DPV Deliverable Records5,835,23095.22%165,93976.51%
DPV Non-Deliverable Records185,7523.03%35,10016.18%
LACS Updated (Rural Address converted to Street Address)33,4970.55%2540.12%
Residential Delivery Indicator5,970,53197.43%194,34289.60%
Addresses matched to the USPS Database6,020,98398.26%201,04092.69%
Invalid Addresses107,3041.75%15,9687.36%
Expired Addresses10,8220.18%2760.13%
Business Move (B)3400.01%610.03%
Family Move (F)1168741.91%5,3902.49%
Individual Move (I)3502995.72%12,3105.68%
General Delivery Address1590.00%1700.08%
High Rise Address76492212.48%7,0973.27%
PO Box Address280400.46%158,35973.01%
Rural Route Address810.00%2690.12%
Single Family Address524332185.57%33,58215.48%
Unknown519030.85%15,5377.16%
Reporting as presented from the TrueNCOA data service. The TrueNCOA data dictionary is presented here.
Combining NCOA results of RVL Addresses with the DAL data:
Vacant Addresses:

There were 2,112 records across the state with addresses that have been flagged as (90-day) “Vacant” by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election according to the DAL file. Of those records, 1,542 were Early In-Person and 547 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location. (Note this is actually an increase over the previous results that were not adjusted for potential mailing address matches.)

P.O. Boxes (Non-protected):

There were 13,492 records across the state with addresses that have been flagged as P.O. Box Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election AND were NOT listed as protected entries according to the DAL file. (VA allows for voters who have a legal protective order to list a P.O. Box as their address of record on public documents) Of those records, 11,426 were Early In-Person and 2020 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

Note, this is actually an increase over the previous results that were not adjusted for potential mailing address matches. This is not unexpected, as there are a number of PO box mailing addresses that have been substituted in for invalid primary street addresses. PO Boxes are not supposed to be allowed as registered voting addresses. (You should talk to your legislators about this discrepancy, because this catch-22 will likely need to be changed by legislative action!)

Invalid Addresses:

There were 2,164 records across the state with addresses that have been flagged as “Invalid” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election. Of those records, 1,535 were Early In-Person and 598 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location. Note, this is actually a significant decrease over the previous results that were not adjusted for potential mailing addresses. This is not unexpected, as there were a number of invalid primary address with a valid (even if PO Box) mailing address. (The fact that the primary addresses do not validate with the USPS is it’s own issue, but we are ignoring that for this analysis as noted above.)

Invalid OR Vacant Addresses:

There were 4,274 records across the state with addresses that have been flagged as “Invalid” or “Vacant” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election. Of those records, 3,077 were Early In-Person and 1,143 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

Record of Moves Out-of-State:

There were 797 records that had records of NCOA moves to valid out-of-state addresses before 2022-08 that also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election. Of those records, 346 were Early In-Person and 450 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

Results By District:

District 01:
Invalid Addresses:

There were 213 records with registered addresses that have been flagged as “Invalid” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 01. Of those records, 170 were Early In-Person and 38 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

Invalid OR Vacant Addresses:

There were 364 records with registered addresses that have been flagged as “Invalid” or “Vacant” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 01. Of those records, 288 were Early In-Person and 71 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

District 02:
Invalid Addresses:

There were 183 records with registered addresses that have been flagged as “Invalid” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 02. Of those records, 147 were Early In-Person and 33 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

Invalid OR Vacant Addresses:

There were 441 records with registered addresses that have been flagged as “Invalid” or “Vacant” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 02. Of those records, 344 were Early In-Person and 90 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

District 03:
Invalid Addresses:

There were 37 records with registered addresses that have been flagged as “Invalid” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 03. Of those records, 27 were Early In-Person and 10 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

Invalid OR Vacant Addresses:

There were 324 records with registered addresses that have been flagged as “Invalid” or “Vacant” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 03. Of those records, 223 were Early In-Person and 94 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

District 04:
Invalid Addresses:

There were 116 records with registered addresses that have been flagged as “Invalid” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 04. Of those records, 104 were Early In-Person and 12 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

Invalid OR Vacant Addresses:

There were 318 records with registered addresses that have been flagged as “Invalid” or “Vacant” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 04. Of those records, 256 were Early In-Person and 59 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

District 05:
Invalid Addresses:

There were 383 records with registered addresses that have been flagged as “Invalid” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 05. Of those records, 299 were Early In-Person and 81 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

Invalid OR Vacant Addresses:

There were 584 records with registered addresses that have been flagged as “Invalid” or “Vacant” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 05. Of those records, 445 were Early In-Person and 134 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

District 06:
Invalid Addresses:

There were 202 records with registered addresses that have been flagged as “Invalid” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 06. Of those records, 144 were Early In-Person and 53 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

Invalid OR Vacant Addresses:

There were 400 records with registered addresses that have been flagged as “Invalid” or “Vacant” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 06. Of those records, 301 were Early In-Person and 91 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

District 07:
Invalid Addresses:

There were 243 records with registered addresses that have been flagged as “Invalid” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 07. Of those records, 169 were Early In-Person and 68 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

Invalid OR Vacant Addresses:

There were 370 records with registered addresses that have been flagged as “Invalid” or “Vacant” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 07. Of those records, 274 were Early In-Person and 87 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

District 08:
Invalid Addresses:

There were 165 records with registered addresses that have been flagged as “Invalid” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 08. Of those records, 69 were Early In-Person and 94 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

Invalid OR Vacant Addresses:

There were 413 records with registered addresses that have been flagged as “Invalid” or “Vacant” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 08. Of those records, 226 were Early In-Person and 183 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

District 09:
Invalid Addresses:

There were 328 records with registered addresses that have been flagged as “Invalid” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 09. Of those records, 257 were Early In-Person and 66 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

Invalid OR Vacant Addresses:

There were 479 records with registered addresses that have been flagged as “Invalid” or “Vacant” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 09. Of those records, 373 were Early In-Person and 100 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

District 10:
Invalid Addresses:

There were 174 records with registered addresses that have been flagged as “Invalid” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 10. Of those records, 104 were Early In-Person and 68 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

Invalid OR Vacant Addresses:

There were 246 records with registered addresses that have been flagged as “Invalid” or “Vacant” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 10. Of those records, 164 were Early In-Person and 82 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

District 11:
Invalid Addresses:

There were 122 records with registered addresses that have been flagged as “Invalid” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 11. Of those records, 46 were Early In-Person and 75 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

Invalid OR Vacant Addresses:

There were 335 records with registered addresses that have been flagged as “Invalid” or “Vacant” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 11. Of those records, 181 were Early In-Person and 152 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

Summary Data Files by Locality:

The complete set of graphics and statistics for each locality, and each congressional district in VA can be downloaded here as a zip file. The tabulated summary results can also be downloaded in excel, csv, or numbers format:

Categories
Election Data Analysis Election Forensics Election Integrity Interesting technical

Distribution of Invalid Voter Addresses and Absentee Ballots in VA 2022 General Election

Edited on 2022-12-15 for typo corrections, addition of Congressional District breakdown, and added commentary section.

BLUF (Bottom Line Up Front):

There were 15,419 ballots cast during early voting in the VA 2022 General Election where the voters’ registered address on record were flagged as “Invalid” by a National Change of Address (NCOA) database check. If we include addresses that were identified as 90-day Vacant the total rises to 17,244. Plotting the distribution of these based on the ZIP+4 identified by the NCOA check shows a disproportionate high amount of issues in the Eastern shore of VA.

A certified commercial provider of NCOA data verification was used to facilitate this analysis on raw data obtained from the VA Dept of Elections (“ELECT”). It is not technically possible to obtain a truly time-synchronized complete set of data for any election due to the way elections are run in VA, but we made every effort to obtain the data from the state as close in time as was possible. The NCOA database is maintained and curated by the United States Postal Service (USPS).

For those wishing to review specific entries, or to help validate these issues, and who are part of an organization that is able to receive and handle election information according to VA law and the VA Dept of Elections requirements, you may contact us to request the raw data breakdowns. We will need to validate your organization or employment and will make data available as legally allowed.

Commentary and Discussion (added 2022-12-15):

In response to recent interest on this matter, I would like to be very clear: We are simply presenting the data as compiled to facilitate public discourse. We have strived to only utilize data directly obtained from authoritative sources (ELECT, the USPS via TrueNCOA provider).

The designation of “Invalid” addresses is according to the definition by the USPS and TrueNCOA, i.e. the TrueNCOA check has reported the addresses as listed in the RVL have no match in the USPS database. Invalid addresses do not include things like valid P.O. Boxes or valid rural addresses.

The VA Constitution (Section II-1 and II-2) specify the requirements for voter eligibility to include that voters are required to supply a primary address for their registration record, regardless of their method of voting. VA is required by law to consistently maintain and validate these records. Based on the below analysis, the data shows that there is a small but statistically significant number of “Invalid” addresses associated for voters who cast ballots in the Nov 2022 election.

Continuing EPEC’s mission to promote voter participation, analyze election technology, and educate the public about best practices in managing election technology systems; we are providing the below analysis in order to educate and inform the public, legislators and elections officials about the existence of these discrepancies.

Details:

After receiving the results of a National Change of Address (NCOA) database check on the registration (not the temporary) addresses in the latest VA Registered Voter List (RVL). I’ve gone through and collated the flagged addresses and reconciled them with the entries in the Daily Absentee List (DAL) file records provided by the VA Dept. of Elections (“ELECT”).

The DAL file (dated 2022-12-08) provides a records of all of the voters that cast absentee (either Early In-Person or Mail-In) ballots in the election, and the RVL (dated 2022-11-23) gives all of the registered voter addresses and other pertinent information. Both datasets come directly from the VA Dept. of Elections and must be purchased. Total cost was ~$7000. The two datasets can be tied together using the voter Identification Number that is assigned to each (supposedly) unique voter by the state. Entries in the RVL should be unique to each registered voter (although there are a small number of duplicate voter IDs that I have seen … but thats for another post), whereas the DAL file can have multiple entries attributed to a single voter recording the various stages of ballot processing.

The NCOA check was performed on all addresses in the RVL file in order to detect recent moves, invalid addresses, vacant addresses, P.O. Boxes, commercial addresses, etc. The NCOA check takes multiple days to run using a commercial service provider and was executed between 2022-12-01 through 2022-12-06. The processing needed to be performed in two batches.

Results:

Raw TrueNCOA Processing result stats on the full RVA dataset:
NCOA Processing of VA RVL 2022-11-23 RecordsBatch 1Batch 2TotalPercent
Records Processed5,831,089296,7676,127,856
18 – Month NCOA Moves264,21012,618276,8284.52%
48 – Month NCOA Moves155,274865156,1392.55%
Moves with no Forwarding Address23,65144724,0980.39%
Total NCOA Moves443,13513,930457,0657.46%
Vacant Flag26,8651,74228,6070.47%
DPV Updated/Address Corrected Records568,03920,748588,7879.61%
DPV Deliverable Records5,555,024280,2075,835,23195.22%
DPV Non-Deliverable Records173,32212,427185,7493.03%
LACS Updated (Rural Address converted to Street Address)32,1161,32733,4430.55%
Residential Delivery Indicator5,681,183289,3455,970,52897.43%
Addresses matched to the USPS Database5,728,347292,6346,020,98198.26%
Invalid Addresses102,6174,161106,7781.74%
Expired Addresses6,8755767,4510.12%
Business Move (B)33963450.01%
Family Move (F)110,4443,549113,9931.86%
Individual Move (I)332,35210,375342,7275.59%
General Delivery Address15301530.00%
High Rise Address703,90362,059765,96212.50%
PO Box Address26,97377027,7430.45%
Rural Route Address791800.00%
Single Family Address5,012,679230,2005,242,87985.56%
Unknown49,2992,45751,7560.84%
Reporting as presented from the TrueNCOA data service. The TrueNCOA data dictionary is presented here.
Combining NCOA results of RVL Addresses with the DAL data:
Vacant Addresses:

There were 1,829 records across the state with registered addresses that have been flagged as (90-day) “Vacant” by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election according to the DAL file. Of those records, 1,317 were Early In-Person and 491 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

Note: This graphic was updated on Thu Dec 15 to correct a typo in the title as to the date of the DAL file that was used.
P.O. Boxes (Non-protected):

There were 1,648 records across the state with registered addresses that have been flagged as P.O. Box Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election AND were NOT listed as protected entries according to the DAL file. (VA allows for voters who have a legal protective order to list a P.O. Box as their address of record on public documents) Of those records, 1,348 were Early In-Person and 294 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

Note: This graphic was updated on Thu Dec 15 to correct a typo in the title as to the date of the DAL file that was used.
Invalid Addresses:

There were 15,419 records across the state with registered addresses that have been flagged as “Invalid” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election. Of those records, 12,766 were Early In-Person and 2,566 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

Note: This graphic was updated on Thu Dec 15 to correct a typo in the title as to the date of the DAL file that was used.
Invalid OR Vacant Addresses:

There were 17,244 records across the state with registered addresses that have been flagged as “Invalid” or “Vacant” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election. Of those records, 14,083 were Early In-Person and 3,053 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

Note: This graphic was updated on Thu Dec 15 to correct a typo in the title as to the date of the DAL file that was used.
Record of Moves Out-of-State:

There were 793 records that had records of NCOA moves to valid out-of-state addresses before 2022-08 that also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election. Of those records, 338 were Early In-Person and 454 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

Note: This graphic was updated on Thu Dec 15 to correct a typo in the title as to the date of the DAL file that was used.

Results By District:

This section was added 2022-12-15, per multiple requests for by-district breakouts.

District 01:
Invalid Addresses:

There were 3,222 records with registered addresses that have been flagged as “Invalid” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 01. Of those records, 2,841 were Early In-Person and 364 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

Invalid OR Vacant Addresses:

There were 3,310 records with registered addresses that have been flagged as “Invalid” or “Vacant” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 01. Of those records, 2,909 were Early In-Person and 384 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

District 02:
Invalid Addresses:

There were 2,552 records with registered addresses that have been flagged as “Invalid” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 02. Of those records, 2,185 were Early In-Person and 353 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

Invalid OR Vacant Addresses:

There were 2,763 records with registered addresses that have been flagged as “Invalid” or “Vacant” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 02. Of those records, 2,346 were Early In-Person and 400 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

District 03:
Invalid Addresses:

There were 137 records with registered addresses that have been flagged as “Invalid” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 03. Of those records, 97 were Early In-Person and 34 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

Invalid OR Vacant Addresses:

There were 412 records with registered addresses that have been flagged as “Invalid” or “Vacant” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 03. Of those records, 283 were Early In-Person and 117 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

District 04:
Invalid Addresses:

There were 507 records with registered addresses that have been flagged as “Invalid” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 04. Of those records, 423 were Early In-Person and 78 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

Invalid OR Vacant Addresses:

There were 695 records with registered addresses that have been flagged as “Invalid” or “Vacant” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 04. Of those records, 567 were Early In-Person and 121 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

District 05:
Invalid Addresses:

There were 2,093 records with registered addresses that have been flagged as “Invalid” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 05. Of those records, 1,738 were Early In-Person and 348 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

Invalid OR Vacant Addresses:

There were 2,264 records with registered addresses that have been flagged as “Invalid” or “Vacant” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 05. Of those records, 1,860 were Early In-Person and 395 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

District 06:
Invalid Addresses:

There were 1,214 records with registered addresses that have been flagged as “Invalid” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 06. Of those records, 990 were Early In-Person and 212 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

Invalid OR Vacant Addresses:

There were 1,390 records with registered addresses that have been flagged as “Invalid” or “Vacant” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 06. Of those records, 1,129 were Early In-Person and 247 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

District 07:
Invalid Addresses:

There were 1,042 records with registered addresses that have been flagged as “Invalid” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 07. Of those records, 868 were Early In-Person and 167 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

Invalid OR Vacant Addresses:

There were 1,139 records with registered addresses that have been flagged as “Invalid” or “Vacant” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 07. Of those records, 946 were Early In-Person and 183 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

District 08:
Invalid Addresses:

There were 276 records with registered addresses that have been flagged as “Invalid” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 08. Of those records, 148 were Early In-Person and 125 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

Invalid OR Vacant Addresses:

There were 517 records with registered addresses that have been flagged as “Invalid” or “Vacant” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 08. Of those records, 300 were Early In-Person and 212 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

District 09:
Invalid Addresses:

There were 3,247 records with registered addresses that have been flagged as “Invalid” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 09. Of those records, 2,639 were Early In-Person and 597 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

Invalid OR Vacant Addresses:

There were 3,369 records with registered addresses that have been flagged as “Invalid” or “Vacant” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 09. Of those records, 2,733 were Early In-Person and 624 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

District 10:
Invalid Addresses:

There were 940 records with registered addresses that have been flagged as “Invalid” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 10. Of those records, 740 were Early In-Person and 198 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

Invalid OR Vacant Addresses:

There were 992 records with registered addresses that have been flagged as “Invalid” or “Vacant” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 10. Of those records, 783 were Early In-Person and 207 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

District 11:
Invalid Addresses:

There were 189 records with registered addresses that have been flagged as “Invalid” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 11. Of those records, 97 were Early In-Person and 90 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

Invalid OR Vacant Addresses:

There were 393 records with registered addresses that have been flagged as “Invalid” or “Vacant” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 11. Of those records, 227 were Early In-Person and 163 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

Summary Data Files by Locality:

The complete set of graphics and statistics for each locality, and each congressional district in VA can be downloaded here as a zip file. The tabulated summary results can also be downloaded in excel, csv, or numbers format:

Categories
Election Data Analysis Election Forensics Election Integrity Interesting technical

Interesting change in effective dates in VA Registered Voter List

I’ve stumbled across an interesting data artifact that I’m not sure what to make of. But I will present it here for completeness.

In the Registered Voter List available from the VA Dept of Elections (“ELECT”), each record of a registered voter has an “effective date” associated with it. This can be the same as the actual registration date, or the date that the voters record is returned to “active” status, etc. It appears that sometime within the last year, almost all of the voter registrations with a previous effective date earlier than June 2011 have had their effective date reassigned.

For this analysis I am using an RVL that I purchased from ELECT on 2021-11-06 and comparing it with an RVL purchased on 2022-11-22. I am only comparing the records associated with common voter IDs between each dataset. Any new or removed voters in the last year have been removed from the data and the corresponding plots below.

In the 2021 RVL, we can see the distribution of the effective dates in the histogram below. The majority of records have rather recent effective dates, but there are diminishing tails from long-term voters who’s effective date of their registration goes back many years. (The y-axis in the plot is logarithmic, so we can better see the shape of the distribution tails.)

This isn’t all-together so surprising. Newer voters, or voters who have made recent changes to their registration information, will likely get an updated effective date on their voter registration record. Older, or longer term voters, that have not made any recent changes and stay active would show older effective dates on their records.

Now compare that to the RVL file dated 2022-11-22. Again, this comparison and the data in these plots is only those records that share common voter ids between the two files. Sometime between the time I downloaded the 2021-11-06 RVL and the 2022-11-22 RVL, almost all records with effective dates before July 2011 have had their effective dates reset to a more recent date.

I don’t really know what to make of this. Was there a mass update of voter registration records? Or a database restore, or some other operation on the records?

Even more interesting is when we superimpose the two histograms we see that the 2022 records with effective date after July 2011 look to also have had a significant percentage of dates reset. We see the red curve maintains it shape, save for the large spike at the far right, but is shifted lower … as if a constant percentage of the records have been included in the effective date shift.

Now if we apply a constant multiplier of 20x to the red (2022) dataset we can mostly re-align the histograms.

Of the effective dates that were changed between the two files, the distribution of the adjustments to the effective dates is shown below. I find it interesting that there are a number of records where the effective date has been moved backwards (?) in time.

We know there have been significant issues with the database used by ELECT (known as the “VERIS” database), so maybe this is an artifact of some maintenance operations or repairs on the data entries? Or maybe this is a symptom of a larger problem. Whatever it is, it doesn’t make a lot of sense in what should be a very well maintained and authoritative set of records.

Categories
Election Data Analysis Election Forensics Election Integrity Press Release

Virginia’s Prince William County Conducts Ballot Recount after Errors Reported in Election Scanner


PWC Board Concludes Human Error ‘Likely’ Before Certifying Results

Non-profit electoral process group praises electoral board for responding to public’s calls to conduct a partial audit of discrepancies before certifying election results

November 16, 2022 –- Election officials in Virginia’s Prince William County met to certify the results of the 2022 General Election after they conducted a hand-count of ballot counts in conflict with machine counts and concluded that human error was a “likely” cause rather than machine error.

The board’s decision to conduct a hand-count of the ballots in question followed a call for a review from the public, local and state officials, and Electoral Process Education Corporation (EPEC), a
Virginia-based non-profit 501c (3) that provides election data analysis.

The ballots in question in PWC’s Precinct 612 remain in the custody of the general registrar; the board ruled the matter resolved after conducting a hand count of the machine bin that tallied 27 more
votes than physical ballots that were scanned into the machines. (See prior release detailing the error here: https://digitalpollwatchers.org/multiple-errors-found-in-virginias-2022-election-scanner-and-pollbook-data/)

Although the PWC electoral board was not able to reconcile one outstanding difference between the poll-book count of votes cast and the machine scans in two precincts, it judged the differences were
not likely coming from machine malfunctions.

While it is still problematic for poll books not to match the scanner tapes, it is a much more serious issue when the number of physical ballots in the accumulation bin and the scanner totals do not match.

EPEC’s recommendations based on its analysis of the election data include the following:

Standard processes should be updated to require verification that the number of ballots in the scanner collection bin match the scanner result report tape. If the numbers do not match, an on premises hand tabulation be performed by the election officers and the results recorded in the official public record.

EPEC also commended the PWC board of elections for ensuring transparency by confirming that the final counts of the physical ballots align with machine scan tallies before certifying the results.

Categories
Election Data Analysis Election Forensics Election Integrity technical

“On Machine” ballots with logically impossible time stamps

In looking over the VA DAL data, one interesting issue that is readily apparent, is that the BALLOT_RECIEPT_DATE field for in-person, on-machine early vote data is logically impossible.

These time-stamps are supposed to be generated by the electronic poll-books when a voter is checked in at an in-person early voting site. The appeal and rationale for utilizing electronic poll-books is exactly because the can automate the recording of check-in and (theoretically) minimize human error. The operating hours of VA in-person early voting sites are limited to 7am – 7pm. I’m not aware of any in-person early voting center that had extended hours past those. Therefore, logically, we would expect that the electronic poll book generated time stamps for check-ins for in-person on-machine early votes would fall within the 7am – 7pm bounds.

The plot below is generated directly from the Daily Absentee List (DAL) file pulled from the VA Department of Elections on 11/08/2022 at 6am. The x-axis gives the time (rounded to the nearest minute) of the BALLOT_RECIEPT_DATE field associated with recorded Early In-Person On-Machine ballots in the file. The (logrithmic) y-axis gives the total number of Early In-Person On-Machine records that were recorded with that unique timestamp. The blue trace represents all of the records that fall within the daily 7am – 7pm bounds, and the red trace represents the data outside of those bounds.

There were 520,549 records that fall within the expected time bounds, and 156,576 that fall outside of the bounds. From a purely systems perspective, that means that the ability of our electronic poll books (or the backend database they are tied to) to accurately record the check-in time of Early In-Person On-Machine voters has an error rate of 156576 / (156576+520549) = 23.12%.

Let me say that again. A 23.12% error rate.

23.12% of the time, our electronic poll-book based system is reporting a logically impossible time for a person to have physically walked into an open + operating early voting location to check-in and cast their ballot.

Now, if we want to be generous and allow for the possibility that maybe voting locations opened early or closed late and we pad our (7am – 7pm) bounds to be from (6am – 8pm) and run the same analysis, we still get an error rate of 23.09%.

If we pad the hours of operations limits even further to (5am – 9pm), we still get an error rate of 23.06%.

If we run the same analysis using the 7am – 7pm bounds on the 2021 and 2020 data we get 29.64% and 71.17% error rates, respectively.

Update 2022-11-13

I adjusted the allowed times to 7am-10pm and re-ran the most recent 2022, 2021 and 2020 DAL files, as well as breaking down by locality. While doing this I noticed that some localities had all timestamps set to midnight, while others still had invalid timestamps set to unique values (but outside operational hours), and some had combinations of both. I’ve delineated the plots such that magenta traces are from ballot receipt timestamps that are all set to midnight, red trace is invalid timestamps not set to midnight, and blue traces are valid within 7am-10pm hours of operation (which is very very generous).

There are two error percentages being computed and being displayrd in the graph title area. The first (“BRx error”) is as described above and results in a 23.14% error in the 2022 VA statewide data. The second (“BRx_Mok error”) is as described above except we allow for the uniformly midnight ballot receipt dates to be presumed allowable, and results in a 0.05% error metric.

The inclusion of the latter class of error computation is in order to account for the remote chance that a locality is legitimately using paper poll books or otherwise not recording the time of the voter checkin, but only recording the date information (which would be consistent with all timestamps at midnight). VA requires the use of electronic poll books, but there are still some that use manual entry paper poll-books as backup. So even IF that was the explanation for why so many entries were uniformly timestamped to midnight … (A) why did they have to go to their paper poll book backups in the first place? and (B) we still have a residual error of 0.05% across the state that needs to be explained even after removing uniform midnight timestamps from consideration. That might not seem a terribly huge error rate at first blush, but when you consider that most electronic data recording systems (at least that I am aware of) have error rate requirement thresholds for acceptance testing set to the order of 1/1,000,000 … thats still unacceptable. I have been unable to find a documented requirement for error rate threshold for the electronic poll book systems used in VA, as per the VA department of elections.

The complete tabulation of all errors for each locality is provided here:

Selected Locality Plots:

The segmented Prince William County (my home county) 2022 plot is below. There is a 0.06% error rate of invalid (all midnight) timestamps in the Ballot Receipt date data.

The segmented Loudoun County 2022 plot is below. There is a 0.03% error rate of invalid (all midnight) timestamps in the Ballot Receipt date data.

The segmented Manassass City 2022 plot is below. There is a 5.82% error rate of invalid (all midnight) timestamps in the Ballot Receipt date data.

The segmented Mathews County 2022 plot is below. There is a 24.21% error rate of total invalid timestamps in the Ballot Receipt date data, and a reduced error rate of 15.71% when allowing all midnight timestamps to be considered as valid.

The segmented Virginia Beach City 2022 plot is below. There is a 0.24% error rate of invalid (all midnight) timestamps in the Ballot Receipt date data.

The complete set of generated plots for every locality is included in the attached zip file:

Categories
Uncategorized

Variances Observed in TX Early Vote Data

Recently I’ve started downloading all of the data from the TX secretary of state website multiple times per day. Each time I download the data I grab new versions of files representing how many Mail-In or In-Person votes have happened since mail-in votes have started to be accepted, according to the TX SOS. Note that this TX Early Voting return data, which is required by law to be publicly posted daily, is supposed to reflect the number of voted ballots (either In-Person or Mail-In) per the previous days in the ongoing election and serves as the official public record of these ballot transactions.

The TX SOS does site not post the cumulative results, but instead has individual links by day that show the totals of each category of voted ballot. I have downloaded copies all of this data over multiple days.

Now you would think, that if the TX SOS data was trustworthy and accurate, that I shouldn’t see differences in the historical data on the TX SOS site day to day. I should see new data as a newly available download, but the data associated with previous days results should stay the same.

… except it doesn’t.

In the gallery below are 3 separate graphs of the data pulled from the TX SOS site. Each pull of the data grabbed the entire history of the data.

If you play the images in sequence you will notice that between the 1st (captured on 10/26 @ ~3pm) and second image (captured 10/27 @ ~7am) there are a few thousand ballots that suddenly appear in the Mail-In ballot trace attributed to 10/22. Between the second and third image (captured 10/27 @ ~9PM) you will see that there are a handful (~10) of ballots that get retrospectively added to the In-Person ballot totals attributed to 10/17 and 10/18.

What is the explanation for these additions?

I’m happy to supply the raw downloaded and timestamped files to anyone who is interested. Feel free to contact me and I will send the latest zip files and source code used to download.