BLUF: The number of detected exact (Full Name + DOB) “clone” registration records in the VA Registered Voter List (RVL) file has decreased overall from 2022-11-23 to 2023-07-01, however there are additional new clones still being added to the database.
There has been a concentrated effort by various election integrity groups and public officials around the state of VA to clean up the voter rolls. Specifically the VA Department of Elections (“ELECT”) made a concerted effort in early 2023 to remove and clean up a large number (~19,000) of deceased voters and other errant records. The data below shows that these efforts have made an impact on the number of exact “clones” identified in the database, but that there is still more work to do.
BACKGROUND: This is a continuation of exploratory analysis on the existence of “cloned” records in the VA Registered Voter List. Please see previous posts here, here, here and here for background information.
As a reminder and for the purposes of this analysis, a potential “cloned” record is defined as a record in the VA Registered Voter List (RVL) where the Full Name (First + Middle + Last + Suffix) and full Date of Birth (mm/dd/yyyy) exactly matches another record but they have different Voter Identification Numbers. In my previous analyses I was focusing on the Active registration records that had been identified as clones, but even if a cloned record is marked as Inactive in the database, it still holds the potential to be voted as any interaction with the voter immediately moves the registration from Inactive to Active. Therefore the analysis below includes both Active and Inactive records.
It is important to note and emphasize that this analysis is only specifically focusing on exact clones, and not any other of the number of potential errors that could be represented in a voter database. There are a couple of reasons for this narrow focus:
- The detection of exact clones in the database requires no additional data correlations and can operate directly on the data provided from ELECT. It is easily defined and scriptable, and can be replicated by other researchers and public officials for verification.
- Due to item (1), the identification of exact clones is a good candidate to track over time as a proxy indicator for issues with the database.
- There are some rather interesting non-random distribution patterns of the already observed cloned records that I have previously discussed here, and I am interested in observing and understanding the cause of these distribution shapes.
DETAILS: As we now have collected multiple statewide voter registration lists, I was curious as to how the numbers of detected cloned registrant records have changed over time, specifically with respect to the REGISTRATION_DATE field that is reported in each record.
In Figure 1 below I’ve plotted the number of identified cloned records in the 2022-11-23 RVL stratified by registration date year. Figure 2 is the same plot from the 2023-06-30 RVL file we recently purchased. Figure 3 shows the number of additions, removals and net change in the number of identified cloned records in each date bin between these two datasets, based on the unique set of cloned Voter ID Numbers that fall within each bin.
The total number of identified clones in the 2022-11-23 dataset was 2,445. The total number of identified clones in the 2023-06-30 was 1,485. We can see in Figure 3 that while there has been an overall reduction of 960 in the number of cloned records (which is good!), there are still new clones being added to the voter registration database, even as previously identified clones have been removed. This suggests that there is an ongoing process(es) or mechanism(s) that is continuously adding cloned records to the voter list database.
It is not readily apparent as to what causes the added cloned records. It could be any number of technology issues, such as a poor input verification or coding practices, or related to human error and poor procedures and/or training, or a mixture of issues.
The two full datasets above (2022-11-23 and 2023-06-30) were purchased directly from the department of elections. The lists were parsed, standardized and normalized for string case, known typos, whitespace and punctuation issues, but otherwise the raw data entries were unadjusted.
We also purchased the Monthly Update Subscription (MUS) from ELECT at the time that we ordered the 2023-06-30 RVL. The MUS is generated on a monthly basis and captures the changes to the voter list over the prior month. We received the 2022-07-01 MUS, and applied the changes within it to the full 2023-06-30 RVL that we had received the day before. As the MUS contains all the changes over the previous month, and we had purchased our full dataset the prior day, we did not expect there to be many adjustments required, but there were a few. Applying the MUS to the 2023-06-30 dataset resulted in the generation of an updated 2023-07-01 dataset.
For completeness, the same plots that were generated above for the directly purchased data is repeated below for the updated RVL dataset. Figure 4 plots the number of identified cloned records in the 2023-07-01 RVL stratified by registration date year, and Figure 5 shows the differences with the 2022-06-23 dataset.
One thing that I find interesting is the difference in detected cloned entries between the purchased 2023-06-30 dataset and the 2023-07-01 dataset after the MUS entries have been applied. This is presented in Figure 6 below. We see that the application of the MUS did not remove any clones, but 90 were added. I’m not sure what this means yet, as we only have a single MUS file and it was generated so close to our full download. We will monitor and see how this progresses as we continue to receive the MUS files throughout the 2023 election cycle.