Categories
Election Data Analysis Election Forensics Election Integrity mathematics technical

‘Dark’ Transactions in VA’s Voter Registration Data

EPEC has compared the changes to two purchased full versions of the VA Registered Voter List (RVL) to the content of the Monthly Update Service (MUS) data covering the same temporal period. Of the ID numbers that were added to the RVL, 3,613 (or 1.0589% of total additions) never appear anywhere in the MUS files covering the same temporal period. Of the ID numbers that were removed from the RVL, 3,355 (or 2.4096% total removals) never appear anywhere in the MUS files covering the same temporal period.


Since mid 2023 EPEC has been purchasing, processing and archiving copies of both the full Registered Voter List (RVL) and the Monthly Update Service (MUS) files which gives the UPDATE, ADD or CANCEL transactions to the voter list throughout the year.

Once a baseline RVL is established, the MUS files can be used to update that baseline in order to keep the list current. That should be all one needs to keep an accurate dataset of the registered voter list using monthly updates … except there is a catch … the MUS for some reason doesn’t quite capture all of the changes that are occurring in the voter list. In fact, we see about 1-2.5% of the ADD or CANCEL transactions between each RVL snapshot are not reflected by any corresponding entries in the MUS.

All of the changes that are made between two different RVL baseline snapshots should be able to be observed in the corresponding MUS files that cover the same time period, and vice versa. The MUS has transaction logs accounting for new registrants, for registrants who move, for removing deceased individuals, for individuals that have had a change in their felon status, for individuals who are determined non-citizen, for administrative updates and correction, etc. So, in theory, it should be able to be a complete record. However, over the course of working with the VA data files, every so often we have noticed that some transactions seem to be unaccounted for. Therefore, once we had enough data compiled, we decided to test just how well the MUS data actually explains the changes we see between between two baseline RVL files.

Method:

For this experiment, we used full RVL snapshots purchased from VA Department of Elections (ELECT) on 2023-06-30 and 2024-08-29, and all of the monthly MUS distributions covering the entire time period in between.

Using the voter ID number field that is present in all datasets, we first determine which ID numbers were added to the 2024 RVL dataset, and which ID numbers were deleted from the 2023 RVL data. We then checked to see how many of those ID numbers appear in any of the MUS data files, for any reason.

Note that this data was processed statewide, such that registrants moving between localities within the state should not affect the total number of computed additions or removals, as the ID numbers should still be present in the datasets, although corresponding locality information may have changed.

Results:

The breakdown of the number of changes that were present in the MUS file over the time period of the RVL snapshots (2023-06-30 through 2024-08-29) is given in Figure 1 below. The MUS data was deduplicated and truncated to only consider transactions with TRANSACTION date information between the dates associated with the RVL datasets. The bars in Figure 1 are logarithmically scaled in the y-axis, with the x-axis representing the NVRAReasonCode given for each transaction in the MUS. The bars are color coded by transaction type. As there are duplicates and oversampling within the collection of MUS files, only the latest transactions for each uniquely identified ID number was utilized to generate the plot. As can be seen from the various categories along the x-axis of this plot, the data in the MUS logs should be sufficient to capture all of the transactions with the RVL.

Figure 1: Breakdown of MUS transactions between 2023-06-30 and 2024-08-29

Direct Inspection of the RVL Snapshots:

Performing a simple set-difference between the elements of the unique ID numbers present in the 2023-06-30 RVL data vs the 2024-08-29 RVL data shows that there were 341,191 unique ID’s added, and 139,232 removed between the two datasets.

Of the ID numbers that were ADDED between the raw RVL snapshots, 3,613 (or 1.0589%) never appear anywhere in the MUS files covering the same temporal period.

Of the 3,613 ID numbers that were ADDED between the raw RVL snapshots, and that don’t appear in the MUS record, 537 (or 14.863%) have at least one entry in the Voter History List (VHL) data the EPEC has been collecting and archiving.

Of the ID numbers that were REMOVED between the raw RVL snapshots, 3,355 (or 2.4096%) never appear anywhere in the MUS files covering the same temporal period.

Of the 3,355 ID numbers that were REMOVED between the raw RVL snapshots, and that don’t appear in the MUS record, 2,011 (or 59.94%) have at least one entry in the VHL data the EPEC has been collecting and archiving.

Using the MUS-Adjusted RVL baseline

If we ignore the 2024-08-29 dataset, and instead directly apply the transactions in the MUS datafiles to the 2023-06-30 dataset in order to create a new RVL list, we would end up with 342,888 Additions, and 137,849 removals respectively to unique voter ID numbers. We see 1,697 more (342,888-341,191=1697) additions when trying to directly apply the MUS than when directly comparing RVL snapshots, and 1,383 less (139,232-137,849=1393) removals. Keep in mind these discrepancies are in addition to the 3,613 and 3,355 discrepancies using the RVL snapshot baselines, as the ID numbers in each set are unique. So the total number of discrepancies is 3,613 + 3,355 + 1,697 + 1,383 = 10,048 records.

Summary of these results:
                   Num_Added: 341191
        Num_Added_not_in_MUS: 3613
        Pct_Added_not_in_MUS: 1.0589
  Num_Added_not_in_MUS_wVHL: 537
                 Num_Removed: 139232
      Num_Removed_not_in_MUS: 3355
      Pct_Removed_not_in_MUS: 2.4096
Num_Removed_not_in_MUS_wVHL: 2011
           MUS_Num_Deletions: 137849
           MUS_Num_Additions: 342888
             MUS_Num_Updates: 946248
            NUS_Num_NOOP_ADD: 651
         NUS_Num_NOOP_MODIFY: 334282
Discussion:

We do not understand yet the origin of these discrepancies, it could be a coding error on the part of the developers of the VERIS system, or it could be that there is a category of data adjustments that is not adequately reflected in the RVL or MUS data products. The RVL snapshots are supposed to be the authoritative record of the voter registration data, and the MUS data updates are supposed to capture all of the transactional changes to said registration records.

Regardless of the cause of the discrepancy, the fact remains that there are a small number of transactions and changes to the voter record that are unobservable. They are, in effect, “dark” transactions in the voter registration data that cannot be observed, validated or verified.

Categories
Election Integrity technical

VA Department of Elections Removing Full Date of Birth from Purchased Datasets

The VA Department of Elections (ELECT) has given us at EPEC notice (as of 8/26/2024) that they will be removing the full date of birth information from purchased datasets and replacing it with year of birth only. Note that this is not in relation to publicly available data, but for data that has been purchased by specific qualifying individuals or organizations capable of purchasing and handling expanded datasets according to VA law and ELECT policies. Current VA law does NOT allow full birthdate information in publicly disclosed records, but there is no restriction on it being included in otherwise protected records provided to qualified organizations. In fact, many qualified organizations have relied on this information in order to perform their legitimate functions.

The use of full birthdate information is an important field when trying to identify and discriminate between individual registrants that might otherwise have the same name information: “John Q. Smith 10/19/1981” can obviously be determined to be a unique registrant vs. “John Q. Smith 04/01/1981”. But if only the year of birth information is available, these two (hypothetical) records become much more difficult to distinguish without supplemental information. Removing the full date of birth information degrades the confidence and accuracy of registration matching queries.

The degradation in the ability to confidently match records due to removing month and day of birth information can be seen in the table below. 1,000,000 records from a recent VA RVL (Aug 2024) were used to demonstrate what happens to the ability to match records when altering the birth information. All other processing was exactly the same. Even when only considering records that are exactly matching, the removal of month and day of birth information results in an increase of potential matches by nearly 400%. That means that for every true match that a Full-DOB would produce, there are 3 false-positive matches when only using Year of Birth information. The numbers get increasingly worse as you consider 1, 2, or 3 character differences.

Exact Matches1 Char Diff 2 Chars Diff 3 Chars Diff Total
Full-DOB1138633415562089
Year-Only4413512101993415148303
% Change390.27%4083.72%3053.59%2194.79%2312.25%
1,000,000 records from the most recent VA Registered Voter List (RVL) were compared for similarity score using different birth information. “Full-DOB” processing utilized the FIRST MIDDLE LAST SUFFIX GENDER MM/DD/YYYY information. Year only processing utilized FIRST MIDDLE LAST SUFFIX GENDER YYYY information. All other processing was exactly the same.

The ability to match or distinguish between registration records in the registered voter list is important for being able to perform a number of different legitimate activities by authorized organizations, such as:

  • The determination if a person is already registered (or not) by get-out-the-vote organizations.
  • Determining the existence of potential duplicate registration records by election integrity, public research, and watchdog organizations.
  • The identification and validation of potential Electoral Board member candidates, Election Officers and authorized Poll Watchers by political parties and candidates as required by VA law.
  • The vetting of volunteers for other partisan candidate and party functions and activities (door-knockers, event organizers, etc.)
  • The identification and verification of deceased individuals and corroborating obituary information.
  • … and much, much, more.

This change in the official data by ELECT will also affect the organizational and operational logistics of multiple organizations. With only a few weeks left before the start of the early voting period, organizations (both partisan and non-partisan alike) will need to expend precious money, time and resources in order to correct all of their data ingest and processing systems to handle the new formats and fields. They will also need to invent new logic to combine and fuse older (full-DOB) data with the newly released (year-only) data in order to maintain their mission effectiveness.

Note: This also means that there is an increased risk of spam phone calls and marketing materials being sent to individuals who otherwise would have been excluded from targeted marketing efforts had they been able to confidently discriminate them from similar, but different, registration records. Each of those otherwise unnecessary phone calls or text messages costs time and money for the candidates and campaigns, and the companies they hire, as well as potential annoyance by the recipients.

As it stands, we (the proverbial “we”) are just going to have to deal with this sudden loss of data fidelity for the time being. Even if political or legal remedies are ultimately successful, they will take time. The works still needs to get done in the meantime. It’s not the end of the world, but it decreases the confidence of automated matching systems, and increases the amount of human labor required by volunteers, campaigns, registrars and other election officials.

Some questions I have regarding this matter:

  • Why the sudden change? There does not seem to be a recent court case or legal reason pressing for immediate change, at least that I am aware of.
  • Why not wait until after the election such as not to impact current operations of various election related organizations? This would be consistent with how the department of elections has operated in the past.
Categories
Election Data Analysis Election Forensics Election Integrity technical

2024 VA November General Election DAL File Metrics

Below you will find the current summary data and graphics from the 2024 VA November General Election Daily Absentee List files. We pull the DAL file everyday and track the count of each specific ballot category in each daily file.

Note: Page may take a moment to load the graphics objects.

Linear Scale Plot:

Place your cursor over the series name in the legend at right to see the series highlighted in the graphic. Place your cursor over a specific data point to see that data points value.

Logarithmic Scale Plot:

The logarithmic plot is the same underlying data as the linear scale plot, except with a logarithmic y-scale in order to be able to compress the dynamic range and see the shape of all of the data curves in a single graphic. Place your cursor over the series name in the legend at right to see the series highlighted in the graphic. Place your cursor over a specific data point to see that data points value.

Summary Data Table:
Print  CSV  Copy  

The underlying data for the graphics above is provided in the summary data table.

Additional Data:

Additional CSV datasets stratified by Locality, City, Congressional District, State House District, State Senate District, and Precinct are available here.

Data column descriptions:
  • ISSUED” := Number of DAL file records where BALLOT_STATUS= “ISSUED”
  • NOT_ISSUED” := Number of DAL file records where BALLOT_STATUS= “NOT ISSUED”
  • PROVISIONAL” := Number of DAL file records where BALLOT_STATUS= “PROVISIONAL” and APP_STATUS=”APPROVED”
  • DELETED” := Number of DAL file records where BALLOT_STATUS= “DELETED”
  • MARKED” := Number of DAL file records where BALLOT_STATUS= “MARKED” and APP_STATUS=”APPROVED”
  • ON_MACHINE” := Number of DAL file records where BALLOT_STATUS= “ON_MACHINE” and APP_STATUS=”APPROVED”
  • PRE_PROCESSED” := Number of DAL file records where BALLOT_STATUS= “PRE-PROCESSED” and APP_STATUS=”APPROVED”
  • FWAB” := Number of DAL file records where BALLOT_STATUS= “FWAB” and APP_STATUS=”APPROVED”
  • MAIL_IN” := The sum of “MARKED” + “PRE_PROCESSED”
  • COUNTABLE” := The sum of “PROVISIONAL” + “MARKED” + “PRE_PROCESSED” + “ON_MACHINE” + “FWAB”
  • MILITARY” := Number of DAL file records where VOTER_TYPE= “MILITARY”
  • OVERSEAS” := Number of DAL file records where VOTER_TYPE= “OVERSEAS”
  • TEMPORARY” := Number of DAL file records where VOTER_TYPE= “TEMPORARY”
  • MILITARY_COUNTABLE” := Number of DAL file records where VOTER_TYPE= “MILITARY” and where COUNTABLE is True
  • OVERSEAS_COUNTABLE” := Number of DAL file records where VOTER_TYPE= “OVERSEAS” and where COUNTABLE is True
  • TEMPORARY_COUNTABLE” := Number of DAL file records where VOTER_TYPE= “TEMPORARY” and where COUNTABLE is True

All data purchased by Electoral Process Education Corp. (EPEC) from the VA Dept of Elections (ELECT). All processing performed by EPEC.

If you like the work that EPEC is doing, please support us with a donation.