Categories
Election Integrity technical

VA Department of Elections Removing Full Date of Birth from Purchased Datasets

The VA Department of Elections (ELECT) has given us at EPEC notice (as of 8/26/2024) that they will be removing the full date of birth information from purchased datasets and replacing it with year of birth only. Note that this is not in relation to publicly available data, but for data that has been purchased by specific qualifying individuals or organizations capable of purchasing and handling expanded datasets according to VA law and ELECT policies. Current VA law does NOT allow full birthdate information in publicly disclosed records, but there is no restriction on it being included in otherwise protected records provided to qualified organizations. In fact, many qualified organizations have relied on this information in order to perform their legitimate functions.

The use of full birthdate information is an important field when trying to identify and discriminate between individual registrants that might otherwise have the same name information: “John Q. Smith 10/19/1981” can obviously be determined to be a unique registrant vs. “John Q. Smith 04/01/1981”. But if only the year of birth information is available, these two (hypothetical) records become much more difficult to distinguish without supplemental information. Removing the full date of birth information degrades the confidence and accuracy of registration matching queries.

The degradation in the ability to confidently match records due to removing month and day of birth information can be seen in the table below. 1,000,000 records from a recent VA RVL (Aug 2024) were used to demonstrate what happens to the ability to match records when altering the birth information. All other processing was exactly the same. Even when only considering records that are exactly matching, the removal of month and day of birth information results in an increase of potential matches by nearly 400%. That means that for every true match that a Full-DOB would produce, there are 3 false-positive matches when only using Year of Birth information. The numbers get increasingly worse as you consider 1, 2, or 3 character differences.

Exact Matches1 Char Diff 2 Chars Diff 3 Chars Diff Total
Full-DOB1138633415562089
Year-Only4413512101993415148303
% Change390.27%4083.72%3053.59%2194.79%2312.25%
1,000,000 records from the most recent VA Registered Voter List (RVL) were compared for similarity score using different birth information. “Full-DOB” processing utilized the FIRST MIDDLE LAST SUFFIX GENDER MM/DD/YYYY information. Year only processing utilized FIRST MIDDLE LAST SUFFIX GENDER YYYY information. All other processing was exactly the same.

The ability to match or distinguish between registration records in the registered voter list is important for being able to perform a number of different legitimate activities by authorized organizations, such as:

  • The determination if a person is already registered (or not) by get-out-the-vote organizations.
  • Determining the existence of potential duplicate registration records by election integrity, public research, and watchdog organizations.
  • The identification and validation of potential Electoral Board member candidates, Election Officers and authorized Poll Watchers by political parties and candidates as required by VA law.
  • The vetting of volunteers for other partisan candidate and party functions and activities (door-knockers, event organizers, etc.)
  • The identification and verification of deceased individuals and corroborating obituary information.
  • … and much, much, more.

This change in the official data by ELECT will also affect the organizational and operational logistics of multiple organizations. With only a few weeks left before the start of the early voting period, organizations (both partisan and non-partisan alike) will need to expend precious money, time and resources in order to correct all of their data ingest and processing systems to handle the new formats and fields. They will also need to invent new logic to combine and fuse older (full-DOB) data with the newly released (year-only) data in order to maintain their mission effectiveness.

Note: This also means that there is an increased risk of spam phone calls and marketing materials being sent to individuals who otherwise would have been excluded from targeted marketing efforts had they been able to confidently discriminate them from similar, but different, registration records. Each of those otherwise unnecessary phone calls or text messages costs time and money for the candidates and campaigns, and the companies they hire, as well as potential annoyance by the recipients.

As it stands, we (the proverbial “we”) are just going to have to deal with this sudden loss of data fidelity for the time being. Even if political or legal remedies are ultimately successful, they will take time. The works still needs to get done in the meantime. It’s not the end of the world, but it decreases the confidence of automated matching systems, and increases the amount of human labor required by volunteers, campaigns, registrars and other election officials.

Some questions I have regarding this matter:

  • Why the sudden change? There does not seem to be a recent court case or legal reason pressing for immediate change, at least that I am aware of.
  • Why not wait until after the election such as not to impact current operations of various election related organizations? This would be consistent with how the department of elections has operated in the past.

Leave a Reply