Categories
Election Data Analysis Election Forensics Election Integrity mathematics technical Uncategorized

VA 2024 March Primary Election Fingerprints

Abstract

Examining the Election Night Reporting data from the VA 2024 March Democratic and Republican primaries provides supporting evidence that the Republican primary was impacted and skewed by a large number of Democratic “crossover” voters, resulting in an irregular election fingerprint when the data is plotted.

Background

The US National Academy of Sciences (NAS) published a paper in 2012 titled “Statistical detection of systematic election irregularities.” [1] The paper asked the question, “How can it be distinguished whether an election outcome represents the will of the people or the will of the counters?” The study reviewed the results from elections in Russia and other countries, where widespread fraud was suspected. The study was published in the proceedings of the National Academy of Sciences as well as referenced in multiple election guides by USAID [2][3], among other citations.

The study authors’ thesis was that with a large sample sample of the voting data, they would be able to see whether or not voting patterns deviated from the voting patterns of elections where there was no suspected fraud. The results of their study proved that there were indeed significant deviations from the expected, normal voting patterns in the elections where fraud was suspected, as well as provided a number of interesting insights into the associated “signatures” of various electoral mechanism as they present themselves in the data.

Statistical results are often graphed, to provide a visual representation of how normal data should look. A particularly useful visual representation of election data, as utilized in [1], is a two-dimensional histogram of the percent voter turnout vs the percent vote share for the winner, or what I call an “election fingerprint”. Under the assumptions of a truly free and fair election, the expected shape of the fingerprint is of that of a 2D Gaussian (a.k.a. a “Normal”) distribution [4]. The obvious caveat here is that no election is ever perfect, but with a large enough sample size of data points we should be able to identify large scale statistical properties.

In many situations, the results of an experiment follow what is called a ‘normal distribution’. For example, if you flip a coin 100 times and count how many times it comes up heads, the average result will be 50. But if you do this test 100 times, most of the results will be close to 50, but not exactly. You’ll get almost as many cases with 49, or 51. You’ll get quite a few 45s or 55s, but almost no 20s or 80s. If you plot your 100 tests on a graph, you’ll get a well-known shape called a bell curve that’s highest in the middle and tapers off on either side. That is a normal distribution.

https://news.mit.edu/2012/explained-sigma-0209

In a free and fair election, the plotted graphs of both the Turnout percentage and the percentage of Vote Share for Election Winner should (again … ideally) both resemble Gaussian “Normal” distributions; and their combined distribution should also follow a 2-dimensional Gaussian (or “normal”) distribution. Computing this 2 Dimensional joint distribution of the % Turnout vs. % Vote Share is what I refer to as an “Election Fingerprint”.

Figure 1 is reprinted examples from the referenced National Academy of Sciences paper. The actual election results in Russia, Uganda and Switzerland appear in the left column, the right column is the modeled expected appearance in a fair election with little fraud, and the middle column is the researchers’ model of the as-collected data, with any possible fraud mechanisms included.

Figure 1: NAS Paper Results (reprinted from [1])

As you can see, the election in Switzerland (assumed fair) shows a range of voter turnout, from approximately 30 – 70% across voting districts, and a similar range of votes for the winner. The Switzerland data is consistent across models, and does not show any significant irregularities.

What do the clusters mean in the Russia 2011 and 2012 elections? Of particular concern are the top right corners, showing nearly 100% turnout of voters, and nearly 100% of them voted for the winner.

Both of those events (more than 90% of registered voters turning out to vote and more than 90% of the voters voting for the winner) are statistically improbable, even for very contested elections. Election results that show a strong linear streak away from the main fingerprint lobe indicates ‘ballot stuffing,’ where ballots are added at a specific rate. Voter turnout over 100% indicates ‘extreme fraud’. [1][5]

Note that election results with ‘outliers’ – results that fall outside of expected normal voting patterns – while evidentiary indicators, are not in and of themselves definitive proof of outright fraud or malfeasance. For example, in rare but extreme cases, where the electorate is very split and the split closely follows the geographic boundaries between voting precincts, we could see multiple overlapping Gaussian lobes in the 2D image. Even in that rare case, there should not be distinct structures visible in the election fingerprint, linear streaks, overly skewed or smeared distributions, or exceedingly high turnout or vote share percentages. Additional reviews of voting patterns and election results should be conducted whenever deviations from normal patterns occur in an election.

Additionally it should be noted that “the absence of evidence is not the evidence of absence”: Election Fingerprints that look otherwise normal might still have underlying issues that are not readily apparent with this view of the data.

Results on 2024 VA March Primaries:

Figure 2 and Figure 3 are the computed election fingerprints for the Democratic and Republican VA 2024 March Primaries, respectively. They were computed according to the NAS paper and using official state reported voter turnout and votes for the statewide winner and reported per voting Locality with combined In-Person Early, Election Day, Absentee and Provisional votes. Figures 4 and 5 perform the same process, except each data point is generated per individual precinct in a locality. The color scale moves from precincts with low counts as deep blue, to precincts with high numbers represented as bright yellow. Note that a small blurring filter was applied to the computed image for ease of viewing small isolated Locality or Precinct results.

The upper right inset in each graphic image was computed per the NAS paper; the bottom left inset shows what an idealized model of the data could or should look like, based on the reported voter turnout and vote share for the winner. This ideal model is allowed to have up to 3 Gaussian lobes based on the peak locations and standard deviations in the reported results. The top-left and bottom-right inset plots show the sum of the rows and columns of the fingerprint image. The top-left graph corresponds to the sum of the rows in the upper right image and is the histogram of the vote share for the winner across precincts. The bottom right graph shows the sum of the columns of the upper right image, and is the histogram of the percentage turnout across voting localities.

Figure 2 Democratic primary, accumulated per Locality:
Figure 3 Republican primary, accumulated per Locality:
Figure 4 Democratic primary, accumulated per Precinct:
Figure 5 Republican primary, accumulated per Precinct:

Analysis:

As can be seen in Figure 2 and 4, the Democratic primary fingerprint looks to fall within expected normal distribution. Even though the total vote share for the winner (Biden) is up around 90%, this was not unexpected given the current set of contestants and the fact that Biden is the incumbent.

The Republican primary results, as shown in Figure 3 and 5, show significant “smearing” of the percent of total vote share for the winner. The percent of voter turnout (x-axis) does however show a near Gaussian distribution, which is what one would expect. The republican primary data does not show the linear streaking pattern that the authors in [1] correlate with extreme fraud, but significant smearing of the distribution is observed.

A consideration that might partially explain this smearing of the histogram, is that there was at least 17% of “crossover voters” who historically lean Democrat but voted in the Republican primary (see here for more information). Multiple news reports and exit polling suggest that this was due in part to loosely organized efforts by the opposing party to cast “Protest Votes” and artificially inflate the challenger (Haley) and dilute the expected (Trump) margin of victory for the winner, with no intention of supporting a Republican candidate in the General Election. (This is completely legal in VA, by the way, as VA does not require by-party voter registration.)

If we categorize each locality as being either Democratic or Republican leaning based on the average results of the last four presidential elections, and then split the computation of the per precinct results into separate parts accordingly, we can see this phenomenon much clearer.

Figure 6 shows the per-precinct results for only those locality precincts that belong to historic Republican leaning localities. It depicts a much tighter distribution and has much less smearing or blurring of the distribution tails. We can see from the data that Republican base in historically Republican leaning localities seems solidly behind candidate Trump.

Figure 7 shows the per-precinct results for only those locality precincts that belong to historic Democratic leaning localities. It can clearly be seen by comparing the two plots that the major contributor to the spread of the total republican primary distribution is the votes from historically Democratic leaning localities.

Figure 6 Republican primary, accumulated per Precinct in Republican leaning localities:
Figure 7 Republican primary, accumulated per Precinct in Democratic leaning localities:

References:

Categories
Election Data Analysis Election Forensics Election Integrity

2024 VA March Republican Primary Election DAL File Metrics

Below you will find the current summary data and graphics from the VA 2024 Republican Primary Election Daily Absentee List files. We pull the DAL file everyday and track the count of each specific ballot category in each daily file.

Note: Page may take a moment to load the graphics objects.

Linear Scale Plot:

Place your cursor over the series name in the legend at right to see the series highlighted in the graphic. Place your cursor over a specific data point to see that data points value.

Logarithmic Scale Plot:

The logarithmic plot is the same underlying data as the linear scale plot, except with a logarithmic y-scale in order to be able to compress the dynamic range and see the shape of all of the data curves in a single graphic. Place your cursor over the series name in the legend at right to see the series highlighted in the graphic. Place your cursor over a specific data point to see that data points value.

Summary Data Table:
Print  CSV  Copy  

The underlying data for the graphics above is provided in the summary data table.

Additional Data:

Additional CSV datasets stratified by Locality, City, Congressional District, State House District, State Senate District, and Precinct are available here.

Data column descriptions:
  • ISSUED” := Number of DAL file records where BALLOT_STATUS= “ISSUED”
  • NOT_ISSUED” := Number of DAL file records where BALLOT_STATUS= “NOT ISSUED”
  • PROVISIONAL” := Number of DAL file records where BALLOT_STATUS= “PROVISIONAL” and APP_STATUS=”APPROVED”
  • DELETED” := Number of DAL file records where BALLOT_STATUS= “DELETED”
  • MARKED” := Number of DAL file records where BALLOT_STATUS= “MARKED” and APP_STATUS=”APPROVED”
  • ON_MACHINE” := Number of DAL file records where BALLOT_STATUS= “ON_MACHINE” and APP_STATUS=”APPROVED”
  • PRE_PROCESSED” := Number of DAL file records where BALLOT_STATUS= “PRE-PROCESSED” and APP_STATUS=”APPROVED”
  • FWAB” := Number of DAL file records where BALLOT_STATUS= “FWAB” and APP_STATUS=”APPROVED”
  • MAIL_IN” := The sum of “MARKED” + “PRE_PROCESSED”
  • COUNTABLE” := The sum of “PROVISIONAL” + “MARKED” + “PRE_PROCESSED” + “ON_MACHINE” + “FWAB”
  • MILITARY” := Number of DAL file records where VOTER_TYPE= “MILITARY”
  • OVERSEAS” := Number of DAL file records where VOTER_TYPE= “OVERSEAS”
  • TEMPORARY” := Number of DAL file records where VOTER_TYPE= “TEMPORARY”
  • MILITARY_COUNTABLE” := Number of DAL file records where VOTER_TYPE= “MILITARY” and where COUNTABLE is True
  • OVERSEAS_COUNTABLE” := Number of DAL file records where VOTER_TYPE= “OVERSEAS” and where COUNTABLE is True
  • TEMPORARY_COUNTABLE” := Number of DAL file records where VOTER_TYPE= “TEMPORARY” and where COUNTABLE is True

All data purchased by Electoral Process Education Corp. (EPEC) from the VA Dept of Elections (ELECT). All processing performed by EPEC.

If you like the work that EPEC is doing, please support us with a donation.

Categories
Election Data Analysis Election Forensics Election Integrity technical

2024 VA March Democratic Primary Election DAL File Metrics

Below you will find the current summary data and graphics from the VA 2024 Democratic Primary Election Daily Absentee List files. We pull the DAL file everyday and track the count of each specific ballot category in each daily file.

Note: Page may take a moment to load the graphics objects.

Linear Scale Plot:

Place your cursor over the series name in the legend at right to see the series highlighted in the graphic. Place your cursor over a specific data point to see that data points value.

Logarithmic Scale Plot:

The logarithmic plot is the same underlying data as the linear scale plot, except with a logarithmic y-scale in order to be able to compress the dynamic range and see the shape of all of the data curves in a single graphic. Place your cursor over the series name in the legend at right to see the series highlighted in the graphic. Place your cursor over a specific data point to see that data points value.

Summary Data Table:
Print  CSV  Copy  

The underlying data for the graphics above is provided in the summary data table.

Additional Data:

Additional CSV datasets stratified by Locality, City, Congressional District, State House District, State Senate District, and Precinct are available here.

Data column descriptions:
  • ISSUED” := Number of DAL file records where BALLOT_STATUS= “ISSUED”
  • NOT_ISSUED” := Number of DAL file records where BALLOT_STATUS= “NOT ISSUED”
  • PROVISIONAL” := Number of DAL file records where BALLOT_STATUS= “PROVISIONAL” and APP_STATUS=”APPROVED”
  • DELETED” := Number of DAL file records where BALLOT_STATUS= “DELETED”
  • MARKED” := Number of DAL file records where BALLOT_STATUS= “MARKED” and APP_STATUS=”APPROVED”
  • ON_MACHINE” := Number of DAL file records where BALLOT_STATUS= “ON_MACHINE” and APP_STATUS=”APPROVED”
  • PRE_PROCESSED” := Number of DAL file records where BALLOT_STATUS= “PRE-PROCESSED” and APP_STATUS=”APPROVED”
  • FWAB” := Number of DAL file records where BALLOT_STATUS= “FWAB” and APP_STATUS=”APPROVED”
  • MAIL_IN” := The sum of “MARKED” + “PRE_PROCESSED”
  • COUNTABLE” := The sum of “PROVISIONAL” + “MARKED” + “PRE_PROCESSED” + “ON_MACHINE” + “FWAB”
  • MILITARY” := Number of DAL file records where VOTER_TYPE= “MILITARY”
  • OVERSEAS” := Number of DAL file records where VOTER_TYPE= “OVERSEAS”
  • TEMPORARY” := Number of DAL file records where VOTER_TYPE= “TEMPORARY”
  • MILITARY_COUNTABLE” := Number of DAL file records where VOTER_TYPE= “MILITARY” and where COUNTABLE is True
  • OVERSEAS_COUNTABLE” := Number of DAL file records where VOTER_TYPE= “OVERSEAS” and where COUNTABLE is True
  • TEMPORARY_COUNTABLE” := Number of DAL file records where VOTER_TYPE= “TEMPORARY” and where COUNTABLE is True

All data purchased by Electoral Process Education Corp. (EPEC) from the VA Dept of Elections (ELECT). All processing performed by EPEC.

If you like the work that EPEC is doing, please support us with a donation.

Categories
Election Data Analysis Election Forensics Election Integrity programming technical Uncategorized

Records of Early Ballots Cast Do Not Have Corresponding Registration Records in VA 2023 General Election Data

Update (2023-12-14 12:00:00 EST) : Special thank you to Rick Michael of the Chesterfield Electoral board for checking their records on issues #1 and #2 below. There were 3 x Issue #1 records and 9 x Issue #2 records identified in Chesterfield County.

According to Rick, the records in question were populated and visible when looking via the electronic VERIS (the states election database) login available to the Registrar. The 3 x Issue #1 records can be found and are Active records in the electronic system, and the 9 x Issue #2 records had an update that moved the records from Inactive to Active that were not reflected in the data supplied to us.

That implies that the data that we purchased (for approximately $12,000) directly from the department of elections is inaccurate and incomplete. Our initial purchase and download of the June 30 Registered Voter List (RVL) database does not show the registrants identified in Issue #1, even though the Registrar can see them in their electronic terminal. And our Monthly Update Subscription (MUS) we receive is missing the updates showing the registrant records identified in Issue #2 being moved from Inactive to Active status.

The department of elections is required by federal law (NVRA, HAVA) to keep and maintain accurate election records AND to make those records accessible for inspection and verification, and for use by candidates and political parties. Additionally, we have paid (twice!) for this data; once as taxpayers, and once again as a 501c3 entity. If the data we, and other campaigns and candidates are receiving is not representative of the actual records in the database, incomplete and inaccurate … that needs to be addressed and fixed.

Summary:
  • Issue #1: There are 99 records of ballots cast, according to the VA Department of Elections (ELECT) Daily Absentee List (DAL) data file that do not have corresponding voter ID listed in Registered Voter List (RVL) data.
  • Issue #2: There are 380 records of ballots cast in the DAL where the corresponding RVL record has been listed as “Inactive” since June-30-2023 and no modification to the RVL record has taken place.
  • Issue #3: There are 18 records of ballots cast in the DAL where the corresponding RVL record is listed as “Inactive” as of Dec-01-2023, but there has been previous modifications to the RVL record since June-30-2023.
  • We are currently reaching out in attempts to contact the VA AG’s office and to provide them the details of this analysis in order to have these anomalies further investigated.
Data files utilized for this analysis:

Our 501c3 EPEC purchased and downloaded the full statewide VA RVL on June-30-2023 from ELECT. We additionally purchased the Monthly Update Service (MUS) package from ELECT, where on the 1st of each month we are provided a list of all of the changes that have occurred to the RVL in the previous month. By applying these changes to our baseline data file, we are able to update our copy of the RVL to reflect the latest state as per ELECT. We can also create a cumulative record of all entries associated with a particular voter ID by simply concatenating all of these datafiles.

Additionally, during the VA 2023 General Election, we purchased access to the Daily Absentee List (DAL) file generated by ELECT that documents all of the transactions associated with early mail-in or in-person voting during the 45 day early voting period. The DAL file we utilized for this analysis was downloaded from ELECT on Nov-13-2023 at 6am EST.

Identification of ballots cast via the DAL file can be performed by checking for rows of the DAL data table that have the APP_STATUS field set to “Approved” and have the BALLOT_STATUS field set to any of the following: “Marked” | “Pre-Processed” | “On Machine” | “FWAB” | “Provisional”.

Once cast ballots are identified in the DAL, the Voter Identification Number can be used to lookup all of the corresponding records in our cumulative RVL data. The data issues summarized above can be directly observed using this process. Due to VA law, we cannot publish the full specific records here in this blog but have summarized, captured and described our process and results.

For Issue #1: If there does not exist a corresponding registration record for cast ballots, then the voter should not have been able to have their mail-in ballot approved or issued, or been able to check-in to an early voting precinct to vote on-machine. If the voter record actually does exists, then why is it not reflected in the data that we purchased from ELECT. Note that all provisional and Same Day Registration (SDR) ballots were required to be entered into the states database (“VERIS”) by the Friday after the election (Fri Nov-10-2023). We specifically waited until we received the Dec-01-2023 MUS data update from ELECT to attempt to perform this or similar analysis in order to ensure that we would not be missing any last minute registrations or RVL updates.

For Issue #2: There are 380 records of ballots being cast in the DAL where the baseline June-30-2023 RVL data file shows the registrant as inactive, and there has been no modifications or adjustments to the record presented in the MUS data files. Therefore these registrants should still have been listed as “Inactive” during the early voting period which started in September through Election Day (Nov 7).

For issue #3: There are 18 records that show the cast ballot is from a registrant that is currently listed as “Inactive” but there has been adjustments to the registration record over the last 6 months. An example of such is below. Note that I have captured the MUS data file generation dates in the 5th column to note when the file was generated and received by us.

In the example given below, the first invalidation operation on the registration record appears in the MUS file dated Sep-01, with the earliest transaction date listed as Aug-29-2023. The ballot application was not received until Sept 26 according to the DAL, so the application should never have been approved or the ballot issued as the registrant status should have been “Invalid” according to the states own data.

(Also … yes … I know there is a typo in the spelling of “APP_RECIEPT_DATE” in the tables below … but this is the data as it comes from ELECT).

APP_RECIEPT_DATEAPP_STATUSBALLOT_RECEIPT_DATEBALLOT_STATUS
“2023-09-26 00:00:00”Approved“2023-10-19 00:00:00”Pre-Processed
Example Extract of a DAL data record for a mail-in ballot cast during early voting.
TRANSACTIONDATETRANSACTIONTIMETrans_TypeNVRAReasonCodeFile Source
30-June-202312:12:00BASELINEBASELINEBaseline RVL
28-Jul-202309:34:03MODIFYChange OutMUS 08/01/2023
28-Jul-202309:34:04MODIFYAddress ChangeMUS 08/01/2023
28-Jul-202309:34:04MODIFYChange InMUS 09/01/2023
28-Jul-202309:34:03MODIFYChange OutMUS 09/01/2023
28-Jul-202309:34:04MODIFYAddress ChangeMUS 09/01/2023
28-Jul-202309:34:04MODIFYChange InMUS 09/01/2023
29-Aug-202311:55:49MODIFYInactivateMUS 09/01/2023
29-Aug-202311:55:49MODIFYInactivateMUS 10/01/2023
Extract of RVL Cumulative Data Records for Voter ID in above DAL entry
Categories
Election Data Analysis Election Forensics Election Integrity technical

2023 VA General Election DAL File Metrics

Below you will find the current summary data and graphics from the VA 2023 General Election Daily Absentee List files. We pull the DAL file everyday and track the count of each specific ballot category in each daily file.

Note: Page may take a moment to load the graphics objects.

Linear Scale Plot:

Place your cursor over the series name in the legend at right to see the series highlighted in the graphic. Place your cursor over a specific data point to see that data points value.

Logarithmic Scale Plot:

The logarithmic plot is the same underlying data as the linear scale plot, except with a logarithmic y-scale in order to be able to compress the dynamic range and see the shape of all of the data curves in a single graphic. Place your cursor over the series name in the legend at right to see the series highlighted in the graphic. Place your cursor over a specific data point to see that data points value.

Summary Data Table:

The underlying data for the graphics above is provided in the summary data table.

Additional Data:

Additional CSV datasets stratified by Locality, City, Congressional District, State House District, State Senate District, and Precinct are available here.

Data column descriptions:
  • ISSUED” := Number of DAL file records where BALLOT_STATUS= “ISSUED”
  • NOT_ISSUED” := Number of DAL file records where BALLOT_STATUS= “NOT ISSUED”
  • PROVISIONAL” := Number of DAL file records where BALLOT_STATUS= “PROVISIONAL” and APP_STATUS=”APPROVED”
  • DELETED” := Number of DAL file records where BALLOT_STATUS= “DELETED”
  • MARKED” := Number of DAL file records where BALLOT_STATUS= “MARKED” and APP_STATUS=”APPROVED”
  • ON_MACHINE” := Number of DAL file records where BALLOT_STATUS= “ON_MACHINE” and APP_STATUS=”APPROVED”
  • PRE_PROCESSED” := Number of DAL file records where BALLOT_STATUS= “PRE-PROCESSED” and APP_STATUS=”APPROVED”
  • FWAB” := Number of DAL file records where BALLOT_STATUS= “FWAB” and APP_STATUS=”APPROVED”
  • MAIL_IN” := The sum of “MARKED” + “PRE_PROCESSED”
  • COUNTABLE” := The sum of “PROVISIONAL” + “MARKED” + “PRE_PROCESSED” + “ON_MACHINE” + “FWAB”
  • MILITARY” := Number of DAL file records where VOTER_TYPE= “MILITARY”
  • OVERSEAS” := Number of DAL file records where VOTER_TYPE= “OVERSEAS”
  • TEMPORARY” := Number of DAL file records where VOTER_TYPE= “TEMPORARY”
  • MILITARY_COUNTABLE” := Number of DAL file records where VOTER_TYPE= “MILITARY” and where COUNTABLE is True
  • OVERSEAS_COUNTABLE” := Number of DAL file records where VOTER_TYPE= “OVERSEAS” and where COUNTABLE is True
  • TEMPORARY_COUNTABLE” := Number of DAL file records where VOTER_TYPE= “TEMPORARY” and where COUNTABLE is True
Editorial Note:

When we first started receiving DAL data files on Sept-14-2023 after our purchase from ELECT, we noticed that there were a dozen or so records that were marked as being in a countable state, including records that should correspond to In-Person-Early votes. This is problematic as in person early voting facilities were not opened until Sept-22-2023. We respectfully raised this issue with the dept of elections and they acknowledged the error and directed registrars to correct the issues before the official start of early voting. (This can be observed in the logarithmic plot with the Overseas_Countable, On-Machine, Provisional, and Pre-Processed counts being reset to 0 on 9-21-2023.) We’d like to take a moment to thank the folks at ELECT and the registrars for listening to our concerns and correcting these errors before the start of early voting. Credit where credit is due.


All data purchased by Electoral Process Education Corp. (EPEC) from the VA Dept of Elections (ELECT). All processing performed by EPEC.

If you like the work that EPEC is doing, please support us with a donation.

Categories
Election Data Analysis Election Forensics Election Integrity Interesting mathematics technical

VA voter registrations greater than census determined eligible voters

One of our volunteers brought to our attention the following press release: https://www.honestelections.org/honest-elections-project-supports-new-pre-litigation-notices-against-arizona-and-virginia/. They claim that “… 43 counties and independent cities in Virginia and four counties in Arizona claim to have more voters than voting-age adult citizens.”

Attempt to directly validate the claim

After reading through the press release we decided to independently try to verify the claims in the release. Note that an analysis like this has been on our list of things-to-do, but there are only so many hours in the day! The fact this press release was issued gave us a well deserved prod to complete this analysis.

EPEC has purchased the entire statewide registered voter list data from the VA Department of Elections (ELECT) and has current records as of 2023-08-01. Eligible parties can purchase data from ELECT via their website here.

The necessary data from the US Census office can be downloaded here and included the estimates of the eligible voting age citizens in each county. From the documentation on the census site, the “cvap_est” field in the census data represents “The rounded estimate of the total number of United States citizens 18 years of age or older for that geographic area and group.”

It is therefore a straightforward process to accumulate the number of registrant records in each county, as well as accumulating the number of eligible voting age citizens and compute the registration percent “REG_PCT” as (# Registered / # Eligible * 100). The below table has the results of this direct computation for each county.

The results are only slightly different than the results presented by Honest Elections Project, but still show significant issues with 38 counties being over 100%.

Adjusting for population growth since 2020 census

As the census redistricting data is circa 2020, and the eligible voter data was estimated for 2021, we can attempt to account for population shifts since the 2020 census data was collected and the voter eligibility data was computed. The US Census bureau also makes available the estimates of population growth by county year-over-year since the date of the last census here, which we can use to find the recent rates of growth or decline for each county. We can then use these rates to adjust the number of eligible voter estimates to scale with the most recent rates of population change. This is admittedly an approximation and assumes a linear relationship, but it is arguably better than taking the 2020 census and 2021 eligible voter estimates and applying them directly to the latest (2023) RVL.

The REG_PCT_ADJ column in the table below represents this adjusted estimate.

Active vs inactive registrations

An additional consideration that can be made with this data, is to attempt to consider only “Active” voter registrations vs registrations with any status assigned. Note that “Inactive” voter registrations can be immediately returned to “Active” status by simply having any type of interaction with the department of elections (or through DMV, etc), and the registrant will then be allowed to vote. Because of this easy ability to change “Inactive” records to “Active”, it is most appropriate (IMO) to include them in this analysis. However, for completeness, and in order to bound the scope of the issue, the corresponding REG_PCT_ACTIVE and REG_PCT_ADJ_ACTIVE columns have also been computed which only consider “Active” voters.

Results

Even the most forgiving analysis we could compute with the official data from US Census and VA ELECT, which only considers active voters and attempts to adjust for population change since the census, still results in multiple (6) counties in VA having more than 100% registered voters than eligible voters, and many counties that had over 90%.

The most appropriate metric to consider, in my opinion, is the Adjusted and either Active or Inactive status results, as inactive status registrations can still be converted to active status and voted. There were 36 localities with over 100% in this category and 59 between 90% and 100%. There are 133 voting localities in total in VA.

The summary tabulated data and graphics for each of the methods of analyzing the data is presented below.

Tabulated Data Results:
LOCALITY_NAMEN_REGN_REG_ACTIVEN_ELIGIBLEN_ELIGIBLE_ADJREG_PCTREG_PCT_ADJREG_PCT_ACTIVEREG_PCT_ADJ_ACTIVE
GOOCHLAND COUNTY22032213081984020314.622534217111.048387096774108.453897988458107.399193548387104.889962705976
LOUDOUN COUNTY291468272527259775261988.121593707112.200173226831111.252372140753104.908863439515104.022655050994
FAIRFAX CITY17867165781589016179.2525931696112.441787287602110.431553603056104.32976714915102.464560118177
SURRY COUNTY5635544953255327.44865113427105.821596244131105.772957545076102.328638497653102.281605264085
NEW KENT COUNTY19258186861754518281.8036615372109.763465374751105.339715689632106.503277286976102.210921558649
KING WILLIAM COUNTY14025135921338013590.9343586927104.820627802691103.193788078519101.584454409567100.007840824473
MATHEWS COUNTY7353709271457099.01111761264102.911126662001103.57780651670299.2582225332499.9012381091326
NORTHAMPTON COUNTY9801930694359326.0652878146103.879173290938105.09255186971498.632750397456399.7848472298299
HANOVER COUNTY86583838008328084111.898838322103.96613832853102.937873470706100.62439961575499.6291858314583
RAPPAHANNOCK COUNTY6194595259455996.95979561651104.188393608074103.285668256898100.11774600504699.250290194552
JAMES CITY COUNTY63840604186010061066.3951247591106.222961730449104.541949577299100.52911813643998.9382128690675
FAUQUIER COUNTY56031528055296553424.3705925266105.788728405551104.87910176304199.697913716605398.840659074394
FALLS CHURCH CITY11130102531034510373.4476832119107.588206863219107.29316173264799.110681488641998.8388847479627
ACCOMACK COUNTY25327241922465024522.1840906366102.74645030426103.2819911407198.141987829614698.6535290273647
CRAIG COUNTY3917374738303807.2210828548102.271540469974102.88343951549397.832898172323898.4182404555912
CHARLES CITY COUNTY5728558457005677.65042979943100.491228070175100.88680292707597.964912280701898.3505425182942
ISLE OF WIGHT COUNTY31039296652971030281.6797400553104.473241332885102.5009189267199.848535846516397.9635220194221
SPOTSYLVANIA COUNTY105580100062100440102345.441485999105.117483074472103.16043242086599.623655913978597.7688879418126
LANCASTER COUNTY9201873690658943.53432452276101.50027578599102.87879115945696.370656370656497.6795043548532
CLARKE COUNTY12213114891161011882.4255832663105.193798449612102.78204491513398.957795004306696.6890128576076
POWHATAN COUNTY24346236242419524441.048216348100.6240958875899.611112356938897.640008266170796.6570655680737
LOUISA COUNTY29799287752900529810.5293092847102.73745905878399.961324707907599.207033270125896.5262968042565
FAIRFAX COUNTY784080721354749510747334.300776511104.612346733199104.91690254084596.243412362743796.523603861148
HENRICO COUNTY239597227831236070236137.172406976101.494048375482101.46517702306596.509933494302596.4824799406588
BEDFORD COUNTY62932605996243562871.2603534819100.796027868984100.09660955765397.05934171538496.3858520718266
NELSON COUNTY11827113841195511835.4598.929318276871699.928604320072395.223755750731996.1856118694262
FREDERICK COUNTY68222638686571566439.8543302061103.814958533059102.68234433648397.18937837632296.1290488124432
NORTHUMBERLAND COUNTY1039599041009010304.4313465051103.022794846383100.87892917570598.156590683845496.1139888942937
POQUOSON CITY9547910995009479.6573875803100.494736842105100.71039078382795.884210526315896.0899706347414
ORANGE COUNTY28633273512786528465.9231224287102.756145702494100.58693644626598.155392068903796.0833059316801
PRINCE WILLIAM COUNTY317403289258300255301330.510742882105.711145526303105.33384064477796.337446503805195.9935982874356
CHESTERFIELD COUNTY268258253920259725264714.168974025103.285398017134101.3387387005897.764943690441895.9223304835323
STAFFORD COUNTY111503103640106940108128.634023171104.266878623527103.12069601850996.914157471479395.8488016946475
AMELIA COUNTY1018099031028010370.952987928399.027237354085698.158771058449696.332684824902795.4878496848553
BOTETOURT COUNTY26424256712691027035.137004444298.193979933110497.739471398485295.395763656633294.9542071703948
GREENE COUNTY14956144321521515272.161166064398.297732500821697.929820392631694.853762734143994.4987408335423
ALBEMARLE COUNTY84004791078326583844.7847760722100.887527772774100.18989281724995.006305170239694.3493387349904
CULPEPER COUNTY37104354793720037612.219020172999.74193548387198.64879277688995.373655913978594.3283882851241
EMPORIA CITY3987371040353933.8020277481398.8104089219331101.35232967690391.945477075588694.3107958618791
KING AND QUEEN COUNTY5385515754305470.7168566286799.17127071823298.43316956671294.972375690607794.2655256184836
GLOUCESTER COUNTY30265290683058030862.894891518298.969914977109298.062738788374295.055591890124394.1842950966615
WESTMORELAND COUNTY14199136041447014454.550501815198.127159640635898.232041170820394.01520387007694.1156904069188
WARREN COUNTY30580288713036030718.0311057939100.72463768115999.550651194672995.095520421607493.9871435788555
MADISON COUNTY10409100651077010808.602150537696.648096564531196.302924791086493.454038997214593.1202745722244
FLUVANNA COUNTY20994201972141521693.887289995398.034088255895496.773804156721694.312397851972993.0999582048827
KING GEORGE COUNTY19822185771974519984.620303757100.38997214484799.186272737308894.08457837427292.9564821229435
APPOMATTOX COUNTY12256117451241512679.986583729798.719291180024296.656253688203794.603302456705692.6262809699701
ROANOKE COUNTY73141691367480074926.018335727897.782085561497397.617625525315592.42780748663192.2723528297154
SUFFOLK CITY71421654886949571088.6638879661102.77142240449100.46749522899794.234117562414692.1215794732158
HIGHLAND COUNTY1879181519201971.4056224899697.864583333333395.312703715775794.5312592.0662891134289
ARLINGTON COUNTY175053155984169220169528.629042616103.446992081314103.25866550598792.17822952369792.0104178750769
YORK COUNTY50591474655159051709.59579071698.063578212831997.836773284317992.004264392324191.7914736601402
FLOYD COUNTY11879114771242512513.126249274695.605633802816994.932311585113592.370221327967891.7196851639319
SHENANDOAH COUNTY32321309583366033764.368600682696.021984551396395.725172243696591.972667855020891.6883723375006
CUMBERLAND COUNTY7381713477507789.964933993495.238709677419494.75010558508792.051612903225891.579359604933
CAROLINE COUNTY23033218052340023813.572383924698.431623931623996.722153352969893.183760683760791.565430202818
ESSEX COUNTY8291790186108673.6448066717296.295005807200995.588419687449191.765389082462391.0920400374546
MIDDLESEX COUNTY8708830889959137.7910323059896.809338521400895.296554377458692.362423568649290.9191288203865
DINWIDDIE COUNTY20876199322184021966.364513018395.586080586080695.036208598049491.263736263736390.7387291519602
FRANKLIN CITY5877556460606148.7229330708796.98019801980295.580823269667891.815181518151890.4903353194541
CHESAPEAKE CITY176334164440181540182080.13664946697.132312438030296.844171607511290.580588300099290.3118830125735
CHARLOTTE COUNTY8391809990058968.2672279118293.181565796779693.563224497646589.938922820655290.3073001080252
CAMPBELL COUNTY40938390804350543321.159458239394.099528789794394.498855783080889.82875531548190.2099585715667
MECKLENBURG COUNTY22837219072412524290.610561056194.661139896373194.015751240989490.806217616580390.1871113734884
BATH COUNTY3353320735903556.6209933936993.398328690807894.274875119617589.331476323119890.1698552068635
HALIFAX COUNTY25042241782703026938.720303335592.64520902700792.959129899349289.4487606363389.7518505992518
WYTHE COUNTY20849201882258022508.736879432692.33392382639592.626254914600789.40655447298589.6896174500436
VIRGINIA BEACH CITY328860303604342075339791.41743699396.136812102609196.782903606145488.753635898560389.3501084547837
RUSSELL COUNTY19126185892093520829.412362669691.358968235013191.822081521020688.793885837114989.2439963083892
ALEXANDRIA CITY11121697404109245109471.650837935101.804201565289101.59342546560189.161060002746188.976460347894
SCOTT COUNTY15973155351747017465.120566055391.431024613623491.456568762798388.923869490555288.9487131866319
PITTSYLVANIA COUNTY44900432604884548732.824762855791.923431262155892.135024428591988.56587163476388.7697362312001
PAGE COUNTY17104165491878518744.748119826991.051370774554291.24689161284988.096885813148888.286062283737
CARROLL COUNTY21244206452333023390.991365966191.058722674667890.821289562399888.491213030432988.2604746288714
FRANKLIN COUNTY39895385634375043800.899836393491.188571428571491.082603665717388.14488.0415702509351
GILES COUNTY12058115491325513150.296369995290.969445492267191.693750929542187.129385137683987.8231157310733
COLONIAL HEIGHTS CITY12913119651358513641.672521681995.053367684946694.658481058508788.075082811924987.7091865457335
SOUTHAMPTON COUNTY13173126671454514460.328214226390.567205225163391.097517323570787.088346510828587.5982883122804
HOPEWELL CITY15555144901678016647.325988334492.69964243146693.438429756827986.352800953516187.0410059258396
AMHERST COUNTY22734218972503025187.877735656790.827007590890990.257703481770887.483020375549386.9346763939621
PATRICK COUNTY12896125271447514454.518055241189.091537132987989.21777917959786.542314335060486.664944151893
MANASSAS CITY23663216292505524982.937215012394.444222710037994.716645190065586.326082618239986.5750884848044
PORTSMOUTH CITY68285630857350073041.556566091192.904761904761993.487876231406585.829931972789186.3686413130011
AUGUSTA COUNTY54910530036118561409.98534289989.744218354171889.415425998549686.62744136634886.310067823732
ROCKINGHAM COUNTY56871542636224062923.041079673391.373714652956390.381836326044487.183483290488486.2370906887544
ALLEGHANY COUNTY10965103951226512110.707867412189.400733795352690.539711799216684.753363228699685.8331330736759
WASHINGTON COUNTY39282378314407544210.191846522889.12535450935988.852815062121585.833238797504385.5707664226648
WAYNESBORO CITY15564142921655016705.275269959394.042296072507693.168174414871186.35649546827885.5538132059456
HAMPTON CITY9992989709105410105555.29079341694.800303576510894.669816405103685.104828763874484.9876868565225
ROCKBRIDGE COUNTY16203156511845518417.501435575887.797344892982987.976102820884484.80628555946984.9789536042499
HENRY COUNTY36321342474065040334.796007634889.350553505535190.048800527279184.248462484624884.9068382384221
FREDERICKSBURG CITY19199175492057020786.123058542493.334953816237292.364506579353985.313563441905784.4265183583041
BUCHANAN COUNTY14607140091703016593.553194703285.772166764533288.028162676229582.260716382853884.4243534559663
CHARLOTTESVILLE CITY33751304393656536279.548327137592.304115957883293.030375394044983.246273759059283.9012650475344
MARTINSVILLE CITY9035826997809907.0411100450292.382413087934691.1977642935184.550102249488883.4658896450509
STAUNTON CITY18003168792005020246.177834951189.790523690773188.920487347104884.184538653366683.3688221925113
DICKENSON COUNTY1007194531147011364.83901241787.802964254577188.615421555876382.414995640802183.1775970576605
LUNENBURG COUNTY8036776493209346.4132699841686.223175965665285.979506446686683.304721030042983.069299160288
PULASKI COUNTY23605228212764527561.594001419885.386145776813285.644538551667482.550189907759182.8000006052786
LEE COUNTY15524149131814018075.041022619185.578831312017685.886388753271882.210584343991282.5060368125189
BRISTOL CITY12449110901354513449.919567124991.908453303802192.558174328630281.8752307124482.4540246850758
SMYTH COUNTY20124194292373523613.919425675784.786180745734285.220922614476981.858015588792982.277743265587
GRAYSON COUNTY10885104881274012750.80365296885.439560439560485.367168189953882.323390894819582.253638950504
BLAND COUNTY4624431952855258.485191778687.492904446546887.934069059078281.721854304635882.1339196077333
MANASSAS PARK CITY89388188102109983.4697342231687.541625857002989.527992150471480.195886385896282.0155739234795
NEWPORT NEWS CITY122432110842136045135707.76650213289.993752067330790.217386341021481.474512109963681.6769760913118
PETERSBURG CITY23678206262531525335.484460694793.533478174995193.457853694228381.477384949634681.411508163576
BRUNSWICK COUNTY11073107061320013152.087114337683.886363636363684.191960589501681.106060606060681.4015289507093
COVINGTON CITY3797357644354394.7592043273485.614430665163586.398362764932580.631341600901981.369645838135
WISE COUNTY24615235812921528980.998543702984.25466370015484.934961653860780.715385931884381.3671066731541
DANVILLE CITY28416260103213532023.52546252688.42694880970988.73476480049880.939785280846481.2215383045099
WINCHESTER CITY18077158671973019604.384847945991.621895590471492.208963148843980.420679168778580.9359749008522
ROANOKE CITY65501589027354072871.955561856989.068534131085189.885058655246180.095186293173880.8294487856874
NORTON CITY2600244330553036.4899476728285.106382978723485.625180547449379.967266775777480.4547369528534
SALEM CITY17727162712009020223.912203202688.237929318068787.653663751530580.990542558486880.4542654087636
RICHMOND CITY158141142368177060178454.84090649589.314921495538288.616817115575780.406641816333479.778166440773
LYNCHBURG CITY55508490296146061591.281629970490.315652456882590.123144917623779.773836641718279.6037989508931
GALAX CITY4016376547304780.4324973719884.904862579281284.009135202887679.598308668076178.7585642527071
TAZEWELL COUNTY27315253713249532303.55231295484.059086013232884.557263967054278.076627173411378.5393499581963
BUCKINGHAM COUNTY11026104361362013653.768595041380.954478707782780.754261530434576.622613803230576.4331102241624
SUSSEX COUNTY7121683691859072.8634850166577.528579205225978.486797599897274.425694066412675.3455621953234
RICHMOND COUNTY5629544272007224.6657089181178.180555555555677.913639561918775.583333333333375.3252845080764
BUENA VISTA CITY4380390052255206.0430839002383.827751196172284.132995624742774.641148325358974.9129413097024
NOTTOWAY COUNTY965192721243512415.051655544177.611580217129177.736285500593874.563731403297174.6835394427009
PRINCE EDWARD COUNTY13443127101788017855.570433119375.184563758389375.287429490717171.085011185682371.182268007663
PRINCE GEORGE COUNTY24912227513244032542.605702590876.794081381011176.551952316518770.13255240443969.9114269088438
NORFOLK CITY138210126445184395182802.31049888374.953225412836675.606265381883468.572900566718269.170351104929
MONTGOMERY COUNTY60377545937903079093.969241665476.397570542831876.335782081593169.078830823737869.02296157776
WILLIAMSBURG CITY1012787211298513178.84306220177.989988448209576.842860577389167.162110127069766.174245788033
GREENSVILLE COUNTY6395615495959438.6146161934866.649296508598267.753587364700164.13757165190265.2002465429811
HARRISONBURG CITY25205228333872038443.460777083465.095557851239765.563816291547358.969524793388459.3937162223725
RADFORD CITY917184181431514484.613106033164.065665385958863.31546402285458.805448829898758.1168439804149
LEXINGTON CITY4087341064706490.8906228978963.168469860896562.965165143629252.704791344667752.5351634792698
Categories
Election Data Analysis Election Forensics Election Integrity mathematics programming technical

Voter ID Number distribution patterns in VA Registered Voter List

One thing that I have been asked about repeatedly is if there is any sort of patterns in the assignment of voter ID numbers in the VA data. Specifically, I’ve been asked repeatedly if I’ve found any similar pattern to what AuditNY has found in the NY data. It’s not something that I have looked at in depth previously due mostly to lack of time, and because VA is setup very differently than NY, so a direct comparison or attempt to replicate the AuditNY findings in VA isn’t as straightforward as one would hope.

The NY data uses a different Voter ID number for counties vs at the state level, which is the “Rosetta Stone” that was needed for the NY team to understand the algorithms that were used to assign voter ID numbers, and in turn discover some very (ahem) “interesting” patterns in the data. VA doesn’t have such a system and only uses a single voter ID number throughout the state and local jurisdictions.

Well … while my other machine is busy crunching on the string distance computations, I figured I’d take a crack at looking at the distribution of the Voter ID numbers in the VA Registered Voter List (RVL) and just see what I find.

To start with, here is a simple scatter plot of the Voter ID numbers vs the Registration date for each record in the 2023-07-01 RVL. From the zoomed out plot it is readily apparent that there must have been a change in the algorithm that was used to assign voter identification numbers sometime around 2007, which coincides nicely with the introduction of the current Virginia Election and Registration Information System (VERIS) system.

From a high level, it appears that the previous assignment algorithm broke the universe of possible ID numbers up into discrete ranges and assigned IDs within those ranges, but favoring the bottom of each range. This would be a logical explanation for the banded structure we see pre-2007. The new assignment algorithm post-2007 looks to be using a much more randomized approach. Nothing strange about that. As computing systems have gotten better and security has become more of a concern over the years there have been many systems that migrated to more randomized assignments of identification numbers.

Looking at a zoomed in block of the post-2007 “randomized” ID assignments we can see some of the normal variability that we would expect to see in the election cycles. We see that we have a high density of new assignments around November of 2016 and 2020, with a low density section of assignments correlated to the COVID-19 lockdowns. There are short periods where it looks like there were lulls in the assignment of voter ID’s, these are perhaps due to holidays or maintenance periods, or related to the legal requirements to “freeze” the voter rolls 30 days before any election (primaries, runoffs, etc). Note that VA now has same day voter registration as of the laws passed by the previous democratic super-majority that went into affect in 2022, so going forward we would likely see these “blackout periods” be significantly reduced.

We can see more clearly the banded assignment structure of the pre-2007 entries by zooming in on a smaller section of the plot, as shown below. It’s harder to make out in this banded structure, but we still see similar patterns of density changes presumably due to the natural election cycles, holidays, maintenance periods, legally required registration lockouts periods, etc. We can also see the “bucketing” of ID numbers into distinct bands, with the bias of numbers filling the lower section of each band.

All of that looks unremarkable and seems to make sense to me … however … if we zoom into the Voter ID address range of around [900,000,000 to 920,000,000] we do see something that catches my curiosity. We see the existence of the same banded structure as above between 900,000,000 and 915,000,000 AND pre-2007, but there is another band of assignments super-imposed on the entire date range of the RVL. This band does not seem to be affected by the introduction of the VERIS system (presumably), which is very interesting. There is also what looks like to be a vertical high-density band between 2007 and 2010 that extends along the entire vertical axis, but we only see it once we zoom in to the VERIS transition period.

The horizontal band that extends across all date ranges only exists in the [~915,000,000 to ~920,000,000] ID range. It trails off in density pre ~1993, but it exists throughout the full registration date range. I will note that the “Motor Voter” National Voter Registration Act (NVRA) was implemented in 1993, so perhaps these are a reserved universe block for DMV (or other externally provided) registrations? (That’s a guess, but an educated one.)

A plausible explanation I can imagine for the distinct high density band between 2007-2010 is that this might be related to how the VERIS system was implemented and brought into service, and there was some sort of update around 2010 that made correction to its internal algorithms. (But that is just a guess.) That still wouldn’t entirely explain the huge change in the density of new registrants added to the rolls.

Another, or additional, explanation might be that when VERIS came online there were a number of registrants that had their Voter ID number regenerated and/or their registration date field updated as part of the rollout of the new VERIS software. Meaning that while VERIS was coming online and handling the normal amount of new real registrations, it was also moving/updating a large number of historic registrations, which would account for the higher density as VERIS became the system of record. That seems to be a poor systems administration and design choice, in my opinion, as it makes inaccurate those moved registrant records by giving them a false registration date. However, if that was the case, and VERIS was resetting registration dates as it ingested voter records into its databases, why do we see any records with pre-2007 registration dates at all? (This is again, merely an educated guess on my part, so take with a grain of salt.)

Incorporating the identification of cloned registrations

In attempting to incorporate some of my early results on the most recent RVL data doing duplicate record identification (technically they are “cloned” records, as “duplicates” would have the same voter ID numbers. This was pointed out to me a few days ago.) on this dataset, I did a scatter plot of only those records that had an identified exact match of (FullName +DOB) to other records in the dataset, but with unique Voter ID numbers. The scatter plot of those records is shown below, and we can see that there is a distinct ~horizontal cluster of records that aligns with the 915M – 920M ID band and pre-2007. In the post-2007 block we see the cloned records do not seem to be totally randomly distributed, but have a bias towards the lower right of the graph.

Superimposing the two plots produces the following, with the red indicating the records with identified Full Name + DOB string matches.

Zooming in to take a closer look at the 915M-920M band again, gives the following:

It is curious that there seems to be an alignment of the exact Full Name + DOB matching records with the 915M-920M, pre-2007 ID band. Post-2007 the exact cloned matches have a less structured distribution throughout the data, but they do seem to cluster around the lower right.

If the cloned records were simply due to random data entry errors, etc. I would expect to see sporadic red datapoints distributed “salt-n-pepper” style throughout the entirety of the area covered by the blue data. There might be some argument to be made for there being a bias of more of the red data points to the right side of the plot, as officials have not yet had time to “catch” or “clean-up” erroneous entries, but there is little reason to have linear features, or to have a bias for lower ID numbers in the vertical axis.

I am continuing to investigate this data, but as of right now all I can tell you is that … yes, there does seem to be interesting patterns in the way Voter IDs are assigned in VA, especially with records that have already been found and flagged to be problematic (clones).

Categories
Election Data Analysis Election Forensics Election Integrity programming technical

Preliminary Results of 2023-07-01 VA RVL duplicate detections

Below are the preliminary results from performing exact (string distance of 0) duplicate record checks on the 2023-07-01 VA Registered Voter List information. Note that these are the numbers of ordered matches discovered, not the number of individual unique registrants. Each count represents two different registration records, with unique voter IDs, that match the given criteria. Match pairs are directional, in that a pair (A,B) is counted separately from the pair (B, A). Matches are grouped into this table according to the LOCALITY_NAME of the first element of the identified pair. Some pairs can have different locality information, except for in the strictest case (3rd column), so a match might be counted in one locality while its mirror is counted in the other.

The first data column of the table below is equivalent to the criteria used by the MOU between the VA Department of Elections (ELECT) and the Department of Motor Vehicles (DMV), as discussed and documented in a previous post. There were 5,290 matches for this criteria across the state.

The second data column below is based on a match of the registrants Full Name and Day+Month+Year of birth information, but NOT the registrants listed address information. There were 1,200 matches for this criteria across the state.

The third data column is the strictest match criteria and includes the Gender and Address of the registrant. There were 208 matches for this criteria across the state.

I am only considering Active registrations in the table below. I previously computed similar statistics using the previous purchased 2022-11-23 RVL, I have not yet done a comparison of the two datasets, but will do so once I complete the string distance processing on this latest set.

Row LabelsSum of Exact Same First+Last+DobSum of Exact Same First+Middle+Last+Sfx+DOBSum of Exact Same First+Middle+Last+Sfx+Gender+Address+DOB
ACCOMACK COUNTY2470
ALBEMARLE COUNTY1165722
ALEXANDRIA CITY60130
ALLEGHANY COUNTY1240
AMELIA COUNTY500
AMHERST COUNTY3282
APPOMATTOX COUNTY1500
ARLINGTON COUNTY152620
AUGUSTA COUNTY6962
BATH COUNTY200
BEDFORD COUNTY67130
BLAND COUNTY500
BOTETOURT COUNTY2420
BRISTOL CITY800
BRUNSWICK COUNTY1710
BUCHANAN COUNTY1150
BUCKINGHAM COUNTY700
BUENA VISTA CITY300
CAMPBELL COUNTY39134
CAROLINE COUNTY1800
CARROLL COUNTY1400
CHARLES CITY COUNTY200
CHARLOTTE COUNTY1110
CHARLOTTESVILLE CITY2580
CHESAPEAKE CITY136138
CHESTERFIELD COUNTY38012734
CLARKE COUNTY1100
COLONIAL HEIGHTS CITY1240
COVINGTON CITY500
CRAIG COUNTY300
CULPEPER COUNTY2620
CUMBERLAND COUNTY1352
DANVILLE CITY2140
DICKENSON COUNTY600
DINWIDDIE COUNTY2120
EMPORIA CITY930
ESSEX COUNTY910
FAIRFAX CITY1220
FAIRFAX COUNTY55018814
FALLS CHURCH CITY1280
FAUQUIER COUNTY4120
FLOYD COUNTY900
FLUVANNA COUNTY2330
FRANKLIN CITY842
FRANKLIN COUNTY2022
FREDERICK COUNTY5494
FREDERICKSBURG CITY1420
GALAX CITY300
GILES COUNTY300
GLOUCESTER COUNTY2420
GOOCHLAND COUNTY2320
GRAYSON COUNTY1310
GREENE COUNTY1610
GREENSVILLE COUNTY700
HALIFAX COUNTY2364
HAMPTON CITY131378
HANOVER COUNTY73100
HARRISONBURG CITY1882
HENRICO COUNTY2457534
HENRY COUNTY2944
HIGHLAND COUNTY200
HOPEWELL CITY3374
ISLE OF WIGHT COUNTY2710
JAMES CITY COUNTY3110
KING AND QUEEN COUNTY400
KING GEORGE COUNTY1700
KING WILLIAM COUNTY1740
LANCASTER COUNTY830
LEE COUNTY38142
LEXINGTON CITY830
LOUDOUN COUNTY158316
LOUISA COUNTY2610
LUNENBURG COUNTY1350
LYNCHBURG CITY75294
MADISON COUNTY1020
MANASSAS CITY2332
MANASSAS PARK CITY410
MARTINSVILLE CITY900
MATHEWS COUNTY522
MECKLENBURG COUNTY2530
MIDDLESEX COUNTY600
MONTGOMERY COUNTY4750
NELSON COUNTY1554
NEW KENT COUNTY1810
NEWPORT NEWS CITY88120
NORFOLK CITY111150
NORTHAMPTON COUNTY832
NORTHUMBERLAND COUNTY810
NOTTOWAY COUNTY1084
ORANGE COUNTY2940
PAGE COUNTY2330
PATRICK COUNTY1010
PETERSBURG CITY3160
PITTSYLVANIA COUNTY6292
POQUOSON CITY800
PORTSMOUTH CITY80100
POWHATAN COUNTY2660
PRINCE EDWARD COUNTY2050
PRINCE GEORGE COUNTY2452
PRINCE WILLIAM COUNTY178410
PULASKI COUNTY1620
RADFORD CITY810
RAPPAHANNOCK COUNTY610
RICHMOND CITY159482
RICHMOND COUNTY563810
ROANOKE CITY4000
ROANOKE COUNTY5770
ROCKBRIDGE COUNTY1720
ROCKINGHAM COUNTY3470
RUSSELL COUNTY1900
SALEM CITY1110
SCOTT COUNTY1740
SHENANDOAH COUNTY2240
SMYTH COUNTY1700
SOUTHAMPTON COUNTY1860
SPOTSYLVANIA COUNTY86224
STAFFORD COUNTY92212
STAUNTON CITY1500
SUFFOLK CITY5152
SURRY COUNTY820
SUSSEX COUNTY1122
TAZEWELL COUNTY2100
VIRGINIA BEACH CITY244252
WARREN COUNTY2510
WASHINGTON COUNTY3552
WAYNESBORO CITY1300
WESTMORELAND COUNTY1920
WILLIAMSBURG CITY1580
WINCHESTER CITY1120
WISE COUNTY2450
WYTHE COUNTY2120
YORK COUNTY2600
Grand Total52901200208
Categories
Election Data Analysis Election Forensics Election Integrity mathematics programming technical

Potential Duplicate Registrants in VA RVL by Locality

Previously I posted the computation of potential duplicate records based on string comparisons in the registered voter list. As a follow up to that article, I’ve compiled the statistics of the number of potential pairs for each locality in VA.

I tallied the number of registrant pairs with the reference match criteria defined by the MOU between ELECT and the DMV along with the two highest confidence (most stringent) match criteria that I computed. I also stratified the results by Active registrant records only or either Active or Inactive records. I also stratified by if the pairs crossed a locality boundary or not.

The table below is organized into the following computed columns, and has been sorted in decreasing order according to column 5.

  1. Exactly matching First + Last + DOB, which is equivalent to the MOU between ELECT and DMV.
  2. Exactly matching First + Middle + Last + Suffix + DOB
  3. Exactly matching First + Middle + Last + Suffix + DOB + Gender + Street Address
  4. The same as #1, but filtering for only ACTIVE voter records
  5. The same as #2, but filtering for only ACTIVE voter records
  6. The same as #3, but filtering for only ACTIVE voter records
  7. The same as #1, but filtering for only pairs that cross a locality boundary.
  8. The same as #2, but filtering for only pairs that cross a locality boundary.
  9. The same as #3, but filtering for only pairs that cross a locality boundary.
  10. The same as #4, but filtering for only pairs that cross a locality boundary.
  11. The same as #5, but filtering for only pairs that cross a locality boundary.
  12. The same as #6, but filtering for only pairs that cross a locality boundary.


123456789101112
LOCALITY_NAMENum Registrant RecordsPct Same First Last DobPct Same Full Name DobPct Same Full Name Dob AddressPct Same First Last Dob _ Active OnlyPct Same Full Name Dob _ Active OnlyPct Same Full Name Dob Address _ Active OnlyPct Same First Last Dob _ xLocPct Same Full Name Dob _ xLocPct Same Full Name Dob Address _ xLocPct Same First Last Dob _ Active Only _ xLocPct Same Full Name Dob _ Active Only _ xLocPct Same Full Name Dob Address _ Active Only _ xLoc
NORTON CITY26040.2304%0.2304%0.1536%0.1920%0.1920%0.1536%0.0768%0.0768%0.0000%0.0384%0.0384%0.0000%
NOTTOWAY COUNTY97040.2988%0.2061%0.0618%0.2473%0.1752%0.0618%0.2370%0.1649%0.0206%0.1855%0.1340%0.0206%
RADFORD CITY95510.4293%0.2827%0.0000%0.2827%0.1675%0.0000%0.4293%0.2827%0.0000%0.2827%0.1675%0.0000%
HIGHLAND COUNTY19030.2627%0.1576%0.1051%0.2627%0.1576%0.1051%0.1576%0.0525%0.0000%0.1576%0.0525%0.0000%
WILLIAMSBURG CITY104800.2195%0.1336%0.0000%0.2004%0.1336%0.0000%0.2004%0.1336%0.0000%0.1813%0.1336%0.0000%
LYNCHBURG CITY563190.3072%0.1829%0.0533%0.2255%0.1296%0.0533%0.1616%0.0764%0.0000%0.1190%0.0479%0.0000%
EMPORIA CITY40230.3480%0.1740%0.0000%0.2983%0.1243%0.0000%0.2486%0.0746%0.0000%0.1989%0.0249%0.0000%
SUFFOLK CITY715800.2403%0.1229%0.0754%0.2249%0.1187%0.0754%0.1229%0.0307%0.0000%0.1104%0.0265%0.0000%
FALLS CHURCH CITY112130.1784%0.1338%0.0357%0.1516%0.1159%0.0178%0.0892%0.0624%0.0000%0.0803%0.0624%0.0000%
SUSSEX COUNTY71490.2658%0.1259%0.0839%0.2238%0.1119%0.0839%0.1539%0.0140%0.0000%0.1119%0.0000%0.0000%
FRANKLIN CITY59240.2026%0.1182%0.0338%0.1857%0.1013%0.0338%0.1688%0.0844%0.0000%0.1519%0.0675%0.0000%
APPOMATTOX COUNTY121950.2542%0.1230%0.0328%0.2214%0.0902%0.0328%0.2050%0.0738%0.0000%0.1886%0.0574%0.0000%
LEE COUNTY156190.2497%0.0960%0.0128%0.2305%0.0832%0.0128%0.1473%0.0192%0.0000%0.1409%0.0192%0.0000%
ALBEMARLE COUNTY848890.1920%0.1001%0.0212%0.1590%0.0825%0.0188%0.1402%0.0554%0.0000%0.1096%0.0401%0.0000%
AMHERST COUNTY229060.1965%0.0829%0.0437%0.1790%0.0742%0.0437%0.1441%0.0393%0.0000%0.1266%0.0306%0.0000%
PRINCE EDWARD COUNTY135950.2280%0.0883%0.0000%0.1912%0.0662%0.0000%0.2133%0.0883%0.0000%0.1765%0.0662%0.0000%
STAUNTON CITY181800.1980%0.0935%0.0000%0.1595%0.0605%0.0000%0.1650%0.0605%0.0000%0.1265%0.0275%0.0000%
NELSON COUNTY118950.1765%0.0673%0.0168%0.1513%0.0588%0.0168%0.1261%0.0504%0.0000%0.1177%0.0420%0.0000%
ARLINGTON COUNTY1770920.1378%0.0683%0.0113%0.1146%0.0576%0.0102%0.0870%0.0344%0.0000%0.0683%0.0260%0.0000%
NORTHUMBERLAND COUNTY104570.1339%0.0574%0.0191%0.1243%0.0574%0.0191%0.0956%0.0191%0.0000%0.0861%0.0191%0.0000%
SOUTHAMPTON COUNTY132180.2194%0.0757%0.0000%0.1740%0.0530%0.0000%0.1589%0.0454%0.0000%0.1286%0.0227%0.0000%
HOPEWELL CITY158250.2401%0.0695%0.0253%0.2085%0.0506%0.0253%0.1390%0.0190%0.0000%0.1201%0.0126%0.0000%
LUNENBURG COUNTY80970.1853%0.0618%0.0000%0.1729%0.0494%0.0000%0.1853%0.0618%0.0000%0.1729%0.0494%0.0000%
AMELIA COUNTY101790.1375%0.0884%0.0098%0.0884%0.0491%0.0098%0.1375%0.0884%0.0098%0.0884%0.0491%0.0098%
RICHMOND CITY1610970.1707%0.0639%0.0000%0.1316%0.0490%0.0000%0.1459%0.0528%0.0000%0.1155%0.0416%0.0000%
CHARLOTTESVILLE CITY347890.1265%0.0604%0.0000%0.1064%0.0489%0.0000%0.1150%0.0489%0.0000%0.0949%0.0374%0.0000%
LEXINGTON CITY42110.2612%0.1187%0.0000%0.1900%0.0475%0.0000%0.2612%0.1187%0.0000%0.1900%0.0475%0.0000%
FAIRFAX COUNTY7877270.1143%0.0559%0.0053%0.0988%0.0474%0.0053%0.0665%0.0236%0.0000%0.0546%0.0171%0.0000%
CHARLOTTE COUNTY84740.2242%0.0708%0.0236%0.1652%0.0472%0.0236%0.2006%0.0472%0.0000%0.1416%0.0236%0.0000%
HARRISONBURG CITY264430.1777%0.0870%0.0000%0.1210%0.0454%0.0000%0.1324%0.0567%0.0000%0.0908%0.0303%0.0000%
BRUNSWICK COUNTY110980.2253%0.0631%0.0000%0.1982%0.0451%0.0000%0.2072%0.0451%0.0000%0.1802%0.0270%0.0000%
HAMPTON CITY1008070.2044%0.0764%0.0060%0.1468%0.0446%0.0040%0.1210%0.0387%0.0000%0.0972%0.0268%0.0000%
WISE COUNTY247500.1455%0.0525%0.0000%0.1333%0.0444%0.0000%0.1212%0.0364%0.0000%0.1091%0.0283%0.0000%
WYTHE COUNTY209500.1480%0.0525%0.0191%0.1289%0.0430%0.0191%0.1002%0.0143%0.0000%0.0907%0.0143%0.0000%
CHESAPEAKE CITY1780050.1258%0.0433%0.0303%0.1140%0.0410%0.0303%0.0843%0.0062%0.0000%0.0747%0.0051%0.0000%
NEWPORT NEWS CITY1247780.1354%0.0537%0.0016%0.1122%0.0409%0.0016%0.1002%0.0313%0.0000%0.0850%0.0216%0.0000%
CUMBERLAND COUNTY74160.1483%0.0539%0.0270%0.1214%0.0405%0.0270%0.1214%0.0270%0.0000%0.0944%0.0135%0.0000%
PRINCE GEORGE COUNTY249570.1643%0.0401%0.0000%0.1322%0.0401%0.0000%0.1643%0.0401%0.0000%0.1322%0.0401%0.0000%
HALIFAX COUNTY250860.1196%0.0438%0.0239%0.1156%0.0399%0.0239%0.0877%0.0120%0.0000%0.0837%0.0080%0.0000%
SMYTH COUNTY201590.1339%0.0397%0.0000%0.1290%0.0397%0.0000%0.1141%0.0198%0.0000%0.1091%0.0198%0.0000%
FAIRFAX CITY178250.1234%0.0617%0.0000%0.0954%0.0393%0.0000%0.1122%0.0617%0.0000%0.0842%0.0393%0.0000%
CAMPBELL COUNTY413180.1380%0.0508%0.0048%0.1186%0.0387%0.0048%0.1283%0.0411%0.0000%0.1089%0.0290%0.0000%
COLONIAL HEIGHTS CITY130660.0918%0.0383%0.0000%0.0918%0.0383%0.0000%0.0918%0.0383%0.0000%0.0918%0.0383%0.0000%
CHESTERFIELD COUNTY2700840.1529%0.0478%0.0067%0.1300%0.0381%0.0059%0.1107%0.0248%0.0000%0.0937%0.0196%0.0000%
PETERSBURG CITY237400.1685%0.0421%0.0000%0.1559%0.0379%0.0000%0.1601%0.0421%0.0000%0.1474%0.0379%0.0000%
SURRY COUNTY56750.1762%0.0352%0.0000%0.1410%0.0352%0.0000%0.1410%0.0000%0.0000%0.1057%0.0000%0.0000%
STAFFORD COUNTY1112610.1222%0.0440%0.0072%0.1079%0.0351%0.0072%0.1007%0.0279%0.0000%0.0881%0.0207%0.0000%
BUCHANAN COUNTY148360.0876%0.0337%0.0000%0.0876%0.0337%0.0000%0.0607%0.0067%0.0000%0.0607%0.0067%0.0000%
PORTSMOUTH CITY683810.1536%0.0409%0.0058%0.1375%0.0336%0.0058%0.1185%0.0263%0.0000%0.1024%0.0190%0.0000%
PITTSYLVANIA COUNTY453220.1677%0.0441%0.0044%0.1522%0.0331%0.0044%0.1324%0.0221%0.0000%0.1214%0.0154%0.0000%
MECKLENBURG COUNTY229960.1522%0.0478%0.0000%0.1305%0.0304%0.0000%0.1261%0.0391%0.0000%0.1131%0.0304%0.0000%
NORTHAMPTON COUNTY98770.0911%0.0304%0.0202%0.0810%0.0304%0.0202%0.0911%0.0101%0.0000%0.0810%0.0101%0.0000%
PAGE COUNTY170950.1872%0.0351%0.0000%0.1521%0.0292%0.0000%0.1521%0.0117%0.0000%0.1170%0.0058%0.0000%
ACCOMACK COUNTY254830.1216%0.0275%0.0000%0.1020%0.0275%0.0000%0.1138%0.0275%0.0000%0.0942%0.0275%0.0000%
GRAYSON COUNTY109410.1645%0.0274%0.0000%0.1554%0.0274%0.0000%0.1462%0.0274%0.0000%0.1371%0.0274%0.0000%
ALLEGHANY COUNTY110690.1355%0.0271%0.0000%0.1084%0.0271%0.0000%0.0994%0.0090%0.0000%0.0723%0.0090%0.0000%
MATHEWS COUNTY73780.0949%0.0271%0.0271%0.0678%0.0271%0.0271%0.0678%0.0000%0.0000%0.0407%0.0000%0.0000%
BEDFORD COUNTY632400.1233%0.0300%0.0063%0.1154%0.0269%0.0063%0.1012%0.0142%0.0000%0.0933%0.0111%0.0000%
HENRICO COUNTY2404360.1152%0.0299%0.0083%0.0998%0.0258%0.0083%0.0944%0.0175%0.0000%0.0807%0.0133%0.0000%
WAYNESBORO CITY155610.1735%0.0450%0.0000%0.1285%0.0257%0.0000%0.1735%0.0450%0.0000%0.1285%0.0257%0.0000%
HANOVER COUNTY870000.1092%0.0287%0.0023%0.1011%0.0253%0.0023%0.1023%0.0218%0.0000%0.0943%0.0184%0.0000%
CRAIG COUNTY39720.1007%0.0252%0.0000%0.1007%0.0252%0.0000%0.1007%0.0252%0.0000%0.1007%0.0252%0.0000%
GALAX CITY40670.1229%0.0246%0.0000%0.1229%0.0246%0.0000%0.1229%0.0246%0.0000%0.1229%0.0246%0.0000%
ORANGE COUNTY284820.1299%0.0351%0.0000%0.1194%0.0246%0.0000%0.1299%0.0351%0.0000%0.1194%0.0246%0.0000%
DANVILLE CITY288380.1040%0.0312%0.0000%0.0902%0.0243%0.0000%0.1040%0.0312%0.0000%0.0902%0.0243%0.0000%
CARROLL COUNTY211630.1040%0.0236%0.0095%0.1040%0.0236%0.0095%0.0945%0.0142%0.0000%0.0945%0.0142%0.0000%
FREDERICK COUNTY679120.1075%0.0324%0.0088%0.0883%0.0236%0.0059%0.0898%0.0206%0.0000%0.0736%0.0147%0.0000%
MANASSAS PARK CITY90180.0665%0.0222%0.0000%0.0554%0.0222%0.0000%0.0444%0.0222%0.0000%0.0333%0.0222%0.0000%
HENRY COUNTY365390.1259%0.0246%0.0000%0.1122%0.0219%0.0000%0.0931%0.0082%0.0000%0.0848%0.0055%0.0000%
BLAND COUNTY45810.1091%0.0218%0.0000%0.1091%0.0218%0.0000%0.1091%0.0218%0.0000%0.1091%0.0218%0.0000%
SPOTSYLVANIA COUNTY1053610.0987%0.0247%0.0057%0.0873%0.0218%0.0057%0.0816%0.0095%0.0000%0.0702%0.0066%0.0000%
WINCHESTER CITY183520.1035%0.0381%0.0000%0.0708%0.0218%0.0000%0.0926%0.0381%0.0000%0.0599%0.0218%0.0000%
LANCASTER COUNTY92670.0755%0.0216%0.0000%0.0755%0.0216%0.0000%0.0755%0.0216%0.0000%0.0755%0.0216%0.0000%
KING WILLIAM COUNTY139960.1286%0.0214%0.0000%0.1143%0.0214%0.0000%0.1286%0.0214%0.0000%0.1143%0.0214%0.0000%
WESTMORELAND COUNTY142330.1827%0.0211%0.0000%0.1756%0.0211%0.0000%0.1546%0.0211%0.0000%0.1475%0.0211%0.0000%
VIRGINIA BEACH CITY3319140.1118%0.0259%0.0066%0.0967%0.0208%0.0066%0.0883%0.0114%0.0000%0.0762%0.0081%0.0000%
POWHATAN COUNTY242870.1400%0.0371%0.0000%0.1153%0.0206%0.0000%0.1400%0.0371%0.0000%0.1153%0.0206%0.0000%
BOTETOURT COUNTY263110.1178%0.0190%0.0076%0.1102%0.0190%0.0076%0.1102%0.0114%0.0000%0.1026%0.0114%0.0000%
FLUVANNA COUNTY210010.1286%0.0238%0.0000%0.1190%0.0190%0.0000%0.1095%0.0238%0.0000%0.1000%0.0190%0.0000%
SCOTT COUNTY160590.1121%0.0249%0.0000%0.1059%0.0187%0.0000%0.0996%0.0125%0.0000%0.0934%0.0062%0.0000%
ALEXANDRIA CITY1122120.0820%0.0205%0.0000%0.0686%0.0178%0.0000%0.0784%0.0169%0.0000%0.0651%0.0143%0.0000%
TAZEWELL COUNTY281470.0995%0.0178%0.0142%0.0959%0.0178%0.0142%0.0853%0.0036%0.0000%0.0817%0.0036%0.0000%
RICHMOND COUNTY56490.2301%0.0354%0.0000%0.1947%0.0177%0.0000%0.1947%0.0354%0.0000%0.1593%0.0177%0.0000%
ROCKINGHAM COUNTY568170.0845%0.0246%0.0035%0.0739%0.0176%0.0000%0.0739%0.0176%0.0000%0.0669%0.0141%0.0000%
LOUISA COUNTY295670.1150%0.0271%0.0135%0.1015%0.0169%0.0135%0.1082%0.0135%0.0000%0.0947%0.0034%0.0000%
LOUDOUN COUNTY2919140.0740%0.0219%0.0041%0.0620%0.0164%0.0041%0.0651%0.0171%0.0000%0.0531%0.0116%0.0000%
RAPPAHANNOCK COUNTY62390.0962%0.0160%0.0000%0.0801%0.0160%0.0000%0.0962%0.0160%0.0000%0.0801%0.0160%0.0000%
JAMES CITY COUNTY643900.0745%0.0186%0.0000%0.0668%0.0155%0.0000%0.0621%0.0124%0.0000%0.0544%0.0093%0.0000%
PATRICK COUNTY128620.0855%0.0155%0.0000%0.0777%0.0155%0.0000%0.0855%0.0155%0.0000%0.0777%0.0155%0.0000%
PRINCE WILLIAM COUNTY3165300.0812%0.0186%0.0000%0.0663%0.0148%0.0000%0.0711%0.0142%0.0000%0.0581%0.0104%0.0000%
AUGUSTA COUNTY549930.1455%0.0218%0.0036%0.1255%0.0145%0.0036%0.1346%0.0182%0.0000%0.1146%0.0109%0.0000%
DINWIDDIE COUNTY208350.1584%0.0384%0.0048%0.1152%0.0144%0.0048%0.1488%0.0288%0.0048%0.1152%0.0144%0.0048%
GOOCHLAND COUNTY214100.1261%0.0187%0.0000%0.1121%0.0140%0.0000%0.1261%0.0187%0.0000%0.1121%0.0140%0.0000%
MONTGOMERY COUNTY619440.0936%0.0145%0.0000%0.0807%0.0129%0.0000%0.0904%0.0145%0.0000%0.0775%0.0129%0.0000%
SHENANDOAH COUNTY323040.0960%0.0155%0.0000%0.0743%0.0124%0.0000%0.0960%0.0155%0.0000%0.0743%0.0124%0.0000%
ROANOKE COUNTY734670.0953%0.0163%0.0027%0.0830%0.0123%0.0027%0.0817%0.0109%0.0000%0.0694%0.0068%0.0000%
SALEM CITY179320.0892%0.0112%0.0000%0.0781%0.0112%0.0000%0.0892%0.0112%0.0000%0.0781%0.0112%0.0000%
NEW KENT COUNTY190220.1051%0.0210%0.0000%0.0894%0.0105%0.0000%0.0946%0.0210%0.0000%0.0789%0.0105%0.0000%
WASHINGTON COUNTY394490.1014%0.0152%0.0000%0.0887%0.0101%0.0000%0.0862%0.0051%0.0000%0.0786%0.0051%0.0000%
MADISON COUNTY104070.0865%0.0192%0.0000%0.0769%0.0096%0.0000%0.0865%0.0192%0.0000%0.0769%0.0096%0.0000%
NORFOLK CITY1412360.0984%0.0092%0.0000%0.0864%0.0085%0.0000%0.0899%0.0064%0.0000%0.0793%0.0057%0.0000%
PULASKI COUNTY238250.0881%0.0126%0.0000%0.0756%0.0084%0.0000%0.0881%0.0126%0.0000%0.0756%0.0084%0.0000%
CLARKE COUNTY122690.1060%0.0163%0.0000%0.0978%0.0082%0.0000%0.1060%0.0163%0.0000%0.0978%0.0082%0.0000%
GREENE COUNTY149260.1072%0.0067%0.0000%0.1072%0.0067%0.0000%0.1072%0.0067%0.0000%0.1072%0.0067%0.0000%
GLOUCESTER COUNTY302840.0859%0.0066%0.0000%0.0859%0.0066%0.0000%0.0859%0.0066%0.0000%0.0859%0.0066%0.0000%
WARREN COUNTY305170.0885%0.0066%0.0000%0.0852%0.0066%0.0000%0.0819%0.0066%0.0000%0.0786%0.0066%0.0000%
ISLE OF WIGHT COUNTY311790.0898%0.0064%0.0000%0.0834%0.0064%0.0000%0.0898%0.0064%0.0000%0.0834%0.0064%0.0000%
ROCKBRIDGE COUNTY162660.1230%0.0123%0.0000%0.1045%0.0061%0.0000%0.1230%0.0123%0.0000%0.1045%0.0061%0.0000%
CULPEPER COUNTY371170.0943%0.0108%0.0000%0.0808%0.0054%0.0000%0.0889%0.0108%0.0000%0.0754%0.0054%0.0000%
FAUQUIER COUNTY563960.0887%0.0071%0.0000%0.0762%0.0053%0.0000%0.0887%0.0071%0.0000%0.0762%0.0053%0.0000%
FREDERICKSBURG CITY194550.0874%0.0051%0.0000%0.0720%0.0051%0.0000%0.0874%0.0051%0.0000%0.0720%0.0051%0.0000%
FRANKLIN COUNTY398660.0602%0.0050%0.0050%0.0502%0.0050%0.0050%0.0552%0.0000%0.0000%0.0452%0.0000%0.0000%
MANASSAS CITY238150.1008%0.0042%0.0000%0.0966%0.0042%0.0000%0.0840%0.0042%0.0000%0.0798%0.0042%0.0000%
YORK COUNTY508380.0925%0.0157%0.0000%0.0669%0.0039%0.0000%0.0885%0.0157%0.0000%0.0629%0.0039%0.0000%
BATH COUNTY33580.0893%0.0000%0.0000%0.0893%0.0000%0.0000%0.0893%0.0000%0.0000%0.0893%0.0000%0.0000%
BRISTOL CITY123450.0729%0.0000%0.0000%0.0567%0.0000%0.0000%0.0567%0.0000%0.0000%0.0567%0.0000%0.0000%
BUCKINGHAM COUNTY110630.1356%0.0271%0.0000%0.0904%0.0000%0.0000%0.1356%0.0271%0.0000%0.0904%0.0000%0.0000%
BUENA VISTA CITY44320.0903%0.0000%0.0000%0.0903%0.0000%0.0000%0.0903%0.0000%0.0000%0.0903%0.0000%0.0000%
CAROLINE COUNTY228940.1005%0.0087%0.0000%0.0830%0.0000%0.0000%0.1005%0.0087%0.0000%0.0830%0.0000%0.0000%
CHARLES CITY COUNTY57200.0524%0.0000%0.0000%0.0350%0.0000%0.0000%0.0524%0.0000%0.0000%0.0350%0.0000%0.0000%
COVINGTON CITY38880.1029%0.0000%0.0000%0.0772%0.0000%0.0000%0.1029%0.0000%0.0000%0.0772%0.0000%0.0000%
DICKENSON COUNTY101440.1084%0.0000%0.0000%0.0887%0.0000%0.0000%0.1084%0.0000%0.0000%0.0887%0.0000%0.0000%
ESSEX COUNTY83180.1443%0.0000%0.0000%0.1443%0.0000%0.0000%0.1443%0.0000%0.0000%0.1443%0.0000%0.0000%
FLOYD COUNTY118520.0759%0.0000%0.0000%0.0759%0.0000%0.0000%0.0759%0.0000%0.0000%0.0759%0.0000%0.0000%
GILES COUNTY120930.0413%0.0000%0.0000%0.0331%0.0000%0.0000%0.0413%0.0000%0.0000%0.0331%0.0000%0.0000%
GREENSVILLE COUNTY64350.1709%0.0155%0.0000%0.1399%0.0000%0.0000%0.1709%0.0155%0.0000%0.1399%0.0000%0.0000%
KING AND QUEEN COUNTY54030.0740%0.0000%0.0000%0.0740%0.0000%0.0000%0.0740%0.0000%0.0000%0.0740%0.0000%0.0000%
KING GEORGE COUNTY197800.1314%0.0000%0.0000%0.0910%0.0000%0.0000%0.1314%0.0000%0.0000%0.0910%0.0000%0.0000%
MARTINSVILLE CITY90700.0992%0.0000%0.0000%0.0882%0.0000%0.0000%0.0992%0.0000%0.0000%0.0882%0.0000%0.0000%
MIDDLESEX COUNTY87460.1029%0.0114%0.0000%0.0800%0.0000%0.0000%0.1029%0.0114%0.0000%0.0800%0.0000%0.0000%
POQUOSON CITY96350.0934%0.0000%0.0000%0.0934%0.0000%0.0000%0.0934%0.0000%0.0000%0.0934%0.0000%0.0000%
ROANOKE CITY660830.0817%0.0015%0.0000%0.0666%0.0000%0.0000%0.0817%0.0015%0.0000%0.0666%0.0000%0.0000%
RUSSELL COUNTY192400.1091%0.0000%0.0000%0.1040%0.0000%0.0000%0.1091%0.0000%0.0000%0.1040%0.0000%0.0000%
Categories
Election Data Analysis Election Forensics Election Integrity mathematics programming technical

Potential duplicate registrants in VA voter list

I previously documented the utilization of the Hamming string distance measure to identify candidate pairs of duplicate registrants in voter lists. While a good first attempt at quantifying the numbers of potential duplicates in the voter rolls, using a hamming distance metric is less than ideal for reasons discussed below and in the previous article. I have since been able to update the processing functions to use a more complete Levenshtein distance (LD) metric, and made some improvements to parsers and other code utilities, etc., but otherwise the the analysis followed the same procedure, and is described below.


Using the 2022-11-23 Registered Voter List (RVL) and the 2023-01-26 Voter History List (VHL) purchased from the VA Department of Elections (ELECT) I wrote up an analysis script to check for potentially duplicated registrant records in the RVL and cross reference duplicate pairings with the VHL to identify potential duplicate votes. The details are summarized below.

Please note that I will not publish voter Personally Identifiable Information (PII) on this blog. I have substituted fictitious PII information for all examples given below, and cryptographically hashed all voter information in the downloadable results file. I will make available the detailed information to those that have the authorization to receive and process voter data upon request (contact us).

Summary of Results:

As a baseline, there were 6,464 records for STATUS=’Active’ registrants that adhered to the definition of a “duplicate” when Social Security Number (SSN) is not available, as defined by the MOU between DMV and ELECT (section 7.3) of having the same First Name + Last Name + Full Date of Birth (DOB). I’ve included a copy of the MOU between the VA DMV and ELECT at the end of this article for reference. It should be noted that most records held by DMV and ELECT have a SSN associated with them (or at least they should). SSN information is not distributed as part of the data purchased from ELECT, however, so this is the appropriate standard baseline for this work.

Upgrading our definition of a potential duplicate to [First + Middle + Last + Suffix + DOB] and using a LevenshteinDistance=0 drops the number of potential duplicates to 1,982, with each identified registrant in a pair having an exactly matching string result and unique voter ID numbers.

According to my derivations and simulations that are described in detail here, we should only expect to see an average of 11 (+/- 3) potential duplicate pairs (a.k.a. “collisions”) at a distance of 0. This is over two orders of magnitude different than what we observe in the compiled results. Such a discrepancy deserves further investigation and verification.

Allowing for a single string difference by setting LevenshteinDistance<=1 increases the pool of potential duplicates to 5,568. While this relaxation of the filter does allow us to find certain issues (described below) it also increases our chances of finding false positives as well. The LD metric results should not be viewed as a final determination, but as simply a useful tool to make an initial pass through the data and find candidate matches that still require further review, verification and validation.

Increasing to LevenshteinDistance<=2 brings the number of potential duplicates up to 32,610. When we increase to LD <= 3 we get an explosion of 183,130 potential duplicates.

Method:

For every entry in the latest RVL, I performed a string distance comparison, based on Levenshtein distance, between every possible pair of strings of (FIRST NAME + MIDDLE NAME + LAST NAME + SUFFIX + FULL DOB).  For the ~6M different RVL entries, we therefore need to compute ~3.8 x 10^13 different string comparisons, and each string comparison can require upwards of 75 x 75 individual character comparisons, meaning the total number of character operations is on the order of 202.5 Quadrillion, not including logging and I/O.

A distance of 0 indicates the strings being compared are identical, a distance of 1 indicates that there a single character can be changed, inserted or removed that would convert one string into the other. A distance of 2 indicates that 2 modifications are required, etc. 

Example: The string pair of “ALISHA” –> “ALISHIA” has an LD of 1, corresponding to the addition of an “I” before the final “A”.

I aggregated all of the Levenshtein distance pairings that were less than or equal to 3 characters different in order to identify potential (key word) duplicated registrants, and additionally for each pairing looked at the voter history information for each registrant in the pair to determine if there was a potential (again … key word) for multiple ballots to be cast by the same person in any given election.  As we allow for more characters to be different, we potentially are including many more likely false positive matches, even if we are catching more true positives.

For example: At a distance of 4 the strings of “Dave Joseph Smith M 10/01/1981” and “Tony Joseph Smith M 10/01/1981” at the same address would produce a potential match, but so would “Davey Joseph Smith M 10/01/1981” and “David Josiph Smith M 10/02/1981”. The first pair is more likely to be a false positive due to twins, while the second is more likely to be due to typo’s, mistakes, or use of nicknames and might warrant further investigation. A much stronger potential match would be something like “David Josiph Smith M 10/01/1981” and “David Joseph Smith M 10/01/1981”, with a distance of 1 at the same address. In an attempt to limit false positives, I have clamped the distance checks to <= 3 in this analysis.

The Levenshtein distance measure is importantly able to identify potential insertions or deletions as well as character changes, which is an improvement over the Hamming distance measure. This is exampled by the following pairing: “David Joseph Smith M 10/01/1981” and “Dave Joseph Smith M 10/01/1981”. The change from “id” to “e” in the first name adds/subtracts a character making the rest of the characters in the remainder of the string shift position. A Levenshtein metric would correctly return a small distance of 2, whereas the hamming distance returns 27.

Note that with the official records obtained from ELECT, and in accordance with the laws of VA, I do not have access to the social security number or drivers license numbers for each registration record, which would help in identifying and discriminating potential duplicate errors vs things like twins, etc. I only have the first name, middle name, last name, suffix, month of birth, day of birth, year of birth, gender, and address information that I can work with.  I can therefore only take things so far before someone else (with investigative authority and ability to access those other fields) would need to step in and confirm and validate these findings.

Results:

The summary totals are as follows, with detailed examples.

DMV_ELECT MOU StandardLD <= 0LD <= 1LD <= 2LD <= 3
Number of Potential Duplicate Registrant Pairs7,586 (0.12%)2,472 (0.04%)6,620 (0.11%)32,610 (0.53%)183,130 (2.99%)
Number of Potential Duplicate Registrant Pairs (Active Only)6,464 (0.11%)1,982 (0.03%)5,568 (0.10%)28,884 (0.50%)164,302 (2.85%)
Number of Potential Duplicate Ballots6,3621123,57637,028236,254
Number of Potential Duplicate Ballots (Active Only)6,2281103,54236,434232,394

Examples of Types of Issues Observed:

NOTE THE BELOW INFORMATION HAS HAD THE VOTER PERSONALLY IDENTIFIABLE INFORMATION (“PII”) FICTIONALIZED. WHILE THESE ARE BASED ON REAL DATA TO ILLUSTRATE THE DIFFERENT TYPES OF OBSERVATIONS, THEY DO NOT REPRESENT REAL VOTER INFORMATION.

Example #1: The following set of records has the exact match (distance = 0) of full name and full birthdate (including year), but different address and different voter ID numbers AND there was a vote cast from each of those unique voter ID’s in the 2020 General Election.  While it’s remotely possible that two individuals share the exact same name, month, day and year of birth … it is probabilistically unlikely (see here), and should warrant further scrutiny.

Voter Record A:

AMY BETH McVOTER 12/05/1970 F 12345 CITIZEN CT

Voter Record B:

AMY BETH McVOTER 12/05/1970 F 5678 McPUBLIC DR

Example #2: This set of records has a single character different (distance of 1) in their first name, but middle name, last name, birthdate and address are identical AND both records are associated with votes that were cast in the 2020, 2021, and 2022 November General Elections.  While it is possible that this is a pair of 23 year old twins (with same middle names) that live together, it at least bears looking into.

Voter Record A:

TAYLOR DAVID VOTER 02/16/2000 M 6543 OVERLOOK AVE NW

Voter Record B:

DAYLOR DAVID VOTER 02/16/2000 M 6543 OVERLOOK AVE NW

Example #3: This set of records has two characters different (distance of 2) in their birthdate, but name and address are identical AND the birth years are too close together for a child/parent relationship, AND both records are associated with votes that were cast in the 2020 and 2022 November General Elections. 

Voter Record A:

REGINA DESEREE MACGUFFIN 02/05/1973 F 123 POPE AVE

Voter Record B:

REGINA DESEREE MACGUFFIN 03/07/1973 F 123 POPE AVE

Example #4: This set of records has again a single character different (distance of 1) in the first name (but not the first letter this time) and the last name, birthdate and address are identical.  There were also multiple votes cast in the 2019 and 2022 November General from these registrants.

Voter Record A:

EDGARD JOHNSON 10/19/1981 M 5498 PAGELAND BLVD

Voter Record B:

EDUARD JOHNSON 10/19/1981 M 5498 PAGELAND BLVD

Example #5: This set of records has two characters different (distance of 2) in the first and middle names and the last name, birthdate, gender and address are identical.  There were also multiple votes cast in the 2021 and 2022 November General from these registrants. Again it is possible that these records represent a set of twins given the information that ELECT provides.

Voter Record A:

ALANA JAVETTE THOMPSON 01/01/2003 F 123 CHARITY LN

Voter Record B:

ALAYA YAVETTE THOMPSON 01/01/2003 F 123 CHARITY LN

Example #6: The following set of records has the exact match (Distance = 0) of full name and full birthdate (including year), and same address but different voter ID numbers.  There was no duplicated votes in the same election detected between the two ID numbers.

Voter Record A:

JAMES TIBERIUS KIRK 03/22/2223 M 1701 Enterprise Bridge

Voter Record B:

JAMES TIBERIUS KIRK 03/22/2223 M 1701 Enterprise Bridge

Example #7: The following set of records has the exact match (distance = 0) of full name and full birthdate (including year), same address but different gender and voter ID numbers.  There was no duplicated votes in the same election detected between the two ID numbers.

Voter Record A:

MAXWELL QUAID CLINGER 11/03/2004 M 4077 MASH DR

Voter Record B:

MAXWELL QUAID CLINGER 11/03/2004 U 4077 MASH DR

Example #8: The following set of records has a single punctuation character different, with the same address but different voter ID numbers.  There was no duplicated votes in the same election detected between the two ID numbers.

Voter Record A:

JOHN JACOB JINGLHIEMER-SCHMIDT 06/29/1997 M 12345 JACOBS RD

Voter Record B:

JOHN JACOB JINGLHIEMER SCHMIDT 06/29/1997 M 12345 JACOBS RD

Results Dataset:

A full version of the aggregated excel data is provided below, however all voter information (ID, first name, middle name, last name, dob, gender, address) have been removed and replaced by a one-way hash number, with randomized salt, based on the voter ID. The full file with specific voter information can be provided to parties authorized by ELECT to receive and process voter information, Election Officials, or Law Enforcement upon request.

20221123-VA-RVL-String-Distance.csv

The MOU between the VA Department of Elections (ELECT) and the VA Department of Motor Vehicles (DMV) is also provided below for reference. Section 7.3 defines the minimal standards for determining a match when no social security number is present.