Categories
Election Data Analysis Election Integrity technical

Ballot Completion Rate for VA Absentee Voters

Below is an analysis of the VA statewide voter completion rate for absentee ballots compiled from the 2022 General Election Daily Absentee List (DAL) file downloaded from the VA Dept of Elections (“ELECT”) on 2022-11-15 17:46:21.

The DAL file records the transactions of all absentee ballots during the early voting period in VA elections. It includes records for both mail-in and in-person early voting transactions. It does not record the the actual values of the voted ballots, but the “fact-of” a registered voters checking in to an early voting site, or mailing their ballot application or completed ballot to the registrar, etc. The DAL record is published daily over the course of the early voting period and the file is cumulative.

For the purposes of this analysis a “Completed” ballot is a ballot that has been recorded in the DAL file as reaching a state in which the ballot can be considered to be tabulate-able. A “Completed” ballot must have its “APP_STATUS” field set to “Approved” AND have the “BALLOT_STATUS” field set to (“FWAB” OR “Marked” OR “On Machine” OR “Pre-Processed”).

The “VOTER_TYPE” field was used to separate records into “Military”, “Overseas” or “Temporary(Federal-Only Ballot)” and the ballot completion rate was computed for each sub-category, as well as overall.

All Absentee Voters
Avg Transactions Per Voter1.03
Avg Completion Rate Per Voter91.95%
Num Of Unique Voters1,057,268
Military Voters (VOTER_TYPE==”Military”)
Avg Transactions Per Voter1.08
Avg Completion Rate Per Voter78.60%
Num Of Unique Voters9,346
Overseas Voters (VOTER_TYPE==”Overseas”)
Avg Transactions Per Voter1.17
Avg Completion Rate Per Voter63.63%
Num Of Unique Voters7,052
Temporary Federal Voters (VOTER_TYPE==”Temporary(Federal-Only Ballot)”
Avg Transactions Per Voter1.21
Avg Completion Rate Per Voter61.14%
Num Of Unique Voters1,539
Discussion:

The data above shows that there is a distinct statistical discrepancy in the ability of Military, Overseas, or Temporary Federal Workers to complete their absentee ballots in comparison to standard ballots. These categories of voters are specifically reliant on the Mail-In absentee ballot process, and are demonstrably not having the same ability to have their votes cast and counted as is provided to standard absentee voters.

This discrepancy might be due to any number of potential reasons or mechanisms, which cannot be determined from the DAL data as provided by ELECT. The discrepancy demonstrably exists, though, and it should be investigated and remedied by legislators and officials in order to remedy the comparative disenfranchisement of specific classes of VA voters..

I will note for completeness that the first discovery and observation of this discrepancy was due to the diligent work of a fellow EPEC board member. I independently validated his results and created the scripts to process the data on a statewide basis to produce these tables. As always I am happy to provide the raw data, scripts and results to interested parties that are capable of receiving and handling VA election data according to VA law and the policies of ELECT. Interested parties can contact us to request more information.

Categories
Election Data Analysis Election Forensics Election Integrity mathematics programming technical

Voter ID Number distribution patterns in VA Registered Voter List

One thing that I have been asked about repeatedly is if there is any sort of patterns in the assignment of voter ID numbers in the VA data. Specifically, I’ve been asked repeatedly if I’ve found any similar pattern to what AuditNY has found in the NY data. It’s not something that I have looked at in depth previously due mostly to lack of time, and because VA is setup very differently than NY, so a direct comparison or attempt to replicate the AuditNY findings in VA isn’t as straightforward as one would hope.

The NY data uses a different Voter ID number for counties vs at the state level, which is the “Rosetta Stone” that was needed for the NY team to understand the algorithms that were used to assign voter ID numbers, and in turn discover some very (ahem) “interesting” patterns in the data. VA doesn’t have such a system and only uses a single voter ID number throughout the state and local jurisdictions.

Well … while my other machine is busy crunching on the string distance computations, I figured I’d take a crack at looking at the distribution of the Voter ID numbers in the VA Registered Voter List (RVL) and just see what I find.

To start with, here is a simple scatter plot of the Voter ID numbers vs the Registration date for each record in the 2023-07-01 RVL. From the zoomed out plot it is readily apparent that there must have been a change in the algorithm that was used to assign voter identification numbers sometime around 2007, which coincides nicely with the introduction of the current Virginia Election and Registration Information System (VERIS) system.

From a high level, it appears that the previous assignment algorithm broke the universe of possible ID numbers up into discrete ranges and assigned IDs within those ranges, but favoring the bottom of each range. This would be a logical explanation for the banded structure we see pre-2007. The new assignment algorithm post-2007 looks to be using a much more randomized approach. Nothing strange about that. As computing systems have gotten better and security has become more of a concern over the years there have been many systems that migrated to more randomized assignments of identification numbers.

Looking at a zoomed in block of the post-2007 “randomized” ID assignments we can see some of the normal variability that we would expect to see in the election cycles. We see that we have a high density of new assignments around November of 2016 and 2020, with a low density section of assignments correlated to the COVID-19 lockdowns. There are short periods where it looks like there were lulls in the assignment of voter ID’s, these are perhaps due to holidays or maintenance periods, or related to the legal requirements to “freeze” the voter rolls 30 days before any election (primaries, runoffs, etc). Note that VA now has same day voter registration as of the laws passed by the previous democratic super-majority that went into affect in 2022, so going forward we would likely see these “blackout periods” be significantly reduced.

We can see more clearly the banded assignment structure of the pre-2007 entries by zooming in on a smaller section of the plot, as shown below. It’s harder to make out in this banded structure, but we still see similar patterns of density changes presumably due to the natural election cycles, holidays, maintenance periods, legally required registration lockouts periods, etc. We can also see the “bucketing” of ID numbers into distinct bands, with the bias of numbers filling the lower section of each band.

All of that looks unremarkable and seems to make sense to me … however … if we zoom into the Voter ID address range of around [900,000,000 to 920,000,000] we do see something that catches my curiosity. We see the existence of the same banded structure as above between 900,000,000 and 915,000,000 AND pre-2007, but there is another band of assignments super-imposed on the entire date range of the RVL. This band does not seem to be affected by the introduction of the VERIS system (presumably), which is very interesting. There is also what looks like to be a vertical high-density band between 2007 and 2010 that extends along the entire vertical axis, but we only see it once we zoom in to the VERIS transition period.

The horizontal band that extends across all date ranges only exists in the [~915,000,000 to ~920,000,000] ID range. It trails off in density pre ~1993, but it exists throughout the full registration date range. I will note that the “Motor Voter” National Voter Registration Act (NVRA) was implemented in 1993, so perhaps these are a reserved universe block for DMV (or other externally provided) registrations? (That’s a guess, but an educated one.)

A plausible explanation I can imagine for the distinct high density band between 2007-2010 is that this might be related to how the VERIS system was implemented and brought into service, and there was some sort of update around 2010 that made correction to its internal algorithms. (But that is just a guess.) That still wouldn’t entirely explain the huge change in the density of new registrants added to the rolls.

Another, or additional, explanation might be that when VERIS came online there were a number of registrants that had their Voter ID number regenerated and/or their registration date field updated as part of the rollout of the new VERIS software. Meaning that while VERIS was coming online and handling the normal amount of new real registrations, it was also moving/updating a large number of historic registrations, which would account for the higher density as VERIS became the system of record. That seems to be a poor systems administration and design choice, in my opinion, as it makes inaccurate those moved registrant records by giving them a false registration date. However, if that was the case, and VERIS was resetting registration dates as it ingested voter records into its databases, why do we see any records with pre-2007 registration dates at all? (This is again, merely an educated guess on my part, so take with a grain of salt.)

Incorporating the identification of cloned registrations

In attempting to incorporate some of my early results on the most recent RVL data doing duplicate record identification (technically they are “cloned” records, as “duplicates” would have the same voter ID numbers. This was pointed out to me a few days ago.) on this dataset, I did a scatter plot of only those records that had an identified exact match of (FullName +DOB) to other records in the dataset, but with unique Voter ID numbers. The scatter plot of those records is shown below, and we can see that there is a distinct ~horizontal cluster of records that aligns with the 915M – 920M ID band and pre-2007. In the post-2007 block we see the cloned records do not seem to be totally randomly distributed, but have a bias towards the lower right of the graph.

Superimposing the two plots produces the following, with the red indicating the records with identified Full Name + DOB string matches.

Zooming in to take a closer look at the 915M-920M band again, gives the following:

It is curious that there seems to be an alignment of the exact Full Name + DOB matching records with the 915M-920M, pre-2007 ID band. Post-2007 the exact cloned matches have a less structured distribution throughout the data, but they do seem to cluster around the lower right.

If the cloned records were simply due to random data entry errors, etc. I would expect to see sporadic red datapoints distributed “salt-n-pepper” style throughout the entirety of the area covered by the blue data. There might be some argument to be made for there being a bias of more of the red data points to the right side of the plot, as officials have not yet had time to “catch” or “clean-up” erroneous entries, but there is little reason to have linear features, or to have a bias for lower ID numbers in the vertical axis.

I am continuing to investigate this data, but as of right now all I can tell you is that … yes, there does seem to be interesting patterns in the way Voter IDs are assigned in VA, especially with records that have already been found and flagged to be problematic (clones).

Categories
Election Data Analysis Election Forensics Election Integrity programming technical

Preliminary Results of 2023-07-01 VA RVL duplicate detections

Below are the preliminary results from performing exact (string distance of 0) duplicate record checks on the 2023-07-01 VA Registered Voter List information. Note that these are the numbers of ordered matches discovered, not the number of individual unique registrants. Each count represents two different registration records, with unique voter IDs, that match the given criteria. Match pairs are directional, in that a pair (A,B) is counted separately from the pair (B, A). Matches are grouped into this table according to the LOCALITY_NAME of the first element of the identified pair. Some pairs can have different locality information, except for in the strictest case (3rd column), so a match might be counted in one locality while its mirror is counted in the other.

The first data column of the table below is equivalent to the criteria used by the MOU between the VA Department of Elections (ELECT) and the Department of Motor Vehicles (DMV), as discussed and documented in a previous post. There were 5,290 matches for this criteria across the state.

The second data column below is based on a match of the registrants Full Name and Day+Month+Year of birth information, but NOT the registrants listed address information. There were 1,200 matches for this criteria across the state.

The third data column is the strictest match criteria and includes the Gender and Address of the registrant. There were 208 matches for this criteria across the state.

I am only considering Active registrations in the table below. I previously computed similar statistics using the previous purchased 2022-11-23 RVL, I have not yet done a comparison of the two datasets, but will do so once I complete the string distance processing on this latest set.

Row LabelsSum of Exact Same First+Last+DobSum of Exact Same First+Middle+Last+Sfx+DOBSum of Exact Same First+Middle+Last+Sfx+Gender+Address+DOB
ACCOMACK COUNTY2470
ALBEMARLE COUNTY1165722
ALEXANDRIA CITY60130
ALLEGHANY COUNTY1240
AMELIA COUNTY500
AMHERST COUNTY3282
APPOMATTOX COUNTY1500
ARLINGTON COUNTY152620
AUGUSTA COUNTY6962
BATH COUNTY200
BEDFORD COUNTY67130
BLAND COUNTY500
BOTETOURT COUNTY2420
BRISTOL CITY800
BRUNSWICK COUNTY1710
BUCHANAN COUNTY1150
BUCKINGHAM COUNTY700
BUENA VISTA CITY300
CAMPBELL COUNTY39134
CAROLINE COUNTY1800
CARROLL COUNTY1400
CHARLES CITY COUNTY200
CHARLOTTE COUNTY1110
CHARLOTTESVILLE CITY2580
CHESAPEAKE CITY136138
CHESTERFIELD COUNTY38012734
CLARKE COUNTY1100
COLONIAL HEIGHTS CITY1240
COVINGTON CITY500
CRAIG COUNTY300
CULPEPER COUNTY2620
CUMBERLAND COUNTY1352
DANVILLE CITY2140
DICKENSON COUNTY600
DINWIDDIE COUNTY2120
EMPORIA CITY930
ESSEX COUNTY910
FAIRFAX CITY1220
FAIRFAX COUNTY55018814
FALLS CHURCH CITY1280
FAUQUIER COUNTY4120
FLOYD COUNTY900
FLUVANNA COUNTY2330
FRANKLIN CITY842
FRANKLIN COUNTY2022
FREDERICK COUNTY5494
FREDERICKSBURG CITY1420
GALAX CITY300
GILES COUNTY300
GLOUCESTER COUNTY2420
GOOCHLAND COUNTY2320
GRAYSON COUNTY1310
GREENE COUNTY1610
GREENSVILLE COUNTY700
HALIFAX COUNTY2364
HAMPTON CITY131378
HANOVER COUNTY73100
HARRISONBURG CITY1882
HENRICO COUNTY2457534
HENRY COUNTY2944
HIGHLAND COUNTY200
HOPEWELL CITY3374
ISLE OF WIGHT COUNTY2710
JAMES CITY COUNTY3110
KING AND QUEEN COUNTY400
KING GEORGE COUNTY1700
KING WILLIAM COUNTY1740
LANCASTER COUNTY830
LEE COUNTY38142
LEXINGTON CITY830
LOUDOUN COUNTY158316
LOUISA COUNTY2610
LUNENBURG COUNTY1350
LYNCHBURG CITY75294
MADISON COUNTY1020
MANASSAS CITY2332
MANASSAS PARK CITY410
MARTINSVILLE CITY900
MATHEWS COUNTY522
MECKLENBURG COUNTY2530
MIDDLESEX COUNTY600
MONTGOMERY COUNTY4750
NELSON COUNTY1554
NEW KENT COUNTY1810
NEWPORT NEWS CITY88120
NORFOLK CITY111150
NORTHAMPTON COUNTY832
NORTHUMBERLAND COUNTY810
NOTTOWAY COUNTY1084
ORANGE COUNTY2940
PAGE COUNTY2330
PATRICK COUNTY1010
PETERSBURG CITY3160
PITTSYLVANIA COUNTY6292
POQUOSON CITY800
PORTSMOUTH CITY80100
POWHATAN COUNTY2660
PRINCE EDWARD COUNTY2050
PRINCE GEORGE COUNTY2452
PRINCE WILLIAM COUNTY178410
PULASKI COUNTY1620
RADFORD CITY810
RAPPAHANNOCK COUNTY610
RICHMOND CITY159482
RICHMOND COUNTY563810
ROANOKE CITY4000
ROANOKE COUNTY5770
ROCKBRIDGE COUNTY1720
ROCKINGHAM COUNTY3470
RUSSELL COUNTY1900
SALEM CITY1110
SCOTT COUNTY1740
SHENANDOAH COUNTY2240
SMYTH COUNTY1700
SOUTHAMPTON COUNTY1860
SPOTSYLVANIA COUNTY86224
STAFFORD COUNTY92212
STAUNTON CITY1500
SUFFOLK CITY5152
SURRY COUNTY820
SUSSEX COUNTY1122
TAZEWELL COUNTY2100
VIRGINIA BEACH CITY244252
WARREN COUNTY2510
WASHINGTON COUNTY3552
WAYNESBORO CITY1300
WESTMORELAND COUNTY1920
WILLIAMSBURG CITY1580
WINCHESTER CITY1120
WISE COUNTY2450
WYTHE COUNTY2120
YORK COUNTY2600
Grand Total52901200208
Categories
Election Data Analysis Election Integrity technical

VA Voter List data standardization and normalization

EPEC has purchased and downloaded the full statewide VA Registered Voter List (RVL), the full Voter History List (VHL) and the Monthly Update Subscription (MUS) to the voter list as of 2023-06-30. These files are provided by ELECT as comma-separated-value files, but contain numerous idiosyncrasies, formatting issues and errors.

We combined the MUS information with our baseline list to create a new Statewide voter list record incorporating all of the relevant changes. As we had just downloaded our baseline list only the day before we received the MUS, there were a number of entries in the MUS that had already been incorporated into our baseline dataset, however there were a few significant deletions / adds / modifications.

The updated RVL and VHL is currently being processed using the following methods, among others:

  • The Statewide RVL and VHL are being split into smaller data files organized by LOCALITY_NAME, LOCALITY_PRECINCT_NAME, CONG_CODE_VALUE, STHOUSE_CODE_VALUE, STSENATE_CODE_VALUE, and CITY
  • The data has been standardized and normalized to remove whitespace errors, all fields have been converted to upper case, observed field name issues have been corrected, and missing fields in the VHL have been added.
    • The VHL does not contain “LOCALITY_NAME” or “PRECINCT_NAME” fields, but does reference each by code value. The missing fields have been added into the VHL after correlating with the RVL data in order to endure commonality between the datasets, and to allow for splitting into the folder structure defined above.
    • The formatting for precinct names in the RVL is inconsistent in its use of spaces and dashes between the precinct code and name. This has been standardized to be the ” – ” separator.
    • The inconsistent use of the ampersand symbol (“&”) in county names, such as “KING & QUEEN COUNTY”, has been standardized to always use the word “AND” instead.
    • etc. We will continue to update these standardizations and error checks as we discover new issues.
  • The primary and mailing addresses from the RVL have been fed as input to an NCOA processing system (truencoa.com) and the resultant reports have been collated for each grouping as listed above.
  • The RVL fields have also been collated against version 13 of the US Dept of Transportation’s National Address Database and the RVL entries have been augmented with the information regarding whether a match was found or not, as well as the type of match. Our best attempt has been made to match addresses to the RVL entries, but there are still inconsistencies and mis-spellings in both the NAD and RVL data that we are continuing to work to identify and improve.
    • Prior to matching to the NAD listings the RVL primary and mailing addresses are normalized and standardized according to the US Post Offices published list of common street suffix abbreviations.
    • Initial matches are attempted based on a Strict match to either the Primary or Mailing address
    • Subsequent matches use iterative relaxing of various criteria, such as ignoring the street suffix, or flipping the position of the street direction indicator. We have denoted the USDOT_MATCH_TYPE in the augmented RVL dataset to allow filtering on these different matching criteria.

EPEC is working to make this value-added data available for those entities that are authorized to handle VA election information. Interested parties may contact us for more details.

Categories
Election Data Analysis Election Forensics Election Integrity mathematics programming technical

Potential Duplicate Registrants in VA RVL by Locality

Previously I posted the computation of potential duplicate records based on string comparisons in the registered voter list. As a follow up to that article, I’ve compiled the statistics of the number of potential pairs for each locality in VA.

I tallied the number of registrant pairs with the reference match criteria defined by the MOU between ELECT and the DMV along with the two highest confidence (most stringent) match criteria that I computed. I also stratified the results by Active registrant records only or either Active or Inactive records. I also stratified by if the pairs crossed a locality boundary or not.

The table below is organized into the following computed columns, and has been sorted in decreasing order according to column 5.

  1. Exactly matching First + Last + DOB, which is equivalent to the MOU between ELECT and DMV.
  2. Exactly matching First + Middle + Last + Suffix + DOB
  3. Exactly matching First + Middle + Last + Suffix + DOB + Gender + Street Address
  4. The same as #1, but filtering for only ACTIVE voter records
  5. The same as #2, but filtering for only ACTIVE voter records
  6. The same as #3, but filtering for only ACTIVE voter records
  7. The same as #1, but filtering for only pairs that cross a locality boundary.
  8. The same as #2, but filtering for only pairs that cross a locality boundary.
  9. The same as #3, but filtering for only pairs that cross a locality boundary.
  10. The same as #4, but filtering for only pairs that cross a locality boundary.
  11. The same as #5, but filtering for only pairs that cross a locality boundary.
  12. The same as #6, but filtering for only pairs that cross a locality boundary.


123456789101112
LOCALITY_NAMENum Registrant RecordsPct Same First Last DobPct Same Full Name DobPct Same Full Name Dob AddressPct Same First Last Dob _ Active OnlyPct Same Full Name Dob _ Active OnlyPct Same Full Name Dob Address _ Active OnlyPct Same First Last Dob _ xLocPct Same Full Name Dob _ xLocPct Same Full Name Dob Address _ xLocPct Same First Last Dob _ Active Only _ xLocPct Same Full Name Dob _ Active Only _ xLocPct Same Full Name Dob Address _ Active Only _ xLoc
NORTON CITY26040.2304%0.2304%0.1536%0.1920%0.1920%0.1536%0.0768%0.0768%0.0000%0.0384%0.0384%0.0000%
NOTTOWAY COUNTY97040.2988%0.2061%0.0618%0.2473%0.1752%0.0618%0.2370%0.1649%0.0206%0.1855%0.1340%0.0206%
RADFORD CITY95510.4293%0.2827%0.0000%0.2827%0.1675%0.0000%0.4293%0.2827%0.0000%0.2827%0.1675%0.0000%
HIGHLAND COUNTY19030.2627%0.1576%0.1051%0.2627%0.1576%0.1051%0.1576%0.0525%0.0000%0.1576%0.0525%0.0000%
WILLIAMSBURG CITY104800.2195%0.1336%0.0000%0.2004%0.1336%0.0000%0.2004%0.1336%0.0000%0.1813%0.1336%0.0000%
LYNCHBURG CITY563190.3072%0.1829%0.0533%0.2255%0.1296%0.0533%0.1616%0.0764%0.0000%0.1190%0.0479%0.0000%
EMPORIA CITY40230.3480%0.1740%0.0000%0.2983%0.1243%0.0000%0.2486%0.0746%0.0000%0.1989%0.0249%0.0000%
SUFFOLK CITY715800.2403%0.1229%0.0754%0.2249%0.1187%0.0754%0.1229%0.0307%0.0000%0.1104%0.0265%0.0000%
FALLS CHURCH CITY112130.1784%0.1338%0.0357%0.1516%0.1159%0.0178%0.0892%0.0624%0.0000%0.0803%0.0624%0.0000%
SUSSEX COUNTY71490.2658%0.1259%0.0839%0.2238%0.1119%0.0839%0.1539%0.0140%0.0000%0.1119%0.0000%0.0000%
FRANKLIN CITY59240.2026%0.1182%0.0338%0.1857%0.1013%0.0338%0.1688%0.0844%0.0000%0.1519%0.0675%0.0000%
APPOMATTOX COUNTY121950.2542%0.1230%0.0328%0.2214%0.0902%0.0328%0.2050%0.0738%0.0000%0.1886%0.0574%0.0000%
LEE COUNTY156190.2497%0.0960%0.0128%0.2305%0.0832%0.0128%0.1473%0.0192%0.0000%0.1409%0.0192%0.0000%
ALBEMARLE COUNTY848890.1920%0.1001%0.0212%0.1590%0.0825%0.0188%0.1402%0.0554%0.0000%0.1096%0.0401%0.0000%
AMHERST COUNTY229060.1965%0.0829%0.0437%0.1790%0.0742%0.0437%0.1441%0.0393%0.0000%0.1266%0.0306%0.0000%
PRINCE EDWARD COUNTY135950.2280%0.0883%0.0000%0.1912%0.0662%0.0000%0.2133%0.0883%0.0000%0.1765%0.0662%0.0000%
STAUNTON CITY181800.1980%0.0935%0.0000%0.1595%0.0605%0.0000%0.1650%0.0605%0.0000%0.1265%0.0275%0.0000%
NELSON COUNTY118950.1765%0.0673%0.0168%0.1513%0.0588%0.0168%0.1261%0.0504%0.0000%0.1177%0.0420%0.0000%
ARLINGTON COUNTY1770920.1378%0.0683%0.0113%0.1146%0.0576%0.0102%0.0870%0.0344%0.0000%0.0683%0.0260%0.0000%
NORTHUMBERLAND COUNTY104570.1339%0.0574%0.0191%0.1243%0.0574%0.0191%0.0956%0.0191%0.0000%0.0861%0.0191%0.0000%
SOUTHAMPTON COUNTY132180.2194%0.0757%0.0000%0.1740%0.0530%0.0000%0.1589%0.0454%0.0000%0.1286%0.0227%0.0000%
HOPEWELL CITY158250.2401%0.0695%0.0253%0.2085%0.0506%0.0253%0.1390%0.0190%0.0000%0.1201%0.0126%0.0000%
LUNENBURG COUNTY80970.1853%0.0618%0.0000%0.1729%0.0494%0.0000%0.1853%0.0618%0.0000%0.1729%0.0494%0.0000%
AMELIA COUNTY101790.1375%0.0884%0.0098%0.0884%0.0491%0.0098%0.1375%0.0884%0.0098%0.0884%0.0491%0.0098%
RICHMOND CITY1610970.1707%0.0639%0.0000%0.1316%0.0490%0.0000%0.1459%0.0528%0.0000%0.1155%0.0416%0.0000%
CHARLOTTESVILLE CITY347890.1265%0.0604%0.0000%0.1064%0.0489%0.0000%0.1150%0.0489%0.0000%0.0949%0.0374%0.0000%
LEXINGTON CITY42110.2612%0.1187%0.0000%0.1900%0.0475%0.0000%0.2612%0.1187%0.0000%0.1900%0.0475%0.0000%
FAIRFAX COUNTY7877270.1143%0.0559%0.0053%0.0988%0.0474%0.0053%0.0665%0.0236%0.0000%0.0546%0.0171%0.0000%
CHARLOTTE COUNTY84740.2242%0.0708%0.0236%0.1652%0.0472%0.0236%0.2006%0.0472%0.0000%0.1416%0.0236%0.0000%
HARRISONBURG CITY264430.1777%0.0870%0.0000%0.1210%0.0454%0.0000%0.1324%0.0567%0.0000%0.0908%0.0303%0.0000%
BRUNSWICK COUNTY110980.2253%0.0631%0.0000%0.1982%0.0451%0.0000%0.2072%0.0451%0.0000%0.1802%0.0270%0.0000%
HAMPTON CITY1008070.2044%0.0764%0.0060%0.1468%0.0446%0.0040%0.1210%0.0387%0.0000%0.0972%0.0268%0.0000%
WISE COUNTY247500.1455%0.0525%0.0000%0.1333%0.0444%0.0000%0.1212%0.0364%0.0000%0.1091%0.0283%0.0000%
WYTHE COUNTY209500.1480%0.0525%0.0191%0.1289%0.0430%0.0191%0.1002%0.0143%0.0000%0.0907%0.0143%0.0000%
CHESAPEAKE CITY1780050.1258%0.0433%0.0303%0.1140%0.0410%0.0303%0.0843%0.0062%0.0000%0.0747%0.0051%0.0000%
NEWPORT NEWS CITY1247780.1354%0.0537%0.0016%0.1122%0.0409%0.0016%0.1002%0.0313%0.0000%0.0850%0.0216%0.0000%
CUMBERLAND COUNTY74160.1483%0.0539%0.0270%0.1214%0.0405%0.0270%0.1214%0.0270%0.0000%0.0944%0.0135%0.0000%
PRINCE GEORGE COUNTY249570.1643%0.0401%0.0000%0.1322%0.0401%0.0000%0.1643%0.0401%0.0000%0.1322%0.0401%0.0000%
HALIFAX COUNTY250860.1196%0.0438%0.0239%0.1156%0.0399%0.0239%0.0877%0.0120%0.0000%0.0837%0.0080%0.0000%
SMYTH COUNTY201590.1339%0.0397%0.0000%0.1290%0.0397%0.0000%0.1141%0.0198%0.0000%0.1091%0.0198%0.0000%
FAIRFAX CITY178250.1234%0.0617%0.0000%0.0954%0.0393%0.0000%0.1122%0.0617%0.0000%0.0842%0.0393%0.0000%
CAMPBELL COUNTY413180.1380%0.0508%0.0048%0.1186%0.0387%0.0048%0.1283%0.0411%0.0000%0.1089%0.0290%0.0000%
COLONIAL HEIGHTS CITY130660.0918%0.0383%0.0000%0.0918%0.0383%0.0000%0.0918%0.0383%0.0000%0.0918%0.0383%0.0000%
CHESTERFIELD COUNTY2700840.1529%0.0478%0.0067%0.1300%0.0381%0.0059%0.1107%0.0248%0.0000%0.0937%0.0196%0.0000%
PETERSBURG CITY237400.1685%0.0421%0.0000%0.1559%0.0379%0.0000%0.1601%0.0421%0.0000%0.1474%0.0379%0.0000%
SURRY COUNTY56750.1762%0.0352%0.0000%0.1410%0.0352%0.0000%0.1410%0.0000%0.0000%0.1057%0.0000%0.0000%
STAFFORD COUNTY1112610.1222%0.0440%0.0072%0.1079%0.0351%0.0072%0.1007%0.0279%0.0000%0.0881%0.0207%0.0000%
BUCHANAN COUNTY148360.0876%0.0337%0.0000%0.0876%0.0337%0.0000%0.0607%0.0067%0.0000%0.0607%0.0067%0.0000%
PORTSMOUTH CITY683810.1536%0.0409%0.0058%0.1375%0.0336%0.0058%0.1185%0.0263%0.0000%0.1024%0.0190%0.0000%
PITTSYLVANIA COUNTY453220.1677%0.0441%0.0044%0.1522%0.0331%0.0044%0.1324%0.0221%0.0000%0.1214%0.0154%0.0000%
MECKLENBURG COUNTY229960.1522%0.0478%0.0000%0.1305%0.0304%0.0000%0.1261%0.0391%0.0000%0.1131%0.0304%0.0000%
NORTHAMPTON COUNTY98770.0911%0.0304%0.0202%0.0810%0.0304%0.0202%0.0911%0.0101%0.0000%0.0810%0.0101%0.0000%
PAGE COUNTY170950.1872%0.0351%0.0000%0.1521%0.0292%0.0000%0.1521%0.0117%0.0000%0.1170%0.0058%0.0000%
ACCOMACK COUNTY254830.1216%0.0275%0.0000%0.1020%0.0275%0.0000%0.1138%0.0275%0.0000%0.0942%0.0275%0.0000%
GRAYSON COUNTY109410.1645%0.0274%0.0000%0.1554%0.0274%0.0000%0.1462%0.0274%0.0000%0.1371%0.0274%0.0000%
ALLEGHANY COUNTY110690.1355%0.0271%0.0000%0.1084%0.0271%0.0000%0.0994%0.0090%0.0000%0.0723%0.0090%0.0000%
MATHEWS COUNTY73780.0949%0.0271%0.0271%0.0678%0.0271%0.0271%0.0678%0.0000%0.0000%0.0407%0.0000%0.0000%
BEDFORD COUNTY632400.1233%0.0300%0.0063%0.1154%0.0269%0.0063%0.1012%0.0142%0.0000%0.0933%0.0111%0.0000%
HENRICO COUNTY2404360.1152%0.0299%0.0083%0.0998%0.0258%0.0083%0.0944%0.0175%0.0000%0.0807%0.0133%0.0000%
WAYNESBORO CITY155610.1735%0.0450%0.0000%0.1285%0.0257%0.0000%0.1735%0.0450%0.0000%0.1285%0.0257%0.0000%
HANOVER COUNTY870000.1092%0.0287%0.0023%0.1011%0.0253%0.0023%0.1023%0.0218%0.0000%0.0943%0.0184%0.0000%
CRAIG COUNTY39720.1007%0.0252%0.0000%0.1007%0.0252%0.0000%0.1007%0.0252%0.0000%0.1007%0.0252%0.0000%
GALAX CITY40670.1229%0.0246%0.0000%0.1229%0.0246%0.0000%0.1229%0.0246%0.0000%0.1229%0.0246%0.0000%
ORANGE COUNTY284820.1299%0.0351%0.0000%0.1194%0.0246%0.0000%0.1299%0.0351%0.0000%0.1194%0.0246%0.0000%
DANVILLE CITY288380.1040%0.0312%0.0000%0.0902%0.0243%0.0000%0.1040%0.0312%0.0000%0.0902%0.0243%0.0000%
CARROLL COUNTY211630.1040%0.0236%0.0095%0.1040%0.0236%0.0095%0.0945%0.0142%0.0000%0.0945%0.0142%0.0000%
FREDERICK COUNTY679120.1075%0.0324%0.0088%0.0883%0.0236%0.0059%0.0898%0.0206%0.0000%0.0736%0.0147%0.0000%
MANASSAS PARK CITY90180.0665%0.0222%0.0000%0.0554%0.0222%0.0000%0.0444%0.0222%0.0000%0.0333%0.0222%0.0000%
HENRY COUNTY365390.1259%0.0246%0.0000%0.1122%0.0219%0.0000%0.0931%0.0082%0.0000%0.0848%0.0055%0.0000%
BLAND COUNTY45810.1091%0.0218%0.0000%0.1091%0.0218%0.0000%0.1091%0.0218%0.0000%0.1091%0.0218%0.0000%
SPOTSYLVANIA COUNTY1053610.0987%0.0247%0.0057%0.0873%0.0218%0.0057%0.0816%0.0095%0.0000%0.0702%0.0066%0.0000%
WINCHESTER CITY183520.1035%0.0381%0.0000%0.0708%0.0218%0.0000%0.0926%0.0381%0.0000%0.0599%0.0218%0.0000%
LANCASTER COUNTY92670.0755%0.0216%0.0000%0.0755%0.0216%0.0000%0.0755%0.0216%0.0000%0.0755%0.0216%0.0000%
KING WILLIAM COUNTY139960.1286%0.0214%0.0000%0.1143%0.0214%0.0000%0.1286%0.0214%0.0000%0.1143%0.0214%0.0000%
WESTMORELAND COUNTY142330.1827%0.0211%0.0000%0.1756%0.0211%0.0000%0.1546%0.0211%0.0000%0.1475%0.0211%0.0000%
VIRGINIA BEACH CITY3319140.1118%0.0259%0.0066%0.0967%0.0208%0.0066%0.0883%0.0114%0.0000%0.0762%0.0081%0.0000%
POWHATAN COUNTY242870.1400%0.0371%0.0000%0.1153%0.0206%0.0000%0.1400%0.0371%0.0000%0.1153%0.0206%0.0000%
BOTETOURT COUNTY263110.1178%0.0190%0.0076%0.1102%0.0190%0.0076%0.1102%0.0114%0.0000%0.1026%0.0114%0.0000%
FLUVANNA COUNTY210010.1286%0.0238%0.0000%0.1190%0.0190%0.0000%0.1095%0.0238%0.0000%0.1000%0.0190%0.0000%
SCOTT COUNTY160590.1121%0.0249%0.0000%0.1059%0.0187%0.0000%0.0996%0.0125%0.0000%0.0934%0.0062%0.0000%
ALEXANDRIA CITY1122120.0820%0.0205%0.0000%0.0686%0.0178%0.0000%0.0784%0.0169%0.0000%0.0651%0.0143%0.0000%
TAZEWELL COUNTY281470.0995%0.0178%0.0142%0.0959%0.0178%0.0142%0.0853%0.0036%0.0000%0.0817%0.0036%0.0000%
RICHMOND COUNTY56490.2301%0.0354%0.0000%0.1947%0.0177%0.0000%0.1947%0.0354%0.0000%0.1593%0.0177%0.0000%
ROCKINGHAM COUNTY568170.0845%0.0246%0.0035%0.0739%0.0176%0.0000%0.0739%0.0176%0.0000%0.0669%0.0141%0.0000%
LOUISA COUNTY295670.1150%0.0271%0.0135%0.1015%0.0169%0.0135%0.1082%0.0135%0.0000%0.0947%0.0034%0.0000%
LOUDOUN COUNTY2919140.0740%0.0219%0.0041%0.0620%0.0164%0.0041%0.0651%0.0171%0.0000%0.0531%0.0116%0.0000%
RAPPAHANNOCK COUNTY62390.0962%0.0160%0.0000%0.0801%0.0160%0.0000%0.0962%0.0160%0.0000%0.0801%0.0160%0.0000%
JAMES CITY COUNTY643900.0745%0.0186%0.0000%0.0668%0.0155%0.0000%0.0621%0.0124%0.0000%0.0544%0.0093%0.0000%
PATRICK COUNTY128620.0855%0.0155%0.0000%0.0777%0.0155%0.0000%0.0855%0.0155%0.0000%0.0777%0.0155%0.0000%
PRINCE WILLIAM COUNTY3165300.0812%0.0186%0.0000%0.0663%0.0148%0.0000%0.0711%0.0142%0.0000%0.0581%0.0104%0.0000%
AUGUSTA COUNTY549930.1455%0.0218%0.0036%0.1255%0.0145%0.0036%0.1346%0.0182%0.0000%0.1146%0.0109%0.0000%
DINWIDDIE COUNTY208350.1584%0.0384%0.0048%0.1152%0.0144%0.0048%0.1488%0.0288%0.0048%0.1152%0.0144%0.0048%
GOOCHLAND COUNTY214100.1261%0.0187%0.0000%0.1121%0.0140%0.0000%0.1261%0.0187%0.0000%0.1121%0.0140%0.0000%
MONTGOMERY COUNTY619440.0936%0.0145%0.0000%0.0807%0.0129%0.0000%0.0904%0.0145%0.0000%0.0775%0.0129%0.0000%
SHENANDOAH COUNTY323040.0960%0.0155%0.0000%0.0743%0.0124%0.0000%0.0960%0.0155%0.0000%0.0743%0.0124%0.0000%
ROANOKE COUNTY734670.0953%0.0163%0.0027%0.0830%0.0123%0.0027%0.0817%0.0109%0.0000%0.0694%0.0068%0.0000%
SALEM CITY179320.0892%0.0112%0.0000%0.0781%0.0112%0.0000%0.0892%0.0112%0.0000%0.0781%0.0112%0.0000%
NEW KENT COUNTY190220.1051%0.0210%0.0000%0.0894%0.0105%0.0000%0.0946%0.0210%0.0000%0.0789%0.0105%0.0000%
WASHINGTON COUNTY394490.1014%0.0152%0.0000%0.0887%0.0101%0.0000%0.0862%0.0051%0.0000%0.0786%0.0051%0.0000%
MADISON COUNTY104070.0865%0.0192%0.0000%0.0769%0.0096%0.0000%0.0865%0.0192%0.0000%0.0769%0.0096%0.0000%
NORFOLK CITY1412360.0984%0.0092%0.0000%0.0864%0.0085%0.0000%0.0899%0.0064%0.0000%0.0793%0.0057%0.0000%
PULASKI COUNTY238250.0881%0.0126%0.0000%0.0756%0.0084%0.0000%0.0881%0.0126%0.0000%0.0756%0.0084%0.0000%
CLARKE COUNTY122690.1060%0.0163%0.0000%0.0978%0.0082%0.0000%0.1060%0.0163%0.0000%0.0978%0.0082%0.0000%
GREENE COUNTY149260.1072%0.0067%0.0000%0.1072%0.0067%0.0000%0.1072%0.0067%0.0000%0.1072%0.0067%0.0000%
GLOUCESTER COUNTY302840.0859%0.0066%0.0000%0.0859%0.0066%0.0000%0.0859%0.0066%0.0000%0.0859%0.0066%0.0000%
WARREN COUNTY305170.0885%0.0066%0.0000%0.0852%0.0066%0.0000%0.0819%0.0066%0.0000%0.0786%0.0066%0.0000%
ISLE OF WIGHT COUNTY311790.0898%0.0064%0.0000%0.0834%0.0064%0.0000%0.0898%0.0064%0.0000%0.0834%0.0064%0.0000%
ROCKBRIDGE COUNTY162660.1230%0.0123%0.0000%0.1045%0.0061%0.0000%0.1230%0.0123%0.0000%0.1045%0.0061%0.0000%
CULPEPER COUNTY371170.0943%0.0108%0.0000%0.0808%0.0054%0.0000%0.0889%0.0108%0.0000%0.0754%0.0054%0.0000%
FAUQUIER COUNTY563960.0887%0.0071%0.0000%0.0762%0.0053%0.0000%0.0887%0.0071%0.0000%0.0762%0.0053%0.0000%
FREDERICKSBURG CITY194550.0874%0.0051%0.0000%0.0720%0.0051%0.0000%0.0874%0.0051%0.0000%0.0720%0.0051%0.0000%
FRANKLIN COUNTY398660.0602%0.0050%0.0050%0.0502%0.0050%0.0050%0.0552%0.0000%0.0000%0.0452%0.0000%0.0000%
MANASSAS CITY238150.1008%0.0042%0.0000%0.0966%0.0042%0.0000%0.0840%0.0042%0.0000%0.0798%0.0042%0.0000%
YORK COUNTY508380.0925%0.0157%0.0000%0.0669%0.0039%0.0000%0.0885%0.0157%0.0000%0.0629%0.0039%0.0000%
BATH COUNTY33580.0893%0.0000%0.0000%0.0893%0.0000%0.0000%0.0893%0.0000%0.0000%0.0893%0.0000%0.0000%
BRISTOL CITY123450.0729%0.0000%0.0000%0.0567%0.0000%0.0000%0.0567%0.0000%0.0000%0.0567%0.0000%0.0000%
BUCKINGHAM COUNTY110630.1356%0.0271%0.0000%0.0904%0.0000%0.0000%0.1356%0.0271%0.0000%0.0904%0.0000%0.0000%
BUENA VISTA CITY44320.0903%0.0000%0.0000%0.0903%0.0000%0.0000%0.0903%0.0000%0.0000%0.0903%0.0000%0.0000%
CAROLINE COUNTY228940.1005%0.0087%0.0000%0.0830%0.0000%0.0000%0.1005%0.0087%0.0000%0.0830%0.0000%0.0000%
CHARLES CITY COUNTY57200.0524%0.0000%0.0000%0.0350%0.0000%0.0000%0.0524%0.0000%0.0000%0.0350%0.0000%0.0000%
COVINGTON CITY38880.1029%0.0000%0.0000%0.0772%0.0000%0.0000%0.1029%0.0000%0.0000%0.0772%0.0000%0.0000%
DICKENSON COUNTY101440.1084%0.0000%0.0000%0.0887%0.0000%0.0000%0.1084%0.0000%0.0000%0.0887%0.0000%0.0000%
ESSEX COUNTY83180.1443%0.0000%0.0000%0.1443%0.0000%0.0000%0.1443%0.0000%0.0000%0.1443%0.0000%0.0000%
FLOYD COUNTY118520.0759%0.0000%0.0000%0.0759%0.0000%0.0000%0.0759%0.0000%0.0000%0.0759%0.0000%0.0000%
GILES COUNTY120930.0413%0.0000%0.0000%0.0331%0.0000%0.0000%0.0413%0.0000%0.0000%0.0331%0.0000%0.0000%
GREENSVILLE COUNTY64350.1709%0.0155%0.0000%0.1399%0.0000%0.0000%0.1709%0.0155%0.0000%0.1399%0.0000%0.0000%
KING AND QUEEN COUNTY54030.0740%0.0000%0.0000%0.0740%0.0000%0.0000%0.0740%0.0000%0.0000%0.0740%0.0000%0.0000%
KING GEORGE COUNTY197800.1314%0.0000%0.0000%0.0910%0.0000%0.0000%0.1314%0.0000%0.0000%0.0910%0.0000%0.0000%
MARTINSVILLE CITY90700.0992%0.0000%0.0000%0.0882%0.0000%0.0000%0.0992%0.0000%0.0000%0.0882%0.0000%0.0000%
MIDDLESEX COUNTY87460.1029%0.0114%0.0000%0.0800%0.0000%0.0000%0.1029%0.0114%0.0000%0.0800%0.0000%0.0000%
POQUOSON CITY96350.0934%0.0000%0.0000%0.0934%0.0000%0.0000%0.0934%0.0000%0.0000%0.0934%0.0000%0.0000%
ROANOKE CITY660830.0817%0.0015%0.0000%0.0666%0.0000%0.0000%0.0817%0.0015%0.0000%0.0666%0.0000%0.0000%
RUSSELL COUNTY192400.1091%0.0000%0.0000%0.1040%0.0000%0.0000%0.1091%0.0000%0.0000%0.1040%0.0000%0.0000%
Categories
Election Data Analysis Election Forensics Election Integrity mathematics programming technical

Potential duplicate registrants in VA voter list

I previously documented the utilization of the Hamming string distance measure to identify candidate pairs of duplicate registrants in voter lists. While a good first attempt at quantifying the numbers of potential duplicates in the voter rolls, using a hamming distance metric is less than ideal for reasons discussed below and in the previous article. I have since been able to update the processing functions to use a more complete Levenshtein distance (LD) metric, and made some improvements to parsers and other code utilities, etc., but otherwise the the analysis followed the same procedure, and is described below.


Using the 2022-11-23 Registered Voter List (RVL) and the 2023-01-26 Voter History List (VHL) purchased from the VA Department of Elections (ELECT) I wrote up an analysis script to check for potentially duplicated registrant records in the RVL and cross reference duplicate pairings with the VHL to identify potential duplicate votes. The details are summarized below.

Please note that I will not publish voter Personally Identifiable Information (PII) on this blog. I have substituted fictitious PII information for all examples given below, and cryptographically hashed all voter information in the downloadable results file. I will make available the detailed information to those that have the authorization to receive and process voter data upon request (contact us).

Summary of Results:

As a baseline, there were 6,464 records for STATUS=’Active’ registrants that adhered to the definition of a “duplicate” when Social Security Number (SSN) is not available, as defined by the MOU between DMV and ELECT (section 7.3) of having the same First Name + Last Name + Full Date of Birth (DOB). I’ve included a copy of the MOU between the VA DMV and ELECT at the end of this article for reference. It should be noted that most records held by DMV and ELECT have a SSN associated with them (or at least they should). SSN information is not distributed as part of the data purchased from ELECT, however, so this is the appropriate standard baseline for this work.

Upgrading our definition of a potential duplicate to [First + Middle + Last + Suffix + DOB] and using a LevenshteinDistance=0 drops the number of potential duplicates to 1,982, with each identified registrant in a pair having an exactly matching string result and unique voter ID numbers.

According to my derivations and simulations that are described in detail here, we should only expect to see an average of 11 (+/- 3) potential duplicate pairs (a.k.a. “collisions”) at a distance of 0. This is over two orders of magnitude different than what we observe in the compiled results. Such a discrepancy deserves further investigation and verification.

Allowing for a single string difference by setting LevenshteinDistance<=1 increases the pool of potential duplicates to 5,568. While this relaxation of the filter does allow us to find certain issues (described below) it also increases our chances of finding false positives as well. The LD metric results should not be viewed as a final determination, but as simply a useful tool to make an initial pass through the data and find candidate matches that still require further review, verification and validation.

Increasing to LevenshteinDistance<=2 brings the number of potential duplicates up to 32,610. When we increase to LD <= 3 we get an explosion of 183,130 potential duplicates.

Method:

For every entry in the latest RVL, I performed a string distance comparison, based on Levenshtein distance, between every possible pair of strings of (FIRST NAME + MIDDLE NAME + LAST NAME + SUFFIX + FULL DOB).  For the ~6M different RVL entries, we therefore need to compute ~3.8 x 10^13 different string comparisons, and each string comparison can require upwards of 75 x 75 individual character comparisons, meaning the total number of character operations is on the order of 202.5 Quadrillion, not including logging and I/O.

A distance of 0 indicates the strings being compared are identical, a distance of 1 indicates that there a single character can be changed, inserted or removed that would convert one string into the other. A distance of 2 indicates that 2 modifications are required, etc. 

Example: The string pair of “ALISHA” –> “ALISHIA” has an LD of 1, corresponding to the addition of an “I” before the final “A”.

I aggregated all of the Levenshtein distance pairings that were less than or equal to 3 characters different in order to identify potential (key word) duplicated registrants, and additionally for each pairing looked at the voter history information for each registrant in the pair to determine if there was a potential (again … key word) for multiple ballots to be cast by the same person in any given election.  As we allow for more characters to be different, we potentially are including many more likely false positive matches, even if we are catching more true positives.

For example: At a distance of 4 the strings of “Dave Joseph Smith M 10/01/1981” and “Tony Joseph Smith M 10/01/1981” at the same address would produce a potential match, but so would “Davey Joseph Smith M 10/01/1981” and “David Josiph Smith M 10/02/1981”. The first pair is more likely to be a false positive due to twins, while the second is more likely to be due to typo’s, mistakes, or use of nicknames and might warrant further investigation. A much stronger potential match would be something like “David Josiph Smith M 10/01/1981” and “David Joseph Smith M 10/01/1981”, with a distance of 1 at the same address. In an attempt to limit false positives, I have clamped the distance checks to <= 3 in this analysis.

The Levenshtein distance measure is importantly able to identify potential insertions or deletions as well as character changes, which is an improvement over the Hamming distance measure. This is exampled by the following pairing: “David Joseph Smith M 10/01/1981” and “Dave Joseph Smith M 10/01/1981”. The change from “id” to “e” in the first name adds/subtracts a character making the rest of the characters in the remainder of the string shift position. A Levenshtein metric would correctly return a small distance of 2, whereas the hamming distance returns 27.

Note that with the official records obtained from ELECT, and in accordance with the laws of VA, I do not have access to the social security number or drivers license numbers for each registration record, which would help in identifying and discriminating potential duplicate errors vs things like twins, etc. I only have the first name, middle name, last name, suffix, month of birth, day of birth, year of birth, gender, and address information that I can work with.  I can therefore only take things so far before someone else (with investigative authority and ability to access those other fields) would need to step in and confirm and validate these findings.

Results:

The summary totals are as follows, with detailed examples.

DMV_ELECT MOU StandardLD <= 0LD <= 1LD <= 2LD <= 3
Number of Potential Duplicate Registrant Pairs7,586 (0.12%)2,472 (0.04%)6,620 (0.11%)32,610 (0.53%)183,130 (2.99%)
Number of Potential Duplicate Registrant Pairs (Active Only)6,464 (0.11%)1,982 (0.03%)5,568 (0.10%)28,884 (0.50%)164,302 (2.85%)
Number of Potential Duplicate Ballots6,3621123,57637,028236,254
Number of Potential Duplicate Ballots (Active Only)6,2281103,54236,434232,394

Examples of Types of Issues Observed:

NOTE THE BELOW INFORMATION HAS HAD THE VOTER PERSONALLY IDENTIFIABLE INFORMATION (“PII”) FICTIONALIZED. WHILE THESE ARE BASED ON REAL DATA TO ILLUSTRATE THE DIFFERENT TYPES OF OBSERVATIONS, THEY DO NOT REPRESENT REAL VOTER INFORMATION.

Example #1: The following set of records has the exact match (distance = 0) of full name and full birthdate (including year), but different address and different voter ID numbers AND there was a vote cast from each of those unique voter ID’s in the 2020 General Election.  While it’s remotely possible that two individuals share the exact same name, month, day and year of birth … it is probabilistically unlikely (see here), and should warrant further scrutiny.

Voter Record A:

AMY BETH McVOTER 12/05/1970 F 12345 CITIZEN CT

Voter Record B:

AMY BETH McVOTER 12/05/1970 F 5678 McPUBLIC DR

Example #2: This set of records has a single character different (distance of 1) in their first name, but middle name, last name, birthdate and address are identical AND both records are associated with votes that were cast in the 2020, 2021, and 2022 November General Elections.  While it is possible that this is a pair of 23 year old twins (with same middle names) that live together, it at least bears looking into.

Voter Record A:

TAYLOR DAVID VOTER 02/16/2000 M 6543 OVERLOOK AVE NW

Voter Record B:

DAYLOR DAVID VOTER 02/16/2000 M 6543 OVERLOOK AVE NW

Example #3: This set of records has two characters different (distance of 2) in their birthdate, but name and address are identical AND the birth years are too close together for a child/parent relationship, AND both records are associated with votes that were cast in the 2020 and 2022 November General Elections. 

Voter Record A:

REGINA DESEREE MACGUFFIN 02/05/1973 F 123 POPE AVE

Voter Record B:

REGINA DESEREE MACGUFFIN 03/07/1973 F 123 POPE AVE

Example #4: This set of records has again a single character different (distance of 1) in the first name (but not the first letter this time) and the last name, birthdate and address are identical.  There were also multiple votes cast in the 2019 and 2022 November General from these registrants.

Voter Record A:

EDGARD JOHNSON 10/19/1981 M 5498 PAGELAND BLVD

Voter Record B:

EDUARD JOHNSON 10/19/1981 M 5498 PAGELAND BLVD

Example #5: This set of records has two characters different (distance of 2) in the first and middle names and the last name, birthdate, gender and address are identical.  There were also multiple votes cast in the 2021 and 2022 November General from these registrants. Again it is possible that these records represent a set of twins given the information that ELECT provides.

Voter Record A:

ALANA JAVETTE THOMPSON 01/01/2003 F 123 CHARITY LN

Voter Record B:

ALAYA YAVETTE THOMPSON 01/01/2003 F 123 CHARITY LN

Example #6: The following set of records has the exact match (Distance = 0) of full name and full birthdate (including year), and same address but different voter ID numbers.  There was no duplicated votes in the same election detected between the two ID numbers.

Voter Record A:

JAMES TIBERIUS KIRK 03/22/2223 M 1701 Enterprise Bridge

Voter Record B:

JAMES TIBERIUS KIRK 03/22/2223 M 1701 Enterprise Bridge

Example #7: The following set of records has the exact match (distance = 0) of full name and full birthdate (including year), same address but different gender and voter ID numbers.  There was no duplicated votes in the same election detected between the two ID numbers.

Voter Record A:

MAXWELL QUAID CLINGER 11/03/2004 M 4077 MASH DR

Voter Record B:

MAXWELL QUAID CLINGER 11/03/2004 U 4077 MASH DR

Example #8: The following set of records has a single punctuation character different, with the same address but different voter ID numbers.  There was no duplicated votes in the same election detected between the two ID numbers.

Voter Record A:

JOHN JACOB JINGLHIEMER-SCHMIDT 06/29/1997 M 12345 JACOBS RD

Voter Record B:

JOHN JACOB JINGLHIEMER SCHMIDT 06/29/1997 M 12345 JACOBS RD

Results Dataset:

A full version of the aggregated excel data is provided below, however all voter information (ID, first name, middle name, last name, dob, gender, address) have been removed and replaced by a one-way hash number, with randomized salt, based on the voter ID. The full file with specific voter information can be provided to parties authorized by ELECT to receive and process voter information, Election Officials, or Law Enforcement upon request.

20221123-VA-RVL-String-Distance.csv

The MOU between the VA Department of Elections (ELECT) and the VA Department of Motor Vehicles (DMV) is also provided below for reference. Section 7.3 defines the minimal standards for determining a match when no social security number is present.

Categories
Election Data Analysis Election Forensics Election Integrity Interesting programming technical

FEC Filing Summaries Supporting James O’keefe Recent Revelation

Per the recent James O’Keefe video documenting incredulous amounts of contributions by individuals to political committees, I took a few minutes to download the public FEC data in bulk format and collated all of the individuals that had more than 100 donations to each of the following organizations: DNC, DCCC, DSCC, RNC, NRSC, NRCC, WinRed, ActBlue in the 2021-2022 Election period.

I made sure to account for and remove those records of contributions that had been (legally) returned due to over-contribution, or earmarked for campaign committee legal or facility funds, etc which are exempt from campaign finance limits.

I hope it helps James. I’m not going to try and do any sort of analysis on this data, as I’ve got plenty to do regarding the IT of our elections, but I wanted to help where I could. Any questions, or if you would like the raw transaction data, please feel free to contact me, or if you have a particular committee ID that you would like the information for, just let me know. I am happy to help.

Categories
Election Data Analysis Election Forensics Election Integrity Interesting programming technical

Potential duplicate registrants in VA voter list via Hamming Distance

Using the 2022-11-23 Registered Voter List (RVL) and the 2023-01-26 Voter History List (VHL) purchased from the VA Department of Elections (ELECT) I wrote up an analysis script to check for potentially duplicated registrant records in the RVL and cross reference duplicate pairings with the VHL to identify potential duplicate votes. This was my initial attempt at quantifying the number of potentially duplicate records in the RVL, and I have since updated the code to use a more rigorous Levenshtein distance metric, as well as making improvements to the parsing routines, bugfixes, etc. The details of the Hamming distance work are summarized below, and left up here for reference. For the latest and up to date information, please see the newer article posted here.

Errata note: One of the code bugs I discovered was that some of the entries did not actually get checked as they were accidentally skipped, so the numbers below are lower than the numbers presented in the newer work.

Please note that I will not publish voter Personally Identifiable Information (PII) on this blog. I have substituted fictitious PII information for all examples given below, and cryptographically hashed all voter information in the downloadable results file. I will make available the detailed information to those that have the authorization to receive and process voter data upon request (contact us).

Summary of Results:

We should mathematically expect approximately 11 exact string collisions in the full RVL dataset when comparing (First Name + Middle Name + Last Name + Suffix + Full DOB), but instead we see 1982 such collisions, which is over an order of magnitude increase from the expected value. While its possible that some of these collisions are false positives, there are quite a number of them that are deserving of further scrutiny.

Method:

For every entry in the latest RVL, I performed a string distance comparison, based on Hamming distance, between every possible pair of strings of (FIRST NAME + MIDDLE NAME + LAST NAME + SUFFIX + FULL DOB).  So for the ~6M different RVL entries, we need to compute ~3.6 x 10^13 different string comparisons. A hamming distance of 0 indicates the strings being compared are identical, a hamming distance of 1 indicates that there is a single character different between the two strings, a hamming distance of 2 indicates 2 characters are different, etc.  This obviously is a very computationally intensive process and it took over two days to complete the processing, once I got the bugs worked out.  (I’ve been quietly working on this one for a while now … )

Note that the Hamming distance only compares each respective position in a string and does not account for adding or removing a character completely from a string. A metric that does include addition and subtraction is the Levenshtein Edit Distance, which is much more computationally expensive (but more rigorous) metric. The Hamming distance is related to the Levenshtein distance in that it is mathematically the upper bound on the Levenshtein distance for arbitrary strings. I haven’t yet finished making an optimized GPU accelerated version of the Levenshtein edit distance metric, but it is in the works and I will redo this analysis with the new metric once that is completed.

I aggregated all of the Hamming distance pairings that were less than or equal to 3 characters different in order to identify potential (key word) duplicated registrants, and additionally for each pairing looked at the voter history information for each registrant in the pair to determine if there was a potential (again … key word) for multiple ballots to be cast by the same person in any given election.  As we allow for more characters to be different, we potentially are including many more likely false positive matches, even if we are catching more true positives.

For example: At a Hamming distance of 4 the strings of “Dave Joseph Smith M 10/01/1981” and “Tony Joseph Smith M 10/01/1981” at the same address would produce a potential match, but so would “Davey Joseph Smith M 10/01/1981” and “David Josiph Smith M 10/02/1981”. The first pair is more likely to be a false positive due to twins, while the second is more likely to be due to typo’s, mistakes, or use of nicknames and might warrant further investigation. A much stronger potential match would be something like “David Josiph Smith M 10/01/1981” and “David Joseph Smith M 10/01/1981”, with a Hamming distance of 1 at the same address. In an attempt to limit false positives, I have clamped the Hamming distance checks to <= 3 in this analysis.

One of the drawbacks of using Hamming distance over a more complete metric such as Levenshtein, is that the Hamming distance would give a very high score, and would therefore filter out of our results, an example pairing such as: “David Joseph Smith M 10/01/1981” and “Dave Joseph Smith M 10/01/1981”. The change from “id” to “e” adds/subtracts a character making the rest of the characters in the remainder of the string shift position and also not match. A Levenshtein metric would correctly return a small distance of 2, whereas the hamming distance returns 27. (As mentioned earlier, I am working on a Levenshtein implementation, but it is not yet complete.)

Note that with the official records obtained from ELECT, and in accordance with the laws of VA, I do not have access to the social security number or drivers license numbers for each registration record, which would help in identifying and discriminating potential duplicate errors vs things like twins, etc. I only have the first name, middle name, last name, suffix, month of birth, day of birth, year of birth, gender, and address information that I can work with.  I can therefore only take things so far before someone else (with investigative authority and ability to access those other fields) would need to step in and confirm and validate these findings.

Results:

The summary totals are as follows, with detailed examples.

Hamming Distance0123
Number of Potential Duplicate Registrant Pairs1982327621864120642
Number of Potential Duplicate Ballots110324831210175872

According to my derivations and simulations that are described in detail at the end of this article, we should only expect to see an average of 11 (+/- 3) potential duplicate pairs (a.k.a. “collisions”) at a Hamming distance of 0. This is over two orders of magnitude different than what we observe in the compiled results table above. Such a discrepancy deserves further investigation and verification.

Examples of Types of Issues Observed:

NOTE THE BELOW INFORMATION HAS HAD THE VOTER PERSONALLY IDENTIFIABLE INFORMATION (“PII”) FICTIONALIZED. WHILE THESE ARE BASED ON REAL DATA TO ILLUSTRATE THE DIFFERENT TYPES OF OBSERVATIONS, THEY DO NOT REPRESENT REAL VOTER INFORMATION.

Example #1: The following set of records has the exact match (Hamming Distance = 0) of full name and full birthdate (including year), but different address and different voter ID numbers AND there was a vote cast from each of those unique voter ID’s in the 2020 General Election.  While it’s remotely possible that two individuals share the exact same name, month, day and year of birth … it is probabilistically unlikely (see section below on mathematical derivation of probabilities if interested), and should warrant further scrutiny.

Voter Record A:

AMY BETH McVOTER 12/05/1970 F 12345 CITIZEN CT

Voter Record B:

AMY BETH McVOTER 12/05/1970 F 5678 McPUBLIC DR

Example #2: This set of records has a single character different (Hamming distance of 1) in their first name, but middle name, last name, birthdate and address are identical AND both records are associated with votes that were cast in the 2020, 2021, and 2022 November General Elections.  While it is possible that this is a pair of 23 year old twins (with same middle names) that live together, it at least bears looking into.

Voter Record A:

TAYLOR DAVID VOTER 02/16/2000 M 6543 OVERLOOK AVE NW

Voter Record B:

DAYLOR DAVID VOTER 02/16/2000 M 6543 OVERLOOK AVE NW

Example #3: This set of records has two characters different (Hamming distance of 2) in their birthdate, but name and address are identical AND the birth years are too close together for a child/parent relationship, AND both records are associated with votes that were cast in the 2020 and 2022 November General Elections. 

Voter Record A:

REGINA DESEREE MACGUFFIN 02/05/1973 F 123 POPE AVE

Voter Record B:

REGINA DESEREE MACGUFFIN 03/07/1973 F 123 POPE AVE

Example #4: This set of records has again a single character different (Hamming distance of 1) in the first name (but not the first letter this time) and the last name, birthdate and address are identical.  There were also multiple votes cast in the 2019 and 2022 November General from these registrants.

Voter Record A:

EDGARD JOHNSON 10/19/1981 M 5498 PAGELAND BLVD

Voter Record B:

EDUARD JOHNSON 10/19/1981 M 5498 PAGELAND BLVD

Example #5: This set of records has two characters different (Hamming distance of 2) in the first and middle names and the last name, birthdate, gender and address are identical.  There were also multiple votes cast in the 2021 and 2022 November General from these registrants. Again it is possible that these records represent a set of twins given the information that ELECT provides.

Voter Record A:

ALANA JAVETTE THOMPSON 01/01/2003 F 123 CHARITY LN

Voter Record B:

ALAYA YAVETTE THOMPSON 01/01/2003 F 123 CHARITY LN

Example #6: The following set of records has the exact match (Hamming Distance = 0) of full name and full birthdate (including year), and same address but different voter ID numbers.  There was no duplicated votes in the same election detected between the two ID numbers.

Voter Record A:

JAMES TIBERIUS KIRK 03/22/2223 M 1701 Enterprise Bridge

Voter Record B:

JAMES TIBERIUS KIRK 03/22/2223 M 1701 Enterprise Bridge

Example #7: The following set of records has the exact match (Hamming Distance = 0) of full name and full birthdate (including year), same address but different gender and voter ID numbers.  There was no duplicated votes in the same election detected between the two ID numbers.

Voter Record A:

MAXWELL QUAID CLINGER 11/03/2004 M 4077 MASH DR

Voter Record B:

MAXWELL QUAID CLINGER 11/03/2004 U 4077 MASH DR

Results Dataset:

A full version of the aggregated excel data is provided below, however all voter information (ID, first name, middle name, last name, dob, gender, address) have been removed and replaced by a one-way hash number, with randomized salt, based on the voter ID. The full file with specific voter information can be provided to parties authorized by ELECT to recieve and process voter information, Election Officials, or Law Enforcement upon request.

On the mathematical probability of matches:

2023-05-27: I have moved my derivation of the expected value of the number of collisions to a separate post, available here.

Categories
Election Data Analysis Election Forensics Election Integrity technical

Discrepancies between official VA turnout report and Voter History information

Now that the Voter History List and corresponding list of those who voted in the 11-08-2022 November General election has been released by the VA dept. of elections (“ELECT”), we have compiled and compared the results from the official turnout statistics on the ELECT website (here) to the information contained in both the Daily Absentee List (“DAL”) and Voter History List (“VHL”) information.

These are all “official” datasets provided by ELECT, and they should theoretically all match. However, there are a number of discrepancies in the official records that need to be explained. Performing simple differences between the accumulated totals for each locality exposes these issues. For example:

  • Why does the official results on the ELECT website show the total number of ballots cast for Albemarle County to be 11,105 less than the number of ballots recorded in the VHL? Why do the DAL file records show 11,429 more cast absentee (Early or Mail) ballots than the ELECT official results in Albemarle County? A: The Albemarle electoral board looked into this matter when we brought it to their attention and they reported that the registrar had made a transcription error and submitted the wrong line when submitting the election results to ELECT. While mistakes do happen, this explanation still begs the question as to why did ELECT not catch or flag these discrepancies, and what procedures are in place to prevent, catch or rectify these issues.
  • Why does Richmond City show 25,500 more ballots cast in the official turnout results on the ELECT website than are present in the VHL? Why are there 325 more absentee (Early or Mail) ballots in the ELECT website data than the DAL file?
  • Why is the sum of the absolute value of discrepancies between the VHL and the official results on the ELECT website 69,381?
  • Why is the sum of the absolute value of discrepancies in absentee ballots (Early or Mail) between the DAL and the official results on the ELECT website 28,515?
  • Why does the VHL data records contain zero “Election Day” records for Covington City?
  • … etc

We have notified a number of local electoral board members, registrars and election integrity groups, and postponed publishing this data until after they were able to validate the existence of these discrepancies, in the hopes that they can discover the source of the discrepancies and rectify the issues.

As a further dive into the data, we also computed and noted the following additional issues. Documentation on these specific discrepancies are available upon request by entities that are able to receive and process election data according to VA law and ELECT requirements. We will not publish the details publicly on this blog as they reveal personal voter information, but will give the summary results below. Please contact us for further information.

  • There were 3,354 records of valid absentee ballots cast recorded in the VHL that did not have corresponding entries (matched by voter ID) in the DAL record.
  • There were also 19,723 records of valid absentee ballots cast recorded in the DAL that did not have corresponding entries (matched by voter ID) in the VHL record.
  • The VHL contained 6 records of ballots cast in the VA Nov 2022 election that do not have a corresponding registration record in the Registered Voter List (RVL) dated 11/23/2022.
  • The VHL contained 93 records of ballots cast in any election that do not have a corresponding registration record in the Registered Voter List (RVL) dated 11/23/2022.
  • The 11/23/2022 statewide RVL contained 1,197 records that had duplicate records with different voter IDs, based on performing an exact match to first name, middle name, last name, suffix, full date-of-birth, and gender.

Background and Methodology:

The DAL file is supposed to track all of the transactions for the non-election day ballots for a specific election. This includes any Mail-In or Early In-Person ballots. (Any non-election day ballot is considered to be an “absentee” ballot in VA.) It is updated daily throughout the course of the election and is finalized once the election is certified. The VA 2022 election was certified by all of the localities by 11-18-2022. The last version of the DAL file we purchased and obtained from ELECT and used for this analysis was dated 12/13/2022, and it had been unchanged for a number of weeks by that point. We downloaded and archived copies of the DAL file multiple times per day in order to track the changes near-real-time over the course of the election cycle. (The day-to-day changes to the DAL records over time are documented in other blog entries.)

The VHL is updated by election officials after the election has concluded by adding the “voter credit” into VERIS (the state election database) for all registered voters that cast a ballot in the election (either absentee or on election day). The VHL contains information covering the last 4 years of election history for each currently registered voter. As of mid-January we were informed that all of the voter credit updates had been completed in the VERIS database for the 2022 General Election. (Why this isn’t done automatically similar to the DAL file entries and made available before election certification?) The version of the VHL that we purchased and obtained from ELECT and used for this analysis is from 1/26/2023. We specifically tried to time our purchase of the VHL to occur after the voter credit updates had been completed but before the scheduled maintenance process of removing longstanding “inactive” voters had taken place. We note that even if some “Inactive” voters had been removed from the VHL by the time we purchased the VHL, that should not impact this analysis, as the act of voting in the 2022 election would require the voters status to be listed as “Active”.

The “official” turnout report page on the ELECT website (here, again) was downloaded and converted to an excel spreadsheet on 2/4/2023. The notes on the website page is that the information content was last updated on 12/06/2022. (We have archived a pdf copy of the site as we downloaded it for reference attached to the bottom of this post.) All of the local electoral boards had completed their canvass by 11/18/2022 and submitted their information to ELECT, so there is no reason that this information should be missing any data.

The official turnout results from ELECT tabulate the Early Voting, Election Day, By Mail, and Provisional ballots cast and accounted for during the 2022 General Election. The DAL file includes information as to the Mail-In, Early Voting, and Absentee Provisional ballots cast. The VHL contains information as to all voters who cast a ballot in the election, and broken down by Election Day, Absentee (either Early or Mail-In), and Provisionals for each sub-group. Theoretically all of these data files should match exactly.

We note that there is a known issue that could potentially, but rarely, cause the number of ballots reported in the VHL to be strictly less than the number of ballots in the official results reported on the ELECT webpage. Due to the time delay between the close of the election and the update and publication of the VHL, there can be registrants that have been legitimately removed due to death, etc during that interval and their corresponding voting record is also removed when that happens. In the data presented below, this would manifest as a positive (+) discrepancy, but even so, these numbers should be relatively small.

We also again note that we made every effort to purchase the VHL dataset after all of the voter credit information was updated but before the department of elections performed their annual removal of records that had been inactive for more than 2 federal elections. Even so, if the voter was “inactive” and slated for removal, they wouldn’t be showing up in the count of ballots cast anyway. The act of casting a ballot immediately moves a voter into the “active” category.

The official turnout results are collated by locality, so we took the DAL and VHL information and accumulated their tallies to match the official turnout report breakdown. We then subtracted the DAL +/or VHL results from the official results to determine how much of a discrepancy exists between each of the datasets.

Terminology:

  • delta EV (DAL): This is the change in the Early Vote ballot numbers as calculated by subtracting the DAL “On Machine” ballots from the official turnout report “Early Vote” tally.
  • delta ByMail (DAL): This is the change in the Mail In ballot numbers as calculated by subtracting the DAL (“Pre-Processed” + “Marked”) ballots from the official turnout report “Mail In” tally.
  • Net DAL: The net sum of the DAL differences in each locality
  • Sum Abs Value DAL: The sum of the absolute values of the DAL differences in each locality.
  • delta ED (VHL): This is the change in Election Day ballot numbers as calculated by subtracting the VHL (“Election Day” AND NOT “Provisional”) ballots from the official turnout report “Election Day” tally.
  • delta Provisional (VHL): This is the change in Provisional ballot numbers as calculated by subtracting the VHL “Provisional” ballot total from the official turnout report “Provisional” tally.
  • delta Early or Mail (VHL):  This is the change in Early OR Mail-In ballot numbers as calculated by subtracting the VHL (“Absentee” AND NOT “Provisional”) ballots from the official turnout report (“Early Voting” + “Mail In”) tally.
  • delta Provisional (VHL): This is the change in Provisional ballot numbers as calculated by subtracting the VHL “Provisional” ballots from the official turnout report “Provisional” tally.
  • Net VHL: The net sum of the VHL differences for each locality   
  • Sum Abs Value VHL: The sum of the absolute values of the VHL differences for each locality

Data Results:

We present the data and results in the following EXCEL spreadsheet. Tab 1 is the ELECT website data with differences between the DAL and VHL data on the right hand side. Tab 2 is the accumulated DAL data by locality. Tab 3 is the accumulated VHL data by locality.

The screen capture of the ELECT website data from 2/4/2023 is also here:

https://digitalpollwatchers.org/wp-content/uploads/2023/02/ELECT-Turnout-Results-Screenshot-2023-02-04-1.pdf

Categories
Election Data Analysis Election Forensics Election Integrity programming technical

Distribution of Invalid Voter Addresses and Absentee Ballots in VA 2022 General Election, with Mailing Address Substitution

Forward:

In a previous post I documented the results from an United States Postal Service (USPS) National Change of Address (NCOA) database check with the 2022 Virginia Registered Voter List (RVL) primary address records joined with the Daily Absentee List (DAL) reports of absentee ballots cast. There was a significant public reaction to the fact that over 15K RVL primary addresses associated with voters who cast ballots in the VA 2022 general election were not recognized by the USPS database as valid addresses (among other issues). I reported the data as I found it, but a common commentary was that my analysis did not account for rural voters who do not have a traditional street address, or that do not have mail receptacles and use PO boxes as their primary delivery mechanism.

The requirements for voter registration and primary address are specified by the VA Constitution, Federal and VA law, and require the following:

  1. The VA constitution (Section II-1 and II-2) and the National Voter Registration Act (NVRA) requires that registered voter primary addresses be an actual physical (and deliverable) street address. The de-facto arbiter of what defines a recognizable and/or deliverable address is the USPS, therefore a street address that is not recognized or is undeliverable according to USPS is not compliant. We should be making every effort to ensure that the primary address associated with each voter is able to be correctly recognized and translated into a deliverable address by the USPS. This may require adjusting VA’s data normalization policies such that input street addresses are correctly mapped to USPS addresses, or a legislative action in order to correct.
  2. There is also the legal restriction that registered VA voters are not allowed to use PO Boxes as their primary registered voter address. There is an exception that “protected” voters are allowed to provide a PO box address to be displayed in public records, but their actual address on file must still be a physical address.
  3. The 2022 VA GREB, section 6..2.3 states that in special cases, a rural voter may supply the name of the highway and enough detail in the comments section of the voter application that the registrar may ascertain where the physical address is. This seems (IMO, but I’m not a lawyer) in contradiction to the language in the VA constitution and in the NVRA that make the implicit requirement of deliverable addresses. Also, if the address as entered is not recognized by the USPS NCOA system, then it will constantly be generating validation errors every time it is checked against the NCOA database, which VA is currently required to do at least annually. Again, this may require adjusting VA’s data normalization policies such that input street addresses are correctly mapped to USPS addresses, or a legislative action in order to correct.
  4. Additionally the USPS is supposed to recognize “Post Box Street Address” (PBSA) locations such that when someone addresses an envelope or package to the street address of a residence that is served only by delivery to a PO box, the USPS should automatically recognize this and adjust accordingly. Per the NCOA documentation, the USPS NCOA database checks are supposed to be doing this detection and translation already.

And again, to be clear, my point is not to accuse anyone of malicious intent or wrongdoing … I am simply trying to point out that the way we are using and managing our data is discrepant with what the requirements actually are. We either need to change the data and our practices to conform to the legal requirements, or change the law to fit how we are actually (and practically) using the data.

That being said, and in order to show that there are still significant data issues even after adjusting for rural routes, etc., I’ve re-run my NCOA analysis to account for records with invalid primary addresses but that had valid mailing addresses, even if those mailing addresses are PO Boxes. Any RVL record with a primary address that was marked invalid by an NCOA check, but had a different mailing address AND that mailing address was returned from a second NCOA check as NOT-invalid (even if it was a PO Box), was replaced with the mailing address listing and NCOA results. While the requirement is still that primary addresses are supposed to be valid, this re-processing and allowing for primary OR mailing addresses to be used is more in line with the way VA has actually implemented its voter registrations across the state … even though that implementation seems to be at odds with the requirements.

The rest of the analysis is performed the same as before, and is presented in the same format for consistency. I have updated the dates and numerical results as appropriate, but have otherwise kept the format and much of the language and layout the same as the previous analysis.


BLUF (Bottom Line Up Front):

There were 2,164 ballots cast during early voting in the VA 2022 General Election where the voters’ (primary or mailing) registered address on record were flagged as “Invalid” by a National Change of Address (NCOA) database check. If we include addresses that were identified as 90-day Vacant the total rises to 4,274. (The previous analysis that used only the primary addresses returned 15,419 invalid records with associated ballots, and went up to 17,244 when including addresses flagged as vacant.)

A certified commercial provider of NCOA data verification was used to facilitate this analysis on raw data obtained from the VA Dept of Elections (“ELECT”). It is not technically possible to obtain a truly time-synchronized complete set of data for any election due to the way elections are run in VA, but we made every effort to obtain the data from the state as close in time as was possible. The NCOA database is maintained and curated by the United States Postal Service (USPS).

For those wishing to review specific entries, or to help validate these issues, and who are part of an organization that is able to receive and handle election information according to VA law and the VA Dept of Elections requirements, you may contact us to request the raw data breakdowns. We will need to validate your organization or employment and will make data available as legally allowed.

Commentary and Discussion:

I would like to be very clear: We are simply presenting the data as compiled to facilitate public discourse. We have strived to only utilize data directly obtained from authoritative sources (ELECT, the USPS via TrueNCOA provider).

The designation of “Invalid” addresses is according to the definition by the USPS and TrueNCOA, i.e. the TrueNCOA check has reported the addresses as listed in the RVL have no match in the USPS database. Invalid addresses do not include things like valid P.O. Boxes or valid rural addresses that are automatically recognized and translated by USPS to Post Box Street Address (PBSA) records.

The VA Constitution (Section II-1 and II-2) specify the requirements for voter eligibility to include that voters are required to supply a primary address for their registration record, regardless of their method of voting. VA is required by law to consistently maintain and validate these records. Based on the below analysis, the data shows that there is a small but statistically significant number of “Invalid” addresses associated for voters who cast ballots in the Nov 2022 election.

Continuing EPEC’s mission to promote voter participation, analyze election technology, and educate the public about best practices in managing election technology systems; we are providing the below analysis in order to educate and inform the public, legislators and elections officials about the existence of these discrepancies.

Details:

After receiving the results of a National Change of Address (NCOA) database check on the registration (not the temporary) addresses in the latest VA Registered Voter List (RVL). I’ve gone through and collated the flagged addresses and reconciled them with the entries in the Daily Absentee List (DAL) file records provided by the VA Dept. of Elections (“ELECT”).

The DAL file (dated 2022-12-13) provides a records of all of the voters that cast absentee (either Early In-Person or Mail-In) ballots in the election, and the RVL (dated 2022-11-23) gives all of the registered voter addresses and other pertinent information. Both datasets come directly from the VA Dept. of Elections and must be purchased. Total cost was ~$7000. The two datasets can be tied together using the voter Identification Number that is assigned to each (supposedly) unique voter by the state. Entries in the RVL should be unique to each registered voter (although there are a small number of duplicate voter IDs that I have seen … but thats for another post), whereas the DAL file can have multiple entries attributed to a single voter recording the various stages of ballot processing.

The NCOA check was performed on all addresses in the RVL file in order to detect recent moves, invalid addresses, vacant addresses, P.O. Boxes, commercial addresses, etc. The NCOA check takes multiple days to run using a commercial service provider and was executed between 2022-12-19 through 2022-12-30.

*** As noted above, in this run I substituted those primary addresses that evaluated to “invalid” with their corresponding “mailing” address records that had been recognized as “valid”, if possible. ***

Results:

Raw TrueNCOA Processing result stats on the full RVA dataset:
NCOA Processing of VA RVL RecordsFull RVL 11-23 Primary Addresses (12/30/2022)%Unique RVL Mailing Addresses (12/30/2022)%
Records Processed6,127,856
216,896
18 – Month NCOA Moves282,6694.61%8,2473.80%
48 – Month NCOA Moves163,9312.68%8,0993.73%
Moves with no Forwarding Address20,9130.34%1,4150.65%
Total NCOA Moves467,5137.63%17,7618.19%
Vacant Flag29,3150.48%22,76310.49%
DPV Updated/Address Corrected Records601,5399.82%18,5558.55%
DPV Deliverable Records5,835,23095.22%165,93976.51%
DPV Non-Deliverable Records185,7523.03%35,10016.18%
LACS Updated (Rural Address converted to Street Address)33,4970.55%2540.12%
Residential Delivery Indicator5,970,53197.43%194,34289.60%
Addresses matched to the USPS Database6,020,98398.26%201,04092.69%
Invalid Addresses107,3041.75%15,9687.36%
Expired Addresses10,8220.18%2760.13%
Business Move (B)3400.01%610.03%
Family Move (F)1168741.91%5,3902.49%
Individual Move (I)3502995.72%12,3105.68%
General Delivery Address1590.00%1700.08%
High Rise Address76492212.48%7,0973.27%
PO Box Address280400.46%158,35973.01%
Rural Route Address810.00%2690.12%
Single Family Address524332185.57%33,58215.48%
Unknown519030.85%15,5377.16%
Reporting as presented from the TrueNCOA data service. The TrueNCOA data dictionary is presented here.
Combining NCOA results of RVL Addresses with the DAL data:
Vacant Addresses:

There were 2,112 records across the state with addresses that have been flagged as (90-day) “Vacant” by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election according to the DAL file. Of those records, 1,542 were Early In-Person and 547 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location. (Note this is actually an increase over the previous results that were not adjusted for potential mailing address matches.)

P.O. Boxes (Non-protected):

There were 13,492 records across the state with addresses that have been flagged as P.O. Box Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election AND were NOT listed as protected entries according to the DAL file. (VA allows for voters who have a legal protective order to list a P.O. Box as their address of record on public documents) Of those records, 11,426 were Early In-Person and 2020 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

Note, this is actually an increase over the previous results that were not adjusted for potential mailing address matches. This is not unexpected, as there are a number of PO box mailing addresses that have been substituted in for invalid primary street addresses. PO Boxes are not supposed to be allowed as registered voting addresses. (You should talk to your legislators about this discrepancy, because this catch-22 will likely need to be changed by legislative action!)

Invalid Addresses:

There were 2,164 records across the state with addresses that have been flagged as “Invalid” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election. Of those records, 1,535 were Early In-Person and 598 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location. Note, this is actually a significant decrease over the previous results that were not adjusted for potential mailing addresses. This is not unexpected, as there were a number of invalid primary address with a valid (even if PO Box) mailing address. (The fact that the primary addresses do not validate with the USPS is it’s own issue, but we are ignoring that for this analysis as noted above.)

Invalid OR Vacant Addresses:

There were 4,274 records across the state with addresses that have been flagged as “Invalid” or “Vacant” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election. Of those records, 3,077 were Early In-Person and 1,143 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

Record of Moves Out-of-State:

There were 797 records that had records of NCOA moves to valid out-of-state addresses before 2022-08 that also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election. Of those records, 346 were Early In-Person and 450 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

Results By District:

District 01:
Invalid Addresses:

There were 213 records with registered addresses that have been flagged as “Invalid” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 01. Of those records, 170 were Early In-Person and 38 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

Invalid OR Vacant Addresses:

There were 364 records with registered addresses that have been flagged as “Invalid” or “Vacant” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 01. Of those records, 288 were Early In-Person and 71 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

District 02:
Invalid Addresses:

There were 183 records with registered addresses that have been flagged as “Invalid” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 02. Of those records, 147 were Early In-Person and 33 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

Invalid OR Vacant Addresses:

There were 441 records with registered addresses that have been flagged as “Invalid” or “Vacant” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 02. Of those records, 344 were Early In-Person and 90 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

District 03:
Invalid Addresses:

There were 37 records with registered addresses that have been flagged as “Invalid” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 03. Of those records, 27 were Early In-Person and 10 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

Invalid OR Vacant Addresses:

There were 324 records with registered addresses that have been flagged as “Invalid” or “Vacant” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 03. Of those records, 223 were Early In-Person and 94 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

District 04:
Invalid Addresses:

There were 116 records with registered addresses that have been flagged as “Invalid” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 04. Of those records, 104 were Early In-Person and 12 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

Invalid OR Vacant Addresses:

There were 318 records with registered addresses that have been flagged as “Invalid” or “Vacant” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 04. Of those records, 256 were Early In-Person and 59 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

District 05:
Invalid Addresses:

There were 383 records with registered addresses that have been flagged as “Invalid” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 05. Of those records, 299 were Early In-Person and 81 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

Invalid OR Vacant Addresses:

There were 584 records with registered addresses that have been flagged as “Invalid” or “Vacant” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 05. Of those records, 445 were Early In-Person and 134 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

District 06:
Invalid Addresses:

There were 202 records with registered addresses that have been flagged as “Invalid” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 06. Of those records, 144 were Early In-Person and 53 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

Invalid OR Vacant Addresses:

There were 400 records with registered addresses that have been flagged as “Invalid” or “Vacant” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 06. Of those records, 301 were Early In-Person and 91 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

District 07:
Invalid Addresses:

There were 243 records with registered addresses that have been flagged as “Invalid” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 07. Of those records, 169 were Early In-Person and 68 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

Invalid OR Vacant Addresses:

There were 370 records with registered addresses that have been flagged as “Invalid” or “Vacant” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 07. Of those records, 274 were Early In-Person and 87 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

District 08:
Invalid Addresses:

There were 165 records with registered addresses that have been flagged as “Invalid” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 08. Of those records, 69 were Early In-Person and 94 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

Invalid OR Vacant Addresses:

There were 413 records with registered addresses that have been flagged as “Invalid” or “Vacant” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 08. Of those records, 226 were Early In-Person and 183 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

District 09:
Invalid Addresses:

There were 328 records with registered addresses that have been flagged as “Invalid” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 09. Of those records, 257 were Early In-Person and 66 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

Invalid OR Vacant Addresses:

There were 479 records with registered addresses that have been flagged as “Invalid” or “Vacant” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 09. Of those records, 373 were Early In-Person and 100 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

District 10:
Invalid Addresses:

There were 174 records with registered addresses that have been flagged as “Invalid” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 10. Of those records, 104 were Early In-Person and 68 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

Invalid OR Vacant Addresses:

There were 246 records with registered addresses that have been flagged as “Invalid” or “Vacant” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 10. Of those records, 164 were Early In-Person and 82 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

District 11:
Invalid Addresses:

There were 122 records with registered addresses that have been flagged as “Invalid” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 11. Of those records, 46 were Early In-Person and 75 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

Invalid OR Vacant Addresses:

There were 335 records with registered addresses that have been flagged as “Invalid” or “Vacant” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 11. Of those records, 181 were Early In-Person and 152 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

Summary Data Files by Locality:

The complete set of graphics and statistics for each locality, and each congressional district in VA can be downloaded here as a zip file. The tabulated summary results can also be downloaded in excel, csv, or numbers format: