Category: mathematics

Validating Ed Solomon’s Observations on CO 2020 data

Post author By Jonathan Lareau
Post date May 26, 2025
No Comments on Validating Ed Solomon’s Observations on CO 2020 data

Recently I was made aware of the work Ed Solomon had been doing with data from the 2020 Colorado Cast Vote Records (CVRs), and I’ve taken some time to replicate and validate some of his data observations. I don’t always agree with Ed, but I wanted to take some time and verify the facts of the matter for myself.

For background, CVRs are machine logs of the way the tabulators process the “cast” ballots. You can think of them as equivalent to your bank statement showing all of the recorded transactions for each ballot scanned. They are required to be producible by ballot electronic tabulation systems, and are used as part of official forensic audits and documentation. They do not have any personal information and simply operate on the content of individual ballots as they are processed.

There are 2 specific items that need to be validated here:

Odd statistics associated with statewide ballot measures in Arapahoe County as compared to other counties. Specifically, there were two statewide ballot measures (one dealing with taxes, and another on abortion) that one would expect to show a significant partisan split, and we in fact do see such a split in neighboring El Paso and Adams counties. However, the ballot measures do not show the partisan split in Arapahoe County.
- The difference is not just that the partisan split is muted or reduced, it is a night and day difference. In Arapahoe county there is almost no statistical difference between Trump and Biden voters on the ballot measures, but there is an obvious and clear difference on the same ballot measures in neighboring Adams and El Paso counties.
- Why is this important? It raises questions as to the veracity of the election counts, data handling practices, and the ability to use CVRs for their intended forensic purpose.
The fact that the Arapahoe County CVR data was changed on the official county website without any notification or explanation around Feb 2025. The internal composition of ballots was changed in the data and “scrambled” by Arapahoe county … with the new version of the CVR files no longer showing the inconsistency from #1.
- As CVRs are official records that are used for legal purposes such as audits etc., they should never be “quietly” changed or modified retroactively. A full and transparent explanation of the issues and steps made to remedy should accompany any updates for official documents such as CVRs.
- This change took place years after the CVR was originally produced, and after Ed Solomon had used this particular CVR as part of his supporting documentation in an election case (Thompson vs Secretary of State NV) in Nevada.
  - The county was fully aware of the use of these records in the Nevada case.
  - The county CVRs had already been used in a previous audit of the 2020 election, where ballots from a specific tabulator and batch were pulled and compared to the cast vote record for accuracy. (see here, and here)
- After Ed and Mark noticed the retroactive modifications and started asking questions, the County released a statement explaining that they were contacted by a researcher about potential issues with their CVR regarding “redaction” and privacy concerns.
  - However, the county statement gives an incompatible date for when the documents were “corrected”, according to the file timestamps and internet archive logs. The statement claims April 2nd 2025, whereas the contents of the uploaded file show Feb 20th 2025 as the file modification date.
  - The counties explanation does not comport with the observation of the scrambling of internal contents of official ballots. Its not just that the ordering of ballots was randomized to assuage privacy concerns, but that the actual records of votes cast were being swapped between ballots.
- At BEST this shows a woeful lack of transparency and procedural safeguards by the county.
- At WORST this has the appearance of being intentional tampering with official records.

I can independently replicate and validate both of these data observations. There does seem to be an issue with the ballot measures in the original Arapahoe County CVR data, and that data has been retroactively modified by the county in such a way as to scramble the information associated with votes cast.

Note that Ed Solomon, Draza Smith, Jeff O’Donnell (a.k.a. “The Lone Raccoon”), Mark Cook, MadLiberals, and others all provided data and pointers to the original documents and URLs in question, as well as their own analysis on the X.com platform.

Starting with item # 1:

Arapahoe County makes their CVR data available on their website (https://arapahoevotes.gov/transparency/). To obtain the new (modified) file click on the “Certification & Recounts” tab and scroll down to find the CVR link on the left hand side. Direct link is here: https://www.arapahoevotes.gov/file/2020-general-election-redacted-cast-vote-record

The original data is also still available on the Arapahoe County website, but one needs to do some creative sleuthing via the wayback machine and looking at the URL links in order to get to it, as was done by Mark Cook. (see: here and here)

The link to the original CVR file is: https://gis.arapahoegov.com/DL_Data/Redacted_CVR/Redacted_CVR_Export.zip

The original data from the 2020 CVR data had also been collected and collated by Jeff O’Donnell on his https://votedatabase.com (formerly ordros.com) site, which I archived and versioned in Sept of 2022. I can confirm that the original file matches the files in the votedatabase archive, as well as the current votedatabase site. I used additional files from votedatabase archive for neighboring Adams and El Paso counties as a source for the rest of this work, as I could not find corresponding links to CVR downloads on the Adams or El Paso county websites.

Ed’s original observation was that the two ballot measures that *should* be partisan split were not when looking at the original 2020 CVR Arapahoe County data. He used this observation as supporting evidence showing inconsistencies and irregularities in 2020 election data in an court case (Thompson vs Secretary of State NV) he was providing analysis for in Nevada. All of his analysis and the original file have therefore been previously submitted to the court.

Amendment B was a repeal of the “Gallagher Amendment” dealing with property tax rates and was expected to have a highly partisan split. Likewise Proposition 115 dealt with abortion and was also expected to have highly partisan split.

If we look at the plots of the ballots cast for these two ballot measures, but we condition them on if the person voted for Trump or Biden at the top of the ticket we do see in neighboring counties such as El Paso and Adams counties this partisan split, as shown below. Note the significant spread between the Biden/No (Yellow) & Trump/No (Purple), as well as between the Biden/Yes (Blue) & Trump/Yes (Red).

As can be seen in the plots above from Adams and El Paso counties, there is a significant partisan split in these two down-ballot races when conditioned on how the top of the ticket votes. However this seems to vanish when looking at Arapahoe County, with the Biden/No (Yellow) & Trump/No (Purple) and the Biden/Yes (Blue) & Trump/Yes (Red) stacking almost completely on top of one another.

It can be clearly seen in the plots that the partisan split that was present in the other counties results seems to have completely vanished in Arapahoe.

I will note that the partisan split seems to be missing from almost all down-ballot races that I looked at, not just these two, although these were the ones specifically called out by Ed. This is an important point that I will come back to in a minute.

… And now to item # 2:

Ed’s original observation was submitted as part of his case in Nevada. At one point he and Mark Cook attempted to make a live stream video showing how people could recreate the observations starting from the source documents on the Arapahoe County website, which is when he and Mark realized that the original CVR file on the county website had been quietly replaced with a new file that had its contents scrambled and the results no longer showed the observed pattern.

Note that a CVR file is a legally required forensic record. It is the equivalent of a bank transaction log, and should almost never have its contents manipulated. If an error is discovered, and a correction does need to be issued, then a new file with the corrections should be published along side the original with a clear and prominent explanation and notification of the change. In this case, however, the County simply replaced the link to the original file with the new file with no explanation and no notice.

It was only after this was discovered, and after Ed started making phone calls to the County and bringing up the issue with the Judge in his Nevada case, that Arapahoe County belabouredly published an admission that they had adjusted the file. Their excuse for the modification was that they were made aware of a mistake with their “redaction” of data in the original publication, and were worried about individual privacy.

The (new) altered file did have 16 specific ballots that had their down-ballot races zeroed out, and was missing the “CountingGroup” metadata column. However, the file didn’t just have a small number of ballots (16) down-ballot information omitted, the internal contents on ALL ballots were also completely scrambled, with the top-of-the-ticket entries for President and Senate and metadata columns being completely reordered from all of the down-ballot information. This split-scrambling also also “fixed” the observed issues with Amendment B and Proposition 115, as can be seen below, where there is now a distinct partisan split between the data trends.

This kicked off multiple efforts to reverse engineer the actual changes that were performed on the CVR data by Arapahoe county, by myself and multiple others. Jeff O’Donnell and MadLiberals on X made the observation of the split reordering. I was able to verify this and remove the split-shuffling, exposing the fact that there were 16 ballots that also had all of their down ballot information zeroed out. There were a total of 432 down ballot votes that were removed from 16 specific ballots, followed by ALL of the President and Senate votes for ALL ballots being scrambled in relation to their down-ballot races.

Back to the point I made before above … the “scrambling” of the new file does seem to have “fixed” the expected partisan nature of most of the down-ballot races, so it is not unreasonable to think that this was actually a “fix” for a processing error on the original CVR file. That being said, the original (assumed incorrect) CVR was used in an audit process of two down-ballot races (linked above). Why did they not catch this issue years earlier during the audit? And why did they make the change to files, under the pretense of privacy issues, without announcing and documenting the errors?

Conclusion:

I can verify the two main data issues documented by Ed Solomon on the Arapahoe County 2020 CVR data.

The original data file had significant issues with down-ballot races not showing the expected partisan splits.

Arapahoe county did “quietly” revise the data without explanation until it was discovered by Ed and Mark, and then when pushed, only acknowledged that there was an issue with redactions and data privacy concerns.

The fact that the modification did correct the expected partisan split for ALL down-ballot races lends some credence to the assertion that they were correcting an error/issue with the original CVR file, however it does not excuse the fact that they performed this correction without notice or explanation. It also does not explain how their 2020 audit was able to use the incorrect original CVR files and not catch any of these issues.

The CVR files are intended to be official forensic records. If they are subject to manipulation and “adjustments” without transparency then that brings into question the validity of those files as forensic devices in the first place.

Corrections:

(5/28/2025) Typo correction in that the original posting of this article had “April 2 2024” as the data the Arapahoe county statement said they changed the CVR. That was corrected to be “April 2 2025”.
(5/28/2025) I had the wrong reference for the associated case in NV. I had originally posted that the case was the “Gilbert” case. It is actually “Thompson vs State”. Links have been updated accordingly.
(5/28/2025) Minor spelling errors corrected.

Election Data Analysis Election Forensics Election Integrity mathematics technical

Differential Invalidation in VA 2024 General Election Data – Presidential Race

Post author By Jonathan Lareau
Post date November 18, 2024
No Comments on Differential Invalidation in VA 2024 General Election Data – Presidential Race

Using data published by the VA Department of Elections (“ELECT”), we plotted the Ballot Invalidation Rate (BIR) vs. the % of vote share for the winner in order to attempt to determine if “Differential Invalidation” of ballots occurred in the 2024 VA General Election. The plotted data appears to show differential invalidation and suggests that there are underlying issues that should be investigated and addressed, including data reliability and consistency issues where the number of reported total votes cast is greater than the number of reported ballots cast for some localities.

Details

“Differential invalidation” takes place when the ballots of one candidate or position are invalidated at a higher rate than for other candidates or positions. Note that differential invalidation does not directly indicate any sort of fraud. It is however indicative of an unfairness or inequality in the rate of incomplete or invalid ballots conditioned on candidate choice. While it could be caused by fraud or malfeasance, it could also be caused by confusing ballot layout, poor procedural controls and uniformity, under-voting (not choosing a candidate) by the voter, or other compounding factors, etc. (ref: [1] ch. 6)

The Free and Fair Hypothesis

In a democratic election, each persons vote counts the same. There are other requirements, but this is a necessary condition. In the presense of invalidation, the free and fair hypothesis reduces to each person’s vote having the same probability of being invalidated as any other persons ballot. From a statistical standpoint, this means that the invalidation must be independent of the candidate chosen on the ballot (or of the person voting) [ref: 1, pg. 132]

The data used for this analysis was the “unofficial” election results (the certified results are not yet published), and comes directly from the VA Dept of Elections. The data was downloaded on Nov 18th at 4:34 pm. We purposefully waited to perform this analysis until after the localities had completed their canvass operations, and for the data feeds on the VA Department of Elections (“ELECT”) website to mostly stabilize. The actual certified results will not be available until at least Dec 2 after the State Electoral Board meets to finalize the certification. We will revisit this analysis at that time.

The list of the number of Active Voters, Inactive Voters and total Ballots cast in a given locality can be found here: https://enr.elections.virginia.gov/results/public/Virginia/elections/2024NovemberGeneral/voters

Figure 1: Table of voter registration statistics and number of ballots cast as appeared on the VA Dept of Elections Website on Nov 18 16:34:00 EST at https://enr.elections.virginia.gov/results/public/Virginia/elections/2024NovemberGeneral/voters

The Turnout statistics and Vote Tally reports CSV file reports can be downloaded from here: https://enr.elections.virginia.gov/results/public/Virginia/elections/2024NovemberGeneral/reports
- The direct link to the turnout report used for this computation is here: https://enr.elections.virginia.gov/cdn/results/d2c804ee-4ec2-46bb-91d7-5b41526eab03/Election%20Turnout_bb03ef46-1af5-4f14-86a8-ead1b9094036.csv
- The direct link for the vote tally results report is here: https://enr.elections.virginia.gov/cdn/results/d2c804ee-4ec2-46bb-91d7-5b41526eab03/Election%20Results_4b7d6963-0b3a-4b2f-b331-d9d14ca39bc0.csv

Figure 2: Listing of the link for report CSV files as appeared on the VA Dept of Elections Website on Nov 18 16:34:00 EST at https://enr.elections.virginia.gov/results/public/Virginia/elections/2024NovemberGeneral/reports. Note that additional CSV files for “Election Winners”, “Election Change Log Report”, “EnrAbsenteeRawCSV”, as well as a complete JSON listing under the “Media Export” link at the bottom of the page.

With this dataset in hand we can know how many ballots were cast, as well as how many votes were counted for each candidate in each race in each locality (at least as reported by the state). For a given race, we can then compute the number of incomplete or invalid ballots by subtracting the total number of votes recorded for that race in the locality from the total number of reported ballots cast.

In accordance with the techniques presented in [1] and [2], we computed the plots of the Invalidation Rate vs the Percent Vote Share for the Winner in an attempt to observe if there looks to be any evidence of Differential Invalidation ([1], ch 6). This is similar to the techniques presented in [2], which we have used previously to produce election fingerprint that plotted the 2D histograms of the vote share for the winner vs the turnout percentage. (The 2024 versions are coming, just not ready yet.)

Each dot in Figure 3 below is representing the ballots from a specific locality. The x axis is the percent vote share for the winner (Harris), and the y axis is the ballot invalidation rate, and is computed as 100 – 100 * Nvotes / Nballots.

Figure 3: Plot of the Invalidation rate vs the % of vote share for the winner in each locality in the 20204 VA General Election for President.

A few things are immediately apparent from the plot in Figure 3:

There is clearly a distinction in the invalidation rate between localities that had low vote share and high vote share for harris.
- The data for localities where Harris had low vote share do not have a large distribution of invalidation rates, whereas the high vote share localities do.
There are a number of localities that are reporting negative invalidation rates. How is this possible, you ask? Well there are a number of localities in the CSV data that have higher vote totals than the corresponding reported number of total ballots cast in the locality.

This implies that there is something significantly wrong in the data and reporting tools or procedures used by ELECT, as all of this data was pulled nearly simultaneously and therefore the data should be at least self-consistent. While we understand that this is still unofficial data and that new updates may occur over time, at any given point in time the data should at least be self-consistent.

Note that there are still a few localities that have not yet had their vote totals reflected in the CSV files from ELECT. Those localities were omitted from this analysis. The combined information from all of the data source files that was used to generate this plot is available below.

Combined Results and Turnout Stats Table Download

In conclusion there does appear to be some indications that differential invalidation occurred in the 2024 VA General Election for President. Due to data inconsistencies and the fact that this data is still officially “unofficial” it is hard to make any definitive conclusions, but these results are suggestive of the existence of multiple underlying issues that need to be examined, understood and/or resolved. We can definitively say, however, that this is yet another example of the data streams from ELECT lacking self-consistency, which is a big problem in and of itself.

References

[1] Forsberg, O.J. (2020). Understanding Elections through Statistics: Polling, Prediction, and Testing (1st ed.). Chapman and Hall/CRC. https://doi.org/10.1201/9781003019695
[2] Klimek, Peter & Yegorov, Yuri & Hanel, Rudolf & Thurner, Stefan. (2012). Statistical Detection of Systematic Election Irregularities. Proceedings of the National Academy of Sciences of the United States of America. 109. 16469-73. https://doi.org/10.1073/pnas.1210722109.

Election Data Analysis Election Forensics Election Integrity mathematics technical

‘Dark’ Transactions in VA’s Voter Registration Data

Post author By Jonathan Lareau
Post date September 14, 2024
No Comments on ‘Dark’ Transactions in VA’s Voter Registration Data

EPEC has compared the changes to two purchased full versions of the VA Registered Voter List (RVL) to the content of the Monthly Update Service (MUS) data covering the same temporal period. Of the ID numbers that were added to the RVL, 3,613 (or 1.0589% of total additions) never appear anywhere in the MUS files covering the same temporal period. Of the ID numbers that were removed from the RVL, 3,355 (or 2.4096% total removals) never appear anywhere in the MUS files covering the same temporal period.

Since mid 2023 EPEC has been purchasing, processing and archiving copies of both the full Registered Voter List (RVL) and the Monthly Update Service (MUS) files which gives the UPDATE, ADD or CANCEL transactions to the voter list throughout the year.

Once a baseline RVL is established, the MUS files can be used to update that baseline in order to keep the list current. That should be all one needs to keep an accurate dataset of the registered voter list using monthly updates … except there is a catch … the MUS for some reason doesn’t quite capture all of the changes that are occurring in the voter list. In fact, we see about 1-2.5% of the ADD or CANCEL transactions between each RVL snapshot are not reflected by any corresponding entries in the MUS.

All of the changes that are made between two different RVL baseline snapshots should be able to be observed in the corresponding MUS files that cover the same time period, and vice versa. The MUS has transaction logs accounting for new registrants, for registrants who move, for removing deceased individuals, for individuals that have had a change in their felon status, for individuals who are determined non-citizen, for administrative updates and correction, etc. So, in theory, it should be able to be a complete record. However, over the course of working with the VA data files, every so often we have noticed that some transactions seem to be unaccounted for. Therefore, once we had enough data compiled, we decided to test just how well the MUS data actually explains the changes we see between between two baseline RVL files.

Method:

For this experiment, we used full RVL snapshots purchased from VA Department of Elections (ELECT) on 2023-06-30 and 2024-08-29, and all of the monthly MUS distributions covering the entire time period in between.

Using the voter ID number field that is present in all datasets, we first determine which ID numbers were added to the 2024 RVL dataset, and which ID numbers were deleted from the 2023 RVL data. We then checked to see how many of those ID numbers appear in any of the MUS data files, for any reason.

Note that this data was processed statewide, such that registrants moving between localities within the state should not affect the total number of computed additions or removals, as the ID numbers should still be present in the datasets, although corresponding locality information may have changed.

Results:

The breakdown of the number of changes that were present in the MUS file over the time period of the RVL snapshots (2023-06-30 through 2024-08-29) is given in Figure 1 below. The MUS data was deduplicated and truncated to only consider transactions with TRANSACTION date information between the dates associated with the RVL datasets. The bars in Figure 1 are logarithmically scaled in the y-axis, with the x-axis representing the NVRAReasonCode given for each transaction in the MUS. The bars are color coded by transaction type. As there are duplicates and oversampling within the collection of MUS files, only the latest transactions for each uniquely identified ID number was utilized to generate the plot. As can be seen from the various categories along the x-axis of this plot, the data in the MUS logs should be sufficient to capture all of the transactions with the RVL.

Figure 1: Breakdown of MUS transactions between 2023-06-30 and 2024-08-29

Direct Inspection of the RVL Snapshots:

Performing a simple set-difference between the elements of the unique ID numbers present in the 2023-06-30 RVL data vs the 2024-08-29 RVL data shows that there were 341,191 unique ID’s added, and 139,232 removed between the two datasets.

Of the ID numbers that were ADDED between the raw RVL snapshots, 3,613 (or 1.0589%) never appear anywhere in the MUS files covering the same temporal period.

Of the 3,613 ID numbers that were ADDED between the raw RVL snapshots, and that don’t appear in the MUS record, 537 (or 14.863%) have at least one entry in the Voter History List (VHL) data the EPEC has been collecting and archiving.

Of the ID numbers that were REMOVED between the raw RVL snapshots, 3,355 (or 2.4096%) never appear anywhere in the MUS files covering the same temporal period.

Of the 3,355 ID numbers that were REMOVED between the raw RVL snapshots, and that don’t appear in the MUS record, 2,011 (or 59.94%) have at least one entry in the VHL data the EPEC has been collecting and archiving.

Using the MUS-Adjusted RVL baseline

If we ignore the 2024-08-29 dataset, and instead directly apply the transactions in the MUS datafiles to the 2023-06-30 dataset in order to create a new RVL list, we would end up with 342,888 Additions, and 137,849 removals respectively to unique voter ID numbers. We see 1,697 more (342,888-341,191=1697) additions when trying to directly apply the MUS than when directly comparing RVL snapshots, and 1,383 less (139,232-137,849=1393) removals. Keep in mind these discrepancies are in addition to the 3,613 and 3,355 discrepancies using the RVL snapshot baselines, as the ID numbers in each set are unique. So the total number of discrepancies is 3,613 + 3,355 + 1,697 + 1,383 = 10,048 records.

Summary of these results:

                   Num_Added: 341191
        Num_Added_not_in_MUS: 3613
        Pct_Added_not_in_MUS: 1.0589
  Num_Added_not_in_MUS_wVHL: 537
                 Num_Removed: 139232
      Num_Removed_not_in_MUS: 3355
      Pct_Removed_not_in_MUS: 2.4096
Num_Removed_not_in_MUS_wVHL: 2011
           MUS_Num_Deletions: 137849
           MUS_Num_Additions: 342888
             MUS_Num_Updates: 946248
            NUS_Num_NOOP_ADD: 651
         NUS_Num_NOOP_MODIFY: 334282

Discussion:

We do not understand yet the origin of these discrepancies, it could be a coding error on the part of the developers of the VERIS system, or it could be that there is a category of data adjustments that is not adequately reflected in the RVL or MUS data products. The RVL snapshots are supposed to be the authoritative record of the voter registration data, and the MUS data updates are supposed to capture all of the transactional changes to said registration records.

Regardless of the cause of the discrepancy, the fact remains that there are a small number of transactions and changes to the voter record that are unobservable. They are, in effect, “dark” transactions in the voter registration data that cannot be observed, validated or verified.

Election Data Analysis Election Forensics Election Integrity mathematics technical

Identification of 2,502 Potential Matches of Active Voter Registrations Between FL and VA Voter Registration Lists

Post author By Jonathan Lareau
Post date June 1, 2024
No Comments on Identification of 2,502 Potential Matches of Active Voter Registrations Between FL and VA Voter Registration Lists

Building off of our previous work on computing the string distance between all possible pairs of registered voter records in a single state in order to identify potential matches, we’ve updated the code to allow for cross state comparisons. The first states that we ran this on was VA and FL, using the dataset produced by the FL Department of Elections on 05-07-2024, and the dataset from the VA department of elections dated 05-01-2024. There were a total of 2,502 records that matched our constraints between the FL and VA datasets, as detailed below.

Note: All examples of data records given in this writeup have been fictionalized to protect registered voter identities from being published on this website, and only serve as illustrative examples representative of the nature of properties and characteristics discussed. Law enforcement, election or other gov officials, or individuals otherwise authorized to receive and handle voter data as per VA law and the VA Department of Elections are welcome to contact us for specific details and further information.

Each dataset had the First Name, Middle Initial, Last Name, Suffix, Gender, and Year, Month and Day of Birth concatenated into strings that were then compared against each other using the Levenshtein String Distance measure as an initial filtering method to determine potential matches.

Additionally, for each pair we computed the minimum string distance measure between all of the four possible permutations of pairings between the Primary and Mailing addresses in each record between the states. We required that this minimum distance for a set of registration entries be less than or equal to 12 characters. The choice of the value of twelve was empirically determined after review of the data, as it is loose enough to allow for common variations in address presentation while not being so loose as to be overwhelmed with false positive.

We additionally filtered these findings for only those pairings that were of ACTIVE registrations in both datasets AND where the year, month and day of birth were exact matches.

In summary the 2,502 matches were generated according to the following constraints:

Only applied to ACTIVE voter registrations
Required completed DOB (year, month and day) to exactly match
Required [First Name + Middle Initial + Last Name + Suffix + Gender + DOB] strings to be similar to within <=2 characters
Required that the minimum distance between any pairwise combination of the Primary or Mailing address between the records be less than or equal to 12 characters.

It should be noted that it is readily apparent from reviewing the potential matched records that the majority of these matches look to have originated in FL and then were subsequently moved to VA, but the FL record remained listed as active.

Category 1 Matches:

There were 698 matches in Category 1: where the Levenshtein distance measure for the name and DOB was equal to 0 (exact match) and the minimum address distance was also 0 (also an exact match). Examples in this category are exact matches for every considered field. An example is given below.

FL Active Registration Record:
SOUXIEE Q SMITH F 08/19/1968
1267 SLEEPY SONG PL SPRINGFIELD VA 22150

VA Active Registration Record:
SOUXIEE Q SMITH F 08/19/1968
1267 SLEEPY SONG PL SPRINGFIELD VA 22150

Category 2 Matches:

There were 1,533 matches in Category 2: where the Levenshtein distance measure for the name and DOB was equal to 0 (exact match) and the minimum address distance was greater than 0, but less than or equal to 12. Examples in this category commonly have differences in how the zip code, apartment numbers or state code is presented in either the Primary or Mailing address strings. An example is given below.

FL Active Registration Record:
SOUXIEE Q SMITH F 08/19/1968
1267 SLEEPY SONG PLACE SPRINGFIELD VA 22150

VA Active Registration Record:
SOUXIEE Q SMITH F 08/19/1968
1267 SLEEPY SONG PL SPRINGFIELD VA 221504259

Category 3 Matches:

There were 44 matches in Category 3: where the Levenshtein distance measure for the name and DOB was equal to 1 and the minimum address distance was equal 0 (exact match). Examples in this category are most often due to hyphenation or misspellings in the name, or a change in Gender (i.e. from “M”->”U”). An example is given below.

FL Active Registration Record:
BENNIE DAS M 05/14/1945
12345 PEPPERMINT PATTY CREST APT 1000 ASHBURN VA 201475724

VA Active Registration Record:
BENNEE DAS M 05/14/1945
12345 PEPPERMINT PATTY CREST APT 1000 ASHBURN VA 201475724

Category 4 Matches:

There were 140 matches in Category 4: where the Levenshtein distance measure for the name and DOB was equal to 1 and the minimum address distance was greater than 0, but less than or equal to 12. Examples in this category are most often due to hyphenation or misspellings in the name, or a change in Gender (i.e. from “M”->”U”), as well as small differences in how the addresses are presented. An example is given below.

FL Active Registration Record:
BENNIE DAS M 05/14/1945
1267 SLEEPY SONG PLACE SPRINGFIELD VA 22150

VA Active Registration Record:
BENNEE DAS M 05/14/1945
1267 SLEEPY SONG PL SPRINGFIELD VA 221504259

Category 5 Matches:

There were 19 matches in Category 5: where the Levenshtein sistance measure for the name and DOB was equal to 2 and the minimum address distance was equal 0 (exact match). Examples in this category are most often due to a middle name/initial being present in one record and not being present in the other. An example is given below.

FL Active Registration Record:
BENNIE DAS M 05/14/1945
12345 PEPPERMINT PATTY CREST APT 1000 ASHBURN VA 201475724

VA Active Registration Record:
BENNIE C DAS M 05/14/1945
12345 PEPPERMINT PATTY CREST APT 1000 ASHBURN VA 201475724

Category 6 Matches:

There were 68 matches in Category 3: where the Levenshtein Distance measure was equal to 1 and the minimum address distance was greater than 0, but less than or equal to 12. Examples in this category are most often due to a middle name/initial being present in one record and not being present in the other, as well as small differences in how the addresses are presented. An example is given below.

FL Active Registration Record:
BENNIE C DAS M 05/14/1945
1267 SLEEPY SONG PLACE SPRINGFIELD VA 22150

VA Active Registration Record:
BENNIE DAS M 05/14/1945
1267 SLEEPY SONG PL SPRINGFIELD VA 221504259

Table of Results by VA Locality:

Row Labels	LD=0, AD=0	LD=0, 0<AD<=12	LD=1, AD=0	LD=1, 0<AD<=12	LD=2, AD=0	LD=2, 0<AD<=12
ACCOMACK COUNTY	3	8	1	1	0	0
ALBEMARLE COUNTY	13	24	0	1	0	0
ALEXANDRIA CITY	15	52	1	6	1	1
ALLEGHANY COUNTY	1	3	0	1	0	0
AMELIA COUNTY	2	2	0	0	0	0
AMHERST COUNTY	3	2	0	0	0	0
APPOMATTOX COUNTY	5	0	0	0	1	0
ARLINGTON COUNTY	27	53	2	8	2	6
AUGUSTA COUNTY	3	8	0	1	1	0
BEDFORD COUNTY	4	15	0	1	0	0
BOTETOURT COUNTY	7	2	0	0	0	0
BRISTOL CITY	3	2	0	0	0	0
BRUNSWICK COUNTY	1	2	0	0	0	0
BUCHANAN COUNTY	1	0	0	0	0	0
BUCKINGHAM COUNTY	0	1	0	0	0	0
CAMPBELL COUNTY	2	3	1	1	0	0
CAROLINE COUNTY	0	2	0	0	0	0
CARROLL COUNTY	1	6	0	1	0	0
CHARLOTTE COUNTY	1	4	0	0	0	0
CHARLOTTESVILLE CITY	4	6	0	0	0	1
CHESAPEAKE CITY	27	87	4	13	1	4
CHESTERFIELD COUNTY	28	49	2	5	0	3
CLARKE COUNTY	0	2	0	0	0	0
COLONIAL HEIGHTS CITY	0	1	1	0	0	0
CRAIG COUNTY	2	1	0	0	0	0
CULPEPER COUNTY	6	8	0	0	0	0
CUMBERLAND COUNTY	2	0	0	0	0	0
DANVILLE CITY	2	1	0	0	0	0
DICKENSON COUNTY	1	3	0	0	0	0
DINWIDDIE COUNTY	0	3	0	1	0	0
ESSEX COUNTY	2	0	0	0	0	0
FAIRFAX CITY	3	6	0	0	0	0
FAIRFAX COUNTY	108	259	7	14	4	15
FALLS CHURCH CITY	2	2	0	0	0	1
FAUQUIER COUNTY	4	14	1	0	0	0
FLOYD COUNTY	1	1	1	0	0	0
FLUVANNA COUNTY	2	3	0	2	0	0
FRANKLIN CITY	3	1	0	0	0	0
FRANKLIN COUNTY	5	6	0	1	0	1
FREDERICK COUNTY	10	9	0	2	0	0
FREDERICKSBURG CITY	1	7	0	0	0	0
GALAX CITY	2	0	0	0	0	0
GILES COUNTY	0	0	0	1	0	0
GLOUCESTER COUNTY	6	17	0	1	1	0
GOOCHLAND COUNTY	2	2	1	0	1	0
GRAYSON COUNTY	1	3	0	1	0	0
GREENE COUNTY	0	5	0	0	0	0
HALIFAX COUNTY	1	2	0	1	0	0
HAMPTON CITY	10	16	0	6	0	0
HANOVER COUNTY	2	6	1	2	1	0
HARRISONBURG CITY	1	6	0	1	0	0
HENRICO COUNTY	24	33	0	3	0	1
HENRY COUNTY	3	5	0	1	0	0
ISLE OF WIGHT COUNTY	4	13	0	1	0	2
JAMES CITY COUNTY	23	25	1	1	0	0
KING GEORGE COUNTY	2	4	1	0	0	1
KING WILLIAM COUNTY	2	0	0	0	0	0
LANCASTER COUNTY	2	1	1	0	0	1
LEE COUNTY	3	1	0	0	0	0
LEXINGTON CITY	0	2	0	0	0	0
LOUDOUN COUNTY	29	73	1	1	2	2
LOUISA COUNTY	5	2	0	0	0	0
LYNCHBURG CITY	6	15	0	2	0	0
MADISON COUNTY	2	0	0	0	0	0
MANASSAS CITY	3	0	0	0	0	0
MANASSAS PARK CITY	1	0	0	0	0	0
MARTINSVILLE CITY	2	1	0	0	0	0
MATHEWS COUNTY	0	3	0	0	0	0
MECKLENBURG COUNTY	3	2	0	0	0	0
MIDDLESEX COUNTY	0	4	0	1	0	0
MONTGOMERY COUNTY	6	11	1	1	0	0
NELSON COUNTY	1	2	0	1	0	0
NEW KENT COUNTY	0	6	0	0	0	0
NEWPORT NEWS CITY	8	17	0	1	0	2
NORFOLK CITY	14	58	0	11	0	1
NORTHUMBERLAND COUNTY	2	1	1	0	0	0
NOTTOWAY COUNTY	0	1	0	0	0	0
ORANGE COUNTY	5	6	1	0	0	0
PAGE COUNTY	1	2	0	0	0	0
PATRICK COUNTY	0	2	0	0	0	0
PETERSBURG CITY	2	1	0	0	0	0
PITTSYLVANIA COUNTY	3	7	0	1	0	0
POQUOSON CITY	1	0	0	0	0	0
PORTSMOUTH CITY	5	9	1	1	0	0
POWHATAN COUNTY	2	2	0	1	0	0
PRINCE EDWARD COUNTY	0	2	0	0	0	0
PRINCE GEORGE COUNTY	1	1	1	1	0	1
PRINCE WILLIAM COUNTY	40	83	2	11	3	3
PULASKI COUNTY	2	2	0	0	0	0
RADFORD CITY	0	2	0	0	0	0
RAPPAHANNOCK COUNTY	0	2	1	0	0	0
RICHMOND CITY	12	29	1	3	0	0
ROANOKE CITY	14	12	1	2	0	0
ROANOKE COUNTY	14	15	0	0	0	1
ROCKBRIDGE COUNTY	2	2	2	0	0	0
ROCKINGHAM COUNTY	1	5	0	1	0	1
RUSSELL COUNTY	0	3	0	0	0	1
SALEM CITY	2	1	0	0	0	0
SCOTT COUNTY	2	0	0	0	0	0
SHENANDOAH COUNTY	0	1	0	1	0	1
SMYTH COUNTY	1	2	0	0	0	0
SOUTHAMPTON COUNTY	0	2	0	1	0	0
SPOTSYLVANIA COUNTY	10	19	1	1	0	0
STAFFORD COUNTY	20	48	0	4	0	4
STAUNTON CITY	1	2	0	0	0	0
SUFFOLK CITY	12	31	0	0	0	1
TAZEWELL COUNTY	0	5	0	1	0	0
VIRGINIA BEACH CITY	46	177	1	11	1	12
WARREN COUNTY	2	4	0	0	0	0
WASHINGTON COUNTY	3	5	1	1	0	0
WAYNESBORO CITY	1	3	0	0	0	0
WESTMORELAND COUNTY	5	2	0	0	0	1
WILLIAMSBURG CITY	1	1	0	0	0	0
WINCHESTER CITY	0	6	0	0	0	0
WISE COUNTY	0	7	0	0	0	0
WYTHE COUNTY	0	0	0	1	0	0
YORK COUNTY	12	35	2	2	0	0
Grand Total	698	1533	44	140	19	68

Tabulated Results by FL County Code:

Row Labels	LD=0, AD=0	LD=0, 0<AD<=12	LD=1, AD=0	LD=1, 0<AD<=12	LD=2, AD=0	LD=2, 0<AD<=12
MON	2	20	0	1	0	0
ALA	0	23	0	2	0	0
BAK	0	2	0	0	0	0
BAY	7	40	0	4	1	0
BRA	2	2	0	0	0	0
BRE	41	39	1	1	2	3
BRO	12	95	0	6	0	8
CHA	71	14	6	1	2	1
CIT	1	6	0	1	0	0
CLA	7	47	2	5	0	3
CLL	1	52	0	1	0	1
CLM	0	0	0	1	0	0
DAD	50	59	2	6	2	1
DES	1	1	0	0	0	0
DUV	28	114	4	21	1	9
ESC	19	103	1	10	0	3
FLA	5	11	0	1	2	2
FRA	1	1	0	0	0	0
GAD	1	0	0	1	0	0
GLA	1	0	0	0	0	0
GUL	0	4	0	0	0	0
HAM	3	0	0	0	0	0
HAR	3	1	0	0	0	0
HEN	1	0	0	0	0	0
HER	8	16	0	2	0	1
HIG	0	1	0	0	0	0
HIL	29	65	2	10	1	4
HOL	0	1	0	0	0	0
IND	9	11	1	0	1	0
JAC	0	2	0	0	0	0
LAK	1	10	0	1	0	1
LEE	0	46	0	3	0	1
LEO	35	9	2	0	1	0
LEV	3	0	1	0	0	0
MAD	0	0	1	0	0	0
MAN	31	21	1	1	0	1
MRN	26	16	0	1	0	1
MRT	40	6	2	2	1	1
NAS	4	12	0	1	0	0
OKA	50	31	3	0	1	2
OKE	1	0	0	0	0	0
ORA	1	139	0	9	0	4
OSC	4	15	1	0	0	0
PAL	35	89	3	10	0	2
PAS	0	30	0	3	0	1
PIN	4	88	0	6	0	3
POL	0	62	0	9	0	2
PUT	2	1	0	0	0	0
SAN	13	42	0	3	0	2
SAR	17	18	1	1	2	0
SEM	53	34	5	3	0	3
STJ	8	22	1	5	0	3
STL	60	20	4	2	2	1
SUM	2	29	0	3	0	1
SUW	3	3	0	0	0	0
TAY	0	2	0	0	0	0
VOL	0	51	0	3	0	3
WAK	1	1	0	0	0	0
WAL	1	6	0	0	0	0
Grand Total	698	1533	44	140	19	68

Addendum + Updates:

In response to a number of questions we have received on this topic, and continued work to dig into this data:

The number of matches above has been corrected from the original 2,527 to 2,502 (a difference of 25) due to a “fat-finger” error in tallying the total number of category 5 matches.
For the strict constraints given above, the number of matched records where there is a vote recorded for the same election date in both the VA and FL data is 13.
We also computed the number of exact [First Name + Middle Initial + Last Name + Gender + Full DOB] matches without requiring our additional address filter. This criteria is more strict in the initial match, but more loose in the subsequent filtering.
- This results in a total of 17,701 matches when considering only Active voters on each of the FL and VA voter lists.
  - There are 343 of these matches where both FL and VA records have a history of votes cast in the same election.
- The number jumps to 81,155 if we consider either Active or Inactive registrations.
  - There are 382 of these matches where both FL and VA records have a history of votes cast in the same election.

Election Data Analysis Election Forensics Election Integrity mathematics technical Uncategorized

VA 2024 March Primary Election Fingerprints

Post author By Jonathan Lareau
Post date March 8, 2024
No Comments on VA 2024 March Primary Election Fingerprints

Abstract

Examining the Election Night Reporting data from the VA 2024 March Democratic and Republican primaries provides supporting evidence that the Republican primary was impacted and skewed by a large number of Democratic “crossover” voters, resulting in an irregular election fingerprint when the data is plotted.

Background

The US National Academy of Sciences (NAS) published a paper in 2012 titled “Statistical detection of systematic election irregularities.” [1] The paper asked the question, “How can it be distinguished whether an election outcome represents the will of the people or the will of the counters?” The study reviewed the results from elections in Russia and other countries, where widespread fraud was suspected. The study was published in the proceedings of the National Academy of Sciences as well as referenced in multiple election guides by USAID [2][3], among other citations.

The study authors’ thesis was that with a large sample sample of the voting data, they would be able to see whether or not voting patterns deviated from the voting patterns of elections where there was no suspected fraud. The results of their study proved that there were indeed significant deviations from the expected, normal voting patterns in the elections where fraud was suspected, as well as provided a number of interesting insights into the associated “signatures” of various electoral mechanism as they present themselves in the data.

Statistical results are often graphed, to provide a visual representation of how normal data should look. A particularly useful visual representation of election data, as utilized in [1], is a two-dimensional histogram of the percent voter turnout vs the percent vote share for the winner, or what I call an “election fingerprint”. Under the assumptions of a truly free and fair election, the expected shape of the fingerprint is of that of a 2D Gaussian (a.k.a. a “Normal”) distribution [4]. The obvious caveat here is that no election is ever perfect, but with a large enough sample size of data points we should be able to identify large scale statistical properties.

In many situations, the results of an experiment follow what is called a ‘normal distribution’. For example, if you flip a coin 100 times and count how many times it comes up heads, the average result will be 50. But if you do this test 100 times, most of the results will be close to 50, but not exactly. You’ll get almost as many cases with 49, or 51. You’ll get quite a few 45s or 55s, but almost no 20s or 80s. If you plot your 100 tests on a graph, you’ll get a well-known shape called a bell curve that’s highest in the middle and tapers off on either side. That is a normal distribution.
https://news.mit.edu/2012/explained-sigma-0209

In a free and fair election, the plotted graphs of both the Turnout percentage and the percentage of Vote Share for Election Winner should (again … ideally) both resemble Gaussian “Normal” distributions; and their combined distribution should also follow a 2-dimensional Gaussian (or “normal”) distribution. Computing this 2 Dimensional joint distribution of the % Turnout vs. % Vote Share is what I refer to as an “Election Fingerprint”.

Figure 1 is reprinted examples from the referenced National Academy of Sciences paper. The actual election results in Russia, Uganda and Switzerland appear in the left column, the right column is the modeled expected appearance in a fair election with little fraud, and the middle column is the researchers’ model of the as-collected data, with any possible fraud mechanisms included.

Figure 1: NAS Paper Results (reprinted from [1])

As you can see, the election in Switzerland (assumed fair) shows a range of voter turnout, from approximately 30 – 70% across voting districts, and a similar range of votes for the winner. The Switzerland data is consistent across models, and does not show any significant irregularities.

What do the clusters mean in the Russia 2011 and 2012 elections? Of particular concern are the top right corners, showing nearly 100% turnout of voters, and nearly 100% of them voted for the winner.

Both of those events (more than 90% of registered voters turning out to vote and more than 90% of the voters voting for the winner) are statistically improbable, even for very contested elections. Election results that show a strong linear streak away from the main fingerprint lobe indicates ‘ballot stuffing,’ where ballots are added at a specific rate. Voter turnout over 100% indicates ‘extreme fraud’. [1][5]

Note that election results with ‘outliers’ – results that fall outside of expected normal voting patterns – while evidentiary indicators, are not in and of themselves definitive proof of outright fraud or malfeasance. For example, in rare but extreme cases, where the electorate is very split and the split closely follows the geographic boundaries between voting precincts, we could see multiple overlapping Gaussian lobes in the 2D image. Even in that rare case, there should not be distinct structures visible in the election fingerprint, linear streaks, overly skewed or smeared distributions, or exceedingly high turnout or vote share percentages. Additional reviews of voting patterns and election results should be conducted whenever deviations from normal patterns occur in an election.

Additionally it should be noted that “the absence of evidence is not the evidence of absence”: Election Fingerprints that look otherwise normal might still have underlying issues that are not readily apparent with this view of the data.

Results on 2024 VA March Primaries:

Figure 2 and Figure 3 are the computed election fingerprints for the Democratic and Republican VA 2024 March Primaries, respectively. They were computed according to the NAS paper and using official state reported voter turnout and votes for the statewide winner and reported per voting Locality with combined In-Person Early, Election Day, Absentee and Provisional votes. Figures 4 and 5 perform the same process, except each data point is generated per individual precinct in a locality. The color scale moves from precincts with low counts as deep blue, to precincts with high numbers represented as bright yellow. Note that a small blurring filter was applied to the computed image for ease of viewing small isolated Locality or Precinct results.

The upper right inset in each graphic image was computed per the NAS paper; the bottom left inset shows what an idealized model of the data could or should look like, based on the reported voter turnout and vote share for the winner. This ideal model is allowed to have up to 3 Gaussian lobes based on the peak locations and standard deviations in the reported results. The top-left and bottom-right inset plots show the sum of the rows and columns of the fingerprint image. The top-left graph corresponds to the sum of the rows in the upper right image and is the histogram of the vote share for the winner across precincts. The bottom right graph shows the sum of the columns of the upper right image, and is the histogram of the percentage turnout across voting localities.

Figure 2 Democratic primary, accumulated per Locality:

Figure 3 Republican primary, accumulated per Locality:

Figure 4 Democratic primary, accumulated per Precinct:

Figure 5 Republican primary, accumulated per Precinct:

Analysis:

As can be seen in Figure 2 and 4, the Democratic primary fingerprint looks to fall within expected normal distribution. Even though the total vote share for the winner (Biden) is up around 90%, this was not unexpected given the current set of contestants and the fact that Biden is the incumbent.

The Republican primary results, as shown in Figure 3 and 5, show significant “smearing” of the percent of total vote share for the winner. The percent of voter turnout (x-axis) does however show a near Gaussian distribution, which is what one would expect. The republican primary data does not show the linear streaking pattern that the authors in [1] correlate with extreme fraud, but significant smearing of the distribution is observed.

A consideration that might partially explain this smearing of the histogram, is that there was at least 17% of “crossover voters” who historically lean Democrat but voted in the Republican primary (see here for more information). Multiple news reports and exit polling suggest that this was due in part to loosely organized efforts by the opposing party to cast “Protest Votes” and artificially inflate the challenger (Haley) and dilute the expected (Trump) margin of victory for the winner, with no intention of supporting a Republican candidate in the General Election. (This is completely legal in VA, by the way, as VA does not require by-party voter registration.)

If we categorize each locality as being either Democratic or Republican leaning based on the average results of the last four presidential elections, and then split the computation of the per precinct results into separate parts accordingly, we can see this phenomenon much clearer.

Figure 6 shows the per-precinct results for only those locality precincts that belong to historic Republican leaning localities. It depicts a much tighter distribution and has much less smearing or blurring of the distribution tails. We can see from the data that Republican base in historically Republican leaning localities seems solidly behind candidate Trump.

Figure 7 shows the per-precinct results for only those locality precincts that belong to historic Democratic leaning localities. It can clearly be seen by comparing the two plots that the major contributor to the spread of the total republican primary distribution is the votes from historically Democratic leaning localities.

Figure 6 Republican primary, accumulated per Precinct in Republican leaning localities:

Figure 7 Republican primary, accumulated per Precinct in Democratic leaning localities:

References:

[1] “Statistical detection of election irregularities” Peter Klimek, Yuri Yegorov, Rudolf Hanel, Stefan Thurner Proceedings of the National Academy of Sciences Oct 2012, 109 (41) 16469-16473; DOI: 10.1073/pnas.1210722109 (https://www.pnas.org/content/109/41/16469)
[2] USAID: Assessing and Verifying Election Results: A Decision Makers Guide to Parallel Vote Tabulation and Other Tools (http://web.archive.org/web/20201118021847/https://pdf.usaid.gov/pdf_docs/PA00KGWR.pdf)
[3] USAID: A guide to Election Forensics (http://web.archive.org/web/20210501091306/https://pdf.usaid.gov/pdf_docs/PA00MXR7.pdf)
[4] Multivariate Normal Distribution – Wikipedia (https://en.wikipedia.org/wiki/Multivariate_normal_distribution)
[5] Mebane, Walter R. and Kalinin, Kirill, Comparative Election Fraud Detection (2009). APSA 2009 Toronto Meeting Paper, Available at SSRN: https://ssrn.com/abstract=1450078

Election Integrity mathematics technical Uncategorized

Ranked Choice Voting: An Example of a Perverse Social Choice Function

Post author By Jonathan Lareau
Post date January 18, 2024
No Comments on Ranked Choice Voting: An Example of a Perverse Social Choice Function

The below is based on the discussion of “Single Transferrable Vote” (“STV”) methods in [1], published in 1977. STV has more recently been called “Ranked Choice Voting” (RCV) or “Instant Runoff Voting” (IRF), among other names, by lobbying groups that are currently pushing for its incorporation into our voting systems. Irrespective of the name used, it represents a family of voting methods, with slightly different variants depending on how votes are removed and/or redistributed in each successive round of voting. [2][5]

What does STV/RCV/IRV entail, in general:

The core system is a proportional voting system, where voters are required to rank order their preferred candidate selections and all ballots are collected and centralized tabulation is performed in multiple rounds until winner(s), or candidates that have support above a specified quota (or “threshold”), are allocated.

A common definition of the quota utilized in STL/RCV/IRV systems is the “Droop quota”, and is defined as:

q = FLOOR( # of Voters / (# of Seats + 1) + 1)

In a given round the candidate with the least support is eliminated from further evaluation. Surplus votes from candidates that go over the droop threshold and votes from eliminated candidates can be distributed amongst remaining candidates for subsequent rounds. Surplus vote distribution is only applicable when multiple winners are allowed in a contest.

Vote allocation procedure for STV/RCV/IRV. Reprinted from [1].

The arguments used to support and push for RCV have not significantly changed since the time that the original paper was published, but the terms and language utilized have been modified. The authors note that much of the rationale in pushing for STV was centered around the ideas of inclusivity and making sure voters are able to cast “effective” ballots.

“Modem proponents emphasize the system’s effective representation of minorities, its sensitivity and accuracy in ‘measuring changes in popular will,’ and its tendency to encourage independent (nonparty line) voting.”
Doron, G., & Kronick, R. (1977) [1]

The same arguments have been recently repeated and pushed to legislators and the media. The name has changed from “Single Transferrable Vote” to “Ranked Choice Voting” or “Instant Runoff Voting”, but the argument remains largely the same, as can be seen by simply visiting the websites and promotional material for any of the current groups that are lobbying for RCV to be incorporated [3][4].

The issue pointed out by Doron & Kronick:

The authors in [1] note that the STV/RCV/IRV system allows for a “perversion” (their words, not mine) whereby a candidates chances to be selected as a winner can potentially be negatively impacted even when receiving increased support.

“… a function that permitted an increased vote for a candidate to cause a decline in that candidate’s rank in the social ordering-would probably strike most of us as a rather absurd, even perverse, method of arriving at a social choice. Consequently, some writers refer to this condition as the ‘Non-Perversity’ condition. All of the democratic social choice functions that have been considered in the literature were assumed to guarantee this condition, but the Single Transferrable Vote system does not.”
Doron, G., & Kronick, R. (1977) [1]

The authors present a hypothetical example to demonstrate the issue. Suppose we have 3 candidates (Candidate X, Candidate Y, Candidate Z) and two different voting groups, which we will refer to as group D and D’. Both D and D’ are fairly similar and only disagree on the relative ranking of two specific candidates.

In the tables below, recreated from [1], the only difference in the two voting group selections is that candidate X receives more support than candidate Y in group D’. However, if using the voting rules as described above candidate X wins in D, and loses in D’ even though X has increased support in D’.

# of Voters	First Choice	Second Choice	Third Choice
6	X	Y	Z
2	Y	X	Z
4	Y	Z	X
5	Z	X	Y

Voting group D selections. Reprinted from [1].

# of Voters	First Choice	Second Choice	Third Choice
6	X	Y	Z
2	X	Y	Z
4	Y	Z	X
5	Z	X	Y

Voting group D’ selections. Reprinted from [1].

There are 17 voters in each case, and only 1 seat available. Therefore, the Droop quota/threshold is 9 votes required in order to declare a winner.

In group D it is candidate Z that has the least amount of votes in the first round and is eliminated, therefore advancing 5 second-choice votes for X into the next round. Candidate X passes the threshold and wins in the second round.

In group D’, where candidate X received more support than candidate Y, it is candidate Y that has the least amount of votes in the first round and is eliminated, therefore advancing 4 second-choice votes for Z into the next round. Candidate Z then passes the threshold and wins in the second round.

Bibliography:

Doron, G., & Kronick, R. (1977). Single Transferrable Vote: An Example of a Perverse Social Choice Function. American Journal of Political Science, 21(2), 303–311. https://doi.org/10.2307/2110496
https://ballotpedia.org/Ranked-choice_voting_(RCV)
https://campaignlegal.org/democracyu/accountability/ranked-choice-voting
https://www.hhh.umn.edu/research-centers/center-study-politics-and-governance/research-and-initiatives-cspg/ranked-choice-voting
Brandt F, Conitzer V, Endriss U, Lang J, Procaccia AD, eds. Handbook of Computational Social Choice. Cambridge: Cambridge University Press; 2016. https://doi.org/10.1017/CBO9781107446984

Election Data Analysis Election Forensics Election Integrity Interesting mathematics technical

VA voter registrations greater than census determined eligible voters

Post author By Jonathan Lareau
Post date September 17, 2023
No Comments on VA voter registrations greater than census determined eligible voters

One of our volunteers brought to our attention the following press release: https://www.honestelections.org/honest-elections-project-supports-new-pre-litigation-notices-against-arizona-and-virginia/. They claim that “… 43 counties and independent cities in Virginia and four counties in Arizona claim to have more voters than voting-age adult citizens.”

Attempt to directly validate the claim

After reading through the press release we decided to independently try to verify the claims in the release. Note that an analysis like this has been on our list of things-to-do, but there are only so many hours in the day! The fact this press release was issued gave us a well deserved prod to complete this analysis.

EPEC has purchased the entire statewide registered voter list data from the VA Department of Elections (ELECT) and has current records as of 2023-08-01. Eligible parties can purchase data from ELECT via their website here.

The necessary data from the US Census office can be downloaded here and included the estimates of the eligible voting age citizens in each county. From the documentation on the census site, the “cvap_est” field in the census data represents “The rounded estimate of the total number of United States citizens 18 years of age or older for that geographic area and group.”

It is therefore a straightforward process to accumulate the number of registrant records in each county, as well as accumulating the number of eligible voting age citizens and compute the registration percent “REG_PCT” as (# Registered / # Eligible * 100). The below table has the results of this direct computation for each county.

The results are only slightly different than the results presented by Honest Elections Project, but still show significant issues with 38 counties being over 100%.

Adjusting for population growth since 2020 census

As the census redistricting data is circa 2020, and the eligible voter data was estimated for 2021, we can attempt to account for population shifts since the 2020 census data was collected and the voter eligibility data was computed. The US Census bureau also makes available the estimates of population growth by county year-over-year since the date of the last census here, which we can use to find the recent rates of growth or decline for each county. We can then use these rates to adjust the number of eligible voter estimates to scale with the most recent rates of population change. This is admittedly an approximation and assumes a linear relationship, but it is arguably better than taking the 2020 census and 2021 eligible voter estimates and applying them directly to the latest (2023) RVL.

The REG_PCT_ADJ column in the table below represents this adjusted estimate.

Active vs inactive registrations

An additional consideration that can be made with this data, is to attempt to consider only “Active” voter registrations vs registrations with any status assigned. Note that “Inactive” voter registrations can be immediately returned to “Active” status by simply having any type of interaction with the department of elections (or through DMV, etc), and the registrant will then be allowed to vote. Because of this easy ability to change “Inactive” records to “Active”, it is most appropriate (IMO) to include them in this analysis. However, for completeness, and in order to bound the scope of the issue, the corresponding REG_PCT_ACTIVE and REG_PCT_ADJ_ACTIVE columns have also been computed which only consider “Active” voters.

Results

Even the most forgiving analysis we could compute with the official data from US Census and VA ELECT, which only considers active voters and attempts to adjust for population change since the census, still results in multiple (6) counties in VA having more than 100% registered voters than eligible voters, and many counties that had over 90%.

The most appropriate metric to consider, in my opinion, is the Adjusted and either Active or Inactive status results, as inactive status registrations can still be converted to active status and voted. There were 36 localities with over 100% in this category and 59 between 90% and 100%. There are 133 voting localities in total in VA.

The summary tabulated data and graphics for each of the methods of analyzing the data is presented below.

Tabulated Data Results:

excess_registrations_VA_20230801T200500 Download

LOCALITY_NAME	N_REG	N_REG_ACTIVE	N_ELIGIBLE	N_ELIGIBLE_ADJ	REG_PCT	REG_PCT_ADJ	REG_PCT_ACTIVE	REG_PCT_ADJ_ACTIVE
GOOCHLAND COUNTY	22032	21308	19840	20314.622534217	111.048387096774	108.453897988458	107.399193548387	104.889962705976
LOUDOUN COUNTY	291468	272527	259775	261988.121593707	112.200173226831	111.252372140753	104.908863439515	104.022655050994
FAIRFAX CITY	17867	16578	15890	16179.2525931696	112.441787287602	110.431553603056	104.32976714915	102.464560118177
SURRY COUNTY	5635	5449	5325	5327.44865113427	105.821596244131	105.772957545076	102.328638497653	102.281605264085
NEW KENT COUNTY	19258	18686	17545	18281.8036615372	109.763465374751	105.339715689632	106.503277286976	102.210921558649
KING WILLIAM COUNTY	14025	13592	13380	13590.9343586927	104.820627802691	103.193788078519	101.584454409567	100.007840824473
MATHEWS COUNTY	7353	7092	7145	7099.01111761264	102.911126662001	103.577806516702	99.25822253324	99.9012381091326
NORTHAMPTON COUNTY	9801	9306	9435	9326.0652878146	103.879173290938	105.092551869714	98.6327503974563	99.7848472298299
HANOVER COUNTY	86583	83800	83280	84111.898838322	103.96613832853	102.937873470706	100.624399615754	99.6291858314583
RAPPAHANNOCK COUNTY	6194	5952	5945	5996.95979561651	104.188393608074	103.285668256898	100.117746005046	99.250290194552
JAMES CITY COUNTY	63840	60418	60100	61066.3951247591	106.222961730449	104.541949577299	100.529118136439	98.9382128690675
FAUQUIER COUNTY	56031	52805	52965	53424.3705925266	105.788728405551	104.879101763041	99.6979137166053	98.840659074394
FALLS CHURCH CITY	11130	10253	10345	10373.4476832119	107.588206863219	107.293161732647	99.1106814886419	98.8388847479627
ACCOMACK COUNTY	25327	24192	24650	24522.1840906366	102.74645030426	103.28199114071	98.1419878296146	98.6535290273647
CRAIG COUNTY	3917	3747	3830	3807.2210828548	102.271540469974	102.883439515493	97.8328981723238	98.4182404555912
CHARLES CITY COUNTY	5728	5584	5700	5677.65042979943	100.491228070175	100.886802927075	97.9649122807018	98.3505425182942
ISLE OF WIGHT COUNTY	31039	29665	29710	30281.6797400553	104.473241332885	102.50091892671	99.8485358465163	97.9635220194221
SPOTSYLVANIA COUNTY	105580	100062	100440	102345.441485999	105.117483074472	103.160432420865	99.6236559139785	97.7688879418126
LANCASTER COUNTY	9201	8736	9065	8943.53432452276	101.50027578599	102.878791159456	96.3706563706564	97.6795043548532
CLARKE COUNTY	12213	11489	11610	11882.4255832663	105.193798449612	102.782044915133	98.9577950043066	96.6890128576076
POWHATAN COUNTY	24346	23624	24195	24441.048216348	100.62409588758	99.6111123569388	97.6400082661707	96.6570655680737
LOUISA COUNTY	29799	28775	29005	29810.5293092847	102.737459058783	99.9613247079075	99.2070332701258	96.5262968042565
FAIRFAX COUNTY	784080	721354	749510	747334.300776511	104.612346733199	104.916902540845	96.2434123627437	96.523603861148
HENRICO COUNTY	239597	227831	236070	236137.172406976	101.494048375482	101.465177023065	96.5099334943025	96.4824799406588
BEDFORD COUNTY	62932	60599	62435	62871.2603534819	100.796027868984	100.096609557653	97.059341715384	96.3858520718266
NELSON COUNTY	11827	11384	11955	11835.45	98.9293182768716	99.9286043200723	95.2237557507319	96.1856118694262
FREDERICK COUNTY	68222	63868	65715	66439.8543302061	103.814958533059	102.682344336483	97.189378376322	96.1290488124432
NORTHUMBERLAND COUNTY	10395	9904	10090	10304.4313465051	103.022794846383	100.878929175705	98.1565906838454	96.1139888942937
POQUOSON CITY	9547	9109	9500	9479.6573875803	100.494736842105	100.710390783827	95.8842105263158	96.0899706347414
ORANGE COUNTY	28633	27351	27865	28465.9231224287	102.756145702494	100.586936446265	98.1553920689037	96.0833059316801
PRINCE WILLIAM COUNTY	317403	289258	300255	301330.510742882	105.711145526303	105.333840644777	96.3374465038051	95.9935982874356
CHESTERFIELD COUNTY	268258	253920	259725	264714.168974025	103.285398017134	101.33873870058	97.7649436904418	95.9223304835323
STAFFORD COUNTY	111503	103640	106940	108128.634023171	104.266878623527	103.120696018509	96.9141574714793	95.8488016946475
AMELIA COUNTY	10180	9903	10280	10370.9529879283	99.0272373540856	98.1587710584496	96.3326848249027	95.4878496848553
BOTETOURT COUNTY	26424	25671	26910	27035.1370044442	98.1939799331104	97.7394713984852	95.3957636566332	94.9542071703948
GREENE COUNTY	14956	14432	15215	15272.1611660643	98.2977325008216	97.9298203926316	94.8537627341439	94.4987408335423
ALBEMARLE COUNTY	84004	79107	83265	83844.7847760722	100.887527772774	100.189892817249	95.0063051702396	94.3493387349904
CULPEPER COUNTY	37104	35479	37200	37612.2190201729	99.741935483871	98.648792776889	95.3736559139785	94.3283882851241
EMPORIA CITY	3987	3710	4035	3933.80202774813	98.8104089219331	101.352329676903	91.9454770755886	94.3107958618791
KING AND QUEEN COUNTY	5385	5157	5430	5470.71685662867	99.171270718232	98.433169566712	94.9723756906077	94.2655256184836
GLOUCESTER COUNTY	30265	29068	30580	30862.8948915182	98.9699149771092	98.0627387883742	95.0555918901243	94.1842950966615
WESTMORELAND COUNTY	14199	13604	14470	14454.5505018151	98.1271596406358	98.2320411708203	94.015203870076	94.1156904069188
WARREN COUNTY	30580	28871	30360	30718.0311057939	100.724637681159	99.5506511946729	95.0955204216074	93.9871435788555
MADISON COUNTY	10409	10065	10770	10808.6021505376	96.6480965645311	96.3029247910864	93.4540389972145	93.1202745722244
FLUVANNA COUNTY	20994	20197	21415	21693.8872899953	98.0340882558954	96.7738041567216	94.3123978519729	93.0999582048827
KING GEORGE COUNTY	19822	18577	19745	19984.620303757	100.389972144847	99.1862727373088	94.084578374272	92.9564821229435
APPOMATTOX COUNTY	12256	11745	12415	12679.9865837297	98.7192911800242	96.6562536882037	94.6033024567056	92.6262809699701
ROANOKE COUNTY	73141	69136	74800	74926.0183357278	97.7820855614973	97.6176255253155	92.427807486631	92.2723528297154
SUFFOLK CITY	71421	65488	69495	71088.6638879661	102.77142240449	100.467495228997	94.2341175624146	92.1215794732158
HIGHLAND COUNTY	1879	1815	1920	1971.40562248996	97.8645833333333	95.3127037157757	94.53125	92.0662891134289
ARLINGTON COUNTY	175053	155984	169220	169528.629042616	103.446992081314	103.258665505987	92.178229523697	92.0104178750769
YORK COUNTY	50591	47465	51590	51709.595790716	98.0635782128319	97.8367732843179	92.0042643923241	91.7914736601402
FLOYD COUNTY	11879	11477	12425	12513.1262492746	95.6056338028169	94.9323115851135	92.3702213279678	91.7196851639319
SHENANDOAH COUNTY	32321	30958	33660	33764.3686006826	96.0219845513963	95.7251722436965	91.9726678550208	91.6883723375006
CUMBERLAND COUNTY	7381	7134	7750	7789.9649339934	95.2387096774194	94.750105585087	92.0516129032258	91.579359604933
CAROLINE COUNTY	23033	21805	23400	23813.5723839246	98.4316239316239	96.7221533529698	93.1837606837607	91.565430202818
ESSEX COUNTY	8291	7901	8610	8673.64480667172	96.2950058072009	95.5884196874491	91.7653890824623	91.0920400374546
MIDDLESEX COUNTY	8708	8308	8995	9137.79103230598	96.8093385214008	95.2965543774586	92.3624235686492	90.9191288203865
DINWIDDIE COUNTY	20876	19932	21840	21966.3645130183	95.5860805860806	95.0362085980494	91.2637362637363	90.7387291519602
FRANKLIN CITY	5877	5564	6060	6148.72293307087	96.980198019802	95.5808232696678	91.8151815181518	90.4903353194541
CHESAPEAKE CITY	176334	164440	181540	182080.136649466	97.1323124380302	96.8441716075112	90.5805883000992	90.3118830125735
CHARLOTTE COUNTY	8391	8099	9005	8968.26722791182	93.1815657967796	93.5632244976465	89.9389228206552	90.3073001080252
CAMPBELL COUNTY	40938	39080	43505	43321.1594582393	94.0995287897943	94.4988557830808	89.828755315481	90.2099585715667
MECKLENBURG COUNTY	22837	21907	24125	24290.6105610561	94.6611398963731	94.0157512409894	90.8062176165803	90.1871113734884
BATH COUNTY	3353	3207	3590	3556.62099339369	93.3983286908078	94.2748751196175	89.3314763231198	90.1698552068635
HALIFAX COUNTY	25042	24178	27030	26938.7203033355	92.645209027007	92.9591298993492	89.44876063633	89.7518505992518
WYTHE COUNTY	20849	20188	22580	22508.7368794326	92.333923826395	92.6262549146007	89.406554472985	89.6896174500436
VIRGINIA BEACH CITY	328860	303604	342075	339791.417436993	96.1368121026091	96.7829036061454	88.7536358985603	89.3501084547837
RUSSELL COUNTY	19126	18589	20935	20829.4123626696	91.3589682350131	91.8220815210206	88.7938858371149	89.2439963083892
ALEXANDRIA CITY	111216	97404	109245	109471.650837935	101.804201565289	101.593425465601	89.1610600027461	88.976460347894
SCOTT COUNTY	15973	15535	17470	17465.1205660553	91.4310246136234	91.4565687627983	88.9238694905552	88.9487131866319
PITTSYLVANIA COUNTY	44900	43260	48845	48732.8247628557	91.9234312621558	92.1350244285919	88.565871634763	88.7697362312001
PAGE COUNTY	17104	16549	18785	18744.7481198269	91.0513707745542	91.246891612849	88.0968858131488	88.286062283737
CARROLL COUNTY	21244	20645	23330	23390.9913659661	91.0587226746678	90.8212895623998	88.4912130304329	88.2604746288714
FRANKLIN COUNTY	39895	38563	43750	43800.8998363934	91.1885714285714	91.0826036657173	88.144	88.0415702509351
GILES COUNTY	12058	11549	13255	13150.2963699952	90.9694454922671	91.6937509295421	87.1293851376839	87.8231157310733
COLONIAL HEIGHTS CITY	12913	11965	13585	13641.6725216819	95.0533676849466	94.6584810585087	88.0750828119249	87.7091865457335
SOUTHAMPTON COUNTY	13173	12667	14545	14460.3282142263	90.5672052251633	91.0975173235707	87.0883465108285	87.5982883122804
HOPEWELL CITY	15555	14490	16780	16647.3259883344	92.699642431466	93.4384297568279	86.3528009535161	87.0410059258396
AMHERST COUNTY	22734	21897	25030	25187.8777356567	90.8270075908909	90.2577034817708	87.4830203755493	86.9346763939621
PATRICK COUNTY	12896	12527	14475	14454.5180552411	89.0915371329879	89.217779179597	86.5423143350604	86.664944151893
MANASSAS CITY	23663	21629	25055	24982.9372150123	94.4442227100379	94.7166451900655	86.3260826182399	86.5750884848044
PORTSMOUTH CITY	68285	63085	73500	73041.5565660911	92.9047619047619	93.4878762314065	85.8299319727891	86.3686413130011
AUGUSTA COUNTY	54910	53003	61185	61409.985342899	89.7442183541718	89.4154259985496	86.627441366348	86.310067823732
ROCKINGHAM COUNTY	56871	54263	62240	62923.0410796733	91.3737146529563	90.3818363260444	87.1834832904884	86.2370906887544
ALLEGHANY COUNTY	10965	10395	12265	12110.7078674121	89.4007337953526	90.5397117992166	84.7533632286996	85.8331330736759
WASHINGTON COUNTY	39282	37831	44075	44210.1918465228	89.125354509359	88.8528150621215	85.8332387975043	85.5707664226648
WAYNESBORO CITY	15564	14292	16550	16705.2752699593	94.0422960725076	93.1681744148711	86.356495468278	85.5538132059456
HAMPTON CITY	99929	89709	105410	105555.290793416	94.8003035765108	94.6698164051036	85.1048287638744	84.9876868565225
ROCKBRIDGE COUNTY	16203	15651	18455	18417.5014355758	87.7973448929829	87.9761028208844	84.806285559469	84.9789536042499
HENRY COUNTY	36321	34247	40650	40334.7960076348	89.3505535055351	90.0488005272791	84.2484624846248	84.9068382384221
FREDERICKSBURG CITY	19199	17549	20570	20786.1230585424	93.3349538162372	92.3645065793539	85.3135634419057	84.4265183583041
BUCHANAN COUNTY	14607	14009	17030	16593.5531947032	85.7721667645332	88.0281626762295	82.2607163828538	84.4243534559663
CHARLOTTESVILLE CITY	33751	30439	36565	36279.5483271375	92.3041159578832	93.0303753940449	83.2462737590592	83.9012650475344
MARTINSVILLE CITY	9035	8269	9780	9907.04111004502	92.3824130879346	91.19776429351	84.5501022494888	83.4658896450509
STAUNTON CITY	18003	16879	20050	20246.1778349511	89.7905236907731	88.9204873471048	84.1845386533666	83.3688221925113
DICKENSON COUNTY	10071	9453	11470	11364.839012417	87.8029642545771	88.6154215558763	82.4149956408021	83.1775970576605
LUNENBURG COUNTY	8036	7764	9320	9346.41326998416	86.2231759656652	85.9795064466866	83.3047210300429	83.069299160288
PULASKI COUNTY	23605	22821	27645	27561.5940014198	85.3861457768132	85.6445385516674	82.5501899077591	82.8000006052786
LEE COUNTY	15524	14913	18140	18075.0410226191	85.5788313120176	85.8863887532718	82.2105843439912	82.5060368125189
BRISTOL CITY	12449	11090	13545	13449.9195671249	91.9084533038021	92.5581743286302	81.87523071244	82.4540246850758
SMYTH COUNTY	20124	19429	23735	23613.9194256757	84.7861807457342	85.2209226144769	81.8580155887929	82.277743265587
GRAYSON COUNTY	10885	10488	12740	12750.803652968	85.4395604395604	85.3671681899538	82.3233908948195	82.253638950504
BLAND COUNTY	4624	4319	5285	5258.4851917786	87.4929044465468	87.9340690590782	81.7218543046358	82.1339196077333
MANASSAS PARK CITY	8938	8188	10210	9983.46973422316	87.5416258570029	89.5279921504714	80.1958863858962	82.0155739234795
NEWPORT NEWS CITY	122432	110842	136045	135707.766502132	89.9937520673307	90.2173863410214	81.4745121099636	81.6769760913118
PETERSBURG CITY	23678	20626	25315	25335.4844606947	93.5334781749951	93.4578536942283	81.4773849496346	81.411508163576
BRUNSWICK COUNTY	11073	10706	13200	13152.0871143376	83.8863636363636	84.1919605895016	81.1060606060606	81.4015289507093
COVINGTON CITY	3797	3576	4435	4394.75920432734	85.6144306651635	86.3983627649325	80.6313416009019	81.369645838135
WISE COUNTY	24615	23581	29215	28980.9985437029	84.254663700154	84.9349616538607	80.7153859318843	81.3671066731541
DANVILLE CITY	28416	26010	32135	32023.525462526	88.426948809709	88.734764800498	80.9397852808464	81.2215383045099
WINCHESTER CITY	18077	15867	19730	19604.3848479459	91.6218955904714	92.2089631488439	80.4206791687785	80.9359749008522
ROANOKE CITY	65501	58902	73540	72871.9555618569	89.0685341310851	89.8850586552461	80.0951862931738	80.8294487856874
NORTON CITY	2600	2443	3055	3036.48994767282	85.1063829787234	85.6251805474493	79.9672667757774	80.4547369528534
SALEM CITY	17727	16271	20090	20223.9122032026	88.2379293180687	87.6536637515305	80.9905425584868	80.4542654087636
RICHMOND CITY	158141	142368	177060	178454.840906495	89.3149214955382	88.6168171155757	80.4066418163334	79.778166440773
LYNCHBURG CITY	55508	49029	61460	61591.2816299704	90.3156524568825	90.1231449176237	79.7738366417182	79.6037989508931
GALAX CITY	4016	3765	4730	4780.43249737198	84.9048625792812	84.0091352028876	79.5983086680761	78.7585642527071
TAZEWELL COUNTY	27315	25371	32495	32303.552312954	84.0590860132328	84.5572639670542	78.0766271734113	78.5393499581963
BUCKINGHAM COUNTY	11026	10436	13620	13653.7685950413	80.9544787077827	80.7542615304345	76.6226138032305	76.4331102241624
SUSSEX COUNTY	7121	6836	9185	9072.86348501665	77.5285792052259	78.4867975998972	74.4256940664126	75.3455621953234
RICHMOND COUNTY	5629	5442	7200	7224.66570891811	78.1805555555556	77.9136395619187	75.5833333333333	75.3252845080764
BUENA VISTA CITY	4380	3900	5225	5206.04308390023	83.8277511961722	84.1329956247427	74.6411483253589	74.9129413097024
NOTTOWAY COUNTY	9651	9272	12435	12415.0516555441	77.6115802171291	77.7362855005938	74.5637314032971	74.6835394427009
PRINCE EDWARD COUNTY	13443	12710	17880	17855.5704331193	75.1845637583893	75.2874294907171	71.0850111856823	71.182268007663
PRINCE GEORGE COUNTY	24912	22751	32440	32542.6057025908	76.7940813810111	76.5519523165187	70.132552404439	69.9114269088438
NORFOLK CITY	138210	126445	184395	182802.310498883	74.9532254128366	75.6062653818834	68.5729005667182	69.170351104929
MONTGOMERY COUNTY	60377	54593	79030	79093.9692416654	76.3975705428318	76.3357820815931	69.0788308237378	69.02296157776
WILLIAMSBURG CITY	10127	8721	12985	13178.843062201	77.9899884482095	76.8428605773891	67.1621101270697	66.174245788033
GREENSVILLE COUNTY	6395	6154	9595	9438.61461619348	66.6492965085982	67.7535873647001	64.137571651902	65.2002465429811
HARRISONBURG CITY	25205	22833	38720	38443.4607770834	65.0955578512397	65.5638162915473	58.9695247933884	59.3937162223725
RADFORD CITY	9171	8418	14315	14484.6131060331	64.0656653859588	63.315464022854	58.8054488298987	58.1168439804149
LEXINGTON CITY	4087	3410	6470	6490.89062289789	63.1684698608965	62.9651651436292	52.7047913446677	52.5351634792698

Election Data Analysis Election Forensics Election Integrity mathematics programming technical

Voter ID Number distribution patterns in VA Registered Voter List

Post author By Jonathan Lareau
Post date July 6, 2023
No Comments on Voter ID Number distribution patterns in VA Registered Voter List

One thing that I have been asked about repeatedly is if there is any sort of patterns in the assignment of voter ID numbers in the VA data. Specifically, I’ve been asked repeatedly if I’ve found any similar pattern to what AuditNY has found in the NY data. It’s not something that I have looked at in depth previously due mostly to lack of time, and because VA is setup very differently than NY, so a direct comparison or attempt to replicate the AuditNY findings in VA isn’t as straightforward as one would hope.

The NY data uses a different Voter ID number for counties vs at the state level, which is the “Rosetta Stone” that was needed for the NY team to understand the algorithms that were used to assign voter ID numbers, and in turn discover some very (ahem) “interesting” patterns in the data. VA doesn’t have such a system and only uses a single voter ID number throughout the state and local jurisdictions.

Well … while my other machine is busy crunching on the string distance computations, I figured I’d take a crack at looking at the distribution of the Voter ID numbers in the VA Registered Voter List (RVL) and just see what I find.

To start with, here is a simple scatter plot of the Voter ID numbers vs the Registration date for each record in the 2023-07-01 RVL. From the zoomed out plot it is readily apparent that there must have been a change in the algorithm that was used to assign voter identification numbers sometime around 2007, which coincides nicely with the introduction of the current Virginia Election and Registration Information System (VERIS) system.

From a high level, it appears that the previous assignment algorithm broke the universe of possible ID numbers up into discrete ranges and assigned IDs within those ranges, but favoring the bottom of each range. This would be a logical explanation for the banded structure we see pre-2007. The new assignment algorithm post-2007 looks to be using a much more randomized approach. Nothing strange about that. As computing systems have gotten better and security has become more of a concern over the years there have been many systems that migrated to more randomized assignments of identification numbers.

Looking at a zoomed in block of the post-2007 “randomized” ID assignments we can see some of the normal variability that we would expect to see in the election cycles. We see that we have a high density of new assignments around November of 2016 and 2020, with a low density section of assignments correlated to the COVID-19 lockdowns. There are short periods where it looks like there were lulls in the assignment of voter ID’s, these are perhaps due to holidays or maintenance periods, or related to the legal requirements to “freeze” the voter rolls 30 days before any election (primaries, runoffs, etc). Note that VA now has same day voter registration as of the laws passed by the previous democratic super-majority that went into affect in 2022, so going forward we would likely see these “blackout periods” be significantly reduced.

We can see more clearly the banded assignment structure of the pre-2007 entries by zooming in on a smaller section of the plot, as shown below. It’s harder to make out in this banded structure, but we still see similar patterns of density changes presumably due to the natural election cycles, holidays, maintenance periods, legally required registration lockouts periods, etc. We can also see the “bucketing” of ID numbers into distinct bands, with the bias of numbers filling the lower section of each band.

All of that looks unremarkable and seems to make sense to me … however … if we zoom into the Voter ID address range of around [900,000,000 to 920,000,000] we do see something that catches my curiosity. We see the existence of the same banded structure as above between 900,000,000 and 915,000,000 AND pre-2007, but there is another band of assignments super-imposed on the entire date range of the RVL. This band does not seem to be affected by the introduction of the VERIS system (presumably), which is very interesting. There is also what looks like to be a vertical high-density band between 2007 and 2010 that extends along the entire vertical axis, but we only see it once we zoom in to the VERIS transition period.

The horizontal band that extends across all date ranges only exists in the [~915,000,000 to ~920,000,000] ID range. It trails off in density pre ~1993, but it exists throughout the full registration date range. I will note that the “Motor Voter” National Voter Registration Act (NVRA) was implemented in 1993, so perhaps these are a reserved universe block for DMV (or other externally provided) registrations? (That’s a guess, but an educated one.)

A plausible explanation I can imagine for the distinct high density band between 2007-2010 is that this might be related to how the VERIS system was implemented and brought into service, and there was some sort of update around 2010 that made correction to its internal algorithms. (But that is just a guess.) That still wouldn’t entirely explain the huge change in the density of new registrants added to the rolls.

Another, or additional, explanation might be that when VERIS came online there were a number of registrants that had their Voter ID number regenerated and/or their registration date field updated as part of the rollout of the new VERIS software. Meaning that while VERIS was coming online and handling the normal amount of new real registrations, it was also moving/updating a large number of historic registrations, which would account for the higher density as VERIS became the system of record. That seems to be a poor systems administration and design choice, in my opinion, as it makes inaccurate those moved registrant records by giving them a false registration date. However, if that was the case, and VERIS was resetting registration dates as it ingested voter records into its databases, why do we see any records with pre-2007 registration dates at all? (This is again, merely an educated guess on my part, so take with a grain of salt.)

Incorporating the identification of cloned registrations

In attempting to incorporate some of my early results on the most recent RVL data doing duplicate record identification (technically they are “cloned” records, as “duplicates” would have the same voter ID numbers. This was pointed out to me a few days ago.) on this dataset, I did a scatter plot of only those records that had an identified exact match of (FullName +DOB) to other records in the dataset, but with unique Voter ID numbers. The scatter plot of those records is shown below, and we can see that there is a distinct ~horizontal cluster of records that aligns with the 915M – 920M ID band and pre-2007. In the post-2007 block we see the cloned records do not seem to be totally randomly distributed, but have a bias towards the lower right of the graph.

Superimposing the two plots produces the following, with the red indicating the records with identified Full Name + DOB string matches.

Zooming in to take a closer look at the 915M-920M band again, gives the following:

It is curious that there seems to be an alignment of the exact Full Name + DOB matching records with the 915M-920M, pre-2007 ID band. Post-2007 the exact cloned matches have a less structured distribution throughout the data, but they do seem to cluster around the lower right.

If the cloned records were simply due to random data entry errors, etc. I would expect to see sporadic red datapoints distributed “salt-n-pepper” style throughout the entirety of the area covered by the blue data. There might be some argument to be made for there being a bias of more of the red data points to the right side of the plot, as officials have not yet had time to “catch” or “clean-up” erroneous entries, but there is little reason to have linear features, or to have a bias for lower ID numbers in the vertical axis.

I am continuing to investigate this data, but as of right now all I can tell you is that … yes, there does seem to be interesting patterns in the way Voter IDs are assigned in VA, especially with records that have already been found and flagged to be problematic (clones).

Election Data Analysis Election Forensics Election Integrity mathematics programming technical

Potential Duplicate Registrants in VA RVL by Locality

Post author By Jonathan Lareau
Post date May 31, 2023
No Comments on Potential Duplicate Registrants in VA RVL by Locality

Previously I posted the computation of potential duplicate records based on string comparisons in the registered voter list. As a follow up to that article, I’ve compiled the statistics of the number of potential pairs for each locality in VA.

I tallied the number of registrant pairs with the reference match criteria defined by the MOU between ELECT and the DMV along with the two highest confidence (most stringent) match criteria that I computed. I also stratified the results by Active registrant records only or either Active or Inactive records. I also stratified by if the pairs crossed a locality boundary or not.

The table below is organized into the following computed columns, and has been sorted in decreasing order according to column 5.

Exactly matching First + Last + DOB, which is equivalent to the MOU between ELECT and DMV.
Exactly matching First + Middle + Last + Suffix + DOB
Exactly matching First + Middle + Last + Suffix + DOB + Gender + Street Address
The same as #1, but filtering for only ACTIVE voter records
The same as #2, but filtering for only ACTIVE voter records
The same as #3, but filtering for only ACTIVE voter records
The same as #1, but filtering for only pairs that cross a locality boundary.
The same as #2, but filtering for only pairs that cross a locality boundary.
The same as #3, but filtering for only pairs that cross a locality boundary.
The same as #4, but filtering for only pairs that cross a locality boundary.
The same as #5, but filtering for only pairs that cross a locality boundary.
The same as #6, but filtering for only pairs that cross a locality boundary.

		1	2	3	4	5	6	7	8	9	10	11	12
LOCALITY_NAME	Num Registrant Records	Pct Same First Last Dob	Pct Same Full Name Dob	Pct Same Full Name Dob Address	Pct Same First Last Dob _ Active Only	Pct Same Full Name Dob _ Active Only	Pct Same Full Name Dob Address _ Active Only	Pct Same First Last Dob _ xLoc	Pct Same Full Name Dob _ xLoc	Pct Same Full Name Dob Address _ xLoc	Pct Same First Last Dob _ Active Only _ xLoc	Pct Same Full Name Dob _ Active Only _ xLoc	Pct Same Full Name Dob Address _ Active Only _ xLoc
NORTON CITY	2604	0.2304%	0.2304%	0.1536%	0.1920%	0.1920%	0.1536%	0.0768%	0.0768%	0.0000%	0.0384%	0.0384%	0.0000%
NOTTOWAY COUNTY	9704	0.2988%	0.2061%	0.0618%	0.2473%	0.1752%	0.0618%	0.2370%	0.1649%	0.0206%	0.1855%	0.1340%	0.0206%
RADFORD CITY	9551	0.4293%	0.2827%	0.0000%	0.2827%	0.1675%	0.0000%	0.4293%	0.2827%	0.0000%	0.2827%	0.1675%	0.0000%
HIGHLAND COUNTY	1903	0.2627%	0.1576%	0.1051%	0.2627%	0.1576%	0.1051%	0.1576%	0.0525%	0.0000%	0.1576%	0.0525%	0.0000%
WILLIAMSBURG CITY	10480	0.2195%	0.1336%	0.0000%	0.2004%	0.1336%	0.0000%	0.2004%	0.1336%	0.0000%	0.1813%	0.1336%	0.0000%
LYNCHBURG CITY	56319	0.3072%	0.1829%	0.0533%	0.2255%	0.1296%	0.0533%	0.1616%	0.0764%	0.0000%	0.1190%	0.0479%	0.0000%
EMPORIA CITY	4023	0.3480%	0.1740%	0.0000%	0.2983%	0.1243%	0.0000%	0.2486%	0.0746%	0.0000%	0.1989%	0.0249%	0.0000%
SUFFOLK CITY	71580	0.2403%	0.1229%	0.0754%	0.2249%	0.1187%	0.0754%	0.1229%	0.0307%	0.0000%	0.1104%	0.0265%	0.0000%
FALLS CHURCH CITY	11213	0.1784%	0.1338%	0.0357%	0.1516%	0.1159%	0.0178%	0.0892%	0.0624%	0.0000%	0.0803%	0.0624%	0.0000%
SUSSEX COUNTY	7149	0.2658%	0.1259%	0.0839%	0.2238%	0.1119%	0.0839%	0.1539%	0.0140%	0.0000%	0.1119%	0.0000%	0.0000%
FRANKLIN CITY	5924	0.2026%	0.1182%	0.0338%	0.1857%	0.1013%	0.0338%	0.1688%	0.0844%	0.0000%	0.1519%	0.0675%	0.0000%
APPOMATTOX COUNTY	12195	0.2542%	0.1230%	0.0328%	0.2214%	0.0902%	0.0328%	0.2050%	0.0738%	0.0000%	0.1886%	0.0574%	0.0000%
LEE COUNTY	15619	0.2497%	0.0960%	0.0128%	0.2305%	0.0832%	0.0128%	0.1473%	0.0192%	0.0000%	0.1409%	0.0192%	0.0000%
ALBEMARLE COUNTY	84889	0.1920%	0.1001%	0.0212%	0.1590%	0.0825%	0.0188%	0.1402%	0.0554%	0.0000%	0.1096%	0.0401%	0.0000%
AMHERST COUNTY	22906	0.1965%	0.0829%	0.0437%	0.1790%	0.0742%	0.0437%	0.1441%	0.0393%	0.0000%	0.1266%	0.0306%	0.0000%
PRINCE EDWARD COUNTY	13595	0.2280%	0.0883%	0.0000%	0.1912%	0.0662%	0.0000%	0.2133%	0.0883%	0.0000%	0.1765%	0.0662%	0.0000%
STAUNTON CITY	18180	0.1980%	0.0935%	0.0000%	0.1595%	0.0605%	0.0000%	0.1650%	0.0605%	0.0000%	0.1265%	0.0275%	0.0000%
NELSON COUNTY	11895	0.1765%	0.0673%	0.0168%	0.1513%	0.0588%	0.0168%	0.1261%	0.0504%	0.0000%	0.1177%	0.0420%	0.0000%
ARLINGTON COUNTY	177092	0.1378%	0.0683%	0.0113%	0.1146%	0.0576%	0.0102%	0.0870%	0.0344%	0.0000%	0.0683%	0.0260%	0.0000%
NORTHUMBERLAND COUNTY	10457	0.1339%	0.0574%	0.0191%	0.1243%	0.0574%	0.0191%	0.0956%	0.0191%	0.0000%	0.0861%	0.0191%	0.0000%
SOUTHAMPTON COUNTY	13218	0.2194%	0.0757%	0.0000%	0.1740%	0.0530%	0.0000%	0.1589%	0.0454%	0.0000%	0.1286%	0.0227%	0.0000%
HOPEWELL CITY	15825	0.2401%	0.0695%	0.0253%	0.2085%	0.0506%	0.0253%	0.1390%	0.0190%	0.0000%	0.1201%	0.0126%	0.0000%
LUNENBURG COUNTY	8097	0.1853%	0.0618%	0.0000%	0.1729%	0.0494%	0.0000%	0.1853%	0.0618%	0.0000%	0.1729%	0.0494%	0.0000%
AMELIA COUNTY	10179	0.1375%	0.0884%	0.0098%	0.0884%	0.0491%	0.0098%	0.1375%	0.0884%	0.0098%	0.0884%	0.0491%	0.0098%
RICHMOND CITY	161097	0.1707%	0.0639%	0.0000%	0.1316%	0.0490%	0.0000%	0.1459%	0.0528%	0.0000%	0.1155%	0.0416%	0.0000%
CHARLOTTESVILLE CITY	34789	0.1265%	0.0604%	0.0000%	0.1064%	0.0489%	0.0000%	0.1150%	0.0489%	0.0000%	0.0949%	0.0374%	0.0000%
LEXINGTON CITY	4211	0.2612%	0.1187%	0.0000%	0.1900%	0.0475%	0.0000%	0.2612%	0.1187%	0.0000%	0.1900%	0.0475%	0.0000%
FAIRFAX COUNTY	787727	0.1143%	0.0559%	0.0053%	0.0988%	0.0474%	0.0053%	0.0665%	0.0236%	0.0000%	0.0546%	0.0171%	0.0000%
CHARLOTTE COUNTY	8474	0.2242%	0.0708%	0.0236%	0.1652%	0.0472%	0.0236%	0.2006%	0.0472%	0.0000%	0.1416%	0.0236%	0.0000%
HARRISONBURG CITY	26443	0.1777%	0.0870%	0.0000%	0.1210%	0.0454%	0.0000%	0.1324%	0.0567%	0.0000%	0.0908%	0.0303%	0.0000%
BRUNSWICK COUNTY	11098	0.2253%	0.0631%	0.0000%	0.1982%	0.0451%	0.0000%	0.2072%	0.0451%	0.0000%	0.1802%	0.0270%	0.0000%
HAMPTON CITY	100807	0.2044%	0.0764%	0.0060%	0.1468%	0.0446%	0.0040%	0.1210%	0.0387%	0.0000%	0.0972%	0.0268%	0.0000%
WISE COUNTY	24750	0.1455%	0.0525%	0.0000%	0.1333%	0.0444%	0.0000%	0.1212%	0.0364%	0.0000%	0.1091%	0.0283%	0.0000%
WYTHE COUNTY	20950	0.1480%	0.0525%	0.0191%	0.1289%	0.0430%	0.0191%	0.1002%	0.0143%	0.0000%	0.0907%	0.0143%	0.0000%
CHESAPEAKE CITY	178005	0.1258%	0.0433%	0.0303%	0.1140%	0.0410%	0.0303%	0.0843%	0.0062%	0.0000%	0.0747%	0.0051%	0.0000%
NEWPORT NEWS CITY	124778	0.1354%	0.0537%	0.0016%	0.1122%	0.0409%	0.0016%	0.1002%	0.0313%	0.0000%	0.0850%	0.0216%	0.0000%
CUMBERLAND COUNTY	7416	0.1483%	0.0539%	0.0270%	0.1214%	0.0405%	0.0270%	0.1214%	0.0270%	0.0000%	0.0944%	0.0135%	0.0000%
PRINCE GEORGE COUNTY	24957	0.1643%	0.0401%	0.0000%	0.1322%	0.0401%	0.0000%	0.1643%	0.0401%	0.0000%	0.1322%	0.0401%	0.0000%
HALIFAX COUNTY	25086	0.1196%	0.0438%	0.0239%	0.1156%	0.0399%	0.0239%	0.0877%	0.0120%	0.0000%	0.0837%	0.0080%	0.0000%
SMYTH COUNTY	20159	0.1339%	0.0397%	0.0000%	0.1290%	0.0397%	0.0000%	0.1141%	0.0198%	0.0000%	0.1091%	0.0198%	0.0000%
FAIRFAX CITY	17825	0.1234%	0.0617%	0.0000%	0.0954%	0.0393%	0.0000%	0.1122%	0.0617%	0.0000%	0.0842%	0.0393%	0.0000%
CAMPBELL COUNTY	41318	0.1380%	0.0508%	0.0048%	0.1186%	0.0387%	0.0048%	0.1283%	0.0411%	0.0000%	0.1089%	0.0290%	0.0000%
COLONIAL HEIGHTS CITY	13066	0.0918%	0.0383%	0.0000%	0.0918%	0.0383%	0.0000%	0.0918%	0.0383%	0.0000%	0.0918%	0.0383%	0.0000%
CHESTERFIELD COUNTY	270084	0.1529%	0.0478%	0.0067%	0.1300%	0.0381%	0.0059%	0.1107%	0.0248%	0.0000%	0.0937%	0.0196%	0.0000%
PETERSBURG CITY	23740	0.1685%	0.0421%	0.0000%	0.1559%	0.0379%	0.0000%	0.1601%	0.0421%	0.0000%	0.1474%	0.0379%	0.0000%
SURRY COUNTY	5675	0.1762%	0.0352%	0.0000%	0.1410%	0.0352%	0.0000%	0.1410%	0.0000%	0.0000%	0.1057%	0.0000%	0.0000%
STAFFORD COUNTY	111261	0.1222%	0.0440%	0.0072%	0.1079%	0.0351%	0.0072%	0.1007%	0.0279%	0.0000%	0.0881%	0.0207%	0.0000%
BUCHANAN COUNTY	14836	0.0876%	0.0337%	0.0000%	0.0876%	0.0337%	0.0000%	0.0607%	0.0067%	0.0000%	0.0607%	0.0067%	0.0000%
PORTSMOUTH CITY	68381	0.1536%	0.0409%	0.0058%	0.1375%	0.0336%	0.0058%	0.1185%	0.0263%	0.0000%	0.1024%	0.0190%	0.0000%
PITTSYLVANIA COUNTY	45322	0.1677%	0.0441%	0.0044%	0.1522%	0.0331%	0.0044%	0.1324%	0.0221%	0.0000%	0.1214%	0.0154%	0.0000%
MECKLENBURG COUNTY	22996	0.1522%	0.0478%	0.0000%	0.1305%	0.0304%	0.0000%	0.1261%	0.0391%	0.0000%	0.1131%	0.0304%	0.0000%
NORTHAMPTON COUNTY	9877	0.0911%	0.0304%	0.0202%	0.0810%	0.0304%	0.0202%	0.0911%	0.0101%	0.0000%	0.0810%	0.0101%	0.0000%
PAGE COUNTY	17095	0.1872%	0.0351%	0.0000%	0.1521%	0.0292%	0.0000%	0.1521%	0.0117%	0.0000%	0.1170%	0.0058%	0.0000%
ACCOMACK COUNTY	25483	0.1216%	0.0275%	0.0000%	0.1020%	0.0275%	0.0000%	0.1138%	0.0275%	0.0000%	0.0942%	0.0275%	0.0000%
GRAYSON COUNTY	10941	0.1645%	0.0274%	0.0000%	0.1554%	0.0274%	0.0000%	0.1462%	0.0274%	0.0000%	0.1371%	0.0274%	0.0000%
ALLEGHANY COUNTY	11069	0.1355%	0.0271%	0.0000%	0.1084%	0.0271%	0.0000%	0.0994%	0.0090%	0.0000%	0.0723%	0.0090%	0.0000%
MATHEWS COUNTY	7378	0.0949%	0.0271%	0.0271%	0.0678%	0.0271%	0.0271%	0.0678%	0.0000%	0.0000%	0.0407%	0.0000%	0.0000%
BEDFORD COUNTY	63240	0.1233%	0.0300%	0.0063%	0.1154%	0.0269%	0.0063%	0.1012%	0.0142%	0.0000%	0.0933%	0.0111%	0.0000%
HENRICO COUNTY	240436	0.1152%	0.0299%	0.0083%	0.0998%	0.0258%	0.0083%	0.0944%	0.0175%	0.0000%	0.0807%	0.0133%	0.0000%
WAYNESBORO CITY	15561	0.1735%	0.0450%	0.0000%	0.1285%	0.0257%	0.0000%	0.1735%	0.0450%	0.0000%	0.1285%	0.0257%	0.0000%
HANOVER COUNTY	87000	0.1092%	0.0287%	0.0023%	0.1011%	0.0253%	0.0023%	0.1023%	0.0218%	0.0000%	0.0943%	0.0184%	0.0000%
CRAIG COUNTY	3972	0.1007%	0.0252%	0.0000%	0.1007%	0.0252%	0.0000%	0.1007%	0.0252%	0.0000%	0.1007%	0.0252%	0.0000%
GALAX CITY	4067	0.1229%	0.0246%	0.0000%	0.1229%	0.0246%	0.0000%	0.1229%	0.0246%	0.0000%	0.1229%	0.0246%	0.0000%
ORANGE COUNTY	28482	0.1299%	0.0351%	0.0000%	0.1194%	0.0246%	0.0000%	0.1299%	0.0351%	0.0000%	0.1194%	0.0246%	0.0000%
DANVILLE CITY	28838	0.1040%	0.0312%	0.0000%	0.0902%	0.0243%	0.0000%	0.1040%	0.0312%	0.0000%	0.0902%	0.0243%	0.0000%
CARROLL COUNTY	21163	0.1040%	0.0236%	0.0095%	0.1040%	0.0236%	0.0095%	0.0945%	0.0142%	0.0000%	0.0945%	0.0142%	0.0000%
FREDERICK COUNTY	67912	0.1075%	0.0324%	0.0088%	0.0883%	0.0236%	0.0059%	0.0898%	0.0206%	0.0000%	0.0736%	0.0147%	0.0000%
MANASSAS PARK CITY	9018	0.0665%	0.0222%	0.0000%	0.0554%	0.0222%	0.0000%	0.0444%	0.0222%	0.0000%	0.0333%	0.0222%	0.0000%
HENRY COUNTY	36539	0.1259%	0.0246%	0.0000%	0.1122%	0.0219%	0.0000%	0.0931%	0.0082%	0.0000%	0.0848%	0.0055%	0.0000%
BLAND COUNTY	4581	0.1091%	0.0218%	0.0000%	0.1091%	0.0218%	0.0000%	0.1091%	0.0218%	0.0000%	0.1091%	0.0218%	0.0000%
SPOTSYLVANIA COUNTY	105361	0.0987%	0.0247%	0.0057%	0.0873%	0.0218%	0.0057%	0.0816%	0.0095%	0.0000%	0.0702%	0.0066%	0.0000%
WINCHESTER CITY	18352	0.1035%	0.0381%	0.0000%	0.0708%	0.0218%	0.0000%	0.0926%	0.0381%	0.0000%	0.0599%	0.0218%	0.0000%
LANCASTER COUNTY	9267	0.0755%	0.0216%	0.0000%	0.0755%	0.0216%	0.0000%	0.0755%	0.0216%	0.0000%	0.0755%	0.0216%	0.0000%
KING WILLIAM COUNTY	13996	0.1286%	0.0214%	0.0000%	0.1143%	0.0214%	0.0000%	0.1286%	0.0214%	0.0000%	0.1143%	0.0214%	0.0000%
WESTMORELAND COUNTY	14233	0.1827%	0.0211%	0.0000%	0.1756%	0.0211%	0.0000%	0.1546%	0.0211%	0.0000%	0.1475%	0.0211%	0.0000%
VIRGINIA BEACH CITY	331914	0.1118%	0.0259%	0.0066%	0.0967%	0.0208%	0.0066%	0.0883%	0.0114%	0.0000%	0.0762%	0.0081%	0.0000%
POWHATAN COUNTY	24287	0.1400%	0.0371%	0.0000%	0.1153%	0.0206%	0.0000%	0.1400%	0.0371%	0.0000%	0.1153%	0.0206%	0.0000%
BOTETOURT COUNTY	26311	0.1178%	0.0190%	0.0076%	0.1102%	0.0190%	0.0076%	0.1102%	0.0114%	0.0000%	0.1026%	0.0114%	0.0000%
FLUVANNA COUNTY	21001	0.1286%	0.0238%	0.0000%	0.1190%	0.0190%	0.0000%	0.1095%	0.0238%	0.0000%	0.1000%	0.0190%	0.0000%
SCOTT COUNTY	16059	0.1121%	0.0249%	0.0000%	0.1059%	0.0187%	0.0000%	0.0996%	0.0125%	0.0000%	0.0934%	0.0062%	0.0000%
ALEXANDRIA CITY	112212	0.0820%	0.0205%	0.0000%	0.0686%	0.0178%	0.0000%	0.0784%	0.0169%	0.0000%	0.0651%	0.0143%	0.0000%
TAZEWELL COUNTY	28147	0.0995%	0.0178%	0.0142%	0.0959%	0.0178%	0.0142%	0.0853%	0.0036%	0.0000%	0.0817%	0.0036%	0.0000%
RICHMOND COUNTY	5649	0.2301%	0.0354%	0.0000%	0.1947%	0.0177%	0.0000%	0.1947%	0.0354%	0.0000%	0.1593%	0.0177%	0.0000%
ROCKINGHAM COUNTY	56817	0.0845%	0.0246%	0.0035%	0.0739%	0.0176%	0.0000%	0.0739%	0.0176%	0.0000%	0.0669%	0.0141%	0.0000%
LOUISA COUNTY	29567	0.1150%	0.0271%	0.0135%	0.1015%	0.0169%	0.0135%	0.1082%	0.0135%	0.0000%	0.0947%	0.0034%	0.0000%
LOUDOUN COUNTY	291914	0.0740%	0.0219%	0.0041%	0.0620%	0.0164%	0.0041%	0.0651%	0.0171%	0.0000%	0.0531%	0.0116%	0.0000%
RAPPAHANNOCK COUNTY	6239	0.0962%	0.0160%	0.0000%	0.0801%	0.0160%	0.0000%	0.0962%	0.0160%	0.0000%	0.0801%	0.0160%	0.0000%
JAMES CITY COUNTY	64390	0.0745%	0.0186%	0.0000%	0.0668%	0.0155%	0.0000%	0.0621%	0.0124%	0.0000%	0.0544%	0.0093%	0.0000%
PATRICK COUNTY	12862	0.0855%	0.0155%	0.0000%	0.0777%	0.0155%	0.0000%	0.0855%	0.0155%	0.0000%	0.0777%	0.0155%	0.0000%
PRINCE WILLIAM COUNTY	316530	0.0812%	0.0186%	0.0000%	0.0663%	0.0148%	0.0000%	0.0711%	0.0142%	0.0000%	0.0581%	0.0104%	0.0000%
AUGUSTA COUNTY	54993	0.1455%	0.0218%	0.0036%	0.1255%	0.0145%	0.0036%	0.1346%	0.0182%	0.0000%	0.1146%	0.0109%	0.0000%
DINWIDDIE COUNTY	20835	0.1584%	0.0384%	0.0048%	0.1152%	0.0144%	0.0048%	0.1488%	0.0288%	0.0048%	0.1152%	0.0144%	0.0048%
GOOCHLAND COUNTY	21410	0.1261%	0.0187%	0.0000%	0.1121%	0.0140%	0.0000%	0.1261%	0.0187%	0.0000%	0.1121%	0.0140%	0.0000%
MONTGOMERY COUNTY	61944	0.0936%	0.0145%	0.0000%	0.0807%	0.0129%	0.0000%	0.0904%	0.0145%	0.0000%	0.0775%	0.0129%	0.0000%
SHENANDOAH COUNTY	32304	0.0960%	0.0155%	0.0000%	0.0743%	0.0124%	0.0000%	0.0960%	0.0155%	0.0000%	0.0743%	0.0124%	0.0000%
ROANOKE COUNTY	73467	0.0953%	0.0163%	0.0027%	0.0830%	0.0123%	0.0027%	0.0817%	0.0109%	0.0000%	0.0694%	0.0068%	0.0000%
SALEM CITY	17932	0.0892%	0.0112%	0.0000%	0.0781%	0.0112%	0.0000%	0.0892%	0.0112%	0.0000%	0.0781%	0.0112%	0.0000%
NEW KENT COUNTY	19022	0.1051%	0.0210%	0.0000%	0.0894%	0.0105%	0.0000%	0.0946%	0.0210%	0.0000%	0.0789%	0.0105%	0.0000%
WASHINGTON COUNTY	39449	0.1014%	0.0152%	0.0000%	0.0887%	0.0101%	0.0000%	0.0862%	0.0051%	0.0000%	0.0786%	0.0051%	0.0000%
MADISON COUNTY	10407	0.0865%	0.0192%	0.0000%	0.0769%	0.0096%	0.0000%	0.0865%	0.0192%	0.0000%	0.0769%	0.0096%	0.0000%
NORFOLK CITY	141236	0.0984%	0.0092%	0.0000%	0.0864%	0.0085%	0.0000%	0.0899%	0.0064%	0.0000%	0.0793%	0.0057%	0.0000%
PULASKI COUNTY	23825	0.0881%	0.0126%	0.0000%	0.0756%	0.0084%	0.0000%	0.0881%	0.0126%	0.0000%	0.0756%	0.0084%	0.0000%
CLARKE COUNTY	12269	0.1060%	0.0163%	0.0000%	0.0978%	0.0082%	0.0000%	0.1060%	0.0163%	0.0000%	0.0978%	0.0082%	0.0000%
GREENE COUNTY	14926	0.1072%	0.0067%	0.0000%	0.1072%	0.0067%	0.0000%	0.1072%	0.0067%	0.0000%	0.1072%	0.0067%	0.0000%
GLOUCESTER COUNTY	30284	0.0859%	0.0066%	0.0000%	0.0859%	0.0066%	0.0000%	0.0859%	0.0066%	0.0000%	0.0859%	0.0066%	0.0000%
WARREN COUNTY	30517	0.0885%	0.0066%	0.0000%	0.0852%	0.0066%	0.0000%	0.0819%	0.0066%	0.0000%	0.0786%	0.0066%	0.0000%
ISLE OF WIGHT COUNTY	31179	0.0898%	0.0064%	0.0000%	0.0834%	0.0064%	0.0000%	0.0898%	0.0064%	0.0000%	0.0834%	0.0064%	0.0000%
ROCKBRIDGE COUNTY	16266	0.1230%	0.0123%	0.0000%	0.1045%	0.0061%	0.0000%	0.1230%	0.0123%	0.0000%	0.1045%	0.0061%	0.0000%
CULPEPER COUNTY	37117	0.0943%	0.0108%	0.0000%	0.0808%	0.0054%	0.0000%	0.0889%	0.0108%	0.0000%	0.0754%	0.0054%	0.0000%
FAUQUIER COUNTY	56396	0.0887%	0.0071%	0.0000%	0.0762%	0.0053%	0.0000%	0.0887%	0.0071%	0.0000%	0.0762%	0.0053%	0.0000%
FREDERICKSBURG CITY	19455	0.0874%	0.0051%	0.0000%	0.0720%	0.0051%	0.0000%	0.0874%	0.0051%	0.0000%	0.0720%	0.0051%	0.0000%
FRANKLIN COUNTY	39866	0.0602%	0.0050%	0.0050%	0.0502%	0.0050%	0.0050%	0.0552%	0.0000%	0.0000%	0.0452%	0.0000%	0.0000%
MANASSAS CITY	23815	0.1008%	0.0042%	0.0000%	0.0966%	0.0042%	0.0000%	0.0840%	0.0042%	0.0000%	0.0798%	0.0042%	0.0000%
YORK COUNTY	50838	0.0925%	0.0157%	0.0000%	0.0669%	0.0039%	0.0000%	0.0885%	0.0157%	0.0000%	0.0629%	0.0039%	0.0000%
BATH COUNTY	3358	0.0893%	0.0000%	0.0000%	0.0893%	0.0000%	0.0000%	0.0893%	0.0000%	0.0000%	0.0893%	0.0000%	0.0000%
BRISTOL CITY	12345	0.0729%	0.0000%	0.0000%	0.0567%	0.0000%	0.0000%	0.0567%	0.0000%	0.0000%	0.0567%	0.0000%	0.0000%
BUCKINGHAM COUNTY	11063	0.1356%	0.0271%	0.0000%	0.0904%	0.0000%	0.0000%	0.1356%	0.0271%	0.0000%	0.0904%	0.0000%	0.0000%
BUENA VISTA CITY	4432	0.0903%	0.0000%	0.0000%	0.0903%	0.0000%	0.0000%	0.0903%	0.0000%	0.0000%	0.0903%	0.0000%	0.0000%
CAROLINE COUNTY	22894	0.1005%	0.0087%	0.0000%	0.0830%	0.0000%	0.0000%	0.1005%	0.0087%	0.0000%	0.0830%	0.0000%	0.0000%
CHARLES CITY COUNTY	5720	0.0524%	0.0000%	0.0000%	0.0350%	0.0000%	0.0000%	0.0524%	0.0000%	0.0000%	0.0350%	0.0000%	0.0000%
COVINGTON CITY	3888	0.1029%	0.0000%	0.0000%	0.0772%	0.0000%	0.0000%	0.1029%	0.0000%	0.0000%	0.0772%	0.0000%	0.0000%
DICKENSON COUNTY	10144	0.1084%	0.0000%	0.0000%	0.0887%	0.0000%	0.0000%	0.1084%	0.0000%	0.0000%	0.0887%	0.0000%	0.0000%
ESSEX COUNTY	8318	0.1443%	0.0000%	0.0000%	0.1443%	0.0000%	0.0000%	0.1443%	0.0000%	0.0000%	0.1443%	0.0000%	0.0000%
FLOYD COUNTY	11852	0.0759%	0.0000%	0.0000%	0.0759%	0.0000%	0.0000%	0.0759%	0.0000%	0.0000%	0.0759%	0.0000%	0.0000%
GILES COUNTY	12093	0.0413%	0.0000%	0.0000%	0.0331%	0.0000%	0.0000%	0.0413%	0.0000%	0.0000%	0.0331%	0.0000%	0.0000%
GREENSVILLE COUNTY	6435	0.1709%	0.0155%	0.0000%	0.1399%	0.0000%	0.0000%	0.1709%	0.0155%	0.0000%	0.1399%	0.0000%	0.0000%
KING AND QUEEN COUNTY	5403	0.0740%	0.0000%	0.0000%	0.0740%	0.0000%	0.0000%	0.0740%	0.0000%	0.0000%	0.0740%	0.0000%	0.0000%
KING GEORGE COUNTY	19780	0.1314%	0.0000%	0.0000%	0.0910%	0.0000%	0.0000%	0.1314%	0.0000%	0.0000%	0.0910%	0.0000%	0.0000%
MARTINSVILLE CITY	9070	0.0992%	0.0000%	0.0000%	0.0882%	0.0000%	0.0000%	0.0992%	0.0000%	0.0000%	0.0882%	0.0000%	0.0000%
MIDDLESEX COUNTY	8746	0.1029%	0.0114%	0.0000%	0.0800%	0.0000%	0.0000%	0.1029%	0.0114%	0.0000%	0.0800%	0.0000%	0.0000%
POQUOSON CITY	9635	0.0934%	0.0000%	0.0000%	0.0934%	0.0000%	0.0000%	0.0934%	0.0000%	0.0000%	0.0934%	0.0000%	0.0000%
ROANOKE CITY	66083	0.0817%	0.0015%	0.0000%	0.0666%	0.0000%	0.0000%	0.0817%	0.0015%	0.0000%	0.0666%	0.0000%	0.0000%
RUSSELL COUNTY	19240	0.1091%	0.0000%	0.0000%	0.1040%	0.0000%	0.0000%	0.1091%	0.0000%	0.0000%	0.1040%	0.0000%	0.0000%

Election Data Analysis Election Forensics Election Integrity mathematics programming technical

Potential duplicate registrants in VA voter list

Post author By Jonathan Lareau
Post date May 27, 2023
No Comments on Potential duplicate registrants in VA voter list

I previously documented the utilization of the Hamming string distance measure to identify candidate pairs of duplicate registrants in voter lists. While a good first attempt at quantifying the numbers of potential duplicates in the voter rolls, using a hamming distance metric is less than ideal for reasons discussed below and in the previous article. I have since been able to update the processing functions to use a more complete Levenshtein distance (LD) metric, and made some improvements to parsers and other code utilities, etc., but otherwise the the analysis followed the same procedure, and is described below.

Using the 2022-11-23 Registered Voter List (RVL) and the 2023-01-26 Voter History List (VHL) purchased from the VA Department of Elections (ELECT) I wrote up an analysis script to check for potentially duplicated registrant records in the RVL and cross reference duplicate pairings with the VHL to identify potential duplicate votes. The details are summarized below.

Please note that I will not publish voter Personally Identifiable Information (PII) on this blog. I have substituted fictitious PII information for all examples given below, and cryptographically hashed all voter information in the downloadable results file. I will make available the detailed information to those that have the authorization to receive and process voter data upon request (contact us).

Summary of Results:

As a baseline, there were 6,464 records for STATUS=’Active’ registrants that adhered to the definition of a “duplicate” when Social Security Number (SSN) is not available, as defined by the MOU between DMV and ELECT (section 7.3) of having the same First Name + Last Name + Full Date of Birth (DOB). I’ve included a copy of the MOU between the VA DMV and ELECT at the end of this article for reference. It should be noted that most records held by DMV and ELECT have a SSN associated with them (or at least they should). SSN information is not distributed as part of the data purchased from ELECT, however, so this is the appropriate standard baseline for this work.

Upgrading our definition of a potential duplicate to [First + Middle + Last + Suffix + DOB] and using a LevenshteinDistance=0 drops the number of potential duplicates to 1,982, with each identified registrant in a pair having an exactly matching string result and unique voter ID numbers.

According to my derivations and simulations that are described in detail here, we should only expect to see an average of 11 (+/- 3) potential duplicate pairs (a.k.a. “collisions”) at a distance of 0. This is over two orders of magnitude different than what we observe in the compiled results. Such a discrepancy deserves further investigation and verification.

Allowing for a single string difference by setting LevenshteinDistance<=1 increases the pool of potential duplicates to 5,568. While this relaxation of the filter does allow us to find certain issues (described below) it also increases our chances of finding false positives as well. The LD metric results should not be viewed as a final determination, but as simply a useful tool to make an initial pass through the data and find candidate matches that still require further review, verification and validation.

Increasing to LevenshteinDistance<=2 brings the number of potential duplicates up to 32,610. When we increase to LD <= 3 we get an explosion of 183,130 potential duplicates.

Method:

For every entry in the latest RVL, I performed a string distance comparison, based on Levenshtein distance, between every possible pair of strings of (FIRST NAME + MIDDLE NAME + LAST NAME + SUFFIX + FULL DOB). For the ~6M different RVL entries, we therefore need to compute ~3.8 x 10^13 different string comparisons, and each string comparison can require upwards of 75 x 75 individual character comparisons, meaning the total number of character operations is on the order of 202.5 Quadrillion, not including logging and I/O.

A distance of 0 indicates the strings being compared are identical, a distance of 1 indicates that there a single character can be changed, inserted or removed that would convert one string into the other. A distance of 2 indicates that 2 modifications are required, etc.

Example: The string pair of “ALISHA” –> “ALISHIA” has an LD of 1, corresponding to the addition of an “I” before the final “A”.

I aggregated all of the Levenshtein distance pairings that were less than or equal to 3 characters different in order to identify potential (key word) duplicated registrants, and additionally for each pairing looked at the voter history information for each registrant in the pair to determine if there was a potential (again … key word) for multiple ballots to be cast by the same person in any given election. As we allow for more characters to be different, we potentially are including many more likely false positive matches, even if we are catching more true positives.

For example: At a distance of 4 the strings of “Dave Joseph Smith M 10/01/1981” and “Tony Joseph Smith M 10/01/1981” at the same address would produce a potential match, but so would “Davey Joseph Smith M 10/01/1981” and “David Josiph Smith M 10/02/1981”. The first pair is more likely to be a false positive due to twins, while the second is more likely to be due to typo’s, mistakes, or use of nicknames and might warrant further investigation. A much stronger potential match would be something like “David Josiph Smith M 10/01/1981” and “David Joseph Smith M 10/01/1981”, with a distance of 1 at the same address. In an attempt to limit false positives, I have clamped the distance checks to <= 3 in this analysis.

The Levenshtein distance measure is importantly able to identify potential insertions or deletions as well as character changes, which is an improvement over the Hamming distance measure. This is exampled by the following pairing: “David Joseph Smith M 10/01/1981” and “Dave Joseph Smith M 10/01/1981”. The change from “id” to “e” in the first name adds/subtracts a character making the rest of the characters in the remainder of the string shift position. A Levenshtein metric would correctly return a small distance of 2, whereas the hamming distance returns 27.

Note that with the official records obtained from ELECT, and in accordance with the laws of VA, I do not have access to the social security number or drivers license numbers for each registration record, which would help in identifying and discriminating potential duplicate errors vs things like twins, etc. I only have the first name, middle name, last name, suffix, month of birth, day of birth, year of birth, gender, and address information that I can work with. I can therefore only take things so far before someone else (with investigative authority and ability to access those other fields) would need to step in and confirm and validate these findings.

Results:

The summary totals are as follows, with detailed examples.

	DMV_ELECT MOU Standard	LD <= 0	LD <= 1	LD <= 2	LD <= 3
Number of Potential Duplicate Registrant Pairs	7,586 (0.12%)	2,472 (0.04%)	6,620 (0.11%)	32,610 (0.53%)	183,130 (2.99%)
Number of Potential Duplicate Registrant Pairs (Active Only)	6,464 (0.11%)	1,982 (0.03%)	5,568 (0.10%)	28,884 (0.50%)	164,302 (2.85%)
Number of Potential Duplicate Ballots	6,362	112	3,576	37,028	236,254
Number of Potential Duplicate Ballots (Active Only)	6,228	110	3,542	36,434	232,394

Examples of Types of Issues Observed:

NOTE THE BELOW INFORMATION HAS HAD THE VOTER PERSONALLY IDENTIFIABLE INFORMATION (“PII”) FICTIONALIZED. WHILE THESE ARE BASED ON REAL DATA TO ILLUSTRATE THE DIFFERENT TYPES OF OBSERVATIONS, THEY DO NOT REPRESENT REAL VOTER INFORMATION.

Example #1: The following set of records has the exact match (distance = 0) of full name and full birthdate (including year), but different address and different voter ID numbers AND there was a vote cast from each of those unique voter ID’s in the 2020 General Election. While it’s remotely possible that two individuals share the exact same name, month, day and year of birth … it is probabilistically unlikely (see here), and should warrant further scrutiny.

Voter Record A:

AMY BETH McVOTER 12/05/1970 F 12345 CITIZEN CT

Voter Record B:

AMY BETH McVOTER 12/05/1970 F 5678 McPUBLIC DR

Example #2: This set of records has a single character different (distance of 1) in their first name, but middle name, last name, birthdate and address are identical AND both records are associated with votes that were cast in the 2020, 2021, and 2022 November General Elections. While it is possible that this is a pair of 23 year old twins (with same middle names) that live together, it at least bears looking into.

Voter Record A:

TAYLOR DAVID VOTER 02/16/2000 M 6543 OVERLOOK AVE NW

Voter Record B:

DAYLOR DAVID VOTER 02/16/2000 M 6543 OVERLOOK AVE NW

Example #3: This set of records has two characters different (distance of 2) in their birthdate, but name and address are identical AND the birth years are too close together for a child/parent relationship, AND both records are associated with votes that were cast in the 2020 and 2022 November General Elections.

Voter Record A:

REGINA DESEREE MACGUFFIN 02/05/1973 F 123 POPE AVE

Voter Record B:

REGINA DESEREE MACGUFFIN 03/07/1973 F 123 POPE AVE

Example #4: This set of records has again a single character different (distance of 1) in the first name (but not the first letter this time) and the last name, birthdate and address are identical. There were also multiple votes cast in the 2019 and 2022 November General from these registrants.

Voter Record A:

EDGARD JOHNSON 10/19/1981 M 5498 PAGELAND BLVD

Voter Record B:

EDUARD JOHNSON 10/19/1981 M 5498 PAGELAND BLVD

Example #5: This set of records has two characters different (distance of 2) in the first and middle names and the last name, birthdate, gender and address are identical. There were also multiple votes cast in the 2021 and 2022 November General from these registrants. Again it is possible that these records represent a set of twins given the information that ELECT provides.

Voter Record A:

ALANA JAVETTE THOMPSON 01/01/2003 F 123 CHARITY LN

Voter Record B:

ALAYA YAVETTE THOMPSON 01/01/2003 F 123 CHARITY LN

Example #6: The following set of records has the exact match (Distance = 0) of full name and full birthdate (including year), and same address but different voter ID numbers. There was no duplicated votes in the same election detected between the two ID numbers.

Voter Record A:

JAMES TIBERIUS KIRK 03/22/2223 M 1701 Enterprise Bridge

Voter Record B:

JAMES TIBERIUS KIRK 03/22/2223 M 1701 Enterprise Bridge

Example #7: The following set of records has the exact match (distance = 0) of full name and full birthdate (including year), same address but different gender and voter ID numbers. There was no duplicated votes in the same election detected between the two ID numbers.

Voter Record A:

MAXWELL QUAID CLINGER 11/03/2004 M 4077 MASH DR

Voter Record B:

MAXWELL QUAID CLINGER 11/03/2004 U 4077 MASH DR

Example #8: The following set of records has a single punctuation character different, with the same address but different voter ID numbers. There was no duplicated votes in the same election detected between the two ID numbers.

Voter Record A:

JOHN JACOB JINGLHIEMER-SCHMIDT 06/29/1997 M 12345 JACOBS RD

Voter Record B:

JOHN JACOB JINGLHIEMER SCHMIDT 06/29/1997 M 12345 JACOBS RD

Results Dataset:

A full version of the aggregated excel data is provided below, however all voter information (ID, first name, middle name, last name, dob, gender, address) have been removed and replaced by a one-way hash number, with randomized salt, based on the voter ID. The full file with specific voter information can be provided to parties authorized by ELECT to receive and process voter information, Election Officials, or Law Enforcement upon request.

20221123-VA-RVL-String-Distance.csv

The MOU between the VA Department of Elections (ELECT) and the VA Department of Motor Vehicles (DMV) is also provided below for reference. Section 7.3 defines the minimal standards for determining a match when no social security number is present.

MOU-between-DMV-and-The-Virginia-Department-of-Elections-2021 Download

Recent Posts

Recent Comments

Archives

Categories