There are currently two completely separate but simultaneous primary elections being held in VA, with actual Election Day coming up fast on March 5th. As part of EPEC’s data analysis on the ongoing Democrat and Republican primaries, I took some time to look at the distribution of voter participation. VA does not have voter registration by party, but participation in primary elections is often used as a surrogate method to try and estimate a voter leaning.
I was specifically interested as to how many “cross-over” voters were participating in each parties primary. There have been multiple news articles (here, for example) discussing the potential for democrats to cross-vote in the 2024 Primaries, and I wanted to see if I could observe evidence of that behavior in the data.
Results:
As can be seen from the image below, there is definitely evidence of crossover voting occurring, with historically democratic primary voters crossing over and voting in this years (2024) Republican primary.
Approximately 17.5% of the 109,395 ballots cast in the 2024 VA Republican primary are associated with historically Democrat leaning registrants. Only 0.35% of the 159,505 ballots cast in the 2024 VA Democratic primary are associated with historically Republican leaning registrants. [Note this plot was updated on 2024-03-11 to reflect the latest values. The previous results from mid-February had the number of crossover D->R voters at ~12%]
Method:
Step 1: Compute an estimate of voter leaning.
The data utilized in this analysis all comes directly from the VA Dept of Elections (“ELECT”) and includes the statewide Registered Voter List (“RVL”) and Voter History List (“VHL”) files dated 03/11/2024, as well as the Daily Absentee List (“DAL”) files corresponding to each of the ongoing Democrat and Republican primaries.
An estimation of each party leaning is first computed by going through the VHL and for each unique voter in the VHL summing the number of Democrat or Republican primaries that voter has participated in historically. We then take the difference of these two fields and divide by the total number of election contests the voter has participated in. This gives us a resultant estimate of the “leaning” for each unique voter.
leaning = (# Dem Primaries – # Rep Primaries) / (# of Total Contests)
A leaning < 0 indicates a Republican lean, and > 0 indicates a Democratic lean. A voter might have a lean == 0 if they had a balanced participation in previous primaries, or if there is no voter history for that particular voter.
Step 2: Plot the histogram of voter leaning for ballots cast so far in both the Democratic and Republican primaries. Additionally plot the Computed voter leaning for the entire RVL as a reference.
Below you will find the current summary data and graphics from the VA 2024 Republican Primary Election Daily Absentee List files. We pull the DAL file everyday and track the count of each specific ballot category in each daily file.
Note: Page may take a moment to load the graphics objects.
Place your cursor over the series name in the legend at right to see the series highlighted in the graphic. Place your cursor over a specific data point to see that data points value.
The logarithmic plot is the same underlying data as the linear scale plot, except with a logarithmic y-scale in order to be able to compress the dynamic range and see the shape of all of the data curves in a single graphic. Place your cursor over the series name in the legend at right to see the series highlighted in the graphic. Place your cursor over a specific data point to see that data points value.
The underlying data for the graphics above is provided in the summary data table.
Additional Data:
Additional CSV datasets stratified by Locality, City, Congressional District, State House District, State Senate District, and Precinct are available here.
Data column descriptions:
“ISSUED” := Number of DAL file records where BALLOT_STATUS= “ISSUED”
“NOT_ISSUED” := Number of DAL file records where BALLOT_STATUS= “NOT ISSUED”
“PROVISIONAL” := Number of DAL file records where BALLOT_STATUS= “PROVISIONAL” and APP_STATUS=”APPROVED”
“DELETED” := Number of DAL file records where BALLOT_STATUS= “DELETED”
“MARKED” := Number of DAL file records where BALLOT_STATUS= “MARKED” and APP_STATUS=”APPROVED”
“ON_MACHINE” := Number of DAL file records where BALLOT_STATUS= “ON_MACHINE” and APP_STATUS=”APPROVED”
“PRE_PROCESSED” := Number of DAL file records where BALLOT_STATUS= “PRE-PROCESSED” and APP_STATUS=”APPROVED”
“FWAB” := Number of DAL file records where BALLOT_STATUS= “FWAB” and APP_STATUS=”APPROVED”
“MAIL_IN” := The sum of “MARKED” + “PRE_PROCESSED”
“COUNTABLE” := The sum of “PROVISIONAL” + “MARKED” + “PRE_PROCESSED” + “ON_MACHINE” + “FWAB”
“MILITARY” := Number of DAL file records where VOTER_TYPE= “MILITARY”
“OVERSEAS” := Number of DAL file records where VOTER_TYPE= “OVERSEAS”
“TEMPORARY” := Number of DAL file records where VOTER_TYPE= “TEMPORARY”
“MILITARY_COUNTABLE” := Number of DAL file records where VOTER_TYPE= “MILITARY” and where COUNTABLE is True
“OVERSEAS_COUNTABLE” := Number of DAL file records where VOTER_TYPE= “OVERSEAS” and where COUNTABLE is True
“TEMPORARY_COUNTABLE” := Number of DAL file records where VOTER_TYPE= “TEMPORARY” and where COUNTABLE is True
All data purchased by Electoral Process Education Corp. (EPEC) from the VA Dept of Elections (ELECT). All processing performed by EPEC.
If you like the work that EPEC is doing, please support us with a donation.
Below you will find the current summary data and graphics from the VA 2024 Democratic Primary Election Daily Absentee List files. We pull the DAL file everyday and track the count of each specific ballot category in each daily file.
Note: Page may take a moment to load the graphics objects.
Place your cursor over the series name in the legend at right to see the series highlighted in the graphic. Place your cursor over a specific data point to see that data points value.
The logarithmic plot is the same underlying data as the linear scale plot, except with a logarithmic y-scale in order to be able to compress the dynamic range and see the shape of all of the data curves in a single graphic. Place your cursor over the series name in the legend at right to see the series highlighted in the graphic. Place your cursor over a specific data point to see that data points value.
The underlying data for the graphics above is provided in the summary data table.
Additional Data:
Additional CSV datasets stratified by Locality, City, Congressional District, State House District, State Senate District, and Precinct are available here.
Data column descriptions:
“ISSUED” := Number of DAL file records where BALLOT_STATUS= “ISSUED”
“NOT_ISSUED” := Number of DAL file records where BALLOT_STATUS= “NOT ISSUED”
“PROVISIONAL” := Number of DAL file records where BALLOT_STATUS= “PROVISIONAL” and APP_STATUS=”APPROVED”
“DELETED” := Number of DAL file records where BALLOT_STATUS= “DELETED”
“MARKED” := Number of DAL file records where BALLOT_STATUS= “MARKED” and APP_STATUS=”APPROVED”
“ON_MACHINE” := Number of DAL file records where BALLOT_STATUS= “ON_MACHINE” and APP_STATUS=”APPROVED”
“PRE_PROCESSED” := Number of DAL file records where BALLOT_STATUS= “PRE-PROCESSED” and APP_STATUS=”APPROVED”
“FWAB” := Number of DAL file records where BALLOT_STATUS= “FWAB” and APP_STATUS=”APPROVED”
“MAIL_IN” := The sum of “MARKED” + “PRE_PROCESSED”
“COUNTABLE” := The sum of “PROVISIONAL” + “MARKED” + “PRE_PROCESSED” + “ON_MACHINE” + “FWAB”
“MILITARY” := Number of DAL file records where VOTER_TYPE= “MILITARY”
“OVERSEAS” := Number of DAL file records where VOTER_TYPE= “OVERSEAS”
“TEMPORARY” := Number of DAL file records where VOTER_TYPE= “TEMPORARY”
“MILITARY_COUNTABLE” := Number of DAL file records where VOTER_TYPE= “MILITARY” and where COUNTABLE is True
“OVERSEAS_COUNTABLE” := Number of DAL file records where VOTER_TYPE= “OVERSEAS” and where COUNTABLE is True
“TEMPORARY_COUNTABLE” := Number of DAL file records where VOTER_TYPE= “TEMPORARY” and where COUNTABLE is True
All data purchased by Electoral Process Education Corp. (EPEC) from the VA Dept of Elections (ELECT). All processing performed by EPEC.
If you like the work that EPEC is doing, please support us with a donation.
The below is based on the discussion of “Single Transferrable Vote” (“STV”) methods in [1], published in 1977. STV has more recently been called “Ranked Choice Voting” (RCV) or “Instant Runoff Voting” (IRF), among other names, by lobbying groups that are currently pushing for its incorporation into our voting systems. Irrespective of the name used, it represents a family of voting methods, with slightly different variants depending on how votes are removed and/or redistributed in each successive round of voting. [2][5]
What does STV/RCV/IRV entail, in general:
The core system is a proportional voting system, where voters are required to rank order their preferred candidate selections and all ballots are collected and centralized tabulation is performed in multiple rounds until winner(s), or candidates that have support above a specified quota (or “threshold”), are allocated.
A common definition of the quota utilized in STL/RCV/IRV systems is the “Droop quota”, and is defined as:
q = FLOOR( # of Voters / (# of Seats + 1) + 1)
In a given round the candidate with the least support is eliminated from further evaluation. Surplus votes from candidates that go over the droop threshold and votes from eliminated candidates can be distributed amongst remaining candidates for subsequent rounds. Surplus vote distribution is only applicable when multiple winners are allowed in a contest.
The arguments used to support and push for RCV have not significantly changed since the time that the original paper was published, but the terms and language utilized have been modified. The authors note that much of the rationale in pushing for STV was centered around the ideas of inclusivity and making sure voters are able to cast “effective” ballots.
“Modem proponents emphasize the system’s effective representation of minorities, its sensitivity and accuracy in ‘measuring changes in popular will,’ and its tendency to encourage independent (nonparty line) voting.”
Doron, G., & Kronick, R. (1977) [1]
The same arguments have been recently repeated and pushed to legislators and the media. The name has changed from “Single Transferrable Vote” to “Ranked Choice Voting” or “Instant Runoff Voting”, but the argument remains largely the same, as can be seen by simply visiting the websites and promotional material for any of the current groups that are lobbying for RCV to be incorporated [3][4].
The issue pointed out by Doron & Kronick:
The authors in [1] note that the STV/RCV/IRV system allows for a “perversion” (their words, not mine) whereby a candidates chances to be selected as a winner can potentially be negatively impacted even when receiving increased support.
“… a function that permitted an increased vote for a candidate to cause a decline in that candidate’s rank in the social ordering-would probably strike most of us as a rather absurd, even perverse, method of arriving at a social choice. Consequently, some writers refer to this condition as the ‘Non-Perversity’ condition. All of the democratic social choice functions that have been considered in the literature were assumed to guarantee this condition, but the Single Transferrable Vote system does not.”
Doron, G., & Kronick, R. (1977) [1]
The authors present a hypothetical example to demonstrate the issue. Suppose we have 3 candidates (Candidate X, Candidate Y, Candidate Z) and two different voting groups, which we will refer to as group D and D’. Both D and D’ are fairly similar and only disagree on the relative ranking of two specific candidates.
In the tables below, recreated from [1], the only difference in the two voting group selections is that candidate X receives more support than candidate Y in group D’. However, if using the voting rules as described above candidate X wins in D, and loses in D’ even though X has increased support in D’.
# of Voters
First Choice
Second Choice
Third Choice
6
X
Y
Z
2
Y
X
Z
4
Y
Z
X
5
Z
X
Y
Voting group D selections. Reprinted from [1].
# of Voters
First Choice
Second Choice
Third Choice
6
X
Y
Z
2
X
Y
Z
4
Y
Z
X
5
Z
X
Y
Voting group D’ selections. Reprinted from [1].
There are 17 voters in each case, and only 1 seat available. Therefore, the Droop quota/threshold is 9 votes required in order to declare a winner.
In group D it is candidate Z that has the least amount of votes in the first round and is eliminated, therefore advancing 5 second-choice votes for X into the next round. Candidate X passes the threshold and wins in the second round.
In group D’, where candidate X received more support than candidate Y, it is candidate Y that has the least amount of votes in the first round and is eliminated, therefore advancing 4 second-choice votes for Z into the next round. Candidate Z then passes the threshold and wins in the second round.
Bibliography:
Doron, G., & Kronick, R. (1977). Single Transferrable Vote: An Example of a Perverse Social Choice Function. American Journal of Political Science, 21(2), 303–311. https://doi.org/10.2307/2110496
Brandt F, Conitzer V, Endriss U, Lang J, Procaccia AD, eds. Handbook of Computational Social Choice. Cambridge: Cambridge University Press; 2016. https://doi.org/10.1017/CBO9781107446984
Update (2023-12-14 12:00:00 EST) : Special thank you to Rick Michael of the Chesterfield Electoral board for checking their records on issues #1 and #2 below. There were 3 x Issue #1 records and 9 x Issue #2 records identified in Chesterfield County.
According to Rick, the records in question were populated and visible when looking via the electronic VERIS (the states election database) login available to the Registrar. The 3 x Issue #1 records can be found and are Active records in the electronic system, and the 9 x Issue #2 records had an update that moved the records from Inactive to Active that were not reflected in the data supplied to us.
That implies that the data that we purchased (for approximately $12,000) directly from the department of elections is inaccurate and incomplete. Our initial purchase and download of the June 30 Registered Voter List (RVL) database does not show the registrants identified in Issue #1, even though the Registrar can see them in their electronic terminal. And our Monthly Update Subscription (MUS) we receive is missing the updates showing the registrant records identified in Issue #2 being moved from Inactive to Active status.
The department of elections is required by federal law (NVRA, HAVA) to keep and maintain accurate election records AND to make those records accessible for inspection and verification, and for use by candidates and political parties. Additionally, we have paid (twice!) for this data; once as taxpayers, and once again as a 501c3 entity. If the data we, and other campaigns and candidates are receiving is not representative of the actual records in the database, incomplete and inaccurate … that needs to be addressed and fixed.
Summary:
Issue #1: There are 99 records of ballots cast, according to the VA Department of Elections (ELECT) Daily Absentee List (DAL) data file that do not have corresponding voter ID listed in Registered Voter List (RVL) data.
Issue #2: There are 380 records of ballots cast in the DAL where the corresponding RVL record has been listed as “Inactive” since June-30-2023 and no modification to the RVL record has taken place.
Issue #3: There are 18 records of ballots cast in the DAL where the corresponding RVL record is listed as “Inactive” as of Dec-01-2023, but there has been previous modifications to the RVL record since June-30-2023.
We are currently reaching out in attempts to contact the VA AG’s office and to provide them the details of this analysis in order to have these anomalies further investigated.
Data files utilized for this analysis:
Our 501c3 EPEC purchased and downloaded the full statewide VA RVL on June-30-2023 from ELECT. We additionally purchased the Monthly Update Service (MUS) package from ELECT, where on the 1st of each month we are provided a list of all of the changes that have occurred to the RVL in the previous month. By applying these changes to our baseline data file, we are able to update our copy of the RVL to reflect the latest state as per ELECT. We can also create a cumulative record of all entries associated with a particular voter ID by simply concatenating all of these datafiles.
Additionally, during the VA 2023 General Election, we purchased access to the Daily Absentee List (DAL) file generated by ELECT that documents all of the transactions associated with early mail-in or in-person voting during the 45 day early voting period. The DAL file we utilized for this analysis was downloaded from ELECT on Nov-13-2023 at 6am EST.
Identification of ballots cast via the DAL file can be performed by checking for rows of the DAL data table that have the APP_STATUS field set to “Approved” and have the BALLOT_STATUS field set to any of the following: “Marked” | “Pre-Processed” | “On Machine” | “FWAB” | “Provisional”.
Once cast ballots are identified in the DAL, the Voter Identification Number can be used to lookup all of the corresponding records in our cumulative RVL data. The data issues summarized above can be directly observed using this process. Due to VA law, we cannot publish the full specific records here in this blog but have summarized, captured and described our process and results.
For Issue #1: If there does not exist a corresponding registration record for cast ballots, then the voter should not have been able to have their mail-in ballot approved or issued, or been able to check-in to an early voting precinct to vote on-machine. If the voter record actually does exists, then why is it not reflected in the data that we purchased from ELECT. Note that all provisional and Same Day Registration (SDR) ballots were required to be entered into the states database (“VERIS”) by the Friday after the election (Fri Nov-10-2023). We specifically waited until we received the Dec-01-2023 MUS data update from ELECT to attempt to perform this or similar analysis in order to ensure that we would not be missing any last minute registrations or RVL updates.
For Issue #2: There are 380 records of ballots being cast in the DAL where the baseline June-30-2023 RVL data file shows the registrant as inactive, and there has been no modifications or adjustments to the record presented in the MUS data files. Therefore these registrants should still have been listed as “Inactive” during the early voting period which started in September through Election Day (Nov 7).
For issue #3: There are 18 records that show the cast ballot is from a registrant that is currently listed as “Inactive” but there has been adjustments to the registration record over the last 6 months. An example of such is below. Note that I have captured the MUS data file generation dates in the 5th column to note when the file was generated and received by us.
In the example given below, the first invalidation operation on the registration record appears in the MUS file dated Sep-01, with the earliest transaction date listed as Aug-29-2023. The ballot application was not received until Sept 26 according to the DAL, so the application should never have been approved or the ballot issued as the registrant status should have been “Invalid” according to the states own data.
(Also … yes … I know there is a typo in the spelling of “APP_RECIEPT_DATE” in the tables below … but this is the data as it comes from ELECT).
APP_RECIEPT_DATE
APP_STATUS
BALLOT_RECEIPT_DATE
BALLOT_STATUS
“2023-09-26 00:00:00”
Approved
“2023-10-19 00:00:00”
Pre-Processed
Example Extract of a DAL data record for a mail-in ballot cast during early voting.
TRANSACTIONDATE
TRANSACTIONTIME
Trans_Type
NVRAReasonCode
File Source
30-June-2023
12:12:00
BASELINE
BASELINE
Baseline RVL
28-Jul-2023
09:34:03
MODIFY
Change Out
MUS 08/01/2023
28-Jul-2023
09:34:04
MODIFY
Address Change
MUS 08/01/2023
28-Jul-2023
09:34:04
MODIFY
Change In
MUS 09/01/2023
28-Jul-2023
09:34:03
MODIFY
Change Out
MUS 09/01/2023
28-Jul-2023
09:34:04
MODIFY
Address Change
MUS 09/01/2023
28-Jul-2023
09:34:04
MODIFY
Change In
MUS 09/01/2023
29-Aug-2023
11:55:49
MODIFY
Inactivate
MUS 09/01/2023
29-Aug-2023
11:55:49
MODIFY
Inactivate
MUS 10/01/2023
Extract of RVL Cumulative Data Records for Voter ID in above DAL entry
Below you will find the current summary data and graphics from the VA 2023 General Election Daily Absentee List files. We pull the DAL file everyday and track the count of each specific ballot category in each daily file.
Note: Page may take a moment to load the graphics objects.
Linear Scale Plot:
Place your cursor over the series name in the legend at right to see the series highlighted in the graphic. Place your cursor over a specific data point to see that data points value.
Logarithmic Scale Plot:
The logarithmic plot is the same underlying data as the linear scale plot, except with a logarithmic y-scale in order to be able to compress the dynamic range and see the shape of all of the data curves in a single graphic. Place your cursor over the series name in the legend at right to see the series highlighted in the graphic. Place your cursor over a specific data point to see that data points value.
Summary Data Table:
The underlying data for the graphics above is provided in the summary data table.
Additional Data:
Additional CSV datasets stratified by Locality, City, Congressional District, State House District, State Senate District, and Precinct are available here.
Data column descriptions:
“ISSUED” := Number of DAL file records where BALLOT_STATUS= “ISSUED”
“NOT_ISSUED” := Number of DAL file records where BALLOT_STATUS= “NOT ISSUED”
“PROVISIONAL” := Number of DAL file records where BALLOT_STATUS= “PROVISIONAL” and APP_STATUS=”APPROVED”
“DELETED” := Number of DAL file records where BALLOT_STATUS= “DELETED”
“MARKED” := Number of DAL file records where BALLOT_STATUS= “MARKED” and APP_STATUS=”APPROVED”
“ON_MACHINE” := Number of DAL file records where BALLOT_STATUS= “ON_MACHINE” and APP_STATUS=”APPROVED”
“PRE_PROCESSED” := Number of DAL file records where BALLOT_STATUS= “PRE-PROCESSED” and APP_STATUS=”APPROVED”
“FWAB” := Number of DAL file records where BALLOT_STATUS= “FWAB” and APP_STATUS=”APPROVED”
“MAIL_IN” := The sum of “MARKED” + “PRE_PROCESSED”
“COUNTABLE” := The sum of “PROVISIONAL” + “MARKED” + “PRE_PROCESSED” + “ON_MACHINE” + “FWAB”
“MILITARY” := Number of DAL file records where VOTER_TYPE= “MILITARY”
“OVERSEAS” := Number of DAL file records where VOTER_TYPE= “OVERSEAS”
“TEMPORARY” := Number of DAL file records where VOTER_TYPE= “TEMPORARY”
“MILITARY_COUNTABLE” := Number of DAL file records where VOTER_TYPE= “MILITARY” and where COUNTABLE is True
“OVERSEAS_COUNTABLE” := Number of DAL file records where VOTER_TYPE= “OVERSEAS” and where COUNTABLE is True
“TEMPORARY_COUNTABLE” := Number of DAL file records where VOTER_TYPE= “TEMPORARY” and where COUNTABLE is True
Editorial Note:
When we first started receiving DAL data files on Sept-14-2023 after our purchase from ELECT, we noticed that there were a dozen or so records that were marked as being in a countable state, including records that should correspond to In-Person-Early votes. This is problematic as in person early voting facilities were not opened until Sept-22-2023. We respectfully raised this issue with the dept of elections and they acknowledged the error and directed registrars to correct the issues before the official start of early voting. (This can be observed in the logarithmic plot with the Overseas_Countable, On-Machine, Provisional, and Pre-Processed counts being reset to 0 on 9-21-2023.) We’d like to take a moment to thank the folks at ELECT and the registrars for listening to our concerns and correcting these errors before the start of early voting. Credit where credit is due.
All data purchased by Electoral Process Education Corp. (EPEC) from the VA Dept of Elections (ELECT). All processing performed by EPEC.
If you like the work that EPEC is doing, please support us with a donation.
After reading through the press release we decided to independently try to verify the claims in the release. Note that an analysis like this has been on our list of things-to-do, but there are only so many hours in the day! The fact this press release was issued gave us a well deserved prod to complete this analysis.
EPEC has purchased the entire statewide registered voter list data from the VA Department of Elections (ELECT) and has current records as of 2023-08-01. Eligible parties can purchase data from ELECT via their website here.
The necessary data from the US Census office can be downloaded here and included the estimates of the eligible voting age citizens in each county. From the documentation on the census site, the “cvap_est” field in the census data represents “The rounded estimate of the total number of United States citizens 18 years of age or older for that geographic area and group.”
It is therefore a straightforward process to accumulate the number of registrant records in each county, as well as accumulating the number of eligible voting age citizens and compute the registration percent “REG_PCT” as (# Registered / # Eligible * 100). The below table has the results of this direct computation for each county.
The results are only slightly different than the results presented by Honest Elections Project, but still show significant issues with 38 counties being over 100%.
Adjusting for population growth since 2020 census
As the census redistricting data is circa 2020, and the eligible voter data was estimated for 2021, we can attempt to account for population shifts since the 2020 census data was collected and the voter eligibility data was computed. The US Census bureau also makes available the estimates of population growth by county year-over-year since the date of the last census here, which we can use to find the recent rates of growth or decline for each county. We can then use these rates to adjust the number of eligible voter estimates to scale with the most recent rates of population change. This is admittedly an approximation and assumes a linear relationship, but it is arguably better than taking the 2020 census and 2021 eligible voter estimates and applying them directly to the latest (2023) RVL.
The REG_PCT_ADJ column in the table below represents this adjusted estimate.
Active vs inactive registrations
An additional consideration that can be made with this data, is to attempt to consider only “Active” voter registrations vs registrations with any status assigned. Note that “Inactive” voter registrations can be immediately returned to “Active” status by simply having any type of interaction with the department of elections (or through DMV, etc), and the registrant will then be allowed to vote. Because of this easy ability to change “Inactive” records to “Active”, it is most appropriate (IMO) to include them in this analysis. However, for completeness, and in order to bound the scope of the issue, the corresponding REG_PCT_ACTIVE and REG_PCT_ADJ_ACTIVE columns have also been computed which only consider “Active” voters.
Results
Even the most forgiving analysis we could compute with the official data from US Census and VA ELECT, which only considers active voters and attempts to adjust for population change since the census, still results in multiple (6) counties in VA having more than 100% registered voters than eligible voters, and many counties that had over 90%.
The most appropriate metric to consider, in my opinion, is the Adjusted and either Active or Inactive status results, as inactive status registrations can still be converted to active status and voted. There were 36 localities with over 100% in this category and 59 between 90% and 100%. There are 133 voting localities in total in VA.
The summary tabulated data and graphics for each of the methods of analyzing the data is presented below.
Below is an analysis of the VA statewide voter completion rate for absentee ballots compiled from the 2022 General Election Daily Absentee List (DAL) file downloaded from the VA Dept of Elections (“ELECT”) on 2022-11-15 17:46:21.
The DAL file records the transactions of all absentee ballots during the early voting period in VA elections. It includes records for both mail-in and in-person early voting transactions. It does not record the the actual values of the voted ballots, but the “fact-of” a registered voters checking in to an early voting site, or mailing their ballot application or completed ballot to the registrar, etc. The DAL record is published daily over the course of the early voting period and the file is cumulative.
For the purposes of this analysis a “Completed” ballot is a ballot that has been recorded in the DAL file as reaching a state in which the ballot can be considered to be tabulate-able. A “Completed” ballot must have its “APP_STATUS” field set to “Approved” AND have the “BALLOT_STATUS” field set to (“FWAB” OR “Marked” OR “On Machine” OR “Pre-Processed”).
The “VOTER_TYPE” field was used to separate records into “Military”, “Overseas” or “Temporary(Federal-Only Ballot)” and the ballot completion rate was computed for each sub-category, as well as overall.
All Absentee Voters
Avg Transactions Per Voter
1.03
Avg Completion Rate Per Voter
91.95%
Num Of Unique Voters
1,057,268
Military Voters (VOTER_TYPE==”Military”)
Avg Transactions Per Voter
1.08
Avg Completion Rate Per Voter
78.60%
Num Of Unique Voters
9,346
Overseas Voters (VOTER_TYPE==”Overseas”)
Avg Transactions Per Voter
1.17
Avg Completion Rate Per Voter
63.63%
Num Of Unique Voters
7,052
Temporary Federal Voters (VOTER_TYPE==”Temporary(Federal-Only Ballot)”
Avg Transactions Per Voter
1.21
Avg Completion Rate Per Voter
61.14%
Num Of Unique Voters
1,539
Discussion:
The data above shows that there is a distinct statistical discrepancy in the ability of Military, Overseas, or Temporary Federal Workers to complete their absentee ballots in comparison to standard ballots. These categories of voters are specifically reliant on the Mail-In absentee ballot process, and are demonstrably not having the same ability to have their votes cast and counted as is provided to standard absentee voters.
This discrepancy might be due to any number of potential reasons or mechanisms, which cannot be determined from the DAL data as provided by ELECT. The discrepancy demonstrably exists, though, and it should be investigated and remedied by legislators and officials in order to remedy the comparative disenfranchisement of specific classes of VA voters..
I will note for completeness that the first discovery and observation of this discrepancy was due to the diligent work of a fellow EPEC board member. I independently validated his results and created the scripts to process the data on a statewide basis to produce these tables. As always I am happy to provide the raw data, scripts and results to interested parties that are capable of receiving and handling VA election data according to VA law and the policies of ELECT. Interested parties can contact us to request more information.
BLUF: The number of detected exact (Full Name + DOB) “clone” registration records in the VA Registered Voter List (RVL) file has decreased overall from 2022-11-23 to 2023-07-01, however there are additional new clones still being added to the database.
There has been a concentrated effort by various election integrity groups and public officials around the state of VA to clean up the voter rolls. Specifically the VA Department of Elections (“ELECT”) made a concerted effort in early 2023 to remove and clean up a large number (~19,000) of deceased voters and other errant records. The data below shows that these efforts have made an impact on the number of exact “clones” identified in the database, but that there is still more work to do.
BACKGROUND: This is a continuation of exploratory analysis on the existence of “cloned” records in the VA Registered Voter List. Please see previous posts here, here, here and here for background information.
As a reminder and for the purposes of this analysis, a potential “cloned” record is defined as a record in the VA Registered Voter List (RVL) where the Full Name (First + Middle + Last + Suffix) and full Date of Birth (mm/dd/yyyy) exactly matches another record but they have different Voter Identification Numbers. In my previous analyses I was focusing on the Active registration records that had been identified as clones, but even if a cloned record is marked as Inactive in the database, it still holds the potential to be voted as any interaction with the voter immediately moves the registration from Inactive to Active. Therefore the analysis below includes both Active and Inactive records.
It is important to note and emphasize that this analysis is onlyspecifically focusing on exact clones, and not any other of the number of potential errors that could be represented in a voter database. There are a couple of reasons for this narrow focus:
The detection of exact clones in the database requires no additional data correlations and can operate directly on the data provided from ELECT. It is easily defined and scriptable, and can be replicated by other researchers and public officials for verification.
Due to item (1), the identification of exact clones is a good candidate to track over time as a proxy indicator for issues with the database.
There are some rather interesting non-random distribution patterns of the already observed cloned records that I have previously discussed here, and I am interested in observing and understanding the cause of these distribution shapes.
DETAILS: As we now have collected multiple statewide voter registration lists, I was curious as to how the numbers of detected cloned registrant records have changed over time, specifically with respect to the REGISTRATION_DATE field that is reported in each record.
In Figure 1 below I’ve plotted the number of identified cloned records in the 2022-11-23 RVL stratified by registration date year. Figure 2 is the same plot from the 2023-06-30 RVL file we recently purchased. Figure 3 shows the number of additions, removals and net change in the number of identified cloned records in each date bin between these two datasets, based on the unique set of cloned Voter ID Numbers that fall within each bin.
The total number of identified clones in the 2022-11-23 dataset was 2,445. The total number of identified clones in the 2023-06-30 was 1,485. We can see in Figure 3 that while there has been an overall reduction of 960 in the number of cloned records (which is good!), there are still new clones being added to the voter registration database, even as previously identified clones have been removed. This suggests that there is an ongoing process(es) or mechanism(s) that is continuously adding cloned records to the voter list database.
It is not readily apparent as to what causes the added cloned records. It could be any number of technology issues, such as a poor input verification or coding practices, or related to human error and poor procedures and/or training, or a mixture of issues.
The two full datasets above (2022-11-23 and 2023-06-30) were purchased directly from the department of elections. The lists were parsed, standardized and normalized for string case, known typos, whitespace and punctuation issues, but otherwise the raw data entries were unadjusted.
We also purchased the Monthly Update Subscription (MUS) from ELECT at the time that we ordered the 2023-06-30 RVL. The MUS is generated on a monthly basis and captures the changes to the voter list over the prior month. We received the 2022-07-01 MUS, and applied the changes within it to the full 2023-06-30 RVL that we had received the day before. As the MUS contains all the changes over the previous month, and we had purchased our full dataset the prior day, we did not expect there to be many adjustments required, but there were a few. Applying the MUS to the 2023-06-30 dataset resulted in the generation of an updated 2023-07-01 dataset.
For completeness, the same plots that were generated above for the directly purchased data is repeated below for the updated RVL dataset. Figure 4 plots the number of identified cloned records in the 2023-07-01 RVL stratified by registration date year, and Figure 5 shows the differences with the 2022-06-23 dataset.
One thing that I find interesting is the difference in detected cloned entries between the purchased 2023-06-30 dataset and the 2023-07-01 dataset after the MUS entries have been applied. This is presented in Figure 6 below. We see that the application of the MUS did not remove any clones, but 90 were added. I’m not sure what this means yet, as we only have a single MUS file and it was generated so close to our full download. We will monitor and see how this progresses as we continue to receive the MUS files throughout the 2023 election cycle.
One thing that I have been asked about repeatedly is if there is any sort of patterns in the assignment of voter ID numbers in the VA data. Specifically, I’ve been asked repeatedly if I’ve found any similar pattern to what AuditNY has found in the NY data. It’s not something that I have looked at in depth previously due mostly to lack of time, and because VA is setup very differently than NY, so a direct comparison or attempt to replicate the AuditNY findings in VA isn’t as straightforward as one would hope.
The NY data uses a different Voter ID number for counties vs at the state level, which is the “Rosetta Stone” that was needed for the NY team to understand the algorithms that were used to assign voter ID numbers, and in turn discover some very (ahem) “interesting” patterns in the data. VA doesn’t have such a system and only uses a single voter ID number throughout the state and local jurisdictions.
Well … while my other machine is busy crunching on the string distance computations, I figured I’d take a crack at looking at the distribution of the Voter ID numbers in the VA Registered Voter List (RVL) and just see what I find.
To start with, here is a simple scatter plot of the Voter ID numbers vs the Registration date for each record in the 2023-07-01 RVL. From the zoomed out plot it is readily apparent that there must have been a change in the algorithm that was used to assign voter identification numbers sometime around 2007, which coincides nicely with the introduction of the current Virginia Election and Registration Information System (VERIS) system.
From a high level, it appears that the previous assignment algorithm broke the universe of possible ID numbers up into discrete ranges and assigned IDs within those ranges, but favoring the bottom of each range. This would be a logical explanation for the banded structure we see pre-2007. The new assignment algorithm post-2007 looks to be using a much more randomized approach. Nothing strange about that. As computing systems have gotten better and security has become more of a concern over the years there have been many systems that migrated to more randomized assignments of identification numbers.
Looking at a zoomed in block of the post-2007 “randomized” ID assignments we can see some of the normal variability that we would expect to see in the election cycles. We see that we have a high density of new assignments around November of 2016 and 2020, with a low density section of assignments correlated to the COVID-19 lockdowns. There are short periods where it looks like there were lulls in the assignment of voter ID’s, these are perhaps due to holidays or maintenance periods, or related to the legal requirements to “freeze” the voter rolls 30 days before any election (primaries, runoffs, etc). Note that VA now has same day voter registration as of the laws passed by the previous democratic super-majority that went into affect in 2022, so going forward we would likely see these “blackout periods” be significantly reduced.
We can see more clearly the banded assignment structure of the pre-2007 entries by zooming in on a smaller section of the plot, as shown below. It’s harder to make out in this banded structure, but we still see similar patterns of density changes presumably due to the natural election cycles, holidays, maintenance periods, legally required registration lockouts periods, etc. We can also see the “bucketing” of ID numbers into distinct bands, with the bias of numbers filling the lower section of each band.
All of that looks unremarkable and seems to make sense to me … however … if we zoom into the Voter ID address range of around [900,000,000 to 920,000,000] we do see something that catches my curiosity. We see the existence of the same banded structure as above between 900,000,000 and 915,000,000 AND pre-2007, but there is another band of assignments super-imposed on the entire date range of the RVL. This band does not seem to be affected by the introduction of the VERIS system (presumably), which is very interesting. There is also what looks like to be a vertical high-density band between 2007 and 2010 that extends along the entire vertical axis, but we only see it once we zoom in to the VERIS transition period.
The horizontal band that extends across all date ranges only exists in the [~915,000,000 to ~920,000,000] ID range. It trails off in density pre ~1993, but it exists throughout the full registration date range. I will note that the “Motor Voter” National Voter Registration Act (NVRA) was implemented in 1993, so perhaps these are a reserved universe block for DMV (or other externally provided) registrations? (That’s a guess, but an educated one.)
A plausible explanation I can imagine for the distinct high density band between 2007-2010 is that this might be related to how the VERIS system was implemented and brought into service, and there was some sort of update around 2010 that made correction to its internal algorithms. (But that is just a guess.) That still wouldn’t entirely explain the huge change in the density of new registrants added to the rolls.
Another, or additional, explanation might be that when VERIS came online there were a number of registrants that had their Voter ID number regenerated and/or their registration date field updated as part of the rollout of the new VERIS software. Meaning that while VERIS was coming online and handling the normal amount of new real registrations, it was also moving/updating a large number of historic registrations, which would account for the higher density as VERIS became the system of record. That seems to be a poor systems administration and design choice, in my opinion, as it makes inaccurate those moved registrant records by giving them a false registration date. However, if that was the case, and VERIS was resetting registration dates as it ingested voter records into its databases, why do we see any records with pre-2007 registration dates at all? (This is again, merely an educated guess on my part, so take with a grain of salt.)
Incorporating the identification of cloned registrations
In attempting to incorporate some of my early results on the most recent RVL data doing duplicate record identification (technically they are “cloned” records, as “duplicates” would have the same voter ID numbers. This was pointed out to me a few days ago.) on this dataset, I did a scatter plot of only those records that had an identified exact match of (FullName +DOB) to other records in the dataset, but with unique Voter ID numbers. The scatter plot of those records is shown below, and we can see that there is a distinct ~horizontal cluster of records that aligns with the 915M – 920M ID band and pre-2007. In the post-2007 block we see the cloned records do not seem to be totally randomly distributed, but have a bias towards the lower right of the graph.
Superimposing the two plots produces the following, with the red indicating the records with identified Full Name + DOB string matches.
Zooming in to take a closer look at the 915M-920M band again, gives the following:
It is curious that there seems to be an alignment of the exact Full Name + DOB matching records with the 915M-920M, pre-2007 ID band. Post-2007 the exact cloned matches have a less structured distribution throughout the data, but they do seem to cluster around the lower right.
If the cloned records were simply due to random data entry errors, etc. I would expect to see sporadic red datapoints distributed “salt-n-pepper” style throughout the entirety of the area covered by the blue data. There might be some argument to be made for there being a bias of more of the red data points to the right side of the plot, as officials have not yet had time to “catch” or “clean-up” erroneous entries, but there is little reason to have linear features, or to have a bias for lower ID numbers in the vertical axis.
I am continuing to investigate this data, but as of right now all I can tell you is that … yes, there does seem to be interesting patterns in the way Voter IDs are assigned in VA, especially with records that have already been found and flagged to be problematic (clones).