Categories
Election Data Analysis Election Forensics Election Integrity Interesting technical

Distribution of Invalid Voter Addresses and Absentee Ballots in VA 2022 General Election

Edited on 2022-12-15 for typo corrections, addition of Congressional District breakdown, and added commentary section.

BLUF (Bottom Line Up Front):

There were 15,419 ballots cast during early voting in the VA 2022 General Election where the voters’ registered address on record were flagged as “Invalid” by a National Change of Address (NCOA) database check. If we include addresses that were identified as 90-day Vacant the total rises to 17,244. Plotting the distribution of these based on the ZIP+4 identified by the NCOA check shows a disproportionate high amount of issues in the Eastern shore of VA.

A certified commercial provider of NCOA data verification was used to facilitate this analysis on raw data obtained from the VA Dept of Elections (“ELECT”). It is not technically possible to obtain a truly time-synchronized complete set of data for any election due to the way elections are run in VA, but we made every effort to obtain the data from the state as close in time as was possible. The NCOA database is maintained and curated by the United States Postal Service (USPS).

For those wishing to review specific entries, or to help validate these issues, and who are part of an organization that is able to receive and handle election information according to VA law and the VA Dept of Elections requirements, you may contact us to request the raw data breakdowns. We will need to validate your organization or employment and will make data available as legally allowed.

Commentary and Discussion (added 2022-12-15):

In response to recent interest on this matter, I would like to be very clear: We are simply presenting the data as compiled to facilitate public discourse. We have strived to only utilize data directly obtained from authoritative sources (ELECT, the USPS via TrueNCOA provider).

The designation of “Invalid” addresses is according to the definition by the USPS and TrueNCOA, i.e. the TrueNCOA check has reported the addresses as listed in the RVL have no match in the USPS database. Invalid addresses do not include things like valid P.O. Boxes or valid rural addresses.

The VA Constitution (Section II-1 and II-2) specify the requirements for voter eligibility to include that voters are required to supply a primary address for their registration record, regardless of their method of voting. VA is required by law to consistently maintain and validate these records. Based on the below analysis, the data shows that there is a small but statistically significant number of “Invalid” addresses associated for voters who cast ballots in the Nov 2022 election.

Continuing EPEC’s mission to promote voter participation, analyze election technology, and educate the public about best practices in managing election technology systems; we are providing the below analysis in order to educate and inform the public, legislators and elections officials about the existence of these discrepancies.

Details:

After receiving the results of a National Change of Address (NCOA) database check on the registration (not the temporary) addresses in the latest VA Registered Voter List (RVL). I’ve gone through and collated the flagged addresses and reconciled them with the entries in the Daily Absentee List (DAL) file records provided by the VA Dept. of Elections (“ELECT”).

The DAL file (dated 2022-12-08) provides a records of all of the voters that cast absentee (either Early In-Person or Mail-In) ballots in the election, and the RVL (dated 2022-11-23) gives all of the registered voter addresses and other pertinent information. Both datasets come directly from the VA Dept. of Elections and must be purchased. Total cost was ~$7000. The two datasets can be tied together using the voter Identification Number that is assigned to each (supposedly) unique voter by the state. Entries in the RVL should be unique to each registered voter (although there are a small number of duplicate voter IDs that I have seen … but thats for another post), whereas the DAL file can have multiple entries attributed to a single voter recording the various stages of ballot processing.

The NCOA check was performed on all addresses in the RVL file in order to detect recent moves, invalid addresses, vacant addresses, P.O. Boxes, commercial addresses, etc. The NCOA check takes multiple days to run using a commercial service provider and was executed between 2022-12-01 through 2022-12-06. The processing needed to be performed in two batches.

Results:

Raw TrueNCOA Processing result stats on the full RVA dataset:
NCOA Processing of VA RVL 2022-11-23 RecordsBatch 1Batch 2TotalPercent
Records Processed5,831,089296,7676,127,856
18 – Month NCOA Moves264,21012,618276,8284.52%
48 – Month NCOA Moves155,274865156,1392.55%
Moves with no Forwarding Address23,65144724,0980.39%
Total NCOA Moves443,13513,930457,0657.46%
Vacant Flag26,8651,74228,6070.47%
DPV Updated/Address Corrected Records568,03920,748588,7879.61%
DPV Deliverable Records5,555,024280,2075,835,23195.22%
DPV Non-Deliverable Records173,32212,427185,7493.03%
LACS Updated (Rural Address converted to Street Address)32,1161,32733,4430.55%
Residential Delivery Indicator5,681,183289,3455,970,52897.43%
Addresses matched to the USPS Database5,728,347292,6346,020,98198.26%
Invalid Addresses102,6174,161106,7781.74%
Expired Addresses6,8755767,4510.12%
Business Move (B)33963450.01%
Family Move (F)110,4443,549113,9931.86%
Individual Move (I)332,35210,375342,7275.59%
General Delivery Address15301530.00%
High Rise Address703,90362,059765,96212.50%
PO Box Address26,97377027,7430.45%
Rural Route Address791800.00%
Single Family Address5,012,679230,2005,242,87985.56%
Unknown49,2992,45751,7560.84%
Reporting as presented from the TrueNCOA data service. The TrueNCOA data dictionary is presented here.
Combining NCOA results of RVL Addresses with the DAL data:
Vacant Addresses:

There were 1,829 records across the state with registered addresses that have been flagged as (90-day) “Vacant” by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election according to the DAL file. Of those records, 1,317 were Early In-Person and 491 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

Note: This graphic was updated on Thu Dec 15 to correct a typo in the title as to the date of the DAL file that was used.
P.O. Boxes (Non-protected):

There were 1,648 records across the state with registered addresses that have been flagged as P.O. Box Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election AND were NOT listed as protected entries according to the DAL file. (VA allows for voters who have a legal protective order to list a P.O. Box as their address of record on public documents) Of those records, 1,348 were Early In-Person and 294 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

Note: This graphic was updated on Thu Dec 15 to correct a typo in the title as to the date of the DAL file that was used.
Invalid Addresses:

There were 15,419 records across the state with registered addresses that have been flagged as “Invalid” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election. Of those records, 12,766 were Early In-Person and 2,566 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

Note: This graphic was updated on Thu Dec 15 to correct a typo in the title as to the date of the DAL file that was used.
Invalid OR Vacant Addresses:

There were 17,244 records across the state with registered addresses that have been flagged as “Invalid” or “Vacant” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election. Of those records, 14,083 were Early In-Person and 3,053 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

Note: This graphic was updated on Thu Dec 15 to correct a typo in the title as to the date of the DAL file that was used.
Record of Moves Out-of-State:

There were 793 records that had records of NCOA moves to valid out-of-state addresses before 2022-08 that also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election. Of those records, 338 were Early In-Person and 454 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

Note: This graphic was updated on Thu Dec 15 to correct a typo in the title as to the date of the DAL file that was used.

Results By District:

This section was added 2022-12-15, per multiple requests for by-district breakouts.

District 01:
Invalid Addresses:

There were 3,222 records with registered addresses that have been flagged as “Invalid” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 01. Of those records, 2,841 were Early In-Person and 364 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

Invalid OR Vacant Addresses:

There were 3,310 records with registered addresses that have been flagged as “Invalid” or “Vacant” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 01. Of those records, 2,909 were Early In-Person and 384 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

District 02:
Invalid Addresses:

There were 2,552 records with registered addresses that have been flagged as “Invalid” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 02. Of those records, 2,185 were Early In-Person and 353 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

Invalid OR Vacant Addresses:

There were 2,763 records with registered addresses that have been flagged as “Invalid” or “Vacant” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 02. Of those records, 2,346 were Early In-Person and 400 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

District 03:
Invalid Addresses:

There were 137 records with registered addresses that have been flagged as “Invalid” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 03. Of those records, 97 were Early In-Person and 34 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

Invalid OR Vacant Addresses:

There were 412 records with registered addresses that have been flagged as “Invalid” or “Vacant” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 03. Of those records, 283 were Early In-Person and 117 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

District 04:
Invalid Addresses:

There were 507 records with registered addresses that have been flagged as “Invalid” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 04. Of those records, 423 were Early In-Person and 78 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

Invalid OR Vacant Addresses:

There were 695 records with registered addresses that have been flagged as “Invalid” or “Vacant” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 04. Of those records, 567 were Early In-Person and 121 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

District 05:
Invalid Addresses:

There were 2,093 records with registered addresses that have been flagged as “Invalid” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 05. Of those records, 1,738 were Early In-Person and 348 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

Invalid OR Vacant Addresses:

There were 2,264 records with registered addresses that have been flagged as “Invalid” or “Vacant” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 05. Of those records, 1,860 were Early In-Person and 395 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

District 06:
Invalid Addresses:

There were 1,214 records with registered addresses that have been flagged as “Invalid” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 06. Of those records, 990 were Early In-Person and 212 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

Invalid OR Vacant Addresses:

There were 1,390 records with registered addresses that have been flagged as “Invalid” or “Vacant” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 06. Of those records, 1,129 were Early In-Person and 247 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

District 07:
Invalid Addresses:

There were 1,042 records with registered addresses that have been flagged as “Invalid” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 07. Of those records, 868 were Early In-Person and 167 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

Invalid OR Vacant Addresses:

There were 1,139 records with registered addresses that have been flagged as “Invalid” or “Vacant” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 07. Of those records, 946 were Early In-Person and 183 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

District 08:
Invalid Addresses:

There were 276 records with registered addresses that have been flagged as “Invalid” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 08. Of those records, 148 were Early In-Person and 125 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

Invalid OR Vacant Addresses:

There were 517 records with registered addresses that have been flagged as “Invalid” or “Vacant” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 08. Of those records, 300 were Early In-Person and 212 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

District 09:
Invalid Addresses:

There were 3,247 records with registered addresses that have been flagged as “Invalid” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 09. Of those records, 2,639 were Early In-Person and 597 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

Invalid OR Vacant Addresses:

There were 3,369 records with registered addresses that have been flagged as “Invalid” or “Vacant” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 09. Of those records, 2,733 were Early In-Person and 624 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

District 10:
Invalid Addresses:

There were 940 records with registered addresses that have been flagged as “Invalid” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 10. Of those records, 740 were Early In-Person and 198 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

Invalid OR Vacant Addresses:

There were 992 records with registered addresses that have been flagged as “Invalid” or “Vacant” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 10. Of those records, 783 were Early In-Person and 207 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

District 11:
Invalid Addresses:

There were 189 records with registered addresses that have been flagged as “Invalid” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 11. Of those records, 97 were Early In-Person and 90 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

Invalid OR Vacant Addresses:

There were 393 records with registered addresses that have been flagged as “Invalid” or “Vacant” Addresses by the NCOA check and also had an Early In-Person, Mail-In, FWAB or Provisional ballot cast in the VA 2022 General Election in District 11. Of those records, 227 were Early In-Person and 163 were Mail-In. The geographic distribution of the addresses (based on the ZIP+4), as reported by the NCOA service, is shown below, with the size of the marker proportional to the total number of counts at that ZIP+4 location.

Summary Data Files by Locality:

The complete set of graphics and statistics for each locality, and each congressional district in VA can be downloaded here as a zip file. The tabulated summary results can also be downloaded in excel, csv, or numbers format:

Categories
Election Data Analysis Election Forensics Election Integrity Interesting technical

Interesting change in effective dates in VA Registered Voter List

I’ve stumbled across an interesting data artifact that I’m not sure what to make of. But I will present it here for completeness.

In the Registered Voter List available from the VA Dept of Elections (“ELECT”), each record of a registered voter has an “effective date” associated with it. This can be the same as the actual registration date, or the date that the voters record is returned to “active” status, etc. It appears that sometime within the last year, almost all of the voter registrations with a previous effective date earlier than June 2011 have had their effective date reassigned.

For this analysis I am using an RVL that I purchased from ELECT on 2021-11-06 and comparing it with an RVL purchased on 2022-11-22. I am only comparing the records associated with common voter IDs between each dataset. Any new or removed voters in the last year have been removed from the data and the corresponding plots below.

In the 2021 RVL, we can see the distribution of the effective dates in the histogram below. The majority of records have rather recent effective dates, but there are diminishing tails from long-term voters who’s effective date of their registration goes back many years. (The y-axis in the plot is logarithmic, so we can better see the shape of the distribution tails.)

This isn’t all-together so surprising. Newer voters, or voters who have made recent changes to their registration information, will likely get an updated effective date on their voter registration record. Older, or longer term voters, that have not made any recent changes and stay active would show older effective dates on their records.

Now compare that to the RVL file dated 2022-11-22. Again, this comparison and the data in these plots is only those records that share common voter ids between the two files. Sometime between the time I downloaded the 2021-11-06 RVL and the 2022-11-22 RVL, almost all records with effective dates before July 2011 have had their effective dates reset to a more recent date.

I don’t really know what to make of this. Was there a mass update of voter registration records? Or a database restore, or some other operation on the records?

Even more interesting is when we superimpose the two histograms we see that the 2022 records with effective date after July 2011 look to also have had a significant percentage of dates reset. We see the red curve maintains it shape, save for the large spike at the far right, but is shifted lower … as if a constant percentage of the records have been included in the effective date shift.

Now if we apply a constant multiplier of 20x to the red (2022) dataset we can mostly re-align the histograms.

Of the effective dates that were changed between the two files, the distribution of the adjustments to the effective dates is shown below. I find it interesting that there are a number of records where the effective date has been moved backwards (?) in time.

We know there have been significant issues with the database used by ELECT (known as the “VERIS” database), so maybe this is an artifact of some maintenance operations or repairs on the data entries? Or maybe this is a symptom of a larger problem. Whatever it is, it doesn’t make a lot of sense in what should be a very well maintained and authoritative set of records.

Categories
Election Data Analysis Election Forensics Election Integrity Press Release

Virginia’s Prince William County Conducts Ballot Recount after Errors Reported in Election Scanner


PWC Board Concludes Human Error ‘Likely’ Before Certifying Results

Non-profit electoral process group praises electoral board for responding to public’s calls to conduct a partial audit of discrepancies before certifying election results

November 16, 2022 –- Election officials in Virginia’s Prince William County met to certify the results of the 2022 General Election after they conducted a hand-count of ballot counts in conflict with machine counts and concluded that human error was a “likely” cause rather than machine error.

The board’s decision to conduct a hand-count of the ballots in question followed a call for a review from the public, local and state officials, and Electoral Process Education Corporation (EPEC), a
Virginia-based non-profit 501c (3) that provides election data analysis.

The ballots in question in PWC’s Precinct 612 remain in the custody of the general registrar; the board ruled the matter resolved after conducting a hand count of the machine bin that tallied 27 more
votes than physical ballots that were scanned into the machines. (See prior release detailing the error here: https://digitalpollwatchers.org/multiple-errors-found-in-virginias-2022-election-scanner-and-pollbook-data/)

Although the PWC electoral board was not able to reconcile one outstanding difference between the poll-book count of votes cast and the machine scans in two precincts, it judged the differences were
not likely coming from machine malfunctions.

While it is still problematic for poll books not to match the scanner tapes, it is a much more serious issue when the number of physical ballots in the accumulation bin and the scanner totals do not match.

EPEC’s recommendations based on its analysis of the election data include the following:

Standard processes should be updated to require verification that the number of ballots in the scanner collection bin match the scanner result report tape. If the numbers do not match, an on premises hand tabulation be performed by the election officers and the results recorded in the official public record.

EPEC also commended the PWC board of elections for ensuring transparency by confirming that the final counts of the physical ballots align with machine scan tallies before certifying the results.

Categories
Election Data Analysis Election Forensics Election Integrity Press Release

Multiple Errors Found in Virginia’s 2022 Election Scanner and Pollbook Data

At Least Two Precincts Showed Different Physical Ballot Count Compared to Scanner Counts

Non-profit electoral process group calls for a full audit of precincts with discrepancies before Commonwealth of Virginia certifies local and statewide election results.

November 14, 2022 — Electoral Process Education Corporation (EPEC), a non-profit 501c (3) that performs election data analysis, is urging Virginia’s public election officials to verify scanner machine ballot counts before certification of results in key precincts as a result of recent findings.

The recommendation comes after election officers, analysts and observers discovered discrepancies in the data reported to and provided by the Virginia Department of Elections (“ELECT”). The findings raise questions about the proper certification of the machines in question, and whether issues were addressed according to statewide election protocols.

In at least two precincts in Prince William County (PWC) the number of physical ballots cast and accumulated was different than the machine scanner’s tally of ballots, as reported by election officers. The numbers must align as part of the precinct’s tracking of total ballots cast at the voting location.

Although the number of ballots impacted was small, the repeated findings raise questions about the origin of the errors and whether the machines were operating correctly.

In Virginia’s VA-7 Congressional District Race, election officers observed differences in ballot counts of voters who were checked in with pollbooks compared to the actual number of ballots in the machines throughout the day. When the election officers went to close out the polling station, they discovered a ballot scanner with 27 more ballots represented in the electronic total than physical ballots present inside the machine’s collection bin.  The scanner reported 531 ballots scanned and recorded, but only 504 physical ballots were in the collection bin underneath the scanner.

Election officers documented these issues with the General Registrar and Electoral Board and recorded the information in the official Statement of Results (SOR) and Chief’s notes. The officers proceeded to conduct a hand tabulation of the vote totals on the ballots in accordance with election procedures. They repeated this tabulation multiple times, with multiple officers witnessing the process.  The results of the hand tabulation, as compared to the scanner totals, is as follows:

The Democrat candidate received 22 of the unexplained votes, a 7.86% difference compared to the physical ballot tally for the candidate. The Republican candidate received another 3 votes, a 1.34% difference over the physical ballot tally for the candidate.  There were 2 write-in ballots.

In Virginia’s VA-10 Congressional District Race, elections officers also found a small difference between machine scans and physical ballots (approximately 5-10, out of 1505 cast). EPEC is working to confirm if this discrepancy was reported on the official Statements of Results or not.

EPEC’s collection and analysis of additional datasets resulted in the discovery of further issues and discrepancies, to include the following:

  • EPEC assisted poll watching teams with a unique web form for documenting observations. According to its analysis of 738 reports (at the time of this writing) by poll watchers in Virginia, 21% (155 reports) contained at least one serious issue flagged for further review; 10.16 % (75 reports) specifically flagged data discrepancies or equipment issues. The VA poll watcher reporting summary can be reviewed at https://digitalpollwatchers.org/2022-general-election-va-poll-watcher-reporting-summary/

The PWC Electoral Board is expected to meet Tuesday, Nov. 15th, to perform a final certification of local election results.  The State Board of Elections will subsequently meet to certify the results of the election statewide.

EPEC is urging VA election officials to perform a detailed, transparent process to explain and rectify these discrepancies. It has compiled a list of recommendations based on its analysis:

  • Sequester all equipment at voting precincts that found discrepancies between ballots and machine-counts, perform a full hand count and tabulation audit for these precincts.
  • Physical ballots should be compared with the Scanner Report Tapes, the full Cast Vote Record (CVR), Digital Ballot Images and other machine records and logs. 
  • Ensure election equipment complies with Virginia’s election statutes by certifying software, hardware, and programming hash codes, and then checking randomly selected precincts with the same equipment but with no reported issues.
  • Standard processes should be updated to require verification that the number of ballots in the scanner collection bin match the scanner result report tape. If the numbers do not match, an on-premises hand tabulation shall be performed by the election officers and the results recorded in the official public record.

Update 2022-11-16:

The Prince William County Board of Elections and Office of Elections heeded the call to investigate further and performed a hand recount of the ballots for the VA-7 precinct in question. See Virginia’s Prince William County Conducts Ballot Recount after Errors Reported in Election Scanner for more information.

Categories
Election Data Analysis Election Forensics Election Integrity technical

“On Machine” ballots with logically impossible time stamps

In looking over the VA DAL data, one interesting issue that is readily apparent, is that the BALLOT_RECIEPT_DATE field for in-person, on-machine early vote data is logically impossible.

These time-stamps are supposed to be generated by the electronic poll-books when a voter is checked in at an in-person early voting site. The appeal and rationale for utilizing electronic poll-books is exactly because the can automate the recording of check-in and (theoretically) minimize human error. The operating hours of VA in-person early voting sites are limited to 7am – 7pm. I’m not aware of any in-person early voting center that had extended hours past those. Therefore, logically, we would expect that the electronic poll book generated time stamps for check-ins for in-person on-machine early votes would fall within the 7am – 7pm bounds.

The plot below is generated directly from the Daily Absentee List (DAL) file pulled from the VA Department of Elections on 11/08/2022 at 6am. The x-axis gives the time (rounded to the nearest minute) of the BALLOT_RECIEPT_DATE field associated with recorded Early In-Person On-Machine ballots in the file. The (logrithmic) y-axis gives the total number of Early In-Person On-Machine records that were recorded with that unique timestamp. The blue trace represents all of the records that fall within the daily 7am – 7pm bounds, and the red trace represents the data outside of those bounds.

There were 520,549 records that fall within the expected time bounds, and 156,576 that fall outside of the bounds. From a purely systems perspective, that means that the ability of our electronic poll books (or the backend database they are tied to) to accurately record the check-in time of Early In-Person On-Machine voters has an error rate of 156576 / (156576+520549) = 23.12%.

Let me say that again. A 23.12% error rate.

23.12% of the time, our electronic poll-book based system is reporting a logically impossible time for a person to have physically walked into an open + operating early voting location to check-in and cast their ballot.

Now, if we want to be generous and allow for the possibility that maybe voting locations opened early or closed late and we pad our (7am – 7pm) bounds to be from (6am – 8pm) and run the same analysis, we still get an error rate of 23.09%.

If we pad the hours of operations limits even further to (5am – 9pm), we still get an error rate of 23.06%.

If we run the same analysis using the 7am – 7pm bounds on the 2021 and 2020 data we get 29.64% and 71.17% error rates, respectively.

Update 2022-11-13

I adjusted the allowed times to 7am-10pm and re-ran the most recent 2022, 2021 and 2020 DAL files, as well as breaking down by locality. While doing this I noticed that some localities had all timestamps set to midnight, while others still had invalid timestamps set to unique values (but outside operational hours), and some had combinations of both. I’ve delineated the plots such that magenta traces are from ballot receipt timestamps that are all set to midnight, red trace is invalid timestamps not set to midnight, and blue traces are valid within 7am-10pm hours of operation (which is very very generous).

There are two error percentages being computed and being displayrd in the graph title area. The first (“BRx error”) is as described above and results in a 23.14% error in the 2022 VA statewide data. The second (“BRx_Mok error”) is as described above except we allow for the uniformly midnight ballot receipt dates to be presumed allowable, and results in a 0.05% error metric.

The inclusion of the latter class of error computation is in order to account for the remote chance that a locality is legitimately using paper poll books or otherwise not recording the time of the voter checkin, but only recording the date information (which would be consistent with all timestamps at midnight). VA requires the use of electronic poll books, but there are still some that use manual entry paper poll-books as backup. So even IF that was the explanation for why so many entries were uniformly timestamped to midnight … (A) why did they have to go to their paper poll book backups in the first place? and (B) we still have a residual error of 0.05% across the state that needs to be explained even after removing uniform midnight timestamps from consideration. That might not seem a terribly huge error rate at first blush, but when you consider that most electronic data recording systems (at least that I am aware of) have error rate requirement thresholds for acceptance testing set to the order of 1/1,000,000 … thats still unacceptable. I have been unable to find a documented requirement for error rate threshold for the electronic poll book systems used in VA, as per the VA department of elections.

The complete tabulation of all errors for each locality is provided here:

Selected Locality Plots:

The segmented Prince William County (my home county) 2022 plot is below. There is a 0.06% error rate of invalid (all midnight) timestamps in the Ballot Receipt date data.

The segmented Loudoun County 2022 plot is below. There is a 0.03% error rate of invalid (all midnight) timestamps in the Ballot Receipt date data.

The segmented Manassass City 2022 plot is below. There is a 5.82% error rate of invalid (all midnight) timestamps in the Ballot Receipt date data.

The segmented Mathews County 2022 plot is below. There is a 24.21% error rate of total invalid timestamps in the Ballot Receipt date data, and a reduced error rate of 15.71% when allowing all midnight timestamps to be considered as valid.

The segmented Virginia Beach City 2022 plot is below. There is a 0.24% error rate of invalid (all midnight) timestamps in the Ballot Receipt date data.

The complete set of generated plots for every locality is included in the attached zip file:

Categories
Election Data Analysis Election Integrity Uncategorized

VA Daily Absentee List

The EPEC staff monitors the Virginia Daily Absentee List for unexpected values. We essentially “audit” the electoral process in Virginia during an election cycle. We are currently monitoring the 2022 General Election.

One of the areas of interest is the DAL – Daily Absentee List. It shows the current status of absentee voting in Virginia – by mail in ballot and early voting (absentee in person).

In Virginia, Absentee In-Person Early Voting started on Friday, September 23. Our initial DAL file was saved on Saturday, September 24, at 9 PM.

The official Ballot Status in the DAL at 9 PM was:

Issued: 290,095

Federal Worker Absentee Ballot (FWAB) 1

Marked: 2,118

On Machine: 8,397

Not Issued: 5,766

Unmarked: 546

Pre-Processed: 1

Deleted: 13,015

Grand Total: 319,939

Nearly 19,327 ballots – 6 % of those requested, were in a state which would not be counted if the election vote counting period were over today – Not Issued, Unmarked, or Deleted. There was also 1 ballot in a Pre-Processed Ballot Status state. The magnitude of ballots in one of these “states” is surprising but not alarming.

It appears Not Issued means there is either a backlog in mailing out ballots or an issue with voter registration – legal name, address of record in the registration database, citizenship, etc. Unless the backlog or issue is resolved, the voter will be denied a ballot.

Unmarked is associated with mail-in Absentee Ballots. A Marked ballot is moved to an Unmarked status if an election official notices an error with the associated absentee ballot documents such as a name or address error, missing signature, or missing signature verification. Election officers are required to contact voters if their ballot requires a cure – correction to the information accompanying the ballot. If the cure is not provided, the ballot will not be counted. Some voters choose to have a new ballot mailed to them if a cure is required, in which case a ballot in the Unmarked state will be spoiled and marked Deleted in the system. This is one of the reasons we see voters having one or more Deleted ballots associated with them in the DAL files.

Deleted ballots are not supposed to be processed (counted). We believe these are officially referred to as “spoiled ballots. The process to keep these separate from countable ballots is an interest area for election integrity observers. The most common reason for ballots to get Deleted (spoiled) is voter error. Examples: mistake when filling out a ballot in person resulting in the first ballot being spoiled and a new ballot issued, or a voter surrendering an absentee ballot to vote in person or receive a new one via the mail.

More accurate voter registration records MAY reduce the volume of initial Not Issued and Deleted ballots. Our post-election observations and recommendations will address this issue. Our initial hypothesis – changes in residency, relocation within Localities, ineligible voters requesting ballots, and voters passing away probably account for most of the unexpectedly large values of ballots in an “at risk” state.

Categories
Election Data Analysis Election Forensics Election Integrity technical

2022 VA General Daily Changes to Voter Registration Totals

Here is the changes to the voter registration numbers for each VA locality over the course of the 2022 general election. These files will be updated automatically as the data becomes available. The first graph below is the percent change with the color coding clamped to +/- 3 x the standard deviation, and the second is the absolute percent change.

The computed csv file for the above data is here: https://digitalpollwatchers.org/files/2022/VA/registration-changes/2022-va-general-voter-registration-count-changes.csv

Categories
Election Data Analysis Election Forensics Election Integrity technical

2022 VA General Election DAL File Statistics

Update 10-17-2022: There has been an issue with the VERIS system (the database that runs behind the scenes at the VA department of elections) where updates to the DAL files have not progressed since 10/14. On 10/17 there was a published change to the data files but the report generated was incomplete and cutoff halfway through its listing of CARROL COUNTY data. I had a phone conversation on 10/17 with ELECT and they are aware of the issue and working to correct it. Also I have included a new gallery at the bottom of the page of all of the individual localities or precincts that are automatically flagged as having issues of concern. Issues detected include any number of “vanishing” voters as defined below, “On Machine” ballot counts that decrease day-to-day, “Marked” OR “Pre-Processed” counts that decrease day-to-day, etc.

Update 10-18-2022: The publication of the DAL files has resumed. I have queries in to the department of elections as to the exact cause of the issues and will update accordingly as I find out more information.

Below is the current set of statistics from the 2022 VA General Election Daily Absentee List (DAL) file records. There are two plots below representing the same data, one plot with a linear y-axis and the other with a logarithmic y-axis. The x-axis is the date that each DAL file processed was archived and pulled from the Dept of Elections (ELECT) servers. Solid traces are directly extracted data from the DAL files. Dashed traces are computed metrics such as the number of “vanished” voters detected (described below). Red datapoints are placed on traces that exhibit questionable behavior, for example if the number of “approved” and “countable” ballots ever decreases, etc. Vertical dotted lines indicate important dates.

There are two very important fields in the DAL file that we want to pay attention to here: the APP_STATUS field, and the BALLOT_STATUS field.

DAL records with APP_STATUS = “Approved” and BALLOT_STATUS = “Issued” indicate a ballot that has been mailed to a voter.

DAL records with APP_STATUS = “Approved” and BALLOT_STATUS = “Marked” indicate a mail-in ballot that has been mailed to a voter, and then subsequently returned.

DAL records with APP_STATUS = “Approved” and BALLOT_STATUS = “Pre-Processed” indicate a mail-in ballot that has been mailed to a voter, returned and the ballot envelope has been opened and the ballot processed.

DAL records with APP_STATUS = “Approved” and BALLOT_STATUS = “On Machine” indicate a ballot record from a voter who physically walked into an early voting site and cast their vote on a tabulator machine.

DAL records with APP_STATUS = “Approved” and BALLOT_STATUS = “FWAB” indicate a Federal Worker Absentee Ballot (FWAB) mail-in ballot that has been received.

The combination of all ballots that have APP_STATUS=Approved and BALLOT_STATUS = “Marked” | “Pre-Processed” | “On Machine” | “FWAB” we term as “Countable” ballots.

I’ve computed the number of countable records that have an invalid BALLOT_RECIEPT_DATE or an invalid APP_RECIEPT_DATE. (For example if the BALLOT_RECIEPT_DATE is before the start of early voting, etc.)

I am also attempting to detect the number of duplicate voter IDs in a “countable” (as described above) state, if any, for each DAL file.

Additionally, I’ve computed and plotted the number of “Vanished” voters seen as we process the DAL files in chronological order. As each publication of the the DAL file is intended to capture information on all of the absentee ballots to date during an election, we would expect that once a unique voter ID becomes a record in the DAL file, that all subsequent DAL files should have an entry for that ID, regardless of its status. However, we know there are multiple instances where a voter ID will show up in the DAL record on a given date, and then be completely missing from a future DAL file.

Upon asking the department of elections for clarification as to how this can occur, their answer given was that if the voter has their registration cancelled for any reason, they are also removed from the DAL file. This holds true, even if live ballots had been issued for that voter, or if the voters vote has already been fed into a tabulator. This means that there is NO ACCOUNTING for these ballots in the DAL record. Note that the department of elections also does the same thing with the Voter History List (VHL) and the List of Those Who Voted (LTWV) data files. This is apparently standard operating procedure for the VERIS database(s) at ELECT, and (I quote) “… nothing unusual …” or to be concerned about as far as the department of elections is concerned. I vehemently disagree, and think that removing these records from the DAL while the election is ongoing is extremely problematic, to put it politely.

I will continue to update these plots as the election progresses. as more data comes in I will also be publishing these types of graphs for selected localities and precincts.

All of the latest plots for every locality and precinct as well as the corresponding underlying CSV data files will be updated daily, and you can download them here.

The semilog versions of the plots for all localities or precincts that appear in the DAL data that have flagged issues of concern are shown in the gallery below. The image carousel below might take a moment to load, btw.

Categories
Election Data Analysis Election Forensics Election Integrity Interesting programming technical

Updates to Henrico CVR processing

Note: For background information, please see my introduction to Cast Vote Records processing and theory here: Statistical Detection of Irregularities via Cast Vote Records.

Since I posted my initial analysis of the Henrico CVR data, one comment was made to me by a member of the Texas election integrity group I have been working with: We have been assuming, based on vendor documentation and the laws and requirements in various states, that when a cast vote record is produced by vendor software the results are sorted by the time the ballot was recorded onto a scanner. However, when looking at the results that we’ve been getting so far and trying to figure out plausible explanations for what we were seeing, he realized it might be the case that the ordering of the CVR entries are being done by both time AND USB stick grouping (which is usually associated with a specific scanner or precinct) but then simply concatenating all of those results together.

While there isn’t enough information in the Henrico CVR files to breakout the entries by USB/Scanner, and the Henrico data has record ID numbers instead of actual timestamps, there is enough information to break out them by Precinct, District and Race, with the exception of the Central Absentee Precincts (CAP) entries where we can only break them out by district given the metadata alone. However, with some careful MATLAB magic I was able to cluster the results marked as just “CAP” into at least 5 different sub-groupings that are statistically distinct. (I used an exponential moving average to discover the boundaries between groupings, and looking at the crossover points in vote share.) I then relabeled the entries with the corresponding “CAP 1”, “CAP 2”, … , “CAP 5” labels as appropriate. My previous analysis was only broken out by Race ID and CAP/Non-CAP/Provisional category.

Processing in this manner makes the individual distributions look much cleaner, so I think this does confirm that there is not a true sequential ordering in the CVR files coming out of the vendor software packages. (If they would just give us the dang timestamps … this would be a lot easier!)

I have also added a bit more rigor to the statistics outlier detection by adding plots of the length of observed runs (e.g. how many “heads” did we get in a row?) as we move through the entries, as well as the plot of the probability of this number of consecutive tosses occurring. We compute this probability for K consecutive draws using the rules of statistical independence, which is P([a,a,a,a]) = P(a) x P(a) x P(a) x P(a) = P(a)^4. Therefore the probability of getting 4 “heads” in a row with a hypothetical 53/47 weighted coin would be .53^4 = 0.0789. There are also plotted lines for a probability 1/#Ballots for reference.

Results

The good news is that this method of slicing the data and assuming that the Vendor is simply concatenating USB drives seems to produce much tighter results that look to obey the expected IID distributions. Breaking up the data this way resulted in no plot breaking the +/- 3/sqrt(N-1) boundaries, but there still are a few interesting datapoints that we can observe.

In the plot below we have the Attorney Generals race in the 4th district from precinct 501 – Antioch. This is a district that Miyares won handily 77%/23%. We see that the top plot of the cumulative spread is nicely bounded by the +/- 3/sqrt(N-1) lines. The second plot from the top gives the vote ratio in order to compare with the work that Draza Smith, Jeff O’Donnell and others are doing with CVR’s over at Ordros.com. The second from bottom plot gives the number k of consecutive ballots (in either candidates favor) that have been seen at each moment in the counting process. And the bottom plot raises either the 77% or 23% overall probability to the k-th power to determine the probability associated with pulling that many consecutive Miyares or Herring ballots from an IID distribution. The most consecutive ballots Miyares received in a row was just over 15, which had a .77^15 = 0.0198 or 1.98% chance of occurring. The most consecutive ballots Herring received was about 4, which equates to a probability of occurrence of .23^4 = 0.0028 or 0.28% chance. The dotted line on the bottom plot is referenced at 1/N, and the solid line is referenced at 0.01%.

But let’s now take a look at another plot for the Miyares contest in another blowout locality with 84% / 16% for Miyares. The +/- 3/sqrt(N-1) limit nicely bounds our ballot distribution again. There is, however, an interesting block of 44 consecutive ballots for Miyares about halfway through the processing of ballots. This equates to .84^44 = 0.0004659 or a 0.04659% chance of occurrence from an IID distribution. Close to this peak is a run of 4 ballots for Herring which doesn’t sound like much, but given the 84% / 16% split, the probability of occurrence for that small run is .16^4 = 0.0006554 or 0.06554%!

Moving to the Lt. Governors race we see an interesting phenomenon where where Ayala received a sudden 100 consecutive votes a little over midway through the counting process. Now granted, this was a landslide district for Ayala, but this still equates to a .92^100 = 0.000239 or 0.0239% chance of occurrence.

And here’s another large block of contiguous Ayala ballots equating to about .89^84 = 0.00005607 or 0.0056% chance of occurrence.

Tests for Differential Invalidation (added 2022-09-19):

“Differential invalidation” takes place when the ballots of one candidate or position are invalidated at a higher rate than for other candidates or positions. With this dataset we know how many ballots were cast, and how many ballots had incomplete or invalid results (no recorded vote in the cvr, but the ballot record exists) for the 3 statewide races. In accordance with the techniques presented in [1] and [2], I computed the plots of the Invalidation Rate vs the Percent Vote Share for the Winner in an attempt to observe if there looks to be any evidence of Differential Invalidation ([1], ch 6). This is similar to the techniques presented in [2], which I used previously to produce my election fingerprint plots and analysis that plotted the 2D histograms of the vote share for the winner vs the turnout percentage.

The generated the invalidation rate plots for the Gov, Lt Gov and AG races statewide in VA 2021 are below. Each plot below is representing one of the statewide races, and each dot is representing the ballots from a specific precinct. The x axis is the percent vote share for the winner, and the y axis is computed as 100 – 100 * Nvotes / Nballots. All three show a small but statistically significant linear trend and evidence of differential invalidation. The linear regression trendlines have been computed and superimposed on the data points in each graph.

To echo the warning from [1]: a differential invalidation rate does not directly indicate any sort of fraud. It indicates an unfairness or inequality in the rate of incomplete or invalid ballots conditioned on candidate choice. While it could be caused by fraud, it could also be caused by confusing ballot layout, or socio-economic issues, etc.

Full Results Download

References

  • [1] Forsberg, O.J. (2020). Understanding Elections through Statistics: Polling, Prediction, and Testing (1st ed.). Chapman and Hall/CRC. https://doi.org/10.1201/9781003019695
  • [2] Klimek, Peter & Yegorov, Yuri & Hanel, Rudolf & Thurner, Stefan. (2012). Statistical Detection of Systematic Election Irregularities. Proceedings of the National Academy of Sciences of the United States of America. 109. 16469-73. https://doi.org/10.1073/pnas.1210722109.
Categories
Election Data Analysis Election Forensics Election Integrity Interesting programming technical

CVR Analysis – Henrico County VA 2021

Update 2022-08-29 per observations by members of the Texas team I am working with, we’ve been able to figure out that (a) the vendor was simply concatenating data records from each machine and not sorting the CVR results and (b) how to mostly unwrap this affect on the data to produce much cleaner results. The results below are left up for historical reference.

For background information, please see my introduction to Cast Vote Records processing and theory here: Statistical Detection of Irregularities via Cast Vote Records. This entry will be specifically documenting the results from processing the Henrico County Virginia CVR data from the 2021 election.

As in the results from the previous post, I expanded the theoretical error bounds out to 6/sqrt(N) instead of 3/sqrt(N) in order to give a little bit of extra “wiggle room” for small fluctuations.

However the Henrico dataset could only be broken up by CAP, Non-CAP or Provisional. So be aware that the CAP curves presented below contain a combination of both early-vote and mail-in ballots.

The good news is that I’ve at least found one race that seems to not have any issues with the CVR curves staying inside the error boundaries. MemberHouseOfDelegates68thDistrict did not have any parts of the curves that broke through the error boundaries.

The bad news … is pretty much everything else doesn’t. I cannot tell you why these curves have such differences from statistical expectation, just that they do. We must have further investigation and analysis of these races to determine root cause. I’ve presented all of the races that had sufficient number of ballots below (1000 minimum for the race a whole, and 100 ballot minimum for each ballot type).