Categories
Election Data Analysis Election Forensics Election Integrity technical

Distribution of VHL Entries as compared to CSV and Turnout statistics

Second time’s the charm! See here for my (corrected) first attempt. My apologies for the coding error in my first effort to compute the 2020 CSV Estimates.

Per a question that I received I took a look at the difference between the number of voters that the VA department of elections (“ELECT”) says participated in a given election (via their Voter History List files), the number of ballots that ELECT says were counted in an election (via the public CSV Election Results files), and the amount of turnout that occurred in a given election (via the ELECT public Turnout CSV files).

Theoretically, all of these sources should give the same (or extremely similar) results. Note that the VHL is counting people, and the CSV is counting votes cast in the respective races, but the numbers should still be roughly similar between the two sources. Also the Results CSV “Total Vote” field, and the Turnout CSV files used for this analysis should be including overvotes, write-ins, etc so that should not the source of the discrepancies. Furthermore, the 11-06-2021 VHL data file is identical to the VHL datafile I downloaded on 12-14-2021, so there is no “missing data” from the VHL file for the 2021 election, unless ELECT has a significant lag in their updating of voter credit. From my conversations with multiple registrars and elections staff, the voter credit is applied when the canvas is completed and the results are certified, so this also should not be a source of error. Update 2022-07-31 23:05 per email discussion with ELECT staff there IS a significant lag in the updating of the VHL data from the time of canvas and certification. This explains the incomplete 2021 VHL. The voter credit is supposed to be applied as of the canvas and certification, but there is often lag in the data being entered and/or replicated. I am not exactly sure how long of a lag is to be expected. Also, while the links for the DAL file given out by ELECT after a user purchases the data are “live” and updated daily over a 30 day window, the links for the RVL and VHL are NOT! So all of this data should be considered as of Nov 6 2021.

Finally, note that by definition the Voter History List information will slightly undercount the number of voters that participated in previous elections, because voters that have been removed from the registered voter list in between the end of an election period and when the a given VHL file is downloaded (11-6-2021 and 12-14-2021 in this case) will have had their corresponding records also deleted from the VHL. This means that as the date of the VHL file you are using gets further and further away from the election that you are interested in, the VHL data becomes more and more inaccurate by design. This seems like a pretty bad way to do record keeping in my opinion, but thats the way ELECT is handling the data.

Results:

One interesting thing to note, is that even though ELECT and registrars state that the “Voter Credit” (i.e. updates to the VHL) are applied at the conclusion of the canvas and before certification of the election, it appears that there are significant differences with the VHL for 2021 from the Turnout and Result CSV files. The VHL file I am using was first downloaded on 11-06-2021 after the canvas and certification was completed, and I continued to download and archive versions of the the live URL link I received from ELECT until 12-14-2021. There was no change in the content of the VHL file for that entire period. (See my update note above) An example from King George County is shown below.

Another interesting thing to observe is that the CSV Election Results and CSV Turnout Results tend to agree for the most part, while the VHL data discrepancies trend negatively as we move further and further away from past elections. As discussed above, this might be attributable to the fact that changes to the voter registrations are mirrored in updates to the Voter History List. An example from Highland County that shows this small but increasing deviation as we move into the past is shown below.

One specific interesting datapoint is the below plot from Page county, where the 2020 Results CSV and Turnout CSV numbers significantly deviate, with a much smaller deviation between the VHL numbers and the Results CSV. I don’t have any explanation as to why this occurs in the data.

Another specifically interesting datapoint is that the turnout numbers from 2019 in Rockbridge County are significantly less than the number of voters that supposedly participated in the elections according to the VHL. (Note that there were no congressional district races in 2019 November General.) This is particularly interesting considering that we presume, per discussion above, that the VHL undercounts the number of participating voters as we look further and further into the past, as the VHL has voters removed as they are purged from the voter roles. So how does the number of participating voters (from VHL) outnumber the number of votes tallied by ~2400?

A similar occurrence in New Kent County in 2020 can be observed below, with a deviation of nearly -6000 estimated ballots. The VHL file shows there were ~6000 more voters that participated in the election than the Turnout CSV file records, yet the CSV file for the number of votes cast is within 265 votes of the VHL estimate. Whats going on here?

A similar situation occurs in Shenandoah County in 2018, but by a much smaller margin.

Complete Gallery:

Below is a complete gallery of every Locality’s computed results for completeness.

Categories
Election Data Analysis Election Forensics Election Integrity programming technical

Number of duplicated voter records in each VA locality

Computed below is the number of duplicated voter records in each locality as of the 11/06/21 VA Registered Voter List (RVL). The computation is based on performing an exact match of LAST_NAME, DOB and ADDRESS fields between records in the file.

Note: If the combination of the name “Jane Smith”, with DOB “1/1/1980”, at “12345 Some Road, Ln.” appears 3 times in the file, there are 3 counts added to the results below. If the combination appears only once, there are 0 counts added to the results below, as there is no repetition.

Additionally I’ve done an even more restrictive matching which requires exact match on FIRST, MIDDLE and LAST name, DOB and ADDRESS fields in the second graphic and list presented below.

The first, more lenient, criteria will correctly flag multiple records with the same first or middle name, but misspelled such as “Muhammad” vs “Mahammad”, but could also include occurrences of voting age twins who live together or spouses with the same DOB.

The second, more strict, criteria requires that multiple rows flagged have exactly the same spelling and punctuation for FIRST, MIDDLE, LAST, DOB and ADDRESS fields. This has less false positive, but more false negatives, as it will likely miss common misspellings between entries, etc.

There are no attempts to match for common misspellings, etc. I did do a simple cleanup for multiple contiguous whitespace elements, etc., before attempting to match.

I have summarized the data here so as not to reveal any personally identifiable information (PII) from the RVL in adherence to VA law.

Update 2022-07-13 12:30: I have sent the full information, for both the lenient and strict criteria queries, to the Prince William County and Loudoun County Registrars. The Loudoun deputy registrar has responded and stated that all but 1 of the duplications in the stricter criteria had already been caught by the elections staff, but he has not yet looked at the entries in the more lenient criteria results file. I have also attempted to contact the Henrico County, Lynchburg City, and York County registrars but have not yet received a response or request to provide them with the full data.

Update 2022-07-31 23:03: I have also heard back from the PWC Registrar (Eric Olsen). Most of the entries that I had flagged in the 11/6/2021 RVL list have already been taken care of by the PWC staff already. There were only a couple that had not yet been noticed or marked as duplicates. Also, per our discussion, I should reiterate and clarify that the titles on the plots below simply refer to duplicated entries of the data files according to the filtering choice. It is a technically accurate description and should not be read as I am asserting other than the results of the matching operation.

Locality NameNumber of repeated entries
ACCOMACK COUNTY64
ALBEMARLE COUNTY311
ALEXANDRIA CITY225
ALLEGHANY COUNTY16
AMELIA COUNTY32
AMHERST COUNTY84
APPOMATTOX COUNTY18
ARLINGTON COUNTY446
AUGUSTA COUNTY119
BATH COUNTY2
BEDFORD COUNTY170
BLAND COUNTY4
BOTETOURT COUNTY45
BRISTOL CITY22
BRUNSWICK COUNTY30
BUCHANAN COUNTY30
BUCKINGHAM COUNTY32
BUENA VISTA CITY10
CAMPBELL COUNTY82
CAROLINE COUNTY67
CARROLL COUNTY42
CHARLES CITY COUNTY18
CHARLOTTE COUNTY28
CHARLOTTESVILLE CITY80
CHESAPEAKE CITY545
CHESTERFIELD COUNTY948
CLARKE COUNTY40
COLONIAL HEIGHTS CITY18
COVINGTON CITY2
CRAIG COUNTY4
CULPEPER COUNTY114
CUMBERLAND COUNTY8
DANVILLE CITY88
DICKENSON COUNTY22
DINWIDDIE COUNTY44
EMPORIA CITY6
ESSEX COUNTY25
FAIRFAX CITY38
FAIRFAX COUNTY2962
FALLS CHURCH CITY39
FAUQUIER COUNTY203
FLOYD COUNTY28
FLUVANNA COUNTY36
FRANKLIN CITY23
FRANKLIN COUNTY84
FREDERICK COUNTY210
FREDERICKSBURG CITY54
GALAX CITY0
GILES COUNTY24
GLOUCESTER COUNTY52
GOOCHLAND COUNTY84
GRAYSON COUNTY18
GREENE COUNTY32
GREENSVILLE COUNTY16
HALIFAX COUNTY48
HAMPTON CITY285
HANOVER COUNTY316
HARRISONBURG CITY40
HENRICO COUNTY676
HENRY COUNTY74
HIGHLAND COUNTY4
HOPEWELL CITY34
ISLE OF WIGHT COUNTY98
JAMES CITY COUNTY217
KING & QUEEN COUNTY13
KING GEORGE COUNTY42
KING WILLIAM COUNTY43
LANCASTER COUNTY10
LEE COUNTY24
LEXINGTON CITY12
LOUDOUN COUNTY1245
LOUISA COUNTY74
LUNENBURG COUNTY26
LYNCHBURG CITY165
MADISON COUNTY12
MANASSAS CITY64
MANASSAS PARK CITY24
MARTINSVILLE CITY14
MATHEWS COUNTY18
MECKLENBURG COUNTY54
MIDDLESEX COUNTY12
MONTGOMERY COUNTY159
NELSON COUNTY30
NEW KENT COUNTY26
NEWPORT NEWS CITY329
NORFOLK CITY411
NORTHAMPTON COUNTY18
NORTHUMBERLAND COUNTY22
NORTON CITY6
NOTTOWAY COUNTY12
ORANGE COUNTY70
PAGE COUNTY47
PATRICK COUNTY28
PETERSBURG CITY68
PITTSYLVANIA COUNTY84
POQUOSON CITY28
PORTSMOUTH CITY186
POWHATAN COUNTY55
PRINCE EDWARD COUNTY43
PRINCE GEORGE COUNTY77
PRINCE WILLIAM COUNTY1159
PULASKI COUNTY59
RADFORD CITY14
RAPPAHANNOCK COUNTY10
RICHMOND CITY300
RICHMOND COUNTY14
ROANOKE CITY133
ROANOKE COUNTY233
ROCKBRIDGE COUNTY28
ROCKINGHAM COUNTY113
RUSSELL COUNTY28
SALEM CITY58
SCOTT COUNTY18
SHENANDOAH COUNTY48
SMYTH COUNTY40
SOUTHAMPTON COUNTY28
SPOTSYLVANIA COUNTY345
STAFFORD COUNTY410
STAUNTON CITY14
SUFFOLK CITY194
SURRY COUNTY10
SUSSEX COUNTY14
TAZEWELL COUNTY52
VIRGINIA BEACH CITY922
WARREN COUNTY46
WASHINGTON COUNTY78
WAYNESBORO CITY26
WESTMORELAND COUNTY24
WILLIAMSBURG CITY22
WINCHESTER CITY42
WISE COUNTY40
WYTHE COUNTY35
YORK COUNTY178
Locality NameNumber of repeated entries
ACCOMACK COUNTY0
ALBEMARLE COUNTY4
ALEXANDRIA CITY0
ALLEGHANY COUNTY0
AMELIA COUNTY0
AMHERST COUNTY2
APPOMATTOX COUNTY0
ARLINGTON COUNTY10
AUGUSTA COUNTY0
BATH COUNTY0
BEDFORD COUNTY4
BLAND COUNTY0
BOTETOURT COUNTY0
BRISTOL CITY0
BRUNSWICK COUNTY0
BUCHANAN COUNTY2
BUCKINGHAM COUNTY0
BUENA VISTA CITY0
CAMPBELL COUNTY2
CAROLINE COUNTY0
CARROLL COUNTY0
CHARLES CITY COUNTY0
CHARLOTTE COUNTY0
CHARLOTTESVILLE CITY0
CHESAPEAKE CITY8
CHESTERFIELD COUNTY8
CLARKE COUNTY0
COLONIAL HEIGHTS CITY0
COVINGTON CITY0
CRAIG COUNTY0
CULPEPER COUNTY0
CUMBERLAND COUNTY0
DANVILLE CITY0
DICKENSON COUNTY0
DINWIDDIE COUNTY0
EMPORIA CITY0
ESSEX COUNTY0
FAIRFAX CITY0
FAIRFAX COUNTY54
FALLS CHURCH CITY0
FAUQUIER COUNTY2
FLOYD COUNTY0
FLUVANNA COUNTY0
FRANKLIN CITY3
FRANKLIN COUNTY0
FREDERICK COUNTY6
FREDERICKSBURG CITY0
GALAX CITY0
GILES COUNTY2
GLOUCESTER COUNTY0
GOOCHLAND COUNTY0
GRAYSON COUNTY0
GREENE COUNTY0
GREENSVILLE COUNTY0
HALIFAX COUNTY0
HAMPTON CITY8
HANOVER COUNTY0
HARRISONBURG CITY0
HENRICO COUNTY24
HENRY COUNTY2
HIGHLAND COUNTY0
HOPEWELL CITY2
ISLE OF WIGHT COUNTY4
JAMES CITY COUNTY0
KING & QUEEN COUNTY0
KING GEORGE COUNTY0
KING WILLIAM COUNTY0
LANCASTER COUNTY0
LEE COUNTY0
LEXINGTON CITY0
LOUDOUN COUNTY23
LOUISA COUNTY0
LUNENBURG COUNTY0
LYNCHBURG CITY16
MADISON COUNTY0
MANASSAS CITY0
MANASSAS PARK CITY0
MARTINSVILLE CITY0
MATHEWS COUNTY2
MECKLENBURG COUNTY0
MIDDLESEX COUNTY0
MONTGOMERY COUNTY2
NELSON COUNTY0
NEW KENT COUNTY0
NEWPORT NEWS CITY0
NORFOLK CITY0
NORTHAMPTON COUNTY0
NORTHUMBERLAND COUNTY0
NORTON CITY0
NOTTOWAY COUNTY0
ORANGE COUNTY0
PAGE COUNTY0
PATRICK COUNTY0
PETERSBURG CITY0
PITTSYLVANIA COUNTY4
POQUOSON CITY0
PORTSMOUTH CITY0
POWHATAN COUNTY0
PRINCE EDWARD COUNTY0
PRINCE GEORGE COUNTY0
PRINCE WILLIAM COUNTY8
PULASKI COUNTY0
RADFORD CITY0
RAPPAHANNOCK COUNTY0
RICHMOND CITY10
RICHMOND COUNTY0
ROANOKE CITY0
ROANOKE COUNTY11
ROCKBRIDGE COUNTY0
ROCKINGHAM COUNTY4
RUSSELL COUNTY0
SALEM CITY0
SCOTT COUNTY0
SHENANDOAH COUNTY0
SMYTH COUNTY0
SOUTHAMPTON COUNTY0
SPOTSYLVANIA COUNTY4
STAFFORD COUNTY4
STAUNTON CITY0
SUFFOLK CITY4
SURRY COUNTY0
SUSSEX COUNTY0
TAZEWELL COUNTY2
VIRGINIA BEACH CITY4
WARREN COUNTY0
WASHINGTON COUNTY0
WAYNESBORO CITY0
WESTMORELAND COUNTY2
WILLIAMSBURG CITY0
WINCHESTER CITY0
WISE COUNTY0
WYTHE COUNTY2
YORK COUNTY0
Categories
Election Data Analysis Election Forensics Election Integrity Interesting technical

Distribution of VHL Entries for selected VA Localities

MAJOR CORRECTION (2022-07-17): The below analysis incorrectly computed the CSV totals for each of the 3 counties. I had an indexing error into my CSV file list and erroneously computed the 2020 totals from the CSV’s. I have since updated and recomputed all of the VHL and CSV results, and have additionally added a check against the Turnout report numbers as reported by ELECT. These new results will be forthcoming in an upcoming blog post. While there are still discrepancies between the VHL and CSV, their magnitude is not as large as was originally presented here. My apologies for the error. Corrected numbers for the 2020 CSV totals are shown in red below with my original estimates in strikethrough font.

(Edited 2022-07-10 17:31 EST to add better explanation of VHL and CSV files to first paragraph.)

Per Request – Prince William, Fairfax and Loudoun County

According to the CSV file hosted on the ELECT servers and downloaded on 11-30-2020, the number of votes cast in Prince William County in 2020 Presidential race was 228,267 (corrected) 137,874, which is significantly different than the number of Voters that cast ballots (223,404) in 2020 as generated from the Voter History List (VHL). The 2021 CSV file, downloaded from ELECT on 12-11-2021, gives 153,218 voters as compared to 152,166 given by the VHL data. Why the discrepancies? Note that the VHL is counting people, and the CSV is counting votes cast in the races for President(2020)/Gov(2021), but the numbers should still be roughly similar between the two sources. Also the CSV “Total Vote” field should be including overvotes, write-ins, etc so that is not the source of the discrepancies. Furthermore, the 11-06-2021 VHL data file is identical to the VHL datafile downloaded on 12-14-2021, so there is no missing data from the VHL file for the 2021 election. Finally, note that by definition the Voter History List information will undercount the number of voters that participated in the 2020 election, because voters that have been removed from the registered voter list in between the end of the 2020 election period and when the file was downloaded (11-6-2021) will have had their corresponding records deleted from the VHL.

According to the CSV file hosted on the ELECT servers and downloaded on 11-30-2020, the number of votes cast in Fairfax County was 601,243 (corrected) 487232, which is significantly different than the number (589282) generated from the VHL. The 2021 CSV file, downloaded from ELECT on 12-11-2021, give 441262 voters as compared to 439344 given by the VHL data. Why the discrepancies?

According to the CSV file hosted on the ELECT servers and downloaded on 11-30-2020, the number of votes cast in Loudoun County was 224,976 (corrected) 111655, which is significantly different than the number (220758) generated from the VHL. The 2021 CSV file, downloaded from ELECT on 12-11-2021, give 161793 voters as compared to 161449 given by the VHL data. Why the discrepancies?