Categories
Election Data Analysis Election Forensics Election Integrity Interesting programming technical

The 300K VA Ballots Outstanding and At-Risk Claim is Incorrect.

There’s a (IMO frivolous) lawsuit by Mark Elias (here) against the USPO that is causing people to think that there are > 300K outstanding mail-in ballots in VA. According to the VA Dept of Elections data, this is just factually incorrect. There are not 300K outstanding absentee ballots floating around. There is < 100K as per the 11/01/2021 DAL file entries. (Light blue line, logarithmic y axis)

Of the 3 Localities listed in the lawsuit, from the 11/01/2021 DAL File:

ALBEMERLE COUNTYJAMES CITY COUNTYPORTSMOUTH CITY
Total Registered Voters817386196166696
Total Outstanding Mail In Ballots254510611151
Registered / Outstanding * 1003.1%1.7%1.7%

The graph of the breakouts of these 3 counties over time as the DAL files have been updated during the early voting period is shown below. As we see increases in the number of mail in ballots that are in a status state of “Marked” or “Pre-Processed”, we see corresponding decreases in the outstanding “Issued” ballots in all 3 localities. We are NOT seeing a huge buildup of “Issued” ballots that are backlogged as claimed in the lawsuit corresponding to an “October Slowdown” by the USPO. His lawsuit is bunk.

Categories
Election Data Analysis Election Forensics Election Integrity Interesting programming technical

GALAX CITY “Marked +/or Pre-Processed” Vote Tally Decreases … why?

I have been working with a number of different groups for GOTV efforts for the 2021 VA Election, as as such I have been archiving and trending the Daily Absentee List (DAL) data files for a while now. One curious case that I’d love to get somebody at VA ELECT to explain, is why the set of (“Marked” +/or “Pre-Processed”) Ballots for GALAX City has suddenly been reduced to near zero.

As background, the DAL is effectively a transaction log of the status of all early and absentee ballots in VA. When a qualified voter requests a by-mail absentee ballot and it is sent to them a DAL record is created in an “Issued” state. Once it is received the record is updated to “Marked”, and once the ballot has been opened and scanned into a tabulator it is updated to “Pre-Processed”.

In the chart below we can see the number of issued ballots increasing, then decreasing as ballots are received and updated to “Marked” or “Pre-Processed”, but then suddenly (starting with the Oct 26th DAL files) we see the “MarkedOrPre” trace drop to near zero. The only line in this graph that I would ever expect to see decreasing is the number of outstanding “Issued” ballots. So … whats going on here?

Categories
Election Data Analysis Election Forensics Election Integrity Interesting programming technical

VA 2020 Election Analysis Report (v5)

Adding on to my previous report (old version is here).

  • Fixed a few typos and formatting issues.
  • Added some discussion about the fact that the VA DOE did correct the turnout statistic error I found, but did not make any sort of statement or explanation as to why the error was there in the first place; why did it go undiscovered for nearly a year until I pointed it out; what procedures and policies are in place to make sure errors like this don’t happen again?
  • Added a section documenting the discovery that a small number of public records have been retroactively adjusted and specific entries are now missing from the public archives regarding the registered voter totals.

Categories
Election Data Analysis Election Forensics Election Integrity Interesting programming technical

Comparison of 2020 and 2021 Absentee Breakouts in VA

I got asked a question earlier this evening as to how the current 2021 VA election early vote/absentee data was shaping up in comparison to the 2020, so I did some quick processing and plotting of the Daily Absentee List (DAL) from just after the 2020 election (11-09-2020), and the current DAL as of today (10-21-2021).

The below graphs plot the APP_STATUS=”Approved” entries in the DAL, broken out by BALLOT_STATUS and plotted vs their BALLOT_RECEIPT_DATE. One major difference is that we haven’t had any Federal Write-In Absentee Ballots (FWAB) entered into the DAL in 2021 yet that have a valid BALLOT_RECEIPT_DATE, and the 2020 FWAB counts had a very different general curve than the other ballots. We also had a little less than 1000 ballots entered “On Machine” before early voting started in 2020. We see ballots issued earlier in the year for 2021, but no major “On Machine” counts. Note these graphs are logarithmic in the y-axis for easier viewing, and I had to discard 660 entries for 2020 and 11 entries for 2021 because the BALLOT_RECEIPT_DATE was invalid.

Categories
Election Data Analysis Election Forensics Election Integrity Interesting programming technical

More Details on “Ideal” Fingerprint Computations

Per request by a reviewer of my most recent election irregularities report in VA (here), here’s a little more technical detail as to how the “ideal” model is computed in accordance with the original 2012 National Academy of Sciences paper that I based this work off of.

The generalized summary in my report for VA reads as follows:

“The upper right image was computed per the NAS paper; the bottom left image shows what an idealized model of the data could or should look like, based on the reported voter turnout and vote share for the winner. This ideal model is allowed to have up to 3 Gaussian lobes based on the peak locations and standard deviations in the reported Virginia results.”

While that description is absolutely accurate, it glosses over some of the implementation as I didn’t want the reader to go all glassy-eyed on me! A more explicit technical definition is as follows: All of the localized maximal peaks in the 2D histogram that are above pThresh (~= 0.7) x the value of the global maximum peak are used as the centroids of a Gaussian Mixture model, with shared covariance matrix equal to 1.5 x sqrt of the covariance matrix of all of the data points. (Thats a lot of mathematics packed into one sentence, but its accurate!) In the case of the VA per county per cong district data this give us either 2 or 3 peaks dependent on the value that is used for the pThresh threshold. The value of 0.7 was chosen after observing results from multiple states data that I have been doing fingerprint analysis on. The MATLAB imregionalmax(…) function from the Image Processing Toolbox is used to find the candidate localized peaks, and the gmdistribution(…) function from the Statistics toolbox generated the final idealized model.

% HBf is the 2D Histogram image
BW = imregionalmax(HBf);
v = HBf(BW);
[r,c] = find(BW.*HBf >=  max(v(:))*pThresh);
mu = [r,c];
s = 1.5;

cv = diag(diag(s*sqrt(cov(rawData))));
GMModel = gmdistribution(mu,cv);

The end result of this is shown below (bottom left) with the Bayesian Information Criterion (BIC) and number of Gaussian components listed in the title of the bottom left “ideal” plot.

Categories
Election Data Analysis Election Forensics Election Integrity Interesting technical

VA 2020 Election Analysis Report

Finally. This has been a long time in the making, but here is my summarized report on the most significant VA 2020 General Election irregularities that I’ve discovered. All of this information is presented in detail in previous posts on this site, as well as cataloging many other issues, but I’ve collected the major points here to try and make things easily digestible and accessible to those who are interested.

Special thanks to everyone that helped in putting this together, acquiring deciphering and collating data, performing peer review, etc. I have tried to be as meticulous and as transparent as possible so that others can recreate my results at every stage if they wish to validate.

Note there is a newer updated version of this report (here).

Categories
Election Data Analysis Election Forensics Election Integrity Interesting programming technical

NH 2020 Election Fingerprint

Background:

The US National Academy of Sciences (NAS) published a paper in 2012 titled “Statistical detection of systematic election irregularities.” [1]  The paper asked the question, “How can it be distinguished whether an election outcome represents the will of the people or the will of the counters?” The study reviewed the results from elections in Russia and other countries, where widespread fraud was suspected. The study was published in the proceedings of the National Academy of Sciences as well as referenced in multiple election guides by USAID [2][3], among other citations.

The study authors’ thesis was that with a large sample of the voting data, they would be able to see whether or not voting patterns deviated from the voting patterns of elections where there was no fraud. The results of their study proved that there were indeed significant deviations from the expected, normal voting patterns in the elections where fraud was suspected.

Statistical results are often graphed, to provide a visual representation of how normal data should look. A particularly useful visual representation of election data is the election fingerprint. When used to analyze election data, the election fingerprint typically analyzes the votes for the winner versus voter turnout by voting district. The expected shape of the fingerprint is of that of a 2D Gaussian (a.k.a. “Normal”) distribution [4].  (See this MIT News article for a great additional description and primer on the Gaussian or Normal distribution: https://news.mit.edu/2012/explained-sigma-0209)

 Here is an example reprinted from the referenced National Academy of Sciences paper:

The actual election results in Russia, Uganda and Switzerland appear in the left column, the right column is the expected appearance in a fair election with little fraud, and the middle column is the researchers’ model with fraud included.

As you can see, the election in Switzerland shows a range of voter turnout, from approximately 30 – 70% across voting districts, and a similar range of votes for the winner.  

What do the clusters mean in the Russia 2011 and 2012 elections? Of particular concern are the top right corners, showing nearly 100% turnout of voters, and nearly 100% of them voted for the winner.

Both of those events (more than 90% of registered voters turning out to vote and more than 90% of the voters voting for the winner) are statistically improbable, even for very contested elections. Election results that show a strong linear streak away from the main fingerprint lobe indicates ‘ballot stuffing,’ where ballots are added at a specific rate. Voter turnout over 100% indicates ‘extreme fraud’. [1][5]

Election results with ‘outliers’ – results that fall outside of normal voting patterns – are not in and of themselves definitive proof of outright fraud. But additional reviews of voting patterns and election results should be conducted whenever deviations from normal patterns occur in an election.  Additionally it should be noted that “the absence of evidence is not the evidence of absence”:  Election Fingerprints that look otherwise normal might still have underlying issues that are just simply not readily apparent with this view of the data.

Using this studies methodology, in late 2020 and 2021, multiple researchers in the US have applied the same analysis to the US 2020 election results, as well as the results of previous elections.

The US 2020 Election – New Hampshire:

Source Data:

  1. Registered Voter Data: https://sos.nh.gov/media/00lg4swb/names-on-checklist-general-2020.xlsx
  2. Total Regular and Absentee Ballots Cast: https://sos.nh.gov/media/yi4fonny/ballots-cast-2020-general.xls
  3. Vote Totals: https://sos.nh.gov/media/yjmp5qmd/president-2020.xls
  4. Write-In Totals: https://sos.nh.gov/media/wv3m4jne/presidential-write-ins-2020.pdf
  5. Records of Voter Rolls Pre-Election Day, On Election Day, and marked as Absentee.  (Note that due to personal privacy considerations, this raw dataset is not openly published and the raw data must be obtained via request. The summaries of this dataset is included in the “2020-NH-Combined-Data.csv” file included below)

Election Fingerprint:

The upper right image in the following graphic is the computed election fingerprint, computed according to the NAS paper and using official state reported voter turnout and votes for the statewide winner. The color scale moves from precincts with low counts as deep blue, to precincts with high numbers represented as bright yellow. Note that a small blurring filter was applied to the computed image for ease of viewing small isolated histogram hits.

The bottom left image of the graphic shows what an “idealized” model of the data could look like. The upper right image was computed per the NAS paper; the bottom left image shows what an idealized mixture-of-Gaussian model of the data should look like, based on the reported voter turnout and vote share for the winner.

The top-left and bottom-right plots show the sum of the rows and columns of the fingerprint image. The top-left graph corresponds to the sum of the rows in the upper right image and is the histogram of the vote share for Biden across precincts. The bottom right plot corresponds sum of the columns of the upper right image, and is the histogram of the % turnout across the precincts.

Observations/Conclusions:

  • There does not appear to be any majorly distinct linear correlations, over 100% turnout precincts, or otherwise major red flags even though there is some patterned noise.  The distribution is very large and diffuse, and has a definite skew, which is curious, but not necessarily indicative.  
  • There are a small number of outlier precincts outside of the main distribution lobe, most notably the cluster along the 40% turnout line (Lempster, Newport & Claremont Ward 3), and two precincts above 90% turnout (Randolph & Ellsworth).
  • There are at least two major peaks in the main lobe, which is consistent with the theory of a split electorate.
  • The % Vote Share for Biden plot (Upper-Left) is “lop-sided” and shows a distinct skew in the data above the 40% Vote Share mark.  
  • Looking at the difference between the Total Reported Votes from Source B and Total Votes count from official Source C shows 10,666 unaccounted for votes.  The total number of Write-In votes from Source D was only 1158 and not nearly enough to account for this difference.
  • Looking at the difference between the registered voters from Source E and the Registered voters from Source A, there is a difference of 122,248 registrations.

References:

[1] “Statistical detection of election irregularities” Peter Klimek, Yuri Yegorov, Rudolf Hanel, Stefan Thurner Proceedings of the National Academy of Sciences Oct 2012, 109 (41) 16469-16473; DOI: 10.1073/pnas.1210722109 (https://www.pnas.org/content/109/41/16469)

[2] USAID: Assessing and Verifying Election Results: A Decision Makers Guide to Parallel Vote Tabulation and Other Tools (http://web.archive.org/web/20201118021847/https://pdf.usaid.gov/pdf_docs/PA00KGWR.pdf)

[3] USAID: A guide to Election Forensics (http://web.archive.org/web/20210501091306/https://pdf.usaid.gov/pdf_docs/PA00MXR7.pdf)

[4] Multivariate Normal Distribution – Wikipedia (https://en.wikipedia.org/wiki/Multivariate_normal_distribution)

[5] Mebane, Walter R. and Kalinin, Kirill, Comparative Election Fraud Detection (2009). APSA 2009 Toronto Meeting Paper, Available at SSRN: https://ssrn.com/abstract=1450078

Data Files:

Categories
Election Data Analysis Election Forensics Election Integrity Interesting programming technical

Dallas, TX Election Fingerprint

Election Fingerprint for Dallas, TX polling places – All Vote Types:

Conclusions:

  • The results have a rotated and elongated orientation, and looks like a number of distinct “hot-spots” within the larger cluster.
  • Vote share near or over 100% is highly irregular and indicates a strong potential for fraud in any election. In the image above, there is what looks like a cluster centered around 50% turnout and spreading across multiple turnout bins of near 100% vote share for Biden.  Even in contentious elections, voter turnout over 90% is statistically unlikely, but not impossible.
  • Of the 868 precincts in this dataset, 87 are in the > 90% vote share for Biden band, 46 had > 93%, 15 had > 95%, and 2 had > 97%.
  • This fingerprint shows modest items of interest, but is inconclusive.
Categories
Election Data Analysis Election Forensics Election Integrity Interesting programming technical

Atlanta, Georgia Election Fingerprints

Hat tip to Ed Solomon for collating the data on this one. There’s a bunch more coming as Ed’s done the heavy lifting on a bunch of localities, and I’m working through them.

Election Fingerprint for Atlanta, GA polling places – All Vote Types:

Conclusions:

  • The results do not form a normal Gaussian distribution, and are therefore, by definition, an “irregular” distribution. The main lobe of the ‘fingerprint’ also has a diffuse linear streak up and to the left.  According to the authors of the NAS paper, election results that show a strong streaking away from the main lobe may indicate ‘ballot stuffing,’ where ballots are added (or subtracted) at a specific rate. The election fingerprint is in the form of a main lobe and streak, although the streak is not as pronounced as the NAS paper’s Russia results.  
  • Vote share near or over 100% is highly irregular and indicates a strong potential for fraud in any election. In the image above, there is a distinct and shape line across multiple turnout bins of near 100% vote share for Biden.  Even in contentious elections, voter turnout over 90% is statistically unlikely, but not impossible.
  • Of the 1022 polling places in Atlanta, 252 are in the > 90% vote share for Biden band, 201 had > 93%, 84 had > 95%, and 8 had > 97%.
  • These findings indicate significant election irregularities that warrant additional scrutiny and investigation. It should be reiterated that the observed irregularities discussed above can serve as useful indicators and warnings of issues with an election, but do not constitute absolute proof of fraud on their own.
Categories
Election Data Analysis Election Forensics Election Integrity Interesting programming technical

Well, isn’t that interesting …

USAID pulled two of their recent publications sometime after Nov 8 2020 on election integrity from their website. Specifically two publications that discuss election integrity forensics, including referencing the same National Academy of sciences paper that describes the Election Fingerprinting process that I’ve been using as a basis for my work. Luckily … there’s the wayback machine … and I made copies.

“ASSESSING AND VERIFYING ELECTION RESULTS: A DECISION-MAKER’S GUIDE TO PARALLEL VOTE TABULATION AND OTHER TOOLS” (original URL: https://pdf.usaid.gov/pdf_docs/PA00KGWR.pdf)

https://digitalpollwatchers.org/wp-content/uploads/2021/08/PA00KGWR-Wayback-Snapshot-2020-Nov-8.pdf

“A GUIDE TO ELECTION FORENSICS” (original URL: https://pdf.usaid.gov/pdf_docs/PA00MXR7.pdf)

https://digitalpollwatchers.org/wp-content/uploads/2021/08/A-Guide-to-Elections-Forensics-USAID.pdf