Election Data Analysis Election Forensics Election Integrity Interesting programming technical

New WI 2020 Election Fingerprints

Previously (see here, and here) I built a set of Election Fingerprints including WI that was based on the County data that I extracted from the NYT/Edison research election tracker, which was the best data that I had available at the time. While the results did have some interesting features, there wasn’t really enough data points to make any sort of inference one way or the other. Well … this last weekend (07/30/21) I received a dataset that had all of the vote counts and registered voters by Ward Grouping in the state, which is a much higher fidelity dataset than I had been working with.

Update 8/4/21: Not 24 hours after posting, my data source for the WI data has discovered some more accurate voter registration numbers for me to use which more accurately account for same day registrations. (The one I used was dated as of 12/1/20) I will be updating my analysis ASAP. The updated plots and analysis is posted here, but I’ll keep this post up for historic purposes.

The original image I created is replicated below for reference. As you can see, its a very blocky histogram created from the County level data points. There is a main lobe at about 90% turnout and 40% for Biden with some kurtotic streaking up and to the left with another grouping of Counties. While those features are indeed irregular and interesting, due to the coarse nature of the original per County data, it was plausible that the shape is just due to sampling issues so I didn’t flag WI as one of my issue states.

The data set I received this weekend however, is a drastic improvement in fidelity. It’s not without its data quality issues, as there was a good bit of data normalization needing to be done to get the vote tally data per ward group and the voter registration data per ward to marry up. Besides some different spellings, abbreviations and shorthand that needed to be rectified between the dataset, the registration data had any municipality that would straddle county lines labeled similar to the following format: “MULTIPLE COUNTIES, CITY OF NOWHERESVILLE, WARD 5A” along with a registered voter count. The vote count tally data had grouped multiple wards into different groupings such as “SOMEWHERE COUNTY, CITY OF NOWHERESVILLE, WARDS 1-3,4,5A,6C,13-15” along with a set of vote totals for Biden, Trump and TotalVotesCast. So I had to go through the files by hand and group each files corresponding data with the smallest size groupings as possible. There also seems to be a few (not many, though) Wards that are not accounted for by the Vote count result as well and others that are missing from the registration data. That being said, we can see much more detail in the generated statewide fingerprint (below), which also matches the “low-res” version that I had done previously. (Hooray … my results are consistent!)

We see what looks like two (or even three) superimposed lobes around the 90% turnout and 40% per Biden bin, which is consistent with my earlier low-res version, and we have a lot more detail as to the upward kurtotic smear off the main distribution. There also are a few Ward Groupings that have exceptionally high % for Biden (rarely, if ever, does a candidate receive near 100% vote share), and there are Ward Groupings that are near/over the 100% turnout marker, with the addition that some of those are ALSO either 100% Biden or 0% Biden (all big red flags)!

As discussed in the National Academy of Sciences paper that all of this work is based on, an idealized Election Fingerprint should look like a 2D Gaussian (or a multi-variate normal distribution). Or maybe, in an extremely divided populace, 2 overlapping Gaussian distributions. Deviations from this are, by definition, irregular and an indication that there might be an issue with the election. Distinct linear or non-linear features sloping away from the central lobes(s) could indicate voter manipulation or “ballot stuffing”; isolated regions at high turnout +/or very high vote percentage for one candidate could be an indication of extreme manipulation or vote substitution. Now, its true that no real Election Fingerprint will look perfectly Gaussian due to the realities of how elections operate, data errors, etc … but they should at least be close!

So that moves the needle for me as to considering WI as a problematic state based off Election Fingerprint generation. I would definitely say now that I think there are concrete signs of election irregularity in WI that deserve to be rigorously investigated. I can’t tell you if the shape of the above plot is due to incompetence, or outright fraud … but there is definitely something wrong!

Per County Data

Since I now have all of this rich data to play with … It’d be a shame to only do things statewide! I also produced fingerprints for every single county. One note here is that there is one plot that is the group of “MULTIPLE COUNTIES” that I had mentioned above due to the way the source registration data was organized and labelled. Some of the counties, look actually pretty good with nice, localized, near-gaussian distributions. Some are just an absolute mess!

My list of “interesting” counties:


Of that list, I’d label the following as “VERY interesting”:


The Election Fingerprints of all of the counties (plus the “MULTIPLE COUNTIES” super-group) are listed below:

Here is the link to the consolidated dataset used to generate all of the graphics on this page.

Election Data Analysis Election Forensics Election Integrity Interesting programming technical

(Another) Interesting View of VA 2020 Election Data

In answering some questions and messages I’ve been getting on my election analysis work, I took a minute and simply plotted the log-log plot of all of the absolute vote counts for Biden Vs. Trump for all precincts in the state. Real physical precincts are colored blue. Central absentee counting precincts are colored red. The real precincts seem to look about like I would expect, however the central absentee precincts fall fairly neatly on the logarithmic line, which is curious to me.

What do I think it means? Not quite sure yet as I’m still noodling things over … but it is very interesting, to say the least.

Election Data Analysis Election Forensics Election Integrity Ginsu Science Interesting programming technical Uncategorized

I have been censored by Big Tech for doing independent election research and analysis

Facebook has put me in “FB Jail” until Feb 7 for, I can only guess, my posting links to my independent analysis of election data and results. I’m assuming a Twitter ban will be coming shortly.

If the totalitarian fact checkers at facebook think the countless hours I have spent collating and documenting the election irregularities of the 2020 election is incorrect I invite them to please show me where I got the math wrong. I’ve shown my work, provided my sources and methodology. I’ve even made sure to not erase my errors and flaws as I found them but to acknowledge them and update my posts with corrections as I did the work in real time. I’ll happily debate and defend my analysis and the theory behind it, and if you can show me where I’m wrong or have mis-represented anything I will gladly correct it. This censorship by big tech is cowardly, totalitarian and demonstrably un-American.

From Thomas Paine: ‘He that would make his own liberty secure must guard even his enemy from oppression; for if he violates this duty he establishes a precedent that will reach to himself.’

The solution to bad ideas, or ideas that you do not agree with is not to try and silence the speaker, but to combat them with better ideas.

Election Data Analysis Election Forensics Election Integrity Interesting programming technical

A Midnight Spike in VA, and generation of Election Fingerprints over time.

BLUF: Discovered 55,196 vote spike in Alexandria City absentee ballots with 0 (… as in … zilch, zip, nada) for Trump or Jorgensen sometime between 11:30pm 11/3/2020 and 01:00am 11/4/2020. Generated time series of VA per county election fingerprints, and cumulative vote counts. Data sources: VA Dept of Elections voter registration data by locality file, NYT Edison time series datafeed snapshots from wayback machine.


So I’m still puzzled by the shape of the per county election fingerprint in VA. Multiple scientists I’ve talked to and worked with on this are all scratching their heads as to what would explain the significant structures observed. As a reminder, the election fingerprint “should” look like one (maaaaybe two for a really split electorate) Gaussian lobes, without a lot of smearing, linear features or structures. They will of course not be perfect Gaussians, but we would expect them to be somewhat close. I go into the theory and details of generating these fingerprints here, and the VA per county data is replicated below as well. It does not look Gaussian at all and is by definition an “irregularity” in the election data.

For reference I show the MN plot below as well, which still shows some slight irregularities, but for the most part it looks very clean. It’s got a little bit of kurtotic energy in the tails, but it looks like a pretty well defined 2D Gaussian main lobe otherwise. It looks like it could even have two distribution centers: one distribution center for a large number of the smaller (presumably suburban + rural) localities that didn’t go for Biden, and much fewer (but more populous) larger localities that went just slightly above 50% for Biden. Both distribution centers had about 90% turnout. The MN data makes sense to me, and other data scientists that I’ve shown it too, and it doesn’t set off any blaring alarm bells. The VA data however, remains inexplicable.

So, in trying to decipher what happened in VA, I was interested in taking a look at how the election fingerprints for VA evolved over time. Does the whole structure move fairly uniformly from left to right as more turnout occurs? Does it look Gaussian at any point in the counting process and then shift? In order to attempt to do that I need the per county vote tallies over time and unfortunately I didn’t anticipate needing to do analysis like this before the election, and so I didn’t setup anything to capture the updates to the VA Dept of Elections site or the NYT Edison datafeed. I tried to do an after-the-fact FOIA request, but it has been denied by VA Dept of Elections stating that they do not keep those records.

However … thankfully, the wayback machine does have a few snapshots of the VA datafeed. Not nearly what I would like to be able to make a full sequence to watch how the fingerprint takes shape, but its better than nothing. And while we’re at it, lets keep our eye’s out for individual counties that have large vote swings and also have statistically improbable results for any candidate (> 95%).

The earliest data file I could find on the wayback machine was from 2020-11-04T04:06:57.160Z and shows results through 11:30pm on the 3rd. While thats not ideal, that at least starts us off while Trump was still ahead.


We can already see our “boomerang” structure fairly well formed in this initial plot, and I didn’t find any individual counties that had a large update with over 95% Biden. (Now that doesn’t mean such updates didn’t happen before this snapshot, but since my snapshot sampling rate is so low, each update includes more than one set of batch updates, and specific outlier batches might be just getting rolled into the sums.)


We can pretty clearly see that just after the last snapshot, right around midnight, there’s obviously some sort of issue that happened with the data, with a couple of really large bumps for Biden that end up getting reversed and reverted. I have no idea if there’s any specific event that this can be correlated to in news reporting, if this is a glitch in the NYT feed, etc. But it looks like this event occurred between the updates to the NYT datafeed that I could find on the wayback machine. So whatever it is that happened, it got baked into the cake already as far as this datafeed snapshot is concerned. (If anyone has the files for these timestamps, please share!)

In addition to that we see that we’ve had a noticeable shift in our fingerprint as it looks like more low-turnout areas have been shifted into the boomerang.

Whats really interesting to me is that there was one locality (Alexandria) that had a large change of 55,196 Biden votes and ZERO for Trump or Jorgensen votes in the near hour and a half since the previous data points. From the data, it looks like these were all absentee votes, which we admittedly expect to be a higher turnout percentage for Biden … but 100% of such a large sample size … c’mon man! According to VA Dept of Elections, the absentee vote in Alexandria broke overall 84.78% (55,940 / 65,985) for Biden and 13.57% (8,951 / 65,985) for Trump. Now we know the underlying distribution is not I.I.D., so we should not expect exactly 13.57% of this batch to have gone for Trump, but we would still expect at the very least a small handful of this batch should have been for Trump. Additionally, this batch of absentee ballots makes up 83.65% of Biden’s total absentee haul, meaning Tump must have received 8,951 / (65,985 – 55,196) = 82.96% of ALL the remaining absentee votes in Alexandria, which also seems pretty unlikely.

2020-11-05T02:45:04.745Z and Beyond

After the previous batch there aren’t any more large outliers at least that I caught. But I only have a few snapshots from the wayback machine to work with. We see the remaining straggler counties start to shift to underneath the “boomerang”. If there was a point in time where the data was looking like a 2D Gaussian, it was before the first snapshot that I was able to find and get my hands on.

Complete Galleries Below:

Election Data Analysis Election Forensics Election Integrity Interesting programming technical

Excess Absentee Votes in VA

See also my previous election fingerprint blog posts here, and here. There is another, related, discrepancy in the Daily Absentee List data that I document here.

Originally Posted 2020-12-01 10:58:00 (Multiple Updates)

After computing the VA election fingerprints which clearly indicate that more investigation is required into the 2020 VA vote data, I took a look at the official VA daily absentee ballot count file and compared it with the JSON reports from VA dept of elections.  The JSON data provides a summary total of the votes recorded at each virtual absentee precinct, and the daily absentee list (here) gives all of the absentee ballot registrations for each precinct. For each (virtual) absentee precinct in a locality I summed all of the daily absentee numbers from component real precincts in order to compare those numbers with the reported total absentee votes. If everything has been tallied and recorded correctly they should be equal, or at least close.

I then compute the “excess absentee vote” by taking the difference of the number of total absentee votes reported in each virtual absentee precinct minus the cumulative (‘Marked’ + ‘Pre-Processed’ + ‘OnMachine’ + ‘FWAB’) numbers for the absentee precinct.  

I finally plotted the “excess absentee” vote totals vs the % of the absentee precinct vote that went for Biden.  Ideally we would like to see excess absentee vote == 0, meaning that the daily absentee reports and the number of absentee ballots that were counted in each virtual precinct match perfectly.  (I’ll give you a hint … they don’t.)

Outstanding Question: Is there a description document someplace where the BALLOT_STATUS (‘Marked’, etc) meanings are described.  I’m using my best guess as to which categories map to valid ballots, and I’d like to double check that.

Each dot in the plot above represents one of the virtual absentee precincts across the commonwealth. Unsurprisingly, we can clearly see that as the number of excess absentee votes increases more of the absentee vote totals go to Biden. Summing over all of the excess votes, there were 1,334,968 excess absentee votes across VA in the 2020 election using this method. Where did these 1,334,968 votes come from? (See latest updates below)

All source data comes directly from the VA Dept of Elections. My tabulated results are posted below. I welcome any / all peer review and will gladly make my code and compiled datasets available.

Note: Removed incorrect files ... see updates below

Update 2020-12-01 19:30:00

In discussions with The Virginia Project and others that have been churning and burning through this data, there was a question as to weather or not the Daily Absentee Tally dataset (which is available to campaigns by request from the dept of elections), which is what I used above includes the In-Person early vote. My understanding is that the early vote was treated the same as No Excuse Absentee, so early vote numbers *should* be included in the Daily Absentee Tally dataset. But it looks like that might not be true. The JSON files published by VA dept of elections only give the sum totals per candidate per precinct (including absentee precincts), and I used the Daily Absentee Tally dataset to map how many absentee (and early votes) should correspond to each precinct.

However, there is another dataset that I was pointed to that could be a useful comparison and way to compute the missing numbers: that is the Summary Absentee Dataset located here. This dataset DOES contain the In-Person early vote absentee totals (‘ApplicationType’ == ‘In Person’), but it is only generated per absentee counting precinct and not mapped back to the voters physical precinct. It does however break down the Absentee votes per age group, gender, and type of absentee ballot … which is nice … but not what I’m looking for at the moment.

So, we have two datasets, both summaries of the absentee vote numbers with different breakdowns, that *should* sum to give the same totals. Surprise … they don’t. Why? Unknown. The ‘In Person’ vote not being included in the Daily tally does not reconcile the numbers between the two datasets, though.

The plot I generated above, redone with this other dataset is shown below. I’ve included new csv files with the additional data (I’ve labeled as “_V2”). The excess votes per absentee district are similarly computed as the totalVote reported by each absentee precinct (from the JSON files) minus the sum count of all of the Absentee/Mail-In numbers from the Summary Absentee Dataset for that absentee precinct. This new plot does not show positive excess votes, but instead shows negative excess votes, with higher negative excess votes also showing higher Biden total vote percentage. I don’t know which one is correct, but both seem to show high positive / negative excess vote numbers that align with higher Biden percent of the vote. Summing over all of the negative excess votes gives -121,049 votes that are unaccounted for.

One additional question I have is how to tease out the rejection rate of Absentee/Mail-In votes, which might be a contributing factor for both plots. Are the numbers in the Daily or Summary absentee stats files before or after rejection? I have so far not been able to find a dataset that captures the rejection rates for Absentee/Main-In ballots.

Note: Removed incorrect files ... see updates below

Update 2020-12-04 02:00:00

So I’ve been able to confirm that the Daily Absentee List DOES, in fact, contain the In-Person “early” votes. I did this by finding my name and address in the list as I voted early on Oct 26th. The early votes are the ones marked as “On Machine”. I will also note that the date reported for my early vote was incorrectly set to Oct 28th (I voted on the 26th), which is why I had difficulty finding it at first.

So this means that:

  1. My first plot above *should* be correct in its computation of the “excess absentee vote”. Save for a fat finger bug in my code (I’m in the process of double-checking, btw).
  2. The Summary Absentee count does not match the totals from the Daily Absentee List. Why?
  3. If the Daily list is correct, and the excess vote is computed correctly above in the first plot, then how to account for the 1,334,968 excess votes?

Update 2020-12-06 00:05:00

I have revised my previously computed excess vote number of 185,713 to 1,334,968 due to finding some “fat finger” errors, and idiosyncrasies with the DAL file.

Yes, I know … thats a really big number! I’m going back over my code again to see where / if I screwed something up. I will continue to update if I find anything else.

I’ve also taken the time to clean up the plots, both with and without annotations as to which precincts were the most egregious offenders.

Update 2020-12-10 23:12:00

Happy to report that I found a logical bug in my code.  I was subtracting PreProcessed ballots when I should have been adding when computing the excess vote with the DAL data (D’Oh!).  It’s a simple bug, but produces a big difference.  Result is a much smaller order of magnitude difference in the excess vote numbers, and a much more believable excess vote tally.
This line of code:

>> abCntVotes(j) = tdata.marked(j) + tdata.onMachine(j) - tdata.preProcessed(j) + tdata.fwab(j) ;

Should have read:  

>> abCntVotes(j) = tdata.marked(j) + tdata.onMachine(j) + tdata.preProcessed(j) + tdata.fwab(j) ;

That’s the good news.

The first bit of bad news is that the Summary Absentee List (SAL) still doesn’t make much sense, as discussed in my 2020-12-01 update above.  The other bad news is there are two absentee precincts that still stick out like a sore thumb from the DAL derived excess vote numbers. Guess which ones they are … PRINCE WILLIAM COUNTY (11) and PRINCE WILLIAM COUNTY (1)!  PWC district 1 and 11 are both waaay outside the standard deviation on the plot below.  Another curious fact is that we see the deviation on the Biden precincts (blue) looks markedly higher than the deviation in the Trump precincts (red)

Now if we look at the summary totals of the PWC absentee precincts we see that the sum of the negative excess absentee counts of district 1 and 10, are almost a perfect complement of the district 11 excess count (far right column).  That’s also curious.  By itself I’d call that just a coincidence, but combined with the fact that PWC 11 and 01 are also so outside the general trends of all other precincts in the plot above that gets my worry beads out.

Per discussion with PWC staff, there was apparently an error in reporting election data that got caught and corrected where all absentee votes were being sent to the district 01 absentee precinct, so this might be an artifact of that issue.

Additionally, and this is something we see in all of the precinct data and in the JSON data itself.  The JSON data files directly report the Trump, Jorgensen and Biden vote totals, as well as the ‘totalVotes’ numbers. The ‘sumVotes’ column below is the sum of the Trump, Jorgensen and Biden votes which should equal the ‘totalVotes’ column, but it doesn’t.  It’s not usually a dramatic difference, but its a difference all the same.

Again, per discussion with PWC staff, the ‘totalVotes’ numbers reported by the Dept. of Elections JSON data feed includes items such as overvotes / undervotes / unmarked ballots, which count for turnout reasons, but don’t get attributed to a specific candidate.

localityStrprecinctStrNregDonald J. TrumpJo JorgensenJoseph R. BidensumVotestotalVotesissuedmarkedonMachinecancelleddeletedfwablatenotIssuedpreProcessedprovisionalunmarkedabsenteeVotesSumexcessAbsenteeVotes
PRINCE WILLIAM COUNTY# AB – Central Absentee Precinct (01)1154752272078146939704407072125903371471430610662032144909172025-1304
PRINCE WILLIAM COUNTY# AB – Central Absentee Precinct (10)4538310588443213713240232525968144322543029541400884214232842-317
PRINCE WILLIAM COUNTY# AB – Central Absentee Precinct (11)106455195797824604066401666833303322840443056265200213380111650611622

So taking all of the above into account, and assuming that PWC 11 and 01 are artifacts of a data entry error, using the DAL and the JSON vote tallies we see that there is a good bit, but not extreme amounts, of variation in the “excess vote” and that the deviations in excess vote seems greater in blue precincts than in red precincts.

The SAL data still does not agree with the JSON or the DAL data files, and I’m still working to figure out why.

My tabulated results are posted below. Note that I do not include the raw DAL data in my results below, only my summarized results, as the raw data contains personal address information. I welcome any / all peer review and will gladly make my code and compiled datasets available.

Update 2020-12-13 17:44:00

Per twitter comment, looking at the data with normalized x-axis (as percent of absentee vote total) in order to compensate for different precinct sizes produces the plot below. Newport News City and Richmond County now also stand out, and we still see a difference in deviation of excess votes between Biden precincts and Trump precincts.

Election Data Analysis Election Forensics Election Integrity Interesting programming technical

HART Voting Machines in PWC, VA

Discovered today that my county PWC, VA uses the HART voting machine systems and I have been doing a little research. Right now I’m just seeing what I can find and verify and collecting them on this page.

Biggest Find:

I found a very recent paper (2019) discussing and demonstrating methods by which to hack all of the major brand voting machines that use ballot images. It’s effective enough to be able to defeat any ballot image recounts as well. It’s a fairly simple man-in-the-middle attack vector, which needs to only install a wrapper around the windows scanner driver for the scanning systems. … and given all of these systems are using unencrypted(!) USB sticks(!!), thats pretty easy to do!!!

The wrapped adversarial payload can use some standard “textbook” (… no really … I own multiple textbooks that reference them) image processing tricks to selectively switch where the voter has marked the ballot in the image as its scanned, keeping the voters handwriting and style intact. The user sees their vote recorded, but the ballot image is altered before its recorded or counted. And auditing, or recounting from ballot images just recounts the swapped image. With a well designed and deterministic coding of this attack, re-running the same ballots through the scanner to perform a recount just re-applies the same deterministic conversion to the stored ballot images, and hence the recount stays the same. It’s a pretty devilishly elegant attack vector.

Bernhard M., Kandula K., Wink J., Halderman J.A. (2019) UnclearBallot: Automated Ballot Image Manipulation. In: Krimmer R. et al. (eds) Electronic Voting. E-Vote-ID 2019. Lecture Notes in Computer Science, vol 11759. Springer, Cham.
alt link:

Now I’m not saying that that took place in the election, but it is a direct example of a pretty scary scenario and a working example.

Other Finds:

Article on VA audit process, and why it can’t possibly affect the 2020 results.

HART Verity Systems Administrator Guide:

Feb 5 2016 PWC Elections Board Meeting Agenda discussing Acquisition of HART systems:

Romney Financial Ties to HART (from 2012):

Another follow-the-money story from 2019:

NBC Jan 2020 Story on Voting Machine makers testifying before congress:

2018 Story on Vulnerabilities of multiple Voting Machines:

Whistleblower detailing HART shady business practices (2008):

HART Systems Security Issues (2006):

Election Data Analysis Election Forensics Election Integrity Interesting programming technical

VA Senate Timeseries Election Data Plot

Computing the VA 2020 Senate Election data time series from the NYT Edison data feed. This is direct from the NYT data feed without any processing.


Election Data Analysis Election Forensics Election Integrity Interesting programming technical

VA Timeseries Election Data Plot

Computing the VA 2020 Presidential Election data time series from the NYT Edison data feed. This is direct from the NYT data feed without any processing.


Election Data Analysis Election Forensics Election Integrity Interesting programming technical

Computing the VA 2020 Daily Absentee Ballot List Rejection Rate and Other Stats

Adding onto the work I’ve already done looking at the VA Daily Absentee List I wanted to compute the rejection rate for absentee ballots and some other basic statistics just from the Daily Absentee Ballot List (DAL) produced by VA Dept of Elections. (

Keep in mind I’ve already been able to show:

  • There were 166 Absentee Ballot Applications that were received and accepted AFTER the corresponding ballot was received or accepted (here).
  • There is a discrepancy in the number of “excess” absentee votes counted vs. the total accepted registrations from the Daily Absentee List (here).
  • The 2020 VA election fingerprints, as well as other states, have significant structural irregularities and indications of vote manipulation (here).
  • The VA election fingerprints show a trend of increasing irregularity starting with 2008, 2012, 2016 and now 2020 (here).

Basic Info:

There are 3335335 entries, with 36 columns in the Daily Absentee List.


Computing the Absentee Ballot Rejection Rate:

So (in MATLAB), with the full list directly imported into the ‘DailyAbsenteeList’ table variable:

% Lets see if we can teas out the rejection rate for 
% absentee ballots
>> ineligable = DailyAbsenteeList.APP_STATUS == "Denied/Not Eligible";
>> incomplete = DailyAbsenteeList.APP_STATUS == "Denied/Incomplete";
>> rejectionRate = 100 * mean(ineligable | incomplete)
>> ans = 0.0425

RESULT: The absentee ballot rejection rate was an exceptionally low 0.0425%

Finding the Counted ballots:

As we did in the “excess vote” computation, we need to find the indices into the list of all of the ballots that were counted, meaning their BALLOT_STATUS is in one of the following states: {‘On Machine’, ‘Marked’, ‘Pre-Processed’, ‘FWAB’}. So the ‘idxv’ variable below is a boolean flag indicating that the BALLOT_STATUS of each entry in the list is in a countable state.

% For each entry in the Daily Absentee List, test if 
% BALLOT_STATUS is in a valid state 
>> idxv = DailyAbsenteeList.BALLOT_STATUS=='Marked' | ...
    DailyAbsenteeList.BALLOT_STATUS=='Pre-Processed' | ...
    DailyAbsenteeList.BALLOT_STATUS=='On Machine' | ...

Now one would think that if a ballot has been marked into one of those BALLOT_STATUS categories, then that ballot application should have been ‘Approved’, right? Well lets double check that shall we:

% For each entry in the Daily Absentee List, test if 
% BALLOT_STATUS is in a valid state AND APP_STATUS is set to 
% 'Approved'.  (One would think these would match.)
>> idxav = (DailyAbsenteeList.BALLOT_STATUS=='Marked' | ...
    DailyAbsenteeList.BALLOT_STATUS=='Pre-Processed' | ...
    DailyAbsenteeList.BALLOT_STATUS=='On Machine' | ...
    DailyAbsenteeList.BALLOT_STATUS=='FWAB') & ...

We can check for mismatches by counting the number of flags that are not equal between ‘idxv’ and ‘idxav’.

% How many absentee ballots have their BALLOT_STATUS set to 
% a valid (i.e. "countable") state, but are not marked as 
% 'Approved'
>> numUnapprovedButCountableVotes = sum(idxv)-sum(idxav)
>> ans = 1437

RESULT: There are 1437 entries that have BALLOT_STATUS set to a valid state, but don’t have an approved absentee ballot application

Well … this begs some follow up questions…

Q1: Of the ballots that have been set to a valid BALLOT_STATUS state but were not set to ‘Approved’, what is the breakdown by APP_STATUS?

>> cancelled = sum(DailyAbsenteeList.APP_STATUS=='Cancelled or Duplicate' & (idxv~=idxav))
>> deniedInc = sum(DailyAbsenteeList.APP_STATUS=='Denied/Incomplete' & (idxv~=idxav))
>> deniedIne = sum(DailyAbsenteeList.APP_STATUS=='Denied/Not Eligible' & (idxv~=idxav))
>> issued = sum(DailyAbsenteeList.APP_STATUS=='Issued' & (idxv~=idxav))
>> onHold = sum(DailyAbsenteeList.APP_STATUS=='On Hold' & (idxv~=idxav))
>> pending = sum(DailyAbsenteeList.APP_STATUS=='Pending Approval' & (idxv~=idxav))
>> provisional = sum(DailyAbsenteeList.APP_STATUS=='Provisional' & (idxv~=idxav))

RESULTS: Of the 1437 ballots that were in a countable state but not marked as having APP_STATUS=’Accepted’, their APP_STATUS was:
cancelled = 444
deniedInc = 568
deniedIne = 4
issued = 28
onHold = 101
pending = 288
provisional = 4

Q2: Of the ballots that have been set to a valid BALLOT_STATUS state but were not set to ‘Approved’, what is the breakdown by BALLOT_STATUS?

>> unapproved_mailIn = sum(DailyAbsenteeList.BALLOT_STATUS=='Marked' & ...
>> unapproved_preProcessed = sum(DailyAbsenteeList.BALLOT_STATUS=='Pre-Processed' & ...
>> unapproved_earlyInPerson = sum(DailyAbsenteeList.BALLOT_STATUS=='On Machine' & ...
>> unapproved_FWAB = sum(DailyAbsenteeList.BALLOT_STATUS=='FWAB' & ...

RESULTS: Of the 1437 ballots that were in a countable state but not marked as having APP_STATUS=’Accepted’, their BALLOT_STATUS was:
unapproved_MailIn (‘Marked’) = 512
unapproved_preProcessed (‘Pre-Processed’) = 578
unapproved_earlyInPerson (‘On Machine’) = 347
unapproved_FWAB (‘FWAB’) = 0

Checking for duplicate identities:

Another check we can do is try to determine the amount of duplicated voters that have been accepted.

% What about the possibility of the same person being marked 
% with multiple ballots?  We will now go through 
% the'Accepted' and countable ballots and
% look for duplicate names and address information.
>> idxvs = find(idxav);
>> fnames = cellstr(DailyAbsenteeList.FIRST_NAME(idxvs));
>> lnames = cellstr(DailyAbsenteeList.LAST_NAME(idxvs));
>> mnames = cellstr(DailyAbsenteeList.MIDDLE_NAME(idxvs));
>> sffx = cellstr(DailyAbsenteeList.SUFFIX(idxvs));
>> addy1 = cellstr(DailyAbsenteeList.ADDRESS_LINE_1(idxvs));
>> addy2 = cellstr(num2str(DailyAbsenteeList.ADDRESS_LINE_2(idxvs)));
>> addy3 = cellstr(num2str(DailyAbsenteeList.ADDRESS_LINE_3(idxvs)));
>> zips = cellstr(num2str(DailyAbsenteeList.ZIP(idxvs)));
>> addy2 = strrep(strrep(addy2,'NaN',''),' ','');
>> addy3 = strrep(strrep(addy3,'NaN',''),' ','');
>> zips = strrep(strrep(zips,'NaN',''),' ','');
>> namesAndAddys = join([fnames, mnames, lnames, sffx, addy1, addy2, addy3, zips]);
>> [unames,unr,uni] = unique(namesAndAddys);
>> duplicateVotes = numel(namesAndAddys) - numel(unames)
>> ans = 166

RESULT: The number of repeated accepted absentee ballots is 166 … now where have we seen that number pop up before … ahhhh yes … its the same number of ballots that have ballot applications that we received and accepted AFTER the actual ballot was accepted and cast!

Election Data Analysis Election Forensics Election Integrity Interesting programming technical

Another VA Daily Absentee List Discrepancy

Looking further into the VA Daily Absentee List there were 166 “valid” counted absentee ballots that had the application receipt date greater than the ballot receipt date, and 1,797,901 ballots where the application and ballot receipt dates were equal.

An absentee application needs to be received and validated, then an absentee ballot needs to be mailed to the applicant filled out and returned. The Daily Absentee List also does not seem to be accounting for In-Person “a.k.a. early” votes, as I’ve discussed here, so that does not seem to be a viable explanation. If the Daily Absentee List DOES include the in-person data than there needs to be a different explanation for the discrepancies that I noted in the link above between the Daily and Summary absentee lists and the JSON vote count tally. Update 2020-12-04: I can now confirm the daily list does contain In-Person early vote. The In-Person is likely a good explanation for the 1,797,901 number, but the 166 ballots received before the application is still problematic. Also, this leads to more questions as to why is there a discrepancy between the total counts on the Daily and Summary lists, as well as why do both not match the actual absentee recorded vote counts from the JSON data? I’ve also updated my other blog post on the excess vote with the details.

In MATLAB, after reading in the Daily Absentee List file to the ‘DailyAbsenteeList’ variable:

% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Get the indicies of those absentee ballots that have their status marked as being one of the valid categories.
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>> idxv = DailyAbsenteeList.BALLOT_STATUS==’Marked’ | DailyAbsenteeList.BALLOT_STATUS==’Pre-Processed’ | DailyAbsenteeList.BALLOT_STATUS==’On Machine’ | DailyAbsenteeList.BALLOT_STATUS==’FWAB’;

% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Next we check the dates for the ‘valid’ absentee ballots and perform the summations
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
>> sum(DailyAbsenteeList.APP_RECIEPT_DATE(idxv) > DailyAbsenteeList.BALLOT_RECEIPT_DATE(idxv))
>> ans = 166

>> sum(DailyAbsenteeList.APP_RECIEPT_DATE(idxv) == DailyAbsenteeList.BALLOT_RECEIPT_DATE(idxv))
>> ans = 1797901