Consider a state with heavy Republican support, Tennessee.
Each Tennesse county has different turnout rates
This plot shows the turnout rate for each age from 18 to 100 in all of Tennessee’s 95 counties.
Each • point is the turnout rate for a particular age group in a particular county.
And the
red line shows the average turnout rate for the age groups across counties.
There is no statewide key
This is essentially Douglas Frank’s state-level “key” that he uses to predict the turnout rate in each
county . If Doug Frank was correct, then all the points would cluster around this line.
But clearly this isn’t the case. There is a lot of variation across the counties. So not surprisingly, the
state-level turnout rate isn’t going to be a perfect prediction of the turnout rate in Tennessee’s counties.
But if there is so much variability, how could Douglas Frank possibly claim to have such strong correlations
between his predictions and the actual results? That’s because he performs a sleight of hand and examines the
number of people who turnout to vote from an age group, rather than the turnout rate.
Frank’s focus on turnout counts causes him to overstate the predictive power of his voter fraud test. In fact,
when predicting counts he reaches the unsurprising conclusion that age groups with more individuals have more
people turnout to vote.
Had he focused on turnout rates—which is actually what his conspiracy theory implies
should be predictable–instead of counts, he would have noticed his predictive power is much lower.
We’ll use Frank’s procedure to estimate a state-level key for Tennessee (basically the red line in the previous plot).
This is his prediction of Union County’s turnout rate.
We can plot the actual turnout rate against this
prediction. The 45-degree line shows where the points would fall if we made a perfect prediction. Clearly, the
predicted turnout rate is different than the actual turnout rate. The predicted and actual only correlate at
only about 0.57.
But if we plot the turnout count rather than the rate it magically appears that the relationship between the prediction and actual results is stronger.
Now the correlation is 0.99–a number that Frank says “ain’t natural.”
What’s going on here?
After all, it seems strange that a poorly predicted turnout rate could result in such a strong correlation for turnout counts.
Frank’s test is not a test of voter fraud at all. When we correlate predicted and actual counts we’re only recovering the obvious fact that age groups with more registered voters will have more voters cast ballots than age groups with fewer registered voters.
The variation in the age group sizes overwhelms the correlation coefficient, causing it to increase and converge on 1. In other words, because age groups with large numbers of people will obviously cast more votes than age groups with small numbers of people, we will mechanically observe a very strong correlation between group size and the number of votes cast.
This is not suspicious in the least: the larger a group is, the more votes they will cast in an election, and the correlation will therefore be very high.
To understand this basic pattern, let’s consider a simple fictitious county where there is no vote manipulation. In our fictitious county every age group turns out at a different rate, but in our first election every age group has the same number of people—100 residents.
Using this fictitious county we follow Doug Frank’s procedure to estimate a predicted turnout rate for each group. Based on this prediction, we find that the predicted correlation correlates with the truth at 0.26.
Because every age group is the same size, we find the same correlation when we predict the number of people from each age group who vote, rather than their turnout rate.
But if there is variation in the number of people in each age group, the correlation between our predicted count of voters and the actual number of voters from each age group will appear stronger.
To see this, let’s return to our fictitious county and make one age group slightly bigger than the other age groups—but critically we leave all other age groups the same size.
The plot on the left shows the new relationship between the predicted and actual turnout counts as we increase the size of this single age group.
To get a sense of how increasing the size of the largest group changes the correlation between predicted and actual counts, the plot on the right shows how the correlation changes as we increase the size of the largest group.
The horizontal axis is the size of the largest group and the vertical axis is the correlation between predicted and actual turnout counts—Douglas Frank’s test of voter fraud.
The line tracks the evolution of the correlation as we change the size of the largest group and the point shows the correlation between predicted and actual turnout counts for the plot on the left.
Put simply, the plot on the right shows that Frank’s supposedly unnatural correlation can have quite natural origins. As the group size gets bigger, the correlation between the predicted and actual counts gets closer to 1, even though the underlying prediction about turnout rates remains identical–with a correlation at about 0.26.
The explanation is simple: making one group bigger results in more variation across age groups and this variation swamps the correlation.
Bottom Line:
Douglas Frank’s supposed evidence of fraud is not really evidence at all.
His tests aren’t really tests, there is plenty of county to county variation, and his statistical analysis exaggerates his predictive ability.