Why is it important?
After an election, it’s often hard to recall who said what. Who was wrong or right, who predicted the right outcome and who used the right data or the right predictive model.
In this article I’ll compile some forecasts for the 2020 US Presidential election. I used these forecasts at 11 AM, November 2nd, one day before the elections. I use maps without toss-ups when available.
First website on Google when you search for a forecast. They explain their model here, the map is here. Basically they adjust polls with economic and demographic data, then they try to account for errors by averaging on multiple simulations with different hypothesis. That’s a robust way to do political forecasts in theory.
Biden: 350 - Trump: 188
270towin mostly uses polls at a state level, so it’s a less advanced method than the one of fivethirtyeight. Polls can be biased because not everybody will answer to a poll and not necessary with the real vote they’ll make, so it’s usually a weak method to only rely on polls.
Biden: 351 - Trump: 187
It’s interesting to note that they got the same result fivethirtyeight got while having a much less complicated method:
The candidate that leads in the polls is shown as the winner of the state. The 2016 party winner is used where there are no polls.
CNN is basing themselves on “ratings”. I didn’t find a clear answer on how they made their forecast map, nor a map without toss-ups, but it’s also probably based on polls as they’re sharing per-state polls on their website. CNN map isn’t far from 270towin’s map.
Biden: 290 - Trump: 163
Realclearpolitics has a no toss-ups map. Their results however is a bit different from the others. Biden is still winning but with 15 fewer voters because of North Carolina choosing the Republican party in a recent poll they added. It seems that they’re mostly based on polls, just like 270towin.
Biden: 335 - Trump: 203
Their forecast is based on a model and not only on polls. They’re also doing multiple simulations to account for errors like fivethirtyeight. They create a similarity index between states using demographics and political profiles which is a great way to propagate a change in one poll to other states.
Biden: 350 - Trump: 188
(6) my own lazy forecast: averaging forecasts of other people
You want to do a forecast but you’re too lazy to build a complete algorithm? Great news, you can do forecast accumulation. The technical term is “ensemble learning”.
We have 4 forecasts without toss-ups which for three of them are quite the same. The final result could be 350/188 but because of realclearpolitics we have to adjust to a bit lower than that. Weighting, based on my estimation of the quality of their methods, fivethirtyeight 5/5, 270towin 1/5, realclearpolitics 3/5 and economist.com 4/5, we get 347/191 on average (which may not be possible because of how the election is done but it should minimize the error).
Biden: 347 - Trump: 191
I could also add a bias towards Trump because that’s how it went in 2016 but I’m trusting previous models as some of them claim to have corrected their algorithm to account for the previous presidential election.
The main problem is that all models seem to be deeply using polls and even complex models don’t seem to be far from what raw polls say. Maybe they’re all right and most polls were done appropriately, we’ll know that this week, but if not they should definitely try to base themselves on different metrics, and try to know why they weren’t able to predict the outcome in the State where they failed to have the great result.
The outcome will surely be interesting for next elections all over the world, to understand how far we’ve come in forecasting elections. 2016 had a surprising result so we hope that this time all people doing forecasts have corrected their algorithms.
Who was right / Who was wrong / Why?
This part was written on November 12, 2020
Ok, the count is still not over but it’ll probably take a long time before it is. Let’s extrapolate the last results, Biden gets Georgia and Trump get NC and Alaska. Final result:
Biden: 306 - Trump: 232
Who was right?
CNN did pretty great but they didn’t make a no toss-ups map. RealClearPolitics got the best no toss-ups map. They only got Florida wrong (29 electors). Fivethirtyeight and 270towin gave North Carolina to Biden.
What lessons can be learned?
First, votes for “pro-countryside” candidates are always underestimated. Polls aren’t reliable enough (they’re often made in big cities because it’s easier) and models (such as the one from 270towin) should take into account other metrics as well.
Votes shouldn’t only be predicted based on what people say they’ll do, but also based on what they did and who they are. Knowing how people voted during previous elections is a great indicator on how they’ll vote for the next one. It appears that this time the truth was in between 2020 polls and 2016 election, while models mostly followed 2020 polls.
It’s a possibility that the next republican candidate will be less “pro-countryside” in order to win the next election. If models don’t account for that fact, they’ll put a big bias towards that candidate, not understanding it’s not Trump anymore. These models could predict a win for the republican candidate while this bias won’t be as much prominent in the public opinion, and the democrat could win in the end to the surprise of forecasters.
It’s important to not only learn the lesson, but also understand why people voted like they did. Has Trump invested much more in Florida? Was it because he voted in that state? Because he often goes to Florida and he has a special aura in that place? This should be accounted into the algorithms, but one data point isn’t enough and these phenomena should be analyzed quantitavely thanks to the data coming from other elections.