democratic primary predictions and forecasts

Democratic Model

If you’ve been on our site, you’ve likely seen our Democratic primary predictions.  This post explains our models and how we come up with our Democratic forecasts.

Our first model predicts state winners; it uses a logistic regression and runs simulations for each election 1000 times.

The state winner model uses the following independent variables: whether the contest is a caucus, state percent white, black, and Hispanic, state GDP per capita, and the state’s average age.  The dependent variable is a Clinton victory (a logistic model with a Clinton victory coded “1”).  Here are its predictions for the primary season as well as the each primary/caucus’s actual winner:

State


Hillary Clinton Win Probability


Bernie Sanders Win Probability


Actual Winner


IA0.440.56Clinton
NH0.190.81Sanders
NV0.700.30Clinton
SC0.990.01Clinton
AL0.990.01Clinton
AR0.830.17Clinton
CO0.480.52Sanders
GA0.990.01Clinton
MA0.750.25Clinton
MN0.100.90Sanders
OK0.150.85Sanders
TN0.860.14Clinton
TX0.970.03Clinton
VT0.120.88Sanders
VA0.930.07Clinton
KS0.250.75Sanders
LA1.000.00Clinton
NE0.200.80Sanders
ME0.140.86Sanders
MI0.430.57Sanders
MS0.970.03Clinton
FL0.960.04Clinton
IL0.980.02Clinton
MO0.840.16Clinton
NC0.990.01Clinton
OH0.840.16Clinton
AZ0.850.15Clinton
ID0.080.92Sanders
UT0.090.91Sanders
AK0.130.87Sanders
HI0.160.84Sanders
WA0.170.83Sanders

For Democratic primary vote share predictions, I use a dummy variable for caucus, state percent African American, white, and Hispanic, state GDP per capita, percent of the state population between 18 and 25, a dummy variable for whether Clinton is predicted to win (found above), and a dummy variable for the South.  The dependent variable is her actual vote share.  This approach explains around 93 percent of vote share variation.  Its predictions as well as actual outcomes are shown below:

State


Hillary Clinton Vote Share


Actual Clinton Vote Share


IA0.440.5
NH0.320.38
NV0.580.53
SC0.730.74
AL0.710.78
AR0.590.66
CO0.410.4
GA0.740.71
MA0.530.5
MN0.370.38
OK0.380.42
TN0.670.66
TX0.640.65
VT0.200.14
VA0.600.64
KS0.360.32
LA0.710.71
NE0.360.43
ME0.360.36
MI0.510.48
MS0.780.83
FL0.720.64
IL0.510.51
MO0.510.5
NC0.660.55
OH0.600.57
AZ0.520.58
ID0.260.21
UT0.300.2
AK0.260.18
HI0.210.3
WA0.320.27

Each contest yields new data points that are then put into the model.  The two tables shown depict initial predictions, not outputs from after the model has considered new information and learned from it (which, when backtested, presents more accurate results than our first estimates – no surprise there).  Our Democratic primary predictions become more accurate as n (completed primaries/caucuses) increases, leading to a higher degree of confidence with which we can forecast the primaries.

Feel free to comment with questions or suggestions – we’re always looking to improve our model!

Leave a Reply

Your email address will not be published. Required fields are marked *