By Matt Florell 2014-08-28
One of the most important facts about polling that we reaffirmed this primary season was: Sample size is the most important variable related to polling accuracy. We've noticed this before in elections, but this primary season there are several consistent examples of how accurate larger sample size polls are, and how inaccurate small sample size polls can be. The four races where we polled and had greater than 500 respondents all had very accurate results that were only 0.4% to 3.6% off of the final election day results. All of our smaller sample polls had greater differences from the election results.
During this 2014 primary season we polled a total of 37 primary races, and we correctly picked the winner in 30 of those races. Out of the 7 that we picked wrong, 5 had sample sizes below 300 respondents. As an example of one of the races we got wrong, let's take a look at the HD-40 Republican Primary. This race put Coleen Burton against John Hugh Shannon. We polled this race four times from mid-July to mid-August, and we never had a sample size over 240. Keep in mind that only 8,493 people even voted in this race, so that is a very small population to try to survey in any case. What you will notice from the chart below is that Shannon's vote percentage plateaued while Burton's percentage showed a trend of improving over time. In the end, Shannon's vote share stayed almost exactly where it had been in our last poll, while Burton took the entire share of the undecideds. This isn't usually a common event, but we have seen it happen several times in the last few years.
The next important trend I would like to touch on is how a candidate's momentum can carry them to victory on election day. To illustrate this we will take a look at one of the most hotly contested and expensive State House Republican primaries that was decided this week: HD-74, Richard DeNapoli vs. Julio Gonzalez. Around the same time we polled this race, an internal campaign poll came out that showed DeNapoli way ahead, and it called Gonzalez "virtually unelectable". Our first poll in mid-July showed DeNapoli ahead by over 23 points. We polled this race a total of five times from mid-July to mid-August, with sample sizes always above 300. What you can see from the chart below is the consistent rise in Gonzalez's numbers while at the same time DeNapoli steadily declined. On election day, similar to HD-40, the winner took almost the entire share of undecideds in our poll.
The last thing that I wanted to mention in this recap, is polls with high undecided shares and how those results should not be considered very accurate. Our poll of the Republican primary race for State House district 30 in mid-July had an undecided share of 56.9%. With Undecideds that high and a gap between the two candidates of less than 5%, it is impossible to make any kind of projection of a winner. If we had wanted a more accurate set of results, we should have polled this race again in mid-August.
As we get closer to the general election, we will be able to include voters from all parties, so small sample sizes should be less of an issue. But the big variable that we have to deal with in general elections is the turnout model, and what population demographics we will use to construct our calling lists to most accurately predict the proper forecast of who votes in November.