Opinion polling's record or "Defending polling"

This year the opinion polls called our general election very, very wrong. Understandably this has led a lot of people to lose faith in opinion polls, to say that they are unreliable and should not be trusted. I am not one of those people.

It's true that opinion polls failed us spectacularly this year, but historically their track record is actually very good. As a quick back of the envelope calculation I've looked at their record by comparing the data from ICM's final polls against the actual election result in every general election since 1992 (another year the polls famously failed):

Difference between ICM final polls and election results

All data is from ICM and the data was sourced from the Guardian datablog and UK Polling Report.

This is also available as a table here, where I also included most of the significant elections from the last partliament. I excluded the Welsh devolution referendum, as it wasn't polled much, and the PPC elections, because no-one cares about those. The London mayoral election wasn't covered by ICM, so I included YouGov data there.

An opinion poll would typically have a margin of error of ±3%, given a sample size of 1000. More detail on this, and a few caveats on what I've said, are here. Most of the polls are within this range from the actual result and can be said to be broadly "correct", therefore we can see that 1992 and 2015 were the exceptions rather than the rule.

However, there was at least one election between 1992 and 2015 where the poll was incorrect, this was the 2014 European parliament election. UKIP was understated and the Conservatives overstated, but both within the margin of error (just). However, Labour was overstated both here and in the 2015 general election polls by 4-5%. No-one seemed to find this portentious at the time of the European elections, likely because European elections tend to behave very differently to general elections. Put bluntly, if you extrapolate trends from them then you're going to be wrong far more often than you are right.

Beyond this, the gulf between the results as expected and the results we got are just not very dramatic. The European elections are run under proportional representation, and although the use of regional lists rather than national lists produces some very disproportionate results for smaller parties (to the detriment of the Liberal Democrat and Green seat tallies) this does prevent dramatic, disproportional gains or losses for the larger parties. Labour didn't collapse where the polls said they'd gain, they just gained a few less MEPs than expected, which is not exciting or attention grabbing. It's small wonder that people missed it, but it's interesting how a systematic flaw in the polling may have already been visible last year.

Given that polls are actually reasonably reliable, we now must face the elephant in the room. What went wrong in the general election polling?

Elephant_-_Jardim_Zoológico_de_Brasília_-_DSC09849.JPG

Elephants.

Obviously the polling companies are investigating this. Their whole business is measuring public opinion, if they can't do that well then their very existence is in danger! Clearly until the formal investigation is concluded anything I (or anyone else) says is largely speculation, but I would urge anyone interested with time to read some early post-mortems by the pollsters themselves (YouGov, ICM, Populus, ComRes and Survation). I particularly recommend the articles from YouGov, ICM and Populus.

Some explanations that have been proposed for why the polls were so wrong are that there was a late swing to the Conservatives, not picked up by the polls; that there was a "shy Tory" effect where Conservative voters were too embarassed to admit their real voting intention to the polling companies; or that there were fundamental issues with sampling that meant the polling companies were not polling a representative sample of the British public.

The late swing hypothesis has been advocated by Survation, who released a previously unpublished poll after the election. The fieldwork for this poll was carried out shortly before the election but not published initially as the results were thought to be bizarre and probably wrong. However, after election day it turned out they closely resembled the election results and the poll was swiftly published. This said, YouGov also did a poll, with a large sample size, on the day of the election itself which showed no net change in voting intention, which weakens this hypothesis greatly. A large late swing is possible but it seems highly unlikely that it would be picked up only by one company.

The "shy Tory" hypothesis presents different problems. Polling companies either work by anonymous polling of panels of people over the internet, or by semi-random phone interviews. If the "shy Tory" hypothesis accounted for the errors in the polls then you would expect higher Conservative vote shares in the internet-based polls, which was never observed. In fact phone polls tended to have slightly higher shares of the vote for the Conservatives (at the cost of UKIP). Other problems with this idea are laid out elegantly in the YouGov and ComRes articles I linked to earlier.

My personal inclination is that the third hypothesis, that the opinion polls were consistently missampling the population, is most likely to be correct. Telephone pollsters have had great problems with low response rates to their calls (a 10% response rate is typical) and internet pollsters work on panels of people who volunteered for polls so both type of pollster aren't collecting from truly random samples of the population. The article from ICM shows the process by which they adjust their raw data from telephone interviews, and that their raw data is not any closer to the general election results than their adjusted, published data. In my view, this suggests a problem with the raw data itself, the sample.

Where the conventional methods of polling failed there have been a few hints of companies that made better predictions. Reportedly Labour's private polling showed a result that was closer to the one we had on the night. Their polling was unusual in that they asked several questions on topical issues before asking the voting intention question, rather than leaving these questions until afterwards as is standard. However, these polls are unpublished and some people in Labour even say they never even existed, so I'd take this story with a pinch of salt.

Another tidbit is that in the last days of the campaign SurveyMonkey, clearly not a dedicated polling company, invited random UK participants of their user-made surveys to fill out a brief poll about the election afterwards. The results were not spot on, but were nevertheless far closer to the real results than the opinion polls were. I couldn't find information about their methodology but I strongly suspect that the reason for their success is that their sample was simply more random and more representative than that of the opinion pollsters.

I hope that I have demonstrated that the poor quality of the polls this year were unusual, even if I cannot confidently pin down the cause of their failure. Taking a broader view on their record shows that they can have great predictive value, and if their issues are resolved then they will remain the best quality data we have to predict future political events. And ultimately, a prediction of the future that isn't data-driven is nothing more than a blind guess.

If you also want to keep up to date with the polls then I'd recommend visiting UK Polling Report or Political Betting. The characters of the sites are quite different but either will keep you up to date and reliably informed. You can also keep up to date on Twitter at Britain Elects.