CHAPTER 3: ASSESSING THE QUALITY OF POLLS AND SURVEYS
3.3 POLLING FOR THE 2020 US ELECTION
After the 2016 election polls predicted with high probability that Hillary Clinton would win the election, there was great interest in whether the polls would do a better job predicting the results of the 2020 US election. Again, the polls predicted with high probability that the democratic nominee, Joe Biden, would win the presidency. In the end, that turned out to be the case, with Biden obtaining 7 million more votes than Donald Trump. However, US presidential elections are won or lost in the swing states. In many of those states, Biden’s win was not so clear-cut. The New York Times article titled “A Black Eye’: Why Political Polling Missed the Mark. Again.” presents the final polling averages for each state alongside the actual outcome. For example, the polling averages predicted that Biden would win Wisconsin by 10% points. However, Biden ended up winning by less than 1% point. This was still a win for the candidate, but it was not a win for polling. As we will begin to understand in Chapter 6, the average of all the sample averages (at least in theory) is the truth (the population average). In a polling context, this means that if the polls conducted were of good quality, the average of all the polls should have been very close to the outcome of the election.
This was true for polls on the national level but not so true for swing states such as Florida and Wisconsin. We will go beyond the news headlines to the source of the poll to try to understand why the polling averages were so far from the actual outcome of the election in these key swing states. A key swing state in every election is Florida. Historically, the race in that state is often a tossup between the two major party candidates. On election night, one of the first states on which the media focused was Florida.
If Joe Biden were to win Florida, then he was almost certain to win the presidency. On the day of the election, the website FiveThirtyEight, which averages nationwide polls, showed that Biden was ahead by a margin of 2.5% points in Florida. However, shortly after midnight on election night, the Associated Press called the race in Florida for Trump. In the end, Trump won Florida by 3.4% points, a deviation of almost 6% points from the polling average. In the 2008 US election, FiveThirtyEight’s Nate Silver predicted the outcome of the election in the individual states and nationally with great accuracy by calculating polling averages. However, election polling has become far more difficult to conduct since then because of lower response rates, as Silver himself concedes in the New York Time article titled “What’s the Matter with Polling?”
The FiveThirtyEight website states that they accounted for the differences in sample sizes and quality of the individual polls when calculating the polling averages. However, with so many election polls emerging every day leading up to the election, we can imagine how difficult it would be to critique every poll in depth to determine (and adjust) for its quality. However, FiveThirtyEight’s statistical models were built upon polls conducted by other pollsters. If these polls are of poor quality, then the statistical models for predicting the election outcome will be of poor quality. We will examine several of the polls that FiveThirtyEight used to calculate its polling average for Florida so that we can access the quality of the polls for ourselves. As with any analysis, polling averages are only as good as the quality of the individual polls and resulting data upon which they are based.
FiveThirtyEight provides links to the source of the polls on which their averages are based and includes a letter grade for each of the polls. The highest-graded poll (a B+), conducted in Florida immediately before the election, was one conducted by Quinnipiac University. The link to the webpage for the poll, conducted from October 28 to November 1, provided further details about how it was conducted. The poll results were 47% for Biden and 42% for Trump, based on a sample of 1,657 self-identified likely voters with a margin of error presented of 2.4%. In the end, Trump received 51.2% of the vote, and Biden received 47.8% of the vote. Although the poll results were close to the eventual outcome for Biden, the 51.2% for Trump was well outside the margin of error. The pollsters did not provide any information regarding the 11% of voters unaccounted for in the polls results. Based on the election results, it is likely that most of these voters in the end voted for Trump.
The margin of error of 2.4% is correct for this sample size if it were a random sample with a 100% response rate. The pollsters used random- digit dialing to contact likely voters on landlines and cell phones. It is highly unlikely (over a three-day period) the pollsters selected a random sample of 1,657 self-identified likely voters and got every one of those individuals to pick up the phone and respond. The pollsters did not provide the response rate, instead stating that the sample was weighted for known population characteristics indicating a less-than–100% response rate. After adjusting the sample for known differences between the sample and population, the margin of error increased to 3.2%. Even accounting for this adjustment, the result for Trump of 51.2% was still well outside the margin of error. The pollsters weighted the sample by county, gender, age, education, and race. Evidently, it was not enough to bring their predictions closer to the true outcome of the election in Florida.
The deviation of their results from the actual outcome could also be due to systematic bias in data collection or to other factors (not adjusted for in their analysis) driving the outcome of the election. In an election during which emotions are running high, it can be very difficult or impossible to know what these factors are, never mind adjust for them. How an individual will vote may not be easily predicted by simply knowing their county, gender, age, education, or race. Their decision on whom to vote for may be more personal to them and can’t be easily predicted from knowing their demographic characteristics. Finally, it should be mentioned that the pollsters asked several questions of the respondents, broken down by political party, age, and gender, without reporting the margin of error for each of these subgroup analyses. When a pollster breaks down polling data by subgroups, the margin of error increases because the results are based on a smaller sample size. The pollsters should make this clear by providing the margin of error for each subgroup analysis.
……………..
The debate on what went wrong with polling in 2020 died down much quicker than it did in 2016. In the end, Biden received 306 electoral votes, and Trump received 232 electoral votes. Pollsters could claim that the national polls got it right, with Biden winning 51.4% of the vote versus Trump winning 46.9%.
However, in US elections, national polls do not matter. From the polls we examined, we can see that the quality of state polls is questionable. Pollsters should focus their attention on trying to do a better job at the state level because these polls really do matter. As already stated, the polling averages had Biden 10% ahead in Wisconsin, but he ended up winning the state by less than 1%. A certain percentage of the electorate in Wisconsin would have looked at the poll results and decided they did not need to vote. In the end, Biden won Wisconsin by 20,608 votes. All it would have taken is another 21,000 of the Biden voters sitting it out for the election in Wisconsin to have gone the other way.
As with other types of research, the quality of polls varies by pollster. Research is difficult, and polls (and surveys) have become more difficult to conduct in recent years. With the many challenges of obtaining a representative sample of respondents, it has become hard to complete a quality poll due to low response rates. At the same time, it has become much easier to obtain a nonrepresentative (convenience) sample of respondents through online polling. In both scenarios, the pollsters can weight their results for known differences between the sample and the population. However, these known differences (or factors) may or may not be key factors in determining how someone will vote. As already discussed, the deciding factor(s) that affect an individual’s decision in choosing a candidate may be very different from the factors for which pollsters can adjust.
In an article titled “Key Things to Know About Polling in the United States” the Pew Research Center discusses what the title suggests. It describes the different ways in which news and polling organizations select their samples—from telephone calls to online—which affect the data quality. The ability to conduct polls online quickly and inexpensively has led to many firms conducting polls with little to no survey credentials. The phrase nationally representative is often used, but the public should ask further questions. The article points the reader to a guide published by the American Association for the Advancement of Science that lists key questions that should be asked before trusting the results of a poll, including:
- How were the questions asked?
- Was weighting applied? If so, how?
- How many people were surveyed, and what was the margin of error?
The article points out (as we saw in some of the polls we critiqued) that the margin of error presented is often a underestimate due to other types of error (besides sampling error) contained in the data: error due to nonresponse, coverage error (the entire target population did not have a chance of being selected), and mismeasurement error. The authors point out that the actual margin of error in the study could be twice as large as what is reported.
The authors also point out that opt-in online polls tend to overrepresent Democrats, which may be one reason why we saw a strong win for Biden in some of the polls we examined. They talk about how membership of the Transparency and Accountability Initiative, in which pollsters agree to provide key information about how their polls are conducted, is a good sign (but no guarantee) that the poll was well conducted. We saw no reference to this initiative in any of the polls we examined. Finally, they point out that polling errors can be correlated from state to state with similar demographic characteristics. This was the case in the 2016 election and certainly seemed to be so in the 2020 election. In 2016, the overlooked factor was more noncollege-educated White voters voting for the Republican candidate in battleground states than in previous elections. Again, in the 2020 election, there were factor(s) unaccounted for in the poll results in key states predicting a sizable win for Biden when that outcome turned out not to be the case. It is debatable and would be interesting to know what those factors were.
As already stated, interest in identifying those factors and what went wrong with the polls died down much quicker in 2020 than in 2016. In the end, the polls predicted that Biden would win, and that was the outcome. Polling is big business for the media and for pollsters, so it is good for both the media and for pollsters that polling lives to see another day. However, a little more reflection on what went wrong in the 2020 US election polls would have been good for polling. If no reflection is done regarding poll quality in key swing states, a rude awakening may occur again in future elections, as it did in 2016. The results of polls can drive people’s decision as to whether to vote or not, with the Wisconsin polls being a good example. Polls are good for business, but poor-quality, misleading polls are bad for our democracy.
To summarize, the power of a random sample is in the fact that we expect the sample to be representative of the population in every possible way. Whatever the factor (or factors) that drive a group of individual’s decision-making (when deciding whom to vote for in a general election) should be reflected (at least in theory) in a random sample. That is what makes a random sample an extremely powerful mechanism for getting at the truth in a population. Adjusting convenience samples through weighting for known characteristics (or factors) about the population may be helpful, but not necessarily. Again, in a US election, why an individual or a group of individuals vote for a presidential candidate may have very little to do with the county they are from or their gender, age, education, or race. The reason why someone votes for a candidate can’t always be determined by knowing that person’s demographic characteristics. Adjusting for these characteristics may increase the pollster’s probability of making a good guess, but it is no guarantee that they will be correct. The more known factors for which a pollster can adjust, the higher the probability their predictions will be accurate. In their analysis, the research firm Gallup adjusts for eight factors, and the Pew Research Center adjusts for 12 factors. The Pew Research also selects its sample of respondents using what they call the American Trends Panel (ATP) survey methodology. Begun in 2014, the ATP is a concerted effort by Pew Research to obtain (and maintain) a representative sample of Americans who are willing to partake in various surveys. Pew Research completes extensive weighting to ensure its sample data are as representative of the population as possible. In the next section, we will discuss two surveys conducted by Pew Research related to political polarization.
The quality of polls (as with any type of research) depends on the quality of the data collected. Unfortunately for polling, collecting quality data is becoming increasingly difficult to do. For many pollsters, the methods of selecting samples (online, telephone, text message) and subsequent weighting end up with a poll result far from the outcome of the election. In March 2021, FiveThirtyEight’s Nate Silver announced that FiveThirtyEight would no longer take the methods used by pollsters into account when grading a particular poll. Instead, it would focus on grading the pollsters by their track record—or, in other words, how close the pollsters’ results were to the outcome of a particular election. This approach would certainly make the job of grading polls much easier when deciding what polls to include in statistical models for calculating averages. However, it is a questionable approach, and it remains to be seen how well it will work. The use of good statistical methods should result in good estimates of the truth in the population. However, if the data is of poor quality, then the results will be of poor quality. There is only so much weighting can be done by adjusting for known confounding factors when the data is of poor quality. The real issue with polling is not the methods used for analyzing the data but the quality of the data itself. Unless pollsters figure out how to collect better quality data, polling will continue its decline as a means of pursuing the truth in the population.
Available in Multiple Formats – Published by
– Respected in the Industry for over 30 years.
Statistical Thinking through Media Examples (Third Edition) by Anthony Donoghue – ©2022, 346 pages
Choose the format that suits you best:
- Paperback List Price: $106.95
- Cognella Direct Ebook: $80.95 – You save $26 (24%)
- Paperback and Ebook Bundle: $104.95
Free 3–7 day delivery · 30-day returns
Also available on Amazon – Paperback: $114.21 Hardcover: $139.00