In March of 2016, the American Statistical Association (ASA), “the world’s largest professional association of statisticians” (5) 1, took an unprecedented step: it issued a statement (“ASA Statement on Statistical Significance and P-values”), which was published online under the auspices of one of its publications, The American Statistician (TAS), on the “proper use and interpretation” (7) of a certain statistical measure – the “p-value”. For those of you who escaped, in High School and/or College, the blissful world of statistics, “p-value” is short for “probability value”, which is a numerical index practitioners rely on to reach a conclusion based on the data they’re analyzing. Generally speaking, we use probabilities as a measure of uncertainty: the probability of coming up Heads on the flip of a fair coin is said to be 0.5 or 50%; if you decide to play the lotto (SuperLotto Plus) in my home state of California, there is roughly one chance in 40 million that your number will come up; in other words, you’re near certain to loose – but, still, there is a chance, however infinitesimal, that you will win. (NOTE: This is not an endorsement for games of chance.)
The pronouncement, whose target audience are “researchers, practitioners and science writers who are not primarily statisticians” (3), was unheard of because never before in its long history (the association was founded in 1839) had the ASA told practitioners how to use any statistical technique or methodology. (In the interest of full disclosure I should let the reader know that I am a member of the ASA.)
How did this all come about? An introduction to the ASA statement, written by the association’s executive director, Ron Wasserstein, and the editor of TAS, Nicole Lazar, provided some background information. Its purpose is to explain what led the association’s board of directors to make the statement and the process that led to its publication. Wasserstein and Lazar identified two areas of concerns that “stimulated” (1) the board’s response: i) a recent, “highly visible” (1), and ongoing discussion within scientific journals on the questionable use of statistical methods in the process of scientific discoveries; ii) the reproducibility and replicability “crisis” (2) in science.
Regarding the first issue, Wasserstein and Lazar quote several sources that talk about statistics and its “flimsy foundation” (1), its “numerous deep flaws” (1), etc., on one side; and on the other, the defenders of statistical methods who claim that the problem is not statistics, but that a lot of data analysis is done by people who are not “properly trained [my emphasis] to perform” (1) it. The second area that spurred the ASA board to action is described thus: “The statistical community has been deeply concerned about issues of reproducibility and replicability of scientific conclusions.” (2) “Reproducibility and replicability” means that nobody else can come up with the “scientific conclusions” the original researchers presented. For example, some readers may remember the “cold fusion” episode, back in 1989. Well, here was a case where two Utah University researchers made a claim to scientific discovery, but no one else in their field was able to replicate their findings.2 The “reproducibility and replicability crisis” has been brewing for a few years, but seems to have come to a head in August 2015 with an article in the journal Science which found that although 97% of the original studies, albeit in the field of psychology, that were scrutinized reported “statistically significant” results, only 36% of the replications did (p.944).3 In the ASA’s view, this creates “much confusion and even doubt about the validity of science” (2), and the “misunderstanding or misuse of statistical inference” (2) is partly responsible for this situation.
The authors tell us that “the Board envisioned that the ASA statement on p-values and statistical significance would shed light on an aspect of our field that is too often misunderstood and misused in the broader research community, and, in the process, provide the community a service.” (3) At the Board’s behest, Wasserstein assembled a “group of experts representing a wide variety of points of view” (3) to complete this task. Wasserstein and Lazar report that the “statement development process was lengthier and more controversial than anticipated.” (4) They also assure us that “nothing in the ASA statement is new. Statisticians and others have been sounding the alarm about these matters for decades, to little avail.” (5) They expressed the hope that the statement “would open a fresh discussion and draw renewed and vigorous attention to changing the practice of science with regards to the use of statistical inference.” (5)
The ASA’s message
Given this array of dire circumstances, e.g., some researchers throwing the “p-value” into the “dustbin of history” (more on that later), one could expect a statement full of vim and vigor, breathing fire and brimstone, and mounting a vigorous defense of one of the cornerstones of the “science of statistics”. But no. Instead, we are regaled with a pronouncement couched in very mild-language whose ambition is to clarify “several widely agreed upon principles underlying the proper use and interpretation of the p-value” (6-7). So, for example, it tells us that “the p-value can be a useful statistical measure”: hardly a ringing endorsement, but neither is it a recommendation to discard it. The ASA statement is divided into five sections: Introduction; What is a p-value?; Principles; Other Approaches; and Conclusion. When Wasserstein and Lazar in their Introduction (1-6) state that “[n]othing in the ASA statement is new” (5), they are not kidding. The contents of the Principles section of the statement, reads, for the most part, like something students taking their first introductory course in statistics would find in a widely relied upon textbook like David S. Moore’s The Basic Practice of Statistics (New York: W.H. Freeman, 1995; now in its seventh edition): for example, failing to reject the null hypothesis (H0) does not mean you have proved it to be true; rejecting the H0 does not mean it is false or that your research hypothesis (symbolized as H1) is true; statistical significance (i.e. rejecting the H0 and concluding in favor of H1) is not necessarily the same as substantive or clinical importance; etc. The beginner in statistics is, of course, entering a new cultural realm; like any other practice such as learning to be a chef, a crane operator, an automotive technician, or a brain surgeon, it is a process of socialization, i.e. it is a process that inculcates the rules and behavior that are considered appropriate within that culture. Thus, all these prescriptions are rituals used to induct you into the scientific culture of inferential statistics: the student learns the norms that define the “proper use and interpretation of the p-value” (8).
The statement claims that “misuses and misconceptions concerning p-values” are “prevalent” (11) in “the broader research community” (3) among those “who are not primarily statisticians”. It also states that “some statisticians prefer to supplement or even replace p-values with other approaches” (11), thereby encouraging “the broader research community” to do the same. One of the “other approaches” mentioned are “confidence intervals” (11). I would wager to say that there is just as much “misuses and misconceptions” (perhaps more “misconception” than “misuse”) of “confidence intervals” among “the broader research community” as there are concerning p-values. To take just one example: a scholarly book first published in 2012, which is a compendium of articles on a specific topic. (The book will remain nameless, as will the author, and the quote has been modified to insure anonymity. I do not wish to embarrass anybody, after all, errare humanum est, and I’ve done plenty of that myself, thus hardly in a position to cast the first stone.) The article in question, suitably altered, states: “Swedish public approval for paternity leave is 67% ±3 percentage points. (…) [I]n repeated samples, we would expect the true level of public support for paternity leave to fall between 64 percent and 70 percent in 95 out of 100 samples.” Clearly, this illustrates a misinterpretation of the concept of “confidence interval”, but the editors of the book did not catch it in time. However, the second edition, published four years later, corrects the mistake: “if the survey were repeated many times, 95 percent of the samples of this size would be expected to produce a margin of error that captures the true percentage of Swedes supporting paternity leave.” Therefore, advocating the use of confidence intervals in lieu of p-values does not seem to be much of a solution. Obviously, supplementing the p-value with a confidence interval would not satisfy, one would think, those who advocate its abandonment.
The ASA statement is by no means condoning the banishment of the p-value – nor is the ASA likely to do so in the future. This methodology has been with us for nearly a century and has been used, correctly or not, in multitudes of studies in a variety of disciplines that all harbor the science label. It is an elaborate scheme that has been the centerpiece of statistical practice and based on the work of heavy hitters like Ronald Fisher (1890-1962), Jerzy Neyman (1894-1981) and Egon Pearson (1885-1980).
Other “other approaches” (11) mentioned in the statement: Bayesian statistics. This is a methodology that lost out to what is often referred to as the frequentist school (Fisher, and Neyman-Pearson) back in the 1930s and 40s. These are the two major schools of inferential statistics. “Lost out” does not mean the Bayesian approach is without its aficionados: in fact, it has been used by a substantial minority in the statistical community starting in the 1950s. But it has always been treated as a second rate citizen in the world of statistics: most introductory textbooks and beyond teach the frequentist creed (null hypothesis testing and the p-value), and (just about) all the commercial software packages are programmed along that same doctrine. It is why you are more likely to be assigned as an introductory textbook the one mentioned earlier by David S. Moore, or one by Mario Triola (Elementary Statistics), rather than one by Donald A. Berry (Statistics: A Bayesian Perspective). But Bayesianism is not without controversy either…
Statistical testing by means of the null hypothesis (NHST, hereafter) and its resulting p-value is one of the cornerstones of knowledge production in many sciences. Back in 2001, a prominent statistician could write: “hypothesis testing has become the most widely used statistical tool in scientific research” (David Salsburg, The Lady Tasting Tea, p.114). Controversy about this approach is by no means new, it is almost as old as methodology itself, and it has been going on ever since – that hasn’t stopped the majority of statistics users from relying on this approach (“that’s what we’re taught; so we do what we’re taught”). So why did the ASA feel that it had to come out with a statement at this time in the history of its discipline? In other words, why this sudden urge on the part of the ASA to intervene on a topic that has been contentious for decades? I’d like to suggest a few items that might help make sense of the ASA’s action, and the tone and contents of its statement. I’m sure there are many more, but these come to mind immediately and seem to me to be important.
First, the dominance of the frequentist school is being challenged; not so much by means of debates between the two camps (those have been going on for years), but by a sort of critical-mass effect favoring the Bayesian school of thought: i.e. an increasing number of practitioners are adopting that approach. It seems to me that the frequentist creed has lost the hegemonic position it occupied for so long in the field of statistics. As David Salsburg states in his very informative history of statistics in the 20th century: “By the end of the twentieth century, [Bayesian statistics] had reached such a level of acceptability that over half the articles that appear in journals like Annals of Statistics and Biometrika now make use of Bayesian methods” (op. cit., 129-130).
The message from the ASA statement is: there is more to the practice of statistics than NHST. This is reflected in the composition of the panel of experts put together by the ASA. At first glance, and as best as I can judge, it appears to me that there is a good mix of different schools of statistical practice, including Bayesian and frequentists, which may also explain the mild language I referred to earlier. The “wide variety of points of view” (3) represented by the group of experts insured that the ASA statement would be reflective of that diversity, thus requiring compromise (“The statement development process was lengthier and more controversial than anticipated” (4)). As the introduction by Wasserstein and Lazar reports, the statement went through “multiple drafts” (4). The ASA statement clearly shows that nobody got the upper hand: there’s NHST of course, but there are also “other approaches” just as important.
I have mentioned earlier that the contents of the Principles section of the ASA statement reads like something you would find in an introductory statistics textbook. Did the ASA believe that the reaffirmation of these principles would put an end to the “misunderstanding or misuse of statistical inference” (2); that “the proper use and interpretation of the p-value” (8) would blossom as a result? After decades of the inculcation of thousands upon thousands of students in a variety of scientific disciplines into the ways of NHST, the misuse and misinterpretation of the p-value persist. It is not the reiteration of rules that folks have been taught in their first class of statistics (and repeated thereafter) that is going to change that situation. So what’s the point? I would say that the ASA statement is primarily a symbolic act. Of course, misusers and misinterpreters of statistical inference are not going to see the light suddenly after reading the Principles section of the statement. What the ASA is doing is asserting its jurisdiction over the field of statistics. It is telling those “who are not primarily statisticians” (3) that it alone has the authority to determine what qualifies as the “proper” use of statistical tools. It is not for non-statisticians to decide what in statistics should or should not be discarded. As the Introduction to the ASA statement says: “Though there was disagreement on exactly what the statement should say, there was high agreement that the ASA should be speaking out about these matters” (5). In other words, despite the diversity and discord within the ASA, it stands united when it perceives that the discipline itself is being challenged. Certainly one of the functions of a professional organization is to uphold and maintain the good reputation of its area of activity.
Although the ASA statement assigns responsibility for the “crisis”, it eschews conveniently the issue of ultimate cause. It piously and wishfully states that “that the scientific community could benefit from a formal statement clarifying several widely agreed upon principles underlying the proper use and interpretation of the p-value” (7-8). In the Introduction to the statement, the authors bemoan that “our field (…) is too often misunderstood and misused in the broader research community” (3). Thus the blame is placed squarely, although indirectly, on those “who are not primarily statisticians” (3) – who are, I would submit, the vast majority of statistics users. I would also venture to guess that the advent of the personal computer and the development of off-the-shelf commercial statistical software have made the access to statistical tools relatively easy, and, as a result, has increased substantially the community of those “who are not primarily statisticians” – thereby making the opportunity for misuse and misinterpretation that much more likely. Statistics is a unique field in that its products are used mostly by non-statisticians. Anytime researchers, whatever their discipline, have collected quantitative data they are most likely to make use of the tools supplied by statistics. As Neyman said, back in 1955, statistics is the “servant of all sciences.” In effect, the ASA statement is deflecting responsibility away from its own discipline and telling non-statisticians: “There is nothing wrong with our tools. We know how to use them properly. It is you, non-statisticians, who misunderstand and misuse our products.”
Let’s look more closely at the way the ASA statement frames this controversy. To counter those in the “broader scientific community” (3) who question the very usefulness of statistics (“it’s got more flaws than I’ve had hot dinners” – a Rumpolian version of Tom Siegfried’s assertion for the reader who is not Facebook-savvy)4, it reiterates the key role statistics has in the production of scientific knowledge; but, more importantly, it identifies the individual user, who is not primarily a statistician, as a causal agent in this crisis. What are the implications? First, as already mentioned, it locates the problems outside the discipline of statistics: a) there is nothing wrong with the corpus of statistical knowledge; b) statisticians know how to use the tools of their trade, and know their limits. Some who are not primarily statisticians do not. Perhaps, another delicately hidden message is that “the broader research community” (3) should rely more heavily on statisticians instead of trying to go at it alone and making a mess of it.
The ASA statement tells us: “Statisticians and others have been sounding the alarm about these matters for decades, to little avail.” (5) So, are these statistics users deviants persisting in their deviant ways? After all, they don’t follow the statistical rules, and as a result give statistics a bad name, and impede scientific progress. No. There is a difference between breaching an ethical or moral norm, and failing to follow technical prescriptions. Although the latter might give rise to chastisement and calls of incompetence, it does not bring about the indignation, the moral indignation that the former would. This also explains the tone of the ASA statement: no “fire and brimstone”. Often times, ethical norms violations call for punishment. For example, if a researcher falsifies his data and is discovered, punitive action is likely to follow: his published paper will be retracted, he might be degraded (loss of an academic title, e.g. “PhD”), he might lose his job, and he might even be sued in a court of law. Not so for the misuse or misinterpretation of a technical rule. It might cause the user some embarrassment, and she might be reprimanded; and if this misuse or misinterpretation happens to be published, it will be corrected, often quietly (i.e. without a corrigendum or erratum) at the first opportunity by a vigilant editor, as in the confidence interval example given earlier. Contrary to the moral violator or deviant, in the case of the violation of a technical norm, the culprit is not seen as somebody who willfully does so. The latter will be likely and gladly willing to mend his or her ways; not the former, who will have to be coerced into submission. The ASA has no enforcement power over honest “misuses and misinterpretations”, especially if these are perpetrated by non-members.
Perhaps, an additional implicit message from the ASA statement is that researchers outside the statistical community are not being trained in the proper use of the methodologies provided by the field of statistics. In other words, this has to do with showing the technical norms-violators the correct path: “changing the practice of science with regards to the use of statistical inference.” (5)
One important ingredient for a “problem” to get noticed (or, perhaps more precisely, for an issue to be elevated to the status of “problem” or “crisis”) and, eventually, acted upon, is for it to be covered by the mainstream media.5 The ASA statement says as much when it refers to the “highly visible [my emphasis] discussions over the last few years” (1). What it does not tell us is that until recently the controversy over NHST was confined within specific disciplines like psychology (a big consumer of statistics), sociology, epidemiology, etc., and, lest we forget, statistics. In other words, the disputes regarding that topic were happening largely behind closed doors, so to speak. But in this latest round, the controversy spilled into the general scientific media – it made it into journals that are the flagships of the scientific community: Science and Nature. As long as the NHST controversy was limited to the pages of the American Journal of Epidemiology, the Journal of Experimental Education, Quality & Quantity, or the American Sociologist, not to mention the pages of statistical journals, “nobody” really paid attention to it. But when it got splashed all over the pages of the very prestigious mainstream scientific publications mentioned, it could no longer be ignored. It became incumbent upon the ASA to intervene: it could not sit back and let the discipline be bandied about.
And then, there is data science – that topic in itself deserves a post. Data science is encroaching upon the territory traditionally occupied by the discipline of statistics. Its emergence is a rather recent phenomenon. It took many statisticians by surprise. Isn’t statistics the science of data? Statistics is often defined as the study of the methods for collecting, processing, analyzing, and interpreting quantitative data. Or more succinctly, as David Moore puts it: “Statistics is the science of learning from data.” In 2013, the president of the ASA asked in an editorial in the association’s monthly magazine: “Aren’t we data science?”6 Somehow some folks outside the field of statistics (e.g. computer science) discovered an area of data that they believed statistics could not deal with: big data! As big data and data science were commanding the attention of the Obama administration, major institutions (e.g. National Science Foundation, National Institutes of Health), and the media, traditional statisticians were being bypassed, ignored, and felt they were being left behind. Data science appeared to portray itself as an independent field, not as a specialty within the discipline of statistics, like biostatistics, for example. It seemed to be questioning the authority, and hence the legitimacy, of statistics. This threat to statistics’ customary bailiwick was taken seriously by the ASA leadership, and it responded quickly (although some ASA members would argue “not quick enough”) based on the principle “if you can’t beat them join them.” Well, not quite, it was more like: let’s try to co-opt them into our fold (“bridging the ꞌdisconnectꞌ”).7 Thus, ASA members, suddenly, saw the expression “data science” pop-up all over the place. For example, the ASA journal that started publication in 2008 (before the “data science” craze) under the title Statistical Analysis and Data Mining has now been given (as of the beginning of this year) the subtitle “The ASA Data Science Journal”. Our sisters in statistics who attended a “Women in Statistics” conference two years ago, will now (in 2016) attend a conference called “Women in Statistics and Data Science”, and may well wonder if there is not redundancy in this new title. Did the women in statistics forget to invite the women in data science back in 2014? Or did the women in statistics assume that “Woman in Statistics” was an all-inclusive title (i.e. by definition women in statistics were doing data science, what else?)?
The reader may well ask: what does all this have to do with the price of fish? Or, more appropriately, what does the emergence of data science and big data have to do with the ASA’s statement on p-values? I hope the reader will forgive me for the platitude but everything happens in a context. What I am suggesting is that the brouhaha about data science and big data is part of the wider context in which the field of statistics has had to withstand some serious probing. For example, two economists, back in 2010, wrote an entire book arguing that “Statistical significance is not a scientific test” (p.4).8 In 2015, the editors of one social science journal, Basic and Applied Social Psychology (BASP), let their readers and potential contributors know that both NHST and confidence intervals would be banned from their publication.9 In their 2016 editorial, the editors of the same periodical lament the fact that “many researchers continue to believe that p remains useful for various other reasons” (p.1).10 In other words, and in direct contradiction of the ASA statement, which would come out a month later, the BASP editors don’t believe the p-value to be a “useful statistical measure”. As a result of this worrying environment, the ASA has had to step in and defend its turf. Its statement is a declaration in defense of the integrity of the discipline and an affirmation of the organization’s jurisdiction over matters statistical.
In summary, I see the ASA statement as primarily a symbolic act that came about as a result of the hostility against statistics expressed within the scientific community in the past few years. More precisely, it is a symbolic act under the guise of being instrumental. The instrumental guise part of the statement consists in declaring that by reiterating principles widely accepted in the statistical community “the conduct or interpretation of quantitative science” could be improved. (8) The statement seeks to defend the integrity and the value of the discipline, and reaffirms its central role in the production of scientific knowledge; it reasserts the ASA’s authority over matters statistical; it establishes a clear boundary between statisticians and non-statisticians and asserts that the latter misuse the tools provided by the field of statistics, and, consequently, that they, not statistics, are one of the causes for the replicability and reproducibility crisis.
1 All numbers in parentheses refer to the pages of the document published in TAS: “ASA Statement on P-Values and Statistical Significance” (Ronald L. Wasserstein & Nicole A. Lazar (2016): “The ASA’s statement on p-values: context, process, and purpose”, The American Statistician). It can be accessed freely at the following page: http://amstat.tandfonline.com/doi/abs/10.1080/00031305.2016.1154108. To learn more about the ASA go to http://www.amstat.org/ASA/about/home.aspx.
2 “Reproducibility” refers to the inability to redo the original data analysis despite being given the data and the analytic procedures followed by the original researchers.
3 Brian Nosek, “Estimating the reproducibility of psychological science”, Science, 28 August 2015, http://science.sciencemag.org/content/349/6251/aac4716.
4 Siegfried, T. (2014), “To make science better, watch out for statistical flaws,” ScienceNews, available at https://www.sciencenews.org/blog/context/make-science-better-watch-out-statistical-flaws.
5 By “mainstream media” I mean, as will become clear shortly, that of the scientific community. Although the controversy was given some press in the mainstream lay media.
6 Marie Davidian, in Amstat News, July 2013, pp. 3-5.
7 “The ASA and Big Data”, Nathaniel Schenker, Marie Davidian, and Robert Rodriguez, in Amstat News, June, 2013, p. 4.
8 Deirdre Nansen McCloskey and Steve Ziliak. The Cult of Statistical Significance: How the Standard Error Costs Us Jobs, Justice, and Lives; Ann Arbor, MI: University of Michigan Press, 2008.
9 David Trafimow & Michael Marks (2015) Editorial, Basic and Applied Social Psychology, 37:1, 1-2.
10 David Trafimow & Michael Marks (2016) Editorial, Basic and Applied Social Psychology, 38:1, 1-2.
Although the events I relate in this post took place more than a year ago, the topic of the controversy (the use of opt-in online panels for election polling purposes) is still very much current, especially at this time of electoral contests, when we are likely to see both successes and blunders (recall the recent 2015 UK parliamentary elections).
On July 25, 2014, the New York Times (NYT), and its polling partner CBS News (CBS), made an announcement that “rocked the polling world” (Washington Post, 07/31/14). The news organizations reported that they had retained YouGov to conduct their polls for the upcoming midterm November elections. The remarkable part was that the polling house is one that bases its polls on an Internet panel, meaning folks who volunteer to take a survey from time to time. This represented a departure from NYT/CBS’s traditional approach: in the past they relied on polls that used telephones and random-digit-dialing (RDD) to reach respondents. RDD is held as the “gold standard” when it comes to polling and survey research by telephone because it conforms to the statistical theory of probability (or random) sampling. In contrast, Internet panels do not fit this theory because panel members are not selected at random; they self-select themselves to be part of the panel.
The NYT indicated that their polls would be based on “an online panel of more than 100,000 respondents nationwide” (NYT, 07/27/14). It attributed its choice to work with YouGov to the fact that “declining response rates may be complicating the ability of telephone polls to capitalize on the advantages of random sampling” (id.). In the same article, it acknowledged both the limitations of working with an online panel (“only the 81 percent of Americans (…) use the Internet”), and YouGov’s less than perfect estimates in the 2012 election (it “underestimated President Obama’s share of the Hispanic vote in 2012”). However, YouGov’s results, it affirmed, “are broadly consistent with previous data on the campaign” (id.). It also cited the serious problem that has plagued telephone sampling: “Only 9 percent of sampled households responded to traditional telephone polls in 2012, down from 21 percent in 2006 and 36 percent in 1997, according to the Pew Research Center” (id.).
In a piece dated 10/05/2014, the NYT stated, perhaps to placate those who criticized their use of the YouGov panel, “The YouGov online surveys are being used to supplement, not replace, the Times’s traditional telephone polls.” It went on to explain that the NYT/CBS “political and social surveys are conducted using random digit dialing probability sampling,” and that the “YouGov data is used for The Upshot election forecasting model in key congressional races and Senate battleground states.”
The event described above was considered “a very big deal in the survey world” by the Pew Research Center’s director of survey research, Scott Keeter 1. Days after the NYT/CBS revelation, the American Association for Public Opinion Research (AAPOR) issued a statement (08/01), signed by its then president, Michael Link, expressing its “concerns” regarding the use of “opt-in Internet” surveys 2. AAPOR is a professional organization that regroups polling and survey research practitioners that work in the private sector, in government, and in academia. (In the interest of full disclosure the reader should know that I am a member of this organization.) As such, one of its responsibilities is to police what is done in the polling industry. AAPOR chastised NYT/CBS: first, for using an Internet panel to report on an electoral contest, because this method of selecting a sample has “little grounding in theory”; second, for a lack of “transparency” regarding how the news organizations arrived at the results they published. As for this last point, the statement read: “While little information about the methodology accompanied the story, a high level overview of the methodology was posted subsequently on the polling vendor’s [i.e. YouGov] website. Unfortunately, due perhaps in part to the novelty of the approach used, many of the details required to honestly assess the methodology remain undisclosed.”
AAPOR rebuked the NYT for abandoning its high standards in matters of polling, and only telling its readers that “the old standards were undergoing review”. It also insisted that “standards need to be in place at all times.” In addition, it criticized the Times for publishing a story (NYT 05/20/2014) that reported on a study whose respondents were recruited by means of ads on Facebook. It warned that “using information from polls which are not conducted with scientific rigor in effect sets a new–lower–standard for the types of information that other news outlets may now seek to report.”
While acknowledging that the “world of polling and opinion research is indeed in the midst of significant change”, in so far as data collection, it warned that “the use of any new methods [should] be conducted within a strong framework of transparency, full disclosure and explicit standards.”
Reactions to the Reaction
Many individuals had their say about AAPOR’s statement. I will concentrate on two of the more notable (and accessible) ones – in my view. Although, predictably, there were two types of reactions to AAPOR’s announcement, for and against, the ones presented here are of the negative variety.
One response (08/05) came from a long time member of the organization, Reg Baker, on his personal blog: The Survey Geek 3. He has been part of AAPOR’s leadership, having been, among other positions, a member of its executive council. The title of his post says it all: “AAPOR gets it wrong.” What did it get wrong?
He writes: “We have well over a decade of experience showing that with appropriate adjustments these polls are just as reliable as those relying on probability sampling, which also require adjustment.” He adds: “There is a substantial literature stretching back to the 2000 elections showing that with the proper adjustments polls using online panels can be every bit as accurate as those using standard RDD samples.” Presumably Baker’s remark was in response to AAPOR stating: “we are witnessing some of the potential dangers of rushing to embrace new approaches without an adequate understanding of the limits of these nascent methodologies.” So what Baker is saying is that AAPOR is wrong on two counts: online polling is not new, and we do have “an adequate understanding of [its] limits.”
AAPOR is also wrong, Baker believes, when it says that YouGov did not provide sufficient details regarding its methodology. On the contrary, Baker asserts: “The details of YouGov’s methodology have been widely shared, including at AAPOR conferences and in peer-reviewed journals.”
He says he agrees (partially) with AAPOR on one point. The NYT, he opines, did “an exceptionally poor job of describing [the decision to use online panels] and disclosing the details of the methodologies they are now willing to accept and the specific information they will routinely publish about them. Shame on them.” But he faults AAPOR for not providing practitioners with “a full set of standards for reporting on results from online research,” despite the fact that this methodology has been around for nearly two decades and is widely used by researchers around the world. One should note that Baker was chair of a 2010 AAPOR task force on opt-in online panels. One might ask: would that not have been a good opportunity to devise “a full set of standards for reporting on results from online research”? But AAPOR’s Executive Council made it very clear that it was not in the task force’s mandate to do so. Nevertheless, the task force did give one recommendation regarding the reporting of survey results based on the opt-in methodology: that surveys based on opt-in or other self-selected samples should not report a “margin of error” as this is not appropriate for non-probability samples.
A more strident reaction, at least in its second formulation, came from a Columbia University professor of political science and statistics. At first the good professor, Andrew Gelman is his name, in a blog called “The Monkey Cage” (?), a regular feature in the Washington Post, provided a response, with his colleague David Rothschild of Microsoft, in the best tradition of polite academic dialog 4. The authors’ post, “Modern polling needs innovation, not traditionalism”, was a model of moderation and reasonableness. In it, they gave AAPOR an emphatic reverential bow, calling it “a justly well-respected organization”, and warned their readers that they were not “disinterested observers” since they collaborate with YouGov on a number of projects. They found AAPOR’s statement, although “undoubtedly well-intentioned”, “so disturbing”. Why? Because, the authors believe, AAPOR’s “rigid faith in technology and theories or ‘standards’ determined in the 1930s” is “holding back our understanding of public opinion” and “putting the industry and research at risk of being unprepared for the end of landline phones and other changes to existing ‘standards’.” Like Baker, the authors point out that YouGov’s methodology has been widely discussed in professional meetings and in peer-reviewed journals. In their view, the theory behind YouGov’s methodology is “well-founded” and “based on the general principles of adjusting for known differences between sample and population.” They add: “If anything, people on the cutting edge of research are not hiding anything; on the contrary, we are fighting hard to overcome entrenched methods by being even more diligent and transparent.”
Although not generally known, academics are human too. And, as any other member of the species, they are prone to the occasional bile-spilling. This is what happened in Gelman’s second formulation of his response to the AAPOR missive posted (08/06) on his personal blog, which rejoices under the name of “Statistical Modeling, Causal Inference, and Social Science”. The article is titled (hold on to your hats) “President of American Association of Buggy-Whip Manufacturers takes a strong stand against internal combustion engine, argues that the so-called ‘automobile’ has ‘little grounding in theory’ and that ‘results can vary widely based on the particular fuel that is used’” 5. The professor directed his ire against Michael Link. He accuses Link of having an “anti-innovation” attitude, of “making things up” to support “his” position, of “talking out of his ass” (no, I’m not making this up; go check for yourself), and of “aggressive methodological conservatism” – apparently, the latter must emit some putrid odor since it seems to have occasioned (twice) a desperate search for the vomit bag – as he reports that it “just makes me want to barf” (no, I’m still not making this up). (Fortunately, our somewhat indisposed professor did make a few substantive points – I will come to that in a moment.) In a blog later in the year (12/09: “Buggy-whip update”), he tells his readers that six days after the posting just mentioned, he sent a personal email to Link asking him to explain “his” (i.e. AAPOR’s) statement of August 1 6. He received no response. Somewhat miffed, the professor writes: “I get frustrated when people don’t respond to my queries.” Tell me about it! Now, it seems to me that it doesn’t take a very sophisticated statistical model to predict that the probability of receiving a response given Gelman’s August 6 post is much closer to zero (0=no response) than to one (1=response).
Now to the substance. Gelman makes the point that there really is no difference between a “probability” sample that has a response rate of 10% and an opt-in Internet panel – both are self-selected samples. In either case, in order to estimate what it is you are trying to estimate (e.g. the percentage a political candidate will receive), you “have to do some adjustment to correct for known differences between sample and population,” and in the process “make assumptions”. The methodology is “not new”, he says, and “a lot of research” has been done on these issues. Regarding the latter, he mentions the work of Roderick Little, an expert in the statistics of “missing data”.
A Sociological View
This controversy illustrates several themes of the sociology of science, in our case, social science:
AAPOR as the guardian of agreed upon standards for the conduct of polls and survey research is duty bound (as one AAPOR member put it, it would have been irresponsible of AAPOR not to have said something) to intervene when any of its norms have been violated – whether the violator is a member of the association or not. In its view the NYT/CBS organization had done just that when it decided to base its election forecasts on polling data that came from an opt-in Internet panel, i.e. from a non-random sample. Generally, in the past, these types of samples have been considered un-scientific. In contrast, probability (aka random) samples have been recognized, if not adopted, as the “gold standard” of sampling since the late 1940s – at least in the United States. In other words, to borrow from sociologist Thomas Gieryn, it has been the task of AAPOR to demarcate science (probability sampling) from non-science (non-probability samples) in the field of polling and survey research 7. These norms, for example, have forced news network organizations to warn their viewers, when reporting the results of a call-in poll (aka 1-800-poll or “junk” poll), that the numbers on their screens were obtained from a “non-scientific” survey.
What is considered science in the polling world has changed over the years. In the 1930s and 40s (1935-1948), the new pollsters (Crossley, Gallup, and Roper) promoted a distinctly non-probability methodology (quota sampling) as science – and (before 1948) nobody really challenged them on this as AAPOR is now challenging the NYT/CBS organization 8. Nowadays, or at least until very recently, were you to use a quota sample or any other non-random sampling methodology for your study, you were liable to get your wrists slapped (figuratively, of course) – at least, and I repeat, in the US 9. The Hite Report is a good example 10. Thus, science is what those who are empowered to say what it is say it is. And science varies depending on the era you live in and what part of the world you reside in.
The AAPOR statement is an opportunity for the association to assert its authority. It is “the leading association of public opinion and survey researchers”, and as such its credentials cannot be doubted. The statement is also the occasion the reiterate the basic tenets of the faith: “a fundamental belief in a scientific approach”; “objective standards”; polls, conducted according to “standards of quality”, “mirror reality” (that is, social reality); etc. AAPOR’s basic ruling in the August 1, 2014 release is that Internet opt-in panels are NOT quite ready for the big league – pre-election polling; they’re still wet behind the ears. The time to extend the boundaries of what is considered scientific in polling is not now, because “these new approaches and methodologies” still require “rigorous empirical testing”, etc. In other words, AAPOR re-emphasized the demarcation line between legitimate polls or surveys that provide reliable knowledge about social reality (e.g. public opinion), and “polls which are not conducted with scientific rigor” whose results are “highly questionable, if not outright incorrect”. It also stated that it is not opposed to the idea of widening the boundaries, indeed it “encourage[s] assessment of [these methodologies’] viability for measure and insight”, but this must done “within a strong framework of transparency, full disclosure and explicit standards.”
Transparency: a device to unmask illegitimate, non-scientific polls
One thing readers of the AAPOR statement might have noticed is the heavy emphasis that has been given to “transparency”. Transparency is the act of unveiling all the steps that were taken to generate the final poll results that are published: from the sampling design, question wording, and data collection mode (e.g. telephone, web), to weighting and other forms of “adjusting” the raw data. AAPOR launched its Transparency Initiative (TI) in 2014. Now, as anybody who has studied the history of polls in this country will tell you, “transparency” is not one of the pollsters’ most conspicuous virtues. Back in the 1940s, one reporter complained that he had made “several informal attempts (…) to check facts and figures” regarding Gallup polls, but that “all ended in failure” 11 (p.737). Two decades later things seemed to have improved a bit. Trying to get answers about the “Gallup system of processing” polling data, a New Yorker columnist had this to say: “By calling members of the Gallup staff, and by writing to Dr. Gallup, I was able to get answers – reluctant and incomplete, but still answers – to some of my questions about the process” 12 (p.174). Nevertheless, in the following decade, some folks were still not satisfied with the pollsters’ transparency, and not just anybody: a member of congress drafted an unsuccessful bill under the name “Truth-in-Polling Act”.
The AAPOR release states that the organization “has for decades worked to encourage disclosure of methods.” Be that as it may… but between encouragement and actual disclosure, there is a wide gap. As an example, a recent (January 2016) poll by the Harvard School of Public Health in collaboration with STAT, an organization that reports news in the health and medical field 13. One page of the 15 page report is dedicated to the poll’s methodology. Although it provides a fair amount of detail (sample size, type of sampling, dates during which the poll took place, mode of interviewing), one would be hard pressed to find any information on response rate – even though it warns the reader that non-response bias can be part of the total error of the survey. Now I am not trying to pick on the folks who did this particular survey (I just happened to receive the results in my inbox as I was writing this), or the polling house (SSRS) that actually conducted the data collection, and is a member of AAPOR’s Transparency Initiative; I am sure they are all fine upstanding researchers. I am merely illustrating that the ideal of “full disclosure” that AAPOR promotes is yet to be realized – as some gentleman from China has said, I am told, “the future is bright but the road is tortuous”.
So, one may ask, why this hard push about transparency? Answer: the Internet (at least one answer). Thanks to the advent of this technology just about anyone can do a survey or poll nowadays. Throw a few questions together (you know how to ask questions, don’t you Steve?), spend a few bucks (or loonies, if you’re in Canada) to use SurveyMonkey or some other web-survey platform, put an ad on craigslist (or elsewhere) to recruit your participants; when you’re done download the results into Excel, et voilà, you’ve got yourself a study. (Disclaimer: I want the reader to know that I am neither promoting nor endorsing the companies mentioned. I am just describing what I have witnessed during the course of my professional career.) Because this technology is so ubiquitous and seemingly user-friendly, it endangers the monopoly the polling profession has over the production of knowledge about society, in general, and public opinion, in particular. It threatens the profession in that it creates the appearance that to conduct a survey or poll no longer requires “expert” knowledge – just like the museum visitor standing in front of a Jackson Pollock painting and exclaiming “My six-year-old could’ve done that!”
The promotion of transparency is, in part, a demarcation maneuver (again to borrow from Gieryn). It is a means for AAPOR to assert its authority and to reiterate what is and what is not legitimate when it comes to polling. Those that joined the Transparency Initiative are recognized as worthy (i.e. scientific) polling practitioners; it is akin to the warning label consumers find on the package of products in the supermarkets. The “Transparency Initiative” label tells consumers that the product from a particular organization is fit for consumption, and, by extension, for those that have not integrated the TI, their products should be viewed as suspect (non-scientific).
One last thing about “transparency”: there is no such thing as “full disclosure” – at least from commercial polling houses. Polling is a business, big business, not mere idle curiosity. These companies can always invoke proprietary rights to avoid revealing how the results they publish have been produced. Thus, the gruesome details remain hidden from the public eye 14.
Redrawing the boundary: what should be considered science?
One of themes in Gelman’s response to AAPOR is the contested nature of the boundary between what constitutes sound (i.e. scientific) polling practice and what does not. As we saw AAPOR is firmly attached to the principle of probability (aka random) sampling. As I said before, this has been the central credo of the polling profession for decades. Gelman wants the boundary to be extended; he wants to push the demarcation line so that it will include non-probability samples. In reality, the line’s location has already been renegotiated since, nowadays, probability samples with response rates of 10% or less are still considered scientific 15. Gelman’s argument is that there is no difference between these types of samples and samples that recruit their respondents off the Internet (à la YouGov): they are both self-selected samples. Gelman writes: “the ‘grounding in theory’ that allows you to make claims about the nonrespondents in a traditional survey, also allows you to make claims about the people not reached in an internet survey.” In both cases, after the poll’s raw results are in, the analyst will have “to do some adjustment to correct for known differences between sample and population.”
In fact, Gelman is intimating that AAPOR seems to be unaware of this shift of the demarcation line, namely, that methodologies as the one used by YouGov are definitely inside the scientific corral. In his view, they have become a legitimate part of the polling culture. Both he and Baker state that data obtained from Internet opt-in panels polls have a solid pedigree: they have passed muster. How? By the traditional, tried-and-true, means to establish one’s claim to scientificity, or scientific worth: the peer-review system and presentations at conferences attended by one’s peers. Baker writes: “There is a substantial literature stretching back to the 2000 elections showing that with the proper adjustments polls using online panels can be every bit as accurate as those using standard RDD samples.” He could have added “or inaccurate” to his statement for the sake of completeness.
Just as AAPOR relies on “transparency” to question the scientific credentials of polling houses that rely on non-probability samples, like YouGov, Gelman, clearly wedded to the transparency norm, underscores the fact that YouGov’s chief scientist “has detailed the methodology at length and subjected the methodology and results to public transparency that rivals the best practices of major polling companies.” In addition, that same individual has written “academic papers (…) published in the top peer review journals.” He adds: “If anything, people on the cutting edge of research are not hiding anything; on the contrary, we are fighting hard to overcome entrenched methods by being even more diligent and transparent.” So there you are. Who could doubt the scientific worth of the new polling techniques? They have been peer-reviewed and they are as transparent as Baccarat crystal. Clearly they have proven, so Gelman believes, their scientificity, and therefore their legitimacy. So what’s the beef?
In his diatribe of August 6, Gelman adopts a rhetoric that has quasi-moralistic tone: he paints AAPOR as a force opposing progress. The title of his post could not be more explicit: AAPOR is stuck in the past, still relying on the horse-drawn “buggy” to get around, whereas he and his acolytes are the forces of progress, gliding in the most up-to-date mode of transportation, the automobile, propelled by the internal combustion engine. Who could argue against progress? Who would support obscurantism? AAPOR, apparently. Thus, Gelman’s whiggish attitude seems to want to locate this venerable institution beyond the pale – in that hellish zone of non-science. But really what he wants AAPOR to do is to recognize the scientific character, the legitimacy and respectability, of the new polling methodologies. It is time, he proclaims, to expand the scientific territory, to push back the boundaries, for the de jure to catch up with the de facto.
Resolution? Plus ça change… or “déjà vu all over again” (Berra)
The debate around the NYT/CBS announcement boils down to this: are polling samples based on Internet opt-in panels ready for prime time or not? AAPOR say no, Gelman and like-minded researchers say yes. How is this controversy going to be resolved? If the issue appears to be unsettled, it is only in the sphere of the de jure (an official acknowledgment from AAPOR), because, on the ground, in the de facto world, it has been resolved: pollsters have “voted with their feet”. Internet opt-in panels have been in use in the commercial polling world for nearly two decades. Powerful economic interests are at stake here: all the corporate polling organizations that have sprouted as a result of the advent of the Web. And it is not some statistical theory (probability sampling), however prestigious, especially when its application is doubtful and cumbersome, that is going to stand in the way of business: clients expect actionable results, while the polling house expects to be profitable – and so do corporate clients. Besides, as some believe (Gelman and others), plenty of tools have been developed to mitigate the limitations of self-selected (opt-in) samples, and their scientific character cannot be impugned: they can “mirror” reality just as well (or as badly) as the next probability sample.
The issue is how is this going to worm itself into AAPOR’s code of professional practice? In fact, non-probability samples have already carved themselves a bit of territory within the AAPOR canon. The current AAPOR Code of Ethics (November 2015 update) states: “Disclosure requirements for non-probability samples are different because the precision of estimates from such samples is a model-based measure (rather than the average deviation from the population value over all possible samples). Reports of non-probability samples will only provide measures of precision if they are accompanied by a detailed description of how the underlying model was specified, its assumptions validated and the measure(s) calculated. To avoid confusion, it is best to avoid using the term “margin of error” or “margin of sampling error” in conjunction with non-probability samples” 16. So the non-probability sample, anathema as it was in the not too distant past, has got its foot in the door – and then some. Does that mean the controversy is over? Apparently so. Of course, a lot of folks are not too crazy about non-probability samples; their probability counterparts are so much neater – if only those darn people cooperated, the blissful days of the 70%+ response rate would be back. But what can you do, if you’re not the Federal government? The show must go on as thespians say. Hence the online opt-in panel. Thus, non-probability sampling and probability sampling, now both harboring the science label, seem destined to live side-by-side in peaceful coexistence for the foreseeable future.
The polling profession has accomplished a complete circle: it started its modern career (ca. 1935) using non-probability samples (quotas), and now it has gone back to its roots by relying on opt-in online panels. And both claim to be scientific. Another feature they have in common is their dependence on very large samples, much larger than is required if one uses probability sampling. In the ‘30s, Gallup used “vote poll” samples in the one hundred to two hundred thousand range 17. This was considered progress compared to the mass mailing (10 million) done by the most prestigious poll of that era: the Literary Digest poll. The scientific pollsters (Crossley, Gallup, and Roper) considered the Digest’s approach to be wasteful, among other things. Nowadays, online polling organizations also rely on samples in the tens of thousands to make their forecasts.
Scientific practice, here social scientific practice, seems to be ruled, in part, by the Humpty-Dumpty philosophy: “Science means just what I choose it to mean–neither more nor less.” (The reader will forgive, I hope, the poetic license, once again.) Moreover, what constitutes science depends on the circumstances. As I said, quota sampling that was used in the 30s and 40s by Crossley, Gallup, and Roper was considered scientific, and labeled as such, even though probability sampling was known and, in 1934, it was demonstrated by a Polish mathematician-statistician, Jerzy Neyman, to be superior to any other form of sampling. The pollsters never adopted probability sampling until well after their disastrous prediction of a Dewey victory over Truman in the 1948 presidential election. Folks in federal agencies, such as the Department of Agriculture, quickly adopted Neyman’s approach, and he was invited to lecture the staff on the issue of probability sampling. So, in effect, two forms of “scientific” sampling, although apparently polar opposites, one a probability methodology, the other a non-probability practice, coexisted during a number of years. Why does that sound familiar?
But let’s come back to our world. Whose “science” is winning? Link’s or Gelman’s? But is there a contest in the first place? I think not – in spite of the appearances: the bile spilled, the moral high ground (e.g. innovation vs. “methodological conservatism”), the abandonment of standards, etc. Gelman and like-minded data analysts are going about their business. As Gelman puts it, addressing AAPOR: “How bout [sic] you do your job and I do mine.” Indeed, no one in one’s right mind is going to strip probability sampling of its scientific legitimacy. But it is its practical implementation these days that makes it problematic for many survey researchers, thus their reliance on the opt-in methodology thanks to the rise of the Internet. This difficulty in applying probability sampling is reminiscent, if the reader allows me to go down memory lane once again, of the assessment made by the pollsters of the 30s and 40s. Gallup wrote: “Although random sampling can be highly accurate in the case of homogeneous populations, and is in many cases the simplest sampling method, there are times when it cannot be used successfully. Sometimes the statistical universe is heterogeneous–that is, it is composed of a number of dissimilar elements which are not evenly distributed throughout the whole. In addition, the universe is sometimes so widely distributed or so inaccessible that it is not feasible to set up a random sampling procedure which will guarantee that each unit has an equally good chance of being included in the sample” 18. Thus, they chose to use quota sampling. Gallup and his fellow pollsters were not the only ones in those days to think that way. As eminent a statistician as Samuel Wilks could state: “In the case of large-scale polls, which are made on a state-wide or nation-wide basis, it is clear that it would be impossible, or at any rate highly impractical to draw a random sample from the population under consideration” 19. Just like today, the pollsters of yester-years found it very difficult to implement probability sampling, so they relied on a non-probability methodology to select respondents to their polls.
I have tried to illustrate the back-and-forth way the science label has been attached to and then taken away from non-probability sampling depending on the circumstances. During the early era of modern polling (1935-1948) in America, pre-election and issue polls were characterized by a distinctly non-probability methodology, the quota sample, which, nevertheless, was branded as scientific by the pollsters of that time. The circumstances then were that probability sampling was not a viable method for the pollsters in those days. During the golden era of random-digit-dialing (RDD) telephone surveys, any form of non-probability sampling was frowned upon and considered distinctly non-scientific. Respondents in non-probability samples only represent themselves, we were told sternly. The circumstances then were that polls were blessed with relatively high response rates (70%+). Then, just in time, came the Internet or Worldwide Web era, and non-probability samples were back in business. The circumstances then were that traditional RDD survey were (are) plagued with appalling low response rates making it increasingly costly, and thus impractical, to implement this methodology. That tension is present in the world of polling and survey research seems clear enough. On one side is the AAPOR statement; the association appears reluctant to confer the science label to opt-in Internet polls. On the other, there are those who rely squarely on that technology and believe in its scientificity. Nowadays, we live in an era, and not for the first time in the history of polling, in which two seemingly opposite sampling methodologies are used by practitioners. Both technologies have been labeled as science and both are riding into the sunset, perhaps not hand-in-hand but definitely side-by-side, towards new successes and failures for the foreseeable future. Does the Spanish philosopher’s, George Santayana, adage “Those who cannot remember the past are condemned to repeat it”, apply to the polling industry or not? Or does it matter?
7 Thomas F. Gieryn: “Boundary-Work and the Demarcation of Science from Non-Science: Strains and Interests in Professional Ideologies of Scientists”, American Sociological Review, Vol. 48, No. 6 (Dec., 1983), pp. 781-795.
8 They were criticized by a few statisticians during the course of a congressional hearing in December 1944: Hearings Committee to Investigate Campaign Expenditures House of Representatives Seventy-Eighth Congress Second Session on H. Res. 551. See for example p. 1294: “The quota-sampling method used, and on which principal dependence was placed, does not provide insurance that the sample drawn is a completely representative cross-section of the population eligible to vote, even with an adequate size of sample.” But to no avail.
9 In other countries, France for example, polling organizations have been using the quota methodology with, presumably, as much success and failure as their American counterparts using probability sampling. To paraphrase, with some poetic license, one of their 17th century compatriots: science on this side of the Atlantic, non-science on the other side.
10 This is a fascinating case and a real treasure trove for the sociologist of (social) scientific knowledge, and merits a post in itself – I will work on it. I mention it here because it was roundly criticized by AAPOR, among others, for the lack of randomness of the samples and the very low response rate to its questionnaires – in other words, not much different than one of today’s Internet or telephone surveys.
11 Benjamin Ginzburg, “Dr. Gallup on the mat”, The Nation, December 16, 1944, pp. 159, 737-739.
12 Joseph Alsop, “Dissection of a Poll”, The New Yorker, September 24, 1960, pp. 170-174, 177-184.
13 http://www.statnews.com/2016/02/11/stat-harvard-poll-gene-editing/ and https://cdn1.sph.harvard.edu/wp-content/uploads/sites/94/2016/01/STAT-Harvard-Poll-Jan-2016-Genetic-Technology.pdf (p.10 for the methodology; retrieved Thu 2/11/2016).
14 Academic survey research centers don’t escape the bottom line either: they will be closed down if they don’t meet certain financial standards. Knowledge production is good but not at any cost.
15 Pollsters and survey researchers have always had to struggle with low response rates: in other words, low response rates are nothing new. Contrary to what a recent article in the New Yorker claims [http://www.newyorker.com/magazine/2015/11/16/politics-and-the-new-machine] (a claim later picked up by the Guardian [http://www.theguardian.com/us-news/datablog/2016/jan/27/dont-trust-the-polls-the-systemic-issues-that-make-voter-surveys-unreliable]), response rates in 1930s in America were not in the 90s. The most prestigious poll during that era was conducted by the Literary Digest (a weekly magazine similar to today’s Time): the highest response rate it achieved was about 24% in 1930 and 1936. When the new pollsters (Crossley, Gallup and Roper) emerged in 1935, they used quotas as their sampling methodology from which a response rate cannot be computed. However, Gallup did use mail-in ballots, in addition to in-person interviews, for his pre-election polls of 1936. Two researchers assessing Gallup’s ballot returns wrote: “As a rule less than one-fifth of the mailed ballots are returned and these tend to come from selected groups. (…)The [Gallup] Institute found that the largest response (about 40 per cent) came from people listed in Who’s Who. Eighteen per cent of the people in telephone lists, 15 per cent of the registered voters in poor areas, and 11 per cent of people on relief returned their ballots” – a far cry from 90% (Daniel Katz & Hadley Cantril, “Public Opinion Polls”, Sociometry, Vol. 1, No. 1/2, Jul. - Oct., 1937, p.160).
17 “POLL: Dr. Gallup to Take the National Pulse and Temperature”, News-Week, October 26, 1935, p.24. Gallup was less than transparent when it came to revealing the exact size of his samples.
18 George Gallup and Saul Forbes Rae, The Pulse of Democracy: The Public-Opinion Poll and How It Works, 1940, Simon & Schuster, New York, p.59.
19 Samuel S. Wilks, “Representative Sampling and Poll Reliability”, The Public Opinion Quarterly, Vol. 4, No. 2 (Jun., 1940), p. 262.