On September 18, 2017, the results of a national poll of American college undergraduates were published on the website of the Brookings Institution. The results were commented by the researcher who initiated the study, John Villasenor—a professor of electrical engineering at the University of California, Los Angeles, and a nonresident senior fellow at Brookings. The survey, conducted in August, received financial support from the Charles Koch Foundation. Its central topic was college students’ knowledge and attitudes towards the First Amendment. As its name suggests, the amendment is the first addition to the Constitution of the United States of America, and it deals, among other things, with the issue of “freedom of speech.” Specifically, the amendment prohibits Congress from passing any law that would curtail “freedom of speech.” The poll in question explored this issue with college students in the U.S. who are American citizens.
The poll generated controversy not only for its substantive findings (“A chilling study shows how hostile college students are toward free speech”—Washington Post), but for its methodology (“‘Junk science’: experts cast doubt on widely cited college free speech survey”—The Guardian). In this comment, I will concentrate on the latter: what is considered legitimate knowledge and what is not?
The author of the Guardian piece (09/22/2017) spoke to several polling experts. One of them, Cliff Zukin, a former president of the American Association for Public Opinion Research (AAPOR, 2005-6), was reported as saying that the professor’s survey was “malpractice,” and “junk science.” [AAPOR describes itself as “the leading professional organization of public opinion and survey research professionals in the U.S., with members from academia, media, government, the non-profit sector and private industry.” Disclosure: I am a member of this organization.] Zukin opined that the Brookings poll should never have been reported in the press. He added, somewhat nonsensically, “If it’s not a probability sample, it’s not a sample of anyone [emphasis added], it’s just 1,500 college students who happen to respond.” Another past president of AAPOR, Michael Traugott (1999-2000), was interviewed. He stated, more diplomatically, that the poll was “an interesting piece of data.” But he, as well, doubted its validity: “Whether it represents the proportion of all college students who believe this is unknown.” The current president of AAPOR, Timothy Johnson (2017-8), was also contacted. He is reported as saying that the survey was “really not appropriate.” Finally, a vice-president at Ipsos, a multinational commercial polling firm, was asked what he thought. In his view, the professor “overstate[d] the quality of his survey.” How did Villasenor go about doing that? By providing a “margin of error” said the Ipos man.
In search of the poll’s methodology
So what do we know about the way this survey was conducted? Not much. But this is not unusual for polls. Villasenor’s methodology section is minimalist—to say the least, especially when it comes to the way his sample was selected. The poll was conducted online and 1500 students responded. How did he find these students? We are not told. How many were eligible? Again, we are not told. How many eligible students were contacted? No answer. The field period started August 17 and ended August 31, 2017. Professor Villasenor tells us that he hired some polling firm to do the data collection. Which one? He does not say. He reports that the data collected were weighted with respect to gender. Indeed, his sample was about 70 percent female (N=1,040), whereas we are told that they represent 57 percent of the college population. However, college students who are American citizens are Villasenor’s target population. Since (I am guessing) students in the U.S. are overwhelmingly American citizens, the difference is probably not critical. Let us say that this poll does not meet the minimum standards of disclosure recommended by AAPOR or by the National Council of Public Polls. Is that it? Just about. He does tell his readers one more thing—something that has been described, after the fact, as a “caveat.” I quote: “To the extent that the demographics of the survey respondents (after weighting for gender) are probabilistically representative of the broader U.S. college undergraduate population, it is possible to estimate the margin of error…” What does that mean? It is a roundabout way of telling us that his sample is not a probability sample; but if you’d like to assume that it is, you can go ahead and compute a margin of sampling (an important word he omits) error. Is this assumption warranted? The author gives us no evidence to that effect.
Bad poll v. good poll
This is the crux of what the experts, quoted by The Guardian, do not like about the poll. For probability fundamentalists, if a poll is not based on a probability (i.e. random) sample, the findings are not worth the paper they are printed on. The results cannot be generalized to the population the poll purports to be studying. A probability survey is one in which each element in the target population (e.g. U.S. college students who are American citizens) has a known probability greater than zero of being selected into the sample. This is what allows pollsters to make statements about the population of interest based on the sample. From the little we are told about this poll, the sample is most likely composed of a self-selected sample of college students who are American citizens. Self-selection contravenes classical statistical theory of random sampling—think of letters of constituents sent to a member of Congress about some issue. As such, calculating a margin of sampling error is an exercise in futility.
The Guardian article ends with a positive counter-example, i.e. a “good” poll. It mentions a 2016 Gallup survey of 3,000 students asking similar questions but coming up with very different answers. The story appeared to be attributing the dissimilarities to differences in methodologies. The newspaper states that the students “had been selected in a carefully randomized process from a nationally representative group of colleges.” (More about that later.)
But some came to the rescue of the Brookings Institution’s poll. One of them was the Washington Post columnist (09/28/2017) whose first piece commented on the results under the “chilling story” headline. In her second column, entitled “Free speech and ‘good’ vs. ‘bad’ polls,” she referred to the rebukes of the poll as “disingenuous, confused or both.” She added: they “don’t render a poll ‘junk science’.” She points out correctly that a lot of “major surveys are now conducted online and use ‘non-probability’ samples.”
We actually learn more from her about the methodology Villasenor used for his survey than we did in the original report on the poll! According to this Post column, the professor contacted the Rand Survey Research Group. They advised him, we are told, on sampling methods and put him in touch with a commercial polling house (Opinion Access Corporation—OAC) that conducts online polls. This firm had in its database members of the population of interest: “college students (subsequently narrowed to college students at four-year schools only).” One wonders why the good professor couldn’t have told us that in the first place. (Note also that the Post columnist says nothing about the citizenship criterion.) Although we know a little bit more about the methodology of the Brookings’ poll, many questions remain. For instance, how does OAC recruit its panel? How many “eligible” students did it contact for the poll? Again, we are in the dark.
Another defender of the Brookings poll is a blogger for the website “reason.com” (“Is That Poll That Found College Students Don’t Value Free Speech Really ‘Junk Science’? Not So Fast”—9/30/2017). In her view, just because the poll is based on an opt-in panel is no reason to “disregard the findings.” Like her Post colleague, she argues that many reputable firms rely on this methodology. She writes: “These days, lots of well-respected outfits are doing sophisticated work outside the confines of traditional probability polls.” And she adds: “it’s a stretch to claim that any poll that uses an opt-in panel is necessarily junk”.
Controversy over Sampling
Ever since their first appearance in the mid-1990s, Internet polls have been controversial. But I think it is fair to say, though, that the controversy is dying down; the community of sample survey practitioners has had to face the facts of life, grudgingly for some: Internet polls are widely used and are here to stay—at least for the time being.
The dispute around the Brookings poll is just the latest installment on the issue of what constitutes legitimate knowledge when the source of that knowledge is a sample survey. (For an analysis of another recent flare-up see “Using online panels for election polls”.) For decades in America, the hegemonic creed that held sway over the community of sample survey practitioners was probability sampling. It was believed to be the only way to obtain reliable and valid (all other things being equal) knowledge from a poll or survey. If one happened not to practice the credo, one did so surreptitiously, being very careful not to advertise this breach—one did not wish to be labeled a deviant. Probability sampling was the “gold standard” and still is.
Nowadays, this ideal is unattainable for most researchers (or so they say), especially those in the commercial sector. With the secular decline of response rates, probability sampling is in jeopardy. As a result, non-probability sampling advocates have been emboldened. Not too long ago questioning the orthodoxy of probability sampling would simply have been inconceivable: anybody who had the audacity to suggest that there was merit in non-probability samples would have received a severe tongue lashing from the guardians of the faith. But with the rise of the Internet and response rates in the single digits, the non-probability school feels it can attack the legitimacy of sample surveys that are probability based with impunity. They argue that polls that have such low response rates cannot claim to be probability-based even though the original mechanism used to select the elements in the population to make up the sample was random. The reason for this is that high rates of non-response destroy the random (probability) quality of the sampling process. What practitioners end up with is a self-selected sample. In addition, and more importantly, it is often assumed that a high rate of non-response is associated with large non-response bias. The latter means that there is a wide gap between those who answered the polls and those who did not on the issue of interest—this is what happened, as far as anyone can tell, in the infamous Literary Digest presidential poll of 1936: respondents favored the Republican candidate (Alf Landon) and non-respondents supported FDR, the incumbent president, and ultimate winner. Of course, non-response bias can only be determined empirically, but as consumers of polls, it is wise to take the results of a low response poll with a heavy grain of salt, unless the polling house tells us that it has taken measures (e.g. non-response follow-up) to assess how different non-respondents are from respondents.
I mentioned earlier that the Guardian story gave as an example of a “carefully randomized” poll, a 2016 survey of U.S. college students conducted by the Gallup organization (Gallup, hereafter). Gallup claimed that the poll results “are based on telephone interviews with a random sample of 3,072 U.S. college students, aged 18 to 24, who are currently enrolled as full-time students at four-year colleges” (p. 32). How did it reach this final sample? It started by selecting a random sample of 240 four-year colleges. All of these colleges were contacted but only 32 agreed to participate in the survey—that’s an eighty-seven percent refusal rate. Does that make them “a nationally representative group of colleges” as the Guardian states? From these colleges, Gallup selected a random sample of 54,806 students to whom emails were sent asking them to fill out a short Internet survey, which would determine their eligibility for a telephone interview. Thirteen percent (6,928) completed the web survey, of which ninety-eight percent (6,814) were eligible and provided a telephone number. Gallup reports that the response rate to the telephone survey was 49 percent. Finally, it states that the “combined response rate for the Web recruit and telephone surveys was 6%” (.13 × .49 × 100). Of course, this response rate does not include the fact that only 32 colleges out of the originally selected 240 decided to participate in the study. But this is a moot point given the already tiny response rate. (Note, however, how much more information we are provided, regarding how this poll was conducted, compared to the Brookings poll.) So is Gallup justified in calling this sample “random”?
This Gallup poll is exactly the type that online pollsters would put forward as an example of a survey that is probability in name only, but in reality, is simply based on a self-selected sample—just like those online polls that use an opt-in panel to conduct their research. The online samplers’ point of view is presented in the reason.com piece. The author quotes the head of the election polling unit of the online company SurveyMonkey, who is reported saying: “We believe we can offer something of similar quality, at a very different price point [compared to traditional probability sampling], and with more speed.” [Emphasis added.] The same story mentions another online polling house, YouGov, described as “best in class when it comes to this type of online panel research.” Indeed, a different institute, the Cato Institute, used the services of this company to conduct a poll on topics similar to those studied by Brookings and Gallup. And like Villasenor’s survey, the Cato Institute methodology page (74) reports a “margin of error”. AAPOR in a 2013 report on non-probability sampling stated: “margin of sampling error in surveys has an accepted meaning and that this measure is not appropriate for non-probability samples” (p. 82). It added: “We believe that users of non-probability samples should be encouraged to report measures of the precision of their estimates, but suggest that, to avoid confusion, the set of terms be distinct from those currently used in probability sample surveys” (p. 82). Well, so much for that. I suppose that non-probability samplers would argue that if Gallup with a response rate of less than 6 percent can give a margin of sampling error, why can’t they?
Some history and some…sociology
It is not the first time in the history of modern polling that the community of sample survey practitioners has been divided over a methodological issue, specifically over sampling, and more precisely, over the worth of probability versus non-probability sampling. In his landmark 1934 paper, Jerzy Neyman (1894-1981) demonstrated the beneficial value of probability sampling and condemned non-probability sampling as inadequate. Up until then, both had been deemed legitimate. The legitimacy of probability sampling was derived from the fact that it rested on a solid mathematical statistics foundation. Statisticians with the U.S. federal government were quick to adopt and expand on Neyman’s ideas. According to historians Duncan and Shelton: “By about the time the United States entered World War II, probability sampling had taken root in the Federal Government” (p. 323). Academic research centers followed suit—sometime later. Not so for the commercial pollsters (Crossley, Gallup, and Roper). From the start (1935), and for many years thereafter, they relied on the non-probability technique of quota samples. In fact, it took almost a decade before the rift over which of the two methodologies was “better” to come out into the open. During that time the pollsters were never questioned about their sampling preference. In December 1944 (NY Times, 12/30/1944, p. 6) one of the first salvos directed against quota sampling came from a technical committee appointed by Congress to look into the methodology of polls. Referring back to the recent presidential election polls, the committee stated: “The quota-sampling method used, and on which principal dependence was placed, does not provide insurance that the sample drawn is a completely representative cross-section of the population eligible to vote, even with an adequate size of sample. In general, the major defects of the quota-sample method lie, first, in the method of fixing quotas, and, second, in the method of selection of respondents to interview” (p. 1294, Hearings before the Committee to Investigate Campaign Expenditures, House of Representatives, 78th Congress, 2nd Session, on H. Res. 551, Part 12, Thursday, December 28, 1944). The line of demarcation was clearly drawn.
Despite this warning, the pollsters persisted in their “misguided” ways. Between 1944 and 1948, the debate between the two camps heated up. Probability samplers were busy attacking the legitimacy of quotas and promoting their brand of sampling as “the best that statistical science has to offer,” (p. 26) as statisticians Philip Hauser and Morris Hansen put it. At their most strident the probability samplers characterized the pollsters’ methodology as “rule of thumb” and their polls as “more like straw votes than scientific instruments” (p. 557). Although faced with the ascendancy of probability samplers, pollsters and their allies fought back and refused to be stripped of their legitimacy: “Current attempts of some academicians,” Hadley Cantril countered, “to set up themselves and their work as ‘scientific’ while labeling Crossley, Gallup and Roper as rule-of-thumb operators is not, in my judgment, either justified or statesmanlike” (p. 23). In addition to this “condemning the condemners” line of attack, quota samplers used two other approaches to question the putative superiority of the probability norm: 1) there was no empirical proof that it was better than quotas; 2) it was far too expensive and too slow to implement for the pollsters’ purposes. The latter justification was a way to argue that their work was done within a different setting than the one in which federal workers and academicians, which regrouped most the probability samplers, practiced. In other words, what the pollsters were implying was that the advancing probability norm did not apply to their case. Their sampling was indeed “scientific” (see Gallup’s testimony at the 1944 Hearings, pp. 1238, 1253) despite the fact it was performed within commercial constraints—which included providing their subscribers (the press) with timely results. They did not want to convey the impression that business considerations took priority over scientific ones, but rather that they had to deploy their “science” within a much more demanding environment than did other sample survey practitioners.
For the 1948 presidential election, the pollsters essentially used the same approach they had relied on in previous elections, but this time with disastrous consequences: they all predicted, wrongly, that the incumbent president, Harry Truman, would lose to Republican challenger, Thomas Dewey. The righteous indignation expressed by some probability samplers at the 1948 poll failure was not simply a result of what they saw as norm violation (using quota sampling instead of probability sampling), but also because they feared it would affect the image and status of social science in general, and the sample survey in particular. Fortunately, the authoritative Social Science Research Council (SSRC) stepped in quickly to diffuse the apparent crisis, and to prevent the battle over sampling from taking center stage and degrading the image of social science at a time when many among natural scientists and politicians saw it as mere “political ideology”—not science (see forthcoming). Of course, the SSRC report on the polling failure chastised the pollsters for not using more up-to-date sampling methods (p. 601), but it allocated much more space to other issues that affected the forecast (e.g., last-minute shift, identifying likely voters), effectively diluting the conflict between the two schools of sampling. In fact, some observers felt that the SSRC report showed “a tendency to let down the polling organizations easily” (p. 134). Be that as it may, it is clear that after the 1948 failure, the probability norm had gained a position of dominance. Although it would take years for it to become the pollsters’ modus operandi, as early as 1949, they felt they had to pay their respects to probability sampling. For instance, Gallup at the annual conference of AAPOR that year announced that his organization was “designing a national probability sample” (pp. 765-6).
Ever since the 1930s, non-probability sampling (whatever its form) has always been second best—at least in America. In yesteryears that methodology was labeled “rule-of-thumb” and even “primitive”; today some call it “junk science” and “malpractice”. But in reality, for decades, non-probability sampling has coexisted side-by-side with its counterpart: probability sampling. The latter had been elevated, shortly after Neyman’s paper, as the only legitimate norm of practice and was endowed with much prestige. So non-probability sampling persisted in a state of what sociologists call a “patterned evasion of norm”: the discredited practice is allowed to thrive because most turn a blind eye as long as the violation is not too flagrant, i.e. if it does not call attention to itself. When it does, then the dominant norm must be reaffirmed for all to be reminded what is legitimate and what is not. This is what happened in 1948, although in a subdued way, with the SSRC report, and this is how I would interpret the reception the Hite Report received in 1988 from the officialdom of sample survey practitioners. At the AAPOR annual conference, a panel, including Shere Hite, was convened, which essentially condemned her methodology. (For an informative and entertaining description of this drama see chapter 1 of David W. Moore’s The Superpollsters, entitled “The Sins of Shere Hite.”) Her research was very controversial and got a lot of publicity. Her samples were self-selected, i.e. non-probability, but she had the audacity to claim that hers was a “scientific” study. She was told otherwise.
We can see the similarity between these historical examples and the reported reactions of Zukin and his like-minded colleagues to the Brookings poll. Perhaps theirs is a voice crying in the wilderness, and anachronistic. It might have carried some weight back in 1988, but today? The non-probability samplers would argue that there is no such thing as a true probability sample. Only a few, mostly in government, have the luxury of obtaining one. Moreover, the online poll practitioners would say that they have at their disposal an array of sophisticated statistical tools that allow them to adjust their non-probability samples in such a way that their performance is just as good (or as bad) as a so-called probability sample. To back up their statement they could point to empirical studies and election results. With the rise of the Internet and the decline of response rates, non-probability sampling’s worth and status have improved. For practitioners in the commercial sector, mostly, and their academic acolytes, probability sampling, given the current environment, fails on two counts: cost and speed. (The quota samplers of the 40s said the same.) We must not forget that, for most sample surveyors, polling is a commercial enterprise: providing their clients with timely information at a “price point” that makes business sense is a stronger imperative than trying to fulfill a norm that is, for all intents and purposes, unachievable.
So, after all of this, who are we to believe? The Brookings poll that’s been characterized as “junk science” by some, while others tell us not to “disregard [its] findings”? Or the Gallup poll that has been described as “carefully randomized”? Or the Cato Institute poll that was conducted by a polling house that has been called “best in class” when it comes to non-probability samples?
Most poll consumers, I venture to guess, perhaps wrongly, are unaware of the lingering controversy about sampling. We read stories that purport to dispense knowledge because, it is assumed, they are based on solid evidence. Take the “chilling study” column in the Washington Post. The writer reports on the Brookings study and develops her argument without once discussing the methodology of the poll. Why should I, as her reader, question its validity? If it were no good, she wouldn’t be writing about it. I might disagree with her conclusions, but polling methodology does not even cross my mind. If I happen to stumble upon the “junk science” Guardian piece, I might start having some doubts about the Post column. But then again, if, by chance, I read the reason.com article, my doubts might be dispelled. Or, I might throw my hands up in the air and decide that I can’t believe any poll!
I would think that pollsters might be concerned about the effect this lack of consensus could have on their field’s image (prestige)—especially when it is bandied about in the open. It does not promote confidence in the polls as valid purveyors of knowledge. If probability fundamentalists insist on taking a conflictual approach towards non-probability pollsters, as exemplified earlier in the Guardian piece, they are likely to cause confusion, or worse, among the attentive public. Or they can compromise and accept that there are other, possibly legitimate, ways of conducting a poll. Yet they can still derive satisfaction, if only symbolic, from the certain knowledge that theirs is the “gold standard”—however elusive it may be. They would do well to remember that we live in an age in which the practice of probability sampling is highly compromised as a result of low response rates. Ever since the 1940s probability fundamentalists have acted as if the probability norm was codified as law. But it never has. As far as I know, neither AAPOR nor the American Statistical Association has stated in their ethical guidelines of professional practice that probability sampling is the norm to follow. Of course, professionals in any field of activity will react if some egregious norm violation has taken place. Did the Brookings poll rise to that level?
Although the events I relate in this post took place more than a year ago, the topic of the controversy (the use of opt-in online panels for election polling purposes) is still very much current, especially at this time of electoral contests, when we are likely to see both successes and blunders (recall the recent 2015 UK parliamentary elections).
On July 25, 2014, the New York Times (NYT), and its polling partner CBS News (CBS), made an announcement that “rocked the polling world” (Washington Post, 07/31/14). The news organizations reported that they had retained YouGov to conduct their polls for the upcoming midterm November elections. The remarkable part was that the polling house is one that bases its polls on an Internet panel, meaning folks who volunteer to take a survey from time to time. This represented a departure from NYT/CBS’s traditional approach: in the past they relied on polls that used telephones and random-digit-dialing (RDD) to reach respondents. RDD is held as the “gold standard” when it comes to polling and survey research by telephone because it conforms to the statistical theory of probability (or random) sampling. In contrast, Internet panels do not fit this theory because panel members are not selected at random; they self-select themselves to be part of the panel.
The NYT indicated that their polls would be based on “an online panel of more than 100,000 respondents nationwide” (NYT, 07/27/14). It attributed its choice to work with YouGov to the fact that “declining response rates may be complicating the ability of telephone polls to capitalize on the advantages of random sampling” (id.). In the same article, it acknowledged both the limitations of working with an online panel (“only the 81 percent of Americans (…) use the Internet”), and YouGov’s less than perfect estimates in the 2012 election (it “underestimated President Obama’s share of the Hispanic vote in 2012”). However, YouGov’s results, it affirmed, “are broadly consistent with previous data on the campaign” (id.). It also cited the serious problem that has plagued telephone sampling: “Only 9 percent of sampled households responded to traditional telephone polls in 2012, down from 21 percent in 2006 and 36 percent in 1997, according to the Pew Research Center” (id.).
In a piece dated 10/05/2014, the NYT stated, perhaps to placate those who criticized their use of the YouGov panel, “The YouGov online surveys are being used to supplement, not replace, the Times’s traditional telephone polls.” It went on to explain that the NYT/CBS “political and social surveys are conducted using random digit dialing probability sampling,” and that the “YouGov data is used for The Upshot election forecasting model in key congressional races and Senate battleground states.”
The event described above was considered “a very big deal in the survey world” by the Pew Research Center’s director of survey research, Scott Keeter 1. Days after the NYT/CBS revelation, the American Association for Public Opinion Research (AAPOR) issued a statement (08/01), signed by its then president, Michael Link, expressing its “concerns” regarding the use of “opt-in Internet” surveys 2. AAPOR is a professional organization that regroups polling and survey research practitioners that work in the private sector, in government, and in academia. (In the interest of full disclosure the reader should know that I am a member of this organization.) As such, one of its responsibilities is to police what is done in the polling industry. AAPOR chastised NYT/CBS: first, for using an Internet panel to report on an electoral contest, because this method of selecting a sample has “little grounding in theory”; second, for a lack of “transparency” regarding how the news organizations arrived at the results they published. As for this last point, the statement read: “While little information about the methodology accompanied the story, a high level overview of the methodology was posted subsequently on the polling vendor’s [i.e. YouGov] website. Unfortunately, due perhaps in part to the novelty of the approach used, many of the details required to honestly assess the methodology remain undisclosed.”
AAPOR rebuked the NYT for abandoning its high standards in matters of polling, and only telling its readers that “the old standards were undergoing review”. It also insisted that “standards need to be in place at all times.” In addition, it criticized the Times for publishing a story (NYT 05/20/2014) that reported on a study whose respondents were recruited by means of ads on Facebook. It warned that “using information from polls which are not conducted with scientific rigor in effect sets a new–lower–standard for the types of information that other news outlets may now seek to report.”
While acknowledging that the “world of polling and opinion research is indeed in the midst of significant change”, in so far as data collection, it warned that “the use of any new methods [should] be conducted within a strong framework of transparency, full disclosure and explicit standards.”
Reactions to the Reaction
Many individuals had their say about AAPOR’s statement. I will concentrate on two of the more notable (and accessible) ones – in my view. Although, predictably, there were two types of reactions to AAPOR’s announcement, for and against, the ones presented here are of the negative variety.
One response (08/05) came from a long time member of the organization, Reg Baker, on his personal blog: The Survey Geek 3. He has been part of AAPOR’s leadership, having been, among other positions, a member of its executive council. The title of his post says it all: “AAPOR gets it wrong.” What did it get wrong?
He writes: “We have well over a decade of experience showing that with appropriate adjustments these polls are just as reliable as those relying on probability sampling, which also require adjustment.” He adds: “There is a substantial literature stretching back to the 2000 elections showing that with the proper adjustments polls using online panels can be every bit as accurate as those using standard RDD samples.” Presumably Baker’s remark was in response to AAPOR stating: “we are witnessing some of the potential dangers of rushing to embrace new approaches without an adequate understanding of the limits of these nascent methodologies.” So what Baker is saying is that AAPOR is wrong on two counts: online polling is not new, and we do have “an adequate understanding of [its] limits.”
AAPOR is also wrong, Baker believes, when it says that YouGov did not provide sufficient details regarding its methodology. On the contrary, Baker asserts: “The details of YouGov’s methodology have been widely shared, including at AAPOR conferences and in peer-reviewed journals.”
He says he agrees (partially) with AAPOR on one point. The NYT, he opines, did “an exceptionally poor job of describing [the decision to use online panels] and disclosing the details of the methodologies they are now willing to accept and the specific information they will routinely publish about them. Shame on them.” But he faults AAPOR for not providing practitioners with “a full set of standards for reporting on results from online research,” despite the fact that this methodology has been around for nearly two decades and is widely used by researchers around the world. One should note that Baker was chair of a 2010 AAPOR task force on opt-in online panels. One might ask: would that not have been a good opportunity to devise “a full set of standards for reporting on results from online research”? But AAPOR’s Executive Council made it very clear that it was not in the task force’s mandate to do so. Nevertheless, the task force did give one recommendation regarding the reporting of survey results based on the opt-in methodology: that surveys based on opt-in or other self-selected samples should not report a “margin of error” as this is not appropriate for non-probability samples.
A more strident reaction, at least in its second formulation, came from a Columbia University professor of political science and statistics. At first the good professor, Andrew Gelman is his name, in a blog called “The Monkey Cage” (?), a regular feature in the Washington Post, provided a response, with his colleague David Rothschild of Microsoft, in the best tradition of polite academic dialog 4. The authors’ post, “Modern polling needs innovation, not traditionalism”, was a model of moderation and reasonableness. In it, they gave AAPOR an emphatic reverential bow, calling it “a justly well-respected organization”, and warned their readers that they were not “disinterested observers” since they collaborate with YouGov on a number of projects. They found AAPOR’s statement, although “undoubtedly well-intentioned”, “so disturbing”. Why? Because, the authors believe, AAPOR’s “rigid faith in technology and theories or ‘standards’ determined in the 1930s” is “holding back our understanding of public opinion” and “putting the industry and research at risk of being unprepared for the end of landline phones and other changes to existing ‘standards’.” Like Baker, the authors point out that YouGov’s methodology has been widely discussed in professional meetings and in peer-reviewed journals. In their view, the theory behind YouGov’s methodology is “well-founded” and “based on the general principles of adjusting for known differences between sample and population.” They add: “If anything, people on the cutting edge of research are not hiding anything; on the contrary, we are fighting hard to overcome entrenched methods by being even more diligent and transparent.”
Although not generally known, academics are human too. And, as any other member of the species, they are prone to the occasional bile-spilling. This is what happened in Gelman’s second formulation of his response to the AAPOR missive posted (08/06) on his personal blog, which rejoices under the name of “Statistical Modeling, Causal Inference, and Social Science”. The article is titled (hold on to your hats) “President of American Association of Buggy-Whip Manufacturers takes a strong stand against internal combustion engine, argues that the so-called ‘automobile’ has ‘little grounding in theory’ and that ‘results can vary widely based on the particular fuel that is used’” 5. The professor directed his ire against Michael Link. He accuses Link of having an “anti-innovation” attitude, of “making things up” to support “his” position, of “talking out of his ass” (no, I’m not making this up; go check for yourself), and of “aggressive methodological conservatism” – apparently, the latter must emit some putrid odor since it seems to have occasioned (twice) a desperate search for the vomit bag – as he reports that it “just makes me want to barf” (no, I’m still not making this up). (Fortunately, our somewhat indisposed professor did make a few substantive points – I will come to that in a moment.) In a blog later in the year (12/09: “Buggy-whip update”), he tells his readers that six days after the posting just mentioned, he sent a personal email to Link asking him to explain “his” (i.e. AAPOR’s) statement of August 1 6. He received no response. Somewhat miffed, the professor writes: “I get frustrated when people don’t respond to my queries.” Tell me about it! Now, it seems to me that it doesn’t take a very sophisticated statistical model to predict that the probability of receiving a response given Gelman’s August 6 post is much closer to zero (0=no response) than to one (1=response).
Now to the substance. Gelman makes the point that there really is no difference between a “probability” sample that has a response rate of 10% and an opt-in Internet panel – both are self-selected samples. In either case, in order to estimate what it is you are trying to estimate (e.g. the percentage a political candidate will receive), you “have to do some adjustment to correct for known differences between sample and population,” and in the process “make assumptions”. The methodology is “not new”, he says, and “a lot of research” has been done on these issues. Regarding the latter, he mentions the work of Roderick Little, an expert in the statistics of “missing data”.
A Sociological View
This controversy illustrates several themes of the sociology of science, in our case, social science:
AAPOR as the guardian of agreed upon standards for the conduct of polls and survey research is duty bound (as one AAPOR member put it, it would have been irresponsible of AAPOR not to have said something) to intervene when any of its norms have been violated – whether the violator is a member of the association or not. In its view the NYT/CBS organization had done just that when it decided to base its election forecasts on polling data that came from an opt-in Internet panel, i.e. from a non-random sample. Generally, in the past, these types of samples have been considered un-scientific. In contrast, probability (aka random) samples have been recognized, if not adopted, as the “gold standard” of sampling since the late 1940s – at least in the United States. In other words, to borrow from sociologist Thomas Gieryn, it has been the task of AAPOR to demarcate science (probability sampling) from non-science (non-probability samples) in the field of polling and survey research 7. These norms, for example, have forced news network organizations to warn their viewers, when reporting the results of a call-in poll (aka 1-800-poll or “junk” poll), that the numbers on their screens were obtained from a “non-scientific” survey.
What is considered science in the polling world has changed over the years. In the 1930s and 40s (1935-1948), the new pollsters (Crossley, Gallup, and Roper) promoted a distinctly non-probability methodology (quota sampling) as science – and (before 1948) nobody really challenged them on this as AAPOR is now challenging the NYT/CBS organization 8. Nowadays, or at least until very recently, were you to use a quota sample or any other non-random sampling methodology for your study, you were liable to get your wrists slapped (figuratively, of course) – at least, and I repeat, in the US 9. The Hite Report is a good example 10. Thus, science is what those who are empowered to say what it is say it is. And science varies depending on the era you live in and what part of the world you reside in.
The AAPOR statement is an opportunity for the association to assert its authority. It is “the leading association of public opinion and survey researchers”, and as such its credentials cannot be doubted. The statement is also the occasion the reiterate the basic tenets of the faith: “a fundamental belief in a scientific approach”; “objective standards”; polls, conducted according to “standards of quality”, “mirror reality” (that is, social reality); etc. AAPOR’s basic ruling in the August 1, 2014 release is that Internet opt-in panels are NOT quite ready for the big league – pre-election polling; they’re still wet behind the ears. The time to extend the boundaries of what is considered scientific in polling is not now, because “these new approaches and methodologies” still require “rigorous empirical testing”, etc. In other words, AAPOR re-emphasized the demarcation line between legitimate polls or surveys that provide reliable knowledge about social reality (e.g. public opinion), and “polls which are not conducted with scientific rigor” whose results are “highly questionable, if not outright incorrect”. It also stated that it is not opposed to the idea of widening the boundaries, indeed it “encourage[s] assessment of [these methodologies’] viability for measure and insight”, but this must done “within a strong framework of transparency, full disclosure and explicit standards.”
Transparency: a device to unmask illegitimate, non-scientific polls
One thing readers of the AAPOR statement might have noticed is the heavy emphasis that has been given to “transparency”. Transparency is the act of unveiling all the steps that were taken to generate the final poll results that are published: from the sampling design, question wording, and data collection mode (e.g. telephone, web), to weighting and other forms of “adjusting” the raw data. AAPOR launched its Transparency Initiative (TI) in 2014. Now, as anybody who has studied the history of polls in this country will tell you, “transparency” is not one of the pollsters’ most conspicuous virtues. Back in the 1940s, one reporter complained that he had made “several informal attempts (…) to check facts and figures” regarding Gallup polls, but that “all ended in failure” 11 (p.737). Two decades later things seemed to have improved a bit. Trying to get answers about the “Gallup system of processing” polling data, a New Yorker columnist had this to say: “By calling members of the Gallup staff, and by writing to Dr. Gallup, I was able to get answers – reluctant and incomplete, but still answers – to some of my questions about the process” 12 (p.174). Nevertheless, in the following decade, some folks were still not satisfied with the pollsters’ transparency, and not just anybody: a member of congress drafted an unsuccessful bill under the name “Truth-in-Polling Act”.
The AAPOR release states that the organization “has for decades worked to encourage disclosure of methods.” Be that as it may… but between encouragement and actual disclosure, there is a wide gap. As an example, a recent (January 2016) poll by the Harvard School of Public Health in collaboration with STAT, an organization that reports news in the health and medical field 13. One page of the 15 page report is dedicated to the poll’s methodology. Although it provides a fair amount of detail (sample size, type of sampling, dates during which the poll took place, mode of interviewing), one would be hard pressed to find any information on response rate – even though it warns the reader that non-response bias can be part of the total error of the survey. Now I am not trying to pick on the folks who did this particular survey (I just happened to receive the results in my inbox as I was writing this), or the polling house (SSRS) that actually conducted the data collection, and is a member of AAPOR’s Transparency Initiative; I am sure they are all fine upstanding researchers. I am merely illustrating that the ideal of “full disclosure” that AAPOR promotes is yet to be realized – as some gentleman from China has said, I am told, “the future is bright but the road is tortuous”.
So, one may ask, why this hard push about transparency? Answer: the Internet (at least one answer). Thanks to the advent of this technology just about anyone can do a survey or poll nowadays. Throw a few questions together (you know how to ask questions, don’t you Steve?), spend a few bucks (or loonies, if you’re in Canada) to use SurveyMonkey or some other web-survey platform, put an ad on craigslist (or elsewhere) to recruit your participants; when you’re done download the results into Excel, et voilà, you’ve got yourself a study. (Disclaimer: I want the reader to know that I am neither promoting nor endorsing the companies mentioned. I am just describing what I have witnessed during the course of my professional career.) Because this technology is so ubiquitous and seemingly user-friendly, it endangers the monopoly the polling profession has over the production of knowledge about society, in general, and public opinion, in particular. It threatens the profession in that it creates the appearance that to conduct a survey or poll no longer requires “expert” knowledge – just like the museum visitor standing in front of a Jackson Pollock painting and exclaiming “My six-year-old could’ve done that!”
The promotion of transparency is, in part, a demarcation maneuver (again to borrow from Gieryn). It is a means for AAPOR to assert its authority and to reiterate what is and what is not legitimate when it comes to polling. Those that joined the Transparency Initiative are recognized as worthy (i.e. scientific) polling practitioners; it is akin to the warning label consumers find on the package of products in the supermarkets. The “Transparency Initiative” label tells consumers that the product from a particular organization is fit for consumption, and, by extension, for those that have not integrated the TI, their products should be viewed as suspect (non-scientific).
One last thing about “transparency”: there is no such thing as “full disclosure” – at least from commercial polling houses. Polling is a business, big business, not mere idle curiosity. These companies can always invoke proprietary rights to avoid revealing how the results they publish have been produced. Thus, the gruesome details remain hidden from the public eye 14.
Redrawing the boundary: what should be considered science?
One of themes in Gelman’s response to AAPOR is the contested nature of the boundary between what constitutes sound (i.e. scientific) polling practice and what does not. As we saw AAPOR is firmly attached to the principle of probability (aka random) sampling. As I said before, this has been the central credo of the polling profession for decades. Gelman wants the boundary to be extended; he wants to push the demarcation line so that it will include non-probability samples. In reality, the line’s location has already been renegotiated since, nowadays, probability samples with response rates of 10% or less are still considered scientific 15. Gelman’s argument is that there is no difference between these types of samples and samples that recruit their respondents off the Internet (à la YouGov): they are both self-selected samples. Gelman writes: “the ‘grounding in theory’ that allows you to make claims about the nonrespondents in a traditional survey, also allows you to make claims about the people not reached in an internet survey.” In both cases, after the poll’s raw results are in, the analyst will have “to do some adjustment to correct for known differences between sample and population.”
In fact, Gelman is intimating that AAPOR seems to be unaware of this shift of the demarcation line, namely, that methodologies as the one used by YouGov are definitely inside the scientific corral. In his view, they have become a legitimate part of the polling culture. Both he and Baker state that data obtained from Internet opt-in panels polls have a solid pedigree: they have passed muster. How? By the traditional, tried-and-true, means to establish one’s claim to scientificity, or scientific worth: the peer-review system and presentations at conferences attended by one’s peers. Baker writes: “There is a substantial literature stretching back to the 2000 elections showing that with the proper adjustments polls using online panels can be every bit as accurate as those using standard RDD samples.” He could have added “or inaccurate” to his statement for the sake of completeness.
Just as AAPOR relies on “transparency” to question the scientific credentials of polling houses that rely on non-probability samples, like YouGov, Gelman, clearly wedded to the transparency norm, underscores the fact that YouGov’s chief scientist “has detailed the methodology at length and subjected the methodology and results to public transparency that rivals the best practices of major polling companies.” In addition, that same individual has written “academic papers (…) published in the top peer review journals.” He adds: “If anything, people on the cutting edge of research are not hiding anything; on the contrary, we are fighting hard to overcome entrenched methods by being even more diligent and transparent.” So there you are. Who could doubt the scientific worth of the new polling techniques? They have been peer-reviewed and they are as transparent as Baccarat crystal. Clearly they have proven, so Gelman believes, their scientificity, and therefore their legitimacy. So what’s the beef?
In his diatribe of August 6, Gelman adopts a rhetoric that has quasi-moralistic tone: he paints AAPOR as a force opposing progress. The title of his post could not be more explicit: AAPOR is stuck in the past, still relying on the horse-drawn “buggy” to get around, whereas he and his acolytes are the forces of progress, gliding in the most up-to-date mode of transportation, the automobile, propelled by the internal combustion engine. Who could argue against progress? Who would support obscurantism? AAPOR, apparently. Thus, Gelman’s whiggish attitude seems to want to locate this venerable institution beyond the pale – in that hellish zone of non-science. But really what he wants AAPOR to do is to recognize the scientific character, the legitimacy and respectability, of the new polling methodologies. It is time, he proclaims, to expand the scientific territory, to push back the boundaries, for the de jure to catch up with the de facto.
Resolution? Plus ça change… or “déjà vu all over again” (Berra)
The debate around the NYT/CBS announcement boils down to this: are polling samples based on Internet opt-in panels ready for prime time or not? AAPOR say no, Gelman and like-minded researchers say yes. How is this controversy going to be resolved? If the issue appears to be unsettled, it is only in the sphere of the de jure (an official acknowledgment from AAPOR), because, on the ground, in the de facto world, it has been resolved: pollsters have “voted with their feet”. Internet opt-in panels have been in use in the commercial polling world for nearly two decades. Powerful economic interests are at stake here: all the corporate polling organizations that have sprouted as a result of the advent of the Web. And it is not some statistical theory (probability sampling), however prestigious, especially when its application is doubtful and cumbersome, that is going to stand in the way of business: clients expect actionable results, while the polling house expects to be profitable – and so do corporate clients. Besides, as some believe (Gelman and others), plenty of tools have been developed to mitigate the limitations of self-selected (opt-in) samples, and their scientific character cannot be impugned: they can “mirror” reality just as well (or as badly) as the next probability sample.
The issue is how is this going to worm itself into AAPOR’s code of professional practice? In fact, non-probability samples have already carved themselves a bit of territory within the AAPOR canon. The current AAPOR Code of Ethics (November 2015 update) states: “Disclosure requirements for non-probability samples are different because the precision of estimates from such samples is a model-based measure (rather than the average deviation from the population value over all possible samples). Reports of non-probability samples will only provide measures of precision if they are accompanied by a detailed description of how the underlying model was specified, its assumptions validated and the measure(s) calculated. To avoid confusion, it is best to avoid using the term “margin of error” or “margin of sampling error” in conjunction with non-probability samples” 16. So the non-probability sample, anathema as it was in the not too distant past, has got its foot in the door – and then some. Does that mean the controversy is over? Apparently so. Of course, a lot of folks are not too crazy about non-probability samples; their probability counterparts are so much neater – if only those darn people cooperated, the blissful days of the 70%+ response rate would be back. But what can you do, if you’re not the Federal government? The show must go on as thespians say. Hence the online opt-in panel. Thus, non-probability sampling and probability sampling, now both harboring the science label, seem destined to live side-by-side in peaceful coexistence for the foreseeable future.
The polling profession has accomplished a complete circle: it started its modern career (ca. 1935) using non-probability samples (quotas), and now it has gone back to its roots by relying on opt-in online panels. And both claim to be scientific. Another feature they have in common is their dependence on very large samples, much larger than is required if one uses probability sampling. In the ‘30s, Gallup used “vote poll” samples in the one hundred to two hundred thousand range 17. This was considered progress compared to the mass mailing (10 million) done by the most prestigious poll of that era: the Literary Digest poll. The scientific pollsters (Crossley, Gallup, and Roper) considered the Digest’s approach to be wasteful, among other things. Nowadays, online polling organizations also rely on samples in the tens of thousands to make their forecasts.
Scientific practice, here social scientific practice, seems to be ruled, in part, by the Humpty-Dumpty philosophy: “Science means just what I choose it to mean–neither more nor less.” (The reader will forgive, I hope, the poetic license, once again.) Moreover, what constitutes science depends on the circumstances. As I said, quota sampling that was used in the 30s and 40s by Crossley, Gallup, and Roper was considered scientific, and labeled as such, even though probability sampling was known and, in 1934, it was demonstrated by a Polish mathematician-statistician, Jerzy Neyman, to be superior to any other form of sampling. The pollsters never adopted probability sampling until well after their disastrous prediction of a Dewey victory over Truman in the 1948 presidential election. Folks in federal agencies, such as the Department of Agriculture, quickly adopted Neyman’s approach, and he was invited to lecture the staff on the issue of probability sampling. So, in effect, two forms of “scientific” sampling, although apparently polar opposites, one a probability methodology, the other a non-probability practice, coexisted during a number of years. Why does that sound familiar?
But let’s come back to our world. Whose “science” is winning? Link’s or Gelman’s? But is there a contest in the first place? I think not – in spite of the appearances: the bile spilled, the moral high ground (e.g. innovation vs. “methodological conservatism”), the abandonment of standards, etc. Gelman and like-minded data analysts are going about their business. As Gelman puts it, addressing AAPOR: “How bout [sic] you do your job and I do mine.” Indeed, no one in one’s right mind is going to strip probability sampling of its scientific legitimacy. But it is its practical implementation these days that makes it problematic for many survey researchers, thus their reliance on the opt-in methodology thanks to the rise of the Internet. This difficulty in applying probability sampling is reminiscent, if the reader allows me to go down memory lane once again, of the assessment made by the pollsters of the 30s and 40s. Gallup wrote: “Although random sampling can be highly accurate in the case of homogeneous populations, and is in many cases the simplest sampling method, there are times when it cannot be used successfully. Sometimes the statistical universe is heterogeneous–that is, it is composed of a number of dissimilar elements which are not evenly distributed throughout the whole. In addition, the universe is sometimes so widely distributed or so inaccessible that it is not feasible to set up a random sampling procedure which will guarantee that each unit has an equally good chance of being included in the sample” 18. Thus, they chose to use quota sampling. Gallup and his fellow pollsters were not the only ones in those days to think that way. As eminent a statistician as Samuel Wilks could state: “In the case of large-scale polls, which are made on a state-wide or nation-wide basis, it is clear that it would be impossible, or at any rate highly impractical to draw a random sample from the population under consideration” 19. Just like today, the pollsters of yester-years found it very difficult to implement probability sampling, so they relied on a non-probability methodology to select respondents to their polls.
I have tried to illustrate the back-and-forth way the science label has been attached to and then taken away from non-probability sampling depending on the circumstances. During the early era of modern polling (1935-1948) in America, pre-election and issue polls were characterized by a distinctly non-probability methodology, the quota sample, which, nevertheless, was branded as scientific by the pollsters of that time. The circumstances then were that probability sampling was not a viable method for the pollsters in those days. During the golden era of random-digit-dialing (RDD) telephone surveys, any form of non-probability sampling was frowned upon and considered distinctly non-scientific. Respondents in non-probability samples only represent themselves, we were told sternly. The circumstances then were that polls were blessed with relatively high response rates (70%+). Then, just in time, came the Internet or Worldwide Web era, and non-probability samples were back in business. The circumstances then were that traditional RDD survey were (are) plagued with appalling low response rates making it increasingly costly, and thus impractical, to implement this methodology. That tension is present in the world of polling and survey research seems clear enough. On one side is the AAPOR statement; the association appears reluctant to confer the science label to opt-in Internet polls. On the other, there are those who rely squarely on that technology and believe in its scientificity. Nowadays, we live in an era, and not for the first time in the history of polling, in which two seemingly opposite sampling methodologies are used by practitioners. Both technologies have been labeled as science and both are riding into the sunset, perhaps not hand-in-hand but definitely side-by-side, towards new successes and failures for the foreseeable future. Does the Spanish philosopher’s, George Santayana, adage “Those who cannot remember the past are condemned to repeat it”, apply to the polling industry or not? Or does it matter?
7 Thomas F. Gieryn: “Boundary-Work and the Demarcation of Science from Non-Science: Strains and Interests in Professional Ideologies of Scientists”, American Sociological Review, Vol. 48, No. 6 (Dec., 1983), pp. 781-795.
8 They were criticized by a few statisticians during the course of a congressional hearing in December 1944: Hearings Committee to Investigate Campaign Expenditures House of Representatives Seventy-Eighth Congress Second Session on H. Res. 551. See for example p. 1294: “The quota-sampling method used, and on which principal dependence was placed, does not provide insurance that the sample drawn is a completely representative cross-section of the population eligible to vote, even with an adequate size of sample.” But to no avail.
9 In other countries, France for example, polling organizations have been using the quota methodology with, presumably, as much success and failure as their American counterparts using probability sampling. To paraphrase, with some poetic license, one of their 17th century compatriots: science on this side of the Atlantic, non-science on the other side.
10 This is a fascinating case and a real treasure trove for the sociologist of (social) scientific knowledge, and merits a post in itself – I will work on it. I mention it here because it was roundly criticized by AAPOR, among others, for the lack of randomness of the samples and the very low response rate to its questionnaires – in other words, not much different than one of today’s Internet or telephone surveys.
11 Benjamin Ginzburg, “Dr. Gallup on the mat”, The Nation, December 16, 1944, pp. 159, 737-739.
12 Joseph Alsop, “Dissection of a Poll”, The New Yorker, September 24, 1960, pp. 170-174, 177-184.
13 http://www.statnews.com/2016/02/11/stat-harvard-poll-gene-editing/ and https://cdn1.sph.harvard.edu/wp-content/uploads/sites/94/2016/01/STAT-Harvard-Poll-Jan-2016-Genetic-Technology.pdf (p.10 for the methodology; retrieved Thu 2/11/2016).
14 Academic survey research centers don’t escape the bottom line either: they will be closed down if they don’t meet certain financial standards. Knowledge production is good but not at any cost.
15 Pollsters and survey researchers have always had to struggle with low response rates: in other words, low response rates are nothing new. Contrary to what a recent article in the New Yorker claims [http://www.newyorker.com/magazine/2015/11/16/politics-and-the-new-machine] (a claim later picked up by the Guardian [http://www.theguardian.com/us-news/datablog/2016/jan/27/dont-trust-the-polls-the-systemic-issues-that-make-voter-surveys-unreliable]), response rates in 1930s in America were not in the 90s. The most prestigious poll during that era was conducted by the Literary Digest (a weekly magazine similar to today’s Time): the highest response rate it achieved was about 24% in 1930 and 1936. When the new pollsters (Crossley, Gallup and Roper) emerged in 1935, they used quotas as their sampling methodology from which a response rate cannot be computed. However, Gallup did use mail-in ballots, in addition to in-person interviews, for his pre-election polls of 1936. Two researchers assessing Gallup’s ballot returns wrote: “As a rule less than one-fifth of the mailed ballots are returned and these tend to come from selected groups. (…)The [Gallup] Institute found that the largest response (about 40 per cent) came from people listed in Who’s Who. Eighteen per cent of the people in telephone lists, 15 per cent of the registered voters in poor areas, and 11 per cent of people on relief returned their ballots” – a far cry from 90% (Daniel Katz & Hadley Cantril, “Public Opinion Polls”, Sociometry, Vol. 1, No. 1/2, Jul. - Oct., 1937, p.160).
17 “POLL: Dr. Gallup to Take the National Pulse and Temperature”, News-Week, October 26, 1935, p.24. Gallup was less than transparent when it came to revealing the exact size of his samples.
18 George Gallup and Saul Forbes Rae, The Pulse of Democracy: The Public-Opinion Poll and How It Works, 1940, Simon & Schuster, New York, p.59.
19 Samuel S. Wilks, “Representative Sampling and Poll Reliability”, The Public Opinion Quarterly, Vol. 4, No. 2 (Jun., 1940), p. 262.