On September 18, 2017, the results of a national poll of American college undergraduates were published on the website of the Brookings Institution. The results were commented by the researcher who initiated the study, John Villasenor—a professor of electrical engineering at the University of California, Los Angeles, and a nonresident senior fellow at Brookings. The survey, conducted in August, received financial support from the Charles Koch Foundation. Its central topic was college students’ knowledge and attitudes towards the First Amendment. As its name suggests, the amendment is the first addition to the Constitution of the United States of America, and it deals, among other things, with the issue of “freedom of speech.” Specifically, the amendment prohibits Congress from passing any law that would curtail “freedom of speech.” The poll in question explored this issue with college students in the U.S. who are American citizens.
The poll generated controversy not only for its substantive findings (“A chilling study shows how hostile college students are toward free speech”—Washington Post), but for its methodology (“‘Junk science’: experts cast doubt on widely cited college free speech survey”—The Guardian). In this comment, I will concentrate on the latter: what is considered legitimate knowledge and what is not?
The author of the Guardian piece (09/22/2017) spoke to several polling experts. One of them, Cliff Zukin, a former president of the American Association for Public Opinion Research (AAPOR, 2005-6), was reported as saying that the professor’s survey was “malpractice,” and “junk science.” [AAPOR describes itself as “the leading professional organization of public opinion and survey research professionals in the U.S., with members from academia, media, government, the non-profit sector and private industry.” Disclosure: I am a member of this organization.] Zukin opined that the Brookings poll should never have been reported in the press. He added, somewhat nonsensically, “If it’s not a probability sample, it’s not a sample of anyone [emphasis added], it’s just 1,500 college students who happen to respond.” Another past president of AAPOR, Michael Traugott (1999-2000), was interviewed. He stated, more diplomatically, that the poll was “an interesting piece of data.” But he, as well, doubted its validity: “Whether it represents the proportion of all college students who believe this is unknown.” The current president of AAPOR, Timothy Johnson (2017-8), was also contacted. He is reported as saying that the survey was “really not appropriate.” Finally, a vice-president at Ipsos, a multinational commercial polling firm, was asked what he thought. In his view, the professor “overstate[d] the quality of his survey.” How did Villasenor go about doing that? By providing a “margin of error” said the Ipos man.
In search of the poll’s methodology
So what do we know about the way this survey was conducted? Not much. But this is not unusual for polls. Villasenor’s methodology section is minimalist—to say the least, especially when it comes to the way his sample was selected. The poll was conducted online and 1500 students responded. How did he find these students? We are not told. How many were eligible? Again, we are not told. How many eligible students were contacted? No answer. The field period started August 17 and ended August 31, 2017. Professor Villasenor tells us that he hired some polling firm to do the data collection. Which one? He does not say. He reports that the data collected were weighted with respect to gender. Indeed, his sample was about 70 percent female (N=1,040), whereas we are told that they represent 57 percent of the college population. However, college students who are American citizens are Villasenor’s target population. Since (I am guessing) students in the U.S. are overwhelmingly American citizens, the difference is probably not critical. Let us say that this poll does not meet the minimum standards of disclosure recommended by AAPOR or by the National Council of Public Polls. Is that it? Just about. He does tell his readers one more thing—something that has been described, after the fact, as a “caveat.” I quote: “To the extent that the demographics of the survey respondents (after weighting for gender) are probabilistically representative of the broader U.S. college undergraduate population, it is possible to estimate the margin of error…” What does that mean? It is a roundabout way of telling us that his sample is not a probability sample; but if you’d like to assume that it is, you can go ahead and compute a margin of sampling (an important word he omits) error. Is this assumption warranted? The author gives us no evidence to that effect.
Bad poll v. good poll
This is the crux of what the experts, quoted by The Guardian, do not like about the poll. For probability fundamentalists, if a poll is not based on a probability (i.e. random) sample, the findings are not worth the paper they are printed on. The results cannot be generalized to the population the poll purports to be studying. A probability survey is one in which each element in the target population (e.g. U.S. college students who are American citizens) has a known probability greater than zero of being selected into the sample. This is what allows pollsters to make statements about the population of interest based on the sample. From the little we are told about this poll, the sample is most likely composed of a self-selected sample of college students who are American citizens. Self-selection contravenes classical statistical theory of random sampling—think of letters of constituents sent to a member of Congress about some issue. As such, calculating a margin of sampling error is an exercise in futility.
The Guardian article ends with a positive counter-example, i.e. a “good” poll. It mentions a 2016 Gallup survey of 3,000 students asking similar questions but coming up with very different answers. The story appeared to be attributing the dissimilarities to differences in methodologies. The newspaper states that the students “had been selected in a carefully randomized process from a nationally representative group of colleges.” (More about that later.)
But some came to the rescue of the Brookings Institution’s poll. One of them was the Washington Post columnist (09/28/2017) whose first piece commented on the results under the “chilling story” headline. In her second column, entitled “Free speech and ‘good’ vs. ‘bad’ polls,” she referred to the rebukes of the poll as “disingenuous, confused or both.” She added: they “don’t render a poll ‘junk science’.” She points out correctly that a lot of “major surveys are now conducted online and use ‘non-probability’ samples.”
We actually learn more from her about the methodology Villasenor used for his survey than we did in the original report on the poll! According to this Post column, the professor contacted the Rand Survey Research Group. They advised him, we are told, on sampling methods and put him in touch with a commercial polling house (Opinion Access Corporation—OAC) that conducts online polls. This firm had in its database members of the population of interest: “college students (subsequently narrowed to college students at four-year schools only).” One wonders why the good professor couldn’t have told us that in the first place. (Note also that the Post columnist says nothing about the citizenship criterion.) Although we know a little bit more about the methodology of the Brookings’ poll, many questions remain. For instance, how does OAC recruit its panel? How many “eligible” students did it contact for the poll? Again, we are in the dark.
Another defender of the Brookings poll is a blogger for the website “reason.com” (“Is That Poll That Found College Students Don’t Value Free Speech Really ‘Junk Science’? Not So Fast”—9/30/2017). In her view, just because the poll is based on an opt-in panel is no reason to “disregard the findings.” Like her Post colleague, she argues that many reputable firms rely on this methodology. She writes: “These days, lots of well-respected outfits are doing sophisticated work outside the confines of traditional probability polls.” And she adds: “it’s a stretch to claim that any poll that uses an opt-in panel is necessarily junk”.
Controversy over Sampling
Ever since their first appearance in the mid-1990s, Internet polls have been controversial. But I think it is fair to say, though, that the controversy is dying down; the community of sample survey practitioners has had to face the facts of life, grudgingly for some: Internet polls are widely used and are here to stay—at least for the time being.
The dispute around the Brookings poll is just the latest installment on the issue of what constitutes legitimate knowledge when the source of that knowledge is a sample survey. (For an analysis of another recent flare-up see “Using online panels for election polls”.) For decades in America, the hegemonic creed that held sway over the community of sample survey practitioners was probability sampling. It was believed to be the only way to obtain reliable and valid (all other things being equal) knowledge from a poll or survey. If one happened not to practice the credo, one did so surreptitiously, being very careful not to advertise this breach—one did not wish to be labeled a deviant. Probability sampling was the “gold standard” and still is.
Nowadays, this ideal is unattainable for most researchers (or so they say), especially those in the commercial sector. With the secular decline of response rates, probability sampling is in jeopardy. As a result, non-probability sampling advocates have been emboldened. Not too long ago questioning the orthodoxy of probability sampling would simply have been inconceivable: anybody who had the audacity to suggest that there was merit in non-probability samples would have received a severe tongue lashing from the guardians of the faith. But with the rise of the Internet and response rates in the single digits, the non-probability school feels it can attack the legitimacy of sample surveys that are probability based with impunity. They argue that polls that have such low response rates cannot claim to be probability-based even though the original mechanism used to select the elements in the population to make up the sample was random. The reason for this is that high rates of non-response destroy the random (probability) quality of the sampling process. What practitioners end up with is a self-selected sample. In addition, and more importantly, it is often assumed that a high rate of non-response is associated with large non-response bias. The latter means that there is a wide gap between those who answered the polls and those who did not on the issue of interest—this is what happened, as far as anyone can tell, in the infamous Literary Digest presidential poll of 1936: respondents favored the Republican candidate (Alf Landon) and non-respondents supported FDR, the incumbent president, and ultimate winner. Of course, non-response bias can only be determined empirically, but as consumers of polls, it is wise to take the results of a low response poll with a heavy grain of salt, unless the polling house tells us that it has taken measures (e.g. non-response follow-up) to assess how different non-respondents are from respondents.
I mentioned earlier that the Guardian story gave as an example of a “carefully randomized” poll, a 2016 survey of U.S. college students conducted by the Gallup organization (Gallup, hereafter). Gallup claimed that the poll results “are based on telephone interviews with a random sample of 3,072 U.S. college students, aged 18 to 24, who are currently enrolled as full-time students at four-year colleges” (p. 32). How did it reach this final sample? It started by selecting a random sample of 240 four-year colleges. All of these colleges were contacted but only 32 agreed to participate in the survey—that’s an eighty-seven percent refusal rate. Does that make them “a nationally representative group of colleges” as the Guardian states? From these colleges, Gallup selected a random sample of 54,806 students to whom emails were sent asking them to fill out a short Internet survey, which would determine their eligibility for a telephone interview. Thirteen percent (6,928) completed the web survey, of which ninety-eight percent (6,814) were eligible and provided a telephone number. Gallup reports that the response rate to the telephone survey was 49 percent. Finally, it states that the “combined response rate for the Web recruit and telephone surveys was 6%” (.13 × .49 × 100). Of course, this response rate does not include the fact that only 32 colleges out of the originally selected 240 decided to participate in the study. But this is a moot point given the already tiny response rate. (Note, however, how much more information we are provided, regarding how this poll was conducted, compared to the Brookings poll.) So is Gallup justified in calling this sample “random”?
This Gallup poll is exactly the type that online pollsters would put forward as an example of a survey that is probability in name only, but in reality, is simply based on a self-selected sample—just like those online polls that use an opt-in panel to conduct their research. The online samplers’ point of view is presented in the reason.com piece. The author quotes the head of the election polling unit of the online company SurveyMonkey, who is reported saying: “We believe we can offer something of similar quality, at a very different price point [compared to traditional probability sampling], and with more speed.” [Emphasis added.] The same story mentions another online polling house, YouGov, described as “best in class when it comes to this type of online panel research.” Indeed, a different institute, the Cato Institute, used the services of this company to conduct a poll on topics similar to those studied by Brookings and Gallup. And like Villasenor’s survey, the Cato Institute methodology page (74) reports a “margin of error”. AAPOR in a 2013 report on non-probability sampling stated: “margin of sampling error in surveys has an accepted meaning and that this measure is not appropriate for non-probability samples” (p. 82). It added: “We believe that users of non-probability samples should be encouraged to report measures of the precision of their estimates, but suggest that, to avoid confusion, the set of terms be distinct from those currently used in probability sample surveys” (p. 82). Well, so much for that. I suppose that non-probability samplers would argue that if Gallup with a response rate of less than 6 percent can give a margin of sampling error, why can’t they?
Some history and some…sociology
It is not the first time in the history of modern polling that the community of sample survey practitioners has been divided over a methodological issue, specifically over sampling, and more precisely, over the worth of probability versus non-probability sampling. In his landmark 1934 paper, Jerzy Neyman (1894-1981) demonstrated the beneficial value of probability sampling and condemned non-probability sampling as inadequate. Up until then, both had been deemed legitimate. The legitimacy of probability sampling was derived from the fact that it rested on a solid mathematical statistics foundation. Statisticians with the U.S. federal government were quick to adopt and expand on Neyman’s ideas. According to historians Duncan and Shelton: “By about the time the United States entered World War II, probability sampling had taken root in the Federal Government” (p. 323). Academic research centers followed suit—sometime later. Not so for the commercial pollsters (Crossley, Gallup, and Roper). From the start (1935), and for many years thereafter, they relied on the non-probability technique of quota samples. In fact, it took almost a decade before the rift over which of the two methodologies was “better” to come out into the open. During that time the pollsters were never questioned about their sampling preference. In December 1944 (NY Times, 12/30/1944, p. 6) one of the first salvos directed against quota sampling came from a technical committee appointed by Congress to look into the methodology of polls. Referring back to the recent presidential election polls, the committee stated: “The quota-sampling method used, and on which principal dependence was placed, does not provide insurance that the sample drawn is a completely representative cross-section of the population eligible to vote, even with an adequate size of sample. In general, the major defects of the quota-sample method lie, first, in the method of fixing quotas, and, second, in the method of selection of respondents to interview” (p. 1294, Hearings before the Committee to Investigate Campaign Expenditures, House of Representatives, 78th Congress, 2nd Session, on H. Res. 551, Part 12, Thursday, December 28, 1944). The line of demarcation was clearly drawn.
Despite this warning, the pollsters persisted in their “misguided” ways. Between 1944 and 1948, the debate between the two camps heated up. Probability samplers were busy attacking the legitimacy of quotas and promoting their brand of sampling as “the best that statistical science has to offer,” (p. 26) as statisticians Philip Hauser and Morris Hansen put it. At their most strident the probability samplers characterized the pollsters’ methodology as “rule of thumb” and their polls as “more like straw votes than scientific instruments” (p. 557). Although faced with the ascendancy of probability samplers, pollsters and their allies fought back and refused to be stripped of their legitimacy: “Current attempts of some academicians,” Hadley Cantril countered, “to set up themselves and their work as ‘scientific’ while labeling Crossley, Gallup and Roper as rule-of-thumb operators is not, in my judgment, either justified or statesmanlike” (p. 23). In addition to this “condemning the condemners” line of attack, quota samplers used two other approaches to question the putative superiority of the probability norm: 1) there was no empirical proof that it was better than quotas; 2) it was far too expensive and too slow to implement for the pollsters’ purposes. The latter justification was a way to argue that their work was done within a different setting than the one in which federal workers and academicians, which regrouped most the probability samplers, practiced. In other words, what the pollsters were implying was that the advancing probability norm did not apply to their case. Their sampling was indeed “scientific” (see Gallup’s testimony at the 1944 Hearings, pp. 1238, 1253) despite the fact it was performed within commercial constraints—which included providing their subscribers (the press) with timely results. They did not want to convey the impression that business considerations took priority over scientific ones, but rather that they had to deploy their “science” within a much more demanding environment than did other sample survey practitioners.
For the 1948 presidential election, the pollsters essentially used the same approach they had relied on in previous elections, but this time with disastrous consequences: they all predicted, wrongly, that the incumbent president, Harry Truman, would lose to Republican challenger, Thomas Dewey. The righteous indignation expressed by some probability samplers at the 1948 poll failure was not simply a result of what they saw as norm violation (using quota sampling instead of probability sampling), but also because they feared it would affect the image and status of social science in general, and the sample survey in particular. Fortunately, the authoritative Social Science Research Council (SSRC) stepped in quickly to diffuse the apparent crisis, and to prevent the battle over sampling from taking center stage and degrading the image of social science at a time when many among natural scientists and politicians saw it as mere “political ideology”—not science (see forthcoming). Of course, the SSRC report on the polling failure chastised the pollsters for not using more up-to-date sampling methods (p. 601), but it allocated much more space to other issues that affected the forecast (e.g., last-minute shift, identifying likely voters), effectively diluting the conflict between the two schools of sampling. In fact, some observers felt that the SSRC report showed “a tendency to let down the polling organizations easily” (p. 134). Be that as it may, it is clear that after the 1948 failure, the probability norm had gained a position of dominance. Although it would take years for it to become the pollsters’ modus operandi, as early as 1949, they felt they had to pay their respects to probability sampling. For instance, Gallup at the annual conference of AAPOR that year announced that his organization was “designing a national probability sample” (pp. 765-6).
Ever since the 1930s, non-probability sampling (whatever its form) has always been second best—at least in America. In yesteryears that methodology was labeled “rule-of-thumb” and even “primitive”; today some call it “junk science” and “malpractice”. But in reality, for decades, non-probability sampling has coexisted side-by-side with its counterpart: probability sampling. The latter had been elevated, shortly after Neyman’s paper, as the only legitimate norm of practice and was endowed with much prestige. So non-probability sampling persisted in a state of what sociologists call a “patterned evasion of norm”: the discredited practice is allowed to thrive because most turn a blind eye as long as the violation is not too flagrant, i.e. if it does not call attention to itself. When it does, then the dominant norm must be reaffirmed for all to be reminded what is legitimate and what is not. This is what happened in 1948, although in a subdued way, with the SSRC report, and this is how I would interpret the reception the Hite Report received in 1988 from the officialdom of sample survey practitioners. At the AAPOR annual conference, a panel, including Shere Hite, was convened, which essentially condemned her methodology. (For an informative and entertaining description of this drama see chapter 1 of David W. Moore’s The Superpollsters, entitled “The Sins of Shere Hite.”) Her research was very controversial and got a lot of publicity. Her samples were self-selected, i.e. non-probability, but she had the audacity to claim that hers was a “scientific” study. She was told otherwise.
We can see the similarity between these historical examples and the reported reactions of Zukin and his like-minded colleagues to the Brookings poll. Perhaps theirs is a voice crying in the wilderness, and anachronistic. It might have carried some weight back in 1988, but today? The non-probability samplers would argue that there is no such thing as a true probability sample. Only a few, mostly in government, have the luxury of obtaining one. Moreover, the online poll practitioners would say that they have at their disposal an array of sophisticated statistical tools that allow them to adjust their non-probability samples in such a way that their performance is just as good (or as bad) as a so-called probability sample. To back up their statement they could point to empirical studies and election results. With the rise of the Internet and the decline of response rates, non-probability sampling’s worth and status have improved. For practitioners in the commercial sector, mostly, and their academic acolytes, probability sampling, given the current environment, fails on two counts: cost and speed. (The quota samplers of the 40s said the same.) We must not forget that, for most sample surveyors, polling is a commercial enterprise: providing their clients with timely information at a “price point” that makes business sense is a stronger imperative than trying to fulfill a norm that is, for all intents and purposes, unachievable.
So, after all of this, who are we to believe? The Brookings poll that’s been characterized as “junk science” by some, while others tell us not to “disregard [its] findings”? Or the Gallup poll that has been described as “carefully randomized”? Or the Cato Institute poll that was conducted by a polling house that has been called “best in class” when it comes to non-probability samples?
Most poll consumers, I venture to guess, perhaps wrongly, are unaware of the lingering controversy about sampling. We read stories that purport to dispense knowledge because, it is assumed, they are based on solid evidence. Take the “chilling study” column in the Washington Post. The writer reports on the Brookings study and develops her argument without once discussing the methodology of the poll. Why should I, as her reader, question its validity? If it were no good, she wouldn’t be writing about it. I might disagree with her conclusions, but polling methodology does not even cross my mind. If I happen to stumble upon the “junk science” Guardian piece, I might start having some doubts about the Post column. But then again, if, by chance, I read the reason.com article, my doubts might be dispelled. Or, I might throw my hands up in the air and decide that I can’t believe any poll!
I would think that pollsters might be concerned about the effect this lack of consensus could have on their field’s image (prestige)—especially when it is bandied about in the open. It does not promote confidence in the polls as valid purveyors of knowledge. If probability fundamentalists insist on taking a conflictual approach towards non-probability pollsters, as exemplified earlier in the Guardian piece, they are likely to cause confusion, or worse, among the attentive public. Or they can compromise and accept that there are other, possibly legitimate, ways of conducting a poll. Yet they can still derive satisfaction, if only symbolic, from the certain knowledge that theirs is the “gold standard”—however elusive it may be. They would do well to remember that we live in an age in which the practice of probability sampling is highly compromised as a result of low response rates. Ever since the 1940s probability fundamentalists have acted as if the probability norm was codified as law. But it never has. As far as I know, neither AAPOR nor the American Statistical Association has stated in their ethical guidelines of professional practice that probability sampling is the norm to follow. Of course, professionals in any field of activity will react if some egregious norm violation has taken place. Did the Brookings poll rise to that level?