Menus Subscribe Search

Follow us


Genes Are Us

statistics

(Photo: bloomua/Shutterstock)

Why Statistically Significant Studies Aren’t Necessarily Significant

• June 06, 2014 • 9:52 AM

(Photo: bloomua/Shutterstock)

Modern statistics have made it easier than ever for us to fool ourselves.

Scientific results often defy common sense. Sometimes this is because science deals with phenomena that occur on scales we don’t experience directly, like evolution over billions of years or molecules that span billionths of meters. Even when it comes to things that happen on scales we’re familiar with, scientists often draw counter-intuitive conclusions from subtle patterns in the data. Because these patterns are not obvious, researchers rely on statistics to distinguish the signal from the noise. Without the aid of statistics, it would be difficult to convincingly show that smoking causes cancer, that drugged bees can still find their way home, that hurricanes with female names are deadlier than ones with male names, or that some people have a precognitive sense for porn.

OK, very few scientists accept the existence of precognition. But Cornell psychologist Daryl Bem’s widely reported porn precognition study illustrates the thorny relationship between science, statistics, and common sense. While many criticisms were leveled against Bem’s study, in the end it became clear that the study did not suffer from an obvious killer flaw. If it hadn’t dealt with the paranormal, it’s unlikely that Bem’s work would have drawn much criticism. As one psychologist put it after explaining how the study went wrong, “I think Bem’s actually been relatively careful. The thing to remember is that this type of fudging isn’t unusual; to the contrary, it’s rampant–everyone does it. And that’s because it’s very difficult, and often outright impossible, to avoid.”

We shouldn’t put much stock in one statistically significant precognition result that defies everything we know about the physical world. Studies with small, unrepresentative samples can be valuable, but we should treat them cautiously before they are replicated with other samples.

That you can lie with statistics is well known; what is less commonly noted is how much scientists still struggle to define proper statistical procedures for handling the noisy data we collect in the real world. In an exchange published last month in the Proceedings of the National Academy of Sciences, statisticians argued over how to address the problem of false positive results, statistically significant findings that on further investigation don’t hold up. Non-reproducible results in science are a growing concern; so do researchers need to change their approach to statistics?

Valen Johnson, at Texas A&M University, argued that the commonly used threshold for statistical significance isn’t as stringent as scientists think it is, and therefore researchers should adopt a tighter threshold to better filter out spurious results. In reply, statisticians Andrew Gelman and Christian Robert argued that tighter thresholds won’t solve the problem; they simply “dodge the essential nature of any such rule, which is that it expresses a tradeoff between the risks of publishing misleading results and of important results being left unpublished.” The acceptable level of statistical significance should vary with the nature of the study. Another team of statisticians raised a similar point, arguing that a more stringent significance threshold would exacerbate the worrying publishing bias against negative results. Ultimately, good statistical decision making “depends on the magnitude of effects, the plausibility of scientific explanations of the mechanism, and the reproducibility of the findings by others.”

However, arguments over statistics usually occur because it is not always obvious how to make good statistical decisions. Some bad decisions are clear. As xkcd’s Randall Munroe illustrated in his comic on the spurious link between green jelly beans and acne, most people understand that if you keep testing slightly different versions of a hypothesis on the same set of data, sooner or later you’re likely to get a statistically significant result just by chance. This kind of statistical malpractice is called fishing or p-hacking, and most scientists know how to avoid it.

But there are more subtle forms of the problem that pervade the scientific literature. In an unpublished paper (PDF), statisticians Andrew Gelman, at Columbia University, and Eric Loken, at Penn State, argue that researchers who deliberately avoid p-hacking still unknowingly engage in a similar practice. The problem is that one scientific hypothesis can be translated into many different statistical hypotheses, with many chances for a spuriously significant result. After looking at their data, researchers decide which statistical hypothesis to test, but that decision is skewed by the data itself.

To see how this might happen, imagine a study designed to test the idea that green jellybeans cause acne. There are many ways the results could come out statistically significant in favor of the researchers’ hypothesis. Green jellybeans could cause acne in men, but not in women, or in women but not men. The results may be statistically significant if the jellybeans you call “green” include Lemon Lime, Kiwi, and Margarita but not Sour Apple. Gelman and Loken write that “researchers can perform a reasonable analysis given their assumptions and their data, but had the data turned out differently, they could have done other analyses that were just as reasonable in those circumstances.” In the end, the researchers may explicitly test only one or a few statistical hypotheses, but their decision-making process has already biased them toward the hypotheses most likely to be supported by their data. The result is “a sort of machine for producing and publicizing random patterns.”

Gelman and Loken are not alone in their concern. Last year Daniele Fanelli, at the University of Edingburgh, and John Ioannidis, at Stanford University, reported that many U.S. studies, particularly in the social sciences, may overestimate the effect sizes of their results. “All scientists have to make choices throughout a research project, from formulating the question to submitting results for publication.” These choices can be swayed “consciously or unconsciously, by scientists’ own beliefs, expectations, and wishes, and the most basic scientific desire is that of producing an important research finding.”

What is the solution? Part of the answer is to not let measures of statistical significance override our common sense—not our naïve common sense, but our scientifically-informed common sense. We shouldn’t put much stock in one statistically significant precognition result that defies everything we know about the physical world. Studies with small, unrepresentative samples can be valuable, but we should treat them cautiously before they are replicated with other samples. As Gelman and Loken put it, without modern statistics most people would not believe a remarkable claim about general human behavior “based on two survey questions asked to 100 volunteers on the internet and 24 college students. But with the p-value, a result can be declared significant and deemed worth publishing in a leading journal in psychology.”

Michael White
Michael White is a systems biologist at the Department of Genetics and the Center for Genome Sciences and Systems Biology at the Washington University School of Medicine in St. Louis, where he studies how DNA encodes information for gene regulation. He co-founded the online science pub The Finch and Pea. Follow him on Twitter @genologos.

More From Michael White

A weekly roundup of the best of Pacific Standard and PSmag.com, delivered straight to your inbox.

Recent Posts

November 21 • 8:00 AM

What Makes a Film Successful in 2014?

Domestic box office earnings are no longer a reliable metric.



November 21 • 6:00 AM

What Makes a City Unhappy?

According to the National Bureau of Economic Research, Dana McMahan splits time between two of the country’s unhappiest cities. She set out to explore the causes of the happiness deficits.


November 21 • 5:04 AM

Sufferers of Social Anxiety Disorder, Your Friends Like You

The first study of friends’ perceptions suggest they know something’s off with their pals but like them just the same.


November 21 • 4:00 AM

In 2001 Study, Black Celebrities Judged Harshly in Rape Cases

When accused of rape, black celebrities were viewed more negatively than non-celebrities. The opposite was true of whites.


November 20 • 4:00 PM

Women, Kink, and Sex Addiction: It’s Not Like the Movies

The popular view is that if a woman is into BDSM she’s probably a sex addict, and vice versa. In fact, most kinky women are perfectly happy—and possibly healthier than their vanilla counterparts.


November 20 • 2:00 PM

A Majority of Middle-Class Black Children Will Be Poorer as Adults

The disturbing findings of a new study.


November 20 • 12:00 PM

Standing Up for My Group by Kicking Yours

Members of a minority ethnic group are less likely to express support for gay equality if they believe their own group suffers from discrimination.


November 20 • 10:00 AM

For Juvenile Records, It’s ‘Justice by Geography’

A new study finds an inconsistent patchwork of policies across states for how juvenile records are sealed and expunged.


November 20 • 8:00 AM

Surviving the Secret Childhood Trauma of a Parent’s Drug Addiction

As a young girl, Alana Levinson struggled with the shame of her father’s substance abuse. But when she looked more deeply into the research on children of drug-addicted parents, she realized society’s “conspiracy of silence” was keeping her—and possibly millions of others—from adequately dealing with the experience.



November 20 • 6:00 AM

Extreme Weather, Caused by Climate Change, Is Here. Can Nike Prepare You?

Following the approach we often see from companies marketing products before big storms, Nike focuses on climate change science in the promotion of its latest line of base-layer apparel. Is it a sign that more Americans are taking climate change seriously? Don’t get your hopes up.


November 20 • 5:00 AM

How Old Brains Learn New Tricks

A new study shows that the neural plasticity needed for learning doesn’t vanish as we age—it just moves.


November 20 • 4:00 AM

The FBI’s Dangerous Misrepresentation of Encryption Law

The FBI no more deserves a direct line to your data than it deserves to intercept your mail at the post office. But it doesn’t want you to know that.


November 20 • 2:00 AM

Brain Drain Is Economic Development

It may be hard to see unless you shift your focus from places to people, but both destination and source can benefit from “brain drain.”


November 19 • 9:00 PM

Gays Rights Are Great, but Ixnay on the PDAs

New research suggests both heterosexuals and gay men are uncomfortable with public same-sex kissing.


November 19 • 4:00 PM

The Red Cross’ Own Employees Doubt the Charity’s Ethics

Survey results obtained by ProPublica also show a crisis of trust in the charity’s senior leadership.



November 19 • 2:00 PM

Egg Freezing Isn’t the Feminist Issue You Think It Is

New benefits being offered by Apple and Facebook probably aren’t about discouraging women from becoming mothers at a “natural” age.


November 19 • 12:08 PM

Ethnic Diversity Deflates Market Bubbles

But it’s not in the rainbow and sing-along way you’d hope for. We just don’t trust outsiders’ judgments.


November 19 • 12:00 PM

As the Russian Hercules, Vladimir Putin Tames the Cretan Bull

We can better understand Russia’s president, including his foreign policy in Crimea, by looking at how he uses art, opera, and holiday pageantry to assert his connection to the Tsars.


November 19 • 10:00 AM

A Murder Remembered

In her new book, Alice + Freda Forever: A Murder in Memphis, Alexis Coe takes a humanistic look at a forgotten 1892 crime.


November 19 • 8:00 AM

The End to Race-Based Lockdowns in California Prisons

The legacy of “tough on crime” legislation has historically allowed correctional authorities to conceal and pursue politics that would be illegal anywhere else. Could that finally be changing?



November 19 • 6:00 AM

Like a Broken Record

From beer milers to long-distance crawlers, the unending appeal of being No. 1.


Follow us


Sufferers of Social Anxiety Disorder, Your Friends Like You

The first study of friends' perceptions suggest they know something's off with their pals but like them just the same.

Standing Up for My Group by Kicking Yours

Members of a minority ethnic group are less likely to express support for gay equality if they believe their own group suffers from discrimination.

How Old Brains Learn New Tricks

A new study shows that the neural plasticity needed for learning doesn't vanish as we age—it just moves.

Ethnic Diversity Deflates Market Bubbles

But it's not in the rainbow and sing-along way you'd hope for. We just don't trust outsiders' judgments.

Online Brain Exercises Are Probably Useless

Even under the guidance of a specialist trainer, computer-based brain exercises have only modest benefits, a new analysis shows.

The Big One

One company, Comcast, will control up to 40 percent of Internet service coverage in the U.S., and 19 of the top 20 cable markets, if a proposed merger with Time Warner Cable is approved by regulators. November/December 2014

Copyright © 2014 by Pacific Standard and The Miller-McCune Center for Research, Media, and Public Policy. All Rights Reserved.