Menus Subscribe Search

Follow us


Genes Are Us

statistics

(Photo: bloomua/Shutterstock)

Why Statistically Significant Studies Aren’t Necessarily Significant

• June 06, 2014 • 9:52 AM

(Photo: bloomua/Shutterstock)

Modern statistics have made it easier than ever for us to fool ourselves.

Scientific results often defy common sense. Sometimes this is because science deals with phenomena that occur on scales we don’t experience directly, like evolution over billions of years or molecules that span billionths of meters. Even when it comes to things that happen on scales we’re familiar with, scientists often draw counter-intuitive conclusions from subtle patterns in the data. Because these patterns are not obvious, researchers rely on statistics to distinguish the signal from the noise. Without the aid of statistics, it would be difficult to convincingly show that smoking causes cancer, that drugged bees can still find their way home, that hurricanes with female names are deadlier than ones with male names, or that some people have a precognitive sense for porn.

OK, very few scientists accept the existence of precognition. But Cornell psychologist Daryl Bem’s widely reported porn precognition study illustrates the thorny relationship between science, statistics, and common sense. While many criticisms were leveled against Bem’s study, in the end it became clear that the study did not suffer from an obvious killer flaw. If it hadn’t dealt with the paranormal, it’s unlikely that Bem’s work would have drawn much criticism. As one psychologist put it after explaining how the study went wrong, “I think Bem’s actually been relatively careful. The thing to remember is that this type of fudging isn’t unusual; to the contrary, it’s rampant–everyone does it. And that’s because it’s very difficult, and often outright impossible, to avoid.”

We shouldn’t put much stock in one statistically significant precognition result that defies everything we know about the physical world. Studies with small, unrepresentative samples can be valuable, but we should treat them cautiously before they are replicated with other samples.

That you can lie with statistics is well known; what is less commonly noted is how much scientists still struggle to define proper statistical procedures for handling the noisy data we collect in the real world. In an exchange published last month in the Proceedings of the National Academy of Sciences, statisticians argued over how to address the problem of false positive results, statistically significant findings that on further investigation don’t hold up. Non-reproducible results in science are a growing concern; so do researchers need to change their approach to statistics?

Valen Johnson, at Texas A&M University, argued that the commonly used threshold for statistical significance isn’t as stringent as scientists think it is, and therefore researchers should adopt a tighter threshold to better filter out spurious results. In reply, statisticians Andrew Gelman and Christian Robert argued that tighter thresholds won’t solve the problem; they simply “dodge the essential nature of any such rule, which is that it expresses a tradeoff between the risks of publishing misleading results and of important results being left unpublished.” The acceptable level of statistical significance should vary with the nature of the study. Another team of statisticians raised a similar point, arguing that a more stringent significance threshold would exacerbate the worrying publishing bias against negative results. Ultimately, good statistical decision making “depends on the magnitude of effects, the plausibility of scientific explanations of the mechanism, and the reproducibility of the findings by others.”

However, arguments over statistics usually occur because it is not always obvious how to make good statistical decisions. Some bad decisions are clear. As xkcd’s Randall Munroe illustrated in his comic on the spurious link between green jelly beans and acne, most people understand that if you keep testing slightly different versions of a hypothesis on the same set of data, sooner or later you’re likely to get a statistically significant result just by chance. This kind of statistical malpractice is called fishing or p-hacking, and most scientists know how to avoid it.

But there are more subtle forms of the problem that pervade the scientific literature. In an unpublished paper (PDF), statisticians Andrew Gelman, at Columbia University, and Eric Loken, at Penn State, argue that researchers who deliberately avoid p-hacking still unknowingly engage in a similar practice. The problem is that one scientific hypothesis can be translated into many different statistical hypotheses, with many chances for a spuriously significant result. After looking at their data, researchers decide which statistical hypothesis to test, but that decision is skewed by the data itself.

To see how this might happen, imagine a study designed to test the idea that green jellybeans cause acne. There are many ways the results could come out statistically significant in favor of the researchers’ hypothesis. Green jellybeans could cause acne in men, but not in women, or in women but not men. The results may be statistically significant if the jellybeans you call “green” include Lemon Lime, Kiwi, and Margarita but not Sour Apple. Gelman and Loken write that “researchers can perform a reasonable analysis given their assumptions and their data, but had the data turned out differently, they could have done other analyses that were just as reasonable in those circumstances.” In the end, the researchers may explicitly test only one or a few statistical hypotheses, but their decision-making process has already biased them toward the hypotheses most likely to be supported by their data. The result is “a sort of machine for producing and publicizing random patterns.”

Gelman and Loken are not alone in their concern. Last year Daniele Fanelli, at the University of Edingburgh, and John Ioannidis, at Stanford University, reported that many U.S. studies, particularly in the social sciences, may overestimate the effect sizes of their results. “All scientists have to make choices throughout a research project, from formulating the question to submitting results for publication.” These choices can be swayed “consciously or unconsciously, by scientists’ own beliefs, expectations, and wishes, and the most basic scientific desire is that of producing an important research finding.”

What is the solution? Part of the answer is to not let measures of statistical significance override our common sense—not our naïve common sense, but our scientifically-informed common sense. We shouldn’t put much stock in one statistically significant precognition result that defies everything we know about the physical world. Studies with small, unrepresentative samples can be valuable, but we should treat them cautiously before they are replicated with other samples. As Gelman and Loken put it, without modern statistics most people would not believe a remarkable claim about general human behavior “based on two survey questions asked to 100 volunteers on the internet and 24 college students. But with the p-value, a result can be declared significant and deemed worth publishing in a leading journal in psychology.”

Michael White
Michael White is a systems biologist at the Department of Genetics and the Center for Genome Sciences and Systems Biology at the Washington University School of Medicine in St. Louis, where he studies how DNA encodes information for gene regulation. He co-founded the online science pub The Finch and Pea. Follow him on Twitter @genologos.

More From Michael White

A weekly roundup of the best of Pacific Standard and PSmag.com, delivered straight to your inbox.

Recent Posts

October 24 • 10:00 AM

Why DNA Is One of Humanity’s Greatest Inventions

How we’ve co-opted our genetic material to change our world.


October 24 • 8:00 AM

What Do Clowns Think of Clowns?

Three major players weigh in on the current state of the clown.


October 24 • 7:13 AM

There Is No Surge in Illegal Immigration

The overall rate of illegal immigration has actually decreased significantly in the last 10 years. The time is ripe for immigration reform.


October 24 • 6:15 AM

Politicians Really Aren’t Better Decision Makers

Politicians took part in a classic choice experiment but failed to do better than the rest of us.


October 24 • 5:00 AM

Why We Gossip: It’s Really All About Ourselves

New research from the Netherlands finds stories we hear about others help us determine how we’re doing.


October 24 • 2:00 AM

Congratulations, Your City Is Dying!

Don’t take population numbers at face value.


October 23 • 4:00 PM

Of Course Marijuana Addiction Exists

The polarized legalization debate leads to exaggerated claims and denials about pot’s potential harms. The truth lies somewhere in between.


October 23 • 2:00 PM

American Companies Are Getting Way Too Cozy With the National Security Agency

Newly released documents describe “contractual relationships” between the NSA and U.S. companies, as well as undercover operatives.


October 23 • 12:00 PM

The Man Who’s Quantifying New York City

Noah Davis talks to the proprietor of I Quant NY. His methodology: a little something called “addition.”


October 23 • 11:02 AM

Earliest High-Altitude Settlements Found in Peru

Discovery suggests humans adapted to high altitude faster than previously thought.


October 23 • 10:00 AM

The Psychology of Bribery and Corruption

An FBI agent offered up confidential information about a political operative’s enemy in exchange for cash—and they both got caught. What were they thinking?


October 23 • 8:00 AM

Ebola News Gives Me a Guilty Thrill. Am I Crazy?

What it means to feel a little excited about the prospect of a horrific event.


October 23 • 7:04 AM

Why Don’t Men Read Romance Novels?

A lot of men just don’t read fiction, and if they do, structural misogyny drives them away from the genre.


October 23 • 6:00 AM

Why Do Americans Pray?

It depends on how you ask.


October 23 • 4:00 AM

Musicians Are Better Multitaskers

New research from Canada finds trained musicians more efficiently switch from one mental task to another.


October 22 • 4:00 PM

The Last Thing the Women’s Movement Needs Is a Heroic Male Takeover

Is the United Nations’ #HeForShe campaign helping feminism?


October 22 • 2:00 PM

Turning Public Education Into Private Profits

Baker Mitchell is a politically connected North Carolina businessman who celebrates the power of the free market. Every year, millions of public education dollars flow through Mitchell’s chain of four non-profit charter schools to for-profit companies he controls.


October 22 • 12:00 PM

Will the End of a Tax Loophole Kill Off Irish Business and Force Google and Apple to Pay Up?

U.S. technology giants have constructed international offices in Dublin in order to take advantage of favorable tax policies that are now changing. But Ireland might have enough other draws to keep them there even when costs climb.


October 22 • 10:00 AM

Veterans in the Ivory Tower

Why there aren’t enough veterans at America’s top schools—and what some people are trying to do to change that.


October 22 • 8:00 AM

Our Language Prejudices Don’t Make No Sense

We should embrace the fact that there’s no single recipe for English. Making fun of people for replacing “ask” with “aks,” or for frequently using double negatives just makes you look like the unsophisticated one.


October 22 • 7:04 AM

My Politicians Are Better Looking Than Yours

A new study finds we judge the cover by the book—or at least the party.


October 22 • 6:00 AM

How We Form Our Routines

Whether it’s a morning cup of coffee or a glass of warm milk before bed, we all have our habitual processions. The way they become engrained, though, varies from person to person.


October 22 • 4:00 AM

For Preschoolers, Spite and Smarts Go Together

New research from Germany finds greater cognitive skills are associated with more spiteful behavior in children.


October 21 • 4:00 PM

Why the Number of Reported Sexual Offenses Is Skyrocketing at Occidental College

When you make it easier to report assault, people will come forward.


October 21 • 2:00 PM

Private Donors Are Supplying Spy Gear to Cops Across the Country Without Any Oversight

There’s little public scrutiny when private donors pay to give police controversial technology and weapons. Sometimes, companies are donors to the same foundations that purchase their products for police.


Follow us


Politicians Really Aren’t Better Decision Makers

Politicians took part in a classic choice experiment but failed to do better than the rest of us.

Earliest High-Altitude Settlements Found in Peru

Discovery suggests humans adapted to high altitude faster than previously thought.

My Politicians Are Better Looking Than Yours

A new study finds we judge the cover by the book—or at least the party.

That Cigarette Would Make a Great Water Filter

Clean out the ashtray, add some aluminum oxide, and you've (almost) got yourself a low-cost way to remove arsenic from drinking water.

Love and Hate in Israel and Palestine

Psychologists find that parties to a conflict think they're motivated by love while their enemies are motivated by hate.

The Big One

One company, Amazon, controls 67 percent of the e-book market in the United States—down from 90 percent five years ago. September/October 2014 new-big-one-5

Copyright © 2014 by Pacific Standard and The Miller-McCune Center for Research, Media, and Public Policy. All Rights Reserved.