Menus Subscribe Search

Follow us


Genes Are Us

statistics

(Photo: bloomua/Shutterstock)

Why Statistically Significant Studies Aren’t Necessarily Significant

• June 06, 2014 • 9:52 AM

(Photo: bloomua/Shutterstock)

Modern statistics have made it easier than ever for us to fool ourselves.

Scientific results often defy common sense. Sometimes this is because science deals with phenomena that occur on scales we don’t experience directly, like evolution over billions of years or molecules that span billionths of meters. Even when it comes to things that happen on scales we’re familiar with, scientists often draw counter-intuitive conclusions from subtle patterns in the data. Because these patterns are not obvious, researchers rely on statistics to distinguish the signal from the noise. Without the aid of statistics, it would be difficult to convincingly show that smoking causes cancer, that drugged bees can still find their way home, that hurricanes with female names are deadlier than ones with male names, or that some people have a precognitive sense for porn.

OK, very few scientists accept the existence of precognition. But Cornell psychologist Daryl Bem’s widely reported porn precognition study illustrates the thorny relationship between science, statistics, and common sense. While many criticisms were leveled against Bem’s study, in the end it became clear that the study did not suffer from an obvious killer flaw. If it hadn’t dealt with the paranormal, it’s unlikely that Bem’s work would have drawn much criticism. As one psychologist put it after explaining how the study went wrong, “I think Bem’s actually been relatively careful. The thing to remember is that this type of fudging isn’t unusual; to the contrary, it’s rampant–everyone does it. And that’s because it’s very difficult, and often outright impossible, to avoid.”

We shouldn’t put much stock in one statistically significant precognition result that defies everything we know about the physical world. Studies with small, unrepresentative samples can be valuable, but we should treat them cautiously before they are replicated with other samples.

That you can lie with statistics is well known; what is less commonly noted is how much scientists still struggle to define proper statistical procedures for handling the noisy data we collect in the real world. In an exchange published last month in the Proceedings of the National Academy of Sciences, statisticians argued over how to address the problem of false positive results, statistically significant findings that on further investigation don’t hold up. Non-reproducible results in science are a growing concern; so do researchers need to change their approach to statistics?

Valen Johnson, at Texas A&M University, argued that the commonly used threshold for statistical significance isn’t as stringent as scientists think it is, and therefore researchers should adopt a tighter threshold to better filter out spurious results. In reply, statisticians Andrew Gelman and Christian Robert argued that tighter thresholds won’t solve the problem; they simply “dodge the essential nature of any such rule, which is that it expresses a tradeoff between the risks of publishing misleading results and of important results being left unpublished.” The acceptable level of statistical significance should vary with the nature of the study. Another team of statisticians raised a similar point, arguing that a more stringent significance threshold would exacerbate the worrying publishing bias against negative results. Ultimately, good statistical decision making “depends on the magnitude of effects, the plausibility of scientific explanations of the mechanism, and the reproducibility of the findings by others.”

However, arguments over statistics usually occur because it is not always obvious how to make good statistical decisions. Some bad decisions are clear. As xkcd’s Randall Munroe illustrated in his comic on the spurious link between green jelly beans and acne, most people understand that if you keep testing slightly different versions of a hypothesis on the same set of data, sooner or later you’re likely to get a statistically significant result just by chance. This kind of statistical malpractice is called fishing or p-hacking, and most scientists know how to avoid it.

But there are more subtle forms of the problem that pervade the scientific literature. In an unpublished paper (PDF), statisticians Andrew Gelman, at Columbia University, and Eric Loken, at Penn State, argue that researchers who deliberately avoid p-hacking still unknowingly engage in a similar practice. The problem is that one scientific hypothesis can be translated into many different statistical hypotheses, with many chances for a spuriously significant result. After looking at their data, researchers decide which statistical hypothesis to test, but that decision is skewed by the data itself.

To see how this might happen, imagine a study designed to test the idea that green jellybeans cause acne. There are many ways the results could come out statistically significant in favor of the researchers’ hypothesis. Green jellybeans could cause acne in men, but not in women, or in women but not men. The results may be statistically significant if the jellybeans you call “green” include Lemon Lime, Kiwi, and Margarita but not Sour Apple. Gelman and Loken write that “researchers can perform a reasonable analysis given their assumptions and their data, but had the data turned out differently, they could have done other analyses that were just as reasonable in those circumstances.” In the end, the researchers may explicitly test only one or a few statistical hypotheses, but their decision-making process has already biased them toward the hypotheses most likely to be supported by their data. The result is “a sort of machine for producing and publicizing random patterns.”

Gelman and Loken are not alone in their concern. Last year Daniele Fanelli, at the University of Edingburgh, and John Ioannidis, at Stanford University, reported that many U.S. studies, particularly in the social sciences, may overestimate the effect sizes of their results. “All scientists have to make choices throughout a research project, from formulating the question to submitting results for publication.” These choices can be swayed “consciously or unconsciously, by scientists’ own beliefs, expectations, and wishes, and the most basic scientific desire is that of producing an important research finding.”

What is the solution? Part of the answer is to not let measures of statistical significance override our common sense—not our naïve common sense, but our scientifically-informed common sense. We shouldn’t put much stock in one statistically significant precognition result that defies everything we know about the physical world. Studies with small, unrepresentative samples can be valuable, but we should treat them cautiously before they are replicated with other samples. As Gelman and Loken put it, without modern statistics most people would not believe a remarkable claim about general human behavior “based on two survey questions asked to 100 volunteers on the internet and 24 college students. But with the p-value, a result can be declared significant and deemed worth publishing in a leading journal in psychology.”

Michael White
Michael White is a systems biologist at the Department of Genetics and the Center for Genome Sciences and Systems Biology at the Washington University School of Medicine in St. Louis, where he studies how DNA encodes information for gene regulation. He co-founded the online science pub The Finch and Pea. Follow him on Twitter @genologos.

More From Michael White

A weekly roundup of the best of Pacific Standard and PSmag.com, delivered straight to your inbox.

Recent Posts

December 18 • 12:00 PM

The Paradox of Choice, 10 Years Later

Paul Hiebert talks to psychologist Barry Schwartz about how modern trends—social media, FOMO, customer review sites—fit in with arguments he made a decade ago in his highly influential book, The Paradox of Choice: Why More Is Less.


December 18 • 10:00 AM

What It’s Like to Spend a Few Hours in the Church of Scientology

Wrestling with thetans, attempting to unlock a memory bank, and a personality test seemingly aimed at people with depression. This is Scientology’s “dissemination drill” for potential new members.


December 18 • 8:00 AM

Gendering #BlackLivesMatter: A Feminist Perspective

Black men are stereotyped as violent, while black women are rendered invisible. Here’s why the gendering of black lives matters.


December 18 • 7:06 AM

Apparently You Can Bring Your Religion to Work

New research says offices that encourage talk of religion actually make for happier workplaces.


December 18 • 6:00 AM

The Very Weak and Complicated Links Between Mental Illness and Gun Violence

Vanderbilt University’s Jonathan Metzl and Kenneth MacLeish address our anxieties and correct our assumptions.


December 18 • 4:00 AM

Should Movies Be Rated RD for Reckless Driving?

A new study finds a link between watching films featuring reckless driving and engaging in similar behavior years later.


December 17 • 4:00 PM

How to Run a Drug Dealing Network in Prison

People tend not to hear about the prison drug dealing operations that succeed. Substance.com asks a veteran of the game to explain his system.


December 17 • 2:00 PM

Gender Segregation of Toys Is on the Rise

Charting the use of “toys for boys” and “toys for girls” in American English.


December 17 • 12:41 PM

Why the College Football Playoff Is Terrible But Better Than Before

The sample size is still embarrassingly small, but at least there’s less room for the availability cascade.


December 17 • 11:06 AM

Canadian Kids Have a Serious Smoking Problem

Bootleg cigarette sales could be leading Canadian teens to more serious drugs, a recent study finds.


December 17 • 10:37 AM

A Public Lynching in Sproul Plaza

When photographs of lynching victims showed up on a hallowed site of democracy in action, a provocation was issued—but to whom, by whom, and why?


December 17 • 8:00 AM

What Was the Job?

This was the year the job broke, the year we accepted a re-interpretation of its fundamental bargain and bought in to the push to get us to all work for ourselves rather than each other.


December 17 • 6:00 AM

White Kids Will Be Kids

Even the “good” kids—bound for college, upwardly mobile—sometimes break the law. The difference? They don’t have much to fear. A professor of race and social movements reflects on her teenage years and faces some uncomfortable realities.



December 16 • 4:00 PM

How Fear of Occupy Wall Street Undermined the Red Cross’ Sandy Relief Effort

Red Cross responders say there was a ban on working with the widely praised Occupy Sandy relief group because it was seen as politically unpalatable.


December 16 • 3:30 PM

Murder! Mayhem! And That’s Just the Cartoons!

New research suggests deaths are common features of animated features aimed at children.


December 16 • 1:43 PM

In Tragedy, Empathy Still Dependent on Proximity

In spite of an increasingly connected world, in the face of adversity, a personal touch is most effective.


December 16 • 12:00 PM

The ‘New York Times’ Is Hooked on Drug du Jour Journalism

For the paper of record, addiction is always about this drug or that drug rather than the real causes.


December 16 • 10:00 AM

What Is the Point of Academic Books?

Ultimately, they’re meant to disseminate knowledge. But their narrow appeal makes them expensive to produce and harder to sell.


December 16 • 8:00 AM

Unjust and Unwell: The Racial Issues That Could Be Affecting Your Health Care

Physicians and medical students have the same problems with implicit bias as the rest of us.


December 16 • 6:00 AM

If You Get Confused Just Listen to the Music Play

Healing the brain with the Grateful Dead.


December 16 • 4:00 AM

Another Casualty of the Great Recession: Trust

Research from Britain finds people who were laid off from their jobs expressed lower levels of generalized trust.


December 15 • 4:00 PM

When Charter Schools Are Non-Profit in Name Only

Some charters pass along nearly all their money to for-profit companies hired to manage the schools. It’s an arrangement that’s raising eyebrows.


December 15 • 2:00 PM

No More Space Race

A far cry from the fierce Cold War Space Race between the U.S. and the Soviet Union, exploration in the 21st century is likely to be a much more globally collaborative project.


December 15 • 12:32 PM

The Hidden Psychology of the Home Ref

That old myth of home field bias isn’t a myth at all; it’s a statistical fact.


Follow us


Apparently You Can Bring Your Religion to Work

New research says offices that encourage talk of religion actually make for happier workplaces.

Canadian Kids Have a Serious Smoking Problem

Bootleg cigarette sales could be leading Canadian teens to more serious drugs, a recent study finds.

The Hidden Psychology of the Home Ref

That old myth of home field bias isn’t a myth at all; it’s a statistical fact.

A Word of Caution to the Holiday Deal-Makers

Repeat customers—with higher return rates and real bargain-hunting prowess—can have negative effects on a company’s net earnings.

Crowdfunding Works for Science

Scientists just need to put forth some effort.

The Big One

One in two United States senators and two in five House members who left office between 1998 and 2004 became lobbyists. November/December 2014

Copyright © 2014 by Pacific Standard and The Miller-McCune Center for Research, Media, and Public Policy. All Rights Reserved.