Do classic psychological studies published in high-profile journals hold up? The Reproducibility Project aims to find out.
There are few psychological effects better known—or more widely accepted—in academic halls than what is called semantic priming. Show a person a simple stimulus, something as unremarkable as a photograph of a cat. Let some time pass, then ask that same person to list as many words as possible that start with the letter c. This person is more likely not only to come up with the word cat, but to mention catlike animals such as cougars and cheetahs, because he was initially primed with that one little kitty cat.
Priming’s reach, of course, stretches far beyond cognitive tests. Therapists use it to help treat patients with depression during therapy sessions. Advertisers count on commercials to prime us to buy key brands during our trips to the mall or the grocery store. Priming is considered an underlying mechanism in stereotyping. And the word has become part of our cultural lexicon, too. We talk about how we are “primed” to feel, to want, to need, to talk. Priming is everywhere.
And yet, many of the classic studies that led us to our current understanding of priming have never been replicated. In fact, the few attempts to reproduce the results that we have taken at face value for so long have failed. In late 2012, that led Daniel Kahneman, noted Princeton University psychologist and author of the best-selling book Thinking Fast and Slow, to write an open e-mail to the entire priming-research community. He wrote, “Your field is now the poster child for doubts about the integrity of psychological research. Your problem is not with the few people who have actively challenged the validity of some priming results. It is with the much larger population of colleagues who in the past accepted your surprising results as facts when they were published.” Kahneman’s solution? A new research protocol whereby cooperating labs attempt to check and replicate each other’s studies. This is the only way, he argues, to separate the scientific wheat from the chaff.
But accuracy and integrity issues are not limited to studies about semantic priming. They plague the whole psychological community. Research replication, an essential feature of good science—the element that allows truth to shine through the experimental brume—has simply not been a priority in today’s “publish or perish” climate. And we’re now learning that many well-publicized studies can’t be replicated. (This may be due to, say, incorrect or inappropriate analysis of results, or a sample size that is too small.) When studies can be reproduced, there is little incentive for scientists to do so.
“Journals are geared toward publishing new stuff, and that new stuff tends to be overwhelmingly positive results,” says Eric Eich, editor of the journal Psychological Science. “Discovery work, across the board, tends to be valued higher than doing confirmation or replication work.”
So with reputations hanging in the balance, what can be done?
That’s the question Brian Nosek, a psychologist at the University of Virginia, has been pondering. Last year, he launched the crowdsourced Reproducibility Project, with the express mission of replicating psychological studies published in high-profile journals. “Academic science is open, transparent—people are supposed to be able to see the evidence and basis for different claims and then evaluate them,” Nosek tells me. “Science values truth above all else.”
The first step: to find enough psychologists willing to forgo the prestige of discovery work. Nosek reached out to his colleagues across the country—psychologists who, like him, had been decrying the lack of replication in the field at conferences and meetings over the past decade. He challenged each of them to try to reproduce a single study from a sample of those published in three eminent psychological journals. By spreading the work around, and mitigating the difficulties and costs involved with replication, Nosek argued, the field could finally get an idea of what social psychology’s reproducibility rate really is.
The project has been met with overwhelming praise—publicly, at least. Nosek’s original recruitment e-mail went viral, reaching a larger audience than he ever imagined. Now more than 100 scientists across more than 40 global institutions have joined the reproducibility mission, using their own laboratory resources, and the project’s guidelines, in attempts to replicate studies. About 20 replication attempts have already been completed.
Georg Jahn, a psychologist at the University of Greifswald, in Germany, recently replicated a 2008 study that looked at the importance of attention when learning the associations between adjacent and nonadjacent items—a skill that has implications for the way people learn language and grammar. Jahn says the replication was very straightforward, and the results confirmed the original study’s findings.
But reproducing studies, and determining whether original results hold up, is not always so clear-cut. Privately, scientists have voiced concerns about what counts as a true replication. What if the exact same materials or methods are not used? What if the study is run in a setting with slightly more—or less—controls? The devil, as they say, is in the details.
Michael Frank, a developmental psychologist at Stanford University, ran across these issues as he and his students worked on the replications of several survey-type studies. One was a paper titled “Why People Are Reluctant to Tempt Fate.” (pdf) Lead author Jane Risen, at the University of Chicago, and her coauthors documented that individuals, reacting to a series of “what if?” scenarios, responded that it was bad luck to tempt fate—even when they didn’t believe in fate.
Risen’s goal was to find out how and why people can believe things they know are false. Frank’s preliminary results suggest that the magical thinking described in the original paper was not evident in the new study. Risen, though, believes Frank’s students’ work was not a real replication. The Stanford group ran the study on the general population instead of on a subset of students—and did so using the Internet, whereas the original study was conducted in person. “Because the study involved participants imagining themselves being called on in a college class, it was important that it be run with student participants who could relate to the story,” Risen said.
Nosek is quick to tell me that a study could fail to replicate an original result for many reasons: study methods may be just different enough, or a study’s sample not quite large enough to be statistically valid. He admits that he and his project partners are learning as they go—and before making any declarations about a researcher’s success or failure to reproduce a study, they are evaluating every project on a case-by-case basis.
Accuracy is key. If they get a low reproducibility rate, the outcomes, obviously, can have weighty consequences. Donors may be less inclined to fund studies in the field, Nosek explains, or people may lose their trust in psychology altogether.