Imagine a scientist from a small university who has spent a decade creating a unique data set by collecting thousands of measurements of a prairie ecosystem that is adapting to climate change. Despite struggles with funding and the time-consuming demands of her teaching duties, she publishes her first piece in what she intends to be a series of papers on this ecosystem. As a supplement to the paper, she shares her data on the Web. A year later she finds that a large, well-funded research group has downloaded her data and, without contacting her, published multiple papers that present the same analysis she was planning on doing.
It’s not clear how often this scenario happens in science, but many scientists worry about it. While most government agencies and journals make scientists agree to share their data as a condition of funding and publication, researchers often have strong incentives not to share. The ethics of sharing in science are murky, and journals and funding agencies have largely left the specifics of what and when to share up to the individual scientists. As Jonas Waldenström at the University of Linnaeus explained, “it is one thing where your data is used as a brick in a new construction, and another to have someone taking over your house and having to give away the key.” When they share data, scientists want to be sure they’re handing over a brick, and not the key to their ongoing creative projects. In many cases, if you want someone’s data, you won’t find it up on the Web—you have to ask for it directly.
It may be that sharing data doesn’t do much to prevent outright fraud, but my own experience is that sharing is a strong incentive to not be sloppy—and despite science’s reputation for rigor, sloppiness is a substantial problem in some fields.
THE EDITORS OF THE Public Library of Science (PLOS) family of scientific journals recently decided to give their authors much more specific instructions for sharing data. They announced that “authors must make all data publicly available, without restriction, immediately upon publication of the article.” They defined data as “any and all of the digital materials that are collected and analyzed in the pursuit of scientific advances,” and now require authors to provide a “data availability statement” that serves the purpose of “describing where and how others can access each dataset that underlies the findings.”
The response to PLOS editors’ announcement shows that, while most scientists agree in principle that some data sharing is important, they disagree about what data needs to be shared, how it should be distributed, and what sharing is supposed to accomplish. In sometimes-angry responses, researchers argued that PLOS is imposing a huge burden on researchers in an attempt to fix something that isn’t broken, that the new policy applies a misguided, one-size-fits-all solution to different scientific communities with different needs and standards, and that making it easy to access other people’s data will result in nothing but low-value research that is the scientific equivalent of fan fiction: “Science is the motorboat … data are the wake behind it … shit we’ve already churned through.”
The negative responses prompted PLOS to clarify their intentions. The PLOS editors argued that their new policy is merely to more strongly enforce the expectations for sharing that have always been in place. “The policy does not aim to say anything new about what data types, forms and amounts should be shared,” the editors wrote. “The policy does aim to make transparent where the data can be found, and says that it shouldn’t be just on the authors’ own hard drive.” They acknowledged that different scientific fields will have different constraints on what data can be shared, such as legal requirements that protect the privacy of patient data. Some fields don’t have well-established standards for sharing data, and the editors say they are willing to work with authors to figure out how to best comply with PLOS policy.
DESPITE THE ROUGH START, the new PLOS policy is a good thing because it is forcing scientists to reconsider why sharing is important. Before the Internet, if you wanted to look at someone’s data, you’d have to go through the trouble to get it in person or through the mail. But these days the whole process can be much easier, and a scientist is much more likely to explore a quick, preliminary idea or a puzzling inconsistency when the data can be downloaded in a few minutes. It may be that sharing data doesn’t do much to prevent outright fraud, but my own experience is that sharing is a strong incentive to not be sloppy—and despite science’s reputation for rigor, sloppiness is a substantial problem in some fields. You’re much more likely to check your work and follow best data-handling practices when you know someone is going to run your code and parse your data.
The rapid advances in information technology are a blessing and a curse for scientists today. With better computers and networks, scientists can collect and analyze more data, and share it more easily. This is supposed to be good for science and good for society—sharing is supposed to get us “a better ‘bang for the buck’ out of scientific research” that is primarily funded with public money. But this raises hard questions about the value and ethics of sharing that some scientific communities have not yet resolved. Nobody is forced to publish in PLOS journals, and so we should take the new PLOS data sharing policy for what it is: an experiment in science communication and an opportunity for the scientific community to learn how to share.