a group of people wearing hats

Photo credit: Espen Faugstad

Eighty percent of scientific data are lost within two decades, disappearing into old email addresses and obsolete storage devices, a Canadian study indicated.

~United Press International (UPI) article

The data used in the research papers that make up the scientific record is an endangered species. 

That is the conclusion of a University of British Columbia study as reported by UPI. The authors tried to find the original data from “a random set of 516 studies published between 1991 and 2011.” The researchers succeeded with every paper published less than two years ago, but with these older publications, “the odds of obtaining the underlying data dropped by 17 per cent per year after that.” This means that the data used to make conclusions that inform everything from how we treat illness to our understanding of neuroscience is disappearing like the average American’s digital music library. 

As advocates of a more collaborative and transparent scientific process, we realize the problems this presents. The peer review process that vets papers before they are published in scientific journals and added to the scientific record is not nearly as foolproof as one would hope. Access to the original data allows other scientists to review published work, to reproduce experiments, and to check whether the data really proves a paper’s conclusions.

As bloggers and writers, we also realize that this problem is relevant to journalism and the work of historians and other academics. In other words, it is applicable to to the qualitative as much as the quantitative, the historical record as well as the scientific record. 

Data-driven journalists have spreadsheets floating around with the data used in their analyses that are just as likely to disappear as scientific data. (This author knows he does.) More generally, every journalist and historian has hours and hours of interviews in Word documents, audio files, and notebooks that will never see the light of day. 

A perplexed Egyptian activist once told this author that for every half hour he spent talking with journalists, they seemed to use a single, one sentence quote. (And spell his name wrong.) This ratio of one quote used per half hour of interviewing is common. Articles and books can rarely fit more than a few excerpts of each interview.

One of the most frustrating aspects of writing an article is this inability to use so much great material from interesting and important people. Take the example of journalist Bob Woodward. In the seventies, his reporting helped break the Watergate story. For the last 15 odd years, he has written books on the most significant policy decisions made by the Bush and Obama Administrations that involved hundreds of hours of interviews with the presidents and every major player in their staff. One can only imagine all the material Woodward left on the cutting room floor — interviews with President Bush and his staff on the decision to invade Iraq, for example — that other historians and journalists would kill for the opportunity to look at, but are instead forgotten on a spare hard drive in Woodward’s home. 

Some sort of open-access journalism and history would allow other historians and journalists, as well as the public, to profit from this material. But just as access to scientific data is also important for holding researchers accountable — for ensuring that their data supports the conclusions they draw — so too are the full interviews from which academics and journalists draw for their publications. Have they accurately interpreted the interview? Or have they used quotes that misrepresent or fail to get at the full complexity of an issue? We can only know if we have access to the full interviews.

In the case of both the historical and scientific record, it would not be easy to get all this original data and reporting in some sort of public archive. Scientists, journalists, and historians are rewarded for publishing; no careers have been made by sharing data and interviews with valued sources. Someone needs to be responsible for keeping everything up to date for the day when today’s storage options are the equivalent of floppy disks. And if real journalists are anything like this author, many of their interviews exist only in ineligible, physical notes and in shorthands that no outsider could understand. 

But the logistical difficulties would be worth tackling. It would preserve and add to our common store of knowledge — both historical and scientific.

This post was written by Alex Mayyasi. Follow him on Twitter here or Google PlusTo get occasional notifications when we write blog posts, sign up for our email list.