The replication crisis

Some weeks back, I wrote an article for Prospect on the discovery of an astronomical signature of both cosmic inflation and gravitational waves. Either of these alone would be huge news. Cosmic inflation, a period of immensely rapid expansion of the universe, could explain several puzzles about the Big Bang, while gravitational waves, predicted by Einstein’s theory of general relativity, have been sought for decades. Discovering both at once was a blockbusting event.

Except that it seems the announcement might have been premature. It looks increasingly likely that the result, reported by an international collaboration using a telescope at the South Pole called BICEP2, might be an artefact caused by dust scattered through our own galaxy. Most informed observers are now distancing themselves from the claims. The research hasn’t actually been debunked, it is important to stress.

It’s difficult to know where this type of confusion leaves the humble science writer. How much do you take on trust? How deeply can and should you check out the claims, especially when the news is hot? After all, it's increasingly common for papers published in well-respected journals to be withdrawn, either because the research is faulty, other groups show they are wrong, or they simply can’t be verified. Or due to cases of naked fraudulence, such as fabrication of data.

Then, more problematically, there are all the mistakes which don’t end up withdrawn. What’s unsettling is that faulty research only comes to light because there’s enough riding on the outcomes for other scientists to try replicating the research. The majority of research findings are never even tested by other groups, so we don’t know if they’re true or not.

Welcome to the current controversy known as the “reproducibility crisis.”

The alarm was sounded in 2005, when John Ioannidis, a professor of medicine at Stanford University, published a paper in the journal PLoS Medicine with the combative title “Why most published research findings are false.” Ioannidis’s arguments are entirely statistical: in no case does he show that a particular piece of published work is incorrect. All the same, his conclusions are persuasive and disturbing. Not only does it seem likely that most claims are wrong, but that likelihood is greater when there are strong financial interests involved or when a topic is particularly newsworthy and competitive.

It has been well documented, not least by the indefatigable Ben Goldacre, that pharmaceutical companies tend to publish results favourable to the products they want to sell, and sometimes cite other studies that are supportive rather than ones that raise doubts, for example.

Another reason for the unreliability of the published literature in life sciences is the smallness of the sample sizes for human tests. This is particularly problematic in psychology, where claims can be made on the basis of testing just 20 or so subjects. No drug would ever get to market with clinical trials of that size. Yet sometimes the traits being studied are subtle and context-dependent, so that even if there are follow-up studies, they might conflict. That was true, for example, of the alleged Mozart effect, the claim that listening to Mozart produces cognitive benefits in children. The image of psychology has not been helped either by recent high-profile cases of fraud.

What’s to be done? One recent inquiry into allegedly faulty stem-cell research at a Japanese research centre by some of its scientists has highlighted the lengths to which journal editors already go to spot potential problems. These include running all manuscripts through plagiarism-detection software—a level of suspicion that in itself suggests a dysfunctional system. A few publications such as the British Medical Journal have taken the rather drastic step of employing staff members specifically to look into cases of questionable research or reporting.

Veteran health reporter Ivan Oransky, vice president of MedPage Today, runs a blog called Retraction Watch which supplies news and comment about retracted papers across the sciences. Among various proposals to make biomedical research more transparent is a requirement, long championed by Goldacre, that all clinical trials be registered before they are carried out, and audited periodically, so that findings that don’t suit a drug company can’t simply be buried unpublished. And to encourage attempts at replication, it has been suggested that journals that publish a research finding should commit to publish all subsequent replications submitted to them (subject to normal peer review). That's not going to work, however, for the top journals with limited page space.

But there are difficult questions for science reporters too. As Oransky put it at the UK Conference of Science Journalists last month, if writers are determined to be “first” with a story, then they need to ask themselves “how often do you want to be wrong?” But this is no real guide to action. Can anyone honestly expect science writers to police the scientific literature—after the work has been peer-reviewed by experts? Or should they wait three months before reporting an exciting finding, to give potential problems a chance to come to light? It’s not clear what, on the whole, reporters can be expected to do beyond collecting impartial comments on the work from reputable experts.

The problem is one that the entire scientific community must face, and there is no unique solution. But scientists and science advocates could start by being more modest about the so-called scientific process, which supposedly filters all claims through the rigorous, objective and foolproof sieve of reproducibility before accepting them. It doesn’t do this at all, in any systematic way, and never has. Science gets trustworthy knowledge, but it does so in a way that is far more ad hoc and error-prone than the textbook accounts suggest. That’s nothing to be ashamed of, but we shouldn’t be making the scientific process a shibboleth.