Pitfalls of Statistically Complex Research

Tuesday, May 17, 2011

In 2006, a group of researchers at Duke University announced in a research article a major breakthrough [in chemotherapy]. This was followed by several articles in the same vein; all were published in leading journals and had citation counts that any academic would envy. One paper, in the New England Journal of Medicine, was cited 290 times.

By 2009, three trials based on the research results were under way, with 109 cancer patients eventually enrolled. But the efforts never came to fruition - in fact, the trials were halted early, for the promise had been a hollow one. The research was riddled with major errors. This sad story has lessons for our universities, individual researchers and academic journals.

When the first Duke papers on therapeutic regimes came to the attention of clinicians at the University of Texas MD Anderson Cancer Center in Houston, they were keen to try the techniques. Two of their biostatisticians, Keith Baggerly and Kevin Coombes, were asked to investigate. They discovered major problems with the statistics and the validity of the data, and pointed this out to the Duke researchers. Although some small errors were rectified, the researchers were adamant that the core work was valid.

So why had the Duke University review given the all-clear? The reason was that the external reviewers tasked with validating the research were working with corrupted databases. In the diplomatic words of the university's post-mortem report to the Institute of Medicine inquiry, the databases had "incorrect labelling ... the samples also appeared to be non-random and yielded robust predictions of drug response, while predictions with correct clinical annotation did not give accurate predictions".

The medical journals and the Duke researchers and senior managers should reflect on the damage caused. The events have blotted one of the most promising areas in medical research, harmed the reputation of medical researchers in general, blighted the careers of junior staff whose names are attached to the withdrawn papers, diverted other researchers into work that was wasted and harmed the reputation of Duke University.

What lessons should be learned from the scandal? The first concerns the journals. They were not incompetent. Their embarrassing lapses stemmed from two tenets shared by many journals that are now out of date in the age of the internet. The first is that a research paper is the prime indicant of research. That used to be the case when science was comparatively simple, but now masses of data and complex programs are used to establish results. The distinguished geophysicist Jon Claerbout has expressed this succinctly: "An article about computational science in a scientific publication isn't the scholarship itself, it's merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions used to generate the figures."

The second tenet is that letters and discussions about defects in a published paper announcing new research have low status. Journals must acknowledge that falsifiability lies at the heart of the scientific endeavour. Science philosopher Karl Popper said that a theory has authority only as long as no one has provided evidence that shows it to be deficient. It is not good enough for a journal to reject a paper simply because it believes it to be too negative.

The third lesson is for scientists. When research involves data and computer software to process that data, it is usually a good idea to have a statistician on the team. At the "expense" of adding an extra name to a publication, statisticians provide a degree of validation not normally available from the most conscientious external referee. Indeed, the statistics used might merit an extra publication in an applied statistics journal. Statisticians are harsh numerical critics - that's their job - but their involvement gives the researcher huge confidence in the results. Currently the scientific literature, as evidenced by the major research journals, does not boast any great involvement by statisticians.

In its official account to the Institute of Medicine inquiry - in effect a chronicle, a detailed description of the errors that were committed and a future agenda - Duke University implicitly acknowledges the mistakes. It states that "quantitative expertise is needed for complex analyses", "sustained statistical collaboration is critical to assure proper management of these complex datasets for translation to clinical utility" and "the implementation and utilization of systems that provide the ability to track and record each step in these types of complex projects is critical".

Systems failure, The Times Higher Education, May 5, 2011, http://www.timeshighereducation.co.uk/story.asp?sectioncode=26&storycode=416000&c=2.

Most published research findings are false. This was the startling conclusion of a paper by John Ioannidis, an epidemiologist at Stanford University, who has since become the poster boy for an uncomfortable fact: human fallibility undermines the pursuit of truth in research.

While science is still popularly seen as an unfaltering march towards truth, Professor Ioannidis demonstrated in his now-famous 2005 paper, "Why most published research findings are false", that scientists have a bad habit of getting in the way. They play around with their data until they spot something they want to see, engaging in what Professor Ioannidis called statistical "significance chasing". They find clever ways to confirm hypotheses, cherry-picking results and burying bad news in inaccessible, complex databases.

Of course, researchers are not lone actors. Another group of all-too-fallible humans, journal editors, can muddy the waters further. Journals like to report eye-catching positive research findings, but they often pay less attention if a theory is later shot down. When editors are also reluctant to print retractions when things are simply wrong, the scientific literature can become messy and murky.

What seemed to be a significant scientific breakthrough was leaped upon by several top journals, while subsequent evidence of major problems with the data and statistical analysis struggled to gain anything like the same public prominence. There was reluctance by some journals to set the record straight and suggestions that the issue was one of statistical interpretation, with no right or wrong answer.

In the end, it was clear that the case was built on flawed data. But if it had not been for two dogged biostatisticians who spotted the problems and would not let the matter go, much more than just money and time could have been at stake - clinical trials based on the flawed findings were under way.

Leader: To get to the truth, open up, Phil Baty, May 5, 2011, http://www.timeshighereducation.co.uk/story.asp?sectioncode=26&storycode=416017&c=2.