Monday, June 23, 2008

I can't believe I'm quoting Reagan

I recently heard the traumatic tale of a local new-ish (read: untenured) lab. They'd written some code for acquisition and analysis of their data, and one of the grad students, poking through it one day in confusion over his analysis, realized that all of the analysis this lab has done for the last THREE YEARS has been wrong.

They had to scurry to retract a submission at a high-profile journal (fortunately not yet sent out for review). And now they have to reanalyze Everything.

This is awfully scary, but there are a few silver linings:
1) It was just the analysis code, not the acquisition code. You can reanalyze data, but you really don't want to have to re-collect it.
2) It turns out that many of their conclusions are not seriously affected by this reanalysis. Phew.
3) The grad students in the lab are all learning a Very Useful Lesson about homewritten code: Trust, but verify.

We had a similar, though not as earthshaking, incident in GradLab. I had to reanalyze half a year's worth of data, though happily the inaccurate parameter was not very important for my research. I was furious about it--had a giant argument with my GradAdvisor about the situation, was grumpy for a full week, etc etc--but I sure did learn Lesson #3 above.

It's not just code, either. Another postdoc's solutions? A friend's summary of a paper? A colleague's bench protocol? Heck, even a significant result in a published article? Trust, but verify.

8 comments:

Anonymous said...

You heard about the affaire du Chang, right?

When he investigated, Chang was horrified to discover that a homemade data-analysis program had flipped two columns of data, inverting the electron-density map from which his team had derived the final protein structure. Unfortunately, his group had used the program to analyze data for other proteins. As a result, on page 1875, Chang and his colleagues retract three Science papers and report that two papers in other journals also contain erroneous structures.

http://www.sciencemag.org/cgi/content/full/314/5807/1856

Whoops!!! HAHAHAHAHAHAH!!!

Candid Engineer said...

Good point, Dr. J.

One of my (arrogant) colleagues provided me with a protocol when I first started my new position, and he was a bit steamed that I wanted to go through it and verify that the calculations had been done properly. As it turned out (and as I suspected), his calculations were done correctly. I was able, however, to shave about an hour of work off the protocol and learned more about what I was doing in the process.

Drugmonkey said...

yeah, that's a real laugh all right PP. except for this part:

Clarke also served on grant panels on which he says Chang's work was influential. "Those applications providing preliminary results that were not in agreement with the retracted papers were given a rough time," he says.

how pissed would you be if you were one of those applicants...?

Dr. Jekyll and Mrs. Hyde said...

Jeez, that's pretty much the definition of a sick feeling in your stomach. I hadn't seen that one (despite my avowed retraction attraction...).

Yeah, sucks to be the researchers whose data doesn't match. Do you think that they end up looking good down the line? That is, that anyone will remember that they were right, and give them some extra bonus points (unclear exactly how those might translate to priority scores)?

I'm still blinking at the thought of retracting three Science papers.

Anonymous said...

Maybe papers up for review should include both the data, and the source code for programs used to analyze that data, so that they too can be reviewed.

Dr. Jekyll and Mrs. Hyde said...

HSF--interesting idea. Sadly, it would probably be incredibly hard for most of us to understand each other's code--and reviewing takes a lot of time already as is. However, maybe we could extend your suggestion to the infamous "supplemental data"--if the code is published online as supplemental data for the paper, then anyone with growing suspicions could take the time to check it out. Hey, I kind of like that idea!

Anonymous said...

I suspect that merely having the requirement to publish a paper's source code and data would cause the quality of such code and data to improve!

In computer science research, systems with complete system source code available are surprisingly rare but, when available, extremely instructive. Source code can help when attempting to reproduce results or understand details of methodology that are not explained completely in the paper.

Here is an example from economics demonstrating that it's possible:
http://www.marginalrevolution.com/marginalrevolution/2007/09/justin-wolfers-.html

Arlenna said...

Number one rule of doing good science: TRUST NOBODY no matter how smart you think they are or how much you like them or how much you don't want to offend them. Or even if they are you.

Hours, days, months, years, careers wasted by not checking your shit.