Words of Wisdom

"Evolutionary biology is not a story-telling exercise, and the goal of population genetics is not to be inspiring, but to be explanatory."

-Michael Lynch. 2007. Proc. Natl. Acad. Sci. USA. 104:8597-8604.

Social Media
Currently Reading

 

 

 

Cycling

mi (km) travelled: 4,969 (7,950).

mi (km) since last repair: 333 (532)

-----

Busted spoke (rear wheel) (4,636 mi)
Snapped left pedal and replaced both (4,057 mi)
Routine replacement of break pads (3,272 mi)
Routine replacement of both tires/tubes (3,265 mi)
Busted spoke (rear wheel): (2,200 mi)
Flat tire when hit by car (front): (1,990 mi)
Flat tire (front): (937 mi)
Flat tire (rear): (183 mi)

Blog RSS Feed
Powered by Squarespace
« Book Club: The Mayor of Castro Street... | Main | Letting Go... »
Monday
Jan092012

Rant: Replicability...

If one were to list those traits that separate 'scientific knowledge' from 'intuition' or 'revealed knowledge', one might choose traits such as a) demonstrability, b) testability, c) reproducibility, d) consistency with previous scientific knowledge, and e) subject to revision with new observations, among others1. As some scientific fields become more complex, involving highly specialized, technical analyses of ever-larger amounts of data, the question of 'reproducability' is becoming a serious concern.

One concern is practical: If I generate terabytes upon terabytes of data, I have to be able to provide said data to the community, should they want to reproduce my work. Not every lab has the expertise (nor the resources) to provide their own web-hosting of the data, therefore such data have been stored in large, public repositories such as those found at the National Center for Biotechnology Information (NCBI). However, as has been pointed out by several op-eds in the recent scientific literature, the price of data generation (mostly in the form on nucleic acid sequencing) is dropping at a rate faster than the cost of the hard-drives required to store it. Therefore, it's quickly becoming cheaper to simply store samples (and resequence them as needed) rather than store the raw data themselves2.

A second concern is more personal: Some of the analyses being done nowadays are very, very big and very, very complicated. Anyone who's ever been involved in a genome consortium paper (and I've been involved in a few now) knows that a < 10 page Nature paper may involve the work of 40+ people, each contributing significant details to an analysis. Collating all of these details together into a single story is challenging, and unfortunately, crucial details about how things were done sometimes get lost in the shuffle.

It's not only consortium papers though - anecdotally, I've heard several scientists complain upon reading new papers, that they couldn't reproduce the details of the analysis. Either the descriptions are too vague, or the available data are not in a format that makes it easy to understand where they came from, or how they were generated.

In the past month, I've come across a few papers whose details and methods I'd love to adapt to answer some of my own questions. However (and you can probably see where this is going) I can't figure out how to square the supplementary data that are provided with the details of the analysis: Why is it that the paper consistently refers to 8 classes of data, whereas the public depository entry contains 14???? Unfortunately, it's an unnecessary impediment to me getting my work done.

If scientific data are going to be held up to the somewhat lofty goals of the discipline, we need to make sure that they're well documented and reproducible3.

 

1The difference between 'demonstrability' and 'testability' is the subject of much debate in some scientific circles. For instance, if a theory is testable in principle (e.g., given the appropriate conditions/equipment) but not in practice (e.g., the equipment required is far beyond the scope of current knowledge), can it truly be called scientific? There are many reasonable hypotheses about things like life on other planets that just aren't testable in the foreseeable future. That's not to say that this automatically means that such fields aren't 'useful' or interesting; however, the more appropriate question may be whether funding should be prioritized towards testable hypotheses.

2There are all kinds of issues of practicality and reproducibility when we discuss such things - I'm only pointing out the details on purely economics terms.

3I have a nitpicky specific discussion point about online data depositories that I think I'm going to save for a follow-up post.

PrintView Printer Friendly Version

EmailEmail Article to Friend

Reader Comments

There are no comments for this journal entry. To create a new comment, use the form below.

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>