Fears about 'Discovery Science'...
Tuesday, August 9, 2011 at 7:25PM Given all of the recent difficulties I've had with regards to my career1, I'm beginning to worry that my view of science has been somewhat naïve. See, I spent a good 10 years of training having the idea that hypotheses are important drilled into my head. Furthermore, every time I've applied for funding, I've been told (both by my supervisors and the granting agencies themselves) that having clearly defined hypotheses in my project proposal is a necessary requirement for success. And yet, having spoken to some fellow postdocs about their own work, I seem to be noticing a trend towards generating large, expensive datasets (particularly of the 'next-generation sequencing' variety) and then searching for 'interesting stories' within said sets. I've heard this referred to as 'discovery science' in the past.
Now here's a conceit: There first needs to be an observation upon which to formulate a hypothesis. In other words, you first need to see some pattern in the data before you can speculate on its potential cause(s). Generating a large, expensive dataset may produce the observation wanting an explanation. So let me be more precise in saying that my discussion here isn't so much with the lack of a hypothesis itself (which may come later), but rather with a lack of focus, or even a scientific question in the first place.
There are an infinite number of potential datasets that one could generate - I could, for example produce a dataset estimating gene expression levels in an adult dog's liver as well as a baby chimpanzee's cheek and call that a dataset. My observation may then be that 'sheesh, there sure are a lot of expression differences between these two species/tissues/developmental time points!' I could then formulate a number of hypotheses as to why this would be the case; but now I encounter a difficulty: It's very likely that this 'dataset' is totally inadequate for testing any meaningful, realistic hypothesis that I may generate. In fact, one may say that comparing these two, 'unpaired' tissues between two distantly related species is rather 'random' and even 'weird'.
So let me step away from such an egregious example of poor experimental design by using it to make a point: the question you seek to answer ultimately determines how an experiment should be designed. As I've said numerous times before in previous posts, experimental design is one of the most difficult things to learn as a junior scientist (and learning this should ultimately be the result of a Ph.D.) By extension it's perhaps arguable that the ultimate 'thing' to master as one develops their abilities as a scientist is how to come up with interesting, tractable questions.
My naïvete may ultimately stem from my conviction, up to this point, that everything within a given project stems from the question being asked, and the hypothesis being tested2. It determines the design of the experiment, the interpretation of the results, and the writing of the manuscript detailing those results. And yet, I've found myself - as well as folks I've spoken to - stuck in situations where I'm trying to find a 'story' for a number of observations at the point where I'm writing a manuscript. While I suppose that there's nothing wrong with generating new hypotheses even this late in the game, you may have to go do additional experiments in order to properly test said hypotheses (again, something I'm not particularly used to as I generally don't begin writing a paper until I have a fairly complete story).
I'd be lying if I didn't say that I worry about such completely discovery-based projects, both on scientific and on practical grounds. Scientifically, hypotheses are best tested by data generated for this express purpose - often times these large, exploratory projects 'test' hypotheses using pre-generated data (typically the data from which the observation precipitating the hypothesis was drawn) and a set of assumptions. This isn't always bad, but it could be avoided in some cases by more detailed experimental design in the first place. On the practical side, these projects are often very expensive. I've worked in labs both 'rich' and 'poor' and my personal experience is that one spends a lot more time carefully thinking about experimental design in the latter - you need to be able to get as much 'bang for your buck' as possible.
Regardless, this isn't an easily resolved issue: there are good arguments for generating 'big' datasets - they're often very useful to the community for generating and testing hypotheses down the road, for instance. On the flip side, when I hear folks saying things like 'Nature will have to publish this because it's such a huge dataset!', it somewhat undermines my 'naïve' view of what science is supposed to be.
1I left my current postdoc because of dissatisfaction with the work that I was doing. I'd like to discuss this carefully, in a blog post someday.
2I would like to note here that nowhere am I opposed to the question and/or hypotheses changing (even radically) based on new information during the process - only that the end result will be determined by the final question that's been settled upon.
Carlo |
4 Comments | 



