Wednesday, February 16, 2011

On powers and pitfalls of "Next-generation sequencing" in ecology and evolutionary biology

These days there is a lot of buzz about so-called "Next-generation sequencing" techniques, i. e. the novel molecular methods that allows ecologists and evoutionary biologists to rapidly sequence the entire genomes of their favourite organisms. A lot is promised by enthusiastic proponents, and it is understandable and easy to get carried away and to think that all interesting problems will be solved in the near future. As usual, when it comes to new techniques and fashionable scentific "bandwagons, it is healthy with some sound scepticism and critical attitude. I found an excellent such criticial perspective on the interesting and thoughtful blog "The Molecular Ecologist", which is worth reading. Here are som excerpts, but you should read it in full:

Developing genomics tools for ecological organisms is desirable because we can study a wider range of phenotypic traits over evolutionary timescales and in more populations than was possible previously. Through this we are likely to gain a more realistic and comparative understanding of how selection works on natural levels of genetic variation, where this genetic variation comes from and how it is maintained.” Stapley et al. 2010 (TREE)

Okay let’s just get this out of the way. We’re way past calling next generation sequencing technologies, “Next Gen.” I mean, really isn’t Next Gen, yesterday’s news? With the advent of the third generation sequencing technologies that can sequence a single-molecule of DNA, we’re out of date in our terms.

The blogger criticizes the often made rather arrogant arguments that next-generation sequencing will solve many of the "mysteries" about the genetics of adaptation that was not possible to study before, while in reality, many workers in the classical model systems knew a lot since before:

In this article, Stapley et al. (2010), suggest that ecologists tend to have a good idea of what traits might be involved in adaptation for their study organism. They also suggest that geneticists know a lot about the genomic architecture of a few classical model organisms but very little about the ecological relevance. This argument is a little bit of a strawman, because it sets up a false opposition between ecology and genetics. In their eyes, the importance of this technology is that it will make it easier to integrate both ecological and genomic data and to develop for ecologically interesting organisms “a range of genomic resources such as whole genome sequences, transcriptome sequences, and genome-wide marker panels can be generated within the scope of a three-year grant.”

When I first read this statement, I thought that the authors had found a practical way to explain the rate at which genomic data can be generated. But then I realized how uncomfortable the phrase, “can be generated within the scope of a three-year grant” made me feel. And while I can’t put my finger on the exact reasons, I think it’s because it underscores the stark reality that research has to operate within the confines of short-term constraints. Clearly the authors mean that this will shorten the timeframe for researchers to start answering the interesting questions on any organism.

Moreover, using next-generation sequencing techniques on "any" randomly chosen organism is unlikely to generate any interesting data per se, if we know little about the ecology and natural history of the organism in question. It will only be interesting if it already is a well-characterized "model organism", in the sense that it has been extensively studied (preferably experimentally and in the field) for many years, and preferably decades. Interesting "questions" do not pop out of the blue and by themselves, but they are only relevant if there is good natural history knowledge about the organism in question. In other words, if you do not have a past research experience of your creature for many years and know "what to do", it will hardly be worth the money and effort to do next-generation sequencing, because you will be flooded with useless genomic information that you will not know what to do with:

However, the link between generating genomic data for interesting ecological organisms and how high-throughput sequencing technology has already reinvigorated current studies of the genetic basis of adaptation is missing something. The tacit implication is that because we can use HTST to create extensive genomic toolkits on non-model organisms, we should be able to gain a stronger understanding of how selection operates on ecologically relevant variation. And thus answer some of the questions that have “puzzled ecological geneticists for decades.”

I don’t disagree that we’ll move science along, but all of the non-model organisms described in the review have had extensive conceptual legwork contributed by many, many scientists over several years. It is because these biological systems are so highly developed conceptually that the power of HTST can be fully realized. 

For example in the three-spine stickleback system, it has taken several generations of grad students and postdocs to work out that replicate isolated freshwater stickleback populations were independently derived from their oceanic ancestors, that there is no gene flow between these isolated populations of freshwater habitats, that there is significant variation in behavior, life history, and morphology, that diversification happened very rapidly, and that selection has acted in parallel in these different isolated freshwater habitats evoking similar phenotypic trajectories at local, regional and global scales (the references are too numerous to cite so I’ve included a select few: Orti et al. 1994 Evolution, McKinnon and Rundle 2002 TREE, Hohenlohe et al. 2010 PLoS Genetics).

Lastly, and very importantly, next-generation sequencing techniques can be used in the study of parallell evolution and speciation, but it will not pick up all genes involved in adaptation if there are historical contingencies and if different genotypes contribute to the same phenotype. Here, we'll have to relly on other techniques, such as informed guesses and searching for candidate genes with a priori known function: 

In the case of the stickleback system, Baird et al. 2008 and subsequently, Hohenlohe et al. 2010 used Illumina-sequenced RAD tags to gather genome-scale sequence data on natural populations. The data confirmed previous work that freshwater populations were independently derived from the oceanic populations. Furthermore, using high-throughput sequencing technology (RAD-tags), researchers identified 9 genomic regions (3% of the genome) that were differentiated between the two ecotypes (freshwater and oceanic) and thus, putative candidate regions associated with adaptation to freshwater. Some of these genomic regions co-localized with previously identified loci of major effect (e.g. the Ectodysplasin A (Eda) locus). But using this HT sequence data, researchers found several additional regions showing parallel differentiation across independent populations. The power of this much data is that now there is a list of novel candidate regions that may be important in adaptation to freshwater.

Even more interesting is that the data generated from HT sequencing did not find elevated divergence in a region previously identified as underlying a major phenotypic change between the marine and freshwater fish. This pelvic structure, a bony stomach with spines, is present in the marine fish but reduced in the freshwater. The region responsible is a cis-acting tissue-specific enhancer located in the Pituitary homeobox transcription factor 1 gene (Pitx1) found at the telomeric end of linkage group seven (Chan et al. 2008 Science) . So why did high-throughput sequencing data, which provided 45,000 SNPs to the researchers not detect this locus? Hohenlohe et al. (2010) suggest that multiple alleles were selected in different freshwater populations leading to a soft sweep pattern. If, as Hohenlohe et al. suggest, that the soft sweep pattern is true, then using only high-throughput sequencing data to detect regions of adaptive significance could potentially lead to a bias against detecting this form of selection.

High-throughput sequencing technologies do allow each lab to cheaply and in a relatively quick timeframe generate a specific type of genomic data that can inform our understanding of how ecology impacts the genomic architecture of an organism. But it does not mean that within the scope of a three-year grant we will generate anything remotely resembling a detailed picture of the genetics of adaptation. This rich picture will be formed after several decades of hair-pulling by grad students, postdocs and their supervisors all of whom will toil away testing, challenging and advancing our understanding of adaptation.

So what should we conclude then? Next-generation sequencing is dead, long live phenomics", perhaps? Only time will tell. There is clearly a reaction and a movement away from the naive reductionist world-view created by genomics, and increasing awareness that the phenotype should be put back in to the centre of evolutionary biology. My educated guess is that this trend will continue and grow in the near future, as many are gathering messy and large data-sets from next-generation sequencing, and will struggle to get any meaningful results from these data. What we can conclude already now, I think, is that (as usual) new techniques promise more than they can deliver, and we will never find the "silver bullet" method that explains everything. As usual, it is the ecological and evolutionary questions that must be at the centre of attention of any investigation. A dose of healthy scepticism is, however, usually very helpful. And do not forget to read the classics and learn your natural history. That will help a lot, also in the future. 


  1. Interesting post, Erik. I agree that genomics isn't a silver bullet, and has been over-hyped. But, I also believe (and I think you agree) that the new genomic tools will be very useful. We need nuclear, multi-locus phylogeography, for example. Another example: we need a better understanding of the factors that contribute to genome size, as genome size is correlated with cell size, growth rate, etc. Ultimately, genome size can have serious ecological and phenotypic consequences! There are many, many other uses that are yet to materialize, but I think will. Anyway, no disagreement here with Erik - we need phenomics, we need natural history, and we need well-studied ecological systems to maximize the power of genomic approaches.

  2. Sure Shawn, and as you know we are already using "Next-generation sequencing" in our research, so this was not intended to abandon it entirely. I agree that genome size evolution is quite an important topic (especially in salamanders, many of which have HUGE genomes!), and perhaps this is an area in which next-gen seq. might be especially useful. However, in some other areas, such as in ecological speciation, parallell evolution and studies of "young systems" which are separated by only a few loci, it is already quite clear that next-gen seq. will not solve all the questions about the genetics of adaptation. I think this was also the main point by the blogger "The Molecular Ecologist", which I found really useful and interesting.

    Also, in multilocus phylogeography I can also see the benefits, as it will be possible to generate a lot of data from many different loci and strengthen the inferences compared to only a few genes ("Gene trees are not species trees" is as true as it ever was).