[Beg-evodevo] Comparative genomics: Lining up is hard to do

20 Jul 2008

      http://www.nature.com/nrg/journal/v9/n8/full/nrg2420.html

Fulltext:

Nature Reviews Genetics 9, 573 (August 2008) | doi:10.1038/nrg2420

    Comparative genomics: Lining up is hard to do

Tanita Casci

Comparisons between proteins or between DNA stretches tell a story about 
the evolutionary past. Unfortunately, however, which story they tell 
often depends on which method is used to line up the sequences. A new 
method claims to improve the accuracy of current alignments by making 
use not only of the similarity between sequences but also their 
phylogenetic history.

Commonly used methods for aligning multiple sequences rely on sequential 
pairwise alignments between sequences --- the most closely related 
sequences are aligned first, followed by progressively more distant 
ones, all the way down the evolutionary tree. The problem with these 
so-called 'progressive algorithms' is that it is impossible to 
distinguish insertions from deletions when using only two sequences, 
causing biases that distort subsequent alignments. Traditional 
implementations of this algorithm set a higher penalty for insertions 
than for deletions and deem the former less likely to have occurred. 
These biases are particularly problematic when there are multiple 
neighbouring insertions or deletions --- for example, such insertions 
end up being aligned one under the other even though, by definition, 
they are not evolutionarily related. Misleading alignments caused by 
these false homologies are propagated, and lead to incorrect inferences 
about sequence evolution.

Two sequences do not contain information about whether a length 
difference between them is caused by an insertion or a deletion --- 
however, in a multiple (as opposed to pairwise) alignment there are more 
than two sequences. The new phylogeny-aware algorithm uses outgroup 
information from related sequences to infer which of the two possible 
changes has happened. It does so by 'flagging' the gaps (length changes) 
made in earlier alignments so that they can be accounted for in future 
alignments without incurring a penalty. The re-use of a flagged gap also 
indicates that a particular length change was created by an insertion; 
the gap can then be labelled as a 'permanent' insertion and correctly 
kept distinct from any neighbouring ones.

The new algorithm, called PRANK 
<http://www.ebi.ac.uk/goldman-srv/prank/>, was tested on simulated 
sequence data. Traditional methods were shown to introduce biases, and 
therefore misalignments, even when sequences are closely related. PRANK 
performs better than traditional methods in this situation and shows no 
bias as the sequences get more diverged and become harder to align. It 
is striking that traditional methods are error prone even when sequences 
get increasingly similar --- indicating that denser species sampling and 
more DNA sequencing cannot correct the algorithmic flaw in the alignment.

Correct alignment remains a challenge, but less so than in the past. Why 
have programs traditionally performed so badly? The authors suggest that 
alignments were historically developed to compare protein sequences, in 
which any biases would be likely to cluster harmlessly in unconstrained 
regions. Given the increasingly widespread analysis of genomic DNA for 
understanding evolution there is a pressing need to get these methods up 
to scratch.

        Links

          WEB SITE

    * PRANK <http://www.ebi.ac.uk/goldman-srv/prank>

Top of page <http://www.nature.com/nrg/journal/v9/n8/full/nrg2420.html#top>

      References and links

          ORIGINAL RESEARCH PAPER

   1.

      Löytynoja, A & Goldman, N. Phylogeny-aware gap placement prevents
      errors in sequence alignment and evolutionary analysis. Science
      320, 1632--1635 (2008)

          * Article <http://dx.doi.org/10.1126/science.1158395>
          * PubMed
            <http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?holding=npg&cmd=Retrieve&db=PubMed&list_uids=18566285&dopt=Abstract>
          * ChemPort
            <http://chemport.cas.org/cgi-bin/sdcgi?APP=ftslink&action=reflink&origin=npg&version=1.0&coi=1:CAS:528:DC%2BD1cXnt1Oju74%3D&pissn=1471-0056&pyear=2008&md5=a79f51b2820cec08c03d9813f15c2349>

          FURTHER READING

Klaas Vandepoele

Yves Van de Peer

tags

participants (2)