Description | Run it | Instructions | Examples | Publications | Download
Multiple Genome Rearrangements
A tool for constructing phylogenies based on gene order for unichromosomal and multichromosomal genomes
by Guillaume Bourque
Recent progress in genome-scale sequencing and comparative mapping raises new challenges in studies of genome rearrangements. Although the pairwise genome rearrangement problem is well-studied (see GRIMM for details and terminology), algorithms for reconstructing rearrangement scenarios for multiple species are in great need.
The Multiple Genome Rearrangement Problem is to find a phylogenetic tree describing the most "plausible" rearrangement scenario for multiple species. Although the reversal distance for a pair of genomes can be computed in polynomial time, its use in studies of multiple genome rearrangements was somewhat limited since it was not clear how to combine pairwise rearrangement scenarios into a multiple rearrangement scenario. In particular, Caprara, 1999 demonstrated that even the simplest version of the Multiple Genome Rearrangement Problem, the Median Problem, is NP-hard.
MGR implements an algorithm which, given a set of genomes, seeks a tree such that the sum of the rearrangements is minimized over all the edges of the tree. It can be used for phylogeny inference and also for inferrence of the ancestral gene orders. The algorithm is described in details in Bourque and Pevzner, 2002. Although the C program is not available yet, a web tool is available here for small instances of the problem. MGR produces trees in Newick format with an ASCII representation both described below.
The algorithm makes extensive use of GRIMM, the pairwise distance engine developed by Glenn Tesler.
Genomes and rearrangements
We consider 3 different types of genomes:
A unichromosomal genome is represented by a permutation where each number corresponds to a gene. A multichromosomal genome is also represented by a permutation but with delimiters inserted between the chromosomes. For unichromosomal genomes, the rearrangement events that we consider are reversals. For multichromosomal genomes, the rearrangements events that we consider are reversals, translocations, fissions, and fusions. See GRIMM for more details. For example, 4 unichromosomal linear genomes could be
- Unichromosomal circular
- Unichromosomal linear (directed)
- Multichromosomal (or linear undirected)
||1 2 3 4 5
||1 -2 3 4 5
||1 2 3 -5 -4
||1 -3 -5 -4 2
Newick format and ASCII representation
The Newick Standard uses nested parenthesis for representing trees. It allows the labelling of the leaf and the internal nodes. Branch lengths, which correspond to the number of rearrangements, can also be incorporated, preceded by a colon. For instance, given the example above, MGR would produce the following Newick representation
And the following ASCII representation
The internal nodes correspond to ancestral nodes and are labelled by the letter A followed by a number (e.g. A4). The ASCII representation is generated by a modified version of the RETREE program available in the PHYLIP package by Joe Felsenstein. The number of rearrangements that occurred on each edge is shown. When no number is shown on an edge it means that no rearrangement occurred on that edge. In the tree produced by the web tool, it is possible to view actual rearrangements events by clicking on an edge to link to GRIMM.
Related web pages and software
Most of the previous work on the Multiple Genome Rearrangement Problem studied the related problem of finding the phylogenetic tree minimizing the breakpoint distance. So far, this line of work has been restricted to unichromosomal genomes. Two such programs have been made available:
- BPAnalysis is written by Mathieu Blanchette and David Sankoff. The first program to reconstruct phylogenies based on gene order.
- GRAPPA is written by Moret, B.M.E.; Wyman, S.; Bader, D.A.; Warnow, T.; and Yan, M. It greatly improves on the results and on the efficiency of BPAnalysis.
- Hannenhalli, S. and Pevzner, P.A. (1999) Transforming cabbage into turnip: polynomial algorithm for sorting signed permutations by reversals. J. ACM 46(1), 1-27.
- Caprara, A. (1999) Formulations and complexity of multiple sorting by reversals. RECOMB-99, 84-93, Lyon, France. ACM Press.
- Bourque, G. and Pevzner, P.A. (2002) Genome-Scale Evolution: Reconstructing Gene Orders in the Ancestral Species. Genome Res., 12(1), 26-36.
- Hannenhalli, S., Chappey, C., Koonin, E. V. and Pevzner, P. A. (1995) Genome sequence comparison and scenarios for gene rearrangements: a test case. Genomics, 30:299-311.
- Sankoff, D., Sundaram, G. and Kececioglu, J. (1996) Steiner points in the space of genome rearrangements. International Journal of Foundations of Computer Science 7:1-9.
- Blanchette, M., Kunisawa, T. and Sankoff, D. (1999) Gene order breakpoint evidence in animal mitochondrial phylogeny. Journal of Molecular Evolution, 49:193-203.
- Cosner, M.E., Jansen, R.K., Moret, B.M.E., Raubeson, L.A., Wang, L.S., Warnow, T., and Wyman, S. (2000) An empirical comparison of phylogenetic methods on chloroplast gene order data in Campanulaceae. In Comparative Genomics (DCAF-2000), 104-115, Montreal.
- Hannenhalli, S. and Pevzner, P. (1996) To cut ... or not to cut (applications of comparative physical maps in molecular evolution). In Proceedings of the Seventh Annual ACM-SIAM Symposium on Discrete Algorithms (Atlanta, GA, 1996), ACM, New York, pp. 304-313.
MGR began as a project in December 2000 at the University of Southern California. It is the work of Guillaume Bourque under the supervision of Pavel Pevzner. We are especially grateful to Glenn Tesler for uncountable recommendations and for providing the pairwise distance engine without which none of this would have been possible. We are also thankful to Bernard Moret for providing help with GRAPPA.
This page was created by Guillaume Bourque, Genome Institute of Singapore