MGR

Description | Run it | Instructions | Examples | Publications | Download | GRIMM

Multiple Genome Rearrangements

A tool for constructing phylogenies based on gene order for unichromosomal and multichromosomal genomes

by Guillaume Bourque

Recent progress in genome-scale sequencing and comparative mapping raises new challenges in studies of genome rearrangements. Although the pairwise genome rearrangement problem is well-studied (see GRIMM for details and terminology), algorithms for reconstructing rearrangement scenarios for multiple species are in great need.

The Multiple Genome Rearrangement Problem is to find a phylogenetic tree describing the most "plausible" rearrangement scenario for multiple species. Although the reversal distance for a pair of genomes can be computed in polynomial time, its use in studies of multiple genome rearrangements was somewhat limited since it was not clear how to combine pairwise rearrangement scenarios into a multiple rearrangement scenario. In particular, Caprara, 1999 demonstrated that even the simplest version of the Multiple Genome Rearrangement Problem, the Median Problem, is NP-hard.

MGR implements an algorithm which, given a set of genomes, seeks a tree such that the sum of the rearrangements is minimized over all the edges of the tree. It can be used for phylogeny inference and also for inferrence of the ancestral gene orders. The algorithm is described in details in Bourque and Pevzner, 2002. Although the C program is not available yet, a web tool is available here for small instances of the problem. MGR produces trees in Newick format with an ASCII representation both described below.

The algorithm makes extensive use of GRIMM, the pairwise distance engine developed by Glenn Tesler.

Genomes and rearrangements

We consider 3 different types of genomes: A unichromosomal genome is represented by a permutation where each number corresponds to a gene. A multichromosomal genome is also represented by a permutation but with delimiters inserted between the chromosomes. For unichromosomal genomes, the rearrangement events that we consider are reversals. For multichromosomal genomes, the rearrangements events that we consider are reversals, translocations, fissions, and fusions. See GRIMM for more details. For example, 4 unichromosomal linear genomes could be
Genome0:   1 2 3 4 5
Genome1:   1 -2 3 4 5
Genome2:   1 2 3 -5 -4
Genome3:   1 -3 -5 -4 2

Newick format and ASCII representation

The Newick Standard uses nested parenthesis for representing trees. It allows the labelling of the leaf and the internal nodes. Branch lengths, which correspond to the number of rearrangements, can also be incorporated, preceded by a colon. For instance, given the example above, MGR would produce the following Newick representation
(Genome3:2,Genome1:1,(Genome0:0,Genome2:1)A4:0)A5;

And the following ASCII representation

        +-----------------2-----------------Genome3
        |
        |  +-------1--------Genome1
        +-A5
           |  +Genome0
           +-A4
              +-------1--------Genome2
The internal nodes correspond to ancestral nodes and are labelled by the letter A followed by a number (e.g. A4). The ASCII representation is generated by a modified version of the RETREE program available in the PHYLIP package by Joe Felsenstein. The number of rearrangements that occurred on each edge is shown. When no number is shown on an edge it means that no rearrangement occurred on that edge. In the tree produced by the web tool, it is possible to view actual rearrangements events by clicking on an edge to link to GRIMM.

Related web pages and software

Please note that this page was originally written in the early 2000s. The list of other software and publications does not include more recent developments in the field of genome rearrangements, but we have updated expired URLs when possible.

Most of the previous work on the Multiple Genome Rearrangement Problem studied the related problem of finding the phylogenetic tree minimizing the breakpoint distance. So far, this line of work has been restricted to unichromosomal genomes. Two such programs have been made available:

References

Authors

MGR began as a project in December 2000 at the University of Southern California. It is the work of Guillaume Bourque under the supervision of Pavel Pevzner. We are especially grateful to Glenn Tesler for uncountable recommendations and for providing the pairwise distance engine without which none of this would have been possible. We are also thankful to Bernard Moret for providing help with GRAPPA.

This page was created by Guillaume Bourque (now at McGill) and Glenn Tesler (UCSD).