GRIMM

Description | Run it | Instructions | Human/cat/mouse data | Publications | Download | MGR

Genome Rearrangements In Man and Mouse

A tool for analyzing rearrangements in pairs of genomes, including unichromosomal and multichromosomal genomes, and signed and unsigned data

by Glenn Tesler

Two genomes may have many genes in common, but the genes may be arranged in a different sequence or be moved between chromosomes. Such differences in gene orders are the results of rearrangement events that are common in molecular evolution. For example, in unichromosomal genomes, the most common rearrangement events are reversals, in which a contiguous interval of genes is put into the reverse order. For multichromosomal genomes, the most common rearrangement events are reversals, translocations, fissions, and fusions, which are described later.

The pairwise genome rearrangement problem is to find an optimal scenario transforming one genome to another via these rearrangement events. We provide a C program and a web tool combining rearrangement algorithms for unichromosomal and multichromosomal genomes, with either signed or unsigned gene data. In each case, it computes the minimum possible number of rearrangement steps, and determines a possible scenario taking this number of steps.

GRIMM implements the Hannenhalli-Pevzner algorithms for computing unichromosomal and multichromosomal genomic distances, making extensive use of code that was adapted from GRAPPA for the unichromosomal problem. GRIMM also implements the Hannenhalli-Pevzner algorithm for computing the reversal distance between two unsigned unichromosomal genomes, and Tesler's algorithm for computing the distance between two unsigned multichromosomal genomes.

Representation of a genome

We consider a unichromosomal genome to be of a sequence of n genes. The genes are represented by numbers 1, 2, ..., n. The two orientations of gene i are represented by i and -i. A genome is represented as a signed permutation of the numbers 1, 2, ..., n. For example, a unichromosomal genome with n=5 genes is
5 -3 4 2 -1
A multichromosomal genome consists of n genes spread over m chromosomes. We represent it as a signed permutation of 1, 2, ..., n, with delimiters "$" or ";" inserted between the chromosomes. For example, a genome with 12 genes spread over 3 chromosomes is
7 -2 8 3 $    
5 9 -6 -1 12 $
11 4 10 $     
The order of the chromosomes and the direction of the chromosomes do not matter in the multichromosomal algorithms. Thus, we could represent this same genome by flipping the first chromosome (reverse the order of its entries and negate them) and then moving the last chromosome to the beginning:
11 4 10 $     
-3 -8 2 -7 $  
5 9 -6 -1 12 $
Further information regarding the syntax of genomes is available on the help page.

Unichromosomal genomes: sorting by reversal

A reversal in a signed permutation is an operation that takes an interval in a permutation, reverses the order of the numbers, and changes all their signs. For example,
5 1 3 2 -9 7 -4 6 8



5 1 -7 9 -2 -3 -4 6 8
The reversal distance between two genomes is the minimum number of reversals it takes to get from one genome to the other. For a given pair of genomes, the reversal distance is unique, but there are usually many possible reversal scenarios with this distance. However, it is possible that this mathematical notion of reversal distance can underestimate the actual number of steps that occurred biologically.

Multichromosomal genomes: rearrangement operations

We treat four elementary rearrangement events in multichromosomal genomes: reversals, translocations, fusions, and fissions.

Signed and unsigned genomes

Most comparative mapping techniques determine the physical locations and relative order of genes in each chromosome, but do not determine which of two orientations each gene has. Current sequencing methods do provide the orientations. It turns out that the genome rearrangement problem (uni- and multichromosomal) for unsigned permutations is NP-hard, but the same problems for signed data can be done in polynomial time. Fortunately, with many genomes currently being sequenced, it is likely that many comparative maps (corresponding to unsigned permutations) will soon be replaced by sequencing data (corresponding to signed permutations).

Existing data for which signs are not known may be entered into the programs without specifying the signs. The programs will find an optimal assignment of signs, if the data is not too complex, or will approximate it otherwise. For example, to turn the unsigned genome

1 2 3 4 5
into the unsigned genome
1 4 3 2 5
requires one unsigned reversal. The program determines an assignment of signs in the source and destination genomes that give a signed reversal scenario requiring this same number of steps. Here, we get
1 2 3 4 5



1 -4 -3 -2 5
which also takes one step. Note that there may be other sign assignments taking this minimum number of steps. It is possible that correctly signed data would have increased the number of steps:
1 2 3 4 5



1 -4 -3 -2 5



1 -4 3 -2 5
If the data collection method did not determine signs, it is impossible to know mathematically whether the one step or two step scenario is more biologically accurate; the mathematical problem the program solves is to find the signs giving the minimum possible distance.

Word problems and insertions/deletions

We do not currently consider "word problems" in which some genes are repeated,
1 2 -1 3 4
nor do we allow gaps in the numbering (as may arise from insertion/deletion),
1 3 -9 -7 5

Related web pages and software

Please note that this page was originally written in the early 2000s. The list of other software and publications does not include more recent developments in the field of genome rearrangements, but we have updated expired URLs when possible.

These sites offer programs to sort permutations by reversals. At their roots is the Hannenhalli-Pevzner algorithm for sorting signed permutations by reversals. Successive authors improved the algorithm.

In addition to these programs, Guillaume Bourque has developed an algorithm and implementation MGR for computing phylogenetic trees for multiple genomes, unichromosomal or multichromosomal. It uses GRIMM to do its distance computations.

References

Links to journals require subscriptions.

Authors

GRIMM began as a project by Glenn Tesler and Yang Yu for Pavel Pevzner's course CSE 291, in Spring, 2001, at UCSD. The original project was to implement the Hannenhalli-Pevzner algorithm for computing optimal reversal scenarios between multichromosomal signed genomes. This was written in C, and made extensive use of GRAPPA's invdist.c. Subsequent development, including the extensions of the algorithm, unsigned genomes, the web tool, and the related web pages, was done by Glenn Tesler.

This page was created by Glenn Tesler, University of California, San Diego.