GRIMM and MGR

Description | Run it | Instructions | Examples | Publications | Download | GRIMM

Web Instructions

by Glenn Tesler

This begins with the instructions for pairwise genome comparisons. The additional instructions for trees or distance matrices for multiple genomes are located here.

Source and Destination Genomes
Or  Genome 1, Genome 2, ...

We consider a unichromosomal genome to be of a sequence of n genes. The genes are represented by numbers 1, 2, ..., n. The two orientations of gene i are represented by i and -i. A genome is represented as a signed permutation of the numbers 1, 2, ..., n. For example, one unichromosomal genome with n=5 genes is
5 -3 4 2 -1
There are three kinds of unichromosomal genomes, described later.

A multichromosomal genome consists of n genes spread over m chromosomes. We represent it as a signed permutation of 1, 2, ..., n, with delimiters "$" inserted between the chromosomes. For example, a genome with 12 genes spread over 3 chromosomes is

7 -2 8 3 $    
5 9 -6 -1 12 $
11 4 10 $     
For neatness, we have written each chromosome on a separate line, and have terminated the last chromosome with the delimeter, but neither of these are necessary. Any whitespace, including line breaks, simply separates the genes and the chromosome delimeters; only the "$" actually separates chromosomes. This could also have been written in any of the following alternative ways: Also, the order of the chromosomes and the direction of the chromosomes do not matter in our algorithms. Thus, we could represent this same genome by flipping the first chromosome (reverse the order of its entries and negate them) and then moving the last chromosome to the beginning:
11 4 10 $     
-3 -8 2 -7 $  
5 9 -6 -1 12 $
Using this or the original representation of the genome makes no difference in computing the rearrangement distance, or in the possible rearrangement scenarios that can theoretically occur; however, it may affect some of the arbitrary choices GRIMM makes, such as cap numbers and which rearrangement scenario is chosen for display. There can be similar subtle effects when MGR has to make arbitrary choices.

Naming genomes and making comments: A genome may be named by preceding it with a line ">name", as shown below. This name will be used in the report.
Comments are given as "# comment". They are ignored.

# Comments are indicated with a "#"
# and last till the end of the line.
>Sample name
11 4 10 $     
-3 -8 2 -7 $    # another comment
5 9 -6 -1 12 $

Default genome: If you enter only one genome, GRIMM assumes you want to do a pairwise comparison of that genome with the identity permutation

1 2 3 ... n
Although this makes sense for unichromosomal genomes, it does not make much sense for multichromosomal genomes. However, as there really isn't any meaningful default to use in the multichromosomal case, this default is as good as any.

 

Tip: If your genomes are long or you will be using them extensively, we suggest that you create them in a single file in an editor (or otherwise) on your own computer, and cut and paste them into the genome windows. This is an extension of the file format used by GRAPPA:

# useful comment about first genome
# another useful comment about it
>name of first genome
1 -4 2 $  # chromosome 1
-3 5 6    # chromosome 2
>name of second genome
5 -3 $
6 $
2 -4 1 $
For multiple genomes, continue this for as many genomes as required.

Instead of doing numerous cut and paste operations to manually separate the genomes into their own individual genome windows, you may cut and paste the entire file into one genome window. The number of genome windows does not have to match the number of genomes. For example, in the multiple genome form with a default view of 3 genome windows, you could still paste a dozen genomes into a single window.

Chromosome types

The mathematical formulations of the various distance measurements treat unichromosomal genomes slightly differently. This is true in the published literature, even if it has not been explicitly noted.

Signed and Unsigned genomes

If the signs of the genes are not known, enter them without signs and check the "unsigned" option. The program will try to determine an assignment of signs that minimizes the distance. If the genomes are too complex, it will give an upper bound on the minimum genomic distance instead.

This option is available for pairwise distance scenarios and also for distance matrices among multiple genomes.

MGR only produces trees for signed genomes. "Show all possible initial steps of optimal scenarios" only allows signed genomes. If your browser is capable and you have not disabled JavaScript, the incompatible combinations will be disabled for you automatically.

Run, undo, clear, and sample data

 


Formatting options for pairwise scenarios

Report styles

The distance between the two genomes is the minimum number of reversals, translocations, fissions, and fusions required to transform one genome into the other. Usually there is a multitude of scenarios using this number of steps. GRIMM will show you one of the scenarios. The choice of formatting options depends on the size of the genome and your interest in the details of the mathematical algorithm used to produce the rearrangements.

Highlighting style

The genes involved in a rearrangement event can be highlighted in a variety of ways, depending on the report style chosen: before they are rearranged; after they are rearranged; both before and after (in the two column formats); by a yellow line drawn between the lines (in the one line formats); or no highlighting. If your browser permits it, the options that do not make sense for the chosen report style will be disabled.

Caps (chromosome end markers)

Chromosome delimeters "$" or ";" are not displayed in the reports; the chromosome boundaries are rendered graphically instead as colored lines or table borders. However, most report formats do, by default, display caps. These are artificial markers created by the multichromosomal genomic distance algorithm to delimit the start and end of each chromosome. This is a necessary part of the mathematical algorithms that compute the distance and the rearrangement scenarios, but it is not necessary for you to see them if you do not need them.

If you enter the 12 gene genome

7 -2 8 3 $    
5 9 -6 -1 12 $
11 4 10 $     
it will initially add caps 13 and 14 to the first chromosome, 15 and 16 to the second, and 17 and 18 to the third:
13 7 -2 8 3 14    
15 5 9 -6 -1 12 16
17 11 4 10 18     
In the course of computing the distance and of computing rearrangement scenarios, the caps will be rearranged as well. Throughout a scenario with this genome, the numbers 1,...,12 will represent genes, and the numbers 13,...,18 will represent caps, but the numbers 13 and 14 will not necessarily continue to delimit the first chromosome, or even the same chromosome.

You may display the caps as numbers 13,...,18; highlight them as C13,...,C18 (default); or omit them all together.

Color coding

The genes are assigned a color based on their chromosome in the source or destination genome. There are only a limited number of distinguishable web-safe colors that also contrast well with the normal and highlighting backgrounds, so if there are a lot of chromosomes, it cycles through the colors and then reuses them.

 


Pairwise or Multiple Genome Form

In a pairwise genome comparison, the rearrangement distance between two genomes is given, and an example of a specific sequence of steps achieving that distance is shown.

In a multiple genome comparison, a matrix of the pairwise distances is displayed, and optionally, a phylogenetic tree is computed by MGR.

A button at the top of the form lets you switch to the other form:

# genomes

On the multiple genome form, there is a box to adjust the number of genome windows. Enter a new number of windows and press enter or depending on your browser. Technically this is not the number of genomes, because you may leave windows blank, and because you may enter multiple genomes in one window.

Multiple Genome Options

Action

Tree size

This page was created by Glenn Tesler, University of California, San Diego.