This begins with the instructions
for pairwise genome comparisons.
The additional instructions for trees or distance matrices for
multiple genomes
are located here.
Source and Destination Genomes Or Genome 1, Genome 2, ...
We consider a unichromosomal genome to be of a sequence of n genes.
The genes are represented by numbers 1, 2, ..., n.
The two orientations of gene i are represented by i and -i. A genome
is represented as a signed permutation of the
numbers 1, 2, ..., n. For example, one unichromosomal
genome with n=5 genes is
5 -3 4 2 -1
There are three kinds of unichromosomal genomes, described later.
A multichromosomal genome consists of n genes spread over m chromosomes. We represent it as a signed permutation of
1, 2, ..., n, with delimiters "$" inserted between
the chromosomes. For example, a genome with
12 genes spread over 3 chromosomes is
7 -2 8 3 $
5 9 -6 -1 12 $
11 4 10 $
For neatness, we have written each chromosome on a separate line, and
have terminated the last chromosome with the delimeter, but neither of
these are necessary. Any whitespace, including line breaks, simply
separates the genes and the chromosome delimeters; only the "$"
actually separates chromosomes.
This could also have been written in any of the following
alternative ways:
7 -2 8 3 $ 5 9 -6 -1 12 $ 11 4 10 $
7 -2 8 3 $ 5 9 -6 -1 12 $ 11 4 10
7 -2
8 3$ 5 9 -6
-1 12 $ 11 4 10
(This format is valid, but very sloppy; it
is provided for illustrative purposes only.)
Also, the order of the chromosomes and the direction of the
chromosomes do not matter in our algorithms. Thus, we could
represent this same genome by flipping the first chromosome
(reverse the order of its entries and negate them) and then
moving the last chromosome to the beginning:
11 4 10 $
-3 -8 2 -7 $
5 9 -6 -1 12 $
Using this or the original representation of
the genome makes no difference in computing
the rearrangement distance, or in the possible rearrangement scenarios
that can theoretically occur; however, it may affect
some of the arbitrary choices GRIMM makes, such as cap numbers and
which rearrangement scenario
is chosen for display. There can be similar subtle effects when MGR
has to make arbitrary choices.
Naming genomes and making comments: A genome may be named by preceding it with a line ">name", as shown below.
This name will be used in the report.
Comments are given as "# comment".
They are ignored.
# Comments are indicated with a "#"
# and last till the end of the line.
>Sample name
11 4 10 $
-3 -8 2 -7 $ # another comment
5 9 -6 -1 12 $
Default genome: If you enter only one
genome, GRIMM assumes you want to do a pairwise comparison of that
genome with the identity permutation
1 2 3 ... n
Although this makes sense for unichromosomal genomes,
it does not make much sense for multichromosomal genomes.
However, as
there really isn't any meaningful default to use in the
multichromosomal case, this default is as good as any.
Tip: If your genomes are long or you will be using them extensively, we
suggest that you create them in a single file in an editor (or
otherwise) on your own computer, and cut and paste them into the
genome windows.
This is an extension of the file format used by GRAPPA:
# useful comment about first genome
# another useful comment about it
>name of first genome
1 -4 2 $ # chromosome 1
-3 5 6 # chromosome 2
>name of second genome
5 -3 $
6 $
2 -4 1 $
For multiple genomes, continue this for as many genomes as required.
Instead of doing numerous cut and paste operations to
manually separate the genomes into their own individual genome windows,
you may cut and paste the entire file into one genome window.
The number of genome windows does not have to match the number of
genomes.
For example, in the multiple genome form with a default
view of 3 genome windows, you could still paste a dozen genomes
into a single window.
Chromosome types
The mathematical formulations of the various distance measurements
treat unichromosomal genomes
slightly differently. This is true in the published literature,
even if it has not been explicitly noted.
Circular: A circular chromosome has no physical start or end, or preferred direction,
so the choice of which gene to read first is arbitrary.
These 6 signed permutations all represent the same circular chromosome
(distance=0 between any two of them in circular mode):
1 2 3, 2 3 1, 3 1 2, -3 -2 -1,
-1 -3 -2, -2 -1 -3
Linear (directed): All 6 signed permutations above represent different chromosomes.
The chromosomes
3 1 2 and -2 -1 -3
are considered to be one reversal apart (distance=1).
This is the classical signed permutation reversal distance, which is the
distance metric used in several existing programs that only compute distances for unichromosomal genomes.
Linear (undirected): Chromosomes are not regarded as having a direction; flipping
a chromosome gives an equivalent genome. The two blue genomes above are
regarded as the same, with distance=0, but they are still not
the same as the other four circular shifts of them in the red list. In multichromosomal genomes, all chromosomes are of this type, and an
error message will be issued if you check off one of the other two
types.
Signed and Unsigned genomes
If the signs of the genes are not known, enter them without
signs and check the "unsigned" option. The program will
try to determine an assignment of signs that minimizes
the distance. If the genomes are too complex, it will
give an upper bound on the minimum genomic distance
instead.
This option is available for pairwise distance scenarios and also
for distance matrices among multiple genomes.
MGR only produces trees for signed genomes.
"Show all possible initial steps of optimal scenarios" only allows
signed genomes.
If your browser is capable and you have not disabled JavaScript,
the incompatible combinations will be disabled for you automatically.
Run, undo, clear, and sample data
runs the program.
only undoes changes since your last
submission. The behavior may depend on your browser. Use your browser's Back button to back up to previous inputs.
clears the form.
shows a menu of demonstration data.
Selecting data will automatically run the program with it, unless
you have an older browser or have disabled JavaScript, in which
case you will have to hit the run button.
Formatting options for pairwise scenarios
Report styles
The distance between the two genomes is
the minimum number of reversals, translocations, fissions, and fusions
required to transform one genome into the other. Usually there
is a multitude of scenarios using this number of steps.
GRIMM will show you one of the scenarios. The choice of formatting
options depends on the size of the genome and your interest in the
details of the mathematical algorithm used to produce the rearrangements.
One line per genome, displayed horizontally or vertically: A good choice for small genomes and
for seeing how the algorithm works. GRIMM concatenates all the chromosomes
together in an order determined by its algorithms. This lets you see
how translocations, fissions, and fusions are emulated by reversals
that cross chromosome boundaries. However, since each rearrangement
event involves just one or two chromosomes, this format is unwieldy for
large genomes.
One column: This is also suited to small genomes.
The chromosomes are shown separately.
If a chromosome is too large to
fit in the width of the screen, its genes will be shown on multiple lines.
Table borders (not line breaks) delineate the separate chromosomes.
Two column before & after:
This shows the chromosomes in two side-by-side tables
similar to the one column format.
The option "Only affected chromosomes" is the best choice for large
genomes, as it shows only the one or two chromosomes affected by
each rearrangement event.
Show all possible initial steps of optimal scenarios:
Every reversal, translocation, fission, and fusion that does not
create a new breakpoint is applied to the source genome. The
events that would reduce the distance by 1 are displayed graphically.
A summary of how many events changed the distance by -1, 0, or +1 is
displayed, and the number of events in each category that were
attempted until the first success (reducing distance by 1) is shown.
In the graphical display, the events are grouped according to which chromosome(s) they act on.
Single chromosome operations (reversal, fission)
/
denotes a fission reducing distance by 1
:
denotes a breakpoint where fission does not reduce distance by 1
denotes segments on which reversals reduce the distance by 1
Your browser may be able to compactify chromosomes that don't fit
in the window horizontally by showing the genes between
breakpoints/fissions over multiple lines.
For example, if the source genome has consecutive genes
1, 2, ..., 13, and the window is narrow, this might be displayed as
1 2 3
4 5
:
6 7
/
8 9 10
11 12
13
Resizing the window may cause the numbers to flow differently.
Two chromosome operations (translocation, fusion)
:
denotes a breakpoint (fissions are not indicated)
denotes segments on which translocations reduce the distance by 1
+
denotes a fusion reducing distance by 1.
If chromosomes A, B can be combined as A+B and B+A, it
is denoted A+B+. (The other two possible fusions, in which
one chromosome is flipped but not the other, are shown separately
when relevant.)
separates chromosomes when fusion does not reduce the distance by 1
Highlighting style
The genes involved in a rearrangement event can be highlighted in
a variety of ways, depending on the report style chosen: before they are rearranged; after they are rearranged; both before and after (in the two column formats); by
a yellow line drawn between the lines (in the one line formats); or no highlighting. If your browser permits it, the options that do not make sense
for the chosen report style will be disabled.
Caps (chromosome end markers)
Chromosome delimeters "$" or ";" are not displayed in the reports;
the chromosome boundaries are rendered graphically instead
as colored lines or table borders.
However, most report formats do, by default, display caps. These are artificial markers created by the multichromosomal
genomic distance algorithm to delimit the start and end of each chromosome.
This is a necessary part of the mathematical algorithms that compute
the distance and the rearrangement scenarios,
but it is not necessary for you to see them if you
do not need them.
If you enter the 12 gene genome
7 -2 8 3 $
5 9 -6 -1 12 $
11 4 10 $
it will initially add caps 13 and 14 to the first chromosome, 15 and 16
to the second, and 17 and 18 to the third:
13 7 -2 8 3 14
15 5 9 -6 -1 12 16
17 11 4 10 18
In the course of computing the distance and of computing rearrangement
scenarios, the caps will be rearranged as well.
Throughout a scenario with this genome, the numbers 1,...,12
will represent genes, and the numbers 13,...,18 will represent caps,
but the numbers 13 and 14 will not necessarily continue to
delimit the first chromosome, or even the same chromosome.
You may display the caps
as numbers 13,...,18; highlight them
as C13,...,C18 (default); or omit them all together.
Color coding
The genes are assigned a color based on their chromosome in the source or destination genome. There are only a limited
number of distinguishable web-safe colors that also contrast well with
the normal and highlighting backgrounds, so if there are a lot of
chromosomes, it cycles through the colors and then reuses them.
Pairwise or Multiple Genome Form
In a pairwise genome comparison, the rearrangement distance between
two genomes is given, and an example of a specific sequence of steps
achieving that distance is shown.
In a multiple genome comparison, a matrix of the pairwise distances is
displayed, and optionally, a phylogenetic tree is computed
by MGR.
A button at the top of the form lets you switch to the other form:
# genomes
On the multiple genome form, there is a box to adjust the number of
genome windows. Enter a new number of windows and press enter or
depending on your browser.
Technically this is not the number of genomes,
because you may leave windows blank, and because you may enter
multiple genomes in one window.
Multiple Genome Options
Action
Distance matrix only: This displays a matrix of the pairwise distances between the
genomes. Clicking on an entry in the matrix will show a pairwise
scenario achieving that distance. If you have updated the pairwise
scenario formatting options, they will be respected, but if you
have also changed the genomes (which would be inconsistent with
the matrix), those changes will be ignored.
This requires JavaScript be enabled.
Phylogenetic tree (MGR): This produces
A phylogenetic tree with the given genomes as the leaves.
Clicking on an edge runs GRIMM to produce
a pairwise scenario between the two genomes.
The "Newick Standard" string representation of the tree, for input
to other tree drawing software.
The distance matrix for the input genomes.
Since MGR does not produce instant results, this web interface to MGR
only permits small inputs: small numbers of genes and small
numbers of genomes. It aborts after a time limit of 1 minute.
Tree size
Edge length proportional to distance
and Total width of tree
: The edges will be stretched out in proportion to the distances
they represent, to fit horizontally within the specified number
of characters.
Not proportional: The input genomes are aligned on the
right.
: This only appears when a tree is displayed. You may change the
tree size options and hit this button to reformat
the tree. It does not recompute it, so any alterations you
have made to the genomes will be ignored.
This page was created by Glenn Tesler,
University of California, San Diego.