Glunčić M, Paar V, Jelovina D

What is grm2012.exe?

grm2012.exe is command line tool for searching (tandem) repeats in genomic sequence.

How input file looks like?

Curent version suports .fasta, .fa and .txt files. Program runs in command line and file names are passed as an argument in the command line. If sequence is passed in .txt file, sequence must be in the first line withoth any characters (including whitespaces) before and after sequence.

How to start?

Program may need admin privileges to be able to write out data.
Download and extract files in some folder (eg. C:\New Folder\).
Open command prompt (Start->Run->cmd.exe press Enter).
In command prompt change current folder to the one where files are been exctracted (eg. cd "c:\New folder")
Start the calculation by passing file path and location as an argument to the program(eg. grm2012.exe "C:\New Folder\file.fasta").

Advanced: Additional argument can be passed to the program to change parameter (eg. grm2012.exe -cntrl1 -cntrl2 "C:\New Folder\file.fasta").

Available parameters:

parameter                    descriprion

-rtf Output .rft colourized data file with reduced data (full colour output can be very large).
-nortf Don't output .rft colour file with reduced data
-txt Output txt noncolourized data output (full data output)
-notxt Don't output txt noncolourized data output (full data output)
-kslen N Set key string length valute to the N.
  N can be in range fomr 1 to 16.
  Default value is 8.
-partlen L Set data breaking length to the value L.
  Default value is 1Mbp.
-GRMfilter G Set GRM data peak filter to value G.
  Default value is 600.
-help Displays this help.
-grm Write out GRM file.
-grm-M Write out GRM file, with output data len M.

How output looks like:

Founded tandem repeats are stored in .txt and/or .rtf file (by default txt only).
Files are saved in folder where input file is located, and are named:
Otput file is organized in tree columns:
first column contains start positins of copies in tandem repeats,
second column contains length of copy,
third column contains copy sequence (.rtf file shows only start and end of copy)
Tandems are separated from other tandems by horizontal line.

GRM-Total module

Glunčić M, Paar V, Basar I, Rosandić M

Computes the frequency vs. fragment length distribution for a given genomic sequence by superposing results of consecutive KSA segmentationscomputed for an ensemble of all n-bp key strings (4^n key strings). In GRM diagram each pronounced peak corresponds to one or more repeats at that length, tandem or dispersed.

GRM-Dom module

Basar I, Glunčić M, Paar V, Rosandić M

Determines dominant key string corresponding to fragment length for each peak in the GRM diagram. An n-bp key string (or a group of n-bp key strings) that gives the largest frequency for a fragment length under consideration is referred to as dominant key string.


GRM-Seg module

Basar I, Glunčić M, Paar V, Rosandić M

Performs segmentation of a given genomic sequence into KSA fragments using dominant key string from GRM-Dom module. Any periodic segment within the KSA length array reveals the location of repeat and provides genomic sequences of the corresponding repeat copies.

ColorHOR application

Pavin N, Paar V, Rosandić M, Glunčić M, Basar I, Pezer R

Here we develop a graphical user interface method ColorHOR for fast identification and analysis of higher order repeats (HORs) in a given genomic sequence, without requiring a priori information on composition of genomic sequence. Our graphical method ColorHOR is based on extension of the key-string algorithm (KSA). The choice of key-string is based on the standard consensus alpha satellite. ColorHOR program first constructs the alpha staircase, identifying alpha-satellite containing segments in a given sequence as stairs in alpha staircase, and then it constructs colored bands at positions of each stair, providing a direct visual identification of HORs (direct and/or reverse complement). We suggest that the HOR assignment obtained by ColorHOR be included into databases for complete genome sequence.