Homograph: exploring protein homology and orthology in whole genomes


Abreu-Goodger C, Moreno-Hagelsieb G, Collado-Vides J and Merino E


Summary:

Homograph is an X-windows graphical interface for visualizing protein homology and orthology in whole genomes. A dot-plot is used to represent every pair of proteins that pass a certain threshold of similarity. The dots can be selected and colored by user determined categories, gene descriptions, or similarity cutoff.

Contact: merino@ibt.unam.mx


Downloads and Installation:

Homograph has been tested on RedHat and Mandrake Linux and Windows XP. Be warned, it runs MUCH better under linux.
Linux (highly recomended)
Windows
Although example databases are provided in the Downloads section, you may still want to make your own. In this case, please see these notes.


Screenshots and examples:
A few examples of how Homograph can be used:

To simply view the distribution of homologous proteins in a genome, for example:
- ABC transporters in M. loti, keyword used "ABC". Screenshot
- Possible transposable elements in Y. pestis KIM, keyword used "transpos". Screenshot
To compare the order of proteins in two closely related genomes:
- The orthologs shared by S. typhi and S. typhimurium. Two large inversions can be observed. The origin can be located by finding "dnaA" and is shown in yellow. Screenshot
- Two pathogenic strains of E. coli. The keyword "prophage" has been used to color the resulting proteins in red. These proteins can be seen to enrich the region that differenciates these two strains. Full screenshot, and a closeup, showing that the selected elements lie at the ends of the rearranged fragments.
Incorporating externally obtained data (go to the Downloads section to obtain these datasets):
- Comparison of two strains of H pylori. Blue and yellow show each genome's "Information storage and processing" proteins, acording to the COG classification, green shows those proteins that have the same classification in both genomes. Screenshot
- High and low GC Skew at 3rd base positions, using a sliding window equivalent to a 50th of the genome's size. This can be used to locate origin of replication. S. typhi vs S. typhimurium screenshot, in red high GC skew and in blue low GC skew.