Title
Expressed peptide tags: An additional layer of data for genome annotation
Date Issued
01 November 2006
Access level
metadata only access
Resource Type
journal article
Author(s)
Savidor A.
Donahoo R.S.
VerBerkmoes N.C.
Shah M.B.
Lamour K.H.
McDonald W.H.
Universidad de Tennessee
Abstract
While genome sequencing is becoming ever more routine, genome annotation remains a challenging process. Identification of the coding sequences within the genomic milieu presents a tremendous challenge, especially for eukaryotes with their complex gene architectures. Here, we present a method to assist the annotation process through the use of proteomic data and bioinformatics. Mass spectra of digested protein preparations of the organism of interest were acquired and searched against a protein database created by a six-frame translation of the genome. The identified peptides were mapped back to the genome, compared to the current annotation, and then categorized as supporting or extending the current genome annotation. We named the classified peptides Expressed Peptide Tags (EPTs). The well-annotated bacterium Rhodopseudomonas palustris was used as a control for the method and showed a high degree of correlation between EPT mapping and the current annotation, with 86% of the EPTs confirming existing gene calls and less than 1% of the EPTs expanding on the current annotation. The eukaryotic plant pathogens Phytophthora ramorum and Phytophthora sojae, whose genomes have been recently sequenced and are much less well-annotated, were also subjected to this method. A series of algorithmic steps were taken to increase the confidence of EPT identification for these organisms, including generation of smaller subdatabases to be searched against, and definition of EPT criteria that accommodates the more complex eukaryotic gene architecture. As expected, the analysis of the Phytophthora species showed less correlation between EPT mapping and their current annotation. While ∼76% of Phytophthora EPTs supported the current annotation, a portion of them (7.7% and 12.9% for P. ramorum and P. sojae, respectively) suggested modification to current gene calls or identified novel genes that were missed by the current genome annotation of these organisms. © 2006 American Chemical Society.
Start page
3048
End page
3058
Volume
5
Issue
11
Language
English
OCDE Knowledge area
Genética humana
Genética, Herencia
Subjects
Scopus EID
2-s2.0-33750967613
PubMed ID
Source
Journal of Proteome Research
ISSN of the container
15353893
Sources of information:
Directorio de Producción CientÃfica
Scopus