GENETACK: FRAMESHIFT IDENTIFICATION IN PROTEIN-CODING SEQUENCES BY THE VITERBI ALGORITHM
- نوع فایل : کتاب
- زبان : انگلیسی
- مؤلف : IVAN ANTONOV
- چاپ و سال / کشور: 2010
Description
We describe a new program for ab initio frameshift detection in protein-coding nucleotide sequences. The task is to distinguish the same strand overlapping ORFs that occur in the sequence due to a presence of a frameshifted gene from the same strand overlapping ORFs that encompass true overlapping or adjacent genes. The GeneTack program uses a hidden Markov model (HMM) of genomic sequence with possibly frameshifted protein-coding regions. The Viterbi algorithm finds the maximum likelihood path that discriminates between true adjacent genes and those adjacent protein-coding regions that just appear to be separate entities due to frameshifts. Therefore, the program can identify spurious predictions made by a conventional gene-finding program misled by a frameshift. We tested GeneTack as well as two earlier developed programs FrameD and FSFind on 17 prokaryotic genomes with frameshifts introduced randomly into known genes. We observed that the average frameshift prediction accuracy of GeneTack, in terms of (Sn+Sp)/2 values, was higher by a significant margin than the accuracy of two other programs. In addition, we observed that the average accuracy of GeneTack is favorably compared with the accuracy of the FSFind-BLAST program that uses protein database search to verify predicted frameshifts, even though GeneTack does not use external evidence. GeneTack is freely available at http://topaz.gatech.edu/GeneTack/.
Journal of Bioinformatics and Computational Biology Vol. 8, No. 3 (2010) 535–551 Received 25 October 2009 Revised 12 February 2010 Accepted 13 February 2010