Protein-coding, or not protein-coding ?
Protea is a program that classifies nucleotide sequences either as protein-coding, or as other. The input is a set of related DNA or RNA sequences that need not to be aligned. The method takes advantage of the specific evolutionary pattern of coding sequences together with the consistency of reading frames to decide whether the sequences are coding. It is implemented with a graph-theoretical algorithm.
Q19267_CAEEL/158-187 UGUUUU---------GGAAAAGGAUCCUGUCAUGGAGAUGGAAGCCGCGAAGGCAGUGGA Q86G85_PSEIC/635-670 UGCCGGUCACCUGAAAACAACGAAAUCUGCAGUGGAAACGGA---CAAUGUGUAUGUGGA O97702_CANFA/508-543 UGCAGCCCCCGGGAGGGCCAGCCCGCCUGCAGCCAGCGGGGC---GAGUGCCUGUGUGGC Q19267_CAEEL/158-187 AAGUGUAAAUGUGAGACUGGA------------UAUACUGGAAAUCUAUGC Q86G85_PSEIC/635-670 CAAUGUAUGUGUAACUCUGACGAUGACCGCCACUAUAGUGGCAAAUACUGC O97702_CANFA/508-543 CAAUGUGUCUGCCAUAGCAGUGACUUUGGCAAGAUCACGGGCAAGUACUGC
Web interface: Click here.
Protea is freely distributed under the CECILL license. Please consult the enclosed README file for information about installing and running it. Protea was developed in C, and requires some freely available librairies (GMP and MPFR) and UNIX tools (Lex, Yacc). You also need to install ClustalW (credits).
Computational identification of protein-coding sequences by comparative analysis
A. Fontaine, H. Touzet. International journal of Data Mining and Bioinformatics, 3(2), pages 160-176