Reading the outputTopContentsGene prediction: filling in the form

Gene prediction: filling in the form

Sequence frame

The query sequence and a probabilistic model for the target organism must be indicated here.

Two example sequences are provided: a GC-rich noisy sequence with a frameshift in the noisy region (region with 'N's) and a finished AT-rich sequence. Clicking the corresponding "Example" buttons will fill in the frame the corresponding sequence.

Options frame

The prediction of FrameD can be controlled by several parameters. Default values for all these parameters are provided.

  1. Compute mean expected prediction: this flag controls if FrameD will not only perform gene prediction by computing one optimal prediction but also compute a "mean" prediction. This "mean" prediction gives for each nucleotide the probability that this nucleotide is coding, non coding... considering all possible predictions. This allows to identify positions where FrameD optimal prediction may be unreliable: alternative choice starts, possible frameshifts...

  2. Increased sensitivity: by penalizing intergenic regions in the prediction score, this flag artificially increases FrameD sensitivity. This can be useful to analyze sequences with non standard coding statistics (eg. containing gene obtained through horizontal transfers) and which are usually visible in FrameD graphical interface as open reading frames with an unstable coding potential.

  3. Frameshift penalty: score penalty for predicting a frameshift. For finished sequences, this must take a high value typically around 20 (default value is 18). Lower penalties increase the probability that a frameshift will be predicted and are well suited eg. to EST cluster analysis and unfinished sequences.

  4. Stop penalty: score penalty for using a translation STOP in the prediction. The default value is 4. Lower penalties allow the prediction of smaller genes.
  5. Generate corrected sequence: when a frameshift is predicted, FrameD will generate a sequence where frameshifts have been corrected. Detected inserted nucleotides will be followed by two 'N's to correct the phase without loosing information and deleted nucleotides will be inserted back as 'N'. The corrected sequence is available for download on the prediction page.
  6. Generate translated sequence: for every gene predicted by FrameD that contains no predicted frameshift, the corresponding amino acid sequence is generated in a FASTA file. Each amino-acid sequence name is built from the DNA sequence name followed by the position of the gene in the original sequence. The phase of the gene follows as a comment. The translated sequences are available for download on the prediction page.
  7. GC content: adjust internal FrameD parameters for the GC content of your genome. Default parameters are adequate for rich and medium GC% genomes.
  8. Matured eukaryotic sequences analysis (ATG start only): this flag must be activated for intronless eukaryotic sequence analysis. It restricts Start codon prediction to the ATG codons, deactivates the ribosome binding site search, changes the a priori probabilities of being coding or not and disables gene overlapping
  9. TGA is not a STOP codon option should be activated for the analysis of sequences from organisms that uses a non standard genetic code where TGA codes for an amino-acid. Also available on the page for learning new probabilistic models for new organisms.
  10. Default RBS pattern indicates the RBS pattern used for RBS detection. The default value is ATTCCTCCA from E. Coli.

Protein similarities frame

To enhance prediction quality, FrameD can take into account protein similarities detected using the NCBI-Blastx program. You can paste (or upload) here information about similarities using the so-called "tabulated" format of NCBI-BlastX (obtained using the -m8 flag in NCBI BlastX rel. 2.2.5). The choice of the expectation threshold (BlastX -e flag) and protein databases used is left to the user. The hits for the two example sequences can be directly inserted in the text field by clicking on the corresponding "Example" buttons.

These similarities can be given more or less confidence (see the Options frame). With a default 0 confidence, similarities are visualized in FrameD output but do not affect the prediction process. Higher confidences increase the likelihood of being coding for regions with strong similarities.

Output parameters frame

The parameters in this frame configure the output format and contents but do not affect prediction in itself.


Reading the outputTopContentsGene prediction: filling in the form