 |  |  | Creating a model for a new organism |
Creating a model for a new organism
If the organism of the sequence you are studying is not available on
the initial web page and if no availableorganism model produces
satisfactory results, you can directly build a model for your organism
from know coding sequences.
From the original web page, click on the text link below the roganism
selection and you will be redirected to the AMrkov model building
page. Here:
- choose a name for your Markov model (name of the organism
typically)
- upload a FASTA file of CDS sequences of the organism. The CDS
should go from the start to the STOP (excluded). If you leave the
STOP at the end it will be automatically trimmed but if an in phase
STOP exists beyond this last one the sequence will be rejected.
- if you later want to predict genes in noisy (unfinished)
sequences, choose the "Build N resistant Markov model". If you
don't do this, FrameD can still work on unfinished sequences using a
an automatically built averaged model (less precise than an N
resistant Markov model).
- once the sequence is submitted and the model built without
error, you can go back to the initial page where a new model should
be available in the organism list (click on the "Back to the main
page" link, and reload it).
Considering the number of CDS that should be submitted, note that
FrameD uses interpolated Markov models which automatically adjust the
order of the model to the amount of data. So, the larger the data set
the better the model. On small data sets, you may get poor prediction
performances but overfitting problems should not occur.
 |  |  | Creating a model for a new organism |