
As soon as you use degenerated alphabets for DNA you will get proteins that match to this alphabet. Maybe you can just add a parameter "allow_IUPAC" and check for the stringent ACGT otherwise? Elisabeth Wischnitzki wrote:
And if you give NCBI a sequence consisting of only ACGTs they will say "sorry, that's a protein"? I doubt that...
And they allow IUPAC code?
lieven sterck wrote:
on the blast input page I have to determine if the seq provided is DNA for blastn/blastx and otherwise print an error. apparently NCBI can do it :-p
L.
Michiel Van Bel wrote:
yeah okay,
but those sequences cannot be distinguished from each other, so i don't really get the problem.
lieven sterck wrote:
no points seppe, because there exists prot seq that can match the same regEx
Sebastian Proost wrote:
lieven sterck wrote:
a perl regex to check whether a sequence is pure DNA or not.
bonus question: what if I also need to allow IUPAC code for DNA.
thx, L.
m/[ATCG]*/i for DNA.
bonus: Add extra IUPAC chars between brackets
-- ================================================================== Elisabeth Wischnitzki Tel:+32 (0)9 331 38 22 fax:+32 (0)9 3313809 VIB Department of Plant Systems Biology, Ghent University Technologiepark 927, 9052 Gent, BELGIUM elwis@psb.ugent.be http://www.psb.ugent.be ==================================================================