
a perl regex to check whether a sequence is pure DNA or not. bonus question: what if I also need to allow IUPAC code for DNA. thx, L. -- ============================================================== Lieven Sterck, PhD Tel:+32 (0)9 3313821 Fax:+32 (0)9 3313809 VIB Department of Plant Systems Biology, UGent Bioinformatics and Evolutionary Genomics Division Technologiepark 927, B-9052 Gent, Belgium Email: lieven.sterck@psb.ugent.be Website: http://bioinformatics.psb.ugent.be -------------------------------------------------------------- Algal Genetics Group UMR 7139 CNRS-UPMC Végétaux Marins et Biomolécules (Marine Plants and Biomolecules) Station Biologique Place Georges Teissier, BP74 29682 Roscoff Cedex, France Website: http://www.sb-roscoff.fr/UMR7139/en/genetics.html ============================================================== "Facts are meaningless. You could use facts to prove anything that's even remotely true!" H. Simpson

lieven sterck wrote:
a perl regex to check whether a sequence is pure DNA or not.
bonus question: what if I also need to allow IUPAC code for DNA.
thx, L.
m/[ATCG]*/i for DNA. bonus: Add extra IUPAC chars between brackets -- ================================================================== Sebastian Proost PhD Student Tel:+32 (0)9 331 36 92 fax:+32 (0)9 3313809 VIB Department of Plant Systems Biology, Ghent University Technologiepark 927, 9052 Gent, BELGIUM sepro@psb.ugent.be http://www.psb.ugent.be ================================================================== "If I knew what I was doing, it wouldn't be called research." --Albert Einstein

Sebastian Proost wrote:
lieven sterck wrote:
a perl regex to check whether a sequence is pure DNA or not.
bonus question: what if I also need to allow IUPAC code for DNA.
thx, L.
m/[ATCG]*/i for DNA.
bonus: Add extra IUPAC chars between brackets
correction: m/^[ATCG]*$/i for DNA. since entire string should be made up of these chars (maybe you also need round brackets around the square ones... I haven't tested it ;) -- ================================================================== Sebastian Proost PhD Student Tel:+32 (0)9 331 36 92 fax:+32 (0)9 3313809 VIB Department of Plant Systems Biology, Ghent University Technologiepark 927, 9052 Gent, BELGIUM sepro@psb.ugent.be http://www.psb.ugent.be ================================================================== "If I knew what I was doing, it wouldn't be called research." --Albert Einstein

no points seppe, because there exists prot seq that can match the same regEx Sebastian Proost wrote:
lieven sterck wrote:
a perl regex to check whether a sequence is pure DNA or not.
bonus question: what if I also need to allow IUPAC code for DNA.
thx, L.
m/[ATCG]*/i for DNA.
bonus: Add extra IUPAC chars between brackets
-- ============================================================== Lieven Sterck, PhD Tel:+32 (0)9 3313821 Fax:+32 (0)9 3313809 VIB Department of Plant Systems Biology, UGent Bioinformatics and Evolutionary Genomics Division Technologiepark 927, B-9052 Gent, Belgium Email: lieven.sterck@psb.ugent.be Website: http://bioinformatics.psb.ugent.be -------------------------------------------------------------- Algal Genetics Group UMR 7139 CNRS-UPMC Végétaux Marins et Biomolécules (Marine Plants and Biomolecules) Station Biologique Place Georges Teissier, BP74 29682 Roscoff Cedex, France Website: http://www.sb-roscoff.fr/UMR7139/en/genetics.html ============================================================== "Facts are meaningless. You could use facts to prove anything that's even remotely true!" H. Simpson

yeah okay, but those sequences cannot be distinguished from each other, so i don't really get the problem. lieven sterck wrote:
no points seppe, because there exists prot seq that can match the same regEx
Sebastian Proost wrote:
lieven sterck wrote:
a perl regex to check whether a sequence is pure DNA or not.
bonus question: what if I also need to allow IUPAC code for DNA.
thx, L.
m/[ATCG]*/i for DNA.
bonus: Add extra IUPAC chars between brackets
-- ================================================================== Michiel Van Bel PhD student Tel:+32 (0)9 331 36 95 fax:+32 (0)9 3313809 VIB Department of Plant Systems Biology, Ghent University Technologiepark 927, 9052 Gent, BELGIUM mibel@psb.ugent.be http://www.psb.ugent.be ==================================================================

on the blast input page I have to determine if the seq provided is DNA for blastn/blastx and otherwise print an error. apparently NCBI can do it :-p L. Michiel Van Bel wrote:
yeah okay,
but those sequences cannot be distinguished from each other, so i don't really get the problem.
lieven sterck wrote:
no points seppe, because there exists prot seq that can match the same regEx
Sebastian Proost wrote:
lieven sterck wrote:
a perl regex to check whether a sequence is pure DNA or not.
bonus question: what if I also need to allow IUPAC code for DNA.
thx, L.
m/[ATCG]*/i for DNA.
bonus: Add extra IUPAC chars between brackets
-- ============================================================== Lieven Sterck, PhD Tel:+32 (0)9 3313821 Fax:+32 (0)9 3313809 VIB Department of Plant Systems Biology, UGent Bioinformatics and Evolutionary Genomics Division Technologiepark 927, B-9052 Gent, Belgium Email: lieven.sterck@psb.ugent.be Website: http://bioinformatics.psb.ugent.be -------------------------------------------------------------- Algal Genetics Group UMR 7139 CNRS-UPMC Végétaux Marins et Biomolécules (Marine Plants and Biomolecules) Station Biologique Place Georges Teissier, BP74 29682 Roscoff Cedex, France Website: http://www.sb-roscoff.fr/UMR7139/en/genetics.html ============================================================== "Facts are meaningless. You could use facts to prove anything that's even remotely true!" H. Simpson

And if you give NCBI a sequence consisting of only ACGTs they will say "sorry, that's a protein"? I doubt that... And they allow IUPAC code? lieven sterck wrote:
on the blast input page I have to determine if the seq provided is DNA for blastn/blastx and otherwise print an error. apparently NCBI can do it :-p
L.
Michiel Van Bel wrote:
yeah okay,
but those sequences cannot be distinguished from each other, so i don't really get the problem.
lieven sterck wrote:
no points seppe, because there exists prot seq that can match the same regEx
Sebastian Proost wrote:
lieven sterck wrote:
a perl regex to check whether a sequence is pure DNA or not.
bonus question: what if I also need to allow IUPAC code for DNA.
thx, L.
m/[ATCG]*/i for DNA.
bonus: Add extra IUPAC chars between brackets
-- ================================================================== Elisabeth Wischnitzki Tel:+32 (0)9 331 38 22 fax:+32 (0)9 3313809 VIB Department of Plant Systems Biology, Ghent University Technologiepark 927, 9052 Gent, BELGIUM elwis@psb.ugent.be http://www.psb.ugent.be ==================================================================

As soon as you use degenerated alphabets for DNA you will get proteins that match to this alphabet. Maybe you can just add a parameter "allow_IUPAC" and check for the stringent ACGT otherwise? Elisabeth Wischnitzki wrote:
And if you give NCBI a sequence consisting of only ACGTs they will say "sorry, that's a protein"? I doubt that...
And they allow IUPAC code?
lieven sterck wrote:
on the blast input page I have to determine if the seq provided is DNA for blastn/blastx and otherwise print an error. apparently NCBI can do it :-p
L.
Michiel Van Bel wrote:
yeah okay,
but those sequences cannot be distinguished from each other, so i don't really get the problem.
lieven sterck wrote:
no points seppe, because there exists prot seq that can match the same regEx
Sebastian Proost wrote:
lieven sterck wrote:
a perl regex to check whether a sequence is pure DNA or not.
bonus question: what if I also need to allow IUPAC code for DNA.
thx, L.
m/[ATCG]*/i for DNA.
bonus: Add extra IUPAC chars between brackets
-- ================================================================== Elisabeth Wischnitzki Tel:+32 (0)9 331 38 22 fax:+32 (0)9 3313809 VIB Department of Plant Systems Biology, Ghent University Technologiepark 927, 9052 Gent, BELGIUM elwis@psb.ugent.be http://www.psb.ugent.be ==================================================================
participants (4)
-
Elisabeth Wischnitzki
-
lieven sterck
-
Michiel Van Bel
-
Sebastian Proost