
Hi, just to test our MySQL vigilance, here's a simple challenge: You've got a table with rows that could contain some data twice. Show me a query that removes (or selects) the data that's been entered twice, keeping one copy! For example, a simple table with 2 two fields: id and name. 'id' Is auto incrementing and name could be inserted twice or more. 1 thomas 2 thomas 3 thomas 4 michiel 5 sofie 6 michiel 7 kenny 8 elisabeth 9 michiel 10 sofie First one to solve it, I'll buy a beer :) have fun, Kenny -- ================================================================== Kenny Billiau Web Developer Tel:+32 (0)9 331 36 95 fax:+32 (0)9 3313809 VIB Department of Plant Systems Biology, Ghent University Technologiepark 927, 9052 Gent, BELGIUM kenny.billiau@psb.ugent.be http://bioinformatics.psb.ugent.be ==================================================================

iets gelijk dit: (is echt al een eeuwigheid geleden dat ik nog SQL heb gedaan). SELECT UNIQUE NAME, ID FROM TABLE; maar slaat die unique dan enkel op NAME of ook op ID? Meh, tis msschien voor iemand anders. Michiel Kenny Billiau wrote:
Hi,
just to test our MySQL vigilance, here's a simple challenge: You've got a table with rows that could contain some data twice. Show me a query that removes (or selects) the data that's been entered twice, keeping one copy!
For example, a simple table with 2 two fields: id and name. 'id' Is auto incrementing and name could be inserted twice or more. 1 thomas 2 thomas 3 thomas 4 michiel 5 sofie 6 michiel 7 kenny 8 elisabeth 9 michiel 10 sofie
First one to solve it, I'll buy a beer :)
have fun, Kenny
-- ================================================================== Kenny Billiau Web Developer Tel:+32 (0)9 331 36 95 fax:+32 (0)9 3313809 VIB Department of Plant Systems Biology, Ghent University Technologiepark 927, 9052 Gent, BELGIUM kenny.billiau@psb.ugent.be http://bioinformatics.psb.ugent.be ==================================================================
_______________________________________________ Binari mailing list Binari@psb.ugent.be https://maillist.psb.ugent.be/mailman/listinfo/binari
-- ================================================================== Michiel Van Bel PhD student Tel:+32 (0)9 331 36 95 fax:+32 (0)9 3313809 VIB Department of Plant Systems Biology, Ghent University Technologiepark 927, 9052 Gent, BELGIUM mibel@psb.ugent.be http://www.psb.ugent.be ==================================================================

Kenny Billiau schreef:
Hi,
just to test our MySQL vigilance, here's a simple challenge: You've got a table with rows that could contain some data twice. Show me a query that removes (or selects) the data that's been entered twice, keeping one copy!
For example, a simple table with 2 two fields: id and name. 'id' Is auto incrementing and name could be inserted twice or more. 1 thomas 2 thomas 3 thomas 4 michiel 5 sofie 6 michiel 7 kenny 8 elisabeth 9 michiel 10 sofie
First one to solve it, I'll buy a beer :)
SELECT MIN(id), name FROM nice_table GROUP BY name; Beer me!

yeah, this selects all the ones you could keep .. but doesn't remove the others ;) Thomas Van Parys wrote:
Kenny Billiau schreef:
Hi,
just to test our MySQL vigilance, here's a simple challenge: You've got a table with rows that could contain some data twice. Show me a query that removes (or selects) the data that's been entered twice, keeping one copy!
For example, a simple table with 2 two fields: id and name. 'id' Is auto incrementing and name could be inserted twice or more. 1 thomas 2 thomas 3 thomas 4 michiel 5 sofie 6 michiel 7 kenny 8 elisabeth 9 michiel 10 sofie
First one to solve it, I'll buy a beer :)
SELECT MIN(id), name FROM nice_table GROUP BY name;
Beer me!
_______________________________________________ Binari mailing list Binari@psb.ugent.be https://maillist.psb.ugent.be/mailman/listinfo/binari
-- ================================================================== Kenny Billiau Web Developer Tel:+32 (0)9 331 36 95 fax:+32 (0)9 3313809 VIB Department of Plant Systems Biology, Ghent University Technologiepark 927, 9052 Gent, BELGIUM kenny.billiau@psb.ugent.be http://bioinformatics.psb.ugent.be ==================================================================

Thomas Van Parys schreef:
Kenny Billiau schreef:
Hi,
just to test our MySQL vigilance, here's a simple challenge: You've got a table with rows that could contain some data twice. Show me a query that removes (or selects) the data that's been entered twice, keeping one copy!
For example, a simple table with 2 two fields: id and name. 'id' Is auto incrementing and name could be inserted twice or more. 1 thomas 2 thomas 3 thomas 4 michiel 5 sofie 6 michiel 7 kenny 8 elisabeth 9 michiel 10 sofie
First one to solve it, I'll buy a beer :)
SELECT MIN(id), name FROM nice_table GROUP BY name;
DELETE FROM nice_table WHERE id NOT IN (SELECT MIN(id) FROM nice_table GROUP BY name); Beer me! (really)

thpar wins! Although with a query that has an exponential execution time. If the complete table has to be tested with half of the table that has to removed, then I'm not gonna let this query run over our 15 million similarity table ;) So who can give me a query that runs in O(n)? (which would be any query that hasn't got an IN statement i guess) :D (the winner can sip of thpar's freshly won beer) -Kenny Thomas Van Parys wrote:
Thomas Van Parys schreef:
Kenny Billiau schreef:
Hi,
just to test our MySQL vigilance, here's a simple challenge: You've got a table with rows that could contain some data twice. Show me a query that removes (or selects) the data that's been entered twice, keeping one copy!
For example, a simple table with 2 two fields: id and name. 'id' Is auto incrementing and name could be inserted twice or more. 1 thomas 2 thomas 3 thomas 4 michiel 5 sofie 6 michiel 7 kenny 8 elisabeth 9 michiel 10 sofie
First one to solve it, I'll buy a beer :)
SELECT MIN(id), name FROM nice_table GROUP BY name;
DELETE FROM nice_table WHERE id NOT IN (SELECT MIN(id) FROM nice_table GROUP BY name);
Beer me! (really)
_______________________________________________ Binari mailing list Binari@psb.ugent.be https://maillist.psb.ugent.be/mailman/listinfo/binari
-- ================================================================== Kenny Billiau Web Developer Tel:+32 (0)9 331 36 95 fax:+32 (0)9 3313809 VIB Department of Plant Systems Biology, Ghent University Technologiepark 927, 9052 Gent, BELGIUM kenny.billiau@psb.ugent.be http://bioinformatics.psb.ugent.be ==================================================================

As I want to sip from that beer, here's the solution: select bad_rows.* from people as bad_rows inner join ( select name, MAX(id) as min_id from your_table group by name having count(*) > 1 ) as good_rows on good_rows.name = bad_rows.name and good_rows.min_id <> bad_rows.id; Then you can delete this result set :) (execution time was 0.04sec on a set of 8000 names with 1306 duplicates) (thpar's solution had an execution time of 65.7 secs) Kenny Billiau wrote:
thpar wins!
Although with a query that has an exponential execution time. If the complete table has to be tested with half of the table that has to removed, then I'm not gonna let this query run over our 15 million similarity table ;)
So who can give me a query that runs in O(n)? (which would be any query that hasn't got an IN statement i guess) :D
(the winner can sip of thpar's freshly won beer)
-Kenny
Thomas Van Parys wrote:
Thomas Van Parys schreef:
Kenny Billiau schreef:
Hi,
just to test our MySQL vigilance, here's a simple challenge: You've got a table with rows that could contain some data twice. Show me a query that removes (or selects) the data that's been entered twice, keeping one copy!
For example, a simple table with 2 two fields: id and name. 'id' Is auto incrementing and name could be inserted twice or more. 1 thomas 2 thomas 3 thomas 4 michiel 5 sofie 6 michiel 7 kenny 8 elisabeth 9 michiel 10 sofie
First one to solve it, I'll buy a beer :)
SELECT MIN(id), name FROM nice_table GROUP BY name;
DELETE FROM nice_table WHERE id NOT IN (SELECT MIN(id) FROM nice_table GROUP BY name);
Beer me! (really) -- ================================================================== Kenny Billiau Web Developer Tel:+32 (0)9 331 36 95 fax:+32 (0)9 3313809 VIB Department of Plant Systems Biology, Ghent University Technologiepark 927, 9052 Gent, BELGIUM kenny.billiau@psb.ugent.be http://bioinformatics.psb.ugent.be ==================================================================

And to make it interesting, I want to keep the results with the highest id ;) -Kenny Kenny Billiau wrote:
Hi,
just to test our MySQL vigilance, here's a simple challenge: You've got a table with rows that could contain some data twice. Show me a query that removes (or selects) the data that's been entered twice, keeping one copy!
For example, a simple table with 2 two fields: id and name. 'id' Is auto incrementing and name could be inserted twice or more. 1 thomas 2 thomas 3 thomas 4 michiel 5 sofie 6 michiel 7 kenny 8 elisabeth 9 michiel 10 sofie
First one to solve it, I'll buy a beer :)
have fun, Kenny
-- ================================================================== Kenny Billiau Web Developer Tel:+32 (0)9 331 36 95 fax:+32 (0)9 3313809 VIB Department of Plant Systems Biology, Ghent University Technologiepark 927, 9052 Gent, BELGIUM kenny.billiau@psb.ugent.be http://bioinformatics.psb.ugent.be ==================================================================
_______________________________________________ Binari mailing list Binari@psb.ugent.be https://maillist.psb.ugent.be/mailman/listinfo/binari
-- ================================================================== Kenny Billiau Web Developer Tel:+32 (0)9 331 36 95 fax:+32 (0)9 3313809 VIB Department of Plant Systems Biology, Ghent University Technologiepark 927, 9052 Gent, BELGIUM kenny.billiau@psb.ugent.be http://bioinformatics.psb.ugent.be ==================================================================

SELECT MAX(id), name FROM nice_table GROUP BY name; Kenny Billiau wrote:
And to make it interesting, I want to keep the results with the highest id ;)
-Kenny
Kenny Billiau wrote:
Hi,
just to test our MySQL vigilance, here's a simple challenge: You've got a table with rows that could contain some data twice. Show me a query that removes (or selects) the data that's been entered twice, keeping one copy!
For example, a simple table with 2 two fields: id and name. 'id' Is auto incrementing and name could be inserted twice or more. 1 thomas 2 thomas 3 thomas 4 michiel 5 sofie 6 michiel 7 kenny 8 elisabeth 9 michiel 10 sofie
First one to solve it, I'll buy a beer :)
have fun, Kenny
-- ================================================================== Kenny Billiau Web Developer Tel:+32 (0)9 331 36 95 fax:+32 (0)9 3313809 VIB Department of Plant Systems Biology, Ghent University Technologiepark 927, 9052 Gent, BELGIUM kenny.billiau@psb.ugent.be http://bioinformatics.psb.ugent.be ==================================================================
_______________________________________________ Binari mailing list Binari@psb.ugent.be https://maillist.psb.ugent.be/mailman/listinfo/binari
participants (4)
-
Kenny Billiau
-
Michiel Van Bel
-
Thomas Abeel
-
Thomas Van Parys