
As I want to sip from that beer, here's the solution: select bad_rows.* from people as bad_rows inner join ( select name, MAX(id) as min_id from your_table group by name having count(*) > 1 ) as good_rows on good_rows.name = bad_rows.name and good_rows.min_id <> bad_rows.id; Then you can delete this result set :) (execution time was 0.04sec on a set of 8000 names with 1306 duplicates) (thpar's solution had an execution time of 65.7 secs) Kenny Billiau wrote:
thpar wins!
Although with a query that has an exponential execution time. If the complete table has to be tested with half of the table that has to removed, then I'm not gonna let this query run over our 15 million similarity table ;)
So who can give me a query that runs in O(n)? (which would be any query that hasn't got an IN statement i guess) :D
(the winner can sip of thpar's freshly won beer)
-Kenny
Thomas Van Parys wrote:
Thomas Van Parys schreef:
Kenny Billiau schreef:
Hi,
just to test our MySQL vigilance, here's a simple challenge: You've got a table with rows that could contain some data twice. Show me a query that removes (or selects) the data that's been entered twice, keeping one copy!
For example, a simple table with 2 two fields: id and name. 'id' Is auto incrementing and name could be inserted twice or more. 1 thomas 2 thomas 3 thomas 4 michiel 5 sofie 6 michiel 7 kenny 8 elisabeth 9 michiel 10 sofie
First one to solve it, I'll buy a beer :)
SELECT MIN(id), name FROM nice_table GROUP BY name;
DELETE FROM nice_table WHERE id NOT IN (SELECT MIN(id) FROM nice_table GROUP BY name);
Beer me! (really) -- ================================================================== Kenny Billiau Web Developer Tel:+32 (0)9 331 36 95 fax:+32 (0)9 3313809 VIB Department of Plant Systems Biology, Ghent University Technologiepark 927, 9052 Gent, BELGIUM kenny.billiau@psb.ugent.be http://bioinformatics.psb.ugent.be ==================================================================