[Binari] some more SQL

Hey, i have to do 4 very similar queries (5 restrictions, and only the last one - coordinate- varies) . To give you some idea, ive written them down below. SELECT `ad_genes`.`gene_id` FROM `ad_genes`,`annotation` WHERE `ad_genes`.`exp_id`='2' AND `annotation`.`gene_id` = `ad_genes`.`gene_id` AND `annotation`.`species` = 'ath' AND `ad_genes`.`chr` ='1' AND `ad_genes`.`coordinate` = '9'; SELECT `ad_genes`.`gene_id` FROM `ad_genes`,`annotation` WHERE `ad_genes`.`exp_id`='2' AND `annotation`.`gene_id` = `ad_genes`.`gene_id` AND `annotation`.`species` = 'ath' AND `ad_genes`.`chr` ='1' AND `ad_genes`.`coordinate` = '109'; SELECT `ad_genes`.`gene_id` FROM `ad_genes`,`annotation` WHERE `ad_genes`.`exp_id`='2' AND `annotation`.`gene_id` = `ad_genes`.`gene_id` AND `annotation`.`species` = 'ath' AND `ad_genes`.`chr` ='1' AND `ad_genes`.`coordinate` = '309'; SELECT `ad_genes`.`gene_id` FROM `ad_genes`,`annotation` WHERE `ad_genes`.`exp_id`='2' AND `annotation`.`gene_id` = `ad_genes`.`gene_id` AND `annotation`.`species` = 'ath' AND `ad_genes`.`chr` ='1' AND `ad_genes`.`coordinate` ='409'; The thing is, when i execute them this way, each one takes about 500ms, so that makes 2 seconds in total (which is kinda crap). I could just create 1 query by concatenating the 4 coordinates and use coordinate IN (9,109,309,409). This also executes in about 500ms (which is quite an improvement). However, it is possible that there are no results (for 1 or more of these coordinates), and thus i do not always get the correct amount of data back. To make matters worse, sometimes these coordinates can be te same, and then i also have less then 4 results (i can filter those out with php however). So, is there any way to make these 4 queries (within about 500ms) and in such a way that i get always 4 results (which might be empty, but then i know at least that) ? Michiel PS: the 4 coordinates vary of course, this is just an example. -- ================================================================== Michiel Van Bel PhD student Tel:+32 (0)9 331 36 95 fax:+32 (0)9 3313809 VIB Department of Plant Systems Biology, Ghent University Technologiepark 927, 9052 Gent, BELGIUM mibel@psb.ugent.be http://www.psb.ugent.be ==================================================================

Michiel Van Bel schreef:
Hey,
i have to do 4 very similar queries (5 restrictions, and only the last one - coordinate- varies) . To give you some idea, ive written them down below.
SELECT `ad_genes`.`gene_id` FROM `ad_genes`,`annotation` WHERE `ad_genes`.`exp_id`='2' AND `annotation`.`gene_id` = `ad_genes`.`gene_id` AND `annotation`.`species` = 'ath' AND `ad_genes`.`chr` ='1' AND `ad_genes`.`coordinate` = '9'; SELECT `ad_genes`.`gene_id` FROM `ad_genes`,`annotation` WHERE `ad_genes`.`exp_id`='2' AND `annotation`.`gene_id` = `ad_genes`.`gene_id` AND `annotation`.`species` = 'ath' AND `ad_genes`.`chr` ='1' AND `ad_genes`.`coordinate` = '109'; SELECT `ad_genes`.`gene_id` FROM `ad_genes`,`annotation` WHERE `ad_genes`.`exp_id`='2' AND `annotation`.`gene_id` = `ad_genes`.`gene_id` AND `annotation`.`species` = 'ath' AND `ad_genes`.`chr` ='1' AND `ad_genes`.`coordinate` = '309'; SELECT `ad_genes`.`gene_id` FROM `ad_genes`,`annotation` WHERE `ad_genes`.`exp_id`='2' AND `annotation`.`gene_id` = `ad_genes`.`gene_id` AND `annotation`.`species` = 'ath' AND `ad_genes`.`chr` ='1' AND `ad_genes`.`coordinate` ='409';
The thing is, when i execute them this way, each one takes about 500ms, so that makes 2 seconds in total (which is kinda crap). I could just create 1 query by concatenating the 4 coordinates and use coordinate IN (9,109,309,409). This also executes in about 500ms (which is quite an improvement). However, it is possible that there are no results (for 1 or more of these coordinates), and thus i do not always get the correct amount of data back. To make matters worse, sometimes these coordinates can be te same, and then i also have less then 4 results (i can filter those out with php however).
So, is there any way to make these 4 queries (within about 500ms) and in such a way that i get always 4 results (which might be empty, but then i know at least that) ?
If you indeed do the concatenation thing and you include the field `ad_genes`.`coordinate` in the SELECT clause, you at least know for which coordinates you get results. You stuff this information into an array and let PHP sort the rest out. T. -- ================================================================== Thomas Van Parys Tel:+32 (0)9 331 36 95 fax:+32 (0)9 3313809 VIB Department of Plant Systems Biology, Ghent University Technologiepark 927, 9052 Gent, BELGIUM thomas.vanparys@psb.ugent.be http://bioinformatics.psb.ugent.be ==================================================================

Yes, I beleive thpar's solutions is the easiest one. If you make a lot of queries which take up a lot of time, you could consider spawning multiple threads which each do their query. You can fork in php :) in BOGAS I did the folowing to speed up two longlasting requests: function _launchDiffRequests($genome, $release, $locus_id, $date1, $date2) { $pid = pcntl_fork(); $loci = array(); if ($pid = -1) { $this->flash('Error' . $this->AnnotDB->error, $this->Session->read('referrer'), 360); exit(); } elseif ($pid) { if (!$locus1 = $this->AnnotDB->generalQ($genome, $release, $locus_id, array('modification_date' => $date1))) { $this->flash('Error' . $this->AnnotDB->error, $this->Session->read('referrer'), 360); exit(); } array_push($loci, $locus1); pcntl_wait(); } else { if (!$locus2 = $this->AnnotDB->generalQ($genome, $release, $locus_id, array('modification_date' => $date2))) { $this->flash('Error' . $this->AnnotDB->error, $this->Session->read('referrer'), 360); exit(); } array_push($loci, $locus2); } return $loci; } Unfortunatly, pcntl is not installed (and I don't like Dany recompiling php, as he said he had Francis' settings for php, but he hadn't). So .. you can use curl! function _lauchDiffRequest() $mh = curl_multi_init(); $handles = array(); for($i=0;$i<2;$i++) { $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, "http://url/"); curl_setopt($ch, CURLOPT_HEADER, 0); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); curl_setopt($ch, CURLOPT_TIMEOUT, 10); # get around mod_security curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.2) Gecko/20070223 Firefox/2.0.0.2'); curl_multi_add_handle($mh,$ch); $handles[] = $ch; } $threads = 0; curl_multi_exec($mh,$threads); while ($threads) { curl_multi_exec($mh,$threads); } $loci = array(); for($i=0; $i<count($handles); $i++) { array_push($loci, curl_multi_getcontent($handles[$i])); curl_multi_remove_handle($mh,$handles[$i]); } curl_multi_close($mh); return $loci; } Maybe this can speed up plaza a bit :) On Mon, 15 Sep 2008, Thomas Van Parys wrote:
Michiel Van Bel schreef:
Hey,
i have to do 4 very similar queries (5 restrictions, and only the last one - coordinate- varies) . To give you some idea, ive written them down below.
SELECT `ad_genes`.`gene_id` FROM `ad_genes`,`annotation` WHERE `ad_genes`.`exp_id`='2' AND `annotation`.`gene_id` = `ad_genes`.`gene_id` AND `annotation`.`species` = 'ath' AND `ad_genes`.`chr` ='1' AND `ad_genes`.`coordinate` = '9'; SELECT `ad_genes`.`gene_id` FROM `ad_genes`,`annotation` WHERE `ad_genes`.`exp_id`='2' AND `annotation`.`gene_id` = `ad_genes`.`gene_id` AND `annotation`.`species` = 'ath' AND `ad_genes`.`chr` ='1' AND `ad_genes`.`coordinate` = '109'; SELECT `ad_genes`.`gene_id` FROM `ad_genes`,`annotation` WHERE `ad_genes`.`exp_id`='2' AND `annotation`.`gene_id` = `ad_genes`.`gene_id` AND `annotation`.`species` = 'ath' AND `ad_genes`.`chr` ='1' AND `ad_genes`.`coordinate` = '309'; SELECT `ad_genes`.`gene_id` FROM `ad_genes`,`annotation` WHERE `ad_genes`.`exp_id`='2' AND `annotation`.`gene_id` = `ad_genes`.`gene_id` AND `annotation`.`species` = 'ath' AND `ad_genes`.`chr` ='1' AND `ad_genes`.`coordinate` ='409';
The thing is, when i execute them this way, each one takes about 500ms, so that makes 2 seconds in total (which is kinda crap). I could just create 1 query by concatenating the 4 coordinates and use coordinate IN (9,109,309,409). This also executes in about 500ms (which is quite an improvement). However, it is possible that there are no results (for 1 or more of these coordinates), and thus i do not always get the correct amount of data back. To make matters worse, sometimes these coordinates can be te same, and then i also have less then 4 results (i can filter those out with php however).
So, is there any way to make these 4 queries (within about 500ms) and in such a way that i get always 4 results (which might be empty, but then i know at least that) ?
If you indeed do the concatenation thing and you include the field `ad_genes`.`coordinate` in the SELECT clause, you at least know for which coordinates you get results. You stuff this information into an array and let PHP sort the rest out.
T.
--
participants (3)
-
Kenny Billiau
-
Michiel Van Bel
-
Thomas Van Parys