
I guess the data only changes when a new genome is added, so this might indeed be the only viable solution. I hadn't thought about it yet. Michiel PS: if anybody else has a brilliant solution, please feel free to submit it :) Thomas Abeel wrote:
How often does the data changes?
If this is not too often, you may want to put these counts in a separate table and update them when you update the other tables. This will probably be the only fast solution.
regards, Thomas
Michiel Van Bel wrote:
Hi,
perhaps you guys can help me with this. we have 2 tables:
* gf_data: 2 columns (gene_family id`s and gene_id`s) . This table has about 500k rows * annotation: lots of columns, but we only need the ones with the gene_id`s and the species. This table has about 300k rows.
We now need a couple of things:
* For each species, we need a count of genes that are in gene_families with a size larger than 1 * For each species, we need a count of gene_families.
Now, I've tried already a lot, but run into the problem of having to use either joins or nested queries, which do not really scale well apparently :s
I thought about creating a view (by appending the gene_family into the annotation table), but it's been such a long time i've done this that I cannot remember whether this will be any help at all.
So, any suggestions?
Michiel
PS: I'll think about a price for the winner :)
------------------------------------------------------------------------
_______________________________________________ Binari mailing list Binari@psb.ugent.be https://maillist.psb.ugent.be/mailman/listinfo/binari
-- ================================================================== Michiel Van Bel PhD student Tel:+32 (0)9 331 36 95 fax:+32 (0)9 3313809 VIB Department of Plant Systems Biology, Ghent University Technologiepark 927, 9052 Gent, BELGIUM mibel@psb.ugent.be http://www.psb.ugent.be ==================================================================