
hi there, I wrote a program to test the functional coherence of the modules as described in Segal's PhD. The test is simply looking for any significant overrepresentation of a given GO category in each module, measured with the hypergeometric test with a bonferroni correction. For each module, the result is counted as the percentage of genes associated with a significant GO category (p<0.05) / number of genes tested for this module. biocomp modnet with HC ---------------------- Mean 52.48 Std Dev 38.92 Num. Modules > 0% 40 Num. modules > 50% 27 Segal ----- Mean 50.47 Std Dev 34.43 Num. Modules > 0% 43 Num. modules > 50% 26 As you can see our results are similar to those of segal, even slightly better. 40 modules /50 have a least 1 "significant" gene and 26 modules have more than 50% of the gene than are "significant". The average percentage is 52% per module, value higher compared to Segal. Besides that, the computing with the Hierarchical clustering approach is amazingly fast: less than 5 minutes for the Yeast data (2355 genes x 173 experimenents and 321 regulators), compared to several hours for the classical greedy approach. That would be a big advantage for applying the technique to big datasets like Arabidopsis or Human.. I'll try to use the data contained in the paper send by Steven to test the value of the regulation programs. Eric PS: for those interested, the script: /nas/biocomp/projects/segal/java/erbon/gotest.pl perl gotest.pl gene_association.sgd segal_mod.tab

that's good news and great work, eric ! On Tuesday 06 June 2006 23:41, Eric Bonnet wrote:
hi there,
I wrote a program to test the functional coherence of the modules as described in Segal's PhD. The test is simply looking for any significant overrepresentation of a given GO category in each module, measured with the hypergeometric test with a bonferroni correction. For each module, the result is counted as the percentage of genes associated with a significant GO category (p<0.05) / number of genes tested for this module.
biocomp modnet with HC ---------------------- Mean 52.48 Std Dev 38.92 Num. Modules > 0% 40 Num. modules > 50% 27
Segal ----- Mean 50.47 Std Dev 34.43 Num. Modules > 0% 43 Num. modules > 50% 26
As you can see our results are similar to those of segal, even slightly better. 40 modules /50 have a least 1 "significant" gene and 26 modules have more than 50% of the gene than are "significant". The average percentage is 52% per module, value higher compared to Segal.
Besides that, the computing with the Hierarchical clustering approach is amazingly fast: less than 5 minutes for the Yeast data (2355 genes x 173 experimenents and 321 regulators), compared to several hours for the classical greedy approach. That would be a big advantage for applying the technique to big datasets like Arabidopsis or Human..
I'll try to use the data contained in the paper send by Steven to test the value of the regulation programs.
Eric
PS: for those interested, the script: /nas/biocomp/projects/segal/java/erbon/gotest.pl
perl gotest.pl gene_association.sgd segal_mod.tab
_______________________________________________ Beg-sysbiol mailing list Beg-sysbiol@psb.ugent.be http://oberon.fvms.ugent.be:8080/mailman/listinfo.cgi/beg-sysbiol
-- Tom Michoel <http://www.psb.ugent.be/~tomic/>
participants (2)
-
Eric Bonnet
-
Tom Michoel