[Beg-sysbiol] module testing

7 Jun 2006

      hi there,

I wrote a program to test the functional coherence of the modules as 
described in Segal's PhD. The test is simply looking for any significant 
overrepresentation of a given GO category in each module, measured with 
the hypergeometric test with a bonferroni correction. For each module, 
the result is counted as the percentage of genes associated with a 
significant GO category (p<0.05) / number of genes tested for this module.

biocomp modnet with HC
----------------------
Mean    52.48
Std Dev 38.92
Num. Modules > 0%       40
Num. modules > 50%      27

Segal
-----
Mean    50.47
Std Dev 34.43
Num. Modules > 0%       43
Num. modules > 50%      26

As you can see our results are similar to those of segal, even slightly 
better. 40 modules /50 have a least 1 "significant" gene and 26 modules 
have more than 50% of the gene than are "significant". The average 
percentage is 52% per module, value higher compared to Segal.

Besides that, the computing with the Hierarchical clustering approach is 
amazingly fast: less than 5 minutes for the Yeast data (2355 genes x 173 
experimenents and 321 regulators), compared to several hours for the 
classical greedy approach. That would be a big advantage for applying 
the technique to big datasets like Arabidopsis or Human..

I'll try to use the data contained in the paper send by Steven to test 
the value of the regulation programs.

Eric

PS: for those interested, the script:
/nas/biocomp/projects/segal/java/erbon/gotest.pl

perl gotest.pl gene_association.sgd segal_mod.tab

Eric Bonnet

Tom Michoel

Eric Bonnet

tags

participants (2)