[Beg-sysbiol] [Fwd: Fw: Important Notice on DREAM2 data]

Dear DREAM participants,
You may remember that when you registered for the datasets in DREAM2
-------- Original Message -------- Subject: Fw: Important Notice on DREAM2 data Date: Thu, 24 Jan 2008 23:38:55 -0500 From: Gustavo Stolovitzky <gustavo@us.ibm.com> To: DREAM2 Authors of Accepted Posters <DREAM2_Authors_of_Accepted_Posters%IBMUS@us.ibm.com>, DREAM2 Authors of Accepted Submissions <DREAM2_Authors_of_Accepted_Submissions%IBMUS@us.ibm.com>, DREAM2 Registrants <DREAM2_Registrants%IBMUS@us.ibm.com> Neil and all, Here is a conundrum: 1) We all agree that the DREAM exercise would be most useful with full disclosure of the gold standards. 2) However, in order to obtain data for the DREAM challenges we need to ensure that the data owners feel comfortable enough sharing that information before publication. In the case of the BCL6 target challenge, the fact that Neil Clarke could figure out which the gold standards were from the PR and ROC curves allowed him to get to richer conclusions than he could have obtained without that information. (those are the conclusions that make DREAM useful). I believe that Neil should be able to use that information provided that he doesn't disclose the true postive and negatives before publication by Andrea and collaborators. Andrea, you are the owner of that data: is that fine with you? I agree with Neil that providing the target identity to facilitate analyses with the commitment of the participants to not disclose the actual targets, would not affect in any way the data owner's ability to publish. But the data owners will have the last word on this. It would be nice if we continued this discussion in the DREAM discussion forum, at: http://wiki.c2b2.columbia.edu/dream/discuss/ Neil: can you add your comments to the website? Gustavo Gustavo A. Stolovitzky, Ph.D. Adj. Assoc Prof of Biomed Informatics, Columbia Univ Mngr, Func Genomics & Sys Biology, IBM Research P.O.Box 218 Office : (914) 945-1292 Yorktown Heights, NY 10598 Fax : (914) 945-4217 http://www.research.ibm.com/people/g/gustavo http://domino.research.ibm.com/comm/research_projects.nsf/pages/fungen.index... *Neil Clarke <clarken@gis.a-star.edu.sg>* 01/24/2008 09:11 PM To Gustavo Stolovitzky/Watson/IBM@IBMUS, DREAM2 Authors of Accepted Posters <DREAM2_Authors_of_Accepted_Posters%IBMUS@us.ibm.com>, DREAM2 Authors of Accepted Submissions <DREAM2_Authors_of_Accepted_Submissions%IBMUS@us.ibm.com>, DREAM2 Registrants <DREAM2_Registrants%IBMUS@us.ibm.com> cc <califano@c2b2.columbia.edu>, <haiyuan_yu@DFCI.HARVARD.EDU>, <piacosma@tigem.it>, <pedro.mendes@manchester.ac.uk>, <boris@borisland.com>, <boris@gnsbiotech.com>, <tgardner@bu.edu>, <jpaul@nyas.org> Subject Re: Important Notice on DREAM2 data Dear all, particularly those of you who defined the gold standard BCL6 targets for Challenge #1: In order to figure out what worked - and, more importantly, what failed - we *have* to be able to use the list of "gold standards". Fortunately, I *do* know which of the 200 genes we were given were considered true positives. I was able to extract that information from the precision-recall and ROC curves provided to us by the organizers, based on our prediction. This knowledge of the "gold standard" set is absolutely essential to analyzing what worked and what didn't. Those of you who were in NY may remember that I used that knowledge to show that we would have been much better off if we had only used our expression data analysis. We hurt ourselves considerably trying to include gene ontologies, predicted binding sites, ARACNe, publically available ChIP data, etc. Without knowing what the gold standard set was, I would not have been able to figure this out. I would gotten up and said that we did all these different things, and you (and I) would probably have come to the conclusion that we did something smart by incorporating these different terms. In fact, that's the wrong conclusion, but the only way we know that its wrong that is because I was able to figure out what was considered the gold standard set. The talk would have been almost meaningless without this - and the same goes for the paper that we are writing. I have no intention of identifying the gold standard genes in my paper. There wouldn't be any point, anyway - it doesn't matter to the analysis what the gene names are. However, I *do* need to do analyses that rely on knowing which of the genes are in the gold standard set. I honestly don't see how these analyses could possibly infringe upon the intellectual property of those providing the Challenge set, or affect in any way their ability to publish or patent. However, if anyone disagrees with this, I would welcome further discussion before we get much further in the publication process. Thanks! Neil there
was the notice inthe DREAM website stating
Important Notice
These datasets cannot be used for external publication without explicit permission from the data owners.
The reason for this notice was that the people facilitating the data did so, in some cases, before their data was published.
In particular:
Challenge #1: BCL6 targets challenge: The gold standard will be posted when the owners get their papers published. After which you will be able to use this data for your publication purposes. For questions about this challenge, please contact < califano(at)c2b2.columbia.edu>
Challenge #2: Protein-Protein interaction challenge. The gold standards is posted in our website. Please contact Haiyuan Yu <haiyuan_yu(at)dfci.harvard.edu> for permision to use the data.
Challenge #3: The five-gene-network challenge. The gold standard is posted in our website. However, the original data has not been published by the data producers yet. Please contact Pia Cosma <piacosma(at)tigem.it> for authorization to use the data in any form. If you are using it for our Proceedings in the Annals of the NY Academy of Sciences, please do not mention that the data were obtained from a synthetic network, do not mention that it is coming from yeast, do not plot the qPCR data. Do not show the actual network topology. When referring to this data, please refer to the "DREAM five-gene-network challenge, Cantone et al, unpublished data" as a reference, until this work is published by Pia and colaborators.
Challenge #4: The Insilico network challenge. The gold standard is posted in our website. This data can be used without problem. It was generated by Pedro Mendes. Please refer to Pedro Mendes and the DREAM project if you use this data. For questions on this data, please contact Pedro at: <pedro.mendes(at)manchester.ac.uk>
Challenge #5: The Genome-scale network challenge. The gold standard is posted in our website. This data can be used without problem. It was generated in Tim Gardner's lab, and curated by Boris Hayete. Please refer to Tim Gardner and Boris Hayete and the DREAM project if you use this data. For questions on this data, please contact Boris at <boris(at)borisland.com>
Thanks for your understanding and help on this.
Gustavo
Gustavo A. Stolovitzky, Ph.D. Adj. Assoc Prof of Biomed Informatics, Columbia Univ Mngr, Func Genomics & Sys Biology, IBM Research P.O.Box 218 Office : (914) 945-1292 Yorktown Heights, NY 10598 Fax : (914) 945-4217 http://www.research.ibm.com/people/g/gustavo
http://domino.research.ibm.com/comm/research_projects.nsf/pages/fungen.index...
tml
-- Neil Clarke Deputy Director Genome Institute of Singapore phone: (65) 6478 8005 60 Biopolis St #02-01 Genome Singapore 138672 Required notice: ------------------------------------------------------------------------- This email is confidential and may be privileged. If you are not the intended receipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you. -------------------------------------------------------------------------- -- Yves Van de Peer, PhD. Professor in Bioinformatics and Genome Biology Group Leader Bioinformatics and Evolutionary Genomics VIB Department of Plant Systems Biology, UGent Ghent University Technologiepark 927 B-9052 Ghent Belgium Phone: +32 (0)9 331 3807 Cell Phone: +32 (0)476 560 091 Fax: +32 (0)9 331 3809 email: yves.vandepeer@psb.ugent.be http://bioinformatics.psb.ugent.be/
participants (1)
-
Yves Van de Peer