[Beg-sysbiol] [Fwd: Fw: Important Notice on DREAM2 data]

25 Jan 2008

      ...
Dear DREAM participants,
You may remember that when you registered for the datasets in DREAM2
-------- Original Message --------
Subject: 	Fw: Important Notice on DREAM2 data
Date: 	Thu, 24 Jan 2008 23:38:55 -0500
From: 	Gustavo Stolovitzky <gustavo@us.ibm.com>
To: 	DREAM2 Authors of Accepted Posters 
<DREAM2_Authors_of_Accepted_Posters%IBMUS@us.ibm.com>, DREAM2 Authors of 
Accepted Submissions 
<DREAM2_Authors_of_Accepted_Submissions%IBMUS@us.ibm.com>, DREAM2 
Registrants <DREAM2_Registrants%IBMUS@us.ibm.com>

Neil and all,

Here is a conundrum:

1) We all agree that the DREAM exercise would be most useful with full 
disclosure of the gold standards.

2) However, in order to obtain data for the DREAM challenges we need to 
ensure that the data owners feel comfortable enough
sharing that information before publication.

In the case of the BCL6 target challenge, the fact that Neil Clarke 
could figure out which the gold standards were from the PR and ROC 
curves allowed him to get to richer
conclusions than he could have obtained without that information. (those 
are the conclusions that make DREAM useful). I believe that Neil should 
be able to use that information
provided that he doesn't disclose the true postive and negatives before 
publication by Andrea and collaborators.  Andrea, you are the owner of 
that data: is that fine with you?

I agree with Neil that providing the target identity to facilitate 
analyses with the commitment of the participants to not disclose the 
actual targets, would not affect in any way
the data owner's ability to publish. But the data owners will have the 
last word on this.

It would be nice if we continued this discussion in the DREAM discussion 
forum, at:

http://wiki.c2b2.columbia.edu/dream/discuss/

Neil: can you add your comments to the website?

Gustavo

Gustavo A. Stolovitzky, Ph.D.
Adj. Assoc Prof of Biomed Informatics, Columbia Univ
Mngr, Func Genomics & Sys Biology, IBM  Research
P.O.Box 218                                  Office :  (914) 945-1292
Yorktown Heights, NY 10598  Fax     :  (914) 945-4217
http://www.research.ibm.com/people/g/gustavo
http://domino.research.ibm.com/comm/research_projects.nsf/pages/fungen.index...

*Neil Clarke <clarken@gis.a-star.edu.sg>*

01/24/2008 09:11 PM

To
	Gustavo Stolovitzky/Watson/IBM@IBMUS, DREAM2 Authors of Accepted 
Posters <DREAM2_Authors_of_Accepted_Posters%IBMUS@us.ibm.com>, DREAM2 
Authors of Accepted Submissions 
<DREAM2_Authors_of_Accepted_Submissions%IBMUS@us.ibm.com>, DREAM2 
Registrants <DREAM2_Registrants%IBMUS@us.ibm.com>
cc
	<califano@c2b2.columbia.edu>, <haiyuan_yu@DFCI.HARVARD.EDU>, 
<piacosma@tigem.it>, <pedro.mendes@manchester.ac.uk>, 
<boris@borisland.com>, <boris@gnsbiotech.com>, <tgardner@bu.edu>, 
<jpaul@nyas.org>
Subject
	Re: Important Notice on DREAM2 data

Dear all, particularly those of you who defined the gold standard BCL6
targets for Challenge #1:

In order to figure out what worked - and, more importantly, what failed - we
*have* to be able to use the list of "gold standards".  Fortunately, I *do*
know which of the 200 genes we were given were considered true positives. I
was able to extract that information from the precision-recall and ROC
curves provided to us by the organizers, based on our prediction.

This knowledge of the "gold standard" set is absolutely essential to
analyzing what worked and what didn't. Those of you who were in NY may
remember that I used that knowledge to show that we would have been much
better off if we had only used our expression data analysis. We hurt
ourselves considerably trying to include gene ontologies, predicted binding
sites, ARACNe, publically  available ChIP data, etc.  Without knowing what
the gold standard set was, I would not have been able to figure this out. I
would gotten up and said that we did all these different things, and you
(and I) would probably have come to the conclusion that we did something
smart by incorporating these different terms. In fact, that's the wrong
conclusion, but the only way we know that its wrong that is because I was
able to figure out what was considered the gold standard set.

The talk would have been almost meaningless without this - and the same goes
for the paper that we are writing.

I have no intention of identifying the gold standard genes in my paper.
There wouldn't be any point, anyway - it doesn't matter to the analysis what
the gene names are. However, I *do* need to do analyses that rely on knowing
which of the genes are in the gold standard set.

I honestly don't see how these analyses could possibly infringe upon the
intellectual property of those providing the Challenge set, or affect in any
way their ability to publish or patent. However, if anyone disagrees with
this, I would welcome further discussion before we get much further in the
publication process.

Thanks!

Neil

there
...
was the notice inthe DREAM website stating
Important Notice
These datasets cannot be used for external publication without explicit
permission from the data owners.
The reason for this notice was that the people facilitating the data did
so, in some cases,  before their data was published.
In particular:
Challenge #1: BCL6 targets challenge:
The gold standard will be posted when the owners get their papers
published. After which you will be able to use this data for your
publication purposes.
For questions about this challenge, please contact <
califano(at)c2b2.columbia.edu>
Challenge #2: Protein-Protein interaction challenge.
The gold standards is posted in our website. Please contact Haiyuan Yu
<haiyuan_yu(at)dfci.harvard.edu> for permision to use the data.
Challenge #3: The five-gene-network challenge.
The gold standard is posted in our website. However, the original 
data has
not been published by the data producers yet. Please contact Pia Cosma
<piacosma(at)tigem.it> for authorization to use the data in any form. If
you are using it for our Proceedings in the Annals of the NY Academy of
Sciences,
please do not mention that the data were obtained from a synthetic
network, do not mention that it is coming from yeast, do not plot the 
qPCR
data. Do not
show the actual network topology. When referring to this data, please
refer to the "DREAM five-gene-network challenge,  Cantone et al,
unpublished data"
as a reference, until this work is  published by Pia and colaborators.
Challenge #4: The Insilico network challenge.
The gold standard is posted in our website. This data can be used without
problem. It was generated by Pedro Mendes. Please refer to
Pedro Mendes and the DREAM project if you use this data. For questions on
this data, please contact Pedro at: <pedro.mendes(at)manchester.ac.uk>
Challenge #5: The Genome-scale network challenge.
The gold standard is posted in our website. This data can be used without
problem. It was generated in Tim Gardner's lab, and curated by Boris
Hayete.
Please refer to Tim Gardner and Boris Hayete and the DREAM project if you
use this data. For questions on this data, please contact Boris at
<boris(at)borisland.com>
Thanks for your understanding and help on this.
Gustavo
Gustavo A. Stolovitzky, Ph.D.
Adj. Assoc Prof of Biomed Informatics, Columbia Univ
Mngr, Func Genomics & Sys Biology, IBM  Research
P.O.Box 218                                  Office :  (914) 945-1292
Yorktown Heights, NY 10598  Fax     :  (914) 945-4217
http://www.research.ibm.com/people/g/gustavo
http://domino.research.ibm.com/comm/research_projects.nsf/pages/fungen.index...
...
tml
-- 
Neil Clarke
Deputy Director
Genome Institute of Singapore
phone: (65) 6478 8005
60 Biopolis St
#02-01 Genome
Singapore 138672

Required notice:
-------------------------------------------------------------------------
This email is confidential and may be privileged. If you are not the
intended receipient, please delete it and notify us immediately. Please do
not copy or use it for any purpose, or disclose its contents to any other
person.  Thank you.
--------------------------------------------------------------------------

-- 
Yves Van de Peer, PhD.

Professor in Bioinformatics and Genome Biology
Group Leader Bioinformatics and Evolutionary Genomics
VIB Department of Plant Systems Biology, UGent
Ghent University
Technologiepark 927
B-9052 Ghent
Belgium

Phone: +32 (0)9 331 3807
Cell Phone: +32 (0)476 560 091
Fax: +32 (0)9 331 3809
email: yves.vandepeer@psb.ugent.be

http://bioinformatics.psb.ugent.be/

Yves Van de Peer

tags

participants (1)