[BBC] mini-workshop on gene network inference, ULg, January 9, 10-12am

Pierre Geurts p.geurts at ulg.ac.be
Tue Jan 3 21:14:19 CET 2012


Dear Colleagues,

The Systems and modeling research unit and the GIGA thematic research unit in Systems Biology and Chemical biology have the pleasure to invite you to the following two seminars on gene network inference:

Speaker: Robert Küffner, Ludwig-Maximilians-Universität München, Munich, Germany
Title: Solutions and Practical Problems in Gene Regulatory Network Inference
Date and place: Monday January, 9th 10:00am, Room R7, Institut Montefiore, Sart Tilman, B28, Liège

Speaker: Christophe Ambroise, Laboratoire Statistique et Génome, Université d'Évry Val d'Essonne
Title: Inferring Sparse Gaussian Graphical Models for Biological Network
Date and place: Monday January, 9th 11:00am, Room R7, Institut Montefiore, Sart Tilman, B28, Liège

The seminars will be followed in the afternoon by Vân Anh Huynh-Thu's PhD defense:

Title: Machine learning-based feature ranking: Statistical interpretation and gene network inference
Date and place: Monday January 9th, 2:30pm, Room R3, Institut Montefiore, Sart Tilman, B28, Liège

Abstracts and address below.

With my best wishes for 2012,

Pierre Geurts

--

PLACE:

Institut Montefiore, Sart Tilman, B28, Liège
http://www.montefiore.ulg.ac.be/location.php


ABSTRACTS:

Speaker: Robert Küffner, Ludwig-Maximilians-Universität München, Munich, Germany
Title: Solutions and Practical Problems in Gene Regulatory Network Inference
Abstract:
The inference of gene regulatory networks from mRNA expression data is characterized by the development of many different approaches with their specific performances, data requirements, and inherent biases. The recent community-wide DREAM challenge conducted a large assessment of inference approaches. The accuracy of predictions was evaluated against experimentally supported interactions in the procaryote model organism E. coli, the eucaryote model organism S. cerevisiae and in silico target systems. Further analysis revealed not only which inference strategies are particularly successful but also which kind of specific information was utilized from the different types of experimental measurements. The challenge evaluation also revealed several weaknesses and limitations of current methods for the generation of artificial datasets as well as the state of the art inference methods. For instance, none of the examined approaches was able to achieve robust predictions from S. cerevisiae data. 

-----------

Speaker: Christophe Ambroise, Laboratoire Statistique et Génome, Université d'Évry Val d'Essonne
Title: Inferring Sparse Gaussian Graphical Models for Biological Network
Abstract:
Gaussian Graphical Models provide a convenient framework for representing dependencies between variables.  In this framework, a set
of variables is represented by an undirected graph, where vertices correspond to variables, and an edge connects two vertices if the
corresponding pair of variables are dependent, conditional on the remaining ones.  Recently, this tool has received a high interest for
the discovery of biological networks by l1-penalization of the model likelihood.

In this presentation, we introduce various ways of inferring sparse co-expression networks based on partial correlation coefficients from
either steady-state or time-course transcriptomic data. All proposals search for a latent structure of the network to drive the selection
of edges through an adaptive l1-penalization of the model likelihood. We focus on inference from samples collected in different experimental conditions and therefore not identically distributed.

-----------

Speaker: Vân Anh Huynh-Thu, University of Liège
Title: Machine learning-based feature ranking: Statistical interpretation and gene network inference
Abstract:
Machine learning techniques, and in particular supervised learning methods, are nowadays widely used in bioinformatics. Two prominent applications that we target specifically in this thesis are biomarker discovery and regulatory network inference. These two problems are commonly addressed through the use of feature ranking methods that order the input features of a supervised learning problem from the most to the less relevant for predicting the output. This thesis presents, on the one hand, methodological contributions around machine learning-based feature ranking techniques and on the other hand, more applicative contributions on gene regulatory network inference.

Our methodological contributions focus on the problem of selecting truly relevant features from machine learning-based feature rankings. Unlike the p-values returned by univariate tests, relevance scores derived from machine learning techniques to rank the features are usually not statistically interpretable. This lack of interpretability makes the identification of the truly relevant features among the top-ranked ones a very difficult task and hence prevents the wide adoption of these methods by practitioners. Our first contribution in this field concerns a procedure, based on permutation tests, that estimates for each subset of top-ranked features the probability for that subset to contain at least one irrelevant feature (called CER for "conditional error rate"). As a second contribution, we performed a large-scale evaluation of several, existing or novel, procedures, including our CER method, that all replace the original relevance scores with measures that can be interpreted in a statistical way. These procedures, which were assessed on several artificial and real datasets, differ greatly in terms of computing times and the tradeoff they achieve in terms of false positives and false negatives. Our experiments also clearly highlight that using model performance as a criterion for feature
selection is often counter-productive.

The problem of gene regulatory network inference can be formulated as several feature selection problems, each one aiming at discovering the regulators of one target gene. Within this family of methods, we developed the GENIE3 algorithm that exploits feature rankings derived from tree-based ensemble methods to infer gene networks from steady-state gene expression data. In a second step, we derived two extensions of GENIE3 that aim to infer regulatory networks from other types of data. The first extension exploits expression data provided by time course experiments, while the second extension is related to genetical genomics datasets, which contain expression data together with information about genetic markers. GENIE3 was best performer in the DREAM4 In Silico Multifactorial challenge in 2009 and in the DREAM5 Network Inference challenge in 2010, and its extensions perform very well compared to other methods on several artificial datasets.


More information about the BBClist mailing list