Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Marco Scutari is active.

Publication


Featured researches published by Marco Scutari.


Artificial Intelligence in Medicine | 2013

Identifying significant edges in graphical models of molecular networks

Marco Scutari; Radhakrishnan Nagarajan

Objective Modelling the associations from high-throughput experimental molecular data has provided unprecedented insights into biological pathways and signalling mechanisms. Graphical models and networks have especially proven to be useful abstractions in this regard. Ad hoc thresholds are often used in conjunction with structure learning algorithms to determine significant associations. The present study overcomes this limitation by proposing a statistically motivated approach for identifying significant associations in a network. Methods and materials A new method that identifies significant associations in graphical models by estimating the threshold minimising the L1 norm between the cumulative distribution function (CDF) of the observed edge confidences and those of its asymptotic counterpart is proposed. The effectiveness of the proposed method is demonstrated on popular synthetic data sets as well as publicly available experimental molecular data corresponding to gene and protein expression profiles. Results The improved performance of the proposed approach is demonstrated across the synthetic data sets using sensitivity, specificity and accuracy as performance metrics. The results are also demonstrated across varying sample sizes and three different structure learning algorithms with widely varying assumptions. In all cases, the proposed approach has specificity and accuracy close to 1, while sensitivity increases linearly in the logarithm of the sample size. The estimated threshold systematically outperforms common ad hoc ones in terms of sensitivity while maintaining comparable levels of specificity and accuracy. Networks from experimental data sets are reconstructed accurately with respect to the results from the original papers. Conclusion Current studies use structure learning algorithms in conjunction with ad hoc thresholds for identifying significant associations in graphical abstractions of biological pathways and signalling mechanisms. Such an ad hoc choice can have pronounced effect on attributing biological significance to the associations in the resulting network and possible downstream analysis. The statistically motivated approach presented in this study has been shown to outperform ad hoc thresholds and is expected to alleviate spurious conclusions of significant associations in such graphical abstractions.


Genetics | 2014

Multiple quantitative trait analysis using bayesian networks.

Marco Scutari; Phil Howell; David J. Balding; Ian Mackay

Models for genome-wide prediction and association studies usually target a single phenotypic trait. However, in animal and plant genetics it is common to record information on multiple phenotypes for each individual that will be genotyped. Modeling traits individually disregards the fact that they are most likely associated due to pleiotropy and shared biological basis, thus providing only a partial, confounded view of genetic effects and phenotypic interactions. In this article we use data from a Multiparent Advanced Generation Inter-Cross (MAGIC) winter wheat population to explore Bayesian networks as a convenient and interpretable framework for the simultaneous modeling of multiple quantitative traits. We show that they are equivalent to multivariate genetic best linear unbiased prediction (GBLUP) and that they are competitive with single-trait elastic net and single-trait GBLUP in predictive performance. Finally, we discuss their relationship with other additive-effects models and their advantages in inference and interpretation. MAGIC populations provide an ideal setting for this kind of investigation because the very low population structure and large sample size result in predictive models with good power and limited confounding due to relatedness.


Statistical Applications in Genetics and Molecular Biology | 2013

Improving the efficiency of genomic selection

Marco Scutari; Ian Mackay; David J. Balding

Abstract We investigate two approaches to increase the efficiency of phenotypic prediction from genome-wide markers, which is a key step for genomic selection (GS) in plant and animal breeding. The first approach is feature selection based on Markov blankets, which provide a theoretically-sound framework for identifying non-informative markers. Fitting GS models using only the informative markers results in simpler models, which may allow cost savings from reduced genotyping. We show that this is accompanied by no loss, and possibly a small gain, in predictive power for four GS models: partial least squares (PLS), ridge regression, LASSO and elastic net. The second approach is the choice of kinship coefficients for genomic best linear unbiased prediction (GBLUP). We compare kinships based on different combinations of centring and scaling of marker genotypes, and a newly proposed kinship measure that adjusts for linkage disequilibrium (LD). We illustrate the use of both approaches and examine their performances using three real-world data sets with continuous phenotypic traits from plant and animal genetics. We find that elastic net with feature selection and GBLUP using LD-adjusted kinships performed similarly well, and were the best-performing methods in our study.


Frontiers in Physiology | 2010

Functional relationships between genes associated with differentiation potential of aged myogenic progenitors

Radhakrishnan Nagarajan; Sujay Datta; Marco Scutari; Marjorie L. Beggs; Greg T. Nolen; Charlotte A. Peterson

Aging is accompanied by considerable heterogeneity with possible co-expression of differentiation pathways. The present study investigates the interplay between crucial myogenic, adipogenic, and Wnt-related genes orchestrating aged myogenic progenitor differentiation (AMPD) using clonal gene expression profiling in conjunction with Bayesian structure learning (BSL) techniques. The expression of three myogenic regulatory factor genes (Myogenin, Myf-5, MyoD1), four genes involved in regulating adipogenic potential (C/EBPα, DDIT3, FoxC2, PPARγ), and two genes in the Wnt signaling pathway (Lrp5, Wnt5a) known to influence both differentiation programs were determined across 34 clones by quantitative reverse transcriptase polymerase chain reaction (qRT-PCR). Three control genes were used for normalization of the clonal expression data (18S, GAPDH, and B2M). Constraint-based BSL techniques, namely (a) PC Algorithm, (b) Grow-shrink (GS) algorithm, and (c) Incremental Association Markov Blanket (IAMB) were used to model the functional relationships (FRs) in the form of acyclic networks from the clonal expression profiles. A novel resampling approach that obviates the need for a user-defined confidence threshold is proposed to identify statistically significant FRs at small sample sizes. Interestingly, the resulting acyclic network consisted of FRs corresponding to myogenic, adipogenic, Wnt-related genes and their interaction. A significant number of these FRs were robust to normalization across the three house-keeping genes and the choice of the BSL technique. The results presented elucidate the delicate balance between differentiation pathways (i.e., myogenic as well as adipogenic) and possible cross-talk between pathways in AMPD.


PLOS Genetics | 2016

Using Genetic Distance to Infer the Accuracy of Genomic Prediction

Marco Scutari; Ian Mackay; David J. Balding

The prediction of phenotypic traits using high-density genomic data has many applications such as the selection of plants and animals of commercial interest; and it is expected to play an increasing role in medical diagnostics. Statistical models used for this task are usually tested using cross-validation, which implicitly assumes that new individuals (whose phenotypes we would like to predict) originate from the same population the genomic prediction model is trained on. In this paper we propose an approach based on clustering and resampling to investigate the effect of increasing genetic distance between training and target populations when predicting quantitative traits. This is important for plant and animal genetics, where genomic selection programs rely on the precision of predictions in future rounds of breeding. Therefore, estimating how quickly predictive accuracy decays is important in deciding which training population to use and how often the model has to be recalibrated. We find that the correlation between true and predicted values decays approximately linearly with respect to either FST or mean kinship between the training and the target populations. We illustrate this relationship using simulations and a collection of data sets from mice, wheat and human genetics.


Communications in Statistics-theory and Methods | 2012

Bayesian Network Structure Learning with Permutation Tests

Marco Scutari; A. Brogini

In literature there are several studies on the performance of Bayesian network structure learning algorithms. The focus of these studies is almost always the heuristics the learning algorithms are based on, i.e., the maximization algorithms (in score-based algorithms) or the techniques for learning the dependencies of each variable (in constraint-based algorithms). In this article, we investigate how the use of permutation tests instead of parametric ones affects the performance of Bayesian network structure learning from discrete data. Shrinkage tests are also covered to provide a broad overview of the techniques developed in current literature.


Bayesian Analysis | 2013

On the Prior and Posterior Distributions Used in Graphical Modelling

Marco Scutari

Graphical model learning and inference are often performed using Bayesian techniques. In particular, learning is usually performed in two separate steps. First, the graph structure is learned from the data; then the parameters of the model are estimated conditional on that graph structure. While the probability distributions involved in this second step have been studied in depth, the ones used in the first step have not been explored in as much detail. In this paper, we will study the prior and posterior distributions defined over the space of the graph structures for the purpose of learning the structure of a graphical model. In particular, we will provide a characterisation of the behaviour of those distributions as a function of the possible edges of the graph. We will then use the properties resulting from this characterisation to define measures of structural variability for both Bayesian and Markov networks, and we will point out some of their possible applications.


BMC Bioinformatics | 2009

NATbox: a network analysis toolbox in R

Shweta S. Chavan; Michael Bauer; Marco Scutari; Radhakrishnan Nagarajan

BackgroundThere has been recent interest in capturing the functional relationships (FRs) from high-throughput assays using suitable computational techniques. FRs elucidate the working of genes in concert as a system as opposed to independent entities hence may provide preliminary insights into biological pathways and signalling mechanisms. Bayesian structure learning (BSL) techniques and its extensions have been used successfully for modelling FRs from expression profiles. Such techniques are especially useful in discovering undocumented FRs, investigating non-canonical signalling mechanisms and cross-talk between pathways. The objective of the present study is to develop a graphical user interface (GUI), NATbox: Network Analysis Toolbox in the language R that houses a battery of BSL algorithms in conjunction with suitable statistical tools for modelling FRs in the form of acyclic networks from gene expression profiles and their subsequent analysis.ResultsNATbox is a menu-driven open-source GUI implemented in the R statistical language for modelling and analysis of FRs from gene expression profiles. It provides options to (i) impute missing observations in the given data (ii) model FRs and network structure from gene expression profiles using a battery of BSL algorithms and identify robust dependencies using a bootstrap procedure, (iii) present the FRs in the form of acyclic graphs for visualization and investigate its topological properties using network analysis metrics, (iv) retrieve FRs of interest from published literature. Subsequently, use these FRs as structural priors in BSL (v) enhance scalability of BSL across high-dimensional data by parallelizing the bootstrap routines.ConclusionNATbox provides a menu-driven GUI for modelling and analysis of FRs from gene expression profiles. By incorporating readily available functions from existing R-packages, it minimizes redundancy and improves reproducibility, transparency and sustainability, characteristic of open-source environments. NATbox is especially suited for interdisciplinary researchers and biologists with minimal programming experience and would like to use systems biology approaches without delving into the algorithmic aspects. The GUI provides appropriate parameter recommendations for the various menu options including default parameter choices for the user. NATbox can also prove to be a useful demonstration and teaching tool in graduate and undergraduate course in systems biology. It has been tested successfully under Windows and Linux operating systems. The source code along with installation instructions and accompanying tutorial can be found at http://bioinformatics.ualr.edu/natboxWiki/index.php/Main_Page.


PLOS ONE | 2013

Impact of Noise on Molecular Network Inference

Radhakrishnan Nagarajan; Marco Scutari

Molecular entities work in concert as a system and mediate phenotypic outcomes and disease states. There has been recent interest in modelling the associations between molecular entities from their observed expression profiles as networks using a battery of algorithms. These networks have proven to be useful abstractions of the underlying pathways and signalling mechanisms. Noise is ubiquitous in molecular data and can have a pronounced effect on the inferred network. Noise can be an outcome of several factors including: inherent stochastic mechanisms at the molecular level, variation in the abundance of molecules, heterogeneity, sensitivity of the biological assay or measurement artefacts prevalent especially in high-throughput settings. The present study investigates the impact of discrepancies in noise variance on pair-wise dependencies, conditional dependencies and constraint-based Bayesian network structure learning algorithms that incorporate conditional independence tests as a part of the learning process. Popular network motifs and fundamental connections, namely: (a) common-effect, (b) three-chain, and (c) coherent type-I feed-forward loop (FFL) are investigated. The choice of these elementary networks can be attributed to their prevalence across more complex networks. Analytical expressions elucidating the impact of discrepancies in noise variance on pairwise dependencies and conditional dependencies for special cases of these motifs are presented. Subsequently, the impact of noise on two popular constraint-based Bayesian network structure learning algorithms such as Grow-Shrink (GS) and Incremental Association Markov Blanket (IAMB) that implicitly incorporate tests for conditional independence is investigated. Finally, the impact of noise on networks inferred from publicly available single cell molecular expression profiles is investigated. While discrepancies in noise variance are overlooked in routine molecular network inference, the results presented clearly elucidate their non-trivial impact on the conclusions that in turn can challenge the biological significance of the findings. The analytical treatment and arguments presented are generic and not restricted to molecular data sets.


Behaviormetrika | 2018

Dirichlet Bayesian network scores and the maximum relative entropy principle

Marco Scutari

A classic approach for learning Bayesian networks from data is to identify a maximum a posteriori (MAP) network structure. In the case of discrete Bayesian networks, MAP networks are selected by maximising one of several possible Bayesian–Dirichlet (BD) scores; the most famous is the Bayesian–Dirichlet equivalent uniform (BDeu) score from Heckerman et al. (Mach Learn 20(3):197–243, 1995). The key properties of BDeu arise from its uniform prior over the parameters of each local distribution in the network, which makes structure learning computationally efficient; it does not require the elicitation of prior knowledge from experts; and it satisfies score equivalence. In this paper we will review the derivation and the properties of BD scores, and of BDeu in particular, and we will link them to the corresponding entropy estimates to study them from an information theoretic perspective. To this end, we will work in the context of the foundational work of Giffin and Caticha (Proceedings of the 27th international workshop on Bayesian inference and maximum entropy methods in science and engineering, pp 74–84, 2007), who showed that Bayesian inference can be framed as a particular case of the maximum relative entropy principle. We will use this connection to show that BDeu should not be used for structure learning from sparse data, since it violates the maximum relative entropy principle; and that it is also problematic from a more classic Bayesian model selection perspective, because it produces Bayes factors that are sensitive to the value of its only hyperparameter. Using a large simulation study, we found in our previous work [Scutari in J Mach Learn Res (Proc Track PGM 2016) 52:438–448, 2016] that the Bayesian–Dirichlet sparse (BDs) score seems to provide better accuracy in structure learning; in this paper we further show that BDs does not suffer from the issues above, and we recommend to use it for sparse data instead of BDeu. Finally, will show that these issues are in fact different aspects of the same problem and a consequence of the distributional assumptions of the prior.

Collaboration


Dive into the Marco Scutari's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Ian Mackay

National Institute of Agricultural Botany

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Phil Howell

National Institute of Agricultural Botany

View shared research outputs
Top Co-Authors

Avatar

Sophie Lèbre

University of Strasbourg

View shared research outputs
Top Co-Authors

Avatar

Allan Tucker

Brunel University London

View shared research outputs
Top Co-Authors

Avatar

Claudia Vitolo

European Centre for Medium-Range Weather Forecasts

View shared research outputs
Top Co-Authors

Avatar

Antoine Boivin

Université de Montréal

View shared research outputs
Top Co-Authors

Avatar

Chao-Jung Wu

Université du Québec à Montréal

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge