Sergio Rodrigues de Morais

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sergio Rodrigues de Morais is active.

Explore More

Publication

Featured researches published by Sergio Rodrigues de Morais.

Neurocomputing | 2010

A novel Markov boundary based feature subset selection algorithm

Sergio Rodrigues de Morais; Alexandre Aussem

We aim to identify the minimal subset of random variables that is relevant for probabilistic classification in data sets with many variables but few instances. A principled solution to this problem is to determine the Markov boundary of the class variable. In this paper, we propose a novel constraint-based Markov boundary discovery algorithm called MBOR with the objective of improving accuracy while still remaining scalable to very high dimensional data sets and theoretically correct under the so-called faithfulness condition. We report extensive empirical experiments on synthetic data sets scaling up to tens of thousand variables.

european conference on machine learning | 2008

A novel scalable and data efficient feature subset selection algorithm

Sergio Rodrigues de Morais; Alexandre Aussem

In this paper, we aim to identify the minimal subset of discrete random variables that is relevant for probabilistic classification in data sets with many variables but few instances. A principled solution to this problem is to determine the Markov boundaryof the class variable. Also, we present a novel scalable, data efficient and correct Markov boundary learning algorithm under the so-called faithfulnesscondition. We report extensive empiric experiments on synthetic and real data sets scaling up to 139,351 variables.

Artificial Intelligence in Medicine | 2012

Analysis of nasopharyngeal carcinoma risk factors with Bayesian networks

Alexandre Aussem; Sergio Rodrigues de Morais; Marilys Corbex

OBJECTIVES We propose a new graphical framework for extracting the relevant dietary, social and environmental risk factors that are associated with an increased risk of nasopharyngeal carcinoma (NPC) on a case-control epidemiologic study that consists of 1289 subjects and 150 risk factors. METHODS This framework builds on the use of Bayesian networks (BNs) for representing statistical dependencies between the random variables. We discuss a novel constraint-based procedure, called Hybrid Parents and Children (HPC), that builds recursively a local graph that includes all the relevant features statistically associated to the NPC, without having to find the whole BN first. The local graph is afterwards directed by the domain expert according to his knowledge. It provides a statistical profile of the recruited population, and meanwhile helps identify the risk factors associated to NPC. RESULTS Extensive experiments on synthetic data sampled from known BNs show that the HPC outperforms state-of-the-art algorithms that appeared in the recent literature. From a biological perspective, the present study confirms that chemical products, pesticides and domestic fume intake from incomplete combustion of coal and wood are significantly associated with NPC risk. These results suggest that industrial workers are often exposed to noxious chemicals and poisonous substances that are used in the course of manufacturing. This study also supports previous findings that the consumption of a number of preserved food items, like house made proteins and sheep fat, are a major risk factor for NPC. CONCLUSION BNs are valuable data mining tools for the analysis of epidemiologic data. They can explicitly combine both expert knowledge from the field and information inferred from the data. These techniques therefore merit consideration as valuable alternatives to traditional multivariate regression techniques in epidemiologic studies.

Neurocomputing | 2010

A conservative feature subset selection algorithm with missing data

Alexandre Aussem; Sergio Rodrigues de Morais

This paper introduces a novel conservative feature subset selection method with incomplete data sets. The method is conservative in the sense that it selects the minimal subset of features that renders the rest of the features independent of the target (the class variable) without making any assumption about the missing data mechanism. This is achieved in the context of determining the Markov blanket of the target that reflects the worst-case assumption about the missing data mechanism, including the case when data are not missing at random. An application of the method on synthetic and real-world incomplete data is carried out to illustrate its practical relevance. The method is compared against state-of-the-art approaches such as the expectation-maximization (EM) algorithm and the available case technique.

BMC Bioinformatics | 2010

Analysis of lifestyle and metabolic predictors of visceral obesity with Bayesian Networks.

Alex Aussem; André Tchernof; Sergio Rodrigues de Morais; Sophie Rome

BackgroundThe aim of this study was to provide a framework for the analysis of visceral obesity and its determinants in women, where complex inter-relationships are observed among lifestyle, nutritional and metabolic predictors. Thirty-four predictors related to lifestyle, adiposity, body fat distribution, blood lipids and adipocyte sizes have been considered as potential correlates of visceral obesity in women. To properly address the difficulties in managing such interactions given our limited sample of 150 women, bootstrapped Bayesian networks were constructed based on novel constraint-based learning methods that appeared recently in the statistical learning community. Statistical significance of edge strengths was evaluated and the less reliable edges were pruned to increase the network robustness. To allow accessible interpretation and integrate biological knowledge into the final network, several undirected edges were afterwards directed with physiological expertise according to relevant literature.ResultsExtensive experiments on synthetic data sampled from a known Bayesian network show that the algorithm, called Recursive Hybrid Parents and Children (RHPC), outperforms state-of-the-art algorithms that appeared in the recent literature. Regarding biological plausibility, we found that the inference results obtained with the proposed method were in excellent agreement with biological knowledge. For example, these analyses indicated that visceral adipose tissue accumulation is strongly related to blood lipid alterations independent of overall obesity level.ConclusionsBayesian Networks are a useful tool for investigating and summarizing evidence when complex relationships exist among predictors, in particular, as in the case of multifactorial conditions like visceral obesity, when there is a concurrent incidence for several variables, interacting in a complex manner. The source code and the data sets used for the empirical tests are available at http://www710.univ-lyon1.fr/~aaussem/Software.html.

intelligent data analysis | 2009

Exploiting Data Missingness in Bayesian Network Modeling

Sergio Rodrigues de Morais; Alexandre Aussem

This paper proposes a framework built on the use of Bayesian networks (BN) for representing statistical dependencies between the existing random variables and additional dummy boolean variables, which represent the presence/absence of the respective random variable value. We show how augmenting the BN with these additional variables helps pinpoint the mechanism through which missing data contributes to the classification task. The missing data mechanism is thus explicitly taken into account to predict the class variable using the data at hand. Extensive experiments on synthetic and real-world incomplete data sets reveals that the missingness information improves classification accuracy.

artificial intelligence in medicine in europe | 2007

Nasopharyngeal Carcinoma Data Analysis with a Novel Bayesian Network Skeleton Learning Algorithm

Alexandre Aussem; Sergio Rodrigues de Morais; Marilys Corbex

In this paper, we discuss efforts to apply a novel Bayesian network (BN) structure learning algorithm to a real world epidemiological problem, namely the Nasopharyngeal Carcinoma (NPC). Our specific aims are : (1) to provide a statistical profile of the recruited population, (2) to help indentify the important environmental risk factors involved in NPC, and (3) to gain insight on the applicability and limitations of BN methods on small epidemiological data sets obtained from questionnaires. We discuss first the novel BN structure learning algorithm called Max-Min Parents and Children Skeleton (MMPC) developped by Tsamardinos et al. in 2005. MMPC was proved by extensive empirical simulations to be an excellent trade-off between time and quality of reconstruction compared to most constraint based algorithms, especially for the smaller sample sizes. Unfortunately, MMPC is unable to deal with datasets containing approximate functional dependencies between variables. In this work, we overcome this problem and apply the new version of MMPC on Nasopharyngeal Carcinoma data in order to shed some light into the statistical profile of the population under study.

Computers in Biology and Medicine | 2013

Learning the local Bayesian network structure around the ZNF217 oncogene in breast tumours

Emmanuel Prestat; Sergio Rodrigues de Morais; J. Vendrell; Aurélie Thollet; Christian Gautier; Pascale Cohen; Alex Aussem

In this study, we discuss and apply a novel and efficient algorithm for learning a local Bayesian network model in the vicinity of the ZNF217 oncogene from breast cancer microarray data without having to decide in advance which genes have to be included in the learning process. ZNF217 is a candidate oncogene located at 20q13, a chromosomal region frequently amplified in breast and ovarian cancer, and correlated with shorter patient survival in these cancers. To properly address the difficulties in managing complex gene interactions given our limited sample, statistical significance of edge strengths was evaluated using bootstrapping and the less reliable edges were pruned to increase the network robustness. We found that 13 out of the 35 genes associated with deregulated ZNF217 expression in breast tumours have been previously associated with survival and/or prognosis in cancers. Identifying genes involved in lipid metabolism opens new fields of investigation to decipher the molecular mechanisms driven by the ZNF217 oncogene. Moreover, nine of the 13 genes have already been identified as putative ZNF217 targets by independent biological studies. We therefore suggest that the algorithms for inferring local BNs are valuable data mining tools for unraveling complex mechanisms of biological pathways from expression data. The source code is available at http://www710.univ-lyon1.fr/∼aaussem/Software.html.

european conference on machine learning | 2010

An efficient and scalable algorithm for local Bayesian network structure discovery

Sergio Rodrigues de Morais; Alex Aussem

We present an efficient and scalable constraint-based algorithm, called Hybrid Parents and Children (HPC), to learn the parents and children of a target variable in a Bayesian network. Finding those variables is an important first step in many applications including Bayesian network structure learning, dimensionality reduction and feature selection. The algorithm combines ideas from incremental and divide-and-conquer methods in a principled and effective way, while still being sound in the sample limit. Extensive empirical experiments are provided on public synthetic and real-world data sets of various sample sizes. The most noteworthy feature of HPC is its ability to handle large neighborhoods contrary to current CB algorithm proposals. The number of calls to the statistical test, en hence the run-time, is empirically on the order O(n1.09), where n is the number of variables, on the five benchmarks that we considered, and O(n1.21) on a real drug design characterized by 138,351 features.

european conference on symbolic and quantitative approaches to reasoning and uncertainty | 2009

Robust Gene Selection from Microarray Data with a Novel Markov Boundary Learning Method: Application to Diabetes Analysis

Alexandre Aussem; Sergio Rodrigues de Morais; Florence Perraud; Sophie Rome

This paper discusses the application of a novel feature subset selection method in high-dimensional genomic microarray data on type 2 diabetes based on recent Bayesian network learning techniques. We report experiments on a database that consists of 22,283 genes and only 143 patients. The method searches the genes that are conjunctly the most associated to the diabetes status. This is achieved in the context of learning the Markov boundary of the class variable. Since the selected genes are subsequently analyzed further by biologists, requiring much time and effort, not only model performance but also robustness of the gene selection process is crucial. Therefore, we assess the variability of our results and propose an ensemble technique to yield more robust results. Our findings are compared with the genes that were associated with an increased risk of diabetes in the recent medical literature. The main outcomes of the present research are an improved understanding of the pathophysiology of obesity, and a clear appreciation of the applicability and limitations of Markov boundary learning techniques to human gene expression data.

Explore More