Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Claudio Conversano is active.

Publication


Featured researches published by Claudio Conversano.


Journal of Computational and Graphical Statistics | 2010

Combining an Additive and Tree-Based Regression Model Simultaneously: STIMA

E. Dusseldorp; Claudio Conversano; Bart Jan Van Os

Additive models and tree-based regression models are two main classes of statistical models used to predict the scores on a continuous response variable. It is known that additive models become very complex in the presence of higher order interaction effects, whereas some tree-based models, such as CART, have problems capturing linear main effects of continuous predictors. To overcome these drawbacks, the regression trunk model has been proposed: a multiple regression model with main effects and a parsimonious amount of higher order interaction effects. The interaction effects can be represented by a small tree: a regression trunk. This article proposes a new algorithm—Simultaneous Threshold Interaction Modeling Algorithm (STIMA)—to estimate a regression trunk model that is more general and more efficient than the initial one (RTA) and is implemented in the R-package stima. Results from a simulation study show that the performance of STIMA is satisfactory for sample sizes of 200 or higher. For sample sizes of 300 or higher, the 0.50 SE rule is the best pruning rule for a regression trunk in terms of power and Type I error. For sample sizes of 200, the 0.80 SE rule is recommended. Results from a comparative study of eight regression methods applied to ten benchmark datasets suggest that STIMA and GUIDE are the best performers in terms of cross-validated prediction error. STIMA appeared to be the best method for datasets containing many categorical variables. The characteristics of a regression trunk model are illustrated using the Boston house price dataset. Supplemental materials for this article, including the R-package stima, are available online.


Journal of Classification | 2009

Incremental Tree-Based Missing Data Imputation with Lexicographic Ordering

Claudio Conversano; Roberta Siciliano

In the framework of incomplete data analysis, this paper provides a nonparametric approach to missing data imputation based on Information Retrieval. In particular, an incremental procedure based on the iterative use of tree-based method is proposed and a suitable Incremental Imputation Algorithm is introduced. The key idea is to define a lexicographic ordering of cases and variables so that conditional mean imputation via binary trees can be performed incrementally. A simulation study and real data applications are carried out to describe the advantages and the performance with respect to standard approaches.


multiple classifier systems | 2000

Supervised Classifier Combination through Generalized Additive Multi-model

Claudio Conversano; Roberta Siciliano; Francesco Mola

In the framework of supervised classification and prediction modeling, this paper introduces a methodology based on a general formulation of combined model integration in order to improve the fit to the data. Despite of Generalized Additive Models (GAM) our approach combines not only and not necessarily estimations derived from smoothing functions, but also those provided by either parametric or nonparametric models. Because of the multiple classifier combination we have named this general class of models as Generalized Additive Multi-Models (GAM-M). The estimation procedure iterates the inner algorithm - which is a variant of the backfitting algorithm - and the outer algorithm - which is a standard local scoring algorithm - until convergence. The performances of GAM-M approach with respect to alternative approaches are shown in some applications using real data sets. The stability of the model estimates is evaluated by means of bootstrap and cross-validation. As a result, our methodology improves the goodness-of-fit of the model to the data providing also stable estimates.


Archive | 2010

Simultaneous Threshold Interaction Detection in Binary Classification

Claudio Conversano; Elise Dusseldorp

Classification Trunk Approach (CTA) is a method for the automatic selection of threshold interactions in generalized linear modelling (GLM). It comes out from the integration of classification trees and GLM. Interactions between predictors are expressed as “threshold interactions” instead of traditional cross-products. Unlike classification trees, CTA is based on a different splitting criterion and it is framed in a new algorithm – STIMA – that can be used to estimate threshold interactions effects in classification and regression models. This paper specifically focuses on the binary response case, and presents the results of an application on the Liver Disorders dataset to give insight into the advantages deriving from the use of CTA with respect to other model-based or decision tree-based approaches. Performances of the different methods are compared focusing on prediction accuracy and model complexity.


Archive | 2010

Detecting Subset of Classifiers for Multi-attribute Response Prediction

Claudio Conversano; Francesco Mola

An algorithm detecting a classification model in the presence of a multi-class response is introduced. It is called Sequential Automatic Search of a Subset of Classifiers (SASSC) because it adaptively and sequentially aggregates subsets of instances related to a proper aggregation of a subset of the response classes, that is, to a super-class. In each step of the algorithm, aggregations are based on the search of the subset of instances whose response classes generate a classifier presenting the lowest generalization error compared to other alternative aggregations. Cross-validation is used to estimate such generalization errors. The user can choose a final number of subsets of the response classes (super-classes) obtaining a final tree-based classification model presenting an high level of accuracy without neglecting parsimony. Results obtained analyzing a real dataset highlights the effectiveness of the proposed method.


Journal of Applied Statistics | 2010

Analysis of mutual funds’ management styles: a modeling, ranking and visualizing approach

Claudio Conversano; Domenico Vistocco

A method to rank mutual funds according to their investment style measured with respect to the returns of a reference portfolio (benchmark) is introduced. It is based on a style analysis model estimating a mutual fund portfolio composition as well as the benchmark one. Starting from such compositions, it computes a proximity measure based on the L 1 or L 2 norm to assess the similarity between each mutual fund portfolio returns and the benchmark returns as well as between the returns of each benchmark constituent and that of the corresponding mutual fund constituent. To this purpose the mean integrated absolute error and the mean integrated squared error are computed to derive both a global ranking of mutual fund management styles and partial rankings expressing the over- (under-) weighting of each portfolio constituent. A visual inspection of the results emphasizing main differences in management styles is provided, using a parallel coordinates plot. Since a modeling, a ranking and a visualizing approach are integrated, the method is named MoRaViA. From the practitioners’ point of view, it allows the identification of a specific management style for each mutual fund, discriminating active management funds from passive management ones. To evaluate the effectiveness of MoRaViA, many sets of artificial portfolios are generated and an application on a set of equity funds operating in the European market is presented.


Archive | 2008

Sequential Automatic Search of a Subset of Classifiers in Multiclass Learning

Francesco Mola; Claudio Conversano

A method called Sequential Automatic Search of a Subset of Classifiers is hereby introduced to deal with classification problems requiring decisions among a wide set of competing classes. It utilizes classifiers in a sequential way by restricting the number of competing classes while maintaining the presence of the true (class) outcome in the candidate set of classes. Some features of the method are discussed, namely: a cross-validation-based criteria to select the best classifier in each iteration of the algorithm, the resulting classification model and the possibility of choosing between an heuristic or probabilistic criteria to predict test set observations. Furthermore, the possibility to cast the whole method in the framework of unsupervised learning is also investigated. Advantages of the method are illustrated analyzing data from a letter recognition experiment.


Journal of Applied Statistics | 2017

Variation in caesarean delivery rates across hospitals: a Bayesian semi-parametric approach

Massimo Cannas; Claudio Conversano; Francesco Mola; E. Sironi

ABSTRACT This article presents a Bayesian semi-parametric approach for modeling the occurrence of cesarean sections using a sample of women delivering in 20 hospitals of Sardinia (Italy). A multilevel logistic regression has been fitted on the data using a Dirichlet process prior for modeling the random-effects distribution of the unobserved factors at the hospital level. Using the estimated random effects at the hospital level, a partition of the hospitals in terms of similar medical practice has been obtained that identifies different profiles of hospitals in terms of caesarean section risks. The limited number of clusters may be useful for suggesting policy implications that help to reduce the heterogeneity of caesarean delivery risks.


Archive | 2000

Generalized Additive Multi-Model for Classification and Prediction

Claudio Conversano; Roberta Siciliano; Francesco Mola

In this paper we introduce a methodology based on a combination of classification/prediction procedures derived from different approaches. In particular, starting from a general definition of a classification/prediction model named Generalized Additive Multi-Model (GAM-M) we will demonstrate how it is possible to obtain different types of statistical models based on parametric, semiparametric and nonparametric methods. In our methodology the estimation procedure is based on a variant of the backfitting algorithm used for Generalized Additive Models (GAM). The benchmarking of our methodology will be shown and the results will be compared with those derived from the applications of GAM and Tree procedures.


Archive | 2000

Semi-parametric models for data mining

Claudio Conversano; Francesco Mola

In order to combine the exactness of a very large data set with the major predictability of statistical modeling, we introduce a two-step methodology that makes use of partitioning algorithms and semi-parametric models. The result is an alternative strategy for supervised classification and prediction problems when dealing with huge and complex data sets, in order to improve the predictability of the dependent variable on the basis of the previous detection of homogeneous sub-populations1.

Collaboration


Dive into the Claudio Conversano's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Roberta Siciliano

University of Naples Federico II

View shared research outputs
Top Co-Authors

Avatar

Massimo Cannas

Catholic University of the Sacred Heart

View shared research outputs
Top Co-Authors

Avatar

Luca Frigau

University of Cagliari

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Emiliano Sironi

Catholic University of the Sacred Heart

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Raffaele Miele

University of Naples Federico II

View shared research outputs
Researchain Logo
Decentralizing Knowledge