Antonio D’Ambrosio | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Antonio D’Ambrosio is active.

Explore More

Publication

Featured researches published by Antonio D’Ambrosio.

Psychometrika | 2016

A Recursive Partitioning Method for the Prediction of Preference Rankings Based Upon Kemeny Distances.

Antonio D’Ambrosio; Willem J. Heiser

Preference rankings usually depend on the characteristics of both the individuals judging a set of objects and the objects being judged. This topic has been handled in the literature with log-linear representations of the generalized Bradley-Terry model and, recently, with distance-based tree models for rankings. A limitation of these approaches is that they only work with full rankings or with a pre-specified pattern governing the presence of ties, and/or they are based on quite strict distributional assumptions. To overcome these limitations, we propose a new prediction tree method for ranking data that is totally distribution-free. It combines Kemeny’s axiomatic approach to define a unique distance between rankings with the CART approach to find a stable prediction tree. Furthermore, our method is not limited by any particular design of the pattern of ties. The method is evaluated in an extensive full-factorial Monte Carlo study with a new simulation design.

Journal of Classification | 2012

Accurate Tree-based Missing Data Imputation and Data Fusion within the Statistical Learning Paradigm

Antonio D’Ambrosio; Massimo Aria; Roberta Siciliano

Framework of this paper is statistical data editing, specifically how to edit or impute missing or contradictory data and how to merge two independent data sets presenting some lack of information. Assuming a missing at random mechanism, this paper provides an accurate tree-based methodology for both missing data imputation and data fusion that is justified within the Statistical Learning Theory of Vapnik. It considers both an incremental variable imputation method to improve computational efficiency as well as boosted trees to gain in prediction accuracy with respect to other methods. As a result, the best approximation of the structural risk (also known as irreducible error) is reached, thus reducing at minimum the generalization (or prediction) error of imputation. Moreover, it is distribution free, it holds independently of the underlying probability law generating missing data values. Performance analysis is discussed considering simulation case studies and real world applications.

Algorithms from and for Nature and Life | 2013

Clustering and Prediction of Rankings Within a Kemeny Distance Framework

Willem J. Heiser; Antonio D’Ambrosio

Rankings and partial rankings are ubiquitous in data analysis, yet there is relatively little work in the classification community that uses the typical properties of rankings. We review the broader literature that we are aware of, and identify a common building block for both prediction of rankings and clustering of rankings, which is also valid for partial rankings. This building block is the Kemeny distance, defined as the minimum number of interchanges of two adjacent elements required to transform one (partial) ranking into another. The Kemeny distance is equivalent to Kendall’s τ for complete rankings, but for partial rankings it is equivalent to Emond and Mason’s extension of τ. For clustering, we use the flexible class of methods proposed by Ben-Israel and Iyigun (Journal of Classification 25: 5–26, 2008), and define the disparity between a ranking and the center of cluster as the Kemeny distance. For prediction, we build a prediction tree by recursive partitioning, and define the impurity measure of the subgroups formed as the sum of all within-node Kemeny distances. The median ranking characterizes subgroups in both cases.

European Journal of Operational Research | 2016

Accurate algorithms for identifying the median ranking when dealing with weak and partial rankings under the Kemeny axiomatic approach

Sonia Amodio; Antonio D’Ambrosio; Roberta Siciliano

Preference rankings virtually appear in all fields of science (political sciences, behavioral sciences, machine learning, decision making and so on). The well-known social choice problem consists in trying to find a reasonable procedure to use the aggregate preferences or rankings expressed by subjects to reach a collective decision. This turns out to be equivalent to estimate the consensus (central) ranking from data and it is known to be a NP-hard problem. A useful solution has been proposed by Emond and Mason in 2002 through the Branch-and-Bound algorithm (BB) within the Kemeny and Snell axiomatic framework. As a matter of fact, BB is a time demanding procedure when the complexity of the problem becomes untractable, i.e. a large number of objects, with weak and partial rankings, in presence of a low degree of consensus. As an alternative, we propose an accurate heuristic algorithm called FAST that finds at least one of the consensus ranking solutions found by BB saving a lot of computational time. In addition, we show that the building block of FAST is an algorithm called QUICK that finds already one of the BB solutions so that it can be fruitfully considered to speed up even more the overall searching procedure if the number of objects is low. Simulation studies and applications on real data allows to show the accuracy and the computational efficiency of our proposal.

Expert Systems With Applications | 2017

Regression trees for multivalued numerical response variables

Antonio D’Ambrosio; Massimo Aria; Carmela Iorio; Roberta Siciliano

Abstract In the framework of regression trees, this paper provides a recursive partitioning methodology to deal with a non-standard response variable. Specifically, either multivalued numerical or modal response of the type histogram will be considered. These data are known as symbolic data, which special cases are classical data, imprecise data, conjunctive data as well as fuzzy data. In spite of pre-processing data in order to deal with standard regression tree methodology, this paper provides, as main contribution, a definition of the impurity measure and of the splitting criterion allowing for building the regression tree for multivalued numerical response variable. We analyze and evaluate the performance of our proposal, using simulated data as well as a real-world case studies.

Expert Systems With Applications | 2018

A P-Spline based clustering approach for portfolio selection

Carmela Iorio; Gianluca Frasso; Antonio D’Ambrosio; Roberta Siciliano

Abstract In the last years, many clustering techniques dealing with time course data have been proposed due to recent interests in studying phenomena that change over time. A new clustering method suitable for time series applications has been recently proposed by exploiting the properties of the P-splines approach. This semi-parametric tool has several advantages, i.e. it facilitates the removal of noise from time series and it ensures a computational time saving. In this paper, we propose to use this clustering approach on financial data with the aim of building a financial portfolio. Our proposal works directly on time series without any pre-processing, except for the computation of the spline coefficients and, eventually, normalizing the series. We show that our strategy is useful to support the investment decisions of financial practitioners.

Archive | 2011

Multidimensional Scaling as Visualization Tool of Web Sequence Rules

Antonio D’Ambrosio; Marcello Pecoraro

Web Mining can be defined as the application of Data mining processes to Web data. In the field of Web Mining, we distinguish among Web Content Mining, Web Structure Mining and Web Usage Mining. Web Content Mining is the Web Mining process which analyze various aspects related to the contents of a web site such as text, banners, graphics etc. Web Structure Mining is the branch of Web Mining that analyze the structure of the Net (or a sub-part) in terms of connection among the web pages and their linkage design. Finally, Web Usage Mining goal is to understand the usage custom behaviors of web sites users. Within the context of Web Usage Mining, pattern discovery and pattern analysis allow to profile users and their preferences. The sequence rules are association rules ordered in time. Given a data set coming from a web site which is characterized by a sequence of visits, the proposal is to understand the differences among browsing sections through a Multidimensional Scaling solution, and then obtain a graphical tool which allows to visualize in a new way the sequence rules. The resulting application is half way between Web Usage Mining and Web Structure Mining.

Archive | 2008

Posterior Prediction Modelling of Optimal Trees

Roberta Siciliano; Massimo Aria; Antonio D’Ambrosio

The framework of this paper is classification and regression trees, also known as tree-based methods, binary segmentation, tree partitioning, decision trees. Trees can be fruitfully used either to explore and understand the dependence relationship between the response variable and a set of predictors or to assign the response class or value for new objects on which only the measurements of predictors are known. Since the introduction of two-stage splitting procedure in 1992, the research unit in Naples has been introducing several contributions in this field, one of the main issues is combining tree partitioning with statistical models. This paper will provide a new idea of knowledge extraction using trees and models. It will deal with the trade off between the interpretability of the tree structure (i.e., exploratory trees) and the accuracy of the decision tree model (i.e., decision tree-based rules). Prospective and retrospective view of using models and trees will be discussed. In particular, we will introduce a tree-based methodology that grows an optimal tree structure with the posterior prediction modelling to be used as decision rule for new objects. The general methodology will be presented and a special case will be described in details. An application on a real world data set will be finally shown.

Archive | 2006

Boosted Incremental Tree-based Imputation of Missing Data

Roberta Siciliano; Massimo Aria; Antonio D’Ambrosio

Tree-based procedures have been recently considered as non parametric tools for missing data imputation when dealing with large data structures and no probability assumption. A previous work used an incremental algorithm based on cross-validated decision trees and a lexicographic ordering of the single data to be imputed. This paper considers an ensemble method where tree-based model is used as learner. Furthermore, the incremental imputation concerns missing data of each variable at turn. As a result, the proposed method allows more accurate imputa-tions through a more efficient algorithm. A simulation case study shows the overall good performance of the proposed method against some competitors. A MatLab implementation enriches Tree Harvest Software for non-standard classification and regression trees.

Archive | 2015

A New Proposal for Tree Model Selection and Visualization

Carmela Iorio; Massimo Aria; Antonio D’Ambrosio

The most common approach to build a decision tree is based on a two-step procedure: growing a full tree and then prune it back. The goal is to identify the tree with the lowest error rate. Alternative pruning criteria have been proposed in literature. Within the framework of recursive partitioning algorithms by tree-based methods, this paper provides a contribution on both the visual representation of the data partition in a geometrical space and the selection of the decision tree. In our visual approach the identification of the best tree and of the weakest links is immediately evaluable by the graphical analysis of the tree structure without considering the pruning sequence. The results in terms of error rate are really similar to the ones returned by the classification and regression trees (CART) procedure, showing how this new way to select the best tree is a valid alternative to the well-known cost-complexity pruning.

Explore More