Giovanni Felici
National Research Council
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Giovanni Felici.
PLOS ONE | 2012
Robin van Velzen; Emanuel Weitschek; Giovanni Felici; Freek T. Bakker
Recently diverged species are challenging for identification, yet they are frequently of special interest scientifically as well as from a regulatory perspective. DNA barcoding has proven instrumental in species identification, especially in insects and vertebrates, but for the identification of recently diverged species it has been reported to be problematic in some cases. Problems are mostly due to incomplete lineage sorting or simply lack of a ‘barcode gap’ and probably related to large effective population size and/or low mutation rate. Our objective was to compare six methods in their ability to correctly identify recently diverged species with DNA barcodes: neighbor joining and parsimony (both tree-based), nearest neighbor and BLAST (similarity-based), and the diagnostic methods DNA-BAR, and BLOG. We analyzed simulated data assuming three different effective population sizes as well as three selected empirical data sets from published studies. Results show, as expected, that success rates are significantly lower for recently diverged species (∼75%) than for older species (∼97%) (P<0.00001). Similarity-based and diagnostic methods significantly outperform tree-based methods, when applied to simulated DNA barcode data (P<0.00001). The diagnostic method BLOG had highest correct query identification rate based on simulated (86.2%) as well as empirical data (93.1%), indicating that it is a consistently better method overall. Another advantage of BLOG is that it offers species-level information that can be used outside the realm of DNA barcoding, for instance in species description or molecular detection assays. Even though we can confirm that identification success based on DNA barcoding is generally high in our data, recently diverged species remain difficult to identify. Nevertheless, our results contribute to improved solutions for their accurate identification.
European Journal of Operational Research | 2003
Vanda De Angelis; Giovanni Felici; Paolo Impelluso
Abstract In this paper we address an important design and management problem in a health care centre where a set of services, required by one or more categories of users in different but predetermined sequences, are provided. Various types of servers and facilities are assigned to the different services and are subject to budget restrictions. The number of servers of each type assigned to each service affects the overall efficiency of the system and its indicators, such as the total time spent in the system by the various categories of users. We present a methodology that interactively uses system simulation, estimation of target function and optimisation to calculate and validate the optimal configuration of servers. Such a methodology constitutes the core of an effective decision support system for health care managers. In this paper we describe an application of the proposed methodology and show its effectiveness in improving the management of a transfusion centre.
Informs Journal on Computing | 2002
Giovanni Felici; Klaus Truemper
This paper describes a method for learning logic relationships that correctly classify a given data set. The method derives from given logic data certain minimum cost satisfiability problems, solves these problems, and deduces from the solutions the desired logic relationships. Uses of the method include data mining, learning logic in expert systems, and identification of critical characteristics for recognition systems. Computational tests have proved that the method is fast and effective.
Molecular Ecology Resources | 2013
Emanuel Weitschek; Robin van Velzen; Giovanni Felici; Paola Bertolazzi
BLOG (Barcoding with LOGic) is a diagnostic and character‐based DNA Barcode analysis method. Its aim is to classify specimens to species based on DNA Barcode sequences and on a supervised machine learning approach, using classification rules that compactly characterize species in terms of DNA Barcode locations of key diagnostic nucleotides. The BLOG 2.0 software, its fundamental modules, online/offline user interfaces and recent improvements are described. These improvements affect both methodology and software design, and lead to the availability of different releases on the website http://dmb.iasi.cnr.it/blog-downloads.php. Previous and new experimental tests show that BLOG 2.0 outperforms previous versions as well as other DNA Barcode analysis methods.
Management Science | 2004
Giovanni Felici; Claudio Gentile
In this paper we formulate and efficiently solve staff scheduling problems for large organizations that provide continuous services to customers. We describe an integer programming approach for a class of such problems, where solutions have to obey a number of constraints related to workload balancing, shift compatibility, and distribution of days off. The formulation of the constraints is general and can be extended to different personnel management problems where staff members must cover shifts, and management must assign a fixed number of days off per week. The model maximizes staff satisfaction, expressed by positive weights for pairs of shifts in consecutive days. We consider the associated polytope and study its structure, determining some classes of inequalities that are facet inducing for special subproblems and other valid classes. We also identify a particular subproblem whose solution can be used to determine strong cuts for the complete problem. In addition, we design special branching rules that break the symmetries that arise in the solution space and have a large impact in the efficiency of the method. The validity of this approach has been ascertained by extensive computational tests; moreover, the operations research (OR) department of an airline has implemented the method to solve ground staff management problems.
Biodata Mining | 2014
Emanuel Weitschek; Giulia Fiscon; Giovanni Felici
BackgroundSpecific fragments, coming from short portions of DNA (e.g., mitochondrial, nuclear, and plastid sequences), have been defined as DNA Barcode and can be used as markers for organisms of the main life kingdoms. Species classification with DNA Barcode sequences has been proven effective on different organisms. Indeed, specific gene regions have been identified as Barcode: COI in animals, rbcL and matK in plants, and ITS in fungi. The classification problem assigns an unknown specimen to a known species by analyzing its Barcode. This task has to be supported with reliable methods and algorithms.MethodsIn this work the efficacy of supervised machine learning methods to classify species with DNA Barcode sequences is shown. The Weka software suite, which includes a collection of supervised classification methods, is adopted to address the task of DNA Barcode analysis. Classifier families are tested on synthetic and empirical datasets belonging to the animal, fungus, and plant kingdoms. In particular, the function-based method Support Vector Machines (SVM), the rule-based RIPPER, the decision tree C4.5, and the Naïve Bayes method are considered. Additionally, the classification results are compared with respect to ad-hoc and well-established DNA Barcode classification methods.ResultsA software that converts the DNA Barcode FASTA sequences to the Weka format is released, to adapt different input formats and to allow the execution of the classification procedure. The analysis of results on synthetic and real datasets shows that SVM and Naïve Bayes outperform on average the other considered classifiers, although they do not provide a human interpretable classification model. Rule-based methods have slightly inferior classification performances, but deliver the species specific positions and nucleotide assignments. On synthetic data the supervised machine learning methods obtain superior classification performances with respect to the traditional DNA Barcode classification methods. On empirical data their classification performances are at a comparable level to the other methods.ConclusionsThe classification analysis shows that supervised machine learning methods are promising candidates for handling with success the DNA Barcoding species classification problem, obtaining excellent performances. To conclude, a powerful tool to perform species identification is now available to the DNA Barcoding community.
Archive | 2006
Vanda De Angelis; Giovanni Felici; Gabriella Mancinelli
Feature Selection methods in Data Mining and Data Analysis problems aim at selecting a subset of the variables, or features, that describe the data in order to obtain a more essential and compact representation of the available information. The selected subset has to be small in size and must retain the information that is most useful for the specific application. The role of Feature Selection is particularly important when computationally expensive Data Mining tools are used, or when the data collection process is difficult or costly. Feature Selection problems are typically solved in the literature using search techniques, where the evaluation of a specific subset is accomplished by a proper function (filter methods) or directly by the performance of a Data Mining tool (wrapper methods). In this work we show how the Feature Selection problem can be formulated as a subgraph selection problem derived from the lightest k-subgraph problem, and solved as an Integer Program. The proposed formulation is very flexible, as additional conditions on the solution can be added in the formulation. Although optimal solutions for such problems are difficult to find in the worst case, a large number of test instances have been solved efficiently by commercial tools. Finally, an application to a database on urban mobility is presented, where the proposed method is integrated in the Data Mining tool named Lsquare and is compared with other approaches.
Computers & Mathematics With Applications | 2008
Paola Bertolazzi; Giovanni Felici; Paola Festa; Giuseppe Lancia
In this paper we investigate logic classification and related feature selection algorithms for large biomedical data sets. When the data is in binary/logic form, the feature selection problem can be formulated as a Set Covering problem of very large dimensions, whose solution is computationally challenging. We propose an alternative approximated formulation for feature selection that results in an extension of Set Covering of compact size, and use the logic classifier Lsquare to test its performances on two well-known data sets. An ad hoc metaheuristic of the GRASP type is used to solve efficiently the feature selection problem. A simple and effective method to convert rational data into logic data by interval mapping is also described. The computational results obtained are promising and the use of logic models, that can be easily understood and integrated with other domain knowledge, is one of the major strengths of this approach.
European Journal of Operational Research | 2016
Paola Bertolazzi; Giovanni Felici; Paola Festa; Giulia Fiscon; Emanuel Weitschek
Feature selection methods are used in machine learning and data analysis to select a subset of features that may be successfully used in the construction of a model for the data. These methods are applied under the assumption that often many of the available features are redundant for the purpose of the analysis. In this paper, we focus on a particular method for feature selection in supervised learning problems, based on a linear programming model with integer variables. For the solution of the optimization problem associated with this approach, we propose a novel robust metaheuristics algorithm that relies on a Greedy Randomized Adaptive Search Procedure, extended with the adoption of short memory and a local search strategy. The performances of our heuristic algorithm are successfully compared with those of well-established feature selection methods, both on simulated and real data from biological applications. The obtained results suggest that our method is particularly suited for problems with a very large number of binary or categorical features.
Genomics | 2014
Dimitris Polychronopoulos; Emanuel Weitschek; Slavica Dimitrieva; Philipp Bucher; Giovanni Felici; Yannis Almirantis
Scarce work has been done in the analysis of the composition of conserved non-coding elements (CNEs) that are identified by comparisons of two or more genomes and are found to exist in all metazoan genomes. Here we present the analysis of CNEs with a methodology that takes into account word occurrence at various lengths scales in the form of feature vector representation and rule based classifiers. We implement our approach on both protein-coding exons and CNEs, originating from human, insect (Drosophila melanogaster) and worm (Caenorhabditis elegans) genomes, that are either identified in the present study or obtained from the literature. Alignment free feature vector representation of sequences combined with rule-based classification methods leads to successful classification of the different CNEs classes. Biologically meaningful results are derived by comparison with the genomic signatures approach, and classification rates for a variety of functional elements of the genomes along with surrogates are presented.