Valerio Grossi | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Valerio Grossi is active.

Explore More

Publication

Featured researches published by Valerio Grossi.

Knowledge and Information Systems | 2012

Stream mining: a novel architecture for ensemble-based classification

Valerio Grossi; Franco Turini

Mining data streams has become an important and challenging task for a wide range of applications. In these scenarios, data tend to arrive in multiple, rapid and time-varying streams, thus constraining data mining algorithms to look at data only once. Maintaining an accurate model, e.g. a classifier, while the stream goes by requires a smart way of keeping track of the data already passed away. Such a synthetic structure has to serve two purposes: distilling the most of information out of past data and allowing a fast reaction to concept drifting, i.e. to the change of the data trend that necessarily affects the model. The paper outlines novel data structures and algorithms to tackle the above problem, when the model mined out of the data is a classifier. The introduced model and the overall ensemble architecture are presented in details, even considering how the approach can be extended for treating numerical attributes. A large part of the paper discusses the experiments and the comparisons with several existing systems. The comparisons show that the performance of our system in general, and in particular with respect to the reaction to concept drifting, is at the top level.

software engineering and formal methods | 2015

Clustering Formulation Using Constraint Optimization

Valerio Grossi; Anna Monreale; Mirco Nanni; Dino Pedreschi; Franco Turini

The problem of clustering a set of data is a textbook machine learning problem, but at the same time, at heart, a typical optimization problem. Given an objective function, such as minimizing the intra-cluster distances or maximizing the inter-cluster distances, the task is to find an assignment of data points to clusters that achieves this objective. In this paper, we present a constraint programming model for a centroid based clustering and one for a density based clustering. In particular, as a key contribution, we show how the expressivity introduced by the formulation of the problem by constraint programming makes the standard problem easy to be extended with other constraints that permit to generate interesting variants of the problem. We show this important aspect in two different ways: first, we show how the formulation of the density-based clustering by constraint programming makes it very similar to the label propagation problem and then, we propose a variant of the standard label propagation approach.

international joint conference on artificial intelligence | 2011

Kernel-based selective ensemble learning for streams of trees

Valerio Grossi; Alessandro Sperduti

Learning from streaming data represents an important and challenging task. Maintaining an accurate model, while the stream goes by, requires a smart way for tracking data changes through time, originating concept drift. One way to treat this kind of problem is to resort to ensemble-based techniques. In this context, the advent of new technologies related to web and ubiquitous services call for the need of new learning approaches able to deal with structured-complex information, such as trees. Kernel methods enable the modeling of structured data in learning algorithms, however they are computationally demanding. The contribute of this work is to show how an effective ensemble-based approach can be deviced for streams of trees by optimizing the kernel-based model representation. Both efficacy and efficiency of the proposed approach are assessed for different models by using data sets exhibiting different levels and types of concept drift.

Data Mining and Knowledge Discovery | 2017

Survey on using constraints in data mining

Valerio Grossi; Andrea Romei; Franco Turini

This paper provides an overview of the current state-of-the-art on using constraints in knowledge discovery and data mining. The use of constraints in a data mining task requires specific definition and satisfaction tools during knowledge extraction. This survey proposes three groups of studies based on classification, clustering and pattern mining, whether the constraints are on the data, the models or the measures, respectively. We consider the distinctions between hard and soft constraint satisfaction, and between the knowledge extraction phases where constraints are considered. In addition to discussing how constraints can be used in data mining, we show how constraint-based languages can be used throughout the data mining process.

european conference on machine learning | 2008

A Case Study in Sequential Pattern Mining for IT-Operational Risk

Valerio Grossi; Andrea Romei; Salvatore Ruggieri

IT-operational risk management consists of identifying, assessing, monitoring and mitigating the adverse risks of loss resulting from hardware and software system failures. We present a case study in IT-operational risk measurement in the context of a network of Private Branch eXchanges (PBXs). The approach relies on preprocessing and data mining tasks for the extraction of sequential patterns and their exploitation in the definition of a measure called expected risk.

Lecture Notes in Computer Science | 2016

Data Mining and Constraints: An Overview

Valerio Grossi; Dino Pedreschi; Franco Turini

This paper provides an overview of the current state-of-the-art on using constraints in knowledge discovery and data mining. The use of constraints requires mechanisms for defining and evaluating them during the knowledge extraction process. We give a structured account of three main groups of constraints based on the specific context in which they are defined and used. The aim is to provide a complete view on constraints as a building block of data mining methods.

ACM Transactions on Intelligent Systems and Technology | 2016

Driving Profiles Computation and Monitoring for Car Insurance CRM

Mirco Nanni; Roberto Trasarti; Anna Monreale; Valerio Grossi; Dino Pedreschi

Customer segmentation is one of the most traditional and valued tasks in customer relationship management (CRM). In this article, we explore the problem in the context of the car insurance industry, where the mobility behavior of customers plays a key role: Different mobility needs, driving habits, and skills imply also different requirements (level of coverage provided by the insurance) and risks (of accidents). In the present work, we describe a methodology to extract several indicators describing the driving profile of customers, and we provide a clustering-oriented instantiation of the segmentation problem based on such indicators. Then, we consider the availability of a continuous flow of fresh mobility data sent by the circulating vehicles, aiming at keeping our segments constantly up to date. We tackle a major scalability issue that emerges in this context when the number of customers is large—namely, the communication bottleneck—by proposing and implementing a sophisticated distributed monitoring solution that reduces communications between vehicles and company servers to the essential. We validate the framework on a large database of real mobility data coming from GPS devices on private cars. Finally, we analyze the privacy risks that the proposed approach might involve for the users, providing and evaluating a countermeasure based on data perturbation.

VISUAL '08 Proceedings of the 10th international conference on Visual Information Systems: Web-Based Visual Information Search and Management | 2008

Extending KDDML with a Visual Metaphor for the KDD Process

Valerio Grossi; Andrea Romei

The spreading application of data mining techniques is clearly represented by the large number of suites supporting the knowledge discovery process. The latter can be viewed as real visual programming environments. Based on this assumption, we define some requirements which a typical data mining high-level graphical user interface should satisfy, in order to guarantee a good level of interactivity and expressiveness. The aim of this study is to use these requirements during the engineering and development of visual knowledge flow abstraction for the existing KDDML (Knowledge Discovery in Databases Markup Language) system. We introduce some features not only directly related to the visual metaphor, but also to the whole system, here intended as a real visual programming environment for the knowledge discovery process.

Archive | 2008

Discovering Strategic Behaviors in Multi-Agent Scenarios by Ontology-Driven Mining

Davide Bacciu; Andrea Bellandi; Barbara Furletti; Valerio Grossi; Andrea Romei

Providing human users with a structured insight into extensive data collections is a problem that, in the latter years, has gathered increasing attention by the scientific community. An even more daring and ambitious research challenge is the recent attempt to address the same problem within the scope of the so-called multi-agent systems. In this context, multiple autonomous and heterogeneous entities, i.e. the agents, populate the environment performing actions and taking decisions based on internal policies and their knowledge of the world. The courses of action of the agents and their interaction with the environment and with themselves generates extensive amounts of information. Underneath such flat data collections lies fundamental knowledge concerning strategies and common behaviors emerging from the agent actions. The advantage of extracting such “strategic” knowledge is twofold. On the one hand, it allows users a deeper insight into the multi-agent system, permitting to discover interesting patterns of actions that can be identified as interesting behaviors in the particular problem at hand. Consider, for instance, a multi-agent system simulating negotiation/selling phases in a particular market (Viamonte et al., 2006): the detection of interesting patterns of actions can bring to light specific market strategies that can be exploited to optimize revenues. On the other hand, the identified strategic knowledge can be exploited by the agents themselves to improve their performance, for instance by updating their internal representation of the world and their behavioral policies. In order to extract such “strategic” knowledge, in this work we take a data mining approach using Association analysis, enriching it with the expressive power and the flexibility of ontologies. Association rules can effectively tackle both aspects of strategic behavior extraction, since they both provide a human understandable representation means for the extracted knowledge and an acti on rule base that can be used to supply agents with procedural and high-level knowledge concerning the identified strategies. Ontologies offer a structured description of the domain knowledge while maintaining data and their representation separated. An onto logy refers to an “engineering artifact”, consisting of a specific vocabulary containing the terms used to describe a certain domain, and a set of explicit assumptions regarding the meaning of vocabulary words. This set of Open Access Database www.i-techonline.com

international symposium on neural networks | 2017

A kernel-based ensemble classifier for evolving stream of trees with double concept drifting reaction

Valerio Grossi; Alessandro Sperduti

Modern mining approaches should be able to properly deal with the increased availability of structured data. Here we focus on the problem of processing streams of trees. Specifically, we cope with classification tasks. We show that by adopting a double concept drifting reaction mechanism in the context of a kernel-based ensemble of classifiers, it is actually possible to have an effective and efficient system to process streams of trees. The original contribution consists into the introduction of a local concept drifting mechanism, specifically designed for structured data, and used to compute the ensemble score function in such a way to focus only on reliable (sub)trees belonging to the classification models which constitute the ensemble. Experimental results seem to support the relevance and usefulness of this local component for concept drifting management.

Explore More