Mirko Cesarini | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mirko Cesarini is active.

Explore More

Publication

Featured researches published by Mirko Cesarini.

Information Processing and Management | 2015

A model-based evaluation of data quality activities in KDD

M Mezzanzanica; Roberto Boselli; Mirko Cesarini; Fabio Mercorio

Abstract We live in the Information Age, where most of the personal, business, and administrative data are collected and managed electronically. However, poor data quality may affect the effectiveness of knowledge discovery processes, thus making the development of the data improvement steps a significant concern. In this paper we propose the Multidimensional Robust Data Quality Analysis, a domain-independent technique aimed to improve data quality by evaluating the effectiveness of a black-box cleansing function. Here, the proposed approach has been realized through model checking techniques and then applied on a weakly structured dataset describing the working careers of millions of people. Our experimental outcomes show the effectiveness of our model-based approach for data quality as they provide a fine-grained analysis of both the source dataset and the cleansing procedures, enabling domain experts to identify the most relevant quality issues as well as the action points for improving the cleansing activities. Finally, an anonymized version of the dataset and the analysis results have been made publicly available to the community.

international conference on data technologies and applications | 2013

Automatic Synthesis of Data Cleansing Activities

M Mezzanzanica; Roberto Boselli; Mirko Cesarini; Fabio Mercorio

Data cleansing is growing in importance among both public and private organisations, mainly due to the relevant amount of data exploited for supporting decision making processes. This paper is aimed to show how model-based verification algorithms (namely, model checking) can contribute in addressing data cleansing issues, furthermore a new benchmark problem focusing on the labour market dynamic is introduced. The consistent evolution of the data is checked using a model defined on the basis of domain knowledge. Then, we formally introduce the concept of universal cleanser, i.e. an object which summarises the set of all cleansing actions for each feasible data inconsistency (according to a given consistency model), then providing an algorithm which synthesises it. The universal cleanser can be seen as a repository of corrective interventions useful to develop cleansing routines. We applied our approach to a dataset derived from the Italian labour market data, making the whole dataset and outcomes publicly available to the community, so that the results we present can be shared and compared with other techniques.

international conference on data technologies and applications | 2012

Data Quality Sensitivity Analysis on Aggregate Indicators

M Mezzanzanica; Roberto Boselli; Mirko Cesarini; Fabio Mercorio

Decision making activities stress data and information quality requirements. The quality of data sources is frequently very poor, therefore a cleansing process is required before using such data for decision making processes. When alternative (and more trusted) data sources are not available data can be cleansed only using business rules derived from domain knowledge. Business rules focus on fixing inconsistencies, but an inconsistency can be cleansed in different ways (i.e. the correction can be not deterministic), therefore the choice on how to cleanse data can (even strongly) affect the aggregate values computed for decision making purposes. The paper proposes a methodology exploiting Finite State Systems to quantitatively estimate how computed variables and indicators might be affected by the uncertainty related to low data quality, independently from the data cleansing methodology used. The methodology has been implemented and tested on a real case scenario providing effective results.

knowledge discovery and data mining | 2013

Inconsistency Knowledge Discovery for Longitudinal Data Management: A Model-Based Approach

Roberto Boselli; Mirko Cesarini; Fabio Mercorio; M Mezzanzanica

In the last years, the growing diffusion of IT-based services has given a rise to the use of huge masses of data. However, using data for analytical and decision making purposes requires to perform several tasks, e.g. data cleansing, data filtering, data aggregation and synthesis, etc. Tools and methodologies empowering people are required to appropriately manage the (high) complexity of large datasets.

knowledge discovery and data mining | 2014

A Policy-Based Cleansing and Integration Framework for Labour and Healthcare Data

Roberto Boselli; Mirko Cesarini; Fabio Mercorio; M Mezzanzanica

Large amounts of data are collected by public administrations and healthcare organizations, the integration of the data scattered in several information systems can facilitate the comprehension of complex scenarios and support the activities of decision makers.

intelligent data analysis | 2011

Data quality through model checking techniques

M Mezzanzanica; Roberto Boselli; Mirko Cesarini; Fabio Mercorio

The paper introduces the Robust Data Quality Analysis which exploits formal methods to support Data Quality Improvement Processes. The proposed methodology can be applied to data sources containing sequences of events that can be modelled by Finite State Systems. Consistency rules (derived from domain business rules) can be expressed by formal methods and can be automatically verified on data, both before and after the execution of cleansing activities. The assessment results can provide useful information to improve the data quality processes. The paper outlines the preliminary results of the methodology applied to a real case scenario: the cleansing of a very low quality database, containing the work careers of the inhabitants of an Italian province. The methodology has proved successful, by giving insights on the data quality levels and by providing suggestions on how to ameliorate the overall data quality process.

ieee international conference semantic computing | 2015

Challenge: Processing web texts for classifying job offers

Flora Amato; Roberto Boselli; Mirko Cesarini; Fabio Mercorio; M Mezzanzanica; Vincenzo Moscato; Fabio Persia; Antonio Picariello

Today the Web represents a rich source of labour market data for both public and private operators, as a growing number of job offers are advertised through Web portals and services. In this paper we apply and compare several techniques, namely explicit-rules, machine learning, and LDA-based algorithms to classify a real dataset of Web job offers collected from 12 heterogeneous sources against a standard classification system of occupations.

Journal of Data and Information Quality | 2015

A Model-Based Approach for Developing Data Cleansing Solutions

M Mezzanzanica; Roberto Boselli; Mirko Cesarini; Fabio Mercorio

The data extracted from electronic archives is a valuable asset; however, the issue of the (poor) data quality should be addressed before performing data analysis and decision-making activities. Poor data quality is frequently cleansed using business rules derived from domain knowledge. Unfortunately, the process of designing and implementing cleansing activities based on business rules requires a relevant effort. In this article, we illustrate a model-based approach useful to perform inconsistency identification and corrective interventions, thus simplifying the process of developing cleansing activities. The article shows how the cleansing activities required to perform a sensitivity analysis can be easily developed using the proposed model-based approach. The sensitivity analysis provides insights on how the cleansing activities can affect the results of indicators computation. The approach has been successfully used on a database describing the working histories of an Italian area population. A model formalizing how data should evolve over time (i.e., a data consistency model) in such domain was created (by means of formal methods) and used to perform the cleansing and sensitivity analysis activities.

Journal of Intelligent Information Systems | 2017

WoLMIS: a labor market intelligence system for classifying web job vacancies

Roberto Boselli; Mirko Cesarini; Stefania Marrara; Fabio Mercorio; M Mezzanzanica; Gabriella Pasi; Marco Viviani

In the last decades, an increasing number of employers and job seekers have been relying on Web resources to get in touch and to find a job. If appropriately retrieved and analyzed, the huge number of job vacancies available today on on-line job portals can provide detailed and valuable information about the Web Labor Market dynamics and trends. In particular, this information can be useful to all actors, public and private, who play a role in the European Labor Market. This paper presents WoLMIS, a system aimed at collecting and automatically classifying multilingual Web job vacancies with respect to a standard taxonomy of occupations. The proposed system has been developed for the Cedefop European agency, which supports the development of European Vocational Education and Training (VET) policies and contributes to their implementation. In particular, WoLMIS allows analysts and Labor Market specialists to make sense of Labor Market dynamics and trends of several countries in Europe, by overcoming linguistic boundaries across national borders. A detailed experimental evaluation analysis is also provided for a set of about 2 million job vacancies, collected from a set of UK and Irish Web job sites from June to September 2015.

Semantic Technologies for E-Government | 2010

SEEMP: A Networked Marketplace for Employment Services

Irene Celino; Dario Cerizza; Mirko Cesarini; Emanuele Della Valle; Flavio De Paoli; Jacky Estublier; Maria Grazia Fugini; Asunción Gómez Pérez; Mick Kerrigan; Pascal Guarrera; M Mezzanzanica; Jaime Ramírez; Boris Villazon; Gang Zhao

Human capital is more and more the key factor of economic growth and competitiveness in the information age and knowledge economy. But due to a still fragmented employment market compounded by the enlargement of the EU, the human resources are not effectively exchanged and deployed. The business innovation of SEEMP1 develops a vision of an Employment Mediation Marketplace (EMM) for market transparency and effic ient mediation. Its technological innovation provides a federated marketplace of employment agencies through a peer-to-peer network of employment data and mediation services. In other words, the solution under development is a de-fragmentation of the employment market by a web-based collaborative network. The SEEMP-enabled employment marketplace will strengthen the social organization of public employment administration, maximize the business turnover of private employment agencies, improve citizens’ productivity and welfare, and increase the competitiveness and performance of business.

Explore More