Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Massimo Ruffolo is active.

Publication


Featured researches published by Massimo Ruffolo.


international conference on document analysis and recognition | 2009

PDF-TREX: An Approach for Recognizing and Extracting Tables from PDF Documents

Ermelinda Oro; Massimo Ruffolo

This paper presents PDF-TREX, an heuristic approach for table recognition and extraction from PDF documents.The heuristics starts from an initial set of basic content elements and aligns and groups them, in bottom-up way by considering only their spatial features, in order to identify tabular arrangements of information. The scope of the approach is to recognize tables contained in PDF documents as a 2-dimensional grid on a Cartesian plane and extract them as a set of cells equipped by 2-dimensional coordinates. Experiments, carried out on a dataset composed of tables contained in documents coming from different domains, shows that the approach is well performing in recognizing table cells.The approach aims at improving PDF document annotation and information extraction by providing an output that can be further processed for understanding table and document contents.


international conference on enterprise information systems | 2006

H\imathLεX: A System for Semantic Information Extraction from Web Documents

Massimo Ruffolo; Marco Manna

Recognizing and extracting meaningful information from Web unstructured documents, taking into account their semantics, is an important problem of information and knowledge management. This paper describes H(imath)LeX, a system implementing a novel logic-based approach to information extraction from unstructured documents. The approach adopted in the H(imath)LeX system is founded on a new two-dimensional representation of documents, and heavily exploits DLP + - an extension of disjunctive logic programming for ontology representation and reasoning, which has been recently implemented on top of the DLV reasoning environment. Unlike previous systems, which are mainly syntactic, H(imath)LeX combines both semantic and syntactic knowledge for a powerful information extraction. Ontologies, representing the semantics of information to be extracted, are encoded in DLP + , while the extraction patterns are expressed using regular expressions and an ad hoc two-dimensional grammar. The execution of DLP + reasoning modules, encoding the grammar expressions, yields the actual extraction of information from the input document. H(imath)LeX allows the semantic information extraction from both HTML pages and flat text documents by using synthetic and very expressive extraction patterns.


intelligent data engineering and automated learning | 2004

DESCRY: A Density Based Clustering Algorithm for Very Large Data Sets

Fabrizio Angiulli; Clara Pizzuti; Massimo Ruffolo

A novel algorithm, named DESCRY, for clustering very large multidimensional data sets with numerical attributes is presented. DESCRY discovers clusters having different shape, size, and density and when data contains noise by first finding and clustering a small set of points, called meta-points, that well depict the shape of clusters present in the data set. Final clusters are obtained by assigning each point to one of the partial clusters. The computational complexity of DESCRY is linear both in the data set size and in the data set dimensionality. Experiments show the very good qualitative results obtained comparable with those obtained by state of the art clustering algorithms.


information integration and web-based applications & services | 2015

Using apps and rules in contextual workflows to semantically extract data from documents

Ermelinda Oro; Massimo Ruffolo

If smartly utilized, Big Data locked in unstructured sources, such as PDF documents, can yield unprecedented insights in solving tough business issues, optimizing business processes and improving customer relations. The challenge addressed in this paper is to unlock the value held in data plunged in unstructured document. We describe how a contextual workflow based approach is used to address, in a semantic and flexible way, various problems arising in processing data contained into documents. We present the MANTRA Smart Data Platform, which enables to turn Big Data into Smart Data by means of contextual workflows composed by smart-cloud applications (APPs for short). Among the others, the MANTRA Language APP executes MANTRA rules that are able to extract and annotate information contained in heterogeneous sources (raw text, PDF, HTML or other presentation-oriented document format). Such rules exploit syntactic and semantic expressions, visual and spatial features, and natural language capabilities. Real cases of applications are showing that the proposed approach is able to process a large amount of heterogeneous input documents, as well as extract and consolidate the information of interest.


international conference on enterprise information systems | 2018

A Methodology for Identifying Influencers and their Products Perception on Twitter.

Ermelinda Oro; Clara Pizzuti; Massimo Ruffolo

The massive amount of information posted by twitterers is attracting growing interest because of the several applications fields it can be utilized, such as, for instance, e-commerce. In fact, tweets enable users to express opinions about products and to influence other users. Thus, the identification of social network key influencers with their products perception and preferences is crucial to enable marketers to apply effective techniques of viral marketing and recommendation. In this paper, we propose a methodology, based on multilinear algebra, that combines topological and contextual information to identify the most influential twitterers of specific topics or products along with their perceptions and opinions about them. Experiments on a real use case regarding smartphones show the ability of the proposed methodology to find users that are authoritative in the social network in expressing their views about products and to identify the most relevant products for these users, along with the opinions they express.


international conference on agents and artificial intelligence | 2018

Language Identification of Similar Languages using Recurrent Neural Networks.

Ermelinda Oro; Massimo Ruffolo; Mostafa Sheikhalishahi

The goal of similar Language IDentification (LID) is to quickly and accurately identify the language of the text. It plays an important role in several Natural Language Processing (NLP) applications where it is frequently used as a pre-processing technique. For example, information retrieval systems use LID as a filtering technique to provide users with documents written only in a given language. Although different approaches to this problem have been proposed, similar language identification, in particular applied to short texts, remains a challenging task in NLP. In this paper, a method that combines word vectors representation and Long Short-Term Memory (LSTM) has been implemented. The experimental evaluation on public and well-known datasets has shown that the proposed method improves accuracy and precision of language identification tasks.


advances in databases and information systems | 2018

Contributions from ADBIS 2018 Workshops

Udo Bub; Ajantha Dahanayake; Jérôme Darmont; Claudia Diamantini; Fabio Fassetti; Eduardo Fermé; Nadia Kabachi; Ilaria Matteucci; Bálint Molnár; Sham Navathe; Ermelinda Oro; Marinella Petrocchi; Simona E. Rombo; Massimo Ruffolo; Angelo Spognardi; Bernhard Thalheim; Domenico Ursino

The ADBIS conferences provide an international forum for the presentation of research on database theory, development of advanced DBMS technologies, and their applications. The 22nd edition of ADBIS, held on September 2–5, 2018, in Budapest, Hungary, includes six thematic workshops collecting contributions from various domains representing new trends in the broad research areas of databases and information systems.


IEEE Transactions on Multimedia | 2018

Detecting Topic Authoritative Social Media Users: A Multilayer Network Approach

Ermelinda Oro; Clara Pizzuti; Nicola Procopio; Massimo Ruffolo

After the impressive diffusion of social media and microblogging websites of the last few years, the identification of users having the capability of influencing other users’ choices is an important research topic because of the opportunities it can offer to many business companies. Most of the existing approaches, however, detect influencers by relying on centrality measures computed on networks that connect users having different types of inter-relationships. In this paper, we propose a method capable of finding influential users by exploiting the contents of the messages posted by them to express opinions on items, by modeling these contents with a three-layer network. Layers represent users, items, and keywords, along with intra-layer interactions among the actors of the same layer. Inter-layer connections are triples (<inline-formula><tex-math notation=LaTeX>


applications of natural language to data bases | 2017

A Method for Querying Touristic Information Extracted from the Web

Ermelinda Oro; Massimo Ruffolo

u


Archive | 2002

Towards An Adaptive Mail Classifier

Giuseppe Manco; Elio Masciari; Massimo Ruffolo; Andrea Tagarelli

</tex-math></inline-formula>, <inline-formula> <tex-math notation=LaTeX>

Collaboration


Dive into the Massimo Ruffolo's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Clara Pizzuti

National Research Council

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Nicola Procopio

National Research Council

View shared research outputs
Top Co-Authors

Avatar

Claudia Diamantini

Marche Polytechnic University

View shared research outputs
Top Co-Authors

Avatar

Domenico Ursino

Mediterranea University of Reggio Calabria

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge