Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Ermelinda Oro is active.

Publication


Featured researches published by Ermelinda Oro.


international conference on document analysis and recognition | 2009

PDF-TREX: An Approach for Recognizing and Extracting Tables from PDF Documents

Ermelinda Oro; Massimo Ruffolo

This paper presents PDF-TREX, an heuristic approach for table recognition and extraction from PDF documents.The heuristics starts from an initial set of basic content elements and aligns and groups them, in bottom-up way by considering only their spatial features, in order to identify tabular arrangements of information. The scope of the approach is to recognize tables contained in PDF documents as a 2-dimensional grid on a Cartesian plane and extract them as a set of cells equipped by 2-dimensional coordinates. Experiments, carried out on a dataset composed of tables contained in documents coming from different domains, shows that the approach is well performing in recognizing table cells.The approach aims at improving PDF document annotation and information extraction by providing an output that can be further processed for understanding table and document contents.


very large data bases | 2010

SXPath : extending XPath towards spatial querying on web documents

Ermelinda Oro; Massimo Ruffolo; Steffen Staab

Querying data from presentation formats like HTML, for purposes such as information extraction, requires the consideration of tree structures as well as the consideration of spatial relationships between laid out elements. The underlying rationale is that frequently the rendering of tree structures is very involved and undergoing more frequent updates than the resulting layout structure. Therefore, in this paper, we present Spatial XPath (SXPath), an extension of XPath 1.0 that allows for inclusion of spatial navigation primitives into the language resulting in conceptually simpler queries on Web documents. The SXPath language is based on a combination of a spatial algebra with formal descriptions of XPath navigation, and maintains polynomial time combined complexity. Practical experiments demonstrate the usability of SXPath.


international conference on tools with artificial intelligence | 2008

XONTO: An Ontology-Based System for Semantic Information Extraction from PDF Documents

Ermelinda Oro; Massimo Ruffolo

Information extraction is of paramount importance in several real world applications in the areas of business intelligence, competitive and military intelligence. Although several sophisticated and indeed complex approaches were proposed, they are still limited in many aspects. In this paper the novel ontology-based system named XONTO, that allows the semantic extraction of information from PDF unstructured documents, is presented. The XONTO system is founded on the idea of self-describing ontologies in which objects and classes can be equipped by a set of rules named descriptors. These rules represent patterns that allow to automatically recognize and extract ontology objects contained in PDF documents also when information is arranged in tabular form. This way a self-describing ontology expresses the semantic of the information to extract and the rules that, in turn, populate itself. In the paper XONTO system behaviors and structure are sketched by means of a running example.


International Journal on Artificial Intelligence Tools | 2009

ONTOLOGY-BASED INFORMATION EXTRACTION FROM PDF DOCUMENTS WITH XONTO

Ermelinda Oro; Massimo Ruffolo; Domenico Saccà

Information extraction is of paramount importance in several real world applications in the areas of business, competitive and military intelligence because it enables to acquire information contained in unstructured documents and store them in structured forms. Unstructured documents have different internal encodings, one of the most diffused encoding is the visualization-oriented Adobe portable document format (PDF). Although several sophisticated and indeed complex approaches were proposed, they are still limited in many aspects. In particular, existing information extraction systems cannot be applied to PDF documents because of their completely unstructured nature that pose many issues in defining IE approaches. In this paper the novel ontology-based system named XONTO, that allows the semantic extraction of information from PDF documents, is presented. The XONTO system is founded on the idea of self-describing ontologies in which objects and classes can be equipped by a set of rules named descriptors. These rules represent patterns that allow to automatically recognize and extract ontology objects contained in PDF documents also when information is arranged in tabular form. This way a self-describing ontology expresses the semantic of the information to extract and the rules that, in turn, populate itself. In the paper XONTO system behaviors and structure are sketched by means of a running example.


industrial conference on data mining | 2011

Towards a spatial instance learning method for deep web pages

Ermelinda Oro; Massimo Ruffolo

A large part of information available on the Web is hidden to conventional research engines because Web pages containing such information are dynamically generated as answers to query submitted by search form filled in by keywords. Such pages are referred as Deep Web pages and contain huge amount of relevant information for different application domain. For these reasons there is a constant high interest in efficiently extracting data from Deep Web data sources. In this paper we present a spatial instance learning method from Deep Web pages that exploits both the spatial arrangement and the visual features of data records and data items/fields produced by layout engines of web browsers. The proposed method is independent from the DeepWeb pages encoding and from the presentation layout of data records. Furthermore, it allows for recognizing data records in Deep Web pages having multiple data regions. In the paper the effectiveness of the proposed method is proven by experiments carried out on a dataset of 100 Web pages randomly selected from most known Deep Web sites. Results obtained by using the proposed method show that the method has a very high precision and recall and that system works much better than MDR and ViNTS approaches applied to the same dataset.


Archive | 2012

The H

Marco Manna; Ermelinda Oro; Massimo Ruffolo; Mario Alviano; Nicola Leone

In this paper, a new technique for the optimization of (partially) bound queries over disjunctive Datalog programs with stratified negation is presented. The technique exploits the propagation of query bindings and extends the Magic Set optimization technique (originally defined for non-disjunctive programs). An important feature of disjunctive Datalog programs is nonmonotonicity, which calls for nondeterministic implementations, such as backtracking search. A distinguishing characteristic of the new method is that the optimization can be exploited also during the nondeterministic phase. In particular, after some assumptions have been made during the computation, parts of the program may become irrelevant to a query under these assumptions. This allows for dynamic pruning of the search space. In contrast, the effect of the previously defined Magic Set methods for disjunctive Datalog is limited to the deterministic portion of the process. In this way, the potential performance gain by using the proposed method can be exponential, as could be observed empirically. The correctness of the method is established and proved in a formal way thanks to a strong relationship between Magic Sets and unfounded sets that has not been studied in the literature before. This knowledge allows for extending the method and the correctness proof also to programs with stratified negation in a natural way. The proposed method has been implemented in the DLV system and various experiments on synthetic as well as on real-world data have been conducted. The experimental results on synthetic data confirm the utility of Magic Sets for disjunctive Datalog, and they highlight the computational gain that may be obtained by the new method with respect to the previously proposed Magic Set method for disjunctive Datalog programs. Further experiments on data taken from a real-life application show the benefits of the Magic Set method within an application scenario that has received considerable attention in recent years, the problem of answering user queries over possibly inconsistent databases originating from integration of autonomous sources of information.The explosive growth and popularity of the Web has resulted in a huge amount of digital information sources on the Internet. Unfortunately, such sources only manage data, rather than the knowledge they carry. Recognizing, extracting, and structuring relevant information according to their semantics is a crucial task. Several approaches in the field of Information Extraction (IE) have been proposed to support the translation of semi-structured/unstructured documents into structured data or knowledge. Most of them have a high precision but, since they are mainly syntactic, they often have a low recall, are dependent on the document format, and ignore the semantics of information they extract. In this paper, we describe a new approach for semantic information extraction that could represent the basis for automatically extracting highly structured data from unstructured web sources without any undesirable trade-off between precision and recall. In short, the approach (i) is ontology driven, (ii) is based on a unified representation of documents, (iii) integrates existing IE techniques, (iv) implements semantic regular expressions, (v) has been implemented through Answer Set Programming, (vi) is employed in real-world applications, and (vii) is having a positive feedback from business customers.


international conference on enterprise information systems | 2009

\imath

Ermelinda Oro; Massimo Ruffolo

Managing costs and risks is an high priority theme for health care professionals and providers. A promising approach for reducing costs and risks, and enhancing patient safety, is the definition of process-oriented clinical information systems. In the area of health care information systems, a number of systems and approaches to medical knowledge and clinical processes representation and management are available. But no systems that provide integrated approaches to both declarative and procedural medical knowledge are currently available. In this work a clinical process management system aimed at supporting a semantic process-centered vision of health care practices is described. The system is founded on an ontology-based clinical knowledge representation framework that allows representing and managing, in a unified way, both medical knowledge and clinical processes. The system provides functionalities for: (i) designing clinical processes by exploiting already existing and ad-hoc medical ontologies and guideline base; (ii) executing clinical processes and monitoring their evolution by adopting alerting techniques that aid to prevent risks and errors; (iii) analyzing clinical processes by semantic querying and data mining techniques for making available decision support features able to contain risks and to enhance cost control and patient safety.


business information systems | 2009

L ε X System for Semantic Information Extraction

Ermelinda Oro; Massimo Ruffolo; Domenico Saccà

A process-oriented vision of clinical practices may allow to enhance patient safety by enabling better risks management capabilities. In the field of clinical information systems less attention has been paid to approaches aimed at reducing risks and errors by integrating both declarative and procedural aspects of medical knowledge. This work describes a semantic clinical knowledge representation framework that allows representing and managing, in a unified way, both medical knowledge and clinical processes and supports execution of clinical process taking into account risk handling. Framework features are presented by using a running example inspired by the clinical process adopted for caring breast neoplasm in the oncological ward of an Italian hospital. The example shows how the proposed framework can contribute to reduce risks.


OTM '08 Proceedings of the OTM 2008 Confederated International Conferences, CoopIS, DOA, GADA, IS, and ODBASE 2008. Part II on On the Move to Meaningful Internet Systems | 2008

TOWARDS A SEMANTIC SYSTEM FOR MANAGING CLINICAL PROCESSES

Ermelinda Oro; Massimo Ruffolo

Ontologies enable to directly encode domain knowledge in software applications, so ontology-based systems can exploit the meaning of information for providing advanced and intelligent functionalities. One of the most interesting and promising application of ontologies is information extraction from unstructured documents. In this area the extraction of meaningful information from PDF documents has been recently recognized as an important and challenging problem. This paper proposes an ontology-based information extraction system for PDF documents founded on a well suited knowledge representation approach named self-populating ontology (SPO ). The SPO approach combines object-oriented logic-based features with formal grammar capabilities and allows expressing knowledge in term of ontology schemas, instances, and extraction rules (called descriptors ) aimed at extracting information having also tabular form. The novel aspect of the SPO approach is that it allows to represent ontologies enriched by rules that enable them to populate them-self with instances extracted from unstructured PDF documents. In the paper the tractability of the SPO approach is proven. Moreover, features and behavior of the prototypical implementation of the SPO system are illustrated by means of a running example.


international conference on enterprise information systems | 2017

A Semantic Clinical Knowledge Representation Framework for Effective Health Care Risk Management

Ermelinda Oro; Massimo Ruffolo

Big data generated across the web is assuming growing importance in producing insights useful to understand real-world phenomena and to make smarter decisions. The tourism is one of the leading growth sectors, therefore, methods and technologies that simplify and empower web contents gathering, processing, and analysis are becoming more and more important in this application area. In this paper, we present a web content analytics method that automates and simplifies content extraction and acquisition from many different web sources, like newspapers and social networks, accelerate content cleaning, analysis, and annotation, makes faster insights generation by visual exploration of analysis results. We, also, describe an application to a real-world use case regarding the analysis of the touristic impact of the Italian Open tennis tournament. Obtained results show that our method makes the analysis of news and social media posts more easy, agile, and effective.

Collaboration


Dive into the Ermelinda Oro's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Marco Manna

University of Calabria

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Steffen Staab

University of Koblenz and Landau

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge