Sergio Flesca | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sergio Flesca is active.

Explore More

Publication

Featured researches published by Sergio Flesca.

symposium on principles of database systems | 2004

The Lixto data extraction project: back and forth between theory and practice

Georg Gottlob; Christoph T. Koch; Robert Baumgartner; Marcus Herzog; Sergio Flesca

We present the Lixto project, which is both a research project in database theory and a commercial enterprise that develops Web data extraction (wrapping) and Web service definition software. We discuss the projects main motivations and ideas, in particular the use of a logic-based framework for wrapping. Then we present theoretical results on monadic datalog over trees and on Elog, its close relative which is used as the internal wrapper language in the Lixto system. These results include both a characterization of the expressive power and the complexity of these languages. We describe the visual wrapper specification process in Lixto and various practical aspects of wrapping. We discuss work on the complexity of query languages for trees that was inseminated by our theoretical study of logic-based languages for wrapping. Then we return to the practice of wrapping and the Lixto Transformation Server, which allows for streaming integration of data extracted from Web pages. This is a natural requirement in complex services based on Web wrapping. Finally, we discuss industrial applications of Lixto and point to open problems for future study.

IEEE Transactions on Knowledge and Data Engineering | 2005

Fast detection of XML structural similarity

Sergio Flesca; Giuseppe Manco; Elio Masciari; Luigi Pontieri; Andrea Pugliese

Because of the widespread diffusion of semistructured data in XML format, much research effort is currently devoted to support the storage and retrieval of large collections of such documents. XML documents can be compared as to their structural similarity, in order to group them into clusters so that different storage, retrieval, and processing techniques can be effectively exploited. In this scenario, an efficient and effective similarity function is the key of a successful data management process. We present an approach for detecting structural similarity between XML documents which significantly differs from standard methods based on graph-matching algorithms, and allows a significant reduction of the required computation costs. Our proposal roughly consists of linearizing the structure of each XML document, by representing it as a numerical sequence and, then, comparing such sequences through the analysis of their frequencies. First, some basic strategies for encoding a document are proposed, which can focus on diverse structural facets. Moreover, the theory of discrete Fourier transform is exploited to effectively and efficiently compare the encoded documents (i.e., signals) in the domain of frequencies. Experimental results reveal the effectiveness of the approach, also in comparison with standard methods.

very large data bases | 2003

On the minimization of Xpath queries

Sergio Flesca; Filippo Furfaro; Elio Masciari

XML queries are usually expressed by means of XPath expressions identifying portions of the selected documents. An XPath expression defines a way of navigating an XML tree and returns the set of nodes which are reachable from one or more starting nodes through the paths specified by the expression. The problem of efficiently answering XPath queries is very interesting and has recently received increasing attention by the research community. In particular, an increasing effort has been devoted to define effective optimization techniques for XPath queries. One of the main issues related to the optimization of XPath queries is their minimization. The minimization of XPath queries has been studied for limited fragments of XPath, containing only the descendent, the child and the branch operators. In this work, we address the problem of minimizing XPath queries for a more general fragment, containing also the wildcard operator. We characterize the complexity of the minimization of XPath queries, stating that it is NP-hard, and propose an algorithm for computing minimum XPath queries. Moreover, we identify an interesting tractable case and propose an ad hoc algorithm handling the minimization of this kind of queries in polynomial time.

international conference on logic programming | 2001

Declarative Information Extraction, Web Crawling, and Recursive Wrapping with Lixto

Robert Baumgartner; Sergio Flesca; Georg Gottlob

Lixto is a system and method for the visual and interactive generation of wrappers for Web pages under the supervision of a human developer, for automatically extracting information from Web pages using such wrappers, and for translating the extracted content into XML. This paper describes some advanced features of Lixto, such as disjunctive pattern definitions, specialization rules, and Lixtos capability of collecting and aggregating information from several linked Web pages.

database programming languages | 2005

Consistent query answers on numerical databases under aggregate constraints

Sergio Flesca; Filippo Furfaro; Francesco Parisi

international conference on logic programming | 2001

The Elog Web Extraction Language

Robert Baumgartner; Sergio Flesca; Georg Gottlob

This paper illustrates some aspects of the visual wrapper generation tool Lixto and describes its internal declarative logic-based language Elog. In particular, it gives an example scenarioand contains a detailed description of predicates including their input/output behavior and introduces several new conditions. Additionally, entity relationship diagrams of filters and patterns are depicted and some words on the implementation are issued. Finally, some possible ramifications are discussed.

ACM Transactions on Database Systems | 2010

Querying and repairing inconsistent numerical databases

Sergio Flesca; Filippo Furfaro; Francesco Parisi

The problem of extracting consistent information from relational databases violating integrity constraints on numerical data is addressed. In particular, aggregate constraints defined as linear inequalities on aggregate-sum queries on input data are considered. The notion of repair as consistent set of updates at attribute-value level is exploited, and the characterization of several data-complexity issues related to repairing data and computing consistent query answers is provided. Moreover, a method for computing “reasonable” repairs of inconsistent numerical databases is provided, for a restricted but expressive class of aggregate constraints. Several experiments are presented which assess the effectiveness of the proposed approach in real-life application scenarios.

international xml database symposium | 2003

Repairs and Consistent Answers for XML Data with Functional Dependencies

Sergio Flesca; Filippo Furfaro; Sergio Greco; Ester Zumpano

In this paper we consider the problem of XML data which may be inconsistent with respect to a set of functional dependencies. We propose a technique for computing repairs (minimal sets of update operations making data consistent) and consistent answers. More specifically, our repairs are based on i) the replacing of values associated with attributes and elements, and ii) the introduction of a function stating if the node information is reliable.

computer aided verification | 2005

Verification of tree updates for optimization

Michael Benedikt; Angela Bonifati; Sergio Flesca; Avinash Vyas

With the rise of XML as a standard format for representing tree-shaped data, new programming tools have emerged for specifying transformations to tree-like structures. A recent example along this line are the update languages of [16,15,8] which add tree update primitives on top of the declarative query languages XPath and XQuery. These tree update languages use a “snapshot semantics”, in which all querying is performed first, after which a generated sequence of concrete updates is performed in a fixed order determined by query evaluation. In order to gain efficiency, one would prefer to perform updates as soon as they are generated, before further querying. This motivates a specific verification problem: given a tree update program, determine whether generated updates can be performed before all querying is completed. We formalize this notion, which we call “Binding Independence”. We give an algorithm to verify that a tree update program is Binding Independent, and show how this analysis can be used to produce optimized evaluation orderings that significantly reduce processing time.

ACM Transactions on Computational Logic | 2015

On the Complexity of Probabilistic Abstract Argumentation Frameworks

Bettina Fazzinga; Sergio Flesca; Francesco Parisi

Probabilistic abstract argumentation combines Dung’s abstract argumentation framework with probability theory in order to model uncertainty in argumentation. In this setting, we address the fundamental problem of computing the probability that a set of arguments is an extension according to a given semantics. We focus on the most popular semantics (i.e., admissible, stable, complete, grounded, preferred, ideal-set, ideal, stage, and semistable) and show the following dichotomy result: computing the probability that a set of arguments is an extension is either FP or FP#P-complete depending on the semantics adopted. Our polynomial-time results are particularly interesting, as they hold for some semantics for which no polynomial-time technique was known so far.

Explore More