Giorgio Orsi
University of Oxford
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Giorgio Orsi.
international conference on data engineering | 2011
Georg Gottlob; Giorgio Orsi; Andreas Pieris
Ontological queries are evaluated against an enterprise ontology rather than directly on a database. The evaluation and optimization of such queries is an intriguing new problem for database research. In this paper we discuss two important aspects of this problem: query rewriting and query optimization. Query rewriting consists of the compilation of an ontological query into an equivalent query against the underlying relational database. The focus here is on soundness and completeness. We review previous results and present a new rewriting algorithm for rather general types of ontological constraints (description logics). In particular, we show how a conjunctive query (CQ) against an enterprise ontology can be compiled into a union of conjunctive queries (UCQ) against the underlying database. Ontological query optimization, in this context, attempts to improve this process so to produce possibly small and cost-effective output UCQ. We review existing optimization methods, and propose an effective new method that works for Linear Datalog±, a description logic that encompasses well-known description logics of the DL-Lite family.
Communications of The ACM | 2009
Carlo Curino; Giorgio Orsi; Elisa Quintarelli; Rosalba Rossato; Fabio A. Schreiber; Letizia Tanca
Common to all aCtors in today’s information world is the problem of lowering the “information noise,” both reducing the amount of data to be stored and accessed, and enhancing the “precision” according to which the available data fit the application requirements. Thus, fitting data to the application needs is tantamount to fitting a dress to a person, and will be referred to as data tailoring. The context will be our scissors to tailor data, possibly assembled and integrated from many data sources. Since the 1980s, many organizations have evolved to comply with the market needs in terms of flexibility, effective customer relationship management, supply chain optimization and so on and so forth: the situation where a set of partners re-engineered their single organizations, generating a unique, extended enterprise, has frequently been observed. Together with the organizations, also their information systems evolved, embracing new technologies like XML and ontologies, used in ERP systems and Webservice based applications. In recent years many organizations introduced into their information systems also Knowledge Management features, to allow easy information sharing among the organizations’ members; these new information sources and their content have to be managed together with other – we might say legacy – enterprise data. This growth of information, if not properly controlled, leads to a data overload that may cause confusion rather than knowledge, and dramatically reduce the benefits of a rich information system. However, distinguishing useful information from noise, i.e., from all the information not relevant to the specific application, is not a trivial task; the same piece of information can be considered differently, even by the same user, in different situations, or places – in a single word, in a different context. The notion of context, formerly emerged in various fields of research like psychology and philosophy, is acquiring great importance also in the computer science field. In a commonsense interpretation, the context is perceived as a set of variables that may be of interest for an agent and that influence its actions. The context has often a significant impact on the way humans (or machines) interpret their environment: a change in context causes a transformation in the actor’s mental representation of the reality, even when the reality is not changed. The word itself, derived from the Latin cum (with or together) and texere (to weave), describes a context not just as a profile, but as an active process dealing with the way humans weave their experience within their whole environment, to give it meaning. In the last few years, sophisticated and general context models have been proposed to support context-aware applications. In the following we list the different meanings attributed to the word context: Presentation-oriented: ˲ context is perceived as the capability of the system to adapt content presentation to different channels or to different devices. These context-models are often rigid, since they are designed for specific applications and rely on a well known set of presentation variables. Location-oriented: ˲ with this family of context models, it is possible to handle and What can context do for data?
ACM Transactions on Database Systems | 2014
Georg Gottlob; Giorgio Orsi; Andreas Pieris
Ontological queries are evaluated against a knowledge base consisting of an extensional database and an ontology (i.e., a set of logical assertions and constraints that derive new intensional knowledge from the extensional database), rather than directly on the extensional database. The evaluation and optimization of such queries is an intriguing new problem for database research. In this article, we discuss two important aspects of this problem: query rewriting and query optimization. Query rewriting consists of the compilation of an ontological query into an equivalent first-order query against the underlying extensional database. We present a novel query rewriting algorithm for rather general types of ontological constraints that is well suited for practical implementations. In particular, we show how a conjunctive query against a knowledge base, expressed using linear and sticky existential rules, that is, members of the recently introduced Datalog± family of ontology languages, can be compiled into a union of conjunctive queries (UCQ) against the underlying database. Ontological query optimization, in this context, attempts to improve this rewriting process soas to produce possibly small and cost-effective UCQ rewritings for an input query.
international conference on management of data | 2013
Paolo Atzeni; Christian S. Jensen; Giorgio Orsi; Sudha Ram; Letizia Tanca; Riccardo Torlone
We report the opinions expressed by well-known database researchers on the future of the relational model and SQL during a panel at the International Workshop on Non-Conventional Data Access (NoCoDa 2012), held in Florence, Italy in October 2012 in conjunction with the 31st International Conference on Conceptual Modeling. The panelists include: Paolo Atzeni (Università Roma Tre, Italy), Umeshwar Dayal (HP Labs, USA), Christian S. Jensen (Aarhus University, Denmark), and Sudha Ram (University of Arizona, USA). Quotations from movies are used as a playful though effective way to convey the dramatic changes that database technology and research are currently undergoing.
international world wide web conferences | 2012
Tim Furche; Georg Gottlob; Giovanni Grasso; Xiaonan Guo; Giorgio Orsi; Christian Schallhart
Forms are our gates to the web. They enable us to access the deep content of web sites. Automatic form understanding unlocks this content for applications ranging from crawlers to meta-search engines and is essential for improving usability and accessibility of the web. Form understanding has received surprisingly little attention other than as component in specific applications such as crawlers. No comprehensive approach to form understanding exists and previous works disagree even in the definition of the problem. In this paper, we present OPAL, the first comprehensive approach to form understanding. We identify form labeling and form interpretation as the two main tasks involved in form understanding. On both problems OPAL pushes the state of the art: For form labeling, it combines signals from the text, structure, and visual rendering of a web page, yielding robust characterisations of common design patterns. In extensive experiments on the ICQ and TEL-8 benchmarks and a set of 200 modern web forms OPAL outperforms previous approaches by a significant margin. For form interpretation, we introduce a template language to describe frequent form patterns. These two parts of OPAL combined yield form understanding with near perfect accuracy (> 98%).
international conference on datalog in academia and industry | 2010
Giorgio Orsi; Letizia Tanca
Many interpretations of the notion of context have emerged in vari- ous fields and context-aware systems are pervading everyday life, becoming an expanding research field. Context has often a significant impact on the way hu- mans (or machines) act, and on how they interpret things; furthermore, a change in context causes a transformation in the experience that is going to be lived. Ac- cordingly, while the computer science community has initially perceived the con- text simply as a matter of user time and location, in the last few years this notion has been considered not simply as a state, but as part of a process in which users are involved; thus, sophisticated and general context models and systems have been proposed to support context-aware applications. In this paper we propose a foundational framework for the life-cycle of context-aware system, in which the system design and management activities consider context as an orthogonal, first-class citizen. In doing so, we present a Datalog-based formulation for the definition of context-aware databases.
database and expert systems applications | 2007
Carlo Curino; Giorgio Orsi; Letizia Tanca
System interoperability is a well known issue, especially for heterogeneous information systems, where ontology- based representations may support automatic and user- transparent integration. In this paper we present X-SOM: an ontology mapping and integration tool. The contribution of our tool is a modular and extensible architecture that automatically combines several matching techniques by means of a neural network, performing also ontology debugging to avoid inconsistencies. Besides describing the tool components, we discuss the prototype implementation, which has been tested against the OAEI 2006 benchmark with promising results.
international conference on document analysis and recognition | 2013
Max C. Göbel; Tamir Hassan; Ermelinda Oro; Giorgio Orsi
Table understanding is a well studied problem in document analysis, and many academic and commercial approaches have been developed to recognize tables in several document formats, including plain text, scanned page images and born-digital, object-based formats such as PDF. Despite the abundance of these techniques, an objective comparison of their performance is still missing. The Table Competition held in the context of ICDAR 2013 is our first attempt at objectively evaluating these techniques against each other in a standardized way, across several input formats. The competition independently addresses three problems: (i) table location, (ii) table structure recognition, and (iii) these two tasks combined. We received results from seven academic systems, which we have also compared against four commercial products. This paper presents our findings.
international conference on data engineering | 2012
Roberto De Virgilio; Giorgio Orsi; Letizia Tanca; Riccardo Torlone
We present NYAYA, a flexible system for the management of large-scale semantic data which couples a general-purpose storage mechanism with efficient ontological query answering. NYAYA rapidly imports semantic data expressed in different formalisms into semantic data kiosks. Each kiosk exposes the native ontological constraints in a uniform fashion using data log±, a very general rule-based language for the representation of ontological constraints. A group of kiosks forms a semantic data market where the data in each kiosk can be uniformly accessed using conjunctive queries and where users can specify user-defined constraints over the data. NYAYA is easily extensible and robust to updates of both data and meta-data in the kiosk and can readily adapt to different logical organizations of the persistent storage. In the demonstration, we will show the capabilities of NYAYA over real-world case studies and demonstrate its efficiency over well-known benchmarks.
document engineering | 2012
Max C. Göbel; Tamir Hassan; Ermelinda Oro; Giorgio Orsi
This paper presents a methodology for the evaluation of table understanding algorithms for PDF documents. The evaluation takes into account three major tasks: table detection, table structure recognition and functional analysis. We provide a general and flexible output model for each task along with corresponding evaluation metrics and methods. We also present a methodology for collecting and ground-truthing PDF documents based on consensus-reaching principles and provide a publicly available ground-truthed dataset.