Mauricio A. Hernández

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mauricio A. Hernández is active.

Explore More

Publication

Featured researches published by Mauricio A. Hernández.

very large data bases | 2002

Translating web data

Lucian Popa; Yannis Velegrakis; Mauricio A. Hernández; Renée J. Miller; Ronald Fagin

We present a novel framework for mapping between any combination of XML and relational schemas, in which a high-level, user-specified mapping is translated into semantically meaningful queries that transform source data into the target representation. Our approach works in two phases. In the first phase, the high-level mapping, expressed as a set of inter-schema correspondences, is converted into a set of mappings that capture the design choices made in the source and target schemas (including their hierarchical organization as well as their nested referential constraints). The second phase translates these mappings into queries over the source schemas that produce data satisfying the constraints and structure of the target schema, and preserving the semantic relationships of the source. Nonnull target values may need to be invented in this process. The mapping algorithm is complete in that it produces all mappings that are consistent with the schema constraints. We have implemented the translation algorithm in Clio, a schema mapping tool, and present our experience using Clio on several real schemas.

international conference on management of data | 2001

The Clio project: managing heterogeneity

Renée J. Miller; Mauricio A. Hernández; Laura M. Haas; Lingling Yan; C. T. Howard Ho; Ronald Fagin; Lucian Popa

Clio is a system for managing and facilitating the complex tasks of heterogeneous data transformation and integration. In Clio, we have collected together a powerful set of data management techniques that have proven invaluable in tackling these difficult problems. In this paper, we present the underlying themes of our approach and present a brief case study.

international conference on management of data | 2005

Clio grows up: from research prototype to industrial tool

Laura M. Haas; Mauricio A. Hernández; Howard Ho; Lucian Popa; Mary Tork Roth

Clio, the IBM Research system for expressing declarative schema mappings, has progressed in the past few years from a research prototype into a technology that is behind some of IBMs mapping technology. Clio provides a declarative way of specifying schema mappings between either XML or relational schemas. Mappings are compiled into an abstract query graph representation that captures the transformation semantics of the mappings. The query graph can then be serialized into different query languages, depending on the kind of schemas and systems involved in the mapping. Clio currently produces XQuery, XSLT, SQL, and SQL/XML queries. In this paper, we revisit the architecture and algorithms behind Clio. We then discuss some implementation issues, optimizations needed for scalability, and general lessons learned in the road towards creating an industrial-strength tool.

Conceptual Modeling: Foundations and Applications | 2009

Clio: Schema Mapping Creation and Data Exchange

Ronald Fagin; Laura M. Haas; Mauricio A. Hernández; Renée J. Miller; Lucian Popa; Yannis Velegrakis

The Clio project provides tools that vastly simplify information integration. Information integration requires data conversions to bring data in different representations into a common form. Key contributions of Clio are the definition of non-procedural schema mappings to describe the relationship between data in heterogeneous schemas, a new paradigm in which we view the mapping creation process as one of query discovery, and algorithms for automatically generating queries for data transformation from the mappings. Clio provides algorithms to address the needs of two major information integration problems, namely, data integration and data exchange . In this chapter, we present our algorithms for both schema mapping creation via query discovery, and for query generation for data exchange. These algorithms can be used in pure relational, pure XML, nested relational, or mixed relational and nested contexts.

international conference on management of data | 2001

Clio: a semi-automatic tool for schema mapping

Mauricio A. Hernández; Renée J. Miller; Laura M. Haas

We consider the integration requirements of modern data intensive applications including data warehousing, global information systems and electronic commerce. At the heart of these requirements lies the schema mapping problem in which a source (legacy) database must be mapped into a different, but xed, target schema. The goal of schema mapping is the discovery of a query or set of queries to map source databases into the new structure. We demonstrate Clio, a new semi-automated tool for creating schema mappings. Clio employs a mapping-by-example paradigm that relies on the use of value correspondences describing how a value of a target attribute can be created from a set of values of source attributes. A typical session with Clio starts with the user loading a source and a target schema into the system. These schemas are read from either an underlying Object-Relational database or from an XML le with an associated XML Schema. Users can then draw value correspondences mapping source attributes into target attributes. Clios mapping engine incrementally produces the SQL queries that realize the mappings implied by the correspondences. Clio provides schema and data browsers and other feedback to allow users to understand the mapping produced. Entering and manipulating value correspondences can be done in two modes. In the Schema View mode, users see a representation of the source and target schema and create value correspondences by selecting schema objects from the source and mapping them to a target attribute. The alternative Data View mode o ers a WYSIWYG interface for the mapping process that displays example data for both the source and target tables [3]. Users may add and delete value correspondences from this view and immediately see the changes re ected in the resulting target tuples. Also, the Data View mode helps users navigate through alternative mappings, understanding the often subtle di erences between them. For example, in some cases, changing a join from an inner join to an outer join may dramatically change the resulting table. In other cases, the same change may have no e ect due to constraints that hold on the source

very large data bases | 2010

Explaining missing answers to SPJUA queries

Melanie Herschel; Mauricio A. Hernández

This paper addresses the problem of explaining missing answers in queries that include selection, projection, join, union, aggregation and grouping (SPJUA). Explaining missing answers of queries is useful in various scenarios, including query understanding and debugging. We present a general framework for the generation of these explanations based on source data. We describe the algorithms used to generate a correct, finite, and, when possible, minimal set of explanations. These algorithms are part of Artemis, a system that assists query developers in analyzing queries by, for instance, allowing them to ask why certain tuples are not in the query results. Experimental results demonstrate that Artemis generates explanations of missing tuples at a pace that allows developers to effectively use them for query analysis.

international conference on data engineering | 2008

Orchid: Integrating Schema Mapping and ETL

Stefan Dessloch; Mauricio A. Hernández; Ryan Wisnesky; Ahmed Radwan; Jindan Zhou

This paper describes Orchid, a system that converts declarative mapping specifications into data flow specifications (ETL jobs) and vice versa. Orchid provides an abstract operator model that serves as a common model for both transformation paradigms; both mappings and ETL jobs are transformed into instances of this common model. As an additional benefit, instances of this common model can be optimized and deployed into multiple target environments. Orchid is being deployed in FastTrack, a data transformation toolkit in IBM Information Server.

international conference on data engineering | 2002

Mapping XML and relational schemas with Clio

Lucian Popa; Mauricio A. Hernández; Yannis Velegrakis; Renée J. Miller; Felix Naumann; Howard Ho

Merging and coalescing data from multiple and diverse sources into different data formats continues to be an important problem in modern information systems. Schema matching (the process of matching elements of a source schema with elements of a target schema) and schema mapping (the process of creating a query that maps between two disparate schemas) are at the heart of data integration systems. We demonstrate Clio, a semi-automatic schema mapping tool developed at the IBM Almaden Research Center. In this paper, we showcase Clios mapping engine which allows mapping to and from relational and XML schemas, and takes advantage of data constraints in order to preserve data associations.

IEEE Data(base) Engineering Bulletin | 2015

Extracting, Linking and Integrating Data from Public Sources: A Financial Case Study

Douglas Burdick; Mauricio A. Hernández; Howard Ho; Georgia Koutrika; Rajasekar Krishnamurthy; Lucian Popa; Ioana Stanoi; Shivakumar Vaithyanathan; Sanjiv Ranjan Das

We present Midas, a system that uses complex data processing to extract and aggregate facts from a large collection of structured and unstructured documents into a set of unified, clean entities and relationships. Midas focuses on data for financial companies and is based on periodic filings with the U.S. Securities and Exchange Commission (SEC) and Federal Deposit Insurance Corporation (FDIC). We show that, by using data aggregated by Midas, we can provide valuable insights about financial institutions either at the whole system level or at the individual company level. The key technology components that we implemented in Midas and that enable the various financial applications are: information extraction, entity resolution, mapping and fusion, all on top of a scalable infrastructure based on Hadoop. We describe our experience in building the Midas system and also outline the key research questions that remain to be addressed towards building a generic, high-level infrastructure for large-scale data integration from public sources.

very large data bases | 2009

Artemis: a system for analyzing missing answers

Melanie Herschel; Mauricio A. Hernández; Wang Chiew Tan

A central feature of relational database management systems is the ability to define multiple different views over an underlying database schema. Views provide a method of defining access control to the underlying database, since a view exposes a part of the database and hides the rest. Views also provide logical data independence to application programs that access the database. For most cases, the process of specifying the desired views in SQL is typically tedious and error-prone. While numerous tools exist to support developers in debugging program code, we are not aware of any tool that supports developers in verifying the correctness of their views defined in SQL.

Explore More