ward Ho | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where ward Ho is active.

Explore More

Publication

Featured researches published by ward Ho.

international conference on management of data | 2005

Clio grows up: from research prototype to industrial tool

Laura M. Haas; Mauricio A. Hernández; Howard Ho; Lucian Popa; Mary Tork Roth

Clio, the IBM Research system for expressing declarative schema mappings, has progressed in the past few years from a research prototype into a technology that is behind some of IBMs mapping technology. Clio provides a declarative way of specifying schema mappings between either XML or relational schemas. Mappings are compiled into an abstract query graph representation that captures the transformation semantics of the mappings. The query graph can then be serialized into different query languages, depending on the kind of schemas and systems involved in the mapping. Clio currently produces XQuery, XSLT, SQL, and SQL/XML queries. In this paper, we revisit the architecture and algorithms behind Clio. We then discuss some implementation issues, optimizations needed for scalability, and general lessons learned in the road towards creating an industrial-strength tool.

international conference on data engineering | 2002

Mapping XML and relational schemas with Clio

Lucian Popa; Mauricio A. Hernández; Yannis Velegrakis; Renée J. Miller; Felix Naumann; Howard Ho

Merging and coalescing data from multiple and diverse sources into different data formats continues to be an important problem in modern information systems. Schema matching (the process of matching elements of a source schema with elements of a target schema) and schema mapping (the process of creating a query that maps between two disparate schemas) are at the heart of data integration systems. We demonstrate Clio, a semi-automatic schema mapping tool developed at the IBM Almaden Research Center. In this paper, we showcase Clios mapping engine which allows mapping to and from relational and XML schemas, and takes advantage of data constraints in order to preserve data associations.

IEEE Data(base) Engineering Bulletin | 2015

Extracting, Linking and Integrating Data from Public Sources: A Financial Case Study

Douglas Burdick; Mauricio A. Hernández; Howard Ho; Georgia Koutrika; Rajasekar Krishnamurthy; Lucian Popa; Ioana Stanoi; Shivakumar Vaithyanathan; Sanjiv Ranjan Das

We present Midas, a system that uses complex data processing to extract and aggregate facts from a large collection of structured and unstructured documents into a set of unified, clean entities and relationships. Midas focuses on data for financial companies and is based on periodic filings with the U.S. Securities and Exchange Commission (SEC) and Federal Deposit Insurance Corporation (FDIC). We show that, by using data aggregated by Midas, we can provide valuable insights about financial institutions either at the whole system level or at the individual company level. The key technology components that we implemented in Midas and that enable the various financial applications are: information extraction, entity resolution, mapping and fusion, all on top of a scalable infrastructure based on Hadoop. We describe our experience in building the Midas system and also outline the key research questions that remain to be addressed towards building a generic, high-level infrastructure for large-scale data integration from public sources.

international world wide web conferences | 2007

Mapping-driven XML transformation

Haifeng Jiang; Howard Ho; Lucian Popa; Wook-Shin Han

Clio is an existing schema-mapping tool that provides user-friendly means to manage and facilitate the complex task of transformation and integration of heterogeneous data such as XML over the Web or in XML databases. By means of mappings from source to target schemas, Clio can help users conveniently establish the precise semantics of data transformation and integration. In this paper we study the problem of how to efficiently implement such data transformation (i.e., generating target data from the source data based on schema mappings). We present a three-phase framework for high-performance XML-to-XML transformation based on schema mappings, and discuss methodologies and algorithms for implementing these phases. In particular, we elaborate on novel techniques such as streamed extraction of mapped source values and scalable disk-based merging of overlapping data (including duplicate elimination). We compare our transformation framework with alternative methods such as using XQuery or SQL/XML provided by current commercial databases. The results demonstrate that the three-phase framework (although as simple as it is) is highly scalable and outperforms the alternative methods by orders of magnitude.

international conference on management of data | 2010

Midas: integrating public financial data

Sreeram V. Balakrishnan; Vivian Chu; Mauricio A. Hernández; Howard Ho; Rajasekar Krishnamurthy; Shixia Liu; Jan Pieper; Jeffrey S. Pierce; Lucian Popa; Christine Robson; Lei Shi; Ioana Stanoi; Edison Lao Ting; Shivakumar Vaithyanathan; Huahai Yang

The primary goal of the Midas project is to build a system that enables easy and scalable integration of unstructured and semi-structured information present across multiple data sources. As a first step in this direction, we have built a system that extracts and integrates information from regulatory filings submitted to the U.S. Securities and Exchange Commission (SEC) and the Federal Deposit Insurance Corporation (FDIC). Midas creates a repository of entities, events, and relationships by extracting, conceptualizing, integrating, and aggregating data from unstructured and semi-structured documents. This repository enables applications to use the extracted and integrated data in a variety of ways including mashups with other public data and complex risk analysis.

very large data bases | 2013

Discovering linkage points over web data

Oktie Hassanzadeh; Ken Q. Pu; Soheil Hassas Yeganeh; Renée J. Miller; Lucian Popa; Mauricio A. Hernández; Howard Ho

A basic step in integration is the identification of linkage points, i.e., finding attributes that are shared (or related) between data sources, and that can be used to match records or entities across sources. This is usually performed using a match operator, that associates attributes of one database to another. However, the massive growth in the amount and variety of unstructured and semi-structured data on the Web has created new challenges for this task. Such data sources often do not have a fixed pre-defined schema and contain large numbers of diverse attributes. Furthermore, the end goal is not schema alignment as these schemas may be too heterogeneous (and dynamic) to meaningfully align. Rather, the goal is to align any overlapping data shared by these sources. We will show that even attributes with different meanings (that would not qualify as schema matches) can sometimes be useful in aligning data. The solution we propose in this paper replaces the basic schema-matching step with a more complex instance-based schema analysis and linkage discovery. We present a framework consisting of a library of efficient lexical analyzers and similarity functions, and a set of search algorithms for effective and efficient identification of linkage points over Web data. We experimentally evaluate the effectiveness of our proposed algorithms in real-world integration scenarios in several domains.

international symposium on parallel architectures algorithms and networks | 2005

Clio: a schema mapping tool for information integration

Mauricio A. Hernández; Howard Ho; Felix Naumann; Lucian Popa

The summary form only given. Information integration typically requires the construction of complex artifacts like federated databases, ETL scripts, data warehouses, applications for accessing multiple data sources, and applications that ingest or publish XML. For many companies, it is one of the most complicated IT tasks they face today. To reduce the overall cost, intelligent tools are needed to simplify this difficult task. Clio is a semi-automatic tool for schema mapping and data integration developed at IBM Almaden Research Center over the past few years. It takes source and target schemas as input, which may describe relational or XML data models. Via a graphical Schema Viewer, a user can then interactively specify attribute correspondences between the source and target schemas. An AttributeMatcher component helps suggest such correspondences, based on the similarity of both attribute names and attribute values. Once the user has specified correspondences, Clio generates SQL, SQL/XML, XQuery or XSLT on the fly to implement the specified transformation, which is guaranteed to produce output data that conforms to the target schema. In this paper, we first describe and demonstrate some basic features of Clio. In particular, we describe the abstracted problems and the algorithms behind the AttributeMatcher component. Then, we will describe additional research problems abstracted from the area of schema mapping and information integration, with an emphasis on graph algorithms and issues on scalability and parallelism.

very large data bases | 2009

IBM UFO repository: object-oriented data integration

Michael N. Gubanov; Lucian Popa; Howard Ho; Hamid Pirahesh; Jeng-Yih Chang; Shr-Chang Chen

Currently, WWW, large enterprises, and desktop users suffer from an inability to efficiently access and manage differently structured data. The same data objects (e.g. Product) stored by different databases, repositories, distributed web storage systems, etc are named, referenced, and combined internally into schemas or data structures differently. This leads to structural mismatch of data that often consists of the same semantic objects (e.g. EBay and Yahoo! online auction offers).

international conference on data engineering | 2007

Creating Nested Mappings with Clio

Mauricio A. Hernández; Howard Ho; Lucian Popa; Takeshi Fukuda; Ariel Fuxman; Renée J. Miller; Paolo Papotti

Schema mappings play a central role in many data integration and data exchange scenarios. In those applications, users need to quickly and correctly specify how data represented in one format is converted into a different format. Clio (L. Popa et al., 2002) is a joint research project between IBM and the University of Toronto studying the creation, maintenance, and use of schema mappings. There have always been two goals in our work in Clio: 1) the automatic creation of logical assertions that capture the way one or more source schemas are mapped into a target schema, and 2) the generation of transformation queries or programs that transform a source data instance into a target data instance.

Archive | 2010

Unleashing the Power of Public Data for Financial Risk Measurement, Regulation, and Governance

Mauricio A. Hernández; Sanjiv Ranjan Das; Howard Ho; Georgia Koutrika; Rajasekar Krishnamurthy; Lucian Popa; Ioana Stanoi; Shivakumar Vaithyanathan

We present Midas, a system that uses complex data processing to extract and aggregate facts from a large collection of structured and unstructured documents into a set of unified, clean entities and relationships. Midas focuses on data for financial companies and is based on periodic filings with the U.S. Securities and Exchange Commission (SEC) and Federal Deposit Insurance Corporation (FDIC). We show that, by using data aggregated by Midas, we can provide valuable insights about financial institutions either at the whole system level or at the individual company level. To illustrate, we show how co-lending relationships that are extracted and aggregated from EC text filings can be used to construct a network of the major financial institutions. Centrality computations on this network enable us to identify critical hub banks for monitoring systemic risk. Financial analysts or regulators can further drill down into individual companies and visualize aggregated financial data as well as relationships with other companies or people (e.g., officers or directors). The key technology components that we implemented in Midas and that enable the above applications are: information extraction, entity resolution, mapping and fusion, all on top of a scalable infrastructure based on Hadoop.

Explore More