Mary Tork Roth | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mary Tork Roth is active.

Explore More

Publication

Featured researches published by Mary Tork Roth.

international conference on management of data | 2005

Clio grows up: from research prototype to industrial tool

Laura M. Haas; Mauricio A. Hernández; Howard Ho; Lucian Popa; Mary Tork Roth

Clio, the IBM Research system for expressing declarative schema mappings, has progressed in the past few years from a research prototype into a technology that is behind some of IBMs mapping technology. Clio provides a declarative way of specifying schema mappings between either XML or relational schemas. Mappings are compiled into an abstract query graph representation that captures the transformation semantics of the mappings. The query graph can then be serialized into different query languages, depending on the kind of schemas and systems involved in the mapping. Clio currently produces XQuery, XSLT, SQL, and SQL/XML queries. In this paper, we revisit the architecture and algorithms behind Clio. We then discuss some implementation issues, optimizations needed for scalability, and general lessons learned in the road towards creating an industrial-strength tool.

Ibm Systems Journal | 2002

Data integration through database federation

Laura M. Haas; Eileen Tien Lin; Mary Tork Roth

In a large modern enterprise, it is almost inevitable that different parts of the organization will use different systems to produce, store, and search their critical data. Yet, it is only by combining the information from these various systems that the enterprise can realize the full value of the data they contain. Database federation is one approach to data integration in which middleware, consisting of a relational database management system, provides uniform access to a number of heterogeneous data sources. In this paper, we describe the basics of database federation, introduce several styles of database federation, and outline the conditions under which each style of federation should be used. We discuss the benefits of an information integration solution based on database technology, and we demonstrate the utility of the database federation approach through a number of usage scenarios involving IBMs DB2 product.

Ibm Systems Journal | 2002

Information integration: A new generation of information technology

Mary Tork Roth; Daniel C. Wolfson; Jim Kleewein; Constance Jane Nelin

The explosion of the Internet and e-business in recent years has caused a secondary explosion in the amounts and types of information available to enterprise applications. Industry analysts predict that more data will be generated in the next three years than in all of recorded history. Because the adoption of Internet-based business transaction models has significantly outpaced the development of tools and technologies to deal with the information explosion, many businesses find their systems breaking under the sheer volume and diversity of data being directed at them. The challenge facing businesses today is information integration. Enterprise applications must interact with databases, application servers, content management systems, data warehouses, workflow systems, search engines, message queues, Web crawlers, mining and analysis packages, and other enterprise integration applications. They must use a variety of programming interfaces and understand a variety of languages and formats. They must extract and combine data in multiple formats generated by multiple delivery mechanisms. Clearly, the boundaries that have traditionally existed between database management systems, content management systems, midtier caches, data warehouses, and other data management systems are blurring, and there is a great need for a platform that provides a unified view of all of these services and the data they deliver.

cooperative information systems | 1999

Using Fagin's algorithm for merging ranked results in multimedia middleware

Edward L. Wimmers; Laura M. Haas; Mary Tork Roth; Christoph Braendli

A distributed multimedia information system allows users to access data of different modalities, from different data sources, ranked by various combinations of criteria. Fagin (1996) gives an algorithm for efficiently merging multiple ordered streams of ranked results, to form a new stream ordered by a combination of those ranks. In this paper we describe the implementation of Fagins algorithm in an actual multimedia middleware system, including a novel, incremental version of the algorithm that supports dynamic exploration of data. We show that the algorithm would perform well as part of a single multimedia server and can even be effective in the distributed environment (for a limited set of queries), but that the assumptions it makes about random access limit its applicability dramatically. Our experience provides a better understanding of an important algorithm, and exposes an open problem for distributed multimedia information systems.

Ibm Systems Journal | 2004

Enabling distributed enterprise integration with WebSphere and DB2 information integrator

Cynthia Maro Saracco; Mary Tork Roth; Daniel C. Wolfson

Information technology architects increasingly find themselves searching for better ways to access, integrate, and leverage their information, applications, and business processes. Information integration, in particular, is critical to the community of Web-based businesses, as firms that are able to leverage their information resources most effectively are best positioned to emerge as leaders in their industries. In this paper, we explore how companies can solve this complex business challenge by extending the reach of WebSphere® technology with DB2® Information Integrator (II). DB2 II offers WebSphere developers a new approach to coping with diverse and distributed information sources, enabling them to reduce programming costs, shorten development cycles, and attain reasonable levels of performance for server-side components that need to integrate information throughout their enterprises and partner networks.

international conference on big data | 2015

LabBook: Metadata-driven social collaborative data analysis

Eser Kandogan; Mary Tork Roth; Peter M. Schwarz; Joshua Hui; Ignacio G. Terrizzano; Christina Christodoulakis; Renée J. Miller

Open data analysis platforms are being adopted to support collaboration in science and business. Studies suggest that analytic work in an enterprise occurs in a complex ecosystem of people, data, and software working in a coordinated manner. These studies also point to friction between the elements of this ecosystem that reduces user productivity and quality of work. LabBook is an open, social, and collaborative data analysis platform designed explicitly to reduce this friction and accelerate discovery. Its goal is to help users leverage each others knowledge and experience to find the data, tools and collaborators they need to integrate, visualize, and analyze data. The key insight is to collect and use more metadata about all elements of the analytic ecosystem by means of an architecture and user experience that reduce the cost of contributing such metadata. We demonstrate how metadata can be exploited to improve the collaborative user experience and facilitate collaborative data integration and recommendations. We describe a specific use case and discuss several design issues concerning the capture, representation, querying and use of metadata.

international congress on big data | 2013

Data for All: A Systems Approach to Accelerate the Path from Data to Insight

Eser Kandogan; Mary Tork Roth; Cheryl A. Kieliszewski; Fatma Ozcan; Bob Schloss; Marc-Thomas Schmidt

Zettabytes of data are available to be harvested for competitive business advantage, sound government policies, and new insights in a broad array of applications. Yet, most of this data is inaccessible for users, since current data analysis tools require an army of technical people to find, transform, analyze, and visualize data in order to make it consumable for decision making. In this paper, we present work in progress to lower the barriers for data-driven decision making by introducing a systems approach to scale the user experience, not only in the volume and variety of data, but also in the skills required to harvest that data. We call for a new approach for data-intensive applications that engages the user as an intelligent partner in a social and intelligent conversation with data by automating, guiding, and recommending data, transformations, visualizations, analytics, and suggesting collaboration opportunities within an analytics marketplace, and leverages both metadata and semantic information about the data captured from conversations.

BTW | 1999

From Object-Relational to Federated Databases

Nelson Mendonca Mattos; Jim Kleewein; Mary Tork Roth; Kathy Zeidenstein

Object-relational databases allow users to manipulate rich data types that are not supported by traditional relational database systems. However, the majority of such data are in systems that are outside of that database—either in file systems, specialized systems, hierarchical databases, or other varieties of relational database that do not provide as rich a level of abstraction. Users want to take advantage of object-relational technology to exploit the rich semantics and abstraction but cannot afford to change existing applications or to move that data into the database to do so. The solution for enterprises is a federated database system, which allows users to leverage their existing applications and existing data, while allowing new applications to exploit the functional richness of object-relational technology.

international conference on data mining | 2016

Using Machine Learning to Accelerate Data Wrangling

Shilpi Ahuja; Mary Tork Roth; Rashmi Gangadharaiah; Peter M. Schwarz; Rafael Bastidas

70% Of the time spent on data analytics is not actually spent on data analytics, but rather, in data wrangling: the process of finding, interpreting, extracting, preparing and recombining the data to be analyzed. For data that is collected as free-form text, the lack of standards or competing standards often results in a variety of formats for expressing the same type of data, making the data wrangling step a tedious and error-prone process. For example, US street addresses may be expressed with a house number, PO Box, rural or military route, and/or a direction – all of which can be abbreviated or spelled out in a variety of ways. In this paper, we present an algorithm that uses machine learning to efficiently and automatically identify categories of attributes, such as geo-spatial, that are present in a data file and we discuss results on a variety of real data sets. Our implementation can be used to automatically prepare data for consumption by other tools and services, such as mapping and visualization tools, and is motivated by and in support of a customizable severe weather alerting service.

very large data bases | 1997