Len Seligman | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Len Seligman is active.

Explore More

Publication

Featured researches published by Len Seligman.

international conference on management of data | 2010

OpenII: an open source information integration toolkit

Len Seligman; Peter Mork; Alon Y. Halevy; Kenneth P. Smith; Michael J. Carey; Kuang Chen; Chris Wolf; Jayant Madhavan; Akshay Kannan; Doug Burdick

OpenII (openintegration.org) is a collaborative effort to create a suite of open-source tools for information integration (II). The project is leveraging the latest developments in II research to create a platform on which integration tools can be built and further research conducted. In addition to a scalable, extensible platform, OpenII includes industrial-strength components developed by MITRE, Google, UC-Irvine, and UC-Berkeley that interoperate through a common repository in order to solve II problems. Components of the toolkit have been successfully applied to several large-scale US government II challenges.

Journal on Data Semantics | 2008

The Harmony Integration Workbench

Peter Mork; Len Seligman; Arnon Rosenthal; Joel Korb; Chris Wolf

A key aspect of any data integration endeavor is determining the relationships between the source schemata and the target schema. This schema integration task must be tackled regardless of the integration architecture or mapping formalism. In this paper, we provide a task model for schema integration. We use this breakdown to motivate a workbench for schema integration in which multiple tools share a common knowledge repository. In particular, the workbench facilitates the interoperation of research prototypes for schema matching (which automatically identify likely semantic correspondences) with commercial schema mapping tools (which help produce instance-level transformations). Currently, each of these tools provides its own ad hoc representation of schemata and mappings; combining these tools requires aligning these representations. The workbench provides a common representation so that these tools can more rapidly be combined.

IEEE Computer | 2008

Everybody Share: The Challenge of Data-Sharing Systems

Kenneth P. Smith; Len Seligman; Vipin Swarup

Data sharing is increasingly important in modern society, yet researchers typically focus on a single technology, such as Web services, or explore only one aspect, such as semantic integration. The authors propose a technology-neutral framework for characterizing data sharing problems and solutions and discuss open research challenges.

international conference on information systems security | 2006

A data sharing agreement framework

Vipin Swarup; Len Seligman; Arnon Rosenthal

When consumers build value-added services on top of data resources they do not control, they need to manage their information supply chains to ensure that their data suppliers produce and supply required data as needed. Producers also need to manage their information supply chains to ensure that their data is disseminated and protected appropriately. In this paper, we present a framework for data sharing agreements (DSA) that supports a wide variety of data sharing policies. A DSA is modeled as a set of obligation constraints expressed over a dataflow graph whose nodes are principals with local stores and whose edges are (typed) channels along which data flows. We present a specification language for DSAs in which obligations are expressed as distributed temporal logic (DTL) predicates over data resources, dataflow events, and datastore events. We illustrate the use of our framework via a case study based on a real-world data sharing agreement and discuss issues related to the analysis and compliance of agreements.

information reuse and integration | 2011

PLUS: A provenance manager for integrated information

Adriane Chapman; Barbara T. Blaustein; Len Seligman; M. David Allen

It can be difficult to fully understand the result of integrating information from diverse sources. When all the information comes from a single organization, there is a collective knowledge about where it came from and whether it can be trusted. Unfortunately, once information from multiple organizations is integrated, there is no longer a shared knowledge of the data and its quality. It is often impossible to view and judge the information from a different organization; when errors occur, notification does not always reach all users of the data. We describe how a multi-organizational provenance store that collects provenance from heterogeneous systems addresses these problems. Unlike most provenance systems, we cope with an open world, where the data usage is not determined in advance and can take place across many systems and organizations.

very large data bases | 2008

Analyzing and revising data integration schemas to improve their matchability

Xiaoyong Chai; Mayssam Sayyadian; AnHai Doan; Arnon Rosenthal; Len Seligman

Data integration systems often provide a uniform query interface, called a mediated schema, to a multitude of data sources. To answer user queries, such systems employ a set of semantic matches between the mediated schema and the data-source schemas. Finding such matches is well known to be difficult. Hence much work has focused on developing semi-automatic techniques to efficiently find the matches. In this paper we consider the complementary problem of improving the mediated schema, to make finding such matches easier. Specifically, a mediated schema S will typically be matched with many source schemas. Thus, can the developer of S analyze and revise S in a way that preserves Ss semantics, and yet makes it easier to match with in the future? In this paper we provide an affirmative answer to the above question, and outline a promising solution direction, called mSeer. Given a mediated schema S and a matching tool M, mSeer first computes a matchability score that quantifies how well S can be matched against using M. Next, mSeer uses this score to generate a matchability report that identifies the problems in matching S. Finally, mSeer addresses these problems by automatically suggesting changes to S (e.g., renaming an attribute, reformatting data values, etc.) that it believes will preserve the semantics of S and yet make it more amenable to matching. We present extensive experiments over several real-world domains that demonstrate the promise of the proposed approach.

international provenance and annotation workshop | 2010

Capturing Provenance in the Wild

M. David Allen; Adriane Chapman; Barbara T. Blaustein; Len Seligman

All current provenance systems are “closed world” systems; provenance is collected within the confines of a well understood, pre-planned system. However, when users compose services from heterogeneous systems and organizations to form a new application, it is impossible to track the provenance in the new system using currently available work. In this work, we describe the ability to compose multiple provenance-unaware services in an “open world” system and still collect provenance information about their execution. Our approach is implemented using the PLUS provenance system and the open source MULE Enterprise Service Bus. Our evaluations show that this approach is scalable and has minimal overhead.

information reuse and integration | 2011

Unity: Speeding the creation of community vocabularies for information integration and reuse

Kenneth P. Smith; Peter Mork; Len Seligman; Peter Leveille; Beth Yost; Maya Hao Li; Chris Wolf

Many data sharing communities create data standards (“hub” schemata) to speed information integration by increasing reuse of both data definitions and mappings. Unfortunately, creation of these standards and the mappings to the enterprises implemented systems is both time consuming and expensive. This paper presents Unity, a novel tool for speeding the development of a community vocabulary, which includes both a standard schema and the necessary mappings. We present Unitys scalable algorithms for creating vocabularies and its novel human computer interface which gives the integrator a powerful environment for refining the vocabulary. We then describe Unitys extensive reuse of data structures and algorithms from the OpenII information integration framework, which not only sped the construction of Unity but also results in reuse of the artifacts produced by Unity: vocabularies serve as the basis of information exchanges, and also can be reused as thesauri by other tools within the OpenII framework. Unity has been applied to real U.S. government information integration challenges.

ieee international conference on technologies for homeland security | 2009

Information interoperability and provenance for emergency preparedness and response

Len Seligman; Barbara T. Blaustein; Peter Mork; Kenneth P. Smith; Neal Rothleder

Improved situation awareness is a key enabler of better emergency preparedness and response (EP&R). This paper describes two important challenges: information interoperability and provenance. The former enables meaningful information exchange across separately developed systems, while the latter gives users context that helps them interpret shared information and make trust decisions. We present applied research in information interoperability and provenance, describe our collaborations with leading industrial and academic partners, and illustrate how the resulting tools improve information sharing during preparation, training/exercises, ongoing operations, and response.

Journal of Data and Information Quality | 2016

The Challenge of “Quick and Dirty” Information Quality

Adriane Chapman; Arnon Rosenthal; Len Seligman

Traditionally, information quality (IQ) techniques provide CIOs and other senior IT managers with an in-depth quality assessment of data assets under the control of the enterprise. The goal is to clean dirty data to improve current operational effectiveness and to identify ways in which the organization’s data production processes can be improved so as to produce higher-quality data in the future. This process typically takes place over months and years, and it consumes substantial human and computer resources [Redman 1998, Shilakes and JulieTylman 1998, Gorla, Somers et al. 2010]. Prior work that studies data quality within a limited time frame, such as Long et al. [2004], assumes that the data owner is undertaking a curation task. This article argues for a new direction in IQ research, to enable the creation of rapidly deployed functionality, where the timescale is hours or days rather than months. In these scenarios, developers, not the CIO, make decisions but have neither time nor authority to influence the sources’ data collection processes. Developers may need to rapidly (1) choose from among a pool of candidate data sources, many of which may be external to the developer’s home organization; (2) clean the data; (3) integrate data from multiple sources; and (4) provide a lightweight, value-added application to decision makers. Fundamentally, these are triage activities involving multiple, rapid sprints designed to identify additional, value-added data sources, while ruling out or postponing the collection of other data sources. This triage activity is required to enable time-critical decision making and minimize decision-maker risk, giving rise to the need for quick and dirty information quality (QDIQ) assessments. The need for QDIQ assessments spans public health, disaster response, intelligence, and coalition

Explore More