Marie Jacob | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Marie Jacob is active.

Explore More

Publication

Featured researches published by Marie Jacob.

international conference on management of data | 2008

The ORCHESTRA Collaborative Data Sharing System

Zachary G. Ives; Todd J. Green; Grigoris Karvounarakis; Nicholas E. Taylor; Val Tannen; Partha Pratim Talukdar; Marie Jacob; Fernando Pereira

Sharing structured data today requires standardizing upon a single schema, then mapping and cleaning all of the data. This results in a single queriable mediated data instance. However, for settings in which structured data is being collaboratively authored by a large community, e.g., in the sciences, there is often a lack of consensus about how it should be represented, what is correct, and which sources are authoritative. Moreover, such data is seldom static: it is frequently updated, cleaned, and annotated. The ORCHESTRA collaborative data sharing system develops a new architecture and consistency model for such settings, based on the needs of data sharing in the life sciences. In this paper we describe the basic architecture and implementation of the ORCHESTRA system, and summarize some of the open challenges that arise in this setting.

very large data bases | 2010

Dynamic join optimization in multi-hop wireless sensor networks

Svilen R. Mihaylov; Marie Jacob; Zachary G. Ives; Sudipto Guha

To enable smart environments and self-tuning data centers, we are developing the Aspen system for integrating physical sensor data, as well as stream data coming from machine logical state, and database or Web data from the Internet. A key component of this system is a query processor optimized for limited-bandwidth, possibly battery-powered devices with multiple hop wireless radio communications. This query processor is given a portion of a data integration query, possibly including joins among sensors, to execute. Several recent papers have developed techniques for computing joins in sensors, but these techniques are static and are only appropriate for specific join selectivity ratios. We consider the problem of dynamic join optimization for sensor networks, developing solutions that employ cost modeling, as well as adaptive learning and self-tuning heuristics to choose the best algorithm under real and variable selectivity values. We focus on in-network join computation, but our architecture extends to other approaches (and we compare against these). We develop basic techniques assuming selectivities are uniform and known in advance, and optimization can be done on a pairwise basis; we then extend the work to handle joins between multiple pairs, when selectivities are not fully known. We experimentally validate our work at scale using standard datasets.

very large data bases | 2014

A system for management and analysis of preference data

Marie Jacob; Benny Kimelfeld; Julia Stoyanovich

Preference data arises in a wide variety of domains. Over the past decade, we have seen a sharp increase in the volume of preference data, in the diversity of applications that use it, and in the richness of preference data analysis methods. Examples of applications include rank aggregation in genomic data analysis, management of votes in elections, and recommendation systems in e-commerce. However, little attention has been paid to the challenges of building a system for preference-data management, which would help incorporate sophisticated analytics into larger applications, support computational abstractions for usability by data scientists, and enable scaling up to modern volumes. This vision paper proposes a management system for preference data that aims to address these challenges. We adopt the relational database model, and propose extensions that are specialized to handling preference data. Specifically, we introduce a special type of a relation that is designed for preference data, and describe composable operators on preference relations that can be embedded in SQL statements, for convenient reuse across applications.

data management for sensor networks | 2008

A substrate for in-network sensor data integration

Svilen R. Mihaylov; Marie Jacob; Zachary G. Ives; Sudipto Guha

With the ultimate goal of extending the data integration paradigm and query processing capabilities to ad hoc wireless networks, sensors, and stream systems, we consider how to support communication between sets of nodes performing distributed joins in sensor networks. We develop a communication model that enables in-network join at a variety of locations, and which facilitates coordination among nodes in order to make optimization decisions. While we defer a discussion of the optimizer to future work, we experimentally compare a variety of strategies, including at-base and in-network joins. Results show significant performance gains versus prior work, as well as opportunities for optimization.

international conference on management of data | 2011

Sharing work in keyword search over databases

Marie Jacob; Zachary G. Ives

An important means of allowing non-expert end-users to pose ad hoc queries whether over single databases or data integration systems is through keyword search. Given a set of keywords, the query processor finds matches across different tuples and tables. It computes and executes a set of relational sub-queries whose results are combined to produce the k highest ranking answers. Work on keyword search primarily focuses on single-database, single-query settings: each query is answered in isolation, despite possible overlap between queries posed by different users or at different times; and the number of relevant tables is assumed to be small, meaning that sub-queries can be processed without using cost-based methods to combine work. As we apply keyword search to support ad hoc data integration queries over scientific or other databases on the Web, we must reuse and combine computation. In this paper, we propose an architecture that continuously receives sets of ranked keyword queries, and seeks to reuse work across these queries. We extend multiple query optimization and continuous query techniques, and develop a new query plan scheduling module we call the ATC (based on its analogy to an air traffic controller). The ATC manages the flow of tuples among a multitude of pipelined operators, minimizing the work needed to return the top-k answers for all queries. We also develop techniques to manage the sharing and reuse of state as queries complete and input data streams are exhausted. We show the effectiveness of our techniques in handling queries over real and synthetic data sets.

international workshop on the web and databases | 2015

Analyzing Crowd Rankings

Julia Stoyanovich; Marie Jacob; Xuemei Gong

Ranked data is ubiquitous in real-world applications, arising naturally when users express preferences about products and services, when voters cast ballots in elections, and when funding proposals are evaluated based on their merits or university departments based on their reputation. This paper focuses on crowdsourcing and novel analysis of ranked data. We describe the design of a data collection task in which Amazon MT workers were asked to rank movies. We present results of data analysis, correlating our ranked dataset with IMDb, where movies are rated on a discrete scale rather than ranked. We develop an intuitive measure of worker quality appropriate for this task, where no gold standard answer exists. We propose a model of local structure in ranked datasets, reflecting that subsets of the workers agree in their ranking over subsets of the items, develop a data mining algorithm that identifies such structure, and evaluate in on our dataset. Our dataset is publicly available at https://github.com/stoyanovich/CrowdRank.

very large data bases | 2008

Learning to create data-integrating queries

Partha Pratim Talukdar; Marie Jacob; Muhammad Salman Mehmood; Koby Crammer; Zachary G. Ives; Fernando Pereira; Sudipto Guha

conference on innovative data systems research | 2009

Interactive Data Integration through Smart Copy & Paste.

Zachary G. Ives; Craig A. Knoblock; Steven Minton; Marie Jacob; Partha Pratim Talukdar; Rattapoom Tuchinda; José Luis Ambite; Maria Muslea; Cenk Gazen

international conference on management of data | 2009