Is this you? Create Your Porfile

Adam Marcus

Massachusetts Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Adam Marcus is active.

Explore More

Publication

Featured researches published by Adam Marcus.

human factors in computing systems | 2011

Twitinfo: aggregating and visualizing microblogs for event exploration

Adam Marcus; Michael S. Bernstein; Osama Badar; David R. Karger; Samuel Madden; Robert C. Miller

Microblogs are a tremendous repository of user-generated content about world events. However, for people trying to understand events by querying services like Twitter, a chronological log of posts makes it very difficult to get a detailed understanding of an event. In this paper, we present TwitInfo, a system for visualizing and summarizing events on Twitter. TwitInfo allows users to browse a large collection of tweets using a timeline-based display that highlights peaks of high tweet activity. A novel streaming algorithm automatically discovers these peaks and labels them meaningfully using text from the tweets. Users can drill down to subevents, and explore further via geolocation, sentiment, and popular URLs. We contribute a recall-normalized aggregate sentiment visualization to produce more honest sentiment overviews. An evaluation of the system revealed that users were able to reconstruct meaningful summaries of events in a small amount of time. An interview with a Pulitzer Prize-winning journalist suggested that the system would be especially useful for understanding a long-running event and for identifying eyewitnesses. Quantitatively, our system can identify 80-100% of manually labeled peaks, facilitating a relatively complete view of each event studied.

very large data bases | 2011

Human-powered sorts and joins

Adam Marcus; Eugene Wu; David R. Karger; Samuel Madden; Robert C. Miller

Crowdsourcing markets like Amazons Mechanical Turk (MTurk) make it possible to task people with small jobs, such as labeling images or looking up phone numbers, via a programmatic interface. MTurk tasks for processing datasets with humans are currently designed with significant reimplementation of common workflows and ad-hoc selection of parameters such as price to pay per task. We describe how we have integrated crowds into a declarative workflow engine called Qurk to reduce the burden on workflow designers. In this paper, we focus on how to use humans to compare items for sorting and joining data, two of the most common operations in DBMSs. We describe our basic query interface and the user interface of the tasks we post to MTurk. We also propose a number of optimizations, including task batching, replacing pairwise comparisons with numerical ratings, and pre-filtering tables before joining them, which dramatically reduce the overall cost of running sorts and joins on the crowd. In an experiment joining two sets of images, we reduce the overall cost from

human factors in computing systems | 2010

Enhancing directed content sharing on the web

Michael S. Bernstein; Adam Marcus; David R. Karger; Robert C. Miller

67 in a naive implementation to about

very large data bases | 2012

Counting with the crowd

Adam Marcus; David R. Karger; Samuel Madden; Robert C. Miller; Sewoong Oh

3, without substantially affecting accuracy or latency. In an end-to-end experiment, we reduced cost by a factor of 14.5.

2009 First International Workshop on Near Field Communication | 2009

Using NFC-Enabled Mobile Phones for Public Health in Developing Countries

Adam Marcus; Guido Davidzon; Denise Law; Namrata Verma; Rich Fletcher; Aamir J. Khan; Luis F. G. Sarmenta

To find interesting, personally relevant web content, people rely on friends and colleagues to pass links along as they encounter them. In this paper, we study and augment link-sharing via e-mail, the most popular means of sharing web content today. Armed with survey data indicating that active sharers of novel web content are often those that actively seek it out, we developed FeedMe, a plug-in for Google Reader that makes directed sharing of content a more salient part of the user experience. FeedMe recommends friends who may be interested in seeing content that the user is viewing, provides information on what the recipient has seen and how many emails they have received recently, and gives recipients the opportunity to provide lightweight feedback when they appreciate shared content. FeedMe introduces a novel design space within mixed-initiative social recommenders: friends who know the user voluntarily vet the material on the users behalf. We performed a two-week field experiment (N=60) and found that FeedMe made it easier and more enjoyable to share content that recipients appreciated and would not have found otherwise.

international conference on management of data | 2011

Tweets as data: demonstration of TweeQL and Twitinfo

Adam Marcus; Michael S. Bernstein; Osama Badar; David R. Karger; Samuel Madden; Robert C. Miller

In this paper, we address the problem of selectivity estimation in a crowdsourced database. Specifically, we develop several techniques for using workers on a crowdsourcing platform like Amazons Mechanical Turk to estimate the fraction of items in a dataset (e.g., a collection of photos) that satisfy some property or predicate (e.g., photos of trees). We do this without explicitly iterating through every item in the dataset. This is important in crowd-sourced query optimization to support predicate ordering and in query evaluation, when performing a GROUP BY operation with a COUNT or AVG aggregate. We compare sampling item labels, a traditional approach, to showing workers a collection of items and asking them to estimate how many satisfy some predicate. Additionally, we develop techniques to eliminate spammers and colluding attackers trying to skew selectivity estimates when using this count estimation approach. We find that for images, counting can be much more effective than sampled labeling, reducing the amount of work necessary to arrive at an estimate that is within 1% of the true fraction by up to an order of magnitude, with lower worker latency. We also find that sampled labeling outperforms count estimation on a text processing task, presumably because people are better at quickly processing large batches of images than they are at reading strings of text. Our spammer detection technique, which is applicable to both the label- and count-based approaches, can improve accuracy by up to two orders of magnitude.

international world wide web conferences | 2010

Sync kit: a persistent client-side database caching toolkit for data intensive websites

Edward Benson; Adam Marcus; David R. Karger; Samuel Madden

One of the largest IT challenges in the health and medical fields is the ability to track large numbers of patients and materials. As mobile phone availability becomes ubiquitous around the world, the use of Near Field Communication (NFC) with mobile phones is emerging as a promising solution to this challenge. The decreasing price and increasing availability of mobile phones and NFC allows us to to apply these technologies to developing countries in order to overcome patient identification and disease surveillance limitations, and permit improvements in data quality, patient referral, and emergency response. In this paper, we present a system using NFC-enabled mobile phones for facilitating the tracking and care of patients in a low-resource environment. While our system design has been inspired by the needs of an ongoing project in Karachi, Pakistan, we believe that it is easily generalizable and applicable for similar health and medical projects in other places where mobile service is available.

very large data bases | 2015

Argonaut: macrotask crowdsourcing for complex data processing

Daniel Haas; Jason Ansel; Lydia Gu; Adam Marcus

Microblogs such as Twitter are a tremendous repository of user-generated content. Increasingly, we see tweets used as data sources for novel applications such as disaster mapping, brand sentiment analysis, and real-time visualizations. In each scenario, the workflow for processing tweets is ad-hoc, and a lot of unnecessary work goes into repeating common data processing patterns. We introduce TweeQL, a stream query processing language that presents a SQL-like query interface for unstructured tweets to generate structured data for downstream applications. We have built several tools on top of TweeQL, most notably TwitInfo, an event timeline generation and exploration interface that summarizes events as they are discussed on Twitter. Our demonstration will allow the audience to interact with both TweeQL and TwitInfo to convey the value of data embedded in tweets.

international conference on management of data | 2012

Processing and visualizing the data in tweets

Adam Marcus; Michael S. Bernstein; Osama Badar; David R. Karger; Samuel Madden; Robert C. Miller

We introduce a client-server toolkit called Sync Kit that demonstrates how client-side database storage can improve the performance of data intensive websites. Sync Kit is designed to make use of the embedded relational database defined in the upcoming HTML5 standard to offload some data storage and processing from a web server onto the web browsers to which it serves content. Our toolkit provides various strategies for synchronizing relational database tables between the browser and the web server, along with a client-side template library so that portions web applications may be executed client-side. Unlike prior work in this area, Sync Kit persists both templates and data in the browser across web sessions, increasing the number of concurrent connections a server can handle by up to a factor of four versus that of a traditional server-only web stack and a factor of three versus a recent template caching approach.

international world wide web conferences | 2010

Talking about data: sharing richly structured information through blogs and wikis

Edward Benson; Adam Marcus; Fabian Howahl; David R. Karger

Crowdsourced workflows are used in research and industry to solve a variety of tasks. The databases community has used crowd workers in query operators/optimization and for tasks such as entity resolution. Such research utilizes microtasks where crowd workers are asked to answer simple yes/no or multiple choice questions with little training. Typically, microtasks are used with voting algorithms to combine redundant responses from multiple crowd workers to achieve result quality. Microtasks are powerful, but fail in cases where larger context (e.g., domain knowledge) or significant time investment is needed to solve a problem, for example in large-document structured data extraction. In this paper, we consider context-heavy data processing tasks that may require many hours of work, and refer to such tasks as macrotasks. Leveraging the infrastructure and worker pools of existing crowdsourcing platforms, we automate macrotask scheduling, evaluation, and pay scales. A key challenge in macrotask-powered work, however, is evaluating the quality of a workers output, since ground truth is seldom available and redundancy-based quality control schemes are impractical. We present Argonaut, a framework that improves macrotask powered work quality using a hierarchical review. Argonaut uses a predictive model of worker quality to select trusted workers to perform review, and a separate predictive model of task quality to decide which tasks to review. Finally, Argonaut can identify the ideal trade-off between a single phase of review and multiple phases of review given a constrained review budget in order to maximize overall output quality. We evaluate an industrial use of Argonaut to power a structured data extraction pipeline that has utilized over half a million hours of crowd worker input to complete millions of macrotasks. We show that Argonaut can capture up to 118% more errors than random spot-check reviews in review budget-constrained environments with up to two review layers.

Explore More