Dafna Shahaf | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Dafna Shahaf is active.

Explore More

Publication

Featured researches published by Dafna Shahaf.

knowledge discovery and data mining | 2010

Connecting the dots between news articles

Dafna Shahaf; Carlos Guestrin

The process of extracting useful knowledge from large datasets has become one of the most pressing problems in todays society. The problem spans entire sectors, from scientists to intelligence analysts and web users, all of whom are constantly struggling to keep up with the larger and larger amounts of content published every day. With this much data, it is often easy to miss the big picture. In this paper, we investigate methods for automatically connecting the dots -- providing a structured, easy way to navigate within a new topic and discover hidden connections. We focus on the news domain: given two news articles, our system automatically finds a coherent chain linking them together. For example, it can recover the chain of events starting with the decline of home prices (January 2007), and ending with the ongoing health-care debate. We formalize the characteristics of a good chain and provide an efficient algorithm (with theoretical guarantees) to connect two fixed endpoints. We incorporate user feedback into our framework, allowing the stories to be refined and personalized. Finally, we evaluate our algorithm over real news data. Our user studies demonstrate the algorithms effectiveness in helping users understanding the news.

knowledge discovery and data mining | 2009

Turning down the noise in the blogosphere

Khalid El-Arini; Gaurav Veda; Dafna Shahaf; Carlos Guestrin

In recent years, the blogosphere has experienced a substantial increase in the number of posts published daily, forcing users to cope with information overload. The task of guiding users through this flood of information has thus become critical. To address this issue, we present a principled approach for picking a set of posts that best covers the important stories in the blogosphere. We define a simple and elegant notion of coverage and formalize it as a submodular optimization problem, for which we can efficiently compute a near-optimal solution. In addition, since people have varied interests, the ideal coverage algorithm should incorporate user preferences in order to tailor the selected posts to individual tastes. We define the problem of learning a personalized coverage function by providing an appropriate user-interaction model and formalizing an online learning framework for this task. We then provide a no-regret algorithm which can quickly learn a users preferences from limited feedback. We evaluate our coverage and personalization algorithms extensively over real blog data. Results from a user study show that our simple coverage algorithm does as well as most popular blog aggregation sites, including Google Blog Search, Yahoo! Buzz, and Digg. Furthermore, we demonstrate empirically that our algorithm can successfully adapt to user preferences. We believe that our technique, especially with personalization, can dramatically reduce information overload.

international world wide web conferences | 2012

Trains of thought: generating information maps

Dafna Shahaf; Carlos Guestrin; Eric Horvitz

When information is abundant, it becomes increasingly difficult to fit nuggets of knowledge into a single coherent picture. Complex stories spaghetti into branches, side stories, and intertwining narratives. In order to explore these stories, one needs a map to navigate unfamiliar territory. We propose a methodology for creating structured summaries of information, which we call metro maps. Our proposed algorithm generates a concise structured set of documents maximizing coverage of salient pieces of information. Most importantly, metro maps explicitly show the relations among retrieved pieces in a way that captures story development. We first formalize characteristics of good maps and formulate their construction as an optimization problem. Then we provide efficient methods with theoretical guarantees for generating maps. Finally, we integrate user interaction into our framework, allowing users to alter the maps to better reflect their interests. Pilot user studies with a real-world dataset demonstrate that the method is able to produce maps which help users acquire knowledge efficiently.

knowledge discovery and data mining | 2013

Information cartography: creating zoomable, large-scale maps of information

Dafna Shahaf; Jaewon Yang; Caroline Suen; Jeff Jacobs; Heidi Wang; Jure Leskovec

In an era of information overload, many people struggle to make sense of complex stories, such as presidential elections or economic reforms. We propose a methodology for creating structured summaries of information, which we call zoomable metro maps. Just as cartographic maps have been relied upon for centuries to help us understand our surroundings, metro maps can help us understand the information landscape. Given large collection of news documents our proposed algorithm generates a map of connections that explicitly captures story development. As different users might be interested in different levels of granularity, the maps are zoomable, with each level of zoom showing finer details and interactions. In this paper, we formalize characteristics of good zoomable maps and formulate their construction as an optimization problem. We provide efficient, scalable methods with theoretical guarantees for generating maps. Pilot user studies over real-world datasets demonstrate that our method helps users comprehend complex stories better than prior work.

knowledge discovery and data mining | 2012

Metro maps of science

Dafna Shahaf; Carlos Guestrin; Eric Horvitz

As the number of scientific publications soars, even the most enthusiastic reader can have trouble staying on top of the evolving literature. It is easy to focus on a narrow aspect of ones field and lose track of the big picture. Information overload is indeed a major challenge for scientists today, and is especially daunting for new investigators attempting to master a discipline and scientists who seek to cross disciplinary borders. In this paper, we propose metrics of influence, coverage and connectivity for scientific literature. We use these metrics to create structured summaries of information, which we call metro maps. Most importantly, metro maps explicitly show the relations between papers in a way which captures developments in the field. Pilot user studies demonstrate that our method helps researchers acquire new knowledge efficiently: map users achieved better precision and recall scores and found more seminal papers while performing fewer searches.

ACM Transactions on Knowledge Discovery From Data | 2012

Connecting Two (or Less) Dots: Discovering Structure in News Articles

Dafna Shahaf; Carlos Guestrin

Finding information is becoming a major part of our daily life. Entire sectors, from Web users to scientists and intelligence analysts, are increasingly struggling to keep up with the larger and larger amounts of content published every day. With this much data, it is often easy to miss the big picture. In this article, we investigate methods for automatically connecting the dots---providing a structured, easy way to navigate within a new topic and discover hidden connections. We focus on the news domain: given two news articles, our system automatically finds a coherent chain linking them together. For example, it can recover the chain of events starting with the decline of home prices (January 2007), and ending with the health care debate (2009). We formalize the characteristics of a good chain and provide a fast search-driven algorithm to connect two fixed endpoints. We incorporate user feedback into our framework, allowing the stories to be refined and personalized. We also provide a method to handle partially-specified endpoints, for users who do not know both ends of a story. Finally, we evaluate our algorithm over real news data. Our user studies demonstrate that the objective we propose captures the users’ intuitive notion of coherence, and that our algorithm effectively helps users understand the news.

knowledge discovery and data mining | 2015

Inside Jokes: Identifying Humorous Cartoon Captions

Dafna Shahaf; Eric Horvitz; Robert Mankoff

Humor is an integral aspect of the human experience. Motivated by the prospect of creating computational models of humor, we study the influence of the language of cartoon captions on the perceived humorousness of the cartoons. Our studies are based on a large corpus of crowdsourced cartoon captions that were submitted to a contest hosted by the New Yorker. Having access to thousands of captions submitted for the same image allows us to analyze the breadth of responses of people to the same visual stimulus. We first describe how we acquire judgments about the humorousness of different captions. Then, we detail the construction of a corpus where captions deemed funnier are paired with less-funny captions for the same cartoon. We analyze the caption pairs and find significant differences between the funnier and less-funny captions. Next, we build a classifier to identify funnier captions automatically. Given two captions and a cartoon, our classifier picks the funnier one 69% of the time for captions hinging on the same joke, and 64% of the time for any pair of captions. Finally, we use the classifier to find the best captions and study how its predictions could be used to significantly reduce the load on the cartoon contests judges.

Communications of The ACM | 2015

Information cartography

Dafna Shahaf; Carlos Guestrin; Eric Horvitz; Jure Leskovec

A metro map can tell a story, as well as provide good directions.

european conference on machine learning | 2016

Ballpark Learning: Estimating Labels from Rough Group Comparisons

Tom Hope; Dafna Shahaf

We are interested in estimating individual labels given only coarse, aggregated signal over the data points. In our setting, we receive sets (“bags”) of unlabeled instances with constraints on label proportions. We relax the unrealistic assumption of known label proportions, made in previous work; instead, we assume only to have upper and lower bounds, and constraints on bag differences. We motivate the problem, propose an intuitive formulation and algorithm, and apply our methods to real-world scenarios. Across several domains, we show how using only proportion constraints and no labeled examples, we can achieve surprisingly high accuracy. In particular, we demonstrate how to predict income level using rough stereotypes and how to perform sentiment analysis using very little information. We also apply our method to guide exploratory analysis, recovering geographical differences in twitter dialect.

ACM Sigweb Newsletter | 2013

Metro maps of information by Dafna Shahaf, Carlos Guestrin and Eric Horvitz, with Ching-man Au Yeung as coordinator

Dafna Shahaf; Carlos Guestrin; Eric Horvitz

When information is abundant, it becomes increasingly difficult to fit nuggets of knowledge into a single coherent picture. Complex stories spaghetti into branches, side stories, and intertwining narratives. In order to explore these stories, one needs a map to navigate unfamiliar territory. We have developed a methodology for creating structured summaries of information, which we call metro maps. Our algorithm generates a concise structured set of documents which maxi- mizes coverage of salient pieces of information. Most importantly, metro maps explicitly show the relations among retrieved pieces in a way that captures the evolution of a story. We first for- malize characteristics of good maps and formulate their construction as an optimization problem. Then, we provide efficient methods with theoretical guarantees for generating maps. Finally, we integrate capabilities for supporting user interaction into the framework, allowing users to guide the formulation of the maps so as to better re ect their interests. Pilot user studies with a real- world dataset demonstrate that the method is able to produce maps which help users to acquire knowledge efficiently.

Explore More