Fernando Diaz | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Fernando Diaz is active.

Explore More

Publication

Featured researches published by Fernando Diaz.

ACM Computing Surveys | 2015

Processing Social Media Messages in Mass Emergency: A Survey

Muhammad Imran; Carlos Castillo; Fernando Diaz; Sarah Vieweg

Social media platforms provide active communication channels during mass convergence and emergency events such as disasters caused by natural hazards. As a result, first responders, decision makers, and the public can use this information to gain insight into the situation as it unfolds. In particular, many social media messages communicated during emergencies convey timely, actionable information. Processing social media messages to obtain such information, however, involves solving multiple challenges including: parsing brief and informal messages, handling information overload, and prioritizing different types of information found in messages. These challenges can be mapped to classical information processing operations such as filtering, classifying, ranking, aggregating, extracting, and summarizing. We survey the state of the art regarding computational methods to process social media messages and highlight both their contributions and shortcomings. In addition, we examine their particularities, and methodically examine a series of key subproblems ranging from the detection of events to the creation of actionable and useful summaries. Research thus far has, to a large extent, produced methods to extract situational awareness information from social media. In this survey, we cover these various approaches, and highlight their benefits and shortcomings. We conclude with research challenges that go beyond situational awareness, and begin to look at supporting decision making and coordinating emergency-response actions.

international world wide web conferences | 2013

Practical extraction of disaster-relevant information from social media

Muhammad Imran; Shady Elbassuoni; Carlos Castillo; Fernando Diaz; Patrick Meier

During times of disasters online users generate a significant amount of data, some of which are extremely valuable for relief efforts. In this paper, we study the nature of social-media content generated during two different natural disasters. We also train a model based on conditional random fields to extract valuable information from such content. We evaluate our techniques over our two datasets through a set of carefully designed experiments. We also test our methods over a non-disaster dataset to show that our extraction model is useful for extracting information from socially-generated content in general.

meeting of the association for computational linguistics | 2016

Query Expansion with Locally-Trained Word Embeddings

Fernando Diaz; Bhaskar Mitra; Nick Craswell

Continuous space word embeddings have received a great deal of attention in the natural language processing and machine learning communities for their ability to model term similarity and other relationships. We study the use of term relatedness in the context of query expansion for ad hoc information retrieval. We demonstrate that word embeddings such as word2vec and GloVe, when trained globally, underperform corpus and query specific embeddings for retrieval tasks. These results suggest that other tasks benefiting from global embeddings may also benefit from local embeddings.

Journal of Marketing Research | 2014

The Economic and Cognitive Costs of Annoying Display Advertisements

Daniel G. Goldstein; Siddharth Suri; R. Preston McAfee; Matthew Ekstrand-Abueg; Fernando Diaz

Some online display advertisements are annoying. Although publishers know the payment they receive to run annoying ads, little is known about the cost that such ads incur (e.g., causing website abandonment). Across three empirical studies, the authors address two primary questions: (1) What is the economic cost of annoying ads to publishers? and (2) What is the cognitive impact of annoying ads to users? First, the authors conduct a preliminary study to identify sets of more and less annoying ads. Second, in a field experiment, they calculate the compensating differential, that is, the amount of money a publisher would need to pay users to generate the same number of impressions in the presence of annoying ads as it would generate in their absence. Third, the authors conduct a mouse-tracking study to investigate how annoying ads affect reading processes. They conclude that in plausible scenarios, the practice of running annoying ads can cost more money than it earns.

european conference on information retrieval | 2013

Updating users about time critical events

Qi Guo; Fernando Diaz; Elad Yom-Tov

During unexpected events such as natural disasters, individuals rely on the information generated by news outlets to form their understanding of these events. This information, while often voluminous, is frequently degraded by the inclusion of unimportant, duplicate, or wrong information. It is important to be able to present users with only the novel, important information about these events as they develop. We present the problem of updating users about time critical news events, and focus on the task of deciding which information to select for updating users as an event develops. We propose a solution to this problem which incorporates techniques from information retrieval and multi-document summarization and evaluate this approach on a set of historic events using a large stream of news documents. We also introduce an evaluation method which is significantly less expensive than traditional approaches to temporal summarization.

international joint conference on natural language processing | 2015

Predicting Salient Updates for Disaster Summarization

Chris Kedzie; Kathleen R. McKeown; Fernando Diaz

During crises such as natural disasters or other human tragedies, information needs of both civilians and responders often require urgent, specialized treatment. Monitoring and summarizing a text stream during such an event remains a difficult problem. We present a system for update summarization which predicts the salience of sentences with respect to an event and then uses these predictions to directly bias a clustering algorithm for sentence selection, increasing the quality of the updates. We use novel, disaster-specific features for salience prediction, including geo-locations and language models representing the language of disaster. Our evaluation on a standard set of retrospective events using ROUGE shows that salience prediction provides a significant improvement over other approaches.

PLOS ONE | 2016

Online and Social Media Data As an Imperfect Continuous Panel Survey

Fernando Diaz; Michael Gamon; Jake M. Hofman; Emre Kiciman; David Rothschild

There is a large body of research on utilizing online activity as a survey of political opinion to predict real world election outcomes. There is considerably less work, however, on using this data to understand topic-specific interest and opinion amongst the general population and specific demographic subgroups, as currently measured by relatively expensive surveys. Here we investigate this possibility by studying a full census of all Twitter activity during the 2012 election cycle along with the comprehensive search history of a large panel of Internet users during the same period, highlighting the challenges in interpreting online and social media activity as the results of a survey. As noted in existing work, the online population is a non-representative sample of the offline world (e.g., the U.S. voting population). We extend this work to show how demographic skew and user participation is non-stationary and difficult to predict over time. In addition, the nature of user contributions varies substantially around important events. Furthermore, we note subtle problems in mapping what people are sharing or consuming online to specific sentiment or opinion measures around a particular topic. We provide a framework, built around considering this data as an imperfect continuous panel survey, for addressing these issues so that meaningful insight about public interest and opinion can be reliably extracted from online and social media data.

international acm sigir conference on research and development in information retrieval | 2016

Report on the SIGIR 2015 Workshop on Reproducibility, Inexplicability, and Generalizability of Results (RIGOR)

Jaime Arguello; Matt Crane; Fernando Diaz; Jimmy J. Lin; Andrew Trotman

The SIGIR 2015 Workshop on Reproducibility, Inexplicability, and Generalizability of Results (RIGOR) took place on Thursday, August 13, 2015 in Santiago, Chile. The goal of the workshop was two fold. The first to provide a venue for the publication and presentation of negative results. The second was to provide a venue through which the authors of open source search engines could compare performance of indexing and searching on the same collections and on the same machines - encouraging the sharing of ideas and discoveries in a like-to-like environment. In total three papers were presented and seven systems participated.

international acm sigir conference on research and development in information retrieval | 2014

Mobile query reformulations

Milad Shokouhi; Rosie Jones; Umut Ozertem; Karthik Raghunathan; Fernando Diaz

Users frequently interact with web search systems on their mobile devices via multiple modalities, including touch and speech. These interaction modes are substantially different from the user experience on desktop search. As a result, system designers have new challenges and questions around understanding the intent on these platforms. In this paper, we study the query reformulation patterns in mobile logs. We group query reformulations based on their input method into four categories; text-text, text-voice, voice-text and voice-voice. We discuss the unique characteristics of each of these groups by comparing them against each other and desktop logs. We also compare the distribution of reformulation types (e.g. adding/dropping words) against desktop logs and show that there are new classes of reformulations that are caused by errors in speech recognition. Our results suggest that users do not tend to switch between different input types (e.g. voice and text). Voice to text switches are largely caused by speech recognition errors, and text to voice switches are unlikely to be about the same intent.

web search and data mining | 2013

Temporal web dynamics and its application to information retrieval

Kira Radinsky; Fernando Diaz; Susan T. Dumais; Milad Shokouhi; Anlei Dong; Yi Chang

The World Wide Web is highly dynamic and is constantly evolving to cover the latest information about the physical and social updates in the world. At the same time, the changes in web contents are entangled with new information needs and time-sensitive user interactions with information sources. To address these temporal information needs effectively, it is essential for the search engines to model web dynamics and understand the changes in user behavior over time that are caused by them. In this full-day tutorial, we focus on modeling time-sensitive content on the web, and discuss the state-of-the-art approaches for integrating temporal signals in web search. We address many of the related research topics including those associated with searching dynamic collections, defining time-sensitive relevance, understanding user query behavior over time, and investigating the mains reasons behind content changes. We also cover algorithms, architectures, evaluation methodologies and metrics for time-aware search, and discuss the latest breakthroughs and open challenges, both from the algorithmic and the architectural perspectives.

Explore More