Harr Chen
Massachusetts Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Harr Chen.
international acm sigir conference on research and development in information retrieval | 2006
Harr Chen; David R. Karger
Traditionally, information retrieval systems aim to maximize thenumber of relevant documents returned to a user within some windowof the top. For that goal, the probability ranking principle, whichranks documents in decreasing order of probability of relevance, isprovably optimal. However, there are many scenarios in which thatranking does not optimize for the users information need. Oneexample is when the user would be satisfied with some limitednumber of relevant documents, rather than needing all relevantdocuments. We show that in such a scenario, an attempt to returnmany relevant documents can actually reduce the chances of findingany relevant documents. We consider a number of information retrieval metrics from theliterature, including the rank of the first relevant result, the%no metric that penalizes a system only for retrieving no relevantresults near the top, and the diversity of retrieved results whenqueries have multiple interpretations. We observe that given aprobabilistic model of relevance, it is appropriate to rank so asto directly optimize these metrics in expectation. While doing somay be computationally intractable, we show that a simple greedyoptimization algorithm that approximately optimizes the givenobjectives produces rankings for TREC queries that outperform thestandard approach based on the probability ranking principle.
international joint conference on natural language processing | 2009
Satchuthanan R. Branavan; Harr Chen; Luke Zettlemoyer; Regina Barzilay
In this paper, we present a reinforcement learning approach for mapping natural language instructions to sequences of executable actions. We assume access to a reward function that defines the quality of the executed actions. During training, the learner repeatedly constructs action sequences for a set of documents, executes those actions, and observes the resulting reward. We use a policy gradient algorithm to estimate the parameters of a log-linear model for action selection. We apply our method to interpret instructions in two domains --- Windows troubleshooting guides and game tutorials. Our results demonstrate that this method can rival supervised learning techniques while requiring few or no annotated training examples.
north american chapter of the association for computational linguistics | 2009
Harr Chen; Satchuthanan R. Branavan; Regina Barzilay; David R. Karger
We present a novel Bayesian topic model for learning discourse-level document structure. Our model leverages insights from discourse theory to constrain latent topic assignments in a way that reflects the underlying organization of document topics. We propose a global model in which both topic selection and ordering are biased to be similar across a collection of related documents. We show that this space of orderings can be elegantly represented using a distribution over permutations called the generalized Mallows model. Our structure-aware approach substantially outperforms alternative approaches for cross-document comparison and single-document segmentation.
Journal of Artificial Intelligence Research | 2009
Harr Chen; Satchuthanan R. Branavan; Regina Barzilay; David R. Karger
We present a novel Bayesian topic model for learning discourse-level document structure. Our model leverages insights from discourse theory to constrain latent topic assignments in a way that reflects the underlying organization of document topics. We propose a global model in which both topic selection and ordering are biased to be similar across a collection of related documents. We show that this space of orderings can be effectively represented using a distribution over permutations called the Generalized Mallows Model. We apply our method to three complementary discourse-level tasks: cross-document alignment, document segmentation, and information ordering. Our experiments show that incorporating our permutation-based model in these applications yields substantial improvements in performance over previously proposed methods.
IEEE Transactions on Knowledge and Data Engineering | 2011
Kuang Chen; Harr Chen; Neil Conway; Joseph M. Hellerstein; Tapan S. Parikh
Data quality is a critical problem in modern databases. Data entry forms present the first and arguably best opportunity for detecting and mitigating errors, but there has been little research into automatic methods for improving data quality at entry time. In this paper, we propose USHER, an end-to-end system for form design, entry, and data quality assurance. Using previous form submissions, USHER learns a probabilistic model over the questions of the form. USHER then applies this model at every step of the data entry process to improve data quality. Before entry, it induces a form layout that captures the most important data values of a form instance as quickly as possible. During entry, it dynamically adapts the form to the values being entered, and enables real-time feedback to guide the data enterer toward their intended values. After entry, it re-asks questions that it deems likely to have been entered incorrectly. We evaluate all three components of USHER using two real-world data sets. Our results demonstrate that each component has the potential to improve data quality considerably, at a reduced cost when compared to current practice.
international acm sigir conference on research and development in information retrieval | 2004
Raman Chandrasekar; Harr Chen; Simon Corston-Oliver; Eric D. Brill
We describe a method to define and use subwebs, user-defined neighborhoods of the Internet. Subwebs help improve search performance by inducing a topic-specific page relevance bias over a collection of documents. Subwebs may be automatically identified using a simple algorithm we describe, and used to provide highly-relevant topic-specific information retrieval. Using subwebs in a Help and Support topic, we see marked improvements in precision compared to generic search engine results.
Archive | 2004
Harr Chen; Raman Chandrasekar; Simon H. Corston; Eric D. Brill
empirical methods in natural language processing | 2010
Tahira Naseem; Harr Chen; Regina Barzilay; Mark Johnson
meeting of the association for computational linguistics | 2008
Satchuthanan R. Branavan; Harr Chen; Jacob Eisenstein; Regina Barzilay
Archive | 2004
Simon H. Corston; Raman Chandrasekar; Harr Chen