Is this you? Create Your Porfile

Harr Chen

Massachusetts Institute of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Harr Chen is active.

Explore More

Publication

Featured researches published by Harr Chen.

international acm sigir conference on research and development in information retrieval | 2006

Less is more: probabilistic models for retrieving fewer relevant documents

Harr Chen; David R. Karger

Traditionally, information retrieval systems aim to maximize thenumber of relevant documents returned to a user within some windowof the top. For that goal, the probability ranking principle, whichranks documents in decreasing order of probability of relevance, isprovably optimal. However, there are many scenarios in which thatranking does not optimize for the users information need. Oneexample is when the user would be satisfied with some limitednumber of relevant documents, rather than needing all relevantdocuments. We show that in such a scenario, an attempt to returnmany relevant documents can actually reduce the chances of findingany relevant documents. We consider a number of information retrieval metrics from theliterature, including the rank of the first relevant result, the%no metric that penalizes a system only for retrieving no relevantresults near the top, and the diversity of retrieved results whenqueries have multiple interpretations. We observe that given aprobabilistic model of relevance, it is appropriate to rank so asto directly optimize these metrics in expectation. While doing somay be computationally intractable, we show that a simple greedyoptimization algorithm that approximately optimizes the givenobjectives produces rankings for TREC queries that outperform thestandard approach based on the probability ranking principle.

international joint conference on natural language processing | 2009

Reinforcement Learning for Mapping Instructions to Actions

Satchuthanan R. Branavan; Harr Chen; Luke Zettlemoyer; Regina Barzilay

In this paper, we present a reinforcement learning approach for mapping natural language instructions to sequences of executable actions. We assume access to a reward function that defines the quality of the executed actions. During training, the learner repeatedly constructs action sequences for a set of documents, executes those actions, and observes the resulting reward. We use a policy gradient algorithm to estimate the parameters of a log-linear model for action selection. We apply our method to interpret instructions in two domains --- Windows troubleshooting guides and game tutorials. Our results demonstrate that this method can rival supervised learning techniques while requiring few or no annotated training examples.

north american chapter of the association for computational linguistics | 2009

Global Models of Document Structure using Latent Permutations

Harr Chen; Satchuthanan R. Branavan; Regina Barzilay; David R. Karger

We present a novel Bayesian topic model for learning discourse-level document structure. Our model leverages insights from discourse theory to constrain latent topic assignments in a way that reflects the underlying organization of document topics. We propose a global model in which both topic selection and ordering are biased to be similar across a collection of related documents. We show that this space of orderings can be elegantly represented using a distribution over permutations called the generalized Mallows model. Our structure-aware approach substantially outperforms alternative approaches for cross-document comparison and single-document segmentation.

Journal of Artificial Intelligence Research | 2009

Content modeling using latent permutations

Harr Chen; Satchuthanan R. Branavan; Regina Barzilay; David R. Karger

We present a novel Bayesian topic model for learning discourse-level document structure. Our model leverages insights from discourse theory to constrain latent topic assignments in a way that reflects the underlying organization of document topics. We propose a global model in which both topic selection and ordering are biased to be similar across a collection of related documents. We show that this space of orderings can be effectively represented using a distribution over permutations called the Generalized Mallows Model. We apply our method to three complementary discourse-level tasks: cross-document alignment, document segmentation, and information ordering. Our experiments show that incorporating our permutation-based model in these applications yields substantial improvements in performance over previously proposed methods.

IEEE Transactions on Knowledge and Data Engineering | 2011

Usher: Improving Data Quality with Dynamic Forms

Kuang Chen; Harr Chen; Neil Conway; Joseph M. Hellerstein; Tapan S. Parikh

Data quality is a critical problem in modern databases. Data entry forms present the first and arguably best opportunity for detecting and mitigating errors, but there has been little research into automatic methods for improving data quality at entry time. In this paper, we propose USHER, an end-to-end system for form design, entry, and data quality assurance. Using previous form submissions, USHER learns a probabilistic model over the questions of the form. USHER then applies this model at every step of the data entry process to improve data quality. Before entry, it induces a form layout that captures the most important data values of a form instance as quickly as possible. During entry, it dynamically adapts the form to the values being entered, and enables real-time feedback to guide the data enterer toward their intended values. After entry, it re-asks questions that it deems likely to have been entered incorrectly. We evaluate all three components of USHER using two real-world data sets. Our results demonstrate that each component has the potential to improve data quality considerably, at a reduced cost when compared to current practice.

international acm sigir conference on research and development in information retrieval | 2004

Subwebs for specialized search

Raman Chandrasekar; Harr Chen; Simon Corston-Oliver; Eric D. Brill

We describe a method to define and use subwebs, user-defined neighborhoods of the Internet. Subwebs help improve search performance by inducing a topic-specific page relevance bias over a collection of documents. Subwebs may be automatically identified using a simple algorithm we describe, and used to provide highly-relevant topic-specific information retrieval. Using subwebs in a Help and Support topic, we see marked improvements in precision compared to generic search engine results.

Archive | 2004