Eugene Charniak | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Eugene Charniak is active.

Explore More

Publication

Featured researches published by Eugene Charniak.

meeting of the association for computational linguistics | 2005

Coarse-to-Fine n-Best Parsing and MaxEnt Discriminative Reranking

Eugene Charniak; Mark Johnson

Discriminative reranking is one method for constructing high-performance statistical parsers (Collins, 2000). A discriminative reranker requires a source of candidate parses for each sentence. This paper describes a simple yet novel method for constructing sets of 50-best parses based on a coarse-to-fine generative parser (Charniak, 2000). This method generates 50-best lists that are of substantially higher quality than previously obtainable. We used these parses as the input to a MaxEnt reranker (Johnson et al., 1999; Riezler et al., 2002) that selects the best parse from the set of parses for each sentence, obtaining an f-score of 91.0% on sentences of length 100 or less.

Artificial Intelligence | 1993

A Bayesian model of plan recognition

Eugene Charniak; Robert P. Goldman

Abstract We argue that the problem of plan recognition, inferring an agents plan from observations, is largely a problem of inference under conditions of uncertainty. We present an approach to the plan recognition problem that is based on Bayesian probability theory. In attempting to solve a plan recognition problem we first retrieve candidate explanations. These explanations (sometimes only the most promising ones) are assembled into a plan recognition Bayesian network , which is a representation of a probability distribution over the set of possible explanations. We perform Bayesian updating to choose the most likely interpretation for the set of observed actions. This approach has been implemented in the Wimp3 system for natural language story understanding.

language and technology conference | 2006

Effective Self-Training for Parsing

David McClosky; Eugene Charniak; Mark Johnson

We present a simple, but surprisingly effective, method of self-training a two-phase parser-reranker system using readily available unlabeled data. We show that this type of bootstrapping is possible for parsing when the bootstrapped parses are processed by a discriminative reranker. Our improved model achieves an f-score of 92.1%, an absolute 1.1% improvement (12% error reduction) over the previous best result for Wall Street Journal parsing. Finally, we provide some analysis to better understand the phenomenon.

Ai Magazine | 1997

Statistical Techniques for Natural Language Parsing

Eugene Charniak

I review current statistical work on syntactic parsing and then consider part-of-speech tagging, which was the first syntactic problem to successfully be attacked by statistical techniques and also serves as a good warm-up for the main topic-statistical parsing. Here, I consider both the simplified case in which the input string is viewed as a string of parts of speech and the more interesting case in which the parser is guided by statistical information about the particular words in the sentence. Finally, I anticipate future research directions.

Cognitive Science | 1983

Passing Markers: A Theory of Contextual Influence in Language Comprehension*

Eugene Charniak

Most Artificial Intelligence theories of language either assume a syntactic component which serves as “front end” for the rest of the system, or else reject all attempts at distinguishing modules within the comprehension system. In this paper we will present an alternative which, while keeping modularity, will account for several puzzles for typical “syntax first” theories. The major addition to this theory is a “marker passing” (or “spreading activation”) component, which operates in parallel to the normal syntactic component.

meeting of the association for computational linguistics | 2005

Supervised and Unsupervised Learning for Sentence Compression

Jenine Turner; Eugene Charniak

In Statistics-Based Summarization - Step One: Sentence Compression, Knight and Marcu (Knight and Marcu, 2000) (KM Knight and Marcu use a corpus of 1035 training sentences. More data is not easily available, so in addition to improving the original K&M noisy-channel model, we create unsupervised and semi-supervised models of the task. Finally, we point out problems with modeling the task in this way. They suggest areas for future research.

international joint conference on natural language processing | 2005

Parsing biomedical literature

Matthew Lease; Eugene Charniak

We present a preliminary study of several parser adaptation techniques evaluated on the GENIA corpus of MEDLINE abstracts [1,2]. We begin by observing that the Penn Treebank (PTB) is lexically impoverished when measured on various genres of scientific and technical writing, and that this significantly impacts parse accuracy. To resolve this without requiring in-domain treebank data, we show how existing domain-specific lexical resources may be leveraged to augment PTB-training: part-of-speech tags, dictionary collocations, and named-entities. Using a state-of-the-art statistical parser [3] as our baseline, our lexically-adapted parser achieves a 14.2% reduction in error. With oracle-knowledge of named-entities, this error reduction improves to 21.2%.

north american chapter of the association for computational linguistics | 2001

Edit detection and parsing for transcribed speech

Eugene Charniak; Mark Johnson

We present a simple architecture for parsing transcribed speech in which an edited-word detector first removes such words from the sentence string, and then a standard statistical parser trained on transcribed speech parses the remaining words. The edit detector achieves a misclassification rate on edited words of 2.2%. (The NULL-model, which marks everything as not edited, has an error rate of 5.9%.) To evaluate our parsing results we introduce a new evaluation metric, the purpose of which is to make evaluation of a parse tree relatively indifferent to the exact tree position of EDITED nodes. By this metric the parser achieves 85.3% precision and 86.5% recall.

Artificial Intelligence | 1994

Cost-based abduction and MAP explanation

Eugene Charniak; Solomon Eyal Shimony

Abstract Cost-based abduction attempts to find the best explanation for a set of facts by finding a minimal cost proof for the facts. The costs are computed by summing the costs of the assumptions necessary for the proof plus the cost of the rules. We examine existing methods for constructing explanations (proofs), as a minimization problem on a DAG (directed acyclic graph). We then define a probabilistic semantics for the costs, and prove the equivalence of the cost minimization problem to the Bayesian network MAP (maximum a posteriori probability) solution of the system. A simple best-first algorithm for finding least-cost proofs is presented, and possible improvements are suggested. The semantics of cost-based abduction for complete models are then generalized to handle negation. This, in turn, allows us to apply the best-first search algorithm as a novel way of computing MAP assignments to belief networks that can enumerate assignments in order of decreasing probability. An important point is that improvement results for the best-first search algorithm carry over to the computation of MAPs.

meeting of the association for computational linguistics | 2002

Entropy Rate Constancy in Text

Eugene Charniak

We present a constancy rate principle governing language generation. We show that this principle implies that local measures of entropy (ignoring context) should increase with the sentence number. We demonstrate that this is indeed the case by measuring entropy in three different ways. We also show that this effect has both lexical (which words are used) and non-lexical (how the words are used) causes.

Explore More