Featured Researches

Computation And Language

An Empirical Evaluation of Probabilistic Lexicalized Tree Insertion Grammars

We present an empirical study of the applicability of Probabilistic Lexicalized Tree Insertion Grammars (PLTIG), a lexicalized counterpart to Probabilistic Context-Free Grammars (PCFG), to problems in stochastic natural-language processing. Comparing the performance of PLTIGs with non-hierarchical N-gram models and PCFGs, we show that PLTIG combines the best aspects of both, with language modeling capability comparable to N-grams, and improved parsing performance over its non-lexicalized counterpart. Furthermore, training of PLTIGs displays faster convergence than PCFGs.

Read more
Computation And Language

An Empirical Investigation of Proposals in Collaborative Dialogues

We describe a corpus-based investigation of proposals in dialogue. First, we describe our DRI compliant coding scheme and report our inter-coder reliability results. Next, we test several hypotheses about what constitutes a well-formed proposal.

Read more
Computation And Language

An Empirical Study of Smoothing Techniques for Language Modeling

We present an extensive empirical comparison of several smoothing techniques in the domain of language modeling, including those described by Jelinek and Mercer (1980), Katz (1987), and Church and Gale (1991). We investigate for the first time how factors such as training data size, corpus (e.g., Brown versus Wall Street Journal), and n-gram order (bigram versus trigram) affect the relative performance of these methods, which we measure through the cross-entropy of test data. In addition, we introduce two novel smoothing techniques, one a variation of Jelinek-Mercer smoothing and one a very simple linear interpolation technique, both of which outperform existing methods.

Read more
Computation And Language

An Information Extraction Core System for Real World German Text Processing

This paper describes SMES, an information extraction core system for real world German text processing. The basic design criterion of the system is of providing a set of basic powerful, robust, and efficient natural language components and generic linguistic knowledge sources which can easily be customized for processing different tasks in a flexible manner.

Read more
Computation And Language

An Information Structural Approach to Spoken Language Generation

This paper presents an architecture for the generation of spoken monologues with contextually appropriate intonation. A two-tiered information structure representation is used in the high-level content planning and sentence planning stages of generation to produce efficient, coherent speech that makes certain discourse relationships, such as explicit contrasts, appropriately salient. The system is able to produce appropriate intonational patterns that cannot be generated by other systems which rely solely on word class and given/new distinctions.

Read more
Computation And Language

An Investigation of Transformation-Based Learning in Discourse

This paper presents results from the first attempt to apply Transformation-Based Learning to a discourse-level Natural Language Processing task. To address two limitations of the standard algorithm, we developed a Monte Carlo version of Transformation-Based Learning to make the method tractable for a wider range of problems without degradation in accuracy, and we devised a committee method for assigning confidence measures to tags produced by Transformation-Based Learning. The paper describes these advances, presents experimental evidence that Transformation-Based Learning is as effective as alternative approaches (such as Decision Trees and N-Grams) for a discourse task called Dialogue Act Tagging, and argues that Transformation-Based Learning has desirable features that make it particularly appealing for the Dialogue Act Tagging task.

Read more
Computation And Language

An Iterative Algorithm to Build Chinese Language Models

We present an iterative procedure to build a Chinese language model (LM). We segment Chinese text into words based on a word-based Chinese language model. However, the construction of a Chinese LM itself requires word boundaries. To get out of the chicken-and-egg problem, we propose an iterative procedure that alternates two operations: segmenting text into words and building an LM. Starting with an initial segmented corpus and an LM based upon it, we use a Viterbi-liek algorithm to segment another set of data. Then, we build an LM based on the second set and use the resulting LM to segment again the first corpus. The alternating procedure provides a self-organized way for the segmenter to detect automatically unseen words and correct segmentation errors. Our preliminary experiment shows that the alternating procedure not only improves the accuracy of our segmentation, but discovers unseen words surprisingly well. The resulting word-based LM has a perplexity of 188 for a general Chinese corpus.

Read more
Computation And Language

Anchoring a Lexicalized Tree-Adjoining Grammar for Discourse

We here explore a ``fully'' lexicalized Tree-Adjoining Grammar for discourse that takes the basic elements of a (monologic) discourse to be not simply clauses, but larger structures that are anchored on variously realized discourse cues. This link with intra-sentential grammar suggests an account for different patterns of discourse cues, while the different structures and operations suggest three separate sources for elements of discourse meaning: (1) a compositional semantics tied to the basic trees and operations; (2) a presuppositional semantics carried by cue phrases that freely adjoin to trees; and (3) general inference, that draws additional, defeasible conclusions that flesh out what is conveyed compositionally.

Read more
Computation And Language

Annotation Style Guide for the Blinker Project

This annotation style guide was created by and for the Blinker project at the University of Pennsylvania. The Blinker project was so named after the ``bilingual linker'' GUI, which was created to enable bilingual annotators to ``link'' word tokens that are mutual translations in parallel texts. The parallel text chosen for this project was the Bible, because it is probably the easiest text to obtain in electronic form in multiple languages. The languages involved were English and French, because, of the languages with which the project co-ordinator was familiar, these were the two for which a sufficient number of annotators was likely to be found.

Read more
Computation And Language

Another Facet of LIG Parsing

In this paper we present a new parsing algorithm for linear indexed grammars (LIGs) in the same spirit as the one described in (Vijay-Shanker and Weir, 1993) for tree adjoining grammars. For a LIG L and an input string x of length n , we build a non ambiguous context-free grammar whose sentences are all (and exclusively) valid derivation sequences in L which lead to x . We show that this grammar can be built in O( n 6 ) time and that individual parses can be extracted in linear time with the size of the extracted parse tree. Though this O( n 6 ) upper bound does not improve over previous results, the average case behaves much better. Moreover, practical parsing times can be decreased by some statically performed computations.

Read more

Ready to get started?

Join us today