Featured Researches

Computation And Language

Co-Indexing Labelled DRSs to Represent and Reason with Ambiguities

The paper addresses the problem of representing ambiguities in a way that allows for monotonic disambiguation and for direct deductive computation. The paper focuses on an extension of the formalism of underspecified DRSs to ambiguities introduced by plural NPs. It deals with the collective/distributive distinction, and also with generic and cumulative readings. In addition it provides a systematic account for an underspecified treatment of plural pronoun resolution.

Read more
Computation And Language

Co-evolution of Language and of the Language Acquisition Device

A new account of parameter setting during grammatical acquisition is presented in terms of Generalized Categorial Grammar embedded in a default inheritance hierarchy, providing a natural partial ordering on the setting of parameters. Experiments show that several experimentally effective learners can be defined in this framework. Evolutionary simulations suggest that a learner with default initial settings for parameters will emerge, provided that learning is memory limited and the environment of linguistic adaptation contains an appropriate language.

Read more
Computation And Language

Collocational Grammar

A perspective of statistical language models which emphasizes their collocational aspect is advocated. It is suggested that strings be generalized in terms of classes of relationships instead of classes of objects. The single most important characteristic of such a model is a mechanism for comparing patterns. When patterns are fully generalized a natural definition of syntactic class emerges as a subset of relational class. These collocational syntactic classes should be an unambiguous partition of traditional syntactic classes.

Read more
Computation And Language

Combining Expression and Content in Domains for Dialog Managers

We present work in progress on abstracting dialog managers from their domain in order to implement a dialog manager development tool which takes (among other data) a domain description as input and delivers a new dialog manager for the described domain as output. Thereby we will focus on two topics; firstly, the construction of domain descriptions with description logics and secondly, the interpretation of utterances in a given domain.

Read more
Computation And Language

Combining Hand-crafted Rules and Unsupervised Learning in Constraint-based Morphological Disambiguation

This paper presents a constraint-based morphological disambiguation approach that is applicable languages with complex morphology--specifically agglutinative languages with productive inflectional and derivational morphological phenomena. In certain respects, our approach has been motivated by Brill's recent work, but with the observation that his transformational approach is not directly applicable to languages like Turkish. Our system combines corpus independent hand-crafted constraint rules, constraint rules that are learned via unsupervised learning from a training corpus, and additional statistical information from the corpus to be morphologically disambiguated. The hand-crafted rules are linguistically motivated and tuned to improve precision without sacrificing recall. The unsupervised learning process produces two sets of rules: (i) choose rules which choose morphological parses of a lexical item satisfying constraint effectively discarding other parses, and (ii) delete rules, which delete parses satisfying a constraint. Our approach also uses a novel approach to unknown word processing by employing a secondary morphological processor which recovers any relevant inflectional and derivational information from a lexical item whose root is unknown. With this approach, well below 1 percent of the tokens remains as unknown in the texts we have experimented with. Our results indicate that by combining these hand-crafted,statistical and learned information sources, we can attain a recall of 96 to 97 percent with a corresponding precision of 93 to 94 percent, and ambiguity of 1.02 to 1.03 parses per token.

Read more
Computation And Language

Combining Multiple Methods for the Automatic Construction of Multilingual WordNets

This paper explores the automatic construction of a multilingual Lexical Knowledge Base from preexisting lexical resources. First, a set of automatic and complementary techniques for linking Spanish words collected from monolingual and bilingual MRDs to English WordNet synsets are described. Second, we show how resulting data provided by each method is then combined to produce a preliminary version of a Spanish WordNet with an accuracy over 85%. The application of these combinations results on an increment of the extracted connexions of a 40% without losing accuracy. Both coarse-grained (class level) and fine-grained (synset assignment level) confidence ratios are used and evaluated. Finally, the results for the whole process are presented.

Read more
Computation And Language

Combining Trigram-based and Feature-based Methods for Context-Sensitive Spelling Correction

This paper addresses the problem of correcting spelling errors that result in valid, though unintended words (such as ``peace'' and ``piece'', or ``quiet'' and ``quite'') and also the problem of correcting particular word usage errors (such as ``amount'' and ``number'', or ``among'' and ``between''). Such corrections require contextual information and are not handled by conventional spelling programs such as Unix `spell'. First, we introduce a method called Trigrams that uses part-of-speech trigrams to encode the context. This method uses a small number of parameters compared to previous methods based on word trigrams. However, it is effectively unable to distinguish among words that have the same part of speech. For this case, an alternative feature-based method called Bayes performs better; but Bayes is less effective than Trigrams when the distinction among words depends on syntactic constraints. A hybrid method called Tribayes is then introduced that combines the best of the previous two methods. The improvement in performance of Tribayes over its components is verified experimentally. Tribayes is also compared with the grammar checker in Microsoft Word, and is found to have substantially higher performance.

Read more
Computation And Language

Combining Unsupervised Lexical Knowledge Methods for Word Sense Disambiguation

This paper presents a method to combine a set of unsupervised algorithms that can accurately disambiguate word senses in a large, completely untagged corpus. Although most of the techniques for word sense resolution have been presented as stand-alone, it is our belief that full-fledged lexical ambiguity resolution should combine several information sources and techniques. The set of techniques have been applied in a combined way to disambiguate the genus terms of two machine-readable dictionaries (MRD), enabling us to construct complete taxonomies for Spanish and French. Tested accuracy is above 80% overall and 95% for two-way ambiguous genus terms, showing that taxonomy building is not limited to structured dictionaries such as LDOCE.

Read more
Computation And Language

Comparative Ellipsis and Variable Binding

In this paper, we discuss the question whether phrasal comparatives should be given a direct interpretation, or require an analysis as elliptic constructions, and answer it with Yes and No. The most adequate analysis of wide reading attributive (WRA) comparatives seems to be as cases of ellipsis, while a direct (but asymmetric) analysis fits the data for narrow scope attributive comparatives. The question whether it is a syntactic or a semantic process which provides the missing linguistic material in the complement of WRA comparatives is also given a complex answer: Linguistic context is accessed by combining a reconstruction operation and a mechanism of anaphoric reference. The analysis makes only few and straightforward syntactic assumptions. In part, this is made possible because the use of Generalized Functional Application as a semantic operation allows us to model semantic composition in a flexible way.

Read more
Computation And Language

Comparative Experiments on Disambiguating Word Senses: An Illustration of the Role of Bias in Machine Learning

This paper describes an experimental comparison of seven different learning algorithms on the problem of learning to disambiguate the meaning of a word from context. The algorithms tested include statistical, neural-network, decision-tree, rule-based, and case-based classification techniques. The specific problem tested involves disambiguating six senses of the word ``line'' using the words in the current and proceeding sentence as context. The statistical and neural-network methods perform the best on this particular problem and we discuss a potential reason for this observed difference. We also discuss the role of bias in machine learning and its importance in explaining performance differences observed on specific problems.

Read more

Ready to get started?

Join us today