Featured Researches

Computation And Language

"I don't believe in word senses"

Word sense disambiguation assumes word senses. Within the lexicography and linguistics literature, they are known to be very slippery entities. The paper looks at problems with existing accounts of `word sense' and describes the various kinds of ways in which a word's meaning can deviate from its core meaning. An analysis is presented in which word senses are abstractions from clusters of corpus citations, in accordance with current lexicographic practice. The corpus citations, not the word senses, are the basic objects in the ontology. The corpus citations will be clustered into senses according to the purposes of whoever or whatever does the clustering. In the absence of such purposes, word senses do not exist. Word sense disambiguation also needs a set of word senses to disambiguate between. In most recent work, the set has been taken from a general-purpose lexical resource, with the assumption that the lexical resource describes the word senses of English/French/..., between which NLP applications will need to disambiguate. The implication of the paper is, by contrast, that word senses exist only relative to a task.

Read more
Computation And Language

A Bayesian hybrid method for context-sensitive spelling correction

Two classes of methods have been shown to be useful for resolving lexical ambiguity. The first relies on the presence of particular words within some distance of the ambiguous target word; the second uses the pattern of words and part-of-speech tags around the target word. These methods have complementary coverage: the former captures the lexical ``atmosphere'' (discourse topic, tense, etc.), while the latter captures local syntax. Yarowsky has exploited this complementarity by combining the two methods using decision lists. The idea is to pool the evidence provided by the component methods, and to then solve a target problem by applying the single strongest piece of evidence, whatever type it happens to be. This paper takes Yarowsky's work as a starting point, applying decision lists to the problem of context-sensitive spelling correction. Decision lists are found, by and large, to outperform either component method. However, it is found that further improvements can be obtained by taking into account not just the single strongest piece of evidence, but ALL the available evidence. A new hybrid method, based on Bayesian classifiers, is presented for doing this, and its performance improvements are demonstrated.

Read more
Computation And Language

A Categorial Framework for Composition in Multiple Linguistic Domains

This paper describes a computational framework for a grammar architecture in which different linguistic domains such as morphology, syntax, and semantics are treated not as separate components but compositional domains. Word and phrase formation are modeled as uniform processes contributing to the derivation of the semantic form. The morpheme, as well as the lexeme, has lexical representation in the form of semantic content, tactical constraints, and phonological realization. The motivation for this work is to handle morphology-syntax interaction (e.g., valency change in causatives, subcategorization imposed by case-marking affixes) in an incremental way. The model is based on Combinatory Categorial Grammars.

Read more
Computation And Language

A Chart Generator for Shake and Bake Machine Translation

A generation algorithm based on an active chart parsing algorithm is introduced which can be used in conjunction with a Shake and Bake machine translation system. A concise Prolog implementation of the algorithm is provided, and some performance comparisons with a shift-reduce based algorithm are given which show the chart generator is much more efficient for generating all possible sentences from an input specification.

Read more
Computation And Language

A Comparative Study of the Application of Different Learning Techniques to Natural Language Interfaces

In this paper we present first results from a comparative study. Its aim is to test the feasibility of different inductive learning techniques to perform the automatic acquisition of linguistic knowledge within a natural language database interface. In our interface architecture the machine learning module replaces an elaborate semantic analysis component. The learning module learns the correct mapping of a user's input to the corresponding database command based on a collection of past input data. We use an existing interface to a production planning and control system as evaluation and compare the results achieved by different instance-based and model-based learning algorithms.

Read more
Computation And Language

A Comparison of WordNet and Roget's Taxonomy for Measuring Semantic Similarity

This paper presents the results of using Roget's International Thesaurus as the taxonomy in a semantic similarity measurement task. Four similarity metrics were taken from the literature and applied to Roget's The experimental evaluation suggests that the traditional edge counting approach does surprisingly well (a correlation of r=0.88 with a benchmark set of human similarity judgements, with an upper bound of r=0.90 for human subjects performing the same task.)

Read more
Computation And Language

A Compositional Treatment of Polysemous Arguments in Categorial Grammar

We discuss an extension of the standard logical rules (functional application and abstraction) in Categorial Grammar (CG), in order to deal with some specific cases of polysemy. We borrow from Generative Lexicon theory which proposes the mechanism of {\em coercion}, next to a rich nominal lexical semantic structure called {\em qualia structure}. In a previous paper we introduced coercion into the framework of {\em sign-based} Categorial Grammar and investigated its impact on traditional Fregean compositionality. In this paper we will elaborate on this idea, mostly working towards the introduction of a new semantic dimension. Where in current versions of sign-based Categorial Grammar only two representations are derived: a prosodic one (form) and a logical one (modelling), here we introduce also a more detaled representation of the lexical semantics. This extra knowledge will serve to account for linguistic phenomena like {\em metonymy\/}.

Read more
Computation And Language

A Computational Approach to Aspectual Composition

In this paper, I argue, contrary to the prevailing opinion in the linguistics and philosophy literature, that a sortal approach to aspectual composition can indeed be explanatory. In support of this view, I develop a synthesis of competing proposals by Hinrichs, Krifka and Jackendoff which takes Jackendoff's cross-cutting sortal distinctions as its point of departure. To show that the account is well-suited for computational purposes, I also sketch an implemented calculus of eventualities which yields many of the desired inferences. Further details on both the model-theoretic semantics and the implementation can be found in (White, 1994).

Read more
Computation And Language

A Conceptual Reasoning Approach to Textual Ellipsis

We present a hybrid text understanding methodology for the resolution of textual ellipsis. It integrates conceptual criteria (based on the well-formedness and conceptual strength of role chains in a terminological knowledge base) and functional constraints reflecting the utterances' information structure (based on the distinction between context-bound and unbound discourse elements). The methodological framework for text ellipsis resolution is the centering model that has been adapted to these constraints.

Read more
Computation And Language

A Constraint-based Case Frame Lexicon

We present a constraint-based case frame lexicon architecture for bi-directional mapping between a syntactic case frame and a semantic frame. The lexicon uses a semantic sense as the basic unit and employs a multi-tiered constraint structure for the resolution of syntactic information into the appropriate senses and/or idiomatic usage. Valency changing transformations such as morphologically marked passivized or causativized forms are handled via lexical rules that manipulate case frames templates. The system has been implemented in a typed-feature system and applied to Turkish.

Read more

Ready to get started?

Join us today