Featured Researches

Computation And Language

A Flexible POS tagger Using an Automatically Acquired Language Model

We present an algorithm that automatically learns context constraints using statistical decision trees. We then use the acquired constraints in a flexible POS tagger. The tagger is able to use information of any degree: n-grams, automatically learned context constraints, linguistically motivated manually written constraints, etc. The sources and kinds of constraints are unrestricted, and the language model can be easily extended, improving the results. The tagger has been tested and evaluated on the WSJ corpus.

Read more
Computation And Language

A Framework for Natural Language Interfaces to Temporal Databases

Over the past thirty years, there has been considerable progress in the design of natural language interfaces to databases. Most of this work has concerned snapshot databases, in which there are only limited facilities for manipulating time-varying information. The database community is becoming increasingly interested in temporal databases, databases with special support for time-dependent entries. We have developed a framework for constructing natural language interfaces to temporal databases, drawing on research on temporal phenomena within logic and linguistics. The central part of our framework is a logic-like formal language, called TOP, which can capture the semantics of a wide range of English sentences. We have implemented an HPSG-based sentence analyser that converts a large set of English queries involving time into TOP formulae, and have formulated a provably correct procedure for translating TOP expressions into queries in the TSQL2 temporal database language. In this way we have established a sound route from English to a general-purpose temporal database language.

Read more
Computation And Language

A General Architecture for Language Engineering (GATE) - a new approach to Language Engineering R&D

This report argues for the provision of a common software infrastructure for NLP systems. Current trends in Language Engineering research are reviewed as motivation for this infrastructure, and relevant recent work discussed. A freely-available system called GATE is described which builds on this work.

Read more
Computation And Language

A General, Sound and Efficient Natural Language Parsing Algorithm based on Syntactic Constraints Propagation

This paper presents a new context-free parsing algorithm based on a bidirectional strictly horizontal strategy which incorporates strong top-down predictions (derivations and adjacencies). From a functional point of view, the parser is able to propagate syntactic constraints reducing parsing ambiguity. From a computational perspective, the algorithm includes different techniques aimed at the improvement of the manipulation and representation of the structures used.

Read more
Computation And Language

A Geometric Approach to Mapping Bitext Correspondence

The first step in most corpus-based multilingual NLP work is to construct a detailed map of the correspondence between a text and its translation. Several automatic methods for this task have been proposed in recent years. Yet even the best of these methods can err by several typeset pages. The Smooth Injective Map Recognizer (SIMR) is a new bitext mapping algorithm. SIMR's errors are smaller than those of the previous front-runner by more than a factor of 4. Its robustness has enabled new commercial-quality applications. The greedy nature of the algorithm makes it independent of memory resources. Unlike other bitext mapping algorithms, SIMR allows crossing correspondences to account for word order differences. Its output can be converted quickly and easily into a sentence alignment. SIMR's output has been used to align over 200 megabytes of the Canadian Hansards for publication by the Linguistic Data Consortium.

Read more
Computation And Language

A Grammar Formalism and Cross-Serial Dependencies

First we define a unification grammar formalism called the Tree Homomorphic Feature Structure Grammar. It is based on Lexical Functional Grammar (LFG), but has a strong restriction on the syntax of the equations. We then show that this grammar formalism defines a full abstract family of languages, and that it is capable of describing cross-serial dependencies of the type found in Swiss German.

Read more
Computation And Language

A Hybrid Environment for Syntax-Semantic Tagging

The thesis describes the application of the relaxation labelling algorithm to NLP disambiguation. Language is modelled through context constraint inspired on Constraint Grammars. The constraints enable the use of a real value statind "compatibility". The technique is applied to POS tagging, Shallow Parsing and Word Sense Disambigation. Experiments and results are reported. The proposed approach enables the use of multi-feature constraint models, the simultaneous resolution of several NL disambiguation tasks, and the collaboration of linguistic and statistical models.

Read more
Computation And Language

A Labelled Analytic Theorem Proving Environment for Categorial Grammar

We present a system for the investigation of computational properties of categorial grammar parsing based on a labelled analytic tableaux theorem prover. This proof method allows us to take a modular approach, in which the basic grammar can be kept constant, while a range of categorial calculi can be captured by assigning different properties to the labelling algebra. The theorem proving strategy is particularly well suited to the treatment of categorial grammar, because it allows us to distribute the computational cost between the algorithm which deals with the grammatical types and the algebraic checker which constrains the derivation.

Read more
Computation And Language

A Lexical Semantic Database for Verbmobil

This paper describes the development and use of a lexical semantic database for the Verbmobil speech-to-speech machine translation system. The motivation is to provide a common information source for the distributed development of the semantics, transfer and semantic evaluation modules and to store lexical semantic information application-independently. The database is organized around a set of abstract semantic classes and has been used to define the semantic contributions of the lemmata in the vocabulary of the system, to automatically create semantic lexica and to check the correctness of the semantic representations built up. The semantic classes are modelled using an inheritance hierarchy. The database is implemented using the lexicon formalism LeX4 developed during the project.

Read more
Computation And Language

A Lexicalist Approach to the Translation of Colloquial Text

Colloquial English (CE) as found in television programs or typical conversations is different than text found in technical manuals, newspapers and books. Phrases tend to be shorter and less sophisticated. In this paper, we look at some of the theoretical and implementational issues involved in translating CE. We present a fully automatic large-scale multilingual natural language processing system for translation of CE input text, as found in the commercially transmitted closed-caption television signal, into simple target sentences. Our approach is based on the Whitelock's Shake and Bake machine translation paradigm, which relies heavily on lexical resources. The system currently translates from English to Spanish with the translation modules for Brazilian Portuguese under development.

Read more

Ready to get started?

Join us today