Featured Researches

Computation And Language

A framework for lexical representation

In this paper we present a unification-based lexical platform designed for highly inflected languages (like Roman ones). A formalism is proposed for encoding a lemma-based lexical source, well suited for linguistic generalizations. From this source, we automatically generate an allomorph indexed dictionary, adequate for efficient processing. A set of software tools have been implemented around this formalism: access libraries, morphological processors, etc.

Read more
Computation And Language

A generation algorithm for f-structure representations

This paper shows that previously reported generation algorithms run into problems when dealing with f-structure representations. A generation algorithm that is suitable for this type of representations is presented: the Semantic Kernel Generation (SKG) algorithm. The SKG method has the same processing strategy as the Semantic Head Driven generation (SHDG) algorithm and relies on the assumption that it is possible to compute the Semantic Kernel (SK) and non Semantic Kernel (Non-SK) information for each input structure.

Read more
Computation And Language

A lexical database tool for quantitative phonological research

A lexical database tool tailored for phonological research is described. Database fields include transcriptions, glosses and hyperlinks to speech files. Database queries are expressed using HTML forms, and these permit regular expression search on any combination of fields. Regular expressions are passed directly to a Perl CGI program, enabling the full flexibility of Perl extended regular expressions. The regular expression notation is extended to better support phonological searches, such as search for minimal pairs. Search results are presented in the form of HTML or LaTeX tables, where each cell is either a number (representing frequency) or a designated subset of the fields. Tables have up to four dimensions, with an elegant system for specifying which fragments of which fields should be used for the row/column labels. The tool offers several advantages over traditional methods of analysis: (i) it supports a quantitative method of doing phonological research; (ii) it gives universal access to the same set of informants; (iii) it enables other researchers to hear the original speech data without having to rely on published transcriptions; (iv) it makes the full power of regular expression search available, and search results are full multimedia documents; and (v) it enables the early refutation of false hypotheses, shortening the analysis-hypothesis-test loop. A life-size application to an African tone language (Dschang) is used for exemplification throughout the paper. The database contains 2200 records, each with approximately 15 fields. Running on a PC laptop with a stand-alone web server, the `Dschang HyperLexicon' has already been used extensively in phonological fieldwork and analysis in Cameroon.

Read more
Computation And Language

Active Constraints for a Direct Interpretation of HPSG

In this paper, we characterize the properties of a direct interpretation of HPSG and present the advantages of this approach. High-level programming languages constitute in this perspective an efficient solution: we show how a multi-paradigm approach, containing in particular constraint logic programming, offers mechanims close to that of the theory and preserves its fundamental properties. We take the example of LIFE and describe the implementation of the main HPSG mechanisms.

Read more
Computation And Language

Adapting the Core Language Engine to French and Spanish

We describe how substantial domain-independent language-processing systems for French and Spanish were quickly developed by manually adapting an existing English-language system, the SRI Core Language Engine. We explain the adaptation process in detail, and argue that it provides a fairly general recipe for converting a grammar-based system for English into a corresponding one for a Romance language.

Read more
Computation And Language

Adjunction As Substitution: An Algebraic Formulation of Regular, Context-Free and Tree Adjoining Languages

This note presents a method of interpreting the tree adjoining languages as the natural third step in a hierarchy that starts with the regular and the context-free languages. The central notion in this account is that of a higher-order substitution. Whereas in traditional presentations of rule systems for abstract language families the emphasis has been on a first-order substitution process in which auxiliary variables are replaced by elements of the carrier of the proper algebra - concatenations of terminal and auxiliary category symbols in the string case - we lift this process to the level of operations defined on the elements of the carrier of the algebra. Our own view is that this change of emphasis provides the adequate platform for a better understanding of the operation of adjunction. To put it in a nutshell: Adjoining is not a first-order, but a second-order substitution operation.

Read more
Computation And Language

Aggregate and mixed-order Markov models for statistical language processing

We consider the use of language models whose size and accuracy are intermediate between different order n-gram models. Two types of models are studied in particular. Aggregate Markov models are class-based bigram models in which the mapping from words to classes is probabilistic. Mixed-order Markov models combine bigram models whose predictions are conditioned on different words. Both types of models are trained by Expectation-Maximization (EM) algorithms for maximum likelihood estimation. We examine smoothing procedures in which these models are interposed between different order n-grams. This is found to significantly reduce the perplexity of unseen word combinations.

Read more
Computation And Language

Algorithms for Speech Recognition and Language Processing

Speech processing requires very efficient methods and algorithms. Finite-state transducers have been shown recently both to constitute a very useful abstract model and to lead to highly efficient time and space algorithms in this field. We present these methods and algorithms and illustrate them in the case of speech recognition. In addition to classical techniques, we describe many new algorithms such as minimization, global and local on-the-fly determinization of weighted automata, and efficient composition of transducers. These methods are currently used in large vocabulary speech recognition systems. We then show how the same formalism and algorithms can be used in text-to-speech applications and related areas of language processing such as morphology, syntax, and local grammars, in a very efficient way. The tutorial is self-contained and requires no specific computational or linguistic knowledge other than classical results.

Read more
Computation And Language

Amalia -- A Unified Platform for Parsing and Generation

Contemporary linguistic theories (in particular, HPSG) are declarative in nature: they specify constraints on permissible structures, not how such structures are to be computed. Grammars designed under such theories are, therefore, suitable for both parsing and generation. However, practical implementations of such theories don't usually support bidirectional processing of grammars. We present a grammar development system that includes a compiler of grammars (for parsing and generation) to abstract machine instructions, and an interpreter for the abstract machine language. The generation compiler inverts input grammars (designed for parsing) to a form more suitable for generation. The compiled grammars are then executed by the interpreter using one control strategy, regardless of whether the grammar is the original or the inverted version. We thus obtain a unified, efficient platform for developing reversible grammars.

Read more
Computation And Language

Ambiguity in the Acquisition of Lexical Information

This paper describes an approach to the automatic identification of lexical information in on-line dictionaries. This approach uses bootstrapping techniques, specifically so that ambiguity in the dictionary text can be treated properly. This approach consists of processing an on-line dictionary multiple times, each time refining the lexical information previously acquired and identifying new lexical information. The strength of this approach is that lexical information can be acquired from definitions which are syntactically ambiguous, given that information acquired during the first pass can be used to improve the syntactic analysis of definitions in subsequent passes. In the context of a lexical knowledge base, the types of lexical information that need to be represented cannot be viewed as a fixed set, but rather as a set that will change given the resources of the lexical knowledge base and the requirements of analysis systems which access it.

Read more

Ready to get started?

Join us today