Featured Researches

Computation And Language

A Semantics-based Communication System for Dysphasic Subjects

Dysphasic subjects do not have complete linguistic abilities and only produce a weakly structured, topicalized language. They are offered artificial symbolic languages to help them communicate in a way more adapted to their linguistic abilities. After a structural analysis of a corpus of utterances from children with cerebral palsy, we define a semantic lexicon for such a symbolic language. We use it as the basis of a semantic analysis process able to retrieve an interpretation of the utterances. This semantic analyser is currently used in an application designed to convert iconic languages into natural language; it might find other uses in the field of language rehabilitation.

Read more
Computation And Language

A Sign-Based Phrase Structure Grammar for Turkish

This study analyses Turkish syntax from an informational point of view. Sign based linguistic representation and principles of HPSG (Head-driven Phrase Structure Grammar) theory are adapted to Turkish. The basic informational elements are nested and inherently sorted feature structures called signs. In the implementation, logic programming tool ALE (Attribute Logic Engine) which is primarily designed for implementing HPSG grammars is used. A type and structure hierarchy of Turkish language is designed. Syntactic phenomena such a s subcategorization, relative clauses, constituent order variation, adjuncts, nomina l predicates and complement-modifier relations in Turkish are analyzed. A parser is designed and implemented in ALE.

Read more
Computation And Language

A Simple Transformation for Offline-Parsable Grammars and its Termination Properties

We present, in easily reproducible terms, a simple transformation for offline-parsable grammars which results in a provably terminating parsing program directly top-down interpretable in Prolog. The transformation consists in two steps: (1) removal of empty-productions, followed by: (2) left-recursion elimination. It is related both to left-corner parsing (where the grammar is compiled, rather than interpreted through a parsing program, and with the advantage of guaranteed termination in the presence of empty productions) and to the Generalized Greibach Normal Form for DCGs (with the advantage of implementation simplicity).

Read more
Computation And Language

A Study of the Context(s) in a Specific Type of Texts: Car Accident Reports

This paper addresses the issue of defining context, and more specifically the different contexts needed for understanding a particular type of texts. The corpus chosen is homogeneous and allows us to determine characteristic properties of the texts from which certain inferences can be drawn by the reader. These characteristic properties come from the real world domain (K-context), the type of events the texts describe (F-context) and the genre of the texts (E-context). Together, these three contexts provide elements for the resolution of anaphoric expressions and for several types of disambiguation. We show in particular that the argumentation aspect of these texts is an essential part of the context and explains some of the inferences that can be drawn.

Read more
Computation And Language

A Support Tool for Tagset Mapping

Many different tagsets are used in existing corpora; these tagsets vary according to the objectives of specific projects (which may be as far apart as robust parsing vs. spelling correction). In many situations, however, one would like to have uniform access to the linguistic information encoded in corpus annotations without having to know the classification schemes in detail. This paper describes a tool which maps unstructured morphosyntactic tags to a constraint-based, typed, configurable specification language, a ``standard tagset''. The mapping relies on a manually written set of mapping rules, which is automatically checked for consistency. In certain cases, unsharp mappings are unavoidable, and noise, i.e. groups of word forms {\sl not} conforming to the specification, will appear in the output of the mapping. The system automatically detects such noise and informs the user about it. The tool has been tested with rules for the UPenn tagset \cite{up} and the SUSANNE tagset \cite{garside}, in the framework of the EAGLES\footnote{LRE project EAGLES, cf. \cite{eagles}.} validation phase for standardised tagsets for European languages.

Read more
Computation And Language

A Theory of Parallelism and the Case of VP Ellipsis

We provide a general account of parallelism in discourse, and apply it to the special case of resolving possible readings for instances of VP ellipsis. We show how several problematic examples are accounted for in a natural and straightforward fashion. The generality of the approach makes it directly applicable to a variety of other types of ellipsis and reference.

Read more
Computation And Language

A Variant of Earley Parsing

The Earley algorithm is a widely used parsing method in natural language processing applications. We introduce a variant of Earley parsing that is based on a ``delayed'' recognition of constituents. This allows us to start the recognition of a constituent only in cases in which all of its subconstituents have been found within the input string. This is particularly advantageous in several cases in which partial analysis of a constituent cannot be completed and in general in all cases of productions sharing some suffix of their right-hand sides (even for different left-hand side nonterminals). Although the two algorithms result in the same asymptotic time and space complexity, from a practical perspective our algorithm improves the time and space requirements of the original method, as shown by reported experimental results.

Read more
Computation And Language

A Word Grammar of Turkish with Morphophonemic Rules

In this thesis, morphological description of Turkish is encoded using the two-level model. This description is made up of the phonological component that contains the two-level morphophonemic rules, and the lexicon component which lists the lexical items and encodes the morphotactic constraints. The word grammar is expressed in tabular form. It includes the verbal and the nominal paradigm. Vowel and consonant harmony, epenthesis, reduplication, etc. are described in detail and coded in two-level notation. Loan-word phonology is modelled separately. The implementation makes use of Lexc/Twolc from Xerox. Mechanisms to integrate the morphological analyzer with the lexical and syntactic components are discussed, and a simple graphical user interface is provided. Work is underway to use this model in a classroom setting for teaching Turkish morphology to non-native speakers.

Read more
Computation And Language

A Word-to-Word Model of Translational Equivalence

Many multilingual NLP applications need to translate words between different languages, but cannot afford the computational expense of inducing or applying a full translation model. For these applications, we have designed a fast algorithm for estimating a partial translation model, which accounts for translational equivalence only at the word level. The model's precision/recall trade-off can be directly controlled via one threshold parameter. This feature makes the model more suitable for applications that are not fully statistical. The model's hidden parameters can be easily conditioned on information extrinsic to the model, providing an easy way to integrate pre-existing knowledge such as part-of-speech, dictionaries, word order, etc.. Our model can link word tokens in parallel texts as well as other translation models in the literature. Unlike other translation models, it can automatically produce dictionary-sized translation lexicons, and it can do so with over 99% accuracy.

Read more
Computation And Language

A complexity measure for diachronic Chinese phonology

This paper addresses the problem of deriving distance measures between parent and daughter languages with specific relevance to historical Chinese phonology. The diachronic relationship between the languages is modelled as a Probabilistic Finite State Automaton. The Minimum Message Length principle is then employed to find the complexity of this structure. The idea is that this measure is representative of the amount of dissimilarity between the two languages.

Read more

Ready to get started?

Join us today