Kimmo Koskenniemi
University of Helsinki
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Kimmo Koskenniemi.
international conference on computational linguistics | 1990
Kimmo Koskenniemi
A language-independent method of finite-state surface syntactic parsing and word-disambiguation is discussed. Input sentences are represented as finite-state networks already containing all possible roles and interpretations of its units. Also syntactic constraint rules are represented as finite-state machines where each constraint excludes certain types of ungrammatical readings. The whole grammar is an intersection of its constraint rules and excludes all ungrammatical possibilities leaving the correct interpretation(s) of the sentence. The method is being tested for Finnish, Swedish and English.
international conference on computational linguistics | 1992
Kimmo Koskenniemi; Pasi Tapanainen; Atro Voutilainen
A language-independent framework for syntactic finite-state parsing is discussed. The article presents a framework, a formalism, a compiler and a parser for grammars written in this formalism. As a substantial example, fragments from a nontrivial finite-state grammar of English are discussed.The linguistic framework of the present approach is based on a surface syntactic tagging scheme by F. Karlsson. This representation is slightly less powerful than phrase structure tree notation, letting some ambiguous constructions be described more concisely.The finite-state rule compiler implements what was briefly sketched by Koskenniemi (1990). It is based on the calculus of finite-state machines. The compiler transforms rules into rule-automata. The run-time parser exploits one of certain alternative strategies in performing the effective intersection of the rule automata and the sentence automaton.Fragments of a fairly comprehensive finite-state grammar of English are presented here, including samples from non-finite constructions as a demonstration of the capacity of the present formalism, which goes far beyond plain disambiguation or part of speech tagging. The grammar itself is directly related to a parser and tagging system for English created as a part of project SIMPR using Karlssons CG (Constraint Grammar) formalism.
international conference on computational linguistics | 1988
Laura Kataja; Kimmo Koskenniemi
This paper discusses the problems of description and computational implementation of phonology and morphology in Semitic languages, using Ancient Akkadian as an example. Phonological and morphophonological variations are described using standard finite-state two-level morphological rules. Interdigitation, prefixation and suffixation are described by using an intersection of two lexicons which effectively defines lexical representations of words.
Natural Language Engineering | 2003
Lauri Karttunen; Kimmo Koskenniemi; Gertjan van Noord
Finite state methods have been in common use in various areas of natural language processing (NLP) for many years. A series of specialized workshops in this area illustrates this. In 1996, András Kornai organized a very successful workshop entitled Extended Finite State Models of Language. One of the results of that workshop was a special issue of Natural Language Engineering (Volume 2, Number 4). In 1998, Kemal Oflazer organized a workshop called Finite State Methods in Natural Language Processing. A selection of submissions for this workshop were later included in a special issue of Computational Linguistics (Volume 26, Number 1). Inspired by these events, Lauri Karttunen, Kimmo Koskenniemi and Gertjan van Noord took the initiative for a workshop on finite state methods in NLP in Helsinki, as part of the European Summer School in Language, Logic and Information. As a related special event, the 20th anniversary of two-level morphology was celebrated. The appreciation of these events led us to believe that once again it should be possible, with some additional submissions, to compose an interesting special issue of this journal.
international conference on computational linguistics | 1988
Kimmo Koskenniemi; Kenneth Ward Church
Although, Two-Level Morphology has been found in practice to be an extremely efficient method for processing Finnish words on very small machines, [Barton86] has recently shown the method to be NP-hard. This paper will discuss Bartons theoretical argument and explain why it has not been a problem for us in practice.
Natural Language Engineering | 1996
Kimmo Koskenniemi
A source of potential systematic errors in information retrieval is identified and discussed. These errors occur when base form reduction is applied with a (necessarily) finite dictionary. Formal methods for avoiding this error source are presented, along with some practical complexities met in its implementation.
international conference natural language processing | 2006
Anssi Yli-Jyrä; Kimmo Koskenniemi
New methods to compile morphophonological two-level rules into finite-state machines are presented. Compilation of the original and new two-level rules and grammars is formulated using an operation called the generalized restriction that constructs a one-tape finite-state automaton over an input alphabet of symbol pairs. The generalized restriction is first used to compile the original two-level formalism where the rules were restricted to single symbol pairs as their centers (i.e. the left-hand sides of the rules). The solution handles also strings of symbol pairs (or regular expressions over the pair alphabet) as centers of two-level rules. Then, the treatment of context conditions is generalized with unions and relative complements etc. Moreover, an extended rule type, the presence requirement, combines the generalized context conditions with center conditions at both sides of the rules. The left-hand side specifies where the rule applies and the right-hand side specifies which of the applications are successful. The original two-level grammars were represented as a separate finite-state machine for each rule and the whole grammar as their intersection. The new methods are used first to redefine this setup, and then to implement a uniform conflict resolution scheme for all rules. The resolution scheme prefers successful and the longest embedded applications of rules, but it treats partially overlapping or explicitly independent applications of rules conjunctively. The composite rules of the original formalism have a marginal status in the new formalism because only identity pairs are allowed in locations where no rule is applicable.
international conference on implementation and application of automata | 2007
Anssi Yli-Jyrä; Kimmo Koskenniemi
Kempe and Karttunen [1] have presented a method that compiles a set of parallel conditional replacement (rewriting) rules into a finite-state transducer. Other, simpler methods exist for single rules or for rules of a restricted type, but they can be used only in restricted situations.
Journal of Language Modelling | 2013
Kimmo Koskenniemi
The paper shows how a certain kind of underlying representations (or deep forms) of words can be constructed in a straightforward manner through aligning the surface forms of the morphs of the word forms. The inventory of morphophonemes follows directly from this alignment. Furthermore, the two-level rules which govern the different realisations of such morphophonemes follow fairly directly from the previous steps. The alignment and rules are based upon an approximate general metric among phonemes, e.g., articulatory features, that determines which alternations are likely or possible. This enables us to summarise contexts for the different realisations.
international conference natural language processing | 2006
Jyrki Niemi; Lauri Carlson; Kimmo Koskenniemi
This paper presents two string-based finite-state approaches to modelling the semantics of natural-language calendar expressions: extended regular expressions (XREs) over a timeline string of unique symbols, and a string of hierarchical periods of time constructed by finite-state transducers (FSTs). The approaches cover expressions ranging from plain dates and times of the day to more complex ones, such as the second Tuesday following Easter. The paper outlines the representations of sample calendar expressions in the two models, presents a possible application in temporal reasoning, and informally compares the models.