Mans Hulden
University of Arizona
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Mans Hulden.
conference of the european chapter of the association for computational linguistics | 2014
Mans Hulden; Markus Forsberg; Malin Ahlberg
We present a semi-supervised approach to the problem of paradigm induction from inflection tables. Our system extracts generalizations from inflection tables, representing the resulting paradigms in an abstract form. The process is intended to be language-independent, and to provide human-readable generalizations of paradigms. The tools we provide can be used by linguists for the rapid creation of lexical resources. We evaluate the system through an inflection table reconstruction task using Wiktionary data for German, Spanish, and Finnish. With no additional corpus information available, the evaluation yields per word form accuracy scores on inflecting unseen base forms in different languages ranging from 87.81% (German nouns) to 99.52% (Spanish verbs); with additional unlabeled text corpora available for training the scores range from 91.81% (German nouns) to 99.58% (Spanish verbs). We separately evaluate the system in a simulated task of Swedish lexicon creation, and show that on the basis of a small number of inflection tables, the system can accurately collect from a list of noun forms a lexicon with inflection information ranging from 100.0% correct (collect 100 words), to 96.4% correct (collect 1000 words).
north american chapter of the association for computational linguistics | 2015
Malin Ahlberg; Markus Forsberg; Mans Hulden
Supervised morphological paradigm learning by identifying and aligning the longest common subsequence found in inflection tables has recently been proposed as a simple yet competitive way to induce morphological patterns. We combine this non-probabilistic strategy of inflection table generalization with a discriminative classifier to permit the reconstruction of complete inflection tables of unseen words. Our system learns morphological paradigms from labeled examples of inflection patterns (inflection tables) and then produces inflection tables from unseen lemmas or base forms. We evaluate the approach on datasets covering 11 different languages and show that this approach results in consistently higher accuracies vis-` other methods on the same task, thus indicating that the general method is a viable approach to quickly creating highaccuracy morphological resources.
Proceedings of the 14th SIGMORPHON Workshop on Computational Research in#N# Phonetics, Phonology, and Morphology | 2016
Ryan Cotterell; Christo Kirov; John Sylak-Glassman; David Yarowsky; Jason Eisner; Mans Hulden
The 2016 SIGMORPHON Shared Task was devoted to the problem of morphological reinflection. It introduced morphological datasets for 10 languages with diverse typological characteristics. The shared task drew submissions from 9 teams representing 11 institutions reflecting a variety of approaches to addressing supervised learning of reinflection. For the simplest task, inflection generation from lemmas, the best system averaged 95.56% exact-match accuracy across all languages, ranging from Maltese (88.99%) to Hungarian (99.30%). With the relatively large training datasets provided, recurrent neural network architectures consistently performed best—in fact, there was a significant margin between neural and non-neural approaches. The best neural approach, averaged over all tasks and languages, outperformed the best nonneural one by 13.76% absolute; on individual tasks and languages the gap in accuracy sometimes exceeded 60%. Overall, the results show a strong state of the art, and serve as encouragement for future shared tasks that explore morphological analysis and generation with varying degrees of supervision.
Journal of Language Modelling | 2016
Manex Agirrezabal; Aitzol Astigarraga; Bertol Arrieta; Mans Hulden
We present a finite state technology based system capable of performing metrical scansion of verse written in English. Scansion is the traditional task of analyzing the lines of a poem, marking the stressed and non-stressed elements, and dividing the line into metrical feet. The system’s workflow is composed of several subtasks designed around finite state machines that analyze verse by performing tokenization, part of speech tagging, stress placement, and unknown word stress pattern guessing. The scanner also classifies its input according to the predominant type of metrical foot found. We also present a brief evaluation of the system using a gold standard corpus of human-scanned verse, on which a per-syllable accuracy of 86.78% is reached. The program uses open-source components and is released under the GNU GPL license.
finite state methods and natural language processing | 2005
Mans Hulden
We explore general strategies for finite-state syllabification and describe a specific implementation of a wide-coverage syllabifier for English, as well as outline methods to implement differing ideas encountered in the phonological literature about the English syllable. The syllable is a central phonological unit to which many allophonic variations are sensitive. How a word is syllabified is a non-trivial problem and reliable methods are useful in computational systems that deal with non-orthographic representations of language, for instance phonological research, text-to-speech systems, and speech recognition. The construction strategies for producing syllabifying transducers outlined here are not theory-specific and should be applicable to generalizations made within most phonological frameworks.
finite state methods and natural language processing | 2009
Iñaki Alegria; Izaskun Etxeberria; Mans Hulden; Montserrat Maritxalar
Basque is a morphologically rich language, of which several finite-state morphological descriptions have been constructed, primarily using the Xerox/PARC finite-state tools. In this paper we describe the process of porting a previous description of Basque morphology to foma, an open-source finite-state toolkit compatible with Xerox tools, provide a comparison of the two tools, and contrast the development of a twolevel grammar with parallel alternation rules and a sequential grammar developed by composing individual replacement rules.
conference of the european chapter of the association for computational linguistics | 2009
Mans Hulden
Various methods have been devised to produce morphological analyzers and generators for Semitic languages, ranging from methods based on widely used finite-state technologies to very specific solutions designed for a specific language or problem. Since the earliest proposals of how to adopt the elsewhere successful finite-state methods to root-and-pattern morphologies, the solution of encoding Semitic grammars using multi-tape automata has resurfaced on a regular basis. Multi-tape automata, however, require specific algorithms and reimplementation of finite-state operators across the board, and hence such technology has not been readily available to linguists. This paper, using an actual Arabic grammar as a case study, describes an approach to encoding multi-tape automata on a single tape that can be implemented using any standard finite-automaton toolkit.
Proceedings of the 2014 Joint Meeting of SIGMORPHON and SIGFSM | 2014
Mans Hulden
Extracting and performing an alignment of the longest common subsequence in inflection tables has been shown to be a fruitful approach to supervised learning of morphological paradigms. However, finding the longest subsequence common to multiple strings is well known to be an intractable problem. Additional constraints on the solution sought complicate the problem further—such as requiring that the particular subsequence extracted, if there is ambiguity, be one that is best alignable in an inflection table. In this paper we present and discuss the design of a tool that performs the extraction through some advanced techniques in finite state calculus and does so efficiently enough for the practical purposes of inflection table generalization.
language and technology conference | 2009
Mans Hulden
We present a parsing algorithm for arbitrary context-free and probabilistic context-free grammars based on a representation of such grammars as a combination of a regular grammar and a grammar of balanced parentheses, similar to the representation used in the Chomsky-Schutzenberger theorem. The basic algorithm has the same worst-case complexity as the popular CKY and Earley parsing algorithms frequently employed in natural language processing tasks.
conference on computational natural language learning | 2017
Mans Hulden
This paper explores a divisive hierarchical clustering algorithm based on the well-known Obligatory Contour Principle in phonology. The purpose is twofold: to see if such an algorithm could be used for unsupervised classification of phonemes or graphemes in corpora, and to investigate whether this purported universal constraint really holds for several classes of phonological distinctive features. The algorithm achieves very high accuracies in an unsupervised setting of inferring a consonant-vowel distinction, and also has a strong tendency to detect coronal phonemes in an unsupervised fashion. Remaining classes, however, do not correspond as neatly to phonological distinctive feature splits. While the results offer only mixed support for a universal Obligatory Contour Principle, the algorithm can be very useful for many NLP tasks due to the high accuracy in revealing consonant/vowel/coronal distinctions.