Garrett Nicolai
University of Alberta
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Garrett Nicolai.
north american chapter of the association for computational linguistics | 2015
Garrett Nicolai; Colin Cherry; Grzegorz Kondrak
We approach the task of morphological inflection generation as discriminative string transduction. Our supervised system learns to generate word-forms from lemmas accompanied by morphological tags, and refines them by referring to the other forms within a paradigm. Results of experiments on six diverse languages with varying amounts of training data demonstrate that our approach improves the state of the art in terms of predicting inflected word-forms.
meeting of the association for computational linguistics | 2015
Garrett Nicolai; B. D. Hauer; Mohammad Salameh; Adam St Arnaud; Ying Xu; Lei Yao; Grzegorz Kondrak
We report the results of our experiments in the context of the NEWS 2015 Shared Task on Transliteration. We focus on methods of combining multiple base systems, and leveraging transliterations from multiple languages. We show error reductions over the best base system of up to 10% when using supplemental transliterations, and up to 20% when using system combination. We also discuss the quality of the shared task datasets.
meeting of the association for computational linguistics | 2014
Garrett Nicolai; Grzegorz Kondrak
The relative frequencies of character bigrams appear to contain much information for predicting the first language (L1) of the writer of a text in another language (L2). Tsur and Rappoport (2007) interpret this fact as evidence that word choice is dictated by the phonology of L1. In order to test their hypothesis, we design an algorithm to identify the most discriminative words and the corresponding character bigrams, and perform two experiments to quantify their impact on the L1 identification task. The results strongly suggest an alternative explanation of the effectiveness of character bigrams in identifying the native language of a writer.
north american chapter of the association for computational linguistics | 2015
Garrett Nicolai; Colin Cherry; Grzegorz Kondrak
We replicate the syntactic experiments of Mikolov et al. (2013b) on English, and expand them to include morphologically complex languages. We learn vector representations for Dutch, French, German, and Spanish with the WORD2VEC tool, and investigate to what extent inflectional information is preserved across vectors. We observe that the accuracy of vectors on a set of syntactic analogies is inversely correlated with the morphological complexity of the language.
meeting of the association for computational linguistics | 2016
Garrett Nicolai; Grzegorz Kondrak
We present several methods for stemming and lemmatization based on discriminative string transduction. We exploit the paradigmatic regularity of semi-structured inflection tables to identify stems in an unsupervised manner with over 85% accuracy. Experiments on English, Dutch and German show that our stemmers substantially outperform Snowball and Morfessor, and approach the accuracy of a supervised model. Furthermore, the generated stems are more consistent than those annotated by experts. Our direct lemmatization model is more accurate than Morfette and Lemming on most datasets. Finally, we test our methods on the data from the shared task on morphological reinflection.
2013 International Conference on Electrical Information and Communication Technology (EICT) | 2014
Garrett Nicolai; Asadul Islam; Russell Greiner
Native Language Identification (NLI) is the task of identifying the native language of an author of a text written in a second language. Support Vector Machines and Maximum Entropy Learners are the most common methods used to solve this problem, but we consider it from the point-of-view of probabilistic graphical models. We hypothesize that graphical models are well-suited to this task, as they can capture feature inter-dependencies that cannot be exploited by SVMs. Using progressively more connected graphical models, we show that these models out-perform SVMs on reduced feature sets. Furthermore, on full feature sets, even naïve Bayes increases accuracy from 82.06% to 83.41% over SVMs on a 5-language classification task.
Proceedings of the CoNLL SIGMORPHON 2017 Shared Task: Universal Morphological Reinflection | 2017
Garrett Nicolai; B. D. Hauer; Mohammad Motallebi; Saeed Najafi; Grzegorz Kondrak
We describe our approach and experiments in the context of the CoNLLSIGMORPHON 2017 Shared Task on Universal Morphological Reinflection. We combine a discriminative transduction system with neural models. The results on five languages show that our approach works well in the low-resource setting. We also investigate adaptations designed to handle small training sets.
Proceedings of the 14th SIGMORPHON Workshop on Computational Research in#N# Phonetics, Phonology, and Morphology | 2016
Garrett Nicolai; B. D. Hauer; Adam St Arnaud; Grzegorz Kondrak
We describe our approach and experiments in the context of the SIGMORPHON 2016 Shared Task on Morphological Reinflection. The results show that the methods of Nicolai et al. (2015) perform well on typologically diverse languages. We also discuss language-specific heuristics and errors.
Proceedings of the 14th SIGMORPHON Workshop on Computational Research in#N# Phonetics, Phonology, and Morphology | 2016
Garrett Nicolai; Lei Yao; Grzegorz Kondrak
Syllabification is sometimes influenced by morphological boundaries. We show that incorporating morphological information can improve the accuracy of orthographic syllabification in English and German. Surprisingly, unsupervised segmenters, such as Morfessor, can be more useful for this purpose than the supervised ones.
north american chapter of the association for computational linguistics | 2015
Garrett Nicolai; Grzegorz Kondrak
In spite of the apparent irregularity of the English spelling system, Chomsky and Halle (1968) characterize it as “near optimal”. We investigate this assertion using computational techniques and resources. We design an algorithm to generate word spellings that maximize both phonemic transparency and morphological consistency. Experimental results demonstrate that the constructed system is much closer to optimality than the traditional English orthography.