Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Roman Grundkiewicz is active.

Publication


Featured researches published by Roman Grundkiewicz.


conference on computational natural language learning | 2014

The AMU System in the CoNLL-2014 Shared Task: Grammatical Error Correction by Data-Intensive and Feature-Rich Statistical Machine Translation

Marcin Junczys-Dowmunt; Roman Grundkiewicz

Statistical machine translation toolkits like Moses have not been designed with grammatical error correction in mind. In order to achieve competitive results in this area, it is not enough to simply add more data. Optimization procedures need to be customized, task-specific features should be introduced. Only then can the decoder take advantage of relevant data. We demonstrate the validity of the above claims by combining web-scale language models and large-scale error-corrected texts with parameter tuning according to the task metric and correction-specific features. Our system achieves a result of 35.0% F0.5 on the blind CoNLL-2014 test set, ranking on third place. A similar system, equipped with identical models but without tuned parameters and specialized features, stagnates at 25.4%.


arXiv: Computation and Language | 2016

Log-linear Combinations of Monolingual and Bilingual Neural Machine Translation Models for Automatic Post-Editing.

Marcin Junczys-Dowmunt; Roman Grundkiewicz

This paper describes the submission of the AMU (Adam Mickiewicz University) team to the Automatic Post-Editing (APE) task of WMT 2016. We explore the application of neural translation models to the APE problem and achieve good results by treating different models as components in a log-linear model, allowing for multiple inputs (the MT-output and the source) that are decoded to the same target language (post-edited translations). A simple string-matching penalty integrated within the log-linear model is used to control for higher faithfulness with regard to the raw machine translation output. To overcome the problem of too little training data, we generate large amounts of artificial data. Our submission improves over the uncorrected baseline on the unseen test set by -3.2\% TER and +5.5\% BLEU and outperforms any other system submitted to the shared-task by a large margin.


empirical methods in natural language processing | 2016

Phrase-based Machine Translation is State-of-the-Art for Automatic Grammatical Error Correction

Marcin Junczys-Dowmunt; Roman Grundkiewicz

In this work, we study parameter tuning towards the M^2 metric, the standard metric for automatic grammar error correction (GEC) tasks. After implementing M^2 as a scorer in the Moses tuning framework, we investigate interactions of dense and sparse features, different optimizers, and tuning strategies for the CoNLL-2014 shared task. We notice erratic behavior when optimizing sparse feature weights with M^2 and offer partial solutions. We find that a bare-bones phrase-based SMT setup with task-specific parameter-tuning outperforms all previously published results for the CoNLL-2014 test set by a large margin (46.37% M^2 over previously 41.75%, by an SMT system with neural features) while being trained on the same, publicly available data. Our newly introduced dense and sparse features widen that gap, and we improve the state-of-the-art to 49.49% M^2.


empirical methods in natural language processing | 2015

Human Evaluation of Grammatical Error Correction Systems

Roman Grundkiewicz; Marcin Junczys-Dowmunt; Edward Gillian

The paper presents the results of the first large-scale human evaluation of automatic grammatical error correction (GEC) systems. Twelve participating systems and the unchanged input of the CoNLL-2014 shared task have been reassessed in a WMT-inspired human evaluation procedure. Methods introduced for the Workshop of Machine Translation evaluation campaigns have been adapted to GEC and extended where necessary. The produced rankings are used to evaluate standard metrics for grammatical error correction in terms of correlation with human judgment.


international conference natural language processing | 2014

The WikEd Error Corpus: A Corpus of Corrective Wikipedia Edits and Its Application to Grammatical Error Correction

Roman Grundkiewicz; Marcin Junczys-Dowmunt

This paper introduces the freely available WikEd Error Corpus. We describe the data mining process from Wikipedia revision histories, corpus content and format. The corpus consists of more than 12 million sentences with a total of 14 million edits of various types. As one possible application, we show that WikEd can be successfully adapted to improve a strong baseline in a task of grammatical error correction for English-as-a-Second-Language (ESL) learners’ writings by 2.63%. Used together with an ESL error corpus, a composed system gains 1.64% when compared to the ESL-trained system.


text speech and dialogue | 2013

Automatic Extraction of Polish Language Errors from Text Edition History

Roman Grundkiewicz

There are no large error corpora for a number of languages, despite the fact that they have multiple applications in natural language processing. The main reason underlying this situation is a high cost of manual corpora creation. In this paper we present the methods of automatic extraction of various kinds of errors such as spelling, typographical, grammatical, syntactic, semantic, and stylistic ones from text edition histories. By applying of these methods to the Wikipedia’s article revision history, we created the large and publicly available corpus of naturally-occurring language errors for Polish, called PlEWi. Finally, we analyse and evaluate the detected error categories in our corpus.


language and technology conference | 2015

Reinvestigating the Classification Approach to the Article and Preposition Error Correction

Roman Grundkiewicz; Marcin Junczys-Dowmunt

In this work, we reinvestigate the classifier-based approach to article and preposition error correction going beyond linguistically motivated factors. We show that state-of-the-art results can be achieved without relying on a plethora of heuristic rules, complex feature engineering and advanced NLP tools. A proposed method for detecting spaces for article insertion is even more efficient than methods that use a parser. We examine automatically trained word classes acquired by unsupervised learning as a substitution for commonly used part-of-speech tags. Our best models significantly outperform the top systems from CoNLL-2014 Shared Task in terms of article and preposition error correction.


language and technology conference | 2013

An Example of a Compatible NLP Toolkit

Krzysztof Jassem; Roman Grundkiewicz

The paper describes an open-source set of linguistic tools, whose distinctive features are its customisability and compatibility with other NLP toolkits: texts in various natural languages and character encodings may be read from a number of popular data formats; all annotation tools may be run with several options to differentiate the format of input and output; rule lists used by individual tools may be supplemented or replaced by the user; external tools (including NLP tools designed in independent research centres) may be incorporated into the toolkit’s environment.


meeting of the association for computational linguistics | 2018

Marian: Fast Neural Machine Translation in C++

Marcin Junczys-Dowmunt; Roman Grundkiewicz


north american chapter of the association for computational linguistics | 2018

APPROACHING NEURAL GRAMMATICAL ERROR CORRECTION AS A LOW-RESOURCE MACHINE TRANSLATION TASK

Marcin Junczys-Dowmunt; Roman Grundkiewicz; Shubha Guha; Kenneth Heafield

Collaboration


Dive into the Roman Grundkiewicz's collaboration.

Top Co-Authors

Avatar

Marcin Junczys-Dowmunt

Adam Mickiewicz University in Poznań

View shared research outputs
Top Co-Authors

Avatar

Kenneth Heafield

Carnegie Mellon University

View shared research outputs
Top Co-Authors

Avatar

Krzysztof Jassem

Adam Mickiewicz University in Poznań

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Hieu Hoang

University of Edinburgh

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge