Maciej Ogrodniczuk | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Maciej Ogrodniczuk is active.

Explore More

Publication

Featured researches published by Maciej Ogrodniczuk.

language and technology conference | 2013

Polish Coreference Corpus

Maciej Ogrodniczuk; Katarzyna Głowińska; Mateusz Kopeć; Agata Savary; Magdalena Zawisławska

The Polish Coreference Corpus (PCC) is a large corpus of Polish general nominal coreference built upon the National Corpus of Polish. With its 1900 documents from 14 text genres, containing about 540,000 tokens, 180,000 mentions and 128,000 coreference clusters, the PCC is among the largest coreference corpora in the international community. It has some novel features, such as the annotation of the quasi-identity relation, inspired by Recasens’ near-identity, as well as the mark-up of semantic heads and dominant expressions. It shows a good inter-annotator agreement and is distributed in three formats under an open license. Its by-products include freely available annotation tools with custom features such as file distribution management and annotation adjudication.

international conference on computational linguistics | 2013

Coreference annotation schema for an inflectional language

Maciej Ogrodniczuk; Magdalena Zawisławska; Katarzyna Głowińska; Agata Savary

Creating a coreference corpus for an inflectional and free-word-order language is a challenging task due to specific syntactic features largely ignored by existing annotation guidelines, such as the absence of definite/indefinite articles (making quasi-anaphoricity very common), frequent use of zero subjects or discrepancies between syntactic and semantic heads. This paper comments on the experience gained in preparation of such a resource for an ongoing project (CORE), aiming at creating tools for coreference resolution. Starting with a clarification of the relation between noun groups and mentions, through definition of the annotation scope and strategies, up to actual decisions for borderline cases, we present the process of building the first, to our best knowledge, corpus of general coreference of Polish.

applications of natural language to data bases | 2013

A Multi-purpose Online Toolset for NLP Applications

Maciej Ogrodniczuk; Michał Lenart

This paper presents a new implementation of the multi-purpose set of NLP tools for Polish, made available online in a common web service framework. The tool set comprises a morphological analyzer, a tagger, a named entity recognizer, a dependency parser, a constituency parser and a coreference resolver. Additionally, a web application offering chaining capabilities and a common BRAT-based presentation framework is presented.

CCL | 2013

Interesting Linguistic Features in Coreference Annotation of an Inflectional Language

Maciej Ogrodniczuk; Katarzyna Głowińska; Mateusz Kopeć; Agata Savary; Magdalena Zawisławska

This paper reports on linguistic features and decisions that we find vital in the process of annotation and resolution of coreference for highly inflectional languages. The presented results have been collected during preparation of a corpus of general direct nominal coreference of Polish. Starting from the notion of a mention, its borders and potential vs. actual referentiality, we discuss the problem of complete and near-identity, zero subjects and dominant expressions. We also present interesting linguistic cases influencing the coreference resolution such as the difference between semantic and syntactic heads or the phenomenon of coreference chains made of indefinite pronouns.

Studies in Polish Linguistics | 2016

The Use of Electronic Historical Dictionary Data in Corpus Design

Renata Bronikowska; Włodzimierz Gruszczyński; Maciej Ogrodniczuk; Marcin Woliński

The History of the 17th and 18th c. Polish Language Laboratory, Institute of Polish Language, Polish Academy of Sciences, is in the process of creating two large databases: The Electronic Dictionary of the 17th−18th c. Polish and The Electronic Corpus of the 17th and 18th c. Polish Texts (up to 1772), the latter in cooperation with the Institute of Computer Science, Polish Academy of Sciences. It is expected that combining these two sets of data will help to achieve the objectives established for both database projects. The present article shows the benefits that the Corpus creators can get from the data gathered in the dictionary, with special emphasis put on the use of grammatical information included in the dictionary entries to design tools for automatic text annotation in the Corpus.

international conference natural language processing | 2014

Detection of Nested Mentions for Coreference Resolution in Polish

Maciej Ogrodniczuk; Alicja Wójcicka; Katarzyna Głowińska; Mateusz Kopeć

This paper describes the results of creating a shallow grammar of Polish capable of detecting multi-level nested nominal phrases, intended to be used as mentions in coreference resolution tasks. The work is based on existing grammar developed for the National Corpus of Polish and evaluated on manually annotated Polish Coreference Corpus.

international conference on mining intelligence and knowledge exploration | 2013

Discovery of Common Nominal Facts for Coreference Resolution: Proof of Concept

Maciej Ogrodniczuk

This paper reports on the preliminary experiment aimed at verification whether extraction of nominal facts corresponding to world knowledge from both structured and unstructured data could be effectively performed and its results used as a source of pragmatic knowledge for coreference resolution in Polish. Being the proof-of-concept only, this approach is work in progress and is intended to be further validated in a full-scale project.

intelligent information systems | 2013

Translation- and Projection-Based Unsupervised Coreference Resolution for Polish

Maciej Ogrodniczuk

Creating a coreference resolution tool for a new language is a challenging task due to substantial effort required by development of associated linguistic data, regardless of rule-based or statistical nature of the approach. In this paper, we test the translation- and projection-based method for an inflectional language, evaluate the result on a corpus of general coreference and compare the results with state-of-the-art solutions of this type for other languages.

language data and knowledge | 2017

Multi-pass Sieve Coreference Resolution System for Polish

Bartłomiej Nitoń; Maciej Ogrodniczuk

This paper examines the portability of Stanford’s multi-pass rule-based sieve coreference resolution system to inflectional language (Polish) with a different annotation scheme. The presented system is implemented in BART, a modular toolkit later adapted to the sieve architecture by Baumann et al. The sieves for Polish include processing of zero subjects and experimental knowledge-intensive sieve using the newly created database of periphrastic expressions. Evaluation shows that the results for Polish are higher than those seen on the CoNLL-2011/2012 data.

Proceedings of the Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature | 2017

Lexical Correction of Polish Twitter Political Data

Maciej Ogrodniczuk; Mateusz Kopeć

Language processing architectures are often evaluated in near-to-perfect conditions with respect to processed content. The tools which perform sufficiently well on electronic press, books and other type of non-interactive content may poorly handle littered, colloquial and multilingual textual data which make the majority of communication today. This paper aims at investigating how Polish Twitter data (in a slightly controlled ‘political’ flavour) differs from expectation of linguistic tools and how they could be corrected to be ready for processing by standard language processing chains available for Polish. The setting includes specialised components for spelling correction of tweets as well as hashtag and username decoding.

Explore More