Mateusz Kopeć | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Mateusz Kopeć is active.

Explore More

Publication

Featured researches published by Mateusz Kopeć.

Archive | 2013

Summarizing Short Texts Through a Discourse-Centered Approach in a Multilingual Context

Daniel Alexandru Anechitei; Dan Cristea; Ioannidis Dimosthenis; Eugen Ignat; Diman Karagiozov; Svetla Koeva; Mateusz Kopeć; Cristina Vertan

The chapter presents the architecture of a system targeting summaries of short texts in six languages. At the core of a summary, which comprises clauses and sentences extracted from the original text, is the structure of the discourse and its relationship with its coreferential links. The approach shows a uniform design for all languages, while language specificity is attributed to the resources that fuel the component modules. The design described here includes a number of feedback loops used to fine-tune the parameters by comparing the output of the modules against annotated corpora. “Average” summaries over some human-produced ones are used to evaluate the accuracy of each of the monolingual systems. The study also presents some quantitative data on the corpora used, showing a comparison among languages and results that, mostly, prove to be above the state of the art.

language and technology conference | 2013

Polish Coreference Corpus

Maciej Ogrodniczuk; Katarzyna Głowińska; Mateusz Kopeć; Agata Savary; Magdalena Zawisławska

The Polish Coreference Corpus (PCC) is a large corpus of Polish general nominal coreference built upon the National Corpus of Polish. With its 1900 documents from 14 text genres, containing about 540,000 tokens, 180,000 mentions and 128,000 coreference clusters, the PCC is among the largest coreference corpora in the international community. It has some novel features, such as the annotation of the quasi-identity relation, inspired by Recasens’ near-identity, as well as the mark-up of semantic heads and dominant expressions. It shows a good inter-annotator agreement and is distributed in three formats under an open license. Its by-products include freely available annotation tools with custom features such as file distribution management and annotation adjudication.

CCL | 2013

Interesting Linguistic Features in Coreference Annotation of an Inflectional Language

Maciej Ogrodniczuk; Katarzyna Głowińska; Mateusz Kopeć; Agata Savary; Magdalena Zawisławska

This paper reports on linguistic features and decisions that we find vital in the process of annotation and resolution of coreference for highly inflectional languages. The presented results have been collected during preparation of a corpus of general direct nominal coreference of Polish. Starting from the notion of a mention, its borders and potential vs. actual referentiality, we discuss the problem of complete and near-identity, zero subjects and dominant expressions. We also present interesting linguistic cases influencing the coreference resolution such as the difference between semantic and syntactic heads or the phenomenon of coreference chains made of indefinite pronouns.

international conference natural language processing | 2014

Detection of Nested Mentions for Coreference Resolution in Polish

Maciej Ogrodniczuk; Alicja Wójcicka; Katarzyna Głowińska; Mateusz Kopeć

This paper describes the results of creating a shallow grammar of Polish capable of detecting multi-level nested nominal phrases, intended to be used as mentions in coreference resolution tasks. The work is based on existing grammar developed for the National Corpus of Polish and evaluated on manually annotated Polish Coreference Corpus.

Proceedings of the Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature | 2017

Lexical Correction of Polish Twitter Political Data

Maciej Ogrodniczuk; Mateusz Kopeć

Language processing architectures are often evaluated in near-to-perfect conditions with respect to processed content. The tools which perform sufficiently well on electronic press, books and other type of non-interactive content may poorly handle littered, colloquial and multilingual textual data which make the majority of communication today. This paper aims at investigating how Polish Twitter data (in a slightly controlled ‘political’ flavour) differs from expectation of linguistic tools and how they could be corrected to be ready for processing by standard language processing chains available for Polish. The setting includes specialised components for spelling correction of tweets as well as hashtag and username decoding.

Archive | 2014

Inter-annotator Agreement in Coreference Annotation of Polish

Mateusz Kopeć; Maciej Ogrodniczuk

This paper discusses different methods of estimating the inter-annotator agreement in manual annotation of Polish coreference and proposes a new BLANC-based annotation agreement metric. The commonly used agreement indicators are calculated for mention detection, semantic head annotation, near-identity markup and coreference resolution.

language resources and evaluation | 2012