Adam Przepiórkowski
Polish Academy of Sciences
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Adam Przepiórkowski.
conference of the european chapter of the association for computational linguistics | 2003
Adam Przepiórkowski; Marcin Woliński
The article notes certain weaknesses of current efforts aiming at the standardization of POS tagsets for morphologically rich languages and argues that, in order to achieve clear mappings between tagsets, it is necessary to have clear and formal rules of delimiting POSs and grammatical categories within any given tagset. An attempt at constructing such a tagset for Polish is presented.
meeting of the association for computational linguistics | 2007
Adam Przepiórkowski; Lukasz Degórski; Miroslav Spousta; Kiril Simov; Petya Osenova; Lothar Lemnitzer; Vladislav Kuboň; Beata Wójtowicz
This paper presents the results of the preliminary experiments in the automatic extraction of definitions (for semi-automatic glossary construction) from usually unstructured or only weakly structured e-learning texts in Bulgarian, Czech and Polish. The extraction is performed by regular grammars over XML-encoded morphosyntactically-annotated documents. The results are less than satisfying and we claim that the reason for that is the intrinsic difficulty of the task, as measured by the low interannotator agreement, which calls for more sophisticated deeper linguistic processing, as well as for the use of machine learning classification techniques.
intelligent information systems | 2004
Jakub Piskorski; Peter Homola; Małgorzata Marciniak; Agnieszka Mykowiecka; Adam Przepiórkowski; Marcin Woliński
The aim of this article is to present the initial results of adapting SProUT, a multi-lingual Natural Language Processing platform developed at DFKI, Germany, to the processing of Polish. The article describes some of the problems posed by the integration of Morfeusz, an external morphological analyzer for Polish, and various solutions to the problem of the lack of extensive gazetteers for Polish. The main sections of the article report on some initial experiments in applying this adapted system to the Information Extraction task of identifying various classes of Named Entities in financial and medical texts, perhaps the first such Information Extraction effort for Polish.
meeting of the association for computational linguistics | 2007
Daniel Janus; Adam Przepiórkowski
This paper presents recent extensions to Poliqarp, an open source tool for indexing and searching morphosyntactically annotated corpora, which turn it into a tool for indexing and searching certain kinds of treebanks, complementary to existing treebank search engines. In particular, the paper discusses the motivation for such a new tool, the extended query syntax of Poliqarp and implementation and efficiency issues.
meeting of the association for computational linguistics | 2007
Adam Przepiórkowski
Information Extraction (IE) often involves some amount of partial syntactic processing. This is clear in cases of interesting high-level IE tasks, such as finding information about who did what to whom (when, where, how and why), but it is also true in case of simpler IE tasks, such as finding company names in texts. The aim of this paper is to give an overview of Slavonic phenomena which pose particular problems for IE and partial parsing, and some phenomena which seem easier to treat in Slavonic than in Germanic or Romance; I also mention various tools which have been used for the partial processing of Slavonic.
international conference natural language processing | 2008
Łukasz Kobyliński; Adam Przepiórkowski
We propose a novel machine learning approach to the task of identifying definitions in Polish documents. Specifics of the problem domain and characteristics of the available dataset have been taken into consideration, by carefully choosing and adapting a classification method to highly imbalanced and noisy data. We evaluate the performance of a Random Forest-based classifier in extracting definitional sentences from natural language text and give a comparison with previous work.
linguistic annotation workshop | 2009
Piotr Bański; Adam Przepiórkowski
We present the annotation architecture of the National Corpus of Polish and discuss problems identified in the TEI stand-off annotation system, which, in its current version, is still very much unfinished and untested, due to both technical reasons (lack of tools implementing the TEI-defined XPointer schemes) and certain problems concerning data representation. We concentrate on two features that a stand-off system should possess and that are conspicuously missing in the current TEI Guidelines.
language and technology conference | 2009
Adam Przepiórkowski; Piotr Bański
The paper attempts to answer the question: Which XML standard(s) should be used for multilevel corpus annotation? Various more or less specific standards and best practices are reviewed: TEI P5, XCES, work within ISO TC37 / SC4, TIGER-XML and PAULA. The conclusion of the paper is that the approach with the best claim to following text encoding standards consists in 1) using TEI-conformant schemata that are 2) designed in a way compatible with other standards and data models.
language and technology conference | 2009
Aleksander Buczyński; Adam Przepiórkowski
This article presents a formalism and a beta version of a new tool for simultaneous morphosyntactic disambiguation and shallow parsing. Unlike in the case of other shallow parsing formalisms, the rules of the grammar allow for explicit morphosyntactic disambiguation statements, independently of structure-building statements, which facilitates the task of the shallow parsing of morphosyntactically ambiguous or erroneously disambiguated input.
language and technology conference | 2009
Rafał Młodzki; Adam Przepiórkowski
In this paper we present the Word Sense Disambiguation Development Environment (WSDDE), a platform for testing various Word Sense Disambiguation (WSD) technologies, as well as the results of first experiments in applying the platform to WSD in Polish. The current development version of the environment facilitates the construction and evaluation of WSD methods in the supervised Machine Learning (ML) paradigm using various knowledge sources. Experiments were conducted on a small manually sense-tagged corpus of 13 Polish words. The usual groups of features were implemented including bag-of-words, parts-of-speech, words with their positions, etc. (with different settings), in connection with popular ML algorithms (including Naive Bayes, Decision Trees and Support Vector Machines). The aim was to test to what extent standard approaches to the English WSD task may be adopted to free word order and rich inflection languages such as Polish. In accordance with earlier results in the literature, the initial experiments suggest that these standard approaches are relatively well-suited for Polish. On the other hand, contrary to earlier findings, the experiments also show that adding of some features beyond bag-of-words increases the average accuracy of the results.