Adam Przepiórkowski | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Adam Przepiórkowski is active.

Explore More

Publication

Featured researches published by Adam Przepiórkowski.

conference of the european chapter of the association for computational linguistics | 2003

A Flexemic Tagset for Polish

Adam Przepiórkowski; Marcin Woliński

The article notes certain weaknesses of current efforts aiming at the standardization of POS tagsets for morphologically rich languages and argues that, in order to achieve clear mappings between tagsets, it is necessary to have clear and formal rules of delimiting POSs and grammatical categories within any given tagset. An attempt at constructing such a tagset for Polish is presented.

meeting of the association for computational linguistics | 2007

Towards the Automatic Extraction of Definitions in Slavic

Adam Przepiórkowski; Lukasz Degórski; Miroslav Spousta; Kiril Simov; Petya Osenova; Lothar Lemnitzer; Vladislav Kuboň; Beata Wójtowicz

This paper presents the results of the preliminary experiments in the automatic extraction of definitions (for semi-automatic glossary construction) from usually unstructured or only weakly structured e-learning texts in Bulgarian, Czech and Polish. The extraction is performed by regular grammars over XML-encoded morphosyntactically-annotated documents. The results are less than satisfying and we claim that the reason for that is the intrinsic difficulty of the task, as measured by the low interannotator agreement, which calls for more sophisticated deeper linguistic processing, as well as for the use of machine learning classification techniques.

intelligent information systems | 2004

Information Extraction for Polish Using the SProUT Platform

Jakub Piskorski; Peter Homola; Małgorzata Marciniak; Agnieszka Mykowiecka; Adam Przepiórkowski; Marcin Woliński

The aim of this article is to present the initial results of adapting SProUT, a multi-lingual Natural Language Processing platform developed at DFKI, Germany, to the processing of Polish. The article describes some of the problems posed by the integration of Morfeusz, an external morphological analyzer for Polish, and various solutions to the problem of the lack of extensive gazetteers for Polish. The main sections of the article report on some initial experiments in applying this adapted system to the Information Extraction task of identifying various classes of Named Entities in financial and medical texts, perhaps the first such Information Extraction effort for Polish.

meeting of the association for computational linguistics | 2007

Poliqarp: An open source corpus indexer and search engine with syntactic extensions

Daniel Janus; Adam Przepiórkowski

This paper presents recent extensions to Poliqarp, an open source tool for indexing and searching morphosyntactically annotated corpora, which turn it into a tool for indexing and searching certain kinds of treebanks, complementary to existing treebank search engines. In particular, the paper discusses the motivation for such a new tool, the extended query syntax of Poliqarp and implementation and efficiency issues.

meeting of the association for computational linguistics | 2007

Slavic Information Extraction and Partial Parsing

Adam Przepiórkowski

Information Extraction (IE) often involves some amount of partial syntactic processing. This is clear in cases of interesting high-level IE tasks, such as finding information about who did what to whom (when, where, how and why), but it is also true in case of simpler IE tasks, such as finding company names in texts. The aim of this paper is to give an overview of Slavonic phenomena which pose particular problems for IE and partial parsing, and some phenomena which seem easier to treat in Slavonic than in Germanic or Romance; I also mention various tools which have been used for the partial processing of Slavonic.

international conference natural language processing | 2008

Definition Extraction with Balanced Random Forests

Łukasz Kobyliński; Adam Przepiórkowski

We propose a novel machine learning approach to the task of identifying definitions in Polish documents. Specifics of the problem domain and characteristics of the available dataset have been taken into consideration, by carefully choosing and adapting a classification method to highly imbalanced and noisy data. We evaluate the performance of a Random Forest-based classifier in extracting definitional sentences from natural language text and give a comparison with previous work.

linguistic annotation workshop | 2009

Stand-off TEI Annotation: the Case of the National Corpus of Polish

Piotr Bański; Adam Przepiórkowski

We present the annotation architecture of the National Corpus of Polish and discuss problems identified in the TEI stand-off annotation system, which, in its current version, is still very much unfinished and untested, due to both technical reasons (lack of tools implementing the TEI-defined XPointer schemes) and certain problems concerning data representation. We concentrate on two features that a stand-off system should possess and that are conspicuously missing in the current TEI Guidelines.

language and technology conference | 2009

Which XML standards for multilevel corpus annotation

Adam Przepiórkowski; Piotr Bański

The paper attempts to answer the question: Which XML standard(s) should be used for multilevel corpus annotation? Various more or less specific standards and best practices are reviewed: TEI P5, XCES, work within ISO TC37 / SC4, TIGER-XML and PAULA. The conclusion of the paper is that the approach with the best claim to following text encoding standards consists in 1) using TEI-conformant schemata that are 2) designed in a way compatible with other standards and data models.

language and technology conference | 2009

Spejd: A Shallow Processing and Morphological Disambiguation Tool

Aleksander Buczyński; Adam Przepiórkowski

This article presents a formalism and a beta version of a new tool for simultaneous morphosyntactic disambiguation and shallow parsing. Unlike in the case of other shallow parsing formalisms, the rules of the grammar allow for explicit morphosyntactic disambiguation statements, independently of structure-building statements, which facilitates the task of the shallow parsing of morphosyntactically ambiguous or erroneously disambiguated input.

language and technology conference | 2009

The WSD development environment

Rafał Młodzki; Adam Przepiórkowski

In this paper we present the Word Sense Disambiguation Development Environment (WSDDE), a platform for testing various Word Sense Disambiguation (WSD) technologies, as well as the results of first experiments in applying the platform to WSD in Polish. The current development version of the environment facilitates the construction and evaluation of WSD methods in the supervised Machine Learning (ML) paradigm using various knowledge sources. Experiments were conducted on a small manually sense-tagged corpus of 13 Polish words. The usual groups of features were implemented including bag-of-words, parts-of-speech, words with their positions, etc. (with different settings), in connection with popular ML algorithms (including Naive Bayes, Decision Trees and Support Vector Machines). The aim was to test to what extent standard approaches to the English WSD task may be adopted to free word order and rich inflection languages such as Polish. In accordance with earlier results in the literature, the initial experiments suggest that these standard approaches are relatively well-suited for Polish. On the other hand, contrary to earlier findings, the experiments also show that adding of some features beyond bag-of-words increases the average accuracy of the results.

Explore More