Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Marcin Woliński is active.

Publication


Featured researches published by Marcin Woliński.


conference of the european chapter of the association for computational linguistics | 2003

A Flexemic Tagset for Polish

Adam Przepiórkowski; Marcin Woliński

The article notes certain weaknesses of current efforts aiming at the standardization of POS tagsets for morphologically rich languages and argues that, in order to achieve clear mappings between tagsets, it is necessary to have clear and formal rules of delimiting POSs and grammatical categories within any given tagset. An attempt at constructing such a tagset for Polish is presented.


intelligent information systems | 2004

Information Extraction for Polish Using the SProUT Platform

Jakub Piskorski; Peter Homola; Małgorzata Marciniak; Agnieszka Mykowiecka; Adam Przepiórkowski; Marcin Woliński

The aim of this article is to present the initial results of adapting SProUT, a multi-lingual Natural Language Processing platform developed at DFKI, Germany, to the processing of Polish. The article describes some of the problems posed by the integration of Morfeusz, an external morphological analyzer for Polish, and various solutions to the problem of the lack of extensive gazetteers for Polish. The main sections of the article report on some initial experiments in applying this adapted system to the Information Extraction task of identifying various classes of Named Entities in financial and medical texts, perhaps the first such Information Extraction effort for Polish.


text speech and dialogue | 2010

Towards a bank of constituent parse trees for Polish

Marek Świdziński; Marcin Woliński

We present a project aimed at construction of a bank of constituent parse trees for 20,000 Polish sentences taken from the balanced hand-annotated subcorpus of the National Corpus of Polish (NKJP). The treebank is to be obtained by automatic parsing and manual disambiguation of resulting trees. The grammar applied by the project is a new version of Swidzinskis formal definition of Polish. Each sentence is disambiguated independently by two linguists and, if needed, adjudicated by a supervisor. The feedback from this process is used to iteratively improve the grammar. In the paper, we describe linguistic but also technical decisions made in the project. We discuss the overall shape of the parse trees including the extent of encoded grammatical information. We also delve into the problem of syntactic disambiguation as a challenge for our job.


international conference on computational linguistics | 2014

Extended phraseological information in a valence dictionary for NLP applications

Adam Przepiórkowski; Elżbieta Hajnicz; Agnieszka Patejuk; Marcin Woliński

The aim of this paper is to propose a far-reaching extension of the phraseological component of a valence dictionary for Polish. The dictionary is the basis of two dierent parsers of Polish; its format has been designed so as to maximise the readability of the information it contains and its re-applicability. We believe that the extension proposed here follows this approach and, hence, may be an inspiration in the design of valence dictionaries for other languages.


Aspects of Natural Language Processing | 2009

A New Formal Definition of Polish Nominal Phrases

Marek Świdziński; Marcin Woliński

In the paper, a new formal definition of Polish nominal phrases is presented. Based upon a certain formal grammar of Polish (FGP) that applies a formalism of metamorphosis grammar, it is the first step towards redesigning the entire grammar. It makes use of the results of experiments with implementation of the grammar. After a report on empirical data a large set of parameters that formalize various grammatical features is introduced. Some of those parameters are really new, others are to be reinterpreted and improved. A number of rules are written down to illustrate the way empirical expressions are accounted for. The paper ends in formulating some postulates that the new version of FGP is expected to fulfil.


Aspects of Natural Language Processing | 2009

Inflection of Polish Multi-Word Proper Names with Morfeusz and Multiflex

Agata Savary; Joanna Rabiega-Wiśniewska; Marcin Woliński

We discuss morphological properties of Polish multi-word proper names. We present a cooperating framework of two morphological tools: Morfeusz, a morphological analyser and generator for Polish simple words, and Multiflex, a cross-language morpho-syntactic generator of multi-word units. We discuss interface constraints required for the interoperability of these tools, and we show how the resulting platform allows one to describe the morpho-syntactic behaviour of some interesting examples of Warsaw multi-word toponyms.


language and technology conference | 2009

Toposław: a lexicographic framework for multi-word units

Małgorzata Marciniak; Agata Savary; Piotr Sikora; Marcin Woliński

The paper presents a tool for the creation of an electronic dictionary of multi-word proper names. Toposlaw uses graphs for the representation of inflectional and pragmatic variants of names. It cooperates with Morfeusz, a morphological analyser and generator for Polish words, and Multiflex, a cross-language morpho-syntactic generator of multi-word units. Our goal was to create a userfriendly tool that makes a lexicographic work easy and efficient. In the paper we describe facilities for graph creation, management and debugging. The presented tool was applied to create a dictionary of Warsaw urban proper names.


language and technology conference | 2009

A Relational Model of Polish Inflection in Grammatical Dictionary of Polish

Marcin Woliński

The subject of this article is a description of Polish inflection in the form of a relational database. The description has been developed for a grammatical dictionary of Polish that aims at complete inflectional characterisation of all Polish lexemes. We show some complexities of the Polish inflectional system for various grammatical classes. Then we present a relatively compact relational model which can be used to describe Polish inflection in a uniform way.


text speech and dialogue | 2017

Morphosyntactic Annotation of Historical Texts. The Making of the Baroque Corpus of Polish

Witold Kieraś; Dorota Komosińska; Emanuel Modrzejewski; Marcin Woliński

In the paper, we present some technical issues concerning processing 17th & 18th century texts for the purpose of building a corpus of that period. We describe a chain of procedures leading from transliterated source texts to morphological annotation of text samples that was implemented for building the Baroque Corpus of Polish, a relatively large historical corpus of Polish texts from 17th & 18th c. The described procedure consists of: automatic transliteration from original spelling to modern one, morphological analysis (including the construction of an inflectional dataset for Baroque Polish) and a tool for manual morphosyntactic annotation. The toolchain is being used to create a small manually validated subcorpus, which will serve as training data for a stochastic tagger. Then a larger corpus will be annotated automatically and made available via the Poliqarp corpus search tool.


Studies in Polish Linguistics | 2016

The Use of Electronic Historical Dictionary Data in Corpus Design

Renata Bronikowska; Włodzimierz Gruszczyński; Maciej Ogrodniczuk; Marcin Woliński

The History of the 17th and 18th c. Polish Language Laboratory, Institute of Polish Language, Polish Academy of Sciences, is in the process of creating two large databases: The Electronic Dictionary of the 17th−18th c. Polish and The Electronic Corpus of the 17th and 18th c. Polish Texts (up to 1772), the latter in cooperation with the Institute of Computer Science, Polish Academy of Sciences. It is expected that combining these two sets of data will help to achieve the objectives established for both database projects. The present article shows the benefits that the Corpus creators can get from the data gathered in the dictionary, with special emphasis put on the use of grammatical information included in the dictionary entries to design tools for automatic text annotation in the Corpus.

Collaboration


Dive into the Marcin Woliński's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar

Elżbieta Hajnicz

Polish Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Witold Kieraś

Polish Academy of Sciences

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Agata Savary

François Rabelais University

View shared research outputs
Researchain Logo
Decentralizing Knowledge