Jakub Waszczuk | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jakub Waszczuk is active.

Explore More

Publication

Featured researches published by Jakub Waszczuk.

international multiconference on computer science and information technology | 2010

Tools and methodologies for annotating syntax and named entities in the National Corpus of Polish

Jakub Waszczuk; Katarzyna Głowińska; Agata Savary; Adam Przepiórkowski

The on-going project aiming at the creation of the National Corpus of Polish assumes several levels of linguistic annotation. We present the technical environment and methodological background developed for the three upper annotation levels: the level of syntactic words and groups, and the level of named entities. We show how knowledge-based platforms Spejd and Sprout are used for the automatic pre-annotation of the corpus, and we discuss some particular problems faced during the elaboration of the syntactic grammar, which contains over 800 rules and is one of the largest chunking grammars for Polish. We also show how the tree editor TrEd has been customized for manual post-editing of annotations, and for further revision of discrepancies. Our XML format converters and customized archiving repository ensure the automatic data flow and efficient corpus file management. We believe that this environment or substantial parts of it can be reused in or adapted for other corpus annotation tasks.

International Journal of Data Mining, Modelling and Management | 2013

Annotation tools for syntax and named entities in the National Corpus of Polish

Jakub Waszczuk; Katarzyna Głowińska; Agata Savary; Adam Przepiórkowski; Michał Lenart

The ongoing National Corpus of Polish project assumes several levels of linguistic annotation. We present the technical environment and methodological background developed for the three upper annotation levels: the levels of syntactic words, syntactic groups and named entities. We show how knowledge-based platforms Spejd and Sprout are used for the automatic pre-annotation of the corpus and discuss some particular problems faced during the preparation of the parser grammar, which contains over 1,000 rules and is one of the largest chunking grammars for Polish. We also show how the tree editor TrEd has been customised for manual post-editing of annotations and for further revision of discrepancies. Our XML format converters and customised archiving repository ensure an automatic data flow and efficient corpus file management. We discuss the inter-annotator agreement in the manually annotated data, and present the first results of a CRF classifier trained on these data.

international conference on implementation and application of automata | 2016

Enhancing Practical TAG Parsing Efficiency by Capturing Redundancy

Jakub Waszczuk; Agata Savary; Yannick Parmentier

The efficiency of parsing with tree adjoining grammars (TAGs) depends not only on the size of the input sentence but also, linearly, on the size of the input TAG, which can attain several thousands of elementary trees. We propose a factorized, finite-state TAG representation to cope with this combinatorial explosion. The associated parsing algorithm shows a substantial performance gain on a real-size French TAG.

intelligent information systems | 2013

A Representation of an Old Polish Dictionary Designed for Practical Applications

Jakub Waszczuk

We describe an efficient representation of an old Polish dictionary designed for practical applications. This representation consists of two components: a memory-efficient automaton and a binary version of the dictionary. We have developed a separate automata library and we show some practical applications of the library within the context of the old Polish dictionary.

language resources and evaluation | 2010

Towards the Annotation of Named Entities in the National Corpus of Polish.

Agata Savary; Jakub Waszczuk; Adam Przepiórkowski

international conference on computational linguistics | 2012

Harnessing the CRF Complexity with Domain-Specific Constraints. The Case of Morphosyntactic Tagging of a Highly Inflected Language

Jakub Waszczuk

text speech and dialogue | 2009

Semantic Annotation of City Transportation Information Dialogues Using CRF Method

Agnieszka Mykowiecka; Jakub Waszczuk

language and technology conference | 2015

PARSEME – PARSing and Multiword Expressions within a European multilingual network

Agata Savary; Manfred Sailer; Yannick Parmentier; Michael Rosner; Victoria Rosén; Adam Przepiórkowski; Cvetana Krstev; Veronika Vincze; Beata Wójtowicz; Gyri Smørdal Losnegaard; Carla Parra Escartín; Jakub Waszczuk; Mathieu Constant; Petya Osenova; Federico Sangati

Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing | 2017