Jakub Waszczuk
Polish Academy of Sciences
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jakub Waszczuk.
international multiconference on computer science and information technology | 2010
Jakub Waszczuk; Katarzyna Głowińska; Agata Savary; Adam Przepiórkowski
The on-going project aiming at the creation of the National Corpus of Polish assumes several levels of linguistic annotation. We present the technical environment and methodological background developed for the three upper annotation levels: the level of syntactic words and groups, and the level of named entities. We show how knowledge-based platforms Spejd and Sprout are used for the automatic pre-annotation of the corpus, and we discuss some particular problems faced during the elaboration of the syntactic grammar, which contains over 800 rules and is one of the largest chunking grammars for Polish. We also show how the tree editor TrEd has been customized for manual post-editing of annotations, and for further revision of discrepancies. Our XML format converters and customized archiving repository ensure the automatic data flow and efficient corpus file management. We believe that this environment or substantial parts of it can be reused in or adapted for other corpus annotation tasks.
International Journal of Data Mining, Modelling and Management | 2013
Jakub Waszczuk; Katarzyna Głowińska; Agata Savary; Adam Przepiórkowski; Michał Lenart
The ongoing National Corpus of Polish project assumes several levels of linguistic annotation. We present the technical environment and methodological background developed for the three upper annotation levels: the levels of syntactic words, syntactic groups and named entities. We show how knowledge-based platforms Spejd and Sprout are used for the automatic pre-annotation of the corpus and discuss some particular problems faced during the preparation of the parser grammar, which contains over 1,000 rules and is one of the largest chunking grammars for Polish. We also show how the tree editor TrEd has been customised for manual post-editing of annotations and for further revision of discrepancies. Our XML format converters and customised archiving repository ensure an automatic data flow and efficient corpus file management. We discuss the inter-annotator agreement in the manually annotated data, and present the first results of a CRF classifier trained on these data.
international conference on implementation and application of automata | 2016
Jakub Waszczuk; Agata Savary; Yannick Parmentier
The efficiency of parsing with tree adjoining grammars (TAGs) depends not only on the size of the input sentence but also, linearly, on the size of the input TAG, which can attain several thousands of elementary trees. We propose a factorized, finite-state TAG representation to cope with this combinatorial explosion. The associated parsing algorithm shows a substantial performance gain on a real-size French TAG.
intelligent information systems | 2013
Jakub Waszczuk
We describe an efficient representation of an old Polish dictionary designed for practical applications. This representation consists of two components: a memory-efficient automaton and a binary version of the dictionary. We have developed a separate automata library and we show some practical applications of the library within the context of the old Polish dictionary.
language resources and evaluation | 2010
Agata Savary; Jakub Waszczuk; Adam Przepiórkowski
international conference on computational linguistics | 2012
Jakub Waszczuk
text speech and dialogue | 2009
Agnieszka Mykowiecka; Jakub Waszczuk
language and technology conference | 2015
Agata Savary; Manfred Sailer; Yannick Parmentier; Michael Rosner; Victoria Rosén; Adam Przepiórkowski; Cvetana Krstev; Veronika Vincze; Beata Wójtowicz; Gyri Smørdal Losnegaard; Carla Parra Escartín; Jakub Waszczuk; Mathieu Constant; Petya Osenova; Federico Sangati
Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing | 2017
Agata Savary; Jakub Waszczuk
international conference on computational linguistics | 2016
Jakub Waszczuk; Agata Savary; Yannick Parmentier