Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Dominika Tkaczyk is active.

Publication


Featured researches published by Dominika Tkaczyk.


international conference theory and practice digital libraries | 2013

Large Scale Citation Matching Using Apache Hadoop

Mateusz Fedoryszak; Dominika Tkaczyk; Łukasz Bolikowski

During the process of citation matching links from bibliography entries to referenced publications are created. Such links are indicators of topical similarity between linked texts, are used in assessing the impact of the referenced document and improve navigation in the user interfaces of digital libraries. In this paper we present a citation matching method and show how to scale it up to handle great amounts of data using appropriate indexing and a MapReduce paradigm in the Hadoop environment.


document analysis systems | 2014

CERMINE -- Automatic Extraction of Metadata and References from Scientific Literature

Dominika Tkaczyk; Pawel Szostek; Piotr Jan Dendek; Mateusz Fedoryszak; Lukasz Bolikowski

CERMINE is a comprehensive open source system for extracting metadata and parsed bibliographic references from scientific articles in born-digital form. The system is based on a modular workflow, whose architecture allows for single step training and evaluation, enables effortless modifications and replacements of individual components and simplifies further architecture expanding. The implementations of most steps are based on supervised and unsupervised machine-learning techniques, which simplifies the process of adjusting the system to new document layouts. The paper describes the overall workflow architecture, provides details about individual implementations and reports evaluation methodology and results. CERMINE service is available at http://cermine.ceon.pl.


document analysis systems | 2012

A Modular Metadata Extraction System for Born-Digital Articles

Dominika Tkaczyk; Lukasz Bolikowski; Artur Czeczko; K. Rusek

We present a comprehensive system for extracting metadata from scholarly articles. In our approach the entire document is inspected, including headers and footers of all the pages as well as bibliographic references. The system is based on a modular workflow which allows for evaluation, unit testing and replacement of individual components. The workflow is optimized towards processing of born-digital documents, but may accept scanned document images as well. The machine-learning approaches we have chosen for solving individual tasks increase the ability to adapt to new document layouts and formats. The evaluation tests we have performed showed good results of the individual implementations and the entire metadata extraction process.


Semantic Web Evaluation Challenges | 2015

Extracting Contextual Information from Scientific Literature Using CERMINE System

Dominika Tkaczyk; Łukasz Bolikowski

CERMINE is a comprehensive open source system for extracting structured metadata and references from born-digital scientific literature. Among other information, the system is able to extract information related to the context the article was written in, such as the authors and their affiliations, the relations between them or references to other articles. Extracted information is presented in a structured, machine-readable form. CERMINE is based on a modular workflow, whose loosely coupled architecture allows for individual components evaluation and adjustment, enables effortless improvements and replacements of independent parts of the algorithm and facilitates future architecture expanding. The implementation of the workflow is based mostly on supervised and unsupervised machine-learning techniques, which simplifies the procedure of adapting the system to new document layouts and styles. In this paper we outline the overall workflow architecture, describe key aspects of the system implementation, provide details about training and adjusting of individual algorithms, and finally report how CERMINE was used for extracting contextual information from scientific articles in PDF format in the context of ESWC 2015 Semantic Publishing Challenge. CERMINE system is available under an open-source licence and can be accessed at http://cermine.ceon.pl.


acm/ieee joint conference on digital libraries | 2012

GROTOAP: ground truth for open access publications

Dominika Tkaczyk; Artur Czeczko; K. Rusek; Lukasz Bolikowski; Roman Bogacewicz

The field of digital document content analysis includes many important tasks, for example page segmentation or zone classification. It is impossible to build effective solutions for such problems and evaluate their performance without a reliable test set, that contains both input documents and expected results of segmentation and classification. In this paper we present GROTOAP --- a test set useful for training and performance evaluation of page segmentation and zone classification tasks. The test set contains input articles in a digital form and corresponding ground truth files. All input documents included in the test set have been selected from DOAJ database, which indexes articles published under CC-BY license. The whole test set is available under the same license.


Intelligent Tools for Building a Scientific Information Platform | 2013

Data Model for Analysis of Scholarly Documents in the MapReduce Paradigm

Adam Kawa; Łukasz Bolikowski; Artur Czeczko; Piotr Jan Dendek; Dominika Tkaczyk

At CeON ICM UW we are in possession of a large collection of scholarly documents that we store and process using MapReduce paradigm. One of the main challenges is to design a simple, but effective data model that fits various data access patterns and allows us to perform diverse analysis efficiently. In this paper, we will describe the organization of our data and explain how this data is accessed and processed by open-source tools from Apache Hadoop Ecosystem.


Intelligent Tools for Building a Scientific Information Platform | 2013

Methodology for evaluating citation parsing and matching

Mateusz Fedoryszak; Łukasz Bolikowski; Dominika Tkaczyk; Krzysztof Wojciechowski

Bibliographic references between scholarly publications contain valuable information for researchers and developers involved with digital repositories. They are indicators of topical similarity between linked texts, impact of the referenced document, and improve navigation in user interfaces of digital libraries. Consequently, several approaches to extraction, parsing and resolving said references have been proposed to date. In this paper we develop a methodology for evaluating parsing and matching algorithms and choosing the most appropriate one for a document collection at hand. We apply the methodology for evaluating reference parsing and matching module of the YADDA2 software platform.


International Journal on Document Analysis and Recognition | 2015

CERMINE: automatic extraction of structured metadata from scientific literature

Dominika Tkaczyk; Pawel Szostek; Mateusz Fedoryszak; Piotr Jan Dendek; Lukasz Bolikowski


D-lib Magazine | 2014

GROTOAP2 The Methodology of Creating a Large Ground Truth Dataset of Scientific Articles

Dominika Tkaczyk; Pawel Szostek; Lukasz Bolikowski


D-lib Magazine | 2015

Structured Affiliations Extraction from Scientific Literature

Dominika Tkaczyk; Bartosz Tarnawski; Lukasz Bolikowski

Collaboration


Dive into the Dominika Tkaczyk's collaboration.

Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

K. Rusek

University of Warsaw

View shared research outputs
Top Co-Authors

Avatar
Researchain Logo
Decentralizing Knowledge