Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Jakub Stárka is active.

Publication


Featured researches published by Jakub Stárka.


Computers in Industry | 2014

Linked data support for filing public contracts

Jakub Klímek; Jindřich Mynarz; Tomáš Knap; Vojtěch Svátek; Jakub Stárka

Abstract Management of the tendering phase of the public contract lifecycle is a demanding activity with often irrevocable impact on the subsequent realization phase. We investigate the impact of the linked data technology on this process. The public contract information itself can be published as linked data. A specialized vocabulary, the Public Contracts Ontology, was designed for this purpose. Extractors and transformers for public contract datasets in various formats (HTML, CSV, XML) were developed to enable conversion into RDF format corresponding to the vocabulary. Moreover, an application for filing public contracts was implemented. It enables a contracting authority to manage RDF data about itself and its contracts, suppliers to the contracts, to-be-contracted products and services, and actual tenders proposed by bidders. It also provides matchmaking services for finding similar contracts and suitable suppliers for a given call for tenders based on their history, which is a useful feature for contracting authorities.


Procedia Computer Science | 2012

Optimization and Refinement of XML Schema Inference Approaches

Michal Klempa; Jakub Stárka; Irena Mlynkova

XML is a widely used technology. Although in most real life applications XML data is required to conform to particular schemas, the majority of real-world XML documents does not contain any explicit declaration. To fill the gap, the research area of automatic schema inference from XML documents has emerged. This paper refines and extends recent approaches to the automatic schema inference by exploiting an obsolete schema in the inference process, designing new MDL measures and heuristic excluding of eccentric data inputs. It delivers a ready-to-use implementation integrated into jInfer – a framework for XML schema inference. Experimental results are a part of the paper.


information integration and web-based applications & services | 2013

Strigil: A Framework for Data Extraction in Semi-Structured Web Documents

Jakub Stárka; Irena Holubová

In this paper we introduce Strigil, a framework for automated data extraction. It represents an easily configurable tool that enables one to retrieve a data from textual or weak-structured documents. The paper contains description of the framework architecture and its important components. Additionally, we propose a scraping language inspired by the XSL transformations designed to extract data from different kinds of documents. Although there are many different approaches focused on various aspects of data scraping, they are usually very specialized to a concrete domain or a data source. We compare these solutions and discuss their advantages and disadvantages. Our scraping language is designed to work with an ontology to map scraped data directly to classes and attributes.


The Computer Journal | 2012

Analyzer: A Complex System for Data Analysis

Jakub Stárka; Martin Svoboda; Jan Sochna; Jiří Schejbal; Irena Mlýnková; David Bednárek

Recently eXtensible Markup Language (XML) has achieved the leading role among languages for data representation and, thus, we can witness a massive boom of corresponding techniques for managing XML data. Most of the processing techniques, however, suffer from various bottlenecks worsening their time and/or space efficiency. We assume that the main reason is they consider XML collections too globally, involving all their possible features, although real-world data are often much simpler. Even though some techniques do restrict the input data, the restrictions are mostly unnatural. This paper aims to introduce Analyzer—a complex framework for performing statistical analyses of realworld documents. Exploitation of results of these analyses is a classical way how data processing can be optimized in many areas. Although this intent is legitimate, ad hoc and dedicated analyses soon become obsolete, they are usually built on insufficiently extensive collections and are difficult to repeat. Analyzer represents an easily extensible framework, which helps the user with gathering documents, managing analyses and browsing computed reports.


international conference on innovations in information technology | 2011

Integration of web service interfaces via decision trees

Jakub Stárka; Irena Mlynkova; Jakub Klímek; Martin Necasky

Web services as a basic instrument for inter-system communication are expanding rapidly. This causes an increased interest in effective integration into the respective complex systems. However, manual integration and management of evolution of the XML formats may be very hard. In this paper we study the possibilities of reverse engineering of XML schemas. A new method based on analysis of our previously proposed platform-specific conceptual model XSEM and the subsequent creation of a decision tree is introduced. The method allows to find a mapping from XML formats to conceptual diagram efficiently and more precisely.


signal-image technology and internet-based systems | 2012

Inference of an XML Schema with the Knowledge of XML Operations

Mário Mikula; Jakub Stárka; Irena Mlynkova

Recently, plenty of methods dealing with automatic inference of XML schema have been developed, however, most of them utilize XML documents as their only input. In this paper, we focus on extending inference by incorporating XML operations, in particular XQuery queries. We discuss how can XQuery queries help in improving the inference process and we propose an algorithm based on chosen improvements, extending an existing method of a key discovery. Experimental results are a part of the paper.


international database engineering and applications symposium | 2012

Schematron schema inference

Michal Kozák; Jakub Stárka; Irena Mlýnková

In this paper we introduce a method to infer a Schematron schema from a set of XML documents. We analyze different aspect of Schematron schema generation. Since the automatic inferring of XML documents is not a new problem, we will introduce only a single method that we will use in our experimental implementation. In the implementation we generate a grammar using the introduced inferring method and we allow the user to modify the grammar. The grammar is then transformed into Schematron schema by the use of our algorithm. Experimental results are a part of the paper.


DATESO | 2012

On Distributed Querying of Linked Data

Martin Svoboda; Jakub Stárka; Irena Mlynkova


DATESO | 2011

XML Document Correction and XQuery Analysis with Analyzer

Jakub Stárka; Martin Svoboda; Jirí Schejbal; Irena Mlynkova; David Bednárek


COLD'12 Proceedings of the Third International Conference on Consuming Linked Data - Volume 905 | 2012

Analyses of RDF triples in sample datasets

Jakub Stárka; Martin Svoboda; Irena Mlynkova

Collaboration


Dive into the Jakub Stárka's collaboration.

Top Co-Authors

Avatar

Irena Mlynkova

Charles University in Prague

View shared research outputs
Top Co-Authors

Avatar

Martin Svoboda

Charles University in Prague

View shared research outputs
Top Co-Authors

Avatar

David Bednárek

Charles University in Prague

View shared research outputs
Top Co-Authors

Avatar

Irena Holubová

Charles University in Prague

View shared research outputs
Top Co-Authors

Avatar

Irena Mlýnková

Charles University in Prague

View shared research outputs
Top Co-Authors

Avatar

Jakub Klímek

Charles University in Prague

View shared research outputs
Top Co-Authors

Avatar

Michal Klempa

Charles University in Prague

View shared research outputs
Top Co-Authors

Avatar

Michal Kozák

Charles University in Prague

View shared research outputs
Top Co-Authors

Avatar

Mário Mikula

Charles University in Prague

View shared research outputs
Top Co-Authors

Avatar

Jan Sochna

Charles University in Prague

View shared research outputs
Researchain Logo
Decentralizing Knowledge