Jakub Stárka
Charles University in Prague
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jakub Stárka.
Computers in Industry | 2014
Jakub Klímek; Jindřich Mynarz; Tomáš Knap; Vojtěch Svátek; Jakub Stárka
Abstract Management of the tendering phase of the public contract lifecycle is a demanding activity with often irrevocable impact on the subsequent realization phase. We investigate the impact of the linked data technology on this process. The public contract information itself can be published as linked data. A specialized vocabulary, the Public Contracts Ontology, was designed for this purpose. Extractors and transformers for public contract datasets in various formats (HTML, CSV, XML) were developed to enable conversion into RDF format corresponding to the vocabulary. Moreover, an application for filing public contracts was implemented. It enables a contracting authority to manage RDF data about itself and its contracts, suppliers to the contracts, to-be-contracted products and services, and actual tenders proposed by bidders. It also provides matchmaking services for finding similar contracts and suitable suppliers for a given call for tenders based on their history, which is a useful feature for contracting authorities.
Procedia Computer Science | 2012
Michal Klempa; Jakub Stárka; Irena Mlynkova
XML is a widely used technology. Although in most real life applications XML data is required to conform to particular schemas, the majority of real-world XML documents does not contain any explicit declaration. To fill the gap, the research area of automatic schema inference from XML documents has emerged. This paper refines and extends recent approaches to the automatic schema inference by exploiting an obsolete schema in the inference process, designing new MDL measures and heuristic excluding of eccentric data inputs. It delivers a ready-to-use implementation integrated into jInfer – a framework for XML schema inference. Experimental results are a part of the paper.
information integration and web-based applications & services | 2013
Jakub Stárka; Irena Holubová
In this paper we introduce Strigil, a framework for automated data extraction. It represents an easily configurable tool that enables one to retrieve a data from textual or weak-structured documents. The paper contains description of the framework architecture and its important components. Additionally, we propose a scraping language inspired by the XSL transformations designed to extract data from different kinds of documents. Although there are many different approaches focused on various aspects of data scraping, they are usually very specialized to a concrete domain or a data source. We compare these solutions and discuss their advantages and disadvantages. Our scraping language is designed to work with an ontology to map scraped data directly to classes and attributes.
The Computer Journal | 2012
Jakub Stárka; Martin Svoboda; Jan Sochna; Jiří Schejbal; Irena Mlýnková; David Bednárek
Recently eXtensible Markup Language (XML) has achieved the leading role among languages for data representation and, thus, we can witness a massive boom of corresponding techniques for managing XML data. Most of the processing techniques, however, suffer from various bottlenecks worsening their time and/or space efficiency. We assume that the main reason is they consider XML collections too globally, involving all their possible features, although real-world data are often much simpler. Even though some techniques do restrict the input data, the restrictions are mostly unnatural. This paper aims to introduce Analyzer—a complex framework for performing statistical analyses of realworld documents. Exploitation of results of these analyses is a classical way how data processing can be optimized in many areas. Although this intent is legitimate, ad hoc and dedicated analyses soon become obsolete, they are usually built on insufficiently extensive collections and are difficult to repeat. Analyzer represents an easily extensible framework, which helps the user with gathering documents, managing analyses and browsing computed reports.
international conference on innovations in information technology | 2011
Jakub Stárka; Irena Mlynkova; Jakub Klímek; Martin Necasky
Web services as a basic instrument for inter-system communication are expanding rapidly. This causes an increased interest in effective integration into the respective complex systems. However, manual integration and management of evolution of the XML formats may be very hard. In this paper we study the possibilities of reverse engineering of XML schemas. A new method based on analysis of our previously proposed platform-specific conceptual model XSEM and the subsequent creation of a decision tree is introduced. The method allows to find a mapping from XML formats to conceptual diagram efficiently and more precisely.
signal-image technology and internet-based systems | 2012
Mário Mikula; Jakub Stárka; Irena Mlynkova
Recently, plenty of methods dealing with automatic inference of XML schema have been developed, however, most of them utilize XML documents as their only input. In this paper, we focus on extending inference by incorporating XML operations, in particular XQuery queries. We discuss how can XQuery queries help in improving the inference process and we propose an algorithm based on chosen improvements, extending an existing method of a key discovery. Experimental results are a part of the paper.
international database engineering and applications symposium | 2012
Michal Kozák; Jakub Stárka; Irena Mlýnková
In this paper we introduce a method to infer a Schematron schema from a set of XML documents. We analyze different aspect of Schematron schema generation. Since the automatic inferring of XML documents is not a new problem, we will introduce only a single method that we will use in our experimental implementation. In the implementation we generate a grammar using the introduced inferring method and we allow the user to modify the grammar. The grammar is then transformed into Schematron schema by the use of our algorithm. Experimental results are a part of the paper.
DATESO | 2012
Martin Svoboda; Jakub Stárka; Irena Mlynkova
DATESO | 2011
Jakub Stárka; Martin Svoboda; Jirí Schejbal; Irena Mlynkova; David Bednárek
COLD'12 Proceedings of the Third International Conference on Consuming Linked Data - Volume 905 | 2012
Jakub Stárka; Martin Svoboda; Irena Mlynkova