Jakub Yaghob
Charles University in Prague
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jakub Yaghob.
web intelligence | 2006
Jakub Yaghob; Filip Zavoral
Years of research and development of technologies and tools do not lead to expected semantic Web widespread. We consider practical nonexistence of an infrastructure for the semantic Web operation as one of the main reasons of that status. In our paper we describe a proposal of such infrastructure based on the DataPile technology and relevant developed tools and their integration with Web search engines and other tools
high performance distributed computing | 2014
Zbyněk Falt; David Bednárek; Martin Kruliš; Jakub Yaghob; Filip Zavoral
At present time, the programmers may choose from a number of streaming languages. They cover various aspects of the development process of streaming applications; however, specification of complex or runtime-dependent parts of the applications still remains a great challenge. We have analysed a large amount of requirements raised by the development of multiple data streaming parallel applications and proposed a novel language called Bobolang. It contains syntactic and semantic features which allow the programmer to naturally solve most of the problems, which we met in the design of streaming applications. The language is used to specify the structure of the whole application as well as the inner structure of each operator. Thanks to the properties of the language, Bobolang can create an optimized evaluation plan which is capable of making the best use of the available hardware resources. The language has been employed in several practical problems and it has proven itself to be a very powerful tool for the development of data-intensive parallel applications.
IDC | 2013
David Bednárek; Jiří Dokulil; Jakub Yaghob; Filip Zavoral
The memory hierarchy affects the performance of task-scheduling strategies in task-based parallel environments. For data-intensive problems, the flow of data may be explicitly specified as a part of the algorithm, allowing the task scheduler to be aware of the data flow. In this paper, we describe such a task-based environment with explicit data-flow specification. We demonstrate the effect of data-flow awareness on the system performance. The results show that the explicit specification of data flow improves the quality of task scheduling.
2009 Third International Conference on Advances in Semantic Processing | 2009
David Bednárek; Jiri Dokulil; Jakub Yaghob; Filip Zavoral
The state of the art in semi-structured data processing (and XML in particular) and Semantic Web repositories correspond to each other: the non-scalability of pilot implementations, the inability of optimizations, and the cost of the fully native implementation. Although there are successful implementations in each of the approaches, none of the methods may be considered universal. The Bobox framework proposed in this paper is a relational-like storage engine applicable both as a native XML database and as a Semantic Web repository. The main purpose of the engine is in experiments in both areas. The main stress is put to the performance of complex queries and transformations, and to the ability of parallel evaluation in particular.
ubiquitous computing systems | 2007
Jiri Dokulil; Jaroslav Tykal; Jakub Yaghob; Filip Zavoral
The semantic Web is not widespread as it has been expected by its founders. This is partially caused by lack of standard and working infrastructure for the semantic Web. We have built a working, portable, stable, high-performance infrastructure for the semantic Web. This enables various experiments with the semantic Web in the real world.
IDC | 2015
Zbyněk Falt; Martin Kruliš; David Bednárek; Jakub Yaghob; Filip Zavoral
Parallel data processing and parallel streaming systems become quite popular. They are employed in various domains such as real-time signal processing, OLAP database systems, or high performance data extraction.One of the key components of these systems is the task scheduler which plans and executes tasks spawned by the system on available CPU cores. The multiprocessor systems and CPU architecture of the day become quite complex, which makes the task scheduling a challenging problem. In this paper, we propose a novel task scheduling strategy for parallel data stream systems, that reflects many technical issues of the current hardware. We were able to achieve up to 3× speed up on a NUMA system and up to 10% speed up on an older SMP system with respect to the unoptimized version of the scheduler. The basic ideas implemented in our scheduler may be adopted for task schedulers that focus on other priorities or employ different constraints.
advances in databases and information systems | 2013
Zbyněk Falt; Jan Bulánek; Jakub Yaghob
Since the development of applications for parallel architectures is complicated and error-prone, many frameworks were created to simplify this task. One promising approach which is applicable especially for the development of parallel databases is expressing algorithms as stream programs, i.e. inputs and outputs of procedures are data streams and these procedures are connected so that they form an oriented graph. In this paper, we introduce highly scalable sorting algorithm which is suitable for streaming systems. We achieve this mainly by introducing multiway merge algorithm which is able to merge multiple independent sorted streams in parallel.
international conference data science | 2017
David Bednárek; Martin Kruliš; Jakub Yaghob; Filip Zavoral
Electronic sports or pro gaming have become very popular in this millenium and the increased value of this new industry is attracting investors with various interests. One of these interest is game betting, which requires player and team rating, game result predictions, and fraud detection techniques. In our work, we focus on preprocessing data of Counter-Strike: Global Offensive game in order to employ subsequent data analysis methods for quantifying player performance. The data preprocessing is difficult since the data format is complex and undocumented, the data quality of available sources is low, and there is no direct way how to match players from the recorded files with players listed on public boards such as HLTV website. We have summarized our experience from the data preprocessing and provide a way how to establish a player matching based on their metadata.
database and expert systems applications | 2014
Jakub Yaghob; David Bednárek; Martin Kruli; Filip Zavoral
Astrophysical databases have used proprietary formats (especially the FITS format) to represent measured data and related metadata. The design of the FITS format was influenced by punch cards, thus it is extremely inappropriate for modern hardware. Even though this format is well established in the domain of astrophysics and will be definitely used as a common ground for data exchange, a new representation is required if the data are to be processed efficiently in a high performance manner. In this paper, we propose a specialized column-oriented format for the measured data, which allows much faster loading from the persistent data storage and direct application of the data in computational operations. Furthermore, we have tested various I/O methods implemented in modern operating systems to accommodate different access patterns that were observed in various use cases. We have created a prototype implementation of the proposed methods and experimentally evaluated their benefits.
conference on current trends in theory and practice of informatics | 1996
David Bednárek; Petr Merta; David Obdrzalek; Jakub Yaghob; Filip Zavoral
T4 is a microkernel and a distributed operating system built upon it. This paper describes the concepts, goals, and major design features of the T4 system. It provides an overview of the T4 system architecture, communication principles, and support for distributed computation.