David Bednárek | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where David Bednárek is active.

Explore More

Publication

Featured researches published by David Bednárek.

high performance distributed computing | 2014

Bobolang: a language for parallel streaming applications

Zbyněk Falt; David Bednárek; Martin Kruliš; Jakub Yaghob; Filip Zavoral

At present time, the programmers may choose from a number of streaming languages. They cover various aspects of the development process of streaming applications; however, specification of complex or runtime-dependent parts of the applications still remains a great challenge. We have analysed a large amount of requirements raised by the development of multiple data streaming parallel applications and proposed a novel language called Bobolang. It contains syntactic and semantic features which allow the programmer to naturally solve most of the problems, which we met in the design of streaming applications. The language is used to specify the structure of the whole application as well as the inner structure of each operator. Thanks to the properties of the language, Bobolang can create an optimized evaluation plan which is capable of making the best use of the available hardware resources. The language has been employed in several practical problems and it has proven itself to be a very powerful tool for the development of data-intensive parallel applications.

IDC | 2013

Data-Flow Awareness in Parallel Data Processing

David Bednárek; Jiří Dokulil; Jakub Yaghob; Filip Zavoral

The memory hierarchy affects the performance of task-scheduling strategies in task-based parallel environments. For data-intensive problems, the flow of data may be explicitly specified as a part of the algorithm, allowing the task scheduler to be aware of the data flow. In this paper, we describe such a task-based environment with explicit data-flow specification. We demonstrate the effect of data-flow awareness on the system performance. The results show that the explicit specification of data flow improves the quality of task scheduling.

2009 Third International Conference on Advances in Semantic Processing | 2009

Using Methods of Parallel Semi-structured Data Processing for Semantic Web

David Bednárek; Jiri Dokulil; Jakub Yaghob; Filip Zavoral

The state of the art in semi-structured data processing (and XML in particular) and Semantic Web repositories correspond to each other: the non-scalability of pilot implementations, the inability of optimizations, and the cost of the fully native implementation. Although there are successful implementations in each of the approaches, none of the methods may be considered universal. The Bobox framework proposed in this paper is a relational-like storage engine applicable both as a native XML database and as a Semantic Web repository. The main purpose of the engine is in experiments in both areas. The main stress is put to the performance of complex queries and transformations, and to the ability of parallel evaluation in particular.

IDC | 2015

Locality Aware Task Scheduling in Parallel Data Stream Processing

Zbyněk Falt; Martin Kruliš; David Bednárek; Jakub Yaghob; Filip Zavoral

Parallel data processing and parallel streaming systems become quite popular. They are employed in various domains such as real-time signal processing, OLAP database systems, or high performance data extraction.One of the key components of these systems is the task scheduler which plans and executes tasks spawned by the system on available CPU cores. The multiprocessor systems and CPU architecture of the day become quite complex, which makes the task scheduling a challenging problem. In this paper, we propose a novel task scheduling strategy for parallel data stream systems, that reflects many technical issues of the current hardware. We were able to achieve up to 3× speed up on a NUMA system and up to 10% speed up on an older SMP system with respect to the unoptimized version of the scheduler. The basic ideas implemented in our scheduler may be adopted for task schedulers that focus on other priorities or employ different constraints.

Archive | 2015

Big Data Movement: A Challenge in Data Processing

Jaroslav Pokorný; Petr Skoda; Ivan Zelinka; David Bednárek; Filip Zavoral; Martin Kruliš; Petr Šaloun

This chapter discusses modern methods of data processing, especially data parallelization and data processing by bio-inspired methods. The synthesis of novel methods is performed by selected evolutionary algorithms and demonstrated on the astrophysical data sets. Such approach is now characteristic for so called Big Data and Big Analytics. First, we describe some new database architectures that support Big Data storage and processing. We also discuss selected Big Data issues, specifically the data sources, characteristics, processing, and analysis. Particular interest is devoted to parallelism in the service of data processing and we discuss this topic in detail. We show how new technologies encourage programmers to consider parallel processing not only in a distributive way (horizontal scaling), but also within each server (vertical scaling). The chapter also intensively discusses interdisciplinary intersection between astrophysics and computer science, which has been denoted astroinformatics, including a variety of data sources and examples. The last part of the chapter is devoted to selected bio-inspired methods and their application on simple model synthesis from astrophysical Big Data collections. We suggest a method how new algorithms can be synthesized by bio-inspired approach and demonstrate its application on an astronomy Big Data collection. The usability of these algorithms along with general remarks on the limits of computing are discussed at the conclusion of this chapter.

database and expert systems applications | 2010

Tri Query: Modifying XQuery for RDF and Relational Data

David Bednárek; Jiri Dokulil

The ability to convert between different data formats is important in large and heterogeneous information systems. Although XML was established as an universal standard for data exchange, XML-related languages like XQuery lack the ability to access data in other formats; in particular, relational data and RDF. In this paper, we describe TriQuery - an extension of the XQuery language which adds records (tuples) and RDF-specific operators. Using the statically optimizable record types, relational data as well as the results from RDF sub-queries can be integrated more efficiently than with their traditional encoding using XML elements and attributes.

advances in databases and information systems | 2008

Reducing Temporary Trees in XQuery

David Bednárek

The creation, maintenance and disposal of tree fragments during XQuery execution form a significant issue in the design of XQuery processors. The problem is further complicated by the definition of node identity which violates the functional nature of the XQuery language. This paper presents a novel mathematical model of XQuery execution that reflects temporary tree construction and manipulation, including navigation. Using this model as reference, an efficient algorithm of static analysis is presented that determines the level of information required at a particular place of the XQuery program. As a side effect, the algorithm also decides on the ordered/unordered context as defined by the XQuery language. Based on this algorithm, the amount of information stored during the execution as well as the complexity of operations may be significantly reduced.

IDC | 2008

Output-Driven XQuery Evaluation

David Bednárek

When a XML document is stored in a relational or native database, its tree structure is usually dissolved into various forms of interval or Dewey indexes. Besides other advantages, these loosely-coupled structures allow parallel or distributed evaluation of XPath queries. However, when a XQuery or XSLT program produces a new XML document, its construction forms a hardly parallelizable bottleneck. In this paper, we present a method of XQuery/XSLT evaluation that directly generates Dewey-like structures representing the output of the transformation. This approach forms an output-side counterpart of Dewey-based XPath evaluation methods and makes parallel evaluation of XQuery/XSLT programs easier.

international conference data science | 2017

Data Preprocessing of eSport Game Records - Counter-Strike: Global Offensive.

David Bednárek; Martin Kruliš; Jakub Yaghob; Filip Zavoral

Electronic sports or pro gaming have become very popular in this millenium and the increased value of this new industry is attracting investors with various interests. One of these interest is game betting, which requires player and team rating, game result predictions, and fraud detection techniques. In our work, we focus on preprocessing data of Counter-Strike: Global Offensive game in order to employ subsequent data analysis methods for quantifying player performance. The data preprocessing is difficult since the data format is complex and undocumented, the data quality of available sources is low, and there is no direct way how to match players from the recorded files with players listed on public boards such as HLTV website. We have summarized our experience from the data preprocessing and provide a way how to establish a player matching based on their metadata.

Information Systems | 2017

Improving matrix-based dynamic programming on massively parallel accelerators

David Bednárek; Michal Brabec; Martin Kruliš

Dynamic programming techniques are well-established and employed by various practical algorithms, including the edit-distance algorithm or the dynamic time warping algorithm. These algorithms usually operate in an iteration-based manner where new values are computed from values of the previous iteration. The data dependencies enforce synchronization which limits possibilities for internal parallel processing. In this paper, we investigate parallel approaches to processing matrix-based dynamic programming algorithms on modern multicore CPUs, Intel Xeon Phi accelerators, and general purpose GPUs. We address both the problem of computing a single distance on large inputs and the problem of computing a number of distances of smaller inputs simultaneously (e.g., when a similarity query is being resolved). Our proposed solutions yielded significant improvements in performance and achieved speedup of two orders of magnitude when compared to the serial baseline. HighlightsDynamic programming algorithms with matrix organization (e.g., Levenshtein distance).Employing task parallelism and SIMD/SIMT vectorization.Proposed hierarchical algorithm optimized for CPUs, Intel Xeon Phi devices, and GPUs.Can be efficiently parallelized if inputs are large or many distances are computed.Experiments also determine optimal configurations for current hardware.

Explore More