Is this you? Create Your Porfile

Sándor Juhász

Budapest University of Technology and Economics

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Sándor Juhász is active.

Explore More

Publication

Featured researches published by Sándor Juhász.

parallel distributed and network based processing | 2002

Execution time prediction for parallel data processing tasks

Sándor Juhász; Hassan Charaf

Nowadays a wide range of highly efficient hardware components are available as possible building blocks for parallel distributed systems, however many questions arise on the software side. There is no common solution for optimal distribution of co-operating tasks, and performance prediction is also an open issue. Efforts are focused on creating and making use of mathematical models in a precise domain, namely applications making moderate computation effort on a relatively large amount of data. The possibilities to predict and to minimize execution times are investigated in a cluster of workstations environment, where the data transfer system is expected to become the performance bottleneck. The use of the presented generic model is shown on the example of a parallel integer sorting algorithm: formulas are built up to provide the expected execution times and to approximate the optimal cluster size. Finally, the predicted and the measured execution times of the sorting algorithm are compared for different problem and cluster sizes.

acm symposium on applied computing | 2004

Exploiting fast ethernet performance in multiplatform cluster environment

Sándor Juhász; Hassan Charaf

As the communication subsystem largely determines the overall performance and the characteristics of cluster systems, it must face diverging demands such as bandwidth, latency, quality of service and cost. In this paper we investigate the performance and improvement possibilities of a portable TCP/IP based communication subsystem that aims to integrate heterogeneous nodes. The cluster is built up from standard PCs connected with low-cost network, where nodes may have different processor speed, memory size and may even run different operating systems. We present and compare application level end-to-end latencies measured under different conditions varying the number of simultaneous connections, processing threads and the types of operating systems. Our experiments show that message latencies are overwhelmingly dominated by software overheads, which can be hidden or eliminated by different methods, thus PC clusters can take good advantage of the bandwidth of a Fast Ethernet connection even with smaller message sizes. Finally, based on the results, we draw the attention to a domain of inaccuracy of the standard communication models in PC cluster environment, and we suggest a new formula to describe the latency of concurrent message channels over the same medium.

Lecture Notes in Computer Science | 2004

Asynchronous Distributed Broadcasting in Cluster Environment

Sándor Juhász; Ferenc Kovács

Improving communication performance is an important issue in cluster systems. This paper investigates the possibility of accelerating group communication at the level of message passing libraries. A new algorithm for implementing the broadcast communication primitive will be introduced. It enhances the performance of fully-switched cluster systems by using message decomposition and asynchronous communication. The new algorithm shows the dynamism and the portability of the software solutions, while it has a constant asymptotic time complexity achieved only with hardware support before. Test measurements show that the algorithm really has a constant time complexity, and in certain cases it can outperform the widely used binary tree approach by 100 percent. The presented algorithm can be used to increase the performance of broadcasting, and can also indirectly speed up various group communication primitives used in standard message passing libraries.

International Journal of Parallel Programming | 2012

Software Controlled Adaptive Pre-Execution for Data Prefetching

Ákos Dudás; Sándor Juhász; Tamás Schrádi

Data prefetching mechanisms are widely used for hiding memory latency in data intensive applications. They mask the speed gap between CPUs and their memory systems by preloading data into the CPU caches, where accessing them is by at least one order of magnitude faster. Pre-execution is a combined prefetching method, which executes a slice of the original code preloading the code and its data at the same time. Pre-execution is often mentioned in the literature, but according to our knowledge, it has not been formally defined yet. We fill this void by presenting the formal definition of speculative and non-speculative pre-execution, and derive a lightweight software-based strategy which accelerates the main working thread by introducing an adaptive, non-speculative pre-execution helper thread. This helper thread acts as a perfect predictor, calculates memory addresses, prefetches the data and consumes cache misses early. The adaptive automatic control allows the helper thread to configure itself in run-time for best performance. The method is directly applicable to any data intensive application without requiring hardware modifications. Our method was able to achieve an average speedup of 10–30% in a real-life application.

international conference on computational cybernetics | 2004

Performance prediction for association rule mining algorithms

Renáta Iváncsy; Sándor Juhász; Ferenc Kovács

Execution time prediction is very important issue in job scheduling and resource allocation. Association rule mining algorithms are complex and their execution time depends on both the properties of the input data sources and on the mining parameters. In this paper, an analytical model of the Apriori algorithm is introduced, which is based on statistical parameters of the input dataset (average size of the transactions, number of transactions in the dataset) and on the minimum support threshold. The developed analytical model has only few parameters therefore the predicted execution time can be calculated in a simple way. The investigated domain of the input parameters covers the most commonly used datasets, therefore the introduced model can be used widely in field of association rule mining. The constant parameters of the model can be identified in small number of test executions. The developed model allows predicting the execution time of the Apriori algorithm in a wide range of parameters. The suggested model was validated by several different datasets and the experimental results show that the overall average error rate of the model is less than 15%

Archive | 2013

Reconfigurable Preexecution in Data Parallel Applications on Multicore Systems

Ákos Dudás; Sándor Juhász

The performance of data-intensive applications is often limited not only by the computational power of current computers but also by the performance gap between the CPU and the main system memory. Data prefetch mechanisms mask this latency by moving data closer to the CPU automatically. These methods rely on predicting future memory addresses; however, they are not suited for applications with random memory access patterns. Preexecution is a prefetch method which executes a slice of the original algorithm in parallel with the main thread to calculate memory addresses and issue loads early. In this paper we propose a lightweight software preexecution strategy for data parallel applications that accelerates the main working thread with an adaptive preexecution helper thread acting as a perfect predictor and consuming cache misses. With automatic parameter tuning the helper thread adapts to the application and system it is executed on. This method was able to achieve an average speedup of 10–30% in a real-life data parallel application.

Facing the Multicore-Challenge | 2013

Recalibrating Fine-Grained Locking in Parallel Bucket Hash Tables

Ákos Dudás; Sándor Juhász; Sándor Kolumbán

Mutual exclusion protects data structures in parallel environments in order to preserve data integrity. A lock being held effectively blocks the execution of all other threads wanting to access the same shared resource until the lock is released. This blocking behavior reduces the level of parallelism causing performance loss. Fine grained locking reduces the contention for the locks resulting in better throughput, however, the granularity, i.e. how many locks to use, is not straightforward. In large bucket hash tables, the best approach is to divide the table into blocks, each containing one or more buckets, and locking these blocks independently. The size of the block, for optimal performance, depends on the time spent within the critical sections, which depends on the table’s internal properties, and the arrival intensity of the queries. A queuing model is presented capturing this behavior, and an adaptive algorithm is presented fine-tuning the granularity of locking (the block size) to adapt to the execution environment.

IDC | 2008

Large-Scale Data Dictionaries Based on Hash Tables

Sándor Juhász

Data dictionaries allow efficient transformation of repeating input values. The attention is focused on the analysis of voluminous lookup tables that store up to a few tens of millions of key-value pairs. Because of their compactness and search efficiency, hash tables turn out to provide the best solutions in such cases. This paper deals with performance issues of such structures and its main contribution is to take into consideration the effect of the multi-level memory hierarchies present in all the current computers. The paper enumerates and compares various choices and methods in order to give an indication how to choose the structure and the parameters of hash tables in case of large-scale, in-memory data dictionaries.

international conference on artificial intelligence | 2006