Marek Tudruj
Polish Academy of Sciences
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Marek Tudruj.
parallel, distributed and network-based processing | 2003
Marek Tudruj; Lukasz Masko
New architectural solutions for parallel systems built of bus-based shared memory processor clusters are presented. A new paradigm is proposed for interprocessor communication, called communication on the fly. With this paradigm, processors can be dynamically switched between clusters at program run-time to bring in their caches data that can be read by many processors in a cluster at the same time they are written to the cluster memory. A cache controlled macro data flow program execution paradigm is also proposed. Programs are structured into tasks for which all required data are brought to the processor data cache before task execution. A. new graph representation of programs is introduced, which enables modeling of functioning of data caches, memories, bus arbiters, processor switching between clusters and parallel reads of data on the fly. This representation is used for realistic simulation of a numerical algorithm execution based on distribution of parallel tasks between dynamic SMP clusters and on communication on the fly. Performance evaluation results are presented for different configurations of the programs and shared memory clusters in the system.
Applied Soft Computing | 2015
Ivanoe De Falco; Eryk Laskowski; Richard Olejnik; Umberto Scafuri; Ernesto Tarantino; Marek Tudruj
The paper describes methods for using Extremal Optimization (EO) for processor load balancing during execution of distributed applications. A load balancing algorithm for clusters of multicore processors is presented and discussed. In this algorithm the EO approach is used to periodically detect the best tasks as candidates for migration and for a guided selection of the best computing nodes to receive the migrating tasks. To decrease the complexity of selection for migration, the embedded EO algorithm assumes a two-step stochastic selection during the solution improvement based on two separate fitness functions. The functions are based on specific models which estimate relations between the programs and the executive hardware. The proposed load balancing algorithm is assessed by experiments with simulated load balancing of distributed program graphs. The algorithm is compared against a greedy fully deterministic approach, a genetic algorithm and an EO-based algorithm with random placement of migrated tasks.
DAPSYS | 2005
Marek Tudruj; Janusz Borkowski; Damian Kopanski
An extension of the graphical parallel program design system P-GRADE towards specification of program execution control based on global application state monitoring is presented. De-coupled structured specifications of computational and control elements of parallel programs are assumed. Special synchronizer processes collect process state messages supplied with time interval timestamps and construct strongly consistent application states. Control predicates are evaluated on these states by synchronizers. As a result, control signals can be sent to application processes to stimulate desired reactions to the predicates. The signals can cause asynchronous computation activation or cancellation. Implementation of a parallel program of Traveling Salesman Problem (TSP) solved by branch-and-bound (B&B) method is described to illustrate properties of the new system.
international symposium on parallel and distributed computing | 2011
Marek Tudruj; Janusz Borkowski; Lukasz Masko; Adam Smyk; Damian Kopanski; Eryk Laskowski
A new distributed program graphical design environment is described in the paper. It is oriented towards designing program execution control based on a built-in system infrastructure which enables easy global application states monitoring in systems based on multicore processors. Two aspects of global application control design are covered. First is the global control flow in programs at the level of processes and threads. The second is the asynchronous control of internal process and thread behavior. The proposed control infrastructure is based on structural program elements called synchronizers organized at the process and thread levels to collect state information, evaluate control predicates on global states and send signals to application program threads and processes to stimulate global control actions. The paper presents principles of the application program graphical design and programming methods to implement global control at the level of threads.
Future Generation Computer Systems | 2007
Eryk Laskowski; Marek Tudruj; Richard Olejnik; Bernard Toursel
A method for an introductory optimization of multithreaded Java programs for execution on clusters of Java Virtual Machines (JVMs) inside desktop grids is presented. It is composed of two stages. In the first stage, a clustering algorithm is applied to extended macro data flow graphs generated on the basis of the byte-code compiled for multithreaded Java programs. These graphs account for data and control dependencies in programs including conditional branch instructions annotated by branch statistics driven from execution traces for representative sets of data. In the second stage, a list scheduling is performed based on the Earliest Task First (ETF) heuristics in which node mapping on JVMs accounts for mutually exclusive paths outgoing from conditional branch nodes. The presented object placement optimization algorithm is a part of the DG-ADAJ environment.
parallel processing and applied mathematics | 2005
Lukasz Masko; Pierre Francois Dutot; Grégory Mounié; Denis Trystram; Marek Tudruj
The paper presents an algorithm for scheduling parallel programs for execution in a parallel architecture based on dynamic SMP processor clusters with data transfers on the fly. The algorithm is based on the concept of moldable computational tasks. First, an initial program graph is decomposed into sub–graphs, which are then treated as moldable tasks. So identified moldable tasks are then scheduled using an algorithm with warranted schedule length.
international conference on parallel processing | 2001
Marek Tudruj; Lukasz Masko
The paper presents a new architectural solution for parallel systems built of shared memory processor clusters. The system is based on dynamically run-time reconfigurable multi-processor clusters; each organized around a local shared memory module placed in a common address space. Each memory module is accessed by a local cluster bus and a common inter-cluster bus. Programs are organized accordingly to their macro dataflow graphs in which tasks and communication are so defined, as to eliminate reloading of data caches during task execution. The behaviour of the proposed system has been evaluated by simulation based on an extended macro dataflow graph representation that includes modelling of data bus arbiters in the system. Program distribution into dynamic processor clusters assumes run-time switching of processors between busses and memory modules. It can reduce contention on data busses. CG algorithm execution in the proposed architecture shows seed-up greater than 4 when 5 busses are applied instead of one.
european conference on applications of evolutionary computation | 2013
Ivanoe De Falco; Eryk Laskowski; Richard Olejnik; Umberto Scafuri; Ernesto Tarantino; Marek Tudruj
The paper shows how to use Extremal Optimization in load balancing of distributed applications executed in clusters of multicore processors interconnected by a message passing network. Composed of iterative optimization phases which improve program task placement on processors, the proposed load balancing method discovers dynamically the candidates for migration with the use of an Extremal Optimization algorithm and a special quality model which takes into account the computation and communication parameters of the constituent parallel tasks. Assessed by experiments with simulated load balancing of distributed program graphs, a comparison of the proposed Extremal Optimization approach against a deterministic approach based on a similar load balancing theoretical model is provided.
parallel, distributed and network-based processing | 2007
Janusz Borkowski; Damian Kopanski; Marek Tudruj
Many computational problems have irregular data/control characteristics, which make programs difficult to be efficiently implemented in parallel systems. Due to irregular character of code or data, even division of work between processors at application startup is frequently impossible. Runtime optimization is possible, but it requires a constant exchange of control information and/or data during runtime is required. A novel parallel application control method is proposed in the paper. It is based on application global state monitoring for runtime irregular application control. The method provides a ready-to-use control infrastructure, which can be conveniently applied by a programmer. Both suitability and efficiency of the proposed control method are discussed in the paper based on two selected numerical applications: adaptive integration and branch and bound search. The presented experimental results were obtained with PS-GRADE graphical parallel design system, which embeds the proposed control method. The results confirm the efficiency of control based on global predicates in irregular computations
parallel computing in electrical engineering | 2002
Marek Tudruj; Lukasz Masko
The paper concerns efficient architectural solutions for shared memory systems composed of processor clusters based on busses. The essential proposed feature is program run-time dynamic switching of processors between clusters. A new communication paradigm, called communication on the fly is proposed, which is a combination of processor switching between clusters and parallel data reads of data from cluster busses to processor data caches. Specific data cache functionality is assumed in the system. Programs are decomposed into such tasks executed without preemption, so as to eliminate reloading of caches during task execution. A cache controlled program execution paradigm is proposed in which task execution is enabled only if all necessary data have been introduced to the processor data cache. An extended macro-data flow program graph representation is proposed for modeling functioning of data caches, data bus arbiters, switching processors between clusters and multiple parallel reads of data on the fly useful for designing parallel programs for execution in the proposed architecture. This new program representation has been used for simulated symbolic execution of an FFT program graph, based on mapping of parallel tasks on dynamic SMP clusters with communication on the fly.