Jiri Jaros
Brno University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jiri Jaros.
Journal of the Acoustical Society of America | 2012
Bradley E. Treeby; Jiri Jaros; Alistair P. Rendell; Benjamin T. Cox
The simulation of nonlinear ultrasound propagation through tissue realistic media has a wide range of practical applications. However, this is a computationally difficult problem due to the large size of the computational domain compared to the acoustic wavelength. Here, the k-space pseudospectral method is used to reduce the number of grid points required per wavelength for accurate simulations. The model is based on coupled first-order acoustic equations valid for nonlinear wave propagation in heterogeneous media with power law absorption. These are derived from the equations of fluid mechanics and include a pressure-density relation that incorporates the effects of nonlinearity, power law absorption, and medium heterogeneities. The additional terms accounting for convective nonlinearity and power law absorption are expressed as spatial gradients making them efficient to numerically encode. The governing equations are then discretized using a k-space pseudospectral technique in which the spatial gradients are computed using the Fourier-collocation method. This increases the accuracy of the gradient calculation and thus relaxes the requirement for dense computational grids compared to conventional finite difference methods. The accuracy and utility of the developed model is demonstrated via several numerical experiments, including the 3D simulation of the beam pattern from a clinical ultrasound probe.
european conference on applications of evolutionary computation | 2010
Petr Pospichal; Jiri Jaros; Josef Schwarz
This paper deals with the mapping of the parallel island-based genetic algorithm with unidirectional ring migrations to nVidia CUDA software model. The proposed mapping is tested using Rosenbrock’s, Griewank’s and Michalewicz’s benchmark functions. The obtained results indicate that our approach leads to speedups up to seven thousand times higher compared to one CPU thread while maintaining a reasonable results quality. This clearly shows that GPUs have a potential for acceleration of GAs and allow to solve much complex tasks.
ieee international conference on high performance computing data and analytics | 2016
Jiri Jaros; Alistair P. Rendell; Bradley E. Treeby
Model-based treatment planning and exposimetry for high-intensity focused ultrasound requires the numerical simulation of nonlinear ultrasound propagation through heterogeneous and absorbing media. This is a computationally demanding problem due to the large distances travelled by the ultrasound waves relative to the wavelength of the highest frequency harmonic. Here, the k-space pseudospectral method is used to solve a set of coupled partial differential equations equivalent to a generalised Westervelt equation. The model is implemented in C++ and parallelised using the message passing interface (MPI) for solving large-scale problems on distributed clusters. The domain is partitioned using a 1D slab decomposition, and global communication is performed using a sparse communication pattern. Operations in the spatial frequency domain are performed in transposed space to reduce the communication burden imposed by the 3D fast Fourier transform. The performance of the model is evaluated using grid sizes up to 4096×2048×2048 grid points, distributed over a cluster using up to 1024 compute cores. Given the global nature of the gradient calculation, the model shows good strong scaling behaviour, with a speed-up of 1.7x whenever the number of cores is doubled. This means large-scale simulations can be distributed across high numbers of cores on a cluster to minimise execution times with a relatively small overhead. The efficacy of the model is demonstrated by simulating the ultrasound beam pattern for a high-intensity focused ultrasound sonication of the kidney.
congress on evolutionary computation | 2012
Jiri Jaros
This paper introduces a novel implementation of the genetic algorithm exploiting a multi-GPU cluster. The proposed implementation employs an island-based genetic algorithm where every GPU evolves a single island. The individuals are processed by CUDA warps, which enables the solution of large knapsack instances and eliminates undesirable thread divergence. The MPI interface is used to exchange genetic material among isolated islands and collect statistical data. The characteristics of the proposed GAs are investigated on a two-node cluster composed of 14 Fermi GPUs and 4 six-core Intel Xeon processors. The overall GPU performance of the proposed GA reaches 5.67 TFLOPS.
genetic and evolutionary computation conference | 2011
Petr Pospichal; Eoin Murphy; Michael O'Neill; Josef Schwarz; Jiri Jaros
Several papers show that symbolic regression is suitable for data analysis and prediction in financial markets. Grammatical Evolution (GE), a grammar-based form of Genetic Programming (GP), has been successfully applied in solving various tasks including symbolic regression. However, often the computational effort to calculate the fitness of a solution in GP can limit the area of possible application and/or the extent of experimentation undertaken. This paper deals with utilizing mainstream graphics processing units (GPU) for acceleration of GE solving symbolic regression. GPU optimization details are discussed and the NVCC compiler is analyzed. We design an effective mapping of the algorithm to the CUDA framework, and in so doing must tackle constraints of the GPU approach, such as the PCI-express bottleneck and main memory transactions. This is the first occasion GE has been adapted for running on a GPU. We measure our implementation running on one core of CPU Core i7 and GPU GTX 480 together with a GE library written in JAVA, GEVA. Results indicate that our algorithm offers the same convergence, and it is suitable for a larger number of regression points where GPU is able to reach speedups of up to 39 times faster when compared to GEVA on a serial CPU code written in C. In conclusion, properly utilized, GPU can offer an interesting performance boost for GE tackling symbolic regression.
internaltional ultrasonics symposium | 2014
Bradley E. Treeby; Jiri Jaros; Daniel Rohrbach; Ben Cox
A new model for simulating elastic wave propagation using the open-source k-Wave MATLAB Toolbox is described. The model is based on two coupled first-order equations describing the stress and particle velocity within an isotropic medium. For absorbing media, the Kelvin-Voigt model of viscoelasticity is used. The equations are discretised in 2D and 3D using an efficient time-stepping pseudospectral scheme. This uses the Fourier collocation spectral method to compute spatial derivatives and a leapfrog finite-difference scheme to integrate forwards in time. A multi-axial perfectly matched layer (M-PML) is implemented to allow free-field simulations using a finite-sized computational grid. Acceleration using a graphics processing unit (GPU) is supported via the MATLAB Parallel Computing Toolbox. An overview of the simulation functions and their theoretical and numerical foundations is described.
genetic and evolutionary computation conference | 2009
Jiri Jaros
The paper deals with optimization of collective communications on multistage interconnection networks (MINs). In the experimental work, unidirectional MINs like Omega, Butterfly and Clos are investigated. The study is completed by bidirectional binary, fat and full binary tree. To avoid link contentions and associated delays, collective communications are processed in synchronized steps. Minimum number of steps is sought for the given network topology, wormhole switching, minimum routing and given sets of sender and/or receiver nodes. Evolutionary algorithm proposed in this paper is able to design optimal schedules for broadcast and scatter collective communications. Acquired optimum schedules can simplify the consecutive writing high-performance communication routines for application-specific networks on chip, or for development of communication libraries in case of general-purpose multistage interconnection networks.
european conference on applications of evolutionary computation | 2012
Jiri Jaros; Petr Pospichal
The paper introduces an optimized multicore CPU implementation of the genetic algorithm and compares its performance with a fine-tuned GPU version. The main goal is to show the true performance relation between modern CPUs and GPUs and eradicate some of myths surrounding GPU performance. It is essential for the evolutionary community to provide the same conditions and designer effort to both implementations when benchmarking CPUs and GPUs. Here we show the performance comparison supported by architecture characteristics narrowing the performance gain of GPUs.
congress on evolutionary computation | 2007
Jiri Jaros; Josef Schwarz
The paper presents a new concept of parallel bivariate marginal distribution algorithm using the stepping stone based model of communication with the unidirectional ring topology. The traditional migration of individuals is compared with a newly proposed technique of probability model migration. The idea of the new xBMDA algorithms is to modify the learning of classical probability model (applied in the sequential BMDA). In the first strategy, the adaptive learning of the resident probability model is used. The evaluation of pair dependency, using Pearsons chi-square statistics is influenced by the relevant immigrant pair dependency according to the quality of resident and immigrant subpopulation. In the second proposed strategy, the evaluation metric is applied for the diploid mode of the aggregated resident and immigrant subpopulation. Experimental results show that the proposed adaptive BMDA outperforms the traditional concept of individual migration.
genetic and evolutionary computation conference | 2007
Jiri Jaros; Milos Ohlidal; Vaclav Dvorak
In this paper, we describe two evolutionary algorithms aimed at scheduling collective communications on interconnection networks of parallel computers. To avoid contention for links and associated delays, collective communications proceed in synchronized steps. Minimum number of steps is sought for the given network topology, wormhole (pipelined) switching, minimum routing and given sets of sender and/or receiver nodes. Used algorithms are able not only re-invent optimum schedules for known symmetric topologies like hyper-cubes, but they can find schedules even for any asymmetric or irregular topologies in case of general many-to-many collective communications. In most cases does the number of steps reach the theoretical lower bound for the given type of collective communication; if it does not, non-minimum routing can provide further improvement. Optimum schedules may serve for writing high-performance communication routines for application-specific networks on chip or for development of communication libraries in case of general-purpose interconnection networ.