Stéphane Domas | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Stéphane Domas is active.

Explore More

Publication

Featured researches published by Stéphane Domas.

The Journal of Supercomputing | 2012

Sparse systems solving on GPUs with GMRES

Raphaël Couturier; Stéphane Domas

Scientific applications very often rely on solving one or more linear systems. When matrices are sparse, iterative methods are preferred to direct ones. Nevertheless, the value of nonzero elements and their distribution (i.e., the sketch of the matrix) greatly influence the efficiency of those methods (in terms of computation time, number of iterations, result precision) or simply prevent the convergence.Among iterative methods, GMRES (Saad, Iterative methods for sparse linear systems. PWS Publishing, New York, 1996) is often chosen when dealing with general nonsymmetric matrices. Indeed its convergence is very fast and more stable than the biconjugate gradient. Furthermore, it is mainly based on mathematical operations (matrix-vector and dot products, norms, etc.) that can be heavily parallelized and is thus a good candidate to implement a solver for sparse systems on Graphics Processing Units (GPU).This paper presents a GMRES method for such an architecture. It is based on the modified Gram–Schmidt approach and is very similar to that of Sparselib (Barrett et al., Templates for the solution of linear systems: building blocks for iterative methods, SIAM, Philadelphia, 1994). Our version uses restarting and a very basic preconditioning. For its implementation, we have based our code on CUBLAS (NVIDIA, http://developer.download.nvidia.com/compute/cuda/2_1/toolkit/docs/CUBLAS_Library_2.1.pdf, 2008) and SpMV (Bell and Garland, Efficient sparse matrix-vector multiplication on CUDA. NVIDIA technical report NVR-2008-004, 2008) libraries, in order to achieve a good performance whatever the matrix sizes and their sketch are. Our experiments exhibit encouraging results on the comparison between Central Processing Units (CPU) and GPU executions in double precision, obtaining a speedup ranging from 8 up to 23 for a large variety of problems.

international parallel and distributed processing symposium | 2007

CRAC: a Grid Environment to Solve Scientific Applications with Asynchronous Iterative Algorithms

Raphaël Couturier; Stéphane Domas

This paper presents CRAC, an environment dedicated to design efficient asynchronous iterative algorithms for a grid architecture. Those algorithms are particularly suited for grid architecture since they naturally allow to overlap communications by computations. Each processor computes its iterations freely without any synchronization with its neighbors. All the characteristics of CRAC are described. A real application using four distant clusters, with a total of 120 processors, shows the interest of this environment and of asynchronous algorithms.

signal processing systems | 2014

Fine-tuned High-speed Implementation of a GPU-based Median Filter

Gilles Perrot; Stéphane Domas; Raphaël Couturier

Median filtering is a well-known method used in a wide range of application frameworks as well as a standalone filter, especially for salt-and-pepper denoising. It is able to highly reduce the power of noise while minimizing edge blurring. Currently, existing algorithms and implementations are quite efficient but may be improved as far as processing speed is concerned, which has led us to further investigate the specificities of modern GPUs. In this paper, we propose the GPU implementation of fixed-size kernel median filters, able to output up to 1.85 billion pixels per second on C2070 Tesla cards. Based on a Branchless Vectorized Median class algorithm and implemented through memory fine tuning and the use of GPU registers, our median drastically outperforms existing implementations, resulting, as far as we know, in the fastest median filter to date.

computer and information technology | 2011

GPU Implementation of a Region Based Algorithm for Large Images Segmentation

Gilles Perrot; Stéphane Domas; Raphaël Couturier; Nicolas Bertaux

Image segmentation is one of the most challenging issues in image computing. In this work, we focus on region-based active contour techniques (snakes) as they seem to achieve a high level of robustness and fit with a large range of applications. Some algorithmic optimizations provide significant speedups, but even so, execution times are still non-neglectable with the continuing increase of image sizes. Moreover, these algorithms are not well suited for running on multi-core CPUs. At the same time, recent developments of Graphical Processing Units (GPU) suggest that higher speedups could be obtained by use of their specific design. We have managed to adapt a specially efficient snake algorithm that fits recent Nvidia GPU architecture and takes advantage of its massive multi-threaded execution capabilities. The speedup obtained is most often around 7.

international conference on thermal mechanical and multi physics simulation and experiments in microelectronics and microsystems | 2011

Modeling, filtering and optimization for AFM arrays

Hui Hui; Y. Yakoubi; Michel Lenczner; Scott Cogan; André Meister; Mélanie Favre; Raphaël Couturier; Stéphane Domas

In this paper, we present new tools and results developed for Arrays of Microsystems and especially for Atomic Force Microscope (AFM) array design. For modeling, we developed a two-scale model of cantilever arrays in elastodynamics. A robust optimization toolbox is interfaced to aid for design before the microfabrication process. A model based algorithm of static state estimation using measurement of mechanical displacements by interferometry is stated. Quantization of interferometry data processing is analyzed for FPGA implementation. A robust H ∞ filtering problem of the coupled cantilevers is solved for time-invariant system with random noise effects. Our solution allows semi-decentralized computing based on functional calculus that can be implemented by networks of distributed electronic circuits as shown in a previous paper.

Concurrency and Computation: Practice and Experience | 2016

An optimized GPU-based 2D convolution implementation

Gilles Perrot; Stéphane Domas; Raphaël Couturier

With the increasing sophistication of image processing algorithms, and because of its low computation complexity, convolution should fully benefit from the ever‐increasing capacities of state‐of‐the‐art graphics processing units, such as Nvidias Kepler and Maxwell family cards. Currently, it tends to be used as a preprocessing stage within more intricate image manipulations and has recently been implemented quite efficiently by several teams. However, either their implementations do not come near hardwares peak performance or are unable to process large mask sizes. Such limitations are overrun by our original parallel register‐only convolution filter implementation of two‐dimensional convolution filters that can process 32‐bit floating‐point images on a NVidia K40 card using mask sizes up to 127×127 and at the same time achieving pixel throughputs over 29GP/s, which is, as far as we know, the highest rate known to date. Such results were obtained by using registers sparingly and by designing memory access patterns that cancel both load and store replays at warp levels, along with optimizing cache use. Copyright

international conference cloud and big data computing | 2017

A Hole-Filling Framework Based on DIBR and Improved Criminisi's Inpainting Algorithm for 3D Videos

Ke Du; Stéphane Domas; Mengdie Wu; Michel Lenczner

This study is based on the properties of Depth Image Based Rendering (DIBR), especially on the characteristics of holes caused by disocclusion. In order to recover the texture and the structure in missing areas and to improve the quality of rendered image, some research has been done on the hole-filling process for the virtual view image, starting from Criminisis inpainting algorithm. The depth information is taken into consideration in the hole-filling framework for 3D videos we proposed. Some pre-processing steps are also added to either enhance the quality of synthesized image in the virtual view or to speed up the processing. Experimental results show that the proposed framework has better performances than the existing method. Both the quality of synthesized image and the processing speed are improved.

international conference on intelligent computing | 2016

An Improved Algorithm Based on SURF for MR Infant Brain Image Registration

Ke Du; Stéphane Domas; Michel Lenczner; Guangjin Zhang

The correct diagnosis of brain diseases is crucial for children with brain disorders. But the complex characteristics of infant brain make the image analysis very complicated. Thus, an accurate image registration is a prerequisite for accurate analysis of MR infant brain images, and it provides valuable information for the diagnosis of doctors. This paper presents our research works on SURF registration algorithm of 2-D MR infant brain images. We firstly describe the original algorithm and analyze its advantages and drawbacks. Then an improved version is proposed, which uses 8-D descriptor vectors with the length of 128. The experiment results show, compared with the original version, our algorithm can achieve more accurate image registration with a little more time consumption. For all the images tested, the increase of correct matching rate varies from a minimum of 5.7 % to a maximum of 14.9 % compared with the classical one.

2012 Second Workshop on Design, Control and Software Implementation for Distributed MEMS | 2012