Valeriu Codreanu | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Valeriu Codreanu is active.

Explore More

Publication

Featured researches published by Valeriu Codreanu.

IEEE Transactions on Visualization and Computer Graphics | 2016

CUBu: Universal Real-Time Bundling for Large Graphs

Matthew van der Zwan; Valeriu Codreanu; Alexandru Telea

Visualizing very large graphs by edge bundling is a promising method, yet subject to several challenges: speed, clutter, level-of-detail, and parameter control. We present CUBu, a framework that addresses the above problems in an integrated way. Fully GPU-based, CUBu bundles graphs of up to a million edges at interactive framerates, being over 50 times faster than comparable state-of-the-art methods, and has a simple and intuitive control of bundling parameters. CUBu extends and unifies existing bundling techniques, offering ways to control bundle shapes, separate bundles by edge direction, and shade bundles to create a level-of-detail visualization that shows both the graph core structure and its details. We demonstrate CUBu on several large graphs extracted from real-life application domains.

international conference on high performance computing and simulation | 2013

GPU-ASIFT: A fast fully affine-invariant feature extraction algorithm

Valeriu Codreanu; Feng Dong; Baoquan Liu; Jos B. T. M. Roerdink; David Williams; Po Yang; Burhan Yasar

This paper presents a method that takes advantage of powerful graphics hardware to obtain fully affine-invariant image feature detection and matching. The chosen approach is the accurate, but also very computationally expensive, ASIFT algorithm. We have created a CUDA version of this algorithm that is up to 70 times faster than the original implementation, while keeping the algorithms accuracy close to that of ASIFT. Its matching performance is therefore much better than that of other non-fully affine-invariant algorithms. Also, this approach was adapted to fit the multi-GPU paradigm in order to assess the acceleration potential from modern GPU clusters.

Computers & Graphics | 2014

Parallel centerline extraction on the GPU

Baoquan Liu; Alexandru Telea; Jos B. T. M. Roerdink; Gordon J. Clapworthy; David Williams; Po Yang; Feng Dong; Valeriu Codreanu; Alessandro Chiarini

Centerline extraction is important in a variety of visualization applications including shape analysis, geometry processing, and virtual endoscopy. Centerlines allow accurate measurements of length along winding tubular structures, assist automatic virtual navigation, and provide a path-planning system to control the movement and orientation of a virtual camera. However, efficiently computing centerlines with the desired accuracy has been a major challenge. Existing centerline methods are either not fast enough or not accurate enough for interactive application to complex 3D shapes. Some methods based on distance mapping are accurate, but these are sequential algorithms which have limited performance when running on the CPU. To our knowledge, there is no accurate parallel centerline algorithm that can take advantage of modern many-core parallel computing resources, such as GPUs, to perform automatic centerline extraction from large data volumes at interactive speed and with high accuracy. In this paper, we present a new parallel centerline extraction algorithm suitable for implementation on a GPU to produce highly accurate, 26-connected, one-voxel-thick centerlines at interactive speed. The resulting centerlines are as accurate as those produced by a state-of-the-art sequential CPU method [40], while being computed hundreds of times faster. Applications to fly through path planning and virtual endoscopy are discussed. Experimental results demonstrating centeredness, robustness and efficiency are presented.

international symposium on electronics and telecommunications | 2010

Performance gain from data and control dependency elimination in embedded processors

Valeriu Codreanu; Radu Hobincu

This paper presents a way of increasing overall performance in embedded processors by introducing a multithreading interleaved execution model that can be applied to any Instruction Set Architecture. Usual acceleration techniques as superpipeline or branch prediction are not suited for embedded machines due to their inherent inefficiency. We will show that by removing dependencies within a processor and thus eliminating the need for extra hardware required for keeping the overall coherence, there will be a noticeable increase in performance (up to 450%) and also a decrease in size and power consumption. Also, this approach will maintain the backwards compatibility with the software legacy in order to keep the software changes to a minimum.

Signal Processing-image Communication | 2016

GSWO: A programming model for GPU-enabled parallelization of sliding window operations in image processing

Po Yang; Gordon J. Clapworthy; Feng Dong; Valeriu Codreanu; David Williams; Baoquan Liu; Jos B. T. M. Roerdink; Zhikun Deng

Sliding Window Operations (SWOs) are widely used in image processing applications. They often have to be performed repeatedly across the target image, which can demand significant computing resources when processing large images with large windows. In applications in which real-time performance is essential, running these filters on a CPU often fails to deliver results within an acceptable timeframe. The emergence of sophisticated graphic processing units (GPUs) presents an opportunity to address this challenge. However, GPU programming requires a steep learning curve and is error-prone for novices, so the availability of a tool that can produce a GPU implementation automatically from the original CPU source code can provide an attractive means by which the GPU power can be harnessed effectively. This paper presents a GPU-enabled programming model, called GSWO, which can assist GPU novices by converting their SWO-based image processing applications from the original C/C++ source code to CUDA code in a highly automated manner. This model includes a new set of simple SWO pragmas to generate GPU kernels and to support effective GPU memory management. We have implemented this programming model based on a CPU-to-GPU translator (C2GPU). Evaluations have been performed on a number of typical SWO image filters and applications. The experimental results show that the GSWO model is capable of efficiently accelerating these applications, with improved applicability and a speed-up of performance compared to several leading CPU-to-GPU source-to-source translators.

international conference on parallel processing | 2013

Evaluation of Autoparallelization Toolkits for Commodity GPUs

David Williams; Valeriu Codreanu; Po Yang; Baoquan Liu; Feng Dong; Burhan Yasar; Babak Mahdian; Alessandro Chiarini; Xia Zhao; Jos B. T. M. Roerdink

In this paper we evaluate the performance of the OpenACC and Mint toolkits against C and CUDA implementations of the standard PolyBench test suite. Our analysis reveals that performance is similar in many cases, but that a certain set of code constructs impede the ability of Mint to generate optimal code. We then present some small improvements which we integrate into our own GPSME toolkit (which is derived from Mint) and show that our toolkit now out-performs OpenACC in the majority of tests.

2013 International Conference on Computer Medical Applications (ICCMA) | 2013

Accelerating colonic polyp detection using commodity graphics hardware

David Williams; Valeriu Codreanu; Jos B. T. M. Roerdink; Po Yang; Baoquan Liu; Feng Dong; Alessandro Chiarini

We present a parallel implementation of an algorithm for the detection of colonic polyps from CT data sets. This implementation is designed specifically to take advantage of the computational power available on modern Graphics Processing Units (GPUs), which significantly reduces the execution time to streamline the workflow of clinicians examining the data. We provide details about the changes which were made to the existing algorithm to suit the new target hardware, and perform tests which demonstrate that the results are a very close match to the reference implementation while being computed in a fraction of the time.

Concurrency and Computation: Practice and Experience | 2016

Evaluating automatically parallelized versions of the support vector machine

Valeriu Codreanu; Bob Dröge; David Williams; Burhan Yasar; Po Yang; Baoquan Liu; Feng Dong; Olarik Surinta; Lambert Schomaker; Jos B. T. M. Roerdink; Marco Wiering

The support vector machine (SVM) is a supervised learning algorithm used for recognizing patterns in data. It is a very popular technique in machine learning and has been successfully used in applications such as image classification, protein classification, and handwriting recognition. However, the computational complexity of the kernelized version of the algorithm grows quadratically with the number of training examples. To tackle this high computational complexity, we have developed a directive‐based approach that converts a gradient‐ascent based training algorithm for the CPU to an efficient graphics processing unit (GPU) implementation. We compare our GPU‐based SVM training algorithm to the standard LibSVM CPU implementation, a highly optimized GPU‐LibSVM implementation, as well as to a directive‐based OpenACC implementation. The results on different handwritten digit classification datasets demonstrate an important speed‐up for the current approach when compared to the CPU and OpenACC versions. Furthermore, our solution is almost as fast and sometimes even faster than the highly optimized CUBLAS‐based GPU‐LibSVM implementation, without sacrificing the algorithms accuracy. Copyright

ieee faible tension faible consommation | 2013

VASILE: A reconfigurable vector architecture for instruction level frequency scaling

Lucian Petrica; Valeriu Codreanu; Sorin Cotofana

Coarse-grained dynamic frequency scaling has been extensively utilised in embedded (multiprocessor) platforms to achieve energy reduction and by implication to extend the autonomy and battery lifetime. In this paper we propose to make use of fine-grained frequency scaling, i.e., adjust the frequency at instruction level, to increase the instruction throughput of a FPGA implemented Vector Processor (VP). We introduce a VP architectural template and an associated design methodology that enables the creation of application requirements tailored VP instances. For each instance, the data-path delays of individual instructions are optimized separately, guided by profiling data corresponding to the target application class, maximizing the performance of frequently utilised instructions to the detriment of those which are less often executed. In this way instructions are divided into clock frequency classes according to their data-path delay and at run time the clock frequency is scaled to the value required by the class of the to be executed instruction. During the application execution different VP instances are dynamically configured in FPGA in order to create the most appropriate hardware support for optimizing the application performance in terms of throughput without increasing power consumption, and therefore reducing energy. As operating frequency changes induce a certain time penalty, which may potentially diminish the actual performance gain, the application code is optimised during the compilation in order to reduce the number of runtime clock switches via, e.g., loop tiling, instruction clustering. We evaluate the effectiveness of the proposed approach on several computational kernels used in image processing applications, i.e., sum of absolute differences, sum of squared differences, and Gaussian filtering. Our results indicate that an average instruction throughput increase of 20%, and a 15 % energy consumption reduction are achieved due to the utilisation of runtime reconfiguration and fine-grained frequency scaling.

international conference on pattern recognition applications and methods | 2018

Learning to Evaluate Chess Positions with Deep Neural Networks and Limited Lookahead.

Matthia Sabatelli; Francesco Bidoia; Valeriu Codreanu; Marco Wiering

In this paper we propose a novel supervised learning approach for training Artificial Neural Networks (ANNs) to evaluate chess positions. The method that we present aims to train different ANN architectures to understand chess positions similarly to how highly rated human players do. We investigate the capabilities that ANNs have when it comes to pattern recognition, an ability that distinguishes chess grandmasters from more amateur players. We collect around 3,000,000 different chess positions played by highly skilled chess players and label them with the evaluation function of Stockfish, one of the strongest existing chess engines. We create 4 different datasets from scratch that are used for different classification and regression experiments. The results show how relatively simple Multilayer Perceptrons (MLPs) outperform Convolutional Neural Networks (CNNs) in all the experiments that we have performed. We also investigate two different board representations, the first one representing if a piece is present on the board or not, and the second one in which we assign a numerical value to the piece according to its strength. Our results show how the latter input representation influences the performances of the ANNs negatively in almost all experiments.

Explore More