J. Daniel Garcia | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where J. Daniel Garcia is active.

Explore More

Publication

Featured researches published by J. Daniel Garcia.

ubiquitous computing | 2014

The Internet of Things: connecting the world

Jesús Carretero; J. Daniel Garcia

In the last few years, we have seen an increased interest in The Internet of Things (IoT). IoT is a network of Internetenabled objects in a world where physical objects are seamlessly integrated into the information network and where the physical objects can become active participants in business processes. The Internet of Things [1] brings together two key concepts: Internet-connected devices everywhere in any time and any place and ubiquitous computing, where ‘‘the most profound technologies are those that disappear’’ [2] in such a way that these devices made themselves indistinguishable from explicit technology that the humans use in their lives. IoT aims at increasing the ubiquity of the Internet by integrating every object for interaction via embedded systems and leads to the highly distributed network of devices communicating with human beings as well as other devices. These objects can communicate with humans and enable people to monitor and control them through intelligent services in anytime and anywhere, taking into account security and privacy issues. Benefits from IoT will allow improving the services as perceived by the users, for example saving energy, enhancing comfort, getting better healthcare, and increased independence. On the other hand, IoT raises new technical and ethical challenges. The European Research Cluster on the Internet of Things has identified the trendiest application areas in the Internet of Things (IoT), in the document ‘‘The Internet of Things 2012—New Horizons’’ and grouped them in different domains [3]: applications development facilities; autonomic and self-aware IoT; IoT infrastructure as a service; IoT-aware open networks and location; large-scale IoT process deployment; data management and security; device-level energy issues; standardization; and societal, economic, and legal issues.

parallel, distributed and network-based processing | 2016

RPL: A Domain-Specific Language for Designing and Implementing Parallel C++ Applications

Vladimir Janjic; Christopher Brown; K. Mackenzie; Kevin Hammond; Marco Danelutto; Marco Aldinucci; J. Daniel Garcia

Parallelising sequential applications is usually a very hard job, due to many different ways in which an application can be parallelised and a large number of programming models (each with its own advantages and disadvantages) that can be used. In this paper, we describe a method to semi-automatically generate and evaluate different parallelisations of the same application, allowing programmers to find the best parallelisation without significant manual reengineering of the code. We describe a novel, high-level domain-specific language, Refactoring Pattern Language (RPL), that is used to represent the parallel structure of an application and to capture its extra-functional properties (such as service time). We then describe a set of RPL rewrite rules that can be used to generate alternative, but semantically equivalent, parallel structures (parallelisations) of the same application. We also describe the RPL Shell that can be used to evaluate these parallelisations, in terms of the desired extra-functional properties. Finally, we describe a set of C++ refactorings, targeting OpenMP, Intel TBB and FastFlow parallel programming models, that semi-automatically apply the desired parallelisation to the applications source code, therefore giving a parallel version of the code. We demonstrate how the RPL and the refactoring rules can be used to derive efficient parallelisations of two realistic C++ use cases (Image Convolution and Ant Colony Optimisation).

New Generation Computing | 2013

A Comparative Study and Evaluation of Parallel Programming Models for Shared-Memory Parallel Architectures

Luis Miguel Sanchez; Javier Fernández; Rafael Sotomayor; Soledad Escolar; J. Daniel Garcia

Nowadays, shared-memory parallel architectures have evolved and new programming frameworks have appeared that exploit these architectures: OpenMP, TBB, Cilk Plus, ArBB and OpenCL. This article focuses on the most extended of these frameworks in commercial and scientific areas. This paper shows a comparative study of these frameworks and an evaluation. The study covers several capacities, such as task deployment, scheduling techniques, or programming language abstractions. The evaluation measures three dimensions: code development complexity, performance and efficiency, measure as speedup per watt. For this evaluation, several parallel benchmarks have been implemented with each framework. These benchmarks are created to cover certain scenarios, like regular memory access or irregular computation. The conclusions show some highlights, like the fact that some frameworks (OpenMP, Cilk Plus) are better for transforming quickly a sequential code, others (TBB) have a small footprint which is ideal for small problems, and others (OpenCL) are suited for heterogeneous architectures but they require a very complex development process. The conclusions also show that the vectorization support is more critical than multitasking to achieve efficiency for those problems where this approach fits.

Concurrency and Computation: Practice and Experience | 2017

A generic parallel pattern interface for stream and data processing

David del Rio Astorga; Manuel F. Dolz; Javier Fernández; J. Daniel Garcia

Current parallel programming frameworks aid developers to a great extent in implementing applications that exploit parallel hardware resources. Nevertheless, developers require additional expertise to properly use and tune them to operate efficiently on specific parallel platforms. On the other hand, porting applications between different parallel programming models and platforms is not straightforward and demands considerable efforts and specific knowledge. Apart from that, the lack of high‐level parallel pattern abstractions, in those frameworks, further increases the complexity in developing parallel applications. To pave the way in this direction, this paper proposes GRPPI, a generic and reusable parallel pattern interface for both stream processing and data‐intensive C++ applications. GRPPI accommodates a layer between developers and existing parallel programming frameworks targeting multi‐core processors, such as C++ threads, OpenMP and Intel TBB, and accelerators, as CUDA Thrust. Furthermore, thanks to its high‐level C++ application programming interface and pattern composability features, GRPPI allows users to easily expose parallelism via standalone patterns or patterns compositions matching in sequential applications. We evaluate this interface using an image processing use case and demonstrate its benefits from the usability, flexibility, and performance points of view. Furthermore, we analyze the impact of using stream and data pattern compositions on CPUs, GPUs and heterogeneous configurations.

Simulation Modelling Practice and Theory | 2013

A novel black-box simulation model methodology for predicting performance and energy consumption in commodity storage devices

Laura Prada; Javier García; Alejandro Calderón; J. Daniel Garcia; Jesús Carretero

Abstract Traditional approaches for storage devices simulation have been based on detailed and analytic models. However, analytic models are difficult to obtain and detailed models require a high computational cost which may be not affordable for large scale simulations (e.g. detailed data center simulations). In current systems like large clusters, grids, or clouds, performance and energy studies are critical, and fast simulations take an important role on them. A different approach is the black-box statistical modeling, where the storage device, its interface, and the interconnection mechanisms are modeled as a single stochastic process, defining the request response time as a random variable with an unknown distribution. A random variate generator can be built and integrated into a bigger simulation model. This approach allows to generate a simulation model for both real and synthetic complex workloads. This article describes a novel methodology that aims to build fast simulation models for storage devices. Our method uses as starting point a workload and produces a random variate generator which can be easily integrated into large scale simulation models. A comparison between our variate generator and the widely known simulation tool DiskSim, shows that our variate generator is faster, and can be as accurate as DiskSim for both performance and energy consumption predictions.

programming models and applications for multicores and manycores | 2016

Discovering Pipeline Parallel Patterns in Sequential Legacy C++ Codes

David del Rio Astorga; Manuel F. Dolz; Luis Miguel Sanchez; J. Daniel Garcia

Since free performance lunch of processors is over, parallelism has become the new trend in hardware and architecture design. However, parallel resources deployed in data centers are underused in many cases, given that sequential programming is still deeply rooted in current software development. To face this problem, new methodologies and techniques for parallel programming have been progressively developed. For instance, parallel frameworks offer programming skeletons that allow expressing parallelism and concurrency in applications to better exploit concurrent hardware. Nevertheless, it remains a large portion of production software, coming from a broad range of scientific and industrial areas, that still execute sequential legacy codes. Taking into account that these software modules contain thousands, or even millions, of code lines, the effort needed to identify parallel regions is extremely high. To pave the way in this area, this paper presents Parallel Pattern Analyzer Tool (PPAT), a software component that aids discovering and annotating parallel patterns in source codes. Hence, facilitating the transformation of sequential code into parallel. We evaluate this tool for the special case of parallel pipelines using a series of well-known sequential benchmark suites.

parallel, distributed and network-based processing | 2016

Introducing Parallelism by Using REPARA C++11 Attributes

Marco Danelutto; J. Daniel Garcia; Luis Miguel Sanchez; Rafael Sotomayor; Massimo Torquati

Patterns provide a mechanism to express parallelism at a high level of abstraction and to make easier the transformation of existing legacy applications to target parallel frameworks. That also opens a path for writing new parallel applications. In this paper we introduce the REPARA approach for expressing parallel patterns and transforming the source code to parallelism frameworks. We take advantage of C++11 attributes as a mechanism to introduce annotations and enrich semantic information on valid source code. We also present a methodology for performing transformation of source code that allows to target multiple parallel programming models. Another contribution is a rule based mechanism to transform annotated code to those specific programming models. The REPARA approach requires programmer intervention only to perform initial code annotation while providing speedups that are comparable to those obtained by manual parallelization.

international conference on algorithms and architectures for parallel processing | 2016

A C++ Generic Parallel Pattern Interface for Stream Processing

David del Rio Astorga; Manuel F. Dolz; Luis Miguel Sanchez; Javier Garcia Blas; J. Daniel Garcia

Current parallel programming frameworks aid to a great extent developers to implement applications in order to exploit parallel hardware resources. Nevertheless, developers require additional expertise to properly use and tune them to operate on specific parallel platforms. On the other hand, porting applications between different parallel programming models and platforms is not straightforward and requires, in most of the cases, considerable efforts. Apart from that, the lack of high-level parallel pattern abstractions in these frameworks increases even more the complexity for developing parallel applications. To pave the way in this direction, this paper proposes GrPPI, a generic and reusable high-level parallel pattern interface for stream-based C++ applications. Thanks to its high-level C++ API, this interface allows users to easily expose parallelism in sequential applications using already existing parallel frameworks, such as C++ threads, OpenMP and Intel TBB. We evaluate this approach using an image processing use case to demonstrate its benefits from the usability, flexibility, and performance points of view.

international symposium on parallel and distributed processing and applications | 2012

A Comparative Evaluation of Parallel Programming Models for Shared-Memory Architectures

Luis Miguel Sanchez; Javier Fernández; Rafael Sotomayor; J. Daniel Garcia

Nowadays, most computers that are commercially available off-the-shelf (COTS) include hardware features that increase the performance of parallel general-purpose threads (hyper threading, multicore, ccNUMA architectures) or SIMD kernels (CPU vector instructions, GPUs). The purpose of this paper is to perform a compared evaluation of several parallel programming models where each one is fitted to exploit some of these features but also each one requires a different level of programming skills. Four parallel programming models (OpenMP, Intel TBB, Intel ArBB, and CUDA) have been selected. The idea is to cover a wide spectrum of programming models and most of the parallel hardware features included in modern computers. On one hand, OpenMP and TBB platforms, that exploits parallel threads running on multicore systems. On the other hand, ArBB, that combines multicore parallel threads and multicore SIMD features with a simpler programming model, and CUDA that exploits SIMD features of the GPU hardware. Our results obtained with the benchmarks used on this paper suggest that OpenMP and TBB have a lower performance compared to ArBB and CUDA. But also that ArBB performance tends to be comparable with CUDA performance in most cases (although it is normally lower). Thus, there are evidences that a careful designed top range multicore and multisocket architecture, can be comparable in terms of performance with top range GPU cards for many applications, with the advantage of a simpler programming model.

The Journal of Supercomputing | 2011

Power saving-aware prefetching for SSD-based systems

Laura Prada; Javier García; J. Daniel Garcia; Jesús Carretero

Energy saving for computing systems has recently become an important and worrying need. Energy demand has been increasing in many systems, especially in data centers and supercomputers. This article considers the problem of saving energy on storage systems taking advantage of SSD drives. SSD and magnetic disk devices offer different power characteristics, being SSD drives much less power consuming than conventional magnetic disk drives.This paper presents the design and evaluation of a novel power consumption-aware prefetching mechanism for hybrid storage systems. The prefetching mechanism aims to reduce the power consumption of high performance storage subsystems. Every disk access request is absorbed by an associated SSD device, and only when the SSD device is full, requests are forwarded to the disk in background.We have evaluated the proposed approach with the help of both synthetic and realistic workloads. The experimental results demonstrate that our solution achieves significant reduction in energy consumption. Additionally, the performance evaluation shows that our solution may bring a substantial I/O performance benefit.

Explore More