Is this you? Create Your Porfile

Carla Ramiro

Polytechnic University of Valencia

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Carla Ramiro is active.

Explore More

Publication

Featured researches published by Carla Ramiro.

IEEE Transactions on Vehicular Technology | 2012

Fully Parallel GPU Implementation of a Fixed-Complexity Soft-Output MIMO Detector

Sandra Roger; Carla Ramiro; Alberto Gonzalez; Vicenc Almenar; Antonio M. Vidal

Multicore and graphic processing units (GPUs) can be combined to efficiently implement signal-processing algorithms for communication systems, due to their parallel processing capabilities. This paper proposes a fully parallel fixed-complexity soft-output detector, which is suitable for GPU implementation and allows a considerable decrease in the computational time required for the data detection stage in multiple-input-multiple-output (MIMO) systems. A novel channel matrix preprocessing stage, based on column-norm ordering, is developed to efficiently match the multicore architecture. The throughput of the implementation is shown to outperform other recent implementations and to support some of the configurations in the long-term evolution (LTE) standard.

Computer-Aided Engineering | 2012

An efficient GPU implementation of fixed-complexity sphere decoders for MIMO wireless systems

Sandra Roger; Carla Ramiro; Alberto Gonzalez; Vicenc Almenar; Antonio M. Vidal

The use of many-core processors such as general purpose Graphic Processing Units GPUs has recently become attractive for the efficient implementation of signal processing algorithms for communication systems. This is due to the cost-effectiveness of GPUs together with their potential capability of parallel processing. This paper presents an implementation of the widely employed fixed-complexity sphere decoder on GPUs, which allows to considerably decrease the computational time required for the data detection stage in multiple-input multiple-output systems. Both, the hard-and soft-output versions of the method have been implemented. Speedup results show the proposed GPU implementation boosts the runtime of the parallel execution of the methods in a high performance multi-core CPU. In addition, the throughput of the algorithm is evaluated and is shown to outperform other recent implementations and to fulfill the real-time requirements of several LTE configurations.

international conference on acoustics, speech, and signal processing | 2012

A reconfigurable GPU implementation for Tomlinson-Harashima precoding

Fernando Domene; Sandra Roger; Carla Ramiro; Gema Piñero; Alberto Gonzalez

Fast parallel processing capability of general purpose Graphic Processing Units (GPU) can be exploited to accelerate the precoding calculation needed in spatially multiplexed wireless communication systems. In this paper, a GPU-based implementation of the well-known multiuser Tomlinson-Harashima precoding (THP) scheme combined with a lattice-reduction (LR) stage is presented. The proposed approach allows the LR stage to be switched off when user requirements are achieved by using only THP. Moreover, our GPU implementation provides scalability in the number of sub-carriers per symbol, which is a key factor in LTE and 4G wireless standards. Simulation results show that the GPU-based THP implementation performs up to 7 times faster than its CPU-equivalent whereas the LR stage implementation only achieves a speedup of 3. Despite the fact that the LR cannot be as efficiently parallelized as the THP, a speedup of nearly 6 is achieved when both are combined.

The Journal of Supercomputing | 2013

Multicore implementation of a fixed-complexity tree-search detector for MIMO communications

Carla Ramiro; Sandra Roger; Alberto Gonzalez; Vicenc Almenar; Antonio M. Vidal

Multicore systems allow the efficient implementation of signal processing algorithms for communication systems due to their high parallel processing capabilities. In this paper, we present a high-throughput multicore implementation of a fixed-complexity tree-search-based detector interesting for MIMO wireless communication systems. Experimental results confirm that this implementation allows to accelerate the data detection stage for different constellation sizes and number of subcarriers.

The Journal of Supercomputing | 2015

MIMOPack: a high-performance computing library for MIMO communication systems

Carla Ramiro; Antonio M. Vidal; Alberto Gonzalez

This paper presents MIMOPack, a set of optimized functions to perform some of the most complex stages in multiple-input multiple-output (MIMO) communication systems such as channel coding, preprocessing, precoding and detection. These functions are optimized to be run in a wide range of architectures increasing the portability of scientific codes between different computing environments. MIMOPack aims to become a useful library for the research community facilitating to the programmer the development of adaptable parallel applications and also to speed up simulation platforms used to assess different technologies proposed by several companies involved in standarization processes.

The Journal of Supercomputing | 2014

A GPU implementation of an iterative receiver for energy saving MIMO ID-BICM systems

Carla Ramiro; M. Ángeles Simarro; Francisco-Jose Martínez-Zaldívar; Antonio M. Vidal; Alberto Gonzalez

Iterative detection and decoding in communication systems with multiple transmitter and receiver antennas suffer from a significant increase in the computational cost and energy consumption. Nowadays, application of specific high-performance computing techniques for signal processing in communication systems is receiving considerable attention. In this paper, we present an accelerated and efficient iterative receiver, which has been implemented following two strategies. First, we reduce the computational cost using parallelized algorithms executed on graphics processing unit. In addition, our receiver allows the selection between two types of detectors with different complexity and performance. The selection can be done to fulfill a given compromise between bit error rate and power consumption.

international conference on conceptual structures | 2012

Two-Stage Least Squares Algorithms with QR Decomposition for Simultaneous Equations Models on Heterogeneous Multicore and Multi-GPU Systems

Carla Ramiro; Jose-Juan López-Espín; Domingo Giménez; Antonio M. Vidal

Abstract This paper analyzes the use of a multicore+multiGPU system for solving Simultaneous Equations Models by the Two-Stage Least Squares method with QR decomposition. The combination of CPU and GPU allows us to reduce the execution time in the solution of large SEM. When working on a heterogeneous system it is necessary to design dynamic and hybrid algorithms to exploit the full potential of the machine but the heterogeneity makes it diffcult. To obtain optimum performance, problems should be suitable and programming must be performed carefully. Our contribution shows that we can efficiently exploit the resources of the machine even for dense linear algebra problems of double data type where GPUs do not offer good performance, as occurs in some highly optimized libraries that use the hybrid programming CPU with GPU, such as CULA or MAGMA, where the speedup achieved is far from the theoretical.

Archive | 2013