Is this you? Create Your Porfile

Esther Salamí

Polytechnic University of Catalonia

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Esther Salamí is active.

Explore More

Publication

Featured researches published by Esther Salamí.

ieee international symposium on workload characterization | 2005

A performance characterization of high definition digital video decoding using H.264/AVC

Mauricio Alvarez; Esther Salamí; Alex Ramirez; Mateo Valero

H.264/AVC is a new international video coding standard that provides higher coding efficiency with respect to previous standards at the expense of a higher computational complexity. The complexity is even higher when H.264/AVC is used in applications with high bandwidth and high quality like high definition (HD) video decoding. In this paper, we analyze the computational requirements of H.264 decoder with a special emphasis in HD video and we compare it with previous standards and lower resolutions. The analysis was done with a SIMD optimized decoder using hardware performance monitoring. The main objective is to identify the application bottlenecks and to suggest the necessary support in the architecture for processing HD video efficiently. We have found that H.264/AVC decoding of HD video perform many more operations per frame than MPEG-4 and MPEG-2, has new kernels with more demanding memory access patterns and has a lot data dependent branches that are difficult to predict. In order to improve the H.264/AVC decoding process at HD it is necessary to explore a better support in media instructions, specialized prefetching techniques and possibly, the use of some kind of multiprocessor architecture.

international symposium on performance analysis of systems and software | 2007

Performance Impact of Unaligned Memory Operations in SIMD Extensions for Video Codec Applications

Mauricio Alvarez; Esther Salamí; Alex Ramirez; Mateo Valero

Although SIMD extensions are a cost effective way to exploit the data level parallelism present in most media applications, we will show that they had have a very limited memory architecture with a weak support for unaligned memory accesses. In video codec, and other applications, the overhead for accessing unaligned positions without an efficient architecture support has a big performance penalty and in some cases makes vectorization counter-productive. In this paper we analyze the performance impact of extending the Altivec SIMD ISA with unaligned memory operations. Results show that for several kernels in the H.264/AVC media codec, unaligned access support provides a speedup up to 3.8times compared to the plain SIMD version, translating into an average of 1.2times in the entire application. In addition to providing a significant performance advantage, the use of unaligned memory instructions makes programming SIMD code much easier both for the manual developer and the auto vectorizing compiler

ieee international symposium on workload characterization | 2007

HD-VideoBench. A Benchmark for Evaluating High Definition Digital Video Applications

Mauricio Alvarez; Esther Salamí; Alex Ramirez; Mateo Valero

HD-VideoBench is a benchmark devoted to high definition (HD) digital video processing. It includes a set of video encoders and decoders (Codecs) for the MPEG-2, MPEG-4 and H.264 video standards. The applications were carefully selected taken into account the quality and portability of the code, the representativeness of the video application domain, the availability of high performance optimizations and the distribution under a free license. Additionally, HD-VideoBench defines a set of input sequences and configuration parameters of the video Codecs which are appropriate for the HD video domain.

Geocarto International | 2011

Architecture for a helicopter-based unmanned aerial systems wildfire surveillance system

Enric Pastor; Cristina Barrado; Pablo Royo; Eduard Santamaria; Juan Lopez; Esther Salamí

Forest fires are an important problem for many countries. The economical loss is the most visible impact in the short term. The ecological damage and the impact on the wild life diversity and climate change are the most important factors in the long term. Up to now, satellites like NASAs MODIS (moderate-resolution imaging spectroradiometer) system have been the primary source for strategic large area thermal imaging. Tactical monitoring has been until recently reduced to observation from the ground or from some dedicated aerial resource like command and control helicopters. However, little technological support has been available to those in charge of these monitoring tasks. An unmanned aerial systems (UAS) platform capable of overflying the area of a forest fire and with capacity of operating from non-prepared terrains would be an extremely valuable information gathering asset in several well-defined circumstances: surveillance during day and specially night, and early morning or late afternoon monitoring of post-fire hot-spots during the following days of the extinction. This work introduces the Sky-Eye system, a helicopter-based UAS platform that together with its hardware/software architecture is designed to facilitate the development of wildfire remote sensing applications. The Sky-Eye UAS will improve the overall awareness by providing tactical support to wildfire monitoring and control of ground squads. Sky-Eye employs existing commercial off-the-shelf (COTS) technology that can be immediately deployed on the field on-board medium-sized UAS helicopters at a reasonable cost. Sky-Eye is built on top of a user-parameterizable architecture called USAL (UAS Service Abstraction Layer). This architecture defines a collection of standard services and their interrelations as a basic starting point for further development. Functionalities like enhanced flight-plans, a mission control engine, data storage, communications management, etc. are offered.

international conference on parallel processing | 2005

A Vector-µSIMD-VLIW Architecture for Multimedia Applications

Esther Salamí

Media processing has motivated strong changes in the focus and design of processors. These applications are composed of heterogeneous regions of code, some of them with high levels of DLP and other ones with only modest amounts of ILP. A common approach to deal with these applications are /spl mu/SIMD-VLIWprocessors. However, the ILP regions fail to scale when we increase the width of the machine, which, on the other hand, is desired to achieve high performance in the DLP regions. In this paper, we propose and evaluate adding vector capabilities to a /spl mu/SIMD-VLIW core to speed-up the execution of the DLP regions, while, at the same time, reducing the fetch bandwidth requirements. Results show that, in the DLP regions, both 2 and 4-issue width vector-/spl mu/SIMD-VLIW architectures outperform a 8-issue width /spl mu/SIMD-VLIW in factors of up to 2.7X and 4.2X (1.6X and 2.1X in average) respectively. As a result, the DLP regions become less than 10% of the total execution time and performance is dominated by the ILP regions.Media processing has motivated strong changes in the focus and design of processors. These applications are composed of heterogeneous regions of code, some of them with high levels of DLP and other ones with only modest amounts of ILP. A common approach to deal with these applications are /spl mu/SIMD-VLIWprocessors. However, the ILP regions fail to scale when we increase the width of the machine, which, on the other hand, is desired to achieve high performance in the DLP regions. In this paper, we propose and evaluate adding vector capabilities to a /spl mu/SIMD-VLIW core to speed-up the execution of the DLP regions, while, at the same time, reducing the fetch bandwidth requirements. Results show that, in the DLP regions, both 2 and 4-issue width vector-/spl mu/SIMD-VLIW architectures outperform a 8-issue width /spl mu/SIMD-VLIW in factors of up to 2.7X and 4.2X (1.6X and 2.1X in average) respectively. As a result, the DLP regions become less than 10% of the total execution time and performance is dominated by the ILP regions.

international conference on supercomputing | 2001

On the potential of tolerant region reuse for multimedia applications

Carlos Álvarez; Jesus Corbal; Esther Salamí; Mateo Valero

The recent years have shown an interesting evolution in the mid-end to low-end embedded domain. Portable systems are growing in importance as they improve in storage capacity and in interaction capabilities with general purpose systems. Furthermore, media processing is changing the view embedded processors are designed, keeping in mind the emergence of new application domains such as those for PDA systems or for the third generation of mobile digital phones (UMTS). The performance requirements of these new kind of devices are not those of the general-purpose domain, where traditionally the premium goal is the highest performance. Embedded systems must face ever increasing real time requirements as well as power consumption constraints. Under this special scenario, instruction/region reuse arises as a promising way of increasing the performance of media embedded processors and, at the same time, reducing the power consumption. Furthermore, media and signal processing applications are a suitable target for instruction/region reuse, given the large amount of redundancy found in media data working sets. In this paper we propose a novel region reuse mechanism that takes advantage of the tolerance of media algorithms to losses in the precision of computation. By identifying regions of code where an input data set is processed into an output data set, we can reuse computational instances using the result of previous ones with a similar input data set (hence the term tolerant reuse). We will show that conventional region reuse is barely able to provide more than a 8% in reduction of executed instructions (even with significantly big tables) in a typical JPEG encoder application. On the other hand, when applying the concept of tolerance, we are able to provide a reduction of more than 25% of the number of executed instructions with tables smaller than 1KB (with only small degradations in the quality of the output image), and up to a 40% of reduction (and no visually perceptible differences) with bigger tables .

ieee international symposium on workload characterization | 2005

Parallel processing in biological sequence comparison using general purpose processors

Friman Sánchez; Esther Salamí; Alex Ramirez; Mateo Valero

The comparison and alignment of DNA and protein sequences are important tasks in molecular biology and bioinformatics. One of the most well known algorithms to perform the string-matching operation present in these tasks is the Smith-Waterman algorithm (SW). However, it is a computation intensive algorithm, and many researchers have developed heuristic strategies to avoid using it, specially when using large databases to perform the search. There are several efficient implementations of the SW algorithm on general purpose processors. These implementations try to extract data-level parallelism taking advantage of single-instruction multiple-data extensions (SIMD), capable of performing several operations in parallel on a set of data. In this paper, we propose a more efficient data parallel implementation of the SW algorithm. Our proposed implementation obtains a 30% reduction in the execution time relative to the previous best data-parallel alternative. In this paper we review different alternative implementation of the SW algorithm, compare them with our proposal, and present preliminary results for some heuristic implementations. Finally, we present a detailed study of the computational complexity of the different alignment algorithms presented and their behavior on the different aspect of the CPU microarchitecture.

IEEE Computer Architecture Letters | 2002

Initial Results on Fuzzy Floating Point Computation for Multimedia Processors

Carlos Álvarez; Jesus Corbal; Esther Salamí; Mateo Valero

During the recent years the market of mid low end portable systems such as PDAs or mobile digital phoneshave experimented a revolution in both selling volume and features as handheld devices incorporate Multimedia applications This fact brings to an increase in the computational demands of the devices while still having the limitation of power and energy consumptionInstruction memoization is a promising technique to helpalleviate the problem of power consumption of expensive functional units such as the floating point one Unfortunately this technique could be energy inecient for low end systems due to the additional power consumption of the relatively big tables required In this paper we present a novel way of understandingmultimedia floating point operations based on the fuzzy computation paradigm losses in the computation precision may exchange performance for negligible errors in theoutput Exploiting the implicit characteristics of media FP computation we propose a new technique called fuzzy memoization Fuzzy memoization expands the capabilities of classic memoization by attaching entries with similar inputs to the same output We present a case of study for a SH like processor and report good performance and power delay improvements with feasible hardware requirements

ACM Transactions on Architecture and Code Optimization | 2005

Dynamic memory interval test vs. interprocedural pointer analysis in multimedia applications

Esther Salamí; Mateo Valero

Techniques to detect aliasing between access patterns of array elements are quite effective for many numeric applications. However, although multimedia codes usually follow very regular memory access patterns, current commercial compilers remain unsuccessful in disambiguating them due mainly to complex pointer references. The Dynamic Memory Interval Test is a runtime memory disambiguation technique that takes advantage of the specific behavior of multimedia memory access patterns. It evaluates whether or not the full loop is disambiguated by analyzing the region domain of each load or store before each invocation of the loop.This paper provides a detailed evaluation of the approach, compares it against an advanced interprocedural pointer analysis framework, and analyzes the possibility of using both techniques at the same time. Both techniques achieve similar speedups separately (1.25X in average for a 8-issue width architecture). Furthermore, they can be used together to improve performance (reaching an average speed-up of 1.32X). Results also confirm that memory disambiguation is a key optimization to exploit the available parallelism in multimedia codes, especially for wide-issue architectures (1.50X average speed-up when scaling from 4- to 12-issue width in contrast to a low 1.10X for the baseline compiler).

Journal of Aerospace Information Systems | 2013

Real-Time data processing for the airborne detection of hot spots

Esther Salamí; Cristina Barrado; Enric Pastor; Pablo Royo; Eduard Santamaria

ATELLITES and aircraft have traditionally been the primary source of remote sensing data. The increasing number of satellite constellations and the improvement of the quality of airborne sensors have produced a great deal of imagery and high-precision geographic data. At present, the miniaturization of electronics, computers, and sensors creates new opportunities for remote sensing applications. Small and/or unmanned aircraft are promising technologies, especially for tactical reaction in emergency situations, such as forest fires, where a quick and efficient response is critical to minimize damage.

Explore More