Is this you? Create Your Porfile

Perttu Salmela

Tampere University of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Perttu Salmela is active.

Explore More

Publication

Featured researches published by Perttu Salmela.

international conference on acoustics, speech, and signal processing | 2008

Complex-valued QR decomposition implementation for MIMO receivers

Perttu Salmela; Adrian Burian; Harri Sorokin; Jarmo Takala

Multiple input multiple output (MIMO) transmission is an emerging technique targeted at 3G long term evolution (LTE) systems. One vital baseband function in MIMO receivers is QR decomposition of the channel matrix. In this paper, a processor based complex-valued QR decomposition is presented. The processor is enhanced with complex arithmetic and inverse square root function units. The proposed processor fits well with the real-time requirements of the MIMO receiver. The computing power is tailored for typical MIMO systems. Due to the generality of the applied computing resources it can also be used for other tasks. Also, the presented principles can be applied on any customizable processor architectures to accelerate QR decomposition.

Vlsi Design | 2008

A Programmable Max-Log-MAP Turbo Decoder Implementation

Perttu Salmela; Harri Sorokin; Jarmo Takala

In the advent of very high data rates of the upcoming 3G long-term evolution telecommunication systems, there is a crucial need for efficient and flexible turbo decoder implementations. In this study, a max-log-MAP turbo decoder is implemented as an application-specific instruction-set processor. The processor is accompanied with accelerating computing units, which can be controlled in detail. With a novel memory interface, the dual-port memory for extrinsic information is avoided. As a result, processing one trellis stage with max-log-MAP algorithm takes only 1.02 clock cycles on average, which is comparable to pure hardware decoders. With six turbo iterations and 277 MHz clock frequency 22.7 Mbps, decoding speed is achieved on 130 nm technology.

international conference on embedded computer systems: architectures, modeling, and simulation | 2008

Fine-grained application-specific instruction set processor design for the K-best list sphere detector algorithm

Juho Antikainen; Perttu Salmela; Olli Silvén; Markku J. Juntti; Jarmo Takala; Markus Myllylä

Very high spectral efficiency and data rates are among the goals of future wireless communication systems. A strong candidate for meeting the requirements is the use of multiple antennas at both the transmitter and the receiver, known as multiple-input multiple-output (MIMO) communications. Sphere detectors have been proposed to be used in MIMO reception to achieve or approximate the optimal maximum likelihood detection with reduced computational complexity. Furthermore, list sphere detectors (LSDs) can be used to approximate the maximum a posteriori detection in channel coded systems. The K-best LSD is a particularly interesting LSD variant with predetermined computational complexity and fixed throughput. In this paper, an application-specific instruction set processor is designed for the K-best LSD using transport triggered architecture. 2 × 2 64-level quadrature amplitude modulation transmission scheme with 16-bit arithmetic and a list size of 16 is used as a baseline design target. List size and word length simulations are presented to justify the choices. The designed processor has a significant amount of general-purpose properties, and it reaches a detection throughput of 7.6 Mbps with a hardware complexity of only 25 000 gate equivalents.

asilomar conference on signals, systems and computers | 2007

Application-specific Instruction Set Processor Implementation of List Sphere Detector

Juho Antikainen; Perttu Salmela; Olli Silvén; Markku J. Juntti; Jarmo Takala; Markus Myllylä

Multiple-input multiple-output (MIMO) technology enables higher transmission capacity without additional frequency spectrum and is becoming a part of many wireless system standards. Sphere detection has been introduced in MIMO systems to achieve maximum likelihood (ML) or near-ML estimation with reduced complexity. This paper presents an application-specific instruction set processor (ASIP) implementation of if-best list sphere detector (LSD) using the transport triggered architecture (TTA). The implementation is based on using memory and heap data structure for symbol vector sorting. The design space is explored by presenting several variations of the implementation and comparing them with each other in terms of latency and hardware complexity. An early proposal for a parallelized architecture with a detection throughput of approximately 5.3 Mbps is presented.

application-specific systems, architectures, and processors | 2004

Stride permutation networks for array processors

Tuomas Järvinen; Perttu Salmela; Harri Sorokin; Jarmo Takala

In several digital signal processing algorithms, the computation is performed in consecutive stages consisting of parallel computational nodes. The stages are decoupled by data permutations where stride permutations are common because of their regularity. Parallel computation of such algorithms with reduced number of processing elements implies that several computational nodes are assigned to each element. As a drawback, permutations become more complex and require data storage. In this paper, register-based stride permutation networks are proposed for array processors where the storage requirement of the networks is relatively small, and thus, memory-based structures would be an expensive solution. The proposed networks are regular and scalable and they support any stride of power-of-two. In addition, the networks reach the lower bound in the number of registers indicating area-efficiency. Furthermore, the networks are generated without heuristics, which makes them attractive for automated design procedures.

Joint IST Workshop on Mobile Future, 2006 and the Symposium on Trends in Communications. SympoTIC '06. | 2006

DSP implementation of Cholesky decomposition

Perttu Salmela; Aki Happonen; Tuomas Järvinen; Adrian Burian; Jarmo Takala

Both the matrix inversion and solving a set of linear equations can be computed with the aid of the Cholesky decomposition. In this paper, the Cholesky decomposition is mapped to the typical resources of digital signal processors (DSP) and our implementation applies a novel way of computing the fixed-point inverse square root function. The presented principles result in savings in the number of clock cycles. As a result, the Cholesky decomposition can be incorporated in applications such as 3G channel estimator where short execution time is crucial

international conference on acoustics, speech, and signal processing | 2001

Multi-port interconnection networks for radix-R algorithms

Jarmo Takala; Tuomas Järvinen; Perttu Salmela; David Akopian

In array processors, complex data reordering is often needed to realize the interconnection topologies between the computational nodes in algorithms. Several important algorithms, e.g., discrete trigonometric transforms and Viterbi decoding, can be represented in a radix-R form where the principal topology is stride by R permutation. A general factorialization of stride permutations is derived, which can be mapped onto register-based structures for constructing area-efficient multi-port interconnection networks. The networks can be modified to support several stride permutations and sequence sizes.

International Journal of Digital Multimedia Broadcasting | 2009

3G Long Term Evolution Baseband Processing with Application-Specific Processors

Perttu Salmela; Juho Antikainen; Teemu Pitkänen; Olli Silvén; Jarmo Takala

Data rates in the upcoming 3G long term evolution (LTE) standard will be manifold when compared to the current universal mobile telecommunications system. Implementing receivers conforming with the high-capacity transmission techniques is challenging due to the complexity and computational requirements of algorithms. In this study, the software defined radio (SDR) is targeted and the four essential baseband functions of the 3G LTE receiver, namely, list sphere decoding, fast Fourier transform, QR decomposition, and turbo decoding, are addressed and the functions are implemented as application specific processors (ASPs). As a result, the design space that describes the essential computational challenges of 3G LTE receivers is clarified and estimates of area, power, and interprocessor communication buffer requirements are presented.

Eurasip Journal on Embedded Systems | 2007

Application-Specific Instruction Set Processor Implementation of List Sphere Detector

Juho Antikainen; Perttu Salmela; Olli Silvén; Markku J. Juntti; Jarmo Takala; Markus Myllylä

Multiple-input multiple-output (MIMO) technology enables higher transmission capacity without additional frequency spectrum and is becoming a part of many wireless system standards. Sphere detection has been introduced in MIMO systems to achieve maximum likelihood (ML) or near-ML estimation with reduced complexity. This paper reviews related work on sphere detector implementations and presents an application-specific instruction set processor (ASIP) implementation of K-best list sphere detector (LSD) using transport triggered architecture (TTA). The implementation is based on using memory and heap data structure for symbol vector sorting. The design space is explored by presenting several variations of the implementation and comparing them with each other in terms of their latencies and hardware complexities. An early proposal for a parallelized architecture with a decoding throughput of approximately 5.3 Mbps is presented

signal processing systems | 2005

A flexible multiplier for media processing

Claudio Brunelli; Perttu Salmela; Jarmo Takala; Jari Nurmi

In the last years multimedia processing applications have gained more and more importance in the field of mobile and hand-held devices, requiring dedicated hardware platforms characterized by high performance computation capabilities with reduced area occupation and low power consumption. 2D graphics and signal processing applications in general benefit from the usage of integer single-instruction-multiple-data (SIMD) functional units, while 3D graphics applications can be significantly accelerated employing single precision floating-point functional units. This paper presents a model and implementation of a versatile multiplier able to perform either double precision, (paired) single precision floating-point multiplications or 16-bit or 8-bit SIMD integer (vector) multiplications; it was implemented on an FPGA device and compared to other floating-point multipliers and similar devices, each capable of performing only a limited subset of the proposed design. The results show that all the functionalities provided by the set of the other considered devices can be performed by the proposed design with a minor area overhead penalty and still competitive performance; thus the proposed multiplier represents in particular a good candidate for usage in area-limited designs.

Explore More