Tuomas Järvinen | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Tuomas Järvinen is active.

Explore More

Publication

Featured researches published by Tuomas Järvinen.

international symposium on circuits and systems | 2003

Conflict-free parallel memory access scheme for FFT processors

Jarmo Takala; Tuomas Järvinen; Harri Sorokin

In this paper, a parallel access scheme for constant geometry FFT algorithms is proposed, which allows conflict-free access of operands distributed over parallel memory modules. The scheme is a linear transformation and the address generation is performed with the aid of bit-wise XOR operations. Different FFT lengths can be supported with the aid of a simple address rotation unit. The scheme is general supporting several radices in FFT computations and different numbers of parallel memory modules. The scheme allows parallel butterfly computations independent of the FFT length.

application-specific systems, architectures, and processors | 2004

Stride permutation networks for array processors

Tuomas Järvinen; Perttu Salmela; Harri Sorokin; Jarmo Takala

In several digital signal processing algorithms, the computation is performed in consecutive stages consisting of parallel computational nodes. The stages are decoupled by data permutations where stride permutations are common because of their regularity. Parallel computation of such algorithms with reduced number of processing elements implies that several computational nodes are assigned to each element. As a drawback, permutations become more complex and require data storage. In this paper, register-based stride permutation networks are proposed for array processors where the storage requirement of the networks is relatively small, and thus, memory-based structures would be an expensive solution. The proposed networks are regular and scalable and they support any stride of power-of-two. In addition, the networks reach the lower bound in the number of registers indicating area-efficiency. Furthermore, the networks are generated without heuristics, which makes them attractive for automated design procedures.

Joint IST Workshop on Mobile Future, 2006 and the Symposium on Trends in Communications. SympoTIC '06. | 2006

DSP implementation of Cholesky decomposition

Perttu Salmela; Aki Happonen; Tuomas Järvinen; Adrian Burian; Jarmo Takala

Both the matrix inversion and solving a set of linear equations can be computed with the aid of the Cholesky decomposition. In this paper, the Cholesky decomposition is mapped to the typical resources of digital signal processors (DSP) and our implementation applies a novel way of computing the fixed-point inverse square root function. The presented principles result in savings in the number of clock cycles. As a result, the Cholesky decomposition can be incorporated in applications such as 3G channel estimator where short execution time is crucial

international conference on acoustics, speech, and signal processing | 2001

Multi-port interconnection networks for radix-R algorithms

Jarmo Takala; Tuomas Järvinen; Perttu Salmela; David Akopian

In array processors, complex data reordering is often needed to realize the interconnection topologies between the computational nodes in algorithms. Several important algorithms, e.g., discrete trigonometric transforms and Viterbi decoding, can be represented in a radix-R form where the principal topology is stride by R permutation. A general factorialization of stride permutations is derived, which can be mapped onto register-based structures for constructing area-efficient multi-port interconnection networks. The networks can be modified to support several stride permutations and sequence sizes.

IEEE Transactions on Communications | 2005

Systematic approach for path metric access in Viterbi decoders

Tuomas Järvinen; Perttu Salmela; Teemu Sipilä; Jarmo Takala

A systematic approach for the path metric memory management in Viterbi decoders is presented. Between the parallel computation units and memory modules, a permutation of path metrics is required in order to access the path metrics in correct order. We propose a parallel memory access scheme, which reduces the interconnection complexity between parallel computation units and memory modules by rescheduling the path metric computations.

Joint IST Workshop on Mobile Future, 2006 and the Symposium on Trends in Communications. SympoTIC '06. | 2006

Simplified max-log-MAP decoder structure

Perttu Salmela; Tuomas Järvinen; Jarmo Takala

Area efficient turbo decoder structures are needed in 3G receivers. Typically, max-log-MAP decoders are used as component decoders. In this paper, it is shown that all the computations of the max-log-MAP algorithm can be carried out with slightly modified add compare select (ACS) units. All the required computations of the max-log-MAP algorithm are analyzed and they are mapped to the proposed ACS units (ACSU). A set of four ACSUs is multiplexed to carry out the computations. With the presented method, the structure of the max-log-MAP decoder is simplified and the computing resources are shared economically

personal, indoor and mobile radio communications | 2003

Parallel memory access in turbo decoders

Perttu Salmela; Tuomas Järvinen; Teemu Sipilä; Jarmo Takala

The memory requirements of turbo decoders are high since long code block lengths are preferred. Especially, the extrinsic information memory is accessed frequently with both linear and interleaved access patterns. In this paper, a parallel access scheme into extrinsic information memory is developed for a 3GPP turbo decoder. A single port memory is divided into parallel accessible modules and the memory throughput requirements and both the linear and interleaved access patterns are considered as module and word address generating functions are developed. As a result, the throughput of the parallel access scheme allows high-speed decoding and the usage of the dual port memory can be avoided and savings in chip area are achieved.

International Scholarly Research Notices | 2011

Low-complexity Inverse Square Root Approximation for Baseband Matrix Operations

Perttu Salmela; Adrian Burian; Tuomas Järvinen; Aki Happonen; Jarmo Takala

Baseband functions like channel estimation and symbol detection of sophisticated telecommunications systems require matrix operations, which apply highly nonlinear operations like division or square root. In this paper, a scalable low-complexity approximation method of the inverse square root is developed and applied in Cholesky and QR decompositions. Computation is derived by exploiting the binary representation of the fixedpoint numbers and by substituting the highly nonlinear inverse square root operation with a more implementation appropriate function. Low complexity is obtained since the proposed method does not use large multipliers or look-up tables (LUT). Due to the scalability, the approximation accuracy can be adjusted according to the targeted application. The method is applied also as an accelerating unit of an application-specific instruction-set processor (ASIP) and as a software routine of a conventional DSP. As a result, the method can accelerate any fixed-point system where cost-efficiency and low power consumption are of high importance, and coarse approximation of inverse square root operation is required.

international conference / workshop on embedded computer systems: architectures, modeling and simulation | 2004

Register-Based Permutation Networks for Stride Permutations

Tuomas Järvinen; Jarmo Takala

In several digital signal processing algorithms, intermediate results between computational stages are reordered according to stride permutations. If such algorithms are computed in parallel with reduced number of processing elements where one element computes several computational nodes, the permutation, instead of being hardwired, requires a storage of intermediate data elements. In this paper, register-based permutation networks for stride permutations are proposed. The proposed networks are regular and scalable and they support any stride of power-of-two. In addition, the networks reach the minimum of register complexity, i.e., the number of registers, indicating area-efficiency.

signal processing systems | 2007

Stride Permutation Networks for Array Processors

Tuomas Järvinen; Perttu Salmela; Harri Sorokin; Jarmo Takala

In several digital signal processing algorithms, computational nodes are organized in consecutive stages and data is reordered between these stages. Parallel computation of such algorithms with reduced number of processing elements implies that several computational nodes are assigned to each element. As a drawback, permutations become more complex and require data storage. In this paper, a systematic design methodology for stride permutation networks is derived. These permutations are represented with Boolean matrices, which are decomposed and mapped directly onto register-based networks. The resulting networks are regular and scalable and they support any stride of power-of-two. In addition, the networks reach the lower bound in the number of registers indicating area-efficiency. Since the proposed methodology is systematic, it can be exploited in automated design generation.

Explore More