Is this you? Create Your Porfile

Jarmo Takala

Tampere University of Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jarmo Takala is active.

Explore More

Publication

Featured researches published by Jarmo Takala.

Archive | 2013

Handbook of Signal Processing Systems

Shuvra S. Bhattacharyya; Ed F. Deprettere; Rainer Leupers; Jarmo Takala

Handbook of Signal Processing Systemsis organized in three parts. The first part motivates representative applications that drive and apply state-of-the art methods for design and implementation of signal processing systems; the second part discusses architectures for implementing these applications; the third part focuses on compilers and simulation tools, describes models of computation and their associated design tools and methodologies. This handbook is an essential tool for professionals in many fields and researchers of all levels.

field-programmable logic and applications | 2010

Customized Exposed Datapath Soft-Core Design Flow with Compiler Support

Otto Esko; Pekka Jääskeläinen; Pablo Huerta; Carlos Sanches De La Lama; Jarmo Takala; José Ignacio Martínez

A popular way to exploit high level programming languages in FPGA designs is to use a soft-core with accompanying software development tools. However, a common shortcoming with the current soft-core offerings is their limited software execution capability: the required performance for the implementation can be often reached only with instruction set extensions. In this paper, we propose and evaluate an application-specific processor design toolset that uses a multi-issue exposed data path processor architecture template. The main benefit of the architecture is scalability with respect to instruction-level parallelism (ILP). The design flow allows the designer to freely customize the data path resources in the core to exploit the ILP available in computation intensive kernels. The design toolset includes a retargetable C compiler and an architecture simulator, making design space exploration feasible. The experiments show that a relatively small soft-core tailored with the toolset provides significant speedups on software execution without using any instruction set extensions. The best measured speedup in comparison to the major commercial soft-cores was fourfold in applications from the CHStone benchmark suite, while the amount of consumed FPGA resources remained moderate.

signal processing systems | 2010

Binary Adders on Quantum-Dot Cellular Automata

Ismo Hänninen; Jarmo Takala

This article describes the design of adder units on quantum-dot cellular automata (QCA) nanotechnology, which promises very dense circuits and high operating frequencies, using a single homogeneous layer of the basic cells. We construct pipelined structures without the earlier noise problems, avoided by careful clocking organization, and the modular layouts are verified with the QCADesigner coherence vector simulation. Our designs occupy only a fraction of area compared to the previous noise rejecting design, while having also superior performance, and it is shown that the wiring overhead of the arithmetic circuits on QCA grows with square-law dependence on the operand word length. Power analysis at the fundamental Landauer’s limit shows, that the operating frequencies will indeed be bound by the energy dissipated in information erasure: under irreversible operation, the clock rates of the adder units on molecular QCA are only tens of gigahertz, while the switching speed of the technology is in the terahertz regime.

international conference on embedded computer systems: architectures, modeling, and simulation | 2010

OpenCL-based design methodology for application-specific processors

Pekka O. Jäskeläinen; Carlos S. de La Lama; Pablo Huerta; Jarmo Takala

OpenCL is a programming language standard which enables the programmer to express the application by structuring its computation as kernels. The OpenCL compiler is given the explicit freedom to parallelize the execution of kernel instances at all the levels of parallelism. In comparison to the traditional C programming language which is sequential in nature, OpenCL enables higher utilization of parallelism naturally available in hardware constructs while still having a feasible learning curve for engineers familiar with the C language. This paper describes methodology and compiler techniques involved in applying OpenCL as an input language for a design flow of application-specific processors. At the core of the methodology is a whole program optimizing compiler that links together the host and kernel codes of the input OpenCL program and parallelizes the result on a customized statically scheduled processor. The OpenCL vendor extension mechanism is used to provide clean access to custom operations. The methodology is studied with a design case to verify the scalability of the implementation at the instruction level and to exemplify the use of custom operations. The case shows that the use of OpenCL allows producing scalable application-specific processor designs and makes it possible to gradually reach the performance of hand-tailored RTL designs by exploiting the OpenCL extension mechanism to access custom hardware operations of varying complexity.

international conference on acoustics, speech, and signal processing | 2008

Complex-valued QR decomposition implementation for MIMO receivers

Perttu Salmela; Adrian Burian; Harri Sorokin; Jarmo Takala

Multiple input multiple output (MIMO) transmission is an emerging technique targeted at 3G long term evolution (LTE) systems. One vital baseband function in MIMO receivers is QR decomposition of the channel matrix. In this paper, a processor based complex-valued QR decomposition is presented. The processor is enhanced with complex arithmetic and inverse square root function units. The proposed processor fits well with the real-time requirements of the MIMO receiver. The computing power is tailored for typical MIMO systems. Due to the generality of the applied computing resources it can also be used for other tasks. Also, the presented principles can be applied on any customizable processor architectures to accelerate QR decomposition.

International Journal of Parallel Programming | 2015

pocl: A Performance-Portable OpenCL Implementation

Pekka Jääskeläinen; Carlos S. de La Lama; Kalle Raiskila; Jarmo Takala; Heikki Berg

OpenCL is a standard for parallel programming of heterogeneous systems. The benefits of a common programming standard are clear; multiple vendors can provide support for application descriptions written according to the standard, thus reducing the program porting effort. While the standard brings the obvious benefits of platform portability, the performance portability aspects are largely left to the programmer. The situation is made worse due to multiple proprietary vendor implementations with different characteristics, and, thus, required optimization strategies. In this paper, we propose an OpenCL implementation that is both portable and performance portable. At its core is a kernel compiler that can be used to exploit the data parallelism of OpenCL programs on multiple platforms with different parallel hardware styles. The kernel compiler is modularized to perform target-independent parallel region formation separately from the target-specific parallel mapping of the regions to enable support for various styles of fine-grained parallel resources such as subword SIMD extensions, SIMD datapaths and static multi-issue. Unlike previous similar techniques that work on the source level, the parallel region formation retains the information of the data parallelism using the LLVM IR and its metadata infrastructure. This data can be exploited by the later generic compiler passes for efficient parallelization. The proposed open source implementation of OpenCL is also platform portable, enabling OpenCL on a wide range of architectures, both already commercialized and on those that are still under research. The paper describes how the portability of the implementation is achieved. We test the two aspects to portability by utilizing the kernel compiler and the OpenCL implementation to run OpenCL applications in various platforms with different style of parallel resources. The results show that most of the benchmarked applications when compiled using pocl were faster or close to as fast as the best proprietary OpenCL implementation for the platform at hand.

signal processing systems | 2013

Pedestrian Navigation Based on Inertial Sensors, Indoor Map, and WLAN Signals

Helena Leppäkoski; Jussi Collin; Jarmo Takala

As satellite signals, e.g. GPS, are severely degraded indoors or not available at all, other methods are needed for indoor positioning. In this paper, we propose methods for combining information from inertial sensors, indoor map, and WLAN signals for pedestrian indoor navigation. We present results of field tests where complementary extended Kalman filter was used to fuse together WLAN signal strengths and signals of an inertial sensor unit including one gyro and three-axis accelerometer. A particle filter was used to combine the inertial data with map information. The results show that both the map information and WLAN signals can be used to improve the pedestrian dead reckoning estimate based on inertial sensors. The results with different combinations of the available sensor information are compared.

IEEE Sensors Journal | 2012

Bias Prediction for MEMS Gyroscopes

Martti Kirkko-Jaakkola; Jussi Collin; Jarmo Takala

MEMS gyroscopes are gaining popularity because of their low manufacturing costs in large quantities. For navigation system engineering, this presents a challenge because of strong nonstationary noise processes, such as 1/f noise, in the output of MEMS gyros. In practice, on-the-fly calibration is often required before the gyroscope data are useful and comparable to more expensive optical gyroscopes. In this paper, we focus on an important part of MEMS gyro processing, i.e., predicting the future bias given calibration data with known (usually zero) input. We derive prediction algorithms based on Kalman filtering and the computation of moving averages, and compare their performance against simple averaging of the calibration data based on both simulations and real measured data. The results show that it is necessary to model fractional noise in order to consistently predict the bias of a modern MEMS gyro, but the complexity of the Kalman filter approach makes other methods, such as the moving averages, appealing.

IEEE Transactions on Aerospace and Electronic Systems | 2007

User-level reliability monitoring in urban personal satellite-navigation

Heidi Kuusniemi; Andreas Wieser; Gérard Lachapelle; Jarmo Takala

Monitoring the reliability of the obtained user position is of great importance, especially when using the global positioning system (GPS) as a standalone system. In the work presented here, we discuss reliability testing, reliability enhancement, and quality control for global navigation satellite system (GNSS) positioning. Reliability testing usually relies on statistical tests for receiver autonomous integrity monitoring (RAIM) and fault detection and exclusion (FDE). It is here extended by including an assessment of the redundancy and the geometry of the obtained user position solution. The reliability enhancement discussed here includes rejection of possible outliers, and the use of a robust estimator, namely a modified Danish method. We draw special attention to navigation applications in degraded signal-environments such as indoors where typically multiple errors occur simultaneously. The results of applying the discussed methods to high-sensitivity GPS data from an indoor experiment demonstrate that weighted estimation, FDE, and quality control yield a significant improvement in reliability and accuracy. The accuracy actually obtained was by 40% better than with equal weights and no FDE; the rms value of horizontal errors was reduced from 15 m to 9 m, and the maximum horizontal errors were largely reduced.

IEEE Transactions on Very Large Scale Integration Systems | 2004

Multiple-symbol parallel decoding for variable length codes

Jari Nikara; Stamatis Vassiliadis; Jarmo Takala; Petri Liuha

In this paper, a multiple-symbol parallel variable length decoding (VLD) scheme is introduced. The scheme is capable of decoding all the codewords in an N-bit block of encoded input data stream. The proposed method partially breaks the recursive dependency related to the VLD. First, all possible codewords in the block are detected in parallel and lengths are returned. The procedure results redundant number of codeword lengths from which incorrect values are removed by recursive selection. Next, the index for each symbol corresponding the detected codeword is generated from the length determining the page and the partial codeword defining the offset in symbol table. The symbol lookup can be performed independently from symbol table. Finally, the sum of the valid codeword lengths is provided to an external shifter aligning the encoded input stream for a new decoding cycle. In order to prove feasibility and determine the limiting factors of our proposal, the variable length decoder has been implemented on an field-programmable gate-array (FPGA) technology. When applied to MPEG-2 standard benchmark scenes, on average 4.8 codewords are decoded per cycle resulting in the throughput of 106 million symbols per second.

Explore More