Jarkko Niittylahti
Tampere University of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Jarkko Niittylahti.
midwest symposium on circuits and systems | 2000
F. Curticapean; Jarkko Niittylahti
A new architecture for a direct digital frequency synthesizer (DDFS) is presented. The introduced technique reduces both the ROM storage requirements and the amount of additional logic and thus exhibits significant potential for low-power applications. Several design examples are described in the paper and compared with conventional designs.
IEEE Transactions on Circuits and Systems for Video Technology | 2004
Jarno K. Tanskanen; Tero Sihvo; Jarkko Niittylahti
This paper proposes an internal data memory architecture supporting byte and modulo addressing for processors having subword parallel processing capability, or alternatively, multiple SIMD-connected processing elements on-chip. Byte-addressable memory efficiently relieves the data word alignment problem in motion estimation block matching. In addition, a special modulo addressing allows part of the bytes in a word to be accessed simultaneously from the both ends of a circular buffer. With the modulo-addressable memory, the external memory bandwidth can be significantly reduced, while preserving efficient memory access performance in block-matching operations. The proposed data memory architecture consists of parallel memory modules, address computation circuitry, and data permutation network. Designs for different data bus widths (N= 2, 4, 8 bytes) are considered.
signal processing systems | 2005
Jarno K. Tanskanen; Reiner Creutzburg; Jarkko Niittylahti
Some of the modern powerful digital signal processors (DSPs) have byte-addressable internal data memory. This property is valuable especially in computationally demanding inter frame video encoding, where data accesses are typically unaligned according to word boundaries. The byte-addressable memory allows load or store command to start accessing from any byte-address, providing at most as many successive bytes from subsequent addresses as data bus can handle in parallel. Maybe the simplest way to construct such a byte-addressable memory is to use N 8-bit memory modules or banks to be accessed in parallel, when N is data bus width in bytes. However, in addition to byte-addressable subsequent bytes, memory consisting of parallel memory modules can provide much more versatile addressing capabilities with reasonable implementation cost. Versatile access formats can significantly reduce the need for data reordering in the register file. At first, we provide motivation for using parallel memory architecture with versatile access formats as an internal on-chip data memory of modern DSP. After this, notations are described and general view of parallel memory design is given. We propose some example parallel data memory architecture designs with data access formats especially helpful in H.263 encoding and MPEG-4 core profile motion and texture encoding. The examples are given for different data bus widths (16, 32, 64, and 128 bits). Finally, performance is shortly compared to other memory architectures and area, delay, and power figures are estimated.
EURASIP Journal on Advances in Signal Processing | 2003
Hakan Öktem; Karen O. Egiazarian; Jarkko Niittylahti; Juha Lemmetti
Digital radiography is a popular diagnostic imaging method. Denoising and enhancement have an important potential in obtaining as much easily interpretable diagnostic information as possible with reasonable absorbed doses of ionising radiation. Due to the increasing usage of high resolution and high precision images with a limited number of human experts, the computational efficiency of the denoising and enhancement becomes important. In this paper, a local adaptive image enhancement and simultaneous denoising algorithm for fulfilling the requirements of digital X-ray image enhancement is introduced. The algorithm is based on modification of the wavelet transform coefficients by a pointwise nonlinear transformation and reconstructing the enhanced image from the modified wavelet transform coefficients. The implementation of algorithm in software is simple, quick, and universal.
signal processing systems | 2004
Jarno K. Tanskanen; Jarkko Niittylahti
Current video compression standards, which process frames macroblock by macroblock, employ several processing functions to achieve the compression. These functions refer to data memory address space in different ways. E.g., performing motion estimation and motion compensation functions requires many times data accesses unaligned to word boundaries. On the other hand, Discrete Cosine Transformation (DCT) and inverse of it (IDCT) for 8 × 8 block can be performed first for rows and then for columns. Thus, transposition is needed between these two stages. Among other things, parallel memory architecture can provide a solution for these tasks. In our other paper, we shortly surveyed parallel memory architectures and proposed parallel memory architecture designs for different data path widths for video coding applications. In this paper, we construct video coding function examples by using the proposed parallel data memory efficiently. Furthermore, performance and implementation cost of the parallel memory architecture are estimated and compared to more conventional memory architectures. The examples are given for different data bus widths (16, 32, 64, and 128 bits). We show that the parallel memory can keep the data path fully utilized in many video coding function implementations. This ensures high-speed operation and full utilization of the processing resources.
international symposium on circuits and systems | 2003
Florean Curticapean; K.I. Palomaki; Jarkko Niittylahti
In this paper, a quadrature direct digital frequency synthesizer (DDFS) based on a new angle rotation algorithm is presented. It is shown that the proposed architecture features higher spectral purity,, reduced hardware cost, power consumption, and tuning. latency when compared to previously presented designs. The introduced DDFS produces 16-bit sine and cosine waveforms with a spurious free dynamic range (SFDR) of 114 dBc. The design was implemented using a 0.35 /spl mu/m CMOS technology. It occupies an area of 0.46 mm/sup 2/ and dissipates 115 mW at 3.3 V supply voltage and 100 MHz clock.
field-programmable custom computing machines | 2002
Tero Rissa; Milan Vasilko; Jarkko Niittylahti
This paper presents a system-level approach for modelling and implementing hardware-software systems, which contain Run-Time Reconfigurable (RTR) hardware. The developed technique provides management and scheduling of RTR tasks from system-level simulations to synthesizable VHDL descriptions. The developed technique was implemented using OCAPI-xl - a system-level modelling and implementation tool based on C + + libraries. The proposed approach allows designers to explore the tradeoffs between implementation of system partitions in software, static hardware, and RTR hardware. After the system has been partitioned, an OCAPI-xl-based design flow can be utilized for implementation of all the system components.
field-programmable technology | 2002
Tero Rissa; Riku Uusikartano; Jarkko Niittylahti
This paper presents a technique for realizing adaptive FIR filters that use constant-coefficient multipliers on a run-time reconfigurable FPGA. Three different adaptive FIR filter architectures for run-time reconfigurable FPGAs are presented. It is shown that run-time reconfigurable logic can be used to efficiently implement adaptive constant-coefficient FIR filters. With reasonable configuration latency, benefits in speed, area and power consumption are obtained.
Microprocessors and Microsystems | 2002
Juha Alakarhu; Jarkko Niittylahti
Abstract The DRAM is a typical bottleneck of a digital system. Moreover, the architectures are getting more complex, which makes their evaluation harder. In this paper, we propose a fast and early estimation of the memory system performance with an actual application in mind. A DRAM simulator, Rascas, developed for this purpose is presented. Rascas is used to study the DRAM behavior with several memory system configurations. The results show that the bandwidth of the same DRAM arrangement can vary over 50% with typical cache configurations and applications. We also show that the packet-oriented multi-bank architectures can exploit the temporal locality of the memory references better than the traditional architectures.
midwest symposium on circuits and systems | 2000
K.I. Palomaki; Jarkko Niittylahti
This paper presents a Taylor series approximation based method for digital sine and cosine phase-to-amplitude conversion. The performance of the phase-to-amplitude conversion and the spurious free dynamic range (SFDR) of the output signal are improved by applying methods similar to those used in direct digital frequency synthesizers for ROM compression. Finally, simulation results according to the number of approximation terms and the amplitude bit width are presented and compared with other phase-to-amplitude conversion methods.