Jari Nikara | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jari Nikara is active.

Explore More

Publication

Featured researches published by Jari Nikara.

IEEE Transactions on Very Large Scale Integration Systems | 2004

Multiple-symbol parallel decoding for variable length codes

Jari Nikara; Stamatis Vassiliadis; Jarmo Takala; Petri Liuha

In this paper, a multiple-symbol parallel variable length decoding (VLD) scheme is introduced. The scheme is capable of decoding all the codewords in an N-bit block of encoded input data stream. The proposed method partially breaks the recursive dependency related to the VLD. First, all possible codewords in the block are detected in parallel and lengths are returned. The procedure results redundant number of codeword lengths from which incorrect values are removed by recursive selection. Next, the index for each symbol corresponding the detected codeword is generated from the length determining the page and the partial codeword defining the offset in symbol table. The symbol lookup can be performed independently from symbol table. Finally, the sum of the valid codeword lengths is provided to an external shifter aligning the encoded input stream for a new decoding cycle. In order to prove feasibility and determine the limiting factors of our proposal, the variable length decoder has been implemented on an field-programmable gate-array (FPGA) technology. When applied to MPEG-2 standard benchmark scenes, on average 4.8 codewords are decoded per cycle resulting in the throughput of 106 million symbols per second.

international conference on computer design | 2002

Parallel multiple-symbol variable-length decoding

Jari Nikara; Stamatis Vassiliadis; Jarmo Takala; Mihai Sima; Petri Liuha

In this paper a parallel Variable-Length Decoding (VLD) scheme is introduced. The scheme is capable of decoding all the codewords in an N-bit buffer whose accumulated codelength is at most N. The proposed method partially breaks the recursive dependency related to the MPEG-2 VLD. All possible codewords in the buffer are detected in parallel and the sum of the codelengths is provided to the external shifter aligning the variable-length coded input stream for a new decoding cycle. Two length detection mechanisms are proposed: the first approach determines the length in a parallel/serial fashion and the second using a new device denoted as MultiplexedAdd. In order to prove feasibility and determine the limiting factors of our proposal, the parallel/serial codeword detector with 32-bit input has been described in behavioral non-optimized VHDL and mapped onto Alteras ACEX EP1K100 FPGA. The implemented prototype exhibits a latency of 110 ns and uses 32% of the logic cells of the device. When applied to MPEG-2 standard benchmark scenes, on average 3.5 symbols are decoded per cycle.

human factors in computing systems | 2012

EasyGroups: binding mobile devices for collaborative interactions

Andrés Lucero; Tero Jokela; Arto Palin; Viljakaisa Aaltonen; Jari Nikara

We present a touch and proximity based method for binding a group of mobile devices into an ecosystem for collaborative interactions. We aim to provide a seamless user experience by integrating the binding method with the application start-up flow. Our method also determines the order of the devices, allowing implementation of spatial interactions.

design, automation, and test in europe | 2009

A case for multi-channel memories in video recording

Eero Aho; Jari Nikara; Petri Antero Tuominen; Kimmo Kuusilinna

In video recording, ever increasing demands on image resolution, frame rate, and quality necessitate a lot of memory bandwidth and energy. This paper presents and evaluates such a potential memory load in future handheld multimedia devices. Based on the achieved simulation results, the multi-channel memories provide the capability for high bandwidth without excessive overhead in terms of energy consumption. A full HDTV (1080p) quality video recording with H.264/AVC encoding at 30 frames per second (fps) is found here to require 4.3 GB/s memory bandwidth. According to the simulations, this memory requirement can be fulfilled with four 32-bit memory channels operating at 400 MHz and consuming 345 mW of power. As another example, 400 MHz 8-channel memory configuration is able to provide the required bandwidth for video recording with up to 3840times2160@30 fps. Die stacking is the technology thought to be able to provide the required bandwidth, sufficiently low power consumption, and the multi-channel memory organization.

international conference on electronics, circuits, and systems | 2002

Pipeline architecture for two-dimensional discrete cosine transform and its inverse

Jarmo Takala; Jari Nikara; Konsta Punkka

In this paper, a pipeline architecture supporting both 8 /spl times/ 8 discrete cosine transform (DCT) and its inverse is described. A regular two-dimensional algorithm with perfect shuffle topology for DCT is derived. The resulting signal flow graph is mapped vertically onto sequential processing units. A similar pipeline architecture is derived for the inverse transform. The unified architecture is obtained by mapping both previous pipelines onto common resources. The proposed architecture contains three multipliers for 8 /spl times/ 8 transforms and its throughput can be increased with additional pipelining.

Signal Processing | 2006

Discrete cosine and sine transforms: regular algorithms and pipeline architectures

Jari Nikara; Jarmo Takala; Jaakko Astola

In this paper, regular fast algorithms for discrete cosine transform (DCT) and discrete sine transform (DST) of types II-IV are proposed and mapped onto pipeline architectures. The algorithms are based on the factorization of transform matrices described earlier by Wang. The regular structures of the algorithms are advantageous when mapping them onto hardware although such algorithms do not reach the theoretical lower bound on multiplicative complexity. Instead, the algorithms lend themselves for vertical mapping resulting in area-efficient pipeline structures. A unified pipeline architecture supporting both the DCT-II and its inverse is implemented with data path synthesis for proving the feasibility and estimating the performance. The latency of an ASIC implementation is 94 cycles while operating at 250 MHz frequency.

embedded systems for real time multimedia | 2012

Towards real-time applications in mobile web browsers

Eero Aho; Kimmo Kuusilinna; Tomi Aarnio; Janne Pietiäinen; Jari Nikara

WebGL and WebCL are web targeted versions of OpenGL ES and OpenCL standards. Using these standards, it is possible to better exploit the hardware resources in embedded systems from web browsers allowing timely processing of audio, video, and graphics. WebGL excels in graphics applications while WebCL fares better when more flexibility is required in execution platform selection, load balancing, data formats, control flow, or memory access patterns. This paper explores the potential for mobile web application acceleration utilizing WebGL and particularly WebCL which is currently under intense development. Where driver support is lacking, WebGL is used as a proxy to provide an estimate of WebCL opportunity. Speedups in the order of 200x over JavaScript are demonstrated in best case situations for a GPU target. In similar situations, CPU acceleration can be 10x while running in a laptop browser. In addition, as building and optimizing a WebCL implementation is part of the reported work, an overview of the important development issues is given.

international symposium on system-on-chip | 2009

Performance analysis of multi-channel memories in mobile devices

Jari Nikara; Eero Aho; Petri Antero Tuominen; Kimmo Kuusilinna

Multi-channel memories can be organized in a variety of ways to optimize for different kinds of memory loads. However, their efficient configuration and management in mobile environment is not obvious. In this paper, a SystemC model of a multi-channel memory is constructed out of low-power double data rate SDRAMs. The model is simulated with sketchy load in order to gain understanding how memory access size and number of channels affect access times and power figures. The simulations confirm that applications with large data accesses benefit from the multi-channel memories. When used properly, multi-channel memories provide the capability for high throughput but do not introduce excessive overhead compared to single-channel memories in terms of energy consumption. The experiments also reveal that relatively small accesses can be extremely expensive if the memory is not properly configured. In future systems, novel policies, advanced control mechanisms, and reorganization of traditional memory management are needed to keep the power consumption manageable.

international conference on acoustics, speech, and signal processing | 2000

Pipeline architecture for 8/spl times/8 discrete cosine transform

Jarmo Takala; Jari Nikara; David Akopian; Jaakko Astola; Jukka Saarinen

An array processor architecture for the 2-D discrete cosine transform (DCT) based on the row-column decomposition of the 2-D DCT. The utilized 1-D DCT architectures are derived by applying the principles used to construct pipelined fast Fourier transform architectures. In general, this approach has not been used due to the irregularities found in the fast DCT algorithms. The basis of our architectural derivation is the constant geometry fast algorithms for DCT described earlier. By rescheduling the operations; an in-place algorithm can be obtained, which can be mapped onto a pipelined structure with the aid of vertical projection. In addition, a sequential matrix transposition network is described, which is based on shift-exchange units.

international symposium on circuits and systems | 2002

Register-based reordering networks for matrix transpose

Jarmo Takala; Tuomas Järvinen; Jari Nikara

In array processors, data reordering is often needed to perform the computations in correct order. Matrix transpose is such a reordering operation used, e.g., in block-based video coding implementations. In this paper, a parameterized decomposition of the permutation matrix performing 2/sup k/ /spl times/ 2/sup k/ matrix transpose is derived. A systematic approach to design register-based reordering units based on the decomposition is proposed where the number of ports, 2/sup q/, can be varied, q /spl les/ k.

Explore More