Grzegorz Pastuszak | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Grzegorz Pastuszak is active.

Explore More

Publication

Featured researches published by Grzegorz Pastuszak.

IEEE Transactions on Circuits and Systems for Video Technology | 2005

A high-performance architecture for embedded block coding in JPEG 2000

Grzegorz Pastuszak

JPEG 2000 offers critical advantages over other still image compression schemes at the price of increased computational complexity. Hardware-accelerated performance is a key to successful development of real time JPEG 2000 solutions for applications such as digital cinema and digital home theatre. The crucial role in the whole processing plays embedded block coding with optimized truncation because it requires bit-level operations. In this paper, a dedicated architecture of the block-coding engine is presented. Square-based bit-plane scanning and the internal first-in first-out are combined to speed up the context generation. A dynamic significance state restoring technique reduces the size of the state memories to 1 kbits. The pipeline architecture enhanced by an inverse multiple branch selection method is exploited to code two context-symbol pairs per clock cycle in the arithmetic coder module. The block-coding architecture was implemented in VHDL and synthesized for field-programmable gate array devices. Simulation results show that the single engine can process, on average, about 22 million samples at 66-MHz working frequency.

ieee computer society annual symposium on vlsi | 2008

Transforms and Quantization in the High-Throughput H.264/AVC Encoder Based on Advanced Mode Selection

Grzegorz Pastuszak

The H.264/AVC standard allows for a high compression efficiency at the cost of computational complexity. To achieve as high as possible efficiency, the proposed architecture supports the mode selection based on the rate-distortion optimization. In particular, the dataflow assumes throughput of 32 samples/coefficient per clock cycle, on average, allowing a lot of compression options to be checked. Moreover, the architecture supports all transform sizes specified for high profile using the same hardware resources. Synthesis results show that the design can work at 100 MHz for FPGA Stratix II devices.

IEEE Transactions on Circuits and Systems for Video Technology | 2016

Algorithm and Architecture Design of the H.265/HEVC Intra Encoder

Grzegorz Pastuszak; Andrzej Abramowski

Improved video coding techniques introduced in the H.265/High Efficiency Video Coding (HEVC) standard allow video encoders to achieve better compression efficiencies. On the other hand, the increased complexity requires a new design methodology able to face challenges associated with ever higher spatiotemporal resolutions. This paper presents a computationally scalable algorithm and its hardware architecture able to support intra encoding up to 2160p@30 frames/s resolution. The scalability allows a tradeoff between the throughput and the compression efficiency. In particular, the encoder is able to check a variable number of candidate modes. The rate estimation based on bin counting and the distortion estimation in the transform domain simplify the rate-distortion analysis and enable the evaluation of a great number of candidate intra modes. The encoder preselects candidate modes by the processing of 8 × 8 predictions computed from original samples. The preselection shares hardware resources used for the processing of predictions generated from reconstructed samples. To support intra 4×4 modes for the 2160p@30 frames/s resolution, the encoder incorporates a separate reconstruction loop. The processing of blocks with different sizes is interleaved to compensate for the delay of reconstruction loops. Implementation results show that the encoder utilizes 1086k gates and 52-kB on-chip memories for TSMC 90 nm. The main reconstruction loop can operate at 400 MHz, whereas the remaining modules work at 200 MHz. For 2160p@30 frames/s videos, the average BD-rate is 5.46% compared with that of the HM software.

IEEE Transactions on Circuits and Systems for Video Technology | 2008

A High-Performance Architecture of the Double-Mode Binary Coder for H.264.AVC

Grzegorz Pastuszak

H.264/AVC offers critical advantages over other video compression schemes at the price of increased computational complexity. The efficiency of hardware video encoders depends on all modules embedded in the processing path. This paper presents the architecture of the H.264/AVC binary coder, which is the last stage of the video coder. The module conforms to H.264/AVC High Profile and supports two binary coding modes: context adaptive binary arithmetic coding (CABAC) and context adaptive variable-length coding (CAVLC). The architecture saves a considerable amount of hardware resources since two coding modes share the same logic and storage elements. Five versions of the arithmetic coding path are developed to study the area/performance tradeoff related to parallel symbol encoding. The implementation results show that the parallel symbol encoding allows higher efficiency. The whole architecture of the binary coder is described in VHDL and synthesized for different configurations to show the implementation cost of some coding options. For both CAVLC and CABAC modes, the architecture achieves the similar throughput able to support HDTV in real time.

IEEE Transactions on Circuits and Systems for Video Technology | 2013

Adaptive Computationally Scalable Motion Estimation for the Hardware H.264/AVC Encoder

Grzegorz Pastuszak; Mariusz H. Jakubowski

Motion estimation is the most computationally intensive part of video encoders, as the compression efficiency usually increases with the amount of computations. The adaptive computationally scalable motion-estimation algorithm and its hardware implementation described in this paper allow the H.264/AVC encoders to achieve efficiencies close to optimal in real-time conditions. The algorithm employs several search strategies to adapt to local motion activity, and the number of checked search points is set by the encoder controller for each macroblock. The algorithm can achieve results close to optimum even if the number of search points assigned to macroblocks is strongly limited and varies over time. The architecture applies a novel dataflow. First, the motion vector generation is not constrained by the calculation of residuals and corresponding costs. Second, the fractional-pel interpolation is performed prior to the integer-pel search. Third, the ME and compensation use the same resources. The architecture is verified in the real-time field-programmable gate array hardware encoder. The synthesis results and the real-time verification show that the design can support HDTV at 200 MHz for 0.13-

Iet Image Processing | 2014

Hardware architectures for the H.265/HEVC discrete cosine transform

Grzegorz Pastuszak

\mu{\rm m}

digital systems design | 2013

A Novel Intra Prediction Architecture for the Hardware HEVC Encoder

Andrzej Abramowski; Grzegorz Pastuszak

TSMC technology.

parallel computing in electrical engineering | 2004

A Novel Architecture of Arithmetic Coder in JPEG2000 Based on Parallel Symbol Encoding

Grzegorz Pastuszak

This study presents a design methodology for the two-dimensional (2D) discrete cosine transform dedicated for H.265/HEVC hardware encoders. The methodology decomposes matrix multiplications for different transform sizes into some steps based on the division of transform units into fixed-size blocks. The modified order of processed blocks allows a significant reduction of the size of the transposition buffer. As a consequence, the resource consumption of the whole 2D-transform architecture is decreased. Separate transform cores assigned to two transform stages increase the throughput more than twice. The decomposition enables different hardware configurations of the architectures. Particularly, the architectures applying the proposed methodology are parametrically specified in VHDL, and configuration parameters enable the tradeoff between resources and the throughput. Furthermore, the interface adaptation to desired horizontal and vertical sizes is possible. The use of regular multipliers allows the support for transforms specified in other video standards. Computational elements embedded in architectures are well-suited to FPGA devices, which improves the area-speed efficiency. Synthesis results show that they can operate at 200 and 400 MHz when implemented in FPGA Arria II and TSMC 90 nm, respectively.

IEEE Transactions on Circuits and Systems for Video Technology | 2015

Architecture Design of the H.264/AVC Encoder Based on Rate-Distortion Optimization

Grzegorz Pastuszak

This work presents a novel Intra prediction architecture for the hardware High Efficiency Video Coding (HEVC) encoder. The architecture supports full range of features included in the standard, in accordance with the Main and Main 10 profiles, i.e. all modes and all Prediction Unit (PU) sizes. The architecture embeds the internal RAM working at the doubled clock rate to provide quick access to reference samples. This also leads to a reduction of required number of registers, while maintaining a high throughput. All needed multiplications are carried out using multiplexers and adders. The module provides a few soft configuration options, allowing the encoder to skip some modes and PU sizes. This feature trades computation time for compression efficiency. The module can produce 8x8 prediction blocks almost in each clock cycle. The design can operate at 100 MHz and 200 MHz for FPGA Aria II devices and the TSMC 0.13μm technology, respectively. The implementations generating all allowable predictions are able to process almost 15 and 30 frames per second for 1080p sequences for FPGA and ASIC, respectively. When 4x4 predictions are off, the trough put is doubled.

digital systems design | 2013

Architecture Design and Efficiency Evaluation for the High-Throughput Interpolation in the HEVC Encoder

Grzegorz Pastuszak; Maciej Trochimiuk

This paper presents a high-performance architecture of the context adaptive binary arithmetic coder (CABAC) for the embedded block-coding algorithm in JPEG 2000. The architecture has been developed in two variants to code two or three context-symbol pairs per clock cycle. The inverse multiple branch selection (IMBS) method is proposed to minimize critical paths, which originate from causally dependent operations. The designs have been implemented in VHDL and synthesized for FPGA devices. Simulation results show that the two- and three-symbol engines can process about 22 million samples at 77 and 53 MHz working frequency, respectively.

Explore More