Amila Edirisuriya | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Amila Edirisuriya is active.

Explore More

Publication

Featured researches published by Amila Edirisuriya.

IEEE Transactions on Circuits and Systems | 2014

Improved 8-Point Approximate DCT for Image and Video Compression Requiring Only 14 Additions

Uma Potluri; Arjuna Madanayake; Renato J. Cintra; Fábio M. Bayer; Sunera Kulasekera; Amila Edirisuriya

Video processing systems such as HEVC requiring low energy consumption needed for the multimedia market has lead to extensive development in fast algorithms for the efficient approximation of 2-D DCT transforms. The DCT is employed in a multitude of compression standards due to its remarkable energy compaction properties. Multiplier-free approximate DCT transforms have been proposed that offer superior compression performance at very low circuit complexity. Such approximations can be realized in digital VLSI hardware using additions and subtractions only, leading to significant reductions in chip area and power consumption compared to conventional DCTs and integer transforms. In this paper, we introduce a novel 8-point DCT approximation that requires only 14 addition operations and no multiplications. The proposed transform possesses low computational complexity and is compared to state-of-the-art DCT approximations in terms of both algorithm complexity and peak signal-to-noise ratio. The proposed DCT approximation is a candidate for reconfigurable video standards such as HEVC. The proposed transform and several other DCT approximations are mapped to systolic-array digital architectures and physically realized as digital prototype circuits using FPGA technology and mapped to 45 nm CMOS technology.

IEEE Transactions on Circuits and Systems for Video Technology | 2012

A Row-Parallel 8

Arjuna Madanayake; Renato J. Cintra; Denis Onen; Vassil S. Dimitrov; Nilanka T. Rajapaksha; Leonard T. Bruton; Amila Edirisuriya

An algebraic integer (AI)-based time-multiplexed row-parallel architecture and two final reconstruction step (FRS) algorithms are proposed for the implementation of bivariate AI encoded 2-D discrete cosine transform (DCT). The architecture directly realizes an error-free 2-D DCT without using FRSs between row-column transforms, leading to an 8 × 8 2-D DCT that is entirely free of quantization errors in AI basis. As a result, the user-selectable accuracy for each of the coefficients in the FRS facilitates each of the 64 coefficients to have its precision set independently of others, avoiding the leakage of quantization noise between channels as is the case for published DCT designs. The proposed FRS uses two approaches based on: 1) optimized Dempster-Macleod multipliers, and 2) expansion factor scaling. This architecture enables low-noise high-dynamic range applications in digital video processing that requires full control of the finite-precision computation of the 2-D DCT. The proposed architectures and FRS techniques are experimentally verified and validated using hardware implementations that are physically realized and verified on field-programmable gate array (FPGA) chip. Six designs, for 4-bit and 8-bit input word sizes, using the two proposed FRS schemes, have been designed, simulated, physically implemented, and measured. The maximum clock rate and block rate achieved among 8-bit input designs are 307.787 MHz and 38.47 MHz, respectively, implying a pixel rate of 8 × 307.787≈2.462 GHz if eventually embedded in a real- time video-processing system. The equivalent frame rate is about 1187.35Hz for the image size of 1920 × 1080. All implementations are functional on a Xilinx Virtex-6 XC6VLX240T FPGA device.

ieee convention of electrical and electronics engineers in israel | 2012

\,\times\,

Amila Edirisuriya; Arjuna Madanayake; Renato J. Cintra; Fábio M. Bayer

The discrete cosine transform (DCT) is widely employed in image and video coding applications due to its high energy compaction. In addition to 4×4 and 8×8 transforms utilized in earlier video coding standards, the proposed HEVC standard suggests the use of larger transform sizes including 16 × 16 and 32×32 transforms in order to obtain higher coding gains. Further, it also proposes the use of non-square transform sizes as well as the use of the discrete sine transform (DST) in certain intra-prediction modes. The decision on the type of transform used in a given prediction scenario is dynamically made, to obtain required compression rates. This motivated the proposed digital VLSI architecture for a multitransform engine capable of computing 16×16 approximate 2-D DCT/DST transform, with null multiplicative complexity. The relationship between DCT-II and DST-II is employed to compute both transforms using the same digital core, leading to reductions in both area and power. Closed-form relationship between the 16×16 transform and arbitrary smaller sized transform is presented, enabling the usability of this architecture to compute transforms of size 4 · 2P × 4 · 2q where 0 ≤ p, q ≤ 2.

Measurement Science and Technology | 2012

8 2-D DCT Architecture Using Algebraic Integer-Based Exact Computation

Fábio M. Bayer; Renato J. Cintra; Amila Edirisuriya; Arjuna Madanayake

The discrete cosine transform (DCT) is the key step in many image and video coding standards. The eight-point DCT is an important special case, possessing several low-complexity approximations widely investigated. However, the 16-point DCT transform has energy compaction advantages. In this sense, this paper presents a new 16-point DCT approximation with null multiplicative complexity. The proposed transform matrix is orthogonal and contains only zeros and ones. The proposed transform outperforms the well-known Walsh?Hadamard transform and the current state-of-the-art 16-point approximation. A fast algorithm for the proposed transform is also introduced. This fast algorithm is experimentally validated using hardware implementations that are physically realized and verified on a 40?nm CMOS Xilinx Virtex-6 XC6VLX240T FPGA chip for a maximum clock rate of 342?MHz. Rapid prototypes on FPGA for a 8-bit input word size show significant improvement in compressed image quality by up to 1?2?dB at the cost of only eight adders compared to the state-of-art 16-point DCT approximation algorithm in the literature (Bouguezel et al 2010 Proc. 53rd IEEE Int. Midwest Symp. on Circuits and Systems).

IEEE Circuits and Systems Magazine | 2015

A multiplication-free digital architecture for 16×16 2-D DCT/DST transform for HEVC

Arjuna Madanayake; Renato J. Cintra; Vassil S. Dimitrov; Fábio M. Bayer; Khan A. Wahid; Sunera Kulasekera; Amila Edirisuriya; Uma Potluri; Shiva Madishetty; Nilanka T. Rajapaksha

The DCT and the DWT are used in a number of emerging DSP applications, such as, HD video compression, biomedical imaging, and smart antenna beamformers for wireless communications and radar. Of late, there has been much interest on fast algorithms for the computation of the above transforms using multiplier-free approximations because they result in low power and low complexity systems. Approximate methods rely on the trade-off of accuracy for lower power and/or circuit complexity/chip-area. This paper provides a detailed review of VLSI architectures and CAS implementations for both DCT/DWTs, which can be designed either for higher-accuracy or for low-power consumption. This article covers both recent theoretical advancements on discrete transforms in addition to an overview of existing VLSI architectures. The paper also discusses error free VLSI architectures that provides high accuracy systems and approximate architectures that offer high computational gain making them highly attractive for real-world applications that are subject to constraints in both chip-area as well as power. The methods discussed in the paper can be used in the design of emerging low-power digital systems having lowest complexity at the cost of a loss in accuracy?the optimal trade-off of computational accuracy for lowest possible complexity and power. A complete synopsis of available techniques, algorithms and FPGA/VLSI realizations are discussed in the paper.

IEEE Transactions on Circuits and Systems for Video Technology | 2013

A digital hardware fast algorithm and FPGA-based prototype for a novel 16-point approximate DCT for image compression applications

Amila Edirisuriya; Arjuna Madanayake; Renato J. Cintra; Vassil S. Dimitrov; Nilanka T. Rajapaksha

An area efficient row-parallel architecture is proposed for the real-time implementation of bivariate algebraic integer (AI) encoded 2-D discrete cosine transform (DCT) for image and video processing. The proposed architecture computes 8 × 8 2-D DCT transform based on the Arai DCT algorithm. An improved fast algorithm for AI-based 1-D DCT computation is proposed along with a single channel 2-D DCT architecture. The design improves on the four-channel AI DCT architecture that was published recently by reducing the number of integer channels to one and the number of eight-point 1-D DCT cores from five down to two. The architecture offers exact computation of 8 × 8 blocks of the 2-D DCT coefficients up to the FRS, which converts the coefficients from the AI representation to fixed-point format using the method of expansion factors. Prototype circuits corresponding to FRS blocks based on two expansion factors are realized, tested, and verified on FPGA-chip, using a Xilinx Virtex-6 XC6VLX240T device. Post place-and-route results show a 20% reduction in terms of area compared to the 2-D DCT architecture requiring five 1-D AI cores. The area-time and area-time2 complexity metrics are also reduced by 23% and 22% respectively for designs with eight-bit input word length. The digital realizations are simulated up to place and route for ASICs using 45 nm CMOS standard cells. The maximum estimated clock rate is 951 MHz for the CMOS realizations indicating 7.608·109 pixels/s and a 8 × 8 block rate of 118.875 MHz.

Journal of Control Science and Engineering | 2013

Low-Power VLSI Architectures for DCT\/DWT: Precision vs Approximation for HD Video, Biomedical, and Smart Antenna Applications

Nilanka T. Rajapaksha; Amila Edirisuriya; Arjuna Madanayake; Renato J. Cintra; Denis Onen; Ihab Amer; Vassil S. Dimitrov

Transformation and quantization play a critical role in video codecs. Recently proposed algebraic-integer-(AI-) based discrete cosine transform (DCT) algorithms are analyzed in the presence of quantization, using the High Efficiency Video Coding (HEVC) standard. AI DCT is implemented and tested on asynchronous quasi delay-insensitive logic, using Achronix SPD60 field programmable gate array (FPGA), which leads to lower complexity, higher speed of operation, and insensitivity to process-voltagetemperature variations. Performance of AI DCT with HEVC is measured in terms of the accuracy of the transform coefficients and the overall rate-distortion (R-D) characteristics, using HM 7.1 reference software. Results indicate a 31% improvement over the integer DCT in the number of transformcoefficients having error within 1%. The performance of the 65 nmasynchronous hardware in terms of speed of operation is investigated and compared with the 65 nm synchronous Xilinx FPGA. Considering word lengths of 5 and 6 bits, a speed increase of 230% and 199% is observed, respectively. These results indicate that AI DCT can be potentially utilized in HEVC for applications demanding high accuracy as well as high throughput. However, novel quantization schemes are required to allow the accuracy improvements obtained.

international midwest symposium on circuits and systems | 2011

A Single-Channel Architecture for Algebraic Integer-Based 8

Amila Edirisuriya; Arjuna Madanayake; Jithra Adikari; Vassil S. Dimitrov

Novel multiple radix architectures for 7N × 7N integer multiplications, where N = 2k and k is a non-negative integer, based on recent developments involving multiple radix representation of numbers is presented. Hardware implementations for a 7 × 7 bit multiple-radix multiplier is provided, followed by larger multiplier architectures that employ the 7×7 architecture as a building-block based on the Karatsuba Algorithm. The methodology employed for prototyping the multiplier circuits using Xilinx FPGA devices is described. Measured results in terms of speed and area complexity from on-chip physical FPGA realizations are provided. This recursive architecture provides a new method for building multiple-radix parallel hardware multipliers operating on large integer multiplicands with potential future applications in areas such as computational number theory, digital arithmetic and computer security.

Journal of Circuits, Systems, and Computers | 2017

\,\times\,

Renato J. Cintra; Fábio M. Bayer; Arjuna Madanayake; Uma Potluri; Amila Edirisuriya

Multiplier-free fast algorithms are derived and analyzed for realizing the 8-point discrete sine transform of type II and type VII (DST-II and DST-VII) transforms with applications in image and video compression. A new fast algorithm is identified using numerical search methods for approximating DST-VII without employing multipliers. In addition, recently proposed fast algorithms for approximating the 8-point DCT-II are now extended to approximate DST-II. All proposed approximations for DST-II and DST-VII are compared with ideal transforms, and circuit complexity is measured using FPGA-based rapid prototypes on a 90nm Xilinx Virtex-4 device. The proposed architectures find applications in emerging video processing standards such as H.265/HEVC.

ieee convention of electrical and electronics engineers in israel | 2012

8 2-D DCT Computation

Nilanka T. Rajapaksha; Arjuna Madanayake; Kye-Shin Lee; Amila Edirisuriya; Len T. Bruton; Leonid Belostotski; Yongsheng Xu

A method is proposed for realizing analog active filters at microwave frequencies that avoids the use of spiral inductors and that can be used in the front-end of radio receivers. The proposed analog filters are based on doubly (resistively) terminated, passive, LC-ladder prototypes and wave-digital filter structures. It is expected that the proposed “wave-discrete” filters will carry to the analog domain the excellent sensitivity and stability properties of the wave-digital filters and have low sensitivity to process, voltage and temperature variations. A building block circuit is simulated in Cadence SPICE with BSIM4 models as proof of concept in 65nm CMOS process and frequency response, time domain response and IIP3 results are given. The simulation serves as an encouraging proof-of-concept towards future RF realization of wave-discrete analog IC filters for RF-FPGA, software defined, cognitive, and reconfigurable radio front-ends.

Explore More