Andrew Kinane
Dublin City University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Andrew Kinane.
international symposium on neural networks | 2006
Daniel Larkin; Andrew Kinane; Valentin Muresan; Noel E. O’Connor
This paper proposes an efficient hardware architecture for a function generator suitable for an artificial neural network (ANN). A spline-based approximation function is designed that provides a good trade-off between accuracy and silicon area, whilst also being inherently scalable and adaptable for numerous activation functions. This has been achieved by using a minimax polynomial and through optimal placement of the approximating polynomials based on the results of a genetic algorithm. The approximation error of the proposed method compares favourably to all related research in this field. Efficient hardware multiplication circuitry is used in the implementation, which reduces the area overhead and increases the throughput.
visual communications and image processing | 2005
Andrew Kinane; Valentin Muresan; Noel E. O'Connor
The explosive growth of the mobile multimedia industry has accentuated the need for efficient VLSI implementations of the associated computationally demanding signal processing algorithms. This need becomes greater as end-users demand increasingly enhanced features and more advanced underpinning video analysis. One such feature is object-based video processing as supported by MPEG-4 core profile, which allows content-based interactivity. MPEG-4 has many computationally demanding underlying algorithms, an example of which is the Shape Adaptive Discrete Cosine Transform (SA-DCT). The dynamic nature of the SA-DCT processing steps pose significant VLSI implementation challenges and many of the previously proposed approaches use area and power consumptive multipliers. Most also ignore the subtleties of the packing steps and manipulation of the shape information. We propose a new multiplier-less serial datapath based solely on adders and multiplexers to improve area and power. The adder cost is minimised by employing resource re-use methods. The number of (physical) adders used has been derived using a common sub-expression elimination algorithm. Additional energy efficiency is factored into the design by employing guarded evaluation and local clock gating. Our design implements the SA-DCT packing with minimal switching using efficient addressing logic with a transpose memory RAM. The entire design has been synthesized using TSMC 0.09μm TCBN90LP technology yielding a gate count of 12028 for the datapath and its control logic.
power and timing modeling optimization and simulation | 2004
Andrew Kinane; Valentin Muresan; Noel E. O'Connor; Noel Murphy; Seán Marlow
This paper proposes an energy-efficient hardware acceleration architecture for the variable N-point 1D Discrete Cosine Transform (DCT) that can be leveraged if implementing MPEG-4’s Shape Adaptive DCT (SA-DCT) tool. The SA-DCT algorithm was originally formulated in response to the MPEG-4 requirement for object based texture coding, and is one of the most computationally demanding blocks in an MPEG-4 video codec. Therefore energy-efficient implementations are important – especially on battery powered wireless platforms. This N-point 1D DCT architecture employs a re-configurable distributed arithmetic data path and clock gating to reduce power consumption.
field-programmable technology | 2005
Andrew Kinane; Alan Casey; Valentin Muresan; Noel E. O'Connor
Two FPGA implementations of a shape adaptive discrete cosine transform (SA-DCT) accelerator are presented in this paper: one PCI-based and the other AMBA-based. The former is used for conformance testing with the MPEG-4 standard requirements. The latter is an alternative platform for system prototyping and has an architecture more representative of a mobile device. The proposed accelerator meets real time constraints on both platforms with a gate count of approximately 40k, and outperforms the optimised reference software implementation by 20times. It is estimated that the accelerator consumes 250mW on a Virtex-E FPGA and 79mW on a Virtex-II FPGA in the worst case scenario
workshop on image analysis for multimedia interactive services | 2003
Noel E. O'Connor; Valentin Muresan; Andrew Kinane; Daniel Larkin; Seán Marlow; Noel Murphy
This paper presents a brief overview of past and current hardware acceleration (HwA) approaches that have been proposed for the most computationally intensive compression tools of the MPEG-4 standard. These approaches are classified based on their historical evolution and architectural approach. An analysis of both evolutionary and functional classifications is carried out in order to speculate on the possible trends of the HwA architectures to be employed in mobile video platforms.
international symposium on circuits and systems | 2006
Andrew Kinane; Valentin Muresan; Noel E. O'Connor
The efficient design of multiplierless implementations of constant matrix multipliers is challenged by the huge solution search spaces even for small scale problems. Previous approaches tend to use hill-climbing algorithms risking sub-optimal results. The proposed algorithm avoids this by exploring parallel solutions. The computational complexity is tackled by modelling the problem in a format amenable to genetic programming and hardware acceleration. Results show an improvement on state of the art algorithms with future potential for even greater savings
Lecture Notes in Computer Science | 2006
Andrew Kinane; Valentin Muresan; Noel E. O’Connor
The efficient design of multiplierless implementations of constant matrix multipliers is challenged by the huge solution search spaces even for small scale problems. Previous approaches tend to use hill-climbing algorithms risking sub-optimal results. The three-stage algorithm proposed in this paper partitions the global constant matrix multiplier into its constituent dot products, and all possible solutions are derived for each dot product in the first two stages. The third stage leverages the effective search capability of genetic programming to search for global solutions created by combining dot product partial solutions. A bonus feature of the algorithm is that the modelling is amenable to hardware acceleration. Another bonus feature is a search space reduction early exit mechanism, made possible by the way the algorithm is modelled. Results show an improvement on state of the art algorithms with future potential for even greater savings.
international conference on neural information processing | 2006
Daniel Larkin; Andrew Kinane; Noel E. O'Connor
This paper addresses the problem of accelerating large artificial neural networks (ANN), whose topology and weights can evolve via the use of a genetic algorithm. The proposed digital hardware architecture is capable of processing any evolved network topology, whilst at the same time providing a good trade off between throughput, area and power consumption. The latter is vital for a longer battery life on mobile devices. The architecture uses multiple parallel arithmetic units in each processing element (PE). Memory partitioning and data caching are used to minimise the effects of PE pipeline stalling. A first order minimax polynomial approximation scheme, tuned via a genetic algorithm, is used for the activation function generator. Efficient arithmetic circuitry, which leverages modified Booth recoding, column compressors and carry save adders, is adopted throughout the design.
Eurasip Journal on Embedded Systems | 2007
Andrew Kinane; Daniel Larkin; Noel E. O'Connor
We propose novel hardware accelerator architectures for the most computationally demanding algorithms of the MPEG-4 video compression standard-motion estimation, binary motion estimation (for shape coding), and the forward/inverse discrete cosine transforms (incorporating shape adaptive modes). These accelerators have been designed using general low-energy design philosophies at the algorithmic/architectural abstraction levels. The themes of these philosophies are avoiding waste and trading area/performance for power and energy gains. Each core has been synthesised targeting TSMC 0.09 μ m TCBN90LP technology, and the experimental results presented in this paper show that the proposed cores improve upon the prior art.
Lecture Notes in Computer Science | 2006
Daniel Larkin; Andrew Kinane; Noel E. O'Connor