Eddie Hung | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Eddie Hung is active.

Explore More

Publication

Featured researches published by Eddie Hung.

field programmable gate arrays | 2013

Towards simulator-like observability for FPGAs: a virtual overlay network for trace-buffers

Eddie Hung; Steven J. E. Wilton

The rising complexity of verification has led to an increase in the use of FPGA prototyping, which can run at significantly higher operating frequencies and achieve much higher coverage than logic simulations. However, a key challenge is observability into these devices, which can be solved by embedding trace-buffers to record on-chip signal values. Rather than connecting a predetermined subset of circuits signals to dedicated trace-buffer inputs at compile-time, in this work we propose that a virtual overlay network is built to multiplex all on-chip signals to all on-chip trace-buffers. Subsequently, at debug-time, the designer can choose a signal subset for observation. To minimize its overhead, we build this network out of unused routing multiplexers, and by using optimal bipartite graph matching techniques, we show that any subset of on-chip signals can be connected to 80-90% of the maximum trace-buffer capacity in less than 50 seconds.

field-programmable technology | 2009

A detailed delay path model for FPGAs

Eddie Hung; Steven J. E. Wilton; Haile Yu; Thomas C. P. Chau; Philip Heng Wai Leong

A complete circuit-level description of a representative FPGA is presented in this paper, from which a simple RC delay model as a function of architectural and technology parameters is derived. Using this model, the expression for the optimal delay of any path through the FPGA can be formulated. We distill our model into being purely architecture dependent, and use it to capture new insight into how FPGA parameters can directly affect its delay. Several applications of this model are: (1) to gain better intuition of how architecture and process parameters affect the delay path in an FPGA, (2) for initial studies into new circuit designs and integrated circuit technologies, (3) in CAD tools for optimisation and sensitivity analysis. The technique described can be applied to arbitrary circuits, and simulations show that our closed form equations give delay values that are accurate to approximately 10% when compared to HSPICE simulation.

IEEE Transactions on Very Large Scale Integration Systems | 2014

Incremental Trace-Buffer Insertion for FPGA Debug

Eddie Hung; Steven J. E. Wilton

As integrated circuits encapsulate more functionality and complexity, verifying that these devices operate correctly under all scenarios is an increasingly difficult task. Rather than using traditional verification techniques such as software simulation, more and more designers are taking advantage of the significantly higher clock speeds that can be achieved by using field-programmable gate-array (FPGA)-based prototypes. A key challenge to these prototypes is the lack of on-chip observability during debugging; one popular solution is to insert trace-buffers into the design to record a limited set of internal signals, but modifying this trace configuration often requires the entire circuit to be recompiled. In this paper, we propose that the original circuit mapping is fully preserved and incremental techniques are used to eliminate the need for a full recompilation, thereby accelerating the debugging process. By exploiting two opportunities available during trace-insertion: the ability to connect from any point of a signal to any trace-pin, and the internal symmetry of the FPGA architecture, we find that incremental trace-insertion can be 98 times faster than a full recompilation, return a routing solution with a shorter wirelength, and have a negligible effect on the critical-path delay of the original circuit when reclaiming 75% of the leftover memory capacity for tracing.

IEEE Transactions on Very Large Scale Integration Systems | 2013

Scalable Signal Selection for Post-Silicon Debug

Eddie Hung; Steven J. E. Wilton

As modern integrated circuits increase in size and complexity, more and more verification effort is necessary to ensure their error-free operation. This has motivated designers to apply post-silicon debugging techniques to their designs, such as by embedding trace instrumentation within. However, a key drawback to this approach is that only a small subset of a chips internal signals can be traced, but selecting the most effective signals to observe must be determined before fabrication and before the nature of any errors is known. This paper explores the tradeoff between the scalability of automated signal selection algorithms, and the amount of circuit observability that they offer. Three selection methods are presented: a technique that optimizes for observability directly; a method based on the graph-centrality of the circuits connectivity; and a hybrid technique that combines both algorithms through exploiting the circuit hierarchy. To quantify the observability of each technique, we define the debug difficulty metric to measure how accurately the traced data can be used to resolve a circuits state behavior. Although we find that the graph-based method offers the least observability of the three algorithms, it was the only method that could be applied to our largest benchmark of over 50 000 flip-flops, computing a selection in less than 90 s. Last, we present a novel application that can only be enabled by these scalable algorithms-speculative debug insertion for field-programmable gate arrays.

field-programmable logic and applications | 2011

Speculative Debug Insertion for FPGAs

Eddie Hung; Steven J. E. Wilton

FPGA prototypes have become an increasingly important part of the overall integrated circuit design and verification flow, providing the ability to test an integrated circuit running at (near) speed with realistic inputs and outputs. When unexpected behaviour is observed in the prototype, it is necessary to determine the source of this behaviour, this usually requires observing signals that are internal to one of the devices in the prototype. Tools currently exist to enable FPGAs to be instrumented, but these are normally used in a reactive manner, that is, instrumentation is only added after incorrect behaviour has been observed. In this paper, we propose speculative debug insertion, in which a tool automatically predicts what signals will be useful during debug, and instruments the design during the first compilation. If done correctly, this can significantly accelerate the debug process, especially for large prototypes containing many FPGAs. However, it is important that this does not negatively affect the performance, capacity, power, or compilation time. We show that speculative debug insertion is possible, and experimentally evaluate the limits to speculative insertion.

field programmable logic and applications | 2012

Limitations of incremental signal-tracing for FPGA debug

Eddie Hung; Steven J. E. Wilton

Developing state-of-the-art custom silicon can be a prohibitively expensive and risky undertaking, due in no small part to the need to perform thorough design verification. Field-Programmable Gate-Arrays offer a flexible platform for constructing prototypes to aid in their verification, but unlike software simulation, observability into these prototypes is a major challenge. Designers can choose to insert trace-instrumentation to enhance on-chip observability, but doing so often requires re-compiling the entire design for each new trace configuration. This work presents two contributions: to explore the limitations of incremental-synthesis for trace-buffer insertion, and to propose CAD optimizations exclusive to this application for improving runtime and routability. We find that 99.4% of all used cluster outputs (driving both combinational and sequential circuit signals) can be incrementally-traced to 75% of the free memory-capacity on an FPGA, an order of magnitude quicker than the original compilation and with a nominal impact on circuit delay, for a 20% minimum channel width (10% area) increase.

field-programmable logic and applications | 2008

A configurable and programmable motion estimation processor for the H.264 video codec

Jose Luis Nunez-Yanez; Eddie Hung; Vassilios A. Chouliaras

This work presents a programmable, configurable motion estimation processor for the H.264 video coding standard, capable of handling the processing requirements of high definition (HD) video and suitable for FPGA implementation. The programmable aspect of the processor follows the ASIP (application specific instruction set processor) approach with a instruction set targeted to accelerating block matching motion estimation algorithms. Configurability relates to the ability to optimize the microarchitecture for the selected algorithm and performance requirements through varying the number and type of execution units at compile time.

IEEE Transactions on Very Large Scale Integration Systems | 2012

Cogeneration of Fast Motion Estimation Processors and Algorithms for Advanced Video Coding

Jose Luis Nunez-Yanez; Atukem Nabina; Eddie Hung; George Vafiadis

This paper presents a flexible and scalable motion estimation processor capable of supporting the processing requirements for high-definition (HD) video using the H.264 Advanced Video Codec, which is suited for FPGA implementation. Unlike most previous work, our core is optimized to execute all existing fast block matching algorithms, which we show to match or exceed the inter-frame prediction performance of traditional full-search approaches at the HD resolutions commonly in use today. Using our development tools, such algorithms can be described using a C-style syntax which is compiled into our custom instruction set. We show that different HD sequences exhibit different characteristics which necessitate a flexible and configurable solution when targeting embedded applications. This is supported in our core and toolset by allowing designers to modify the number of functional units to be instantiated. All processor instances remain binary compatible so recompilation of the motion estimation algorithm is not required. Due to this optimization process, it is possible to match the processing requirements of the selected motion estimation algorithm to the hardware microarchitecture leading to a very efficient implementation.

field-programmable custom computing machines | 2015

Accelerating SpMV on FPGAs by Compressing Nonzero Values

Paul Grigoras; Pavel Burovskiy; Eddie Hung; Wayne Luk

Sparse matrix vector multiplication (SpMV) is an important kernel in many areas of scientific computing, especially as a building block for iterative linear system solvers. We study how loss less nonzero compression can be used to overcome memory bandwidth limitations in FPGA-based SpMV implementations. We introduce a dictionary-based compression algorithm which reduces redundant nonzero values to improve memory bandwidth without reducing computation efficiency by making use of spare FPGA resources. We show how a sparse matrix in the CSR format can be converted to the proposed storage format on the CPU and that average compression ratios of 1.14 - 1.40 and up to 2.65 times can be achieved, over CSR, for relevant matrices in our benchmarks.

field programmable logic and applications | 2014

Transparent insertion of latency-oblivious logic onto FPGAs

Eddie Hung; Tim Todman; Wayne Luk

We present an approach for inserting latency-oblivious functionality into pre-existing FPGA circuits transparently. To ensure transparency - that such modifications do not affect the designs maximum clock frequency - we insert any additional logic post place-and-route, using only the spare resources that were not consumed by the pre-existing circuit. The typical challenge with adding new functionality into existing circuits incrementally is that spare FPGA resources to host this functionality must be located close to the input signals that it requires, in order to minimise the impact of routing delays. In congested designs, however, such co-location is often not possible. We overcome this challenge by using flow techniques to pipeline and route signals from where they originate, potentially in a region of high resource congestion, into a region of low congestion capable of hosting new circuitry, at the expense of latency. We demonstrate and evaluate our approach by augmenting realistic designs with self-monitoring circuitry, which is not sensitive to latency. We report results on circuits operating over 200MHz and show that our insertions have no impact on timing, are 2-4 times faster than compile-time insertion, and incur only a small power overhead.

Explore More