Network


Latest external collaboration on country level. Dive into details by clicking on the dots.

Hotspot


Dive into the research topics where Thomas C. P. Chau is active.

Publication


Featured researches published by Thomas C. P. Chau.


field-programmable technology | 2009

A detailed delay path model for FPGAs

Eddie Hung; Steven J. E. Wilton; Haile Yu; Thomas C. P. Chau; Philip Heng Wai Leong

A complete circuit-level description of a representative FPGA is presented in this paper, from which a simple RC delay model as a function of architectural and technology parameters is derived. Using this model, the expression for the optimal delay of any path through the FPGA can be formulated. We distill our model into being purely architecture dependent, and use it to capture new insight into how FPGA parameters can directly affect its delay. Several applications of this model are: (1) to gain better intuition of how architecture and process parameters affect the delay path in an FPGA, (2) for initial studies into new circuit designs and integrated circuit technologies, (3) in CAD tools for optimisation and sensitivity analysis. The technique described can be applied to arbitrary circuits, and simulations show that our closed form equations give delay values that are accurate to approximately 10% when compared to HSPICE simulation.


applied reconfigurable computing | 2013

Heterogeneous reconfigurable system for adaptive particle filters in real-time applications

Thomas C. P. Chau; Xinyu Niu; Alison Eele; Wayne Luk; Peter Y. K. Cheung; Jan M. Maciejowski

This paper presents a heterogeneous reconfigurable system for real-time applications applying particle filters. The system consists of an FPGA and a multi-threaded CPU. We propose a method to adapt the number of particles dynamically and utilise the run-time reconfigurability of the FPGA for reduced power and energy consumption. An application is developed which involves simultaneous mobile robot localisation and people tracking. It shows that the proposed adaptive particle filter can reduce up to 99% of computation time. Using run-time reconfiguration, we achieve 34% reduction in idle power and save 26-34% of system energy. Our proposed system is up to 7.39 times faster and 3.65 times more energy efficient than the Intel Xeon X5650 CPU with 12 threads, and 1.3 times faster and 2.13 times more energy efficient than an NVIDIA Tesla C2070 GPU.


field programmable gate arrays | 2009

A comparison of via-programmable gate array logic cell circuits

Thomas C. P. Chau; Philip Heng Wai Leong; Sam M. H. Ho; Brian P. W. Chan; Steve C. L. Yuen; Kong-Pang Pun; Oliver C. S. Choy; Xinan Wang

Via-programmable gate arrays (VPGAs) offer a middle ground between application specific integrated circuits and field programmable gate arrays in terms of flexibility, manufactuing cost, speed, power and area. In this paper, we present a novel VPGA logic cell, the complementary universal logic gate (CULG) which can be used to implement both sequential and combinatorial elements. Its performance is compared with a number of other designs including transmission gate, differential cascode voltage switch with pass gate, and standard cell. The CULG is found to have comparable power-delay product and process variation sensitivity to the other designs while offering the lowest power consumption.


field-programmable custom computing machines | 2013

Automating Elimination of Idle Functions by Run-Time Reconfiguration

Xinyu Niu; Thomas C. P. Chau; Qiwei Jin; Wayne Luk; Qiang Liu

A design approach is proposed to automatically identify and exploit run-time reconfiguration opportunities while optimising resource utilisation. We introduce Reconfiguration Data Flow Graph, a hierarchical graph structure enabling reconfigurable designs to be synthesised in three steps: function analysis, configuration organisation, and run-time solution generation. Three applications, based on barrier option pricing, particle filter, and reverse time migration are used in evaluating the proposed approach. The run-time solutions approximate the theoretical performance by eliminating idle functions, and are 1.31 to 2.19 times faster than optimised static designs. FPGA designs developed with the proposed approach are up to 28.8 times faster than optimised CPU reference designs and 1.55 times faster than optimised GPU designs.


conference on decision and control | 2013

Parallelisation of Sequential Monte Carlo for real-time control in air traffic management

Alison Eele; Jan M. Maciejowski; Thomas C. P. Chau; Wayne Luk

This paper presents the parallelisation of a Sequential Monte Carlo algorithm, and the associated changes required when applied to the problem of conflict resolution and aircraft trajectory control in air traffic management. The target problem is non-linear, constrained, non-convex and multi-agent. The new method is shown to have a 98.5% computational time saving over that of a previous sequential implementation, with no degradation in path quality. The computation saving is enough to allow real-time implementation.


IEEE Transactions on Very Large Scale Integration Systems | 2013

Architecture and Design Flow for a Highly Efficient Structured ASIC

Man-Ho Ho; Yanqing Ai; Thomas C. P. Chau; Steve C. L. Yuen; Chiu-Sing Choy; Philip Heng Wai Leong; Kong-Pang Pun

As fabrication process technology continues to advance, mask set costs have become prohibitively expensive. Structured application specific integrated circuits (sASICs) offer a middle ground in price and performance between ASICs and field-programmable gate arrays (FPGAs) by sharing masks across different designs. In this paper, two sASIC architectures are proposed, the first being based on three-input lookup-tables, and the second on AOI22 gates. The sASICs are programmed using a standard-cell compatible design flow. They are customized using a minimum of three masks, i.e., two metals and one via. The area and delay of the sASIC are compared with ASICs and FPGAs. Results over a set of benchmark circuits show that our AOI22-based sASIC had an average of 1.76x/1.41x increase in area/delay compared to ASICs, a considerable improvement compared with the 26.56x/5.09x increase for FPGAs. This is, to the best of our knowledge, the best performance reported in the literature for a practical sASIC. A prototype using the sASIC was fabricated using a universal machine control 0.13-μm mixed-mode/RF process. It was fully verified using scan and functional tests, and used in a demonstration system.


ACM Sigarch Computer Architecture News | 2013

Accelerating sequential Monte Carlo method for real-time air traffic management

Thomas C. P. Chau; James Stanley Targett; Marlon Wijeyasinghe; Wayne Luk; Peter Y. K. Cheung; Benjamin Cope; Alison Eele; Jan M. Maciejowski

This paper presents how field-programmable gate arrays (FPGAs) are used to accelerate the Sequential Monte Carlo method for air traffic management. A novel data structure is introduced for a particle stream that enables efficient evaluation of constraints and weights. A parallel implementation for this streaming data structure is designed, and an analytical model is provided for estimating the performance and resource usage of our implementation. We compare our design to implementations on CPU and GPU. We show 9.3 times speed up and 89 times improvement in energy efficiency over an Intel Core i7-950 CPU with 8 threads and demonstrate 1.3 times speed up and 13.5 times improvement in energy efficiency over an NVIDIA Tesla C2070 GPU with 448 cores. We also estimate the performance of FPGA in future scenario and show that FPGA is able to control 15 times and 2.8 times more aircraft than CPU and GPU in real-time respectively.


field programmable logic and applications | 2012

Adaptive Sequential Monte Carlo approach for real-time applications

Thomas C. P. Chau; Wayne Luk; Peter Y. K. Cheung; Alison Eele; Jan M. Maciejowski

This paper presents an adaptive Sequential Monte Carlo approach for real-time applications. Sequential Monte Carlo method is employed to estimate the states of dynamic systems using weighted particles. The proposed approach reduces the run-time computation complexity by adapting the size of the particle set. Multiple processing elements on FPGAs are dynamically allocated for improved energy efficiency without violating real-time constraints. A robot localisation application is developed based on the proposed approach. Compared to a non-adaptive implementation, the dynamic energy consumption is reduced by up to 70% without affecting the quality of solutions.


design and diagnostics of electronic circuits and systems | 2010

Design of a single layer programmable Structured ASIC library

Thomas C. P. Chau; David W. L. Wu; Yanqing Ai; Brian P. W. Chan; Sam M. H. Ho; Oscar K. L. Lau; Steve C. L. Yuen; Kong-Pang Pun; Oliver C. S. Choy; Philip Heng Wai Leong

A Structured Application-specific Integrated Circuit (SASIC) is a programmable fabric in which a small set of masks are customized for a particular application, serving to reduce the associated non-recurring engineering cost (NRE). In this paper we describe the implementation of a SASIC logic cell which is programmable via a single metal layer. A SASIC fabric prototype is fabricated and all implemented functions are verified on silicon. Experimental measurement verifies correct operation of our SASIC with a clock frequency of over 250MHz.


ACM Transactions on Reconfigurable Technology and Systems | 2015

Automating Elimination of Idle Functions by Runtime Reconfiguration

Xinyu Niu; Thomas C. P. Chau; Qiwei Jin; Wayne Luk; Qiang Liu; Oliver Pell

A design approach is proposed to automatically identify and exploit run-time reconfiguration opportunities while optimising resource utilisation. We introduce Reconfiguration Data Flow Graph, a hierarchical graph structure enabling reconfigurable designs to be synthesised in three steps: function analysis, configuration organisation, and run-time solution generation. Three applications, based on barrier option pricing, particle filter, and reverse time migration are used in evaluating the proposed approach. The run-time solutions approximate the theoretical performance by eliminating idle functions, and are 1.31 to 2.19 times faster than optimised static designs. FPGA designs developed with the proposed approach are up to 28.8 times faster than optimised CPU reference designs and 1.55 times faster than optimised GPU designs.

Collaboration


Dive into the Thomas C. P. Chau's collaboration.

Top Co-Authors

Avatar

Wayne Luk

Imperial College London

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Alison Eele

University of Cambridge

View shared research outputs
Top Co-Authors

Avatar
Top Co-Authors

Avatar
Top Co-Authors

Avatar

Xinyu Niu

Imperial College London

View shared research outputs
Top Co-Authors

Avatar

Kong-Pang Pun

The Chinese University of Hong Kong

View shared research outputs
Top Co-Authors

Avatar

Sam M. H. Ho

The Chinese University of Hong Kong

View shared research outputs
Top Co-Authors

Avatar

Steve C. L. Yuen

The Chinese University of Hong Kong

View shared research outputs
Top Co-Authors

Avatar

Oliver C. S. Choy

The Chinese University of Hong Kong

View shared research outputs
Researchain Logo
Decentralizing Knowledge