Ian Kuon
University of Toronto
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Ian Kuon.
Foundations and Trends in Electronic Design Automation | 2008
Ian Kuon; Russell Tessier; Jonathan Rose
Field-Programmable Gate Arrays (FPGAs) have become one of the key digital circuit implementation media over the last decade. A crucial part of their creation lies in their architecture, which governs the nature of their programmable logic functionality and their programmable interconnect. FPGA architecture has a dramatic effect on the quality of the final devices speed performance, area efficiency, and power consumption. This survey reviews the historical development of programmable logic devices, the fundamental programming technologies that the programmability is built on, and then describes the basic understandings gleaned from research on architectures. We include a survey of the key elements of modern commercial FPGA architecture, and look toward future trends in the field.
field programmable gate arrays | 2009
Jason Luu; Ian Kuon; Peter Jamieson; Ted Campbell; Andy Ye; Wei Mark Fang; Jonathan Rose
The VPR toolset [6, 7] has been widely used to perform FPGA architecture and CAD research, but has not evolved over the past decade to include many architectural features now present in modern FPGAs. This paper describes a new version of the toolset that includes four significant features: first, it now supports a broad range of single-driver routing architectures [29, 4, 16]. Single-driver routing has significantly different architectural and electrical properties from the multi-driver approach previously modelled, and is now employed in the majority of FPGAs sold. Second, the new release can now model a heterogeneous selection of hard logic blocks, which could include the hard memory and multipliers that are now ubiquitous in FPGAs. Third, we provide optimized electrical models of a wide range of architectures in different process technologies, including a range of area-delay tradeoffs for each single architecture. Prior releases of VPR did not publish even one architecture file with accurate resistance and capacitance parameters. Finally, to maintain robustness and to support future development the release includes a set of regression tests to check functionality and quality of result of the output of the tools. To illustrate the use of the new features, we present a new look at the FPGA area vs. logic block LUT size question that shows that small LUT sizes, with the use of carefully optimized electrical design and single-driver architectures, have better area (relative to 4-LUTs) than previously thought. Another experiment shows that several of the previous architectural results are invariant in moving from multi-driver to single-driver routing architecture and across a range of process technologies.
field programmable gate arrays | 2008
Ian Kuon; Jonathan Rose
Field-programmable gate arrays (FPGAs) are used in a wide range of markets that have differing cost, performance and power consumption requirements. It would be advantageous if a single device family could serve these varied needs but the economics of catering to this wide distribution of market demands suggest more than one family is appropriate. Consequently, FPGA vendors have moved to provide a more diverse set of families that sit at different points in the area-speed-power design space. In this work, our goal is to understand the circuit and architectural design attributes of an FPGA that enable trade-offs between area and speed, and to determine the magnitude of the possible trade-offs. This will be useful for architects seeking to determine the number of device families in a suite of offerings, as well as the changes to make between families. We have found that varying both architecture and transistor sizing of an FPGA allows the effective area to change by a factor of 3.6 from largest to smallest and the speed to change by a factor of 2.6 from fastest to slowest. It is interesting to observe that the range of area and delay trade-offs possible by varying only the transistor sizing of a single architecture is larger than the ranges observed in past architectural experiments. In addition to transistor size, we note that LUT size is one of the most useful parameters for trading off area and delay
field-programmable custom computing machines | 2004
Navid Azizi; Ian Kuon; Aaron Egier; Ahmad Darabiha; Paul Chow
Current high-performance applications are typically implemented on large-scale general-purpose distributed or multiprocessing systems often based on commodity microprocessors. Field-Programmable Gate Arrays (FPGAs) have now reached a level of sophistication that they too could be used for such applications. In this paper we explore the feasibility of using FPGAs to implement large-scale application-specific computations by way of a case study that implements a novel molecular dynamics system. The system has been designed such that it is scalable and parallelizable. On the Transmogrifier 3 (TM3), the system performs calculations on an 8,192 particle system in 37 seconds at 26 MHz. This implementation shows that by scaling to more modern parts running at 100 MHz, a speedup of over 20 x can be achieved compared to a state-of-the-art microprocessor. This can also be achieved at less cost, using less power and taking less space than a standard microprocessor-based system, while maintaining the computational precision required.
field programmable gate arrays | 2005
Ian Kuon; Aaron Egier; Jonathan Rose
Creating a new FPGA is a challenging undertaking because of the significant effort that must be spent on circuit design, layout and verification. It currently takes approximately 50 to 200 person years from architecture definition to tape-out for a new FPGA family. Such a lengthy development time is necessary because the process is primarily done manually. Simplifying and shortening the design process would be advantageous since it could reduce the time to market for new FPGAs while also enhancing architecture explorations. One way to accomplish this is through automation and, in this paper, we describe our efforts to automate the entire process by making use of a previously developed set of tools that assist in the creation of the repeatable FPGA tile [25]. Our aim is to demonstrate the feasibility of a CAD flow that uses an input FPGA architecture description to generate a layout that can be sent for fabrication. We prove the feasibility of this proposition by actually designing and fabricating a complete FPGA. Initial functional testing of the FPGA appears promising but is inconclusive at this time. Through this architecture to layout process, we investigate the issues that are faced in the architecture selection, circuit design, layout and verification of such an automatically produced FPGA. We found that there are significant savings in design time. As well, we demonstrate that we can produce a layout using automated tools that is only 36% larger than a commercial FPGA device layout. Given the significant time savings and the relatively minor area penalty, we feel that this work demonstrates that automated layout of FPGAs is practical and advantageous.
ACM Transactions on Reconfigurable Technology and Systems | 2011
Jason Luu; Ian Kuon; Peter Jamieson; Ted Campbell; Andy Ye; Wei Mark Fang; Kenneth B. Kent; Jonathan Rose
The VPR toolset has been widely used in FPGA architecture and CAD research, but has not evolved over the past decade. This article describes and illustrates the use of a new version of the toolset that includes four new features: first, it supports a broad range of single-driver routing architectures, which have superior architectural and electrical properties over the prior multidriver approach (and which is now employed in the majority of FPGAs sold). Second, it can now model, for placement and routing a heterogeneous selection of hard logic blocks. This is a key (but not final) step toward the incluion of blocks such as memory and multipliers. Third, we provide optimized electrical models for a wide range of architectures in different process technologies, including a range of area-delay trade-offs for each single architecture. Finally, to maintain robustness and support future development the release includes a set of regression tests for the software. To illustrate the use of the new features, we explore several architectural issues: the FPGA area efficiency versus logic block granularity, the effect of single-driver routing, and a simple use of the heterogeneity to explore the impact of hard multipliers on wiring track count.
design automation conference | 2008
Ian Kuon; Jonathan Rose
The creation of an FPGA requires extensive transistor-level design. This is necessary for both the final design, and during architecture exploration, when many different logic and routing architectures are considered. For such explorations, it is not feasible to spend significant amounts of time on transistor-level design. This paper presents an automated transistor sizing tool for FPGA architecture exploration that uses a two-phased approach - a coarse rapid phase with simple modeling followed by refinement with much more accurate models. The output of the system is a design optimized towards a specific area-delay criterion. We compare the quality of our results to prior manual and partially automated approaches. Also, our tool has been used to produce hundreds of candidate architectures which we are releasing to support future high quality explorations.
Archive | 2009
Ian Kuon; Jonathan Rose
The book focuses on the cost/area, performance and power consumption differences between Field-Programmable Gate Arrays (FPGAs) and Application Specific Integrated Circuits (ASICs). These differences are referred to as the gap between FPGAs and ASICs and knowledge of this gap is fundamental for people who design FPGAs, who use FPGAs, or who are considering their use. This book reviews and examines the gap in two ways. The first portion of the book focuses on measurements of the silicon area, performance, and power consumption gap. This is done by comparing designs implemented on a commercial FPGA and using an ASIC methodology. Through this comparison, various trends are noted to elucidate some of the design choices that can narrow the gap. The latter half of the book focuses on the trade-offs that can be made in the creation of a FPGA to narrow the gap selectively. This is useful because silicon area, performance and power consumption are not equally important to all users of FPGAs. The book describes the approach used to investigate these trade-offs and it includes a detailed description of the transistor sizing tool developed to assist in this investigation. The scope of the trade-offs is then examined and the effect of these trade-offs on the FPGA to ASIC gap is considered. The idea of making cost and performance trade-offs has been considered in past works but this book explores the use of transistor-sizing to enable these trade-offs.
IEEE Transactions on Very Large Scale Integration Systems | 2011
Ian Kuon; Jonathan Rose
Field-programmable gate arrays (FPGAs) are used in a variety of markets that have differing cost, performance and power consumption requirements. While it would be ideal to serve all these markets with a single FPGA family, the diversity in the needs of these markets means that generally more than one family is appropriate. Consequently, FPGA vendors have moved to provide a diverse set of families that sit at different points in the area-speed-power design space. This paper aims to understand the circuit and architectural design attributes of FPGAs that enable tradeoffs between area and speed, and to determine the magnitude of the possible tradeoffs. This will be useful for architects seeking to determine the number of device families in a suite of offerings, as well as the changes to make between families. We explore a broad range of architectures and circuit designs and developed a transistor sizing tool that automatically optimizes each design. In this paper, we describe this tool and demonstrate that it achieves results that are comparable to past work but with vastly less effort. We then use the designs produced by the tool to explore the range of tradeoffs possible. We find that through architecture and transistor sizing changes it is possible to usefully vary the area of an FPGA by a factor of 2.0 and the performance of an FPGA by a factor of 2.1. We also observe that the range of area and delay tradeoffs possible by varying only the transistor sizing of a single architecture is larger than the ranges observed in past architectural experiments. In addition to transistor size, we note that LUT size is one of the most useful parameters for trading off area and delay.
Archive | 2010
Ian Kuon; Jonathan Rose
This chapter continues the exploration of area and delay trade-offs started in the previous chapter. That chapter focused on architectural and process technology selection as those choices have been the conventional approach for enabling area and delay trade-offs. In each of the FPGA designs considered in that exploration, all the transistor sizes in the design were optimized using the tool described in Chapter 4 but, in all cases, the objective in that optimization was to minimize the circuits’ area delay product. However, other objective functions are possible and exploiting these possibilities may significantly expand the useful design space for FPGAs and a larger design space will present more opportunities to trade-off area and delay.