Gary William Grewal | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Gary William Grewal is active.

Explore More

Publication

Featured researches published by Gary William Grewal.

Proceedings of the 7th international symposium on High-level synthesis | 1994

An integrated approach to retargetable code generation

Thomas Charles Wilson; Gary William Grewal; Ben Halley; Dilip K. Banerji

Special-purpose instruction set processors (ISPs) challenge compilers because of instruction level parallelism, small numbers of registers, and highly specialized register capabilities. Many traditionally separate subproblems in code generation have been unified and jointly optimized within a single integer linear programming (ILP) model. ILP modeling provides a powerful methodology for generating high-quality code for a variety of ISPs.<<ETX>>

Integration | 2011

StarPlace: A new analytic method for FPGA placement

Ming Xu; Gary William Grewal; Shawki Areibi

To date, the best algorithms for performing placement on Field-Programmable Gate Arrays (FPGAs) are based on Simulated Annealing (SA). Unfortunately, these algorithms are not scalable due to the long convergence time of the latter. With an aim towards developing a scalable FPGA placer we present an analytic placement method based on a near-linear net model, called star+. The star+ model is a variant of the well-known star model and is continuously differentiable - a requirement of analytic methods that rely on the existence of first- and second-order derivatives. Most importantly, with the star+ model incremental changes in cost resulting from block movement can be computed in O(1) time, regardless of the size of the net. This makes it possible to construct time-efficient solution methods based on conjugate gradient and successive over-relaxation for solving the resulting non-linear equation system. When compared to VPR, the current state-of-the-art placer based on SA, our analytic method is able to obtain an 8-9% reduction in critical-path delay while achieving a speedup of nearly 5x when VPR is run in its fast mode.

international conference on computer design | 1994

An ILP solution for simultaneous scheduling, allocation, and binding in multiple block synthesis

Thomas Charles Wilson; Gary William Grewal; Dilip K. Banerji

Presents a novel approach to the high-level synthesis problems of scheduling, allocation, and binding for multiblock behavioral descriptions. Our design tool, JOSHUA, uses an integer linear programming (ILP) formulation to solve the three interdependent subproblems simultaneously and optimally. The system allows the designer to minimize time, area, and the number of microwords for the entire design, or for specific segments of the design. A diverse module library provides a selection of modules that can perform a specific operation in differing amounts of time (control steps). A novel feature is the ability to select an implementation for part of an algorithm from among a set of implementation alternatives. The system can also handle the issues of path frequencies, loops, parallel threads of execution, and register allocation.<<ETX>>

Code Generation for Embedded Processors | 2002

An ILP-Based Approach to Code Generation

Thomas Charles Wilson; Gary William Grewal; Shawn Henshall; Dilip K. Banerji

Generating efficient code for instruction-set processors involves many different, interrelated subproblems. Several aspects of the problem have been integrated within a single, powerful integer linear programming (ILP) model. We present the central concepts of the model and its application. We also explain the organization and function of a complete code generation system that is currently under development and that surrounds and supports an ILP optimizer. This system contains many optimization modules that can either perform optimizations on their own or present promising opportunities for the ILP to consider.

Canadian Journal of Electrical and Computer Engineering-revue Canadienne De Genie Electrique Et Informatique | 2007

Hierarchical FPGA placement

Shawki Areibi; Gary William Grewal; Dilip K. Banerji; Peng Du

Field-programmable gate arrays (FPGAs) are semiconductor chips that can realize most digital circuits on site by specifying programmable logic and their interconnections. The use of FPGAs has grown almost exponentially because they dramatically reduce design turnaround time and startup cost for electronic products compared with traditional application-specific integrated circuits (ASICs). Efficient computer-aided-design tools are required to compile hardware descriptions into bitstream files that are used to configure the target FPGA to implement the desired circuits. Currently, the compile time, which is dominated by placement and routing time, can easily be hours or even days for large (8-million-gate) FPGAs. With 40-million-gate FPGAs on the horizon, these prohibitively long compile times may nullify the time-to-market advantage of FPGAs. This paper presents two novel placement heuristics that significantly reduce the computation time required to achieve high-quality placements, compared with the versatile place and route (VPR) tool. The first algorithm is an enhancement of simulated annealing (SA) that attempts to solve the placement problem top-down by considering all modules at the flat level. The second algorithm involves a hierarchical approach based on a two-step procedure that first proceeds bottom-up (grouping highly connected modules together) and then top-down (declustering). The overall effect is to reduce the number of entities needing to be considered at each level, such that time-consuming methods like SA become feasible for very large problems. Experimental results show a 70¿80% reduction in runtime, coupled with very high-quality placements.

International Journal of Computational Intelligence and Applications | 2001

AN ENHANCED GENETIC ALGORITHM FOR SOLVING THE HIGH-LEVEL SYNTHESIS PROBLEMS OF SCHEDULING, ALLOCATION, AND BINDING

Gary William Grewal; Thomas Charles Wilson

This paper presents a novel approach to the concurrent solution of three High-Level Synthesis (HLS) problems that are modeled as a Constraint-Satisfaction Problem (CSP) and solved using an Enhanced Genetic Algorithm (EGA). We focus on the core problems of high-level synthesis: Scheduling, Allocation, and Binding. Scheduling consists of assigning of operations in a Data-Flow Graph (DFG) to control steps or clock cycles. Allocation selects specific numbers and types of functional units from a hardware library to perform the operations specified in the DFG. Binding assigns constituent operations of the DFG to specific unit instances. A very general version of this problem is considered where functional units may perform different operations in different numbers of control steps. The EGA is designed to solve CSPs quickly and does not require a user to specify appropriate mutation and crossover rates a priori; these are determined automatically during the course of the genetic search. The enhancements include a directed mutation operator and a new type of elitism that avoids premature convergence. The HLS problems are solved by applying two EGAs in a hierarchical manner. The first performs allocation, while the second performs scheduling and binding and serves as the fitness function for the second. When compared to other, well-known techniques, our results show a reduction in time to obtain optimal solutions for standard benchmarks.

ieee international symposium on parallel distributed processing workshops and phd forum | 2010

The pilot approach to cluster programming in C

John D. Carter; William B. Gardner; Gary William Grewal

The Pilot library offers a new method for programming parallel clusters in C. Formal elements from Communicating Sequential Processes (CSP) were used to realize a process/channel model of parallel computation that reduces opportunities for deadlock and other communication errors. This simple model, plus an application programming interface (API) fashioned on Cs formatted I/O, are designed to make the library easy for novice scientific C programmers to learn. Optional runtime services including deadlock detection help the programmer to debug communication issues. Pilot forms a thin layer on top of standard Message Passing Interface (MPI), preserving the latters portability and efficiency, with little performance impact. MPIs powerful collective operations can still be accessed within the conceptual model.

international conference on computer aided design | 2016

GPlace: a congestion-aware placement tool for ultrascale FPGAs

Ryan Pattison; Ziad Abuowaimer; Shawki Areibi; Gary William Grewal; Anthony Vannelli

Traditional FPGA flows that wait until the routing stage to tackle congestion are quickly becoming less effective. This is due to the increasing size and complexity of FPGA architectures and the designs targeted for them. In this paper, we present two new congestion-aware placement tools for Xilinx UltraScale architectures, called GPlace-pack and GPlace-flat, respectively. The former placer participated in the ISPD 2016 Routability-driven Placement Contest for FPGAs, and finished in third place overall. The latter placer was subseqently developed based on our experience in the contest with GPlace-pack. Results obtained indicate that GPlace-flat is on average 5.3× faster than GPlace-pack. The post routing results show that GPlace-flat is able to obtain a further 22.5% improvement in wirelength and a 40.0% improvement in runtime compared to GPlace-pack.

international conference on vlsi design | 1997

An enhanced genetic solution for scheduling, module allocation, and binding in VLSI design

Gary William Grewal; Thomas Charles Wilson

This paper presents a novel approach to the high-level synthesis problems of scheduling, module allocation, and module binding for behavioral descriptions. A very general version of this problem is considered where modules may perform different operations in different numbers of control steps. These inherently interdependent problems are solved using an Enhanced Genetic Algorithm (EGA) which is both more robust and more efficient than the simple GA.

international conference on machine learning and applications | 2012

A Dynamic Sampling Framework for Multi-class Imbalanced Data

Bazyli Debowski; Shawki Areibi; Gary William Grewal; J. Tempelman

In this paper we present a Dynamic Sampling Framework for use with multi-class imbalanced data containing any number of classes. The framework makes use of existing sampling techniques such as RUS, ROS, and SMOTE and ties the classification algorithm into the sampling process in a wrapper like manner. In doing so the framework is able to search for a desirably sampled training set, thus eliminating the need to specify a target distribution and automatically tuning the training set distribution to the classification algorithms learning preferences. This is important when re-sampling multi-class data where manually searching for an appropriate target distribution would be a daunting task. We test both our Dynamic Sampling approach and traditional Static Sampling using RUS, ROS, SMOTE, ROS+RUS, and SMOTE+RUS with several classification algorithms on a four class, highly imbalanced data set. We compare the results of Static Sampling and Dynamic Sampling and find that overall both techniques are able to raise Recall for the highest minority classes, but Dynamic Sampling is also able to maintain or raise Recall for the majority classes. Also, Dynamic Sampling is overall more robust and resilient, and is better able to sustain classifier Accuracy and to raise G-Mean and Minimum F-Measures.

Explore More