Christian Fobel | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Christian Fobel is active.

Explore More

Publication

Featured researches published by Christian Fobel.

canadian conference on electrical and computer engineering | 2012

A formal and empirical analysis of recombination for genetic algorithm-based approaches to the FPGA placement problem

Robert Collier; Christian Fobel; Laura Richards; Gary William Grewal

To reduce the compilation times for Field Programmable Gate Arrays, genetic algorithms have been proposed for performing placement. However, the quality of solutions produced by these methods, so far, has been inferior to that produced by other search methods. In this paper, we show how traditional recombination operators, employed by the genetic algorithm when performing placement, fail to produce offspring solutions that are confined to the solution subspace defined by the parent solutions. This violates a fundamental principle that should govern the behavior of the recombination operator. We explore this flaw in detail, and propose a novel recombination operator that yields very statistically significant performance improvements, when tested with standard benchmarks.

field programmable logic and applications | 2014

A scalable, serially-equivalent, high-quality parallel placement methodology suitable for modern multicore and GPU architectures

Christian Fobel; Gary William Grewal; Deborah A. Stacey

Placement and routing run-times continue to dominate the automated FPGA design flow. As the size of FPGA architectures continue to grow exponentially, it remains critical to develop parallel tools for FPGA design where the amount of exposed concurrent work scales with the size of the designs to be synthesized. In this paper, we propose a novel algorithm for parallel placement, based on simulated annealing, where the amount of parallel work directly scales with the size of the net-list to be placed. Our approach concurrently evaluates and conditionally applies very large sets of non-conflicting swaps using common parallel computing primitives, including stream compaction, category reduction, and sort. While our design is suitable for targeting all modern parallel computing platforms, we present results from our implementation which targets NVIDIAs CUDA platform, where we achieve a mean speed-up of 19x over VPR with post-routing critical-path-delay and wire-length quality that matches or exceeds VPR. We believe that this work is an important step towards the development of a scalable, high-quality placement tool.

canadian conference on electrical and computer engineering | 2007

Using Hardware Acceleration to Reduce FPGA Placement Times

Christian Fobel; Gary William Grewal; Andrew Morton

Placement is one of the most time-consuming processes in automatically synthesizing and configuring circuits for field programmable gate arrays (FPGAs). In this paper, we present a hardware-accelerated iterative-improvement algorithm for performing placement. The design and evaluation of the accelerated algorithm is presented. Initial results indicate speedups of 3.5 times of hardware over software execution times. By taking better advantage of hardware parallelism, it is anticipated that speedups of at least an order of magnitude can be accomplished.

reconfigurable computing and fpgas | 2015

Scalable analytic placement for FPGA on GPGPU

Ryan Pattison; Christian Fobel; Gary William Grewal; Shawki Areibi

The growth in field-programmable gate array (FPGA) capacity has outpaced improvements in serial processor speeds for the last decade and will continue for the foreseeable future. Unfortunately, as modern FPGAs have millions of logic elements and continue to grow, the compilation of designs can take hours or even days to complete. As a result, the runtimes of placement and routing flow have become a major concern for FPGA users and vendors alike. Roughly half the total compilation time is spent in the placement phase. Analytic placement algorithms solve the FPGA placement problem quickly. With an aim toward developing a scalable FPGA placement algorithm, we present a parallel analytic placement algorithm implemented on general-purpose computing graphics processing units (GPGPUs). The proposed analytic placer is scalable, that is, the placer maintains parallel efficiency as the problem size grows and number of parallel workers increase. Our algorithm is a parallelized version of the serial analytic placement algorithm StarPlace and achieves speedups of 13-31 times compared to this serial version. The proposed parallel algorithm is on average 78 times faster than the academic tool versatile place and route (VPR) when run in its fast, wirelength driven mode. The wirelength is on average 3% lower than VPR, with a 24% reduction in critical-path delay.

ieee international newcas conference | 2012

GPU Approach to FPGA placement based on star

Christian Fobel; Gary William Grewal; Robert Collier; Deborah A. Stacey

While simulated-annealing is currently the most widely used method for performing FPGA placement, it does not scale to very large designs. Modern many-core architectures (including GPUs) offer a promising alternative to traditional multi-core processors for improving runtime performance. In this work, we propose a GPU-accelerated simulated-annealing variant for FPGA placement. Our approach uses the Star+ wirelength model along with a novel method of efficiently generating large sets of independent swap operations, providing a high level of parallelism. Speedups from 5.4-89.2× (median 20.2×) were achieved over a single-core CPU-only implementation.

genetic and evolutionary computation conference | 2012

Depictions of genotypic space for evaluating the suitability of different recombination operators

Robert Collier; Christian Fobel; Gary William Grewal; Mark Wineberg

When the genetic algorithm recombines two parent genotypes, the differences between them define a genotypic subspace, and any offspring produced should be confined to this subspace. Although this might seem insignificant, those recombination (or crossover) operators that violate this principle can direct a search away from the region (in genotypic space) that contains the two parent genotypes. This is contrary to the task for which the recombination operator was originally developed and can be detrimental, so this paper introduces a visualization that can be used to detect violations of this principle. The methodology also inspired the development of a different approach to recombining permutations, and a brief case study shows that an alternative recombination operator that does not violate this principle can be used to achieve a performance improvement over previous attempts to optimize Field-Programmable Gate-Array placements using a genetic algorithm. We believe that this technique will be invaluable for developing additional recombination operators.

ieee international conference on high performance computing data and analytics | 2011

GPU-Accelerated Wire-Length Estimation for FPGA Placement

Christian Fobel; Gary William Grewal; Deborah A. Stacey

In the FPGA design flow, placement remains one of the most time-consuming stages, and is also crucial in terms of quality of result. HPWL and Star+ are widely used as cost metrics in FPGA placement for estimating the total wire-length of a candidate placement prior to routing. However, both wire-length models are expensive to compute requiring O(nm) time, where n is the number of nets and m is the average net cardinality. This paper proposes using the massively multi-threaded architecture provided by GPUs to reduce the time required to compute HPWL and Star+. First, a specialized set of data structures is developed for storing net-connectivity information on the GPU. Next, a study is performed to determine how to best map the data structures onto the GPU to exploit the heterogeneous memories and thread-level parallelism that are available. Finally, a study is performed to determine what effect circuit size and net cardinality have on the speedups that can be achieved. Overall, the results show that speedups of as much as 160x over a serial CPU implementation can be achieved for both models when tested using standard benchmarks.

Microelectronics Journal | 2009

Hardware accelerated FPGA placement

Christian Fobel; Gary William Grewal; Andrew Morton

A key advantage of field-programmable gate arrays (FPGAs) over full-custom and semi-custom devices is that they provide relatively quick implementation from concept to physical realization. However, as modern FPGAs reach close to one million logic blocks, more efficient and scalable FPGA placement algorithms are needed. This paper investigates the feasibility of using hardware acceleration, in the form of FPGAs, to improve the performance of placement algorithms. An iterative algorithm is presented which exploits the fine-grain parallelism in routing individual nets. Overall, our results show that speedups of 3-4 times can be obtained, without sacrificing solution quality.

international conference on computer design | 2008

A parallel Steiner tree heuristic for macro cell routing

Christian Fobel; Gary William Grewal

Global routing of macro cells remains an important but time-consuming step in the VLSI design cycle. Macro cells are large, irregularly sized parameterized circuit modules that typically contain large numbers of terminals that must be interconnected. The interconnection pattern for each set of terminals (net) that must be connected is a Steiner tree, and the primary sub-problem in the global routing of macro cells is to find a set of dissimilar, low-cost Steiner trees for each net that must be routed. In this paper, a two-phase, parallel (multi-processor) algorithm is proposed for quickly constructing a diverse pool of high-quality Steiner trees for routing of multi-terminal nets. In the first phase, a single Steiner tree is constructed using a heuristic, called Shrubbery. Then, in the second phase, a pool of dissimilar, high-quality trees are created from the original tree, by running multiple instances of a local search in parallel. Computational experiments performed on over 800 commonly used benchmarks show that running multiple instances of the local search in parallel results in near-linear speed-up over the serial case. Most importantly, the trees produced are both high-quality and dissimilar, allowing for numerous routing possibilities for each net.

Evolutionary Intelligence | 2014

Advancing genetic algorithm approaches to field programmable gate array placement with enhanced recombination operators

Robert Collier; Christian Fobel; Ryan Pattison; Gary William Grewal; Shawki Areibi; Peter Jamieson

Abstract Since their inception, field programmable gate arrays have seen an enormous growth in usage because they can dramatically reduce design and manufacturing costs. However, the time required for placement (a key step in the design) is dominating the compilation process. In this paper, we take some initial theoretical steps towards developing an efficient genetic algorithm for solving the placement problem by developing suitable recombination operators for performing placement. According to Holland, when the genetic algorithm recombines two parent genotypes, the differences between them define a genotypic subspace, and any offspring produced should be confined to this subspace. Those recombination operators that violate this principle can direct a search away from the region containing the parent genotypes and this is contrary to the intended task for recombination. This is often detrimental to search performance. This paper contributes the development of an intuitive visualization technique that can be used to easily detect violations of the previous principle. The efficacy of the proposed methodology is demonstrated and it is demonstrated that many standard recombination operators violate this principle. The methodology is then used to guide the development of novel operators that exhibit substantial (and statistically significant) improvements in performance over standard recombination operators.

Explore More