Peter Hallschmid | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Peter Hallschmid is active.

Explore More

Publication

Featured researches published by Peter Hallschmid.

field programmable gate arrays | 2001

Detailed routing architectures for embedded programmable logic IP cores

Peter Hallschmid; Steven J. E. Wilton

As the complexity of integrated circuits increases, the ability to make post-fabrication changes to fixed ASIC chips will become more and more attractive. This ability can be realized using programmable logic cores. These cores are blocks of programmable logic that can be embedded into a fixed-function ASIC or a custom chip. Such cores differ from stand-alone FPGAs in that they can take on a variety of shapes and sizes. With this in mind, we investigate the detailed routing characteristics of rectangular programmable logic cores. We quantify the effects of having different x and y channel capacities, and show that the optimum ratio between the x and y channel widths for a rectangular core is between 1.2 and 1.5. We also present a new switch block family optimized for rectangular cores. Compared to a simple extension of an existing switch block, our new architecture leads to an 8.7% improvement in density with little effect on speed. Finally, we show that if the channel widths and switch block are chosen carefully the penalty for using a rectangular core (compared to a square core with the same logic capacity) is small; for a core with an aspect ratio of 2:1, the area penalty is 1.6% and the speed penalty is 1.1%.

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems | 2008

Fast Design Space Exploration Using Local Regression Modeling With Application to ASIPs

Peter Hallschmid; Resve A. Saleh

The configuration of an application-specific instruction-set processor through an exhaustive search of the design space is computationally prohibitive. Consequently, we propose a novel algorithm that models the design space using local regression statistics. With only a small subset of the design space sampled, our model uses statistical inference to estimate all remaining points. This technique enables existing design space exploration approaches to make longer strides toward the optimal point while evaluating fewer points in the design space. We tested our approach on two important aspects of processor architecture. Initially, we optimized the pattern history table (PHT) of a GSelect branch predictor to minimize the total energy of an embedded processor. Our approach was able to find the optimal configuration for the majority of benchmarks tested. By configuring the PHT size using our approach, the total processor energy was reduced by 17.2% on average, which is close to the possible percentage of 17.6% using optimal configurations. We then extended our approach to a multidimensional cache tuning problem where we configured a two-level cache hierarchy with 19 278 possible configurations. In this case, only 1% of the design space was simulated, resulting in a 100 times speedup. In doing so, we were able to identify near optimal configurations for most benchmarks and reduce the overall energy of the processor by 13.9% on average, with one benchmark by as much as 53%.

field programmable logic and applications | 2014

High-level synthesis-based design methodology for Dynamic Power-Gated FPGAs

Rehan Ahmed; Assem A. M. Bsoul; Steven J. E. Wilton; Peter Hallschmid; Richard Klukas

Static leakage power consumption is critical in modern FPGAs for many applications. Dynamic Power-Gating (DPG), in which parts of the FPGA in-use logic blocks are powered-down at run-time, is a promising technique to reduce the static power. Adoption of such emerging DPG enabled FPGA architectures remains challenging as the current tool-chains to program the FPGA does not support this type of power-gating. Moreover, manually identifying profitable power-gating opportunities in an application requires significant design expertise and is time consuming. In this paper, we propose a high-level synthesis-based design framework that exploits the dynamic power-gating feature of the FPGAs to minimize the static power dissipation. We use this framework on a set of CHStone benchmark suite and demonstrate that power-gating opportunities for hardware accelerators can be identified in an automatic way. Results show that up to 96% reduction in static energy is achieved for individual accelerators using dynamic power-gating technique.

applied reconfigurable computing | 2015

Hierarchical Dynamic Power-Gating in FPGAs

Rehan Ahmed; Steven J. E. Wilton; Peter Hallschmid; Richard Klukas

Dynamic power-gating has been shown to reduce FPGA static leakage power significantly. In this paper, we propose a high-level synthesis (HLS) compiler-assisted framework that automatically detects the hierarchical power-gating opportunities, and turns off accelerators when they are not required. Unlike previous work which considers turning off entire accelerators when they are not required, our technique is more fine-grained, in that it allows turning off a portion of an accelerator when other parts of an accelerator are running. Results on CHStone benchmarks show that hierarchical power-gating can save up to 31 % of static energy when the parent and descendant accelerators are power-gated independently. An additional savings of up to 25 % can be achieved if the parent accelerator is power-gated while the sub-accelerator runs.

design automation conference | 2007

Automatic cache tuning for energy-efficiency using local regression modeling

Peter Hallschmid; Resve A. Saleh

Configuration of an application-specific instruction-set processor (ASIP) through an exhaustive search of the design space is computationally prohibitive. We propose a novel algorithm that models the design space using local regressions. With only a small subset of the design space sampled, our model uses statistical inference to estimate all remaining points. We used our approach to tune a two-level cache with 19,278 legal configurations. Only 1% of the design space was simulated resulting in a 100times speedup over a brute-force approach, hi doing so, we were able to identify near optimal configurations for most benchmarks and reduce the overall power of the processor by 13.9% on average, with one benchmark as high as 53%.

field-programmable logic and applications | 2011

Modeling and Evaluation of Dynamic Partial Reconfigurable Datapaths for FPGA-Based Systems Using Stochastic Networks

Rehan Ahmed; Peter Hallschmid

The dynamic partial reconfiguration of FPGAs is a method which modifies parts of FPGA configuration memory at run-time. The hardware resources and time overhead needed to perform a partial reconfiguration (PR) can significantly impact overall system cost and performance and must be considered early in the design cycle. Unfortunately, predicting reconfiguration overhead is difficult especially in the presence of non-deterministic factors such as the sharing of resources with traffic not related to the PR process. Thus, current design practices include the measurement of overhead but only after the system has been built thus limiting the number of candidates that can be evaluated. We propose a flexible approach for modeling the PR datapath based on Queueing Theory such that we can estimate performance trends and bottlenecks of the PR process while considering the impact of shared resources. Performance trends are provided for an example system to demonstrate the effectiveness of the approach.

Archive | 2013

Model-based Performance Evaluation of Dynamic Partial Reconfigurable Datapaths for FPGA-based Systems

Rehan Ahmed; Peter Hallschmid

Dynamic partial-reconfigurable (DPR) FPGAs have the property that all or part of their functionality can be time-multiplexed at run-time. This is achieved by dynamically transferring partial configuration bitstreams from off-chip memory to FPGA configuration memory via a specialized datapath. The performance of this datapath can have a significant impact on overall system performance and should be considered early in the design cycle. Unfortunately, performance measures for such systems can typically be determined only after development. Such measures are heavily dependent upon the detailed characteristics of the datapath and on the particular workload imposed on the system during measurement and thus can only be used to make predictions for systems similar to that used for initial measurements. In this chapter, we outline an approach to model the DPR datapath early in the design cycle using queueing networks. This modeling approach is essential for experimenting with system parameters and for providing statistical insight into the effectiveness of candidate architectures. A case study is provided to demonstrate the usefulness and flexibility of the modeling scheme.

canadian conference on electrical and computer engineering | 2007

Fast Power Estimation for Automatic Instruction-Set Selection

Peter Hallschmid; David Yeager; Resve A. Saleh

Recent research in the area of application specific instruction-set processors (ASIPs) has focused on the automatic selection of a custom instruction-set based on a high-level description of the application. Automatic instruction-set selection is typically comprised of instruction selection and instruction enumeration. During instruction enumeration, candidate instructions are identified using a simple cost function that minimizes the total number of operations in each basic block of the application while also adhering to the micro-architectural constraints of the ASIP. Existing methods indirectly account for power by using the above mentioned cost function and relying on the assumption that fewer operations will always reduce power. This approach is generally taken because power estimation is time-consuming. In this paper, we directly estimate the power dissipation of a custom instruction by using a simple yet effective probabilistic approach based on probability distributions of the input Hamming distance. Results indicate that our approach can estimate the power dissipation incurred by a custom instruction to within 12% of the value reported by PrimePower.

canadian conference on electrical and computer engineering | 2007

Point Estimation in Design Space Exploration Using Local Regression Modeling

Peter Hallschmid; Resve A. Saleh

Configuration of an application-specific instruction-set processor (ASIP) through an exhaustive search of the design space is computationally prohibitive. To enable further automation, new methods are needed to speed up design space exploration (DSE), since the evaluation of each configuration is very expensive in terms of run-time. One method of speeding up DSE is to simulate a small sample of the design space and then use this information to model the rest of the design space using statistical regression techniques. From this model, unknown points within the space can be estimated. This approach has the potential to speed-up DSE time by several orders of magnitude. In this paper, we study the effectiveness of using local regressions statistics (LOESS) to model the design space. We compare the use of a non-parametric statistics based on LOESS to polynomial regressions in their ability to estimate unknown points. After showing the effectiveness of LEOSS, we apply it to the configuration of the pattern history table (PHT) of a branch predictor when configured to minimize the overall power dissipation of the processor.

IEEE Transactions on Very Large Scale Integration Systems | 2005

Routing architecture optimizations for high-density embedded programmable IP cores

Peter Hallschmid; Steven J. E. Wilton

Programmable logic cores differ from stand-alone field-programmable gate arrays in that they can take on a variety of shapes and sizes. With this in mind, we investigate the detailed routing architecture of rectangular programmable logic cores. We quantify the effects of having different X and Y channel capacities and show that the optimum ratio between the X and Y channel widths for a rectangular core is between 1.2 and 1.5. We also present a new switch block family optimized for rectangular cores. Further, we quantify the effects of logic block pin placement. Compared with a simple extension of an existing switch block, our new architecture leads to a density improvement of up to 11.9%. Finally, we show that, if the channel width, switch block, and pin placement are chosen carefully, then the penalty for using a rectangular core (compared to a square core with the same logic capacity) is small; for a core with an aspect ratio of 2:1, the area penalty is 1.6% and the speed penalty is 3.8%.

Explore More