Greg Semeraro
University of Rochester
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Greg Semeraro.
high-performance computer architecture | 2002
Greg Semeraro; Grigorios Magklis; Rajeev Balasubramonian; David H. Albonesi; Sandhya Dwarkadas; Michael L. Scott
As clock frequency increases and feature size decreases, clock distribution and wire delays present a growing challenge to the designers of singly-clocked, globally synchronous systems. We describe an alternative approach, which we call a multiple clock domain (MCD) processor, in which the chip is divided into several clock domains, within which independent voltage and frequency scaling can be performed. Boundaries between domains are chosen to exploit existing queues, thereby minimizing inter-domain synchronization costs. We propose four clock domains, corresponding to the front end , integer units, floating point units, and load-store units. We evaluate this design using a simulation infrastructure based on SimpleScalar and Wattch. In an attempt to quantify potential energy savings independent of any particular on-line control strategy, we use off-line analysis of traces from a single-speed run of each of our benchmark applications to identify profitable reconfiguration points for a subsequent dynamic scaling run. Using applications from the MediaBench, Olden, and SPEC2000 benchmark suites, we obtain an average energy-delay product improvement of 20% with MCD compared to a modest 3% savings from voltage scaling a single clock and voltage system.
international symposium on microarchitecture | 2002
Greg Semeraro; David H. Albonesi; Steven G. Dropsho; Grigorios Magklis; Sandhya Dwarkadas; Michael L. Scott
We describe the design, analysis, and performance of an on-line algorithm to dynamically control the frequency/voltage of a Multiple Clock Domain (MCD) microarchitecture. The MCD microarchitecture allows the frequency/voltage of microprocessor regions to be adjusted independently and dynamically, allowing energy savings when the frequency of some regions can be reduced without significantly impacting performance. Our algorithm achieves on average a 19.0% reduction in Energy Per Instruction (EPI), a 3.2% increase in Cycles Per Instruction (CPI), a 16.7% improvement in Energy-Delay Product, and a Power Savings to Performance Degradation ratio of 4.6. Traditional frequency/voltage scaling techniques which apply reductions globally to a fully synchronous processor achieve a Power Savings to Performance Degradation ratio of only 2-3. Our Energy-Delay Product improvement is 85.5% of what has been achieved using an off-line algorithm. These results were achieved using a broad range of applications from the MediaBench, Olden, and Spec2000 benchmark suites using an algorithm we show to require minimal hardware resources.
IEEE Computer | 2003
David H. Albonesi; Rajeev Balasubramonian; S.G. Dropsbo; Sandhya Dwarkadas; Eby G. Friedman; Michael C. Huang; Volkan Kursun; Grigorios Magklis; Michael L. Scott; Greg Semeraro; Pradip Bose; Alper Buyuktosunoglu; Peter W. Cook; Stanley E. Schuster
By using adaptive processing to dynamically tune major microprocessor resources, developers can achieve greater energy efficiency with reasonable hardware and software overhead while avoiding undue performance loss. Adaptive processors require few additional transistors. Further, because adaptation occurs only in response to infrequent trigger events, the decision logic can be placed into a low-leakage state until such events occur.
international conference on parallel architectures and compilation techniques | 2002
Steven G. Dropsho; Alper Buyuktosunoglu; Rajeev Balasubramonian; David H. Albonesi; Sandhya Dwarkadas; Greg Semeraro; Grigorios Magklis; M.L. Scottt
Energy efficiency in microarchitectures has become a necessity. Significant dynamic energy savings can be realized for adaptive storage structures such as caches, issue queues, and register files by disabling unnecessary storage resources. Prior studies have analyzed individual structures and their control. A common theme to these studies is exploration of the configuration space and use of system IPC as feedback to guide reconfiguration. However when multiple structures adapt in concert, the number of possible configurations increases dramatically, and assigning causal effects to IPC change becomes problematic. To overcome this issue, we introduce designs that are reconfigured solely on local behavior. We introduce a novel cache design that permits direct calculation of efficient configurations. For buffer and queue structures, limited histogramming permits precise resizing control. When applying these techniques we show energy savings of up to 70% on the individual structures, and savings averaging 30% overall for the portion of energy attributed to these structures with an average of 2.1% performance degradation.
symposium on asynchronous circuits and systems | 2004
Greg Semeraro; David H. Albonesi; Grigorios Magklis; Michael L. Scott; Steven G. Dropsho; Sandhya Dwarkadas
We analyze an Alpha 21264-like globally-asynchronous, locally-synchronous (GALS) processor organized as multiple clock domain (MCD) microarchitecture and identify the architectural features of the processor that influence the limited performance degradation measured. We show that the out-of-order superscalar execution features of a processor, which allow traditional instruction execution latency to be hidden, are the same features that reduce the performance degradation impact of the synchronization costs of an MCD processor. In the case of our Alpha 21264-like processor, up to 94% of the MCD synchronization delays are hidden and do not impact overall performance. In addition, we show that by adding out-of-order superscalar execution capabilities to a simpler microarchitecture, such as an Intel StrongARM-like processor, as much as 62% of the performance degradation caused by synchronization delays can be eliminated.
international symposium on microarchitecture | 2003
Grigorios Magklis; Greg Semeraro; David H. Albonesi; Steven G. Dropsho; Sandhya Dwarkadas; Michael L. Scott
Multiple clock domains is one solution to the increasing problem of propagating the clock signal across increasingly larger and faster chips. The ability to independently scale frequency and voltage in each domain creates a powerful means of reducing power dissipation. A multiple clock domain (MCD) microarchitecture, which uses a globally asynchronous, locally synchronous (GALS) clocking style, permits future aggressive frequency increases, maintains a synchronous design methodology, and exploits the trend of making functional blocks more autonomous. In MCD, each processor domain is internally synchronous, but domains operate asynchronously with respect to one another. Designers still apply existing synchronous design techniques to each domain, but global clock skew is no longer a constraint. Moreover, domains can have independent voltage and frequency control, enabling dynamic voltage scaling at the domain level.
international symposium on computer architecture | 2003
Grigorios Magklis; Michael L. Scott; Greg Semeraro; David H. Albonesi; Steven G. Dropsho
Archive | 2004
David H. Albonesi; Greg Semeraro; Grigorios Magklis; Michael L. Scott; Rajeev Balasubramonian; Sandhya Dwarkadas
international symposium on microarchitecture | 2004
Steven G. Dropsho; Greg Semeraro; David H. Albonesi; Grigorios Magklis; Michael L. Scott
Workshop on Application Specific Processors (WASP) | 2003
Greg Semeraro; David H. Albonesi; Steven G. Dropsho; Grigorios Magklis; Michael L. Scott