Jay B. Brockman | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jay B. Brockman is active.

Explore More

Publication

Featured researches published by Jay B. Brockman.

international symposium on microarchitecture | 2009

McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures

Sheng Li; Jung Ho Ahn; Richard D. Strong; Jay B. Brockman; Dean M. Tullsen; Norman P. Jouppi

This paper introduces McPAT, an integrated power, area, and timing modeling framework that supports comprehensive design space exploration for multicore and manycore processor configurations ranging from 90nm to 22nm and beyond. At the microarchitectural level, McPAT includes models for the fundamental components of a chip multiprocessor, including in-order and out-of-order processor cores, networks-on-chip, shared caches, integrated memory controllers, and multiple-domain clocking. At the circuit and technology levels, McPAT supports critical-path timing modeling, area modeling, and dynamic, short-circuit, and leakage power modeling for each of the device types forecast in the ITRS roadmap including bulk CMOS, SOI, and double-gate transistors. McPAT has a flexible XML interface to facilitate its use with many performance simulators. Combined with a performance simulator, McPAT enables architects to consistently quantify the cost of new ideas and assess tradeoffs of different architectures using new metrics like energy-delay-area2 product (EDA2P) and energy-delay-area product (EDAP). This paper explores the interconnect options of future manycore processors by varying the degree of clustering over generations of process technologies. Clustering will bring interesting tradeoffs between area and performance because the interconnects needed to group cores into clusters incur area overhead, but many applications can make good use of them due to synergies of cache sharing. Combining power, area, and timing results of McPAT with performance simulation of PARSEC benchmarks at the 22nm technology node for both common in-order and out-of-order manycore designs shows that when die cost is not taken into account clustering 8 cores together gives the best energy-delay product, whereas when cost is taken into account configuring clusters with 4 cores gives the best EDA2P and EDAP.

Concurrent Engineering | 1996

Concurrent Subspace Optimization Using Design Variable Sharing in a Distributed Computing Environment

Brett A. Wujek; John E. Renaud; Stephen M. Batill; Jay B. Brockman

This paper reviews recent implementation advances and modifications in the continued development of a Concurrent Subspace Op timization (CSSO) algorithm for Multidisciplinary Design Optimization (MDO) The CSSO MDO algorithm implemented in this research incor porates a Coordination Procedure of System Approximation (CP-SA) for design updates This study also details the use of a new discipline based decomposition strategy which provides for design variable sharing across discipline design regimes (i e subspaces) A graphical user interface is developed which provides for menu driven execution of MDO algorithms and results display this new programming environment highlights the modularity of the CSSO algorithm The algorithm is implemented in a distributed computing environment using the graphical user interface providing for truly concurrent discipline design Implementation studies introduce two new multidisciplinary design test problems the optimal design of a high performance, low cost structural system and the preliminary sizing of a general aviation aircraft concept for optimal perfor mance Significant time savings are observed when using distributed computing for concurrent design across disciplines The use of design vari able sharing across disciplines does not introduce any difficulties in implementation as the design update in the CSSO MDO algorithm is gener ated in the CP-SA Application of the CSSO algorithm results in a considerable decrease in the number of system analyses required for optimization in both test problems More importantly for the fully coupled aircraft concept sizing problem a significant reduction in the number of individual contributing analyses is observed

ACM Transactions on Architecture and Code Optimization | 2013

The McPAT Framework for Multicore and Manycore Architectures: Simultaneously Modeling Power, Area, and Timing

Sheng Li; Jung Ho Ahn; Richard D. Strong; Jay B. Brockman; Dean M. Tullsen; Norman P. Jouppi

This article introduces McPAT, an integrated power, area, and timing modeling framework that supports comprehensive design space exploration for multicore and manycore processor configurations ranging from 90nm to 22nm and beyond. At microarchitectural level, McPAT includes models for the fundamental components of a complete chip multiprocessor, including in-order and out-of-order processor cores, networks-on-chip, shared caches, and integrated system components such as memory controllers and Ethernet controllers. At circuit level, McPAT supports detailed modeling of critical-path timing, area, and power. At technology level, McPAT models timing, area, and power for the device types forecast in the ITRS roadmap. McPAT has a flexible XML interface to facilitate its use with many performance simulators. Combined with a performance simulator, McPAT enables architects to accurately quantify the cost of new ideas and assess trade-offs of different architectures using new metrics such as Energy-Delay-Area2 Product (EDA2P) and Energy-Delay-Area Product (EDAP). This article explores the interconnect options of future manycore processors by varying the degree of clustering over generations of process technologies. Clustering will bring interesting trade-offs between area and performance because the interconnects needed to group cores into clusters incur area overhead, but many applications can make good use of them due to synergies from cache sharing. Combining power, area, and timing results of McPAT with performance simulation of PARSEC benchmarks for manycore designs at the 22nm technology shows that 8-core clustering gives the best energy-delay product, whereas when die area is taken into account, 4-core clustering gives the best EDA2P and EDAP.

design, automation, and test in europe | 2012

CACTI-3DD: architecture-level modeling for 3D die-stacked DRAM main memory

Ke Chen; Sheng Li; Naveen Muralimanohar; Jung Ho Ahn; Jay B. Brockman; Norman P. Jouppi

Emerging 3D die-stacked DRAM technology is one of the most promising solutions for future memory architectures to satisfy the ever-increasing demands on performance, power, and cost. This paper introduces CACTI-3DD, the first architecture-level integrated power, area, and timing modeling framework for 3D die-stacked off-chip DRAM main memory. CACTI-3DD includes TSV models, improves models for 2D off-chip DRAM main memory over current versions of CACTI, and includes 3D integration models that enable the analysis of a full spectrum of 3D DRAM designs from coarse-grained rank-level 3D stacking to bank-level 3D stacking. CACTI-3DD enables an in-depth study of architecture-level tradeoffs of power, area, and timing for 3D die-stacked DRAM designs. We demonstrate the utility of CACTI-3DD in analyzing design trade-offs of emerging 3D die-stacked DRAM main memories. We find that a coarse-grained 3D DRAM design that stacks canonical DRAM dies can only achieve marginal benefits in power, area, and timing compared to the original 2D design. To fully leverage the huge internal bandwidth of TSVs, DRAM dies must be re-architected, and system implications must be considered when building 3D DRAMs with redesigned 2D planar DRAM dies. Our results show that the 3D DRAM with re-architected DRAM dies achieves significant improvements in power and timing compared to the coarse-grained 3D die-stacked DRAM.

international conference on computer aided design | 2011

CACTI-P: architecture-level modeling for SRAM-based structures with advanced leakage reduction techniques

Sheng Li; Ke Chen; Jung Ho Ahn; Jay B. Brockman; Norman P. Jouppi

This paper introduces CACTI-P, the first architecture-level integrated power, area, and timing modeling framework for SRAM-based structures with advanced leakage power reduction techniques. CACTI-P supports modeling of major leakage power reduction approaches including power-gating, long channel devices, and Hi-k metal gate devices. Because it accounts for implementation overheads, CACTI-P enables in-depth study of architecture-level tradeoffs for advanced leakage power management schemes. We illustrate the potential applicability of CACTI-P in the design and analysis of leakage power reduction techniques of future manycore processors by applying nanosecond scale power-gating to different levels of cache for a 64 core multithreaded architecture at the 22nm technology. Combining results from CACTI-P and a performance simulator, we find that although nanosecond scale power-gating is a powerful way to minimize leakage power for all levels of caches, its severe impacts on processor performance and energy when being used for L1 data caches make nanosecond scale power-gating a better fit for caches closer to main memory.

symposium on frontiers of massively parallel computation | 1996

Pursuing a petaflop: point designs for 100 TF computers using PIM technologies

Peter M. Kogge; Steven C. Bass; Jay B. Brockman; Danny Z. Chen; Edwin Hsing-Mean Sha

This paper is a summary of a proposal submitted to the NSF 100 Tera Flops Point Design Study. Its main thesis is that the use of Processing-In-Memory (PIM) technology can provide an extremely dense and highly efficient base on which such computing systems can be constructed the paper describes a strawman organization of one potential PIM chip, along with how multiple such chips might be organized into a real system, what the software supporting such a system might look like, and several applications which we will be attempting to place onto such a system.

international conference on supercomputing | 1999

Microservers: a new memory semantics for massively parallel computing

Jay B. Brockman; Peter M. Kogge; Thomas L. Sterling; Vincent W. Freeh; Shannon K. Kuntz

The semantics of memory-a large state which can only be read or changed a small piece at a time-has remained virtually untouched since von Neumann, and its effects-latency and bandwidth-have proved to be the major stumbling block for high performance computing. This paper suggests a new model, termed “microservers,” that exploits “Processing-In- Memory” VLSI technology, and that can reduce latency and memory traffic, increase inherent opportunities for concurrency, and support a variety of highly concurrent programming paradigms. Application of this model is then discussed in the framework of several on-going supercomputing programs, particularly the HTMT petaflops project.

ieee international conference on high performance computing data and analytics | 2011

System implications of memory reliability in exascale computing

Sheng Li; Ke Chen; Ming-Yu Hsieh; Naveen Muralimanohar; Chad D. Kersey; Jay B. Brockman; Arun Rodrigues; Norman P. Jouppi

Resiliency will be one of the toughest challenges in future exascale systems. Memory errors contribute more than 40% of the total hardware-related failures and are projected to increase in future exascale systems. The use of error correction codes (ECC) and checkpointing are two effective approaches to fault tolerance. While there are numerous studies on ECC or checkpointing in isolation, this is the first paper to investigate the combined effect of both on overall system performance and power. Specifically, we study the impact of various ECC schemes (SECDED, BCH, and chip-kill) in conjunction with checkpointing on future exascale systems. Our simulation results show that while chipkill is 13% better for computation-intensive applications, BCH has a 28% advantage in system energy-delay product (EDP) for memory-intensive applications. We also propose to use BCH in tagged memory systems with commodity DRAMs where chipkill is impractical. Our proposed architecture achieves 2.3× better system EDP than state-of-the-art tagged memory systems.

International Journal of Advanced Robotic Systems | 2014

LEGO-based Robotics in Higher Education: 15 Years of Student Creativity

Ethan E. Danahy; Eric L. Wang; Jay B. Brockman; Adam R. Carberry; Ben Shapiro; Chris Rogers

Our goal in this article is to reflect on the role LEGO robotics has played in college engineering education over the last 15 years, starting with the introduction of the RCX in 1998 and ending with the introduction of the EV3 in 2013. By combining a modular computer programming language with a modular building platform, LEGO Education has allowed students (of all ages) to become active leaders in their own education as they build everything from animals for a robotic zoo to robots that play childrens games. Most importantly, it allows all students to develop different solutions to the same problem to provide a learning community. We look first at how the recent developments in the learning sciences can help in promoting student learning in robotics. We then share four case studies of successful college-level implementations that build on these developments.

international symposium on computer architecture | 2004

A low cost, multithreaded processing-in-memory system

Jay B. Brockman; Shyamkumar Thoziyoor; Shannon K. Kuntz; Peter M. Kogge

This paper discusses die cost vs. performance tradeoffs for a PIM system that could serve as the memory system of a host processor. For an increase of less than twice the cost of a commodity DRAM part, it is possible to realize a performance speedup of nearly a factor of 4 on irregular applications. This cost efficiency derives from developing a custom multithreaded processor architecture and implementation style that is well-suited for embedding in a memory. Specifically, it takes advantage of the low latency and high row bandwidth to both simplify processor design --- reducing area --- as well as to improve processing throughput. To support our claims of cost and performance, we have used simulation, analysis of existing chips, and also designed and fully implemented a prototype chip, PIM Lite.

Explore More