David J. Palframan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where David J. Palframan is active.

Explore More

Publication

Featured researches published by David J. Palframan.

international symposium on computer architecture | 2017

Scalpel: Customizing DNN Pruning to the Underlying Hardware Parallelism

Jiecao Yu; Andrew Lukefahr; David J. Palframan; Ganesh S. Dasika; Reetuparna Das; Scott A. Mahlke

As the size of Deep Neural Networks (DNNs) continues to grow to increase accuracy and solve more complex problems, their energy footprint also scales. Weight pruning reduces DNN model size and the computation by removing redundant weights. However, we implemented weight pruning for several popular networks on a variety of hardware platforms and observed surprising results. For many networks, the network sparsity caused by weight pruning will actually hurt the overall performance despite large reductions in the model size and required multiply-accumulate operations. Also, encoding the sparse format of pruned networks incurs additional storage space overhead. To overcome these challenges, we propose Scalpel that customizes DNN pruning to the underlying hardware by matching the pruned network structure to the data-parallel hardware organization. Scalpel consists of two techniques: SIMD-aware weight pruning and node pruning. For low-parallelism hardware (e.g., microcontroller), SIMD-aware weight pruning maintains weights in aligned fixed-size groups to fully utilize the SIMD units. For high-parallelism hardware (e.g., GPU), node pruning removes redundant nodes, not redundant weights, thereby reducing computation without sacrificing the dense matrix format. For hardware with moderate parallelism (e.g., desktop CPU), SIMD-aware weight pruning and node pruning are synergistically applied together. Across the microcontroller, CPU and GPU, Scalpel achieves mean speedups of 3.54x, 2.61x, and 1.25x while reducing the model sizes by 88%, 82%, and 53%. In comparison, traditional weight pruning achieves mean speedups of 1.90x, 1.06x, 0.41x across the three platforms.

international symposium on computer architecture | 2015

COP: to compress and protect main memory

David J. Palframan; Nam Sung Kim; Mikko H. Lipasti

Protecting main memories from soft errors typically requires special dual-inline memory modules (DIMMs) which incorporate at least one extra chip per rank to store error-correcting codes (ECC). This increases the cost of the DIMM as well as its power consumption. To avoid these costs, some proposals have suggested protecting non-ECC DIMMs by allocating a portion of memory space to store ECC metadata. However, such proposals can significantly shrink the available memory space while degrading performance due to extra memory accesses. In this work, we propose a technique called COP which uses block-level compression to make room for ECC check bits in DRAM. Because a compressed block with check bits is the same size as an uncompressed block, no extra memory accesses are required and the memory space is not reduced. Unlike other approaches that require explicit compression-tracking metadata, COP employs a novel mechanism that relies on ECC to detect compressed data. Our results show that COP can reduce the DRAM soft error rate by 93% with no storage overhead and negligible impact on performance. We also propose a technique using COP to protect both compressible and incompressible data with minimal storage and performance overheads.

high-performance computer architecture | 2014

Precision-aware soft error protection for GPUs

David J. Palframan; Nam Sung Kim; Mikko H. Lipasti

With the advent of general-purpose GPU computing, it is becoming increasingly desirable to protect GPUs from soft errors. For high computation throughout, GPUs must store a significant amount of state and have many execution units. The high power and area costs of full protection from soft errors make selective protection techniques attractive. Such approaches provide maximum error coverage within a fixed area or power limit, but typically treat all errors equally. We observe that for many floating-point-intensive GPGPU applications, small magnitude errors may have little effect on results, while large magnitude errors can be amplified to have a significant negative impact. We therefore propose a novel precision-aware protection approach for the GPU execution logic and register file to mitigate large magnitude errors. We also propose an architecture modification to optimize error coverage for integer computations. Our approach combines selective logic hardening, targeted checker circuits, and intelligent register file encoding for best error protection. We demonstrate that our approach can reduce the mean error magnitude by up to 87% compared to a traditional selective protection approach with the same overhead.

design, automation, and test in europe | 2011

Time redundant parity for low-cost transient error detection

David J. Palframan; Nam Sung Kim; Mikko H. Lipasti

With shrinking transistor sizes and supply voltages, errors in combinational logic due to radiation particle strikes are on the rise. A broad range of applications will soon require protection from this type of error, requiring an effective and inexpensive solution. Many previously proposed logic protection techniques rely on duplicate logic or latches, incurring high overheads. In this paper, we present a technique for transient error detection using parity trees for power and area efficiency. This approach is highly customizable, allowing adjustment of a number of parameters for optimal error coverage and overhead. We present simulation results comparing our scheme to latch duplication, showing on average greater than 55% savings in area and power overhead for the same error coverage. We also demonstrate adding protection to reach a target logic soft error rate, constituting at best a 59X reduction in the error rate with under 2% power and area overhead.

dependable systems and networks | 2012

Mitigating random variation with spare RIBs: Redundant intermediate bitslices

David J. Palframan; Nam Sung Kim; Mikko H. Lipasti

Delay variation due to dopant fluctuation is expected to become more prominent in future technology generations. To regain performance lost due to within-die variations, many architectural techniques propose modified timing schemes such as time borrowing or variable latency execution. As an alternative that specifically targets random variation, we propose introducing redundancy along the processor datapath in the form of one or more extra bitslices. This approach allows us to leave dummy slices in the datapath unused to avoid excessively slow critical paths created by delay variations. We examine the benefits of applying this technique to potential critical paths such as the ALU and register file, and demonstrate that our technique can significantly reduce the delay penalty due to variation. By adding a single bitslice, for instance, we can reduce this delay penalty by 10%. Finally, we discuss heuristics for configuring our redundant design after fabrication.

IEEE Micro | 2013

Resilient High-Performance Processors with Spare RIBs

David J. Palframan; Nam Sung Kim; Mikko H. Lipasti

Resilience to defects and parametric variations is of the utmost concern for future technology generations. Traditional redundancy to repair defects, however, can incur performance penalties owing to multiplexing. This article presents a processor design that incorporates bit-sliced redundancy along the data path. This approach makes it possible to tolerate defects without hurting performance, because the same bit offset is left unused throughout the execution core. In addition, the authors use this approach to enhance performance by avoiding excessively slow critical paths created by random delay variations. Adding a single bit slice, for instance, can reduce the delay overhead of random process variations by 10 percent while providing fault tolerance for 15 percent of the execution core.

international symposium on microarchitecture | 2011

CRAM: coded registers for amplified multiporting

Vignyan Reddy Kothinti Naresh; David J. Palframan; Mikko H. Lipasti

Modern out-of-order processors require a large number of register file access ports. However, adding more ports can drastically increase the delay, power and area of the register file. This relationship imposes constraints on existing superscalar designs while impeding implementation of faster and wider out-of-order processors. In this paper, we present a novel multi-ported register file using concepts from network coding. We split a true multi-ported register file into two interleaved banks, each having half the read and write ports. A third bank, storing the XOR of the write backs to the other two banks, is added to amplify the read and write bandwidth. When compared to a conventional register file, our 8R4W 128-entry coded CRAM register file reduces leakage power by 48%, area by 29% and delay by 9%. In addition, for SPEC2006 benchmarks, our implementation consumes 40% less register file dynamic energy on average with IPC degradation of 3%.

high-performance computer architecture | 2015

iPatch: Intelligent fault patching to improve energy efficiency

David J. Palframan; Nam Sung Kim; Mikko H. Lipasti

Archive | 2014

Method and Apparatus for Soft Error Mitigation in Computers

David J. Palframan; Nam Sung Kim; Mikko H. Lipasti

great lakes symposium on vlsi | 2015

Online and Operand-Aware Detection of Failures Utilizing False Alarm Vectors

Amir Yazdanbakhsh; David J. Palframan; Azadeh Davoodi; Nam Sung Kim; Mikko H. Lipasti

Explore More

Collaboration

Dive into the David J. Palframan's collaboration.

Top Co-Authors

Mikko H. Lipasti

Wisconsin Alumni Research Foundation

View shared research outputs

Top Co-Authors

Amir Yazdanbakhsh

Georgia Institute of Technology

View shared research outputs

Top Co-Authors

Andrew Lukefahr

University of Michigan

View shared research outputs

Top Co-Authors

University of Wisconsin-Madison

View shared research outputs

Top Co-Authors

Ganesh S. Dasika

University of Michigan

View shared research outputs

Top Co-Authors

University of Michigan

View shared research outputs

Top Co-Authors

University of Michigan

View shared research outputs

Top Co-Authors

Scott A. Mahlke

University of Michigan

View shared research outputs

Top Co-Authors

Vignyan Reddy Kothinti Naresh

Qualcomm

View shared research outputs

Explore More