Donald Kline | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Donald Kline is active.

Explore More

Publication

Featured researches published by Donald Kline.

IEEE Transactions on Computers | 2016

Improving Bit Flip Reduction for Biased and Random Data

Seyed Mohammad Seyedzadeh; Rakan Maddah; Donald Kline; Rami G. Melhem

Nonvolatile memory technologies such as Spin-Transfer Torque Random Access Memory (STT-RAM) and Phase Change Memory (PCM) are emerging as promising replacements to DRAM. Before deploying STT-RAM and PCM into functional systems, a number of challenges still remain must be addressed. Specifically, both require relatively high write energy, STT-RAM suffers from high bit error rates and PCM suffers from low endurance. A common solution to overcome those challenges is to minimize the number of bits changed per write. In this paper, we propose and evaluate the hybrid coset encoder to efficiently improve and balance the bit flip reduction for biased and unbiased data. The main core of the coset encoder consists of biased and unbiased vectors which maps the data input to a larger set of data vectors. Subsequently, the intermediate data vector that yields the least number of differences when compared to the currently stored data is selected. Our evaluation shows that hybrid coset encoder reduces bit flips by up to 25 percent over a baseline differential writing scheme. Further, our proposed scheme reduces bit flips by up to 20 percent over the leading bit-flip minimization scheme for biased data, while achieving very low decoding overhead similar to the Flip-N-Write scheme.

international green and sustainable computing conference | 2016

Modeling STT-RAM fabrication cost and impacts in NVSim

Ismail Bayram; Enes Eken; Donald Kline; Nikolas Parshook; Yiran Chen

Reducing power consumption of computational systems in the use-phase has become a significant focus to decrease thermal impacts and overall energy consumption of computing systems while having battery life benefits for increasingly mobile computing products. It is also a major driver of the sustainability of these systems due to the environmental impacts incurred through electricity generation. STT-RAM is a promising candidate to reduce use-phase power consumption due to its non-volatile data storage that dramatically reduces static power common in deeply scaled CMOS while maintaining high speed operation and excellent CMOS compatibility. However, augmenting CMOS chips with STT-RAM incurs an additional manufacturing cost through the extra materials and fabrication steps necessary to create the chip. In this paper we describe several extensions to the widely used NVSim tool that estimates area, performance, and use-phase power of STT-RAM to include calculations for manufacturing costs and environmental impacts such as energy usage, global warming potential, and other emissions. To demonstrate the value of these NVSim extensions, we provide a case study to experimentally determine the time it takes for replacing a SRAM cache with an ISO-capacity and ISO-area STT-RAM cache to overcome the manufacturing cost overhead. Our results indicate it can take an average of 80 and 160 days, respectively at 100% utilization to recover the manufacturing energy overhead.

design automation conference | 2015

Domain-wall memory buffer for low-energy NoCs

Donald Kline; Haifeng Xu; Rami G. Melhem

Networks-on-chip (NoCs) have become a leading energy consumer in modern multi-core processors, with a considerable portion of this energy originating from the large number of virtual channel (FIFO) buffers. While emerging memories have been considered for many architectural components such as caches, the asymmetric access properties and relatively small size of network-FIFOs compared to the required peripheral circuitry has led to few such replacements proposed for NoCs. In this paper, we propose control schemes that leverage the “shift-register” nature of spintronic domain-wall memory (DWM) to replace conventional memory buffers for the NoC. Our results indicate that the best shift-based scheme utilizes a dual-nanowire approach to ensure that reads and writes can be more effectively aligned with access ports for simultaneous access in the same cycle. Our approach provides a 2.93X speedup over a DWM buffer using a traditional FIFO memory control scheme with a 1.16X savings in energy. Compared to a SRAM-FIFO it exhibits an 8% message latency degradation versus a 56% energy reduction. The resulting approach achieves a 53% reduction in energy delay product compared to SRAM and a 42% reduction in energy delay product versus STT-MRAM.

international green and sustainable computing conference | 2016

Holistically evaluating the environmental impacts in modern computing systems

Donald Kline; Nikolas Parshook; Xiaoyu Ge; Erik Brunvand; Rami G. Melhem; Panos K. Chrysanthis

There is mounting evidence that manufacturing energy and environmental costs are a growing factor in the overall energy footprint of computing systems. The quantification of these impacts requires the evaluation of both the manufacturing and use phase energy/environmental costs of major integrated circuit (IC) components, including processing units, memory, and storage. In particular, expansions of memory and cache can potentially increase manufacturing costs beyond what can be recovered through use phase advantages for reasonable usage patterns. With this holistic view of sustainability in mind, we provide evaluations of the environmental impacts of memory and cache options for Parsec and SPEC multi-program workloads. Using indifference point analysis, we determine which architectural decisions are the most sustainable in the context of these workloads for various usage scenarios. Through a form of break even analysis, we show the impact of upgrading to a new technology node. Our analysis of current processor trends indicates that upgrading may require upwards of 10 years of service time to break even, and that designing systems with smaller cache and main memory sizes may provide an overall positive environmental trend without dramatically reducing performance.

great lakes symposium on vlsi | 2015

MSCS: Multi-hop Segmented Circuit Switching

Donald Kline; Kai Wang; Rami G. Melhem

NoCs (networks-on-chip) are commonly proposed as scalable on-chip interconnects for current and future CMPs (chip multi-processors) and many-core systems. While scalable, the lack of global control can create routing inefficiencies detrimental to the overall network latency. Recently, NoCs have been proposed that allow flits to traverse multiple network switches in a single cycle. This requires a more global view of control to allow routers along the path of a packet to configure their switches collectively. In this paper, we propose a reservation based circuit-switching design, MSCS, which provides simplified global control and multi-hop traversal while reducing latency. MSCS performs network control once per network dimension for the lifetime of a packet, while the leading methods require multiple arbitration steps depending on contention in the network. Furthermore, MSCS can perform control for a packet prior to the availability of resources through reservations, while previous schemes only perform control on-demand. Overall, MSCS can reduce the buffer size by 50% over the leading multi-hop scheme while maintaining a nominal latency improvement (1.4%). With the same buffer resources per port, MSCS achieves a 12.7% latency improvement.

international conference on computer design | 2017

Yoda: Judge Me by My Size, Do You?

Jiangwei Zhang; Donald Kline; Liang Fang; Rami G. Melhem

Phase change memory is a promising alternative to conventional memories such as DRAM due to its density and non-volatility. Unfortunately, reliability is still a challenge as limited write endurance, exacerbated by process variation, leads to increasing numbers of stuck-at faults over the memorys lifetime. Error-correcting Pointers (ECP) is a popular proposal to mitigate stuck-at faults by recording the addresses and the values of faulty bits in order to extend the memory lifetime. In this paper, we propose Yoda, a method to extend ECP with one or a small number of additional encoding bits in order to dramatically improve the effectiveness and guaranteed fault correction capability of ECP. Our simulation results demonstrate that Yoda has a 3.0x improvement in fault coverage compared to a fault-aware ECP with a similar overhead, while also providing a 2.5-3.0x improvement over state-of-the-art schemes with comparable complexity.

Proceedings of the International Symposium on Memory Systems | 2017

Mitigating bitline crosstalk noise in DRAM memories

Seyed Mohammad Seyedzadeh; Donald Kline; Rami G. Melhem

DRAM cells in deeply scaled CMOS confront significant challenges to ensure reliable operation. Parasitic capacitances induced by certain bit storage patterns, or bad patterns, create coupling noise that can cause crosstalk-induced faults when the coupling exceeds tolerable margins. These margins decrease and their variabilities increase with scaling, leading to weak cells that are highly susceptible to this form of crosstalk. This paper explores coding techniques to address row-based crosstalk. First, n-to-m bit encoding is explored to remove bad bit patterns from code words. Second, a Periodic Flip Encoding (PFE) technique is proposed to flip specific bits in a repeated pattern with different offsets and produce multiple code word candidates. PFE encoding can be used in a fault-oblivious or fault-aware fashion. Fault-oblivious PFE mitigates faults when the location of weak cells is unknown by minimizing the number of bad patterns in the encoded data. Fault-aware PFE avoids faults when the location of the weak cells is known by selecting the code word in which the center of any bad pattern does not coincide with a weak cell. Fault-aware and fault-oblivious PFE provide two fault tolerance solutions with a trade-off between reliability improvement and performance and power overheads. Experimental evaluation demonstrates that PFE outperforms n-to-m bit encoding as well as other leading approaches, including error correction pointers (ECP) and error correction codes (ECC). For example, when 0.01% of the cells are weak, fault-aware PFE achieves an Uncorrectable Bit Error Rate (UBER) smaller than 3×10−12 compared to 1.4×10−6 for ECP and 6.8×10−6 for ECC-1. When a relatively high 1% of the cells are weak, fault-aware PFE improves the UBER more than two orders of magnitude compared to ECC-1 and one order of magnitude compared to ECP. This is accomplished in both cases with a low performance overhead of between 1--2%, depending on the hardware implementation.

Integration | 2018

Yielding optimized dependability assurance through bit inversion

Jiangwei Zhang; Donald Kline; Liang Fang; Rami G. Melhem

Abstract Phase change memory (PCM) is a promising alternative to conventional DRAM main memories, due to its read performance, density, and nonvolatility and resulting low static energy. Unfortunately, reliability is still a significant challenge as limited write endurance, exacerbated by process variation, leads to increasing numbers of stuck-at faults over the memorys lifetime. This includes a significant number of stuck-at faults that appear early in the memorys service. Error-correcting Pointers (ECP) is a popular proposal to mitigate stuck-at faults in PCM by recording the addresses and the values of faulty bits in order to extend the lifetime of the memory. We propose a method to extend the effectiveness of ECP coverage called Yoda, which utilizes a small number of additional encoding bits in order to dramatically improve the effectiveness and fault correction capability of ECP. By adding one additional bit to ECP which corrects f faults, Yoda can correct 2f +1 faults. Further improvements are possible introducing small numbers additional bits. Our simulation results demonstrate that Yoda has a 3.0× improvement in fault coverage compared to a fault-aware ECP with a similar overhead, while also providing a 2.5–3.0× improvement over state-of-the-art schemes with comparable complexity. Furthermore, Yoda provides a method to protect the auxiliary bits, also with a small overhead. By adding one auxiliary bit to protect the auxiliary bits, Yoda can achieve extra improvement.

IEEE Computer Architecture Letters | 2018