Luis Angel D. Bathen
University of California, Irvine
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Luis Angel D. Bathen.
design, automation, and test in europe | 2012
Yi Wang; Luis Angel D. Bathen; Zili Shao; Nikil D. Dutt
Three-dimensional (3D) flash memory is emerging to fulfil the ever-increasing demands of storage capacity. In 3D NAND flash memory, multiple layers are stacked to increase bit density and reduce bit cost of flash memory. However, the physical architecture of 3D flash memory leads to a higher probability of disturbance to adjacent physical pages and greatly increases bit error rates. This paper presents 3D-FlashMap, a novel physical-location-aware block mapping strategy for three-dimensional NAND flash memory. 3D-FlashMap permutes the physical mapping of blocks and maximizes the distance between consecutively logical blocks, which can significantly reduce the disturbance to adjacent physical pages and effectively enhance the reliability. We apply 3D-FlashMap to a representative flash storage system. Experimental results show that the proposed scheme can reduce uncorrectable page errors by 85% with less than 2% space overhead in comparison with the baseline scheme.
design automation conference | 2012
Yi Wang; Luis Angel D. Bathen; Nikil D. Dutt; Zili Shao
The increasing density of NAND flash memory leads to a dramatic increase in the bit error rate of flash, which greatly reduces the ability of error correcting codes (ECC) to handle multi-bit errors. To ensure the functionality and reliability of flash memory, the pages containing address mapping information and other metadata should be carefully stored in flash memory. This paper presents Meta-Cure, a novel hardware and file system interface that transparently protects metadata in the presence of multi-bit faults. Meta-Cure exploits built-in ECC and replication in order to protect pages containing critical data. Redundant pairs are formed at run time and distributed to different physical pages to protect against failures. Meta-Cure requires no changes to the file system, on-chip hierarchy, or hardware implementation of flash memory chip. Experimental results show that the proposed technique can reduce uncorrectable page errors by 92% with less than 1% space overhead in comparison with conventional error correction techniques.
design, automation, and test in europe | 2012
Luis Angel D. Bathen; Nikil D. Dutt; Alex Nicolau; Puneet Gupta
Power consumption variability of both on-chip SRAMs and off-chip DRAMs is expected to continue to increase over the next decades. We opportunistically exploit this variability through a novel Variability-aware Memory Virtualization (VaMV) layer that allows programmers to partition their applications address space (through annotations) into virtual address regions and create mapping policies for each region. Each policy has different requirements (e.g., power, fault-tolerance) and is exploited by our dynamic memory management module (VaMVisor), which adapts to the underlying hardware, prioritizes the memory resources according to their characteristics (e.g., power consumption), and selectively maps data to the best-fitting memory resource (e.g., high-utilization data to low-power memory space). Our experimental results on embedded benchmarks show that VaMV is capable of reducing dynamic power consumption by 63% on average while reducing total execution time by an average of 34% by exploiting: 1) SRAM voltage scaling, 2) DRAM power variability, and 3) Efficient dynamic policy-driven variability-aware memory allocation.
design, automation, and test in europe | 2011
Luis Angel D. Bathen; Nikil D. Dutt
The dual effects of larger die sizes and technology scaling, combined with aggressive voltage scaling for power reduction, increase the error rates for on-chip memories. Traditional on-chip memory reliability techniques (e.g., ECC) incur significant power and performance overheads. In this paper, we propose a low-power-and-performance-overhead Embedded RAID (E-RAID) strategy and present Embedded RAIDs-on-Chip (E-RoC), a distributed dynamically managed reliable memory subsystem. E-RoC achieves reliability through redundancy by optimizing RAID-like policies tuned for on-chip distributed memories. We achieve on-chip reliability of memories through the use of distributed dynamic scratch pad allocatable memories (DSPAMs) and their allocation policies. We exploit aggressive voltage scaling to reduce power consumption overheads due to parallel DSP AM accesses, and rely on the E-RoC manager to automatically handle any resulting voltage-scaling-induced errors. Our experimental results on multimedia benchmarks show that E-RoCs fully distributed redundant reliable memory subsystem reduces power consumption by up to 85% and latency up to 61 % over traditional reliability approaches that use parity/cyclic hybrids for error checking and correction.
IEEE Transactions on Very Large Scale Integration Systems | 2014
Yi Wang; Zili Shao; Henry C. B. Chan; Luis Angel D. Bathen; Nikil D. Dutt
The linear scaling down of NAND flash memory is approaching its physical, electrical, and reliability limitations. To maintain the current trend of increasing bit density and reducing bit per cost, 3-D flash memory is emerging as a viable solution to fulfill the ever-increasing demands of storage capacity. In 3-D NAND flash memory, multiple layers are stacked to provide ultrahigh density storage devices. However, the physical architecture of 3-D flash memory leads to a higher probability of disturbance to adjacent physical pages and greatly increases bit error rates. This paper presents a novel physical-location-aware address mapping strategy for 3-D NAND flash memory. It permutes the physical mapping of pages and maximizes the distance between the consecutively logical pages, which can significantly reduce the disturbance to adjacent physical pages and effectively enhance the reliability. The proposed mapping strategy is applied to a representative flash storage system. Experimental results show that the proposed scheme can reduce uncorrectable page errors by 70.16% with less than 10.01% space overhead in comparison with the baseline scheme.
design automation conference | 2012
Luis Angel D. Bathen; Nikil D. Dutt
Hybrid on-chip memories that combine Non-Volatile Memories (NVMs) with SRAMs promise to mitigate the increasing leakage power of traditional on-chip SRAMs. We present HaVOC: a run-time memory manager that virtualizes the hybrid on-chip memory space and supports efficient sharing of distributed ScratchPad Memories (SPMs) and NVMs. HaVOC allows programmers and the compiler to partition the applications address space and generate data/instruction block layouts considering virtualized hybrid address spaces. We define a data volatility metric used by our hybrid memory-aware compilation flow to generate memory allocation policies that are enforced at run-time by a filter-inspired dynamic memory algorithm. Our experimental results with a set of embedded benchmarks executing simultaneously on a Chip-Multiprocessor with hybrid NVM/SPMs show that HaVOC is able to reduce execution time and energy by 60.8% and 74.7% respectively with respect to traditional multitasking based SPM allocation policies.
international conference on hardware/software codesign and system synthesis | 2011
Luis Angel D. Bathen; Nikil D. Dutt; Dongyoun Shin; Sung-Soo Lim
Emerging multicore platforms are increasingly deploying distributed scratchpad memories to achieve lower energy and area together with higher predictability; but this requires transparent and efficient software management of these critical resources. In this paper, we introduce SPMVisor, a hardware/software layer that virtualizes the scratchpad memory space in order to facilitate the use of distributed SPMs in an efficient, transparent and secure manner. We introduce the notion of virtual scratchpad memories (vSPMs), which can be dynamically created and managed as regular SPMs. To protect the on-chip memory space, the SP-MVisor supports vSPM-level and block-level access control lists. In order to efficiently manage the on-chip real-estate, our SPMVisor supports policy-driven allocation strategies based on privilege levels. Our experimental results on Me-diabench/CHStone benchmarks running on various Chip-Multiprocessor configurations and software stacks (RTOS, virtualization, secure execution) show that SPMVisor enhances performance by 71% on average and reduces power consumption by 79% on average.
international conference on hardware/software codesign and system synthesis | 2012
Luis Angel D. Bathen; Mark Gottscho; Nikil D. Dutt; Alex Nicolau; Puneet Gupta
ITRS predicts that over the next decade, hardware power variation will increase at alarming rates. As a result, designers must build software that can adapt to and exploit these variations to reduce power consumption and improve system performance. This paper presents ViPZonE, a system-level solution that opportunistically exploits DRAM power variation through physical address zoning. ViPZonE is composed of a variability-aware software stack that allows developers to indicate to the OS the expected dominant usage patterns (write or read) as well as level of utilization (high, medium, or low) through high-level APIs. ViPZonEs variability-aware page allocator, implemented in the Linux kernel, is responsible for interpreting these high-level requests for memory and transparently mapping them to physical address zones with different power consumption. Our experimental results across various configurations running PAR-SEC workloads show an average of 13.1% memory power consumption savings at the cost of a modest 1.03% increase in execution time over a typical Linux virtual memory allocator.
ACM Transactions on Storage | 2007
Binny S. Gill; Luis Angel D. Bathen
Prefetching is a widely used technique in modern data storage systems. We study the most widely used class of prefetching algorithms known as sequential prefetching. There are two problems that plague the state-of-the-art sequential prefetching algorithms: (i) cache pollution, which occurs when prefetched data replaces more useful prefetched or demand-paged data, and (ii) prefetch wastage, which happens when prefetched data is evicted from the cache before it can be used. A sequential prefetching algorithm can have a fixed or adaptive degree of prefetch and can be either synchronous (when it can prefetch only on a miss) or asynchronous (when it can also prefetch on a hit). To capture these distinctions we define four classes of prefetching algorithms: fixed synchronous (FS), fixed asynchronous (FA), adaptive synchronous (AS), and adaptive asynchronous (AsynchA). We find that the relatively unexplored class of AsynchA algorithms is in fact the most promising for sequential prefetching. We provide a first formal analysis of the criteria necessary for optimal throughput when using an AsynchA algorithm in a cache shared by multiple steady sequential streams. We then provide a simple implementation called AMP (adaptive multistream prefetching) which adapts accordingly, leading to near-optimal performance for any kind of sequential workload and cache size. Our experimental setup consisted of an IBM xSeries 345 dual processor server running Linux using five SCSI disks. We observe that AMP convincingly outperforms all the contending members of the FA, FS, and AS classes for any number of streams and over all cache sizes. As anecdotal evidence, in an experiment with 100 concurrent sequential streams and varying cache sizes, AMP surpasses the FA, FS, and AS algorithms by 29--172%, 12--24%, and 21--210%, respectively, while outperforming OBL by a factor of 8. Even for complex workloads like SPC1-Read, AMP is consistently the best-performing algorithm. For the SPC2 video-on-demand workload, AMP can sustain at least 25% more streams than the next best algorithm. Furthermore, for a workload consisting of short sequences, where optimality is more elusive, AMP is able to outperform all the other contenders in overall performance. Finally, we implemented AMP in the state-of-the-art enterprise storage system, the IBM system storage DS8000 series. We demonstrated that AMP dramatically improves performance for common sequential and batch processing workloads and delivers up to a twofold increase in the sequential read capacity.
design automation conference | 2015
Santanu Sarma; Tiago Mück; Luis Angel D. Bathen; Nikil D. Dutt; Alexandru Nicolau
Due to increased demand for higher performance and better energy efficiency, MPSoCs are deploying heterogeneous architectures with architecturally differentiated core types. However, the traditional Linux-based operating system is unable to exploit this heterogeneity since existing kernel load balancing and scheduling approaches lack support for aggressively heterogeneous architectural configurations (e.g. beyond two core types). In this paper we present SmartBalance: a sensing-driven closed-loop load balancer for aggressively heterogeneous MPSoCs that performs load balancing using a sense-predict-balance paradigm. SmartBalance can efficiently manage the chip resources while opportunistically exploiting the workload variations and performance-power trade-offs of different core types. When compared to the standard vanilla Linux kernel load balancer, our per-thread and per-core performance-power-aware scheme shows an improvement in energy efficiency (throughput/Watt) of over 50% for benchmarks from the PARSEC benchmark suite executing on a heterogeneous MPSoC with 4 different core types and over 20% w.r.t. state-of-the-art ARMs global task scheduling (GTS) scheme for octa-core big.Little architecture.