Youyou Lu
Tsinghua University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Youyou Lu.
international conference on computer design | 2014
Youyou Lu; Jiwu Shu; Long Sun; Onur Mutlu
Emerging non-volatile memory (NVM) technologies enable data persistence at the main memory level at access speeds close to DRAM. In such persistent memories, memory writes need to be performed in strict order to satisfy storage consistency requirements and enable correct recovery from system crashes. Unfortunately, adhering to a strict order for writes to persistent memory significantly degrades system performance as it requires flushing dirty data blocks from CPU caches and waiting for their completion at the main memory in the order specified by the program. This paper introduces a new mechanism, called Loose-Ordering Consistency (LOC), that satisfies the ordering requirements of persistent memory writes at significantly lower performance degradation than state-of-the-art mechanisms. LOC consists of two key techniques. First, Eager Commit reduces the commit overhead for writes within a transaction by eliminating the need to perform a persistent commit record write at the end of a transaction. We do so by ensuring that we can determine the status of all committed transactions during recovery by storing necessary metadata information statically with blocks of data written to memory. Second, Speculative Persistence relaxes the ordering of writes between transactions by allowing writes to be speculatively written to persistent memory. A speculative write is made visible to software only after its associated transaction commits. To enable this, our mechanism requires the tracking of committed transaction ID and support for multi-versioning in the CPU cache. Our evaluations show that LOC reduces the average performance overhead of strict write ordering from 66.9% to 34.9% on a variety of workloads.
european conference on computer systems | 2016
Jiaxin Ou; Jiwu Shu; Youyou Lu
Emerging non-volatile main memories (NVMMs) provide data persistence at the main memory level. To avoid the double-copy overheads among the user buffer, the OS page cache, and the storage layer, state-of-the-art NVMM-aware file systems bypass the OS page cache which directly copy data between the user buffer and the NVMM storage. However, one major drawback of existing NVMM technologies is the slow writes. As a result, such direct access for all file operations can lead to suboptimal system performance. In this paper, we propose HiNFS, a high performance file system for non-volatile main memory. Specifically, HiNFS uses an NVMM-aware Write Buffer policy to buffer the lazy-persistent file writes in DRAM and persists them to NVMM lazily to hide the long write latency of NVMM. However, HiNFS performs direct access to NVMM for eager-persistent file writes, and directly reads file data from both DRAM and NVMM as they have similar read performance, in order to eliminate the double-copy overheads from the critical path. To ensure read consistency, HiNFS uses a combination of the DRAM Block Index and Cacheline Bitmap to track the latest data between DRAM and NVMM. Finally, HiNFS employs a Buffer Benefit Model to identify the eager-persistent file writes before issuing the write operations. Using software NVMM emulators, we evaluate HiNFSs performance with various workloads. Comparing with state-of-the-art NVMM-aware file systems - PMFS and EXT4-DAX, surprisingly, our results show that HiNFS improves the system throughput by up to 184% for filebench microbenchmarks and reduces the execution time by up to 64% for data-intensive traces and macro-benchmarks, demonstrating the benefits of hiding the long write latency of NVMM.
ieee conference on mass storage systems and technologies | 2015
Youyou Lu; Jiwu Shu; Long Sun
Persistent memory provides data persistence at main memory level and enables memory-level storage systems. To ensure consistency of the storage systems, memory writes need to be transactional and are carefully moved across the boundary between the volatile CPU cache and the persistent memory. Unfortunately, the CPU cache is hardware-controlled, and it incurs high overhead for programs to track and move data blocks from being volatile to persistent. In this paper, we propose a software-based mechanism, Blurred Persistence, to blur the volatility-persistence boundary, so as to reduce the overhead in transaction support. Blurred Persistence consists of two techniques. First, Execution in Log executes a transaction in the log to eliminate duplicated data copies for execution. It allows the persistence of volatile uncommitted data, which can be detected by reorganizing the log structure. Second, Volatile Checkpoint with Bulk Persistence allows the committed data to aggressively stay volatile by leveraging the data durability in the log, as long as the commit order across threads is kept. By doing so, it reduces the frequency of forced persistence and improves cache efficiency. Evaluations show that our mechanism improves system performance by 56.3% to 143.7% for a variety of workloads.
international conference on computer design | 2013
Youyou Lu; Jiwu Shu; Jia Guo; Shuai Li; Onur Mutlu
Flash memory has accelerated the architectural evolution of storage systems with its unique characteristics compared to magnetic disks. The no-overwrite property of flash memory has been leveraged to efficiently support transactions, a commonly used mechanism in systems to provide consistency. However, existing transaction designs embedded in flash-based Solid State Drives (SSDs) have limited support for transaction flexibility, i.e., support for different isolation levels between transactions, which is essential to enable different systems to make tradeoffs between performance and consistency. Since they provide support for only strict isolation between transactions, existing designs lead to a reduced number of on-the-fly requests and therefore cannot exploit the abundant internal parallelism of an SSD. There are two design challenges that need to be overcome to support flexible transactions: (1) enabling a transaction commit protocol that supports parallel execution of transactions; and (2) efficiently tracking the state of transactions that have pages scattered over different locations due to parallel allocation of pages. In this paper, we propose LightTx to address these two challenges. LightTx supports transaction flexibility using a lightweight embedded transaction design. The design of LightTx is based on two key techniques. First, LightTx uses a commit protocol that determines the transaction state solely inside each transaction (as opposed to having dependencies between transactions that complicate state tracking) in order to support parallel transaction execution. Second, LightTx periodically retires the dead transactions to reduce transaction state tracking cost. Experiments show that LightTx provides up to 20.6% performance improvement due to transaction flexibility. LightTx also achieves nearly the lowest overhead in garbage collection and mapping persistence compared to existing embedded transaction designs.
international parallel and distributed processing symposium | 2014
Jiaxin Ou; Jiwu Shu; Youyou Lu; Letian Yi; Wei Wang
Data migration schemes are critical to balance the load in storage clusters for performance improvement. However, as NAND flash based SSDs are widely deployed in storage systems, extending the lifespan of SSD storage clusters becomes a new challenge for data migration. Prior approaches designed for HDD storage clusters, however, are inefficient due to excessive write amplification during data migration, which significantly decrease the lifespan of SSD storage clusters. To overcome this problem, we propose EDM, an endurance aware data migration scheme with careful data placement and movement to minimize the data migrated, so as to limit the worn-out of SSDs while improving the performance. Based on the observation that performance degradation is dominated by the wear speed of an SSD, which is affected by both the storage utilization and the write intensity, two complementary data migration policies are designed to explore the trade-offs among throughput, response time during migration, and lifetime of SSD storage clusters. Moreover, we design an SSD wear model and quantitatively calculate the amount of data migrated as well as the sources and destinations of the migration, so as to reduce the write amplification caused by migration. Results on a real storage cluster using real-world traces show that EDM performs favorably versus existing HDD based migration techniques, reducing cluster-wide aggregate erase count by up to 40%. In the meantime, it improves the performance by 25% on average compared to the baseline system which achieves almost the same effectiveness of performance improvement as previous migration techniques.
computing frontiers | 2015
Long Sun; Youyou Lu; Jiwu Shu
Emerging non-volatile memory (NVRAM) technologies, like phase change memory, envision persistent memory architectures. In case of power failure, operations on persistent memory should be in transactional semantics by adopting techniques such as WAL. To ensure consistency and atomicity, persist barriers are widely adopted, to prevent persistent memory controller from scheduling writes and exploiting bank-level parallelism of NVRAM devices. Besides, unified retention time for persistent writes, i.e., log and data writes, further reduces the performance of persistent memory system, while retention time for log writes does not need to be so long due to periodic truncation. In this paper, we study how NVRAM write latency affects the system throughput and propose DP2, which consists of two main techniques: differential persistency and dual persistency. Differential persistency distinguishes log writes from data writes, and enhances the NVRAM memory controller to schedule log writes across persist barriers to fully utilize bank level parallelism. Dual persistency relaxes the retention time of log writes to reduce write latency and the iterations per write, which in turn enhances lifetime of NVRAM devices. Evaluation results show that our proposed techniques improve system throughput up by 43% on average and extend lifetime up by 47%, with 104-s retention time for log writes.
IEEE Transactions on Computers | 2015
Youyou Lu; Jiwu Shu; Jia Guo; Shuai Li; Onur Mutlu
Flash memory has accelerated the architectural evolution of storage systems with its unique characteristics compared to magnetic disks. The no-overwrite property of flash memory naturally supports transactions, a commonly used mechanism in systems to provide consistency. However, existing embedded transaction designs in flash-based Solid State Drives (SSDs) either limit the transaction concurrency or introduce high overhead in tracking transaction states. This leads to low or unstable SSD performance. In this paper, we propose a transactional SSD (TxSSD) architecture, LightTx, to enable better concurrency and low overhead. First, LightTx improves transaction concurrency arbitrarily by using a page-independent commit protocol. Second, LightTx tracks the recent updates by leveraging the near-log-structured update property of SSDs and periodically retires dead transactions to reduce the transaction state tracking cost. Experiments show that LightTx achieves nearly the lowest overhead in garbage collection, memory consumption and mapping persistence compared to existing embedded transaction designs. LightTx also provides up to 20.6 percent performance improvement due to improved transaction concurrency.
2014 IEEE Non-Volatile Memory Systems and Applications Symposium (NVMSA) | 2014
Youyou Lu; Jiwu Shu; Peng Zhu
Transaction is a common technique to ensure system consistency but incurs high overhead. Recent flash memory techniques enable efficient embedded transaction support inside solid state drives (SSDs). In this paper, we propose a new embedded transaction mechanism, TxCache, for SSDs with non-volatile disk cache. TxCache revises cache management of disk cache to support transactions using two techniques. First, it persists new-version data in non-volatile disk cache in a shadow way while protecting old-version data from being overwritten. Second, it uses pointers and flags leveraging the byte-addressability to cluster pages of each transaction and manage transaction status. The non-volatility and byte-addressability properties make TxCache an efficient transaction design. Experiments using file system and database workloads show performance improvement up to 46.0% and lifetime extension up to 33.8% compared to a recent transactional SSD design.
ACM Transactions on Storage | 2016
Youyou Lu; Jiwu Shu; Long Sun
Persistent memory provides data durability in main memory and enables memory-level storage systems. To ensure consistency of such storage systems, memory writes need to be transactional and are carefully moved across the boundary between the volatile CPU cache and the persistent main memory. Unfortunately, cache management in the CPU cache is hardware-controlled. Legacy transaction mechanisms, which are designed for disk-based storage systems, are inefficient in ordered data persistence of transactions in persistent memory. In this article, we propose the Blurred Persistence mechanism to reduce the transaction overhead of persistent memory by blurring the volatility-persistence boundary. Blurred Persistence consists of two techniques. First, Execution in Log executes a transaction in the log to eliminate duplicated data copies for execution. It allows persistence of the volatile uncommitted data, which are detectable with reorganized log structure. Second, Volatile Checkpoint with Bulk Persistence allows the committed data to aggressively stay volatile by leveraging the data durability in the log, as long as the commit order across threads is kept. By doing so, it reduces the frequency of forced persistence and improves cache efficiency. Evaluations show that our mechanism improves system performance by 56.3% to 143.7% for a variety of workloads.
2016 5th Non-Volatile Memory Systems and Applications Symposium (NVMSA) | 2016
Hu Wan; Youyou Lu; Yuanchao Xu; Jiwu Shu
Atomic and durable transactions are widely used to ensure the crash consistency in persistent memory (PM). However, whether to use redo or undo logging is still a hotly debated topic in persistent memory systems. In this paper, we empirically study the performance of both redo and undo logging using NVML, a persistent memory transactional object store framework. Our results on an NVDIMM server show that redo logging significantly outperforms undo logging for workloads in which a transaction updates large number of different objects, while it underperforms undo logging for workloads with intensive read operations. Furthermore, undo logging is more sensitive to the read-to-write ratios, compared to redo logging. Finally, our experiments also demonstrate that asynchronous log truncation is much helpful in redo logging for log-heavy transactions.