Jinglei Ren | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jinglei Ren is active.

Explore More

Publication

Featured researches published by Jinglei Ren.

international symposium on microarchitecture | 2015

ThyNVM: enabling software-transparent crash consistency in persistent memory systems

Jinglei Ren; Jishen Zhao; Samira Manabi Khan; Jongmoo Choi; Yongwei Wu; Onur Mutiu

Emerging byte-addressable nonvolatile memories (NVMs) promise persistent memory, which allows processors to directly access persistent data in main memory. Yet, persistent memory systems need to guarantee a consistent memory state in the event of power loss or a system crash (i.e., crash consistency). To guarantee crash consistency, most prior works rely on programmers to (1) partition persistent and transient memory data and (2) use specialized software interfaces when updating persistent memory data. As a result, taking advantage of persistent memory requires significant programmer effort, e.g., to implement new programs as well as modify legacy programs. Use cases and adoption of persistent memory can therefore be largely limited. In this paper, we propose a hardware-assisted DRAM+NVM hybrid persistent memory design, Transparent Hybrid NVM (ThyNVM), which supports software-transparent crash consistency of memory data in a hybrid memory system. To efficiently enforce crash consistency, we design a new dual-scheme checkpointing mechanism, which efficiently overlaps checkpointing time with application execution time. The key novelty is to enable checkpointing of data at multiple granularities, cache block or page granularity, in a coordinated manner. This design is based on our insight that there is a tradeoff between the application stall time due to checkpointing and the hardware storage overhead of the metadata for checkpointing, both of which are dictated by the granularity of checkpointed data. To get the best of the tradeoff, our technique adapts the checkpointing granularity to the write locality characteristics of the data and coordinates the management of multiple-granularity updates. Our evaluation across a variety of applications shows that ThyNVM performs within 4.9% of an idealized DRAM-only system that can provide crash consistency at no cost.

architectural support for programming languages and operating systems | 2017

DudeTM: Building Durable Transactions with Decoupling for Persistent Memory

Mengxing Liu; Mingxing Zhang; Kang Chen; Xuehai Qian; Yongwei Wu; Weimin Zheng; Jinglei Ren

Emerging non-volatile memory (NVM) offers non-volatility, byte-addressability and fast access at the same time. To make the best use of these properties, it has been shown by empirical evidence that programs should access NVM directly through CPU load and store instructions, so that the overhead of a traditional file system or database can be avoided. Thus, durable transactions become a common choice of applications for accessing persistent memory data in a crash consistent manner. However, existing durable transaction systems employ either undo logging, which requires a fence for every memory write, or redo logging, which requires intercepting all memory reads within transactions. This paper presents DUDETM, a crash-consistent durable transaction system that avoids the drawbacks of both undo logging and redo logging. DUDETM uses shadow DRAM to decouple the execution of a durable transaction into three fully asynchronous steps. The advantage is that only minimal fences and no memory read instrumentation are required. This design also enables an out-of-the-box transactional memory (TM) to be used as an independent component in our system. The evaluation results show that DUDETM adds durability to a TM system with only 7.4 ~ 24.6% throughput degradation. Compared to the existing durable transaction systems, DUDETM provides 1.7times to 4.4times higher throughput. Moreover, DUDETM can be implemented with existing hardware TMs with minor hardware modifications, leading to a further 1.7times speedup.

Science in China Series F: Information Sciences | 2015

Fixing, preventing, and recovering from concurrency bugs

Dongdong Deng; Guoliang Jin; Marc de Kruijf; Ang Li; Ben Liblit; Shan Lu; ShanXiang Qi; Jinglei Ren; Karthikeyan Sankaralingam; Linhai Song; Yongwei Wu; Mingxing Zhang; Wei Zhang; Weimin Zheng

Concurrency bugs are becoming widespread with the emerging ubiquity of multicore processors and multithreaded software. They manifest during production runs and lead to severe losses. Many effective concurrency-bug detection tools have been built. However, the dependability of multi-threaded software does not improve until these bugs are handled statically or dynamically. This article discusses our recent progresses on fixing, preventing, and recovering from concurrency bugs.

foundations of software engineering | 2014

AI: a lightweight system for tolerating concurrency bugs

Mingxing Zhang; Yongwei Wu; Shan Lu; Shanxiang Qi; Jinglei Ren; Weimin Zheng

Concurrency bugs are notoriously difficult to eradicate during software testing because of their non-deterministic nature. Moreover, fixing concurrency bugs is time-consuming and error-prone. Thus, tolerating concurrency bugs during production runs is an attractive complementary approach to bug detection and testing. Unfortunately, existing bug-tolerating tools are usually either 1) constrained in types of bugs they can handle or 2) requiring roll-back mechanism, which can hitherto not be fully achieved efficiently without hardware supports. This paper presents a novel program invariant, called Anticipating Invariant (AI), which can help anticipate bugs before any irreversible changes are made. Benefiting from this ability of anticipating bugs beforehand, our software-only system is able to forestall the failures with a simple thread stalling technique, which does not rely on execution roll-back and hence has good performance Experiments with 35 real-world concurrency bugs demonstrate that AI is capable of detecting and tolerating most types of concurrency bugs, including both atomicity and order violations. Two new bugs have been detected and confirmed by the corresponding developers. Performance evaluation with 6 representative parallel programs shows that AI incurs negligible overhead (<1%) for many nontrivial desktop and server applications.

asia pacific workshop on systems | 2017

Programming for Non-Volatile Main Memory Is Hard

Jinglei Ren; Qingda Hu; Samira Manabi Khan; Thomas Moscibroda

Using non-volatile memory as main memory (NVMM) can largely improve the performance of applications, but adds to the challenge of programming -- it turns out to be very error-prone to write real-world NVMM programs, especially with object-oriented programming. This paper presents a field study of erroneous NVMM programs written by programmers who are trained to use a general NVMM programming interface. We performed the field study in a training workshop of 30 participants. Our observations and derived best practices offer a reference for future NVMM programming techniques design. Toward that end, we propose a taxonomy of latest NVMM programming techniques and, accordingly, a set of paradigms that can reduce the risk of NVMM-specific bugs. The paradigms incorporate a minimal NVMM library interface design and a new design pattern inspired by the field study.

IEEE Transactions on Computers | 2014

NO2: Speeding up Parallel Processing of Massive Compute-Intensive Tasks

Yongwei Wu; Weichao Guo; Jinglei Ren; Xun Zhao; Weiming Zheng

Large-scale computing frameworks, either tenanted on the cloud or deployed in the high-end local cluster, have become an indispensable software infrastructure to support numerous enterprise and scientific applications. Tasks executed on these frameworks are generally classified into data-intensive and compute-intensive ones. However, most existing frameworks, led by MapReduce, are mainly suitable for data-intensive tasks. Their task schedulers assume that the proportion of data I/O reflects the task progress and state. Unfortunately, this assumption does not apply to most compute-intensive tasks. Due to biased estimation of task progress, traditional frameworks cannot timely cut off outliers and therefore largely prolong execution time when performing compute-intensive tasks. We propose a new framework designed for compute-intensive tasks. By using instrumentation and automatic instrument point selector, our framework estimates the compute-intensive task progress without resorting to data I/O. We employ a clustering method to identify outliers at runtime and perform speculative execution/aborting, speeding up task execution by up to 25%. Moreover, our improvement to bare instrumentation limits overhead within 0.1%, and the aborting-based execution only introduces 10% more average CPU usage. Low overhead and resource consumption make our framework practically usable in the production environment.

ACM Transactions on Storage | 2018

D ude T x : Durable Transactions Made Decoupled

Mengxing Liu; Mingxing Zhang; Kang Chen; Xuehai Qian; Yongwei Wu; Weimin Zheng; Jinglei Ren

Emerging non-volatile memory (NVM) offers non-volatility, byte-addressability, and fast access at the same time. It is suggested that programs should access NVM directly through CPU load and store instructions. To guarantee crash consistency, durable transactions are regarded as a common choice of applications for accessing persistent memory data. However, existing durable transaction systems employ either undo logging, which requires a fence for every memory write, or redo logging, which requires intercepting all memory reads within transactions. Both approaches incur significant overhead. This article presents DudeTx, a crash-consistent durable transaction system that avoids the drawbacks of both undo and redo logging. DudeTx uses shadow DRAM to decouple the execution of a durable transaction into three fully asynchronous steps. The advantage is that only minimal fences and no memory read instrumentation are required. This design enables an out-of-the-box concurrency control mechanism, transactional memory or fine-grained locks, to be used as an independent component. The evaluation results show that DudeTx adds durability to a software transactional memory system with only 7.4%--24.6% throughput degradation. Compared to typical existing durable transaction systems, DudeTx provides 1.7× --4.4× higher throughput. Moreover, DudeTx can be implemented with hardware transactional memory or lock-based concurrency control, leading to a further 1.7× and 3.3× speedup, respectively.

IEEE Transactions on Computers | 2014

Quatrain: Accelerating Data Aggregation between Multiple Layers

Jinglei Ren; Yongwei Wu; Meiqi Zhu; Weiming Zheng

Composition of multiple layers (or components/services) has been a dominant practice in building distributed systems, meanwhile aggregation has become a typical pattern of data flows nowadays. However, the efficiency of data aggregation is usually impaired by multiple layers due to amplified delay. Current solutions based on data/execution flow optimization mostly counteract flexibility, reusability, and isolation of layers abstraction. Otherwise, programmers have to do much error-prone manual programming to optimize communication, and it is complicated in a multithreaded environment. To resolve the dilemma, we propose a new style of inter-process communication that not only optimizes data aggregation but also retains the advantages of layered (or component-based/service-oriented) architecture. Our approach relaxes the traditional definition of procedure and allows a procedure to return multiple times. Specifically, we implement an extended remote procedure calling framework Quatrain to support the new multireturn paradigm. In this paper, we establish the importance of multiple returns, introduce our very simple semantics, and present a new synchronization protocol that frees programmers from multireturn-related thread coordination. Several practical applications are constructed with Quatrain, and the evaluation shows an average of 56% reduction of response time, compared with the traditional calling paradigm, in realistic environments.

usenix annual technical conference | 2017