Heming Cui
Columbia University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Heming Cui.
programming language design and implementation | 2012
Jingyue Wu; Yang Tang; Gang Hu; Heming Cui; Junfeng Yang
Parallel programs are known to be difficult to analyze. A key reason is that they typically have an enormous number of execution interleavings, or schedules. Static analysis over all schedules requires over-approximations, resulting in poor precision; dynamic analysis rarely covers more than a tiny fraction of all schedules. We propose an approach called schedule specialization to analyze a parallel program over only a small set of schedules for precision, and then enforce these schedules at runtime for soundness of the static analysis results. We build a schedule specialization framework for C/C++ multithreaded programs that use Pthreads. Our framework avoids the need to modify every analysis to be schedule-aware by specializing a program into a simpler program based on a schedule, so that the resultant program can be analyzed with stock analyses for improved precision. Moreover, our framework provides a precise schedule-aware def-use analysis on memory locations, enabling us to build three highly precise analyses: an alias analyzer, a data-race detector, and a path slicer. Evaluation on 17 programs, including 2 real-world programs and 15 popular benchmarks, shows that analyses using our framework reduced may-aliases by 61.9%, false race reports by 69%, and path slices by 48.7%; and detected 7 unknown bugs in well-checked programs.
symposium on operating systems principles | 2015
Heming Cui; Rui Gu; Cheng Liu; Tianyu Chen; Junfeng Yang
State machine replication (SMR) leverages distributed consensus protocols such as Paxos to keep multiple replicas of a program consistent in face of replica failures or network partitions. This fault tolerance is enticing on implementing a principled SMR system that replicates general programs, especially server programs that demand high availability. Unfortunately, SMR assumes deterministic execution, but most server programs are multithreaded and thus nondeterministic. Moreover, existing SMR systems provide narrow state machine interfaces to suit specific programs, and it can be quite strenuous and error-prone to orchestrate a general program into these interfaces This paper presents Crane, an SMR system that transparently replicates general server programs. Crane achieves distributed consensus on the socket API, a common interface to almost all server programs. It leverages deterministic multithreading (specifically, our prior system Parrot) to make multithreaded replicas deterministic. It uses a new technique we call time bubbling to efficiently tackle a difficult challenge of nondeterministic network input timing. Evaluation on five widely used server programs (e.g., Apache, ClamAV, and MySQL) shows that Crane is easy to use, has moderate overhead, and is robust. Cranes source code is at github.com/columbia/crane.
Communications of The ACM | 2014
Junfeng Yang; Heming Cui; Jingyue Wu; Yang Tang; Gang Hu
Stable multithreading dramatically simplifies the interleaving behaviors of parallel programs, offering new hope for making parallel programming easier.
symposium on cloud computing | 2017
Cheng Wang; Jianyu Jiang; Xusheng Chen; Ning Yi; Heming Cui
State machine replication (SMR) uses Paxos to enforce the same inputs for a program (e.g., Redis) replicated on a number of hosts, tolerating various types of failures. Unfortunately, traditional Paxos protocols incur prohibitive performance overhead on server programs due to their high consensus latency on TCP/IP. Worse, the consensus latency of extant Paxos protocols increases drastically when more concurrent client connections or hosts are added. This paper presents APUS, the first RDMA-based Paxos protocol that aims to be fast and scalable to client connections and hosts. APUS intercepts inbound socket calls of an unmodified server program, assigns a total order for all input requests, and uses fast RDMA primitives to replicate these requests concurrently. We evaluated APUS on nine widely-used server programs (e.g., Redis and MySQL). APUS incurred a mean overhead of 4.3% in response time and 4.2% in throughput. We integrated APUS with an SMR system Calvin. Our Calvin-APUS integration was 8.2X faster than the extant Calvin-ZooKeeper integration. The consensus latency of APUS outperformed an RDMA-based consensus protocol by 4.9X. APUS source code and raw results are released on github.com/hku-systems/apus.
asia pacific workshop on systems | 2015
Heming Cui; Rui Gu; Cheng Liu; Junfeng Yang
Dynamic program analysis frameworks greatly improve software quality as they enable a wide range of powerful analysis tools (e.g., reliability, profiling, and logging) at runtime. However, because existing frameworks run only one actual execution for each software application, the execution is fully or partially coupled with an analysis tool in order to transfer execution states (e.g., accessed memory and thread interleavings) to the analysis tool, easily causing a prohibitive slowdown for the execution. To reduce the portions of execution states that require transfer, many frameworks require significantly carving analysis tools as well as the frameworks themselves. Thus, these frameworks significantly trade off transparency with analysis tools and allow only one type of tools to run within one execution. This paper presents RepFrame, an efficient and transparent framework that fully decouples execution and analysis by constructing multiple equivalent executions. To do so, RepFrame leverages a recent fault-tolerant technique: transparent state machine replication, which runs the same software application on a set of machines (or replicas), and ensures that all replicas see the same sequence of inputs and process these inputs with the same efficient thread interleavings automatically. In addition, this paper discusses potential directions in which REPFRAME can further strengthen existing analyses. Evaluation shows that REPFRAME is easy to run two asynchronous analysis tools together and has reasonable overhead.
international conference enterprise systems | 2017
Yuexuan Wang; Zhaoquan Gu; Lei Song; Tongyang Li; Heming Cui; Francis C. M. Lau
The emergence of low-cost 3D printing has catalyzed many new applications in academic, industrial, military, and medical fields. 3D printing has been dubbed a very slow process because its most common underlying printing method, Fused Deposition Modeling (FDM), is time-consuming due to the limited speed of the extruder. Multiple extruders, working simultaneously, can accelerate the printing process, but there are no extant efficient algorithms with guaranteed efficiency for solving the key challenge: how to schedule the extruders to avoid collisions during the printing operations. In this paper, we propose efficient algorithms to meet this challenge. First, we propose slicing algorithms for two extruders, which divide each slicing piece into conflicting areas and non-conflicting areas. The extruders can print the non-conflicting areas simultaneously while a conflicting area has to be printed separately by a single one. Then we extend the method to arbitrary n extruders for which we present a more practical sector based slicing algorithm. Our algorithms can achieve approximately n times efficiency when compared to the time consumed by single-extruder printers. We conduct simulations to verify the algorithms and the results show that our algorithms can significantly reduce the printing time.
annual computer security applications conference | 2017
Jianyu Jiang; Shixiong Zhao; Danish Alsayed; Yuexuan Wang; Heming Cui; Feng Liang; Zhaoquan Gu
Big-data frameworks (e.g., Spark) enable computations on tremendous data records generated by third parties, causing various security and reliability problems such as information leakage and programming bugs. Existing systems for big-data security (e.g., Titian) track data transformations in a record level, so they are imprecise and too coarse-grained for these problems. For instance, when we ran Titian to drill down input records that produced a buggy output record, Titian reported 3 to 9 orders of magnitude more input records than the actual ones. Information Flow Tracking (IFT) is a conventional approach for precise information control. However, extant IFT systems are neither efficient nor complete for big-data frameworks, because theses frameworks are data-intensive, and data flowing across hosts is often ignored by IFT. This paper presents Kakute, the first precise, fine-grained information flow analysis system for big-data. Our insight on making IFT efficient is that most fields in a data record often have the same IFT tags, and we present two new efficient techniques called Reference Propagation and Tag Sharing. In addition, we design an efficient, complete cross-host information flow propagation approach. Evaluation on seven diverse big-data programs (e.g., WordCount) shows that Kakute had merely 32.3% overhead on average even when fine-grained information control was enabled. Compared with Titian, Kakute precisely drilled down the actual bug inducing input records, a huge reduction of 3 to 9 orders of magnitude. Kakutes performance overhead is comparable with Titian. Furthermore, Kakute effectively detected 13 real-world security and reliability bugs in 4 diverse problems, including information leakage, data provenance, programming and performance bugs. Kakutes source code and results are available on https://github.com/hku-systems/kakute.
asia pacific workshop on systems | 2016
Cheng Wang; Jingyu Yang; Ning Yi; Heming Cui
Driven by the increasing computational demands, cluster management systems (e.g., Mesos) are already pervasive for deploying many applications. Unfortunately, despite much effort, existing systems are still difficult to meet the high requirements of critical applications (e.g., trading and military applications), because these applications naturally require high-availability and low performance overhead in deployments. Existing systems typically replicate their job controllers so that these controllers can be highly-available and thus they can handle application failures. However, applications themselves are still often a single point of failure, leaving arbitrary unavailable time windows for themselves. This paper proposes the design of Tripod, a cluster management system that automatically provides high-availability to general applications. Tripods key to make applications achieve high-availability efficiently is a new Paxos replication protocol that leverages RDMA (Remote Direct Memory Access). Tripod runs replicas of the same job with a replicas of controllers, and controllers agree on job requests efficiently with this protocol. Evaluation shows that Tripod has low performance overhead in both throughput and response time compared to an applications unreplicated execution.
operating systems design and implementation | 2010
Heming Cui; Jingyue Wu; Chia-Che Tsai; Junfeng Yang
symposium on operating systems principles | 2011
Heming Cui; Jingyue Wu; John Gallagher; Huayang Guo; Junfeng Yang