Is this you? Create Your Porfile

Guoliang Jin

University of Wisconsin-Madison

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Guoliang Jin is active.

Explore More

Publication

Featured researches published by Guoliang Jin.

programming language design and implementation | 2012

Understanding and detecting real-world performance bugs

Guoliang Jin; Linhai Song; Xiaoming Shi; Joel Scherpelz; Shan Lu

Developers frequently use inefficient code sequences that could be fixed by simple patches. These inefficient code sequences can cause significant performance degradation and resource waste, referred to as performance bugs. Meager increases in single threaded performance in the multi-core era and increasing emphasis on energy efficiency call for more effort in tackling performance bugs. This paper conducts a comprehensive study of 110 real-world performance bugs that are randomly sampled from five representative software suites (Apache, Chrome, GCC, Mozilla, and MySQL). The findings of this study provide guidance for future work to avoid, expose, detect, and fix performance bugs. Guided by our characteristics study, efficiency rules are extracted from 25 patches and are used to detect performance bugs. 332 previously unknown performance problems are found in the latest versions of MySQL, Apache, and Mozilla applications, including 219 performance problems found by applying rules across applications.

programming language design and implementation | 2011

Automated atomicity-violation fixing

Guoliang Jin; Linhai Song; Wei Zhang; Shan Lu; Ben Liblit

Fixing software bugs has always been an important and time-consuming process in software development. Fixing concurrency bugs has become especially critical in the multicore era. However, fixing concurrency bugs is challenging, in part due to non-deterministic failures and tricky parallel reasoning. Beyond correctly fixing the original problem in the software, a good patch should also avoid introducing new bugs, degrading performance unnecessarily, or damaging software readability. Existing tools cannot automate the whole fixing process and provide good-quality patches. We present AFix, a tool that automates the whole process of fixing one common type of concurrency bug: single-variable atomicity violations. AFix starts from the bug reports of existing bug-detection tools. It augments these with static analysis to construct a suitable patch for each bug report. It further tries to combine the patches of multiple bugs for better performance and code readability. Finally, AFixs run-time component provides testing customized for each patch. Our evaluation shows that patches automatically generated by AFix correctly eliminate six out of eight real-world bugs and significantly decrease the failure probability in the other two cases. AFix patches never introduce new bugs and usually have similar performance to manually-designed patches.

architectural support for programming languages and operating systems | 2011

ConSeq: detecting concurrency bugs through sequential errors

Wei Zhang; Junghee Lim; Ramya Olichandran; Joel Scherpelz; Guoliang Jin; Shan Lu; Thomas W. Reps

Concurrency bugs are caused by non-deterministic interleavings between shared memory accesses. Their effects propagate through data and control dependences until they cause software to crash, hang, produce incorrect output, etc. The lifecycle of a bug thus consists of three phases: (1) triggering, (2) propagation, and (3) failure. Traditional techniques for detecting concurrency bugs mostly focus on phase (1)--i.e., on finding certain structural patterns of interleavings that are common triggers of concurrency bugs, such as data races. This paper explores a consequence-oriented approach to improving the accuracy and coverage of state-space search and bug detection. The proposed approach first statically identifies potential failure sites in a program binary (i.e., it first considers a phase (3) issue). It then uses static slicing to identify critical read instructions that are highly likely to affect potential failure sites through control and data dependences (phase (2)). Finally, it monitors a single (correct) execution of a concurrent program and identifies suspicious interleavings that could cause an incorrect state to arise at a critical read and then lead to a software failure (phase (1)). ConSeqs backwards approach, (3)!(2)!(1), provides advantages in bug-detection coverage and accuracy but is challenging to carry out. ConSeq makes it feasible by exploiting the empirical observationthat phases (2) and (3) usually are short and occur within one thread. Our evaluation on large, real-world C/C++ applications shows that ConSeq detects more bugs than traditional approaches and has a much lower false-positive rate.

architectural support for programming languages and operating systems | 2013

Production-run software failure diagnosis via hardware performance counters

Joy Arulraj; Po-Chun Chang; Guoliang Jin; Shan Lu

Sequential and concurrency bugs are widespread in deployed software. They cause severe failures and huge financial loss during production runs. Tools that diagnose production-run failures with low overhead are needed. The state-of-the-art diagnosis techniques use software instrumentation to sample program properties at run time and use off-line statistical analysis to identify properties most correlated with failures. Although promising, these techniques suffer from high run-time overhead, which is sometimes over 100%, for concurrency-bug failure diagnosis and hence are not suitable for production-run usage. We present PBI, a system that uses existing hardware performance counters to diagnose production-run failures caused by sequential and concurrency bugs with low overhead. PBI is designed based on several key observations. First, a few widely supported performance counter events can reflect a wide variety of common software bugs and can be monitored by hardware with almost no overhead. Second, the counter overflow interrupt supported by existing hardware and operating systems provides a natural and effective mechanism to conduct event sampling at user level. Third, the noise and non-determinism in interrupt delivery complements well with statistical processing. We evaluate PBI using 13 real-world concurrency and sequential bugs from representative open-source server, client, and utility programs, and 10 bugs from a widely used software-testing benchmark. Quantitatively, PBI can effectively diagnose failures caused by these bugs with a small overhead that is never higher than 10%. Qualitatively, PBI does not require any change to software and presents a novel use of existing hardware performance counters.

Science in China Series F: Information Sciences | 2015

Fixing, preventing, and recovering from concurrency bugs

Dongdong Deng; Guoliang Jin; Marc de Kruijf; Ang Li; Ben Liblit; Shan Lu; ShanXiang Qi; Jinglei Ren; Karthikeyan Sankaralingam; Linhai Song; Yongwei Wu; Mingxing Zhang; Wei Zhang; Weimin Zheng

Concurrency bugs are becoming widespread with the emerging ubiquity of multicore processors and multithreaded software. They manifest during production runs and lead to severe losses. Many effective concurrency-bug detection tools have been built. However, the dependability of multi-threaded software does not improve until these bugs are handled statically or dynamically. This article discusses our recent progresses on fixing, preventing, and recovering from concurrency bugs.

foundations of software engineering | 2015

What change history tells us about thread synchronization

Rui Gu; Guoliang Jin; Linhai Song; Linjie Zhu; Shan Lu

Multi-threaded programs are pervasive, yet difficult to write. Missing proper synchronization leads to correctness bugs and over synchronization leads to performance problems. To improve the correctness and efficiency of multi-threaded software, we need a better understanding of synchronization challenges faced by real-world developers. This paper studies the code repositories of open-source multi-threaded software projects to obtain a broad and in- depth view of how developers handle synchronizations. We first examine how critical sections are changed when software evolves by checking over 250,000 revisions of four representative open-source software projects. The findings help us answer questions like how often synchronization is an afterthought for developers; whether it is difficult for devel- opers to decide critical section boundaries and lock variables; and what are real-world over-synchronization problems. We then conduct case studies to better understand (1) how critical sections are changed to solve performance prob- lems (i.e. over-synchronization issues) and (2) how soft- ware changes lead to synchronization-related correctness problems (i.e. concurrency bugs). This in-depth study shows that tool support is needed to help developers tackle over-synchronization problems; it also shows that concur- rency bug avoidance, detection, and testing can be improved through better awareness of code revision history.

architectural support for programming languages and operating systems | 2014

Leveraging the short-term memory of hardware to diagnose production-run software failures

Joy Arulraj; Guoliang Jin; Shan Lu

Failures caused by software bugs are widespread in production runs, causing severe losses for end users. Unfortunately, diagnosing production-run failures is challenging. Existing work cannot satisfy privacy, run-time overhead, diagnosis capability, and diagnosis latency requirements all at once. This paper designs a low overhead, low latency, privacy preserving production-run failure diagnosis system based on two observations. First, short-term memory of program execution is often sufficient for failure diagnosis, as many bugs have short propagation distances. Second, maintaining a short-term memory of execution is much cheaper than maintaining a record of the whole execution. Following these observations, we first identify an existing hardware unit, Last Branch Record (LBR), that records the last few taken branches to help diagnose sequential bugs. We then propose a simple hardware extension, Last Cache-coherence Record (LCR), to record the last few cache accesses with specified coherence states and hence help diagnose concurrency bugs. Finally, we design LBRA and LCRA to automatically locate failure root causes using LBR and LCR. Our evaluation uses 31 real-world sequential and concurrency bug failures from 18 representative open-source software. The results show that with just 16 record entries, LBR and LCR enable our system to automatically locate the root causes for 27 out of 31 failures, with less than 3% run-time overhead. As our system does not rely on sampling,

architectural support for programming languages and operating systems | 2016

CloudSeer: Workflow Monitoring of Cloud Infrastructures via Interleaved Logs

Xiao Yu; Pallavi Joshi; Jianwu Xu; Guoliang Jin; Hui Zhang; Guofei Jiang

Cloud infrastructures provide a rich set of management tasks that operate computing, storage, and networking resources in the cloud. Monitoring the executions of these tasks is crucial for cloud providers to promptly find and understand problems that compromise cloud availability. However, such monitoring is challenging because there are multiple distributed service components involved in the executions. CloudSeer enables effective workflow monitoring. It takes a lightweight non-intrusive approach that purely works on interleaved logs widely existing in cloud infrastructures. CloudSeer first builds an automaton for the workflow of each management task based on normal executions, and then it checks log messages against a set of automata for workflow divergences in a streaming manner. Divergences found during the checking process indicate potential execution problems, which may or may not be accompanied by error log messages. For each potential problem, CloudSeer outputs necessary context information including the affected task automaton and related log messages hinting where the problem occurs to help further diagnosis. Our experiments on OpenStack, a popular open-source cloud infrastructure, show that CloudSeers efficiency and problem-detection capability are suitable for online monitoring.

international conference on autonomic computing | 2015

Automatic Server Hang Bug Diagnosis: Feasible Reality or Pipe Dream?

Daniel Joseph Dean; Peipei Wang; Xiaohui Gu; William Enck; Guoliang Jin

It is notoriously difficult to diagnose server hang bugs as they often generate little diagnostic information and are difficult to reproduce offline. In this paper, we present a characteristic study of 177 real software hang bugs from 8 common open source server systems (i.e., Apache, Lighttpd, My SQL, Squid, HDFS, Hadoop Mapreduce, Tomcat, Cassandra). We identify three major root cause categories (i.e., Programmer errors, mishandled values, concurrency issues). We then describe two major problems (i.e., False positives and false negatives) while applying existing rule-based bug detection techniques to those bugs.

computer aided verification | 2013

Validating library usage interactively

William R. Harris; Guoliang Jin; Shan Lu; Somesh Jha

Programmers who develop large, mature applications often want to optimize the performance of their program without changing its semantics. They often do so by changing how their program invokes a library function or a function implemented in another module of the program. Unfortunately, once a programmer makes such an optimization, it is difficult for him to validate that the optimization does not change the semantics of the original program, because the original and optimized programs are equivalent only due to subtle, implicit assumptions about library functions called by the programs. In this work, we present an interactive program analysis that a programmer can apply to validate that his optimization does not change his programs semantics. Our analysis casts the problem of validating an optimization as an abductive inference problem in the context of checking program equivalence. Our analysis solves the abductive equivalence problem by interacting with the programmer so that the programmer implements a solver for a logical theory that models library functions invoked by the program. We have used our analysis to validate optimizations of real-world, mature applications: the Apache software suite, the Mozilla Suite, and the MySQL database.

Explore More