Chushu Gao | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Chushu Gao is active.

Explore More

Publication

Featured researches published by Chushu Gao.

international conference on software engineering | 2016

VEnron: a versioned spreadsheet corpus and related evolution analysis

Wensheng Dou; Liang Xu; Shing Chi Cheung; Chushu Gao; Jun Wei; Tao Huang

Like most conventional software, spreadsheets are subject to software evolution. However, spreadsheet evolution is rarely assisted by version management tools. As a result, the version information across evolved spreadsheets is often missing or highly fragmented. This makes it difficult for users to notice the evolution issues arising from their spreadsheets. In this paper, we propose a semi-automated approach that leverages spreadsheets’ contexts (e.g., attached emails) and contents to identify evolved spreadsheets and recover the embedded version information. We apply it to the released email archive of the Enron Corporation and build an industrial-scale, versioned spreadsheet corpus VEnron. Our approach first clusters spreadsheets that likely evolved from one to another into evolution groups based on various fragmented information, such as spreadsheet filenames, spreadsheet contents, and spreadsheet-attached emails. Then, it recovers the version information of the spreadsheets in each evolution group. VEnron enables us to identify interesting issues that can arise from spreadsheet evolution. For example, the versioned spreadsheets popularly exist in the Enron email archive; changes in formulas are common; and some groups (16.9%) can introduce new errors during evolution. According to our knowledge, VEnron is the first spreadsheet corpus with version information. It provides a valuable resource to understand issues arising from spreadsheet evolution.

international symposium on software reliability engineering | 2015

Fast reproducing web application errors

Jie Wang; Wensheng Dou; Chushu Gao; Jun Wei

JavaScript has become the most popular language for client-side web applications. Due to JavaScripts highly-dynamic features and event-driven design, it is not easy to debug web application errors. Record-replay techniques are widely used to reproduce errors in web applications. However, the key events related to an error are hidden in the massive event trace collected during a long running. As a result, error diagnosis with the long event trace is exhausting and time-consuming. We present a tool JSTrace that can effectively cut down the web application error reproducing time and facilitate the diagnosis. Based on the dynamic dependencies of JavaScript and DOM instructions, we develop a novel dynamic slicing technique that can remove events irrelevant to the error reproducing. In this process, many events and related instructions are removed without losing the reproducing accuracy. Our evaluation shows that the reduced event trace can faithfully reproduce errors with an average reduction rate of 96%.

foundations of software engineering | 2016

Detecting table clones and smells in spreadsheets

Wensheng Dou; Shing Chi Cheung; Chushu Gao; Chang Xu; Liang Xu; Jun Wei

Spreadsheets are widely used by end users for various business tasks, such as data analysis and financial reporting. End users may perform similar tasks by cloning a block of cells (table) in their spreadsheets. The corresponding cells in these cloned tables are supposed to keep the same or similar computational semantics. However, when spreadsheets evolve, thus cloned tables can become inconsistent due to ad-hoc modifications, and as a result suffer from smells. In this paper, we propose TableCheck to detect table clones and related smells due to inconsistency among them. We observe that two tables with the same header information at their corresponding cells are likely to be table clones. Inspired by existing fingerprint-based code clone detection techniques, we developed a detection algorithm to detect this kind of table clones. We further detected outliers among corresponding cells as smells in the detected table clones. We implemented our idea into TableCheck, and applied it to real-world spreadsheets from the EUSES corpus. Experimental results show that table clones commonly exist (21.8%), and 25.6% of the spreadsheets with table clones suffer from smells due to inconsistency among these clones. TableCheck detected table clones and their smells with a precision of 92.2% and 85.5%, respectively, while existing techniques detected no more than 35.6% true smells that TableCheck could detect.

international symposium on software reliability engineering | 2015

Experience report: A characteristic study on out of memory errors in distributed data-parallel applications

Lijie Xu; Wensheng Dou; Feng Zhu; Chushu Gao; Jie Liu; Hua Zhong; Jun Wei

Out of memory (OOM) errors occur frequently in data-intensive applications that run atop distributed data-parallel frameworks, such as MapReduce and Spark. In these applications, the memory space is shared by the framework and user code. Since the framework hides the details of distributed execution, it is challenging for users to pinpoint the root causes and fix these OOM errors. This paper presents a comprehensive characteristic study on 123 real-world OOM errors in Hadoop and Spark applications. Our major findings include: (1) 12% errors are caused by the large data buffered/cached in the framework, which indicates that it is hard for users to configure the right memory quota to balance the memory usage of the framework and user code. (2) 37% errors are caused by the unexpected large runtime data, such as large data partition, hotspot key, and large key/value record. (3) Most errors (64%) are caused by memory-consuming user code, which carelessly processes unexpected large data or generates large in-memory computing results. Among them, 13% errors are also caused by the unexpected large runtime data. (4) There are three common fix patterns (used in 34% errors), namely changing the memory/dataflow-related configurations, dividing runtime data, and optimizing user code logic. Our findings inspire us to propose potential solutions to avoid the OOM errors: (1) providing dynamic memory management mechanisms to balance the memory usage of the framework and user code at runtime; (2) providing users with memory+disk data structures, since accumulating large computing results in in-memory data structures is a common cause (15% errors).

international conference on web services | 2014

Inferring Data Contract for Web-Based API

Chushu Gao; Jun Wei; Hua Zhong; Tao Huang

Web-based API is a new trend for publishing services. To correctly use the API, developers should follow certain service specifications. Data contract is a service specification to express the constraints over the data model used in the APIs. Data contracts, however are not always readily available in a formalized format if not undocumented at all. In this paper, we present an approach to infer formal data contracts for Web-based API. The approach integrates information of the parameters, error messages and testing result of Web-based API. We demonstrate how this approach infers complicated data preconditions for Web-based API in the real-world Web API platforms.

Journal of Systems and Software | 2017

JSTrace: Fast Reproducing Web Application Errors

Jie Wang; Wensheng Dou; Chushu Gao; Jun Wei

Abstract JavaScript has become the most popular language for client-side web applications. Due to JavaScripts highly-dynamic and event-driven features, it is challenging to diagnose web application errors. Record-replay techniques are used to reproduce errors in web applications. After a long run, these techniques will record a long event trace that triggers an error. Although the error-related events are few, they are interleaved with other massive error-irrelevant events. It is time-consuming to diagnose errors with long event traces. In this article, we present JSTrace, which effectively removes error-irrelevant events from the long event trace, and further facilitates error diagnosis. Based on fine-grained dependences of JavaScript and DOM instructions, we develop a novel dynamic slicing technique that can remove events irrelevant to the error. We further present rules to remove irrelevant events, which cannot be removed by dynamic slicing. In this process, many events and related instructions are removed without losing the error reproducing accuracy. Our evaluation on 13 real-world web application errors shows that the reduced event traces can faithfully reproduce errors with an average reduction rate of 97%. We further performed case studies on 4 real-world errors, and the result shows that JSTrace is useful to diagnose web application errors.

ieee international conference on services computing | 2016

MORE: A Model-Driven Operation Service for Cloud-Based IT Systems.

Wei Chen; Chaochao Liang; Yijun Wan; Chushu Gao; Guoquan Wu; Jun Wei; Tao Huang

The operation of a cloud-based IT system (system for short) is time-consuming and error-prone due to the system scale, heterogeneity and configuration dependency. Although administrators can manage their systems with various configuration management tools, a plenty of knowledge spanning various domains is necessary. To alleviate this situation, we present a model-driven service MORE (Model-driven Operation seRvicE) to automate the initial deployment and the dynamic configuration of a system. Firstly, a model is proposed to specify the high-level view of a system in the form of a desired deployment topology. Then the topology model is transformed into executable code automatically, bridging the gap between high-level abstractions and low-level details. With those executable code as input, a runtime framework is designed based on a transaction-based self-configuration protocol to achieve automation and configuration consistency. Finally, we evaluate the service abilities (including modeling system topology, automating system provisioning, performing runtime reconfiguration) with a case study.

computer software and applications conference | 2017

A Hierarchical Categorization Approach for Configuration Management Modules

Wei Chen; Peixing Xu; Wensheng Dou; Guoquan Wu; Chushu Gao; Jun Wei

Configuration management tools, CMTs for short, are a set of indispensable software for DevOps (Development and Operations). CMTs automate system deployment and configuration through CMT modules, which are reusable, shareable units of configuration code. Therefore, thousands of CMT modules have been developed for various systems, and are still growing fast. Although CMT repositories usually provide keyword-and tag-based search, a large number of search results could prevent users from finding desired CMT modules. CMT modules could be managed in a hierarchical categorization, which can limit the search scope in specified categories, and thus help to improve search performance. Unfortunately, there is no hierarchical categorization in all CMT repositories. In this paper, we propose a hierarchical categorization approach for CMT modules. Our approach first extracts frequently-used module tags as categories, and constructs the category hierarchy by mining the hierarchical relations among tags. We leverage online module profiles (names, descriptions and tags) as source information to do categorization. It trains a set of classifiers by taking TF-IDF (term frequency-inverse document frequency) of module profiles as features. Finally, our evaluation on more than 11,000 CMT modules shows that our approach could obtain 90 fine-grained and multi-layered categories, and does categorization for CMT modules with high precision (0.81), recall (0.88) and F-Measure (0.85).

automated software engineering | 2017

A comprehensive study on real world concurrency bugs in Node.js

Jie Wang; Wensheng Dou; Yu Gao; Chushu Gao; Feng Qin; Kang Yin; Jun Wei

Node.js becomes increasingly popular in building server-side JavaScript applications. It adopts an event-driven model, which supports asynchronous I/O and non-deterministic event processing. This asynchrony and non-determinism can introduce intricate concurrency bugs, and leads to unpredictable behaviors. An in-depth understanding of real world concurrency bugs in Node.js applications will significantly promote effective techniques in bug detection, testing and fixing for Node.js. In this paper, we present NodeCB, a comprehensive study on real world concurrency bugs in Node.js applications. Specifically, we have carefully studied 57 real bug cases from open-source Node.js applications, and have analyzed their bug characteristics, e.g., bug patterns and root causes, bug impacts, bug manifestation, and fix strategies. Through this study, we obtain several interesting findings, which may open up many new research directions in combating concurrency bugs in Node.js. For example, one finding is that two thirds of the bugs are caused by atomicity violation. However, due to lack of locks and transaction mechanism, Node.js cannot easily express and guarantee the atomic intention.

Journal of Systems and Software | 2017

Characterizing and diagnosing out of memory errors in MapReduce applications

Lijie Xu; Wensheng Dou; Feng Zhu; Chushu Gao; Jie Liu; Jun Wei

Out of memory (OOM) errors are common and serious in MapReduce applications. Since MapReduce framework hides the details of distributed execution, it is challenging for users to pinpoint the OOM root causes. Current memory analyzers and memory leak detectors can only figure out what objects are (unnecessarily) persisted in memory but cannot figure out where the objects come from and why the objects become so large. Thus, they cannot identify the OOM root causes. Our empirical study on 56 OOM errors in real-world MapReduce applications found that the OOM root causes are improper job configurations, data skew, and memory-consuming user code. To identify the root causes of OOM errors in MapReduce applications, we design a memory profiling tool Mprof. Mprof can automatically profile and quantify the correlation between a MapReduce application’s runtime memory usage and its static information (input data, configurations, user code). Mprof achieves this through modeling and profiling the application’s dataflow, the memory usage of user code, and performing correlation analysis on them. Based on this correlation, Mprof uses quantitative rules to trace OOM errors back to the problematic user code, data, and configurations. We evaluated Mprof through diagnosing 28 real-world OOM errors in diverse MapReduce applications. Our evaluation shows that Mprof can accurately identify the root causes of 23 OOM errors, and partly identify the root causes of the other 5 OOM errors.

Explore More