Wensheng Dou | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Wensheng Dou is active.

Explore More

Publication

Featured researches published by Wensheng Dou.

international conference on software engineering | 2014

Is spreadsheet ambiguity harmful? detecting and repairing spreadsheet smells due to ambiguous computation

Wensheng Dou; Shing Chi Cheung; Jun Wei

Spreadsheets are widely used by end users for numerical computation in their business. Spreadsheet cells whose computation is subject to the same semantics are often clustered in a row or column. When a spreadsheet evolves, these cell clusters can degenerate due to ad hoc modifications or undisciplined copy-and-pastes. Such degenerated clusters no longer keep cells prescribing the same computational semantics, and are said to exhibit ambiguous computation smells. Our empirical study finds that such smells are common and likely harmful. We propose AmCheck, a novel technique that automatically detects and repairs ambiguous computation smells by recovering their intended computational semantics. A case study using AmCheck suggests that it is useful for discovering and repairing real spreadsheet problems.

international conference on software engineering | 2016

VEnron: a versioned spreadsheet corpus and related evolution analysis

Wensheng Dou; Liang Xu; Shing Chi Cheung; Chushu Gao; Jun Wei; Tao Huang

Like most conventional software, spreadsheets are subject to software evolution. However, spreadsheet evolution is rarely assisted by version management tools. As a result, the version information across evolved spreadsheets is often missing or highly fragmented. This makes it difficult for users to notice the evolution issues arising from their spreadsheets. In this paper, we propose a semi-automated approach that leverages spreadsheets’ contexts (e.g., attached emails) and contents to identify evolved spreadsheets and recover the embedded version information. We apply it to the released email archive of the Enron Corporation and build an industrial-scale, versioned spreadsheet corpus VEnron. Our approach first clusters spreadsheets that likely evolved from one to another into evolution groups based on various fragmented information, such as spreadsheet filenames, spreadsheet contents, and spreadsheet-attached emails. Then, it recovers the version information of the spreadsheets in each evolution group. VEnron enables us to identify interesting issues that can arise from spreadsheet evolution. For example, the versioned spreadsheets popularly exist in the Enron email archive; changes in formulas are common; and some groups (16.9%) can introduce new errors during evolution. According to our knowledge, VEnron is the first spreadsheet corpus with version information. It provides a valuable resource to understand issues arising from spreadsheet evolution.

international symposium on software reliability engineering | 2015

Fast reproducing web application errors

Jie Wang; Wensheng Dou; Chushu Gao; Jun Wei

JavaScript has become the most popular language for client-side web applications. Due to JavaScripts highly-dynamic features and event-driven design, it is not easy to debug web application errors. Record-replay techniques are widely used to reproduce errors in web applications. However, the key events related to an error are hidden in the massive event trace collected during a long running. As a result, error diagnosis with the long event trace is exhausting and time-consuming. We present a tool JSTrace that can effectively cut down the web application error reproducing time and facilitate the diagnosis. Based on the dynamic dependencies of JavaScript and DOM instructions, we develop a novel dynamic slicing technique that can remove events irrelevant to the error reproducing. In this process, many events and related instructions are removed without losing the reproducing accuracy. Our evaluation shows that the reduced event trace can faithfully reproduce errors with an average reduction rate of 96%.

IEEE Transactions on Software Engineering | 2017

CACheck: Detecting and Repairing Cell Arrays in Spreadsheets

Wensheng Dou; Chang Xu; Shing Chi Cheung; Jun Wei

Spreadsheets are widely used by end users for numerical computation in their business. Spreadsheet cells whose computation is subject to the same semantics are often clustered in a row or column as a cell array. When a spreadsheet evolves, the cells in a cell array can degenerate due to ad hoc modifications. Such degenerated cell arrays no longer keep cells prescribing the same computational semantics, and are said to exhibit ambiguous computation smells. We propose CACheck, a novel technique that automatically detects and repairs smelly cell arrays by recovering their intended computational semantics. Our empirical study on the EUSES and Enron corpora finds that such smelly cell arrays are common. Our study also suggests that CACheck is useful for detecting and repairing real spreadsheet problems caused by smelly cell arrays. Compared with our previous work AmCheck, CACheck detects smelly cell arrays with higher precision and recall rate.

foundations of software engineering | 2016

Detecting table clones and smells in spreadsheets

Wensheng Dou; Shing Chi Cheung; Chushu Gao; Chang Xu; Liang Xu; Jun Wei

Spreadsheets are widely used by end users for various business tasks, such as data analysis and financial reporting. End users may perform similar tasks by cloning a block of cells (table) in their spreadsheets. The corresponding cells in these cloned tables are supposed to keep the same or similar computational semantics. However, when spreadsheets evolve, thus cloned tables can become inconsistent due to ad-hoc modifications, and as a result suffer from smells. In this paper, we propose TableCheck to detect table clones and related smells due to inconsistency among them. We observe that two tables with the same header information at their corresponding cells are likely to be table clones. Inspired by existing fingerprint-based code clone detection techniques, we developed a detection algorithm to detect this kind of table clones. We further detected outliers among corresponding cells as smells in the detected table clones. We implemented our idea into TableCheck, and applied it to real-world spreadsheets from the EUSES corpus. Experimental results show that table clones commonly exist (21.8%), and 25.6% of the spreadsheets with table clones suffer from smells due to inconsistency among these clones. TableCheck detected table clones and their smells with a precision of 92.2% and 85.5%, respectively, while existing techniques detected no more than 35.6% true smells that TableCheck could detect.

international symposium on software reliability engineering | 2015

Experience report: A characteristic study on out of memory errors in distributed data-parallel applications

Lijie Xu; Wensheng Dou; Feng Zhu; Chushu Gao; Jie Liu; Hua Zhong; Jun Wei

Out of memory (OOM) errors occur frequently in data-intensive applications that run atop distributed data-parallel frameworks, such as MapReduce and Spark. In these applications, the memory space is shared by the framework and user code. Since the framework hides the details of distributed execution, it is challenging for users to pinpoint the root causes and fix these OOM errors. This paper presents a comprehensive characteristic study on 123 real-world OOM errors in Hadoop and Spark applications. Our major findings include: (1) 12% errors are caused by the large data buffered/cached in the framework, which indicates that it is hard for users to configure the right memory quota to balance the memory usage of the framework and user code. (2) 37% errors are caused by the unexpected large runtime data, such as large data partition, hotspot key, and large key/value record. (3) Most errors (64%) are caused by memory-consuming user code, which carelessly processes unexpected large data or generates large in-memory computing results. Among them, 13% errors are also caused by the unexpected large runtime data. (4) There are three common fix patterns (used in 34% errors), namely changing the memory/dataflow-related configurations, dividing runtime data, and optimizing user code logic. Our findings inspire us to propose potential solutions to avoid the OOM errors: (1) providing dynamic memory management mechanisms to balance the memory usage of the framework and user code at runtime; (2) providing users with memory+disk data structures, since accumulating large computing results in in-memory data structures is a common cause (15% errors).

Journal of Systems and Software | 2017

JSTrace: Fast Reproducing Web Application Errors

Jie Wang; Wensheng Dou; Chushu Gao; Jun Wei

Abstract JavaScript has become the most popular language for client-side web applications. Due to JavaScripts highly-dynamic and event-driven features, it is challenging to diagnose web application errors. Record-replay techniques are used to reproduce errors in web applications. After a long run, these techniques will record a long event trace that triggers an error. Although the error-related events are few, they are interleaved with other massive error-irrelevant events. It is time-consuming to diagnose errors with long event traces. In this article, we present JSTrace, which effectively removes error-irrelevant events from the long event trace, and further facilitates error diagnosis. Based on fine-grained dependences of JavaScript and DOM instructions, we develop a novel dynamic slicing technique that can remove events irrelevant to the error. We further present rules to remove irrelevant events, which cannot be removed by dynamic slicing. In this process, many events and related instructions are removed without losing the error reproducing accuracy. Our evaluation on 13 real-world web application errors shows that the reduced event traces can faithfully reproduce errors with an average reduction rate of 97%. We further performed case studies on 4 real-world errors, and the result shows that JSTrace is useful to diagnose web application errors.

practical aspects of declarative languages | 2018

Rewriting High-Level Spreadsheet Structures into Higher-Order Functional Programs

Florian Biermann; Wensheng Dou; Peter Sestoft

Spreadsheets are used heavily in industry and academia. Often, spreadsheet models are developed for years and their complexity grows vastly beyond what the paradigm was originally conceived for. Such complexity often comes at the cost of recalculation performance. However, spreadsheet models usually have some high-level structure that can be used to improve performance by performing independent computation in parallel. In this paper, we devise rules for rewriting high-level spreadsheet structure in the form of so-called cell arrays into higher-order functional programs that can be easily parallelized on multicore processors. We implement our rule set for the experimental Funcalc spreadsheet engine which already implements parallelizable higher-order array functions as well as user-defined higher-order functions. Benchmarks show that our rewriting approach improves recalculation performance for spreadsheets that are dominated by cell arrays.

computer software and applications conference | 2017

A Hierarchical Categorization Approach for Configuration Management Modules

Wei Chen; Peixing Xu; Wensheng Dou; Guoquan Wu; Chushu Gao; Jun Wei

Configuration management tools, CMTs for short, are a set of indispensable software for DevOps (Development and Operations). CMTs automate system deployment and configuration through CMT modules, which are reusable, shareable units of configuration code. Therefore, thousands of CMT modules have been developed for various systems, and are still growing fast. Although CMT repositories usually provide keyword-and tag-based search, a large number of search results could prevent users from finding desired CMT modules. CMT modules could be managed in a hierarchical categorization, which can limit the search scope in specified categories, and thus help to improve search performance. Unfortunately, there is no hierarchical categorization in all CMT repositories. In this paper, we propose a hierarchical categorization approach for CMT modules. Our approach first extracts frequently-used module tags as categories, and constructs the category hierarchy by mining the hierarchical relations among tags. We leverage online module profiles (names, descriptions and tags) as source information to do categorization. It trains a set of classifiers by taking TF-IDF (term frequency-inverse document frequency) of module profiles as features. Finally, our evaluation on more than 11,000 CMT modules shows that our approach could obtain 90 fine-grained and multi-layered categories, and does categorization for CMT modules with high precision (0.81), recall (0.88) and F-Measure (0.85).

automated software engineering | 2017

A comprehensive study on real world concurrency bugs in Node.js

Jie Wang; Wensheng Dou; Yu Gao; Chushu Gao; Feng Qin; Kang Yin; Jun Wei

Node.js becomes increasingly popular in building server-side JavaScript applications. It adopts an event-driven model, which supports asynchronous I/O and non-deterministic event processing. This asynchrony and non-determinism can introduce intricate concurrency bugs, and leads to unpredictable behaviors. An in-depth understanding of real world concurrency bugs in Node.js applications will significantly promote effective techniques in bug detection, testing and fixing for Node.js. In this paper, we present NodeCB, a comprehensive study on real world concurrency bugs in Node.js applications. Specifically, we have carefully studied 57 real bug cases from open-source Node.js applications, and have analyzed their bug characteristics, e.g., bug patterns and root causes, bug impacts, bug manifestation, and fix strategies. Through this study, we obtain several interesting findings, which may open up many new research directions in combating concurrency bugs in Node.js. For example, one finding is that two thirds of the bugs are caused by atomicity violation. However, due to lack of locks and transaction mechanism, Node.js cannot easily express and guarantee the atomic intention.

Explore More