Thanh H. D. Nguyen | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Thanh H. D. Nguyen is active.

Explore More

Publication

Featured researches published by Thanh H. D. Nguyen.

international conference on software engineering | 2011

An empirical study of build maintenance effort

Shane McIntosh; Bram Adams; Thanh H. D. Nguyen; Yasutaka Kamei; Ahmed E. Hassan

The build system of a software project is responsible for transforming source code and other development artifacts into executable programs and deliverables. Similar to source code, build system specifications require maintenance to cope with newly implemented features, changes to imported Application Program Interfaces (APIs), and source code restructuring. In this paper, we mine the version histories of one proprietary and nine open source projects of different sizes and domain to analyze the overhead that build maintenance imposes on developers. We split our analysis into two dimensions: (1) Build Coupling, i.e., how frequently source code changes require build changes, and (2) Build Ownership, i.e., the proportion of developers responsible for build maintenance. Our results indicate that, despite the difference in scale, the build system churn rate is comparable to that of the source code, and build changes induce more relative churn on the build system than source code changes induce on the source code. Furthermore, build maintenance yields up to a 27% overhead on source code development and a 44% overhead on test development. Up to 79% of source code developers and 89% of test code developers are significantly impacted by build maintenance, yet investment in build experts can reduce the proportion of impacted developers to 22% of source code developers and 24% of test code developers.

international conference on performance engineering | 2012

Automated detection of performance regressions using statistical process control techniques

Thanh H. D. Nguyen; Bram Adams; Zhen Ming Jiang; Ahmed E. Hassan; Mohamed N. Nasser; Parminder Flora

The goal of performance regression testing is to check for performance regressions in a new version of a software system. Performance regression testing is an important phase in the software development process. Performance regression testing is very time consuming yet there is usually little time assigned for it. A typical test run would output thousands of performance counters. Testers usually have to manually inspect these counters to identify performance regressions. In this paper, we propose an approach to analyze performance counters across test runs using a statistical process control technique called control charts. We evaluate our approach using historical data of a large software team as well as an open-source software project. The results show that our approach can accurately identify performance regressions in both software systems. Feedback from practitioners is very promising due to the simplicity and ease of explanation of the results.

working conference on reverse engineering | 2010

A Case Study of Bias in Bug-Fix Datasets

Thanh H. D. Nguyen; Bram Adams; Ahmed E. Hassan

Software quality researchers build software quality models by recovering traceability links between bug reports in issue tracking repositories and source code files. However, all too often the data stored in issue tracking repositories is not explicitly tagged or linked to source code. Researchers have to resort to heuristics to tag the data (e.g., to determine if an issue is a bug report or a work item), or to link a piece of code to a particular issue or bug. Recent studies by Bird et al. and by Antoniol et al. suggest that software models based on imperfect datasets with missing links to the code and incorrect tagging of issues, exhibit biases that compromise the validity and generality of the quality models built on top of the datasets. In this study, we verify the effects of such biases for a commercial project that enforces strict development guidelines and rules on the quality of the data in its issue tracking repository. Our results show that even in such a perfect setting, with a near-ideal dataset, biases do exist – leading us to conjecture that biases are more likely a symptom of the underlying software development process instead of being due to the used heuristics.

international conference on software maintenance | 2010

Studying the impact of dependency network measures on software quality

Thanh H. D. Nguyen; Bram Adams; Ahmed E. Hassan

Dependency network measures capture various facets of the dependencies among software modules. For example, betweenness centrality measures how much information flows through a module compared to the rest of the network. Prior studies have shown that these measures are good predictors of post-release failures. However, these studies did not explore the causes for such good performance and did not provide guidance for practitioners to avoid future bugs. In this paper, we closely examine the causes for such performance by replicating prior studies using data from the Eclipse project. Our study shows that a small subset of dependency network measures have a large impact on post-release failure, while other network measures have a very limited impact. We also analyze the benefit of bug prediction in reducing testing cost. Finally, we explore the practical implications of the important network measures.

mining software repositories | 2014

An industrial case study of automatically identifying performance regression-causes

Thanh H. D. Nguyen; Meiyappan Nagappan; Ahmed E. Hassan; Mohamed N. Nasser; Parminder Flora

Even the addition of a single extra field or control statement in the source code of a large-scale software system can lead to performance regressions. Such regressions can considerably degrade the user experience. Working closely with the members of a performance engineering team, we observe that they face a major challenge in identifying the cause of a performance regression given the large number of performance counters (e.g., memory and CPU usage) that must be analyzed. We propose the mining of a regression-causes repository (where the results of performance tests and causes of past regressions are stored) to assist the performance team in identifying the regression-cause of a newly-identified regression. We evaluate our approach on an open-source system, and a commercial system for which the team is responsible. The results show that our approach can accurately (up to 80% accuracy) identify performance regression-causes using a reasonably small number of historical test runs (sometimes as few as four test runs per regression-cause).

working conference on reverse engineering | 2011

Impact of Installation Counts on Perceived Quality: A Case Study on Debian

Israel Herraiz; Emad Shihab; Thanh H. D. Nguyen; Ahmed E. Hassan

Software defects are generally used to indicate software quality. However, due to the nature of software, we are often only able to know about the defects found and reported, either following the testing process or after being deployed. In software research studies, it is assumed that a higher amount of defect reports represents a higher amount of defects in the software system. In this paper, we argue that widely deployed programs have more reported defects, regardless of their actual number of defects. To address this question, we perform a case study on the Debian GNU/Linux distribution, a well-known free / open source software collection. We compare the defects reported for all the software packages in Debian with their popularity. We find that the number of reported defects for a Debian package is limited by its popularity. This finding has implications on defect prediction studies, showing that they need to consider the impact of popularity on perceived quality, otherwise they might be risking bias.

international conference on software testing verification and validation | 2012

Using Control Charts for Detecting and Understanding Performance Regressions in Large Software

Thanh H. D. Nguyen

Load testing is a very important step in testing of large-scale software systems. For example, studies found that users are likely to abandon an online transaction if the web application fails to response within eight seconds. Performance load tests ensure that performance counters such as response time stays in the acceptable range after each change to the code. Analyzing load tests results to detect performance regression is very time consuming due to the large amount of performance counters. In this thesis, we propose approaches that use control charts, a statistical process control technique, to assist performance engineers in identifying test runs with performance regressions, pinpointing the components which cause the regressions, and determining the causes of regressions in load tests. Using our approaches, engineers will save time in analyzing the results of load tests.

international symposium on software reliability engineering | 2016

Does Geographical Distance Effect Distributed Development Teams: How Aggregation Bias in Software Artifacts Causes Contradictory Findings

Thanh H. D. Nguyen; Bram Adams; Ahmed E. Hassan

Does geographic distance affect distributed software development teams? Researchers have been mining software artifacts to find evidence that geographic distance between software team members introduces delay in communication and deliverables. While some studies found that geographical distance negatively impacts software teams, other studies dispute this finding. It has been speculated that various confounding factors are the reason for the contradicting findings. For example, newer tools and practices that enable team members to communicate and collaborate more effectively, might have negated the effects of distance in some studies. In this study, we examine an alternate theory to explain the contradicting findings: the different aggregations of the software artifacts used in past studies. We call this type of bias: the aggregation bias. We replicated the previous studies on detecting the evidence of delay in communication using the data from a large commercial distributed software project. We use two different levels of artifacts in this study: the class files and the components that are the aggregation of the class files. Our results show that the effect of distance does appear in low level artifacts. However, the effect does not appear in the aggregated artifacts. Since mining software artifacts has became a popular methodology to conduct research in software engineering, the result calls for careful attention in the use of aggregating artifacts in software studies.

asia-pacific software engineering conference | 2011