Tien-Duy B. Le | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Tien-Duy B. Le is active.

Explore More

Publication

Featured researches published by Tien-Duy B. Le.

foundations of software engineering | 2015

Information retrieval and spectrum based bug localization: better together

Tien-Duy B. Le; Richard Jayadi Oentaryo; David Lo

Debugging often takes much effort and resources. To help developers debug, numerous information retrieval (IR)-based and spectrum-based bug localization techniques have been proposed. IR-based techniques process textual information in bug reports, while spectrum-based techniques process program spectra (i.e., a record of which program elements are executed for each test case). Both eventually generate a ranked list of program elements that are likely to contain the bug. However, these techniques only consider one source of information, either bug reports or program spectra, which is not optimal. To deal with the limitation of existing techniques, in this work, we propose a new multi-modal technique that considers both bug reports and program spectra to localize bugs. Our approach adaptively creates a bug-specific model to map a particular bug to its possible location, and introduces a novel idea of suspicious words that are highly associated to a bug. We evaluate our approach on 157 real bugs from four software systems, and compare it with a state-of-the-art IR-based bug localization method, a state-of-the-art spectrum-based bug localization method, and three state-of-the-art multi-modal feature location methods that are adapted for bug localization. Experiments show that our approach can outperform the baselines by at least 47.62%, 31.48%, 27.78%, and 28.80% in terms of number of bugs successfully localized when a developer inspects 1, 5, and 10 program elements (i.e., Top 1, Top 5, and Top 10), and Mean Average Precision (MAP) respectively.

international symposium on software testing and analysis | 2016

A learning-to-rank based fault localization approach using likely invariants

Tien-Duy B. Le; David Lo; Claire Le Goues; Lars Grunske

Debugging is a costly process that consumes much of developer time and energy. To help reduce debugging effort, many studies have proposed various fault localization approaches. These approaches take as input a set of test cases (some failing, some passing) and produce a ranked list of program elements that are likely to be the root cause of the failures (i.e., failing test cases). In this work, we propose Savant, a new fault localization approach that employs a learning-to-rank strategy, using likely invariant diffs and suspiciousness scores as features, to rank methods based on their likelihood to be a root cause of a failure. Savant has four steps: method clustering & test case selection, invariant mining, feature extraction, and method ranking. At the end of these four steps, Savant produces a short ranked list of potentially buggy methods. We have evaluated Savant on 357 real-life bugs from 5 programs from the Defects4J benchmark. Out of these bugs, averaging over 100 repeated trials with different seeds to randomly break ties, we find that on average Savant can identify correct buggy methods for 63.03, 101.72, and 122 bugs at top 1, 3, and 5 positions in the ranked lists that Savant produces. We have compared Savant against several state-of-the-art fault localization baselines that work on program spectra. We show that Savant can successfully locate 57.73%, 56.69%, and 43.13% more bugs at top 1, top 3, and top 5 positions than the best performing baseline, respectively.

international conference on software maintenance | 2013

Theory and Practice, Do They Match? A Case with Spectrum-Based Fault Localization

Tien-Duy B. Le; Ferdian Thung; David Lo

Spectrum-based fault localization refers to the process of identifying program units that are buggy from two sets of execution traces: normal traces and faulty traces. These approaches use statistical formulas to measure the suspiciousness of program units based on the execution traces. There have been many spectrum-based fault localization approaches proposing various formulas in the literature. Two of the best performing and well-known ones are Tarantula and Ochiai. Recently, Xie et al. find that theoretically, under certain assumptions, two families of spectrum-based fault localization formulas outperform all other formulas including those of Tarantula and Ochiai. In this work, we empirically validate Xie et al.s findings by comparing the performance of the theoretically best formulas against popular approaches on a dataset containing 199 buggy versions of 10 programs. Our empirical study finds that Ochiai and Tarantula statistically significantly outperforms 3 out of 5 theoretically best fault localization techniques. For the remaining two, Ochiai also outperforms them, albeit not statistically significantly. This happens because an assumption in Xie et al.s work is not satisfied in many fault localization settings.

international conference on software maintenance | 2013

Will Fault Localization Work for These Failures? An Automated Approach to Predict Effectiveness of Fault Localization Tools

Tien-Duy B. Le; David Lo

Debugging is a crucial yet expensive activity to improve the reliability of software systems. To reduce debugging cost, various fault localization tools have been proposed. A spectrum-based fault localization tool often outputs an ordered list of program elements sorted based on their likelihood to be the root cause of a set of failures (i.e., their suspiciousness scores). Despite the many studies on fault localization, unfortunately, however, for many bugs, the root causes are often low in the ordered list. This potentially causes developers to distrust fault localization tools. Recently, Parnin and Orso highlight in their user study that many debuggers do not find fault localization useful if they do not find the root cause early in the list. To alleviate the above issue, we build an oracle that could predict whether the output of a fault localization tool can be trusted or not. If the output is not likely to be trusted, developers do not need to spend time going through the list of most suspicious program elements one by one. Rather, other conventional means of debugging could be performed. To construct the oracle, we extract the values of a number of features that are potentially related to the effectiveness of fault localization. Building upon advances in machine learning, we process these feature values to learn a discriminative model that is able to predict the effectiveness of a fault localization tool output. In this preliminary work, we consider an output of a fault localization tool to be effective if the root cause appears in the top 10 most suspicious program elements. We have experimented our proposed oracle on 200 faulty programs from Space, NanoXML, XML-Security, and the 7 programs in Siemens test suite. Our experiments demonstrate that we could predict the effectiveness of fault localization tool with a precision, recall, and F-measure (harmonic mean of precision and recall) of 54.36%, 95.29%, and 69.23%. The numbers indicate that many ineffective fault localization instances are identified correctly, while only very few effective ones are identified wrongly.

Empirical Software Engineering | 2015

Should I follow this fault localization tool's output?

Tien-Duy B. Le; David Lo; Ferdian Thung

Debugging is a crucial yet expensive activity to improve the reliability of software systems. To reduce debugging cost, various fault localization tools have been proposed. A spectrum-based fault localization tool often outputs an ordered list of program elements sorted based on their likelihood to be the root cause of a set of failures (i.e., their suspiciousness scores). Despite the many studies on fault localization, unfortunately, however, for many bugs, the root causes are often low in the ordered list. This potentially causes developers to distrust fault localization tools. Recently, Parnin and Orso highlight in their user study that many debuggers do not find fault localization useful if they do not find the root cause early in the list. To alleviate the above issue, we build an oracle that could predict whether the output of a fault localization tool can be trusted or not. If the output is not likely to be trusted, developers do not need to spend time going through the list of most suspicious program elements one by one. Rather, other conventional means of debugging could be performed. To construct the oracle, we extract the values of a number of features that are potentially related to the effectiveness of fault localization. Building upon advances in machine learning, we process these feature values to learn a discriminative model that is able to predict the effectiveness of a fault localization tool output. In this work, we consider an output of a fault localization tool to be effective if the root cause appears in the top 10 most suspicious program elements. We have evaluated our proposed oracle on 200 faulty versions of Space, NanoXML, XML-Security, and the 7 programs in Siemens test suite. Our experiments demonstrate that we could predict the effectiveness of 9 fault localization tools with a precision, recall, and F-measure (harmonic mean of precision and recall) of up to 74.38 %, 90.00 % and 81.45 %, respectively. The numbers indicate that many ineffective fault localization instances are identified correctly, while only few effective ones are identified wrongly.

international conference on program comprehension | 2015

RCLinker: automated linking of issue reports and commits leveraging rich contextual information

Tien-Duy B. Le; Mario Linares-Vásquez; David Lo; Denys Poshyvanyk

Links between issue reports and their corresponding commits in version control systems are often missing. However, these links are important for measuring the quality of various parts of a software system, predicting defects, and many other tasks. A number of existing approaches have been designed to solve this problem by automatically linking bug reports to source code commits via comparison of textual information in commit messages with textual contents in the bug reports. Yet, the effectiveness of these techniques is oftentimes sub optimal when commit messages are empty or only contain minimum information, this particular problem makes the process of recovering trace ability links between commits and bug reports particularly challenging. In this work, we aim at improving the effectiveness of existing bug linking techniques by utilizing rich contextual information. We rely on a recently proposed tool, namely Change Scribe, which generates commit messages containing rich contextual information by using a number of code summarization techniques. Our approach then extracts features from these automatically generated commit messages and bug reports and inputs them into a classification technique that creates a discriminative model used to predict if a link exists between a commit message and a bug report. We compared our approach, coined as RCLinker (Rich Context Linker), to MLink, which is an existing state-of-the-art bug linking approach. Our experiment results on bug reports from 6 software projects show that RCLinker can outperform MLink in terms of F-measure by 138.66%.

ieee international conference on software analysis evolution and reengineering | 2015

Beyond support and confidence: Exploring interestingness measures for rule-based specification mining

Tien-Duy B. Le; David Lo

Numerous rule-based specification mining approaches have been proposed in the literature. Many of these approaches analyze a set of execution traces to discover interesting usage rules, e.g., whenever lock() is invoked, eventually unlock() is invoked. These techniques often generate and enumerate a set of candidate rules and compute some interestingness scores. Rules whose interestingness scores are above a certain threshold would then be output. In past studies, two measures, namely support and confidence, which are well-known measures, are often used to compute these scores. However, aside from these two, many other interestingness measures have been proposed. It is thus unclear if support and confidence are the best interestingness measures for specification mining. In this work, we perform an empirical study that investigates the utility of 38 interestingness measures in recovering correct specifications of classes from Java libraries. We used a ground truth dataset consisting of 683 rules and recorded execution traces that are produced when we run the DaCapo test suite. We apply 38 different interestingness measures to identify correct rules from a pool of candidate rules. Our study highlights that many measures are on par to support and confidence. Some of the measures are even better than support or confidence and at least one of the measures is statistically significantly better than the two measures. We also find that compositions of several measures with support statistically significantly outperform the composition of support and confidence. Our findings highlight the need to look beyond standard support and confidence to find interesting rules.

automated software engineering | 2015

Synergizing Specification Miners through Model Fissions and Fusions (T)

Tien-Duy B. Le; Xuan-Bach D. Le; David Lo; Ivan Beschastnikh

Software systems are often developed and released without formal specifications. For those systems that are formally specified, developers have to continuously maintain and update the specifications or have them fall out of date. To deal with the absence of formal specifications, researchers have proposed techniques to infer the missing specifications of an implementation in a variety of forms, such as finite state automaton (FSA). Despite the progress in this area, the efficacy of the proposed specification miners needs to improve if these miners are to be adopted. We propose SpecForge, a new specification mining approach that synergizes many existing specification miners. SpecForge decomposes FSAs that are inferred by existing miners into simple constraints, through a process we refer to as model fission. It then filters the outlier constraints and fuses the constraints back together into a single FSA (i.e., model fusion). We have evaluated SpecForge on execution traces of 10 programs, which includes 5 programs from DaCapo benchmark, to infer behavioral models of 13 library classes. Our results show that SpecForge achieves an average precision, recall and F-measure of 90.57%, 54.58%, and 64.21% respectively. SpecForge outperforms the best performing baseline by 13.75% in terms of F-measure.

international conference on software maintenance | 2016

Inferring Links between Concerns and Methods with Multi-abstraction Vector Space Model

Yun Zhang; David Lo; Xin Xia; Tien-Duy B. Le; Giuseppe Scanniello; Jianling Sun

Concern localization refers to the process of locating code units that match a particular textual description. It takes as input textual documents such as bug reports and feature requests and outputs a list of candidate code units that are relevant to the bug reports or feature requests. Many information retrieval (IR) based concern localization techniques have been proposed in the literature. These techniques typically represent code units and textual descriptions as a bag of tokens at one level of abstraction, e.g., each token is a word, or each token is a topic. In this work, we propose a multi-abstraction concern localization technique named MULAB. MULAB represents a code unit and a textual description at multiple abstraction levels. Similarity of a textual description and a code unit is now made by considering all these abstraction levels. We combine a vector space model and multiple topic models to compute the similarity and apply a genetic algorithm to infer semi-optimal topic model configurations. We have evaluated our solution on 136 concerns from 8 open source Java software systems. The experimental results show that MULAB outperforms the state-of-art baseline PR, which is proposed by Scanniello et al. in terms of effectiveness and rank.

automated software engineering | 2016

Diversity maximization speedup for localizing faults in single-fault and multi-fault programs

Xin Xia; Liang Gong; Tien-Duy B. Le; David Lo; Lingxiao Jiang; Hongyu Zhang

Fault localization is useful for reducing debugging effort. Such techniques require test cases with oracles, which can determine whether a program behaves correctly for every test input. Although most fault localization techniques can localize faults relatively accurately even with a small number of test cases, choosing the right test cases and creating oracles for them are not easy. Test oracle creation is expensive because it can take much manual labeling effort (i.e., effort needed to decide whether the test cases pass or fail). Given a number of test cases to be executed, it is challenging to minimize the number of test cases requiring manual labeling and in the meantime achieve good fault localization accuracy. To address this challenge, this paper presents a novel test case selection strategy based on Diversity Maximization Speedup (Dms). Dms orders a set of unlabeled test cases in a way that maximizes the effectiveness of a fault localization technique. Developers are only expected to label a much smaller number of test cases along this ordering to achieve good fault localization results. We evaluate the performance of Dms on 2 different types of programs, single-fault and multi-fault programs. Our experiments with 411 faults from the Software-artifact Infrastructure Repository show (1) that Dms can help existing fault localization techniques to achieve comparable accuracy with on average 67 and 6 % fewer labeled test cases than previously best test case prioritization techniques for single-fault and multi-fault programs, and (2) that given a labeling budget (i.e., a fixed number of labeled test cases), Dms can help existing fault localization techniques reduce their debugging cost (in terms of the amount of code needed to be inspected to locate faults). We conduct hypothesis test and show that the saving of the debugging cost we achieve for the real C programs are statistically significant.

Explore More