Xuan-Bach D. Le | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Xuan-Bach D. Le is active.

Explore More

Publication

Featured researches published by Xuan-Bach D. Le.

ieee international conference on software analysis evolution and reengineering | 2016

History Driven Program Repair

Xuan-Bach D. Le; David Lo; Claire Le Goues

Effective automated program repair techniques have great potential to reduce the costs of debugging and maintenance. Previously proposed automated program repair (APR) techniques often follow a generate-and-validate and test-case-driven procedure: They first randomly generate a large pool of fix candidates and then exhaustively validate the quality of the candidates by testing them against existing or provided test suites. Unfortunately, many real-world bugs cannot be repaired by existing techniques even after more than 12 hours of computation in a multi-core cloud environment. More work is needed to advance the capabilities of modern APR techniques. We propose a new technique that utilizes the wealth of bug fixesacross projects in their development history to effectively guide and drive a programrepair process. Our main insight is that recurring bug fixes are common inreal-world applications, and that previously-appearing fix patterns canprovide useful guidance to an automated repair technique. Based on this insight, our technique first automaticallymines bug fix patterns from the history of many projects. We then employ existingmutation operators to generate fix candidates for a given buggy program. Candidates that match frequently occurring historical bug fixes are consideredmore likely to be relevant, and we thus give them priority inthe random search process. Finally, candidates thatpass all the previously failed test cases are recommended as likely fixes. We compare our technique against existinggenerate-and-validate and test-driven APR approaches using 90 bugs from 5 Javaprograms. The experiment results show that our technique can producegood-quality fixes for many more bugs as compared to the baselines, while beingreasonably computationally efficient: it takes less than 20minutes, on average, to correctly fix a bug.

foundations of software engineering | 2017

S3: syntax- and semantic-guided repair synthesis via programming by examples

Xuan-Bach D. Le; Duc-Hiep Chu; David Lo; Claire Le Goues; Willem Visser

A notable class of techniques for automatic program repair is known as semantics-based. Such techniques, e.g., Angelix, infer semantic specifications via symbolic execution, and then use program synthesis to construct new code that satisfies those inferred specifications. However, the obtained specifications are naturally incomplete, leaving the synthesis engine with a difficult task of synthesizing a general solution from a sparse space of many possible solutions that are consistent with the provided specifications but that do not necessarily generalize. We present S3, a new repair synthesis engine that leverages programming-by-examples methodology to synthesize high-quality bug repairs. The novelty in S3 that allows it to tackle the sparse search space to create more general repairs is three-fold: (1) A systematic way to customize and constrain the syntactic search space via a domain-specific language, (2) An efficient enumeration- based search strategy over the constrained search space, and (3) A number of ranking features based on measures of the syntactic and semantic distances between candidate solutions and the original buggy program. We compare S3’s repair effectiveness with state-of-the-art synthesis engines Angelix, Enumerative, and CVC4. S3 can successfully and correctly fix at least three times more bugs than the best baseline on datasets of 52 bugs in small programs, and 100 bugs in real-world large programs.

automated software engineering | 2015

Synergizing Specification Miners through Model Fissions and Fusions (T)

Tien-Duy B. Le; Xuan-Bach D. Le; David Lo; Ivan Beschastnikh

Software systems are often developed and released without formal specifications. For those systems that are formally specified, developers have to continuously maintain and update the specifications or have them fall out of date. To deal with the absence of formal specifications, researchers have proposed techniques to infer the missing specifications of an implementation in a variety of forms, such as finite state automaton (FSA). Despite the progress in this area, the efficacy of the proposed specification miners needs to improve if these miners are to be adopted. We propose SpecForge, a new specification mining approach that synergizes many existing specification miners. SpecForge decomposes FSAs that are inferred by existing miners into simple constraints, through a process we refer to as model fission. It then filters the outlier constraints and fuses the constraints back together into a single FSA (i.e., model fusion). We have evaluated SpecForge on execution traces of 10 programs, which includes 5 programs from DaCapo benchmark, to infer behavioral models of 13 library classes. Our results show that SpecForge achieves an average precision, recall and F-measure of 90.57%, 54.58%, and 64.21% respectively. SpecForge outperforms the best performing baseline by 13.75% in terms of F-measure.

international conference on software maintenance | 2016

Enhancing Automated Program Repair with Deductive Verification

Xuan-Bach D. Le; Quang Loc Le; David Lo; Claire Le Goues

Automated program repair (APR) is a challenging process of detecting bugs, localizing buggy code, generating fix candidates and validating the fixes. Effectiveness of program repair methods relies on the generated fix candidates, and the methods used to traverse the space of generated candidates to search for the best ones. Existing approaches generate fix candidates based on either syntactic searches over source code or semantic analysis of specification, e.g., test cases. In this paper, we propose to combine both syntactic and semantic fix candidates to enhance the search space of APR, and provide a function to effectively traverse the search space. We present an automated repair method based on structured specifications, deductive verification and genetic programming. Given a function with its specification, we utilize a modular verifier to detect bugs and localize both program statements and sub-formulas in the specification that relate to those bugs. While the former are identified as buggy code, the latter are transformed as semantic fix candidates. We additionally generate syntactic fix candidates via various mutation operators. Best candidates, which receives fewer warnings via a static verification, are selected for evolution though genetic programming until we find one satisfying the specification. Another interesting feature of our proposed approach is that we efficiently ensure the soundness of repaired code through modular (or compositional) verification. We implemented our proposal and tested it on C programs taken from the SIR benchmark that are seeded with bugs, achieving promising results.

international conference on software maintenance | 2016

Empirical Study on Synthesis Engines for Semantics-Based Program Repair

Xuan-Bach D. Le; David Lo; Claire Le Goues

Automatic Program Repair (APR) is an emerging and rapidly growing research area, with many techniques proposed to repair defective software. One notable state-of-the-art line of APR approaches is known as semantics-based techniques, e.g., Angelix, which extract semantics constraints, i.e., specifications, via symbolic execution and test suites, and then generate repairs conforming to these constraints using program synthesis. The repair capability of such approaches—expressive power, output quality, and scalability—naturally depends on the underlying synthesis technique. However, despite recent advances in program synthesis, not much attention has been paid to assess, compare, or leverage the variety of available synthesis engine capabilities in an APR context. In this paper, we empirically compare the effectiveness of different synthesis engines for program repair. We do this by implementing a framework on top of the latest semantics-based APR technique, Angelix, that allows us to use different such engines. For this preliminary study, we use a subset of bugs in the IntroClass benchmark, a dataset of many small programs recently proposed for use in evaluating APR techniques, with a focus on assessing output quality. Our initial findings suggest that different synthesis engines have their own strengths and weaknesses, and future work on semantics-based APR should explore innovative ways to exploit and combine multiple synthesis engines.

international conference on program comprehension | 2015

Active semi-supervised defect categorization

Ferdian Thung; Xuan-Bach D. Le; David Lo

Defects are inseparable part of software development and evolution. To better comprehend problems affecting a software system, developers often store historical defects and these defects can be categorized into families. IBM proposes Orthogonal Defect Categorization (ODC) which include various classifications of defects based on a number of orthogonal dimensions (e.g., Symptoms and semantics of defects, root causes of defects, etc.). To help developers categorize defects, several approaches that employ machine learning have been proposed in the literature. Unfortunately, these approaches often require developers to manually label a large number of defect examples. In practice, manually labelling a large number of examples is both time-consuming and labor-intensive. Thus, reducing the onerous burden of manual labelling while still being able to achieve good performance is crucial towards the adoption of such approaches. To deal with this challenge, in this work, we propose an active semi-supervised defect prediction approach. It is performed by actively selecting a small subset of diverse and informative defect examples to label (i.e., Active learning), and by making use of both labeled and unlabeled defect examples in the prediction model learning process (i.e., Semi-supervised learning). Using this principle, our approach is able to learn a good model while minimizing the manual labeling effort. To evaluate the effectiveness of our approach, we make use of a benchmark dataset that contains 500 defects from three software systems that have been manually labelled into several families based on ODC. We investigate our approachs ability in achieving good classification performance, measured in terms of weighted precision, recall, F-measure, and AUC, when only a small number of manually labelled defect examples are available. Our experiment results show that our active semi-supervised defect categorization approach is able to achieve a weighted precision, recall, F-measure, and AUC of 0.651, 0.669, 0.623, and 0.710, respectively, when only 50 defects are manually labelled. Furthermore, it outperforms an existing active multi-class classification algorithm, proposed in the machine learning community, by a substantial margin.

international symposium on software testing and analysis | 2017

JFIX: semantics-based repair of Java programs via symbolic PathFinder

Xuan-Bach D. Le; Duc-Hiep Chu; David Lo; Claire Le Goues; Willem Visser

Recently there has been a proliferation of automated program repair (APR) techniques, targeting various programming languages. Such techniques can be generally classified into two families: syntactic- and semantics-based. Semantics-based APR, on which we focus, typically uses symbolic execution to infer semantic constraints and then program synthesis to construct repairs conforming to them. While syntactic-based APR techniques have been shown success- ful on bugs in real-world programs written in both C and Java, semantics-based APR techniques mostly target C programs. This leaves empirical comparisons of the APR families not fully explored, and developers without a Java-based semantics APR technique. We present JFix, a semantics-based APR framework that targets Java, and an associated Eclipse plugin. JFix is implemented atop Symbolic PathFinder, a well-known symbolic execution engine for Java programs. It extends one particular APR technique (Angelix), and is designed to be sufficiently generic to support a variety of such techniques. We demonstrate that semantics-based APR can indeed efficiently and effectively repair a variety of classes of bugs in large real-world Java programs. This supports our claim that the framework can both support developers seeking semantics-based repair of bugs in Java programs, as well as enable larger scale empirical studies comparing syntactic- and semantics-based APR targeting Java. The demonstration of our tool is available via the project website at: https://xuanbachle.github.io/semanticsrepair/

international conference on software maintenance | 2016

Recommending Code Changes for Automatic Backporting of Linux Device Drivers

Ferdian Thung; Xuan-Bach D. Le; David Lo; Julia L. Lawall

Device drivers are essential components of any operating system (OS). They specify the communication protocol that allows the OS to interact with a device. However, drivers for new devices are usually created for a specific OS version. These drivers often need to be backported to the older versions to allow use of the new device. Backporting is often done manually, and is tedious and error prone. To alleviate this burden on developers, we propose an automatic recommendation system to guide the selection of backporting changes. Our approach analyzes the version history for cues to recommend candidate changes. We have performed an experiment on 100 Linux driver files and have shown that we can give a recommendation containing the correct backport for 68 of the drivers. For these 68 cases, 73.5%, 85.3%, and 88.2% of the correct recommendations are located in the Top-1, Top-2, and Top-5 positions of the recommendation lists respectively. The successful cases cover various kinds of changes including change of record access, deletion of function argument, change of a function name, change of constant, and change of if condition. Manual investigation of failed cases highlights limitations of our approach, including inability to infer complex changes, and unavailability of relevant cues in version history.

foundations of software engineering | 2017

XSearch: a domain-specific cross-language relevant question retrieval tool

Bowen Xu; Zhenchang Xing; Xin Xia; David Lo; Xuan-Bach D. Le

During software development process, Chinese developers often seek solutions to the technical problems they encounter by searching relevant questions on Q&A sites. When developers fail to find solutions on Q&A sites in Chinese, they could translate their query and search on the English Q&A sites. However, Chinese developers who are non-native English speakers often are not comfortable to ask or search questions in English, as they do not know the proper translation of the Chinese technical words into the English technical words. Furthermore, the process of manually formulating cross-language queries and determining the importance of query words is a tedious and time-consuming process. For the purpose of helping Chinese developers take advantages of the rich knowledge base of the English version of Stack Overflow and simplify the retrieval process, we propose an automated cross-language relevant question retrieval tool (XSearch) to retrieve relevant English questions on Stack Overflow for a given Chinese question. This tool can address the increasing need for developer to solve technical problems by retrieving cross-language relevant Q&A resources. Demo Tool Website: http://172.93.36.10:8080/XSearch Demo Video: https://goo.gl/h57sed

automated software engineering | 2016

Towards efficient and effective automatic program repair

Xuan-Bach D. Le

Automatic Program Repair (APR) has recently been an emerging research area, addressing an important challenge in software engineering. APR techniques, if effective and efficient, can greatly help software debugging and maintenance. Recently proposed APR techniques can be generally classified into two families, namely search-based and semantics-based APR methods. To produce repairs, search based APR techniques generate huge populations of possible repairs, i.e., search space, and lazily search for the best one among the search space. Semantics-based APR techniques utilize constraint solving and program synthesis to make search space more tractable, and find those repairs that conform to semantics constraints extracted via symbolic execution. Despite recent advances in APR, search-based APR still suffers from search space explosion problem, while the semantics-based APR could be hindered by limited capability of constraint solving and program synthesis. Furthermore, both APR families may be subject to overfitting, in which generated repairs do not generalize to other test sets. This thesis works towards enhancing both effectiveness and efficiency in order for APR to be practically adopted in foreseeable future. To achieve this goal, other than using test cases as the primary criteria for traversing the search space, we designed a new feature used for a new search-based APR technique to effectively traverse the search space, wherein bug fix history is used to evaluate the quality of repair candidates. We also developed a deductive-reasoning-based repair technique that combines search-based and semantics-based approaches to enhance the repair capability, while ensuring the soundness of generated repairs. We also leveraged machine-learning techniques to build a predictive model that predicts whether an APR technique is effective in fixing particular bugs. In the future, we plan to synergize many existing APR techniques, improve our predictive model, and adopt the advances of other fields such as test case generation and program synthesis for APR.

Explore More