Is this you? Create Your Porfile

nan Lucia

Singapore Management University

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where nan Lucia is active.

Explore More

Publication

Featured researches published by nan Lucia.

mining software repositories | 2012

Are faults localizable

Lucia; Ferdian Thung; David Lo; Lingxiao Jiang

Many fault localization techniques have been proposed to facilitate debugging activities. Most of them attempt to pinpoint the location of faults (i.e., localize faults) based on a set of failing and correct executions and expect debuggers to investigate a certain number of located program elements to find faults. These techniques thus assume that faults are localizable, i.e., only one or a few lines of code that are close to one another are responsible for each fault. However, in reality, are faults localizable? In this work, we investigate hundreds of real faults in several software systems, and find that many faults may not be localizable to a few lines of code and these include faults with high severity level.

automated software engineering | 2014

Fusion fault localizers

Lucia; David Lo; Xin Xia

Many spectrum-based fault localization techniques have been proposed to measure how likely each program element is the root cause of a program failure. For various bugs, the best technique to localize the bugs may differ due to the characteristics of the buggy programs and their program spectra. In this paper, we leverage the diversity of existing spectrum-based fault localization techniques to better localize bugs using data fusion methods. Our proposed approach consists of three steps: score normalization, technique selection, and data fusion. We investigate two score normalization methods, two technique selection methods, and five data fusion methods resulting in twenty variants of Fusion Localizer. Our approach is bug specific in which the set of techniques to be fused are adaptively selected for each buggy program based on its spectra. Also, it requires no training data, i.e., execution traces of the past buggy programs. We evaluate our approach on a common benchmark dataset and a dataset consisting of real bugs from three medium to large programs. Our evaluation demonstrates that our approach can significantly improve the effectiveness of existing state-of-the-art fault localization techniques. Compared to these state-of-the-art techniques, the best variants of Fusion Localizer can statistically significantly reduce the amount of code to be inspected to find all bugs. Our best variants can increase the proportion of bugs localized when developers only inspect the top 10% most suspicious program elements by more than 10% and increase the number of bugs that can be successfully localized when developers only inspect up to 10 program blocks by more than 20%.

automated software engineering | 2011

Search-based fault localization

Shaowei Wang; David Lo; Lingxiao Jiang; Lucia; Hoong Chuin Lau

Many spectrum-based fault localization measures have been proposed in the literature. However, no single fault localization measure completely outperforms others: a measure which is more accurate in localizing some bugs in some programs is less accurate in localizing other bugs in other programs. This paper proposes to compose existing spectrum-based fault localization measures into an improved measure. We model the composition of various measures as an optimization problem and present a search-based approach to explore the space of many possible compositions and output a heuristically near optimal composite measure. We employ two search-based strategies including genetic algorithm and simulated annealing to look for optimal solutions and compare the effectiveness of the resulting composite measures on benchmark software systems. Compared to individual spectrum-based fault localization techniques, our composite measures perform statistically significantly better.

automated software engineering | 2012

To what extent could we detect field defects? an empirical study of false negatives in static bug finding tools

Ferdian Thung; Lucia; David Lo; Lingxiao Jiang; Foyzur Rahman; Premkumar T. Devanbu

Software defects can cause much loss. Static bug-finding tools are believed to help detect and remove defects. These tools are designed to find programming errors; but, do they in fact help prevent actual defects that occur in the field and reported by users? If these tools had been used, would they have detected these field defects, and generated warnings that would direct programmers to fix them? To answer these questions, we perform an empirical study that investigates the effectiveness of state-of-the-art static bug finding tools on hundreds of reported and fixed defects extracted from three open source programs: Lucene, Rhino, and AspectJ. Our study addresses the question: To what extent could field defects be found and detected by state-of-the-art static bug-finding tools? Different from past studies that are concerned with the numbers of false positives produced by such tools, we address an orthogonal issue on the numbers of false negatives. We find that although many field defects could be detected by static bug finding tools, a substantial proportion of defects could not be flagged. We also analyze the types of tool warnings that are more effective in finding field defects and characterize the types of missed defects.

programming language design and implementation | 2011

kb -anonymity: a model for anonymized behaviour-preserving test and debugging data

Aditya Budi; David Lo; Lingxiao Jiang; Lucia

It is often very expensive and practically infeasible to generate test cases that can exercise all possible program states in a program. This is especially true for a medium or large industrial system. In practice, industrial clients of the system often have a set of input data collected either before the system is built or after the deployment of a previous version of the system. Such data are highly valuable as they represent the operations that matter in a clients daily business and may be used to extensively test the system. However, such data often carries sensitive information and cannot be released to third-party development houses. For example, a healthcare provider may have a set of patient records that are strictly confidential and cannot be used by any third party. Simply masking sensitive values alone may not be sufficient, as the correlation among fields in the data can reveal the masked information. Also, masked data may exhibit different behavior in the system and become less useful than the original data for testing and debugging. For the purpose of releasing private data for testing and debugging, this paper proposes the kb-anonymity model, which combines the k-anonymity model commonly used in the data mining and database areas with the concept of program behavior preservation. Like k-anonymity, kb-anonymity replaces some information in the original data to ensure privacy preservation so that the replaced data can be released to third-party developers. Unlike k-anonymity, kb-anonymity ensures that the replaced data exhibits the same kind of program behavior exhibited by the original data so that the replaced data may still be useful for the purposes of testing and debugging. We also provide a concrete version of the model under three particular configurations and have successfully applied our prototype implementation to three open source programs, demonstrating the utility and scalability of our prototype.

international conference on software engineering | 2012

Active refinement of clone anomaly reports

Lucia; David Lo; Lingxiao Jiang; Aditya Budi

Software clones have been widely studied in the recent literature and shown useful for finding bugs because inconsistent changes among clones in a clone group may indicate potential bugs. However, many inconsistent clone groups are not real bugs (true positives). The excessive number of false positives could easily impede broad adoption of clone-based bug detection approaches. In this work, we aim to improve the usability of clone-based bug detection tools by increasing the rate of true positives found when a developer analyzes anomaly reports. Our idea is to control the number of anomaly reports a user can see at a time and actively incorporate incremental user feedback to continually refine the anomaly reports. Our system first presents top few anomaly reports from the list of reports generated by a tool in its default ordering. Users then either accept or reject each of the reports. Based on the feedback, our system automatically and iteratively refines a classification model for anomalies and re-sorts the rest of the reports. Our goal is to present the true positives to the users earlier than the default ordering. The rationale of the idea is based on our observation that false positives among the inconsistent clone groups could share common features (in terms of code structure, programming patterns, etc.), and these features can be learned from the incremental user feedback. We evaluate our refinement process on three sets of clone-based anomaly reports from three large real programs: the Linux Kernel (C), Eclipse, and ArgoUML (Java), extracted by a clone-based anomaly detection tool. The results show that compared to the original ordering of bug reports, we can improve the rate of true positives found (i.e., true positives are found faster) by 11%, 87%, and 86% for Linux kernel, Eclipse, and ArgoUML, respectively.

extending database technology | 2011

Mining closed discriminative dyadic sequential patterns

David Lo; Hong Cheng; Lucia

A lot of data are in sequential formats. In this study, we are interested in sequential data that goes in pairs. There are many interesting datasets in this format coming from various domains including parallel textual corpora, duplicate bug reports, and other pairs of related sequences of events. Our goal is to mine a set of closed discriminative dyadic sequential patterns from a database of sequence pairs each belonging to one of the two classes +ve and -ve. These dyadic sequential patterns characterize the discriminating facets contrasting the two classes. They are potentially good features to be used for the classification of dyadic sequential data. They can be used to characterize and flag correct and incorrect translations from parallel textual corpora, automate the manual and time consuming duplicate bug report detection process, etc. We provide a solution of this new problem by proposing new search space traversal strategy, projected database structure, pruning properties, and novel mining algorithms. To demonstrate the scalability and utility of our solution, we have experimented with both synthetic and real datasets. Experiment results show that our solution is scalable. Mined patterns are also able to improve the accuracy of one possible downstream application, namely the detection of duplicate bug reports using pattern-based classification.

international conference on software maintenance | 2012

When would this bug get reported

Ferdian Thung; David Lo; Lingxiao Jiang; Lucia; Foyzur Rahman; Premkumar T. Devanbu

Not all bugs in software would be experienced and reported by end users right away: Some bugs manifest themselves quickly and may be reported by users a few days after they get into the code base; others manifest many months or even years later, and may only be experienced and reported by a small number of users. We refer to the period of time between the time when a bug is introduced into code and the time when it is reported by a user as bug reporting latency. Knowledge of bug reporting latencies has an implication on prioritization of bug fixing activities-bugs with low reporting latencies may be fixed earlier than those with high latencies to shift debugging resources towards bugs highly concerning users. To investigate bug reporting latencies, we analyze bugs from three Java software systems: AspectJ, Rhino, and Lucene. We extract bug reporting data from their version control repositories and bug tracking systems, identify bug locations based on bug fixes, and back-trace bug introducing time based on change histories of the buggy code. Also, we remove non-essential changes, and most importantly, recover root causes of bugs from their treatments/fixes. We then calculate the bug reporting latencies, and find that bugs have diverse reporting latencies. Based on the calculated reporting latencies and features we extract from bugs, we build classification models that can predict whether a bug would be reported early (within 30 days) or later, which may be helpful for prioritizing bug fixing activities. Our evaluation on the three software systems shows that our bug reporting latency prediction models could achieve an AUC (Area Under the Receiving Operating Characteristics Curve) of 70.869%.

ACM Sigsoft Software Engineering Notes | 2014

Leveraging machine learning and information retrieval techniques in software evolution tasks: summary of the first MALIR-SE workshop, at ASE 2013

Lucia; David Lo; Giuseppe Scanniello; Alessandro Marchetto; Nasir Ali; Collin McMillan

The first International Workshop on MAchine Learning and Information Retrieval for Software Evolution (MALIR-SE) was held on the 11th of November 2013. The workshop was held in conjunction with the 28th IEEE/ACM International Conference on Automated Software Engineering (ASE) in Silicon Valley, California, USA. The workshop brought researchers and practitioners that were interested in leveraging machine learning and information retrieval techniques to automate various software evolution tasks. During the workshop, papers on the application of machine learning and information retrieval techniques to bug fix time prediction and anti-pattern detection were presented. There were also discussions on the presented papers and on future direction of research in the area.

ACM Sigsoft Software Engineering Notes | 2015

Improving Software Quality and Productivity Leveraging Mining Techniques: [Summary of the Second Workshop on Software Mining, at ASE 2013]

Ming Li; Hongyu Zhang; David Lo; Lucia

The second International Workshop on Software Mining (Soft-mine) was held on the 11th of November 2013. The workshop was held in conjunction with the 28th IEEE/ACM International Conference on Automated Software Engineering (ASE) in Silicon Valley, California, USA. The workshop has facilitated researchers who are interested in mining various types of software-related data and in applying data mining techniques to support software engineering tasks. During the workshop, seven papers on software mining and behavior models, execution trace mining, and bug localization and fixing were presented. One of the papers received the best paper award. Furthermore, there were two invited talk sessions presented by two active researchers from software engineering and data mining community.

Explore More