Is this you? Create Your Porfile

Foyzur Rahman

University of California, Davis

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Foyzur Rahman is active.

Explore More

Publication

Featured researches published by Foyzur Rahman.

foundations of software engineering | 2010

The missing links: bugs and bug-fix commits

Adrian Bachmann; Christian Bird; Foyzur Rahman; Premkumar T. Devanbu; Abraham Bernstein

Empirical studies of software defects rely on links between bug databases and program code repositories. This linkage is typically based on bug-fixes identified in developer-entered commit logs. Unfortunately, developers do not always report which commits perform bug-fixes. Prior work suggests that such links can be a biased sample of the entire population of fixed bugs. The validity of statistical hypotheses-testing based on linked data could well be affected by bias. Given the wide use of linked defect data, it is vital to gauge the nature and extent of the bias, and try to develop testable theories and models of the bias. To do this, we must establish ground truth: manually analyze a complete version history corpus, and nail down those commits that fix defects, and those that do not. This is a diffcult task, requiring an expert to compare versions, analyze changes, find related bugs in the bug database, reverse-engineer missing links, and finally record their work for use later. This effort must be repeated for hundreds of commits to obtain a useful sample of reported and unreported bug-fix commits. We make several contributions. First, we present Linkster, a tool to facilitate link reverse-engineering. Second, we evaluate this tool, engaging a core developer of the Apache HTTP web server project to exhaustively annotate 493 commits that occurred during a six week period. Finally, we analyze this comprehensive data set, showing that there are serious and consequential problems in the data.

mining software repositories | 2010

Clones: What is that smell?

Foyzur Rahman; Christian Bird; Premkumar T. Devanbu

Clones are generally considered bad programming practice in software engineering folklore. They are identified as a bad smell and a major contributor to project maintenance difficulties. Clones inherently cause code bloat, thus increasing project size and maintenance costs. In this work, we try to validate the conventional wisdom empirically to see whether cloning makes code more defect prone. This paper analyses relationship between cloning and defect proneness. We find that, first, the great majority of bugs are not significantly associated with clones. Second, we find that clones may be less defect prone than non-cloned code. Finally, we find little evidence that clones with more copies are actually more error prone. Our findings do not support the claim that clones are really a “bad smell”. Perhaps we can clone, and breathe easy, at the same time.

foundations of software engineering | 2013

Sample size vs. bias in defect prediction

Foyzur Rahman; Daryl Posnett; Israel Herraiz; Premkumar T. Devanbu

Most empirical disciplines promote the reuse and sharing of datasets, as it leads to greater possibility of replication. While this is increasingly the case in Empirical Software Engineering, some of the most popular bug-fix datasets are now known to be biased. This raises two significant concerns: first, that sample bias may lead to underperforming prediction models, and second, that the external validity of the studies based on biased datasets may be suspect. This issue has raised considerable consternation in the ESE literature in recent years. However, there is a confounding factor of these datasets that has not been examined carefully: size. Biased datasets are sampling only some of the data that could be sampled, and doing so in a biased fashion; but biased samples could be smaller, or larger. Smaller data sets in general provide less reliable bases for estimating models, and thus could lead to inferior model performance. In this setting, we ask the question, what affects performance more, bias, or size? We conduct a detailed, large-scale meta-analysis, using simulated datasets sampled with bias from a high-quality dataset which is relatively free of bias. Our results suggest that size always matters just as much bias direction, and in fact much more than bias direction when considering information-retrieval measures such as AUCROC and F-score. This indicates that at least for prediction models, even when dealing with sampling bias, simply finding larger samples can sometimes be sufficient. Our analysis also exposes the complexity of the bias issue, and raises further issues to be explored in the future.

international conference on software engineering | 2014

Comparing static bug finders and statistical prediction

Foyzur Rahman; Sameer Khatri; Earl T. Barr; Premkumar T. Devanbu

The all-important goal of delivering better software at lower cost has led to a vital, enduring quest for ways to find and remove defects efficiently and accurately. To this end, two parallel lines of research have emerged over the last years. Static analysis seeks to find defects using algorithms that process well-defined semantic abstractions of code. Statistical defect prediction uses historical data to estimate parameters of statistical formulae modeling the phenomena thought to govern defect occurrence and predict where defects are likely to occur. These two approaches have emerged from distinct intellectual traditions and have largely evolved independently, in “splendid isolation”. In this paper, we evaluate these two (largely) disparate approaches on a similar footing. We use historical defect data to apprise the two approaches, compare them, and seek synergies. We find that under some accounting principles, they provide comparable benefits; we also find that in some settings, the performance of certain static bug-finders can be enhanced using information provided by statistical defect prediction.

automated software engineering | 2012

To what extent could we detect field defects? an empirical study of false negatives in static bug finding tools

Ferdian Thung; Lucia; David Lo; Lingxiao Jiang; Foyzur Rahman; Premkumar T. Devanbu

Software defects can cause much loss. Static bug-finding tools are believed to help detect and remove defects. These tools are designed to find programming errors; but, do they in fact help prevent actual defects that occur in the field and reported by users? If these tools had been used, would they have detected these field defects, and generated warnings that would direct programmers to fix them? To answer these questions, we perform an empirical study that investigates the effectiveness of state-of-the-art static bug finding tools on hundreds of reported and fixed defects extracted from three open source programs: Lucene, Rhino, and AspectJ. Our study addresses the question: To what extent could field defects be found and detected by state-of-the-art static bug-finding tools? Different from past studies that are concerned with the numbers of false positives produced by such tools, we address an orthogonal issue on the numbers of false negatives. We find that although many field defects could be detected by static bug finding tools, a substantial proportion of defects could not be flagged. We also analyze the types of tool warnings that are more effective in finding field defects and characterize the types of missed defects.

foundations of software engineering | 2010

LINKSTER: enabling efficient manual inspection and annotation of mined data

Christian Bird; Adrian Bachmann; Foyzur Rahman; Abraham Bernstein

While many uses of mined software engineering data are automatic in nature, some techniques and studies either require, or can be improved, by manual methods. Unfortunately, manually inspecting, analyzing, and annotating mined data can be difficult and tedious, especially when information from multiple sources must be integrated. Oddly, while there are numerous tools and frameworks for automatically mining and analyzing data, there is a dearth of tools which facilitate manual methods. To fill this void, we have developed LINKSTER, a tool which integrates data from bug databases, source code repositories, and mailing list archives to allow manual inspection and annotation. LINKSTER has already been used successfully by an OSS project lead to obtain data for one empirical study.

international conference on software maintenance | 2012

When would this bug get reported

Ferdian Thung; David Lo; Lingxiao Jiang; Lucia; Foyzur Rahman; Premkumar T. Devanbu

Not all bugs in software would be experienced and reported by end users right away: Some bugs manifest themselves quickly and may be reported by users a few days after they get into the code base; others manifest many months or even years later, and may only be experienced and reported by a small number of users. We refer to the period of time between the time when a bug is introduced into code and the time when it is reported by a user as bug reporting latency. Knowledge of bug reporting latencies has an implication on prioritization of bug fixing activities-bugs with low reporting latencies may be fixed earlier than those with high latencies to shift debugging resources towards bugs highly concerning users. To investigate bug reporting latencies, we analyze bugs from three Java software systems: AspectJ, Rhino, and Lucene. We extract bug reporting data from their version control repositories and bug tracking systems, identify bug locations based on bug fixes, and back-trace bug introducing time based on change histories of the buggy code. Also, we remove non-essential changes, and most importantly, recover root causes of bugs from their treatments/fixes. We then calculate the bug reporting latencies, and find that bugs have diverse reporting latencies. Based on the calculated reporting latencies and features we extract from bugs, we build classification models that can predict whether a bug would be reported early (within 30 days) or later, which may be helpful for prioritizing bug fixing activities. Our evaluation on the three software systems shows that our bug reporting latency prediction models could achieve an AUC (Area Under the Receiving Operating Characteristics Curve) of 70.869%.

foundations of software engineering | 2012