Nicolas Bettenburg
Queen's University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Nicolas Bettenburg.
foundations of software engineering | 2008
Nicolas Bettenburg; Sascha Just; Adrian Schröter; Cathrin Weiss; Rahul Premraj; Thomas Zimmermann
In software development, bug reports provide crucial information to developers. However, these reports widely differ in their quality. We conducted a survey among developers and users of APACHE, ECLIPSE, and MOZILLA to find out what makes a good bug report. The analysis of the 466 responses revealed an information mismatch between what developers need and what users supply. Most developers consider steps to reproduce, stack traces, and test cases as helpful, which are, at the same time, most difficult to provide for users. Such insight is helpful for designing new bug tracking tools that guide users at collecting and providing more helpful information. Our CUEZILLA prototype is such a tool and measures the quality of new bug reports; it also recommends which elements should be added to improve the quality. We trained CUEZILLA on a sample of 289 bug reports, rated by developers as part of the survey. The participants of our survey also provided 175 comments on hurdles in reporting and resolving bugs. Based on these comments, we discuss several recommendations for better bug tracking systems, which should focus on engaging bug reporters, better tool support, and improved handling of bug duplicates.
international conference on software maintenance | 2008
Nicolas Bettenburg; Rahul Premraj; Thomas Zimmermann; Sunghun Kim
In a survey we found that most developers have experienced duplicated bug reports, however, only few considered them as a serious problem. This contradicts popular wisdom that considers bug duplicates as a serious problem for open source projects. In the survey, developers also pointed out that the additional information provided by duplicates helps to resolve bugs quicker. In this paper, we therefore propose to merge bug duplicates, rather than treating them separately. We quantify the amount of information that is added for developers and show that automatic triaging can be improved as well. In addition, we discuss the different reasons why users submit duplicate bug reports in the first place.
mining software repositories | 2008
Nicolas Bettenburg; Rahul Premraj; Thomas Zimmermann; Sunghun Kim
In software engineering experiments, the description of bug reports is typically treated as natural language text, although it often contains stack traces, source code, and patches. Neglecting such structural elements is a loss of valuable information; structure usually leads to a better performance of machine learning approaches. In this paper, we present a tool called infoZilla that detects structural elements from bug reports with near perfect accuracy and allows us to extract them. We anticipate that infoZilla can be used to leverage data from bug reports at a different granularity level that can facilitate interesting research in the future.
IEEE Transactions on Software Engineering | 2010
Thomas Zimmermann; Rahul Premraj; Nicolas Bettenburg; Sascha Just; Adrian Schröter; Cathrin Weiss
In software development, bug reports provide crucial information to developers. However, these reports widely differ in their quality. We conducted a survey among developers and users of APACHE, ECLIPSE, and MOZILLA to find out what makes a good bug report. The analysis of the 466 responses revealed an information mismatch between what developers need and what users supply. Most developers consider steps to reproduce, stack traces, and test cases as helpful, which are, at the same time, most difficult to provide for users. Such insight is helpful for designing new bug tracking tools that guide users at collecting and providing more helpful information. Our CUEZILLA prototype is such a tool and measures the quality of new bug reports; it also recommends which elements should be added to improve the quality. We trained CUEZILLA on a sample of 289 bug reports, rated by developers as part of the survey. The participants of our survey also provided 175 comments on hurdles in reporting and resolving bugs. Based on these comments, we discuss several recommendations for better bug tracking systems, which should focus on engaging bug reporters, better tool support, and improved handling of bug duplicates.
working conference on reverse engineering | 2009
Nicolas Bettenburg; Weiyi Shang; Walid M. Ibrahim; Bram Adams; Ying Zou; Ahmed E. Hassan
Current research on code clones tries to address the question whether or not code clones are harmful for the quality of software. As most of these studies are based on the fine-grained analysis of inconsistent changes at the revision level, they capture much of the chaotic and experimental nature inherent to any ongoing software development process. Conclusions drawn from the inspection of highly fluctuating and short-lived clones are likely to exaggerate the ill effects of inconsistent changes. To gain a broader perspective, we perform an empirical study on the effect of inconsistent changes on software quality at the release level. Based on a case study on two open source software systems, we observe that only 1% to 3% of inconsistent changes to clones introduce software defects, as opposed to substantially higher percentages reported by other studies. Our findings suggest that developers are able to effectively manage and control the evolution of cloned code at the release level.
eclipse technology exchange | 2007
Nicolas Bettenburg; Sascha Just; Adrian Schröter; Cathrin Weiß; Rahul Premraj; Thomas Zimmermann
The information in bug reports influences the speed at which bugs are fixed. However, bug reports differ in their quality of information. We conducted a survey among ECLIPSE developers to determine the information in reports that they widely used and the problems frequently encountered. Our results show that steps to reproduce and stack traces are most sought after by developers, while inaccurate steps to reproduce and incomplete information pose the largest hurdles. Surprisingly, developers are indifferent to bug duplicates. Such insight is useful to design new bug tracking tools that guide reporters at providing more helpful information. We also present a prototype of a quality-meter tool that measures the quality of bug reports by scanning its content.
mining software repositories | 2012
Nicolas Bettenburg; Meiyappan Nagappan; Ahmed E. Hassan
Much research energy in software engineering is focused on the creation of effort and defect prediction models. Such models are important means for practitioners to judge their current project situation, optimize the allocation of their resources, and make informed future decisions. However, software engineering data contains a large amount of variability. Recent research demonstrates that such variability leads to poor fits of machine learning models to the underlying data, and suggests splitting datasets into more fine-grained subsets with similar properties. In this paper, we present a comparison of three different approaches for creating statistical regression models to model and predict software defects and development effort. Global models are trained on the whole dataset. In contrast, local models are trained on subsets of the dataset. Last, we build a global model that takes into account local characteristics of the data. We evaluate the performance of these three approaches in a case study on two defect and two effort datasets. We find that for both types of data, local models show a significantly increased fit to the data compared to global models. The substantial improvements in both relative and absolute prediction errors demonstrate that this increased goodness of fit is valuable in practice. Finally, our experiments suggest that trends obtained from global models are too general for practical recommendations. At the same time, local models provide a multitude of trends which are only valid for specific subsets of the data. Instead, we advocate the use of trends obtained from global models that take into account local characteristics, as they combine the best of both worlds.
international conference on program comprehension | 2010
Nicolas Bettenburg; Ahmed E. Hassan
Correcting software defects accounts for a significant amount of resources such as time, money and personnel. To be able to focus testing efforts where needed the most, researchers have studied statistical models to predict in which parts of a software future defects are likely to occur. By studying the mathematical relations between predictor variables used in these models, researchers can form an increased understanding of the important connections between development activities and software quality. Predictor variables used in past top-performing models are largely based on file-oriented measures, such as source code and churn metrics. However, source code is the end product of numerous interlaced and collaborative activities carried out by developers. Traces of such activities can be found in the repositories used to manage development efforts. In this paper, we investigate statistical models, to study the impact of social structures between developers and end-users on software quality. These models use predictor variables based on social information mined from the issue tracking and version control repositories of a large open-source software project. The results of our case study are promising and indicate that statistical models based on social information have a similar degree of explanatory power as traditional models. Furthermore, our findings suggest that social information does not substitute, but rather augments traditional product and process-based metrics used in defect prediction models.
international conference on software maintenance | 2009
Nicolas Bettenburg; Emad Shihab; Ahmed E. Hassan
Mailing list repositories contain valuable information about the history of a project. Research is starting to mine this information to support developers and maintainers of long-lived software projects. However, such information exists as unstructured data that needs special processing before it can be studied. In this paper, we identify several challenges that arise when using off-the-shelf techniques for processing mailing list data. Our study highlights the importance of proper processing of mailing list data to ensure accurate research results.
Science of Computer Programming | 2012
Nicolas Bettenburg; Weiyi Shang; Walid M. Ibrahim; Bram Adams; Ying Zou; Ahmed E. Hassan
To study the impact of code clones on software quality, researchers typically carry out their studies based on fine-grained analysis of inconsistent changes at the revision level. As a result, they capture much of the chaotic and experimental nature inherent in any on-going software development process. Analyzing highly fluctuating and short-lived clones is likely to exaggerate the ill effects of inconsistent changes on the quality of the released software product, as perceived by the end user. To gain a broader perspective, we perform an empirical study on the effect of inconsistent changes on software quality at the release level. Based on a case study on three open source software systems, we observe that only 1.02%-4.00% of all clone genealogies introduce software defects at the release level, as opposed to the substantially higher percentages reported by previous studies at the revision level. Our findings suggest that clones do not have a significant impact on the post-release quality of the studied systems, and that the developers are able to effectively manage the evolution of cloned code.