Chakkrit Tantithamthavorn

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Chakkrit Tantithamthavorn is active.

Explore More

Publication

Featured researches published by Chakkrit Tantithamthavorn.

international conference on software engineering | 2016

Automated parameter optimization of classification techniques for defect prediction models

Chakkrit Tantithamthavorn; Shane McIntosh; Ahmed E. Hassan; Ken-ichi Matsumoto

Defect prediction models are classifiers that are trained to identify defect-prone software modules. Such classifiers have configurable parameters that control their characteristics (e.g., the number of trees in a random forest classifier). Recent studies show that these classifiers may underperform due to the use of suboptimal default parameter settings. However, it is impractical to assess all of the possible settings in the parameter spaces. In this paper, we investigate the performance of defect prediction models where Caret — an automated parameter optimization technique - has been applied. Through a case study of 18 datasets from systems that span both proprietary and open source domains, we find that (1) Caret improves the AUC performance of defect prediction models by as much as 40 percentage points; (2) Caret-optimized classifiers are at least as stable as (with 35% of them being more stable than) classifiers that are trained using the default settings; and (3) Caret increases the likelihood of producing a top-performing classifier by as much as 83%. Hence, we conclude that parameter settings can indeed have a large impact on the performance of defect prediction models, suggesting that researchers should experiment with the parameters of the classification techniques. Since automated parameter optimization techniques like Caret yield substantially benefits in terms of performance improvement and stability, while incurring a manageable additional computational cost, they should be included in future defect prediction studies.

ieee international conference on software analysis evolution and reengineering | 2015

Who should review my code? A file location-based code-reviewer recommendation approach for Modern Code Review

Patanamon Thongtanunam; Chakkrit Tantithamthavorn; Raula Gaikovina Kula; Norihiro Yoshida; Hajimu Iida; Ken-ichi Matsumoto

Software code review is an inspection of a code change by an independent third-party developer in order to identify and fix defects before an integration. Effectively performing code review can improve the overall software quality. In recent years, Modern Code Review (MCR), a lightweight and tool-based code inspection, has been widely adopted in both proprietary and open-source software systems. Finding appropriate code-reviewers in MCR is a necessary step of reviewing a code change. However, little research is known the difficulty of finding code-reviewers in a distributed software development and its impact on reviewing time. In this paper, we investigate the impact of reviews with code-reviewer assignment problem has on reviewing time. We find that reviews with code-reviewer assignment problem take 12 days longer to approve a code change. To help developers find appropriate code-reviewers, we propose RevFinder, a file location-based code-reviewer recommendation approach. We leverage a similarity of previously reviewed file path to recommend an appropriate code-reviewer. The intuition is that files that are located in similar file paths would be managed and reviewed by similar experienced code-reviewers. Through an empirical evaluation on a case study of 42,045 reviews of Android Open Source Project (AOSP), OpenStack, Qt and LibreOffice projects, we find that RevFinder accurately recommended 79% of reviews with a top 10 recommendation. RevFinder also correctly recommended the code-reviewers with a median rank of 4. The overall ranking of RevFinder is 3 times better than that of a baseline approach. We believe that RevFinder could be applied to MCR in order to help developers find appropriate code-reviewers and speed up the overall code review process.

international conference on software engineering | 2015

The impact of mislabelling on the performance and interpretation of defect prediction models

Chakkrit Tantithamthavorn; Shane McIntosh; Ahmed E. Hassan; Akinori Ihara; Ken-ichi Matsumoto

The reliability of a prediction model depends on the quality of the data from which it was trained. Therefore, defect prediction models may be unreliable if they are trained using noisy data. Recent research suggests that randomly-injected noise that changes the classification (label) of software modules from defective to clean (and vice versa) can impact the performance of defect models. Yet, in reality, incorrectly labelled (i.e., mislabelled) issue reports are likely non-random. In this paper, we study whether mislabelling is random, and the impact that realistic mislabelling has on the performance and interpretation of defect models. Through a case study of 3,931 manually-curated issue reports from the Apache Jackrabbit and Lucene systems, we find that: (1) issue report mislabelling is not random; (2) precision is rarely impacted by mislabelled issue reports, suggesting that practitioners can rely on the accuracy of modules labelled as defective by models that are trained using noisy data; (3) however, models trained on noisy data typically achieve 56%-68% of the recall of models trained on clean data; and (4) only the metrics in top influence rank of our defect models are robust to the noise introduced by mislabelling, suggesting that the less influential metrics of models that are trained on noisy data should not be interpreted or used to make decisions.

software engineering, artificial intelligence, networking and parallel/distributed computing | 2013

Using Co-change Histories to Improve Bug Localization Performance

Chakkrit Tantithamthavorn; Akinori Ihara; Ken-ichi Matsumoto

A large open source software (OSS) project receives many bug reports on a daily basis. Bug localization techniques automatically pinpoint source code fragments that are relevant to a bug report, thus enabling faster correction. Even though many bug localization methods have been introduced, their performance is still not efficient. In this research, we improved on existing bug localization methods by taking into account co-change histories. We conducted experiments on two OSS datasets, the Eclipse SWT 3.1 project and the Android ZXing project. We validated our approach by evaluating effectiveness compared to the state-of-the-art approach Bug Locator. In the Eclipse SWT 3.1 project, our approach reliably identified source code that should be fixed for a bug in 72.46% of the total bugs, while Bug Locator identified only 51.02%. In the Android ZXing project, our approach identified 85.71%, while Bug Locator identified 60%.

international symposium on software reliability engineering | 2013

Mining A change history to quickly identify bug locations : A case study of the Eclipse project

Chakkrit Tantithamthavorn; Rattamont Teekavanich; Akinori Ihara; Ken-ichi Matsumoto

In this study, we proposed an approach to mine a change history to improve the bug localization performance. The key idea is that a recently fixed file may be fixed in the near future. We used a combination of textual feature and mining the change history to recommend source code files that are likely to be fixed for a given bug report. First, we adopted the Vector Space Model (VSM) to find relevant source code files that are textually similar to the bug report. Second, we analyzed the change history to identify previously fixed files. We then estimated the fault proneness of these files. Finally, we combined the two scores, from textual similarity and fault proneness, for every source code file. We then recommend developers examine source code files with higher scores. We evaluated our approach based on 1,212 bug reports from the Eclipse Platform and Eclipse JDT. The experimental results show that our proposed approach can improve the bug localization performance and effectively identify buggy files.

international conference on software engineering | 2016

Towards a better understanding of the impact of experimental components on defect prediction modelling

Chakkrit Tantithamthavorn

Defect prediction models are used to pinpoint risky software modules and understand past pitfalls that lead to defective modules. The predictions and insights that are derived from defect prediction models may not be accurate and reliable if researchers do not consider the impact of experimental components (e.g., datasets, metrics, and classifiers) of defect prediction modelling. Therefore, a lack of awareness and practical guidelines from previous research can lead to invalid predictions and unreliable insights. In this thesis, we investigate the impact that experimental components have on the predictions and insights of defect prediction models. Through case studies of systems that span both proprietary and open-source domains, we find that (1) noise in defect datasets; (2) parameter settings of classification techniques; and (3) model validation techniques have a large impact on the predictions and insights of defect prediction models, suggesting that researchers should carefully select experimental components in order to produce more accurate and reliable defect prediction models.

international conference on software engineering | 2017

An experience report on defect modelling in practice: pitfalls and challenges

Chakkrit Tantithamthavorn; Ahmed E. Hassan

Over the past decade with the rise of the Mining Software Repositories (MSR) field, the modelling of defects for large and long-lived systems has become one of the most common applications of MSR. The findings and approaches of such studies have attracted the attention of many of our industrial collaborators (and other practitioners worldwide). At the core of many of these studies is the development and use of analytical models for defects. In this paper, we discuss common pitfalls and challenges that we observed as practitioners attempt to develop such models or reason about the findings of such studies. The key goal of this paper is to document such pitfalls and challenges so practitioners can avoid them in future efforts. We also hope that other academics will be mindful of such pitfalls and challenges in their own work and industrial engagements.

international symposium on software reliability engineering | 2016

A Study of Redundant Metrics in Defect Prediction Datasets

Jirayus Jiarpakdee; Chakkrit Tantithamthavorn; Akinori Ihara; Ken-ichi Matsumoto

Defect prediction models can help Software Quality Assurance (SQA) teams understand their past pitfalls that lead to defective modules. However, the conclusions that are derived from defect prediction models without mitigating redundant metrics issues may be misleading. In this paper, we set out to investigate if redundant metrics issues are affecting defect prediction studies, and its degree and causes of redundancy. Through a case study of 101 publicly-available defect datasets of systems that span both proprietary and open source domains, we observe that (1) 10%-67% of metrics of the studied defect datasets are redundant, and (2) the redundancy of metrics has to do with the aggregation functions of metrics. These findings suggest that researchers should be aware of redundant metrics prior to constructing a defect prediction model in order to maximize internal validity of their studies.

APRES | 2014

Impact Analysis of Granularity Levels on Feature Location Technique

Chakkrit Tantithamthavorn; Akinori Ihara; Hideaki Hata; Ken-ichi Matsumoto

Due to the increasing of software requirements and software features, modern software systems continue to grow in size and complexity. Locating source code entities that required to implement a feature in millions lines of code is labor and cost intensive for developers. To this end, several studies have proposed the use of Information Retrieval (IR) to rank source code entities based on their textual similarity to an issue report. The ranked source code entities could be at a class or function granularity level. Source code entities at the class-level are usually large in size and might contain a lot of functions that are not implemented for the feature. Hence, we conjecture that the class-level feature location technique requires more effort than function-level feature location technique. In this paper, we investigate the impact of granularity levels on a feature location technique. We also presented a new evaluation method using effort-based evaluation. The results indicated that function-level feature location technique outperforms class-level feature location technique. Moreover, function-level feature location technique also required 7 times less effort than class-level feature location technique to localize the first relevant source code entity. Therefore, we conclude that feature location technique at the function-level of program elements is effective in practice.

Empirical Software Engineering | 2018

Studying the dialogue between users and developers of free apps in the Google Play Store

Safwat Hassan; Chakkrit Tantithamthavorn; Cor-Paul Bezemer; Ahmed E. Hassan

The popularity of mobile apps continues to grow over the past few years. Mobile app stores, such as the Google Play Store and Apple’s App Store provide a unique user feedback mechanism to app developers through the possibility of posting app reviews. In the Google Play Store (and soon in the Apple App Store), developers are able to respond to such user feedback. Over the past years, mobile app reviews have been studied excessively by researchers. However, much of prior work (including our own prior work) incorrectly assumes that reviews are static in nature and that users never update their reviews. In a recent study, we started analyzing the dynamic nature of the review-response mechanism. Our previous study showed that responding to a review often has a positive effect on the rating that is given by the user to an app. In this paper, we revisit our prior finding in more depth by studying 4.5 million reviews with 126,686 responses for 2,328 top free-to-download apps in the Google Play Store. One of the major findings of our paper is that the assumption that reviews are static is incorrect. In particular, we find that developers and users in some cases use this response mechanism as a rudimentary user support tool, where dialogues emerge between users and developers through updated reviews and responses. Even though the messages are often simple, we find instances of as many as ten user-developer back-and-forth messages that occur via the response mechanism. Using a mixed-effect model, we identify that the likelihood of a developer responding to a review increases as the review rating gets lower or as the review content gets longer. In addition, we identify four patterns of developers: 1) developers who primarily respond to only negative reviews, 2) developers who primarily respond to negative reviews or to reviews based on their contents, 3) developers who primarily respond to reviews which are posted shortly after the latest release of their app, and 4) developers who primarily respond to reviews which are posted long after the latest release of their app. We perform a qualitative analysis of developer responses to understand what drives developers to respond to a review. We manually analyzed a statistically representative random sample of 347 reviews with responses for the top ten apps with the highest number of developer responses. We identify seven drivers that make a developer respond to a review, of which the most important ones are to thank the users for using the app and to ask the user for more details about the reported issue. Our findings show that it can be worthwhile for app owners to respond to reviews, as responding may lead to an increase in the given rating. In addition, our findings show that studying the dialogue between user and developer can provide valuable insights that can lead to improvements in the app store and user support process.

Explore More