Is this you? Create Your Porfile

Passakorn Phannachitta

Nara Institute of Science and Technology

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Passakorn Phannachitta is active.

Explore More

Publication

Featured researches published by Passakorn Phannachitta.

evaluation and assessment in software engineering | 2015

Case consistency: a necessary data quality property for software engineering data sets

Passakorn Phannachitta; Akito Monden; Jacky Keung; Ken-ichi Matsumoto

Data quality is an essential aspect in any empirical study, because the validity of models and/or analysis results derived from an empirical data is inherently influenced by its quality. In this empirical study, we focus on data consistency as a critical factor influencing the accuracy of prediction models in software engineering. We propose a software metric called Cases Inconsistency Level (CIL) for analyzing conflicts within software engineering data sets by leveraging probability statistics on project cases and counting the number of conflicting pairs. The result demonstrated that CIL is able to be used as a metric to identify either consistent data sets or inconsistent data sets, which are valuable for building robust prediction models. In addition to measuring the level of consistency, CIL is proved to be applicable to predict whether or not an effort model built from data set can achieve higher accuracy, an important indicator for empirical experiments in software engineering.

australian software engineering conference | 2013

An Empirical Experiment on Analogy-Based Software Cost Estimation with CUDA Framework

Passakorn Phannachitta; Jacky Keung; Ken-ichi Matsumoto

The success of estimating software project costs using analog-based reasoning has been noticeable for over a decade. The estimation accuracy is heavily depends on different heuristic methods to selecting the best feature subsets and a suitable set of similar projects from the repository. A complete search of all possible combinations may not be feasible due to insufficient computational resources for such a large search space. In this work, the problem is revisited, and we propose a novel algorithm tailored for analogy-based software cost estimation utilizing the latest CUDA computing framework to enable estimation with large project datasets. We demonstrated the use of the proposed distributed algorithm executed on graphic processing units (GPU), which has a different architecture suitable for compute-intensive problems. The method has been evaluated using 11 real-world datasets from the PROMISE repository. Results shows that the proposed ABE-CUDA approach is able to produce the best project cost estimates by determining the best feature subsets and the most suitable number of analogous projects for estimation, significantly improves the overall feature search time and prediction accuracy for software cost estimation. More importantly, the optimized estimation result can be used as a baseline benchmark to compare with other sophisticated analogy-based methods for software cost estimation.

empirical software engineering and measurement | 2017

The significant effects of data sampling approaches on software defect prioritization and classification

Kwabena Ebo Bennin; Jacky Keung; Akito Monden; Passakorn Phannachitta; Solomon Mensah

Context: Recent studies have shown that performance of defect prediction models can be affected when data sampling approaches are applied to imbalanced training data for building defect prediction models. However, the magnitude (degree and power) of the effect of these sampling methods on the classification and prioritization performances of defect prediction models is still unknown. Goal: To investigate the statistical and practical significance of using resampled data for constructing defect prediction models. Method: We examine the practical effects of six data sampling methods on performances of five defect prediction models. The prediction performances of the models trained on default datasets (no sampling method) are compared with that of the models trained on resampled datasets (application of sampling methods). To decide whether the performance changes are significant or not, robust statistical tests are performed and effect sizes computed. Twenty releases of ten open source projects extracted from the PROMISE repository are considered and evaluated using the AUC, pd, pf and G-mean performance measures. Results: There are statistical significant differences and practical effects on the classification performance (pd, pf and G-mean) between models trained on resampled datasets and those trained on the default datasets. However, sampling methods have no statistical and practical effects on defect prioritization performance (AUC) with small or no effect values obtained from the models trained on the resampled datasets. Conclusions: Existing sampling methods can properly set the threshold between buggy and clean samples, while they cannot improve the prediction of defect-proneness itself. Sampling methods are highly recommended for defect classification purposes when all faulty modules are to be considered for testing.

Empirical Software Engineering | 2017

A stability assessment of solution adaptation techniques for analogy-based software effort estimation

Passakorn Phannachitta; Jacky Keung; Akito Monden; Kenichi Matsumoto

Among numerous possible choices of effort estimation methods, analogy-based software effort estimation based on Case-based reasoning is one of the most adopted methods in both the industry and research communities. Solution adaptation is the final step of analogy-based estimation, employed to aggregate and adapt to solutions derived during the case-based reasoning process. Variants of solution adaptation techniques have been proposed in previous studies; however, the ranking of these techniques is not conclusive and shows conflicting results, since different studies rank these techniques in different ways. This paper aims to find a stable ranking of solution adaptation techniques for analogy-based estimation. Compared with the existing studies, we evaluate 8 commonly adopted solution techniques with more datasets (12), more feature selection techniques included (4), and more stable error measures (5) to a robust statistical test method based on the Brunner test. This comprehensive experimental procedure allows us to discover a stable ranking of the techniques applied, and to observe similar behaviors from techniques with similar adaptation mechanisms. In general, the linear adaptation techniques based on the functions of size and productivity (e.g., regression towards the mean technique) outperform the other techniques in a more robust experimental setting adopted in this study. Our empirical results show that project features with strong correlation to effort, such as software size or productivity, should be utilized in the solution adaptation step to achieve desirable performance. Designing a solution adaptation strategy in analogy-based software effort estimation requires careful consideration of those influential features to ensure its prediction is of relevant and accurate.

open source systems | 2016

The Impact of a Low Level of Agreement Among Reviewers in a Code Review Process

Toshiki Hirao; Akinori Ihara; Yuki Ueda; Passakorn Phannachitta; Ken-ichi Matsumoto

Software code review systems are commonly used in software development. In these systems, many patches are submitted to improve the quality. To verify the quality, voting is commonly used by contributors; however, there still exists a major problem, namely, that reviewers do not always simply reach a broad agreement. In our previous study, we found that consensus is not usually reached, implying that an individual reviewer’s final decision usually differs from that of the majority of the other reviewers. In this study, we further investigate the reasons why such situations often occur, and provide suggestions for better handling of these problems. Our analysis of the Qt and OpenStack project datasets allow us to suggest that a patch owner should select more appropriate reviewers who often agree with others’ decisions.

acm symposium on applied computing | 2016

A review and comparison of methods for determining the best analogies in analogy-based software effort estimation

Bodin Chinthanet; Passakorn Phannachitta; Yasutaka Kamei; Pattara Leelaprute; Arnon Rungsawang; Naoyasu Ubayashi; Ken-ichi Matsumoto

Analogy-based effort estimation (ABE) is a commonly used software development effort estimation method. The processes of ABE are based on a reuse of effort values from similar past projects, where the appropriate numbers of past projects (k values) to be reused is one of the long-standing debates in ABE research studies. To date, many approaches to find this k value have been continually proposed. One important reason for this inconclusive debate is that different studies appear to produce different conclusions of the k value to be appropriate. Therefore, in this study, we revisit 8 common approaches to the k value being most appropriate in general situations. With a more robust and comprehensive evaluation methodology using 5 robust error measures subject to the Wilcoxon rank-sum statistical test, we found that conicting results in the previous studies were not mainly due to the use of different methodologies nor different datasets, but the performance of the different approaches are actually varied widely.

joint conference of international workshop on software measurement and international conference on software process and product measurement | 2011

An Analysis of Gradual Patch Application: A Better Explanation of Patch Acceptance

Passakorn Phannachitta; Pijak Jirapiwong; Akinori Ihara; Masao Ohira; Ken-ichi Matsumoto

Patch submission has been known as one of the most important activities to sustain the open source software (OSS). The patch archive can be analyzed to procure many benefit cognizance for supporting the OSS project works. The recent models and methods that analyze the patches acceptance are quite rack of comprehensive; hence, complex activities such as a committer portioning the Passed QA patch out and accept are still excluded from the analysis. Therefore, the results derived from those methods would be inadequate to conclude the actual patch acceptance. In this research, we introduce an algorithm for analyzing patch acceptance including the partial and gradually accepted conditions. Validating our algorithm, we present our methods for indicating the partial and gradual application of the Passed QA patch between either mailing list and SVN or Bugzilla and CVS which are the commonly deployed patch-activities related system. We studied on two well known OSS projects; Apache HTTP and Eclipse Platform. We obtained a fascinating conclusion that larger patches have more confident to be accepted than the smaller contradicted to other analysis that came from the recent methods.

international conference on software maintenance | 2017

Bug or Not? Bug Report Classification Using N-Gram IDF

Pannavat Terdchanakul; Hideaki Hata; Passakorn Phannachitta; Ken-ichi Matsumoto

Previous studies have found that a significant number of bug reports are misclassified between bugs and nonbugs, and that manually classifying bug reports is a timeconsuming task. To address this problem, we propose a bug reports classification model with N-gram IDF, a theoretical extension of Inverse Document Frequency (IDF) for handling words and phrases of any length. N-gram IDF enables us to extract key terms of any length from texts, these key terms can be used as the features to classify bug reports. We build classification models with logistic regression and random forest using features from N-gram IDF and topic modeling, which is widely used in various software engineering tasks.With a publicly available dataset, our results show that our N-gram IDF-based models have a superior performance than the topic-based models on all of the evaluated cases. Our models show promising results and have a potential to be extended to other software engineering tasks.

Proceedings of the International Workshop on Innovative Software Development Methodologies and Practices | 2014

Scaling up analogy-based software effort estimation: a comparison of multiple hadoop implementation schemes

Passakorn Phannachitta; Jacky Keung; Akito Monden; Ken-ichi Matsumoto

Analogy-based estimation (ABE) is one of the most time consuming and compute intensive method in software development effort estimation. Optimizing ABE has been a dilemma because simplifying the procedure can reduce the estimation performance, while increasing the procedure complexity with more sophisticated theory may sacrifice an advantage of the unlimited scalability for a large data input. Motivated by an emergence of cloud computing technology in software applications, in this study we present 3 different implementation schemes based on Hadoop MapReduce to optimize the ABE process across multiple computing instances in the cloud-computing environment. We experimentally compared the 3 MapReduce implementation schemes in contrast with our previously proposed GPGPU approach (named ABE-CUDA) over 8 high-performance Amazon EC2 instances. Results present that the Hadoop solution can provide more computational resources that can extend the scalability of the ABE process. We recommend adoption of 2 different Hadoop implementations (Hadoop streaming and RHadoop) for accelerating the computation specifically for compute-intensive software engineering related tasks.

asia-pacific software engineering conference | 2013

Improving Analogy-Based Software Cost Estimation through Probabilistic-Based Similarity Measures

Passakorn Phannachitta; Jacky Keung; Akito Monden; Ken-ichi Matsumoto

The performance of software cost estimation based on analogy reasoning depends upon the measures that specifying the similarity between software projects. This paper empirically investigates the use of probabilistic-based distance functions to improve the similarity measurement. The probabilistic-based distance functions are considerably more robust, because they collect the implicit correlation between the occurrences of project feature attributes. This information gain enables the constructed estimation model to be more concise and comprehensible. The study compares 6 probabilistic-based distance functions against the commonly-used Euclidian distance. We empirically evaluate the implemented cost estimation model using 5 real-world datasets collected from the PROMISE repository. The result shows a significant improvement in terms of error reduction, that implies an estimation based on probabilistic-based distance functions achieve higher accuracy on average, and the peak performance significantly outperforms the Euclidian distance based on Wilcox on signed-rank test.

Explore More