Is this you? Create Your Porfile

Hareton Leung

Hong Kong Polytechnic University

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hareton Leung is active.

Explore More

Publication

Featured researches published by Hareton Leung.

IEEE Transactions on Software Engineering | 2006

Empirical Analysis of Object-Oriented Design Metrics for Predicting High and Low Severity Faults

Yuming Zhou; Hareton Leung

In the last decade, empirical studies on object-oriented design metrics have shown some of them to be useful for predicting the fault-proneness of classes in object-oriented software systems. This research did not, however, distinguish among faults according to the severity of impact. It would be valuable to know how object-oriented design metrics and class fault-proneness are related when fault severity is taken into account. In this paper, we use logistic regression and machine learning methods to empirically investigate the usefulness of object-oriented design metrics, specifically, a subset of the Chidamber and Kemerer suite, in predicting fault-proneness when taking fault severity into account. Our results, based on a public domain NASA data set, indicate that 1) most of these design metrics are statistically related to fault-proneness of classes across fault severity, and 2) the prediction capabilities of the investigated metrics greatly depend on the severity of faults. More specifically, these design metrics are able to predict low severity faults in fault-prone classes better than high severity faults in fault-prone classes

Journal of Systems and Software | 2007

Predicting object-oriented software maintainability using multivariate adaptive regression splines

Yuming Zhou; Hareton Leung

Accurate software metrics-based maintainability prediction can not only enable developers to better identify the determinants of software quality and thus help them improve design or coding, it can also provide managers with useful information to help them plan the use of valuable resources. In this paper, we employ a novel exploratory modeling technique, multiple adaptive regression splines (MARS), to build software maintainability prediction models using the metric data collected from two different object-oriented systems. The prediction accuracy of the MARS models are evaluated and compared using multivariate linear regression models, artificial neural network models, regression tree models, and support vector models. The results suggest that for one system MARS can predict maintainability more accurately than the other four typical modeling techniques, and that for the other system MARS is as accurate as the best modeling technique.

Journal of Systems and Software | 2010

On the ability of complexity metrics to predict fault-prone classes in object-oriented systems

Yuming Zhou; Baowen Xu; Hareton Leung

Many studies use logistic regression models to investigate the ability of complexity metrics to predict fault-prone classes. However, it is not uncommon to see the inappropriate use of performance indictors such as odds ratio in previous studies. In particular, a recent study by Olague et al. uses the odds ratio associated with one unit increase in a metric to compare the relative magnitude of the associations between individual metrics and fault-proneness. In addition, the percents of concordant, discordant, and tied pairs are used to evaluate the predictive effectiveness of a univariate logistic regression model. Their results suggest that lesser known complexity metrics such as standard deviation method complexity (SDMC) and average method complexity (AMC) are better predictors than the two commonly used metrics: lines of code (LOC) and weighted method McCabe complexity (WMC). In this paper, however, we show that (1) the odds ratio associated with one standard deviation increase, rather than one unit increase, in a metric should be used to compare the relative magnitudes of the effects of individual metrics on fault-proneness. Otherwise, misleading results may be obtained; and that (2) the connection of the percents of concordant, discordant, and tied pairs with the predictive effectiveness of a univariate logistic regression model is false, as they indeed do not depend on the model. Furthermore, we use the data collected from three versions of Eclipse to re-examine the ability of complexity metrics to predict fault-proneness. Our experimental results reveal that: (1) many metrics exhibit moderate or almost moderate ability in discriminating between fault-prone and not fault-prone classes; (2) LOC and WMC are indeed better fault-proneness predictors than SDMC and AMC; and (3) the explanatory power of other complexity metrics in addition to LOC is limited.

IEEE Transactions on Software Engineering | 2009

Examining the Potentially Confounding Effect of Class Size on the Associations between Object-Oriented Metrics and Change-Proneness

Yuming Zhou; Hareton Leung; Baowen Xu

Previous research shows that class size can influence the associations between object-oriented (OO) metrics and fault-proneness and therefore proposes that it should be controlled as a confounding variable when validating OO metrics on fault-proneness. Otherwise, their true associations may be distorted. However, it has not been determined whether this practice is equally applicable to other external quality attributes. In this paper, we use three size metrics, two of which are available during the high-level design phase, to examine the potentially confounding effect of class size on the associations between OO metrics and change-proneness. The OO metrics that are investigated include cohesion, coupling, and inheritance metrics. Our results, based on Eclipse, indicate that: 1) The confounding effect of class size on the associations between OO metrics and change-proneness, in general, exists, regardless of whichever size metric is used; 2) the confounding effect of class size generally leads to an overestimate of the associations between OO metrics and change-proneness; and 3) for many OO metrics, the confounding effect of class size completely accounts for their associations with change-proneness or results in a change of the direction of the associations. These results strongly suggest that studies validating OO metrics on change-proneness should also consider class size as a confounding variable.

Empirical Software Engineering | 2002

Estimating Maintenance Effort by Analogy

Hareton Leung

Effort estimation is a key step of any software project. This paper presents a method to estimate project effort using an improved version of analogy. Unlike estimation methods based on case-based reasoning, our method makes use of two nearest neighbors of the target project for estimation. An additional refinement based on the relative location of the target project is then applied to generate the effort estimate. We first identify the relationships between cost drivers and project effort, and then determine the number of past project data that should be used in the estimation to provide the best result. Our method is then applied to a set of maintenance projects. Based on a comparison of the estimation results from our estimation method and those of other estimation methods, we conclude that our method can provide more accurate results.

ACM Transactions on Software Engineering and Methodology | 2014

An in-depth study of the potentially confounding effect of class size in fault prediction

Yuming Zhou; Baowen Xu; Hareton Leung; Lin Chen

Background. The extent of the potentially confounding effect of class size in the fault prediction context is not clear, nor is the method to remove the potentially confounding effect, or the influence of this removal on the performance of fault-proneness prediction models. Objective. We aim to provide an in-depth understanding of the effect of class size on the true associations between object-oriented metrics and fault-proneness. Method. We first employ statistical methods to examine the extent of the potentially confounding effect of class size in the fault prediction context. After that, we propose a linear regression-based method to remove the potentially confounding effect. Finally, we empirically investigate whether this removal could improve the prediction performance of fault-proneness prediction models. Results. Based on open-source software systems, we found: (a) the confounding effect of class size on the associations between object-oriented metrics and fault-proneness in general exists; (b) the proposed linear regression-based method can effectively remove the confounding effect; and (c) after removing the confounding effect, the prediction performance of fault prediction models with respect to both ranking and classification can in general be significantly improved. Conclusion. We should remove the confounding effect of class size when building fault prediction models.

IEEE Transactions on Software Engineering | 2015

Are Slice-Based Cohesion Metrics Actually Useful in Effort-Aware Post-Release Fault-Proneness Prediction? An Empirical Study

Yibiao Yang; Yuming Zhou; Hongmin Lu; Lin Chen; Zhenyu Chen; Baowen Xu; Hareton Leung; Zhenyu Zhang

Background. Slice-based cohesion metrics leverage program slices with respect to the output variables of a module to quantify the strength of functional relatedness of the elements within the module. Although slice-based cohesion metrics have been proposed for many years, few empirical studies have been conducted to examine their actual usefulness in predicting fault-proneness. Objective. We aim to provide an in-depth understanding of the ability of slice-based cohesion metrics in effort-aware post-release fault-proneness prediction, i.e. their effectiveness in helping practitioners find post-release faults when taking into account the effort needed to test or inspect the code. Method. We use the most commonly used code and process metrics, including size, structural complexity, Halsteads software science, and code churn metrics, as the baseline metrics. First, we employ principal component analysis to analyze the relationships between slice-based cohesion metrics and the baseline metrics. Then, we use univariate prediction models to investigate the correlations between slice-based cohesion metrics and post-release fault-proneness. Finally, we build multivariate prediction models to examine the effectiveness of slice-based cohesion metrics in effort-aware post-release fault-proneness prediction when used alone or used together with the baseline code and process metrics. Results. Based on open-source software systems, our results show that: 1) slice-based cohesion metrics are not redundant with respect to the baseline code and process metrics; 2) most slice-based cohesion metrics are significantly negatively related to post-release fault-proneness; 3) slice-based cohesion metrics in general do not outperform the baseline metrics when predicting post-release fault-proneness; and 4) when used with the baseline metrics together, however, slice-based cohesion metrics can produce a statistically significant and practically important improvement of the effectiveness in effort-aware post-release fault-proneness prediction. Conclusion. Slice-based cohesion metrics are complementary to the most commonly used code and process metrics and are of practical value in the context of effort-aware post-release fault-proneness prediction.

Information & Software Technology | 2014

Source code size estimation approaches for object-oriented systems from UML class diagrams: A comparative study

Yuming Zhou; Yibiao Yang; Baowen Xu; Hareton Leung; Xiaoyu Zhou

Background: Source code size in terms of SLOC (source lines of code) is the input of many parametric software effort estimation models. However, it is unavailable at the early phase of software development. Objective: We investigate the accuracy of early SLOC estimation approaches for an object-oriented system using the information collected from its UML class diagram available at the early software development phase. Method: We use different modeling techniques to build the prediction models for investigating the accuracy of six types of metrics to estimate SLOC. The used techniques include linear models, non-linear models, rule/tree-based models, and instance-based models. The investigated metrics are class diagram metrics, predictive object points, object-oriented project size metric, fast&&serious class points, objective class points, and object-oriented function points. Results: Based on 100 open-source Java systems, we find that the prediction model built using object-oriented project size metric and ordinary least square regression with a logarithmic transformation achieves the highest accuracy (mean MMRE=0.19 and mean Pred(25)=0.74). Conclusion: We should use object-oriented project size metric and ordinary least square regression with a logarithmic transformation to build a simple, accurate, and comprehensible SLOC estimation model.

empirical software engineering and measurement | 2011

Mining Static Code Metrics for a Robust Prediction of Software Defect-Proneness

Lianfa Li; Hareton Leung

Defect-proneness prediction is affected by multiple aspects including sampling bias, non-metric factors, uncertainty of models etc. These aspects often contribute to prediction uncertainty and result in variance of prediction. This paper proposes two methods of data mining static code metrics to enhance defect-proneness prediction. Given little non-metric or qualitative information extracted from software codes, we first suggest to use a robust unsupervised learning method, shared nearest neighbors (SNN) to extract the similarity patterns of the code metrics. These patterns indicate similar characteristics of the components of the same cluster that may result in introduction of similar defects. Using the similarity patterns with code metrics as predictors, defect-proneness prediction may be improved. The second method uses the Occams windows and Bayesian model averaging to deal with model uncertainty: first, the datasets are used to train and cross-validate multiple learners and then highly qualified models are selected and integrated into a robust prediction. From a study based on 12 datasets from NASA, we conclude that our proposed solutions can contribute to a better defect-proneness prediction.

world congress on services | 2012

Predicting Failures in Dynamic Composite Services with Proactive Monitoring Technique

Yuelong Zhu; Xiaobin Wu; Pengcheng Zhang; Hareton Leung; Wenrui Li

Web service composition is a new paradigm to develop distributed and reactive software-intensive systems. Predicting and preventing failures of dynamic composite services is an important and challenge problem due to the dynamically evolving attribute. In previous work, we propose CASSANDRA, a novel proactive monitoring technique with the ability to predict and prevent the potential failures happening in dynamic evolvable system. In this paper, we concretize the approach into web service composition field. By combining runtime information and design-time specification of basic services, the approach can analyse future -step models ahead of the current service execution states. Then, this model can be used to check with a set of desired properties represented by property sequence chart. Initial experiments on an online medicine case study validates our approach and shows encouraging results.

Explore More