Jacky Keung | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Jacky Keung is active.

Explore More

Publication

Featured researches published by Jacky Keung.

Future Generation Computer Systems | 2012

Empirical prediction models for adaptive resource provisioning in the cloud

Sadeka Islam; Jacky Keung; Kevin Lee; Anna Liu

Cloud computing allows dynamic resource scaling for enterprise online transaction systems, one of the key characteristics that differentiates the cloud from the traditional computing paradigm. However, initializing a new virtual instance in a cloud is not instantaneous; cloud hosting platforms introduce several minutes delay in the hardware resource allocation. In this paper, we develop prediction-based resource measurement and provisioning strategies using Neural Network and Linear Regression to satisfy upcoming resource demands. Experimental results demonstrate that the proposed technique offers more adaptive resource management for applications hosted in the cloud environment, an important mechanism to achieve on-demand resource allocation in the cloud.

IEEE Transactions on Software Engineering | 2012

On the Value of Ensemble Effort Estimation

Ekrem Kocaguneli; Tim Menzies; Jacky Keung

Background: Despite decades of research, there is no consensus on which software effort estimation methods produce the most accurate models. Aim: Prior work has reported that, given M estimation methods, no single method consistently outperforms all others. Perhaps rather than recommending one estimation method as best, it is wiser to generate estimates from ensembles of multiple estimation methods. Method: Nine learners were combined with 10 preprocessing options to generate 9 × 10 = 90 solo methods. These were applied to 20 datasets and evaluated using seven error measures. This identified the best n (in our case n = 13) solo methods that showed stable performance across multiple datasets and error measures. The top 2, 4, 8, and 13 solo methods were then combined to generate 12 multimethods, which were then compared to the solo methods. Results: 1) The top 10 (out of 12) multimethods significantly outperformed all 90 solo methods. 2) The error rates of the multimethods were significantly less than the solo methods. 3) The ranking of the best multimethod was remarkably stable. Conclusion: While there is no best single effort estimation method, there exist best combinations of such effort estimation methods.

IEEE Transactions on Software Engineering | 2012

Exploiting the Essential Assumptions of Analogy-Based Effort Estimation

Ekrem Kocaguneli; Tim Menzies; Ayse Basar Bener; Jacky Keung

Background: There are too many design options for software effort estimators. How can we best explore them all? Aim: We seek aspects on general principles of effort estimation that can guide the design of effort estimators. Method: We identified the essential assumption of analogy-based effort estimation, i.e., the immediate neighbors of a project offer stable conclusions about that project. We test that assumption by generating a binary tree of clusters of effort data and comparing the variance of supertrees versus smaller subtrees. Results: For 10 data sets (from Coc81, Nasa93, Desharnais, Albrecht, ISBSG, and data from Turkish companies), we found: 1) The estimation variance of cluster subtrees is usually larger than that of cluster supertrees; 2) if analogy is restricted to the cluster trees with lower variance, then effort estimates have a significantly lower error (measured using MRE, AR, and Pred(25) with a Wilcoxon test, 95 percent confidence, compared to nearest neighbor methods that use neighborhoods of a fixed size). Conclusion: Estimation by analogy can be significantly improved by a dynamic selection of nearest neighbors, using only the project data from regions with small variance.

IEEE Transactions on Software Engineering | 2008

Analogy-X: Providing Statistical Inference to Analogy-Based Software Cost Estimation

Jacky Keung; Barbara A. Kitchenham; D. R. Jeffery

Data-intensive analogy has been proposed as a means of software cost estimation as an alternative to other data intensive methods such as linear regression. Unfortunately, there are drawbacks to the method. There is no mechanism to assess its appropriateness for a specific dataset. In addition, heuristic algorithms are necessary to select the best set of variables and identify abnormal project cases. We introduce a solution to these problems based upon the use of the Mantel correlation randomization test called Analogy-X. We use the strength of correlation between the distance matrix of project features and the distance matrix of known effort values of the dataset. The method is demonstrated using the Desharnais dataset and two random datasets, showing (1) the use of Mantels correlation to identify whether analogy is appropriate, (2) a stepwise procedure for feature selection, as well as (3) the use of a leverage statistic for sensitivity analysis that detects abnormal data points. Analogy-X, thus, provides a sound statistical basis for analogy, removes the need for heuristic search and greatly improves its algorithmic performance.

Empirical Software Engineering | 2008

Evaluating guidelines for reporting empirical software engineering studies

Barbara A. Kitchenham; Hiyam Al-Khilidar; Muhammed Ali Babar; Mike Berry; Karl Cox; Jacky Keung; Felicia Kurniawati; Mark Staples; He Zhang; Liming Zhu

BackgroundSeveral researchers have criticized the standards of performing and reporting empirical studies in software engineering. In order to address this problem, Jedlitschka and Pfahl have produced reporting guidelines for controlled experiments in software engineering. They pointed out that their guidelines needed evaluation. We agree that guidelines need to be evaluated before they can be widely adopted.AimThe aim of this paper is to present the method we used to evaluate the guidelines and report the results of our evaluation exercise. We suggest our evaluation process may be of more general use if reporting guidelines for other types of empirical study are developed.MethodWe used a reading method inspired by perspective-based and checklist-based reviews to perform a theoretical evaluation of the guidelines. The perspectives used were: Researcher, Practitioner/Consultant, Meta-analyst, Replicator, Reviewer and Author. Apart from the Author perspective, the reviews were based on a set of questions derived by brainstorming. A separate review was performed for each perspective. The review using the Author perspective considered each section of the guidelines sequentially.ResultsThe reviews detected 44 issues where the guidelines would benefit from amendment or clarification and 8 defects.ConclusionsReporting guidelines need to specify what information goes into what section and avoid excessive duplication. The current guidelines need to be revised and then subjected to further theoretical and empirical validation. Perspective-based checklists are a useful validation method but the practitioner/consultant perspective presents difficulties.Categories and Subject DescriptorsK.6.3 [Software Engineering]: Software Management—Software process.General TermsManagement, Experimentation.

ieee international conference on cloud computing technology and science | 2011

Application migration to cloud: a taxonomy of critical factors

Van Nghia Tran; Jacky Keung; Anna Liu; Alan Fekete

Cloud computing has attracted attention as an important platform for software deployment, with perceived benefits such as elasticity to fluctuating load, and reduced operational costs compared to running in enterprise data centers. While some software is written from scratch specially for the cloud, many organizations also wish to migrate existing applications to a cloud platform. Such a migration exercise to a cloud platform is not easy: some changes need to be made to deal with differences in software environment, such as programming model and data storage APIs, as well as varying performance qualities. We report here on experiences in doing a number of sample migrations. We propose a taxonomy of the migration tasks involved, and we show the breakdown of costs among categories of task, for a case-study which migrated a .NET n-tier application to run on Windows Azure. We also indicate important factors that impact on the cost of various migration tasks. This work contributes towards our future direction of building a framework for cost-benefit tradeoff analysis that would apply to migrating applications to cloud platforms, and could help decision-makers evaluate proposals for using cloud computing.

Empirical Software Engineering | 2017

Robust Statistical Methods for Empirical Software Engineering

Barbara A. Kitchenham; Lech Madeyski; David Budgen; Jacky Keung; Pearl Brereton; Stuart M. Charters; Shirley Gibbs; Amnart Pohthong

There have been many changes in statistical theory in the past 30 years, including increased evidence that non-robust methods may fail to detect important results. The statistical advice available to software engineering researchers needs to be updated to address these issues. This paper aims both to explain the new results in the area of robust analysis methods and to provide a large-scale worked example of the new methods. We summarise the results of analyses of the Type 1 error efficiency and power of standard parametric and non-parametric statistical tests when applied to non-normal data sets. We identify parametric and non-parametric methods that are robust to non-normality. We present an analysis of a large-scale software engineering experiment to illustrate their use. We illustrate the use of kernel density plots, and parametric and non-parametric methods using four different software engineering data sets. We explain why the methods are necessary and the rationale for selecting a specific analysis. We suggest using kernel density plots rather than box plots to visualise data distributions. For parametric analysis, we recommend trimmed means, which can support reliable tests of the differences between the central location of two or more samples. When the distribution of the data differs among groups, or we have ordinal scale data, we recommend non-parametric methods such as Cliff’s δ or a robust rank-based ANOVA-like method.

automated software engineering | 2010

When to use data from other projects for effort estimation

Ekrem Kocaguneli; Tim Menzies; Ye Yang; Jacky Keung

Collecting the data required for quality prediction within a development team is time-consuming and expensive. An alternative to make predictions using data that crosses from other projects or even other companies. We show that with/without relevancy filtering, imported data performs the same/worse (respectively) than using local data. Therefore, we recommend the use of relevancy filtering whenever generating estimates using data from another project.

IEEE Transactions on Software Engineering | 2013

Active learning and effort estimation: Finding the essential content of software effort estimation data

Ekrem Kocaguneli; Tim Menzies; Jacky Keung; David R. Cok; Raymond J. Madachy

Background: Do we always need complex methods for software effort estimation (SEE)? Aim: To characterize the essential content of SEE data, i.e., the least number of features and instances required to capture the information within SEE data. If the essential content is very small, then 1) the contained information must be very brief and 2) the value added of complex learning schemes must be minimal. Method: Our QUICK method computes the euclidean distance between rows (instances) and columns (features) of SEE data, then prunes synonyms (similar features) and outliers (distant instances), then assesses the reduced data by comparing predictions from 1) a simple learner using the reduced data and 2) a state-of-the-art learner (CART) using all data. Performance is measured using hold-out experiments and expressed in terms of mean and median MRE, MAR, PRED(25), MBRE, MIBRE, or MMER. Results: For 18 datasets, QUICK pruned 69 to 96 percent of the training data (median = 89 percent). K = 1 nearest neighbor predictions (in the reduced data) performed as well as CARTs predictions (using all data). Conclusion: The essential content of some SEE datasets is very small. Complex estimation methods may be overelaborate for such datasets and can be simplified. We offer QUICK as an example of such a simpler SEE method.

australian software engineering conference | 2009

Software Development Cost Estimation Using Analogy: A Review

Jacky Keung

Software project managers require reliable methods for estimating software project costs, and it is especially important at the early stage of software cycle. Analogy for software cost estimation has been considered as a suitable alternative to regression-based estimation method, and empirical studies have shown that it can be used successfully in many circumstances. It is important for project managers to understand the strengths and weaknesses of each useful software cost estimation method, more importantly when to use these methods and at which stage of the software development. This paper provides a comprehensive overview of the background and the underlying theory of analogy for software cost estimation, published in major software engineering journals and conferences over the past 15 years. Investigation on the dataset quality evaluation and its relevance to the target problem for analogy are further discussed, the result allows researchers and project managers to familiarize the underlying nature of the analogy-based approach.

Explore More