Burak Turhan | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Burak Turhan is active.

Explore More

Publication

Featured researches published by Burak Turhan.

Empirical Software Engineering | 2009

On the relative value of cross-company and within-company data for defect prediction

Burak Turhan; Tim Menzies; Ayse Basar Bener; Justin S. Di Stefano

We propose a practical defect prediction approach for companies that do not track defect related data. Specifically, we investigate the applicability of cross-company (CC) data for building localized defect predictors using static code features. Firstly, we analyze the conditions, where CC data can be used as is. These conditions turn out to be quite few. Then we apply principles of analogy-based learning (i.e. nearest neighbor (NN) filtering) to CC data, in order to fine tune these models for localization. We compare the performance of these models with that of defect predictors learned from within-company (WC) data. As expected, we observe that defect predictors learned from WC data outperform the ones learned from CC data. However, our analyses also yield defect predictors learned from NN-filtered CC data, with performance close to, but still not better than, WC data. Therefore, we perform a final analysis for determining the minimum number of local defect reports in order to learn WC defect predictors. We demonstrate in this paper that the minimum number of data samples required to build effective defect predictors can be quite small and can be collected quickly within a few months. Hence, for companies with no local defect data, we recommend a two-phase approach that allows them to employ the defect prediction process instantaneously. In phase one, companies should use NN-filtered CC data to initiate the defect prediction process and simultaneously start collecting WC (local) data. Once enough WC data is collected (i.e. after a few months), organizations should switch to phase two and use predictors learned from WC data.

automated software engineering | 2010

Defect prediction from static code features: current results, limitations, new approaches

Tim Menzies; Zach Milton; Burak Turhan; Bojan Cukic; Yue Jiang; Ayse Basar Bener

Building quality software is expensive and software quality assurance (QA) budgets are limited. Data miners can learn defect predictors from static code features which can be used to control QA resources; e.g. to focus on the parts of the code predicted to be more defective.Recent results show that better data mining technology is not leading to better defect predictors. We hypothesize that we have reached the limits of the standard learning goal of maximizing area under the curve (AUC) of the probability of false alarms and probability of detection “AUC(pd, pf)”; i.e. the area under the curve of a probability of false alarm versus probability of detection.Accordingly, we explore changing the standard goal. Learners that maximize “AUC(effort, pd)” find the smallest set of modules that contain the most errors. WHICH is a meta-learner framework that can be quickly customized to different goals. When customized to AUC(effort, pd), WHICH out-performs all the data mining methods studied here. More importantly, measured in terms of this new goal, certain widely used learners perform much worse than simple manual methods.Hence, we advise against the indiscriminate use of learners. Learners must be chosen and customized to the goal at hand. With the right architecture (e.g. WHICH), tuning a learner to specific local business goals can be a simple task.

IEEE Transactions on Software Engineering | 2013

Local versus Global Lessons for Defect Prediction and Effort Estimation

Tim Menzies; Andrew Butcher; David R. Cok; Andrian Marcus; Lucas Layman; Forrest Shull; Burak Turhan; Thomas Zimmermann

Existing research is unclear on how to generate lessons learned for defect prediction and effort estimation. Should we seek lessons that are global to multiple projects or just local to particular projects? This paper aims to comparatively evaluate local versus global lessons learned for effort estimation and defect prediction. We applied automated clustering tools to effort and defect datasets from the PROMISE repository. Rule learners generated lessons learned from all the data, from local projects, or just from each cluster. The results indicate that the lessons learned after combining small parts of different data sources (i.e., the clusters) were superior to either generalizations formed over all the data or local lessons formed from particular projects. We conclude that when researchers attempt to draw lessons from some historical data source, they should 1) ignore any existing local divisions into multiple sources, 2) cluster across all available data, then 3) restrict the learning of lessons to the clusters from other sources that are nearest to the test data.

Information & Software Technology | 2010

Practical considerations in deploying statistical methods for defect prediction: A case study within the Turkish telecommunications industry

Ayse Tosun; Ayse Basar Bener; Burak Turhan; Tim Menzies

Context: Building defect prediction models in large organizations has many challenges due to limited resources and tight schedules in the software development lifecycle. It is not easy to collect data, utilize any type of algorithm and build a permanent model at once. We have conducted a study in a large telecommunications company in Turkey to employ a software measurement program and to predict pre-release defects. Based on our prior publication, we have shared our experience in terms of the project steps (i.e. challenges and opportunities). We have further introduced new techniques that improve our earlier results. Objective: In our previous work, we have built similar predictors using data representative for US software development. Our task here was to check if those predictors were specific solely to US organizations or to a broader class of software. Method: We have presented our approach and results in the form of an experience report. Specifically, we have made use of different techniques for improving the information content of the software data and the performance of a Naive Bayes classifier in the prediction model that is locally tuned for the company. We have increased the information content of the software data by using module dependency data and improved the performance by adjusting the hyper-parameter (decision threshold) of the Naive Bayes classifier. We have reported and discussed our results in terms of defect detection rates and false alarms. We also carried out a cost-benefit analysis to show that our approach can be efficiently put into practice. Results: Our general result is that general defect predictors, which exist across a wide range of software (in both US and Turkish organizations), are present. Our specific results indicate that concerning the organization subject to this study, the use of version history information along with code metrics decreased false alarms by 22%, the use of dependencies between modules further reduced false alarms by 8%, and the decision threshold optimization for the Naive Bayes classifier using code metrics and version history information further improved false alarms by 30% in comparison to a prediction using only code metrics and a default decision threshold. Conclusion: Implementing statistical techniques and machine learning on a real life scenario is a difficult yet possible task. Using simple statistical and algorithmic techniques produces an average detection rate of 88%. Although using dependency data improves our results, it is difficult to collect and analyze such data in general. Therefore, we would recommend optimizing the hyper-parameter of the proposed technique, Naive Bayes, to calibrate the defect prediction model rather than employing more complex classifiers. We also recommend that researchers who explore statistical and algorithmic methods for defect prediction should spend less time on their algorithms and more time on studying the pragmatic considerations of large organizations.

Empirical Software Engineering | 2012

On the dataset shift problem in software engineering prediction models

Burak Turhan

A core assumption of any prediction model is that test data distribution does not differ from training data distribution. Prediction models used in software engineering are no exception. In reality, this assumption can be violated in many ways resulting in inconsistent and non-transferrable observations across different cases. The goal of this paper is to explain the phenomena of conclusion instability through the dataset shift concept from software effort and fault prediction perspective. Different types of dataset shift are explained with examples from software engineering, and techniques for addressing associated problems are discussed. While dataset shifts in the form of sample selection bias and imbalanced data are well-known in software engineering research, understanding other types is relevant for possible interpretations of the non-transferable results across different sites and studies. Software engineering community should be aware of and account for the dataset shift related issues when evaluating the validity of research outcomes.

Software Quality Journal | 2011

An industrial case study of classifier ensembles for locating software defects

Ayse Tosun Misirli; Ayse Basar Bener; Burak Turhan

As the application layer in embedded systems dominates over the hardware, ensuring software quality becomes a real challenge. Software testing is the most time-consuming and costly project phase, specifically in the embedded software domain. Misclassifying a safe code as defective increases the cost of projects, and hence leads to low margins. In this research, we present a defect prediction model based on an ensemble of classifiers. We have collaborated with an industrial partner from the embedded systems domain. We use our generic defect prediction models with data coming from embedded projects. The embedded systems domain is similar to mission critical software so that the goal is to catch as many defects as possible. Therefore, the expectation from a predictor is to get very high probability of detection (pd). On the other hand, most embedded systems in practice are commercial products, and companies would like to lower their costs to remain competitive in their market by keeping their false alarm (pf) rates as low as possible and improving their precision rates. In our experiments, we used data collected from our industry partners as well as publicly available data. Our results reveal that ensemble of classifiers significantly decreases pf down to 15% while increasing precision by 43% and hence, keeping balance rates at 74%. The cost-benefit analysis of the proposed model shows that it is enough to inspect 23% of the code on local datasets to detect around 70% of defects.

Expert Systems With Applications | 2009

Data mining source code for locating software bugs

Burak Turhan; Gözde Koçak; Ayse Basar Bener

In a large software system knowing which files are most likely to be fault-prone is valuable information for project managers. They can use such information in prioritizing software testing and allocating resources accordingly. However, our experience shows that it is difficult to collect and analyze fine-grained test defects in a large and complex software system. On the other hand, previous research has shown that companies can safely use cross-company data with nearest neighbor sampling to predict their defects in case they are unable to collect local data. In this study we analyzed 25 projects of a large telecommunication system. To predict defect proneness of modules we trained models on publicly available Nasa MDP data. In our experiments we used static call graph based ranking (CGBR) as well as nearest neighbor sampling for constructing method level defect predictors. Our results suggest that, for the analyzed projects, at least 70% of the defects can be detected by inspecting only (i) 6% of the code using a Naive Bayes model, (ii) 3% of the code using CGBR framework.

international symposium on computer and information sciences | 2007

Software effort estimation using machine learning methods

Bilge Baskeles; Burak Turhan; Ayse Basar Bener

In software engineering, the main aim is to develop projects that produce the desired results within limited schedule and budget. The most important factor affecting the budget of a project is the effort. Therefore, estimating effort is crucial because hiring people more than needed leads to a loss of income and hiring people less than needed leads to an extension of schedule. The main objective of this research is making an analysis of software effort estimation to overcome problems related to it: budget and schedule extension. To accomplish this, we propose a model that uses machine learning methods. We evaluate these models on public datasets and data gathered from software organizations in Turkey. It is found out in the experiments that the best method for a dataset may change and this proves the point that the usage of one model cannot always produce the best results.

Knowledge Based Systems | 2009

Ensemble of neural networks with associative memory (ENNA) for estimating software development costs

Yigit Kultur; Burak Turhan; Ayse Basar Bener

Companies usually have limited amount of data for effort estimation. Machine learning methods have been preferred over parametric models due to their flexibility to calibrate the model for the available data. On the other hand, as machine learning methods become more complex, they need more data to learn from. Therefore the challenge is to increase the performance of the algorithm when there is limited data. In this paper, we use a relatively complex machine learning algorithm, neural networks, and show that stable and accurate estimations are achievable with an ensemble using associative memory. Our experimental results show that our proposed algorithm (ENNA) produces significantly better results than neural network (NN) in terms of accuracy and robustness. We also analyze the effect of feature subset selection on ENNAs estimation performance in a wrapper framework. We show that the proposed ENNA algorithm that use the features selected by the wrapper does not perform worse than those that use all available features. Therefore, measuring only company specific key factors is sufficient to obtain accurate and robust estimates about software cost estimation using ENNA.

Expert Systems With Applications | 2009

Feature weighting heuristics for analogy-based effort estimation models

Ayse Tosun; Burak Turhan; Ayse Basar Bener

Software cost estimation is one of the critical tasks in project management. In a highly demanding and competitive market environment, software project managers need robust models and methodologies to accurately predict the cost of a new project. Analogy-based cost estimation is one of the widely used models that rely on historical project data. It checks the similarity of features between past and current projects, and it approximates current project cost from past ones. One shortcoming of analogy-based cost estimation is that it assumes all project features as equal. However, these features may have different impacts on project cost based on their relevance. In this research, we present two feature weight assignment heuristics for cost estimation. We assign weights to the project features by benefiting from a statistical technique, namely principal components analysis (PCA) that is used for extracting optimal linear patterns of high dimensional data. We test our proposed heuristics on public datasets and conclude that the prediction performance in terms of MMRE and Pred(25) increases with a statistical-based assignment technique rather than random assignment approach.

Explore More