Is this you? Create Your Porfile

Banu Diri

Yıldız Technical University

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Banu Diri is active.

Explore More

Publication

Featured researches published by Banu Diri.

Expert Systems With Applications | 2009

A systematic review of software fault prediction studies

Cagatay Catal; Banu Diri

This paper provides a systematic review of previous software fault prediction studies with a specific focus on metrics, methods, and datasets. The review uses 74 software fault prediction papers in 11 journals and several conference proceedings. According to the review results, the usage percentage of public datasets increased significantly and the usage percentage of machine learning algorithms increased slightly since 2005. In addition, method-level metrics are still the most dominant metrics in fault prediction research area and machine learning algorithms are still the most popular methods for fault prediction. Researchers working on software fault prediction area should continue to use public datasets and machine learning algorithms to build better fault predictors. The usage percentage of class-level is beyond acceptable levels and they should be used much more than they are now in order to predict the faults earlier in design phase of software life cycle.

Expert Systems With Applications | 2008

Visualization and analysis of classifiers performance in multi-class medical data

Banu Diri; Songül Albayrak

The primary role of the thyroid gland is to help regulation of the bodys metabolism. The correct diagnosis of thyroid dysfunctions is very important and early diagnosis is the key factor in its successful treatment. In this article, we used four different kinds of classifiers, namely Bayesian, k-NN, k-Means and 2-D SOM to classify the thyroid gland data set. The robustness of classifiers with regard to sampling variations is examined using a cross validation method and the performance of classifiers in medical diagnostic is visualized by using cobweb representation. The cobweb representation is the original contribution of this work to visualize the classifiers performance when the data have more than two classes. This representation is a newly used method to visualize the classifiers performance in medical diagnosis.

Expert Systems With Applications | 2011

Practical development of an Eclipse-based software fault prediction tool using Naive Bayes algorithm

Cagatay Catal; Ugur Sevim; Banu Diri

Despite the amount of effort software engineers have been putting into developing fault prediction models, software fault prediction still poses great challenges. This research using machine learning and statistical techniques has been ongoing for 15years, and yet we still have not had a breakthrough. Unfortunately, none of these prediction models have achieved widespread applicability in the software industry due to a lack of software tools to automate this prediction process. Historical project data, including software faults and a robust software fault prediction tool, can enable quality managers to focus on fault-prone modules. Thus, they can improve the testing process. We developed an Eclipse-based software fault prediction tool for Java programs to simplify the fault prediction process. We also integrated a machine learning algorithm called Naive Bayes into the plug-in because of its proven high-performance for this problem. This article presents a practical view to software fault prediction problem, and it shows how we managed to combine software metrics with software fault data to apply Naive Bayes technique inside an open source platform.

international conference on information technology: new generations | 2009

Clustering and Metrics Thresholds Based Software Fault Prediction of Unlabeled Program Modules

Cagatay Catal; Ugur Sevim; Banu Diri

Predicting the fault-proneness of program modules when the fault labels for modules are unavailable is a practical problem frequently encountered in the software industry. Because fault data belonging to previous software version is not available, supervised learning approaches can not be applied, leading to the need for new methods, tools, or techniques. In this study, we propose a clustering and metrics thresholds based software fault prediction approach for this challenging problem and explore it on three datasets, collected from a Turkish white-goods manufacturer developing embedded controller software. Experiments reveal that unsupervised software fault prediction can be automated and reasonable results can be produced with techniques based on metrics thresholds and clustering. The results of this study demonstrate the effectiveness of metrics thresholds and show that the standalone application of metrics thresholds (one-stage) is currently easier than the clustering and metrics thresholds based (two-stage) approach because the selection of cluster number is performed heuristically in this clustering based method.

product focused software process improvement | 2008

A Fault Prediction Model with Limited Fault Data to Improve Test Process

Cagatay Catal; Banu Diri

Software fault prediction models are used to identify the fault-prone software modules and produce reliable software. Performance of a software fault prediction model is correlated with available software metrics and fault data. In some occasions, there may be few software modules having fault data and therefore, prediction models using only labeled data can not provide accurate results. Semi-supervised learning approaches which benefit from unlabeled and labeled data may be applied in this case. In this paper, we propose an artificial immune system based semi-supervised learning approach. Proposed approach uses a recent semi-supervised algorithm called YATSI (Yet Another Two Stage Idea) and in the first stage of YATSI, AIRS (Artificial Immune Recognition Systems) is applied. In addition, AIRS, RF (Random Forests) classifier, AIRS based YATSI, and RF based YATSI are benchmarked. Experimental results showed that while YATSI algorithm improved the performance of AIRS, it diminished the performance of RF for unbalanced datasets. Furthermore, performance of AIRS based YATSI is comparable with RF which is the best machine learning classifier according to some researches.

product focused software process improvement | 2007

Software fault prediction with object-oriented metrics based artificial immune recognition system

Cagatay Catal; Banu Diri

Software testing is a time-consuming and expensive process. Software fault prediction models are used to identify fault-prone classes automatically before system testing. These models can reduce the testing duration, project risks, resource and infrastructure costs. In this study, we propose a novel fault prediction model to improve the testing process. Chidamber-Kemerer Object-Oriented metrics and method-level metrics such as Halstead and McCabe are used as independent metrics in our Artificial Immune Recognition System based model. According to this study, class-level metrics based model which applies AIRS algorithm can be used successfully for fault prediction and its performance is higher than J48 based approach. A fault prediction tool which uses this model can be easily integrated into the testing process.

international conference on dependability of computer systems | 2007

An Artificial Immune System Approach for Fault Prediction in Object-Oriented Software

Cagatay Catal; Banu Diri; Bulent Ozumut

The features of real-time dependable systems are availability, reliability, safety and security. In the near future, real-time systems will be able to adapt themselves according to the specific requirements and real-time dependability assessment technique will be able to classify modules as faulty or fault-free. Software fault prediction models help us in order to develop dependable software and they are commonly applied prior to system testing. In this study, we examine Chidamber-Kemerer (CK) metrics and some method-level metrics for our model which is based on artificial immune recognition system (AIRS) algorithm. The dataset is a part of NASA Metrics Data Program and class-level metrics are from PROMISE repository. Instead of validating individual metrics, our mission is to improve the prediction performance of our model. The experiments indicate that the combination of CK and the lines of code metrics provide the best prediction results for our fault prediction model. The consequence of this study suggests that class-level data should be used rather than method-level data to construct relatively better fault prediction models. Furthermore, this model can constitute a part of real-time dependability assessment technique for the future.

Engineering Applications of Artificial Intelligence | 2015

A corpus-based semantic kernel for text classification by using meaning values of terms

Berna Altınel; Murat Can Ganiz; Banu Diri

Text categorization plays a crucial role in both academic and commercial platforms due to the growing demand for automatic organization of documents. Kernel-based classification algorithms such as Support Vector Machines (SVM) have become highly popular in the task of text mining. This is mainly due to their relatively high classification accuracy on several application domains as well as their ability to handle high dimensional and sparse data which is the prohibitive characteristics of textual data representation. Recently, there is an increased interest in the exploitation of background knowledge such as ontologies and corpus-based statistical knowledge in text categorization. It has been shown that, by replacing the standard kernel functions such as linear kernel with customized kernel functions which take advantage of this background knowledge, it is possible to increase the performance of SVM in the text classification domain. Based on this, we propose a novel semantic smoothing kernel for SVM. The suggested approach is based on a meaning measure, which calculates the meaningfulness of the terms in the context of classes. The documents vectors are smoothed based on these meaning values of the terms in the context of classes. Since we efficiently make use of the class information in the smoothing process, it can be considered a supervised smoothing kernel. The meaning measure is based on the Helmholtz principle from Gestalt theory and has previously been applied to several text mining applications such as document summarization and feature extraction. However, to the best of our knowledge, ours is the first study to use meaning measure in a supervised setting to build a semantic kernel for SVM. We evaluated the proposed approach by conducting a large number of experiments on well-known textual datasets and present results with respect to different experimental conditions. We compare our results with traditional kernels used in SVM such as linear kernel as well as with several corpus-based semantic kernels. Our results show that classification performance of the proposed approach outperforms other kernels.

Expert Systems | 2009

Unlabelled extra data do not always mean extra performance for semi‐supervised fault prediction

Cagatay Catal; Banu Diri

: This research focused on investigating and benchmarking several high performance classifiers called J48, random forests, naive Bayes, KStar and artificial immune recognition systems for software fault prediction with limited fault data. We also studied a recent semi-supervised classification algorithm called YATSI (Yet Another Two Stage Idea) and each classifier has been used in the first stage of YATSI. YATSI is a meta algorithm which allows different classifiers to be applied in the first stage. Furthermore, we proposed a semi-supervised classification algorithm which applies the artificial immune systems paradigm. Experimental results showed that YATSI does not always improve the performance of naive Bayes when unlabelled data are used together with labelled data. According to experiments we performed, the naive Bayes algorithm is the best choice to build a semi-supervised fault prediction model for small data sets and YATSI may improve the performance of naive Bayes for large data sets. In addition, the YATSI algorithm improved the performance of all the classifiers except naive Bayes on all the data sets.

applications of natural language to data bases | 2006

Automatic turkish text categorization in terms of author, genre and gender

M. Fatih Amasyali; Banu Diri

In this study, a first comprehensive text classification using n-gram model has been realized for Turkish. We worked in 3 different areas such as determining the identification of a Turkish documents author, classifying documents according to texts genre and identifying a gender of an author, automatically. Naive Bayes, Support Vector Machine, C 4.5 and Random Forest were used as classification methods and the results were given comparatively. The success in determining the author of the text, genre of the text and gender of the author was obtained as 83%, 93% and 96%, respectively.

Explore More