Olcay Taner Yildiz | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Olcay Taner Yildiz is active.

Explore More

Publication

Featured researches published by Olcay Taner Yildiz.

International Journal of Bifurcation and Chaos | 2010

CRYPTANALYSIS OF FRIDRICH'S CHAOTIC IMAGE ENCRYPTION

Ercan Solak; Cahit Çokal; Olcay Taner Yildiz; Türker Bíyíkoglu

We cryptanalyze Fridrich’s chaotic image encryption algorithm. We show that the algebraic weaknesses of the algorithm make it vulnerable against chosen-ciphertext attacks. We propose an attack that reveals the secret permutation that is used to shuffle the pixels of a round input. We demonstrate the effectiveness of our attack with examples and simulation results. We also show that our proposed attack can be generalized to other well-known chaotic image encryption algorithms.

Empirical Software Engineering | 2014

Software defect prediction using Bayesian networks

Ahmet Okutan; Olcay Taner Yildiz

There are lots of different software metrics discovered and used for defect prediction in the literature. Instead of dealing with so many metrics, it would be practical and easy if we could determine the set of metrics that are most important and focus on them more to predict defectiveness. We use Bayesian networks to determine the probabilistic influential relationships among software metrics and defect proneness. In addition to the metrics used in Promise data repository, we define two more metrics, i.e. NOD for the number of developers and LOCQ for the source code quality. We extract these metrics by inspecting the source code repositories of the selected Promise data repository data sets. At the end of our modeling, we learn the marginal defect proneness probability of the whole software system, the set of most effective metrics, and the influential relationships among metrics and defectiveness. Our experiments on nine open source Promise data repository data sets show that response for class (RFC), lines of code (LOC), and lack of coding quality (LOCQ) are the most effective metrics whereas coupling between objects (CBO), weighted method per class (WMC), and lack of cohesion of methods (LCOM) are less effective metrics on defect proneness. Furthermore, number of children (NOC) and depth of inheritance tree (DIT) have very limited effect and are untrustworthy. On the other hand, based on the experiments on Poi, Tomcat, and Xalan data sets, we observe that there is a positive correlation between the number of developers (NOD) and the level of defectiveness. However, further investigation involving a greater number of projects is needed to confirm our findings.

Information Sciences | 2009

Incremental construction of classifier and discriminant ensembles

Aydın Ulaş; Murat Semerci; Olcay Taner Yildiz; Ethem Alpaydin

We discuss approaches to incrementally construct an ensemble. The first constructs an ensemble of classifiers choosing a subset from a larger set, and the second constructs an ensemble of discriminants, where a classifier is used for some classes only. We investigate criteria including accuracy, significant improvement, diversity, correlation, and the role of search direction. For discriminant ensembles, we test subset selection and trees. Fusion is by voting or by a linear model. Using 14 classifiers on 38 data sets, incremental search finds small, accurate ensembles in polynomial time. The discriminant ensemble uses a subset of discriminants and is simpler, interpretable, and accurate. We see that an incremental ensemble has higher accuracy than bagging and random subspace method; and it has a comparable accuracy to AdaBoost, but fewer classifiers.

International Journal of Pattern Recognition and Artificial Intelligence | 2005

LINEAR DISCRIMINANT TREES

Olcay Taner Yildiz; Ethem Alpaydin

We discuss and test empirically the eects of six dimensions along which existing decision tree induction algorithms dier. These are: Node type (univariate versus multivariate), branching factor (two or more), grouping of classes into two if the tree is binary, error (impurity) measure, and the methods for minimization to nd the best split vector and threshold. We then propose a new decision tree induction method that we name linear discriminant trees (LDT) which uses the best combination of these criteria in terms of accuracy, simplicity and learning time. This tree induction method can be univariate or multivariate. The method has a supervised outer optimization layer for converting a K > 2-class problem into a sequence of two-class problems and each two-class problem is solved analytically using Fisher’s Linear Discriminant Analysis (LDA). On twenty datasets from the UCI repository, we compare the linear discriminant trees with the univariate decision tree methods C4.5 and C5.0, multivariate decision tree methods CART, OC1, QUEST, neural trees and LMDT. Our proposed linear discriminant trees learn fast, are accurate, and the trees generated are small.

Pattern Recognition Letters | 2007

Parallel univariate decision trees

Olcay Taner Yildiz; Onur Dikmen

Univariate decision tree algorithms are widely used in data mining because (i) they are easy to learn (ii) when trained they can be expressed in rule based manner. In several applications mainly including data mining, the dataset to be learned is very large. In those cases it is highly desirable to construct univariate decision trees in reasonable time. This may be accomplished by parallelizing univariate decision tree algorithms. In this paper, we first present two different univariate decision tree algorithms C4.5 and univariate linear discriminant tree. We show how to parallelize these algorithms in three ways: (i) feature based; (ii) node based; (iii) data based manners. Experimental results show that performance of the parallelizations highly depend on the dataset and the node based parallelization demonstrate good speedups.

International Journal of Pattern Recognition and Artificial Intelligence | 2009

AN INCREMENTAL FRAMEWORK BASED ON CROSS-VALIDATION FOR ESTIMATING THE ARCHITECTURE OF A MULTILAYER PERCEPTRON

Oya Aran; Olcay Taner Yildiz; Ethem Alpaydin

We define the problem of optimizing the architecture of a multilayer perceptron (MLP) as a state space search and propose the MOST (Multiple Operators using Statistical Tests) framework that incrementally modifies the structure and checks for improvement using cross-validation. We consider five variants that implement forward/backward search, using single/multiple operators and searching depth-first/breadth-first. On 44 classification and 30 regression datasets, we exhaustively search for the optimal and evaluate the goodness based on: (1) Order, the accuracy with respect to the optimal and (2) Rank, the computational complexity. We check for the effect of two resampling methods (5 × 2, ten-fold cv), four statistical tests (5 × 2 cv t, ten-fold cv t, Wilcoxon, sign) and two corrections for multiple comparisons (Bonferroni, Holm). We also compare with Dynamic Node Creation (DNC) and Cascade Correlation (CC). Our results show that: (1) On most datasets, networks with few hidden units are optimal, (2) forward searching finds simpler architectures, (3) variants using single node additions (deletions) generally stop early and get stuck in simple (complex) networks, (4) choosing the best of multiple operators finds networks closer to the optimal, (5) MOST variants generally find simpler networks having lower or comparable error rates than DNC and CC.

IEEE/ACM Transactions on Computational Biology and Bioinformatics | 2012

Design and Analysis of Classifier Learning Experiments in Bioinformatics: Survey and Case Studies

Ozan Irsoy; Olcay Taner Yildiz; Ethem Alpaydin

In many bioinformatics applications, it is important to assess and compare the performances of algorithms trained from data, to be able to draw conclusions unaffected by chance and are therefore significant. Both the design of such experiments and the analysis of the resulting data using statistical tests should be done carefully for the results to carry significance. In this paper, we first review the performance measures used in classification, the basics of experiment design and statistical tests. We then give the results of our survey over 1,500 papers published in the last two years in three bioinformatics journals (including this one). Although the basics of experiment design are well understood, such as resampling instead of using a single training set and the use of different performance metrics instead of error, only 21 percent of the papers use any statistical test for comparison. In the third part, we analyze four different scenarios which we encounter frequently in the bioinformatics literature, discussing the proper statistical methodology as well as showing an example case study for each. With the supplementary software, we hope that the guidelines we discuss will play an important role in future studies.

IEEE Transactions on Knowledge and Data Engineering | 2013

Omnivariate Rule Induction Using a Novel Pairwise Statistical Test

Olcay Taner Yildiz

Rule learning algorithms, for example, Ripper, induces univariate rules, that is, a propositional condition in a rule uses only one feature. In this paper, we propose an omnivariate induction of rules where under each condition, both a univariate and a multivariate condition are trained, and the best is chosen according to a novel statistical test. This paper has three main contributions: First, we propose a novel statistical test, the combined 5 × 2 cv t test, to compare two classifiers, which is a variant of the 5 × 2 cv t test and give the connections to other tests as 5 × 2 cv F test and k-fold paired t test. Second, we propose a multivariate version of Ripper, where support vector machine with linear kernel is used to find multivariate linear conditions. Third, we propose an omnivariate version of Ripper, where the model selection is done via the combined 5 × 2 cv t test. Our results indicate that 1) the combined 5 × 2 cv t test has higher power (lower type II error), lower type I error, and higher replicability compared to the 5 × 2 cv t test, 2) omnivariate rules are better in that they choose whichever condition is more accurate, selecting the right model automatically and separately for each condition in a rule.

Information Sciences | 2012

Eigenclassifiers for combining correlated classifiers

Aydın Ulaş; Olcay Taner Yildiz; Ethem Alpaydin

In practice, classifiers in an ensemble are not independent. This paper is the continuation of our previous work on ensemble subset selection [A. Ulas, M. Semerci, O.T. Yildiz, E. Alpaydin, Incremental construction of classifier and discriminant ensembles, Information Sciences, 179 (9) (2009) 1298-1318] and has two parts: first, we investigate the effect of four factors on correlation: (i) algorithms used for training, (ii) hyperparameters of the algorithms, (iii) resampled training sets, (iv) input feature subsets. Simulations using 14 classifiers on 38 data sets indicate that hyperparameters and overlapping training sets have higher effect on positive correlation than features and algorithms. Second, we propose postprocessing before fusing using principal component analysis (PCA) to form uncorrelated eigenclassifiers from a set of correlated experts. Combining the information from all classifiers may be better than subset selection where some base classifiers are pruned before combination, because using all allows redundancy.

international symposium on computer and information sciences | 2009

Calculating the VC-dimension of decision trees

Olcay Taner Yildiz; Ethem Alpaydin

We propose an exhaustive search algorithm that calculates the VC-dimension of univariate decision trees with binary features. The VC-dimension of the univariate decision tree with binary features depends on (i) the VC-dimension values of the left and right subtrees, (ii) the number of inputs, and (iii) the number of nodes in the tree. From a training set of example trees whose VC-dimensions are calculated by exhaustive search, we fit a general regressor to estimate the VC-dimension of any binary tree. These VC-dimension estimates are then used to get VC-generalization bounds for complexity control using SRM in decision trees, i.e., pruning. Our simulation results shows that SRM-pruning using the estimated VC-dimensions finds trees that are as accurate as those pruned using cross-validation.

Explore More