Zehra Cataltepe
Istanbul Technical University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Zehra Cataltepe.
Neural Computation | 1999
Zehra Cataltepe; Yaser S. Abu-mostafa; Malik Magdon-Ismail
We show that with a uniform prior on models having the same training error, early stopping at some fixed training error above the training error minimum results in an increase in the expected generalization error.
Neurocomputing | 2010
Yusuf Yaslan; Zehra Cataltepe
We introduce the relevant random subspace Co-training (Rel-RASCO) algorithm which produces relevant random subspaces and then does semi-supervised ensemble learning using those subspaces and unlabeled data. Ensemble learning algorithms may benefit from diversity of classifiers used. However, for high dimensional data choosing subspaces randomly, as in RASCO (Random Subspace Method for Co-training, Wang et al. 2008 [5]) algorithm, may produce diverse but inaccurate classifiers. We produce relevant random subspaces by means of drawing features with probabilities proportional to their relevances measured by the mutual information between features and class labels. We show that Rel-RASCO achieves better accuracy by this relevant and random subspace selection scheme. Experiments on five real and one synthetic datasets show that Rel-RASCO algorithm outperforms both RASCO and Co-training in terms of the accuracy achieved at the end of Co-training.
EURASIP Journal on Advances in Signal Processing | 2007
Zehra Cataltepe; Yusuf Yaslan; Abdullah Sonmez
We report our findings on using MIDI files and audio features from MIDI, separately and combined together, for MIDI music genre classification. We use McKay and Fujinagas 3-root and 9-leaf genre data set. In order to compute distances between MIDI pieces, we use normalized compression distance (NCD). NCD uses the compressed length of a string as an approximation to its Kolmogorov complexity and has previously been used for music genre and composer clustering. We convert the MIDI pieces to audio and then use the audio features to train different classifiers. MIDI and audio from MIDI classifiers alone achieve much smaller accuracies than those reported by McKay and Fujinaga who used not NCD but a number of domain-based MIDI features for their classification. Combining MIDI and audio from MIDI classifiers improves accuracy and gets closer to, but still worse, accuracies than McKay and Fujinagas. The best root genre accuracies achieved using MIDI, audio, and combination of them are 0.75, 0.86, and 0.93, respectively, compared to 0.98 of McKay and Fujinaga. Successful classifier combination requires diversity of the base classifiers. We achieve diversity through using certain number of seconds of the MIDI file, different sample rates and sizes for the audio file, and different classification algorithms.
international symposium on computer and information sciences | 2008
Baris Senliol; Gokhan Gulgezen; Lei Yu; Zehra Cataltepe
In this paper we describe an extension of the information theoretical FCBF (Fast Correlation Based Feature Selection) algorithm. The extension, called FCBF#, enables FCBF to select any given size of feature subset and it selects features in a different order than the FCBF. We find out that the extended FCBF algorithm results in more accurate classifiers.
european conference on machine learning | 2009
Gokhan Gulgezen; Zehra Cataltepe; Lei Yu
In addition to accuracy, stability is also a measure of success for a feature selection algorithm. Stability could especially be a concern when the number of samples in a data set is small and the dimensionality is high. In this study, we introduce a stability measure, and perform both accuracy and stability measurements of MRMR (Minimum Redundancy Maximum Relevance) feature selection algorithm on different data sets. The two feature evaluation criteria used by MRMR, MID (Mutual Information Difference) and MIQ (Mutual Information Quotient), result in similar accuracies, but MID is more stable. We also introduce a new feature selection criterion, MID *** , where redundancy and relevance of selected features are controlled by parameter *** .
Data Mining and Knowledge Discovery | 2016
İsmail Güneş; Şule Gündüz-Öğüdücü; Zehra Cataltepe
We propose a link prediction method for evolving networks. Our method first computes a number of different node similarity scores (e.g. Common Neighbor, Preferential Attachment, Adamic–Adar, Jaccard) and their weighted versions, for different past time periods. In order to predict the future node similarity scores, a powerful time series forecasting model, ARIMA, based on these past node similarity scores is used. This time series forecasting based approach enables link prediction based on modeling of the change of past node similarities and also external factors. The proposed link prediction method can be used for evolving networks and prediction of new or recurring links. We evaluate the link prediction performances of our proposed method and the previously proposed time series and similarity based link prediction methods under different circumstances by means of different AUC measures. We show that, the link prediction method proposed in this article results in a better performance than the previous methods.
international conference on pattern recognition | 2006
Yusuf Yaslan; Zehra Cataltepe
We examine performance of different classifiers on different audio feature sets to determine the genre of a given music piece. For each classifier, we also evaluate performances of feature sets obtained by dimensionality reduction methods. Finally, we experiment on increasing classification accuracy by combining different classifiers. Using a set of different classifiers, we first obtain a test genre classification accuracy of around 79.6 plusmn 4.2% on 10 genre set of 1000 music pieces. This performance is better than 71.1 plusmn 7.3% which is the best that has been reported on this data set. We also obtain 80% classification accuracy by using dimensionality reduction or combining different classifiers. We observe that the best feature set depends on the classifier used
international conference on data engineering | 2007
Zehra Cataltepe; Eser Aygün
k-nearest neighbor and centroid-based classification algorithms are frequently used in text classification due to their simplicity and performance. While k-nearest neighbor algorithm usually performs well in terms of accuracy, it is slow in recognition phase. Because the distances/similarities between the new data point to be recognized and all the training data need to be computed. On the other hand, centroid-based classification algorithms are very fast, because only as many distance/similarity computations as the number of centroids (i.e. classes) needs to be done. In this paper, we evaluate the performance of centroid-based classification algorithm and compare it to nearest mean and nearest neighbor algorithms on 9 data sets. We propose and evaluate an improvement on centroid based classification algorithm. Proposed algorithm starts from the centroids of each class and increases the weight of misclassified training data points on the centroid computation until the validation error starts increasing. The weight increase is done based on the training confusion matrix entries for misclassified points. Vie proposed algorithm results in smaller test error than centroid-based classification algorithm in 7 out of 9 data sets. It is also better than 10-nearest neighbor algorithm in 8 out of 9 data sets. We also evaluate different similarity metrics together with centroid and nearest neighbor algorithms. We find out that, when Euclidean distance is turned into a similarity measure using division as opposed to exponentiation. Euclidean-based similarity can perform almost as good as cosine similarity.
Expert Systems With Applications | 2015
Hakan Gunduz; Zehra Cataltepe
The direction of Borsa Istanbul 100 Index (BIST100) open prices is predicted.A feature selection method, called Balanced Mutual Information (BMI) is proposed.BMI is able to deal with the class imbalance problem through oversampling.BMI is compared with Mutual Information and Chi-square based feature selection.BMI achieves higher macro-averaged F-measure than the other methods using less features. In this paper, a novel method is proposed to predict the direction of Borsa Istanbul (BIST) 100 Index (BIST100) open prices using the news articles released, as well as the price data, from the day before. Although English news articles have been used for market-prediction before, to the best of our knowledge, Turkish news articles together with prices have not yet been used to predict the Turkish markets. Turkish text mining techniques are applied on news articles to form feature vectors for each trading day. The feature vectors are assigned three labels based on the direction of the price change from the closing price of the day before and whether the change is significant. News articles are represented using high dimensional features, some of which could be noisy or irrelevant for prediction. There is also the scarcity of training data. Therefore, this study incorporates feature selection methods to select features that could improve classification performance. By its nature, significant positive or negative changes in stock price happen much less than non-significant changes, resulting in an imbalanced data set. Most feature selection methods in literature aim to reduce the classification accuracy. However, for imbalanced datasets, other measures, such as macro-averaged F-measure need to be considered. The paper proposes a feature selection methods that is able to deal with the class imbalance problem through oversampling of the minority classes and consideration of an ensemble of selected features. In order to decide on importance of features, as the relevance criterion for each feature, the proposed methodology uses mutual information which can detect nonlinear dependencies between variables. Therefore, the proposed feature selection method is called Balanced Mutual Information (BMI) feature selection method. Experiments were performed based on news articles provided by two different news sources: Public Disclosure Platform of BIST and financial news websites. It was shown that, using Balanced Mutual Information feature selection method, the significant changes in the BIST100 Index were predicted with an accuracy of 0.74 and a macro-averaged F-measure of 0.68. The BMI feature selection method was compared with Mutual Information and Chi-square based feature selection methods and it was found out that BMI method results in higher performance using a smaller number of features.
signal processing and communications applications conference | 2007
Hakki Murat Genc; Zehra Cataltepe; T. C. Pearson
Dimensionality reduction algorithms help reduce the classification time and sometimes the classification error of a classifier (Yang, et al., 1997). For time critical applications, in order to have reduction in the feature acquisition phase, feature selection methods are more preferable to dimensionality reduction methods, which require measurement of all inputs. Traditional feature selection methods, such as forward or backward feature selection, are costly to implement. In this study, we introduce a new feature selection method that decides on which features to retain, based on how PCA (principal component analysis) or ICA (independent component analysis) (Hyvarinen and Oja, 1999) values those features. We compare the accuracy of our method to backward and forward feature selection with the same number of features selected and PCA and ICA using the same number of principal and independent components. For our experiments, we use spectral measurement data taken from corn kernels infested and not infested by fungi.