Derya Birant
Dokuz Eylül University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Derya Birant.
data and knowledge engineering | 2007
Derya Birant; Alp Kut
This paper presents a new density-based clustering algorithm, ST-DBSCAN, which is based on DBSCAN. We propose three marginal extensions to DBSCAN related with the identification of (i) core objects, (ii) noise objects, and (iii) adjacent clusters. In contrast to the existing density-based clustering algorithms, our algorithm has the ability of discovering clusters according to non-spatial, spatial and temporal values of the objects. In this paper, we also present a spatial-temporal data warehouse system designed for storing and clustering a wide range of spatial-temporal data. We show an implementation of our algorithm by using this data warehouse and present the data mining results.
information technology interfaces | 2006
Derya Birant; Alp Kut
Outlier detection is one of the major data mining methods. This paper proposes a three-step approach to detect spatio-temporal outliers in large databases. These steps are clustering, checking spatial neighbors, and checking temporal neighbors. In this paper, we introduce a new outlier detection algorithm to find small groups of data objects that are exceptional when compared with rest large amount of data. In contrast to the existing outlier detection algorithms, new algorithm has the ability of discovering outliers according to the non-spatial, spatial and temporal values of the objects. In order to demonstrate the new algorithm, this paper also presents an example application using a data warehouse
Expert Systems With Applications | 2011
Gözde Bakırlı; Derya Birant; Alp Kut
Traditionally, data mining tasks such as classification and clustering are performed on data warehouses. Usually, updates are collected and applied to the data warehouse frequent time periods. For this reason, all patterns derived from the data warehouse have to be updated frequently as well. Due to the very large volumes of data, it is highly desirable to perform these updates incrementally. This study proposes a new incremental genetic algorithm for classification for efficiently handling new transactions. It presents the comparison results of traditional genetic algorithm and incremental genetic algorithm for classification. Experimental results show that our incremental genetic algorithm considerably decreases the time needed for training to construct a new classifier with the new dataset. This study also includes the sensitivity analysis of the incremental genetic algorithm parameters such as crossover probability, mutation probability, elitism and population size. In this analysis, many specific models were created using the same training dataset but with different parameter values, and then the performances of the models were compared.
Information Systems | 2015
Elem Guzel Kalayci; Tahir Emre Kalayci; Derya Birant
Processing the excessive volumes of information on the Web is an important issue. The Semantic Web paradigm has been proposed as the solution. However, this approach generates several challenges, such as query processing and optimisation. This paper proposes a novel approach for optimising SPARQL queries with different graph shapes. This new method reorders the triple patterns using Ant Colony Optimisation (ACO) algorithms. Reordering the triple patterns is a way of decreasing the execution times of the SPARQL queries. The proposed approach is focused on in-memory models of RDF data, and it optimises the SPARQL queries by means of Ant System, Elitist Ant System and MAX-MIN Ant System algorithms. The approach is implemented in the Apache Jena ARQ query engine, which is used for the experimentation, and the new method is compared with Normal Execution, Jena Reorder Algorithms, and the Stocker et al. Algorithms. All of the experiments are performed using the LUBM dataset for various shapes of queries, such as chain, star, cyclic, and chain-star. The first contribution is the real-time optimisation of SPARQL query triple pattern orders using ACO algorithms, and the second contribution is the concrete implementation for the ARQ query engine, which is a component of the widely used Semantic Web framework Apache Jena. The experiments demonstrate that the proposed method reduces the execution time of the queries significantly. HighlightsAn approach for optimising SPARQL SELECT queries with different graph shapes and different number of triple patterns.Ant Colony Optimisation algorithms are used to optimise the queries.The approach is implemented in the Apache Jena ARQ query engine.Experiments are performed using the LUBM dataset for various shapes of queries.The experiments demonstrate that the proposed method reduces the execution time of the queries significantly.
international conference on web services | 2004
Alp Kut; Derya Birant
This paper presents a model which combines the processing power of parallel computation with the ease of Web service usage. In this model, parallel programming environment can be embedded in a visual environment. Parallelization of Web services is provided by using multithreading technology with dataset parameters. This work also provides parallel usage of computers located in different places via a wide area network such as Internet.
international symposium on innovations in intelligent systems and applications | 2014
Pelin Yildirim; Derya Birant
In data mining, when using Naive Bayes classification technique, it is necessary to overcome the problem of how to deal with continuous attributes. Most previous work has solved the problem either by using discretization, normal method or kernel method. This study proposes the usage of different continuous probability distribution techniques for Naive Bayes classification. It explores various probability density functions of distributions. The experimental results show that the proposed probability distributions also classify continuous data with potentially high accuracy. In addition, this paper introduces a novel method, named NBC4D, which offers a new approach for classification by applying different distribution types on different attributes. The results (obtained classification accuracy rates) show that our proposed method (the usage of more than one distribution types) has success on real-world datasets when compared with the usage of only one well known distribution type.
machine learning and data mining in pattern recognition | 2013
Yunus Dogan; Derya Birant; Alp Kut
Data clustering is an important and widely used task of data mining that groups similar items together into subsets. This paper introduces a new clustering algorithm SOM++, which first uses K-Means++ method to determine the initial weight values and the starting points, and then uses Self-Organizing Map (SOM) to find the final clustering solution. The purpose of this algorithm is to provide a useful technique to improve the solution of the data clustering and data mining in terms of runtime, the rate of unstable data points and internal error. This paper also presents the comparison of our algorithm with simple SOM and K-Means + SOM by using a real world data. The results show that SOM++ has a good performance in stability and significantly outperforms three other methods training time.
euro american conference on telematics and information systems | 2008
Arben Hajra; Derya Birant; Alp Kut
The main focus of this paper is to use web-based services, data mining techniques and mobile technologies to improve Quality Assurance (QA) in education. This paper presents rather sophisticated web-based tools and services dedicated to the QA in education. It proposes a model for efficient building of the key elements of the QA follow-up: surveys, questionnaires, the visualization of the obtained results, reporting and further usage of the obtained data. It also presents some practical applications to demonstrate the models capabilities.
Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery | 2018
Pelin Yildirim; Derya Birant; Tuba Alpyildiz
Data mining has been proven useful for knowledge discovery in many areas, ranging from marketing to medical and from banking to education. This study focuses on data mining and machine learning in textile industry as applying them to textile data is considered an emerging interdisciplinary research field. Thus, data mining studies, including classification and clustering techniques and machine learning algorithms, implemented in textile industry were presented and explained in detail in this study to provide an overview of how clustering and classification techniques can be applied in the textile industry to deal with different problems where traditional methods are not useful. This article clearly shows that a classification technique has higher interest than a clustering technique in the textile industry. It also shows that the most commonly applied classification methods are artificial neural networks and support vector machines, and they generally provide high accuracy rates in the textile applications. For the clustering task of data mining, a K‐means algorithm was generally implemented in textile studies among the others that were investigated in this article. We conclude with some remarks on the strength of the data mining techniques for textile industry, ways to overcome certain challenges, and offer some possible further research directions. WIREs Data Mining Knowl Discov 2018, 8:e1228. doi: 10.1002/widm.1228
Scientific Programming | 2018
Yunus Dogan; Feriştah Dalkılıç; Derya Birant; Recep Alp Kut; Reyat Yilmaz
The dimensionality reduction and visualization problems associated with multivariate centroids obtained by clustering algorithms are addressed in this paper. Two approaches are used in the literature for the solution of such problems, specifically, the self-organizing map (SOM) approach and mapping selected two features manually (MS2Fs). In addition, principle component analysis (PCA) was evaluated as a component for solving this problem on supervised datasets. Each of these traditional approaches has drawbacks: if SOM runs with a small map size, all centroids are located contiguously rather than at their original distances according to the high-dimensional structure; MS2Fs is not an efficient method because it does not take features outside of the method into account, and lastly, PCA is a supervised method and loses the most valuable feature. In this study, five novel hybrid approaches were proposed to eliminate these drawbacks by using the quantum genetic algorithm (QGA) method and four feature selection methods, Pearson’s correlation, gain ratio, information gain, and relief methods. Experimental results demonstrate that, for 14 datasets of different sizes, the prediction accuracy of the proposed weighted clustering approaches is higher than the traditional K-means++ clustering approach. Furthermore, the proposed approach combined with K-means++ and QGA shows the most efficient placements of the centroids on a two-dimensional map for all the test datasets.