Hasan Bulut | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Hasan Bulut is active.

Explore More

Publication

Featured researches published by Hasan Bulut.

Expert Systems With Applications | 2016

Ensemble of keyword extraction methods and classifiers in text classification

Aytuğ Onan; Serdar Korukoğlu; Hasan Bulut

Text classification is a domain with high dimensional feature space.Extracting the keywords as the features can be extremely useful in text classification.An empirical analysis of five statistical keyword extraction methods.A comprehensive analysis of classifier and keyword extraction ensembles.For ACM collection, a classification accuracy of 93.80% with Bagging ensemble of Random Forest. Automatic keyword extraction is an important research direction in text mining, natural language processing and information retrieval. Keyword extraction enables us to represent text documents in a condensed way. The compact representation of documents can be helpful in several applications, such as automatic indexing, automatic summarization, automatic classification, clustering and filtering. For instance, text classification is a domain with high dimensional feature space challenge. Hence, extracting the most important/relevant words about the content of the document and using these keywords as the features can be extremely useful. In this regard, this study examines the predictive performance of five statistical keyword extraction methods (most frequent measure based keyword extraction, term frequency-inverse sentence frequency based keyword extraction, co-occurrence statistical information based keyword extraction, eccentricity-based keyword extraction and TextRank algorithm) on classification algorithms and ensemble methods for scientific text document classification (categorization). In the study, a comprehensive study of comparing base learning algorithms (Naive Bayes, support vector machines, logistic regression and Random Forest) with five widely utilized ensemble methods (AdaBoost, Bagging, Dagging, Random Subspace and Majority Voting) is conducted. To the best of our knowledge, this is the first empirical analysis, which evaluates the effectiveness of statistical keyword extraction methods in conjunction with ensemble learning algorithms. The classification schemes are compared in terms of classification accuracy, F-measure and area under curve values. To validate the empirical analysis, two-way ANOVA test is employed. The experimental analysis indicates that Bagging ensemble of Random Forest with the most-frequent based keyword extraction method yields promising results for text classification. For ACM document collection, the highest average predictive performance (93.80%) is obtained with the utilization of the most frequent based keyword extraction method with Bagging ensemble of Random Forest algorithm. In general, Bagging and Random Subspace ensembles of Random Forest yield promising results. The empirical analysis indicates that the utilization of keyword-based representation of text documents in conjunction with ensemble learning can enhance the predictive performance and scalability of text classification schemes, which is of practical importance in the application fields of text classification.

Expert Systems With Applications | 2016

A multiobjective weighted voting ensemble classifier based on differential evolution algorithm for text sentiment classification

Aytuğ Onan; Serdar Korukoğlu; Hasan Bulut

Abstract Typically performed by supervised machine learning algorithms, sentiment analysis is highly useful for extracting subjective information from text documents online. Most approaches that use ensemble learning paradigms toward sentiment analysis involve feature engineering in order to enhance the predictive performance. In response, we sought to develop a paradigm of a multiobjective, optimization-based weighted voting scheme to assign appropriate weight values to classifiers and each output class based on the predictive performance of classification algorithms, all to enhance the predictive performance of sentiment classification. The proposed ensemble method is based on static classifier selection involving majority voting error and forward search, as well as a multiobjective differential evolution algorithm. Based on the static classifier selection scheme, our proposed ensemble method incorporates Bayesian logistic regression, naive Bayes, linear discriminant analysis, logistic regression, and support vector machines as base learners, whose performance in terms of precision and recall values determines weight adjustment. Our experimental analysis of classification tasks, including sentiment analysis, software defect prediction, credit risk modeling, spam filtering, and semantic mapping, suggests that the proposed classification scheme can predict better than conventional ensemble learning methods such as AdaBoost, bagging, random subspace, and majority voting. Of all datasets examined, the laptop dataset showed the best classification accuracy (98.86%).

Journal of Information Science | 2017

An improved ant algorithm with LDA-based representation for text document clustering

Aytuğ Onan; Hasan Bulut; Serdar Korukoğlu

Document clustering can be applied in document organisation and browsing, document summarisation and classification. The identification of an appropriate representation for textual documents is extremely important for the performance of clustering or classification algorithms. Textual documents suffer from the high dimensionality and irrelevancy of text features. Besides, conventional clustering algorithms suffer from several shortcomings, such as slow convergence and sensitivity to the initial value. To tackle the problems of conventional clustering algorithms, metaheuristic algorithms are frequently applied to clustering. In this paper, an improved ant clustering algorithm is presented, where two novel heuristic methods are proposed to enhance the clustering quality of ant-based clustering. In addition, the latent Dirichlet allocation (LDA) is used to represent textual documents in a compact and efficient way. The clustering quality of the proposed ant clustering algorithm is compared to the conventional clustering algorithms using 25 text benchmarks in terms of F-measure values. The experimental results indicate that the proposed clustering scheme outperforms the compared conventional and metaheuristic clustering methods for textual documents.

Information Processing and Management | 2017

A hybrid ensemble pruning approach based on consensus clustering and multi-objective evolutionary algorithm for sentiment classification

Aytu Onan; Serdar Korukolu; Hasan Bulut

Sentiment analysis is a critical task of extracting subjective information from online text documents. Ensemble learning can be employed to obtain more robust classification schemes. However, most approaches in the field incorporated feature engineering to build efficient sentiment classifiers.The purpose of our research is to establish an effective sentiment classification scheme by pursuing the paradigm of ensemble pruning. Ensemble pruning is a crucial method to build classifier ensembles with high predictive accuracy and efficiency. Previous studies employed exponential search, randomized search, sequential search, ranking based pruning and clustering based pruning. However, there are tradeoffs in selecting the ensemble pruning methods. In this regard, hybrid ensemble pruning schemes can be more promising.In this study, we propose a hybrid ensemble pruning scheme based on clustering and randomized search for text sentiment classification. Furthermore, a consensus clustering scheme is presented to deal with the instability of clustering results. The classifiers of the ensemble are initially clustered into groups according to their predictive characteristics. Then, two classifiers from each cluster are selected as candidate classifiers based on their pairwise diversity. The search space of candidate classifiers is explored by the elitist Pareto-based multi-objective evolutionary algorithm.For the evaluation task, the proposed scheme is tested on twelve balanced and unbalanced benchmark text classification tasks. In addition, the proposed approach is experimentally compared with three ensemble methods (AdaBoost, Bagging and Random Subspace) and three ensemble pruning algorithms (ensemble selection from libraries of models, Bagging ensemble selection and LibD3C algorithm). Results demonstrate that the consensus clustering and the elitist pareto-based multi-objective evolutionary algorithm can be effectively used in ensemble pruning. The experimental analysis with conventional ensemble methods and pruning algorithms indicates the validity and effectiveness of the proposed scheme.

Computer Standards & Interfaces | 2017

A hierarchical P2P clustering framework for video streaming systems

Sercan Demirci; Asil Yardimci; Muge Sayit; E. Turhan Tunali; Hasan Bulut

Abstract In this study, a novel overlay architecture for constructing hierarchical and scalable clustering of Peer-to-Peer (P2P) networks is proposed. The proposed architecture attempts to enhance the clustering of peers by incorporating join, split, merge and cluster leader election mechanisms in a fully distributed manner. It takes delay proximity of peers into account as distance measure. By constructing hierarchical clustering of peers, the control message overhead and maintenance such as host departure/host join overhead are decreased. Theoretical comparisons on overheads of the proposed system with that of other systems from literature are studied. The control mechanism for dynamic peer behavior of the architecture is tested over PlanetLab. The performance metrics used are end-to-end delay, diameter, cluster head distance, occupancy rate, peer join latency, accuracy and correctness. The test results are compared with Hierarchical Ring Tree (HRT) and mOverlay architecture. In addition, a P2P video streaming application is run over the proposed network overlay. Streaming tests show that video streaming applications perform well in terms of received video quality if hierarchical clusters considering delay proximity are used as underlying network architecture.

Intelligent Automation and Soft Computing | 2017

The Effect of Neighborhood Selection on Collaborative Filtering and a Novel Hybrid Algorithm

Musa Milli; Hasan Bulut

AbstractRecommender systems are widely used in industry and are still active research areas in academia. For many businesses, they have become indispensable business tools. Producing accurate results for such systems is important for the operations of the businesses. For this reason, various algorithms and approaches have been developed for recommender systems to increase the prediction accuracy. Collaborative filtering is one of the most successful approaches. In collaborative filtering, in order to predict more accurately, it is recommended to determine user’s active neighbors. k-nearest neighbor (k-NN) algorithm is one of the most widely used neighbor selection algorithms. However, k-NN algorithm uses a fixed k value that reduces the accuracy of the prediction. In this paper, we present two novel approaches to increase the prediction accuracy of recommender systems; k%-nearest neighbor (k%-NN) algorithm to determine the appropriate k value for a user and a hybrid algorithm that combines a collaborative...

international conference on management of data | 2013

Overcoming limitations of term-based partitioning for distributed RDFS reasoning

Tugba Kulahcioglu; Hasan Bulut

RDFS reasoning is carried out via joint terms of triples; accordingly, a distributed reasoning approach should bring together triples that have terms in common. To achieve this, term-based partitioning distributes triples to partitions based on the terms they include. However, skewed distribution of Semantic Web data results in unbalanced load distribution. A single peer should be able to handle even the largest partition, and this requirement limits scalability. This approach also suffers from data replication since a triple is sent to multiple partitions. In this paper, we propose a two-step method to overcome above limitations. Our RDFS specific term-based partitioning algorithm applies a selective distribution policy and distributes triples with minimum replication. Our schema-sensitive processing approach eliminates non-productive partitions, and enables processing of a partition regardless of its size. Resulting partitions reach full closure without repeating the global schema or without fix-point iteration as suggested by previous studies.

signal processing and communications applications conference | 2011

Multicast tree based video streaming application over planetlab

Muge Sayit; Sercan Demirci; Yagiz Kaymak; Hasan Bulut; E. Turhan Tunali

In this paper, a video streaming system running on multicast tree for peer to peer networks is presented and dynamic tree maintenance algorithm against node failures during streaming session is proposed. The proposed algorithm is tested on nodes existing in different countries over the Internet in Planetlab. The obtained results show that proposed algorithm can be used in latency sensitive live video streaming applications running on peer to peer networks.

Wireless Networks | 2018

Fireworks: an intelligent location discovery algorithm for vehicular ad hoc networks

Ilker Basaran; Hasan Bulut

Searching for and locating a certain destination in a vehicular ad-hoc network (VANET) are fundamental issues to ensure routing and data dissemination under high mobility and lack of fixed infrastructure. However, naive-flooding searching is too expensive and takes a considerable amount of valuable bandwidth in the network. To overcome this, GPS information of the vehicles can be exploited, which can aid searching and routing in VANETs. In this paper, we present a novel position-based searching algorithm—called Fireworks—that can be used as a location discovery algorithm in VANETs. The proposed scheme is purely reactive and has a limited usage of beacons. Fireworks algorithm provides the position of the destination vehicle without having a Location Information System infrastructure or a proactive mechanism. We show that the method is efficient and reliable while greatly reducing the searching overhead. The simulations show that the algorithm covers as many nodes as naive-flooding with less than one-fifth of the broadcast messages and with less than one-third of the Dynamic Source Routing (DSR). It also performs better than Acknowledgement-Based Broadcast Protocol (ABSM) in terms of total number of broadcast messages, node coverage speed and query success rate.

Gazi Üniversitesi Mühendislik-Mimarlık Fakültesi Dergisi | 2018

K-MEANS VE PSO KÜMELEME ALGORİTMALARI İÇİN YENİ İLKLENDİRME YAKLAŞIMLARI

Sinem Çınaroğlu; Hasan Bulut

Gunumuzde mikrodizi teknolojisi sayesinde genlerin farkli seviyelerini es zamanli olarak ifade etmek mumkun hale gelmistir. Genler icindeki gizli bilgilerin temsil edilmesi, genlerin anlasilabilirligini kolaylastirmakta; ancak gen sayisinin fazla olmasi ve veri setlerindeki yuksek gurultu miktari gen verilerinin anlasilmasini zorlastirmaktadir. Bunun icin genlerin anlasilabilirligini kolaylastirmak amaciyla kumeleme kullanilmaktadir. Mikrodizi verileri cok boyutlu verilere en iyi orneklerdendir. Cok boyutlu verileri kumelendirmek icin calisma kapsaminda standart K-means ve PSO kumeleme algoritmalari icin baslangic kume merkezlerinin secimine yonelik yeni yontemler onerilmistir. Ayrica obek (coreset) yaklasimi PSO algoritmasina uyarlanmistir. Gelistirilen yontemlerin dogrulugu; literaturde sikca kullanilan veri setleri uzerinde test edilmis ve bu yaklasimlar Colon Cancer mikrodizi veri seti uzerinde calistirilmistir. Baz alinan standart K-means ve PSO kumeleme yontemleri ile gelistirilen yaklasimlar karsilastirilmis; performanslari cozume ulasilan ortalama iterasyon sayisi, Rand ve Silhouette indeksleri kullanilarak degerlendirilmistir. Deneysel calismalarda, gelistirilen yaklasimlarin oznitelik secimi yapilmis normalize veri setleri uzerinde basarili sonuclar verdigi gozlemlenmistir.

Explore More