Galip Aydin | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Galip Aydin is active.

Explore More

Publication

Featured researches published by Galip Aydin.

Journal of Sensors | 2015

Architecture and Implementation of a Scalable Sensor Data Storage and Analysis System Using Cloud Computing and Big Data Technologies

Galip Aydin; Ibrahim Riza Hallac; Betul Karakus

Sensors are becoming ubiquitous. From almost any type of industrial applications to intelligent vehicles, smart city applications, and healthcare applications, we see a steady growth of the usage of various types of sensors. The rate of increase in the amount of data produced by these sensors is much more dramatic since sensors usually continuously produce data. It becomes crucial for these data to be stored for future reference and to be analyzed for finding valuable information, such as fault diagnosis information. In this paper we describe a scalable and distributed architecture for sensor data collection, storage, and analysis. The system uses several open source technologies and runs on a cluster of virtual servers. We use GPS sensors as data source and run machine-learning algorithms for data analysis.

Concurrency and Computation: Practice and Experience | 2018

Evaluating deep learning models for sentiment classification

Betul Karakus; Muhammed Talo; Ibrahim Riza Hallac; Galip Aydin

Deep learning has emerged as an effective solution to various text mining problems such as document classification and clustering, document summarization, web mining, and sentiment analysis. In this paper, we describe our work on investigating several deep learning models for a binary sentiment classification problem. We used movie reviews in Turkish from the website www.beyazperde.com to train and test the deep learning models. We also report a detailed comparison of the models in terms of accuracy and time performances. Two major deep learning architectures used in this study are Convolutional Neural Networks and Long Short‐Term Memory. We built several variants of these models by changing the number of layers, tuning the hyper‐parameters, and combining models. Additionally, word embeddings were created by applying the word2vec algorithm with a skip‐gram model on a large dataset (∼ 13 M words) composed of movie reviews. We investigate the effect of using the pre‐word embeddings with these models. Experimental results have shown that the use of word embeddings with deep neural networks effectively yields performance improvements in terms of run time and accuracy.

international symposium on innovations in intelligent systems and applications | 2012

Gabor wavelet and unsupervised Fuzzy C-means clustering for edge detection of medical images

Burhan Ergen; Ahmet Çinar; Galip Aydin

It is well known that the Gabor wavelet transform (GWT) provides directional information for the analysis of an image. In this paper, we proposed an approach based on the GWT by combining unsupervised Fuzzy c-means (FCM) clustering which provides plays an important role in recognition as a classifier. After enhancing the edge of the input image using GWT, the binary image showing the edge is obtained using FCM clustering and morphological skeletonization. When compared to the Canny method and other conventional method, the proposed method has showed a better performance in terms of detection accuracy for noisy medical images.

arXiv: Distributed, Parallel, and Cluster Computing | 2017

Classification of scientific papers with big data technologies

Selen Gurbuz; Galip Aydin

Data sizes that cannot be processed by conventional data storage and analysis systems are named as Big Data. It also refers to new technologies developed to store, process and analyze large amounts of data. Automatic information retrieval about the contents of a large number of documents produced by different sources, identifying research fields and topics, extraction of the document abstracts, or discovering patterns are some of the topics that have been studied in the field of big data. In this study, the Naïve Bayes classification algorithm, which is run on a data set consisting of scientific articles, has been tried to automatically determine the classes to which these documents belong. We have developed an efficient system that can analyze the Turkish scientific documents with the distributed document classification algorithm run on the Cloud Computing infrastructure. The Apache Mahout library is used in the study. The servers required for classifying and clustering distributed documents are.

arXiv: Computation and Language | 2017

Preparation of Improved Turkish DataSet for Sentiment Analysis in Social Media

Semiha Makinist; Ibrahim Riza Hallac; Betul Karakus; Galip Aydin

A public dataset, with a variety of properties suitable for sentiment analysis [1], event prediction, trend detection and other text mining applications, is needed in order to be able to successfully perform analysis studies. The vast majority of data on social media is text-based and it is not possible to directly apply machine learning processes into these raw data, since several different processes are required to prepare the data before the implementation of the algorithms. For example, different misspellings of same word enlarge the word vector space unnecessarily, thereby it leads to reduce the success of the algorithm and increase the computational power requirement. This paper presents an improved Turkish dataset with an effective spelling correction algorithm based on Hadoop [2]. The collected data is recorded on the Hadoop Distributed File System and the text based data is processed by MapReduce programming model. This method is suitable for the storage and processing of large sized text based social media data. In this study, movie reviews have been automatically recorded with Apache ManifoldCF (MCF) [3] and data clusters have been created. Various methods compared such as Levenshtein and Fuzzy String Matching have been proposed to create a public dataset from collected data. Experimental results show that the proposed algorithm, which can be used as an open source dataset in sentiment analysis studies, have been performed successfully to the detection and correction of spelling errors.

2017 International Conference on Computer Science and Engineering (UBMK) | 2017

Data classification with deep learning using Tensorflow

Fatih Ertam; Galip Aydin

Deep learning is a subfield of machine learning which uses artificial neural networks that is inspired by the structure and function of the human brain. Despite being a very new approach, it has become very popular recently. Deep learning has achieved much higher success in many applications where machine learning has been successful at certain rates. In particular It is preferred in the classification of big data sets because it can provide fast and efficient results. In this study, we used Tensorflow, one of the most popular deep learning libraries to classify MNIST dataset, which is frequently used in data analysis studies. Using Tensorflow, which is an open source artificial intelligence library developed by Google, we have studied and compared the effects of multiple activation functions on classification results. The functions used are Rectified Linear Unit (ReLu), Hyperbolic Tangent (tanH), Exponential Linear Unit (eLu), sigmoid, softplus and softsign. In this Study, Convolutional Neural Network (CNN) and SoftMax classifier are used as deep learning artificial neural network. The results show that the most accurate classification rate is obtained using the ReLu activation function.

international symposium on networks computers and communications | 2016

Call center performance evaluation using big data analytics

Betul Karakus; Galip Aydin

Quality monitoring for the call centers can be described as the process of listening to the recorded calls in order to measure the performance of a customer service representative or agent. The main challenge of quality monitoring is that managers have no time to listen all the records and therefore only a few of the stored calls are randomly selected. This results in inaccurate performance measurements, since most of call records can not be listened. This paper presents a distributed call monitoring system for assessing all recorded calls using several quality criteria. In the proposed system, we analyze large amount of call records using popular Hadoop MapReduce framework and utilize text similarity algorithms such as Cosine and n-gram. We also integrated slang word lists to our monitoring system. Empirical call records are used to demonstrate the performance of proposed call monitoring system.

Tehnicki Vjesnik-technical Gazette | 2016

Distributed log analysis on the cloud using MapReduce

Galip Aydin; Ibrahim Riza Hallac

In this paper we describe our work on designing a web based, distributed data analysis system based on the popular MapReduce framework deployed on a small cloud; developed specifically for analyzing web server logs. The log analysis system consists of several cluster nodes, it splits the large log files on a distributed file system and quickly processes them using MapReduce programming model. The cluster is created using an open source cloud infrastructure, which allows us to easily expand the computational power by adding new nodes. This gives us the ability to automatically resize the cluster according to the data analysis requirements. We implemented MapReduce programs for basic log analysis needs like frequency analysis, error detection, busy hour detection etc. as well as more complex analyses which require running several jobs. The system can automatically identify and analyze several web server log types such as Apache, IIS, Squid etc. We use open source projects for creating the cloud infrastructure and running MapReduce jobs.

advanced industrial conference on telecommunications | 2015

Running genetic algorithms on Hadoop for solving high dimensional optimization problems

Güngör Yildirim; Ibrahim Riza Hallac; Galip Aydin; Yetkin Tatar

Hadoop is a popular MapReduce framework for developing parallel applications in distributed environments. Several advantages of MapReduce such as programming ease and ability to use commodity hardware make the applicability of soft computing methods for parallel and distributed systems easier than before. In this paper, we present the results of an experimental study on running soft computing algorithms using Hadoop. This study shows how a simple genetic algorithm running on Hadoop can be used to produce solutions for high dimensional optimization problems. In addition, a simple but effective technique, which did not need MapReduce chains, has been proposed.

arXiv: Distributed, Parallel, and Cluster Computing | 2018