Saeed Aghabozorgi
Information Technology University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Saeed Aghabozorgi.
Expert Systems With Applications | 2014
Arman Khadjeh Nassirtoussi; Saeed Aghabozorgi; Teh Ying Wah; David Chek Ling Ngo
The quality of the interpretation of the sentiment in the online buzz in the social media and the online news can determine the predictability of financial markets and cause huge gains or losses. That is why a number of researchers have turned their full attention to the different aspects of this problem lately. However, there is no well-rounded theoretical and technical framework for approaching the problem to the best of our knowledge. We believe the existing lack of such clarity on the topic is due to its interdisciplinary nature that involves at its core both behavioral-economic topics as well as artificial intelligence. We dive deeper into the interdisciplinary nature and contribute to the formation of a clear frame of discussion. We review the related works that are about market prediction based on online-text-mining and produce a picture of the generic components that they all have. We, furthermore, compare each system with the rest and identify their main differentiating factors. Our comparative analysis of the systems expands onto the theoretical and technical foundations behind each. This work should help the research community to structure this emerging field and identify the exact aspects which require further research and are of special significance.
Information Systems | 2015
Saeed Aghabozorgi; Ali Seyed Shirkhorshidi; Teh Ying Wah
Clustering is a solution for classifying enormous data when there is not any early knowledge about classes. With emerging new concepts like cloud computing and big data and their vast applications in recent years, research works have been increased on unsupervised solutions like clustering algorithms to extract knowledge from this avalanche of data. Clustering time-series data has been used in diverse scientific areas to discover patterns which empower data analysts to extract valuable information from complex and massive datasets. In case of huge datasets, using supervised classification solutions is almost impossible, while clustering can solve this problem using un-supervised approaches. In this research work, the focus is on time-series data, which is one of the popular data types in clustering problems and is broadly used from gene expression data in biology to stock market analysis in finance. This review will expose four main components of time-series clustering and is aimed to represent an updated investigation on the trend of improvements in efficiency, quality and complexity of clustering time-series approaches during the last decade and enlighten new paths for future works. Anatomy of time-series clustering is revealed by introducing its 4 main component.Research works in each of the four main components are reviewed in detail and compared.Analysis of research works published in the last decade.Enlighten new paths for future works for time-series clustering and its components.
international conference on computational science and its applications | 2014
Ali Seyed Shirkhorshidi; Saeed Aghabozorgi; Teh Ying Wah; Tutut Herawan
Clustering is an essential data mining and tool for analyzing big data. There are difficulties for applying clustering techniques to big data duo to new challenges that are raised with big data. As Big Data is referring to terabytes and petabytes of data and clustering algorithms are come with high computational costs, the question is how to cope with this problem and how to deploy clustering techniques to big data and get the results in a reasonable time. This study is aimed to review the trend and progress of clustering algorithms to cope with big data challenges from very first proposed algorithms until today’s novel solutions. The algorithms and the targeted challenges for producing improved clustering algorithms are introduced and analyzed, and afterward the possible future path for more advanced algorithms is illuminated based on today’s available technologies and frameworks.
Expert Systems With Applications | 2014
Saeed Aghabozorgi; Ying Wah Teh
An automatic stock market categorization system would be invaluable to individual investors and financial experts, providing them with the opportunity to predict the stock price changes of a company with respect to other companies. In recent years, clustering all companies in the stock markets based on their similarities in the shape of the stock market has increasingly become a common scheme. However, existing approaches are impractical because the stock price data are high-dimensional data and the changes in the stock price usually occur with shift, which makes the categorization more complex. Moreover, no stock market categorization method that can cluster companies down to the sub-cluster level, which are very meaningful to end users, has been developed. Therefore, in this paper, a novel three-phase clustering model is proposed to categorize companies based on the similarity in the shape of their stock markets. First, low-resolution time series data are used to approximately categorize companies. Then, in the second phase, pre-clustered companies are split into some pure sub-clusters. Finally, sub-clusters are merged in the third phase. The accuracy of the proposed method is evaluated using various published data sets in different domains. We show that this approach has good performance in efficiency and effectiveness compared to existing conventional clustering algorithms.
Neurocomputing | 2014
Afshin Jahangirzadeh; Shahaboddin Shamshirband; Saeed Aghabozorgi; Shatirah Akib; Hossein Basser; Nor Badrul Anuar; Miss Laiha Mat Kiah
Abstract In this study, a new procedure to determine the optimum dimensions for a rectangular collar to minimize the temporal trend of scouring around a pier model is proposed. Unlike previous methods of predicting collar dimensions around a bridge pier, the proposed approach concerns the selection of different collar dimension sizes around a bridge scour in terms of the flume׳s upstream ( L uc / D ), downstream ( L dc / D ) and width ( L w / D ) of the flume. The projected determination method involves utilizing Expert Multi Agent System (E-MAS) based Support Vector Regression (SVR) agents with respect to cooperative-based expert SVR (Co-ESVR). The SVR agents (i.e. SVR Luc , SVR Ldc and SVR Lw ) are set around a rectangular collar to predict the collar dimensions around a bridge pier. In the first layer, the Expert System (ES) is adopted to gather suitable data and send it to the next layer. The multi agent-based SVR adjusts its parameters to find the optimal cost prediction function in the collar dimensions around the bridge pier to reduce the collar around the bridge scour. The weighted sharing strategy was utilized to select the cost optimization function through the root mean square error (RMSE). The efficiency of the proposed optimization method (Co-ESVR) was explored by comparing its outcomes with experimental results. Numerical results indicate that the Co-ESVR achieves better accuracy in reducing the percentage of scour depth ( r e ) with a smaller network size, compared to the non-cooperative approaches.
Software - Practice and Experience | 2016
Amin Mohebi; Saeed Aghabozorgi; Teh Ying Wah; Tutut Herawan; Ramin Yahyapour
Enterprises today are dealing with the massive size of data, which have been explosively increasing. The key requirements to address this challenge are to extract, analyze, and process data in a timely manner. Clustering is an essential data mining tool that plays an important role for analyzing big data. However, large‐scale data clustering has become a challenging task because of the large amount of information that emerges from technological progress in many areas, including finance and business informatics. Accordingly, researchers have dealt with parallel clustering algorithms using parallel programming models to address this issue. MapReduce is one of the most famous frameworks, and it has attracted great attention because of its flexibility, ease of programming, and fault tolerance. However, the framework has evident performance limitations, especially for iterative programs. This study will first review the proposed iterative frameworks that extended MapReduce to support iterative algorithms. We summarize these techniques, discuss their uniqueness and limitations, and explain how they address the challenging issues of iterative programs. We also perform an in‐depth review to understand the problems and the solving techniques for parallel clustering algorithms. Hence, we believe that no well‐rounded review provides a significant comparison among parallel clustering algorithms using MapReduce. This work aims to serve as a stepping stone for researchers who are studying big data clustering algorithms. Copyright
soft computing and pattern recognition | 2009
Saeed Aghabozorgi; Teh Ying Wah
The recent extensive growth of data on the Web, has generated an enormous amount of log records on Web server databases. Applying Web Usage Mining techniques on these vast amounts of historical data can discover potentially useful patterns and reveal user access behaviors on the Web site. Cluster analysis has widely been applied to generate user behavior models on Server Web logs. Most of these off-line models have the problem of the decrease of accuracy over time resulted of new users joining or changes of behavior for existing users in model-based approaches. This paper proposes a novel approach to generate dynamic model from off-line model created by fussy clustering. In this method, we will use users’ transactions periodically to change the off-line model. To this aim, an improved model of leader clustering along with a static approach is used to regenerate clusters in an incremental fashion.
The Scientific World Journal | 2014
Zahra Moghaddasi; Hamid A. Jalab; Rafidah Md Noor; Saeed Aghabozorgi
Digital image forgery is becoming easier to perform because of the rapid development of various manipulation tools. Image splicing is one of the most prevalent techniques. Digital images had lost their trustability, and researches have exerted considerable effort to regain such trustability by focusing mostly on algorithms. However, most of the proposed algorithms are incapable of handling high dimensionality and redundancy in the extracted features. Moreover, existing algorithms are limited by high computational time. This study focuses on improving one of the image splicing detection algorithms, that is, the run length run number algorithm (RLRN), by applying two dimension reduction methods, namely, principal component analysis (PCA) and kernel PCA. Support vector machine is used to distinguish between authentic and spliced images. Results show that kernel PCA is a nonlinear dimension reduction method that has the best effect on R, G, B, and Y channels and gray-scale images.
intelligent data analysis | 2014
Saeed Aghabozorgi; Teh Ying Wah
Time series clustering is a very effective approach in discovering valuable information in various systems such as finance, embedded bio-sensor and genome. However, focusing on the efficiency and scalability of these algorithms to deal with time series data has come at the expense of losing the usability and effectiveness of clustering. In this paper a new multi-step approach is proposed to improve the accuracy of clustering of time series data. In the first step, time series data are clustered approximately. Then, in the second step, the built clusters are split into sub-clusters. Finally, sub-clusters are merged in the third step. In contrast to existing approaches, this method can generate accurate clusters based on similarity in shape in very large time series datasets. The accuracy of the proposed method is evaluated using various published datasets in different domains.
Journal of Applied Mathematics | 2014
Mohammad Amin Shayegan; Saeed Aghabozorgi; Ram Gopal Raj
Dimensionality reduction (feature selection) is an important step in pattern recognition systems. Although there are different conventional approaches for feature selection, such as Principal Component Analysis, Random Projection, and Linear Discriminant Analysis, selecting optimal, effective, and robust features is usually a difficult task. In this paper, a new two-stage approach for dimensionality reduction is proposed. This method is based on one-dimensional and two-dimensional spectrum diagrams of standard deviation and minimum to maximum distributions for initial feature vector elements. The proposed algorithm is validated in an OCR application, by using two big standard benchmark handwritten OCR datasets, MNIST and Hoda. In the beginning, a 133-element feature vector was selected from the most used features, proposed in the literature. Finally, the size of initial feature vector was reduced from 100% to 59.40% (79 elements) for the MNIST dataset, and to 43.61% (58 elements) for the Hoda dataset, in order. Meanwhile, the accuracies of OCR systems are enhanced 2.95% for the MNIST dataset, and 4.71% for the Hoda dataset. The achieved results show an improvement in the precision of the system in comparison to the rival approaches, Principal Component Analysis and Random Projection. The proposed technique can also be useful for generating decision rules in a pattern recognition system using rule-based classifiers.