Is this you? Create Your Porfile

Teh Ying Wah

Information Technology University

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Teh Ying Wah is active.

Explore More

Publication

Featured researches published by Teh Ying Wah.

Expert Systems With Applications | 2014

Review: Text mining for market prediction: A systematic review

Arman Khadjeh Nassirtoussi; Saeed Aghabozorgi; Teh Ying Wah; David Chek Ling Ngo

The quality of the interpretation of the sentiment in the online buzz in the social media and the online news can determine the predictability of financial markets and cause huge gains or losses. That is why a number of researchers have turned their full attention to the different aspects of this problem lately. However, there is no well-rounded theoretical and technical framework for approaching the problem to the best of our knowledge. We believe the existing lack of such clarity on the topic is due to its interdisciplinary nature that involves at its core both behavioral-economic topics as well as artificial intelligence. We dive deeper into the interdisciplinary nature and contribute to the formation of a clear frame of discussion. We review the related works that are about market prediction based on online-text-mining and produce a picture of the generic components that they all have. We, furthermore, compare each system with the rest and identify their main differentiating factors. Our comparative analysis of the systems expands onto the theoretical and technical foundations behind each. This work should help the research community to structure this emerging field and identify the exact aspects which require further research and are of special significance.

Information Systems | 2015

Time-series clustering - A decade review

Saeed Aghabozorgi; Ali Seyed Shirkhorshidi; Teh Ying Wah

Clustering is a solution for classifying enormous data when there is not any early knowledge about classes. With emerging new concepts like cloud computing and big data and their vast applications in recent years, research works have been increased on unsupervised solutions like clustering algorithms to extract knowledge from this avalanche of data. Clustering time-series data has been used in diverse scientific areas to discover patterns which empower data analysts to extract valuable information from complex and massive datasets. In case of huge datasets, using supervised classification solutions is almost impossible, while clustering can solve this problem using un-supervised approaches. In this research work, the focus is on time-series data, which is one of the popular data types in clustering problems and is broadly used from gene expression data in biology to stock market analysis in finance. This review will expose four main components of time-series clustering and is aimed to represent an updated investigation on the trend of improvements in efficiency, quality and complexity of clustering time-series approaches during the last decade and enlighten new paths for future works. Anatomy of time-series clustering is revealed by introducing its 4 main component.Research works in each of the four main components are reviewed in detail and compared.Analysis of research works published in the last decade.Enlighten new paths for future works for time-series clustering and its components.

Journal of Computer Science and Technology | 2014

On Density-Based Data Streams Clustering Algorithms: A Survey

Amineh Amini; Teh Ying Wah; Hadi Saboohi

Clustering data streams has drawn lots of attention in the last few years due to their ever-growing presence. Data streams put additional challenges on clustering such as limited time and memory and one pass clustering. Furthermore, discovering clusters with arbitrary shapes is very important in data stream applications. Data streams are infinite and evolving over time, and we do not have any knowledge about the number of clusters. In a data stream environment due to various factors, some noise appears occasionally. Density-based method is a remarkable class in clustering data streams, which has the ability to discover arbitrary shape clusters and to detect noise. Furthermore, it does not need the number of clusters in advance. Due to data stream characteristics, the traditional density-based clustering is not applicable. Recently, a lot of density-based clustering algorithms are extended for data streams. The main idea in these algorithms is using density-based methods in the clustering process and at the same time overcoming the constraints, which are put out by data stream’s nature. The purpose of this paper is to shed light on some algorithms in the literature on density-based clustering over data streams. We not only summarize the main density-based clustering algorithms on data streams, discuss their uniqueness and limitations, but also explain how they address the challenges in clustering data streams. Moreover, we investigate the evaluation metrics used in validating cluster quality and measuring algorithms’ performance. It is hoped that this survey will serve as a steppingstone for researchers studying data streams clustering, particularly density-based algorithms.

international conference on computational science and its applications | 2014

Big Data Clustering: A Review

Ali Seyed Shirkhorshidi; Saeed Aghabozorgi; Teh Ying Wah; Tutut Herawan

Clustering is an essential data mining and tool for analyzing big data. There are difficulties for applying clustering techniques to big data duo to new challenges that are raised with big data. As Big Data is referring to terabytes and petabytes of data and clustering algorithms are come with high computational costs, the question is how to cope with this problem and how to deploy clustering techniques to big data and get the results in a reasonable time. This study is aimed to review the trend and progress of clustering algorithms to cope with big data challenges from very first proposed algorithms until today’s novel solutions. The algorithms and the targeted challenges for producing improved clustering algorithms are introduced and analyzed, and afterward the possible future path for more advanced algorithms is illuminated based on today’s available technologies and frameworks.

International Journal of Information Management | 2016

Big data reduction framework for value creation in sustainable enterprises

Muhammad Habib ur Rehman; Victor Chang; Aisha Batool; Teh Ying Wah

Value creation is a major sustainability factor for enterprises, in addition to profit maximization and revenue generation. Modern enterprises collect big data from various inbound and outbound data sources. The inbound data sources handle data generated from the results of business operations, such as manufacturing, supply chain management, marketing, and human resource management, among others. Outbound data sources handle customer-generated data which are acquired directly or indirectly from customers, market analysis, surveys, product reviews, and transactional histories. However, cloud service utilization costs increase because of big data analytics and value creation activities for enterprises and customers. This article presents a novel concept of big data reduction at the customer end in which early data reduction operations are performed to achieve multiple objectives, such as (a) lowering the service utilization cost, (b) enhancing the trust between customers and enterprises, (c) preserving privacy of customers, (d) enabling secure data sharing, and (e) delegating data sharing control to customers. We also propose a framework for early data reduction at customer end and present a business model for end-to-end data reduction in enterprise applications. The article further presents a business model canvas and maps the future application areas with its nine components. Finally, the article discusses the technology adoption challenges for value creation through big data reduction in enterprise applications.

fuzzy systems and knowledge discovery | 2011

A study of density-grid based clustering algorithms on data streams

Amineh Amini; Teh Ying Wah; Mahmoud Reza Saybani; Saeed Reza Aghabozorgi Sahaf Yazdi

Clustering data streams attracted many researchers since the applications that generate data streams have become more popular. Several clustering algorithms have been introduced for data streams based on distance which are incompetent to find clusters of arbitrary shapes and cannot handle the outliers. Density-based clustering algorithms are remarkable not only to find arbitrarily shaped clusters but also to deal with noise in data. In density-based clustering algorithms, dense areas of objects in the data space are considered as clusters which are segregated by low-density area. Another group of the clustering methods for data streams is grid-based clustering where the data space is quantized into finite number of cells which form the grid structure and perform clustering on the grids. Grid-based clustering maps the infinite number of data records in data streams to finite numbers of grids. In this paper we review the grid based clustering algorithms that use density-based algorithms or density concept for the clustering. We called them density-grid clustering algorithms. We explore the algorithms in details and the merits and limitations of them. The algorithms are also summarized in a table based on the important features. Besides that, we discuss about how well the algorithms address the challenging issues in the clustering data streams.

Sensors | 2015

Mining personal data using smartphones and wearable devices: a survey.

Muhammad Habib ur Rehman; Chee Sun Liew; Teh Ying Wah; Junaid Shuja; Babak Daghighi

The staggering growth in smartphone and wearable device use has led to a massive scale generation of personal (user-specific) data. To explore, analyze, and extract useful information and knowledge from the deluge of personal data, one has to leverage these devices as the data-mining platforms in ubiquitous, pervasive, and big data environments. This study presents the personal ecosystem where all computational resources, communication facilities, storage and knowledge management systems are available in user proximity. An extensive review on recent literature has been conducted and a detailed taxonomy is presented. The performance evaluation metrics and their empirical evidences are sorted out in this paper. Finally, we have highlighted some future research directions and potentially emerging application areas for personal data mining using smartphones and wearable devices.

Software - Practice and Experience | 2016

Iterative big data clustering algorithms: a review

Amin Mohebi; Saeed Aghabozorgi; Teh Ying Wah; Tutut Herawan; Ramin Yahyapour

Enterprises today are dealing with the massive size of data, which have been explosively increasing. The key requirements to address this challenge are to extract, analyze, and process data in a timely manner. Clustering is an essential data mining tool that plays an important role for analyzing big data. However, large‐scale data clustering has become a challenging task because of the large amount of information that emerges from technological progress in many areas, including finance and business informatics. Accordingly, researchers have dealt with parallel clustering algorithms using parallel programming models to address this issue. MapReduce is one of the most famous frameworks, and it has attracted great attention because of its flexibility, ease of programming, and fault tolerance. However, the framework has evident performance limitations, especially for iterative programs. This study will first review the proposed iterative frameworks that extended MapReduce to support iterative algorithms. We summarize these techniques, discuss their uniqueness and limitations, and explain how they address the challenging issues of iterative programs. We also perform an in‐depth review to understand the problems and the solving techniques for parallel clustering algorithms. Hence, we believe that no well‐rounded review provides a significant comparison among parallel clustering algorithms using MapReduce. This work aims to serve as a stepping stone for researchers who are studying big data clustering algorithms. Copyright

soft computing and pattern recognition | 2009

Using Incremental Fuzzy Clustering to Web Usage Mining

Saeed Aghabozorgi; Teh Ying Wah

The recent extensive growth of data on the Web, has generated an enormous amount of log records on Web server databases. Applying Web Usage Mining techniques on these vast amounts of historical data can discover potentially useful patterns and reveal user access behaviors on the Web site. Cluster analysis has widely been applied to generate user behavior models on Server Web logs. Most of these off-line models have the problem of the decrease of accuracy over time resulted of new users joining or changes of behavior for existing users in model-based approaches. This paper proposes a novel approach to generate dynamic model from off-line model created by fussy clustering. In this method, we will use users’ transactions periodically to change the off-line model. To this aim, an improved model of leader clustering along with a static approach is used to regenerate clusters in an incremental fashion.

Data Science and Engineering | 2016

Big Data Reduction Methods: A Survey

Muhammad Habib ur Rehman; Chee Sun Liew; Assad Abbas; Prem Prakash Jayaraman; Teh Ying Wah; Samee Ullah Khan

Abstract Research on big data analytics is entering in the new phase called fast data where multiple gigabytes of data arrive in the big data systems every second. Modern big data systems collect inherently complex data streams due to the volume, velocity, value, variety, variability, and veracity in the acquired data and consequently give rise to the 6Vs of big data. The reduced and relevant data streams are perceived to be more useful than collecting raw, redundant, inconsistent, and noisy data. Another perspective for big data reduction is that the million variables big datasets cause the curse of dimensionality which requires unbounded computational resources to uncover actionable knowledge patterns. This article presents a review of methods that are used for big data reduction. It also presents a detailed taxonomic discussion of big data reduction methods including the network theory, big data compression, dimension reduction, redundancy elimination, data mining, and machine learning methods. In addition, the open research issues pertinent to the big data reduction are also highlighted.

Explore More