Yee Ling Boo
RMIT University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Yee Ling Boo.
Engineering Applications of Artificial Intelligence | 2017
Alireza Moayedikia; Kok-Leong Ong; Yee Ling Boo; William Yeoh; Richard Jensen
Misclassification costs of minority class data in real-world applications can be very high. This is a challenging problem especially when the data is also high in dimensionality because of the increase in overfitting and lower model interpretability. Feature selection is recently a popular way to address this problem by identifying features that best predict a minority class. This paper introduces a novel feature selection method call SYMON which uses symmetrical uncertainty and harmony search. Unlike existing methods, SYMON uses symmetrical uncertainty to weigh features with respect to their dependency to class labels. This helps to identify powerful features in retrieving the least frequent class labels. SYMON also uses harmony search to formulate the feature selection phase as an optimisation problem to select the best possible combination of features. The proposed algorithm is able to deal with situations where a set of features have the same weight, by incorporating two vector tuning operations embedded in the harmony search process. In this paper, SYMON is compared against various benchmark feature selection algorithms that were developed to address the same issue. Our empirical evaluation on different micro-array data sets using G-Mean and AUC measures confirm that SYMON is a comparable or a better solution to current benchmarks.
International Journal of Intelligent Systems in Accounting, Finance & Management | 2013
Ding-Wen Tan; William Yeoh; Yee Ling Boo; Soung-Yue Liew
The capability of identifying customers who are more likely to respond to a product is an important issue in direct marketing. This paper investigates the impact of feature selection on predictive models which predict reordering demand of small and medium-sized enterprise customers in a large online job-advertising company. Three well-known feature subset selection techniques in data mining, namely correlation-based feature selection (CFS), subset consistency (SC) and symmetrical uncertainty (SU), are applied in this study. The results show that the predictive models using SU outperform those without feature selection and those with the CFS and SC feature subset evaluators. This study has examined and demonstrated the significance of applying the feature-selection approach to enhance the accuracy of predictive modelling in a direct-marketing context.
Journal of Computer Information Systems | 2017
Alireza Moayedikia; William Yeoh; Kok-Leong Ong; Yee Ling Boo
ABSTRACT This paper presents a classification framework and a systematic analysis of literature on answer aggregation techniques for the most popular and important type of crowdsourcing, i.e., micro-task crowdsourcing. In doing so, we analyzed research articles since 2006 and developed four classification taxonomies. First, we provided a classification framework based on the algorithmic characteristics of answer aggregation techniques. Second, we outlined the statistical and probabilistic foundations used by different types of algorithms and micro-tasks. Third, we provided a matrix catalog of the data characteristics for which an answer aggregation algorithm is designed. Fourth, a matrix catalog of the commonly used evaluation metrics for each type of micro-task was presented. This paper represents the first systematic literature analysis and classification of the answer aggregation techniques for micro-task crowdsourcing.
international conference on machine learning and applications | 2016
Alireza Moayedikia; Kok-Leong Ong; Yee Ling Boo; William Yeoh
Estimation of worker reliability on microtask crowdsourcing platforms has gained attention from many researchers. On microtask platforms no worker is fully reliable for a task and it is likely that some workers are spammers, in the sense that they provide a random answer to collect the financial reward. Existence of spammers is harmful as they increase the cost of microtasking and will negatively affect the answer aggregation process. Hence, to discriminate spammers and non-spammers one needs to measure worker reliability to predict how likely that a worker put an effort in solving a task. In this paper we introduce a new reliability estimation algorithm works based on bee colony algorithm called REBECO. This algorithm relies on Gaussian process model to estimate reliability of workers dynamically. With bees that go in search of pollen, some are more successful than the others. This maps well to our problem, where some workers (i.e., bees) are more successful than other workers for a given task thus, giving rise to a reliability measure. Answer aggregation with respect to worker reliability rates has been considered as a suitable replacement for conventional majority voting. We compared REBECO with majority voting using two real world datasets. The results indicate that REBECO is able to outperform MV significantly.
Archive | 2016
Bob Li; Mong Shan Ee; Yee Ling Boo; Mamunur Rashid
Abstract Purpose Ever since the publication of the original Jegadeesh and Titman (1993) study, momentum effect has been tested vigorously to validate its pervasiveness for different time periods and across different markets. In spite of numerous out-of-sample tests, there is one apparent alibi – little research has been devised for steady increasing of Shari’ah compliant stocks. Methodology/approach This study is to examine the momentum strategy returns in a global Shari’ah compliant stock setting. Findings It finds strong presence of stock momentum returns for Pakistan and Malaysia. And the momentum returns are neither driven by industry momentum nor by the small size stocks. Though no momentum profits are found for the portfolios formed by global Shari’ah compliant stocks, this seems to be largely due to return reversal for the small size Shari’ah compliant stocks. Originality/value The strong presence of momentum profits for relatively large Shari’ah compliant stocks is a desirable trait as it indicates that the momentum trading strategies are practical and implementable.
Engineering Applications of Artificial Intelligence | 2018
Alireza Moayedikia; Kok-Leong Ong; Yee Ling Boo; William Yeoh
Abstract Conventional microtask crowdsourcing platforms rely on a random task distribution strategy and repeatedly assign tasks to workers. This strategy known as repeated labelling suffers from two shortcomings of high cost and low accuracy as a result of making random distributions. To overcome such shortcomings researchers have introduced task assignment as a substitute strategy. In this strategy, an algorithm selectively chooses suitable tasks for an online worker. Hence, task assignment has gained attentions from researchers to reduce the cost of microtasking whiling increasing its accuracy. However, the existing algorithms on task assignment suffer from four shortcomings as: (i) human intervention, (ii) reliance on a rough estimation of ground truth, (iii) reliance on workers’ dynamic capabilities and (iv) lack of ability in dealing with sparsity. To overcome these shortcomings this paper proposes a new task assignment algorithm known as LEarning Automata based Task assignment (LEATask), that works based on the similarities of workers in performance. This algorithm has two stages of exploration and exploitation. In exploration stage, first a number of workers are hired to learn their reliability. Then, LEATask clusters the hired workers using a given clustering algorithm, and for each cluster generates learning automata. Later, the clusters of workers along with their attached learning automata will be used in exploitation stage. Exploitation stage initially assigns a number of tasks to a newly arrived worker to learn the worker’s reliability. Then, LEATask identifies the cluster of worker. Based on the cluster that worker resides in and the attached learning automata, the next tasks will be assigned to the new worker. LEATask has been empirically evaluated using several real datasets and compared against the baseline and novel algorithms, in terms of root mean square error. The comparisons indicates LEATask consistently is showing better or comparable performance.
Australasian Conference on Data Mining | 2017
Alireza Moayedekia; Kok-Leong Ong; Yee Ling Boo; William Yeoh
Community detection (CD) is the act of grouping similar objects. This has applications in social networks. The conventional CD algorithms focus on finding communities from one single perspective (objective) such as structure. However, reliance on only one objective of structure. This makes the algorithm biased, in the sense that objects are well separated in terms of structure, while weakly separated in terms of other objective function (e.g., attribute). To overcome this issue, novel multi-objective community detection algorithms focus on two objective functions, and try to find a proper balance between these two objective functions. In this paper we use Harmony Search (HS) algorithm and integrate it with Pareto Envelope-Based Selection Algorithm 2 (PESA-II) algorithm to introduce a new multi-objective harmony search based community detection algorithm. The integration of PESA-II and HS helps to identify those non-dominated individuals, and using that individuals during improvisation steps new harmony vectors will be generated. In this paper we experimentally show the performance of the proposed algorithm and compare it against two other multi-objective evolutionary based community detection algorithms, in terms of structure (modularity) and attribute (homogeneity). The experimental results indicate that the proposed algorithm is outperforming or showing comparable performances.
Archive | 2016
Kok-Leong Ong; Daswin De Silva; Yee Ling Boo; Ee Hui Lim; Frank Bodi; Damminda Alahakoon; Simone Leao
Research to solve engineering and science problems commonly require the collection and complex analysis of a vast amount of data. This makes them a natural exemplar of big data applications. For example, data from weather stations, high resolution images from CT scans, or data captured by astronomical instruments all easily showcase one or more big data characteristics, i.e., volume, velocity, variety and veracity. These big data characteristics present computational and analytical challenges that need to be overcame in order to deliver engineering solutions or make scientific discoveries. In this chapter, we catalogued engineering and science problems that carry a big data angle. We will also discuss the research advances for these problems and present a list of tools available to the practitioner. A number of big data application exemplars from the past works of the authors are discussed with further depth, highlighting the association of the specific problem and its big data characteristics. The overview from these various perspectives will provide the reader an up-to-date audit of big data developments in engineering and science.
Computational intelligence in digital forensics: forensic investigation and applications | 2014
Yee Ling Boo; Damminda Alahakoon
Profiling is important in law enforcement, especially in understanding the behaviours of criminals as well as the characteristics and similarities in crimes. It could provide insights to law enforcement officers when solving similar crimes and more importantly for pre-crime action, which is to act before crimes happen. Usually a single case captures data from the crime scene, offenders, etc. and therefore could be termed as multi-modality in data sources and subsequently has resulted a complex data fusion problem. Traditional criminal profiling requires experienced and skilful crime analysts or psychologists to laboriously associate and fuse multi-modal crime data. With the ubiquitous usage of digital data in crime and forensic records, law enforcement has also encountered the issue of big data. In addition, law enforcement professionals are always competing against time in solving crimes and facing constant pressures. Therefore, it is necessary to have a computational approach that could assist in reducing the time and efforts spent for the laborious fusion process in profiling multi-modal crime data. Besides obtaining the demographics, physical characteristics and the behaviours of criminals, a crime profile should also comprise of crime statistics and trends. In fact, crime and criminal profiles are highly interrelated and both are required in order to provide a holistic analysis. In this chapter, our approach proposes the fusion of multiple sources of crime data to populate a holistic crime profile through the use of Growing Self Organising Maps (GSOM).
intelligent systems design and applications | 2011
Yee Ling Boo; Damminda Alahakoon
The human brain processes information in both unimodal and multimodal fashion where information is progressively captured, accumulated, abstracted and seamlessly fused. Subsequently, the fusion of multimodal inputs allows a holistic understanding of a problem. The proliferation of technology has produced various sources of electronic data and continues to do so exponentially. Finding patterns from such multi-source and multimodal data could be compared to the multimodal and multidimensional information processing in the human brain. Therefore, such brain functionality could be taken as an inspiration to develop a methodology for exploring multimodal and multi-source electronic data and further identifying multi-view patterns. In this paper, we first propose a brain inspired conceptual model that allows exploration and identification of patterns at different levels of granularity, different types of hierarchies and different types of modalities. Secondly, we present a cluster driven approach for the implementation of the proposed brain inspired model. Particularly, the Growing Self Organising Maps (GSOM) based cross-clustering approach is discussed. Furthermore, the acquisition of multi-view patterns with clusters driven implementation is demonstrated with experimental results.