Bee Wah Yap
Universiti Teknologi MARA
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Bee Wah Yap.
Business Strategy Series | 2012
Bee Wah Yap; T. Ramayah; Wan Nushazelin Wan Shahidan
Purpose – The purpose of this paper is to test some antecedents and outcomes of satisfaction in the banking sector in Malaysia.Design/methodology/approach – A research model based on the customer satisfaction index (CSI) was developed and tested using structural equation modeling technique with the use of partial least squares (PLS) approach. Data was analyzed using 239 bank customer responses.Findings – The results showed that satisfaction has a positive effect on trust and this trust leads to loyalty to the bank. Good complaint handling by banks will also elevate satisfaction, trust and loyalty.Research limitations/implications – This work contributes to the existing literature on CSI models by introducing customer satisfaction as an antecedent of trust. The findings are limited by the constraints of the number of measures and the participation of banks in this study.Practical implications – Banks should focus on building credibility trust and benevolence trust with their customers. Commercial service i...
1st International Conference on Advanced Data and Information Engineering, DaEng 2013 | 2014
Bee Wah Yap; Khatijahhusna Abd Rani; Hezlin Aryani Abd Rahman; Simon Fong; Zuraida Khairudin; Nik Nairan Abdullah
Most classifiers work well when the class distribution in the response variable of the dataset is well balanced. Problems arise when the dataset is imbalanced. This paper applied four methods: Oversampling, Undersampling, Bagging and Boosting in handling imbalanced datasets. The cardiac surgery dataset has a binary response variable (1 = Died, 0 = Alive). The sample size is 4976 cases with 4.2 % (Died) and 95.8 % (Alive) cases. CART, C5 and CHAID were chosen as the classifiers. In classification problems, the accuracy rate of the predictive model is not an appropriate measure when there is imbalanced problem due to the fact that it will be biased towards the majority class. Thus, the performance of the classifier is measured using sensitivity and precision Oversampling and undersampling are found to work well in improving the classification for the imbalanced dataset using decision tree. Meanwhile, boosting and bagging did not improve the Decision Tree performance.
Quality of Life Research | 2015
Zeinab Jannoo; Bee Wah Yap; Kamarul Imran Musa; Mohamad Alias Lazim; Mohamed Azmi Hassali
AbstractPurposeThe aim of this study was to evaluate and validate the ADDQoL and to assess the impact of diabetes on QoL among the type 2 diabetes mellitus patients in Malaysia.Methods The Malay and English versions of the ADDQoL questionnaire were administered to patients attending routine outpatient visits in three primary hospitals and a public clinic. The construct validity of the ADDQoL was validated using confirmatory factor analysis (CFA). The sample comprised 350 Malay respondents who rated the ADDQoL Malay version and 246 non-Malay respondents (Chinese or Indian) who answered using the ADDQoL original English version.ResultsCFA confirmed the presence of one-factor structure for both samples. The internal consistency was high with Cronbach’s alpha values of 0.945 and 0.907 for the ADDQoL Malay and English versions, respectively. Results showed that for all three ethnicities, the most important domain is ‘family life’. Overall, Malay patients stated their ‘living conditions’ is the most negatively affected, while for Chinese and Indians, diabetes has the greatest impact on their ‘freedom to eat’.ConclusionsThe ADDQoL was found to be culturally appropriate, valid and reliable among Malay- and English-speaking type 2 diabetes mellitus patients in Malaysia.
soft computing | 2016
Mohammad Nasir Abdullah; Bee Wah Yap; Yuslina Zakaria; Abu Bakar Abdul Majeed
Alzheimer’s disease (AD) is neurodegenerative disorder characterized by the gradual memory loss, impairment of cognitive functions and progressive disability. It is known from previous studies that symptoms of AD are due to synaptic dysfunction and neuronal death in the area of the brain, which performs memory consolidation. Thus, the investigation of deviations in various cellular metabolite linkages is crucial to advance our understanding of early disease mechanism and to identify novel therapeutic targets. This study aims to identify small sets of metabolites that could be potential biomarkers of AD. Liquid chromatography/mass spectrometry-quadrupole time of flight (LC/MS-QTOF)-based metabolomics data were used to determine potential biomarkers. The metabolic profiling detected a total of 100 metabolites for 55 AD patients and 55 healthy control. Random forest (RF), a supervised classification algorithm was used to select the important features that might elucidate biomarkers of AD. Mean decrease accuracy of .05 or higher indicates important variables. Out of 100 metabolites, 10 were significantly modified, namely N-(2-hydroxyethyl) icosanamide which had the highest Gini index followed by X11-12-dihyroxy (arachidic) acid, N-(2-hydroxyethyl) palmitamide, phytosphingosine, dihydrosphingosine, deschlorobenzoyl indomenthacin, XZN-2-hydroxyethyl (icos) 11-enamide, X1-hexadecanoyl (sn) glycerol, trypthophan and dihydroceramide C2.
Mathematical Problems in Engineering | 2015
Simon Fong; Robert P. Biuk-Aghai; Yain-Whar Si; Bee Wah Yap
A prime objective in constructing data streaming mining models is to achieve good accuracy, fast learning, and robustness to noise. Although many techniques have been proposed in the past, efforts to improve the accuracy of classification models have been somewhat disparate. These techniques include, but are not limited to, feature selection, dimensionality reduction, and the removal of noise from training data. One limitation common to all of these techniques is the assumption that the full training dataset must be applied. Although this has been effective for traditional batch training, it may not be practical for incremental classifier learning, also known as data stream mining, where only a single pass of the data stream is seen at a time. Because data streams can amount to infinity and the so-called big data phenomenon, the data preprocessing time must be kept to a minimum. This paper introduces a new data preprocessing strategy suitable for the progressive purging of noisy data from the training dataset without the need to process the whole dataset at one time. This strategy is shown via a computer simulation to provide the significant benefit of allowing for the dynamic removal of bad records from the incremental classifier learning process.
international conference on artificial intelligence and applications | 2014
Simon Fong; Zhicong Luo; Bee Wah Yap; Suash Deb
Outlier detection is one of the most important data mining techniques. It has broad applications like fraud detection, credit approval, computer network intrusion detection, anti-money laundering, etc. The basis of outlier detection is to identify data points which are “different” or “far away” from the rest of the data points in the given dataset. Traditional outlier detection method is based on statistical analysis. However, this traditional method has an inherent drawback—it requires the availability of the entire dataset. In practice, especially in the real time data feed application, it is not so realistic to wait for all the data because fresh data are streaming in very quickly. Outlier detection is hence done in batches. However two drawbacks may arise: relatively long processing time because of the massive size, and the result may be outdated soon between successive updates. In this paper, we propose several novel incremental methods to process the real time data effectively for outlier detection. For the experiment, we test three types of mechanisms for analyzing the dataset, namely Global Analysis, Cumulative Analysis and Lightweight Analysis with Sliding Window. The experiment dataset is “household power consumption” which is a popular benchmarking data for Massive Online Analysis.
2013 International Symposium on Computational and Business Intelligence | 2013
Simon Fong; Zhicong Luo; Bee Wah Yap
Classification is one of the most commonly used data mining methods which can make a prediction by modeling from the known data. However, in traditional classification, we need to acquire the whole dataset and then build a training model which may take a lot of time and resource consumption. Another drawback of the traditional classification is that it cannot process the dataset timely and efficiently, especially for real-time data stream or big data. In this paper, we evaluate a lightweight method based on incremental learning algorithms for fast classification. We use this method to do outlier detection via several popular incremental learning algorithms, like Decision Table, Naïve Bayes, J48, VFI, KStar, etc.
soft computing | 2016
Hezlin Aryani Abd Rahman; Bee Wah Yap
Classification problems involving imbalance data will affect the performance of classifiers. In predictive analytics, logistic regression is a statistical technique which is often used as a benchmark when other classifiers, such as Naive Bayes, decision tree, artificial neural network and support vector machine, are applied to a classification problem. This study investigates the effect of imbalanced ratio in the response variable on the parameter estimate of the binary logistic regression via a simulation study. Datasets were simulated with controlled different percentages of imbalance ratio (IR), from 1 % to 50 %, and for various sample sizes. The simulated datasets were then modeled using binary logistic regression. The bias in the estimates was measured using MSE (Mean Square Error). The simulation results provided evidence that imbalance ratio affects the parameter estimates where severe imbalance (IR = 1 %, 2 %, 5 %) has higher MSE. Additionally, the effects of high imbalance (IR ≤ 5 %) will be more severe when sample size is small (n = 100 & n = 500). Further investigation using real dataset from the UCI repository (Bupa Liver (n = 345) and Diabetes Messidor, n = 1151)) confirmed the imbalanced ratio effect on the parameter estimates and the odds ratio, and thus will lead to misleading results.
Archive | 2016
Simon Fong; Charlie Fang; Neal Tian; Raymond K. Wong; Bee Wah Yap
Big Data is being touted as the next big thing arousing technical challenges that confront both academic research communities and commercial IT deployment. The root sources of Big Data are founded on infinite data streams and the curse of dimensionality. It is generally known that data which are sourced from data streams accumulate continuously making traditional batch-based model induction algorithms infeasible for real-time data mining. In the past many methods have been proposed for incrementally data mining by modifying classical machine learning algorithms, such as artificial neural network. In this paper we propose an incremental learning process for supervised learning with parameters optimization by neural network over data stream. The process is coupled with a parameters optimization module which searches for the best combination of input parameters values based on a given segment of data stream. The drawback of the optimization is the heavy consumption of time. To relieve this limitation, a loss function is proposed to look ahead for the occurrence of concept-drift which is one of the main causes of performance deterioration in data mining model. Optimization is skipped intermittently along the way so to save computation costs. Computer simulation is conducted to confirm the merits by this incremental optimization process for neural network.
THE 2ND ISM INTERNATIONAL STATISTICAL CONFERENCE 2014 (ISM-II): Empowering the Applications of Statistical and Mathematical Sciences | 2015
Bee Wah Yap; Zeinab Jannoo; Nornadiah Mohd Razali; Nor Azura Md. Ghani; Mohamad Alias Lazim
The Short Form 36 (SF-36) is one of the most widely used generic health status measure. This study used the SF-36 Health Survey instrument to investigate the functional health and well-being of Malay Type 2 Diabetes Mellitus patients in Malaysia. The survey was carried out in three local hospitals in Selangor. The method of questionnaire administration was both self-administered and interviewer administered. A total of 354 questionnaires was returned, but only 295 questionnaires with no missing data were analyzed. Confirmatory Factor Analysis (CFA) was used to confirm the first-order and third-order CFA models. The higher order analyses included a third-order CFA models with two second-order factors (physical and mental component) and three second-order factors (physical, general well-being and mental health) and both showed satisfactory model fit indices. This study confirmed the multidimensional factor structure of the SF-36.