Santi Wulan Purnami
Sepuluh Nopember Institute of Technology
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Santi Wulan Purnami.
international symposium on information technology | 2008
Santi Wulan Purnami; Santi Puteri Rahayu; Abdullah Embong
Support Vector Machines (SVM) is a new algorithm of data mining technique, recently received increasing popularity in machine learning community. This paper emphasizes how 1-norm SVM can be used in feature selection and smooth SVM (SSVM) for classification. As a case study, a breast cancer diagnosis was implemented. First, feature selection for support vector machines was utilized to determine the important features. Then, SSVM was used to classify the state of disease (benign or malignant) of breast cancer. As a result, SVM can achieve the state of the art performance on feature selection and classification.
international symposium on information technology | 2008
S. P. Rahayu; Santi Wulan Purnami; Abdullah Embong
Credit risk evaluation is an interesting and important data mining problem in financial analysis domain. This problem domain, do require estimable class probabilities as well as accurate classification method. One of classification methods in the kernel-machine techniques and data mining communities that allows non linear probabilistic classification, transparent reasoning, and competitive discriminative ability is Kernel Logistic Regression. Kernel Logistic Regression model is a kernelized version of Logistic Regression, which well known classification method in the field of statistical learning. The parameters of kernel model are given by the solution of a convex optimization problem, that can be found using the efficient Iteratively Re-weighted Least Squares (IRLS) algorithm. In this paper, we investigated the classification performance of applying Kernel Logistic Regression to classify risk credit problem. The result demonstrated that Kernel Logistic Regression has good accuracy to evaluate credit risk, comparable with another well known kernel machine, Support Vector Machine.
networked digital technologies | 2010
Santi Wulan Purnami; Jasni Mohamad Zain; Abdullah Embong
In last decade, the uses of data mining techniques in medical studies are growing gradually. The aim of this paper is to present a recent research on the application of data mining technique for medical diagnosis problems. The proposed data mining technique is Multiple Knot Spline Smooth Support Vector Machine (MKS-SSVM). MKS-SSVM is a new SSVM which used multiple knot spline function to approximate the plus function instead the integral sigmoid function in SSVM. To evaluate the effectiveness of our method, we carried out on two medical dataset (diabetes disease and heart disease). The accuracy of previous results of these data still under 90% so far. The results of this study showed that MKS-SSVM was effective to diagnose medical dataset, especially diabetes disease and heart disease and this is very promising result compared to the previously reported results.
international conference on computational science and its applications | 2010
Santi Wulan Purnami; Jasni Mohamad Zain; Abdullah Embong
In recent years, the uses of intelligent methods in biomedical studies are growing gradually. In this paper, a novel method for diabetes disease diagnosis using modified spline smooth support vector machine (MS-SSVM) is presented. To obtain optimal accuracy results, we used Uniform Design method for selection parameter. The performance of the method is evaluated using 10-fold cross validation accuracy, confusion matrix, sensitivity and specificity. The comparison with previous spline SSVM in diabetes disease diagnosis also was given. The obtained classification accuracy using 10-fold cross validation is 96.58%. The results of this study showed that the modified spline SSVM was effective to detect diabetes disease diagnosis and this is very promising result compared to the previously reported results.
international conference on software engineering and computer systems | 2011
Santi Wulan Purnami; Jasni Mohamad Zain; Abdullah Embong
The smooth support vector machine (SSVM) is one of the promising algorithms for classification problems. However, it is restricted to work well on a small to moderate dataset. There exist computational difficulties when we use SSVM with non linear kernel to deal with large dataset. Based on SSVM, the reduced support vector machine (RSVM) was proposed to solve these difficulties using a randomly selected subset of data to obtain a nonlinear separating surface. In this paper, we propose an alternative algorithm, k-mode RSVM (KMO-RSVM) that combines RSVM with k-mode clustering technique to handle classification problems on categorical large dataset. In our experiments, we tested the effectiveness of KMO-RSVM on four public available dataset. It turns out that KMO-RSVM can improve speed of running time significantly than SSVM and still obtained a high accuracy. Comparison with RSVM indicates that KMO-RSVM is faster, gets smaller reduced set and comparable testing accuracy than RSVM.
international conference on machine learning | 2017
Santi Wulan Purnami; Shofi Andari; Afifah W. Rusydiana
Feature selection has become the most interesting challenge in processing the analysis of high-dimensional microarray data. It addresses the issue of dimensionality reduction by obtaining important features to construct a good model, especially for classification. There are many different feature selection methods that have been proposed, developed, and, eventually, commonly used. Some of these methods are discussed by many researchers to be excellent to improve the accuracy of classification by taking care of redundant and irrelevant instances. The complicated associations among the genes in microarrays data tend to make works more difficult, but removing less important features can improve the accuracy. In this study, basic feature selection techniques were compared based on support vector machine performance in classifying binary classification problems. The experiments were based on high-dimensional microarray datasets which were preprocessed by reducing its dimensionality using correlation based feature selection and fast correlation based filter and evaluated based on classification accuracy resulted from support vector machine standard.
international conference on machine learning | 2017
Santi Wulan Purnami; Rani Kemala Trapsilasiwi
Dealing with multiclass classification problem is still considered as significant hurdle to determine an efficient classifier. Moreover, this task is getting rough when it comes to imbalanced data, which defined as the number of some classes are much bigger than the others. This condition could cause the classifier tends to predict the majority class and ignore the minority class. This study proposed Synthetic Minority Oversampling Technique-Least Square Support Vector Machine (SMOTE-LSSVM) to build a classifier addressing this problem. Particle Swarm Optimization-Gravitational Search Algorithm (PSO-GSA) was used to optimize the parameters of LS-SVM, while SMOTE was employed to balance the data. The effectiveness of SMOTE-LSSVM was examined on malignancy of breast cancer dataset. Results of this studies showed that the accuracy rate after applying SMOTE increased significantly compare to the results without applying SMOTE.
imt gt international conference mathematics statistics and their applications | 2017
Chusnul Khotimah; Santi Wulan Purnami; Dedy Dwi Prastyo; Virasakdi Chosuvivatwong; Hutcha Sriplung
Support Vector Machines (SVMs) has been widely applied for prediction in many fields. Recently, SVM is also developed for survival analysis. In this study, Additive Survival Least Square SVM (A-SURLSSVM) approach is used to analyze cervical cancer dataset and its performance is compared with the Cox model as a benchmark. The comparison is evaluated based on the prognostic index produced: concordance index (c-index), log rank, and hazard ratio. The higher prognostic index represents the better performance of the corresponding methods. This work also applied feature selection to choose important features using backward elimination technique based on the c-index criterion. The cervical cancer dataset consists of 172 patients. The empirical results show that nine out of the twelve features: age at marriage, age of first getting menstruation, age, parity, type of treatment, history of family planning, stadium, long-time of menstruation, and anemia status are selected as relevant features that affect the survival time of cervical cancer patients. In addition, the performance of the proposed method is evaluated through a simulation study with the different number of features and censoring percentages. Two out of three performance measures (c-index and hazard ratio) obtained from A-SURLSSVM consistently yield better results than the ones obtained from Cox model when it is applied on both simulated and cervical cancer data. Moreover, the simulation study showed that A-SURLSSVM performs better when the percentage of censoring data is small.Support Vector Machines (SVMs) has been widely applied for prediction in many fields. Recently, SVM is also developed for survival analysis. In this study, Additive Survival Least Square SVM (A-SURLSSVM) approach is used to analyze cervical cancer dataset and its performance is compared with the Cox model as a benchmark. The comparison is evaluated based on the prognostic index produced: concordance index (c-index), log rank, and hazard ratio. The higher prognostic index represents the better performance of the corresponding methods. This work also applied feature selection to choose important features using backward elimination technique based on the c-index criterion. The cervical cancer dataset consists of 172 patients. The empirical results show that nine out of the twelve features: age at marriage, age of first getting menstruation, age, parity, type of treatment, history of family planning, stadium, long-time of menstruation, and anemia status are selected as relevant features that affect the surviv...
imt gt international conference mathematics statistics and their applications | 2017
Faroh Ladayya; Santi Wulan Purnami; Irhamah
DNA microarrays are data containing gene expression with small sample sizes and high number of features. Furthermore, imbalanced classes is a common problem in microarray data. This occurs when a dataset is dominated by a class which have significantly more instances than the other minority classes. Therefore, it is needed a classification method that solve the problem of high dimensional and imbalanced data. Support Vector Machine (SVM) is one of the classification methods that is capable of handling large or small samples, nonlinear, high dimensional, over learning and local minimum issues. SVM has been widely applied to DNA microarray data classification and it has been shown that SVM provides the best performance among other machine learning methods. However, imbalanced data will be a problem because SVM treats all samples in the same importance thus the results is bias for minority class. To overcome the imbalanced data, Fuzzy SVM (FSVM) is proposed. This method apply a fuzzy membership to each input point and reformulate the SVM such that different input points provide different contributions to the classifier. The minority classes have large fuzzy membership so FSVM can pay more attention to the samples with larger fuzzy membership. Given DNA microarray data is a high dimensional data with a very large number of features, it is necessary to do feature selection first using Fast Correlation based Filter (FCBF). In this study will be analyzed by SVM, FSVM and both methods by applying FCBF and get the classification performance of them. Based on the overall results, FSVM on selected features has the best classification performance compared to SVM.DNA microarrays are data containing gene expression with small sample sizes and high number of features. Furthermore, imbalanced classes is a common problem in microarray data. This occurs when a dataset is dominated by a class which have significantly more instances than the other minority classes. Therefore, it is needed a classification method that solve the problem of high dimensional and imbalanced data. Support Vector Machine (SVM) is one of the classification methods that is capable of handling large or small samples, nonlinear, high dimensional, over learning and local minimum issues. SVM has been widely applied to DNA microarray data classification and it has been shown that SVM provides the best performance among other machine learning methods. However, imbalanced data will be a problem because SVM treats all samples in the same importance thus the results is bias for minority class. To overcome the imbalanced data, Fuzzy SVM (FSVM) is proposed. This method apply a fuzzy membership to each input...
Journal of Physics: Conference Series | 2017
Riza Yuli Rusdiana; Ismaini Zain; Santi Wulan Purnami
Hurdle negative binomial model regression is a method that can be used for discreate dependent variable, excess zero and under- and overdispersion. It uses two parts approach. The first part estimates zero elements from dependent variable is zero hurdle model and the second part estimates not zero elements (non-negative integer) from dependent variable is called truncated negative binomial models. The discrete dependent variable in such cases is censored for some values. The type of censor that will be studied in this research is right censored. This study aims to obtain the parameter estimator hurdle negative binomial regression for right censored dependent variable. In the assessment of parameter estimation methods used Maximum Likelihood Estimator (MLE). Hurdle negative binomial model regression for right censored dependent variable is applied on the number of neonatorum tetanus cases in Indonesia. The type data is count data which contains zero values in some observations and other variety value. This study also aims to obtain the parameter estimator and test statistic censored hurdle negative binomial model. Based on the regression results, the factors that influence neonatorum tetanus case in Indonesia is the percentage of baby health care coverage and neonatal visits.