Taufik Fuadi Abidin
North Dakota State University
Network
Latest external collaboration on country level. Dive into details by clicking on the dots.
Publication
Featured researches published by Taufik Fuadi Abidin.
acm symposium on applied computing | 2006
Taufik Fuadi Abidin; William Perrizo
K-nearest neighbors (KNN) is the simplest method for classification. Given a set of objects in a multi-dimensional feature space, the method assigns a category to an unclassified object based on the plurality of category of the k-nearest neighbors. The closeness between objects is determined using a distance measure, e.g. Euclidian distance. Despite its simplicity, KNN also has some drawbacks: 1) it suffers from expensive computational cost in training when the training set contains millions of objects; 2) its classification time is linear to the size of the training set. The larger the training set, the longer it takes to search for the k-nearest neighbors. In this paper, we propose a new algorithm, called SMART-TV (Small Absolute difference of Total Variation), that approximates a set of potential candidates of nearest neighbors by examining the absolute difference of total variation between each data object in the training set and the unclassified object. Then, the k-nearest neighbors are searched from that candidate set. We empirically evaluate the performance of our algorithm on both real and synthetic datasets and find that SMART-TV is fast and scalable. The classification accuracy of SMART-TV is high and comparable to the accuracy of the traditional KNN algorithm.
international conference on education technology and computer | 2010
Taufik Fuadi Abidin; Bustami Yusuf; Munzir Umran
Partitioning vast amounts of text documents is a challenging problem due to a high dimensional representation of the documents. In this study, we investigate the quality of text document clustering when Singular Value Decomposition (SVD) is used to reduce the dimension of the documents. The results show that the quality of the clusters is very comparable to that of when the dimensions are not reduced. In addition, the computational cost to cluster documents can be reduced significantly when the clustering is done on a small dimension.
international conference on advanced computer science and information systems | 2015
Taufik Fuadi Abidin; Muhammad Subianto; T. A. Gani; Ridha Ferdhiana
A lot of tropical disease cases that occurred in Indonesia are reported online in Indonesian news portals. Online news portals are now becoming great sources of information because online news articles are updated frequently. A rule-based, combined with machine learning algorithm, to identify the location of the cases has been developed. In this paper, a complete flow to routinely search, crawl, clean, classify, extract, and integrate the extracted entities into Google Earth is presented. The algorithm is started by searching for Indonesian news articles using a set of selected queries and Google Site Search API, and then crawling them. After the articles are crawled, they are cleaned and classified. The articles that discuss about tropical disease cases (classified as positive) are further examined to extract the locution of the incidence and to determine the sentences containing the date of occurrence and the number of casualties. The extracted entities are then stored in a relational database and annotated in an XML keyhole markup language notation to create a geographic visualization in Google Earth. The evaluation shows that it takes approximately 6 minutes to search, crawl, clean, classify, extract, and annotate the extracted entities into an XML keyhole markup language notation from 5 Web articles. In other words, it takes about 72.40 seconds to process a new page.
2016 International Conference on Informatics and Computing (ICIC) | 2016
Taufik Fuadi Abidin; Ridha Ferdhiana
In this paper, we examine an algorithm to update n-grams word dictionary (thesaurus) and evaluate its effectiveness in binary classification problem. The thesaurus is used as a reference to generate the numerical feature attributes of web pages. Generally, the n-grams word dictionary is built once using a set of training data and its content is never updated. Hence, the content is static and its coverage is limited to the n-grams word found in the initial training set. Actually, the content of a thesaurus must be dynamic, especially because the n-grams word dictionary is used repeatedly as a reference in generating the numerical feature attributes of web pages. We argue that a dynamic thesaurus is better than a static one in a long-term. Thus, n-grams word dictionary should be updated frequently using new data without degrading the classification accuracy. We validate our proposed algorithm using several test sets, each of which contains one hundred web pages, except for the last one. The experimental results show that our proposed algorithm works well. On average, the accuracy of feature dataset generated using the existing (old) dictionary is 57.75%, while the accuracy of feature dataset generated using updated (new) dictionary is 76.75%. The proposed algorithm increases classification accuracy about 32.90%.
international conference on information technology systems and innovation | 2014
Taufik Fuadi Abidin; Rahmad Dimyathi; Ridha Ferdhiana
With the rapid maturity of internet and web technology over the last decades, the number of Indonesian online news articles is growing rapidly on the web at a pace we never experienced before. In this paper, we introduce a combination of rule-based and machine learning approach to find the sentences that have tropical disease information in them, such as the incidence date and the number of casualty, and we measure its accuracy. Given a set of web pages in tropical disease topic, we first extract the sentences in the pages that match contextual and morphological patterns for a date and number of casualty using a rule-based algorithm. After that, we classify the sentences using Support Vector Machine and collect the sentences that have tropical disease information in them. The results show that the proposed method works well and has good accuracy.
international conference on data engineering | 2006
Taufik Fuadi Abidin; Aijuan Dong; Honglin Li; William Perrizo
Organizing digital images into semantic categories is imperative for effective browsing and retrieval. In large image collections, efficient algorithms are crucial to quickly categorize new images. In this paper, we study a nearest neighbor based algorithm in image classification from a different perspective. The proposed algorithm vertically decomposes image features into separate bit vectors, one for each bit position of the values in the features, and approximates a number of candidates of nearest neighbors by examining the absolute difference of total variation between the images in the repositories and the unclassified image. Once the candidate set is obtained, the k-nearest neighbors are then searched from the set. We use a combination of global color histogram in HSV (6x3x3) color space and Gabor texture for the image features. Our experiments on Corel dataset show that our algorithm is fast and scalable for image classification even when image repositories are very large. In addition, the classification accuracy is comparable to the accuracy of the classical KNN algorithm.
computers and their applications | 2006
Ranapratap Syamala; Taufik Fuadi Abidin; William Perrizo
Archive | 2005
William Perrizo; Taufik Fuadi Abidin; Amal Shehan Perera; Masum Serazi
IASSE | 2005
Amal Shehan Perera; Taufik Fuadi Abidin; Masum Serazi; George Hamer; William Perrizo
Journal of Emerging Technologies in Web Intelligence | 2013
Taufik Fuadi Abidin; Ridha Ferdhiana; Hajjul Kamil