Taufik Fuadi Abidin | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Taufik Fuadi Abidin is active.

Explore More

Publication

Featured researches published by Taufik Fuadi Abidin.

acm symposium on applied computing | 2006

SMART-TV: a fast and scalable nearest neighbor based classifier for data mining

Taufik Fuadi Abidin; William Perrizo

K-nearest neighbors (KNN) is the simplest method for classification. Given a set of objects in a multi-dimensional feature space, the method assigns a category to an unclassified object based on the plurality of category of the k-nearest neighbors. The closeness between objects is determined using a distance measure, e.g. Euclidian distance. Despite its simplicity, KNN also has some drawbacks: 1) it suffers from expensive computational cost in training when the training set contains millions of objects; 2) its classification time is linear to the size of the training set. The larger the training set, the longer it takes to search for the k-nearest neighbors. In this paper, we propose a new algorithm, called SMART-TV (Small Absolute difference of Total Variation), that approximates a set of potential candidates of nearest neighbors by examining the absolute difference of total variation between each data object in the training set and the unclassified object. Then, the k-nearest neighbors are searched from that candidate set. We empirically evaluate the performance of our algorithm on both real and synthetic datasets and find that SMART-TV is fast and scalable. The classification accuracy of SMART-TV is high and comparable to the accuracy of the traditional KNN algorithm.

international conference on education technology and computer | 2010

Singular Value Decomposition for dimensionality reduction in unsupervised text learning problems

Taufik Fuadi Abidin; Bustami Yusuf; Munzir Umran

Partitioning vast amounts of text documents is a challenging problem due to a high dimensional representation of the documents. In this study, we investigate the quality of text document clustering when Singular Value Decomposition (SVD) is used to reduce the dimension of the documents. The results show that the quality of the clusters is very comparable to that of when the dimensions are not reduced. In addition, the computational cost to cluster documents can be reduced significantly when the clustering is done on a small dimension.

international conference on advanced computer science and information systems | 2015

Periodic update and automatic extraction of web data for creating a Google Earth based tool

Taufik Fuadi Abidin; Muhammad Subianto; T. A. Gani; Ridha Ferdhiana

A lot of tropical disease cases that occurred in Indonesia are reported online in Indonesian news portals. Online news portals are now becoming great sources of information because online news articles are updated frequently. A rule-based, combined with machine learning algorithm, to identify the location of the cases has been developed. In this paper, a complete flow to routinely search, crawl, clean, classify, extract, and integrate the extracted entities into Google Earth is presented. The algorithm is started by searching for Indonesian news articles using a set of selected queries and Google Site Search API, and then crawling them. After the articles are crawled, they are cleaned and classified. The articles that discuss about tropical disease cases (classified as positive) are further examined to extract the locution of the incidence and to determine the sentences containing the date of occurrence and the number of casualties. The extracted entities are then stored in a relational database and annotated in an XML keyhole markup language notation to create a geographic visualization in Google Earth. The evaluation shows that it takes approximately 6 minutes to search, crawl, clean, classify, extract, and annotate the extracted entities into an XML keyhole markup language notation from 5 Web articles. In other words, it takes about 72.40 seconds to process a new page.

2016 International Conference on Informatics and Computing (ICIC) | 2016

Algorithm for updating n-grams word dictionary for web classification

Taufik Fuadi Abidin; Ridha Ferdhiana

In this paper, we examine an algorithm to update n-grams word dictionary (thesaurus) and evaluate its effectiveness in binary classification problem. The thesaurus is used as a reference to generate the numerical feature attributes of web pages. Generally, the n-grams word dictionary is built once using a set of training data and its content is never updated. Hence, the content is static and its coverage is limited to the n-grams word found in the initial training set. Actually, the content of a thesaurus must be dynamic, especially because the n-grams word dictionary is used repeatedly as a reference in generating the numerical feature attributes of web pages. We argue that a dynamic thesaurus is better than a static one in a long-term. Thus, n-grams word dictionary should be updated frequently using new data without degrading the classification accuracy. We validate our proposed algorithm using several test sets, each of which contains one hundred web pages, except for the last one. The experimental results show that our proposed algorithm works well. On average, the accuracy of feature dataset generated using the existing (old) dictionary is 57.75%, while the accuracy of feature dataset generated using updated (new) dictionary is 76.75%. The proposed algorithm increases classification accuracy about 32.90%.

international conference on information technology systems and innovation | 2014

Rule-based and machine learning approach for event sentence extraction in Indonesian online news articles

Taufik Fuadi Abidin; Rahmad Dimyathi; Ridha Ferdhiana

With the rapid maturity of internet and web technology over the last decades, the number of Indonesian online news articles is growing rapidly on the web at a pace we never experienced before. In this paper, we introduce a combination of rule-based and machine learning approach to find the sentences that have tropical disease information in them, such as the incidence date and the number of casualty, and we measure its accuracy. Given a set of web pages in tropical disease topic, we first extract the sentences in the pages that match contextual and morphological patterns for a date and number of casualty using a rule-based algorithm. After that, we classify the sentences using Support Vector Machine and collect the sentences that have tropical disease information in them. The results show that the proposed method works well and has good accuracy.

international conference on data engineering | 2006

Efficient Image Classification on Vertically Decomposed Data

Taufik Fuadi Abidin; Aijuan Dong; Honglin Li; William Perrizo

Organizing digital images into semantic categories is imperative for effective browsing and retrieval. In large image collections, efficient algorithms are crucial to quickly categorize new images. In this paper, we study a nearest neighbor based algorithm in image classification from a different perspective. The proposed algorithm vertically decomposes image features into separate bit vectors, one for each bit position of the values in the features, and approximates a number of candidates of nearest neighbors by examining the absolute difference of total variation between the images in the repositories and the unclassified image. Once the candidate set is obtained, the k-nearest neighbors are then searched from the set. We use a combination of global color histogram in HSV (6x3x3) color space and Gabor texture for the image features. Our experiments on Corel dataset show that our algorithm is fast and scalable for image classification even when image repositories are very large. In addition, the classification accuracy is comparable to the accuracy of the classical KNN algorithm.

computers and their applications | 2006