Arnulfo P. Azcarraga | Researchain

Archive Network Publication Hotspot Collaboration

Network

Latest external collaboration on country level. Dive into details by clicking on the dots.

Explore More

Hotspot

Dive into the research topics where Arnulfo P. Azcarraga is active.

Explore More

Publication

Featured researches published by Arnulfo P. Azcarraga.

systems man and cybernetics | 2005

Extracting salient dimensions for automatic SOM labeling

Arnulfo P. Azcarraga; Ming-Huei Hsieh; Shan Ling Pan; Rudy Setiono

Learning in self-organizing maps (SOM) is considered unsupervised because training patterns do not need accompanying desired output information. Prior to its use in some real-world applications, however, a trained SOM often has to be labeled. This labeling phase is usually supervised in that labeled patterns need accompanying output information. Because such labeled patterns are not always available or may not even be possible to construct, the supervised nature of the labeling phase restricts the deployment of SOM from a wide range of potential domains of application. This work proposes a methodical and automatic SOM labeling procedure that does not require a set of prelabeled patterns. Instead, nodes in the trained map are clustered and subsets of training patterns associated to each of the clustered nodes are identified. Salient dimensions per node cluster that constitute the bases for labeling each node in the map are then identified. The effectiveness of the method is demonstrated on a SOM-based international market segmentation study.

IEEE Transactions on Knowledge and Data Engineering | 2004

Evaluating keyword selection methods for WEBSOM text archives

Arnulfo P. Azcarraga; Jr. Yap Tn; J. Tan; Tat-Seng Chua

The WEBSOM methodology, proven effective for building very large text archives, includes a method that extracts labels for each document cluster assigned to nodes in the map. However, the WEBSOM method needs to retrieve all the words of all the documents associated to each node. Since maps may have more than 100,000 nodes and since the archive may contain up to seven million documents, the WEBSOM methodology needs a faster alternative method for keyword selection. Presented here is such an alternative method that is able to quickly deduce meaningful labels per node in the map. It does this just by analyzing the relative weight distribution of the SOM weight vectors and by taking advantage of some characteristics of the random projection method used in dimensionality reduction. The effectiveness of this technique is demonstrated on news document collections.

Journal of the Operational Research Society | 2005

Automatic knowledge extraction from survey data: learning M -of- N constructs using a hybrid approach

Rudy Setiono; Shan Ling Pan; Ming-Huei Hsieh; Arnulfo P. Azcarraga

Data collected from a survey typically consist of attributes that are mostly if not completely binary-valued or binary-encoded. We present a method for handling such data where the underlying data analysis can be cast as a classification problem. We propose a hybrid method that combines neural network and decision tree methods. The network is trained to remove irrelevant data attributes and the decision tree is applied to extract comprehensible classification rules from the trained network. The conditions of the rules are in the form of a conjunction of M-of-N constructs. An M-of-N construct is a rule condition that is satisfied if (at least, exactly, at most) M of the N binary attributes in the construct are present. The effectiveness of the method is illustrated on data collected for a study of global car market segmentation. The results show that besides achieving high predictive accuracy, the method also allows meaningful interpretation of the relationships among the data variables.

international symposium on neural networks | 2012

Keyword extraction using backpropagation neural networks and rule extraction

Arnulfo P. Azcarraga; Michael David Liu; Rudy Setiono

Keyword extraction is vital for Knowledge Management System, Information Retrieval System, and Digital Libraries as well as for general browsing of the web. Keywords are often the basis of document processing methods such as clustering and retrieval since processing all the words in the document can be slow. Common models for automating the process of keyword extraction are usually done by using several statistics-based methods such as Bayesian, K-Nearest Neighbor, and Expectation-Maximization. These models are limited by word-related features that can be used since adding more features will make the models more complex and difficult to comprehend. In this research, a Neural Network, specifically a backpropagation network, will be used in generalizing the relationship of the title and the content of articles in the archive by following word features other than TF-IDF, such as position of word in the sentence, paragraph, or in the entire document, and formats such as heading, and other attributes defined beforehand. In order to explain how the backpropagation network works, a rule extraction method will be used to extract symbolic data from the resulting backpropagation network. The rules extracted can then be transformed into decision trees performing almost as accurate as the network plus the benefit of being in an easily comprehensible format.

International Journal on Artificial Intelligence Tools | 2002

COMPARING KEYWORD EXTRACTION TECHNIQUES FOR WEBSOM TEXT ARCHIVES

Arnulfo P. Azcarraga; Teddy N. Yap; Tat-Seng Chua

The WEBSOM methodology for building very large text archives has a very slow method for extracting meaningful unit labels. This is due to the fact that the method computes for the relative frequencies of all the words of all the documents associated to each unit and then compares these to the relative frequencies of all the words of other units in the map. Since maps may have more than 100,000 units and the archieve may contain up to 7 million documents, the existing WEBSOM method is not practical. A fast alternative method, referred to as the liGHtSOM method, is based on the distribution of weights in the weight vectors of the trained map, plus a simple manipulation of the random projection matrix used for input data compression. Comparison made using a WEBSOM archieve of the Reuters text collection reveal that a high percentage of keywords extracted using this method match the keywords extracted for the same map units using the original WEBSOM method. A detailed time complexity analysis of the two methods is also provided.

knowledge discovery and data mining | 2008

Improved SOM Labeling Methodology for Data Mining Applications

Arnulfo P. Azcarraga; Ming-Huei Hsieh; Shan Ling Pan; Rudy Setiono

Self-Organizing Maps (SOMs) have been useful in gaining insights about the information content of large volumes of data in various data mining applications. As a special form of neural networks, they have been attractive as a data mining tool because they are able to extract information from data even with very little user-intervention. However, although learning in self-organizing maps is considered unsupervised because training patterns do not need desired output information to be supplied by the user, a trained SOM often has to be labeled prior to use in many real-world applications. Unfortunately, this labeling phase is usually supervised as patterns need accompanying output information that have to be supplied by the user. Because labeled patterns are not always available or may not even be possible to construct, the supervised nature of the labeling phase restricts the deployment of SOM to a wider range of potential data mining applications. This work proposes a methodical and semi-automatic SOM labeling procedure that does not require a set of labeled patterns. Instead, nodes in the trained map are clustered and subsets of training patterns associated to each of the clustered nodes are identified. Salient dimensions per node cluster, that constitute the basis for labeling each node in the map, are then identified. The effectiveness of the method is demonstrated on a data mining application involving customer-profiling based on an international market segmentation study.

workshop on self organizing maps | 2011

Design of a structured 3D SOM as a music archive

Arnulfo P. Azcarraga; Sean Manalili

A structured 3D SOM is an extension of a Self-Organizing Map from 2D to 3D where a structure has been built into the design of the 3D map. The 3D SOM is a 3×3×3 cube, with a distinct core cube in the center, and 26 exterior cubes around the center. The structured SOM mainly uses the 8 corner cubes among the 26 exterior cubes. Used to build a music archive, the SOM learning algorithm is modified to include a four-step learning and labeling phase. The first phase is meant only to position the music files in their general locations within the core cube. The second phase is meant to position the music files in their respective corner cubes. The third phase is meant to do a fine adjustment of the weight vectors in the core cube. The fourth phase is the labeling of the map and the association of music files to specific nodes in the map. Through the embedded structure of the 3D SOM, a precise measure is developed to measure the quality of the resulting trained SOM (in this case, the music archive), as well as the quality of the different categories/genres of music albums based on a novel measure of the attraction index and the fidelity of music files to their respective music genres.

Neurocomputing | 2016

Neural network training and rule extraction with augmented discretized input

Yoichi Hayashi; Rudy Setiono; Arnulfo P. Azcarraga

The classification and prediction accuracy of neural networks can be improved when they are trained with discretized continuous attributes as additional inputs. Such input augmentation makes it easier for the network weights to form more accurate decision boundaries when the data samples of different classes in the data set are contained in distinct hyper-rectangular subregions in the original input space. In this paper, we present first how a neural network can be trained with augmented discretized inputs. The additional inputs are obtained by dividing the original interval of each continuous attribute into subintervals of equal length. The network is then pruned to remove most of the discretized inputs as well as the original continuous attributes as long as the network still achieves a minimum preset accuracy requirement. We then discuss how comprehensible classification rules can be extracted from the pruned network by analyzing the activations of the network hidden units and the weights of the network connections that remain in the pruned network. Our experiments on artificial data sets show that the rules extracted from the neural networks can perfectly replicate the class membership rules used to create the data perfectly. On real-life benchmark data sets, neural networks trained with augmented discretized inputs are shown to achieve better accuracy than neural networks trained with the original data.

Journal of the Operational Research Society | 2006

Knowledge acquisition and revision using neural networks: an application to a cross-national study of brand image perception

Rudy Setiono; Shan Ling Pan; Ming-Huei Hsieh; Arnulfo P. Azcarraga

A three-tier knowledge management approach is proposed in the context of a cross-national study of car brand and corporate image perceptions. The approach consists of knowledge acquisition, transfer and revision using neural networks. We investigate how knowledge acquired by a neural network from one car market can be exploited and applied in another market. This transferred knowledge is subsequently revised for application in the new market. Knowledge revision is achieved by re-training the neural network. Core knowledge common to both markets is retained while some localized knowledge components are introduced during network re-training. Since the knowledge acquired by a neural network can be expressed as an accurate set of simple rules, we are able to compare the knowledge extracted from one network with the knowledge extracted from another. Comparison of the originally acquired knowledge with the revised knowledge provides us with insights into the commonalities and differences in car brand and corporate perceptions across national markets.

International Journal on Artificial Intelligence Tools | 2002

GENERATING CONCISE SETS OF LINEAR REGRESSION RULES FROM ARTIFICIAL NEURAL NETWORKS

Rudy Setiono; Arnulfo P. Azcarraga

Neural networks with a single hidden layer are known to be universal function approximators. However, due to the complexity of the network topology and the nonlinear transfer function used in computing the hidden unit activations, the predictions of a trained network are difficult to comprehend. On the other hand, predictions from a multiple linear regression equation are easy to understand but are not accurate when the underlying relationship between the input variables and the output variable is nonlinear. We have thus developed a method for multivariate function approximation which combines neural network learning, clustering and multiple regression. This method generates a set of multiple linear regression equations using neural networks, where the number of regression equations is determined by clustering the weighted input variables. The predictions for samples of the same cluster are computed by the same regression equation. Experimental results on a number of real-world data demonstrate that this new method generates relatively few regression equations from the training data samples. Yet, drawing from the universal function approximation capacity of neural networks, the predictive accuracy is high. The prediction errors are comparable to or lower than those achieved by existing function approximation methods.

Explore More