[PDF] Pattern Classification using Simplified Neural Networks

Abstract

In recent years, many neural network models have been proposed for pattern classification, function approximation and regression problems. This paper presents an approach for classifying patterns from simplified NNs. Although the predictive accuracy of ANNs is often higher than that of other methods or human experts, it is often said that ANNs are practically "black boxes", due to the complexity of the networks. In this paper, we have an attempted to open up these black boxes by reducing the complexity of the network. The factor makes this possible is the pruning algorithm. By eliminating redundant weights, redundant input and hidden units are identified and removed from the network. Using the pruning algorithm, we have been able to prune networks such that only a few input units, hidden units and connections left yield a simplified network. Experimental results on several benchmarks problems in neural networks show the effectiveness of the proposed approach with good generalization ability.

Full PDF

IICTM 2005

Pattern Classification using Simplified Neural Networks with Pruning Algorithm

S. M. Kamruzzaman Ahmed Ryadh Hasan Abstract:

In recent years, many neural network models have been proposed for pattern classification, function approximation and regression problems. This paper presents an approach for classifying patterns from simplified NNs. Although the predictive accuracy of ANNs is often higher than that of other methods or human experts, it is often said that ANNs are practically “black boxes”, due to the complexity of the networks. In this paper, we have an attempted to open up these black boxes by reducing the complexity of the network. The factor makes this possible is the pruning algorithm. By eliminating redundant weights, redundant input and hidden units are identified and removed from the network. Using the pruning algorithm, we have been able to prune networks such that only a few input units, hidden units and connections left yield a simplified network. Experimental results on several benchmarks problems in neural networks show the effectiveness of the proposed approach with good generalization ability.

Keywords:

Artificial Neural Network, Pattern Classification, Pruning Algorithm, Weight Elimination, Penalty Function, Network Simplification.

In recent years, many neural network models have been proposed for pattern classification, function approximation and regression problems [2] [3] [18]. Among them, the class of multi-layer feed forward networks is most popular. Methods using standard back propagation perform gradient descent only in the weight space of a network with fixed topology [13]. In general, this approach is useful only when the network architecture is chosen correctly [9]. Too small a network cannot learn the problem well or too large a size will lead to over fitting and poor generalization [1]. Artificial neural networks are considered as efficient computing models and as the universal approximators [4]. The predictive accuracy of neural network is higher than that of other methods or human experts, it is generally difficult to understand how the network arrives at a particular decision due to the complexity of a particular architecture [6] [15]. One of the major criticism is their being black boxes, since no satisfactory explanation of their behavior has been offered. This is because of the complexity of the interconnections between layers and the network size [18]. As such, an optimal network size with minimal number of interconnection will give insight into how neural network performs. Another motivation for network simplification and pruning is related to time complexity of learning time [7] [8].

Network pruning offers another approach for dynamically determining an appropriate network topology. Pruning techniques [11] begin by training a larger than necessary network and then eliminate weights and neurons that are deemed redundant. Typically, methods for removing weights involve adding a penalty term to the error function [5]. It is hoped that adding a penalty term to the error function, unnecessary connection will have smaller weights and therefore complexity of the network can be significantly reduced. This paper aims at pruning the Assistant Professor, Department of Computer Science and Engineering, Manarat International University, Dhaka-1212, Bangladesh, Email: [email protected] School of Communication, Independent University Bangladesh, Chittagong, Bangladesh. Email: [email protected]

CTM 2005 network size both in number of neurons and number of interconnections between the neurons. The pruning strategies along with the penalty function are described in the subsequent sections.

When a network is to be pruned, it is a common practice to add a penalty term to the error function during training [16]. Usually, the penalty term, as suggested in different literature, is ( ) ( )( ) ( )( ) ( ) ( ) , 1 1 m mh n h o h n h ol p m ml pm mm l m p m l m pl p w vP w v w vw v β βε εβ β = = = = = = = =     = + + +  + +    ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ (1) Given an n-dimensional example { } , 1, 2,...., i x i k ε as input, let ml w be the weight for the connection from input unit { } , 1, 2,...., l l n ε to hidden unit { } , 1, 2,...., m m h ε and mp v be the weight for the connection from hidden unit m to output unit { } , 1, 2,...., p p o ε , the p th output of the network for example x i is obtained by computing hi m mp pm S v σ α =  =    ∑ , where (2) nm i ml ll x w α δ =  =    ∑ , ( ) ( ) ( ) / x x x x x e e e e δ − − = − − (3) The target output from an example i x that belongs to class j C is an o-dimensional vector i t , where ip t = 0 if p = j and ip t = 1, j, p = 1,2…o. The back propagation algorithm is applied to update the weights ( w, v ) and minimize the following function: ( ) ( ) ( ) , , , w v F w v P w v θ = + (4) where ( ) , F w v is the cross entropy function as defined ( ) ( ) , log (1 ) log(1 ) k o i i i ip p p pi p

F w v t S t S = = = − + − − ∑ ∑ (5) and ( ) , P w v is a penalty term as described in (1) used for weight decay.

Penalty function is used for weight decay. As such we can eliminate redundant weights with the following Weight Elimination Algorithm as suggested in different literature [12][14][17].

1. Let η and η be positive scalars such that η η < 0.5. 2. Pick a fully connected network and train this network such that error condition is satisfied by all input patterns. Let (w, v) be the weights of this network. 3. For each ml w , if m ml v w × ≤ η (6) Then remove ml w from the network CTM 2005

4. For each m v , If m v ≤ η (7) Then remove m v from the network 5. If no weight satisfies condition (6) or condition (7) then remove ml w with the smallest product m ml v w × . 6. Retrain the network. If classification rate of the network falls below an acceptable level, then stop. Otherwise go to Step 3. A node-pruning algorithm is presented below to remove redundant nodes in the input and hidden layer.

Step 1:

Create an initial network with as many input neurons as required by the specific problem description and with one hidden unit. Randomly initialize the connection weights of the network within a certain range.

Step 2:

Partially train the network on the training set for a certain number of training epochs using a training algorithm. The number of training epochs, τ , is specified by the user. Step 3:

Eliminate the redundant weights by using weight elimination algorithm as described in section 2.2.

Step 4:

Test this network. If the accuracy of this network falls below an acceptable range then add one more hidden unit and go to step 2.

Step 5:

If there is any input node x l with ml w = 0, for m = 1,2…h, then remove this node. Step 6:

Test the generalization ability of the network with test set. If the network successfully converges then erminate, otherwise, go to step 1.

3. Experimental Results And Discussions

In this experiment, we have used three benchmark classification problems. The problems are breast cancer diagnosis, classification of glass types and Pima Indians Diabetes diagnosis problem [10] [19]. All the data sets were obtained from the UCI machine learning benchmark repository. Brief characteristics of the data sets are listed in Table 1.

Table 1:

Characteristics of data sets. The experimental results of different data sets are shown in table 2, figure 1, 2 and 3. In the experimental results of cancer data set, we have found that a fully connected network of 9-3-2 architecture has the classification accuracy of

Data set Input Attributes Output Units Output Classes Training Examples ValidationExamples Test examples Total examplesCancer1

9 2 2 350 175 174 699

Glass

9 6 6 107 54 54 215

Diabetes

8 2 2 384 192 192 768

CTM 2005

Weight Elimination Algorithm and

Input and Hidden node Pruning Algorithm, we have found a simplified network of 3-1-2 architecture with classification accuracy of 96.644%. The graphical representation of the simplified network is given in figure 3. It shows that only the input attributes I I I along with a single hidden unit is adequate for this problem. Figure 1:

Simplified Network for Breast Cancer Diagnosis problem.

Figure 2:

Simplified Network for Pima Indians Diabetes diagnosis problem.

Bias Bias Active Weight

Pruned Weight

Active Neuron

Pruned Neuron

Input Layer Hidden Layer Output Layer

Bias Bias I I I I I I I I O O W = -204.159255 W = 74.090849 W = -52.965123 W = 52.965297 W = 47.038678 W = 52.469025 W = 46.967161 W = 46.967161 W = 46.967161 W = -46.967363 W = -46.967363 V = -1.152618 V = 1.152618 V = -32.078753 V = 32.084780W i = Input to Hidden Weight V i = Hidden to Output Weight I i = Input signal O i = Output Signal O O W = -21.992443 W = -13.802489 W = -13.802464 V = 3.035398 V = -3.035398 Bias Bias I I I I I I I I I Input Layer Hidden Layer Output Layer

Bias Active Weight

Pruned Weight

Active Neuron

Pruned Neuron W i = Input to Hidden Weight V i = Hidden to Output Weight I i = Input signal O i = Output Signal

CTM 2005

In the experimental results of Pima Indians Diabetes data set, we have found that a fully connected network of 8-3-2 architecture has the classification accuracy of 77.344%. After pruning the network with

Weight Elimination Algorithm and

Input and Hidden node Pruning Algorithm, we have found a simplified network of 8-2-2 architecture with classification accuracy of 75.260%. The graphical representation of the simplified network is given in figure 2. It shows that no input attribute can be removed but a hidden node along with some redundant connection has been removed which have been shown with a dotted line in figure 2.

Figure 3:

Simplified Network for Glass classification problem.

In the experimental results of Glass classification data set, we have found that a fully connected network of 9-4-6 architecture has the classification accuracy of 65.277%. After pruning the network with

Weight Elimination Algorithm and

Input and Hidden node Pruning Algorithm, we have found a simplified network of 9-3-6 architecture with classification accuracy of 63.289%. The graphical representation of the simplified network is given in figure 3. It shows that no input attribute can be removed but a hidden node along with some redundant connection has been removed which have been shown with a dotted line in figure 3.

Active Weight

Pruned Weight

Active Neuron

Pruned Neuron

Input Layer Hidden Layer Output Layer I I I I I I I I I O O O O O O I i = Input signal O i = Output Signal

CTM 2005

Table 2:

Experimental Results

Data sets

Results Cancer1 Diabetes Glass Learning Rate

No. of Epoch

500 1200 650

Initial Architecture

Input Nodes Removed

6 0 1

Hidden Nodes Removed

2 1 2

Total Connection Removed

24 13 16

Simplified Architecture

Accuracy (%) of fully connected network

Accuracy (%) of simplified network

In future we will use this network pruning approach for rule extraction and feature selection. These pruning strategies will be also examined for function approximation and regression problems.

In this paper we proposed an efficient network simplification algorithm using pruning strategies. Using this approach we obtain optimal network architecture with minimal number of connections and neurons without deteriorating the performance of the network significantly. Experimental results show that the performance of the simplified network is quite significant and acceptable compared to fully connected network. This simplification of the network ensures both reliability and reduced computational cost.

Reference [1] T. Ash, “Dynamic node creation in backpropagation networks”,

Connection Sci ., vol. 1, pp. 365–375, 1989. [2] R. W. Brause, “Medical Analysis and Diagnosis by Neural Networks”, J.W. Goethe-University, Computer Science Dept., Frankfurt a. M., Germany. [3] J. W., Everhart, J. E., Dickson, W. C., Knowler, W. C., Johannes, R. S., “Using the ADAP learning algorithm to forecast the onset of diabetes mellitus”,

Proc. Symp. on Computer Applications and Medical Care (Piscataway, NJ: IEEE Computer Society Press), pp. 261–5, 1988.

CTM 2005 [4] S. E. Fahlman and C. Lebiere, “The cascade-correlation learning architecture”, in Advances in Neural Information Processing System 2, D. S. Touretzky, Ed. San Mateo, CA:

Morgan Kaufmann , pp. 524-532, 1990. [5] Simon Haykin, “Neural Networks- A Comprehensive Foundation”, Second Edition,

Pearson Edition Asia , Third Indian Reprint, 2002. [6] T. Y. Kwok and D. Y. Yeung, “Constructive Algorithm for Structure Learning in feed- forward neural network for regression problems,”

IEEE Trans. Neural Networks , vol. 8, pp. 630-645, 1997. [7] M. Monirul. Islam and K. Murase, “A new algorithm to design compact two hidden-layer artificial neural networks”,

Neural Networks , vol. 4, pp. 1265–1278, 2001. [8] M. Monirul Islam, M. A. H. Akhand, M. Abdur Rahman and K. Murase, “Weight Freezing to Reduce Training Time in Designing Artificial neural Networks”,

Proceedings of 5 th ICCIT , EWU, pp. 132-136, 27-28 December 2002. [9] R. Parekh, J.Yang, and V. Honavar, “Constructive Neural Network Learning Algorithms for Pattern Classification”,

IEEE Trans. Neural Networks, vol. 11, no. 2, March 2000. [10] L. Prechelt, “Proben1-A Set of Neural Network Benchmark Problems and Benchmarking Rules”,

University of Karlsruhe , Germany, 1994. [11] R. Reed, “Pruning algorithms-A survey,”

IEEE Trans. Neural Networks , vol. 4, pp. 740-747, 1993. [12] R. Setiono and L.C.K. Hui, “Use of quasi-Newton method in a feedforward neural network construction algorithm”,

IEEE Trans. Neural Networks, vol. 6, no.1, pp. 273-277, Jan. 1995. [13] R. Setiono, Huan Liu, “Understanding Neural networks via Rule Extraction”,

In Proceedings of the International Joint conference on Artificial Intelligence, pp. 480-485, 1995. [14] R. Setiono, Huan Liu, “Improving Backpropagation Learning with Feature Selection”,

Applied Intelligence, vol. 6, no. 2, pp. 129-140, 1996. [15] R. Setiono, “Extracting rules from pruned networks for breast cancer diagnosis”,

Artificial Intelligence in Machine, vol. 8, no. 1, pp. 37-51, 1996. [16] R. Setiono, “A penalty function approach for pruning Feedforward neural networks”,

Neural Computation, vol. 9, no. 1, pp. 185-204, 1997. [17] R. Setiono, “Techniques for extracting rules from artificial neural networks,”

Plenary Lecture presented at the 5th International Conference on Soft Computing and Information Systems , Iizuka, Japan, October 1998. [18] R. Setiono W. K. Leow and J. M. Zurada, “Extraction of rules from artificial neural networks for nonlinear regression”,

IEEE Trans. Neural Networks, vol. 13, no.3, pp. 564-577, 2002. [19] W. H. Wolberg and O.L. Mangasarian, “Multisurface method of pattern separation for medical diagnosis applied to breast cytology,”