[PDF] A Bayesian Approach with Type-2 Student-tMembership Function for T-S Model Identification

Abstract

Clustering techniques have been proved highly suc-cessful for Takagi-Sugeno (T-S) fuzzy model identification. Inparticular, fuzzyc-regression clustering based on type-2 fuzzyset has been shown the remarkable results on non-sparse databut their performance degraded on sparse data. In this paper, aninnovative architecture for fuzzyc-regression model is presentedand a novel student-tdistribution based membership functionis designed for sparse data modelling. To avoid the overfitting,we have adopted a Bayesian approach for incorporating aGaussian prior on the regression coefficients. Additional noveltyof our approach lies in type-reduction where the final output iscomputed using Karnik Mendel algorithm and the consequentparameters of the model are optimized using Stochastic GradientDescent method. As detailed experimentation, the result showsthat proposed approach outperforms on standard datasets incomparison of various state-of-the-art methods.

Full PDF

aa r X i v : . [ c s . A I] S e p A Bayesian Approach with Type-2 Student- t Membership Function for T-S Model Identiﬁcation

Vikas Singh, Homanga Bharadhwaj and Nishchal K Verma

Abstract —Clustering techniques have been proved highly suc-cessful for Takagi-Sugeno (T-S) fuzzy model identiﬁcation. Inparticular, fuzzy c -regression clustering based on type-2 fuzzyset has been shown the remarkable results on non-sparse databut their performance degraded on sparse data. In this paper, aninnovative architecture for fuzzy c -regression model is presentedand a novel student- t distribution based membership functionis designed for sparse data modelling. To avoid the overﬁtting,we have adopted a Bayesian approach for incorporating aGaussian prior on the regression coefﬁcients. Additional noveltyof our approach lies in type-reduction where the ﬁnal output iscomputed using Karnik Mendel algorithm and the consequentparameters of the model are optimized using Stochastic GradientDescent method. As detailed experimentation, the result showsthat proposed approach outperforms on standard datasets incomparison of various state-of-the-art methods. Index Terms —TSK Model, Fuzzy c -Regression, Student- t dis-tribution I. I

NTRODUCTION T HERE have been numerous research focusing on model-ing of non-linear systems through their input and outputmapping. In particular, fuzzy logic based approaches havebeen very successful in modeling of non-linear dynamics inthe presence of uncertainties [1]. Type-1 fuzzy logic enablessystem identiﬁcation and modeling by virtue of numerouslinguistic rules. Although this approach performs well, but dueto the limitation of crisp membership values, its potential tohandle the uncertainty in data is limited. Therefore, in orderto successfully model the data uncertainty, a type-2 fuzzylogic was proposed, where membership values of each datapoints are themselves fuzzy. The type-2 fuzzy logic has beenremarkably successful in past due to its robustness in thepresence of imprecise and noisy data [2], [3].The basic steps used in fuzzy inference system are thestructure and parameter identiﬁcation of the model. Thestructure identiﬁcation is related to the process of selectingnumber of rules, input features and partition of input-outputspace while parameter identiﬁcation is used to compute theantecedent and consequent parameters of the model. In theliterature’s fuzzy clustering have been widely used for fuzzyspace partitioning since, a T-S fuzzy model is comprised ofvarious locally weighted linear regression models, many ofthem are hyperplane based models incorporating a hyperplanebased clustering and seems very effective for structure iden-tiﬁcation. In particular, fuzzy c -regression clustering that are Vikas Singh Homanga Bharadhwaj, and Nishchal K Verma are withthe Department of Electrical Engineering, IIT Kanpur, India (e-mail:[email protected], [email protected], [email protected]) hyperplane-shaped clustering becomes more popular [4], [5].The architecture of these algorithms are robust in partitioningthe data space, inferring estimates of outputs with the inputsand determining an optimum ﬁt for the regression model.Previously proposed techniques like fuzzy c -regression model(FCRM) and fuzzy c -mean (FCM) have been developed fortype-2 fuzzy logic framework. Here, upper and lower member-ship values are determined by simultaneously optimizing twoobjective functions. Interval type-2 (IT2) FCRM, which waspresented recently for the T-S regression framework has shownsigniﬁcantly better performance in terms of error minimizationand robustness in comparison of type-1 fuzzy logic [4]–[6].In this paper, we have combined the Gaussian and student- t density type membership function for an IT2 FCRM frame-work. This is a hyperplane-shaped membership function withrelatively two different weighed terms. The student- t densitypart is weighed more if the data being modeled is sparse.The student- t distribution is a popular prior for sparse datamodelling in Bayesian inference. Therefore, it is used in ourmodel. The stochastic gradient descent (SGD) technique isused for optimization of consequent parameters and KarnikMendel (KM) algorithm is applied for type reduction ofestimated output. We have used L regularization of theregression coefﬁcients in the IT2 fuzzy c -means clustering foridentiﬁcation of antecedent parameters. As demonstrated inthe results section, the regularization helps our model againstoverﬁtting of the training data and increases the generalizationon unseen data. In addition, an innovative scheme for optimiz-ing the consequent parameters is also presented, wherein wedo not perform type reduction of the type-1 weight sets priorfor output estimation. Instead, we use KM algorithm for theoutput to infer an optimal interval type-1 fuzzy set and the setboundaries are optimized by SGD method.The rest of the paper is organized as: In Section II, wediscuss the TSK fuzzy model, IT2-FCR and IT2-FCRM. InSection III, we describe the proposed approach . In SectionIV, we present the efﬁcacy of proposed approach throughexperimentation. Finally, Section V concludes the paper.II. P RELIMINARIES

A. TSK Fuzzy Model

TSK fuzzy model provides a rule-based structure for mod-eling a complex non-linear system. If G ( x , y ) is the system tobe identiﬁed, where x ∈ R n be the input vector and y ∈ R bethe output. Then, the i th rule is written asRule i : IF x i is A i and · · · and x m is A im THEN y i = θ i + θ i x + · · · + θ im x m (1) where, i = , · · · , c is the number of fuzzy rule and y i isthe i th output. Using these rules, we can infer the ﬁnal modeloutput as follows: y = ∑ ci = w i y i ∑ ci = w i , w i = m ∏ j = µ iA j x j (2)where, w i denotes the overall ﬁring strength of the i th rule. B. Interval Type-2 FCM (IT2-FCM)

In the interval type-2 FCM, two different objective func-tion that differ in their degrees of fuzziness are optimizedsimultaneously using Lagrange multipliers to obtain the upperand lower membership function [3]. Let m and m be thetwo degree of fuzziness, then the two objective function aredescribed as Q m ( U , v ) = N ∑ k = c ∑ i = µ i ( x k ) m E ik ( ζ i ) Q m ( U , v ) = N ∑ k = c ∑ i = µ i ( x k ) m E ik ( ζ i ) (3) C. Inter Type-2 Fuzzy c-Regression Algorithm (IT2-FCR)

The main motivation of inter type-2 fuzzy c -regressionalgorithm is to partition the set of n data points ( x k , y k )( k = · · · n ) into c clusters. The data points in every cluster i can be described by a regression model asˆ y k = g i ( x k , ζ i ) = b i x k + · · · + b im x km + b i = [ x k ] ζ Ti (4)where, x k = [ x k , · · · , x km ] be the k th input vector, j = , · · · , m be the number of features, i = , · · · , c be the numberof clusters and ζ i = [ b i , · · · , b im , b i ] be the coefﬁcient vector ofthe i th cluster. In [4], the coefﬁcient vectors are optimized byweighted least square method, whereas, in our approach weused SGD. The primary objective for using SGD is to makethe algorithm robust even for the cases where [ x T P i x ] becomesingular [6]. III. P ROPOSED M ETHODOLOGY

In this paper we have presented a new framework for FCRMwith an innovative student- t distribution based membershipfunction (MF) for sparse data modelling [6]–[8]. The presentedapproach is described in following subsections. A. Fuzzy-Space Partitioning

Firstly, we formulate the task of fuzzy-space partitioning asa Maximum

A-Posterior (MAP) over a squared error function[8]. Exploiting Bayes rule the MAP estimator is deﬁned as φ ( y ) = arg max x ∈ R n p ( x / y ) = arg max x ∈ R n p ( y / x ) p ( x ) (5)where, p ( x / y ) be the posterior, p ( y / x ) be the likelihood and p ( x ) be the prior distribution. Using the above equation theMAP estimator is expressed in term of regression problem as E ik ( ζ i ) = ( y k − g i ( x k , ζ i )) + λ m ∑ p = ( b ip ) (6) where, E ik ( ζ i ) be the MAP estimator, ( y k − g i ( x k , ζ i )) be thelikelihood, ∑ mp = ( b ip ) be the prior or called as a regularizer,which is equivalent to the Bayesian notion of having a prioron the regression weights b i for each cluster i and λ bethe regularizer control parameter. The regularizer reduces theoverﬁtting in cluster assignment by constraining the smallregression weights.In the proposed approach, we ﬁrst deﬁne two degrees offuzziness m and m , initializes the number of clusters c anda termination threshold ε . We also initialize the parameters ζ i and ζ i , which are the upper and lower regression coefﬁcientvectors of i th cluster. Then, the equation (7) is written in theterm upper and lower error function MAP estimator as follows: E ik ( ζ i ) = ( y k − g i ( x k , ζ i )) + λ m ∑ p = ( b ip ) E ik ( ζ i ) = ( y k − g i ( x k , ζ i )) + λ m ∑ p = ( b ip ) (7)To reduce the complexity of the system, a weighted averagetype reduction technique is applied to obtain E ik ( ζ i ) as E ik ( ζ i ) = ( E ik ( ζ i ) + E ik ( ζ i )) E ik ( ζ i ) , the upper and lower membership function forevery data points in each cluster are obtained as similar to [3]and they are given as follows: u ik =  ∑ cr = (cid:16) ( Eik ( ζ i ) Erk ( ζ i ) (cid:17) ( m − ) , if ∑ c (cid:16) ( Eik ( ζ i ) Erk ( ζ i ) (cid:17) < c ∑ cr = (cid:16) ( Eik ( ζ i ) Erk ( ζ i ) (cid:17) ( m − ) , otherwise u ik =  ∑ cr = (cid:16) ( Eik ( ζ i ) Erk ( ζ i ) (cid:17) ( m − ) , if ∑ c (cid:16) ( Eik ( ζ i ) Erk ( ζ i ) (cid:17) ≥ c ∑ cr = (cid:16) ( Eik ( ζ i ) Erk ( ζ i ) (cid:17) ( m − ) , otherwise (9)The above equation can be interpreted as that for a MAPproblem formulated in (8). To estimate the parameters ζ i and ζ i , we formulate the problem as a locally weighted linearregression with an objective function: J ( ζ i ) = n ∑ k = u ik ([ x k ] ζ Ti − y k ) (10)Here, u ik denotes the membership value of k th data point inthe i th cluster. The parameter ζ i and ζ i are estimated by SGDusing the above objective function by appropriately ﬁnding u ik and u ik . Then the regression coefﬁcient ( ζ i ) are obtained by atype reduction technique as follow: ζ i = ( ζ i + ζ i ) || ζ currenti − ζ previousi || ≥ ε toobtain the optimal value of the regression coefﬁcient as brieﬂydescribed in Algorithm 1. B. Identiﬁcation of Antecedent Parameters

The MF developed in [5] is hyperplane shaped, whichcannot successfully incorporate the relevant information ofdata distributions within different clusters. To overcome thisissue, we proposed a modiﬁed Gaussian based MF combinedwith a student- t density function. The student- t distribution iswidely used as a prior in Bayesian inference for sparse datamodelling [7]. Here, we weigh the Gaussian and the student- t part by a hyper-parameter α . If the data we are modelling isvery sparse then, α should be set very low so as to give moreweight to student- t density membership value. µ A i ( x k ) = α exp − η ( d ik ( ζ i ) − v i ( ζ i )) σ i ( ζ i ) ! +( − α ) + d ik ( ζ i ) r ! − ( r + ) (12) µ A i ( x k ) = α exp − η ( d ik ( ζ i ) − v i ( ζ i )) σ i ( ζ i ) ! +( − α ) + d ik ( ζ i ) r ! − ( r + ) (13)In the above, d ik is the distance between k th input vectorand i th cluster hyperplane. d ik ( ζ i ) = | x k . ζ i ||| ζ i || ; d ik ( ζ i ) = | x k . ζ i ||| ζ i || (14)where, r = max { d ik ( ζ i ) , i = · · · c } is the maximum dis-tance of k th input vector from the i th cluster, v i and σ i denotesthe average distance and variance of each data points from thecluster hyperplane respectively. v i ( ζ i ) = n ∑ k = d ik ( ζ i ) n ; σ i ( ζ i ) = n ∑ k = ( d ik ( ζ i ) − v i ( ζ i )) n (15)The lower MF ( µ A i ( x k ) ) and upper MF ( µ A i ( x k ) ) are calledas weights of the TSK fuzzy model corresponding to k th inputbelonging to the i th cluster. Algorithm 1

The Proposed Approach Begin for i=1 to m do Calculate ζ i and ζ i the upper and lower regressionvectors using (10) Calculate errors E ik ( ζ i ) , E ik ( ζ i ) using (7) Calculate upper and lower MFs ( u ik , u ik ) using (10) end The above identiﬁes optimal ζ i and ζ i ∀ i ∈ [ , c ] for i=1 to m do Compute the input MFs using (12) and (13)

Compute the interval type-2 output y k using (18) end End

C. Identiﬁcation of Consequent Parameters

In the most of the literature the defuzziﬁcation of weights iscomputed before determining the model output ˆ y k . The prob-lem with these approaches are that they do not consider effectof model output which will affect the over all performanceof the model. To overcome this problem we evaluated the y k and y k corresponding to the µ A i ( x k ) and µ A i ( x k ) using the KMalgorithm [2]. The values of y k and y k are optimized parallellyuntil the convergence. The another advantage of this approachis that it become more robust in handling noise and providea conﬁdence interval for every output data points. The modeloutput y k and y k corresponding to the weights µ A i ( x k ) and µ A i ( x k ) are calculate using (1) and (2) as follows: y k = p ∑ i = µ A i ( x k ) . ( θ i + θ i x k + · · · + θ iM x kM ) p ∑ i = µ A i ( x k ) + c ∑ i = p + µ A i ( x k )+ c ∑ i = p + µ A i ( x k ) . ( θ i + θ i x k + · · · + θ iM x kM ) p ∑ i = µ A i ( x k ) + c ∑ i = p + µ A i ( x k ) (16) y k = q ∑ i = µ A i ( x k ) . ( θ i + θ i x k + · · · + θ iM x kM ) q ∑ i = µ A i ( x k ) + c ∑ i = q + µ A i ( x k )+ c ∑ i = q + µ A i ( x k ) . ( θ i + θ i x k + · · · + θ iM x kM ) q ∑ i = µ A i ( x k ) + c ∑ i = q + µ A i ( x k ) (17)where, p and q are switching points and computed by KMalgorithm. We run above mentioned steps until the conver-gence of y k and y k . Finally , the model output is determinedby applying a type reduction technique as y k = y k + y k ESULTS & D

ISCUSSION

A. House Prices Dataset c = m = . m = . λ = . α = .

15 and η = .

7. It should be noted that thevalue of α = .

15 is small because the dataset is sparse.So, in MF, the contribution of student- t function should behigh, which is ensured by a smaller value of α i.e., largervalue of 1 − α as deﬁned by (12) and (13). The mean square Table I: C

OMPARISON OF PERFORMANCE ON HOUSE PRICES DATASET

Model LR RR RBFNN ITFRCM [9] TIFNN [9] RIT2FC [9] ProposedMSE 0.06 0.06 0.049 0.019 0.045 0.035

Coefﬁcient of Determination 0.68 0.69 0.67 0.73 0.77 0.79

Median Absolute Error 0.71 0.73 0.73 0.75 0.80 0.81

LR: Logistic Regression, RR: Ridge Regression, RBFNN: Radial Basis Function Neural Network, ITFRCM: Interval Type-2 Fuzzy c-Means,TIFNN: Type-1 Set-Based Fuzzy Neural Network, RIT2FC: Reinforced Interval Type-2 FCM-Based Fuzzy Classiﬁer error (MSE) is 0 .

008 on the test data, which is lower thanstate-of-the-art methods as shown in Table I. The absolutevalue of error as shown in Fig. 2 is also small in compare tothe absolute house prices as shown in Fig. 1. We postulatethat this is due to the student- t MF used in our model,which helps in robustly quantifying the effects of sparse data.Also, the higher test accuracy is due to greater generalizationowing to L regularizer used in our model. The coefﬁcientof variation which is the ratio of explained variance to totalvariance is very high (0 . H o u s e p r i c e s Actual OutputModel Output

Fig. 1: Performance comparison of model output with actual output R e g r e ss i o n E rr o r Fig. 2: Plot of test error for house prices dataset

B. Non-Linear Plant Modeling

The second-order non-linear difference equation as given in(20) is used in order to draw comparison with other benchmarkmodels as given in Table II. z ( k ) = ( z ( k − ) + . ) z ( k − ) z ( k − ) + z ( k − ) + z ( k − ) + v ( k ) (19)where, v ( k ) = sin ( k / ) is the input for validation ofmodel, z ( k ) is the model output whereas, z ( k − ) , z ( k − ) and u ( k ) are the model inputs respectively. The hyper-parametersare tuned by grid search and ﬁnally set as: c = m = . m = η = .

14. The obtained MSE of the modelon 500 test data points is 7 . × − using only four ruleswhich is much smaller compared to other models. Throughsimulations, we have shown that proposed model outperformswith other state-of-the-art model. The Fig. 3 shows that ourmodel output closely tracks the actual output at every time-step. As observed in Fig. 4, the error ﬂuctuates with data point,but the absolute error is consistently less than 0.1 with no rapidsurge at stationary points of time series data. This is a crucial Table II: P

ERFORMANCE ON NON - LINEAR TIME SERIES PROBLEM

State-of-the-Art Rules MSELi et al. [1] 4 1 . × − Fazel Zarandi [4] 4 5 . × − Li et al. [5] 4 1 . × − MIT2 FCRM [6] 4 1 . × − Proposed 4 . × − requirement for a stable system. Therefore, we conclude thatour algorithm yields a dynamically stable model. D a t a P o i n t Actual OutputModel Output

Fig. 3: Performance comparison of model output and actual outputFig. 4: Plot of test error on time series data

C. A sinc Function in one Dimension

In this subsection a non-linear sinc function is used topresent the effectiveness of the proposed model; y = sin ( x ) x (20)where, x ∈ [ − , ) S ( , ] . We have sampled 121 datapoints uniformly for this one dimensional function. As similarto previous case study, the number of rules is taken as four.The hyper-parameters are tuned through grid-search and ﬁnallyﬁxed as: m = . m = η = .

14. The MSE of theproposed model is 2 . × − , which is lower in compareto modiﬁed inter type-2 FRCM (MIT2-FCRM) [6], which is7 . × − on the test data of 121 samples. The Table IIIprovides a detailed comparison of performance with state-of-the-art methods. Table III:

PERFORMANCE ON sinc

FUNCTION

State-of-the-Art Rules MSESCM [10] 2 4 . × − EUM [10] 2 4 . × − EFCM [10] 2 8 . × − Fazel Zarandi [4] 4 2 . × − MIT2 FCRM [6] 4 7 . × − Proposed 4 . × − V. C

ONCLUSION

In this paper, we have illustrated the efﬁcacy of the proposedBayesian type-2 fuzzy regression approach using student- t [4] M. H. F. Zarandi, R. Gamasaee, and I. B. Turksen, “A type-2 fuzzy c-regression clustering algorithm for takagi-sugeno system identiﬁcationand its application in the steel industry,” Information Sciences , vol. 187,pp. 179–203, 2012.[5] C. Li, et al. , “T-S fuzzy model identiﬁcation with a gravitational search-based hyperplane clustering algorithm,”

IEEE Transactions on FuzzySystems , vol. 20, no. 2, pp. 305–317, 2012.[6] W. Zou, C. Li, and N. Zhang, “A T-S fuzzy model identiﬁcation approachbased on a modiﬁed inter type-2 FRCM algorithm,”

IEEE Trans. onFuzzy Syst. , vol. 26, no. 3, pp. 1104 – 1113, 2017.[7] V. E. E. Bening and V. Y. Korolev, “On an application of the studentdistribution in the theory of probability and mathematical statistics,”

Theory of Probability & Its Apls. , vol. 49, no. 3, pp. 377–391, 2005.[8] R. Gribonval, “Should penalized least squares regression be interpretedas maximum a posteriori estimation?”

IEEE Trans. on Signal Process. ,vol. 59, no. 5, pp. 2405–2410, 2011.[9] E. H. Kim, S. K. Oh, and W. Pedrycz, “Design of reinforced intervaltype-2 fuzzy c-means-based fuzzy classiﬁer,”

IEEE Trans. on FuzzySyst. , vol. 26, no. 5, pp. 3054 – 3068, 2017.[10] M. S. Chen and S. W. Wang, “Fuzzy clustering analysis for optimizingfuzzy membership functions,”

Fuzzy sets and systems , vol. 103, no. 2,pp. 239–254, 1999. distribution based MF. The proposed MF is useful for fuzzy c -mean regression models as demonstrated in section IV. Whenthe number of features are small in compared to the samples,clustering of input-output space yield to be very effectivefor identify the rules of the fuzzy system. In addition, wehave also demonstrated that instead of direct defuzziﬁcationof weights before computation of the ﬁnal output, a continuousdefuzziﬁcation and optimization gives better results.R EFERENCES[1] C. Li, et al. , “T-S fuzzy model identiﬁcation based on a novel fuzzyc-regression model clustering algorithm,”

Engineering Applications ofArtiﬁcial Intelligence , vol. 22, no. 4-5, pp. 646–653, 2009.[2] J. Mendel, “On KM algorithms for solving type-2 fuzzy set problems,”