Curse of Dimensionality for TSK Fuzzy Neural Networks: Explanation and Solutions
aa r X i v : . [ c s . L G ] F e b Curse of Dimensionality for TSK Fuzzy NeuralNetworks: Explanation and Solutions
Yuqi Cui, Dongrui Wu and Yifan Xu
School of Artificial Intelligence and AutomationHuazhong University of Science and Technology
Wuhan, China { yqcui, drwu, yfxu } @hust.edu.cn Abstract —Takagi-Sugeno-Kang (TSK) fuzzy system withGaussian membership functions (MFs) is one of the most widelyused fuzzy systems in machine learning. However, it usuallyhas difficulty handling high-dimensional datasets. This paperexplores why TSK fuzzy systems with Gaussian MFs may failon high-dimensional inputs. After transforming defuzzificationto an equivalent form of softmax function, we find that the poorperformance is due to the saturation of softmax . We show thattwo defuzzification operations, LogTSK and HTSK, the latter ofwhich is first proposed in this paper, can avoid the saturation.Experimental results on datasets with various dimensionalitiesvalidated our analysis and demonstrated the effectiveness ofLogTSK and HTSK.
Index Terms —Mini-batch gradient descent, fuzzy neural net-work, high-dimensional TSK fuzzy system, HTSK, LogTSK
I. I
NTRODUCTION
Takagi-Sugeno-Kang (TSK) fuzzy systems [1] haveachieved great success in numerous machine learning applica-tions, including both classification and regression. Since a TSKfuzzy system is equivalent to a five layer neural network [2],[3], it is also known as TSK fuzzy neural network.Fuzzy clustering [4]–[6] and evolutionary algorithms [7],[8] have been used to determine the parameters of TSK fuzzysystems on small datasets. However, their computational costmay be too high for big data. Inspired by its great success indeep learning [9]–[11], mini-batch gradient descent (MBGD)based optimization was recently proposed for training TSKfuzzy systems [12], [13].Traditional optimization algorithms for TSK fuzzy systemsuse grid partition to partition the input space into differentfuzzy regions, whose number grows exponentially with theinput dimensionality. A more popular and flexible way isclustering-based partition, e.g., fuzzy c-means (FCM) [14],EWFCM [15], ESSC [4], [16] and SESSC [6], in which thefuzzy sets in different rules are independent and optimizedseparately.Although the combination of MBGD-based optimizationand clustering-based rule partition can handle the problemof optimizing antecedent parameters on high-dimensionaldatasets, TSK fuzzy systems still have difficulty achievingacceptable performance. The main reason is the curse ofdimensionality, which affects all machine learning models.When the input dimensionality is high, the distances betweendata points become very similar [17]. TSK fuzzy systems usually use distance based approaches to compute membershipgrades, so when it comes to high-dimensional datasets, thefuzzy partitions may collapse. For instance, FCM is knownto have trouble handling high-dimensional datasets, becausethe membership grades of different clusters become similar,leading all centers to move to the center of the gravity [18].Most previous works used feature selection or dimensional-ity reduction to cope with high-dimensionality. Model-agnosticfeature selection or dimensionality reduction algorithms, suchas Relief [19] and principal component analysis (PCA) [20],[21], can filter the features before feeding them into TSKmodels. Neural networks pre-trained on large datasets can alsobe used as feature extractor to generate high-level features withlow dimensionality [22], [23].There are also approaches to select the fuzzy sets in eachrule so that rules may have different numbers of antecedents.For instance, Alcala-Fdez et al. proposed an association anal-ysis based algorithm to select the most representative patternsas rules [24]. C´ozar et al. further improved it by proposing alocal search algorithm to select the optimal fuzzy regions [25].Xu et al. proposed to use the attribute weights learned bysoft subspace fuzzy clustering to remove fuzzy sets with lowweights to build a concise TSK fuzzy system [4]. However,there were few approaches that directly train TSK models onhigh-dimensional datasets.Our previous experiments found that when using MBGD-based optimization, the initialization of the standard deviationof Gaussian membership functions (MFs), σ , is very importantfor high-dimensional datasets, and larger σ may lead tobetter performance. In this paper, we demonstrate that thisimprovement is due to the reduction of saturation caused bythe increase of dimensionality. Furthermore, we validate twoconvenient approaches to accommodate saturation.Our main contributions are: • To the best of our knowledge, we are the first to discoverthat the curse of dimensionality in TSK modeling is dueto the saturation of the softmax function. As a result,there exists an upper bound on the number of rules thateach input can fire. Furthermore, the loss landscape of asaturated TSK system is more rugged, leading to worsegeneralization. • We demonstrate that the initialization of δ should becorrelated with the input dimensionality to avoid satu-ation. Based on this, we propose a high-dimensionalTSK (HTSK) algorithm, which can be viewed as a newdefuzzification operation or initialization strategy. • We validate LogTSK [23] and our proposed HTSK ondatasets with a large range of dimensionality. The resultsindicate that HTSK and LogTSK can not only avoidsaturation, but also are more accurate and more robustthan the vanilla TSK algorithm with simple initialization.The remainder of this paper is organized as follows: Sec-tion II introduces TSK fuzzy systems and the saturation phe-nomenon on high-dimensional datasets. Section III introducesthe details of LogTSK and our proposed HTSK. Section IVvalidates the performances of LogTSK and HTSK on datasetswith various dimensionality. Section V draws conclusions.II. T
RADITIONAL
TSK F
UZZY S YSTEMS ON H IGH - DIMENSIONAL D ATASETS
This section introduces the details of TSK fuzzy system withGaussian MF [26], the equivalence between defuzzificationand softmax function, and the saturation phenomenon of softmax on high-dimensional datasets.
A. TSK Fuzzy Systems
Let the training dataset be D = { x n , y n } Nn =1 , in which x n = [ x n, , ..., x n,D ] T ∈ R D × is a D -dimensional featurevector, and y n ∈ { , , ..., C } can be the corresponding classlabel for a C -class classification problem, or y n ∈ R for aregression problem.Suppose a D -input single-output TSK fuzzy system has R rules. The r -th rule can be represented as:Rule r : IF x is X r, and · · · and x D is X r,D , Then y r ( x ) = b r, + D X d =1 b r,d x d , (1)where X r,d ( r = 1 , ..., R ; d = 1 , ..., D ) is the MF for the d -th attribute in the r -th rule, and b r,d , d = 0 , ..., D , are theconsequent parameters. Note that here we only take single-output TSK fuzzy systems as an example, but the phenomenonand conclusion can also be extended to multi-output TSKsystems.Consider Gaussian MFs. The membership degree µ of x d on X r,d is: µ X r,d ( x d ) = exp − ( x d − m r,d ) σ r,d ! , (2)where m r,d and σ r,d are the center and the standard deviationof the Gaussian MF X r,d , respectively.The final output of the TSK fuzzy system is: y ( x ) = P Rr =1 f r ( x ) y r ( x ) P Ri =1 f i ( x ) , (3)where f r ( x ) = D Y d =1 µ X r,d ( x d ) = exp − D X d =1 ( x d − m r,d ) σ r,d ! (4) is the firing level of Rule r . We can also re-write (3) as: y ( x ) = R X r =1 f r ( x ) y r ( x ) , (5)where f r ( x ) = f r ( x ) P Ri =1 f i ( x ) (6)is the normalized firing level of Rule r . (5) is the defuzzifica-tion operation of TSK fuzzy systems.In this paper, we use k -means clustering to initialize theantecedent parameters m r,d , and MBGD to optimize theparameters b r,d , m r,d and σ r,d . More specifically, we run k -means clustering and assign the R cluster centers to m r,d asthe centers of the rules. We use different initializations of σ r,d to validate their influence on the performance of TSK modelson high-dimensional datasets. He initialization [27] is used forthe consequent parameters. B. TSK Fuzzy Systems on High-Dimensional Datasets
When using Gaussian MFs and the product t -norm, we canre-write (6) as: f r ( x ) = f r ( x ) P Ri =1 f i ( x )= exp (cid:16) − P Dd =1 ( x d − m r,d ) σ r,d (cid:17)P Ri =1 exp (cid:16) − P Dd =1 ( x d − m i,d ) σ i,d (cid:17) . (7)Replacing − P Dd =1 ( x d − m r,d ) σ r,d with Z r , we can observe that f r is a typical softmax function: f r ( x ) = exp( Z r ) P Ri =1 exp( Z i ) , (8)where Z r < ∀ x . We can also show that, as the dimension-ality increases, Z r decreases, which causes the saturation of softmax [28].Let Z = [ Z , ..., Z R ] and f = [ f , ..., f R ] . In a three-rule TSK fuzzy system for low-dimensional task, if Z =[ − . , − . , − . , then f = [0 . , . , . . As thedimensionality increases, Z may increase to, for example, [ − , − , − , and then f = [1 , × − , × − ] , whichmeans the final prediction is dominated by one rule. In otherwords, f in (8) with high-dimensional inputs tends to onlygive non-zero firing level to the rule with the maximum Z r .In order to avoid numeric underflow, we compute thenormalized firing level by a common trick: f r ( x ) = exp( Z r − Z max ) P Ri =1 exp( Z i − Z max ) , (9)where Z max = max( Z , ..., Z R ) . In this paper, we considerthat a rule is fired by x when the corresponding normalizedfiring level f r ( x ) > − .We generate a two-class toy dataset following Gaussiandistribution x i ∼ N (0 , with the dimensionality varyingrom 5 to 2,000 for pilot experiments. The labels are alsogenerated randomly. We initialize σ following Gaussian distri-bution σ ∼ N ( h, . , h = 1 , , , , and train TSK modelswith different R for 30 epochs. The number of rules fired byeach input with different dimensionality at different trainingepochs is shown in Fig. 1. The number of fired rules decreasesrapidly as the dimensionality increases when h = 1 . For aparticular input dimensionality D , there exists an upper boundof the number of fired rules, i.e., larger R would not alwaysincrease the number of fired rules. Increasing h can mitigatethe saturation phenomenon to a certain extent and increase theupper bound of the number of fired rules. , Q S X W G L P H Q V L R Q 1 X P R I I L U H G U X O H V , Q L W , Q S X W G L P H Q V L R Q 1 X P R I I L U H G U X O H V ( S R F K h = 1 R = 200R = 150R = 100R = 50R = 10 (a) , Q S X W G L P H Q V L R Q 1 X P R I I L U H G U X O H V , Q L W , Q S X W G L P H Q V L R Q 1 X P R I I L U H G U X O H V ( S R F K h = 5 R = 200R = 150R = 100R = 50R = 10 (b) , Q S X W G L P H Q V L R Q 1 X P R I I L U H G U X O H V , Q L W , Q S X W G L P H Q V L R Q 1 X P R I I L U H G U X O H V ( S R F K h = 10 R = 200R = 150R = 100R = 50R = 10 (c) , Q S X W G L P H Q V L R Q 1 X P R I I L U H G U X O H V , Q L W , Q S X W G L P H Q V L R Q 1 X P R I I L U H G U X O H V ( S R F K h = 50 R = 200R = 150R = 100R = 50R = 10 (d)Fig. 1. The average number of fired rules versus the input dimensionalityon randomly generated datasets. σ of TSK models is initialized by Gaussiandistribution σ ∼ N ( h, . , h = 1 , , , . The first and second columnsrepresent the model before training and after 30 epochs of training, respec-tively. Although each high-dimensional input feature vector canonly fire a limited number of rules due to saturation, differentinputs may fire different subsets of rules, which means evertrule is useful to the final predictions. We compute the average normalized firing level of the r -th rule during training by: A r = 1 N N X n =1 f r ( x n ) . (10)We train TSK models with 60 rules and computed the 5%-95% percentiles of A r , r = 1 , ..., R , during training on datasetBooks from Amazon product review datasets. The details ofthis dataset will be introduced in Section IV-A. We repeat theexperiments ten times and show the average results in Fig. 2.Except a small number of rules with high A r , most rules barelycontribute to the prediction. This phenomenon doesn’t changeas the training goes on. ( S R F K V A r P P P P P Fig. 2. Different percentiles of A r , r = 1 , ..., R versus the training epochs. C. Enhance the Performance of TSK Fuzzy Systems on High-Dimensional Datasets
The easiest way to mitigate saturation is to increase thescale of σ . As indicated by (7) and (8), increasing the scale of σ also increases the value of Z r to avoid saturation. Similartricks have already been used for training TSK models withfuzzy clustering algorithms, such as FCM [14], ESSC [16] andSESSC [6]. The parameter σ is computed by: σ r,d = h " N X n =1 U n,r ( x n,d − V r,d ) , N X i =1 U i,r / , (11)where U n,r is the membership grade of x n in the r -th cluster, V r, · is the center of the r -th cluster, and h is used to adjust thescale of σ r,d . The larger h is, the smaller | Z r | is. For MBGD-based optimization, we can directly initialize σ with a propervalue to avoid saturation in training. However, the proper scaleparameter h for σ usually depends on the characteristics of thetask, which requires trial-and-error, or time-consuming cross-validation.A better way is to use a scaling factor depending on thedimensionality D to constrain the range of | Z r | . A similarapproach is used in the Transformer [29], in which a scalingfactor / √ d k is used to constrain the value of QK T . Whenthe distribution of the constrained Z r is no longer dependingon the dimensionality D , all we have to do is to choose oneproper initialization range of σ suitable for most datasets.Alternatively, we can use other normalization approacheshich are insensitive to the scale of Z r . For instance, we canreplace the defuzzification by f r ( Z r ) = Z r k [ Z , ..., Z R ] k (12)or f r ( Z r ) = Z r k [ Z , ..., Z R ] k , (13)so that f r ( Z r ) = f r ( hZ r ) ∀ h > .III. D EFUZZIFICATION FOR H IGH -D IMENSIONAL P ROBLEMS
This section introduces LogTSK proposed by Du etal. [23] and our proposed HTSK. Both are suitable for high-dimensional problems.
A. LogTSK
Recently, an algorithm called TCRFN was proposed forpredicting driver’s fatigue using the combination of con-volutional neural network (CNN) and recurrent TSK fuzzysystem [23]. Within TCRFN, a logarithm transformation of f r was proposed to “amplify the small differences on firinglevels” . The firing level and normalized firing level of the r -th rule in TCRFN are: f logr = − f r = 1 P Dd =1 ( x d − m r,d ) σ r,d f logr = f logr P Ri =1 f logr . (14)The final output is: y ( x ) = R X r =1 f logr ( x ) y r ( x ) . (15)We denote the TSK fuzzy system with this log-transformeddefuzzification LogTSK in this paper. Substituting Z r into (14)gives f logr = − /Z r − P Ri =1 /Z i = − /Z r k [1 /Z , ..., /Z R ] k , (16)i.e., LogTSK avoids the saturation by changing the normaliza-tion from softmax to L normalization. Since L normalizationis not affected by the scale of Z r , LogTSK can make TSKfuzzy systems trainable on high-dimensional datasets. B. Our Proposed HTSK
We propose a simple but very effective approach, HTSK(high-dimensional TSK), to enable TSK fuzzy systems todeal with datasets with any dimensionality, by avoiding thesaturation in (8). HTSK constrains the scale of | Z r | by simplychanging the sum operator in Z r to average: Z ′ r = − D D X d =1 ( x d − m r,d ) σ r,d . (17) We can understand this transformation from the perspective ofdefuzzification. (5) can be rewritten as: y ( x ) = R X r =1 f ′ r ( x ) y r ( x ) , (18)where f ′ r ( x ) = f r ( x ) /D P Ri =1 f i ( x ) /D = exp( Z ′ r ) P Ri =1 exp( Z ′ i ) . (19)In this way, the scale of | Z ′ r | no longer depends on thedimensionality D . Even in a very high dimensional space, ifthe input feature vectors are properly pre-processed (z-scoreor zero-one normalization, etc.), we can still guarantee thestability of HTSK.HTSK is equivalent to adaptively increasing σ √ D times inthe vanilla TSK, i.e., the initialization of σ should be correlatedwith the input dimensionality. The vanilla TSK fuzzy systemis a special case of HTSK when setting D = 1 .IV. R ESULTS
In this section, we validate the performances of LogTSKand our proposed HTSK on multiple datasets with varyingsize and input dimensionality.
A. Datasets
Fourteen datasets with the dimensionality D varying from10 to 4,955 were used. Their details are summarized in Table I.For FashionMNIST and MNIST, we used the official training-test partition. For other datasets, we randomly selected 70%samples for training and the remaining 30% for test. TABLE IS
UMMARY OF THE FOURTEEN DATASETS .Dataset Num. of features Num. of samples Num. of classesVowel
10 990 11Vehicle
18 596 4Biodeg
41 1,055 2Sensit
100 78,823 3Usps
256 7,291 10Books
400 2,000 2DVD
400 1,999 2ELEC
400 1,998 2Kitchen
400 1,999 2Isolet
617 1,560 26MNIST
784 60,000 10FashionMNIST
784 60,000 10Colon https://archive.ics.uci.edu/ml/datasets/QSAR+biodegradation https://jmcauley.ucsd.edu/data/amazon/ https://archive.ics.uci.edu/ml/datasets/isolet http://yann.lecun.com/exdb/mnist/ https://github.com/zalandoresearch/fashion-mnist . Algorithms We compared the following five algorithms: • PCA-TSK: We first perform PCA, and keep only a fewcomponents that capture 95% of the variance, to reducethe dimensionality, and then train a vanilla TSK fuzzysystem introduced in Section II. The parameter σ is ini-tialized following Gaussian distribution σ ∼ N (1 , . . • TSK- h : This is the vanilla TSK fuzzy system introducedin Section II. The parameter σ is initialized followingGaussian distribution σ ∼ N ( h, . . We set h to { , , , } to validate the influence of saturation onthe generalization performance. • TSK-BN-UR: This is the TSK-BN-UR algorithm in [13].The weight for UR is selected by the validation set. Theparameter σ is initialized following Gaussian distribution σ ∼ N (1 , . . • LogTSK: TSK with the log-transformed defuzzificationintroduced in Section III-A. The parameter σ is initializedfollowing Gaussian distribution σ ∼ N (1 , . . Otherparameters are initialized by the method described inSection II-A. • HTSK: This is our proposed HTSK in Section III-B. Theparameter σ is initialized following Gaussian distribution σ ∼ N (1 , . .All parameters except σ were initialized as described inSection II-A. All models were trained using MBGD-basedoptimization. We used Adam [10] optimizer. The learning ratewas set to 0.01, which was the best learning rate chosen bycross-validation on most datasets. The batch size was set to2,048 for MNIST and FashionMNIST, and 512 for all otherdatasets. If the batch size was larger than the total number ofsamples N t in the training set, then we set it to min( N t , .We randomly selected 10% samples from the training set asthe validation set for early-stopping. The maximum numberof epochs was set to 200, and the patience of early-stoppingwas 20. The best model on the validation set was used fortesting. We ran all TSK algorithms with the number of rules R = 30 . All algorithms were repeated ten times and theaverage performance was reported.Note that the aim of this paper is not to pursue the state-of-the-art performance on each dataset, so we didn’t use cross-validation to select the best hyper-parameters on each dataset,such as the number of rules. We only aim to demonstratewhy TSK fuzzy systems perform poorly on high-dimensionaldatasets, and the improvement of HTSK and LogTSK. C. Generalization Performances
The average test accuracies of the eight TSK algorithmswith 30 rules are shown in Table II. The best accuracy oneach dataset is marked in bold. We can observe that: • On average, HTSK and LogTSK had similar perfor-mance, and both outperformed other TSK algorithmson a large range of dimensionality. TSK- and TSK- performed well on datasets within a certain rangeof dimensionality, but they were not always optimal when the dimensionality changed. For instance, on Colon, h = 50 was better than h = 5 or , but on Vowel, h = 1 or were better than h = 10 or . However,HTSK and LogTSK always achieved optimal or close-to-optimal performance on those datasets. The results alsoindicate that the initialization of h should be correlatedwith D , and h = √ D is a robust initialization strategyfor datasets with a large range of dimensionality. • PCA-TSK performed the worst, which may be because ofthe loss of information during dimensionality reduction. Italso shows the necessity of training TSK models directlyon high-dimensional features. • In [13], TSK-BN-UR outperformed TSK- on low-dimensional datasets, but this paper shows that it doesnot cope well with high dimensional datasets.We also show the test accuracies of the eight TSK algorithmswith different number of rules in Fig. 3. HTSK and LogTSKoutperformed other TSK algorithms, regardless of R . D. Number of Fired Rules
We analyzed the number of fired rules by each input onHTSK and LogTSK, and show the results in Fig. 4. Thedataset used here is same as the one in Fig. 1. Both figuresshow that in HTSK and LogTSK, each high-dimensional inputfires almost all rules, even with a small initial σ . However,when the number of rules is large, for instance, R = 200 ,LogTSK’s number of fired rules is less than 200, but HTSK’snumber of fired rules is still 200. This may be caused by the L normalization of LogTSK, making the normalized firinglevel sparser than HTSK. E. Gradient and Loss Landscape
Figs. 1 and 4 show that h ≥ can counteract mostof the influence caused by saturation when D < , .Therefore, the performances of TSK- , TSK- and TSK- are very similar to HTSK and LogTSK on datasets withdimensionalities in that range.To study if the limited number of fired rules is the onlyreason causing the decrease of generalization performance,we further analyze the gradient and the loss landscape duringtraining. Because the scale of σ affects the gradients’ absolutevalues, we only compare the L norm of the gradients forTSK- , HTSK, and LogTSK. The parameter σ was initializedfollowing Gaussian distribution σ ∼ N (1 , . . We recordedthe L norm of the gradient of the antecedent parameters m and σ during training on the Books dataset. Fig. 5(a) and (b)show that the gradient of antecedent parameters from TSK- is significantly larger than HTSK and LogTSK, especially inthe initial training phase.We also visualize the loss landscape on the gradient direc-tion using the approach in [30]. Specifically, for each updatestep, we compute the gradient w.r.t. the loss and take one stepfurther using a fixed step: η × the gradient ( η = 1 ). Then, werecord the loss as the parameters move in that direction. Whenthe initial parameters from different runs are the same, theloss’ variation represents the smoothness of the loss landscape. ABLE IIA
VERAGE ACCURACIES OF THE EIGHT
TSK
ALGORITHMS WITH RULES ON THE FOURTEEN DATASETS .PCA-TSK TSK-BN-UR TSK- TSK- TSK- TSK- LogTSK HTSKVowel 80.81 87.21 87.91 87.58 55.49 49.83 85.42
Vehicle 70.28 71.73 72.68 75.31 73.80 72.07 75.25
Biodeg 84.86
Books 73.95 75.55 76.42 79.28 78.70 78.83 78.87
DVD 75.53 75.32 76.05 78.97 78.67 78.42
Elec 75.68 78.72 79.45 81.28 81.45
Gisette 93.66 94.14 95.80 95.38
TABLE IIIA
CCURACY RANKS OF THE EIGHT
TSK
ALGORITHMS WITH RULES ON THE FOURTEEN DATASETS .PCA-TSK TSK-BN-UR TSK- TSK- TSK- TSK- LogTSK HTSKVowel 6 4 2 3 7 8 5 1Vehicle 8 7 5 2 4 6 3 1Biodeg 6 1 5 3 8 7 4 2Sensit 7 5 8 4 3 6 1 2USPS 8 7 6 4 3 5 2 1Books 8 7 6 2 5 4 3 1DVD 7 8 6 3 4 5 1 1Elec 8 7 6 4 2 1 3 5Kitchen 8 7 6 1 5 4 3 2Isolet 6 7 8 5 1 4 2 3MNIST 7 6 8 4 2 5 1 3FashionMNIST 7 6 8 4 2 5 1 3Colon 8 7 5 4 6 1 3 1Gisette 8 7 5 6 1 3 2 4Average 7.3 6.1 6.0 3.5 3.8 4.6 2.4 1 X P R I U X O H V 7 H V W $ F F X U D F \ 9 R Z H O 1 X P R I U X O H V 7 H V W $ F F X U D F \ 9 H K L F O H 1 X P R I U X O H V 7 H V W $ F F X U D F \ % L R G H J 1 X P R I U X O H V 7 H V W $ F F X U D F \ 6 H Q V L W 1 X P R I U X O H V 7 H V W $ F F X U D F \ 8 6 3 6 1 X P R I U X O H V 7 H V W $ F F X U D F \ % R R N V 1 X P R I U X O H V 7 H V W $ F F X U D F \ ' 9 ' 1 X P R I U X O H V 7 H V W $ F F X U D F \ ( / ( & 1 X P R I U X O H V 7 H V W $ F F X U D F \ . L W F K H Q 1 X P R I U X O H V 7 H V W $ F F X U D F \ , V R O H W 1 X P R I U X O H V 7 H V W $ F F X U D F \ 0 1 , 6 7 1 X P R I U X O H V 7 H V W $ F F X U D F \ ) D V K L R Q 0 1 , 6 7 1 X P R I U X O H V 7 H V W $ F F X U D F \ &