[PDF] Interpretable collaborative data analysis on distributed data

Abstract

This paper proposes an interpretable non-model sharing collaborative data analysis method as one of the federated learning systems, which is an emerging technology to analyze distributed data. Analyzing distributed data is essential in many applications such as medical, financial, and manufacturing data analyses due to privacy, and confidentiality concerns. In addition, interpretability of the obtained model has an important role for practical applications of the federated learning systems. By centralizing intermediate representations, which are individually constructed in each party, the proposed method obtains an interpretable model, achieving a collaborative analysis without revealing the individual data and learning model distributed over local parties. Numerical experiments indicate that the proposed method achieves better recognition performance for artificial and real-world problems than individual analysis.

Full PDF

IInterpretable collaborative data analysis ondistributed data

Akira Imakura , Hiroaki Inaba , Yukihiko Okada , and Tetsuya Sakurai University of Tsukuba, 1-1-1 Tennodai, Ibaraki, Tsukuba 305-8573, Japan ∗ Email : [email protected]

Abstract

This paper proposes an interpretable non-model sharing collaborative data analysismethod as one of the federated learning systems, which is an emerging technology to an-alyze distributed data. Analyzing distributed data is essential in many applications suchas medical, ﬁnancial, and manufacturing data analyses due to privacy, and conﬁdential-ity concerns. In addition, interpretability of the obtained model has an important rolefor practical applications of the federated learning systems. By centralizing intermediaterepresentations , which are individually constructed in each party, the proposed methodobtains an interpretable model, achieving a collaborative analysis without revealing theindividual data and learning model distributed over local parties. Numerical experimentsindicate that the proposed method achieves better recognition performance for artiﬁcialand real-world problems than individual analysis.

In many applications, e.g., medical, ﬁnancial, and manufacturing data analyses, sharingthe original data for analysis may be difﬁcult due to privacy and conﬁdentiality require-ments. Distributed data analyses without revealing the individual data have recently at-tracted signiﬁcant attention resulting in the federated learning systems including model share-type federated learning [19, 20, 22, 25] and non-model share-type collaborative data analy-sis [4, 14, 15, 37]. In addition, for practical applications, it is known that the interpretability(i.e., “the degree to which a human can understand the cause of a decision” according toMiller’s deﬁnition [26]) of the obtained model plays an important role [11, 27].A motivating example would be the distributed medical data analysis for employees ofcompanies. In this scenario, employees (i.e., data samples) are distributed in multiple com-panies. Their medical and work records (i.e., features) are distributed in multiple parties, e.g.,the records of medical treatment and check are distributed in different medical institutions andthe work situations of the employees are stored in the company’s personnel department. Dueto the limited number of samples and features, the data in one party of one company couldlack some useful information for analysis. Centralizing the data from multiple parties forcollaborative analysis could help to learn more useful information and obtain high-qualitypredictions. However, due to privacy concerns, it is difﬁcult to share individual medicalrecords and work situations from multiple parties. A similar situation occurs in ﬁnancial andmanufacturing data analyses. Thus, collaborative data analysis for distributed data, which arepartitioned according to samples and features, is essential and important. a r X i v : . [ c s . L G ] N ov oreover, when companies aim to adopt policies or decisions according to analyses ofmachine-learning systems, the model should be interpretable; i.e., people need to understandthe reasons why the system obtained such results [2]. This will allow people to make moreuseful decisions. Therefore, when distributed data analysis is used as a tool to support deci-sion making, the model needs to be interpretable. Federated learning is based on deep neural networks and data collaboration analysis con-structs the multi-layer model via intermediate representations. Thus, the interpretability ofthe obtained model is not high, which could limit its use in some application areas. To the bestof our knowledge, there have been limited investigations on interpretable model constructionfor distributed data in the literature.To meet the above needs of distributed data analysis and interpretability, we propose aninterpretable non-model sharing collaborative data analysis on distributed data. The proposedmethod generates dimensionally-reduced intermediate representations from individual datain local parties, which are then shared instead of the individual data and models. The proposedmethod constructs an interpretable model for each party.The main contributions of this paper are summarized as follows:• The proposed method generates an interpretable model for distributed data based onsharing intermediate representations without revealing the private data and sharing themodel.• The obtained interpretable model is based on the whole features of distributed data,which is not possible to do in individual analysis.• Each party can individually select an interpretable model according to its own needs.• Numerical experiments on both artiﬁcial and real-world data show that the proposedmethod constructs an interpretable model with better recognition performance thanindividual analysis and comparable to that of centralized analysis.

In Section 2, we state the target distributed data and review related works. In Section 3, wepropose a novel interpretable collaborative data analysis. Numerical results are reported inSection 4. Finally, in Section 5, we summarize the results and conclude the paper.Note that throughout the paper, we use the MATLAB colon notation to refer to ranges ofmatrix elements.

In this paper, we consider the simple horizontal and vertical partitions. However, we note thatthe proposed method can also applied to more complicated situations described in [15].2et m and n denote the numbers of features and training data samples. In addition, let X = [ x , x , . . . , x n ] T ∈ R n × m and Y = [ y , y , . . . , y n ] T ∈ R n × (cid:96) be the training datasetand the corresponding ground truth, respectively. The n data samples are partitioned into c institutions and the m features are partitioned into d parties as follows: X =  X , X , · · · X ,d X , X , · · · X ,d ... ... . . . ... X c, X c, · · · X c,d  , Y =  Y Y ... Y c  . (1)Then, the ( i, j ) -th party has partial dataset and the corresponding ground truth, X i,j ∈ R n i × m j , Y i ∈ R n i × (cid:96) . (2) Individual analysis of the dataset in a local party may not have high-quality predictionsdue to the lack of feature information or insufﬁcient samples. If the datasets can be centralizedfrom multiple parties and analyze them as one dataset, i.e., centralized analysis , then weexpect to achieve high-quality predictions. However, it is difﬁcult to share individual data forcentralization due to privacy and conﬁdentiality concerns.All parties want to obtain an interpretable model to achieve the competitive predictionresults as centralized analysis without sharing the private dataset X i,j . Typical techniques for privacy-preserving distributed data analysis include cryptographiccomputations (or secure multi-party computation) [5, 9, 17] e.g., using fully homomorphicencryption [8], and methods using differential privacy [1, 6, 18], where randomization is usedto protect the privacy of the original datasets.Recently, federated learning has been actively studied for distributed data analysis [19,20, 25, 36], where the learning model is centralized while the original datasets remain dis-tributed in local parties. Google ﬁrst proposed the concept of federated learning in [19, 20],which is typically used for Android phone model updates [25]. Recently, there have beenseveral efforts to improve federated learning, e.g., see [22, 36] and reference therein. Notethat, for federated learning, we may need to care a privacy of the original dataset due tothe shared functional model [35]. Hence, non-model sharing-type method i.e., collaborativedata analysis , have been proposed for supervised learning [14, 15] and feature selection [37].The performance comparison between collaborative data analysis and federated learning arereported in [4].

In recent years, as machine learning has been used in various application in society, therehas been an active discussion to develop interpretable machine learning [2]. While regres-sions, rules, and decision trees have been considered to be interpretable in machine-learningmodels, decision trees in particular have long been used in the context of decision supportdue to the high transparency [2, 11]. Also, the need for model transparency from variousstakeholders has increased to replace high-performance black-box models currently used for3aking predictions [29]. Hence, to create an interpretable model with high prediction accu-racy, researchers have developed interpretable models that mimic the behavior of black-boxmodels.In response to these current needs, there is an opinion that since interpretability is adomain-speciﬁc concept, it is necessary to build models that consider ease of use and datastructure [30]. Therefore, interpretable model construction algorithms needs to be designedsuch that users allowed to freely set the model according to its own needs.

Here, we brieﬂy introduce the algorithm of collaborative data analysis [15] and propose anovel interpretable non-model sharing collaborative data analysis on distributed data.

Collaborative data analysis has been proposed in [14, 15] for distributed data together with apractical operation strategy to address privacy and conﬁdentiality concerns. Here, we brieﬂyintroduce the algorithm based on the practical operation strategy.In the practical operation strategy, collaborative data analysis is operated by two roles: user and analyst . Users have the private dataset X i,j and the corresponding ground truth Y i , which need to be analyzed without sharing X i,j . Each user individually constructs andimensionally-reduced intermediate representation and shares it to the analyst. To alloweach user to use an individual function for generating intermediate representation, the analysttransforms the shared intermediate representations to an incorporable form called collabora-tion representations and analyzes them as one dataset. Each user constructs the intermediate representation, (cid:101) X i,j = f i,j ( X i,j ) ∈ R n i × (cid:101) m i,j , where f i,j denotes a linear or nonlinear row-wise mapping function. A typical setting for f i,j is dimensionality reduction, with (cid:101) m i,j < m i,j , including unsupervised [12, 24, 28] and super-vised methods [7, 13, 23, 34]. To address privacy and conﬁdentiality concerns, the function f i,j should be set as• The private data X i,j can be obtained only if anyone has both the corresponding inter-mediate representation (cid:101) X i,j and the mapping function f i,j or its approximation.• The mapping function f i,j can be approximated only if anyone has both the input andoutput of f i,j .Then, the resulting intermediate representations (cid:101) X i,j are centralized to the analyst insteadof the original private data X i,j or the trained model. By sharing the intermediate repre-sentations (cid:101) X i,j while keeping the mapping function f i,j in each party, the collaborative dataanalysis can address the privacy and conﬁdentiality concerns.4 .1.2 Training Phase: Construction and analysis of collaboration representations Since f i,j depends on the user ( i, j ) , the analyst cannot analyze the shared intermediate rep-resentations as one dataset. To overcome this problem, the intermediate representations (cid:101) X i,j are transformed to incorporable collaboration representation as follows: (cid:98) X i = g i ( (cid:101) X i ) ∈ R n i × (cid:98) m , (cid:101) X i = [ (cid:101) X i, , (cid:101) X i, , . . . , (cid:101) X i,d ] ∈ R n i × (cid:101) m i , where a row-wise mapping function g i with (cid:101) m i = (cid:80) dj =1 (cid:101) m i,j and (cid:98) m = min i (cid:101) m i .To construct the mapping function g i for incorporable collaboration representations, an anchor dataset X anc ∈ R r × m , which is a shareable dataset consisting of public data ordummy data randomly constructed, is introduced. The anchor dataset is shared to all usersand is partitioned according to features, i.e., X anc = [ X anc: , , X anc: , , . . . , X anc: ,d ] , where X anc: ,j ∈ R r × m j . At the user-side, applying each mapping function f i,j to the corre-sponding subset of the anchor dataset, X anc: ,j becomes (cid:101) X anc i,j = f i,j ( X anc: ,j ) ∈ R r × (cid:101) m i,j , which is centralized to the analyst. Then, the mapping function g i is constructed such that (cid:98) X anc i = g i ( (cid:101) X anc i ) ∈ R r × (cid:98) m s.t. (cid:98) X anc i ≈ (cid:98) X anc i (cid:48) ( i (cid:54) = i (cid:48) ) , where (cid:101) X anc i = [ (cid:101) X anc i, , (cid:101) X anc i, , . . . , (cid:101) X anc i,d ] . For computing g i , authors of [14, 15] introduced apractical method via a total least squares problem [16] when g i is linear and also indicated anidea when g i is nonlinear.Finally, the obtained collaboration representations (cid:98) X i can be analyzed as one dataset, (cid:98) X =  (cid:98) X (cid:98) X ... (cid:98) X c  ∈ R n × (cid:98) m , together with the shared ground truth Y i using supervised machine learning and deep learningmethods. This will obtain a model h ( (cid:98) X ) ≈ Y. Let X test ∈ R s × m be a test dataset partitioned according to features and samples as X test =  X test1 , X test1 , · · · X test1 ,d X test2 , X test2 , · · · X test2 ,d ... ... . . . ... X test c, X test c, · · · X test c,d  , X test i,j ∈ R s i × m j . Then, the intermediate representations (cid:101) X test i,j = f i,j ( X test i,j ) are con-structed at the user-side, and are shared to the analyst. In the analyst-side, the predictions Y test i of X test i = [ X test i, , X test i, , . . . , X test i,d ] are obtained by Y test i = h ( g i ([ (cid:101) X test i, , (cid:101) X test i, , . . . , (cid:101) X test i,d ])) via the intermediate and collaboration representations and are returned to the correspondingusers. As shown in Section 3.1.3, the obtained model of the i -th institution is Y test i = h ( g i ([ f i, ( X test i, ) , f i, ( X test i, ) , . . . , f i,d ( X test i,d )])) , which is a multi-layer model via intermediate and collaboration representations. The modelis separately hold by the users and the analyst such that f i,j is only at the user-side, and g i and h are only at the analyst-side. Therefore, the interpretability of the model is not so high eventhough a highly interpretable model is used, e.g., the decision tree for h . To address this, wepropose an interpretable collaborative data analysis.We ﬁrst revisit the anchor data X anc . In collaborative data analysis, the anchor data areshareable data consisting of public data or dummy data randomly constructed, and are usedfor constructing collaborative representation (see Section 3.1.2).The basic concept of the proposed method is to mimic the multi-layer model of collabo-rative data analysis, that is,1. Predict the anchor data X anc using collaborative data analysis.2. Construct an interpretable model with the anchor data X anc and their predictions.The predictions of the anchor data X anc are Y anc i = h ( g i ([ f i, ( X anc: , ) , f i, ( X anc: , ) , . . . , f i,d ( X anc: ,d )])) for each i . In the collaborative data analysis, the analyst holds (cid:101) X anc i,j = f i,j ( X anc: ,j ) . Therefore, Y anc i can be obtained by Y anc i = h ( g i ([ (cid:101) X anc i, , (cid:101) X anc i, , . . . , (cid:101) X anc i,d ]) without the need for additionalcommunication from users. Note that additional communication may increase a privacy andconﬁdentiality risks. Regarding higher recognition performance, we can use another datasetfor this purpose different from the anchor data for constructing g i in the collaborative dataanalysis. Then, the predictions Y anc i of the anchor data X anc are returned to the i -th user.At the user-side, an interpretable model is individually constructed as Y anc i ≈ t i ( X anc ) , where the obtained model t i depends on i . Note that, since the anchor data have wholefeatures of X instead of the private dataset X i,j , the obtained interpretable model t i is basedon whole features. For example, in the decision tree, t i can be whole features represented bybranch, which are not feasible to do in an individual analysis that only uses X i,j . Here, eachparty can individually select an interpretable model according to its own needs. This is anadvantage of the proposed method for practical applications.6 lgorithm 1 Interpretable collaborative data analysis

Input (for user-side): X i,j ∈ R n i × m j , Y i ∈ R n i × (cid:96) , individually Output (for user-side):

Interpretable models t i ( i = 1 , , . . . , c ) , which depend on i user-side ( i, j ) analyst-side

1: Generate X anc i,j and share to all users2: Set X anc and X anc: ,j

3: Generate f i,j

4: Compute (cid:101) X i,j = f i,j ( X i,j )

5: Compute (cid:101) X anc i,j = f i,j ( X anc: ,j )

6: Share (cid:101) X i,j , (cid:101) X anc i,j and Y i to analyst → Get (cid:101) X i,j , (cid:101) X anc i,j and Y i for all i and j

7: Set (cid:101) X i and (cid:101) X anc i

8: Construct g i from (cid:101) X anc i for all i

9: Compute (cid:98) X i = g i ( (cid:101) X i ) for all i

10: Set (cid:98) X and Y

11: Analyze (cid:98) X and get h as Y ≈ h ( (cid:98) X )

12: Compute (cid:98) X anc i = g i ( (cid:101) X anc i ) for all i

13: Compute Y anc i = h ( (cid:98) X anc i ) for all i

14: Get Y anc i ← Return Y anc i to users15: Analyze X anc and get t i as Y anc i ≈ t i ( X anc ) Note that the performance of the proposed method depends on the choice of the anchordata X anc . The simplest way to set X anc is via a random matrix [4, 14, 15]. However, to im-prove the performance, the anchor data need to preserve some statistics of X . One practicalidea is to generate X anc i,j for each private data by using methods such as the generative ad-versarial nets (GAN) [10] and autoencoder based on (deep) neural network or dimensionalityreduction with data augmentation. Then, X anc i,j is shared with all users and X anc is set as X anc = [ X anc: , , X anc: , , . . . , X anc: ,d ] =  X anc1 , X anc1 , · · · X anc1 ,d X anc2 , X anc2 , · · · X anc2 ,d ... ... . . . ... X anc c, X anc c, · · · X anc c,d  . We will investigate practical techniques for constructing a suitable anchor data in the future.The pseudo-code of the proposed method is summarized in Algorithm 1. As shown inAlgorithm 1, the proposed interpretable collaborative data analysis is based on the one-path algorithm, which does not require iteration steps with data communication.

In the proposed method (Algorithm 1), each user shares the local anchor data X anc i,j to otherusers and shares intermediate representations (cid:101) X i,j , (cid:101) X anc i,j to the analyst. We discuss how theprivacy of the private data X i,j is preserved for both the users and the analyst. Here, weassume that the users do not trust each other and want to protect their training data X i,j against honest-but-curious users and the analyst. Hence, the users and the analyst will strictlyfollow the strategy, but they will try to infer as much information as possible. We also assumethat the analyst does not collude with any users.7o ensure the privacy of X i,j against other users, each user shares the local anchor data X anc i,j to other users. The local anchor data do not contain X i,j but may preserve some usefulinformation. The local anchor data are constructed by the users themselves using methodssuch as GAN and autoencoder with data augmentation. Therefore, users can control theinformation although it may result a trade-off in the performance. Note that collaborativedata analysis works well even when using random anchor data, as demonstrated in [4, 14, 15].To ensure the privacy of X i,j against the analyst, each user shares the intermediate repre-sentations (cid:101) X i,j , (cid:101) X anc i,j to the analyst. If analyst has the map function f i,j or its approximation,he/she can obtain an approximation of X i,j . However, the function f i,j is private and cannotbe approximated by others because no one has both the input and output the of f i,j . Therefore,the analyst cannot obtain an approximation of X i,j from the intermediate representations.In our future studies, we will further analyze more details of the privacy of the proposedmethod. This section evaluates the performance of the proposed interpretable collaborative data anal-ysis (Algorithm 1) and compares it with those of interpretable centralized and individualanalyses for classiﬁcation problems. Note that centralized analysis is considered as an idealcase since the private datasets X i,j cannot be shared in our target situation. The proposedcollaborative data analysis aims to achieve a better performance than individual analysis.We use a simple decision tree for the interpretable model. In the proposed method, eachintermediate representation is designed from X i,j using locality preserving projections (LPP)[12] which is an unsupervised dimensionality reduction method. We use a kernel versionof ridge regression (K-RR) [32] with a Gaussian kernel for data analysis for collaborativeanalysis. We set the regularization parameter of K-RR to λ = 0 . . The local anchor data X anc i,j are constructed by a low-rank approximation based on singular value decomposition(SVD) with random perturbation and data augmentation. We set r = 2 , as the number ofanchor data.We set the ground truth Y as a binary matrix whose ( i, j ) entry is 1 if the training data x i are in class j and otherwise. This type of ground truth Y has been applied to variousclassiﬁcation algorithms, including ridge regression and deep neural networks [3].In this paper, we evaluate the performance of methods in terms of normalized mutual in-formation (NMI) [33] and accuracy (ACC). Moreover, to evaluate the similarity of predictionmodel by individual and collaborative data analyses with those of centralized analysis, weuse ﬁdelity to centralized analysis under NMI (Fidelity to CA), that is NMI( Y IA , Y CA ) , NMI( Y CDA , Y CA ) , where the function NMI denotes the value of NMI between two predictions and Y CA , Y IA ,and Y CDA are the predictions of centralized, individual, and collaborative data analyses, re-spectively.All the numerical experiments are performed using MATLAB2019b.8a) All training dataset (b) Training dataset in the 1st group(c) Training dataset in the 2nd group (d) Test dataset and its ground truthFigure 1: Features 1 and 2 of the training and test datasets for the artiﬁcial problem.Table 1: Recognition performance (average ± standard error) for the artiﬁcial problem. Method NMI ACC Fidelity to CACentralized analysis . ± .

01 98 . ± . −− Individual analysis . ± .

01 62 . ± .

95 0 . ± . Collab. data analysis . ± .

02 97 . ± .

34 0 . ± . We used a 20-dimensional artiﬁcial data for two-class classiﬁcation. Fig. 1(a) depicts features1 and 11 of all the training datasets, where the number of samples is n = 1 , . Theother 18 dimensions have random values. Note that only features 1 and 11 are necessary forclassiﬁcation.We considered the case where the dataset in Fig. 1(a) is distributed into four parties: c = d = 2 as X = (cid:20) X , X , X , X , (cid:21) ∈ R × , X , , X , , X , , X , ∈ R × . For horizontal partitioning, the 1st group of parties X , , X , is the dataset shown in Figs. 1(b)and the 2nd group of parties X , , X , is the dataset shown in Fig. 1(c). For the vertical par-titioning, X , , X , have the features 1–10 and X , , X , have features 11–20. Fig. 1(d)illustrates features 1 and 11 of the test dataset and their ground truth. For the proposedmethod, we set the dimensionality of intermediate representations to (cid:101) m i,j = 4 for all parties.Fig. 2 presents the recognition results and Table 1 shows the average and standard errorof NMI, ACC, Fidelity to CA calculated across 10 trials. From these results, we can observe9a) Collaborative data analysis (b) Centralized analysis(c) Individual analysis for X , (d) Individual analysis for X , (e) Individual analysis for X , (f) Individual analysis for X , Figure 2: Recognition results of centralized, individual, and collaborative data analyses forthe artiﬁcial problem.that individual analysis does not obtain good recognition results. This is because of thefollowing reasons. Since X , has feature 1 of the samples shown in Fig. 1(b) and X , has feature 11 of the samples shown in Fig. 1(c), the distributions of the two classes areoverlapped. Therefore, using only X , or X , cannot separate the two classes. Moreover, X , has feature 11 of the samples shown in Fig. 1(b) and X , has feature 1 of the samplesshown in Fig. 1(c). Therefore, the classiﬁcation boundaries for X , and X , are horizontaland vertical, respectively.On the other hand, when compared with individual analysis, the proposed collaborativedata analysis (Fig. 2(a)) achieves good recognition results, which are comparable to the re-sults of centralized analysis (Fig. 2(b)). We used a credit rating dataset “CreditRating Historical.dat” from the MATLAB Statisticsand Machine Learning Toolbox. The dataset contains ﬁve ﬁnancial ratios: Working capital /10

MVE_BVTD ＊RE_TA◯MVE_BVTD ◯MVE_BVTD◯MVE_BVTD＊RE_TA ＊RE_TA DC 解析＊RE_TA◯MVE_BVTD (a) Collaborative data analysis ◯MVE_BVTD ＊RE_TA◯MVE_BVTD ◯MVE_BVTD◯MVE_BVTD◯MVE_BVTD＊RE_TA＊RE_TA ＊RE_TA 統合解析 (b) Centralized analysis ＊RE_TA 単独解析 (1,1) ＊RE_TA＊RE_TA ＊RE_TA＊EBIT_TA ＊EBIT_TA ＊EBIT_TA＊EBIT_TA (c) Individual analysis for X , 単独解析 (1,2) ◯MVE_BVTD◯MVE_BVTD◯MVE_BVTD ◯MVE_BVTD ◯MVE_BVTD◯MVE_BVTD◯MVE_BVTD ◯S_TA (d) Individual analysis for X , ＊RE_TA 単独解析 (2,1) ＊RE_TA＊RE_TA ＊RE_TA＊EBIT_TA ＊EBIT_TA＊RE_TA ＊EBIT_TA＊RE_TA (e) Individual analysis for X , 単独解析 (2,2) ◯MVE_BVTD◯MVE_BVTD◯MVE_BVTD◯MVE_BVTD ◯MVE_BVTD◯MVE_BVTD◯MVE_BVTD ◯Industry (f) Individual analysis for X , Figure 3: Decision tree of centralized, individual, and collaborative data analyses for theﬁnancial problem.Table 2: Recognition performance (average ± standard error) for the ﬁnancial problem. Method NMI ACC Fidelity to CACentralized analysis . ± .

00 69 . ± . −− Individual analysis . ± .

02 57 . ± .

68 0 . ± . Collab. data analysis . ± .

01 60 . ± .

67 0 . ± . Total Assets (WC TA), Retained Earnings / Total Assets (RE TA), Earnings Before Interestsand Taxes / Total Assets (EBIT TA), Market Value of Equity / Book Value of Total Debt(MVE BVTD), Sales / Total Assets (S TA), and industry sector labels from 1 to 12 for 3932customers. The dataset also includes credit ratings from “AAA” to “CCC” for all customers.Note that this dataset is simulated and not real.We aim to predict the credit rating using the ﬁve ﬁnancial ratios and industry sector labels.11e considered the case where the training dataset with 3,000 samples is distributed into fourparties: c = d = 2 as X = (cid:20) X , X , X , X , (cid:21) ∈ R × , X , , X , , X , , X , ∈ R × , where, X , , X , have the 1st group of features WC TA, RE TA, and EBIT TA and X , , X , have the 2nd group of features MVE BVTD, S TA, and Industry sector label as features.The obtained decision trees for centralized, individual, and collaborative data analysesare shown in Fig. 3, while the average and standard error of NMI, ACC and Fidelity to CAacross 10 trials are shown in Table 2. In Fig. 3, the features with ∗ are in X , , X , and thefeatures with ◦ are in X , , X , . As shown in Fig. 3, the proposed collaborative analysis(Fig. 3(a)) has a tree with the same two features as centralized analysis, which belongs todifferent groups. This cannot be achieved in individual analysis as shown in Fig. 3(c)–(f). We next evaluate the performances of centralized, individual, and collaborative data analy-ses on the binary and multi-class classiﬁcation problems obtained from [21, 31] and featureselection datasets .We considered the case where the dataset is distributed into six parties: c = 2 and d = 3 .The performance of each method is evaluated by using a ﬁve-fold cross-validation framework.For the proposed method, we set (cid:101) m i,j = 15 .The numerical results of the centralized analysis, an average of the individual analysisand the proposed method for 10 test problems are presented in Table 3. We can observe fromTable 3 that recognition performance of the proposed method is better than that of individualanalysis and comparable to that of centralized analysis on most datasets. To address the needs of distributed data analysis and achieve interpretability, we proposedan interpretable non-model sharing collaborative data analysis on distributed data. The pro-posed method generated an interpretable model for distributed data by sharing intermediaterepresentations without revealing the private data and the model. The obtained interpretablemodel was based on the whole features of distributed data, which cannot be achieved in in-dividual analysis. Numerical experiments on both artiﬁcial and real-world data showed thatthe proposed method constructed an interpretable model with better recognition performancethan individual analysis and comparable to centralized analysis.The distributed data analysis and the interpretable model construction are essential andimportant challenges in real-world situations including medial, ﬁnancial, and manufactur-ing data analyses. The proposed interpretable collaborative data analysis would be a break-through technology for such kinds of distributed data analysis.In our future studies, we will further analyze the privacy and conﬁdentiality concernsand the accuracy of the proposed method. Moreover, practical techniques for improving theperformance of the proposed method including other suitable anchor data will be investigated. available at http://featureselection.asu.edu/datasets.php. ± standard error) for real-world problems. Dataset Method NMI ACC Fidelity to CACarcinom CA . ± .

03 54 . ± . −− m = 9182 IA . ± .

04 40 . ± .

12 0 . ± . n = 174 CDA . ± .

04 54 . ± .

30 0 . ± . CLL-SUB-111 CA . ± .

03 60 . ± . −− m = 11340 IA . ± .

02 56 . ± .

64 0 . ± . n = 111 CDA . ± .

08 52 . ± .

43 0 . ± . GLA-BRA-180 CA . ± .

05 62 . ± . −− m = 49151 IA . ± .

02 55 . ± .

75 0 . ± . n = 180 CDA . ± .

04 61 . ± .

41 0 . ± . jaffe CA . ± .

02 38 . ± . −− m = 676 IA . ± .

01 42 . ± .

51 0 . ± . n = 213 CDA . ± .

04 31 . ± .

46 0 . ± . leukemia CA . ± .

14 94 . ± . −− m = 7129 IA . ± .

06 80 . ± .

39 0 . ± . n = 72 CDA . ± .

10 81 . ± .

94 0 . ± . lung CA . ± .

05 88 . ± . −− m = 3312 IA . ± .

03 78 . ± .

50 0 . ± . n = 203 CDA . ± .

05 86 . ± .

57 0 . ± . pixraw10P CA . ± .

02 38 . ± . −− m = 10000 IA . ± .

04 41 . ± .

12 0 . ± . n = 100 CDA . ± .

03 29 . ± .

89 0 . ± . Prostate GE CA . ± .

10 84 . ± . −− m = 5966 IA . ± .

04 75 . ± .

20 0 . ± . n = 102 CDA . ± .

07 78 . ± .

37 0 . ± . TOX-171 CA . ± .

03 59 . ± . −− m = 5789 IA . ± .

01 49 . ± .

51 0 . ± . n = 171 CDA . ± .

04 49 . ± .

72 0 . ± . warpAR10P CA . ± .

02 30 . ± . −− m = 2400 IA . ± .

02 34 . ± .

39 0 . ± . m = 130 CDA . ± .

03 30 . ± .

61 0 . ± . Acknowledgements

The work was supported in part by the New Energy and Industrial Technology DevelopmentOrganization (NEDO). The work of the ﬁrst author was supported in part by the Japan Sci-ence and Technology Agency (JST), ACT-I (No. JPMJPR16U6) and the Japan Society forthe Promotion of Science (JSPS), Grants-in-Aid for Scientiﬁc Research (Nos. 17K12690,19KK0255). The work of the fourth author was supported in part by the Japan Society forthe Promotion of Science (JSPS), Grants-in-Aid for Scientiﬁc Research (No. 18H03250).

References [1] M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, L. Zhang,Deep learning with differential privacy, in: Proceedings of the 2016 ACM SIGSACConference on Computer and Communications Security, ACM, 2016.[2] A. B. Arrieta, N. D´ıaz-Rodr´ıguez, J. Del Ser, A. Bennetot, S. Tabik, A. Barbado,S. Garc´ıa, S. Gil-L´opez, D. Molina, R. Benjamins, et al., Explainable artiﬁcial intel-ligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsibleAI, Information Fusion 58 (2020) 82–115.[3] C. M. Bishop, Pattern Recognition and Machine Learning (Information Science andStatistics), Springer-Verlag Berlin, Heidelberg, 2006.[4] A. Bogdanova, A. Nakai, Y. Okada, A. Imakura, T. Sakurai, Federated learning systemwithout model sharing through integration of dimensional reduced data representations,in: International Workshop on Federated Learning for User Privacy and Data Conﬁden-tiality in Conjunction with IJCAI 2020 (FL-IJCAI’20), 2020, (accepted).[5] H. Cho, D. J. Wu, B. Berger, Secure genome-wide association analysis using multipartycomputation, Nature biotechnology 36 (6) (2018) 547.[6] C. Dwork, Differential privacy, in: Bugliesi M., Preneel B., Sassone V., Wegener I. (eds)Automata, Languages and Programming. ICALP 2006. Lecture Notes in Computer Sci-ence, vol. 4052, 2006.[7] R. A. Fisher, The use of multiple measurements in taxonomic problems, Annals ofhuman genetics 7 (2) (1936) 179–188.[8] C. Gentry, Fully homomorphic encryption using ideal lattices, in: Stoc, vol. 9, 2009.[9] R. Gilad-Bachrach, N. Dowlin, K. Laine, K. Lauter, M. Naehrig, J. Wernsing, Cryp-tonets: Applying neural networks to encrypted data with high throughput and accuracy,in: International Conference on Machine Learning, 2016.1410] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair,A. Courville, Y. Bengio, Generative adversarial nets, in: Advances in neural informationprocessing systems, 2014.[11] R. Guidotti, A. Monreale, S. Ruggieri, F. Turini, D. Pedreschi, F. Giannotti, A survey ofmethods for explaining black box models, arXiv Preprint (2018) arXiv:1802.01933.[12] X. He, P. Niyogi, Locality preserving projections, in: Advances in neural informationprocessing systems, 2004.[13] A. Imakura, M. Matsuda, X. Ye, T. Sakurai, Complex moment-based supervised eigen-map for dimensionality reduction, in: Proceedings of the AAAI Conference on ArtiﬁcialIntelligence, vol. 33, 2019.[14] A. Imakura, T. Sakurai, Data collaboration analysis framework using centralization ofindividual intermediate representations for distributed data sets, ASCE-ASME Journalof Risk and Uncertainty in Engineering Systems, Part A: Civil Engineering 6 (2020)04020018.[15] A. Imakura, X. Ye, T. Sakurai, Collaborative data analysis: Non-model sharing-type ma-chine learning for distributed data, in: 2020 Principle and Practice of Data and Knowl-edge Acquisition Workshop (PKAW2020), 2020, (accepted).[16] S. Ito, K. Murota, An algorithm for the generalized eigenvalue problem for nonsquarematrix pencils by minimal perturbation approach, SIAM J. Matrix. Anal. Appl. 37(2016) 409–419.[17] S. Jha, L. Kruger, P. McDaniel, Privacy preserving clustering, in: European Symposiumon Research in Computer Security, Springer, 2005.[18] Z. Ji, Z. C. Lipton, C. Elkan, Differential privacy and machine learning: A survey andreview, arXiv preprint (2014) arXiv:1412.7584.[19] J. Koneˇcn`y, H. B. McMahan, D. Ramage, P. Richtarik, Federated optimiza-tion: Distributed machine learning for on-device intelligence, arXiv preprint (2016)arXiv:1610.02527.[20] J. Koneˇcn`y, H. B. McMahan, F. X. Yu, P. Richtarik, A. T. Suresh, D. Bacon, Federatedlearning: Strategies for improving communication efﬁciency, in: NIPS Workshop onPrivate Multi-Party Machine Learning, 2016.URL https://arxiv.org/abs/1610.05492 [21] Y. LeCun, The MNIST database of handwritten digits, http://yann. lecun.com/exdb/mnist/.[22] Q. Li, Z. Wen, Z. Wu, S. Hu, N. Wang, B. He, A survey on federated learning sys-tems: Vision, hype and reality for data privacy and protection, arXiv preprint (2019)arXiv:1907.09693.[23] X. Li, M. Chen, F. Nie, Q. Wang, Locality adaptive discriminant analysis, in: Proceed-ings of the 26th International Joint Conference on Artiﬁcial Intelligence, AAAI Press,2017. 1524] L. v. d. Maaten, G. Hinton, Visualizing data using t-SNE, Journal of machine learningresearch 9 (2008) 2579–2605.[25] H. B. McMahan, E. Moore, D. Ramage, S. Hampson, et al., Communication-efﬁcient learning of deep networks from decentralized data, arXiv preprint (2016)arXiv:1602.05629.[26] T. Miller, Explanation in artiﬁcial intelligence: Insights from the social sciences, arXivPreprint (2017) arXiv:1706.07269.[27] C. Molnar, Interpretable Machine Learning: A Guide for Making Black Box ModelsExplainable, 2019.URL https://christophm.github.io/interpretable-ml-book [28] K. Pearson, LIII. On lines and planes of closest ﬁt to systems of points in space, TheLondon, Edinburgh, and Dublin Philosophical Magazine and Journal of Science 2 (11)(1901) 559–572.[29] A. Preece, D. Harborne, D. Braines, R. Tomsett, S. Chakraborty, Stakeholders in ex-plainable AI, arXiv preprint (2018) arXiv:1810.00184.[30] C. Rudin, Stop explaining black box machine learning models for high stakes decisionsand use interpretable models instead, Nature Machine Intelligence 1 (5) (2019) 206–215.[31] F. Samaria, A. Harter, Parameterisation of a stochastic model for human face identiﬁca-tion, in: Proceeding of IEEE Workshop on Applications of Computer Vision, 1994.[32] C. Saunders, A. Gammerman, V. Vovk, Ridge regression learning algorithm in dualvariables.[33] A. Strehl, J. Ghosh, Cluster ensembles—a knowledge reuse framework for combiningmultiple partitions, Journal of machine learning research 3 (Dec) (2002) 583–617.[34] M. Sugiyama, Dimensionality reduction of multimodal labeled data by local Fisher dis-criminant analysis, Journal of machine learning research 8 (May) (2007) 1027–1061.[35] Q. Yang, GDPR, data shortage and AI, invited Talk of The Thirty-Third AAAI Confer-ence on Artiﬁcial Intelligence (AAAI-19) (2019).URL https://aaai.org/Conferences/AAAI-19/invited-speakers/https://aaai.org/Conferences/AAAI-19/invited-speakers/