Data-driven inference of hidden nodes in networks
aa r X i v : . [ phy s i c s . d a t a - a n ] J a n Data-driven inference of hidden nodes in networks
Danh-Tai Hoang,
1, 2
Junghyo Jo,
3, 4, ∗ and Vipul Periwal † Laboratory of Biological Modeling, National Institute of Diabetes and Digestive and Kidney Diseases,National Institutes of Health, Bethesda, Maryland 20892, USA Department of Natural Sciences, Quang Binh University, Dong Hoi, Quang Binh 510000, Vietnam School of Computational Sciences, Korea Institute for Advanced Study, Seoul 02455, Korea Department of Statistics, Keimyung University, Daegu 42601, Korea (Dated: January 15, 2019)The explosion of activity in finding interactions in complex systems is driven by availability ofcopious observations of complex natural systems. However, such systems, e.g. the human brain, arerarely completely observable. Interaction network inference must then contend with hidden variablesaffecting the behavior of the observed parts of the system. We present a novel data-driven approachfor model inference with hidden variables. From configurations of observed variables, we identifythe observed-to-observed, hidden-to-observed, observed-to-hidden, and hidden-to-hidden interac-tions, the configurations of hidden variables, and the number of hidden variables. We demonstratethe performance of our method by simulating a kinetic Ising model, and show that our method out-performs existing methods. Turning to real data, we infer the hidden nodes in a neuronal network inthe salamander retina and a stock market network. We show that predictive modeling with hiddenvariables is significantly more accurate than that without hidden variables. Finally, an importanthidden variable problem is to find the number of clusters in a dataset. We apply our method toclassify MNIST handwritten digits. We find that there are about 60 clusters which are roughlyequally distributed amongst the digits.
I. INTRODUCTION
To go from observations to predictive understandingis to go from stamp-collecting to science. Absent princi-pled quantitative laws, biological and social systems canbe generally described as networks of interacting nodes,with time-series data providing a window on the dynam-ics of the underlying system. In the present era of bigdata, the network reconstruction problem has attractedconsiderable interest in research areas ranging from neu-roscience [1–4] and genomics [5–7] to finance [8–11]. Afundamental caveat is that such reconstructions alwaysrely on partial observation of these complex networks.For example, it is hopeless to follow the simultaneousspiking activity of every neuron in the brain, the tran-scription of every gene in the genome, and every fluctu-ating factor in a financial system.The problem of accounting for the unobserved con-stituents of any system is ill-posed without further in-formation, simply because the number, the interactions,and the configurations of these hidden nodes must allbe identified from the observed data and, a priori, onecan make the former two as large and as complicated,respectively, as one pleases. To render the problem well-defined, one can first choose a theoretical model structureand then account for the unobserved nodes within thisstructure. Given the importance of this problem, muchwork has been devoted to it.A simple approach is to maximize the likelihood ofobserved configurations after marginalizing unobserved ∗ Corresponding author: [email protected] † Corresponding author: [email protected] configurations [12]. Another effective approach is the Ex-pectation Maximization (EM) algorithm for hidden vari-ables that contains two alternating steps, inferring allinteractions of observed and hidden variables from con-figurations of observed variables, and reconstructing theconfigurations of hidden variables consistent with theseinferred interactions [13]. As one might expect, this al-gorithm is computationally impractical for even moder-ately large systems if the fraction of unobserved vari-ables is significant. Furthermore, hidden variable config-uration reconstruction accuracy is greatly dependent oninteraction inference accuracy, a factor that becomes sig-nificant for limited datasets. Therefore, recent networkreconstruction methods have considered alternative ap-proaches, such as mean field approximations [12, 14] andreplica methods [15, 16]. However, the mean field approx-imations work only for weak and dense interactions [14],whereas the replica methods allows to infer strong andsparse interactions, but impose the stringent assumptionof the independence between hidden variables [16]. Inaddition to non-interacting hidden variables, random in-teraction strengths and the thermodynamic limit are twoprerequisites for the exact inference of the replica meth-ods [15].We recently formulated a new approach [17, 18] to net-work reconstruction for observed variables that is signifi-cantly more accurate inference-wise in the limit of sparsesampling and orders of magnitude faster computation-wise than previous methods. Based on this foundation,we propose a new approach for network reconstruction in-cluding hidden variables, by replacing the inference stepwith our approach. This does not, by itself, address thecrucial question of the number of unobserved variables, sowe complete our proposal by formulating a simple quanti-tative test of model complexity to determine this number.This paper is organized as follows: We briefly reviewour inference method and outline its extension to hid-den variables, paying especial attention to the determi-nation of the number and interactions of hidden vari-ables, amongst themselves and with observed variables.We then validate our method with simulated data fromkinetic Ising models, showing the accurate determina-tion of the number and interactions of hidden variablesfor a range of observed fractions of systems, going up to40% hidden variables. Turning to real data, we applyour method to reconstruct a neural network from par-tially observed neuronal activities, and a stock-marketnetwork using data of opening and closing stock prices of25 American companies. We validate our network recon-structions by reproducing observed neuronal activities bypinning just a few neuron configurations, and by exhibit-ing a profitable stock trading strategy based on our in-ferred network. Finally, we demonstrate that our ap-proach is suited to unsupervised data clustering, as well,since cluster membership is a type of hidden variable. Weestimate the number of hidden features that can explainthe MNIST hand-written digit dataset. Complete sourcecode with documentation is available [19].
II. METHOD
We explain our approach in the context of a concreteexample for ease of understanding. Consider a stochasticdynamical system in which a vector of N binary ( ± σ = ( σ , · · · , σ N ) evolves stochastically accord-ing to the conditional probability: P ( σ i ( t + 1) | σ ( t )) = exp( σ i ( t + 1) H i ( t ))exp( H i ( t )) + exp( − H i ( t )) , (1)for i = 1 , · · · , N . The local field H i ( t ) = P j W ij σ j ( t )represents the summed influence of the present state σ j ( t )on the future state σ i ( t + 1) through the weight W ij . This kinetic Ising model has a model expectation, h σ i ( t +1) i model = tanh H i ( t ). Generating σ ( t ) given W ij is easy,but inferring W ij given σ ( t ) is not trivial. Although nu-merous methods exist for the inverse problem [20–22], werecently proposed a new approach [17, 18]. We give herea simplified intuitive account. The first step is the linearregression of H i = P j W ij σ j between H i and σ j . Sup-pose we know H i ( t ) and σ j ( t ) . The coefficient W ij canthen be obtained as usual: W ij = X k h δH i δσ k i [ C − ] kj , (2)where C jk ≡ h δσ j δσ k i is the covariance matrix for σ ( t ) , with h f i ≡ L − P Lt =1 f ( t ) and δf ≡ f − h f i . The secondstep is the update of the observable, H i ( t ) ← σ i ( t + 1) h σ i ( t + 1) i model H i ( t ) = σ i ( t + 1) H i ( t )tanh H i ( t ) . (3) The multiplicative update of H i ( t ) corrects the magni-tude and sign of H i ( t ) based on the ratio of observed σ i ( t + 1) and model expectation h σ i ( t + 1) i model , whichis always larger than unity in absolute magnitude. Acritical aspect of Eq. (3) is that the limit | H i | ↓ H i ( t ) ← σ i ( t + 1) , independent of H i . Therefore, the up-date in Eq. (3) avoids being entirely multiplicative fordetermining W ij . These two steps, H i ( t ) → W ij and W ij → H i ( t ) , provide a powerful iterative method. Wecontinue this iteration until the discrepancy between dataand model expectation D i ( W ) ≡ P t (cid:2) σ i ( t + 1) − h σ i ( t +1) i model (cid:3) is minimized. We derived the linear regressionin Eq. (2) using the concept of free energy in statisticalmechanics [17] so we call this method Free Energy Min-imization (FEM). Notice that the parameter update inEqs. (2-3) is completely independent of the computationof D i . This crucial feature allows the small sample sizeinference to avoid overfitting because the minimizationof D i is used only as a stopping criterion.Now we propose to apply the FEM method to in-fer interactions from/to hidden variables. The systemhas N v observable (visible) and N h hidden variables( N = N v + N h ). As a variant of the EM algorithm, wefirst assign random configurations for hidden variables.We then infer interaction weights W ij for observed-to-observed, hidden-to-observed, observed-to-hidden, andhidden-to-hidden variables with the FEM method. Given W ij , we can update the configurations of hidden variableswith a probability L / ( L + L ) where L and L rep-resent the likelihoods L of the system before and afterflipping, L = L − Y t =1 N Y i =1 P ( σ i ( t + 1) | σ ( t )) . (4)Note that the independent terms in the update of hid-den states (at each t ) of L and L cancel in the updateratio, L / ( L + L ) . Therefore, we just need to calcu-late the dependent terms. The iterations between theparameter optimization (M step) and the variable up-date (E step) provide accurate inference of the interactionweights, W ij , and the unknown configurations of hiddenvariables.We must now consider the problem of determining thenumber of hidden variables. A simple measure would bethe same discrepancy between observation σ vi ( t + 1) andmodel expectation h σ vi ( t + 1) i model ,D v ≡ N v X i =1 D i ( W ) (5)but this is clearly not taking the hidden variables intoaccount. On the other hand, extending the sum inEq. (5) to include hidden variables is useless because theE step update is minimizing these additional terms al-ready. Since the error in inference of hidden variablestates cannot be set by a scale smaller than the modeldiscrepancy in the observed part, we define the scaleddiscrepancy of the entire system based on the observedpart as D ≡ D v (cid:18) N h N v (cid:19) . (6)The first term in Eq. 6 represents the goodness of fitfor observed variables and the second term representsmodel complexity, so our criterion balances the two. Be-cause D v ∝ − log L v where L v represents the likeli-hood of observed variables, Eq. 6 can be rewritten as D ∝ − log L v (1 + N h /N v ). Our criterion is thus simi-lar in spirit to the Akaike information criterion [23] andBayesian information criterion [24] with log-likelihood ofobservation ( − log L v ∼ D v ) and model degrees of free-dom ( N v + N h ).Finally, our method can be summarized as the follow-ing set of steps:For a range of numbers of hidden variables, in paralleland independently,(i) Assign configurations of hidden variables at random;(ii) Infer interaction weights W ij including observed-to-observed, hidden-to-observed, observed-to-hidden, andhidden-to-hidden from the configurations of observed andhidden variables using FEM;(iii) Flip the states of hidden variables with probability L / ( L + L ) (see Eq. (4)).(iv) Repeat steps (ii) and (iii) until the discrepancy ofobserved variables is minimized. The final values of W ij and hidden states are the inferred coupling weights andconfigurations of hidden spins, respectively.Pick the number of hidden variables that minimizesEq. (6). III. RESULTSA. Kinetic Ising model
To demonstrate the performance of our method, wesynthesized binary time series of N = 100 spins by us-ing the Sherington-Kirkpatrick model [25]. The updateof spin σ follows Eq. (1) with preset coupling strengths W ij (Fig. 1A). Our goal is to reconstruct all of W ij fromobservations of a fraction of σ ( t ) . Suppose that we onlyobserve the time series of 60 spins with 40 spins hid-den (Fig. 1B). When we reconstructed the interactions W ij between observed node i and observed node j usingFEM, the reconstructed W ij , based on the partial obser-vations, showed a large error (Fig. 1C). We introduced40 hidden variables, and applied the EM algorithm out-lined in the previous section. The reconstructed W ij wasclose to the true W ij (Fig. 1D). How well are the hid-den variable configurations recovered? For the case of 40hidden variables, the true configurations of the hiddenvariables were recovered with an accuracy of 96.6%. Thereconstruction accuracy increased with fewer spins hid-den (Fig. 2). For instance, when 90 spins were observedwith 10 spins hidden, the accuracy was 97.6%. The number of hidden variables is usually unknown inreal-world problems. When we reconstructed W ij withdifferent numbers of hidden variables, the mean squareerrors of observed-to-observed interaction strengths,MSE= N − v P i,j ∈ obs ( W ij − W true ij ) , were minimal at theright number of hidden variables (Fig. 3, upper panel).The MSE is also inaccessible in real-world problems, butthe minimum of D (Eq. (6)) captured the correct valueof N h (Fig. 3, lower panel, red lines).To reconstruct W ij from observed and hidden vari-ables, we used FEM. For the M step, mean field meth-ods such as na¨ıve, Thouless-Anderson-Palmer, and exactmean field methods (nMF, TAP, and eMF), and maxi-mum likelihood estimation (MLE) can also be used. Abrief review of these methods can be found in Ref. [17].Given partial observations, mean field approaches werenot successful in reconstructing W ij (Fig. 4). For a smallpercentage of hidden variables (90 observable and 10 hid-den), FEM and MLE showed a similar performance inthe reconstruction of observed-to-observed and hidden-to-observed interactions. However, FEM outperformedMLE in reconstructing observed-to-hidden and hidden-to-hidden interactions (Fig. 4A-D). For a large percent-age of hidden variables (60 observable and 40 hiddenvariables), FEM showed significantly better performanceeven for observed-to-observed and hidden-to-observed in-teractions (Fig. 4E-H). We quantified the reconstructionperformance by measuring MSE between W ij and W true ij .FEM showed more accurate reconstruction of W ij withlower MSE in every case. More importantly, in addi-tion to better performance, FEM took approximately 100times less computation time than MLE due to its multi-plicative update (see [17–19] for details). B. Neuronal network
The analysis of real data brings out issues far moreclearly than simulated validations. Therefore, we appliedour method to infer hidden nodes and their contribu-tions in a real neuronal network. We used the time seriesdata of the 80 most active neurons from published multi-channel recordings of neuronal firing in the salamanderretina [26]. Considering the existence of unobserved hid-den neurons, we modeled the evolution of neuronal activ-ities by defining a local field, H i ( t ) = H ext i + P j W ij σ j ( t ),that determines the future activity of σ i ( t +1). The exter-nal local field H ext i represents the bias of the i th neuronthat sets its threshold. For various numbers of hiddenneurons N h , we computed H ext i and W ij . The activitiesof observed neurons were explained better and D v keptdecreasing with a larger N h of hidden neurons (Fig. 5B).However, once we considered the overall discrepancy D, an optimal number of hidden neurons was N ∗ h = 4 . Thus,the inclusion of four hidden neurons best explained theactivities of observed neurons, taking model complex-ity into account. Given these four hidden neurons, theconnection weights W ij were reconstructed as shown in C D i actual coupling matrix v a r i a b l e i n d e x raster of time series ignoring hidden including hidden i predicted coupling matrix i error i predicted coupling matrix i error observedhidden } } A B
FIG. 1. (Color online) Network reconstruction from partial observations. From the actual interaction weights (A), typicaltime series of 100 variables are generated according to the kinetic Ising model (B). Using the configuration of 60 observablevariables, the interaction weights are recovered in two cases: ignoring (C) and including (D) the existence of hidden variables.Data length L = 40 ,
000 is used. i n f e r en c e a cc u r a cy percent visible FIG. 2. Accuracy of network reconstruction and fractionof hidden variables. The inference accuracy is plotted as afunction of the fraction of visible variables, N v /N . Systemsize N = 100 and data length L = 40 ,
000 are used.
Fig. 5C.Since W true ij is unknown for the neuronal network, wevalidated our reconstruction in two different ways. First,we selected some neurons as input neurons, and thenbased on the activities of these input neurons, we gen-erated the activities of remaining neurons by using thereconstructed W ij . Here we selected the input neuronsbased on having the strongest influence to other neu-rons by gauging P i | W ij | . Given varying numbers ofinput neurons, we could successfully reconstruct the ac-tual activities of the remaining neurons (Fig. 5D). Asthe number of input neurons increased, the reconstruc-tion accuracy increased (Fig. 5F). Moreover, once thefour hidden neurons were considered the reconstruction accuracy was significantly improved. Second, given σ ( t ),we predicted σ ( t + 1), and then calculated the covariance C ij = h δσ i ( t + 1) δσ j ( t ) i . The reconstructed covariancewas comparable with the actual covariance from the ob-servation (Fig. 5E). C. Stock network
Our method has a wide range of practical applica-tions. As a demonstration, we reconstruct a stock mar-ket network with possible hidden nodes. We used stockprice time series of 25 major companies in the S&P500 index in five different sectors: technology (AAPL,GOOGL, MSFT, INTC, IBM), finance (BRK.B, JPM,WFC, BAC, C), health care (JNJ, PFE, UNH, MRK,AMGN), consumer discretionary (AMZN, WMT, HD,DIS, EBAY), and energy and industrial (XOM, CVX,GE, BA, MMM) [27]. We examined the price differencebetween daily opening and closing stock prices. Theirfluctuations from January 2005 to July 2018 are shownin (Fig. 6A). First, instead of considering the continu-ous price fluctuations, we defined a discretized measureof price changes. If the daily price increased at time t forthe i th company (opening price < closing price), we de-fined σ i ( t ) = +1. However, if the price decreased (open-ing price > closing price), then σ i ( t ) = −
1. Finally, ifthe price was unchanged (opening price = closing price),we defined σ i ( t ) = σ i ( t − M S E A N h = 10 D i s c r e p a n c y E
12 14 16 18 20 22 24 26 2810 M S E B N h = 20
12 14 16 18 20 22 24 26 28number of hidden variables0.2000.2250.2500.2750.300 D i s c r e p a n c y F
22 24 26 28 30 32 34 36 3810 M S E C N h = 30
22 24 26 28 30 32 34 36 38number of hidden variables0.200.250.300.35 D i s c r e p a n c y G
32 34 36 38 40 42 44 46 4810 M S E D N h = 40
32 34 36 38 40 42 44 46 48number of hidden variables0.200.250.300.350.40 D i s c r e p a n c y H FIG. 3. (Color online) Estimation of hidden degrees of freedom. Mean square errors of observed variables (upper) anddiscrepancy D v of observed variables (lower, black circles) and discrepancy D of total (observed and hidden) variables (lower,red triangles) are shown with differently assumed numbers N h of hidden variables. The actual numbers of hidden variables are N h = 10, 20, 30 and 40, from left to right. A system size N = 100 and data length L = 40 ,
000 are used. −1 0 1actual nteract ons−101 p r e d c t e d n t e r a c t o n s A observed to observed nMFTAPeMFMLEFEM −1 0 1actual nteract ons−101 p r e d c t e d n t e r a c t o n s B h dden to observed −1 0 1actual nteract ons−101 p r e d c t e d n t e r a c t o n s C observed to h dden −1 0 1actual nteract ons−101 p r e d c t e d n t e r a c t o n s D h dden to h dden −1 0 1actual nteract ons−101 p r e d c t e d n t e r a c t o n s E −1 0 1actual nteract ons−101 p r e d c t e d n t e r a c t o n s F −1 0 1actual nteract ons−101 p r e d c t e d n t e r a c t o n s G −1 0 1actual nteract ons−101 p r e d c t e d n t e r a c t o n s H
60% 70% 80% 90%pe%cent v s ble10 −3 −2 −1 M S E I nMFTAPeMFMLEFEM
60% 70% 80% 90%pe%cent v s ble10 −3 −2 −1 M S E J
60% 70% 80% 90%pe%cent v s ble10 −3 −2 −1 M S E K
60% 70% 80% 90%pe%cent v s ble10 −3 −2 −1 M S E L FIG. 4. (Color online) Performance comparison between inference methods. Predicted interactions versus actual interactions,separately in observed-to-observed (A and E), hidden-to-observed (B and F), observed-to-hidden (C and G), and hidden-to-hidden (D and H), for two numbers of hidden variables N h = 10 (first row) and 40 (second row). The mean square errorsbetween predicted interactions and actual interactions are shown as a function of the fraction ( N v /N ) of observed variablesover total variables (I-L). We compared five inference methods: na¨ıve mean-field (nMF), Thouless-Anderson-Palmer (TAP),exact mean-field (eMF), Maximum Likelihood Estimation (MLE), and Free Energy Minimization (FEM). System size N = 100and data length L = 40 ,
000 are used. For MLE, we used a learning rate α = 1. Actual neuronal spiking time (sec) neu r on Predicted neuronal spiking time (sec) neu r on i n f e r en c e a cc u r a cy number of input neurons D i sc r epan cy number of hidden variables -0.200.20.40.60.81-0.2 0 0.2 0.4 0.6 0.8 1 p r ed i c t ed c o v a r i an c e s actual covariances A B CD E F observedentire including hiddenignoring hidden
FIG. 5. (Color online) Neural network reconstruction with hidden nodes. Activities of the 80 most active neurons areplotted with black dots representing active states and white dots representing silent states (A). Discrepancies between observedand expected neuronal activities (black circles) and discrepancies between entire neuronal activities and their expectations(red triangles) are shown as a functions of the number of hidden variables (B). A predicted neuronal network is visualized inwhich green nodes represent observed neurons, while white nodes represent hidden variables. The red and blue edges representpositive and negative couplings, respectively. Edge direction is clock-wise. The node size scales with value of firing rate, and edgethickness scales with coupling weight (C). Given the reconstructed coupling strengths, external local fields and configurationsof hidden variables, the activities of 80 neurons are reconstructed (D). Inferred covariances C ij versus actual covariances C true ij (E). Inference accuracy of remaining neuronal activities are shown as a function of the number of input neurons for two cases:ignoring (blue) and including (black) the existence of hidden variables (F). B C D
E F GH
Aug 2014 - Jul 2016Aug 2016 - Jul 2018 c u m u l a t i v e p r o f i t time c u m u l a t i v e p r o f i t time -0.05%0.00%0.05%0.10%0.15%0.20%300 400 500 600 700 p r o f i t pe r t r an s a c t i on size of time windows (day) observedentire D i sc r epan cy number of hidden variables observedentire D i sc r epan cy number of hidden variables -0.2-0.100.10.2-0.2 -0.1 0 0.1 0.2 p r ed i c t ed c o v a r i an c e s actual covariances -0.2-0.100.10.2-0.2 -0.1 0 0.1 0.2 p r ed i c t ed c o v a r i an c e s actual covariances I J trade everyday trade on certain days w/o inferenceinference w/o hiddeninference with hidden inference with hiddeninference w/o hiddenw/o inference trade everydaytrade on certain days A AAPLGOOGLMSFTINTCIBMBRK.BJPMWFCBACCJNJPFEUNHMRKAMGNAMZNWMTHDDISEBAYXOMCVXGEBAMMMJan 2005 Nov 2008 Jul 2018
FIG. 6. (Color online) Stock-market network reconstruction with hidden nodes. The time series of the difference betweenopening and closing prices of 25 American companies are shown from January 2005 to July 2018 (A). The network reconstructionconsidered for different periods: from August 2014 to July 2016 (B, C, D), and from August 2016 to July 2018 (E, F, G).Discrepancies between observed and expected configurations (black circles) and discrepancies between entire variables and theirexpectations (red triangles) are computed to estimate the necessary number of hidden variables (B, E). Reconstructed stock-market networks are visualized (C, F) in which the red and blue edges represent positive and negative couplings, respectively.Edge direction is clock-wise, and edge thickness scales with coupling strengths. Inferred covariances C ij versus actual covariances C true ij (D, G). Cumulative profits are shown as a function of the time period for trade on everyday (fully trade) (H) and oncertain days (alternative trade) (I) with different trade strategies: random trades (black), strategic trades ignoring (blue) andincluding hidden variables (red). Profit per transaction versus time window size for the network reconstruction (J), for everydaytrade (dashed line) and certain-day trade (solid line). inferred external factors H ext i and interacting factors P j W ij σ j ( t ) that stochastically determine σ i ( t + 1).Since FEM works well even for small sample sizes [17], wedivided the data into two-year periods to probe possibleslower temporal changes in the interactions W ij betweenstock prices. In particular, we show results from morerecent data for 2014 to 2016 (Fig. 6B-D) and 2016 to2018 (Fig. 6E-F). The discrepancy D v between observed σ vi ( t + 1) and model expectation h σ vi ( t + 1) i model kept de-creasing as expected when more hidden nodes were intro-duced (Fig. 6B and E). However, the entire discrepancy D , considering the model complexity with hidden vari-ables, showed a minimum at N ∗ h ≈ T days: σ v ( t − T + 1) , σ v ( t − T + 2) , · · · , σ v ( t ).Then, we predicted the price change direction σ v ( t + 1)for the next day. Our strategy was to buy the stock i thathad the highest probability of increasing with a maxi-mum H i ( t ), and to sell the stock j that had the highestprobability of decreasing with a minimum H j ( t ) at thebeginning of the day. This trading strategy is expectedto have a maximum profit bounded by (close price( i ) − open price( i )) + (open price( j ) − close price( j )). Thetrading simulation from 2008 to 2018 with a moving timewindow T = 500 days obtained 350% cumulative profit(Fig. 6H). This profit was significantly higher than theprofit of 50% using random trades, which is due to thesecular rise of the entire stock index. Furthermore, thereconstructed network including hidden nodes showed alarger profit than the 250% profit from the reconstructednetwork ignoring the hidden nodes. Next, we refined thetrading strategy by buying/selling the stock that has thehighest probability of increasing/decreasing but only ifits price has decreased/increased on the previous day. Inparticular, this may result in only buying or only sell-ing on any specific day. This new strategy produced thesame cumulative profit in total, but it doubled the profitper transaction (Fig. 6I and J). Finally, we confirmedthat the optimal time window for the highest profit wasabout T = 500 days (Fig. 6J). D. Classification of handwritten digits
Another potential application of our method, inter-preting hidden states as labels, is for unsupervised clas-sification. We demonstrate this idea with the MNIST data of handwritten digits [28]. The data has 60,000 digitsamples of 28 ×
28 pixel gray-scale (between 0 and 255)images obtained from 500 different individuals. Some ofsample digits are shown in Fig. 7A. Our goal is to classifythe 60,000 images into distinct clusters where each clus-ter represents different digits as well as different writingstyles without using true labels. We formulated the clas-sification problem as follows. Different digits and writingstyle combinations are encoded in hidden variable states σ h . Then, one realization of σ h ( t ) generates a digit im-age σ v ( t ), where t is now being used to index the MNISTimages. The feature has N h degrees of freedom with σ hJ ( t ) , J = 1 , · · · , N h . In particular, for simplicity, weadopted one-hot encoding by assigning only one nonzeroelement σ hJ ( t ) = 1 among N h elements of σ h ( t ). Then,the generated image has binary values of σ vi ( t ) = 1 (gray >
1) or σ vi ( t ) = − i th pixel, whichis determined by the conditional probability, P ( σ vi ( t ) = ± | σ h ( t )) = exp( ± H i ( t ))exp( H i ( t )) + exp( − H i ( t )) (7)where H i ( t ) ≡ P J W iJ σ hJ ( t ) represents a local field act-ing on the i th pixel. Here, we ignored observed pixels i if more than 95% samples had the same value. Thethreshold 95% showed similar results as a more restrictivethreshold of 99%. Thus, for this setup, our reconstruc-tion method considers only hidden-to-observed interac-tions. Briefly summarizing the inference procedure, we(i) assign a random binary vector σ h ( t ), in which onlyone element has nonzero value ( σ hJ ( t ) = 1); (ii) applyFEM to reconstruct the interaction strength W iJ fromhidden label J to observed pixel i ; (iii) update the hid-den states by assigning σ hJ ( t ) = 1 for the label J thatmakes the likelihood of the observed pixels of sample t the highest and σ hJ ( t ) = 0 for the other N h − D v between σ vi ( t ) and h σ vi ( t ) i model saturates. Then, the one-hot hidden states σ h represent distinct classes of MNISTimages σ v .We examined various possible numbers (10 to 100)of labels by controlling the number N h of hidden vari-ables. As the hidden degrees of freedom N h increased,the model generated images of σ vi ( t ) closer to the origi-nals. In other words, the discrepancy D v kept decreasingas N h increased (Fig. 7B). However, once the model com-plexity was penalized with the overuse of the hidden de-grees of freedom, an optimal degrees of freedom N ∗ h wasdetermined with a minimum overall discrepancy D. Theestimate N ∗ h ≈
60 means that the 60,000 MNIST imagescan be optimally clustered into about 60 classes of dig-its and writing styles. The mean images 1 /N c P t ∈ c σ vi ( t )corresponding to the 60 labels are shown in Fig. 7C. Here N c is the number of samples corresponding to label c. It isof particular interest that each digit was divided into ap-proximately six classes, suggesting that, in the MNISTdataset, about six different writing styles exist for ev-ery digit. To confirm the robustness of this result, werepeated the analysis with only 20,000 of the MNIST im-ages, and obtained a similar conclusion (dashed lines inFig. 7B).
IV. SUMMARY
Given partial observations of systems, complete net-work reconstruction is a longstanding problem in infer-ence. In this paper, we propose a new iterative approachbased on free energy minimization (FEM) and expecta-tion maximization. We demonstrated on simulated sys-tems that our method can accurately estimate the ac-tual number of hidden variables from partial observa-tions. Furthermore, network reconstruction was success-ful in recovering not only observed-to-observed interac-tions but also those involving hidden variables (hidden-to-observed, observed-to-hidden, and hidden-to-hidden).Hidden-to-hidden interactions are challenging to recon-struct with mean-field methods [12, 14]. We applied thismethod to reconstruct a real neuronal network and astock market network with the inclusion of possible hid-den variables. The reconstructed networks were then val-idated by reproducing real neuronal activities and by aprofitable trade simulation, respectively. Finally, as an-other potential application to unsupervised pattern clas-sification, we found hidden labels in hand-written digitdata. FEM is more effective for network reconstruction thanmaximum likelihood estimation (MLE), because it sepa-rates the cost function evaluation from the independentmultiplicative parameter update. This has two majorbenefits that are crucial for the application to hiddenvariable problems to succeed. The first is that the costfunction can be used as a stopping criterion to avoid over-fitting for small sample sizes, important when consideringlarge numbers of possible hidden variables. The second isthat the multiplicative update is computationally muchmore efficient (approximately 100 times faster than usualMLE-based network reconstruction methods), also criti-cal for determining the configurations of hidden variables.Since the algorithm reconstructs interactions strengths W ij from the j th node to the i th node independently forthe i th node, the network reconstruction can be easilyparallelized, and therefore scaled to large system sizes. ACKNOWLEDGMENT
This work was supported by Intramural ResearchProgram of the National Institutes of Health, NIDDK(D.-T.H.,V.P.), and by Basic Science Research Pro-gram through the National Research Foundation ofKorea (NRF) funded by the Ministry of Education(2016R1D1A1B03932264) (J.J.). [1] E. Schneidman, M. J. Berry, 2nd, R. Segev, andW. Bialek, Nature , 1007 (2006).[2] D. A. Dombeck, A. N. Khabbaz, F. Collman, T. L. Adel-man, and D. W. Tank, Neuron , 43 (2007).[3] J. P. Nguyen, F. B. Shipley, A. N. Linder, G. S. Plummer,M. Liu, S. U. Setru, J. W. Shaevitz, and A. M. Leifer,Proc Natl Acad Sci U S A , E1074 (2016).[4] D. Bernal-Casas, H. J. Lee, A. J. Weitz, and J. H. Lee,Neuron , 522 (2017).[5] T. R. Lezon, J. R. Banavar, M. Cieplak,A. Maritan, and N. V. Fedoroff,Proceedings of the National Academy of Sciences , 1013 (2009).[7] Z. Bar-Joseph, A. Gitter, and I. Simon, Nature ReviewsGenetics , 552 (2012).[8] S. Pincus and R. E. Kalman, Proceedings of the NationalAcademy of Sciences of the United States of America , 13709 (2004).[9] C. K. Tse, J. Liu, and F. C. Lau,Journal of Empirical Finance , 659 (2010).[10] B. M. Tabak, T. R. Serra, and D. O. Cajueiro,Physica A: Statistical Mechanics and its Applications , 3240 (2010).[11] T. Bury, Physica A: Statistical Mechanics and its Applications , 1375 (2013).[12] B. Dunn and Y. Roudi, Phys. Rev. E , 022127 (2013).[13] A. P. Dempster, N. M. Laird, and D. B. Rubin, Journalof the Royal Statistical Society. Series B (Methodologi-cal) , 1 (1977). [14] J. Tyrcha and J. Hertz,Mathematical Biosciences and Engineering , 149 (2014).[15] L. Bachschmid-Romano and M. Opper,Journal of Statistical Mechanics: Theory and Experiment , P06013 (2014).[16] C. Battistin, J. Hertz, J. Tyrcha, and Y. Roudi,Journal of Statistical Mechanics: Theory and Experiment , P05021 (2015).[17] D.-T. Hoang, J. Song, V. Periwal, and J. Jo,arXiv:1705.06384 (2018).[18] D.-T. Hoang, J. Song, V. Periwal, andJ. Jo, Network inference in stochasticsystems (Accessed: October 25, 2018), https://nihcompmed.github.io/network-inference/ .[19] D.-T. Hoang, J. Jo, and V. Periwal,
Network Inferencewith Hidden Variables (Accessed: November 20, 2018), https://nihcompmed.github.io/hidden-variable/ .[20] Y. Roudi and J. Hertz,Phys Rev Lett , 048702 (2011).[21] M. M´ezard and J. Sakellariou, Journal of Statistical Me-chanics: Theory and Experiment , L07001 (2011).[22] H.-L. Zeng, M. Alava, E. Aurell, J. Hertz, and Y. Roudi,Phys Rev Lett , 210601 (2013).[23] H. Akaike, IEEE Transactions on Automatic Control , 716 (1974).[24] G. Schwarz, The Annals of Statistics , 461 (1978).[25] D. Sherrington and S. Kirkpatrick,Phys. Rev. Lett. , 1792 (1975).[26] G. Tkaˇcik, O. Marre, D. Amodei, E. Schnei-dman, W. Bialek, and M. J. Berry, II,PLOS Computational Biology , 1 (2014).[27] Fusion Media Limited, Stock Quotes:SP 500 (Accessed: June 28, 2018),
A C D i sc r epan cy number of hidden variables B FIG. 7. (Color online) Hidden degrees of freedom for the classification of handwritten digit images. (A) 98 image samples wererandomly selected from the training set of the MNIST data. (B) Discrepancies between observed and expected configurations(black circles) and discrepancies between entire variables and their expectations (red triangles) are shown as a function of thenumber of hidden variables. For the clustering, we used 60,000 samples (solid lines) and 20,000 samples (dashed lines). (C)Mean images of each cluster were obtained from our inference method with 60 hidden variables. Difference colors are used justto distinguish different clusters. . [28] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner,Proceedings of the IEEE86