Money flow network among firms' accounts in a regional bank of Japan
Yoshi Fujiwara, Hiroyasu Inoue, Takayuki Yamaguchi, Hideaki Aoyama, Takuma Tanaka
MMoney flow network among firms’ accountsin a regional bank of Japan
Yoshi Fujiwara † , Hiroyasu Inoue , Takayuki Yamaguchi ,Hideaki Aoyama , Takuma Tanaka Graduate School of Simulation Studies, University of Hyogo, Kobe 650-0047, Japan Center for Data Science Education and Research, Shiga University,Hikone 522-8522, Japan RIKEN iTHEMS, Wako, Saitama 351-0198, Japan Research Institute of Economy, Trade and Industry, Tokyo 100-0013, Japan Graduate School of Data Science, Shiga University, Hikone 522-8522, Japan
July 29, 2020
Abstract
In this study, we investigate the flow of money among bank accountspossessed by firms in a region by employing an exhaustive list of all thebank transfers in a regional bank in Japan, to clarify how the network ofmoney flow is related to the economic activities of the firms. The net-work statistics and structures are examined and shown to be similar tothose of a nationwide production network. Specifically, the bowtie anal-ysis indicates what we r efer to as a “walnut” structure with core andupstream/downstream components. To quantify the location of an indi-vidual account in the network, we used the Hodge decomposition methodand found that the Hodge potential of the account has a significant cor-relation to its position in the bowtie structure as well as to its net flow ofincoming and outgoing money and links, namely the net demand/supplyof individual accounts. In addition, we used non-negative matrix factor-ization to identify important factors underlying the entire flow of money;it can be interpreted that these factors are associated with regional eco-nomic activities. One factor has a feature whereby the remittance sourceis localized to the largest city in the region, while the destination is scat-tered. The other factors correspond to the economic activities specific todifferent local places. This study serves as a basis for further investigationon the relationship between money flow and economic activities of firms.
Keywords: input-output table, Hodge decomposition, non-negative matrixfactorization, walnut structureRIKEN-iTHEMS-Report-20 † Corresponding author: [email protected] a r X i v : . [ q -f i n . GN ] J u l ntroduction Determining how money flows among economic entities is an important aspectof understanding the underlying economic activities. For example, the so-calledflow of funds accounts record the financial transactions and the resulting creditsand liabilities among households, firms, banks, and the government (see, e.g.,[1]). Another example is the input-output table, which describes the purchaseand sale relationships among producers and consumers within an economy andclarifies the flows of final and intermediate goods and services with respectto industrial sectors and product outputs (e.g., [2]). These data are used inmacroscopic studies, such as those of industrial sectors and aggregated economicentities.Recent years have witnessed the increasing emergence of microscopic data.For example, one can study a nationwide production network, i.e., how in-dividual firms transfer money among one another as suppliers and customersfor transactions of goods and services (see [3] and the references therein). Incontrast to the macroscopic studies mentioned above, microscopic studies canuncover the heterogeneous structure of the network and its role in economicactivities, how the activities are subject to shocks due to natural disasters [4]and pandemics [5], and so forth. However, microscopic data are not exhaustive;although they may cover most active firms, not all the suppliers and customersare recorded. Such records are based on a survey in which a firm nominates aselected number of important customers and suppliers. In addition, the transac-tion amounts are often lacking; hence, the network is directed but only binary.More importantly, microscopic and macroscopic data are compiled and updatedannually or quarterly at most (see [3, 6] and the references therein).To uncover how economic entities such as firms perform economic activitiesin a real economy, we should ideally study how money flows among firms byusing real-time data of bank transfers with exhaustive lists of accounts andtransfers. To the best of our knowledge, such a study has not been conductedthus far, simply because such data are not available for academic purposes. Thepresent study precisely performs such an analysis of a Japanese bank’s dataset.The bank is a regional bank, which has a high market share with respect to theloans and deposits in a prefecture, particularly supporting financial transactionsamong the manufacturing firms located there (according to a disclosure issuedby the bank).The objective of this study is to investigate economic activities via banktransfers among firms’ accounts by selecting all the transfers related to the firmsto uncover how money flows behind the economic activities. More specifically,we examine the network and flow structures, especially the so-called bowtiestructure, to locate the position of individual accounts upstream and down-stream of the entire flow. We quantify the location using the method of Hodgedecomposition of the flow. Furthermore, we find significant factors underlyingthe entire flow and interpret them using geographical information associatedwith the firms’ accounts. 2 ata
Our dataset comprises all the bank transfers that are sent from or received bythe bank accounts in a regional bank. The regional bank is the largest bank in aprefecture in Japan (mid-sized in terms of its population (more than a million)and economic activity). Hereafter, we refer to it as Bank A for anonymity. Theperiod covered in our study is from March 1, 2017, to July 31, 2019, i.e., aperiod of 29 months or 883 days.During this period, there were 23 million transfers among 1.7 million bankaccounts involving a total of 17.4 trillion yen (roughly 160 billion USD or 140billion Euros). Let us denote a transfer from account i to account j by i → j .To focus only on the firms’ accounts in Bank A, we filtered the data such that(i) both i and j are the accounts of Bank A, (ii) both i and j are owned byfirms excluding households, and (iii) self-loops i → i are deleted. Point (ii) isimportant for our purpose, because our concern here is how money flows andcirculates among firms’ accounts, which is considered to be closely related tothe firms’ economic activities. The resulting data are summarized in Table 1(see the rightmost column).Table 1: Bank accounts and transfers: summaryNumber/Amount Entire data Within Bank A all firms i → j , the column “Entire data” includes the cases in whicheither i or j is not an account of Bank A. The column “Within Bank A”corresponds to the case in which both i and j are accounts of Bank A. “firms”implies that both the source and the target of a link are firm accounts. M andT denote million and trillion, respectively.Note that multiple transfers i → j can exist for a given pair of i and j ,because of frequent transfers. One can quantify the strength of the directionalrelationship between a pair of accounts either by the flow of transfers or by theirfrequency. To do so, we aggregate multiple transfers, if present, into a singlelink i → j with two types of weights, namely flow f ij and frequency g ij (see theillustration in Fig. 1). Hereafter, we use the term link for aggregated transfers.The number of accounts or nodes in the network is N = 30 , M = 280 ,
864 after the aggregation (see Table 1).The summary statistics of the links’ flows f ij and frequencies g ij for allthe pairs of accounts i and j are presented in Table 2. One can observe thatthe distributions for flow and frequency have large skewness, implying that aconsiderable fraction of the money flow is due to a large amount transferred bya small number of flows. 3 j i jf ij = 7 g ij = 3 f ji = 1 g ji = 1Aggregated Figure 1:
Construction of bank-transfer network by aggregation.
Howbank transfers are aggregated into links. i made three transfers (1, 2, and 4)in an arbitrary unit of money to j , while j made one transfer (1) to i during acertain period. Flow f ij is defined by the total flow of transfers along i → j .Frequency g ij is the frequency of these transfers.Table 2: Summary statistics for links’ flows and frequenciesStats. Flow (Yen) Frequency
Min. 1 1Max. 3 . × . × . × . × . . . × . × Summary statistics of the links’ flows and frequencies for all the pairs ofaccounts, where links are aggregated transfers as defined in the main text andFig. 1.
Results and Discussion
Network of firms’ accounts and links of transfers
First, let us summarize the network structure comprising firms’ accounts asnodes and aggregated transfers as links. We remark that transfers are aggre-gated into links as shown in Fig. 1. The degree is the number of transfersreceived by or sent from an account. The number of incoming and outgoinglinks of an account is called the in-degree and out-degree, respectively. Fig. 2shows the distributions of the in-degree and out-degree as complementary cumu-lative distributions. By noting that the total number of accounts is N = 30 , r = 0 .
303 ( p < − ); Kendall’s τ = 0 . p < − )). We also observe that there are accounts that have many moreincoming links than outgoing ones (and vice versa), which can be respectively4 -5 -4 -3 -2 -1 C u m u l a t i v e P r obab ili t y Degree
In-degreeOut-degree
Figure 2:
Degree distributions for the bank transfer network.
Comple-mentary cumulative distributions for in-degree and out-degree, which refer tothe number of incoming and outgoing links, respectively, of each account.considered as “sinks” and “sources” with respect to the money flow.We can observe each link’s weights, flow f ij , and frequency g ij (see Fig. 1).Fig. 4 shows the complementary cumulative distribution for the flow along eachlink. The distribution is highly skewed; there exist a small number of links thathave a large amount of flow exceeding a billion yen—likely important channelswith large flows of money. Quantitatively, 0.1% of the links have flows largerthan a billion yen.Fig. 5 shows the complementary cumulative distribution for the frequencyalong each link. The steps at 30 and 60 on the horizontal axis are consideredto correspond to transfers performed once or twice in each month (recall thatthe entire period includes 29 months). We can see that 0.1% of the links havefrequencies of 500 or more corresponding to daily transfers on weekdays. Community analysis
Communities or clusters in a network are tightly knit groups with high intra-group density and low inter-group connectivity [7]. Community analysis is usefulfor understanding how a network has such heterogeneous structures. We adoptthe widely used Infomap method [8, 9] to detect communities in our data.The results are presented in Table 3. “Level” indicates the level of communi-ties in a hierarchical tree of communities that are detected recursively (see [9]).The number of communities indicates how many communities are detected atthe corresponding level. The label “irr. comm.” denotes irreducible communi-ties that cannot be decomposed further to the next level of smaller communitiesin the hierarchical decomposition. For example, 143 of 164 communities at the5 D eg r ee ( ou t ) Degree (in)
Figure 3:
Scatter plot for in-degree and out-degree of each account.
Each account as a node, represented as a point, has incoming links and outgoinglinks, the numbers of which are represented by the horizontal and vertical axes,respectively. The diagonal line represents the locations where the in-degree andout-degree are equal. -6 -5 -4 -3 -2 -1 C u m u l a t i v e P r obab ili t y Edge Flow (in Yen)
Figure 4:
Distribution for the flows of links.
Complementary cumula-tive distributions for the amount of money defined by f ij between each pair ofaccounts i and j (see Fig. 1). 6 -6 -5 -4 -3 -2 -1 C u m u l a t i v e P r obab ili t y Edge Frequency
Figure 5:
Distribution for the frequencies of transfers.
Complementarycumulative distributions for the frequency defined by g ij between each pair ofaccounts i and j (see Fig. 1). We can observe that there are frequency stepsaround 30 and 60, which are presumed as periodic transfers performed once ortwice in each month (recall that the entire period includes 29 months).7rst level are irreducible ones, whereas the rest of them are decomposed into2,327 smaller communities at the next level, and so forth.Table 3: Numbers of communities, irreducible communities, and ac-counts at each level of community analysis using InfomapLevel
Bowtie structure
With respect to the flow of money, the accounts can be located in a classifi-cation of the so-called bowtie structure, which was first adopted in the studyof the Internet [11]. Nodes in a directed network can be classified into a giantweakly connected component (GSCC), its upstream side as the IN component,its downstream side as the OUT component, and the rest of the nodes that donot belong to any of GSCC, IN, and OUT. In general, they can be defined asfollows.
GWCC
Giant weakly connected component: the largest connected componentwhen viewed as an undirected graph. At least one undirected path exists8 R an k Community Size
Figure 6:
Distributions of the sizes of irreducible communities.
Rank-size plot for the sizes of irreducible communities detected using the Infomapmethod at all the levels, where the ranks are in descending order of the sizewith the lowest rank equal to the total number of irreducible communities (seeTable 3). The size of a community is simply the number of nodes included inthe community. 9or an arbitrary pair of nodes in the component.
GSCC
Giant strongly connected component: the largest connected componentwhen viewed as a directed graph. At least one directed path exists for anarbitrary pair of nodes in the component. IN Nodes from which the GSCC is reached via directed paths.
OUT
Nodes that are reachable from the GSCC via directed paths. TE “Tendrils”: the rest of GWCCTherefore, we have the components such thatGWCC = GSCC + IN + OUT + TE (1)For our data of the entire network with N = 30 ,
613 nodes and M =280 ,
864 links, the GWCC component comprises 30,225 (99.0%) nodes and280,598 (99.9%) links. The components of GSCC, IN, and OUT are summa-rized in Table 4. As can be seen, nearly 40% of the accounts are inside GSCC.Further, 15% of the accounts are in the upstream portion or IN, whereas 37%are in the downstream portion or OUT. These figures are very similar to thoseobserved in the production network in Japan in a previous study [10].The set of three components of GSCC, IN, and OUT is usually referred to asa “bowtie”; however, we find that the entire shape does not look like a “bowtie”but like a “walnut” in the sense that IN and OUT are two mutually disjoint thinskins enveloping the core of GSCC rather than two wings elongating from thecenter of a bowtie. In fact, by examining the shortest-path lengths from GSCCto IN or OUT, we can see that the accounts in the IN and OUT componentsare just a few steps away from GSCC as shown in Table 5. This feature is alsosimilar to the production network on a nationwide scale (see the walnut structurein [10]); however, is different from many social and technological networks suchas the Internet, where the maximum distances from GSCC to IN or OUT areusually very long (see the original paper [11]).Table 4:
Bowtie or “walnut” structure: size of each component.Component
GSCC 11,543 38.2%IN 4,508 14.9%OUT 11,270 37.3%TE 2,904 9.6%total 30,225 100%“Ratio” refers to the ratio of the number of firms to the total number ofaccounts in GWCC.
Hodge decomposition: upstream/downstream flow
Our analysis of the bowtie structure implies that the nodes in IN and OUT arelocated in the upstream and downstream sides in the flow of money. The Hodgedecomposition of the flow in a network is a mathematical method of ranking10able 5: “Walnut” structure: shortest distance from GSCC toIN/OUT. IN to GSCC OUT from GSCCDistance A ij denote adjacency matrix of our directed network of bank transfers, i.e., A ij = (cid:40) i to j, . (2)Recall that the numbers of accounts and links are N and M , respectively. Weexcluded all the self-loops, implying that A ii = 0. Each link has a flow, denotedby B ij , either of the total amount of transfers, f ij , or the frequency of transfers, g ij (see Fig. 1), i.e., B ij = (cid:40) f ij or g ij if there is a flow from i to j, . (3)Note that there may be a pair of accounts such that A ij = A ji = 1 and B ij , B ji >
0. Next, we shall take the frequency of transfers, g ij , by assum-ing that it represents the strength of the link.Let us define a “net flow” F ij by F ij = B ij − B ji (4)and a “net weight” w ij by w ij = A ij + A ji . (5)Note that w ij is symmetric, i.e., w ij = w ji , and non-negative, i.e., w ij ≥ i and j . We remark that Eq. (5) is simply a convention to considerthe effect of mutual links between i and j . One could multiply Eq. (5) by 0.5or an arbitrary positive number, which does not change the result significantlyfor a large network.Now, the Hodge decomposition is given by F ij = F (c) ij + F (g) ij , (6)11igure 7: Walnut structure: a schematic view.
The so-called bowtie struc-ture reveals that GSCC includes nearly 40% of all the nodes or accounts, whilethe IN and OUT components include 15% and 37%, respectively (see Table 4 forthe details). The prominent features are as follows. (i) The shortest distancesto IN and OUT from GSCC are quite small, typically 1 or 2, and 4 at most(Table 5); hence, the ties are not elongated like a “bowtie” but rather like a“walnut” skin. (ii) The nodes in the components of IN and OUT are connectedto the nodes scattered widely in GSCC. See also the study of a supplier-customernetwork [10] with similar features.where the circular flow F (c) ij satisfies (cid:88) j F (c) ij = 0 , (7)which implies that the circular flow is divergence-free. The gradient flow F (g) ij can be expressed as F (g) ij = w ij ( φ i − φ j ) , (8)i.e., the difference of “potentials”. In this manner, the weight w ij serves to makethe gradient flow possible only where a link exists. We refer to the quantity φ i as the Hodge potential . If φ i is relatively large, the account i is located in theupstream side of the entire network, while a small φ i implies that i is locatedin the downstream side of the entire network.Eqs. (6)–(8) can be solved as follows. First, we combine them into thefollowing equation for the Hodge potentials ( φ , · · · , φ N )( ≡ φ ): (cid:88) j L ij φ j = (cid:88) j F ij , (9)for i = 1 , . . . , N . Here, L ij is the so-called graph Laplacian and defined by L ij = δ ij (cid:88) k w ik − w ij , (10)where δ ij is the Kronecker delta. 12 F r equen cy Hodge Potential
GSCCINOUTTE
Figure 8:
Distribution of the Hodge potentials of individual accounts.
Distributions as histograms of φ i in each component of the bowtie or walnutstructure Fig. 7. The horizontal axis represents the value of φ i of an individ-ual node or account, while the vertical axis represents the frequency in thehistogram. The black line corresponds to GSCC or the core. The blue andred lines, respectively, correspond to the IN and OUT components or upstreamand downstream with respect to the core. The green line corresponds to TE(tendrils) or the rest of the nodes.It is straightforward to show that the matrix L = ( L ij ) has only one zeromode (eigenvector with zero eigenvalue), i.e., φ = (1 , , · · · , / √ N . The pres-ence of this zero mode simply corresponds to the arbitrariness in the origin of φ .We can show that all the other eigenvalues are positive (see, e.g., [17]). There-fore, Eq. (9) can be solved for the potentials by fixing the potentials’ origin. Weassume that the average value of φ is zero, i.e., (cid:80) i φ i = 0.The Hodge potentials obtained for the entire network of GWCC are shownin Fig. 8 as the distribution for the potentials of all the accounts in GWCC (redline). By noting that the average is zero by definition, we can see that it is abimodal distribution with two peaks at positive and negative values, while thereare a number of potential values close to zero (peaks around zero). The nodesin TE (tendrils) can be considered to have locations that are not particularlyrelevant to upstream or downstream; we can expect that these nodes mostlyhave potentials close to zero, as shown by the blue line, i.e., the result afterdeleting all the nodes contained in TE’s. We can see that these TE do notcontribute to large absolute values of the Hodge potentials.It can be expected that there is a correlation between the value of the Hodgepotential and the net amount of demand or supply of money for each node. Wecan measure the net amount of demand/supply by examining the in-degree andout-degree of the node, or alternatively, the in-flow and out-flow of money. Fig. 9and Fig. 10 show the results. We find that if the potential is positive, the node13 N e t D eg r ee ( i n - ou t ) Hodge Potential
Figure 9:
Hodge potential and net degree for each node.
Each pointrepresents a node or an account. The net degree is defined by the differencebetween the in-degree and the out-degree of the node. If the net degree ispositive, the node has more incoming links than outgoing ones and vice versa.is located in the upstream side, and its net degree and flow are negative. If thepotential is negative, the node is located in the downstream side, and its netdegree and flow are positive.This finding can be interpreted as follows. Consider a supplier in the pro-duction network, which supplies its products to a number of customers. Thesupplier has a bank account (or possibly multiple accounts) that receives moneyfrom the customers’ accounts as the supplier’s sales. If the supplier is in theupstream side of the supplier-customer relationship, it is likely that the accountis located in the downstream side of the money flows in this study. As the sup-plier not only makes sales but also incurs costs, typically labor costs, there mustbe an outgoing flow from its account to be linked with households and othernon-commercial entities, which are not included in the present study. Conse-quently, the supplier’s account has a negative net degree and flow, while itsHodge potential is likely positive. A similar argument would hold for customersin an opposite way. In other words, our finding is a direct observation of howthe flow of money reflects the economic activities among the firms’ accounts.
Non-negative matrix factorization (NMF): hidden factorsof flows
We would like to show that there are hidden “factors” in the entire flow of thenetwork. By “factor”, we mean a component that can explain a significant partof the flow. Alternatively, the entire flow can be decomposed into only a smallnumber of factors. 14 N e t F l o w ( i n - ou t, m illi on Y en ) Hodge Potential
Figure 10:
Hodge potential and net flow for each node.
This figure issimilar to Fig. 9 except for the vertical axis, which represents the net flow. Thenet flow is defined by the difference between the incoming amount of money andthe outgoing one.In this section, we focus on the geographical information of bank transfers.Each bank account has an address. We obtain the latitudes and longitudes ofthe bank accounts by using geocoding. Consequently, a bank transfer betweentwo bank accounts has two coordinates of its remittance source and destination.We construct a non-negative matrix defined from the frequencies between thegeographical areas, and we adopt NMF to find the hidden factors of geographicalstructures of the flow.NMF constructs an approximate factorization of a non-negative matrix [18].For example, NMF is useful for processing facial images because it producesparts-based representations of such images [19]. To reveal the basic componentsof the geographical structure of bank transfers, we apply NMF to a non-negativematrix V = ( V mn ) defined as follows. We set a square area including theprefecture and split it into K × K smaller squares in a lattice pattern, where K = 100. Let R pq be the ( p, q ) small square area for 1 ≤ p, q ≤ K . Weconsider the frequencies of bank transfers between two small square areas. Let α ( p , q , p , q ) be the frequency of bank transfers from ( p , q ) to ( p , q ) for1 ≤ p , q , p , q ≤ K , i.e., using the frequency g ij of transfers from account i toaccount j , α ( p , q , p , q ) = (cid:88) { ( i,j ) | ( x i ,y i ) ∈ R p q , ( x j ,y j ) ∈ R p q } g ij , (11)where ( x i , y i ) is the coordinate of the address of account i . The non-negative15atrix V of size K × K is defined by V mn = log(max { , α ( p , q , p , q ) } ) , (12)where m = p + ( q − K and n = p + ( q − K . For practical purposes, weconvert the frequencies into their logarithmic values to reduce the influence ofoutstanding values.NMF gives the approximate factorization V ≈ W H (13)for some integer d , where W and H are non-negative matrices of size K × d and d × K , respectively. We let d = 10 from prior knowledge that the numberof local communities in the prefecture is around 10. Since the m th row of V corresponds to bank transfers from ( p, q ) for m = p + ( q − K , the rows of H constitute a basis of bank transfers for the given sources. Similarly, since the m th column corresponds to bank transfers to ( p, q ) for m = p + ( q − K , thecolumns of W constitute a basis of bank transfers for the given destinations.We can regard Eq. (13) as the approximation of V by the sum of products ofthese basis vectors. By letting w m be the m th column vector and h m be the m th row vector, we have V ≈ d (cid:88) m =1 w m h m . (14)The logarithms of the frequencies of bank transfers in the target area are de-composed into matrices w m h m for m = 1 , . . . , d .A basis vector v , which is a column vector w m of W or a row vector h m of H , can be converted to a K × K matrix D ( v ), 1 ≤ p, q ≤ K , on the geograph-ical square area because an entry of V corresponds to the frequency of banktransfers between two small square areas. In other words, D ( v ) is representedas a heatmap in the geographical area and Fig. 11 shows a heatmap of a basisvector. Since basis vectors seem to indicate geographically localized structures,to quantify such structures, we consider a circular area for a basis vector so thatthe sum of entries of the basis vector included in the circular area is maximized.Let r pq be the coordinate of the center of R pq and let C pq be a circular areawhose radius is 10 km and center is r pq . For a K × K matrix E = ( E pq ) and acircular area C , we define β ( C, E ) = (cid:80) { ( p,q ) | r pq ∈ C } E pq (cid:80) { ( p,q ) | ≤ p,q ≤ K } E pq . (15)The proportion γ ( v ) is calculated by C (cid:48) ( v ) = arg max { C pq | ≤ p,q ≤ K } β ( C pq , D ( v )) (16) γ ( v ) = max { C pq | ≤ p,q ≤ K } β ( C pq , D ( v )) . (17)The proportion γ and the circular area C (cid:48) of a basis vector are shown in Fig. 11.The panels (A) and (B) in Fig. 12 show the proportions γ of all the basisvectors of sources and destinations. The proportions are more than 23% except16 .000.050.100.150.20 Figure 11:
Normalized basis vector obtained by NMF. The circulararea has the largest sum of entries of the basis vector included in thecircular area.
A normalized basis vector such that the sum of entries is one isconverted into a heatmap whose lattice pattern corresponds to R pq . The radiusof the circular area is 10 km. The circular area is C (cid:48) ( v ) for some basis vector v ,i.e., it is located at a position such that β ( · , D ( v )) is maximized.
34 2924 31263836 37 26 35 (A)
34 3126 353138 9 35 23 33 (B)
Figure 12:
Circular areas corresponding to the basis vectors and pro-portions of the vector entries included in the circular areas. (A) isdrawn from w m , i.e., the basis vectors for sources, and the proportions γ ( w m ),while (B) is drawn from h m , i.e., the basis vectors for destinations, and theproportions γ ( h m ) for m = 1 , . . . , d . 17 .00.20.40.60.81.0 Figure 13:
Cosine similarities between basis vectors.
The vertical axisrepresents the indices of h s , i.e., the s th row vector of H , and the horizontalaxis represents the indices of w t , i.e., the t th column vector of W . The index ofthe top left square is ( s, t ) = (0 , w m and h n is calculated by w m · h n (cid:107) w m (cid:107)(cid:107) h n (cid:107) , (18)where w m · h n is the inner product of w m and h n and (cid:107) · (cid:107) is the Euclidean normof a vector. All the diagonal entries except for one are 1’s, i.e., the m th basisvector h m is similar to the m th basis vector w m except for m = 7. These basisvectors correspond to basis vectors having geographically localized properties inFig. 12, and the similarities of pairs of basis vectors imply that both incomingand outgoing bank transfers for a local area have similar patterns.We can also interpret the seventh basis vectors of the source and destinationthat do not have similarities. The seventh basis vector of the source is localizedto the largest city in the prefecture and the seventh basis vector of the destina-tion is scattered throughout the prefecture. This means that the pair of thesebasis vectors corresponds to bank transfers from the largest city to the localareas. Therefore, Eq. (14) for our data gives decompositions that describe banktransfers in local areas and bank transfers between the largest city and localareas.Finally, we state the results of NMF with different values of d . To investigatethe changes in the basis vectors that occur according to d , we apply NMF to V with d = 5 , . . . ,
15. In all the cases, most of the basis vectors are geographicallylocalized and form source and destination pairs that are similar to each other andcorrespond to bank transfers in local areas. All the basis vectors are localizedfor d less than 7, and there is a pair of basis vectors corresponding to banktransfers between the largest city and local areas for d greater than or equal to7. For all the values of d that we have examined, the basis vectors correspondto either bank transfers in local areas or bank transfers between the largest cityand other local areas. Conclusion
We studied an exhaustive list of bank accounts of firms and remittances fromsource to destination within a regional bank with a high market share of loansand deposits in a prefecture of Japan. By studying such a network of moneyflow, we could uncover how firms conduct the underlying economic activitiesas suppliers and customers from the upstream side to the downstream side ofthe money flow. We aggregated the remittances that occurred for each pairof accounts as a link during the period from March 2017 to July 2019 (i.e.,approximately two and a half years), which comprises 30K nodes and 0.28Mlinks. We found that the statistical features of the network are actually similarto those of a production network on a nationwide scale in Japan [3], but withgreater emphasis on the regional aspects.19he bowtie analysis revealed what we refer to as a “walnut” structure inwhich the core and upstream/downstream components are tightly connectedwithin the shortest distances, typically at a few steps. By quantifying the loca-tion of the individual account of a firm using the method of Hodge decomposi-tion, we found that the Hodge potential of each node can describe the locationin the entire flow of money from the upstream side to the downstream side, wellcharacterized by the values of the potential. In particular, there is a significantcorrelation between the Hodge potentials and the net flows of incoming andoutgoing money and links as well as the potentials and the walnut structure.This implies that we can characterize the net demand/supply of each node anddecompose the flows into those due to the difference in potentials as well asdivergence-free flows. Furthermore, by using non-negative matrix factorization,we uncovered the fact that the entire flow can be considered as a combinationof several significant factors. One factor has a feature whereby the remittancesource is localized to the largest city in the region, while the destination isscattered. The other factors correspond to the economic activities specific todifferent local places, which can be interpreted as local activities of the economy.We can consider several points that remain to be studied separately from thepresent work. While we aggregated the entire period in this paper, it would beinteresting to determine how the network changes with time by examining thetime-stamps recorded in every remittance. At time scales of days, weeks, andmonths, it is quite likely that there are intra-day, weekly, and seasonal patternsof activities. More interestingly, under mild changes in the booms and bustsof the regional economy on a relatively long time scale, the economic agentsmight change their behaviors possibly by changing peers in the transactions.Alternatively, under sudden changes due to natural disasters or pandemics, theagents can change their usual patterns abruptly. In other words, these areimportant aspects of a temporally changing network.In addition, further investigation of the aspect of money flow amounts iswarranted in the sense that the dominant driving force likely comes from “giantplayers” who demand or supply a large amount of money. Moreover, it wouldbe interesting to select them in a subgraph by choosing only links with flowamounts that are larger than a certain threshold. These topics will be studiedin our future work.
Acknowledgements
We would like to thank Bank A for giving us an opportunity to study such aunique and valuable dataset. We are also grateful to Yoshiaki Nakagawa (Cen-ter for Data Science Education and Research, Shiga University) for insightfuldiscussions.
Funding
This work was supported in part by MEXT as Exploratory Challenges on Post-Kcomputer (Studies of Multi-level Spatiotemporal Simulation of SocioeconomicPhenomena), the project “Large-scale Simulation and Analysis of EconomicNetwork for Macro Prudential Policy” undertaken at the Research Institute of20conomy, Trade and Industry (RIETI), and JSPS KAKENHI Grant Numbers17H02041, 19K22032, and 20H02391.
Availability of data and materials
The dataset is available in a collaborative scheme upon request to TT and YFat Shiga University.
Competing interests
The authors declare that they have no competing interests.
Author’s contributions
All authors contributed equally. All authors read and approved the final manuscript.
References [1] Bank of Japan: Guide to Japan’s Flow of Funds Accounts. . accessed June 2020[2] OECD: Input-Output Tables. . accessed June 2020[3] Aoyama, H., Fujiwara, Y., Ikeda, Y., Iyetomi, H., Souma, W., Yoshikawa,H.: Macro-Econophysics – New Studies on Economic Networks and Syn-chronization. Cambridge University Press, Cambridge, UK (2017)[4] Inoue, H., Todo, Y.: Firm-level propagation of shocks through supply-chainnetworks. Nature Sustainability , 841–847 (2019)[5] Inoue, H., Todo, Y.: The Propagation of Economic Impacts through Sup-ply Chains: The Case of a Mega-city Lockdown to Prevent the Spread ofCOVID-19. Research Institute of Economy, Trade and Industry (RIETI)Discussion Paper Series (2020)[6] Fujiwara, Y., Aoyama, H.: Large-scale structure of a nation-wide produc-tion network. The European Physical Journal B (4), 565–580 (2010)[7] Barab´asi, A.-L.: Network Science. Cambridge University Press, Cambridge,UK (2016)[8] Rosvall, M., Bergstrom, C.T.: Maps of random walks on complex networksreveal community structure. Proceedings of the National Academy of Sci-ences (4), 1118–1123 (2008)[9] Rosvall, M., Bergstrom, C.T.: Multilevel compression of random walks onnetworks reveals hierarchical organization in large integrated systems. PloSone (4), 18209 (2011) 2110] Chakraborty, A., Kichikawa, Y., Iino, T., Iyetomi, H., Inoue, H., Fujiwara,Y., Aoyama, H.: Hierarchical communities in walnut structure of japaneseproduction network. PLoS ONE , 10–13710202739 (2018)[11] Broder, A., Kumar, R., Maghoul, F., Raghavan, P., Rajagopalan, S., Stata,R., Tomkins, A., Wiener, J.: Graph structure in the Web. Computer Net-works (1-6), 309–320 (2000)[12] Jiang, X., Lim, L.-H., Yao, Y., Ye, Y.: Statistical ranking and combinato-rial hodge theory. Mathematical Programming (1), 203–244 (2011)[13] Miura, K., Aoki, T.: Scaling of hodge-kodaira decomposition distinguisheslearning rules of neural networks. IFAC-PapersOnLine (18), 175–180(2015). 4th IFAC Conference on Analysis and Control of Chaotic SystemsCHAOS 2015[14] Kichikawa, Y., Iyetomi, H., Iino, T., Inoue, H.: Hierarchical and Circu-lar Flow Structure of Interfirm Transaction Networks in Japan. https://ssrn.com/abstract=3173955 (2018)[15] Iyetomi, H., Aoyama, H., Fujiwara, Y., Souma, W., Voden-ska, I., Yoshikawa, H.: Relationship between macroeconomic in-dicators and economic cycles in u.s. Sci. Rep. , 8420 (2020).https://doi.org/10.1038/s41598-020-65002-3[16] MacKay, R., Johnson, S., Sansom, B.: How directed is a directed network?arXiv preprint arXiv:2001.05173 (2020)[17] Fujiwara, Y., Islam, R.: Hodge Decomposition of Bitcoin Money Flow.Springer. in press (2020)[18] Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization.In: Proceedings of the 13th International Conference on Neural InformationProcessing Systems. NIPS’00, pp. 535–541. MIT Press, Cambridge, MA,USA (2000)[19] Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative ma-trix factorization. Nature (6755), 788–791 (1999). doi:10.1038/44565