[PDF] Money flow network among firms' accounts in a regional bank of Japan

Abstract

In this study, we investigate the flow of money among bank accounts possessed by firms in a region by employing an exhaustive list of all the bank transfers in a regional bank in Japan, to clarify how the network of money flow is related to the economic activities of the firms. The network statistics and structures are examined and shown to be similar to those of a nationwide production network. Specifically, the bowtie analysis indicates what we refer to as a "walnut" structure with core and upstream/downstream components. To quantify the location of an individual account in the network, we used the Hodge decomposition method and found that the Hodge potential of the account has a significant correlation to its position in the bowtie structure as well as to its net flow of incoming and outgoing money and links, namely the net demand/supply of individual accounts. In addition, we used non-negative matrix factorization to identify important factors underlying the entire flow of money; it can be interpreted that these factors are associated with regional economic activities.One factor has a feature whereby the remittance source is localized to the largest city in the region, while the destination is scattered. The other factors correspond to the economic activities specific to different local places.This study serves as a basis for further investigation on the relationship between money flow and economic activities of firms.

Full PDF

MMoney ﬂow network among ﬁrms’ accountsin a regional bank of Japan

Yoshi Fujiwara † , Hiroyasu Inoue , Takayuki Yamaguchi ,Hideaki Aoyama , Takuma Tanaka Graduate School of Simulation Studies, University of Hyogo, Kobe 650-0047, Japan Center for Data Science Education and Research, Shiga University,Hikone 522-8522, Japan RIKEN iTHEMS, Wako, Saitama 351-0198, Japan Research Institute of Economy, Trade and Industry, Tokyo 100-0013, Japan Graduate School of Data Science, Shiga University, Hikone 522-8522, Japan

July 29, 2020

Abstract

In this study, we investigate the ﬂow of money among bank accountspossessed by ﬁrms in a region by employing an exhaustive list of all thebank transfers in a regional bank in Japan, to clarify how the network ofmoney ﬂow is related to the economic activities of the ﬁrms. The net-work statistics and structures are examined and shown to be similar tothose of a nationwide production network. Speciﬁcally, the bowtie anal-ysis indicates what we r efer to as a “walnut” structure with core andupstream/downstream components. To quantify the location of an indi-vidual account in the network, we used the Hodge decomposition methodand found that the Hodge potential of the account has a signiﬁcant cor-relation to its position in the bowtie structure as well as to its net ﬂow ofincoming and outgoing money and links, namely the net demand/supplyof individual accounts. In addition, we used non-negative matrix factor-ization to identify important factors underlying the entire ﬂow of money;it can be interpreted that these factors are associated with regional eco-nomic activities. One factor has a feature whereby the remittance sourceis localized to the largest city in the region, while the destination is scat-tered. The other factors correspond to the economic activities speciﬁc todiﬀerent local places. This study serves as a basis for further investigationon the relationship between money ﬂow and economic activities of ﬁrms.

Keywords: input-output table, Hodge decomposition, non-negative matrixfactorization, walnut structureRIKEN-iTHEMS-Report-20 † Corresponding author: [email protected] a r X i v : . [ q -f i n . GN ] J u l ntroduction Determining how money ﬂows among economic entities is an important aspectof understanding the underlying economic activities. For example, the so-calledﬂow of funds accounts record the ﬁnancial transactions and the resulting creditsand liabilities among households, ﬁrms, banks, and the government (see, e.g.,[1]). Another example is the input-output table, which describes the purchaseand sale relationships among producers and consumers within an economy andclariﬁes the ﬂows of ﬁnal and intermediate goods and services with respectto industrial sectors and product outputs (e.g., [2]). These data are used inmacroscopic studies, such as those of industrial sectors and aggregated economicentities.Recent years have witnessed the increasing emergence of microscopic data.For example, one can study a nationwide production network, i.e., how in-dividual ﬁrms transfer money among one another as suppliers and customersfor transactions of goods and services (see [3] and the references therein). Incontrast to the macroscopic studies mentioned above, microscopic studies canuncover the heterogeneous structure of the network and its role in economicactivities, how the activities are subject to shocks due to natural disasters [4]and pandemics [5], and so forth. However, microscopic data are not exhaustive;although they may cover most active ﬁrms, not all the suppliers and customersare recorded. Such records are based on a survey in which a ﬁrm nominates aselected number of important customers and suppliers. In addition, the transac-tion amounts are often lacking; hence, the network is directed but only binary.More importantly, microscopic and macroscopic data are compiled and updatedannually or quarterly at most (see [3, 6] and the references therein).To uncover how economic entities such as ﬁrms perform economic activitiesin a real economy, we should ideally study how money ﬂows among ﬁrms byusing real-time data of bank transfers with exhaustive lists of accounts andtransfers. To the best of our knowledge, such a study has not been conductedthus far, simply because such data are not available for academic purposes. Thepresent study precisely performs such an analysis of a Japanese bank’s dataset.The bank is a regional bank, which has a high market share with respect to theloans and deposits in a prefecture, particularly supporting ﬁnancial transactionsamong the manufacturing ﬁrms located there (according to a disclosure issuedby the bank).The objective of this study is to investigate economic activities via banktransfers among ﬁrms’ accounts by selecting all the transfers related to the ﬁrmsto uncover how money ﬂows behind the economic activities. More speciﬁcally,we examine the network and ﬂow structures, especially the so-called bowtiestructure, to locate the position of individual accounts upstream and down-stream of the entire ﬂow. We quantify the location using the method of Hodgedecomposition of the ﬂow. Furthermore, we ﬁnd signiﬁcant factors underlyingthe entire ﬂow and interpret them using geographical information associatedwith the ﬁrms’ accounts. 2 ata

Our dataset comprises all the bank transfers that are sent from or received bythe bank accounts in a regional bank. The regional bank is the largest bank in aprefecture in Japan (mid-sized in terms of its population (more than a million)and economic activity). Hereafter, we refer to it as Bank A for anonymity. Theperiod covered in our study is from March 1, 2017, to July 31, 2019, i.e., aperiod of 29 months or 883 days.During this period, there were 23 million transfers among 1.7 million bankaccounts involving a total of 17.4 trillion yen (roughly 160 billion USD or 140billion Euros). Let us denote a transfer from account i to account j by i → j .To focus only on the ﬁrms’ accounts in Bank A, we ﬁltered the data such that(i) both i and j are the accounts of Bank A, (ii) both i and j are owned byﬁrms excluding households, and (iii) self-loops i → i are deleted. Point (ii) isimportant for our purpose, because our concern here is how money ﬂows andcirculates among ﬁrms’ accounts, which is considered to be closely related tothe ﬁrms’ economic activities. The resulting data are summarized in Table 1(see the rightmost column).Table 1: Bank accounts and transfers: summaryNumber/Amount Entire data Within Bank A all ﬁrms i → j , the column “Entire data” includes the cases in whicheither i or j is not an account of Bank A. The column “Within Bank A”corresponds to the case in which both i and j are accounts of Bank A. “ﬁrms”implies that both the source and the target of a link are ﬁrm accounts. M andT denote million and trillion, respectively.Note that multiple transfers i → j can exist for a given pair of i and j ,because of frequent transfers. One can quantify the strength of the directionalrelationship between a pair of accounts either by the ﬂow of transfers or by theirfrequency. To do so, we aggregate multiple transfers, if present, into a singlelink i → j with two types of weights, namely ﬂow f ij and frequency g ij (see theillustration in Fig. 1). Hereafter, we use the term link for aggregated transfers.The number of accounts or nodes in the network is N = 30 , M = 280 ,

864 after the aggregation (see Table 1).The summary statistics of the links’ ﬂows f ij and frequencies g ij for allthe pairs of accounts i and j are presented in Table 2. One can observe thatthe distributions for ﬂow and frequency have large skewness, implying that aconsiderable fraction of the money ﬂow is due to a large amount transferred bya small number of ﬂows. 3 j i jf ij = 7 g ij = 3 f ji = 1 g ji = 1Aggregated Figure 1:

Construction of bank-transfer network by aggregation.

Howbank transfers are aggregated into links. i made three transfers (1, 2, and 4)in an arbitrary unit of money to j , while j made one transfer (1) to i during acertain period. Flow f ij is deﬁned by the total ﬂow of transfers along i → j .Frequency g ij is the frequency of these transfers.Table 2: Summary statistics for links’ ﬂows and frequenciesStats. Flow (Yen) Frequency

Min. 1 1Max. 3 . × . × . × . × . . . × . × Summary statistics of the links’ ﬂows and frequencies for all the pairs ofaccounts, where links are aggregated transfers as deﬁned in the main text andFig. 1.

Results and Discussion

Network of ﬁrms’ accounts and links of transfers

First, let us summarize the network structure comprising ﬁrms’ accounts asnodes and aggregated transfers as links. We remark that transfers are aggre-gated into links as shown in Fig. 1. The degree is the number of transfersreceived by or sent from an account. The number of incoming and outgoinglinks of an account is called the in-degree and out-degree, respectively. Fig. 2shows the distributions of the in-degree and out-degree as complementary cumu-lative distributions. By noting that the total number of accounts is N = 30 , r = 0 .

303 ( p < − ); Kendall’s τ = 0 . p < − )). We also observe that there are accounts that have many moreincoming links than outgoing ones (and vice versa), which can be respectively4 -5 -4 -3 -2 -1 C u m u l a t i v e P r obab ili t y Degree

In-degreeOut-degree

Figure 2:

Degree distributions for the bank transfer network.

Comple-mentary cumulative distributions for in-degree and out-degree, which refer tothe number of incoming and outgoing links, respectively, of each account.considered as “sinks” and “sources” with respect to the money ﬂow.We can observe each link’s weights, ﬂow f ij , and frequency g ij (see Fig. 1).Fig. 4 shows the complementary cumulative distribution for the ﬂow along eachlink. The distribution is highly skewed; there exist a small number of links thathave a large amount of ﬂow exceeding a billion yen—likely important channelswith large ﬂows of money. Quantitatively, 0.1% of the links have ﬂows largerthan a billion yen.Fig. 5 shows the complementary cumulative distribution for the frequencyalong each link. The steps at 30 and 60 on the horizontal axis are consideredto correspond to transfers performed once or twice in each month (recall thatthe entire period includes 29 months). We can see that 0.1% of the links havefrequencies of 500 or more corresponding to daily transfers on weekdays. Community analysis

Communities or clusters in a network are tightly knit groups with high intra-group density and low inter-group connectivity [7]. Community analysis is usefulfor understanding how a network has such heterogeneous structures. We adoptthe widely used Infomap method [8, 9] to detect communities in our data.The results are presented in Table 3. “Level” indicates the level of communi-ties in a hierarchical tree of communities that are detected recursively (see [9]).The number of communities indicates how many communities are detected atthe corresponding level. The label “irr. comm.” denotes irreducible communi-ties that cannot be decomposed further to the next level of smaller communitiesin the hierarchical decomposition. For example, 143 of 164 communities at the5 D eg r ee ( ou t ) Degree (in)

Figure 3:

Scatter plot for in-degree and out-degree of each account.

Each account as a node, represented as a point, has incoming links and outgoinglinks, the numbers of which are represented by the horizontal and vertical axes,respectively. The diagonal line represents the locations where the in-degree andout-degree are equal. -6 -5 -4 -3 -2 -1 C u m u l a t i v e P r obab ili t y Edge Flow (in Yen)

Figure 4:

Distribution for the ﬂows of links.

Complementary cumula-tive distributions for the amount of money deﬁned by f ij between each pair ofaccounts i and j (see Fig. 1). 6 -6 -5 -4 -3 -2 -1 C u m u l a t i v e P r obab ili t y Edge Frequency

Figure 5:

Distribution for the frequencies of transfers.

Complementarycumulative distributions for the frequency deﬁned by g ij between each pair ofaccounts i and j (see Fig. 1). We can observe that there are frequency stepsaround 30 and 60, which are presumed as periodic transfers performed once ortwice in each month (recall that the entire period includes 29 months).7rst level are irreducible ones, whereas the rest of them are decomposed into2,327 smaller communities at the next level, and so forth.Table 3: Numbers of communities, irreducible communities, and ac-counts at each level of community analysis using InfomapLevel

Bowtie structure

With respect to the ﬂow of money, the accounts can be located in a classiﬁ-cation of the so-called bowtie structure, which was ﬁrst adopted in the studyof the Internet [11]. Nodes in a directed network can be classiﬁed into a giantweakly connected component (GSCC), its upstream side as the IN component,its downstream side as the OUT component, and the rest of the nodes that donot belong to any of GSCC, IN, and OUT. In general, they can be deﬁned asfollows.

GWCC

Giant weakly connected component: the largest connected componentwhen viewed as an undirected graph. At least one undirected path exists8 R an k Community Size

Figure 6:

Distributions of the sizes of irreducible communities.

Rank-size plot for the sizes of irreducible communities detected using the Infomapmethod at all the levels, where the ranks are in descending order of the sizewith the lowest rank equal to the total number of irreducible communities (seeTable 3). The size of a community is simply the number of nodes included inthe community. 9or an arbitrary pair of nodes in the component.

GSCC

Giant strongly connected component: the largest connected componentwhen viewed as a directed graph. At least one directed path exists for anarbitrary pair of nodes in the component. IN Nodes from which the GSCC is reached via directed paths.

OUT

Nodes that are reachable from the GSCC via directed paths. TE “Tendrils”: the rest of GWCCTherefore, we have the components such thatGWCC = GSCC + IN + OUT + TE (1)For our data of the entire network with N = 30 ,

613 nodes and M =280 ,

864 links, the GWCC component comprises 30,225 (99.0%) nodes and280,598 (99.9%) links. The components of GSCC, IN, and OUT are summa-rized in Table 4. As can be seen, nearly 40% of the accounts are inside GSCC.Further, 15% of the accounts are in the upstream portion or IN, whereas 37%are in the downstream portion or OUT. These ﬁgures are very similar to thoseobserved in the production network in Japan in a previous study [10].The set of three components of GSCC, IN, and OUT is usually referred to asa “bowtie”; however, we ﬁnd that the entire shape does not look like a “bowtie”but like a “walnut” in the sense that IN and OUT are two mutually disjoint thinskins enveloping the core of GSCC rather than two wings elongating from thecenter of a bowtie. In fact, by examining the shortest-path lengths from GSCCto IN or OUT, we can see that the accounts in the IN and OUT componentsare just a few steps away from GSCC as shown in Table 5. This feature is alsosimilar to the production network on a nationwide scale (see the walnut structurein [10]); however, is diﬀerent from many social and technological networks suchas the Internet, where the maximum distances from GSCC to IN or OUT areusually very long (see the original paper [11]).Table 4:

Bowtie or “walnut” structure: size of each component.Component

GSCC 11,543 38.2%IN 4,508 14.9%OUT 11,270 37.3%TE 2,904 9.6%total 30,225 100%“Ratio” refers to the ratio of the number of ﬁrms to the total number ofaccounts in GWCC.

Hodge decomposition: upstream/downstream ﬂow

Our analysis of the bowtie structure implies that the nodes in IN and OUT arelocated in the upstream and downstream sides in the ﬂow of money. The Hodgedecomposition of the ﬂow in a network is a mathematical method of ranking10able 5: “Walnut” structure: shortest distance from GSCC toIN/OUT. IN to GSCC OUT from GSCCDistance A ij denote adjacency matrix of our directed network of bank transfers, i.e., A ij = (cid:40) i to j, . (2)Recall that the numbers of accounts and links are N and M , respectively. Weexcluded all the self-loops, implying that A ii = 0. Each link has a ﬂow, denotedby B ij , either of the total amount of transfers, f ij , or the frequency of transfers, g ij (see Fig. 1), i.e., B ij = (cid:40) f ij or g ij if there is a ﬂow from i to j, . (3)Note that there may be a pair of accounts such that A ij = A ji = 1 and B ij , B ji >

0. Next, we shall take the frequency of transfers, g ij , by assum-ing that it represents the strength of the link.Let us deﬁne a “net ﬂow” F ij by F ij = B ij − B ji (4)and a “net weight” w ij by w ij = A ij + A ji . (5)Note that w ij is symmetric, i.e., w ij = w ji , and non-negative, i.e., w ij ≥ i and j . We remark that Eq. (5) is simply a convention to considerthe eﬀect of mutual links between i and j . One could multiply Eq. (5) by 0.5or an arbitrary positive number, which does not change the result signiﬁcantlyfor a large network.Now, the Hodge decomposition is given by F ij = F (c) ij + F (g) ij , (6)11igure 7: Walnut structure: a schematic view.

The so-called bowtie struc-ture reveals that GSCC includes nearly 40% of all the nodes or accounts, whilethe IN and OUT components include 15% and 37%, respectively (see Table 4 forthe details). The prominent features are as follows. (i) The shortest distancesto IN and OUT from GSCC are quite small, typically 1 or 2, and 4 at most(Table 5); hence, the ties are not elongated like a “bowtie” but rather like a“walnut” skin. (ii) The nodes in the components of IN and OUT are connectedto the nodes scattered widely in GSCC. See also the study of a supplier-customernetwork [10] with similar features.where the circular ﬂow F (c) ij satisﬁes (cid:88) j F (c) ij = 0 , (7)which implies that the circular ﬂow is divergence-free. The gradient ﬂow F (g) ij can be expressed as F (g) ij = w ij ( φ i − φ j ) , (8)i.e., the diﬀerence of “potentials”. In this manner, the weight w ij serves to makethe gradient ﬂow possible only where a link exists. We refer to the quantity φ i as the Hodge potential . If φ i is relatively large, the account i is located in theupstream side of the entire network, while a small φ i implies that i is locatedin the downstream side of the entire network.Eqs. (6)–(8) can be solved as follows. First, we combine them into thefollowing equation for the Hodge potentials ( φ , · · · , φ N )( ≡ φ ): (cid:88) j L ij φ j = (cid:88) j F ij , (9)for i = 1 , . . . , N . Here, L ij is the so-called graph Laplacian and deﬁned by L ij = δ ij (cid:88) k w ik − w ij , (10)where δ ij is the Kronecker delta. 12 F r equen cy Hodge Potential

GSCCINOUTTE

Figure 8:

Distribution of the Hodge potentials of individual accounts.

Distributions as histograms of φ i in each component of the bowtie or walnutstructure Fig. 7. The horizontal axis represents the value of φ i of an individ-ual node or account, while the vertical axis represents the frequency in thehistogram. The black line corresponds to GSCC or the core. The blue andred lines, respectively, correspond to the IN and OUT components or upstreamand downstream with respect to the core. The green line corresponds to TE(tendrils) or the rest of the nodes.It is straightforward to show that the matrix L = ( L ij ) has only one zeromode (eigenvector with zero eigenvalue), i.e., φ = (1 , , · · · , / √ N . The pres-ence of this zero mode simply corresponds to the arbitrariness in the origin of φ .We can show that all the other eigenvalues are positive (see, e.g., [17]). There-fore, Eq. (9) can be solved for the potentials by ﬁxing the potentials’ origin. Weassume that the average value of φ is zero, i.e., (cid:80) i φ i = 0.The Hodge potentials obtained for the entire network of GWCC are shownin Fig. 8 as the distribution for the potentials of all the accounts in GWCC (redline). By noting that the average is zero by deﬁnition, we can see that it is abimodal distribution with two peaks at positive and negative values, while thereare a number of potential values close to zero (peaks around zero). The nodesin TE (tendrils) can be considered to have locations that are not particularlyrelevant to upstream or downstream; we can expect that these nodes mostlyhave potentials close to zero, as shown by the blue line, i.e., the result afterdeleting all the nodes contained in TE’s. We can see that these TE do notcontribute to large absolute values of the Hodge potentials.It can be expected that there is a correlation between the value of the Hodgepotential and the net amount of demand or supply of money for each node. Wecan measure the net amount of demand/supply by examining the in-degree andout-degree of the node, or alternatively, the in-ﬂow and out-ﬂow of money. Fig. 9and Fig. 10 show the results. We ﬁnd that if the potential is positive, the node13 N e t D eg r ee ( i n - ou t ) Hodge Potential

Figure 9:

Hodge potential and net degree for each node.

Each pointrepresents a node or an account. The net degree is deﬁned by the diﬀerencebetween the in-degree and the out-degree of the node. If the net degree ispositive, the node has more incoming links than outgoing ones and vice versa.is located in the upstream side, and its net degree and ﬂow are negative. If thepotential is negative, the node is located in the downstream side, and its netdegree and ﬂow are positive.This ﬁnding can be interpreted as follows. Consider a supplier in the pro-duction network, which supplies its products to a number of customers. Thesupplier has a bank account (or possibly multiple accounts) that receives moneyfrom the customers’ accounts as the supplier’s sales. If the supplier is in theupstream side of the supplier-customer relationship, it is likely that the accountis located in the downstream side of the money ﬂows in this study. As the sup-plier not only makes sales but also incurs costs, typically labor costs, there mustbe an outgoing ﬂow from its account to be linked with households and othernon-commercial entities, which are not included in the present study. Conse-quently, the supplier’s account has a negative net degree and ﬂow, while itsHodge potential is likely positive. A similar argument would hold for customersin an opposite way. In other words, our ﬁnding is a direct observation of howthe ﬂow of money reﬂects the economic activities among the ﬁrms’ accounts.

Non-negative matrix factorization (NMF): hidden factorsof ﬂows

We would like to show that there are hidden “factors” in the entire ﬂow of thenetwork. By “factor”, we mean a component that can explain a signiﬁcant partof the ﬂow. Alternatively, the entire ﬂow can be decomposed into only a smallnumber of factors. 14 N e t F l o w ( i n - ou t, m illi on Y en ) Hodge Potential

Figure 10:

Hodge potential and net ﬂow for each node.

This ﬁgure issimilar to Fig. 9 except for the vertical axis, which represents the net ﬂow. Thenet ﬂow is deﬁned by the diﬀerence between the incoming amount of money andthe outgoing one.In this section, we focus on the geographical information of bank transfers.Each bank account has an address. We obtain the latitudes and longitudes ofthe bank accounts by using geocoding. Consequently, a bank transfer betweentwo bank accounts has two coordinates of its remittance source and destination.We construct a non-negative matrix deﬁned from the frequencies between thegeographical areas, and we adopt NMF to ﬁnd the hidden factors of geographicalstructures of the ﬂow.NMF constructs an approximate factorization of a non-negative matrix [18].For example, NMF is useful for processing facial images because it producesparts-based representations of such images [19]. To reveal the basic componentsof the geographical structure of bank transfers, we apply NMF to a non-negativematrix V = ( V mn ) deﬁned as follows. We set a square area including theprefecture and split it into K × K smaller squares in a lattice pattern, where K = 100. Let R pq be the ( p, q ) small square area for 1 ≤ p, q ≤ K . Weconsider the frequencies of bank transfers between two small square areas. Let α ( p , q , p , q ) be the frequency of bank transfers from ( p , q ) to ( p , q ) for1 ≤ p , q , p , q ≤ K , i.e., using the frequency g ij of transfers from account i toaccount j , α ( p , q , p , q ) = (cid:88) { ( i,j ) | ( x i ,y i ) ∈ R p q , ( x j ,y j ) ∈ R p q } g ij , (11)where ( x i , y i ) is the coordinate of the address of account i . The non-negative15atrix V of size K × K is deﬁned by V mn = log(max { , α ( p , q , p , q ) } ) , (12)where m = p + ( q − K and n = p + ( q − K . For practical purposes, weconvert the frequencies into their logarithmic values to reduce the inﬂuence ofoutstanding values.NMF gives the approximate factorization V ≈ W H (13)for some integer d , where W and H are non-negative matrices of size K × d and d × K , respectively. We let d = 10 from prior knowledge that the numberof local communities in the prefecture is around 10. Since the m th row of V corresponds to bank transfers from ( p, q ) for m = p + ( q − K , the rows of H constitute a basis of bank transfers for the given sources. Similarly, since the m th column corresponds to bank transfers to ( p, q ) for m = p + ( q − K , thecolumns of W constitute a basis of bank transfers for the given destinations.We can regard Eq. (13) as the approximation of V by the sum of products ofthese basis vectors. By letting w m be the m th column vector and h m be the m th row vector, we have V ≈ d (cid:88) m =1 w m h m . (14)The logarithms of the frequencies of bank transfers in the target area are de-composed into matrices w m h m for m = 1 , . . . , d .A basis vector v , which is a column vector w m of W or a row vector h m of H , can be converted to a K × K matrix D ( v ), 1 ≤ p, q ≤ K , on the geograph-ical square area because an entry of V corresponds to the frequency of banktransfers between two small square areas. In other words, D ( v ) is representedas a heatmap in the geographical area and Fig. 11 shows a heatmap of a basisvector. Since basis vectors seem to indicate geographically localized structures,to quantify such structures, we consider a circular area for a basis vector so thatthe sum of entries of the basis vector included in the circular area is maximized.Let r pq be the coordinate of the center of R pq and let C pq be a circular areawhose radius is 10 km and center is r pq . For a K × K matrix E = ( E pq ) and acircular area C , we deﬁne β ( C, E ) = (cid:80) { ( p,q ) | r pq ∈ C } E pq (cid:80) { ( p,q ) | ≤ p,q ≤ K } E pq . (15)The proportion γ ( v ) is calculated by C (cid:48) ( v ) = arg max { C pq | ≤ p,q ≤ K } β ( C pq , D ( v )) (16) γ ( v ) = max { C pq | ≤ p,q ≤ K } β ( C pq , D ( v )) . (17)The proportion γ and the circular area C (cid:48) of a basis vector are shown in Fig. 11.The panels (A) and (B) in Fig. 12 show the proportions γ of all the basisvectors of sources and destinations. The proportions are more than 23% except16 .000.050.100.150.20 Figure 11:

Normalized basis vector obtained by NMF. The circulararea has the largest sum of entries of the basis vector included in thecircular area.

A normalized basis vector such that the sum of entries is one isconverted into a heatmap whose lattice pattern corresponds to R pq . The radiusof the circular area is 10 km. The circular area is C (cid:48) ( v ) for some basis vector v ,i.e., it is located at a position such that β ( · , D ( v )) is maximized.

34 2924 31263836 37 26 35 (A)

34 3126 353138 9 35 23 33 (B)

Figure 12:

Circular areas corresponding to the basis vectors and pro-portions of the vector entries included in the circular areas. (A) isdrawn from w m , i.e., the basis vectors for sources, and the proportions γ ( w m ),while (B) is drawn from h m , i.e., the basis vectors for destinations, and theproportions γ ( h m ) for m = 1 , . . . , d . 17 .00.20.40.60.81.0 Figure 13:

Cosine similarities between basis vectors.

The vertical axisrepresents the indices of h s , i.e., the s th row vector of H , and the horizontalaxis represents the indices of w t , i.e., the t th column vector of W . The index ofthe top left square is ( s, t ) = (0 , w m and h n is calculated by w m · h n (cid:107) w m (cid:107)(cid:107) h n (cid:107) , (18)where w m · h n is the inner product of w m and h n and (cid:107) · (cid:107) is the Euclidean normof a vector. All the diagonal entries except for one are 1’s, i.e., the m th basisvector h m is similar to the m th basis vector w m except for m = 7. These basisvectors correspond to basis vectors having geographically localized properties inFig. 12, and the similarities of pairs of basis vectors imply that both incomingand outgoing bank transfers for a local area have similar patterns.We can also interpret the seventh basis vectors of the source and destinationthat do not have similarities. The seventh basis vector of the source is localizedto the largest city in the prefecture and the seventh basis vector of the destina-tion is scattered throughout the prefecture. This means that the pair of thesebasis vectors corresponds to bank transfers from the largest city to the localareas. Therefore, Eq. (14) for our data gives decompositions that describe banktransfers in local areas and bank transfers between the largest city and localareas.Finally, we state the results of NMF with diﬀerent values of d . To investigatethe changes in the basis vectors that occur according to d , we apply NMF to V with d = 5 , . . . ,

15. In all the cases, most of the basis vectors are geographicallylocalized and form source and destination pairs that are similar to each other andcorrespond to bank transfers in local areas. All the basis vectors are localizedfor d less than 7, and there is a pair of basis vectors corresponding to banktransfers between the largest city and local areas for d greater than or equal to7. For all the values of d that we have examined, the basis vectors correspondto either bank transfers in local areas or bank transfers between the largest cityand other local areas. Conclusion

We studied an exhaustive list of bank accounts of ﬁrms and remittances fromsource to destination within a regional bank with a high market share of loansand deposits in a prefecture of Japan. By studying such a network of moneyﬂow, we could uncover how ﬁrms conduct the underlying economic activitiesas suppliers and customers from the upstream side to the downstream side ofthe money ﬂow. We aggregated the remittances that occurred for each pairof accounts as a link during the period from March 2017 to July 2019 (i.e.,approximately two and a half years), which comprises 30K nodes and 0.28Mlinks. We found that the statistical features of the network are actually similarto those of a production network on a nationwide scale in Japan [3], but withgreater emphasis on the regional aspects.19he bowtie analysis revealed what we refer to as a “walnut” structure inwhich the core and upstream/downstream components are tightly connectedwithin the shortest distances, typically at a few steps. By quantifying the loca-tion of the individual account of a ﬁrm using the method of Hodge decomposi-tion, we found that the Hodge potential of each node can describe the locationin the entire ﬂow of money from the upstream side to the downstream side, wellcharacterized by the values of the potential. In particular, there is a signiﬁcantcorrelation between the Hodge potentials and the net ﬂows of incoming andoutgoing money and links as well as the potentials and the walnut structure.This implies that we can characterize the net demand/supply of each node anddecompose the ﬂows into those due to the diﬀerence in potentials as well asdivergence-free ﬂows. Furthermore, by using non-negative matrix factorization,we uncovered the fact that the entire ﬂow can be considered as a combinationof several signiﬁcant factors. One factor has a feature whereby the remittancesource is localized to the largest city in the region, while the destination isscattered. The other factors correspond to the economic activities speciﬁc todiﬀerent local places, which can be interpreted as local activities of the economy.We can consider several points that remain to be studied separately from thepresent work. While we aggregated the entire period in this paper, it would beinteresting to determine how the network changes with time by examining thetime-stamps recorded in every remittance. At time scales of days, weeks, andmonths, it is quite likely that there are intra-day, weekly, and seasonal patternsof activities. More interestingly, under mild changes in the booms and bustsof the regional economy on a relatively long time scale, the economic agentsmight change their behaviors possibly by changing peers in the transactions.Alternatively, under sudden changes due to natural disasters or pandemics, theagents can change their usual patterns abruptly. In other words, these areimportant aspects of a temporally changing network.In addition, further investigation of the aspect of money ﬂow amounts iswarranted in the sense that the dominant driving force likely comes from “giantplayers” who demand or supply a large amount of money. Moreover, it wouldbe interesting to select them in a subgraph by choosing only links with ﬂowamounts that are larger than a certain threshold. These topics will be studiedin our future work.

Acknowledgements

We would like to thank Bank A for giving us an opportunity to study such aunique and valuable dataset. We are also grateful to Yoshiaki Nakagawa (Cen-ter for Data Science Education and Research, Shiga University) for insightfuldiscussions.

Funding

This work was supported in part by MEXT as Exploratory Challenges on Post-Kcomputer (Studies of Multi-level Spatiotemporal Simulation of SocioeconomicPhenomena), the project “Large-scale Simulation and Analysis of EconomicNetwork for Macro Prudential Policy” undertaken at the Research Institute of20conomy, Trade and Industry (RIETI), and JSPS KAKENHI Grant Numbers17H02041, 19K22032, and 20H02391.

Availability of data and materials

The dataset is available in a collaborative scheme upon request to TT and YFat Shiga University.

Competing interests

The authors declare that they have no competing interests.

Author’s contributions

All authors contributed equally. All authors read and approved the ﬁnal manuscript.

References [1] Bank of Japan: Guide to Japan’s Flow of Funds Accounts. . accessed June 2020[2] OECD: Input-Output Tables. . accessed June 2020[3] Aoyama, H., Fujiwara, Y., Ikeda, Y., Iyetomi, H., Souma, W., Yoshikawa,H.: Macro-Econophysics – New Studies on Economic Networks and Syn-chronization. Cambridge University Press, Cambridge, UK (2017)[4] Inoue, H., Todo, Y.: Firm-level propagation of shocks through supply-chainnetworks. Nature Sustainability , 841–847 (2019)[5] Inoue, H., Todo, Y.: The Propagation of Economic Impacts through Sup-ply Chains: The Case of a Mega-city Lockdown to Prevent the Spread ofCOVID-19. Research Institute of Economy, Trade and Industry (RIETI)Discussion Paper Series (2020)[6] Fujiwara, Y., Aoyama, H.: Large-scale structure of a nation-wide produc-tion network. The European Physical Journal B (4), 565–580 (2010)[7] Barab´asi, A.-L.: Network Science. Cambridge University Press, Cambridge,UK (2016)[8] Rosvall, M., Bergstrom, C.T.: Maps of random walks on complex networksreveal community structure. Proceedings of the National Academy of Sci-ences (4), 1118–1123 (2008)[9] Rosvall, M., Bergstrom, C.T.: Multilevel compression of random walks onnetworks reveals hierarchical organization in large integrated systems. PloSone (4), 18209 (2011) 2110] Chakraborty, A., Kichikawa, Y., Iino, T., Iyetomi, H., Inoue, H., Fujiwara,Y., Aoyama, H.: Hierarchical communities in walnut structure of japaneseproduction network. PLoS ONE , 10–13710202739 (2018)[11] Broder, A., Kumar, R., Maghoul, F., Raghavan, P., Rajagopalan, S., Stata,R., Tomkins, A., Wiener, J.: Graph structure in the Web. Computer Net-works (1-6), 309–320 (2000)[12] Jiang, X., Lim, L.-H., Yao, Y., Ye, Y.: Statistical ranking and combinato-rial hodge theory. Mathematical Programming (1), 203–244 (2011)[13] Miura, K., Aoki, T.: Scaling of hodge-kodaira decomposition distinguisheslearning rules of neural networks. IFAC-PapersOnLine (18), 175–180(2015). 4th IFAC Conference on Analysis and Control of Chaotic SystemsCHAOS 2015[14] Kichikawa, Y., Iyetomi, H., Iino, T., Inoue, H.: Hierarchical and Circu-lar Flow Structure of Interﬁrm Transaction Networks in Japan. https://ssrn.com/abstract=3173955 (2018)[15] Iyetomi, H., Aoyama, H., Fujiwara, Y., Souma, W., Voden-ska, I., Yoshikawa, H.: Relationship between macroeconomic in-dicators and economic cycles in u.s. Sci. Rep. , 8420 (2020).https://doi.org/10.1038/s41598-020-65002-3[16] MacKay, R., Johnson, S., Sansom, B.: How directed is a directed network?arXiv preprint arXiv:2001.05173 (2020)[17] Fujiwara, Y., Islam, R.: Hodge Decomposition of Bitcoin Money Flow.Springer. in press (2020)[18] Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization.In: Proceedings of the 13th International Conference on Neural InformationProcessing Systems. NIPS’00, pp. 535–541. MIT Press, Cambridge, MA,USA (2000)[19] Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative ma-trix factorization. Nature (6755), 788–791 (1999). doi:10.1038/44565