[PDF] Conjugate Distribution Law in Cultural Evolution via Statistical Learning

Abstract

Many cultural traits characterizing intelligent behaviors are now thought to be transmitted through statistical learning, motivating us to study its consequences for cultural evolution. We conduct a large-scale music data analysis and observe that various statistical parameters of musical products follow the beta distribution and other conjugate distributions. We construct a simple model of cultural evolution incorporating statistical learning and analytically show that conjugate distributions emerge at equilibrium in the presence of oblique transmission. The results demonstrate how the distribution of a cultural trait within a population depends on the individual's generative model for cultural production (the conjugate distribution law), and open interesting possibilities of theoretical and experimental studies on cultural evolution and social learning.

Full PDF

CConjugate Distribution Law in Cultural Evolution via Statistical Learning

Eita Nakamura ∗ The Hakubi Center for Advanced Research, Kyoto University, Sakyo, Kyoto 606-8501, Japan andGraduate School of Informatics, Kyoto University, Sakyo, Kyoto 606-8501, Japan (Dated: February 11, 2021)Many cultural traits characterizing intelligent behaviors are now thought to be transmittedthrough statistical learning, motivating us to study its consequences for cultural evolution. Weconduct a large-scale music data analysis and observe that various statistical parameters of musicalproducts follow the beta distribution and other conjugate distributions. We construct a simplemodel of cultural evolution incorporating statistical learning and analytically show that conjugatedistributions emerge at equilibrium in the presence of oblique transmission. The results demonstratehow the distribution of a cultural trait within a population depends on the individual’s generativemodel for cultural production (the conjugate distribution law), and open interesting possibilities oftheoretical and experimental studies on cultural evolution and social learning.

Cultural transmission and evolution are essential forthe development of human societies [1]. The commonapproach to studying these processes is to analyze dy-namical systems or stochastic processes incorporating thetransmission, selection, and mutation processes of cul-tural traits [2, 3]. For discrete cultural traits such as anindividual’s native language (English/French/etc.), thewell-developed theories of genetic evolution and popula-tion dynamics can be applied [2, 4, 5]. However, manycultural traits characterizing intelligent behaviors such asstyles of speech production (accent, speech speed, etc.)are not discrete quantities that are directly transmit-ted. Studies on information and cognitive sciences haveprovided accumulating evidences that such an intelligentbehavior often involves a stochastic data generation pro-cess and statistical parameters controlling the process aretransmitted from individuals to individuals through sta-tistical learning of the generated cultural products (e.g.[6–8]). This motivates us to study the signatures and con-sequences of statistical learning from the perspectives ofcultural evolution and social learning [9–14].Signatures of the underlying dynamical process in com-plex biological and social systems are often presented asstatistical laws that characterize the systems’ collectivebehavior. For example, Zipf’s law arises from mobile dy-namics of interacting individuals [15] and from chemicalreaction dynamics in self-reproducing cells [16], and Tay-lor’s law from the random diﬀusion process of traﬃc net-works [17]. There are ongoing projects on big-data anal-ysis of cultural products [18–20], and it has been foundthat in classical music data the within-song frequenciesof certain musical elements characterizing music stylesare approximately beta distributed [21]. The beta distri-bution is commonly used for probability parameters inBayesian statistics because it is a conjugate distribution(one that can parameterize both prior and posterior dis-tributions) [22]. So far, however, the origin of the betadistribution law and how general it is remain unknown.

Experimental observation. —To examine the ubiquityof the beta distribution law, we analyzed four music datasets from diﬀerent genres and observed the distribu-tions of frequencies of pitch and rhythm elements. The classical music dataset consists of pieces in various in-strumentations [21], from which we can extract pitch in-formation. All the other datasets consist of melody data,from which we can extract information about pitches andrhythms. The

Wikifonia dataset contains Western popu-lar songs and jazz songs mainly composed in the early tomid 20th century [23]. The

J-pop dataset contains pop-ular songs in Japan created after 1950 [24]. The

Irish song dataset contains Irish folk songs collected from apublic website [25]. All the datasets contain O (10 –10 )musical pieces created by various composers. As culturaltraits, we extracted the within-song frequencies of the in-tervals of consecutive pitches modulo 12 (pitch-class in-tervals; PCI) and those of the ratios of consecutive notedurations (note-value ratios), and observed their distri-butions within each dataset. See Supplemental Material(SM) for details of the data and analysis method.As shown in Fig. 1, beta distributions were observed forvarious musical elements across diﬀerent genres, whichextensively generalizes the previous observation [21].Two thirds of the 77 distributions analyzed had a ﬁt-ting quality (measured by the cumulative diﬀerence ofthe data and ﬁtted distributions) similar to or better thanthe example in Fig. 1(h). As exempliﬁed in Fig. 1(i), dis-tributions that signiﬁcantly deviate from the beta distri-bution were typically widespread and had multiple peakscorresponding to distinct music styles (e.g. duple rhythmvs dotted rhythm).We also analyzed song-level global statistics that ap-proximately obey certain statistical distributions usingthe same datasets. Fig. 2 shows the results for the J-popdataset. The contour of the distribution of pitch intervals I is approximately exponential distributed: P ( I ) ∝ e − λI in the range 1 ≤ I ≤

12, while the ﬁne structure re-ﬂects the rareness of dissonant intervals (Fig. 2(a)). Thewithin-corpus distribution of the rate parameters λ ofindividual songs approximately follows the gamma dis-tribution. Similarly, the contour of the distribution of a r X i v : . [ phy s i c s . s o c - ph ] F e b

0 0.1 0.2 (7/77) P ( f r equen cy ) Frequency (b) J-pop NVR 1:4

0 0.1 0.2 (13/77) P ( f r equen cy ) Frequency (e) Classical PCI 6

0 0.2 0.4 (2/77) P ( f r equen cy ) Frequency (a) Wikifonia PCI 9

0 0.2 0.4 (10/77) P ( f r equen cy ) Frequency (d) Irish NVR 1:2

0 0.2 0.4 0.6 (38/77) P ( f r equen cy ) Frequency (g) Irish PCI 10

0 0.2 (52/77) P ( f r equen cy ) Frequency (h) Wikifonia NVR 1:3

0 0.2 0.4 (8/77) P ( f r equen cy ) Frequency (c) Classical PCI 8

0 0.1 0.2 (27/77) P ( f r equen cy ) Frequency (f) J-pop PCI 5

0 0.5 1 (77/77) P ( f r equen cy ) Frequency (i) Irish NVR 1:1

FIG. 1. Within-corpus distributions of within-song frequen-cies of pitch-class intervals (PCIs) and note-value ratios(NVRs) with ﬁtted beta distributions. “PCI i ” representsPCIs of i semitones. The numbers in panels indicate the ranksof ﬁtting quality (ordered in the cumulative diﬀerence of thedata and ﬁtted distributions). note value ratios 1: R of integer values R approximatelyobeys a Pareto (power-law) distribution P ( R ) ∝ R − α in the range 1 ≤ R ≤

10, and shape parameters α areroughly gamma distributed (Fig. 2(b)). The number ofnotes in a bar (bar-wise note density) K approximatelyobeys a Poisson distribution P ( K ) ∝ ρ K /K ! in the range2 ≤ K ≤

12, and rate parameters ρ are roughly gammadistributed (Fig. 2(c)). The results for the other datasetswere similar (see SM).The observed gamma distributions are the conjugatedistributions for the exponential, Pareto, and Poissondistributions w.r.t. the analyzed parameters. Togetherwith the beta distribution law, the observations suggesta general conjugate relation between the within-productstatistical distribution and the within-population distri-bution of the statistical cultural traits, which we call the conjugate distribution law . The fact that the relation isobserved in music data of various societies and time pe-riods indicates an underlying general mechanism. Model. —To explain the origin of the conjugate distri-butions, we consider an evolutionary model called thestatistical learn-generate (SLG) system. Suppose that ateach generation t there are N individuals that generatecultural products X tn ( n = 1 , . . . , N ). Each product (e.g.a musical piece) X tn = ( x tn‘ ) L‘ =1 consists of L samples(e.g. musical notes) x tn‘ that are independently generatedby a probability distribution φ ( x ; θ tn ) called a generativemodel, where parameters θ tn are considered as a culturaltrait. For simplicity, the set of all products created by anindividual is treated as a single product with the samesize L . We assume that the distributions of all individu- -4 -3 -2 -1

0 2 4 6 8 10 12 14 (a) P ( I ) Pitch interval I Y ∝ exp(-0.50 X) 0 0.05 0.1 0 0.2 0.4 0.6 0.8 Ψ ( / < I > ) Song-wise 1/

Gamma fit10 -4 -3 -2 -1

1 2 3 4 5 7 10 15 (b) P ( R ) Note-value ratio 1: R Y ∝ X -2.65 Ψ ( / < l n R > ) Song-wise 1/ < ln R> Gamma fit10 -4

0 2 4 6 8 10 12 14 (c) K ! P ( K ) Bar-wise note density K Y ∝ (5.24) X Ψ ( < K > ) Song-wise

Gamma fit(exponential)(Pareto)(Poisson)

FIG. 2. Distributions of global musical statistics (J-pop data).(a) Left: Distributions P of pitch intervals within songs(ﬁne line segments), that within the dataset (bold line seg-ments), and a ﬁtted exponential distribution (straight line).Right: Distribution Ψ of song-wise statistics (reciprocals ofthe means) of pitch intervals and a ﬁtted gamma distribu-tion. (b) Similar results for note value ratios of the form 1: R ,where the global distribution is ﬁtted by a Pareto distribu-tion. (c) Similar results for bar-wise note densities, where theglobal distribution is ﬁtted by a Poisson distribution. als have an identical type but their parameter values canbe diﬀerent. For example, if φ is a Bernoulli distribution,sample x tn‘ takes 0 or 1, and θ is the probability to get1; usually, 1 represents the presence of a speciﬁc elementwe focus on (e.g. a PCI of 6 semitones) and 0 representsthe other cases. When φ is Poisson distributed, sample x tn‘ takes a nonnegative integer and θ represents the rate.We often identify individual n with its cultural trait θ tn .In general, we can incorporate a selection process forthe individuals to produce cultural oﬀspring in the nextgeneration. Since the selection process of cultural prod-ucts is still uncertain [26, 27], however, we consider theselectively neutral case. Also, each individual θ tn is as-sumed to have one cultural oﬀspring θ t +1 n . Parameters θ t +1 n are learned from the parent’s product X tn and we as-sume that the maximum likelihood method is used. Wedenote by ˆ θ tn the parameters learned from product X tn .If oﬀspring use only their parents’ product for learning(vertical transmission), θ t +1 n = ˆ θ tn holds. It is likely incultural transmission that products by other individuals(called secondary parents) in the parent generation arealso used for learning (oblique transmission [2]). Theeﬀect of oblique transmission from equally contributingsecondary parents can be represented as θ t +1 n = (1 − u )ˆ θ tn + uζ, ζ = 1 N X n ˆ θ tn , (1)where the summation is taken over the secondary par-ents, N is their number, and u represents the strength − u u − u − u uu ζ ・・・・・・ Lθ t X t ˆ θ t θ t X t ˆ θ t θ t +11 X t +11 θ t +12 X t +12 ・・・ t t + 1 θ tN X tN ˆ θ tN θ t +1 N X t +1 N Generate ζ ・・・ i nd i v i dua l s Generation samplesData N Ψ( θ, t ) { GenerationTraitLearn ˆ θ t − ˆ θ t − ˆ θ t − N Ψ( θ, t + 1) GL FIG. 3. Statistical learn-generate (SLG) system with obliquetransmission. of their inﬂuence (Fig. 3). When the secondary parentsare randomly chosen and N is large, as assumed in thefollowing analysis, we have ζ ’ ¯ θ = N P n θ tn (populationaverage) and ζ is eﬀectively a constant.When φ is Bernoulli distributed, the SLG system isequivalent to the (asexual) Wright-Fisher (WF) modelknown in population genetics [28]. The neutrally selec-tive WF model consists of a population of N g individuals,each having a gene with two alleles (0 and 1). At eachgeneration all individuals are replaced by new individualsand the genes of randomly chosen parents are inherited.The gene frequency follows the same stochastic processas a vertically transmitted trait of the SLG system with L = N g , and thus, the SLG system is equivalent to aset of N populations of the WF model. Moreover, theeﬀect of oblique transmission in Eq. (1) with constant ζ is equivalent to the linear pressure representing the eﬀectof constant migration (from outside) or mutation [29].We now consider the time evolution of the within-population distribution Ψ( θ, t ) = P ( θ tn ) of cultural traitsin the large N limit. This problem is well understoodfor the WF model, and accordingly, for the case that φ is Bernoulli distributed [30]. In the case of pure verti-cal transmission, the variable θ tn ﬂuctuates statisticallyuntil it becomes 0 or 1, and the equilibrium distributionis given as Ψ( θ, ∞ ) = (1 − ¯ θ ) δ ( θ ) + ¯ θ δ ( θ − L limit. Then the dynamics can bedescribed by a Fokker-Planck (FP) equation ∂ Ψ( θ, t ) ∂t = − ∂∂θ (cid:26) M ( θ )Ψ( θ, t ) − ∂ [ K ( θ )Ψ( θ, t )] ∂θ (cid:27) , (2)where M ( θ ) = u ( ζ − θ ), K ( θ ) = θ (1 − θ ) /L , andthe terms in the curly brackets represent the probabil-ity current [30]. The zero-current equilibrium solutionof the FP equation is given by the beta distributionBeta( θ ; 2 Luζ, Lu (1 − ζ )) [29]. Therefore, the SLG sys-tem can explain an evolutionary origin of the beta distri-bution law in the presence of oblique transmission.To extend the result for cultural traits following moregeneral generative models, we consider exponential fam- ily (EF) distributions with one parameter θ : φ ( x ; θ ) = exp (cid:2) F ( x ) B ( θ ) − A ( θ ) + U ( x ) (cid:3) . (3)A large class of probability distributions includingBernoulli, Poisson, Gaussian, and gamma distributionscan be represented by diﬀerent choices of F ( x ) and U ( x ).Functions B ( θ ) and A ( θ ) specify how the distributionis parameterized and we have h F i = A ( θ ) /B ( θ ) from0 = ∂ θ R dx φ ( x ; θ ). We can explicitly construct a conju-gate distribution for the distribution in Eq. (3) as˜ φ ( θ ; χ, ν ) = exp (cid:2) χB ( θ ) − νA ( θ ) + C ( θ ) + W ( χ, ν ) (cid:3) , (4)where C ( θ ) can be any function and W ( χ, ν ) representsthe normalization term. The function C does not changethe conjugacy property and the minimal case with C = 0is usually used in Bayesian statistics. To ﬁx a parame-terization scheme for θ , we note that it is natural to usethe expectation value h F i as a parameter. It is knownthat the Cram´er-Rao bound can be attained only withthis parameterization [31], meaning that it is the mosteﬃcient one for statistical learning. We thus assume therelation h F i = A ( θ ) /B ( θ ) = θ . The estimation varianceof F is given as D ( θ ) ≡ h ( F − θ ) i = 1 /B ( θ ), and thedistribution Ψ( θ, t ) follows Eq. (2) with K ( θ ) = D ( θ ) /L .With the oblique transmission eﬀect M ( θ ) = u ( ζ − θ ),the equilibrium solution of Eq. (2) can be written asΨ ∗ ( θ ) ∝ K ( θ ) exp (cid:20) Z θ M ( η ) K ( η ) dη (cid:21) . Using K ( θ ) − = LB ( θ ) and θ = A ( θ ) /B ( θ ), we obtainΨ ∗ ( θ ) ∝ exp (cid:2) LuζB ( θ ) − LuA ( θ ) + ln B ( θ ) (cid:3) . (5)Comparing with Eq. (4), this is a conjugate distributionwith C = ln B , which shows an origin of the general con-jugate distribution law in the presence of oblique trans-mission. Eq. (5) tells how the equilibrium distributionform of cultural traits in the population depends on thegenerative model for cultural production; Ψ ∗ is essen-tially determined by the estimation variance D (note that A = θD − and B = D − ).We can further say that the solution (5) is the minimalconjugate distribution if ln B ( θ ) is a linear combinationof B ( θ ) and A ( θ ) up to a constant term, or equivalently,the estimation variance D ( θ ) = 1 /B ( θ ) is quadratic: D ( θ ) = d + d θ + d θ , (6)where d i are constants (see SM). Table I lists examplesof probability distributions satisfying Eq. (6). It can beshown that if φ ( x ; θ ) satisﬁes Eq. (6), so does the distri-bution for a variable y = f ( x ) obtained by a transforma-tion function f ( D remains unchanged). For example, thePareto distribution satisﬁes Eq. (6) w.r.t. the logarithmicmean statistic since the gamma distribution does w.r.t. Individual’s gener-ative model φ Culturaltrait θ Trait distri-bution ΨBernoulli Mean BetaPoisson Mean GammaGaussian Mean GaussianGaussian Variance Inverse gammaGamma Mean Inverse gamma

TABLE I: Examples of exponential family distributionssatisfying Eq. (6) and their conjugate distributions.the mean statistic. EF distributions with quadratic esti-mation variances cover a wide range of well-known distri-butions [32], including those analyzed in Fig. 2 (see SM),indicating that conjugate distributions found in culturalproducts can often be minimal. Not all EF distributionssatisfy Eq. (6): a counter example is the gamma distri-bution w.r.t. the logarithmic mean statistic.Some properties of the equilibrium distribution can bederived. First, under a regular condition that is usuallymet, the relation h θ i ∗ ≡ R dθ θ Ψ ∗ ( θ ) = ζ holds (evenwhen ζ is an external parameter), which is consistentwith the assumption of the constant population mean ζ = ¯ θ . Second, with the condition (6), one can show that V ∗ ( θ ) = Z dθ ( θ − h θ i ∗ ) Ψ ∗ ( θ ) = D ( ζ )2 Lu − d , (7)indicating that when the individuals of the SLG systemtransmit diﬀerent cultural traits θ simultaneously, thevalues V ( θ ) /D (¯ θ ) at equilibrium are independent of ¯ θ anddepend only on Lu and d . Finally, the relaxation time isof order u − , which is physically expected as the obliquetransmission pushes a proportion u of the trait towards ζ in a time unit. Therefore, the conjugate distribution canbe maintained even in the presence of a selective pressureas long as its eﬀect is small in the time scale of order u − ,which may account for the evolving beta distributionsobserved in classical music data [21]. Derivations andnumerical conﬁrmations of these results are given in SM.In relating the theory with experiments, it is importantto note that the creator’s trait θ can only be observedthrough actual products. When θ is estimated from aproduct with L p samples, a variance of D ( θ ) /L p is ex-pected, and the product-wise statistical parameters (asin Figs. 1 and 2) will have a distribution Ψ p distortedfrom that of θ . Nonetheless, since this distortion processis same as the transmission process, Ψ p approximatelyfollows the same conjugate distribution as Ψ ∗ with avariance increased from Eq. (7) by D ( ζ ) /L p . This ex-plains the empirical conjugate distribution law observedin Figs. 1 and 2. This argument also indicates the dif-ﬁculty of applying Eq. (7) for estimating the value of Lu directly from product data if Lu (cid:29) L p . The values of D (¯ θ ) /V ( θ ) obtained from the distributions in Fig. 1 were29 . ± . . d . ) for the Wikifonia data (excluding distributions poorly ﬁtted by beta distributions), whichare relatively consistent among diﬀerent musical elementsand signiﬁcantly smaller than the mean L p = 107. Sincean extremely small u is unlikely, these small values prob-ably reﬂect statistical dependence of the samples due tofrequent repetitions in music. Similar results were ob-tained for the other data (see SM).We can extend the SLG system for EF distributionswith more than one parameter and show that a conju-gate distribution emerges at equilibrium in the presenceof oblique transmission (see SM). In the case of discretedistributions, one can show that the conjugate Dirich-let distribution emerges at equilibrium. It is not easy toﬁnd other examples where the minimal conjugate distri-bution is the equilibrium solution due to the non-diagonalelements of the estimation covariance matrix. Detailedanalyses are left for future work. Discussion. —When selectively neutral, the asymptoticbehavior of the SLG system is similar to that of iteratedlearning models with Bayesian agents [10, 13], in whichthe prior distribution is maintained at equilibrium. Thissuggests the necessity of studying the dynamics under se-lection to discriminate these quite diﬀerent transmissionmechanisms. From the model building perspective, theSLG system removes the reliance on prior distributionsand allows ﬂexible extensions with selective pressures.Our ﬁndings indicate a signiﬁcant link between statis-tical learning and cultural evolution, and connect themwith statistical theory in an intriguing way. Eqs. (5)and (6) indicate a nontrivial relationship between obliquetransmission and Bayesian learning that reﬂects the ge-ometric structure of generative models [33]. Much sta-tistical theory including Bayesian estimation theory andlimit theorems can be developed for EF distributions sat-isfying Eq. (6) [32, 34], which can now be applied for an-alyzing cultural traits of statistical nature. Importantly,our result provides a theoretical background for choosingsuitable parametric models for cultural traits. For exam-ple, the widely applied Gaussian assumption [2, 3] canbe an inappropriate approximation.The SLG system opens new possibilities and questionsfor studying the dynamic aspects of culture and intel-ligence. For example, the dynamical relation µ t ∝ σ t between the mean µ t and the standard deviation σ t ob-served in classical music data [21], as opposed to σ t ∝√ µ t expected from Eq. (7), implies that the factor Lu is time-varying, similarly to an evolving mutation rate[35, 36]. Finally, it will be interesting to apply the pre-sented analysis method in other cultural domains suchas language [18, 37], ﬁne art [20], and cooking [38], andinvestigate the universal properties of cultural evolution. Acknowledgments:

The author would like to thankKunihiko Kaneko and Nobuto Takeuchi for useful dis-cussions and Madison S. Krieger for helpful commentson the manuscript. This work was in part supported byJSPS KAKENHI No. 19K20340. ∗ [email protected][1] E. Szathm´ary and J. Maynard Smith, Nature , 227(1995).[2] L. L. Cavalli-Sforza and M. W. Feldman, CulturalTransmission and Evolution (Princeton University Press,1981).[3] R. Boyd and P. J. Richerson,

Culture and the Evolution-ary Process (The University of Chicago Press, 1985).[4] M. A. Nowak and D. C. Krakauer, Proc. Natl. Acad. Sci. , 8028 (1999).[5] D. M. Abrams, H. A. Yaple, and R. J. Wiener, Phys.Rev. Lett. , 088701 (2011).[6] J. R. Saﬀran, R. N. Aslin, and E. L. Newport, Science , 1926 (1996).[7] J. D. Fern´andez and F. Vico, J. Artiﬁcial Intelligence Res. , 513 (2013).[8] Y. LeCun, Y. Bengio, and G. Hinton, Nature , 436(2015).[9] G. J. Baxter, R. A. Blythe, W. Croft, and A. J. McKane,Phys. Rev. E , 046118 (2006).[10] T. L. Griﬃths and M. L. Kalish, Cog. Sci. , 441 (2007).[11] F. Reali and T. L. Griﬃths, Proc. R. Soc. B: Bio. Sci. , 429 (2010).[12] C. Perreault, C. Moya, and R. Boyd, Evolution and Hu-man Behavior , 449 (2012).[13] B. Thompson, S. Kirby, and K. Smith, Proc. Natl. Acad.Sci. , 4530 (2016).[14] B. Chazelle and C. Wang, J. Machine Learning Res. , 1 (2019).[15] M. Marsili and Y.-C. Zhang, Phys. Rev. Lett. ,2741 (1998).[16] C. Furusawa and K. Kaneko, Phys. Rev. Lett. ,088102 (2003).[17] S. Meloni, J. G´omez-Gardenes, V. Latora, andY. Moreno, Phys. Rev. Lett. , 208701 (2008).[18] J.-B. Michel et al. , Science , 176 (2011). [19] M. Mauch, R. M. MacCallum, M. Levy, and A. M. Leroi,R. Soc. open sci. , 150081 (2015).[20] A. Elgammal, B. Liu, D. Kim, M. Elhoseiny, andM. Mazzone, in Proceedings of the AAAI Conference onArtiﬁcial Intelligence , Vol. 32(1) (2018) p. 2183.[21] E. Nakamura and K. Kaneko, Sci. Rep. , 15993 (2019).[22] C. M. Bishop, Pattern recognition and machine learning (Springer, 2006).[23] F. Simonetta, F. Carnovalini, N. Orio, and A. Rod`a,in

Proceedings of the Audio Mostly 2018 on Sound inImmersion and Emotion (2018).[24] K. Shiiba and S. Kubo (ed.),

Japanese Songs Vols. 3–9(in Japanese) (Nobarasha, 1998/1999/2001/2004/2014).[25] B. L. Sturm, J. F. Santos, O. Ben-Tal, and I. Kor-shunova, in

Proceedings of the 1st Conference on Com-puter Simulation of Musical Creativity (2016).[26] M. J. Salganik, P. S. Dodds, and D. J. Watts, Science , 854 (2006).[27] R. A. Bentley, C. P. Lipo, H. A. Herzog, and M. W.Hahn, Evolution and Human Behavior , 151 (2007).[28] W. J. Ewens, Mathematical population genetics 1: theo-retical introduction (Springer, 2012).[29] S. Wright, Genetics , 97 (1931).[30] J. F. Crow and M. Kimura, An Introduction to Popula-tion Genetics Theory (Harper and Row, 1970).[31] A. V. Fend, Ann. Math. Stat. , 381 (1959).[32] C. N. Morris, Ann. Stat. , 65 (1982).[33] S. Amari,

Information geometry and its applications (Springer, 2016).[34] C. N. Morris, Ann. Stat. , 515 (1983).[35] K. Kaneko and T. Ikegami, Physica D: Nonlinear Phe-nomena , 406 (1992).[36] D. J. Earl and M. W. Deem, Proc. Natl. Acad. Sci. , 11531 (2004).[37] T. L. Griﬃths and M. Steyvers, Proc. Natl. Acad. Sci. , 5228 (2004).[38] M. Caraher, P. Dixon, T. Lang, and R. Carr-Hill, BritishFood Journal , 590 (1999). upplemental Material for“Conjugate Distribution Law in Cultural Evolution via Statistical Learning” Eita Nakamura ∗ The Hakubi Center for Advanced Research, Kyoto University, Sakyo, Kyoto 606-8501, Japan andGraduate School of Informatics, Kyoto University, Sakyo, Kyoto 606-8501, Japan

CONTENTS

S1. Datasets 1S2. Data analysis of frequency statistics 1S3. Data analysis of global statistics 2S4. Derivation of Eq. (6) 4S5. Examples of exponential family distributions with quadratic estimation variances 5S6. Mathematical analysis of the solution of the FP equation (2) 7S7. Numerical analysis of the solution of the FP equation (2) 8S8. Equilibrium solution of the FP equation in the multiple parameters case 9References 11

S1. DATASETS

We used four music datasets for the analysis presented in Figs. 1 and 2. The classical music dataset consists of9,727 musical pieces in various instrumentations written by various composers [1]. The data contents are representedin the MIDI format, and we extracted integer pitches for the analysis. We did not use the rhythm information sinceit is unreliable due to the nature of the MIDI format. The musical notes are ordered according to the onset time andthe musical instruments in each ﬁle and we obtained the sequence of pitches in this order. The

Wikifonia datasetcontains Western popular songs and jazz songs mainly composed in the early to mid 20th century and was compiled inthe work of [2]. The dataset contains 6,388 songs (mostly vocal melodies). The

J-pop dataset contains 1,596 popularsongs in Japan composed between 1946 and 2010 whose transcriptions are published in [3]. The

Irish song datasetcontains Irish folk songs collected from a public website and was compiled in the work of [4]. 6,345 songs in G majorkey and in 4/4 time were used for the analysis. The contents of the Wikifonia, J-pop, and Irish song datasets arerepresented in the MusicXML format and we extracted the pitch and rhythm information. The pitch information wasextracted as a sequence of integer pitches. The rhythm information was extracted as sequences of onset and oﬀsettimes, both represented in beat units. The score-notated duration (note value) of a musical note was deﬁned as thediﬀerence between its onset and oﬀset times.

S2. DATA ANALYSIS OF FREQUENCY STATISTICS

For the analysis in Fig. 1, we extracted the within-song frequencies of the intervals of consecutive pitches modulo12 (pitch-class intervals; PCI) and those of the ratios of consecutive note durations (note-value ratios; NVR). Given asequence of pitches ( p n ) Nn =1 , the sequence of PCI ( q n ) N − n =1 is obtained by q n ≡ p n +1 − p n (mod 12), where 0 ≤ q n ≤ ≤ q ≤

11, the within-song PCI frequency was calculated by dividing the number of appearance of q ∗ [email protected] a r X i v : . [ phy s i c s . s o c - ph ] F e b C u m u l a t i v e d i ff e r en c e Rank ∆ C D (a) (b)(c)(d)(e) (f) (g) (h) (i) FIG. S1. Cumulative diﬀerences of the data and ﬁtted beta distributions (representing the “goodness of ﬁt”) sorted by rank.The alphabetical labels represent the corresponding panels in Fig. 1. by the total number of musical notes in a song. Following [1], we did not use the zero PCI for the analysis since itdescribes a continuation of the same tone (up to octave equivalence) and has little information. The PCI probabilitiesare independent of musical key and commonly used for analyzing music styles. The within-song frequencies of NVRsare similarly deﬁned by counting the number of times the ratio r n : r n +1 of consecutive note values is a speciﬁc ratio.We calculated the frequencies of the most common ratios: 1:1, 1:2, 2:1, 1:3, 3:1, 2:3, 3:2, 1:4, 4:1, 1:6, and 6:1 (11statistics in total).The within-population distribution of the within-song frequencies were obtained for each of the four datasets. Thezero frequency samples were not used in the analysis of these distribution, as in [1], since those samples can becontaminated with diﬀerent styles/genres of music. Since the 11 PCI statistics were calculated for all the datasetsand the 11 NVR statistics were calculated for the datasets other than the classical music dataset, we analyzed 77distributions in total.For each distribution, the beta distribution was ﬁtted by the moment-matching method using the mean and variance.To quantitatively measure the “goodness of ﬁt”, we computed the cumulative diﬀerence ∆ CD of the data and ﬁttedbeta distributions deﬁned as ∆ CD = Z dθ | C data ( θ ) − C ﬁt ( θ ) | , (S1)where C data ( θ ) denotes the data cumulative distribution function (CDF) and C ﬁt ( θ ) denotes the CDF of the ﬁttedbeta distribution. In numerical analysis, the data distribution was represented by a histogram and the integral inEq. (S1) was approximated by a summation. The CDF was used here instead of the probability distribution functionto reduce the dependence on the histogram bin width (0 .

005 in our analysis). The distribution of the values of ∆ CD is shown in Fig. S1.For the estimation of D (¯ θ ) /V ( θ ) discussed below Eq. (7) in the main text, we used the distributions with∆ CD < .

004 that included 51 (about two thirds) of the 77 distributions. The (min, mean, max, s.d.) were(40 . , . , . , .

1) (classical), (17 . , . , . , .

11) (Wikifonia), (39 . , . , . , .

0) (J-pop), and (24 . , . . , .

0) (Irish). The average numbers of notes ¯ L p within each corpus were 2300 (classical), 107 (Wikifonia),148 (J-pop), and 115 (Irish). For all datasets, the values of D (¯ θ ) /V ( θ ) were similar among diﬀerent statistics andsigniﬁcantly smaller than ¯ L p . S3. DATA ANALYSIS OF GLOBAL STATISTICS

For the analysis in Fig. 2, we calculated the song-wise global statistics as follows. The probabilities of pitch intervals I within songs were obtained by calculating the relative frequencies of pitch intervals | p n +1 − p n | . The rate parameterof the exponential distributions used to ﬁt the contours of these distributions were calculated by the probability valuesin the range 1 ≤ I ≤

12. To deal with missing samples and deviations due to the rareness of dissonant intervals, weused the method of least squares (the linear regression was applied in the logarithmic domain of I ). Similarly, thecontour of the distribution of note-value ratios 1: R was ﬁtted by a Pareto distribution in the range 1 ≤ R ≤

10 byapplying the linear regression in the log-log domain. The probability distribution P ( K ) of the bar-wise note density K was ﬁtted by a Poisson distribution by applying the linear regression for ln K ! P ( K ) in the range 2 ≤ K ≤ -4 -3 -2 -1

0 2 4 6 8 10 12 14 P ( I ) Pitch interval I Y ∝ exp(-0.093 X) 0 0.05 0.1 0.15 0 0.1 0.2 0.3 0.4 0.5 Ψ ( / < I > ) Song-wise 1/

Gamma fit

FIG. S2. Distributions of pitch-interval statistics in the classical music data. The left panel shows the distributions ofpitch intervals within songs (ﬁne line segments), that within the whole dataset (bold line segments), and a ﬁtted exponentialdistribution (straight line); the right panel shows the distribution of song-wise statistics (reciprocals of the means) of pitchintervals and a ﬁtted gamma distribution. -4 -3 -2 -1

0 2 4 6 8 10 12 14 (a) P ( I ) Pitch interval I Y ∝ exp(-0.40 X) 0 0.05 0.1 0 0.2 0.4 0.6 0.8 Ψ ( / < I > ) Song-wise 1/

Gamma fit10 -4 -3 -2 -1

1 2 3 4 5 7 10 15 (b) P ( R ) Note-value ratio 1: R Y ∝ X -2.74 Ψ ( / < l n R > ) Song-wise 1/ < ln R> Gamma fit10 -4

0 2 4 6 8 10 12 14 (c) K ! P ( K ) Bar-wise note density K Y ∝ (4.45) X Ψ ( < K > ) Song-wise

Gamma fit

FIG. S3. Distributions of global musical statistics in the Wikifonia data. (a) The left panel shows the distributions ofpitch intervals within songs (ﬁne line segments), that within the whole dataset (bold line segments), and a ﬁtted exponentialdistribution (straight line); the right panel shows the distribution of song-wise statistics (reciprocals of the means) of pitchintervals and a ﬁtted gamma distribution. (b) Similar results for note value ratios of the form 1: R , where the global distributionis ﬁtted by a Pareto distribution. (c) Similar results for bar-wise note densities, where the global distribution is ﬁtted by aPoisson distribution. this dataset contains musical pieces by various instrumentations (including ensemble music) whereas the other threedatasets contain (mostly vocal) melodies. Despite this diﬀerence, the tendencies that the contour is approximatelyexponential distributed and the within-population distribution is approximately gamma distribution are same as theother cases. The results for the Wikifonia data in Fig. S3 are quantitatively similar to the results for the J-pop datain Fig. 2, indicating that the general tendencies are similar for popular music developed in diﬀerent regions. Thewithin-song distributions of the note-value ratios for the Irish music data in Fig. S4(b) clearly diﬀer from the othercases, reﬂecting the characteristic of the Irish music that note value ratios 1: R with R greater than 4 are much rarercompared to the other cases. The within-population distribution of the shape parameter has a noticeable excess in thesmall regime in comparison to the ﬁtted gamma distribution. As small shape parameters correspond to large values of h ln R i , the fact that data samples are sparse for large R may have caused unreliable estimation in this regime. Whilethere are some quantitative and qualitative diﬀerences, the feature that the conjugate gamma distributions appear inthe within-population distributions is found universally for all the analyzed datasets from diﬀerent musical genres.We can estimate the values of D (¯ θ ) /V ( θ ) for the global statistics, similarly as for the frequency statistics. The -4 -3 -2 -1

0 2 4 6 8 10 12 14 (a) P ( I ) Pitch interval I Y ∝ exp(-0.43 X) 0 0.05 0.1 0 0.2 0.4 0.6 0.8 Ψ ( / < I > ) Song-wise 1/

Gamma fit10 -6 -5 -4 -3 -2 -1

1 2 3 4 5 7 10 15 (b) P ( R ) Note-value ratio 1: R Y ∝ X -4.48 Ψ ( / < l n R > ) Song-wise 1/ < ln R> Gamma fit10 -4

0 2 4 6 8 10 12 14 (c) K ! P ( K ) Bar-wise note density K Y ∝ (5.49) X Ψ ( < K > ) Song-wise

Gamma fit

FIG. S4. Distributions of global musical statistics in the Irish song data. See the caption to Fig. S3. values estimated from the distributions of (pitch intervals, note value ratios) were (2 .

8, N/A) (classical), (7 . , . . , .

1) (J-pop), and (7 . , .

8) (Irish) (bar-wise note density were excluded from the analysis sincethe units of data samples are diﬀerent). These values are signiﬁcantly smaller than the corresponding values for thefrequency statistics, suggesting that there is even larger statistical dependence on the samples (possibly due to globalrepetitive structure within songs) for these statistics.

S4. DERIVATION OF EQ. (6)

We show that Eq. (6) is a necessary and suﬃcient condition for the equilibrium distribution in Eq. (5) to be theminimal conjugate distribution with C = 0 in Eq. (4). We assume that functions A ( θ ) and B ( θ ) are smooth in thedomain of deﬁnition and that B ( θ ) = 0, which is equivalent to the well-deﬁnedness of the mean of statistic F (recallthat h F i = A ( θ ) /B ( θ )). Comparing Eqs. (4) and (5), we ﬁnd that the equilibrium distribution is the conjugatedistribution if and only if ln B ( θ ) is a linear combination of B ( θ ) and A ( θ ) up to a constant term, that is, we canwrite ln B ( θ ) = c + c B ( θ ) + c A ( θ )for some constants c i . Diﬀerentiating this equation, we obtain an equivalent condition: B ( θ ) B ( θ ) = c B ( θ ) + c A ( θ ) = ( c + c θ ) B ( θ ) , (S2)where we have used the relation A ( θ ) /B ( θ ) = θ (as assumed in the expectation value parameterization). Since D ( θ ) = 1 /B ( θ ), Eq. (S2) is equivalent to the condition that D ( θ ) is linear in θ , or D ( θ ) is quadratic in θ , which iswhat Eq. (6) says.One can directly solve the zero-current equation by substituting the minimal conjugate distribution. Substituting M ( θ ) = u ( ζ − θ ) and K ( θ ) = D ( θ ) /L into Eq. (2), the zero-current equation is given as2 Lu ( θ − ζ )Ψ ∗ = − ddθ ( D Ψ ∗ ) . (S3)Substituting Ψ ∗ ( θ ) = exp (cid:2) χB ( θ ) − νA ( θ ) − W ( χ, ν ) (cid:3) into Eq. (S3) and using Ψ = ( χB − νA )Ψ ∗ = ( χ − νθ ) B Ψ ∗ = χ − νθD Ψ ∗ and Eq. (6), we have (cid:8) Lu ( θ − ζ ) + d + 2 d θ + χ − νθ (cid:9) Ψ ∗ ( θ ) = 0 . From this, we obtain an explicit solution χ = 2 Luζ − d , ν = 2 Lu + 2 d . (S4) S5. EXAMPLES OF EXPONENTIAL FAMILY DISTRIBUTIONS WITH QUADRATIC ESTIMATIONVARIANCES

Here we calculate the estimation variances for the well-known distributions listed in Table I in the expectation valueparameterization and conﬁrm that they are constant, linear, or quadratic functions. We also identify the correspondingconjugate distributions.

Bernoulli. —The Bernoulli distribution is deﬁned asBer( x ; p ) = p x (1 − p ) − x = exp (cid:2) x (cid:8) ln p − ln (1 − p ) (cid:9) + ln (1 − p ) (cid:3) , (S5)where x takes values in 0 , p is the probability to get 1. The distribution has the form of Eq. (3) with F ( x ) = x, B ( p ) = ln p − ln (1 − p ) , A ( p ) = − ln (1 − p ) . Since the mean and variance of the static F = x are h x i = p and V ( x ) = p (1 − p ), respectively, p is the expectationvalue parameter ( θ = p ) and the estimation variance is D ( θ ) = θ (1 − θ ). The conjugate distribution is calculated as˜ φ ( θ ; χ, ν ) ∝ exp (cid:2) χB ( θ ) − νA ( θ ) (cid:3) ∝ θ χ (1 − θ ) ν − χ , and therefore ˜ φ ( θ ; χ, ν ) = 1 B ( χ + 1 , ν − χ + 1) θ χ (1 − θ ) ν − χ = Beta( θ ; χ + 1 , ν − χ + 1) . (S6) Poisson. —The Poisson distribution is deﬁned asPois( x ; λ ) = λ x e − λ x ! = exp( x ln λ − λ − ln x !) , (S7)where x takes values in N and λ is the rate parameter. The distribution has the form of Eq. (3) with F ( x ) = x, B ( λ ) = ln λ, A ( λ ) = λ. Since the mean and variance of the static F = x are h x i = λ and V ( x ) = λ , respectively, λ is the expectation valueparameter ( θ = λ ) and the estimation variance is D ( θ ) = θ . The conjugate distribution is calculated as˜ φ ( θ ; χ, ν ) ∝ exp (cid:2) χB ( θ ) − νA ( θ ) (cid:3) ∝ θ χ e − νθ , and therefore ˜ φ ( θ ; χ, ν ) = ν χ +1 Γ( χ + 1) θ χ e − νθ = Gam( θ ; χ + 1 , ν ) . (S8) Gaussian parameterized by mean. —The Gaussian distribution is deﬁned asGauss( x ; µ, Σ) = 1 √ π Σ e − ( x − µ )22Σ = exp (cid:20) µ Σ x − x − µ −

12 ln(2 π Σ) (cid:21) , (S9)where x takes values in R , and µ and Σ are the mean and variance ( h x i = µ , h ( x − µ ) i = Σ). Focusing on the statistic x and treating Σ as a constant parameter, the distribution has the form of Eq. (3) with F ( x ) = x, B ( µ ) = µ Σ , A ( µ ) = µ . Since the mean and variance of the static F = x are h x i = µ and V ( x ) = Σ, respectively, µ is the expectation valueparameter ( θ = µ ) and the estimation variance is D ( θ ) = Σ. The conjugate distribution is calculated as˜ φ ( θ ; χ, ν ) ∝ exp (cid:2) χB ( θ ) − νA ( θ ) (cid:3) ∝ exp (cid:20) χ θ Σ − ν θ (cid:21) , and therefore ˜ φ ( θ ; χ, ν ) = r ν π Σ exp (cid:20) − ν (cid:18) θ − χν (cid:19) (cid:21) = Gauss (cid:18) θ ; χν , Σ ν (cid:19) . (S10) Gaussian parameterized by variance. —Focusing on the statistic ( x − µ ) and treating µ as a constant parameter,the Gaussian distribution in Eq. (S9) has the form of Eq. (3) with F ( x ) = ( x − µ ) , B (Σ) = − , A (Σ) = 12 ln(2 π Σ) . Since the mean and variance of the static F ( x ) are h F i = Σ and V ( F ) = 2Σ , respectively, Σ is the expectation valueparameter ( θ = Σ) and the estimation variance is D ( θ ) = 2 θ . The conjugate distribution is calculated as˜ φ ( θ ; χ, ν ) ∝ exp (cid:2) χB ( θ ) − νA ( θ ) (cid:3) = exp (cid:20) − χ θ − ν πθ ) (cid:21) ∝ θ − ν/ e − χ θ and therefore ˜ φ ( θ ; χ, ν ) = ( χ ) ν − Γ( ν −

1) 1 θ ν e − χ θ = InvGam (cid:18) θ ; ν − , χ (cid:19) . (S11) Gamma parameterized by mean. —The Gamma distribution is deﬁned asGam( x ; α, β ) = β α Γ( α ) x α − e − βx , (S12)where x takes values in [0 , ∞ ), and α and β denote the shape and rate parameters. Focusing on the statistic x andtreating α as a constant parameter, the distribution has the form of Eq. (3) with F ( x ) = x, B ( β ) = − β, A ( β ) = − α ln β. Since the mean and variance of the static F = x are h x i = α/β and V ( x ) = α/β , respectively, θ ≡ α/β is theexpectation value parameter and the estimation variance is given as D ( θ ) = θ /α . The conjugate distribution iscalculated as ˜ φ ( θ ; χ, ν ) ∝ exp (cid:2) χB ( θ ) − νA ( θ ) (cid:3) ∝ exp (cid:20) − αχθ − αν ln θ (cid:21) and therefore ˜ φ ( θ ; χ, ν ) = ( χα ) να − Γ( να −

1) 1 θ να e − χα/θ = InvGam( θ ; να − , χα ) . (S13)With the relation β = α/θ , this means that β is gamma distributed:˜ φ ( β ; χ, ν ) = Gam( β ; να − , χ ) . (S14)The exponential distribution is a special case of gamma distribution and the Pareto distribution can be obtainedfrom the exponential distribution by a logarithmic conversion. Therefore, the above calculation shows that theconjugate distributions for the statistical parameters analyzed in Figs. 2(a) and (c) are gamma distributions. S6. MATHEMATICAL ANALYSIS OF THE SOLUTION OF THE FP EQUATION (2)

We ﬁrst calculate the mean h θ i ∗ ≡ R dθ θ Ψ ∗ ( θ ) for the zero-current equilibrium solution Ψ ∗ to Eq. (2). Integratingboth of Eq. (S3) sides gives 2 Lu ( h θ i ∗ − ζ ) = − [ D Ψ ∗ ] θ + θ − , where ( θ − , θ + ) represents the domain of θ . If the surface term on the RHS vanishes at both ends of the domain of θ ,as it regularly does, we have h θ i ∗ = ζ. The physical meaning of this relation is clear: after a long time the directional eﬀect of the oblique transmission(linear pressure) term M ( θ ) dominates (the eﬀect of the diﬀusion term K ( θ ) is directionless).Next, we suppose that generative model φ has a quadratic estimation variance as in Eq. (6). Using Eq. (S3), wehave 2 Lu h ( θ − ζ ) i ∗ = 2 Lu Z dθ ( θ − ζ ) Ψ ∗ = − Z dθ ( θ − ζ ) ddθ ( D Ψ ∗ ) = − (cid:2) ( θ − ζ ) D Ψ ∗ (cid:3) θ + θ − + Z dθ D Ψ ∗ . (S15)The last integral can be transformed as Z dθ D Ψ ∗ = d + d h θ i ∗ + d h θ i ∗ = D ( ζ ) + d h ( θ − ζ ) i ∗ . Thus, if the surface term in Eq. (S15) vanishes at both ends of the domain of θ , as it regularly does, we have h ( θ − ζ ) i ∗ = D ( ζ )2 Lu − d , (S16)which is same as Eq. (7). This result can be explicitly checked for the example cases presented in the previous sectionby using the equilibrium solution in Eq. (S4) and the known formulas for the variances of the conjugate distributions.Given the stationary solution Ψ ∗ ( θ ), the time-dependent FP equation (2) can in general be solved by the eigen-function method. Here, we use the eigenfunction method to calculate the convergence time to the equilibrium. Wedeﬁne the coeﬃcient function Q ( θ, t ) by Ψ( θ, t ) = Q ( θ, t )Ψ ∗ ( θ ) and substitute it into Eq. (2) to obtain ∂Q ( θ, t ) ∂t = (cid:20) M ( θ ) ∂∂θ + K ( θ )2 ∂ ∂θ (cid:21) Q ( θ, t ) . Thus, the equation for an eigenfunction Q ( θ, t ) = e − λt Q λ ( θ ) is given as (cid:20) K ( θ )2 d dθ + M ( θ ) ddθ + λ (cid:21) Q λ ( θ ) = 0 . Using the method of series expansion Q λ = P ∞ n =0 a n θ n and substituting M = u ( ζ − θ ) and K = ( d + d θ + d θ ) /L ,we obtain the following iterative equation for the coeﬃcients: d ( n + 1)( n + 2) a n +2 + (cid:8) d n ( n + 1) + 2 L ( n + 1) uζ (cid:9) a n +1 + (cid:8) d n ( n −

1) + 2 L ( λ − nu ) (cid:9) a n = 0 . (S17)For all the distributions listed in Table I, one can check that the series Q λ must have a ﬁnite order to be a normalizablesolution. This leads to the condition that the coeﬃcient of a n in Eq. (S17) must vanish for some n , and we obtaindiscrete eigenvalues λ m = mu − m ( m − L d ( m = 0 , , , . . . ) . (S18)Since the lowest modes are λ = 0, λ = u , · · · , the lowest nonzero mode decays as ∝ e − ut . That is, the relaxationtime (convergence time to the equilibrium) is of order u − , reproducing the physically expected result as explainedin the main text. P r obab ili t y θ t = 0t = 10t = 20 0 0.03 0.06 0.09 0.12 0.15 0 0.2 0.4 0.6 θ t = 50t = 100 0 0.03 0.06 0.09 0.12 0.15 0 0.2 0.4 0.6 θ t = 200t = 500 0 0.02 0.04 0.06 0.08 0.1 0 0.2 0.4 0.6 P r obab ili t y θ t = 0t = 10t = 20 0 0.02 0.04 0.06 0.08 0.1 0 0.2 0.4 0.6 θ t = 50t = 100 0 0.02 0.04 0.06 0.08 0.1 0 0.2 0.4 0.6 θ t = 200t = 500 (a)(b) FIG. S5. Simulation results for the SLG system with beta distributed generative models. Parameters were set as N = 20000, L = 1000, u = 1 /

50 = 0 .

02, and ζ = 0 .

1. The ﬁtted distribution is Beta( θ ; 4 , θ = 0 .

2. (b) Initial distribution uniformly distributed in the range [0 , . P r obab ili t y θ t = 0t = 10t = 20 0 0.03 0.06 0.09 0.12 0.15 0 3 6 9 12 15 θ t = 50t = 100 0 0.03 0.06 0.09 0.12 0.15 0 3 6 9 12 15 θ t = 200t = 500 0 0.02 0.04 0.06 0.08 0 3 6 9 12 15 P r obab ili t y θ t = 0t = 10t = 20 0 0.02 0.04 0.06 0.08 0 3 6 9 12 15 θ t = 50t = 100 0 0.02 0.04 0.06 0.08 0 3 6 9 12 15 θ t = 200t = 500 (a)(b) FIG. S6. Simulation results for the SLG system with Poisson distributed generative models. Parameters were set as N = 20000, L = 25, u = 1 /

50 = 0 .

02, and ζ = 5. The ﬁtted distribution is Gam( θ ; 5 , θ = 3. (b)Initial distribution uniformly distributed in the range [1 , S7. NUMERICAL ANALYSIS OF THE SOLUTION OF THE FP EQUATION (2)

To conﬁrm the results in the previous section by numerical calculation, we conducted simulation experiments forstatistical learn-generate (SLG) systems whose generative models are one of those listed in Table I. For each SLGsystem, we iterate the learning and generation processes with oblique transmission (with constant ζ ) and observe thatthe within-population distribution of the cultural traits (statistical parameters) converge to the conjugate distributionwith parameters given in Eq. (S4). We also conﬁrm that the relaxation time is of order u − .For simulation, we tested two extreme cases for the initial within-population distribution, a delta function anda uniform distribution with a certain value range. The data generation process was simulated by random numbergeneration functions provided in the C++ standard library. To get samples from Poisson distributions, we usedKnuth’s algorithm (D. E. Knuth, “The Art of Computer Programming 2. Seminumerical Algorithms (3rd ed.),”Addison Wesley, 1997).The results are shown in Figs. S5 to S9. In all cases, we see that the within-population distribution converges tothe theoretically predicted conjugate distribution regardless of the initial distribution. It can be also conﬁrmed thatthe convergence is attained at time t ∼

100 = 2 u − . P r obab ili t y θ t = 0t = 10t = 20 0 0.03 0.06 0.09 0.12 -3 -2 -1 0 1 2 3 θ t = 50t = 100 0 0.03 0.06 0.09 0.12 -3 -2 -1 0 1 2 3 θ t = 200t = 500 0 0.02 0.04 0.06 0.08 -3 -2 -1 0 1 2 3 P r obab ili t y θ t = 0t = 10t = 20 0 0.02 0.04 0.06 0.08 -3 -2 -1 0 1 2 3 θ t = 50t = 100 0 0.02 0.04 0.06 0.08 -3 -2 -1 0 1 2 3 θ t = 200t = 500 (a)(b) FIG. S7. Simulation results for the SLG system with Gaussian distributed generative models and mean value traits with aﬁxed variance Σ = 1. Parameters were set as N = 20000, L = 50, u = 1 /

50 = 0 .

02, and ζ = 0. The ﬁtted distribution isGauss( θ ; 0 , . θ = 2. (b) Initial distribution uniformly distributed in the range [ − , P r obab ili t y θ t = 0t = 10t = 20 0 0.02 0.04 0.06 0.08 0.1 0.12 0 1 2 3 4 θ t = 50t = 100 0 0.02 0.04 0.06 0.08 0.1 0.12 0 1 2 3 4 θ t = 200t = 500 0 0.02 0.04 0.06 0 1 2 3 4 P r obab ili t y θ t = 0t = 10t = 20 0 0.02 0.04 0.06 0 1 2 3 4 θ t = 50t = 100 0 0.02 0.04 0.06 0 1 2 3 4 θ t = 200t = 500 (a)(b) FIG. S8. Simulation results for the SLG system with Gaussian distributed generative models and variance value traits witha ﬁxed mean µ = 0. Parameters were set as N = 20000, L = 100, u = 1 /

50 = 0 .

02, and ζ = 1. The ﬁtted distribution isInvGam( θ ; 3 , θ = 0 .

5. (b) Initial distribution uniformly distributed in the range[0 . , S8. EQUILIBRIUM SOLUTION OF THE FP EQUATION IN THE MULTIPLE PARAMETERS CASE

We extend the FP equation with the oblique transmission term for generative models with more than one parameterand show that equilibrium solution is given by a conjugate distribution in general. An exponential family distributionwith multiple parameters θ = ( θ i ) Ki =1 is given as φ ( x ; θ ) = exp (cid:20) K X k =1 F k ( x ) B k ( θ ) − A ( θ ) + U ( x ) (cid:21) , (S19)where F k are independent statistics. In the following, we write ∂ i = ( ∂/∂θ i ) and omit the summation symbol. From h ∂ i φ i = 0, we obtain h ∂ i A i = h F k i ∂ i B k = G ik h F k i , where G ik ≡ ∂ i B k . We assume G ik is invertible: G − ‘i G ik = δ ‘k and G ik G − kj = δ ij . In the expectation value parameterization, h F k i = G − ki ∂ i A = θ k , ∂ i A = G ik θ k . P r obab ili t y θ t = 0t = 10t = 20 0 0.03 0.06 0.09 0.12 0 1 2 3 4 θ t = 50t = 100 0 0.03 0.06 0.09 0.12 0 1 2 3 4 θ t = 200t = 500 0 0.02 0.04 0.06 0 1 2 3 4 P r obab ili t y θ t = 0t = 10t = 20 0 0.02 0.04 0.06 0 1 2 3 4 θ t = 50t = 100 0 0.02 0.04 0.06 0 1 2 3 4 θ t = 200t = 500 (a)(b) FIG. S9. Simulation results for the SLG system with gamma distributed generative models and mean value traits θ with aﬁxed shape parameter α = 0. The rate parameter is β = 1 /θ . Parameters were set as N = 20000, L = 50, u = 1 /

50 = 0 . ζ = 1. The ﬁtted distribution is InvGam( θ ; 3 , θ = 0 .

5. (b) Initial distributionuniformly distributed in the range [0 . , Diﬀerentiating the last expression, we have G ij = ∂ i ∂ j A − θ k ∂ i ∂ j B k . From h ∂ i ∂ j φ i = 0, we obtain, ∂ i ∂ j A − h F k i ∂ i ∂ j B k = h ( F k G ik − ∂ i A )( F ‘ G j‘ − ∂ j A ) i . Multiplying G − mi G − nj on both sides and summing over i and j , D mn ≡ = h ( F m − θ m )( F n − θ n ) i = G − mi G − nj G ij = G − mn . Since the estimation covariance matrix D mn is symmetric, so is G mn in the expectation value parameterization.The FP equation for the case with multiple parameters can be given as ∂ Ψ( θ, t ) ∂t = − ∂∂θ i (cid:20) M i ( θ )Ψ( θ, t ) − ∂ { K ij ( θ )Ψ( θ, t ) } ∂θ j (cid:21) . Substituting M i ( θ ) = u ( ζ i − θ i ) and K ij ( θ ) = D ij /L , we obtain ∂ Ψ( θ, t ) ∂t = − ∂∂θ i (cid:20) u ( ζ i − θ i )Ψ( θ, t ) − L ∂ { D ij ( θ )Ψ( θ, t ) } ∂θ j (cid:21) . (S20)The zero-current equation is thus 2 Lu ( ζ i − θ i )Ψ − ( ∂ j D ij )Ψ − D ij ∂ j Ψ = 0 . (S21)Assume the following conjugate distribution: Ψ ∝ exp[ χ k B k ( θ ) − νA ( θ ) + C ( θ )]. Since ∂ j Ψ = ( χ k ∂ j B k − ν∂ j A + ∂ j C )Ψ = (cid:2) G jk ( χ k − νθ k ) + ∂ j C (cid:3) Ψ , Eq. (S21) is equivalent to (cid:8) Luζ i − χ i + ( ν − Lu ) θ i − ∂ j D ij − D ij ∂ j C (cid:9) Ψ = 0 . Thus, the conjugate distribution is a solution if the following equations are satisﬁed: ν = 2 Lu, χ i = 2 Luζ i , ∂ j D ij + D ij ∂ j C = 0 . The last equation is equivalent to ∂ k C = − G ki ∂ j D ij . D = G − , ∂ j D = − D ( ∂ j G ) D or ∂ j D k‘ = − G − km G − n‘ ∂ j G mn . The RHS can be transformed as − G ki ∂ j D ij = G − ij ∂ j G ki = G − ij ∂ j ∂ k B i = G − ij ∂ k ∂ j B i = G − ij ∂ k G ji , and the equation for C is ∂ k C = G − ij ∂ k G ji . Using Jacobi’s formula this can be solved as C = ln det G . Therefore, the equilibrium solution is given asΨ ∗ ∝ exp[2 Luζ k B k ( θ ) − LuA ( θ ) + ln det G ( θ )] . (S22)As an example, we consider a discrete (categorical) distribution. A ( K + 1)-dimensional discrete distribution isdescribed by a variable x = ( x i ) K +1 i =1 , where x i is either 0 or 1 and P K +1 i =1 x i = 1 (one-hot vector). The parametersform a probability vector θ = ( θ i ) K +1 i =1 ( P K +1 i =1 θ i = 1). Because of the normalization conditions, the last element canbe represented by the other elements as x K +1 = 1 − P Ki =1 x i and θ K +1 = 1 − P Ki =1 θ i , and the parameter space is infact K -dimensional. The generative model is an exponential family distribution: φ ( x ; θ ) = K +1 Y i =1 θ x i i = exp (cid:20) K X i =1 x i { ln θ i − ln θ K +1 } + ln θ K +1 (cid:21) . We have F i = x i , B i = ln θ i − ln θ K +1 , and A = − ln θ K +1 , and thus G ij = δ ij /θ i + 1 /θ K +1 and ∂ i A = 1 /θ K +1 .The estimation covariance matrix is D ij = G − ij = θ i ( δ ij − θ j ). We can show det G = (det D ) − = ( θ · · · θ K +1 ) − bymathematical induction on K . Substituting this into Eq. (S22), we obtainΨ ∗ ( θ ) ∝ exp (cid:20) K X i =1 Luζ i (ln θ i − ln θ K +1 ) + 2 Lu ln θ K +1 − K +1 X i =1 ln θ i (cid:21) = exp (cid:20) K +1 X i =1 (2 Luζ i − θ i (cid:21) , where ζ K +1 ≡ − ζ − · · · − ζ K . Therefore, the equilibrium distribution is a Dirichlet distribution Dir( θ ; α ) withDirichlet parameters α i = 2 Luζ i ( i = 1 , . . . , K + 1). [1] E. Nakamura and K. Kaneko, Sci. Rep. , 15993 (2019).[2] F. Simonetta, F. Carnovalini, N. Orio, and A. Rod`a, in Proceedings of the Audio Mostly 2018 on Sound in Immersion andEmotion (2018).[3] K. Shiiba and S. Kubo (ed.),

Japanese Songs Vols. 3–9 (in Japanese) (Nobarasha, 1998/1999/2001/2004/2014).[4] B. L. Sturm, J. F. Santos, O. Ben-Tal, and I. Korshunova, in

Related Researches

Random Choices can Facilitate the Solving of Collective Network Coloring Problems by Artificial Agents

by Matthew I. Jones

Persistent individual bias in a voter model with quenched disorder

by Joseph W. Baron

The scaling of social interactions across animal species

by Luis E C Rocha

Smart testing and critical care bed sharing for COVID-19 control

by Paulo J. S. Silva

COVID-19 and Unemployment Risk: Lessons for the Vaccination Campaign

by Valentina Pieroni

Percolation theory of self-exciting temporal processes

by Daniele Notarmuzi

The critical behavior of Hegselmann-Krause opinion model with smart agents

by Yueying Zhu

Entropic Dynamics of Networks

by Felipe Xavier Costa

Mainstreaming of conspiracy theories and misinformation

by N.F. Johnson

Simulation of the impact of people mobility, vaccination rate, and virus variants on the evolution of Covid-19 outbreak

by Corrado Spinella

Noisy multistate voter model for flocking in finite dimensions

by Ernesto S. Loscar

Power laws and phase transitions in heterogenous car following with reaction times

by A. Sai Venkata Ramana

COVID-19 spreading under containment actions

by Fernando E. Cornes

Revealing Critical Characteristics of Mobility Patterns in New York City during the Onset of COVID-19 Pandemic

by Akhil Anil Rajput

Interspecific competition shapes the structural stability of mutualistic networks

by Xiangrong Wang

A survey on modelling of infectious disease spread and control on social contact networks

by Md Shahzamal

The climate impact of ICT: A review of estimates, trends and regulations

by Charlotte Freitag

Evolution of collective fairness in complex networks through degree-based role assignment

by Andreia Sofia Teixeira

Synergies in offshore energy: a roadmap for the Danish sector

by Matteo DAndrea

Scaling of Urban Income Inequality in the United States

by Elisa Heinrich Mora

Clustering for epidemics on networks: a geometric approach

by Bastian Prasse

Optimising the mitigation of epidemic spreading through targeted adoption of contact tracing apps

by Aleix Bassolas

Was India saved by staying below the critical travel threshold and was lockdown and travel restriction the most important public health intervention?

by Sunil Kumar Raina MD Fiapsm

Regional and Sectoral Structures and Their Dynamics of Chinese Economy: A Network Perspective from Multi-Regional Input-Output Tables

by Tao Wang

The rich still get richer: Empirical comparison of preferential attachment via linking statistics in Bitcoin and Ethereum

by Dániel Kondor

«

1

2

3

4

»

Submitted on 2 Feb 2021 (v1), last revised 10 Feb 2021 (this version, v2) Updated

arXiv.org Original Source

NASA ADS

Google Scholar

Semantic Scholar