Conjugate Distribution Law in Cultural Evolution via Statistical Learning
CConjugate Distribution Law in Cultural Evolution via Statistical Learning
Eita Nakamura ∗ The Hakubi Center for Advanced Research, Kyoto University, Sakyo, Kyoto 606-8501, Japan andGraduate School of Informatics, Kyoto University, Sakyo, Kyoto 606-8501, Japan (Dated: February 11, 2021)Many cultural traits characterizing intelligent behaviors are now thought to be transmittedthrough statistical learning, motivating us to study its consequences for cultural evolution. Weconduct a large-scale music data analysis and observe that various statistical parameters of musicalproducts follow the beta distribution and other conjugate distributions. We construct a simplemodel of cultural evolution incorporating statistical learning and analytically show that conjugatedistributions emerge at equilibrium in the presence of oblique transmission. The results demonstratehow the distribution of a cultural trait within a population depends on the individual’s generativemodel for cultural production (the conjugate distribution law), and open interesting possibilities oftheoretical and experimental studies on cultural evolution and social learning.
Cultural transmission and evolution are essential forthe development of human societies [1]. The commonapproach to studying these processes is to analyze dy-namical systems or stochastic processes incorporating thetransmission, selection, and mutation processes of cul-tural traits [2, 3]. For discrete cultural traits such as anindividual’s native language (English/French/etc.), thewell-developed theories of genetic evolution and popula-tion dynamics can be applied [2, 4, 5]. However, manycultural traits characterizing intelligent behaviors such asstyles of speech production (accent, speech speed, etc.)are not discrete quantities that are directly transmit-ted. Studies on information and cognitive sciences haveprovided accumulating evidences that such an intelligentbehavior often involves a stochastic data generation pro-cess and statistical parameters controlling the process aretransmitted from individuals to individuals through sta-tistical learning of the generated cultural products (e.g.[6–8]). This motivates us to study the signatures and con-sequences of statistical learning from the perspectives ofcultural evolution and social learning [9–14].Signatures of the underlying dynamical process in com-plex biological and social systems are often presented asstatistical laws that characterize the systems’ collectivebehavior. For example, Zipf’s law arises from mobile dy-namics of interacting individuals [15] and from chemicalreaction dynamics in self-reproducing cells [16], and Tay-lor’s law from the random diffusion process of traffic net-works [17]. There are ongoing projects on big-data anal-ysis of cultural products [18–20], and it has been foundthat in classical music data the within-song frequenciesof certain musical elements characterizing music stylesare approximately beta distributed [21]. The beta distri-bution is commonly used for probability parameters inBayesian statistics because it is a conjugate distribution(one that can parameterize both prior and posterior dis-tributions) [22]. So far, however, the origin of the betadistribution law and how general it is remain unknown.
Experimental observation. —To examine the ubiquityof the beta distribution law, we analyzed four music datasets from different genres and observed the distribu-tions of frequencies of pitch and rhythm elements. The classical music dataset consists of pieces in various in-strumentations [21], from which we can extract pitch in-formation. All the other datasets consist of melody data,from which we can extract information about pitches andrhythms. The
Wikifonia dataset contains Western popu-lar songs and jazz songs mainly composed in the early tomid 20th century [23]. The
J-pop dataset contains pop-ular songs in Japan created after 1950 [24]. The
Irish song dataset contains Irish folk songs collected from apublic website [25]. All the datasets contain O (10 –10 )musical pieces created by various composers. As culturaltraits, we extracted the within-song frequencies of the in-tervals of consecutive pitches modulo 12 (pitch-class in-tervals; PCI) and those of the ratios of consecutive notedurations (note-value ratios), and observed their distri-butions within each dataset. See Supplemental Material(SM) for details of the data and analysis method.As shown in Fig. 1, beta distributions were observed forvarious musical elements across different genres, whichextensively generalizes the previous observation [21].Two thirds of the 77 distributions analyzed had a fit-ting quality (measured by the cumulative difference ofthe data and fitted distributions) similar to or better thanthe example in Fig. 1(h). As exemplified in Fig. 1(i), dis-tributions that significantly deviate from the beta distri-bution were typically widespread and had multiple peakscorresponding to distinct music styles (e.g. duple rhythmvs dotted rhythm).We also analyzed song-level global statistics that ap-proximately obey certain statistical distributions usingthe same datasets. Fig. 2 shows the results for the J-popdataset. The contour of the distribution of pitch intervals I is approximately exponential distributed: P ( I ) ∝ e − λI in the range 1 ≤ I ≤
12, while the fine structure re-flects the rareness of dissonant intervals (Fig. 2(a)). Thewithin-corpus distribution of the rate parameters λ ofindividual songs approximately follows the gamma dis-tribution. Similarly, the contour of the distribution of a r X i v : . [ phy s i c s . s o c - ph ] F e b
0 0.1 0.2 (7/77) P ( f r equen cy ) Frequency (b) J-pop NVR 1:4
0 0.1 0.2 (13/77) P ( f r equen cy ) Frequency (e) Classical PCI 6
0 0.2 0.4 (2/77) P ( f r equen cy ) Frequency (a) Wikifonia PCI 9
0 0.2 0.4 (10/77) P ( f r equen cy ) Frequency (d) Irish NVR 1:2
0 0.2 0.4 0.6 (38/77) P ( f r equen cy ) Frequency (g) Irish PCI 10
0 0.2 (52/77) P ( f r equen cy ) Frequency (h) Wikifonia NVR 1:3
0 0.2 0.4 (8/77) P ( f r equen cy ) Frequency (c) Classical PCI 8
0 0.1 0.2 (27/77) P ( f r equen cy ) Frequency (f) J-pop PCI 5
0 0.5 1 (77/77) P ( f r equen cy ) Frequency (i) Irish NVR 1:1
FIG. 1. Within-corpus distributions of within-song frequen-cies of pitch-class intervals (PCIs) and note-value ratios(NVRs) with fitted beta distributions. “PCI i ” representsPCIs of i semitones. The numbers in panels indicate the ranksof fitting quality (ordered in the cumulative difference of thedata and fitted distributions). note value ratios 1: R of integer values R approximatelyobeys a Pareto (power-law) distribution P ( R ) ∝ R − α in the range 1 ≤ R ≤
10, and shape parameters α areroughly gamma distributed (Fig. 2(b)). The number ofnotes in a bar (bar-wise note density) K approximatelyobeys a Poisson distribution P ( K ) ∝ ρ K /K ! in the range2 ≤ K ≤
12, and rate parameters ρ are roughly gammadistributed (Fig. 2(c)). The results for the other datasetswere similar (see SM).The observed gamma distributions are the conjugatedistributions for the exponential, Pareto, and Poissondistributions w.r.t. the analyzed parameters. Togetherwith the beta distribution law, the observations suggesta general conjugate relation between the within-productstatistical distribution and the within-population distri-bution of the statistical cultural traits, which we call the conjugate distribution law . The fact that the relation isobserved in music data of various societies and time pe-riods indicates an underlying general mechanism. Model. —To explain the origin of the conjugate distri-butions, we consider an evolutionary model called thestatistical learn-generate (SLG) system. Suppose that ateach generation t there are N individuals that generatecultural products X tn ( n = 1 , . . . , N ). Each product (e.g.a musical piece) X tn = ( x tn‘ ) L‘ =1 consists of L samples(e.g. musical notes) x tn‘ that are independently generatedby a probability distribution φ ( x ; θ tn ) called a generativemodel, where parameters θ tn are considered as a culturaltrait. For simplicity, the set of all products created by anindividual is treated as a single product with the samesize L . We assume that the distributions of all individu- -4 -3 -2 -1
0 2 4 6 8 10 12 14 (a) P ( I ) Pitch interval I Y ∝ exp(-0.50 X) 0 0.05 0.1 0 0.2 0.4 0.6 0.8 Ψ ( / < I > ) Song-wise 1/
Gamma fit10 -4 -3 -2 -1
1 2 3 4 5 7 10 15 (b) P ( R ) Note-value ratio 1: R Y ∝ X -2.65 Ψ ( / < l n R > ) Song-wise 1/ < ln R> Gamma fit10 -4
0 2 4 6 8 10 12 14 (c) K ! P ( K ) Bar-wise note density K Y ∝ (5.24) X Ψ ( < K > ) Song-wise
Gamma fit(exponential)(Pareto)(Poisson)
FIG. 2. Distributions of global musical statistics (J-pop data).(a) Left: Distributions P of pitch intervals within songs(fine line segments), that within the dataset (bold line seg-ments), and a fitted exponential distribution (straight line).Right: Distribution Ψ of song-wise statistics (reciprocals ofthe means) of pitch intervals and a fitted gamma distribu-tion. (b) Similar results for note value ratios of the form 1: R ,where the global distribution is fitted by a Pareto distribu-tion. (c) Similar results for bar-wise note densities, where theglobal distribution is fitted by a Poisson distribution. als have an identical type but their parameter values canbe different. For example, if φ is a Bernoulli distribution,sample x tn‘ takes 0 or 1, and θ is the probability to get1; usually, 1 represents the presence of a specific elementwe focus on (e.g. a PCI of 6 semitones) and 0 representsthe other cases. When φ is Poisson distributed, sample x tn‘ takes a nonnegative integer and θ represents the rate.We often identify individual n with its cultural trait θ tn .In general, we can incorporate a selection process forthe individuals to produce cultural offspring in the nextgeneration. Since the selection process of cultural prod-ucts is still uncertain [26, 27], however, we consider theselectively neutral case. Also, each individual θ tn is as-sumed to have one cultural offspring θ t +1 n . Parameters θ t +1 n are learned from the parent’s product X tn and we as-sume that the maximum likelihood method is used. Wedenote by ˆ θ tn the parameters learned from product X tn .If offspring use only their parents’ product for learning(vertical transmission), θ t +1 n = ˆ θ tn holds. It is likely incultural transmission that products by other individuals(called secondary parents) in the parent generation arealso used for learning (oblique transmission [2]). Theeffect of oblique transmission from equally contributingsecondary parents can be represented as θ t +1 n = (1 − u )ˆ θ tn + uζ, ζ = 1 N X n ˆ θ tn , (1)where the summation is taken over the secondary par-ents, N is their number, and u represents the strength − u u − u − u uu ζ ・・・ ・・・ Lθ t X t ˆ θ t θ t X t ˆ θ t θ t +11 X t +11 θ t +12 X t +12 ・・・ t t + 1 θ tN X tN ˆ θ tN θ t +1 N X t +1 N Generate ζ ・・・ i nd i v i dua l s Generation samplesData N Ψ( θ, t ) { GenerationTraitLearn ˆ θ t − ˆ θ t − ˆ θ t − N Ψ( θ, t + 1) GL FIG. 3. Statistical learn-generate (SLG) system with obliquetransmission. of their influence (Fig. 3). When the secondary parentsare randomly chosen and N is large, as assumed in thefollowing analysis, we have ζ ’ ¯ θ = N P n θ tn (populationaverage) and ζ is effectively a constant.When φ is Bernoulli distributed, the SLG system isequivalent to the (asexual) Wright-Fisher (WF) modelknown in population genetics [28]. The neutrally selec-tive WF model consists of a population of N g individuals,each having a gene with two alleles (0 and 1). At eachgeneration all individuals are replaced by new individualsand the genes of randomly chosen parents are inherited.The gene frequency follows the same stochastic processas a vertically transmitted trait of the SLG system with L = N g , and thus, the SLG system is equivalent to aset of N populations of the WF model. Moreover, theeffect of oblique transmission in Eq. (1) with constant ζ is equivalent to the linear pressure representing the effectof constant migration (from outside) or mutation [29].We now consider the time evolution of the within-population distribution Ψ( θ, t ) = P ( θ tn ) of cultural traitsin the large N limit. This problem is well understoodfor the WF model, and accordingly, for the case that φ is Bernoulli distributed [30]. In the case of pure verti-cal transmission, the variable θ tn fluctuates statisticallyuntil it becomes 0 or 1, and the equilibrium distributionis given as Ψ( θ, ∞ ) = (1 − ¯ θ ) δ ( θ ) + ¯ θ δ ( θ − L limit. Then the dynamics can bedescribed by a Fokker-Planck (FP) equation ∂ Ψ( θ, t ) ∂t = − ∂∂θ (cid:26) M ( θ )Ψ( θ, t ) − ∂ [ K ( θ )Ψ( θ, t )] ∂θ (cid:27) , (2)where M ( θ ) = u ( ζ − θ ), K ( θ ) = θ (1 − θ ) /L , andthe terms in the curly brackets represent the probabil-ity current [30]. The zero-current equilibrium solutionof the FP equation is given by the beta distributionBeta( θ ; 2 Luζ, Lu (1 − ζ )) [29]. Therefore, the SLG sys-tem can explain an evolutionary origin of the beta distri-bution law in the presence of oblique transmission.To extend the result for cultural traits following moregeneral generative models, we consider exponential fam- ily (EF) distributions with one parameter θ : φ ( x ; θ ) = exp (cid:2) F ( x ) B ( θ ) − A ( θ ) + U ( x ) (cid:3) . (3)A large class of probability distributions includingBernoulli, Poisson, Gaussian, and gamma distributionscan be represented by different choices of F ( x ) and U ( x ).Functions B ( θ ) and A ( θ ) specify how the distributionis parameterized and we have h F i = A ( θ ) /B ( θ ) from0 = ∂ θ R dx φ ( x ; θ ). We can explicitly construct a conju-gate distribution for the distribution in Eq. (3) as˜ φ ( θ ; χ, ν ) = exp (cid:2) χB ( θ ) − νA ( θ ) + C ( θ ) + W ( χ, ν ) (cid:3) , (4)where C ( θ ) can be any function and W ( χ, ν ) representsthe normalization term. The function C does not changethe conjugacy property and the minimal case with C = 0is usually used in Bayesian statistics. To fix a parame-terization scheme for θ , we note that it is natural to usethe expectation value h F i as a parameter. It is knownthat the Cram´er-Rao bound can be attained only withthis parameterization [31], meaning that it is the mostefficient one for statistical learning. We thus assume therelation h F i = A ( θ ) /B ( θ ) = θ . The estimation varianceof F is given as D ( θ ) ≡ h ( F − θ ) i = 1 /B ( θ ), and thedistribution Ψ( θ, t ) follows Eq. (2) with K ( θ ) = D ( θ ) /L .With the oblique transmission effect M ( θ ) = u ( ζ − θ ),the equilibrium solution of Eq. (2) can be written asΨ ∗ ( θ ) ∝ K ( θ ) exp (cid:20) Z θ M ( η ) K ( η ) dη (cid:21) . Using K ( θ ) − = LB ( θ ) and θ = A ( θ ) /B ( θ ), we obtainΨ ∗ ( θ ) ∝ exp (cid:2) LuζB ( θ ) − LuA ( θ ) + ln B ( θ ) (cid:3) . (5)Comparing with Eq. (4), this is a conjugate distributionwith C = ln B , which shows an origin of the general con-jugate distribution law in the presence of oblique trans-mission. Eq. (5) tells how the equilibrium distributionform of cultural traits in the population depends on thegenerative model for cultural production; Ψ ∗ is essen-tially determined by the estimation variance D (note that A = θD − and B = D − ).We can further say that the solution (5) is the minimalconjugate distribution if ln B ( θ ) is a linear combinationof B ( θ ) and A ( θ ) up to a constant term, or equivalently,the estimation variance D ( θ ) = 1 /B ( θ ) is quadratic: D ( θ ) = d + d θ + d θ , (6)where d i are constants (see SM). Table I lists examplesof probability distributions satisfying Eq. (6). It can beshown that if φ ( x ; θ ) satisfies Eq. (6), so does the distri-bution for a variable y = f ( x ) obtained by a transforma-tion function f ( D remains unchanged). For example, thePareto distribution satisfies Eq. (6) w.r.t. the logarithmicmean statistic since the gamma distribution does w.r.t. Individual’s gener-ative model φ Culturaltrait θ Trait distri-bution ΨBernoulli Mean BetaPoisson Mean GammaGaussian Mean GaussianGaussian Variance Inverse gammaGamma Mean Inverse gamma
TABLE I: Examples of exponential family distributionssatisfying Eq. (6) and their conjugate distributions.the mean statistic. EF distributions with quadratic esti-mation variances cover a wide range of well-known distri-butions [32], including those analyzed in Fig. 2 (see SM),indicating that conjugate distributions found in culturalproducts can often be minimal. Not all EF distributionssatisfy Eq. (6): a counter example is the gamma distri-bution w.r.t. the logarithmic mean statistic.Some properties of the equilibrium distribution can bederived. First, under a regular condition that is usuallymet, the relation h θ i ∗ ≡ R dθ θ Ψ ∗ ( θ ) = ζ holds (evenwhen ζ is an external parameter), which is consistentwith the assumption of the constant population mean ζ = ¯ θ . Second, with the condition (6), one can show that V ∗ ( θ ) = Z dθ ( θ − h θ i ∗ ) Ψ ∗ ( θ ) = D ( ζ )2 Lu − d , (7)indicating that when the individuals of the SLG systemtransmit different cultural traits θ simultaneously, thevalues V ( θ ) /D (¯ θ ) at equilibrium are independent of ¯ θ anddepend only on Lu and d . Finally, the relaxation time isof order u − , which is physically expected as the obliquetransmission pushes a proportion u of the trait towards ζ in a time unit. Therefore, the conjugate distribution canbe maintained even in the presence of a selective pressureas long as its effect is small in the time scale of order u − ,which may account for the evolving beta distributionsobserved in classical music data [21]. Derivations andnumerical confirmations of these results are given in SM.In relating the theory with experiments, it is importantto note that the creator’s trait θ can only be observedthrough actual products. When θ is estimated from aproduct with L p samples, a variance of D ( θ ) /L p is ex-pected, and the product-wise statistical parameters (asin Figs. 1 and 2) will have a distribution Ψ p distortedfrom that of θ . Nonetheless, since this distortion processis same as the transmission process, Ψ p approximatelyfollows the same conjugate distribution as Ψ ∗ with avariance increased from Eq. (7) by D ( ζ ) /L p . This ex-plains the empirical conjugate distribution law observedin Figs. 1 and 2. This argument also indicates the dif-ficulty of applying Eq. (7) for estimating the value of Lu directly from product data if Lu (cid:29) L p . The values of D (¯ θ ) /V ( θ ) obtained from the distributions in Fig. 1 were29 . ± . . d . ) for the Wikifonia data (excluding distributions poorly fitted by beta distributions), whichare relatively consistent among different musical elementsand significantly smaller than the mean L p = 107. Sincean extremely small u is unlikely, these small values prob-ably reflect statistical dependence of the samples due tofrequent repetitions in music. Similar results were ob-tained for the other data (see SM).We can extend the SLG system for EF distributionswith more than one parameter and show that a conju-gate distribution emerges at equilibrium in the presenceof oblique transmission (see SM). In the case of discretedistributions, one can show that the conjugate Dirich-let distribution emerges at equilibrium. It is not easy tofind other examples where the minimal conjugate distri-bution is the equilibrium solution due to the non-diagonalelements of the estimation covariance matrix. Detailedanalyses are left for future work. Discussion. —When selectively neutral, the asymptoticbehavior of the SLG system is similar to that of iteratedlearning models with Bayesian agents [10, 13], in whichthe prior distribution is maintained at equilibrium. Thissuggests the necessity of studying the dynamics under se-lection to discriminate these quite different transmissionmechanisms. From the model building perspective, theSLG system removes the reliance on prior distributionsand allows flexible extensions with selective pressures.Our findings indicate a significant link between statis-tical learning and cultural evolution, and connect themwith statistical theory in an intriguing way. Eqs. (5)and (6) indicate a nontrivial relationship between obliquetransmission and Bayesian learning that reflects the ge-ometric structure of generative models [33]. Much sta-tistical theory including Bayesian estimation theory andlimit theorems can be developed for EF distributions sat-isfying Eq. (6) [32, 34], which can now be applied for an-alyzing cultural traits of statistical nature. Importantly,our result provides a theoretical background for choosingsuitable parametric models for cultural traits. For exam-ple, the widely applied Gaussian assumption [2, 3] canbe an inappropriate approximation.The SLG system opens new possibilities and questionsfor studying the dynamic aspects of culture and intel-ligence. For example, the dynamical relation µ t ∝ σ t between the mean µ t and the standard deviation σ t ob-served in classical music data [21], as opposed to σ t ∝√ µ t expected from Eq. (7), implies that the factor Lu is time-varying, similarly to an evolving mutation rate[35, 36]. Finally, it will be interesting to apply the pre-sented analysis method in other cultural domains suchas language [18, 37], fine art [20], and cooking [38], andinvestigate the universal properties of cultural evolution. Acknowledgments:
The author would like to thankKunihiko Kaneko and Nobuto Takeuchi for useful dis-cussions and Madison S. Krieger for helpful commentson the manuscript. This work was in part supported byJSPS KAKENHI No. 19K20340. ∗ [email protected][1] E. Szathm´ary and J. Maynard Smith, Nature , 227(1995).[2] L. L. Cavalli-Sforza and M. W. Feldman, CulturalTransmission and Evolution (Princeton University Press,1981).[3] R. Boyd and P. J. Richerson,
Culture and the Evolution-ary Process (The University of Chicago Press, 1985).[4] M. A. Nowak and D. C. Krakauer, Proc. Natl. Acad. Sci. , 8028 (1999).[5] D. M. Abrams, H. A. Yaple, and R. J. Wiener, Phys.Rev. Lett. , 088701 (2011).[6] J. R. Saffran, R. N. Aslin, and E. L. Newport, Science , 1926 (1996).[7] J. D. Fern´andez and F. Vico, J. Artificial Intelligence Res. , 513 (2013).[8] Y. LeCun, Y. Bengio, and G. Hinton, Nature , 436(2015).[9] G. J. Baxter, R. A. Blythe, W. Croft, and A. J. McKane,Phys. Rev. E , 046118 (2006).[10] T. L. Griffiths and M. L. Kalish, Cog. Sci. , 441 (2007).[11] F. Reali and T. L. Griffiths, Proc. R. Soc. B: Bio. Sci. , 429 (2010).[12] C. Perreault, C. Moya, and R. Boyd, Evolution and Hu-man Behavior , 449 (2012).[13] B. Thompson, S. Kirby, and K. Smith, Proc. Natl. Acad.Sci. , 4530 (2016).[14] B. Chazelle and C. Wang, J. Machine Learning Res. , 1 (2019).[15] M. Marsili and Y.-C. Zhang, Phys. Rev. Lett. ,2741 (1998).[16] C. Furusawa and K. Kaneko, Phys. Rev. Lett. ,088102 (2003).[17] S. Meloni, J. G´omez-Gardenes, V. Latora, andY. Moreno, Phys. Rev. Lett. , 208701 (2008).[18] J.-B. Michel et al. , Science , 176 (2011). [19] M. Mauch, R. M. MacCallum, M. Levy, and A. M. Leroi,R. Soc. open sci. , 150081 (2015).[20] A. Elgammal, B. Liu, D. Kim, M. Elhoseiny, andM. Mazzone, in Proceedings of the AAAI Conference onArtificial Intelligence , Vol. 32(1) (2018) p. 2183.[21] E. Nakamura and K. Kaneko, Sci. Rep. , 15993 (2019).[22] C. M. Bishop, Pattern recognition and machine learning (Springer, 2006).[23] F. Simonetta, F. Carnovalini, N. Orio, and A. Rod`a,in
Proceedings of the Audio Mostly 2018 on Sound inImmersion and Emotion (2018).[24] K. Shiiba and S. Kubo (ed.),
Japanese Songs Vols. 3–9(in Japanese) (Nobarasha, 1998/1999/2001/2004/2014).[25] B. L. Sturm, J. F. Santos, O. Ben-Tal, and I. Kor-shunova, in
Proceedings of the 1st Conference on Com-puter Simulation of Musical Creativity (2016).[26] M. J. Salganik, P. S. Dodds, and D. J. Watts, Science , 854 (2006).[27] R. A. Bentley, C. P. Lipo, H. A. Herzog, and M. W.Hahn, Evolution and Human Behavior , 151 (2007).[28] W. J. Ewens, Mathematical population genetics 1: theo-retical introduction (Springer, 2012).[29] S. Wright, Genetics , 97 (1931).[30] J. F. Crow and M. Kimura, An Introduction to Popula-tion Genetics Theory (Harper and Row, 1970).[31] A. V. Fend, Ann. Math. Stat. , 381 (1959).[32] C. N. Morris, Ann. Stat. , 65 (1982).[33] S. Amari,
Information geometry and its applications (Springer, 2016).[34] C. N. Morris, Ann. Stat. , 515 (1983).[35] K. Kaneko and T. Ikegami, Physica D: Nonlinear Phe-nomena , 406 (1992).[36] D. J. Earl and M. W. Deem, Proc. Natl. Acad. Sci. , 11531 (2004).[37] T. L. Griffiths and M. Steyvers, Proc. Natl. Acad. Sci. , 5228 (2004).[38] M. Caraher, P. Dixon, T. Lang, and R. Carr-Hill, BritishFood Journal , 590 (1999). upplemental Material for“Conjugate Distribution Law in Cultural Evolution via Statistical Learning” Eita Nakamura ∗ The Hakubi Center for Advanced Research, Kyoto University, Sakyo, Kyoto 606-8501, Japan andGraduate School of Informatics, Kyoto University, Sakyo, Kyoto 606-8501, Japan
CONTENTS
S1. Datasets 1S2. Data analysis of frequency statistics 1S3. Data analysis of global statistics 2S4. Derivation of Eq. (6) 4S5. Examples of exponential family distributions with quadratic estimation variances 5S6. Mathematical analysis of the solution of the FP equation (2) 7S7. Numerical analysis of the solution of the FP equation (2) 8S8. Equilibrium solution of the FP equation in the multiple parameters case 9References 11
S1. DATASETS
We used four music datasets for the analysis presented in Figs. 1 and 2. The classical music dataset consists of9,727 musical pieces in various instrumentations written by various composers [1]. The data contents are representedin the MIDI format, and we extracted integer pitches for the analysis. We did not use the rhythm information sinceit is unreliable due to the nature of the MIDI format. The musical notes are ordered according to the onset time andthe musical instruments in each file and we obtained the sequence of pitches in this order. The
Wikifonia datasetcontains Western popular songs and jazz songs mainly composed in the early to mid 20th century and was compiled inthe work of [2]. The dataset contains 6,388 songs (mostly vocal melodies). The
J-pop dataset contains 1,596 popularsongs in Japan composed between 1946 and 2010 whose transcriptions are published in [3]. The
Irish song datasetcontains Irish folk songs collected from a public website and was compiled in the work of [4]. 6,345 songs in G majorkey and in 4/4 time were used for the analysis. The contents of the Wikifonia, J-pop, and Irish song datasets arerepresented in the MusicXML format and we extracted the pitch and rhythm information. The pitch information wasextracted as a sequence of integer pitches. The rhythm information was extracted as sequences of onset and offsettimes, both represented in beat units. The score-notated duration (note value) of a musical note was defined as thedifference between its onset and offset times.
S2. DATA ANALYSIS OF FREQUENCY STATISTICS
For the analysis in Fig. 1, we extracted the within-song frequencies of the intervals of consecutive pitches modulo12 (pitch-class intervals; PCI) and those of the ratios of consecutive note durations (note-value ratios; NVR). Given asequence of pitches ( p n ) Nn =1 , the sequence of PCI ( q n ) N − n =1 is obtained by q n ≡ p n +1 − p n (mod 12), where 0 ≤ q n ≤ ≤ q ≤
11, the within-song PCI frequency was calculated by dividing the number of appearance of q ∗ [email protected] a r X i v : . [ phy s i c s . s o c - ph ] F e b C u m u l a t i v e d i ff e r en c e Rank ∆ C D (a) (b)(c)(d)(e) (f) (g) (h) (i) FIG. S1. Cumulative differences of the data and fitted beta distributions (representing the “goodness of fit”) sorted by rank.The alphabetical labels represent the corresponding panels in Fig. 1. by the total number of musical notes in a song. Following [1], we did not use the zero PCI for the analysis since itdescribes a continuation of the same tone (up to octave equivalence) and has little information. The PCI probabilitiesare independent of musical key and commonly used for analyzing music styles. The within-song frequencies of NVRsare similarly defined by counting the number of times the ratio r n : r n +1 of consecutive note values is a specific ratio.We calculated the frequencies of the most common ratios: 1:1, 1:2, 2:1, 1:3, 3:1, 2:3, 3:2, 1:4, 4:1, 1:6, and 6:1 (11statistics in total).The within-population distribution of the within-song frequencies were obtained for each of the four datasets. Thezero frequency samples were not used in the analysis of these distribution, as in [1], since those samples can becontaminated with different styles/genres of music. Since the 11 PCI statistics were calculated for all the datasetsand the 11 NVR statistics were calculated for the datasets other than the classical music dataset, we analyzed 77distributions in total.For each distribution, the beta distribution was fitted by the moment-matching method using the mean and variance.To quantitatively measure the “goodness of fit”, we computed the cumulative difference ∆ CD of the data and fittedbeta distributions defined as ∆ CD = Z dθ | C data ( θ ) − C fit ( θ ) | , (S1)where C data ( θ ) denotes the data cumulative distribution function (CDF) and C fit ( θ ) denotes the CDF of the fittedbeta distribution. In numerical analysis, the data distribution was represented by a histogram and the integral inEq. (S1) was approximated by a summation. The CDF was used here instead of the probability distribution functionto reduce the dependence on the histogram bin width (0 .
005 in our analysis). The distribution of the values of ∆ CD is shown in Fig. S1.For the estimation of D (¯ θ ) /V ( θ ) discussed below Eq. (7) in the main text, we used the distributions with∆ CD < .
004 that included 51 (about two thirds) of the 77 distributions. The (min, mean, max, s.d.) were(40 . , . , . , .
1) (classical), (17 . , . , . , .
11) (Wikifonia), (39 . , . , . , .
0) (J-pop), and (24 . , . . , .
0) (Irish). The average numbers of notes ¯ L p within each corpus were 2300 (classical), 107 (Wikifonia),148 (J-pop), and 115 (Irish). For all datasets, the values of D (¯ θ ) /V ( θ ) were similar among different statistics andsignificantly smaller than ¯ L p . S3. DATA ANALYSIS OF GLOBAL STATISTICS
For the analysis in Fig. 2, we calculated the song-wise global statistics as follows. The probabilities of pitch intervals I within songs were obtained by calculating the relative frequencies of pitch intervals | p n +1 − p n | . The rate parameterof the exponential distributions used to fit the contours of these distributions were calculated by the probability valuesin the range 1 ≤ I ≤
12. To deal with missing samples and deviations due to the rareness of dissonant intervals, weused the method of least squares (the linear regression was applied in the logarithmic domain of I ). Similarly, thecontour of the distribution of note-value ratios 1: R was fitted by a Pareto distribution in the range 1 ≤ R ≤
10 byapplying the linear regression in the log-log domain. The probability distribution P ( K ) of the bar-wise note density K was fitted by a Poisson distribution by applying the linear regression for ln K ! P ( K ) in the range 2 ≤ K ≤ -4 -3 -2 -1
0 2 4 6 8 10 12 14 P ( I ) Pitch interval I Y ∝ exp(-0.093 X) 0 0.05 0.1 0.15 0 0.1 0.2 0.3 0.4 0.5 Ψ ( / < I > ) Song-wise 1/
Gamma fit
FIG. S2. Distributions of pitch-interval statistics in the classical music data. The left panel shows the distributions ofpitch intervals within songs (fine line segments), that within the whole dataset (bold line segments), and a fitted exponentialdistribution (straight line); the right panel shows the distribution of song-wise statistics (reciprocals of the means) of pitchintervals and a fitted gamma distribution. -4 -3 -2 -1
0 2 4 6 8 10 12 14 (a) P ( I ) Pitch interval I Y ∝ exp(-0.40 X) 0 0.05 0.1 0 0.2 0.4 0.6 0.8 Ψ ( / < I > ) Song-wise 1/
Gamma fit10 -4 -3 -2 -1
1 2 3 4 5 7 10 15 (b) P ( R ) Note-value ratio 1: R Y ∝ X -2.74 Ψ ( / < l n R > ) Song-wise 1/ < ln R> Gamma fit10 -4
0 2 4 6 8 10 12 14 (c) K ! P ( K ) Bar-wise note density K Y ∝ (4.45) X Ψ ( < K > ) Song-wise
Gamma fit
FIG. S3. Distributions of global musical statistics in the Wikifonia data. (a) The left panel shows the distributions ofpitch intervals within songs (fine line segments), that within the whole dataset (bold line segments), and a fitted exponentialdistribution (straight line); the right panel shows the distribution of song-wise statistics (reciprocals of the means) of pitchintervals and a fitted gamma distribution. (b) Similar results for note value ratios of the form 1: R , where the global distributionis fitted by a Pareto distribution. (c) Similar results for bar-wise note densities, where the global distribution is fitted by aPoisson distribution. this dataset contains musical pieces by various instrumentations (including ensemble music) whereas the other threedatasets contain (mostly vocal) melodies. Despite this difference, the tendencies that the contour is approximatelyexponential distributed and the within-population distribution is approximately gamma distribution are same as theother cases. The results for the Wikifonia data in Fig. S3 are quantitatively similar to the results for the J-pop datain Fig. 2, indicating that the general tendencies are similar for popular music developed in different regions. Thewithin-song distributions of the note-value ratios for the Irish music data in Fig. S4(b) clearly differ from the othercases, reflecting the characteristic of the Irish music that note value ratios 1: R with R greater than 4 are much rarercompared to the other cases. The within-population distribution of the shape parameter has a noticeable excess in thesmall regime in comparison to the fitted gamma distribution. As small shape parameters correspond to large values of h ln R i , the fact that data samples are sparse for large R may have caused unreliable estimation in this regime. Whilethere are some quantitative and qualitative differences, the feature that the conjugate gamma distributions appear inthe within-population distributions is found universally for all the analyzed datasets from different musical genres.We can estimate the values of D (¯ θ ) /V ( θ ) for the global statistics, similarly as for the frequency statistics. The -4 -3 -2 -1
0 2 4 6 8 10 12 14 (a) P ( I ) Pitch interval I Y ∝ exp(-0.43 X) 0 0.05 0.1 0 0.2 0.4 0.6 0.8 Ψ ( / < I > ) Song-wise 1/
Gamma fit10 -6 -5 -4 -3 -2 -1
1 2 3 4 5 7 10 15 (b) P ( R ) Note-value ratio 1: R Y ∝ X -4.48 Ψ ( / < l n R > ) Song-wise 1/ < ln R> Gamma fit10 -4
0 2 4 6 8 10 12 14 (c) K ! P ( K ) Bar-wise note density K Y ∝ (5.49) X Ψ ( < K > ) Song-wise
Gamma fit
FIG. S4. Distributions of global musical statistics in the Irish song data. See the caption to Fig. S3. values estimated from the distributions of (pitch intervals, note value ratios) were (2 .
8, N/A) (classical), (7 . , . . , .
1) (J-pop), and (7 . , .
8) (Irish) (bar-wise note density were excluded from the analysis sincethe units of data samples are different). These values are significantly smaller than the corresponding values for thefrequency statistics, suggesting that there is even larger statistical dependence on the samples (possibly due to globalrepetitive structure within songs) for these statistics.
S4. DERIVATION OF EQ. (6)
We show that Eq. (6) is a necessary and sufficient condition for the equilibrium distribution in Eq. (5) to be theminimal conjugate distribution with C = 0 in Eq. (4). We assume that functions A ( θ ) and B ( θ ) are smooth in thedomain of definition and that B ( θ ) = 0, which is equivalent to the well-definedness of the mean of statistic F (recallthat h F i = A ( θ ) /B ( θ )). Comparing Eqs. (4) and (5), we find that the equilibrium distribution is the conjugatedistribution if and only if ln B ( θ ) is a linear combination of B ( θ ) and A ( θ ) up to a constant term, that is, we canwrite ln B ( θ ) = c + c B ( θ ) + c A ( θ )for some constants c i . Differentiating this equation, we obtain an equivalent condition: B ( θ ) B ( θ ) = c B ( θ ) + c A ( θ ) = ( c + c θ ) B ( θ ) , (S2)where we have used the relation A ( θ ) /B ( θ ) = θ (as assumed in the expectation value parameterization). Since D ( θ ) = 1 /B ( θ ), Eq. (S2) is equivalent to the condition that D ( θ ) is linear in θ , or D ( θ ) is quadratic in θ , which iswhat Eq. (6) says.One can directly solve the zero-current equation by substituting the minimal conjugate distribution. Substituting M ( θ ) = u ( ζ − θ ) and K ( θ ) = D ( θ ) /L into Eq. (2), the zero-current equation is given as2 Lu ( θ − ζ )Ψ ∗ = − ddθ ( D Ψ ∗ ) . (S3)Substituting Ψ ∗ ( θ ) = exp (cid:2) χB ( θ ) − νA ( θ ) − W ( χ, ν ) (cid:3) into Eq. (S3) and using Ψ = ( χB − νA )Ψ ∗ = ( χ − νθ ) B Ψ ∗ = χ − νθD Ψ ∗ and Eq. (6), we have (cid:8) Lu ( θ − ζ ) + d + 2 d θ + χ − νθ (cid:9) Ψ ∗ ( θ ) = 0 . From this, we obtain an explicit solution χ = 2 Luζ − d , ν = 2 Lu + 2 d . (S4) S5. EXAMPLES OF EXPONENTIAL FAMILY DISTRIBUTIONS WITH QUADRATIC ESTIMATIONVARIANCES
Here we calculate the estimation variances for the well-known distributions listed in Table I in the expectation valueparameterization and confirm that they are constant, linear, or quadratic functions. We also identify the correspondingconjugate distributions.
Bernoulli. —The Bernoulli distribution is defined asBer( x ; p ) = p x (1 − p ) − x = exp (cid:2) x (cid:8) ln p − ln (1 − p ) (cid:9) + ln (1 − p ) (cid:3) , (S5)where x takes values in 0 , p is the probability to get 1. The distribution has the form of Eq. (3) with F ( x ) = x, B ( p ) = ln p − ln (1 − p ) , A ( p ) = − ln (1 − p ) . Since the mean and variance of the static F = x are h x i = p and V ( x ) = p (1 − p ), respectively, p is the expectationvalue parameter ( θ = p ) and the estimation variance is D ( θ ) = θ (1 − θ ). The conjugate distribution is calculated as˜ φ ( θ ; χ, ν ) ∝ exp (cid:2) χB ( θ ) − νA ( θ ) (cid:3) ∝ θ χ (1 − θ ) ν − χ , and therefore ˜ φ ( θ ; χ, ν ) = 1 B ( χ + 1 , ν − χ + 1) θ χ (1 − θ ) ν − χ = Beta( θ ; χ + 1 , ν − χ + 1) . (S6) Poisson. —The Poisson distribution is defined asPois( x ; λ ) = λ x e − λ x ! = exp( x ln λ − λ − ln x !) , (S7)where x takes values in N and λ is the rate parameter. The distribution has the form of Eq. (3) with F ( x ) = x, B ( λ ) = ln λ, A ( λ ) = λ. Since the mean and variance of the static F = x are h x i = λ and V ( x ) = λ , respectively, λ is the expectation valueparameter ( θ = λ ) and the estimation variance is D ( θ ) = θ . The conjugate distribution is calculated as˜ φ ( θ ; χ, ν ) ∝ exp (cid:2) χB ( θ ) − νA ( θ ) (cid:3) ∝ θ χ e − νθ , and therefore ˜ φ ( θ ; χ, ν ) = ν χ +1 Γ( χ + 1) θ χ e − νθ = Gam( θ ; χ + 1 , ν ) . (S8) Gaussian parameterized by mean. —The Gaussian distribution is defined asGauss( x ; µ, Σ) = 1 √ π Σ e − ( x − µ )22Σ = exp (cid:20) µ Σ x − x − µ −
12 ln(2 π Σ) (cid:21) , (S9)where x takes values in R , and µ and Σ are the mean and variance ( h x i = µ , h ( x − µ ) i = Σ). Focusing on the statistic x and treating Σ as a constant parameter, the distribution has the form of Eq. (3) with F ( x ) = x, B ( µ ) = µ Σ , A ( µ ) = µ . Since the mean and variance of the static F = x are h x i = µ and V ( x ) = Σ, respectively, µ is the expectation valueparameter ( θ = µ ) and the estimation variance is D ( θ ) = Σ. The conjugate distribution is calculated as˜ φ ( θ ; χ, ν ) ∝ exp (cid:2) χB ( θ ) − νA ( θ ) (cid:3) ∝ exp (cid:20) χ θ Σ − ν θ (cid:21) , and therefore ˜ φ ( θ ; χ, ν ) = r ν π Σ exp (cid:20) − ν (cid:18) θ − χν (cid:19) (cid:21) = Gauss (cid:18) θ ; χν , Σ ν (cid:19) . (S10) Gaussian parameterized by variance. —Focusing on the statistic ( x − µ ) and treating µ as a constant parameter,the Gaussian distribution in Eq. (S9) has the form of Eq. (3) with F ( x ) = ( x − µ ) , B (Σ) = − , A (Σ) = 12 ln(2 π Σ) . Since the mean and variance of the static F ( x ) are h F i = Σ and V ( F ) = 2Σ , respectively, Σ is the expectation valueparameter ( θ = Σ) and the estimation variance is D ( θ ) = 2 θ . The conjugate distribution is calculated as˜ φ ( θ ; χ, ν ) ∝ exp (cid:2) χB ( θ ) − νA ( θ ) (cid:3) = exp (cid:20) − χ θ − ν πθ ) (cid:21) ∝ θ − ν/ e − χ θ and therefore ˜ φ ( θ ; χ, ν ) = ( χ ) ν − Γ( ν −
1) 1 θ ν e − χ θ = InvGam (cid:18) θ ; ν − , χ (cid:19) . (S11) Gamma parameterized by mean. —The Gamma distribution is defined asGam( x ; α, β ) = β α Γ( α ) x α − e − βx , (S12)where x takes values in [0 , ∞ ), and α and β denote the shape and rate parameters. Focusing on the statistic x andtreating α as a constant parameter, the distribution has the form of Eq. (3) with F ( x ) = x, B ( β ) = − β, A ( β ) = − α ln β. Since the mean and variance of the static F = x are h x i = α/β and V ( x ) = α/β , respectively, θ ≡ α/β is theexpectation value parameter and the estimation variance is given as D ( θ ) = θ /α . The conjugate distribution iscalculated as ˜ φ ( θ ; χ, ν ) ∝ exp (cid:2) χB ( θ ) − νA ( θ ) (cid:3) ∝ exp (cid:20) − αχθ − αν ln θ (cid:21) and therefore ˜ φ ( θ ; χ, ν ) = ( χα ) να − Γ( να −
1) 1 θ να e − χα/θ = InvGam( θ ; να − , χα ) . (S13)With the relation β = α/θ , this means that β is gamma distributed:˜ φ ( β ; χ, ν ) = Gam( β ; να − , χ ) . (S14)The exponential distribution is a special case of gamma distribution and the Pareto distribution can be obtainedfrom the exponential distribution by a logarithmic conversion. Therefore, the above calculation shows that theconjugate distributions for the statistical parameters analyzed in Figs. 2(a) and (c) are gamma distributions. S6. MATHEMATICAL ANALYSIS OF THE SOLUTION OF THE FP EQUATION (2)
We first calculate the mean h θ i ∗ ≡ R dθ θ Ψ ∗ ( θ ) for the zero-current equilibrium solution Ψ ∗ to Eq. (2). Integratingboth of Eq. (S3) sides gives 2 Lu ( h θ i ∗ − ζ ) = − [ D Ψ ∗ ] θ + θ − , where ( θ − , θ + ) represents the domain of θ . If the surface term on the RHS vanishes at both ends of the domain of θ ,as it regularly does, we have h θ i ∗ = ζ. The physical meaning of this relation is clear: after a long time the directional effect of the oblique transmission(linear pressure) term M ( θ ) dominates (the effect of the diffusion term K ( θ ) is directionless).Next, we suppose that generative model φ has a quadratic estimation variance as in Eq. (6). Using Eq. (S3), wehave 2 Lu h ( θ − ζ ) i ∗ = 2 Lu Z dθ ( θ − ζ ) Ψ ∗ = − Z dθ ( θ − ζ ) ddθ ( D Ψ ∗ ) = − (cid:2) ( θ − ζ ) D Ψ ∗ (cid:3) θ + θ − + Z dθ D Ψ ∗ . (S15)The last integral can be transformed as Z dθ D Ψ ∗ = d + d h θ i ∗ + d h θ i ∗ = D ( ζ ) + d h ( θ − ζ ) i ∗ . Thus, if the surface term in Eq. (S15) vanishes at both ends of the domain of θ , as it regularly does, we have h ( θ − ζ ) i ∗ = D ( ζ )2 Lu − d , (S16)which is same as Eq. (7). This result can be explicitly checked for the example cases presented in the previous sectionby using the equilibrium solution in Eq. (S4) and the known formulas for the variances of the conjugate distributions.Given the stationary solution Ψ ∗ ( θ ), the time-dependent FP equation (2) can in general be solved by the eigen-function method. Here, we use the eigenfunction method to calculate the convergence time to the equilibrium. Wedefine the coefficient function Q ( θ, t ) by Ψ( θ, t ) = Q ( θ, t )Ψ ∗ ( θ ) and substitute it into Eq. (2) to obtain ∂Q ( θ, t ) ∂t = (cid:20) M ( θ ) ∂∂θ + K ( θ )2 ∂ ∂θ (cid:21) Q ( θ, t ) . Thus, the equation for an eigenfunction Q ( θ, t ) = e − λt Q λ ( θ ) is given as (cid:20) K ( θ )2 d dθ + M ( θ ) ddθ + λ (cid:21) Q λ ( θ ) = 0 . Using the method of series expansion Q λ = P ∞ n =0 a n θ n and substituting M = u ( ζ − θ ) and K = ( d + d θ + d θ ) /L ,we obtain the following iterative equation for the coefficients: d ( n + 1)( n + 2) a n +2 + (cid:8) d n ( n + 1) + 2 L ( n + 1) uζ (cid:9) a n +1 + (cid:8) d n ( n −
1) + 2 L ( λ − nu ) (cid:9) a n = 0 . (S17)For all the distributions listed in Table I, one can check that the series Q λ must have a finite order to be a normalizablesolution. This leads to the condition that the coefficient of a n in Eq. (S17) must vanish for some n , and we obtaindiscrete eigenvalues λ m = mu − m ( m − L d ( m = 0 , , , . . . ) . (S18)Since the lowest modes are λ = 0, λ = u , · · · , the lowest nonzero mode decays as ∝ e − ut . That is, the relaxationtime (convergence time to the equilibrium) is of order u − , reproducing the physically expected result as explainedin the main text. P r obab ili t y θ t = 0t = 10t = 20 0 0.03 0.06 0.09 0.12 0.15 0 0.2 0.4 0.6 θ t = 50t = 100 0 0.03 0.06 0.09 0.12 0.15 0 0.2 0.4 0.6 θ t = 200t = 500 0 0.02 0.04 0.06 0.08 0.1 0 0.2 0.4 0.6 P r obab ili t y θ t = 0t = 10t = 20 0 0.02 0.04 0.06 0.08 0.1 0 0.2 0.4 0.6 θ t = 50t = 100 0 0.02 0.04 0.06 0.08 0.1 0 0.2 0.4 0.6 θ t = 200t = 500 (a)(b) FIG. S5. Simulation results for the SLG system with beta distributed generative models. Parameters were set as N = 20000, L = 1000, u = 1 /
50 = 0 .
02, and ζ = 0 .
1. The fitted distribution is Beta( θ ; 4 , θ = 0 .
2. (b) Initial distribution uniformly distributed in the range [0 , . P r obab ili t y θ t = 0t = 10t = 20 0 0.03 0.06 0.09 0.12 0.15 0 3 6 9 12 15 θ t = 50t = 100 0 0.03 0.06 0.09 0.12 0.15 0 3 6 9 12 15 θ t = 200t = 500 0 0.02 0.04 0.06 0.08 0 3 6 9 12 15 P r obab ili t y θ t = 0t = 10t = 20 0 0.02 0.04 0.06 0.08 0 3 6 9 12 15 θ t = 50t = 100 0 0.02 0.04 0.06 0.08 0 3 6 9 12 15 θ t = 200t = 500 (a)(b) FIG. S6. Simulation results for the SLG system with Poisson distributed generative models. Parameters were set as N = 20000, L = 25, u = 1 /
50 = 0 .
02, and ζ = 5. The fitted distribution is Gam( θ ; 5 , θ = 3. (b)Initial distribution uniformly distributed in the range [1 , S7. NUMERICAL ANALYSIS OF THE SOLUTION OF THE FP EQUATION (2)
To confirm the results in the previous section by numerical calculation, we conducted simulation experiments forstatistical learn-generate (SLG) systems whose generative models are one of those listed in Table I. For each SLGsystem, we iterate the learning and generation processes with oblique transmission (with constant ζ ) and observe thatthe within-population distribution of the cultural traits (statistical parameters) converge to the conjugate distributionwith parameters given in Eq. (S4). We also confirm that the relaxation time is of order u − .For simulation, we tested two extreme cases for the initial within-population distribution, a delta function anda uniform distribution with a certain value range. The data generation process was simulated by random numbergeneration functions provided in the C++ standard library. To get samples from Poisson distributions, we usedKnuth’s algorithm (D. E. Knuth, “The Art of Computer Programming 2. Seminumerical Algorithms (3rd ed.),”Addison Wesley, 1997).The results are shown in Figs. S5 to S9. In all cases, we see that the within-population distribution converges tothe theoretically predicted conjugate distribution regardless of the initial distribution. It can be also confirmed thatthe convergence is attained at time t ∼
100 = 2 u − . P r obab ili t y θ t = 0t = 10t = 20 0 0.03 0.06 0.09 0.12 -3 -2 -1 0 1 2 3 θ t = 50t = 100 0 0.03 0.06 0.09 0.12 -3 -2 -1 0 1 2 3 θ t = 200t = 500 0 0.02 0.04 0.06 0.08 -3 -2 -1 0 1 2 3 P r obab ili t y θ t = 0t = 10t = 20 0 0.02 0.04 0.06 0.08 -3 -2 -1 0 1 2 3 θ t = 50t = 100 0 0.02 0.04 0.06 0.08 -3 -2 -1 0 1 2 3 θ t = 200t = 500 (a)(b) FIG. S7. Simulation results for the SLG system with Gaussian distributed generative models and mean value traits with afixed variance Σ = 1. Parameters were set as N = 20000, L = 50, u = 1 /
50 = 0 .
02, and ζ = 0. The fitted distribution isGauss( θ ; 0 , . θ = 2. (b) Initial distribution uniformly distributed in the range [ − , P r obab ili t y θ t = 0t = 10t = 20 0 0.02 0.04 0.06 0.08 0.1 0.12 0 1 2 3 4 θ t = 50t = 100 0 0.02 0.04 0.06 0.08 0.1 0.12 0 1 2 3 4 θ t = 200t = 500 0 0.02 0.04 0.06 0 1 2 3 4 P r obab ili t y θ t = 0t = 10t = 20 0 0.02 0.04 0.06 0 1 2 3 4 θ t = 50t = 100 0 0.02 0.04 0.06 0 1 2 3 4 θ t = 200t = 500 (a)(b) FIG. S8. Simulation results for the SLG system with Gaussian distributed generative models and variance value traits witha fixed mean µ = 0. Parameters were set as N = 20000, L = 100, u = 1 /
50 = 0 .
02, and ζ = 1. The fitted distribution isInvGam( θ ; 3 , θ = 0 .
5. (b) Initial distribution uniformly distributed in the range[0 . , S8. EQUILIBRIUM SOLUTION OF THE FP EQUATION IN THE MULTIPLE PARAMETERS CASE
We extend the FP equation with the oblique transmission term for generative models with more than one parameterand show that equilibrium solution is given by a conjugate distribution in general. An exponential family distributionwith multiple parameters θ = ( θ i ) Ki =1 is given as φ ( x ; θ ) = exp (cid:20) K X k =1 F k ( x ) B k ( θ ) − A ( θ ) + U ( x ) (cid:21) , (S19)where F k are independent statistics. In the following, we write ∂ i = ( ∂/∂θ i ) and omit the summation symbol. From h ∂ i φ i = 0, we obtain h ∂ i A i = h F k i ∂ i B k = G ik h F k i , where G ik ≡ ∂ i B k . We assume G ik is invertible: G − ‘i G ik = δ ‘k and G ik G − kj = δ ij . In the expectation value parameterization, h F k i = G − ki ∂ i A = θ k , ∂ i A = G ik θ k . P r obab ili t y θ t = 0t = 10t = 20 0 0.03 0.06 0.09 0.12 0 1 2 3 4 θ t = 50t = 100 0 0.03 0.06 0.09 0.12 0 1 2 3 4 θ t = 200t = 500 0 0.02 0.04 0.06 0 1 2 3 4 P r obab ili t y θ t = 0t = 10t = 20 0 0.02 0.04 0.06 0 1 2 3 4 θ t = 50t = 100 0 0.02 0.04 0.06 0 1 2 3 4 θ t = 200t = 500 (a)(b) FIG. S9. Simulation results for the SLG system with gamma distributed generative models and mean value traits θ with afixed shape parameter α = 0. The rate parameter is β = 1 /θ . Parameters were set as N = 20000, L = 50, u = 1 /
50 = 0 . ζ = 1. The fitted distribution is InvGam( θ ; 3 , θ = 0 .
5. (b) Initial distributionuniformly distributed in the range [0 . , Differentiating the last expression, we have G ij = ∂ i ∂ j A − θ k ∂ i ∂ j B k . From h ∂ i ∂ j φ i = 0, we obtain, ∂ i ∂ j A − h F k i ∂ i ∂ j B k = h ( F k G ik − ∂ i A )( F ‘ G j‘ − ∂ j A ) i . Multiplying G − mi G − nj on both sides and summing over i and j , D mn ≡ = h ( F m − θ m )( F n − θ n ) i = G − mi G − nj G ij = G − mn . Since the estimation covariance matrix D mn is symmetric, so is G mn in the expectation value parameterization.The FP equation for the case with multiple parameters can be given as ∂ Ψ( θ, t ) ∂t = − ∂∂θ i (cid:20) M i ( θ )Ψ( θ, t ) − ∂ { K ij ( θ )Ψ( θ, t ) } ∂θ j (cid:21) . Substituting M i ( θ ) = u ( ζ i − θ i ) and K ij ( θ ) = D ij /L , we obtain ∂ Ψ( θ, t ) ∂t = − ∂∂θ i (cid:20) u ( ζ i − θ i )Ψ( θ, t ) − L ∂ { D ij ( θ )Ψ( θ, t ) } ∂θ j (cid:21) . (S20)The zero-current equation is thus 2 Lu ( ζ i − θ i )Ψ − ( ∂ j D ij )Ψ − D ij ∂ j Ψ = 0 . (S21)Assume the following conjugate distribution: Ψ ∝ exp[ χ k B k ( θ ) − νA ( θ ) + C ( θ )]. Since ∂ j Ψ = ( χ k ∂ j B k − ν∂ j A + ∂ j C )Ψ = (cid:2) G jk ( χ k − νθ k ) + ∂ j C (cid:3) Ψ , Eq. (S21) is equivalent to (cid:8) Luζ i − χ i + ( ν − Lu ) θ i − ∂ j D ij − D ij ∂ j C (cid:9) Ψ = 0 . Thus, the conjugate distribution is a solution if the following equations are satisfied: ν = 2 Lu, χ i = 2 Luζ i , ∂ j D ij + D ij ∂ j C = 0 . The last equation is equivalent to ∂ k C = − G ki ∂ j D ij . D = G − , ∂ j D = − D ( ∂ j G ) D or ∂ j D k‘ = − G − km G − n‘ ∂ j G mn . The RHS can be transformed as − G ki ∂ j D ij = G − ij ∂ j G ki = G − ij ∂ j ∂ k B i = G − ij ∂ k ∂ j B i = G − ij ∂ k G ji , and the equation for C is ∂ k C = G − ij ∂ k G ji . Using Jacobi’s formula this can be solved as C = ln det G . Therefore, the equilibrium solution is given asΨ ∗ ∝ exp[2 Luζ k B k ( θ ) − LuA ( θ ) + ln det G ( θ )] . (S22)As an example, we consider a discrete (categorical) distribution. A ( K + 1)-dimensional discrete distribution isdescribed by a variable x = ( x i ) K +1 i =1 , where x i is either 0 or 1 and P K +1 i =1 x i = 1 (one-hot vector). The parametersform a probability vector θ = ( θ i ) K +1 i =1 ( P K +1 i =1 θ i = 1). Because of the normalization conditions, the last element canbe represented by the other elements as x K +1 = 1 − P Ki =1 x i and θ K +1 = 1 − P Ki =1 θ i , and the parameter space is infact K -dimensional. The generative model is an exponential family distribution: φ ( x ; θ ) = K +1 Y i =1 θ x i i = exp (cid:20) K X i =1 x i { ln θ i − ln θ K +1 } + ln θ K +1 (cid:21) . We have F i = x i , B i = ln θ i − ln θ K +1 , and A = − ln θ K +1 , and thus G ij = δ ij /θ i + 1 /θ K +1 and ∂ i A = 1 /θ K +1 .The estimation covariance matrix is D ij = G − ij = θ i ( δ ij − θ j ). We can show det G = (det D ) − = ( θ · · · θ K +1 ) − bymathematical induction on K . Substituting this into Eq. (S22), we obtainΨ ∗ ( θ ) ∝ exp (cid:20) K X i =1 Luζ i (ln θ i − ln θ K +1 ) + 2 Lu ln θ K +1 − K +1 X i =1 ln θ i (cid:21) = exp (cid:20) K +1 X i =1 (2 Luζ i − θ i (cid:21) , where ζ K +1 ≡ − ζ − · · · − ζ K . Therefore, the equilibrium distribution is a Dirichlet distribution Dir( θ ; α ) withDirichlet parameters α i = 2 Luζ i ( i = 1 , . . . , K + 1). [1] E. Nakamura and K. Kaneko, Sci. Rep. , 15993 (2019).[2] F. Simonetta, F. Carnovalini, N. Orio, and A. Rod`a, in Proceedings of the Audio Mostly 2018 on Sound in Immersion andEmotion (2018).[3] K. Shiiba and S. Kubo (ed.),
Japanese Songs Vols. 3–9 (in Japanese) (Nobarasha, 1998/1999/2001/2004/2014).[4] B. L. Sturm, J. F. Santos, O. Ben-Tal, and I. Korshunova, in