[PDF] An equilibrium approach to modelling social interaction

Abstract

The aim of this work is to put forward a statistical mechanics theory of social interaction, generalizing econometric discrete choice models. After showing the formal equivalence linking econometric multinomial logit models to equilibrium statical mechanics, a multi-population generalization of the Curie-Weiss model for ferromagnets is considered as a starting point in developing a model capable of describing sudden shifts in aggregate human behaviour. Existence of the thermodynamic limit for the model is shown by an asymptotic sub-additivity method and factorization of correlation functions is proved almost everywhere. The exact solution of the model is provided in the thermodynamical limit by finding converging upper and lower bounds for the system's pressure, and the solution is used to prove an analytic result regarding the number of possible equilibrium states of a two-population system. The work stresses the importance of linking regimes predicted by the model to real phenomena, and to this end it proposes two possible procedures to estimate the model's parameters starting from micro-level data. These are applied to three case studies based on census type data: though these studies are found to be ultimately inconclusive on an empirical level, considerations are drawn that encourage further refinements of the chosen modelling approach.

Full PDF

aa r X i v : . [ phy s i c s . s o c - ph ] J u l Alma Mater Studiorum · Universit`a di Bologna

FACOLT `A DI SCIENZE MATEMATICHE, FISICHE E NATURALIDOTTORATO DI RICERCA IN MATEMATICA, XXI CICLO

MAT 07: Fisica Matematica

An equilibrium approachto modelling social interaction doctoral thesis

Presentata da:

Ignacio Gallo

Coordinatore:

Alberto Parmeggiani

Relatore : Pierluigi ContucciEsame Finale anno 2009 o Liliana, Ricardo and Federico.to Sara. bstract

The aim of this work is to put forward a statistical mechanics theory of social inter-action, generalizing econometric discrete choice models. After showing the formal equiv-alence linking econometric multinomial logit models to equilibrium statical mechanics, amulti-population generalization of the Curie-Weiss model for ferromagnets is consideredas a starting point in developing a model capable of describing sudden shifts in aggregatehuman behaviour.Existence of the thermodynamic limit for the model is shown by an asymptotic sub-additivity method and factorization of correlation functions is proved almost everywhere.The exact solution of the model is provided in the thermodynamical limit by ﬁnding con-verging upper and lower bounds for the system’s pressure, and the solution is used to provean analytic result regarding the number of possible equilibrium states of a two-populationsystem.The work stresses the importance of linking regimes predicted by the model to realphenomena, and to this end it proposes two possible procedures to estimate the model’sparameters starting from micro-level data. These are applied to three case studies basedon census type data: though these studies are found to be ultimately inconclusive on anempirical level, considerations are drawn that encourage further reﬁnements of the chosenmodelling approach. ontents hapter 1

Introduction

In recent years there has been an increasing awareness towards the problem of ﬁnding aquantitative way to study the role played by human interactions in shaping the kind ofaggregate behaviour observed at a population level: reference [3] provides a comprehensiveaccount of how ramiﬁed this ﬁeld of study already is. There the author reviews eﬀortsmade by researchers from areas as diverse as psychology, economics and physics, to cite afew, in the pursuit of regularities that may characterize diﬀerent kinds of aggregate humanbehaviour such as urban traﬃc, market behaviour and the internet.The idea of characterizing society as a unitary entity, characterized by global featuresnot dissimilar from those exhibited by physical or living systems has accompanied the devel-opment of philosophical thought since its very beginning, and one must look no further thanPlato’s

Republic to ﬁnd an early example of such a view. The proposal that mathematicsmight play a crucial role in pursuing such an idea, on the other hand, dates back at least toThomas Hobbes’s

Leviathan , where an attempt is made to draw analogies between the lawsdescribing mechanics, and features of society as a whole. Hobbes’s work gives an inspiringoutlook on the ways in which modern science might contribute to practical human aﬀairsfrom an organizational point of view, as well as technological.In later centuries, nevertheless, quantitative science has grown aware of the fact that,though a holistic view such as Hobbes’s plays an important motivational role in the develop-ment of new scientiﬁc enterprises, it is only by reducing a problem to its simplest componentsthat success is attained by empirical studies. One of the interesting sub-problems singledout by the modern approach is that of characterizing the behaviour of a large groups ofpeople, when each individual is faced with a choice among a ﬁnite set of alternatives, anda set of motives driving the choice can be identiﬁed. Such motives might be given by theperson’s personal preferences, as well as by the way he interacts with other people. My1hesis aims to contribute to the research eﬀort which is currently analysing the role playedby social interaction in the human decision making process just described.As early as in the nineteen-seventies the dramatic consequences of including interactionbetween peers into a mathematical model of choice comprising large groups of people havebeen recognized independently by the physical [23], economical [62] and social science [34]communities. The conclusion reached by all these studies is that mathematical models havethe potential to describe several features of social behaviour, among which the sudden anddramatic shifts often observed in society trends [47], and that these are unavoidably linkedto the way individual people inﬂuence each other when deciding how to behave.The possibility of using such models as a tool of empirical investigation, however, isnot found in the scientiﬁc literature until the beginning of the present decade [21]: thereason is to be found in the intrinsic diﬃculty of establishing a methodology of systematicmeasurement for social features. Conﬁdence that such an aim might be an achievable onehas been boosted by the wide consensus gained by econometrics following the Nobel prizeawarded in 2000 to economist Daniel Mcfadden for his work on probabilistic models ofdiscrete choice, and by the increasing interest of policy makers for tools enabling them tocope with the global dimension of today’s society [39, 27].This has led very recently to a number of studies confronting directly the challenge ofquantitatively measuring social interaction for bottom-up models, that is, models derivingmacroscopic phenomena from assumptions about human behaviour at an individual level[11, 61, 51, 65].These works show an interesting interplay of methods coming from econometrics [25],statistical physics [26] and game theory [43], which reveals a substantial overlap in the basicassumptions driving these three disciplines. It must also be noted that all of these studiesrely on a simplifying assumption which considers interaction working on a global uniformscale, that is on a mean ﬁeld approach. This is due to the inability, stated in [69], of existingmethods to measure social network topological structure in any detail. It is expected that itis only matter of time before technology allows to overcome this diﬃculty: in the meanwhile,one of the roles of today’s empirical studies is to assess how much information can be derivedfrom the existing kind of data such as that coming from surveys, polls and censuses.This thesis considers a mean ﬁeld model that highlights the possibility of using themethods of discrete choice econometrics to apply a statistical mechanical generalization ofthe model introduced in [21]. The approach is mainly that of mathematical-physics: thismeans that the main aim shall be to establish the mathematical properties of the proposedmodel, such as the existence of the thermodynamical limit, its factorization properties, andits solution, in a rigorous way: it is hoped that this might be used as a good building2lock for later more reﬁned theories. Furthermore, since maybe the most problematicpoint of a mathematical study of society lies in the feasibility of measuring the relevantquantities starting from real data, two estimation procedures are put forward: one tries tomimic the econometrics approach, while the other stems directly from equilibrium statisticalmechanics, by stressing the role played by ﬂuctuations of main observable quantities. Theseprocedures are applied to some simple case studies.The thesis is therefore organised as follows: the ﬁrst chapter reviews the theory of Multi-nomial Logit discrete choice models. These models are based on a probabilistic approachto the psychology of choice [48], which is chosen here as the modelling approach to humandecision making. In this chapter we focus on the mathematical form of Multinomial Logit,and in particular on its equivalence to the statistical mechanics of non-interacting particles.In the second chapter we consider the Curie-Weiss model, of which we provide a treatmentrecently developed in the wider study of mean ﬁeld spin glasses [37], which allows to giveelegant rigorous proofs of the model’s properties. In chapter three we generalise results fromchapter two for a system partitioned into an arbitrary number of components. Since sucha model corresponds to the generalization of discrete choice ﬁrst considered in [21], whichincludes the eﬀect of peer pressure into the process decision making, it provides a potentialtool for the study of social interaction: chapter four shows an application of this to threesimple case studies. 3 hapter 2

Discrete choice models

In this chapter we describe the general theory of discrete choice models. These are econo-metric models that were ﬁrst applied to the study of demand in transportation systems inthe nineteen-seventies [6]. When people travel they can choose the mode of transportationbetween a set of distinct alternatives, such as train or automobile, and the basic tenet ofthese models is that such a discrete choice can be described by a probability distribution,and that proposals for the form of such distribution can be derived from principles estab-lished at the level of individuals. As we shall see this modus operandi is one familiar tostatistical mechanics, and corresponds to what is commonly known as a bottom-up strategyin ﬁnance.After describing the general scope of discrete choice analysis, in section 2.3 we describeprecisely the mathematical structure of one of the most widely used discrete choice models,the Multinomial Logit model. Here we shall see how the probability distribution describ-ing people’s choices arises from the assumption that individual act trying to maximize thebeneﬁt coming from that choice, which is the common setting of neoclassical economics.Discrete choice models, in general, ignore the eﬀect of social interaction, but we shall see insubsection 2.3.3 that the Multinomial Logit can be rephrased precisely as a statistical me-chanical model, which gives an ideal starting point for extending such a model of behaviourto a context including interaction, to be considered in later chapters.Due to his development of the theory of the Multinomial Logit model economist DanielMcFadden was awarded the Nobel Prize in Economics in 2000 [50], for bringing economicscloser to quantitative scientiﬁc measurement. The purpose of discrete choice theory is todescribe people’s behaviour: it is an econometric technique to infer people’s preferences fromempirical data. In discrete choice theory the decision-maker is assumed to make choicesthat maximise his/her own beneﬁt. Their ‘beneﬁt’ is described by a mathematical formula,4 igure 2.1: Discrete choice predictions against actual use of travel modes in San Francisco, 1975(source: McFadden 2001) a utility function , which is derived from data collected in surveys. This utility functionincludes rational preferences, but also accounts for elements that deviate from rationalbehaviour.Though discrete choice models do not account for ‘peer pressure’or ‘herding eﬀects’, itis nonetheless a fact that the standard performance of discrete choice models is close tooptimal for the analysis of many phenomena where peer inﬂuence is perhaps not a majorfactor in an individual’s decision: Figure 2.1 shows an example of this. The table (takenfrom [50]) compares predictions and actual data concerning use of travel modes, before andafter the introduction of new rail transport system called BART in San Francisco, 1975.We see a remarkable agreement between the predicted share of people using BART (6.3%),and the actual measured ﬁgure after the introduction of the service (6.2%). In discrete choice each decision process is described mathematically by a utility function ,which each individual seeks to maximize. The principle of utility maximization is one whichlies at the heart of neoclassical economics: this has often been critised as too simplistic anassumption for complex human behaviour, and this criticism has been supported by thepoor performance of quantitative models arising from such an assumption. It must benoted however, that if we wish to attain a quantitative description of human behaviour atall, we must do so by considering a description which is analytically treatable. There exist ofcourse alternatives approaches (e.g. agent-based modeling), but since this ﬁeld of research5s still in its youth, it pays to consider possible improvements of utility maximisation beforeabandoning it altogether. This is indeed the view taken by discrete choice, which sees peopleas rational utility maximizers, but also takes into account a certain degree of irrationality,which is modeled through a random contribution to the utility function.As an example, a binary choice could be to either cycle to work or to catch a bus. Theutility function for choosing the bus may be written as: U = V + ε (2.1)where V, the deterministic part of the utility, could be symbolically parametrised as follows V = X a λ a x a + X a α a y a (2.2)The variables x a are assumed to be attributes regarding the choice alternatives them-selves. For example, the bus fare or the journey time. On the other hand, the y a maysocio-economic variables that deﬁne the decision-maker, for example their age, gender orincome. It is this latter set of parameters that allows us to zoom in on speciﬁc geographicalareas or socio-economic groups. The λ a and α a are parameters that need to be estimatedempirically, through survey data, for instance. The key property of these parameters isthat they quantify the relative importance of any given attribute in a person’s decision: thelarger its value, the more this will aﬀect a person’s choice. For example, we may ﬁnd thatcertain people are more aﬀected by the journey time than the bus fare; therefore changingthe fare may not inﬂuence their behaviour signiﬁcantly. The next section will explain howthe value of these parameters is estimated from empirical data. It is an observed fact [49, 2]that choices are not always perfectly rational. For example, someone who usually goes towork by bus may one day decide to cycle instead. This may be because it was a nice sunnyday, or for no evident reason. This unpredictable component of people’s choices is accountedfor by the random term ε . The distribution of ε may be assumed to be of diﬀerent forms,giving rise to diﬀerent possible models: if, for instance, ε is assumed to be normal, the re-sulting model is called a probit model, and it doesn’t admit a closed form solution. Discretechoice analysis assumes ε to be extreme-value distributed, and the resulting model is calleda logit model [6]. In practice this is very convenient as it does not impose any signiﬁcantrestrictions on the model but simpliﬁes it considerably from a practical point of view. Inparticular, it allows us to obtain a closed form solution for the probability of choosing a6articular alternative, say catching a bus rather than cycling to work : P = e V e V , (2.3)(see section 2.3 for the derivation).In words, this describes the rational preferences of the decision maker. As will beexplained later on, (2.3) is analogous to the equation describing the equilibrium state ofa perfect gas of heterogeneous magnetic particles (a Langevin paramagnet ): just like gasparticles react to external forces diﬀerently depending, for instance, on their mass andcharge, discrete choice describes individuals as experiencing heterogeneous inﬂuences intheir decision-making, according to their own socio-economic attributes, such as gender andwealth. A question arises spontaneously: do people and gases behave in the same way? Theanswer to such a controversial question is that in some circumstances they might. Modelsare idealisations of reality, and equation (2.3) is telling us that the same equation maydescribe idealised aspects of both human and gas behaviour; in particular, how individualbehaviour relates to macroscopic or societal variables. These issues go beyond the scope ofthis thesis, but it is important to note that (2.3) oﬀers a mathematical and intuitive linkbetween econometrics and statistical mechanics. The importance of this ‘lucky coincidence’cannot be overstated, and some of the implications will be discussed later on in more detail.

Discrete choice may be seen as a purely empirical model. In order to specify the actualfunctional form associated with a speciﬁc group of people facing a speciﬁc choice, empiricaldata is needed. The actual utility function is then speciﬁed by estimating the numericalvalues of the parameters λ a and α a which appear in our deﬁnition of V given by (2.2), thusestablish the choice probabilities (2.3). As mentioned earlier, these parameters quantifythe relative importance of the attribute variables x a and y a . For example, costs are alwaysassociated with negative parameters: this means that the higher the price of an alternative,the less likely people will be to choose it. This makes intuitive sense: what discrete choiceoﬀers is a quantiﬁcation of this eﬀect. Once the data has been collected, the model parame-ters may be estimated by standard statistical techniques: in practice, Maximum Likelihoodestimation methods are used most often (see, e.g., [6] chapter 4). We shall see in furtherchapters how, though optimal for standard discrete choice models, Maximum likelihood es-timation seems to be unsuitable for phenomena involving interaction due to discontinuitiesin the probability structure. As we shall see, a valuable alternative is given by a method7ut forward by Joseph Berkson [7].Discrete choice has been used to study people’s preferences since the seventies [50].Initial applications focused on transport [68, 53]. These models have been used to developnational and regional transport models around the world, including the UK, the Netherlands[24], as well as Copenhagen [54]. Since then discrete choice has also been applied to a rangeof social problems, for example healthcare [30, 59], telecommunications [42] and social care[60] The binomial logit model which gives the probabilities (2.3) can be seen as a special case ofthe Multinomial Logit model introduced by R. Duncan Luce in 1959 [48] when developing amathematical theory of choice in psychology, and was later given the utility maximizationform which we describe here by Daniel Mcfadden [50].In the following three subsections we shall describe the mathematical structure of aMultinomial Logit model. In the ﬁrst subsection we shall ﬁrst give information about theGumbel extreme-distribution, which is the distribution by which the model describes therandom contribution ε to a person’s utility, and is chosen essentially for reasons of analyticalconvenience. The second subsection uses the properties of Gumbel distribution in order toderive the probability structure of the model. These two sections are an ‘executive summary’of all the main things, and they can be found on any standard book on econometrics [6, 25].The third subsection gives the statistical mechanical reformulation of the MultinomialLogit model, by showing that the same probability structure arises when we compute thepressure of a suitably chosen Hamiltonian: this leads the way for the extensions of themodel that shall be considered in later chapters. In order to implement the modelling assumption of utility maximization in a quantitativeway, we need a suitable probability distribution for the random term ε .The Multinomial Logit Model models randomness in choice by a Gumbel distribution,which has a cumulative distribution function F ( x ) = exp {− e − µ ( x − η ) } , µ > , f ( x ) = µe − µ ( x − η ) exp {− µ ( x − η ) } . We have that if ε d = Gumbel( η, µ ) then E ( ε ) = η + γη , Var( ε ) = π µ , where γ is the Euler-Mascheroni constant ( ∼ = 0 . µ (see [6], pag. 104). I. If ε ′ d = Gumbel( η , µ ) and ε ′′ d = Gumbel( η , µ ) are independent random variables, then ε = ε ′ − ε ′′ is logistically distribute with cumulative distribution F ε ( x ) = 11 + e − µ ( η − η − x ) , and probability density f ε ( x ) = µe − µ ( η − η − x ) (1 + e − µ ( η − η − x ) ) . II. If ε i d = Gumbel( η i , µ ) for 1 i k are independent thenmax i =1 ..k ε i d = Gumbel (cid:0) µ ln k X i =1 e µη i , µ (cid:1) As we said, the logit is a model which is founded on the assumption that individuals choosetheir behaviour trying to maximize a utility, or a “beneﬁt” function. In the next sectionwe shall use Property II to handle the probabilistic maximum of the utilities coming frommany diﬀerent choices, whereas Property I shall be used to compare probabilistically the9eneﬁts of two diﬀerent choices. We shall now derive the probability distribution for an individual l choosing between k alternatives i = 1 ..k . We have that choice i yields l a utility: U ( l ) i = V ( l ) i + ε ( l ) i We assume that l chooses the alternative with the highest utility. However, since theseare random we can only compute the probability that a particular choice is made: p l,i = P ( “ l chooses i ” )This is in fact the probability that U ( l ) i is bigger than all other utilities, and we canwrite this as follows: p l,i = P (cid:0) U ( l ) i > max j = i U ( l ) j (cid:1) = P (cid:0) V ( l ) i + ε ( l ) i > max j = i ( V ( l ) j + ε ( l ) j ) (cid:1) Now deﬁne U ∗ = max j = i ( V ( l ) j + ε ( l ) j ) . By property II of the Gumbel distribution, U ∗ d = Gumbel (cid:0) µ ln X j = i e µV ( l ) j , µ (cid:1) So, if V ∗ = 1 µ ln X j = l e µV ( l ) j , we have that U ∗ = V ∗ + ε ∗ with ε ∗ d = Gumbel(0 , µ ).This in turn gives us that p l,i = P ( V ( l ) i + ε / ( l ) i > V ∗ + ε ∗ ) = P ( V ( l ) i − V ∗ > ε ∗ − ε / ( l ) i ) = 11 + e µ ( V ∗ − V ( l ) i ) =by property I of the Gumbel distribution, and this can be re-expressed as p l,i = e µV ( l ) i e µV ( l ) i + e µV ∗ = e µV ( l ) i P kj =1 e µV ( l ) i µ is a parameter which cannot be identiﬁed fromstatistical data. From a physical perspective, this corresponds to the lack of a well deﬁnedtemperature: intuitively this makes sense, since measuring temperature consists in compar-ing a system of interest with another system whose state we assume to know perfectly well.In physics this can be done to a high degree of precision: in social systems, however, sucha concept has yet no clear meaning, and ﬁnding one will most certainly require a change inperspective about what we mean by measuring a quantity.As a practical consequence, in this simple model we have that we can let the parameter µ be incorporated into the degrees of freedom V ( l ) i of the various utilities, and get the choiceprobabilities in the following form: p l, i = e V ( l ) i P kj =1 e V ( l ) i (2.4) As we have seen, the Multinomial logit model follows a utility-maximization approach,in that it assumes that each person behaves as to optimize his/her own beneﬁt. Froma statistical-mechanical perspective, this amounts to the community of people trying toidentify its ground state , where some deﬁnition of self-perceived well-being, the utility,takes the role traditionally played by energy.If there were an exact value of the utility corresponding to each behaviour, a systemcharacterized by such maximizing principle for the ground state would identify microcanon-ical ensemble in a equilibrium statistical mechanics. This in amounts to stating that theenergy of the system has an exact value, as opposed to being a random variable.However, since the Multinomial logit deﬁnes utility itself as a Gumbel random variablein order to try and capture both the predictable and unpredictable components of humandecisions, its “ground state” turns out to be a “noisy” object. Statistical mechanics modelsthis situation by deﬁning a so-called canonical ensemble , where all possible values of theenergy are considered, each with a probability given by a

Gibbs distribution , which weightsenergetically favourable states more than unfavourable ones. We will now see how the Gibbsdistribution leads to a model which is formally equivalent to the Multinomial logit arisingfrom the Gumbel distribution.Assume that we have a population of N people, each of whom makes a choice σ ( l ) = e l e i form the k -dimensional canonical basis e = (1 , , .. , , e = (0 , , .. , , etc . We have then that a particular state of this system can be described by the followingset: σ ∼ = { σ (1) , ..., σ ( N ) } Now deﬁne v ( l ) as a k -dimensional vector giving the utilities of the various choices forindividual l : v ( l ) = ( V ( l )1 , .. , V ( l ) k ) . We have that V ( l ) i , which is the deterministic part of the utility considered in the lastsection, changes from person to person, and that it can be parametrised by a person’ssocial attributes, for instance. For the moment, however, we just consider them as diﬀerentnumbers, since the exact parametrization doesn’t change the nature of the probabilitystructure.If we now denote by v ( l ) · σ ( l ) the scalar product between the two vectors, we may expressthe energy (also called Hamiltonian ) for the Multinomial Logit Model as follows: H N ( σ ∼ ) = − N X l =1 v ( l ) · σ ( l ) . Intuitively, a Hamiltonian model is one where the deﬁnes a model where the favouredstates σ ∼ are the ones which make the quantity H N small, which due to the minus sign, cor-respond to people choosing as to maximise their utility. Most of the information containedin an equilibrium statistical mechanical model can be derived from its pressure, which isdeﬁned as P N = ln X σ ∼ e − H N ( σ ∼ ) , which acts as a moment generating function for the Gibbs distribution p ( σ ∼ ) = e − H N ( σ ∼ ) P σ ∼′ e − H N ( σ ∼′ ) , and can recover many of the features of the model, among which the probabilities p l,i , asderivatives of P N with respect to suitable parameters.This distribution is chosen in physics since it is the one which maximises the system’s12ntropy at a given temperature, which in turn just means that it is the most likely distri-bution to expect for a system which is at equilibrium. This is not to say that using such amodel corresponds to accepting that society is at equilibrium, but rather to believing thatsome features of society might have small enough variations for a period of time long enoughto allow a quantitative study. As pointed out in a later chapter, this belief has at least somequantitative backing if one considers the remarkable ﬁndings made by ´Emile Durkheim asearly as at the end of 19 th century [20].We will now show that this model is equivalent to the Multinomial Logit by computingits pressure explicitly and ﬁnding its derivatives. Indeed, since the model doesn’t includeinteraction this is a task that can be done easily for a ﬁnite N : P N = ln X σ ∼ e − H N ( σ ∼ ) = ln X σ ∼ exp (cid:8) N X l =1 v ( l ) · σ ( l ) (cid:9) == ln X σ (1) exp (cid:8) v (1) · σ (1) (cid:9) ... X σ ( N ) exp (cid:8) v ( N ) · σ ( N ) (cid:9) == ln N Y l =1 k X i =1 exp { V ( l ) i } = N X l =1 ln k X i =1 exp { V ( l ) i } . Once we have the pressure P N it’s easy to ﬁnd the probability p i,l that person l choosesalternative k , just by computing the derivative of P N with respect to utility V ( l ) i : p i,l = P (“ l chooses i ”) = ∂P N ∂V ( l ) i = e V ( l ) i P kj =1 e V ( l ) j , which is the same as (2.4).This shows how the utility maximization principle is equivalent to a Hamiltonian model,whenever the random part of the utility is Gumbel distributed. There is a simple inter-pretation for this statistical mechanical model: it is a gas of N magnetic particles, each ofwhich has k states, and the energy of these states depend on the corresponding value of theutility V ( l ) k , which therefore bears a close analogy to a magnetic ﬁeld acting on the particle.This model may seem completely uninteresting, since it is in no essential way diﬀerentfrom a Langevin paramagnet. What is interesting, however, is how such a familiar, if trivial,model has arisen independently in the ﬁeld of economics, and there are a few simple pointsto be made that can emphasize the change in perspective.First, we see how for this model it makes sense to consider the pressure P N as anextensive quantity. This is due to the fact that these models are applied to samples of data13hat yield information about each single individual, rather than be applied to extremelylarge ensembles of particles that we regard as identical, and of which we measure averagequantities. Second, the availability of data about individuals ( microeconomic data ) allowsus to deﬁne the vector v ( l ) which assigns a beneﬁt value to each of the alternative thatindividual l has.The main goal of an econometric model of this kind is then to ﬁnd the parametrization for v ( l ) in terms of observable socio-economic features which ﬁts micro data in an optimal way.The main goal of statistical mechanics is, on the other hand, to ﬁnd a microscopic theorycapable of generating laws that are observed consistently over a large number of experimentsand measured with extreme precision at a macroscopic level. Since the numbers availablefor microeconomic data are not as high as the number of particles in a physical systems,but these that are more detailed at the level of individuals, the goal of a model of socialbehaviour could be seen as an interesting mixture of the above. We have see how discrete choice can be given a statistical mechanical description: in thissection we consider why this is of interest to modeling social phenomena.A key limitation of discrete choice theory is that it does not formally account for socialinteractions and imitation. In discrete choice each individual’s decisions are based on purelypersonal preferences, and are not aﬀected by other people’s choices. However, there is agreat deal of theoretical and empirical evidence to suggest that an individual’s behaviour,attitude, identity and social decisions are inﬂuenced by that of others through vicariousexperience or social inﬂuence, persuasions and sanctioning [1, 4]. These theories speciﬁcallyrelate to the interpersonal social environment including social networks, social support, rolemodels and mentoring. The key insight of these theories is that individual behaviours anddecisions are aﬀected by their relationships with those around them - e.g. their parents ortheir peers.Mathematical models that take into account social inﬂuence have been considered bysocial psychology since the ’70s (see [63] for a short review). In particular, inﬂuential worksby Schelling [62] and Granovetter [34] have shown how models where individuals take intoaccount the mean behaviour of others are capable of reproducing, at least qualitatively,the dramatic opinion shifts observed in real life (for example in ﬁnancial bubbles or duringstreet riots). In other words, they observed that the interaction built into their models wasunavoidably linked to the appearance of structural changes on a phenomenological level inthe models themselves. 14 igure 2.2: The diagram illustrates how the inclusion of social interactions (right) leads to theexistence tipping points. By contrast models that do not account for social interactions cannotaccount for the tipping points.

Figure 2.2 compares the typical dependence of average choice with respect to an at-tribute parameter, such as cost, in discrete choice analysis (left), where the dependence isalways a continuous one, with the typical behaviour of an interaction model of Schelling orGranovetter kind (right), where small changes in the attributes can lead to a drastic jumpin the average choice, reﬂecting structural changes such as the disappearing of equilibria inthe social context.The research course initiated by Schelling was eventually linked to the parallel devel-opment of the discrete choice analysis framework at the end of the ’90s, when Brock andDurlauf [21] suggested a direct econometric implementation of the models considered bysocial psychology. In order to accomplish this, Brock and Durlauf had to delve into the im-plications of a model where an individual takes into account the behaviour of others whenmaking a discrete choice: this could only be done by considering a new utility functionwhich depended on the choices of all other people.This new utility function was built by starting from the assumptions of discrete choiceanalysis. The utility function reﬂects what an individual considers desirable: if we hold(see, e.g., [10]) that people consider desirable to conform to people they interact with, wehave that, as a consequence, an individual’s utility increases when he agrees with otherpeople.Symbolically, we can say that when an individual i makes a choice, his utility for thatchoice increases by an amount J ij when another individual j agrees with him, thus deﬁninga set of interaction parameters J ij for all couples of individuals. The new utility function15or individual i hence takes the following form: U i = X j J ij τ j + X a λ a x ( i ) a + X a α a y ( i ) a + ε, (2.5)where the sum P j ranges over all individuals, and the symbol τ j is equal to 1 if j agreeswith i , and 0 otherwise.Analysing the general case of such a model is a daunting task, since the choice of anotherindividual j is itself a random variable, which in turn correlates the choices of all individuals.This problem, however, has been considered by statistical mechanics since the end of the19 th century, throughout the twentieth century, until the present day. Indeed, the ﬁrstsuccess of statistical mechanics was to give a microscopic explanation of the laws governingperfect gases, and this was achieved thanks to a formalism which is strictly equivalent tothe one obtained by discrete choice analysis in (5.4).The interest of statistical mechanics eventually shifted to problems concerning interac-tion between particles, and as daunting as the problem described by (2.5) may be, statisticalphysics has been able to identify some restrictions on models of this kind to make themtractable while retaining great descriptive power as shown, e.g., in the work of Pierre Weiss[70] regarding the behaviour of magnets.The simplest way devised by physics to deal with such a problem is called a meanﬁeld assumption, where interactions are assumed to be of a uniform and global kind. Thisleads to manageable closed form solution and a model that is consistent with the models ofSchelling and Granovetter. Moreover, this assumption is also shown by Brock and Durlaufto be closely linked to the assumption of rational expectations from economic theory, whichassumes that the observed behaviour of an individual must be consistent with his beliefabout the opinion of others.By assuming mean ﬁeld or rational expectations we can rewrite (2.5) in the tamer form U i = J m + X a λ a x ( i ) a + X a α a y ( i ) a + ε, (2.6)where m is the average opinion of a given individual, and this average value is coupled tothe model parameters by a closed form formula.If we now deﬁne V i to be the deterministic part of the utility, similarly as before, V i = J m + X a λ a x ( i ) a + X a α a y ( i ) a ,

16e have that the functional form of the choice probability, given by 5.4, P i = e V i e V i , (2.7)remains unchanged, allowing the empirical framework of discrete choice analysis to be usedto test the theory against real data. This sets the problem as one of heterogeneous inter-acting particles, and we shall see in the next two chapters how such a mean-ﬁeld model,just like the standard Multinomial Logit, can be given a Hamiltonian statistical mechanicalform, and solved in a completely rigorous way using elementary mathematics, via methodsrecently developed in the context of spin glasses [37].Though the mean ﬁeld assumption might be seen as a crude approximation, since itconsiders a uniform and ﬁxed kind of interaction, one should bear in mind that statisticalphysics has built throughout the twentieth century the expertise needed to consider a widerange of forms for the interaction parameters J ij , of both deterministic and random nature,so that a partial success in the application of mean ﬁeld theory might be enhanced bybrowsing through a rich variety of well developed, though analytically more demanding,theories.Nevertheless, an empirical attempt to assess the actual descriptive and predictive powerof such models has not been carried out to date: the natural course for such a study wouldbe to start by empirically testing the mean ﬁeld picture, as it was done for discrete choicein the seventies (see Figure 1), and to proceed by enhancing it with the help available fromthe econometrics, social science, and statistical physics communities. Two recent examplesof empirical studies of mean-ﬁeld models can be found in [65] and [29].17 hapter 3 The Curie-Weiss model

The Curie-Weiss model was ﬁrst introduced in 1907 by Pierre Weiss [70] as a proposal fora phenomenological model capable of explaining the experimental observations carried outby Pierre Curie in 1895 [18], concerning the dependance on temperature of the magneticnature for metals such as iron, nickel, and magnetite.Iron and nickel are materials capable of retaining a degree of magnetization, whichwe call spontaneous magnetization , after having been exposed to a magnetic ﬁeld: suchmaterials are said to be ferromagnetic , from the Latin name for iron. However, it had beenknown since the day of Faraday ([18], pag. 1) that these materials tend to lose their abilityto retain magnetization as their temperature increases.Pierre Curie’s experiments showed not only that the loss of the ferromagnetic propertyindeed occurs, but also observed that it occurs in a very peculiar fashion. For each of thematerials he considered, he found a deﬁnite temperature at which spontaneous magnetiza-tion vanishes abruptly, giving rise to an irregular point in the graph plotting spontaneousmagnetization versus temperature (see Figure 3.1): we now call this temperature the

Curietemperature for the given material.

Figure 3.1: Pierre Curie’s measurements in 1895 igure 3.2: Pierre Weiss’s measurements (crosses) ﬁtted against his theoretical prediction (line) in1907: the graph shows the dependance of spontaneous magnetization on temperature for magnetite Weiss’s model arises from physical considerations about the nature of magnetic interac-tions between atoms: he claims that single atoms must experience, as well as the externalﬁeld, a sum of all the ﬁelds produced by all the other particles inside the material. He callsthis ﬁeld a “molecular ﬁeld” ( champ moleculaire ), and by adding a term corresponding tothis ﬁeld inside the balance equation derived by Paul Langevin to describe paramagnetic materials (that is, magnetic materials that do not retain magnetition after exposure to aﬁeld), he formulates a balance equation for ferromagnetic materials.In his 1907 paper Weiss shows that the theoretical predictions of his model show re-markable agreement with physical reality by ﬁtting them against measurements, carried onby himself, on a ellipsoid made of magnetite (Figure 3.2).Today we know that the Curie-Weiss is not completely accurate: indeed, it is wellknown that some physically measurable quantities for ferromagnetic materials, called criticalexponents , are not predicted correctly by it (see [41], pag. 425). The subsequent study ofmore detailed models, such as the Ising model, has brought to light the reason for such amismatch: when rewritten in the language of modern statistical mechanics, the model ofCurie-Weiss readily shows to be equivalent to one where all particles are interacting witheach other. This turns out to be too strong an assumption for a system where all particlessit next to each other geometrically and which interact, according to quantum mechanics,up to a very short range. On the other hand though, the Ising model, which still makes useof all of Weiss’s other simplifying assumptions about interaction between particles, manages19o predict critical exponents correctly, just by assuming that particles only interact withtheir nearest neighbours on a regular lattice, though, from a mathematical point of view,this modiﬁcation implies a drastic reduction of the symmetry of the problem, which has sofar proved to be analitically untreatable in more than two dimensions (see [41] pag. 341).All objections standing, it is nevertheless worth remembering that the degree of agree-ment between theory and reality for the Curie-Weiss model is truly remarkable given thesimplicity of the model. Today, Weiss’s “molecular ﬁeld” assumption is called a mean ﬁeld assumption, and scientiﬁc wisdom tells that this assumption is of great value in exploringthe phase structure of a system so that, when faced with a new situation, one would trymean ﬁeld ﬁrst ([41], pag. 423).

As a modern statistical mechanics model, the Curie-Weiss model is deﬁned by its Hamilto-nian: H ( σ ) = − N X i,j =1 J ij σ i σ j − N X i =1 h i σ i . (3.1)We consider Ising spins, σ i = ±

1, subject to a uniform magnetic ﬁeld h i = h and toisotropic interactions J i,j = J/ N , so that we have. H ( σ ) = − J N N X i,j =1 σ i σ j − h N X i =1 σ i . (3.2)If we now introduce the magnetization of a conﬁguration σ as m ( σ ) = 1 N N X i =1 σ i we can rewrite the Hamiltonian per particle as: H ( σ ) N = − J m ( σ ) − hm ( σ ) (3.3)The established statistical mechanics framework deﬁnes the equilibrium value of anobservable f ( σ ) as the average with respect to the Gibbs distribution deﬁned by the Hamil-tonian. We call this average the

Gibbs state for f ( σ ), and write it explicitly as: h f i = P σ f ( σ ) e − H ( σ ) P σ e − H ( σ ) . magnetization , m ( σ ), which explicitly reads: m ( σ ) = 1 N N X i =1 σ i . Our quantity of interest is therefore h m i : to ﬁnd it, as well as the moments of manyother observables, statistical mechanics leads us to consider the pressure function: p N = 1 N log X σ e − H ( σ ) . It is easy to verify that, once it’s been derived exactly, the pressure is capable of gener-ating the Gibbs state for the magnetization as h m i = ∂p N ∂h . We show two ways of computing the existence of the thermodynamic limit in the Curie-Weissmodel. The ﬁrst method follows [5] in exploiting directly the convexity of the Hamiltonianin order to prove subadditivity in N for the systems’s pressure.The second method consists in a reﬁnement of the ﬁrst, and covers models for whichthe Hamiltonian is not necessarily convex, such as the two-population model considered inthe next chapter. It is important to point out that a careful application of this method tothe Sherrington-Kirkpatrick spin glass model allowed Guerra [36] to prove the twenty-yearsstanding question concerning existence of thermodynamic limit. We consider a system of N spins deﬁned as above. Following [5] we split the system in twosubsystem of N and N spins, respectively, with N + N = N . For each of these systemswe deﬁne partial magnetizations m ( σ ) = 1 N N X i =1 σ i and m ( σ ) = 1 N N X i = N +1 σ i , H N = − N ( J m + hm ) and H N = − N ( J m + hm ) . We have by deﬁnition that m = N N m + N N m (3.4)and since f ( x ) = x is a convex function we also have that m N N m + N N m . (3.5)We are now ready to prove the following Proposition 1.

There exists a function p ( J, h ) such that lim N →∞ p N = p . Proof.

Relations (3.4) and (3.5) imply that H N H N + H N and this in turn gives Z N = X σ e − H N ( σ ) X σ e − H N ( σ :1 ..N ) − H N ( σ : N +1 ..N ) = Z N Z N where σ : 1 ..N = { σ , .., σ N } and σ : N + 1 ..N = { σ N +1 , .., σ N } . Hence we have thefollowing inequality N p N N p N + N p N , for N + N = N This identiﬁes the sequence { N p N } as a subadditive sequence, for which the followingholds lim N →∞ N p N N = lim N →∞ p N = inf N p N . Hence in order to verify the existence of a ﬁnite limit we need to verify that the sequence { p N } is bounded below, which follows from the boundedness of the intensive quantity H ( σ ) N = − J m − hm, − m

1. Indeed, if H ( σ ) N K , p N = 1 N ln X σ e − H ( σ ) > N ln 2 N e NK = ln 2 + K so the result follows. We shall now prove that our model admits a thermodynamic limit by exploiting an existencetheorem provided for mean ﬁeld models in [8]: the result states that the existence of thepressure per particle for large volumes is guaranteed by a monotonicity condition on theequilibrium state of the Hamiltonian. We therefore prove the existence of the thermody-namic limit independently of an exact solution. Such a line of enquiry is pursued in view ofthe study of models, that shall possibly involve random interactions of spin glass or randomgraph type, and that might or might not come with an exact expression for the pressure.

Proposition 2.

There exists a function p ( J, h ) such that lim N →∞ p N = p . Proof.

Theorem 1 in [8] states that given a Hamiltonian H N such that H N N is bounded in N ,and its associated equilibrium state ω N , the model admits a thermodynamic limit wheneverthe physical condition ω N ( H N ) > ω N ( H N ) + ω N ( H N ) , N + N = N, (3.6)is veriﬁed.For the Curie-Weiss model the condition is easy to verify once we deﬁne partial magne-tizations m ( σ ) = 1 N N X i =1 σ i and m ( σ ) = 1 N N X i =1 σ i . This gives that m = N N m + N N m

23o that H N − H N − H N = − N ( J m + hm ) + N ( J m + hm ) + N ( J m + hm ) == − N J m − N N m − N N m ) − N h ( m − N N m − N N m )= − N J m − N N m − N N m ) > f ( x ) = x , and since it holdsfor every conﬁguration σ , it also implies (3.6), proving the result. In this section we shall prove that the correlation functions of our model factorize completelyin the thermodynamic limit, for almost every choice of parameters. This implies that all thethermodynamic properties of the system can be described by the magnetization. Indeed,the exact solution of the model to be derived in the next section comes as an equation ofstate which, as expected, turns out to be the same as the balance equation derived by Weiss.

Proposition 3. lim N →∞ (cid:0) ω N ( m ) − ω N ( m ) (cid:1) = 0 for almost every choice of h .Proof. We recall the deﬁnition of the Hamiltonian per particle H N ( σ ) N = − J m − hm, and of the pressure per particle p N = 1 N ln X σ e − H N ( σ ) . By taking ﬁrst and second partial derivatives of p N with respect to h we get ∂p N ∂h = 1 N X σ N m ( σ ) e − H ( σ ) Z N = ω N ( m ) , ∂ p N ∂ h = ω N ( m ) − ω N ( m ) . By using these relations we can bound above the integral with respect to h of the24uctuations of m in the Gibbs state: (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Z h (2) h (1) ( ω N ( m ) − ω N ( m ) ) dh (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = 1 N (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Z h (2) h (1) ∂ p N ∂h dh (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = 1 N (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) ∂p N ∂ h (cid:12)(cid:12)(cid:12)(cid:12) h (2) h (1) (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N (cid:0)(cid:12)(cid:12) ω N ( m ) | h (2) (cid:12)(cid:12) + (cid:12)(cid:12) ω N ( m ) | h (1) (cid:12)(cid:12)(cid:1) = O (cid:0) N (cid:1) . (3.7)On the other hand we have that ω N ( m ) = ∂p N ∂h , and ω N ( m ) = 2 ∂p N ∂J , so, by convexity of the thermodynamic pressure p = lim N →∞ p N , both quantities ∂p N ∂h and ∂p N ∂J have well deﬁned thermodynamic limits almost everywhere. This together with (3.7)implies that lim N →∞ ( ω N ( m ) − ω N ( m ) ) = 0 a.e. in h. (3.8)The last proposition proves that m ( σ ) is a self-averaging quantity, that is, a randomquantity whose ﬂuctuations vanish in the thermodynamic limit. This is indeed a powerfulresult, which can be exploited thanks to the following Proposition 4. (Cauchy-Schwartz inequality) Let X and Y be two random variables deﬁnedon a ﬁnite probability space such that P ( X i ) = P ( Y i ) = p i . Then the following holds E ( XY ) − E ( X ) E ( Y ) p Var( X )Var( Y ) Proof.

Let us deﬁne the following quantities: E ( X ) = X i X i p i = µ X , Var( X ) = σ X E ( Y ) = X i Y i p i = µ Y , Var( Y ) = σ Y

25f we now deﬁne rescaled versions of X and Y :¯ X = X − µ X σ X , and ¯ Y = Y − µ Y σ Y , we get that { ¯ X i p / i } and { ¯ Y i p / i } are vectors of Euclidean length equal to 1 (since theirlengths are the variances of ¯ X and ¯ Y , which have been normalized). This implies | E ( ¯ X ¯ Y ) | = | X i ¯ X i ¯ Y i p i | = | X i ¯ X i p / i ¯ Y i p / i | E ( ¯ X ¯ Y ) is the projection of a unit vector againstanother, and therefore that its modulus is less than one.If we now substitute back X and Y into (3.9) we get our result.By putting together the self-avering property and the Cauchy-Schwartz inequality weget the following Proposition 5.

Given any integer k we have that lim N →∞ (cid:0) ω N ( m k ) − ω N ( m ) k (cid:1) = 0 for almost every choice of h .Proof. Applying the Cauchy-Schartz inequality to X = m k − and Y = m we get that | ω N ( m k − m ) − ω N ( m k − ) ω N ( m ) | q Var N ( m k − )Var N ( m ) . (3.10)Now self-averaging tells us that Var N ( m ) tends to zero in the limit, and since m k − isa bounded quantity, (3.10) implies:lim N →∞ (cid:0) ω N ( m k ) − ω N ( m ) k − ω N ( m ) (cid:1) = 0and the rest of the proposition follows by induction on the same argument.The last proposition is very important for this model, because the mean-ﬁeld nature ofthe system allows to use the factorization of the magnetization in order to prove factorizationof spin correlation functions, thus characterizing all the thermodynamics of the system.26n the following proposition we shall only prove the factorization of 2-spins: the factor-ization of k-spins is done in the same way. Proposition 6. lim N →∞ (cid:0) ω N ( σ i σ j ) − ω N ( σ i ) ω N ( σ j ) (cid:1) = 0 for almost every choice of h , whenever σ i , σ j are distinct spins.Proof. Now we can use the self-averaging of m ( σ ) the factorization of correlation functions.This is done by exploiting the translation invariance of the Gibbs measure on spins, whichin turn follows from the mean-ﬁeld nature of the model: ω N ( m ) = ω N ( 1 N N X i =1 σ i ) = ω N ( σ ) ,ω N ( m ) = ω N ( 1 N N X i,j =1 σ i σ j ) = ω N ( 1 N N X i = j =1 σ i σ j ) + ω N ( 1 N N X i = j =1 σ i σ j ) == N − N ω N ( σ σ ) + 1 N . (3.11)We have that (3.11) and (3.8) implylim N →∞ ω N ( σ i σ j ) − ω N ( σ i ) ω N ( σ j ) = 0 , for a.e. h (3.12)which veriﬁes our statement for all couples of spins i = j .The self-averaging of the magnetization has been proved directly here: this, however,can be seen as a consequence of the convexity of the pressure. Indeed, the second derivativeof any convex function exists almost everywhere: this is a consequence of the ﬁrst derivativeexisting almost everywhere and being monotonically increasing (se, e.g., [57]).Therefore existence almost everywhere of ∂ p∂h together with the intensivity propertyof the magnetization implies trivially that its ﬂuctuations vanish in the thermodynamiclimit. This also implies that, since energy per particle is another intensive quantity which isobtained by diﬀerentiating the pressure with respect J , energy per particle is a self-averagingquantity too.As we can see from Proposition 5 factorization of spins only holds a.e. for h , and indeedit can be proved that factorization doesn’t hold at h = 0, J >

1. However, by using27he self-averaging of energy-per-particle proved above, we can similarly obtain a weakerfactorization rule which covers this regime:

Proposition 7. lim N →∞ ω N ( σ i σ j σ k σ l ) − ω N ( σ i σ j ) ω N ( σ k σ l ) = 0 , for a.e. J for almost every choice of J , whenever σ i , σ j , σ k , σ l are distinct spins.Proof. The proof follows the same argument of Proposition 5, and uses the self-averagingof the energy per particle instead of the self-averaging of the magnetization.

We shall derive upper and lower bounds for the thermodynamic limit of the pressure. Thelower bound is obtained through the standard entropic variational principle, while the upperbound is derived by a decoupling strategy.

In order to ﬁnd an upper bound for the pressure we shall divide the conﬁguration space intoa partition of microstates of equal magnetization, following [19, 37, 38]. Since the systemconsists of N spins, its magnetization can take exactly N + 1 values, which are the elementsof the set R N = n − , − N , . . . , − N , o . Clearly for every m ( σ ) we have that X ¯ m ∈ R N δ m, ¯ m = 1 , where δ x,y is a Kronecker delta. Therefore we have that Z N = X σ exp (cid:8) N ( J m + hm ) (cid:9) = = X σ X ¯ m ∈ R N δ m, ¯ m exp (cid:8) N ( J m + hm ) (cid:9) . (3.13)Thanks to the Kronecker delta symbols, we can substitute m (the average of the spinswithin a conﬁguration) with the parameter ¯ m (which is not coupled to the spin conﬁgura-tions) in any convenient fashion. Therefore we can use the following relation in order to28inearize the quadratic term appearing in the Hamiltonian( m − ¯ m ) = 0 , and once we’ve carried out this substitution into (3.13) we are left with a function whichdepends only linearly on m : Z N = X σ X ¯ m ∈ R N δ m, ¯ m exp (cid:8) N ( J m ¯ m − ¯ m ) + hm ) (cid:9) . and bounding above the Kronecker deltas by 1 we get Z N X σ X ¯ m ∈ R N exp (cid:8) N ( J m ¯ m − ¯ m ) + hm ) (cid:9) . Since both sums are taken over ﬁnitely many terms, it is possible to exchange the orderof the two summation symbols, in order to carry out the sum over the spin conﬁgurations,which now factorizes, thanks to the linearity of the interaction with respect to the m s. Thisway we get: Z N X ¯ m ∈ R N G ( ¯ m ) . where G ( ¯ m ) = exp (cid:8) − N J ¯ m (cid:9) · N (cid:0) cosh (cid:0) J ¯ m + h (cid:1)(cid:1) N (3.14)Since the summation is taken over the range R N of cardinality N + 1 we get that thetotal number of summands is N + 1. Therefore Z N ( N + 1) sup ¯ m G, (3.15)which leads to the following upper bound for p N : p N = 1 N ln Z N N ln( N + 1) sup ¯ m G == 1 N ln( N + 1) + 1 N sup ¯ m ln G . (3.16)29here the last equality follows from monotonicity of the logarithm.Now deﬁning the N independent function p up ( ¯ m , ¯ m ) = 1 N ln G = ln 2 − J m + ln cosh (cid:0) J ¯ m + h (cid:1) , and keeping in mind that lim N →∞ N ln( N + 1) = 0, in the thermodynamic limit we get:lim sup N →∞ p N sup ¯ m p UP ( ¯ m ) . (3.17)We can summarize the previous computation into the following: Lemma 1.

Given a Hamiltonian as deﬁned in (3.3), and deﬁning the pressure per particleas p N = N ln Z , given parameters J and h , the following inequality holds: lim sup N →∞ p N sup ¯ m p up where p up ( ¯ m ) = ln 2 − J m + ln cosh (cid:0) J ¯ m + h (cid:1) , and ¯ m ∈ [ − , . We shall give two ways of deriving a lower bound for the pressure: indeed, it is importantto keep in mind that having as many bounding tecniques as possible can be a good way ofapproaching more reﬁned models.

Proposition 8.

Given a Hamiltonian as deﬁned in (3.3) and its associated pressure perparticle p N = N ln Z , the following inequality holds for every J , h : p N > sup − ¯ m p low where p low ( ¯ m ) = − J m + ln 2 + ln cosh( J ¯ m + h ) Proof.

We recall the Hamiltonian per particle written in terms of the conﬁguration’s mag-netization m ( σ ): H ( σ ) N = − J m − hm. m ∈ [ − , +1], the following holds:( m − ¯ m ) > ⇒ m > m ¯ m − ¯ m so that p N = 1 N ln Z N = 1 N ln X σ exp { N ( J m + hm ) } >> N ln X σ exp { N ( J m ¯ m − J m + hm ) } == 1 N ln (cid:16) exp {− N J m } X σ exp { N ( J ¯ mm + hm ) } (cid:17) == − J m + 1 N ln (cid:16) N cosh( J ¯ m + h ) N (cid:17) = − J m + ln 2 + ln cosh( J ¯ m + h )This way we get new lower bound which can be expressed as p N > sup − ¯ m p low where p low ( ¯ m ) = − J m + 1 N ln (cid:16) N cosh( J ¯ m + h ) N (cid:17) = ln 2 − J m + ln cosh( J ¯ m + h )which is the result. The second lower bound is provided by exploiting the well-known Gibbs entropic variationalprinciple (see [58], pag. 188). In our case, instead of considering the whole space of ansatz probability distributions considered in [58], we shall restrict to a much smaller one, anduse the upper bound derived in the last section in order to show that the lower boundcorresponding to the restricted space is sharp in the thermodynamic limit.The mean-ﬁeld nature of our Hamiltonian allows us to restrict the variational problemto a product measure with only one degree of freedom, represented by the non-interactingHamiltonian: ˜ H = − r N X i =1 σ i , and so, given a Hamiltonian ˜ H , we deﬁne the ansatz Gibbs state corresponding to it as31 ( σ ) as: ˜ ω ( f ) = P σ f ( σ ) e − ˜ H ( σ ) P σ e − ˜ H ( σ ) In order to facilitate our task, we shall express the variational principle of [58] in thefollowing simple form:

Proposition 9.

Let a Hamiltonian H , and its associated partition function Z = X σ e − H be given. Consider an arbitrary trial Hamiltonian ˜ H and its associated partition function ˜ Z . The following inequality holds: ln Z > ln ˜ Z − ˜ ω ( H ) + ˜ ω ( ˜ H ) . (3.18) Given a Hamiltonian as deﬁned in (3.3) and its associated pressure per particle p N = N ln Z ,the following inequality follows from (3.18) : lim inf N →∞ p N > sup ¯ m p ′ low (3.19) where p ′ low ( ¯ m ) = J m + h ¯ m − m m − − ¯ m − ¯ m . (3.20) and ¯ m ∈ [ − , .Proof. The inequality (3.18) follows straightforwardly from Jensen’s inequality: e ˜ ω ( − H + ˜ H ) ≤ ˜ ω ( e − H + ˜ H ) . (3.21)We recall the Hamiltonian: H ( σ ) = − J N X i,j σ i σ j − h X i σ i , (3.22)so that its expectation on the trial state is˜ ω ( H ) = − J N X i,j ˜ ω ( σ i σ j ) − h X i ˜ ω ( σ i )and a standard computation for the moments of a non-interacting system (i.e. for a perfect32as) leads to ˜ ω ( H ) = − N (1 − /N ) J r ) − N J − N h tanh r. (3.23)Analogously, the trial Gibbs state of ˜ H is:˜ ω ( ˜ H ) = − N r tanh r, and the non interacting partition function is:˜ Z N = X σ e − ˜ H ( σ ) = 2 N (cosh r ) N , which implies that the non-interacting pressure gives˜ p N = 1 N ln ˜ Z N = ln 2 + ln cosh r So we can ﬁnally apply Proposition (3.18) in order to ﬁnd a lower bound for the pressure p N = 1 N ln Z N : p N = 1 N ln Z N > N (cid:16) ln ˜ Z N − ˜ ω ( H ) + ˜ ω ( ˜ H ) (cid:17) (3.24)which explicitly reads: p N = 1 N ln Z N > ln 2 + ln cosh r + J r ) + h tanh r − r tanh r + J/ N − J (tanh r ) /N. (3.25)Taking the liminf over N and the supremum in r of the left hand side we get (4.21)after performing the change of variables ¯ m = tanh r , and obtaining the following form forthe right hand side: p low ( ¯ m ) = J m + h ¯ m − m m − − ¯ m − ¯ m . .4.4 Exact solution of the model We have derived two lower bounds and one upper bound to the thermodynamic pressure,which are given by the suprema w.r.t. ¯ m of the following functions: p up ( ¯ m ) = p low ( ¯ m ) = ln 2 − J m + ln cosh( J ¯ m + h ) p ′ low ( ¯ m ) = J m + h ¯ m − m m − − ¯ m − ¯ m p up = p low , the supremum of this function gives the thermodynamic value of thepressure, and thus provides the exact solution to the model. However, it is important toverify that the bounds provided by all functions coincide, since for more general cases oneof the bounding arguments may fail, as indeed happens in the next chapter, where a boundof type p low cannot be found due to lack of convexity in the Hamiltonian. Furthermore, p ′ low has a direct thermodynamic interpretation, as shall be explained in the following section.For the standard Curie-Weiss model that we are studying here the equivalence of thetwo bounds can be proved by way of a peculiar property of the Legendre transformation,and we will do this in this section. Proposition 10.

The function f ∗ ( y ) = 1 J (cid:16) y y − y − y − y h (cid:17) is the Legendre transform of f ( x ) = 1 J ln 2 cosh( J x + h ) Proof.

The Legendre transformation is deﬁned by f ∗ ( y ) = sup x (cid:0) xy − f ( x ) (cid:1) Since we are dealing with a convex function we can ﬁnd the supremum by diﬀerentiation: dfdx = y − tanh( J x + h ) = 0which implies J x = arctanh y − h,

34o that by substituting we ﬁnd that the Legendre transform of f is f ∗ ( y ) = y J (arctanh y − h ) − J ln 2 cosh( arctanh y − h + h ) == y J arctanh y − yhJ − J ln 2 cosh arctanh y == y J ln 1 + y − y − yhJ − J ln (cid:16) exp {

12 ln 1 + y − y } + exp {

12 ln 1 − y y } (cid:17) == y J ln 1 + y − y − yhJ − J ln (cid:16) y + 1 − y p − y (cid:17) = y J ln 1 + y − y − yhJ − J ln (cid:16) p − y (cid:17) == 1 J (cid:16) y y ) + 1 − y − y ) − y h − ln 2 (cid:17) == 1 J (cid:16) y y − y − y − yh (cid:17) , which is the required result.We can similarly verify that the Legendre transform of g ( x ) = − x is given by thefunction g ∗ ( x ) = x .This way we see that we can write the bounding functions as: p up ( ¯ m ) = p low ( ¯ m ) = J ( f ( ¯ m ) − g ( ¯ m )) ,p ′ low ( ¯ m ) = J ( g ∗ ( ¯ m ) − f ∗ ( ¯ m )) . (3.27)and the following proposition tells us that all of the bounds that we have found coincide. Proposition 11.

Let f and g be two convex functions and f ∗ and g ∗ be their Legendretransforms. Then the following is true: sup x f ( x ) − g ( x ) = sup y g ∗ ( y ) − f ∗ ( y ) Proof.

For a nice proof see [22], or the appendix in [40].The last proposition tells us that both the variational principles we have derived providethe correct value for the thermodynamic pressure, and so the results of this section can besummarised in the following

Theorem 1.

Given a hamiltonian as deﬁned in (3.3), and deﬁning the pressure per particle s p N = 1 N ln Z , given parameters J and h , the thermodynamic limit lim N →∞ p N = p of the pressure exists, and can be expressed in one of the following equivalent forms:a) p = sup ¯ m p up ( ¯ m ) = sup ¯ m p low ( ¯ m ) b) p = sup ¯ m p ′ low ( ¯ m ) In the last section we have expressed the thermodynamic pressure of the Curie-Weiss modelas the supremum of two distinct functions. Indeed, more can be said about this variationalprinciple, since even the argument of the supremum has a very important meaning: weshall see in this section that, in case there is a unique supremum for p up = p low or p ′ low , itsargument gives the thermodynamic value of the magnetization. If there exists more thanone supremum, we have a phase transition, and each argument gives a pure state for themagnetization.First, we point out the straight-forward fact that stationary points of both p up = p low and p ′ low satisfy the condition: ¯ m ∗ = tanh( J ¯ m ∗ + h ) , (3.28)which can be found in the literature as consistency equation , mean ﬁeld equation , stateequation , secularity equation , and other names, depending on the context.This equation is indeed important: since the bounding functions are smooth, and since itcan be easily seen by checking derivatives that none of the admit suprema at the boundary of[ − , − ,

1] can be also seen as a consequence of the existence results of Section 3.2.

Proposition 12.

Let J and h be given so that p up = p low has a unique supremum, whichis attained at ¯ m ∗ . Then ¯ m ∗ = lim N →∞ ω N ( m ) = lim N →∞ ω N ( σ i ) .Proof. The following holds at ﬁnite N, by deﬁnition of the pressure p N ( J, h ): ∂p N ∂h = ω N ( m N ) .

36e have proved that { p N } is a convergent sequence of functions which are convex (fora proof of the convexity of the pressure see [32], where convexity is proved for the free-energy in the Ising model, which is essentially the same as the pressure multiplied by − N →∞ ω N ( m ) = lim N →∞ ∂p N ∂h = ∂ sup ¯ m p low ∂h whenever the last derivative exists (for a proof that the limit of the derivatives coincideswith the derivative of the limit in this case see [22] pag. 114).Therefore if we write lim N →∞ p N = p ( J, h, ¯ m ∗ ( J, h )), we can write the following: ∂ sup ¯ m p low ∂h = ∂p ( J, h, ¯ m ∗ ( J, h ) ∂h = − J ∂ ¯ m ∗ ∂h ¯ m ∗ + tanh( J ¯ m ∗ + h ) + J ∂ ¯ m ∗ ∂h tanh( J ¯ m ∗ + h ) , and by substituting (3.28) we get ∂ sup ¯ m p low ∂h = ¯ m ∗ , which is our result.A similar proposition can be proved analogously for p ′ low . Let us now write ω ( m ) = lim N →∞ ω N ( m ) and ω ( σ i ) = lim N →∞ ω N ( σ i ) . As a consequence of Proposition 12 we have that we can write p ′ low ( ¯ m ∗ ) = S − U where S = − ω ( σ i )2 ln (cid:16) ω ( σ i )2 (cid:17) − − ω ( σ i )2 ln (cid:16) − ω ( σ i )2 (cid:17) is the thermodynamic entropy and U = J ω ( m ) + hω ( m )is the thermodynamic internal energy, as can be derived directly from the Gibbs distribution.37 .6 A heuristic approach We shall now describe a heuristic procedure to obtain the consistency equation 3.28. Firstof all, we make the following observation about the Gibbs average ω N ( σ N ) of the magneti-zation: ω N ( m ) = ω N ( σ ) = 1 Z N X σ ∈{− , } N σ e − H ( σ ) We now deﬁne the following Hamiltonian ˜ H N :˜ H N = − J N + 1) N X i,j =1 σ i σ j − h N X i =1 σ i , and its associated partition function˜ Z N = X σ ∈{− , } e − ˜ H N , which allows us to write: ω N ( σ ) = P σ ∈{− , } N σ e JN P Ni =1 σ i σ N + hσ N e − ˜ H N − ( σ ) P σ ∈{− , } N e JN P Ni =1 σ i σ N + hσ N e − ˜ H N − ( σ ) == ˜ Z N P σ ∈{− , } N σ e JN P Ni =1 σ i σ N + hσ N e − ˜ H N − ( σ ) ˜ Z N P σ ∈{− , } N e JN P Ni =1 σ i σ N + hσ N e − ˜ H N − ( σ ) == ˜ Z N P σ ∈{− , } N − σ sinh( JN P N − i =1 σ i + h + JN ) e − ˜ H N − ( σ ) ˜ Z N P σ ∈{− , } N − cosh( JN P N − i =1 σ i + h + JN ) e − ˜ H N − ( σ ) == ˜ ω N (sinh( JN P N − i =1 σ i + h + JN ))˜ ω N (cosh( JN P N − i =1 σ i + h + JN ))Now, if we assume that the last line implieslim N →∞ ω N ( σ i ) = lim N →∞ ω N (sinh( J m + h )) ω N (cosh( J m + h )) (3.29)we can use the factorization properties of the model in order to derive the following.Let us consider ω N (sinh( J m + h )), and write it by making the power series at the38rgument explicit: ω N (sinh( J m + h )) = ω N (cid:16) ∞ X k =0 ( J m + h ) k (2 k + 1)! (cid:17) Now, if we consider only a partial sum up to n at the argument of the Gibbs state, andtake the thermodynamic limit, the self-averaging property of the magnetization tells us thatthe following holds a.e. in J and h :lim N →∞ ω N (cid:16) n X k =0 ( J m + h ) k (2 k + 1)! (cid:17) = lim N →∞ ω N (cid:16) n X k =0 k + 1)! k X l =0 (cid:18) kl (cid:19) ( J m ) l h k − l (cid:17) == lim N →∞ n X k =0 k + 1)! k X l =0 (cid:18) kl (cid:19) J l ω N ( m ) l h k − l (cid:17) == lim N →∞ n X k =0 ( J ω N ( m ) + h ) k (2 k + 1)!Now, disregarding convergence problems, the limit of (3.30) together with the assump-tion (3.29) give the following equation:¯ m ∗ = tanh( J ¯ m ∗ + h ) , where ¯ m ∗ is the thermodynamic magnetization. This way we have derived heuristically theconsistency equation describing the most important quantity for our model just by makinguse of the model’s factorization properties.It is important, however, to stress that the procedure we proposed in this section is notmathematically rigorous: assumption (3.29), though sensible, hasn’t been derived rigorously,and the possible convergent problems have not been considered. Nevertheless, since theprocedure has provided the right answer which we have derived rigorously throughout thechapter, and since it consists simple considerations, it can be see as a way of approachingmodels deﬁned on random networks instead that on the complete graph, which are not aswell understood as the one treated in this chapter.39 hapter 4 The Curie-Weiss model for manypopulations

In this chapter we consider the problem of characterizing the equilibrium statistical me-chanics of an mean ﬁeld interacting system partitioned into p sets of spins. The relevance ofsuch a problem to social modelling is that such a partition can be made to correspond to thepartition into classes of people sharing the same socio-economics attributes, as described inchapter 2.Our results can be summarised as follows. After introducing the model we show insection 3 that it is well posed by showing that its thermodynamic limit exists. The resultis non-trivial, since sub-additivity is not met at ﬁnite volume. In section 4 we show thatthe system fulﬁlls a factorization property for the correlation functions which reduces theequilibrium state to only p degrees of freedom. The method is conceptually similar to theone developed by Guerra in [35] to derive identities for the overlap distributions in theSherrington and Kirkpatrick model.We also derive the pressure of the model by rigorous methods developed in the recentstudy of mean ﬁeld spin glasses (see [37] for a review). It is interesting to notice that thoughvery simple, our model encompasses a range of regimes that do not admit solution by theelegant interpolation method used in the celebrated existence result of the Sherrington andKirkpatrick model [36]. This is due to the lack of positivity of the quadratic form describingthe considered interaction. Nevertheless we are able to solve the model exactly in section4.4, using the lower bound provided by the Gibbs variational principle, and thanks to afurther bound given by a partitioning of the conﬁguration space, itself originally devised inthe study of spin glasses (see [37, 19, 38]).As in the classical Curie-Weiss model, the exact solution is provided in an implicit form;40or our system, however, we ﬁnd a system of equations of state, which are coupled as wellas trascendental, and this makes the full characterization of all the possible regimes highlynon-trivial. A simple analytic result about the number of solutions for the two-populationcase is proved in section 4.5. We can generalize the Curie-Weiss model to p -populations, allowing r -body interactionswith r = 1 ..p . This gives rise to the following Hamiltonian: H N = − N p X r =1 p X i ,...,i r =1 J i ,...,i r r Y k =1 m i k , (4.1)or, equivalently, to the following Utility function for individual i : U i = p X r =1 p X i ,...,i r − =1 J i ,...,i r − ,i r − Y k =1 m i k . Here J i ,...,i r gives the interaction coeﬃcients corresponding to the r -body interactionamong individuals coming from populations i , ..., i r , respectively. We can also consider theexternal ﬁelds to be already included in this form of the model, just by setting J i = h i .So we have deﬁned interactions by using a tensor J i ,...,i r of rank r for each of the r -bodyinteractions. We shall prove that our model admits a thermodynamic limit by exploiting an existencetheorem provided for mean ﬁeld models in [8]: the result states that the existence of thepressure per particle for large volumes is guaranteed by a monotonicity condition on theequilibrium state of the Hamiltonian. Such a result proves to be quite useful when thecondition of convexity introduced by the interpolation method [36, 37] doesn’t apply dueto lack of positivity of the quadratic form representing the interactions. We therefore provethe existence of the thermodynamic limit independently of an exact solution. Such a line ofenquiry is pursued in view of further reﬁnements of our model, that shall possibly involverandom interactions of spin glass or random graph type, and that might or might not comewith an exact expression for the pressure. 41 roposition 13.

There exists a function p of all the parameters J i ,...,i r such that lim N →∞ p N = p . The previous proposition is proved with a series of lemmas. Theorem 1 in [8] statesthat given a Hamiltonian H N and its associated equilibrium state ω N the model admits athermodynamic limit whenever the physical condition ω N ( H N ) > ω N ( H N ) + ω N ( H N ) , N + N = N, (4.2)is veriﬁed.We proceed by ﬁrst verifying this condition for an alternative Hamiltonian ˜ H N , andthen showing that its pressure ˜ p N tends to our original pressure p N as N increases. Wechoose ˜ H N in such a way that the condition (4.2) is veriﬁed as an equality.Now, deﬁne the alternative Hamiltonian ˜ H N as follows:˜ H N = − C N p Y l =1 ( N i l − k l )! N i l ! X jk = Nik − ,...,Nikjk = jh for k = h σ j ...σ j r where C is a real number.Though the notation is cumbersome at this point, the new Hamiltonian simply considersproducts of r distinct spins, k i of which are taken from population i (i.e. P pi =1 k i = r ) andso the combinatorial coeﬃcient is just dividing the sum by the correct number of termscontained in the sum itself. Lemma 1.

There exists a function ˜ p such that lim N →∞ ˜ p N = ˜ p Proof.

By linearity we have that ω N ( ˜ H N ) = − C N p Y l =1 ( N i l − k l )! N i l ! X jk = Nik − ,...,Nikjk = jh for k = h ω N ( σ j ...σ j r ) = − C N ω N ( σ j ...σ j r ) , (4.3)where, with a little abuse of notation, we let σ j , .., σ j r , after the last equality be distinctspins taken from their own respective populations. The last equality hence follows from theinvariance of ˜ H N with respect to permutations of spins belonging to the same population.42quation (4.2) implies trivially ω N ( ˜ H N − ˜ H N − ˜ H N ) = 0for N + N = N , which veriﬁes (4.2) as an equality.The following two Lemmas show that the diﬀerence between H N and ˜ H N is thermody-namically negligible and as a consequence their pressures coincide in the thermodynamiclimit.Though the notation is quite tedious, the proof is in no way diﬀerent from the onedescribed in [8]. We chose to keep full generality during this existence proof in order toshow that the mean-ﬁeld allows one to consider a whole range of possibilities for interaction,which might turn out useful for the modelling eﬀort. Lemma 2. H N = ˜ H N + O (1) (4.4) i.e. lim N →∞ H N N = lim N →∞ ˜ H N N Proof.

We begin the proof by rephrasing the Hamiltonian in term of the spins, as follows: H N = − N p X r =1 p X i ,...,i r =1 n J i ,...,i r r Y k =1 N i k N i k m i k o == − p X r =1 n p X i ,...,i r =1 N N r N r r Y k =1 N i k J i ,...,i r r Y k =1 N i k m i k o == − p X r =1 p X i ,...,i r =1 n N r − r Y k =1 α i k J i ,...,i r X j k = N ik − +1 ,...,N ik σ j ...σ j r o =where N = p X i =1 N i , α i = N i N , N = 0 . We only need to give details of the proof in the case only one of the coeﬃcients J i ,...,i r = 0. The general case follows by summing up all the terms corresponding tonon-zero interacting coeﬃcients and noticing that, since this sum has only ﬁnitely manyterms, the result still holds. 43o we consider the following Hamiltonian H N = − N J i ,...,i r r Y k =1 m i k = 1 N r − r Y k =1 α i k J i ,...,i r X j k = N ik − +1 ,...,N ik σ j ...σ j r , and we can lighten our notation by setting C = α ik J i ,...,i r , H N = CN r − X j k = N ik − +1 ,...,N ik σ j ...σ j r . Now, following [8] we divide the sum in two parts, as follows: H N = CN r − X jk = Nik − ,...,Nikjk = jl for k = l σ j ...σ j r + CN r − ∗ X σ j ...σ j r . The ﬁrst part is a sums only over products of distinct spins, whereas P ∗ is a sum of allproducts where at least two spins are equal. It is straightforward to show that CN r − ∗ X σ j ...σ j r = O (1) , so that we can rewrite H N as follows: H N = CN r − X jk = Nik − ,...,Nikjk = jl for k = l σ j ...σ j r + O (1) . A straightforward calculation comparing H N and ˜ H N can now check that H N = ˜ H N + O (1) , which is our result. Lemma 3.

Say p N = 1 N ln Z N , and say h N ( σ ) = H N ( σ ) N . Deﬁne ˜ Z , ˜ p N and ˜ h N in ananalogous way.Deﬁne k N = k h N − ˜ h N k = sup σ ∈{− , +1 } N {| h N ( σ ) − ˜ h N ( σ ) |} < ∞ . (4.5)44 hen | p N − ˜ p N | k h N − ˜ h N k . Proof. p N − ˜ p N = 1 N ln Z N − N ln ˜ Z N = 1 N ln Z N ˜ Z N = 1 N ln P σ e − H N ( σ ) P σ e − ˜ H N ( σ ) N ln P σ e − H N ( σ ) P σ e − N ( h N ( σ )+ k N ) == 1 N ln P σ e − H N ( σ ) e − Nk N P σ e − Nh N ( σ ) = 1 N ln e Nk N = k N = k h N − ˜ h N k where the inequality follows from the deﬁnition of k N in (4.5) and from monotonicity of theexponential and logarithmic functions. The inequality for ˜ p N − p N is obtained in a similarfashion.We are now ready to prove the main result for this section: Proof of Proposition 13:

The existence of the thermodynamic limit follows from our Lem-mas. Indeed, since by Lemma 1 the limit for ˜ p N exists, Lemma 3 and Lemma 2 tell usthat lim N →∞ | p N − ˜ p N | lim N →∞ k h N − ˜ h N k = 0 , implying our result. (cid:3) From now on we shall restrict the model to include pair interactions only. Therefore, wehave a Hamiltonian of the following kind: H N = − N p X i,j =1 J i,j m i m j − N p X i =1 h i m i , (4.6)In this section we shall prove that the correlation functions of our model factorize com-pletely in the thermodynamic limit, for almost every choice of parameters. This implies thatall the thermodynamic properties of the system can be described by the magnetizations m i of the p populations deﬁned in Section 4.1. Indeed, the exact solution of the model, to bederived in the next section, comes as p coupled equations of state for the m i .45 roposition 14. lim N →∞ (cid:0) ω N ( σ i σ j ) − ω N ( σ i ) ω N ( σ j ) (cid:1) = 0 for almost every choice of parameters, where σ i , σ j are any two distinct spins in the system.Proof. We recall the deﬁnition of the Hamiltonian H N = − N p X i,j =1 J i,j m i m j − N p X i =1 h i m i , and of the pressure per particle p N = 1 N ln X σ e − H N ( σ ) . By taking ﬁrst and second partial derivatives of p N with respect to h i we get ∂p N ∂h i = 1 N X σ N m i ( σ ) e − H ( σ ) Z N = ω N ( m i ) , ∂ p N ∂ h i = N ( ω N ( m i ) − ω N ( m i ) ) . By using these relations we can bound above the integral with respect to h i of theﬂuctuations of m i in the Gibbs state: (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Z h (2) i h (1) i ( ω N ( m i ) − ω N ( m i ) ) dh i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = 1 N (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Z h (2) i h (1) i ∂ p N ∂h i dh i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) = 1 N (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) Z h (2) i h (1) i ∂p N ∂ h i (cid:12)(cid:12)(cid:12)(cid:12) h (2) i h (1) i (cid:12)(cid:12)(cid:12)(cid:12)(cid:12) N (cid:0)(cid:12)(cid:12) ω N ( m i ) | h (2) i (cid:12)(cid:12) + (cid:12)(cid:12) ω N ( m i ) | h (1) i (cid:12)(cid:12)(cid:1) = O (cid:0) N (cid:1) . (4.7)On the other hand we have that ω N ( m i ) = ∂p N ∂h i , and ω N ( m i ) = 2 ∂p N ∂J i,i , so, by convexity of the thermodynamic pressure p = lim N →∞ p N , both quantities ∂p N ∂h i and ∂p N ∂J i,i have well deﬁned thermodynamic limits almost everywhere. This together with (4.7)implies that lim N →∞ ( ω N ( m i ) − ω N ( m i ) ) = 0 a.e. in h i , J i,i . (4.8)46n order to prove our statement we shall write the magnetization m i in terms of spinsbelonging to the i th population, and then use the permutation invariance of the Gibbsmeasure: ω N ( m i ) = ω N ( 1 N i − N i − N i X j = N i − σ i ) = ω N ( σ ) ,ω N ( m i ) = ω N ( 1( N i − N i − ) N i X j, l = N i − σ j σ l ) == ω N ( 1( N i − N i − ) N i X j = l = N i − σ j σ l ) + ω N ( 1( N i − N i − ) N j X j = l = N j − σ j σ l ) == N i − N i − − N i − N i − ω N ( σ σ ) + 1 N i − N i − . (4.9)We have that (4.9) and (4.8) implylim N →∞ ω N ( σ i σ j ) − ω N ( σ i ) ω N ( σ j ) = 0 , (4.10)which veriﬁes our statement for all couples of spins i = j belonging to the same population.Furthermore, by deﬁning Var N ( m i ) = (cid:0) ω N ( m i ) − ω N ( m i ) (cid:1) for all populations i , weexploit (4.8), and use the Cauchy-Schwartz inequality to get | ω N ( m i m j ) − ω N ( m i ) ω N ( m j ) | q Var( m i )Var( m j ) −→ N →∞ J i,i , J j,j , h i , h j (4.11)By using (4.9) and (4.11) we can therefore verify statements which are analogous to(4.10), but which concern ω N ( σ i σ j ) where σ i and σ j are spins belonging to diﬀerent subsets.We have thus proved our claim for any couple of spins in the global system. We shall derive upper and lower bounds for the thermodynamic limit of the pressure. Thelower bound is obtained through the standard entropic variational principle, while the upperbound is derived by a decoupling strategy. 47 .4.1 Upper bound

In order to ﬁnd an upper bound for the pressure we shall divide the conﬁguration spaceinto a partition of microstates of equal magnetization, following [19, 37, 38]. Since eachpopulation g consists of N g spins, its magnetization can take exactly N g + 1 values, whichare the elements of the set R N g = n − , − N g , . . . , − N g , o . Clearly for every m g ( σ ) we have that X ¯ m g ∈ R Ng δ m g , ¯ m g = 1 , where δ x,y is a Kronecker delta. This allows us to rewrite the partition function as follows: Z N = X σ exp (cid:8) N p X i,j =1 J i,j m i m j + N p X i =1 h i m i (cid:9) == X σ X ∀ g ¯ m g ∈ R Ng p Y g =1 δ m g , ¯ m g exp (cid:8) N p X i,j =1 J i,j m i m j + N p X i =1 h i m i (cid:9) . (4.12)Thanks to the Kronecker delta symbols, we can substitute m i (the average of the spinswithin a conﬁguration) with the parameter ¯ m i (which is not coupled to the spin conﬁgura-tions) in any convenient fashion.Therefore we can use the following relations in order to linearize all quadratic termsappearing in the Hamiltonian ( m i − ¯ m i ) = 0 ∀ i, ( m i − ¯ m i )( m j − ¯ m j ) = 0 ∀ i = j, . Once we’ve carried out these substitutions into (4.12) we are left with a function which48epends only linearly on the m i : Z N = X σ X ∀ g ¯ m g ∈ R Ng p Y g =1 δ m g , ¯ m g exp (cid:8) N p X i,j =1 J i,j m i m j + N p X i =1 h i m i (cid:9) == X σ X ∀ g ¯ m g ∈ R Ng p Y g =1 δ m g , ¯ m g exp (cid:8) N p X i,j =1 J i,j ( m i ¯ m j + ¯ m i m j − ¯ m i ¯ m j ) + N p X i =1 h i m i (cid:9) == X σ X ∀ g ¯ m g ∈ R Ng p Y g =1 δ m g , ¯ m g exp (cid:8) − N p X i,j =1 J i,j ¯ m i ¯ m j + N p X i,j =1 J i,j ( m i ¯ m j + ¯ m i m j ) ++ N p X i =1 h i m i (cid:9) =and bounding above the Kronecker deltas by 1 we get Z N X σ X ∀ g ¯ m g ∈ R Ng exp (cid:8) N p X i,j =1 J i,j ¯ m i ¯ m j + N p X i,j =1 J i,j ( m i ¯ m j + ¯ m i m j ) + N p X i =1 h i m i (cid:9) =(4.13)As observed many times by Guerra [37], since both sums are taken over ﬁnitely manyterms, it is possible to exchange the order of the two summation symbols, in order to carryout the sum over the spin conﬁgurations, which now factorizes, thanks to the linearity ofthe interaction with respect to the m g . This way we get: Z N X ∀ g ¯ m g ∈ R Ng G ( ¯ m , ..., ¯ m p ) . where G = exp (cid:8) − N p X i,j =1 J i,j ¯ m i ¯ m j } · p Y j =1 N j (cid:0) cosh (cid:0) p X i =1 J i,j + J j,i α j ¯ m i + h j α j (cid:1)(cid:1) N j (4.14)where α j = N j N Since the summation is taken over the ranges R N g , of cardinality N g + 1, we get that49he total number of terms is p Y g =1 ( N g + 1). Therefore Z N p Y g =1 ( N g + 1) sup ¯ m ,..., ¯ m p G, (4.15)which leads to the following upper bound for p N : p N = 1 N ln Z N p X g =1 N ln( N g + 1) + 1 N ln sup ¯ m ,..., ¯ m p G . (4.16)Now deﬁning the N independent function p UP = 1 N ln G = ln 2 − p X i,j =1 J i,j ¯ m i ¯ m j + p X j =1 α j ln cosh (cid:0) p X i =1 J i,j + J j,i α j ¯ m i + h j α j (cid:1) , (4.17)where α j = N j N the thermodynamic limit gives: lim sup N →∞ p N sup ¯ m ,..., ¯ m p p UP . (4.18)We can summarize the previous computation into the following: Lemma 4.

Given a Hamiltonian as deﬁned in (4.6), and deﬁning the pressure per particleas p N = N ln Z , given parameters J i,j and h i , the following inequality holds: lim sup N →∞ p N sup ¯ m ,..., ¯ m p p UP where p UP = ln 2 − p X i,j =1 J i,j ¯ m i ¯ m j + p X j =1 α j ln cosh (cid:0) p X i =1 J i,j + J j,i α j ¯ m i + h j α j (cid:1) , (4.19) and ¯ m i ∈ [ − , . .4.2 Lower bound The lower bound is provided by exploiting the well-known Gibbs entropic variational prin-ciple (see [58], pag. 188). In our case, instead of considering the whole space of ansatz probability distributions considered in [58], we shall restrict to a much smaller one, anduse the upper bound derived in the last section in order to show that the lower boundcorresponding to the restricted space is sharp in the thermodynamic limit.The mean-ﬁeld nature of our Hamiltonian allows us to restrict the variational prob-lem to a p-degrees of freedom product measures represented through the non-interactingHamiltonian: ˜ H = − r N X i =1 σ i − r N + N X i = N +1 σ i + ... − r p N X i = P p − i =1 N i +1 σ i , and so, given a Hamiltonian ˜ H , we deﬁne the ansatz Gibbs state corresponding to it as f ( σ ) as: ˜ ω ( f ) = P σ f ( σ ) e − ˜ H ( σ ) P σ e − ˜ H ( σ ) In order to facilitate our task, we shall express the variational principle of [58] in thefollowing simple form:

Proposition 15.

Let a Hamiltonian H , and its associated partition function Z = X σ e − H be given. Consider an arbitrary trial Hamiltonian ˜ H and its associated partition function ˜ Z . The following inequality holds: ln Z > ln ˜ Z − ˜ ω ( H ) + ˜ ω ( ˜ H ) . (4.20) Given a Hamiltonian as deﬁned in (5.1) and its associated pressure per particle p N = N ln Z ,the following inequality follows from (4.20) : lim inf N →∞ p N > sup ¯ m ,..., ¯ m p p LOW (4.21) where p LOW = 12 p X g,k =1 J g,k ¯ m g ¯ m k + p X g =1 h g ¯ m g + p X g =1 α g S ( ¯ m g ) , (4.22)51 he function S ( ¯ m g ) being the entropy S ( ¯ m g ) = − m g m g − − ¯ m g − ¯ m g and ¯ m g ∈ [ − , .Proof. The (4.20) follows straightforwardly from Jensen’s inequality: e ˜ ω ( − H + ˜ H ) ≤ ˜ ω ( e − H + ˜ H ) . (4.23)The Hamiltonian (4.6) can be written in term of spins as: H ( σ ) = − N p X g,k =1 n J g,k α g α k X i ∈ P g , j ∈ P k σ i σ j o − p X g =1 { h g α g X i ∈ P g σ i } , ; (4.24)where P g contains the labels for spins belonging to the g th subpopulation, that is P g = { g − X k =1 N k + 1 , g − X k =1 N k + 2 , ..., g X k =1 N k } indeed its expectation on the trial state is˜ ω ( H ) = − N p X g,k =1 n J g,k α g α k X i ∈ P g , j ∈ P k ˜ ω ( σ i σ j ) o − p X g =1 { h g α g X i ∈ P g ˜ ω ( σ i ) } (4.25)and a standard computation for the moments leads to˜ ω ( H ) = − N p X g =1 (1 − N α g ) J g,g (tanh r g ) − p X g =1 α g J g,g − N p X g = k =1 J g,k tanh r g tanh r k − N p X g =1 h g tanh r g . (4.26)Analogously, the Gibbs state of ˜ H is:˜ ω ( ˜ H ) = − N p X g =1 α g r g tanh r g , Z N = X σ e − ˜ H ( σ ) = p X g =1 N g (cosh r g ) N g which implies that the non-interacting pressure gives˜ p N = 1 N ln ˜ Z N = ln 2 + p X g =1 α g ln cosh r g So we can ﬁnally apply Proposition (4.20) in order to ﬁnd a lower bound for the pressure p N = 1 N ln Z N : p N = 1 N ln Z N > N (cid:16) ln ˜ Z N − ˜ ω ( H ) + ˜ ω ( ˜ H ) (cid:17) (4.27)which explicitly reads: p N = 1 N ln Z N > ln 2 + p X g =1 α g ln cosh r g + (4.28)+ 12 p X g,k =1 J g,k tanh r g tanh r k + p X g =1 h g tanh r g (4.29) − p X g =1 α g r g tanh r g ++ 12 N p X g =1 J g,g α g (tanh r g ) + 12 N p X g =1 α g J g,g (4.30)(4.31)Taking the lim inf over N and the supremum in the variables r g the left hand side weget the (4.21) after performing the change of variables ¯ m g = tanh r g . Though the functions p LOW and p UP are diﬀerent, it is easily checked that they share thesame local suprema. Indeed, if we diﬀerentiate both functions with respect to parameters¯ m g , we see that the extremality conditions are given in both cases by the Mean Field53quations: ¯ m g = tanh (cid:16) p X k =1 J g,k + J k,g α g ¯ m k + h g α g (cid:17) g = 1 ..p (4.32)If we now use these equations to express tanh − m i as a function of m i and we substituteback into p UP and p LOW we get the same function: p = − p X g,k =1 J g,k ¯ m g ¯ m k − p X g =1 α g

12 ln 1 − ¯ m g . (4.33)Since this function returns the value of the pressure when the vector ( ¯ m , .., ¯ m p ) corre-sponds to an extremum, and this is the same both for p LOW and p UP , we have proved thefollowing: Theorem 1.

Given a hamiltonian as deﬁned in (4.6), and deﬁning the pressure per particleas p N = 1 N ln Z , given parameters J i,j and h i , the thermodynamic limit lim N →∞ p N = p of the pressure exists, and can be expressed in one of the following equivalent forms:a) p = sup ¯ m ,.., ¯ m p p LOW b) p = sup ¯ m ,.., ¯ m p p UP The form we derived for the pressure can be rightfully considered a solution of the statisticalmechanical model, since it expresses the thermodynamic properties of a large number ofparticles in terms of a ﬁnite number of parameters.Nevertheless, the equations of state cannot be solved explicitly in terms of the parame-ters: indeed, even the phase diagram for the two-population case has only been characterisedfully in a subset of our parameter space, in which it has been found useful for a few physicalapplications [13, 44, 46]. This gives us a feeling of how the mean ﬁeld assumption, beingsimplistic from one point of view, can given rise to models exhibiting non-trivial behaviour.In this section we shall focus on the two-population case, which is the case considered inthe applications of the next chapter, and ﬁnd an analytic result concerning the maximumnumber of equilibrium states arising from our equations of state. In particular we shall54rove that, for any choice of the parameters, the total number of local maxima for thefunction p ( ¯ m , ¯ m ) is less or equal to ﬁve.By applying a convenient relabelling to the model’s parameters, we get the mean ﬁeldequations for our two-population model in the following form: ( ¯ m = tanh( J α ¯ m + J (1 − α ) ¯ m + h )¯ m = tanh( J α ¯ m + J (1 − α ) ¯ m + h ) , and correspond to the stationarity conditions of p ( ¯ m , ¯ m ). So, a subset of solutions to thissystem of equations are local maxima, and some among them correspond to the thermody-namic equilibrium.These equations give a two-dimensional generalization of the Curie-Weiss mean ﬁeldequation. Solutions of the classic Curie-Weiss model can be analysed by elementary ge-ometry: in our case, however, the geometry is that of 2 dimensional maps, and it pays torecall that Henon’s map, a simingly harmless 2 dimensional diﬀeomorhism of R , is knownto exhibit full-ﬂedged chaos. Therefore, the parametric dependence of solutions, and inparticular the number of solutions corresponding to local maxima of p ( ¯ m , ¯ m ), is in noway apparent from the equations themselves.We can, nevertheless, recover some geometric features from the analogy with one-dimensional picture. For the classic Curie-Weiss equation, continuity and the IntermediateValue Theorem from elementary calculus assure the existence of at least one solution. Inhigher dimensions we can resort to the analogous result, Brouwer’s Fixed Point Theorem,which states that any continuous map on a topological closed ball has at least one ﬁxedpoint. This theorem, applied to the smooth map R on the square [ − , , given by ( R ( ¯ m , ¯ m ) = tanh( J α ¯ m + J (1 − α ) ¯ m + h ) R ( ¯ m , ¯ m ) = tanh( J α ¯ m + J (1 − α ) ¯ m + h )establishes the existence of at least one point of thermodynamic equilibrium.We can gain further information by considering the precise form of the equations: byinverting the hyperbolic tangent in the ﬁrst equation, we can ¯ m as a function of ¯ m , andvice-versa for the second equation. Therefore, when J = 0 we can rewrite the equationsin the following fashion:  ¯ m = 1 J (1 − α ) (tanh − ¯ m − J α ¯ m − h )¯ m = 1 J α (tanh − ¯ m − J (1 − α ) ¯ m − h ) (4.34)55onsider, for example, the ﬁrst equation: this deﬁnes a function ¯ m ( ¯ m ), and we shallcall its graph curve γ . Let’s consider the second derivative of this function: ∂ ¯ m ∂ ¯ m = − J (1 − α ) · m (1 − ¯ m ) . We see immediately that this second derivative is strictly increasing, and that it changessign exactly at zero. This implies that γ can be divided into three monotonic pieces, eachhaving strictly positive third derivative as a function of ¯ m . The same thing holds forthe second equation, which deﬁnes a function ¯ m ( ¯ m ), and a corresponding curve γ . Ananalytical argument easily establishes that there exist at most 9 crossing points of γ and γ (for convenience we shall label the three monotonic pieces of γ as I , II and III , from leftto right): since γ , too, has a strictly positive third derivative, it follows that it intersectseach of the three monotonic pieces of γ at most three times, and this leaves the number ofintersections between γ and γ bounded above by 9 (see an example of this in Figure 4.1).By deﬁnition of the mean ﬁeld equations, the stationary points of the pressure corre-spond to crossing points of γ and γ . Furthermore, common sense tells us that not all ofthese stationary points can be local maxima. This is indeed true, and it is proved by thefollowing: Proposition 16.

The function p ( ¯ m , ¯ m ) admits at most 5 maxima. To prove 16 we shall need the following:

Lemma 5.

Say P and P are two crossing points linked by a monotonic piece of one ofthe two functions considered above. Then at most one of them is a local maximum of thepressure p ( ¯ m , ¯ m ) .Proof of Lemma 5: The proof consists of a simple observation about the meaning of ourcurves. The mean ﬁeld equations as stationarity conditions for the pressure, so each of γ and γ are made of points where one of the two components of the gradient of p ( ¯ m , ¯ m )vanishes. Without loss of generality assume that P is a maximum, and that the componentthat vanishes on the piece of curve that links P to P is ∂p∂ ¯ m .Since P is a local maximum, p ( ¯ m , ¯ m ) locally increases on the piece of curve γ . Onthe other hand, the directional derivative of p ( ¯ m , ¯ m ) along γ is given by ˆt · ∇ p where ˆt is the unit tangent to γ . Now we just need to notice that by assumptions for anypoint in γ ˆt lies in the same quadrant, while ∇ p is vertical with a deﬁnite verse. This56mplies that the scalar product giving directional derivative is strictly non-negative over all γ , which prevents P form being a maximum. (cid:3) Proof of Proposition 16:

The proof considers two separate cases:a) All crossing points can be joined in a chain by using monotonic pieces of curve suchas the one deﬁned in the lemma;b) At least one crossing point is linked to the others only by non-monotonic pieces ofcurve.In case a), all stationary can be joined in chain in which no two local maxima can benearest neighbours, by the lemma. Since there are at most 9 stationary points, there canbe at most 5 local maxima.For case b) assume that there is a point, call it P , which is not linked to any other pointby a monotonic piece of curve. Without loss of generality, say that P lies on I (which, werecall, is deﬁned as the leftmost monotonic piece of γ ). By assumption, I cannot containother crossing points apart from P , for otherwise P would be monotonically linked to atleast one of them, contradicting the assumption. On the other hand, each of II and III contain at most 3 stationary points, and, by Lemma 5, at most 2 of these are maxima. Sowe have at most 2 maxima on each of II and III , and and at most 1 maximum on I , whichleaves the total bounded above by 5. The cases in which P lies on II , or on III , are provedanalogously, giving the result. (cid:3) igure 4.1: The crossing points correspond to solutions of the mean ﬁeld equations hapter 5 Case studies

In previous chapters we deﬁned a model which, generalizing well known tools from econo-metrics, provides a viable approach to study phenomena of human interaction. Its well-posedness as an equilibrium statistical mechanical model, proved in the last chapter, thoughsupporting the idea that modelling social phenomena working from the bottom up may befeasible, doesn’t imply the relevance of the proposed tool to any actual scenario: indeed,for any model such relevance may only be established as a result of success in describing,and most importantly predicting events from the real world.There are many possible instances from the social sciences to which quantitative mod-elling is an appealing prospective. Due to the increasingly global nature of human mobility,one particularly timely social issue is immigration. The applicability of our model to immi-gration matters was considered in References [16] and [17]. Reference [17] analyses how themicroscopic assumptions of the model reﬂect the tendency of individuals to act consistentlywith their cultural legacy as well as with what they identify as their social group, which areboth tenets in the ﬁeld of social psychology. The numerical analysis carried out in Refer-ence [16] shows how such simple assumptions are enough for the model to identify regimesin which a global change in a cultural trait is triggered by a small fraction of immigrantsinteracting with a large population of residents.The descriptive power shown by the model in the case of immigration further supportsthe view that equilibrium statistical mechanics can play a role in a quantitative theory ofsocial phenomena. However, though qualitatively inspiring, the immigration scenario seemsill-suited as a ﬁrst quantitative case study, due to the intrinsic diﬃculty of ﬁnding a databasethat characterizes such a social issue adequately. We therefore turn to the problem of giving that is, starting from individual interactions and trying to establish patterns that might be at work ona larger scale We consider a population of individuals facing with a “YES/NO” question, such as choosingbetween marrying through a religious or a civil ritual, or voting in favor or against of deathpenalty in a referendum. We index individuals by i, i = 1 ...N , and assign a numerical valueto each individual’s choice σ i in the following way: σ i = ( +1 if i says YES − i says NO , Consistently with the many population Curie-Weiss model analysed in the last chapter,which as we saw generalises the multinomial logit model described in chapter 2, we assumethat the joint probability distribution of these choices is well approximated by a Boltzmann-Gibbs distribution corresponding to the following Hamiltonian H N ( σ ) = − N X i,l =1 J il σ i σ l − N X i =1 h i σ i . Heuristically, this distribution favours the agreement of people’s choices σ i with someexternal inﬂuence h i which varies from person to person, and at the same time favoursagreement of a couple of people whenever their interaction coeﬃcient J il is positive, whereasfavors disagreement whenever J il is negative.Given the setting, the model consists of two basic steps:1) A parametrization of quantities J il and of h i ,2) A systematic procedure allowing us to “measure” the parameters characterizing the60odel, starting from statistical data (such as surveys, polls, etc).The parametrization must be chosen to ﬁt as well as possible the data format available,in order to deﬁne a model which is able to make good use of the increasing wealth of dataavailable through information technologies. Let us ﬁrst consider our model when it ignores interactions J il ≡ ∀ i, l ∈ (1 , ..., N ), thatis H N ( σ ) = − N X i =1 h i σ i . The model shall be applied to data coming from surveys, polls, and censuses, whichmeans that together with the answer to our binary question, we shall have access to infor-mation characterizing individuals from a socio-economical point of view. We can formalizesuch further information by assigning to each person a vector of socio-economic attributes a i = { a (1) i , a (2) i , ..., a ( k ) i } where, for instance, a (1) i = ( i Male0 for i Female , and a (2) i = ( i Employee0 for i Self-employed , etc.As we have seen in chapter 2, the general setting of the multinomial logit allows toexploit the supplementary data by assuming that h i (which is the “ﬁeld” inﬂuencing thechoice of i ) is a function of the vector of attributes a i . Since for the sake of simplicity wechoose our attributes to be binary variables, so that the most general form for h i turns outto be linear h i = k X j =1 α j a ( j ) i + α and the model’s parameters are given by the components of the vector α = { α , α , ..., α k } .It’s worth pointing out that the parameters α j , j = 0 ...k do not depend on the speciﬁcindividual i . 61e know that discrete choice theory holds that, when making a choice, each personweights out various factors such as his own gender, age, income, etc, as to maximize inprobability the beneﬁt arising from his/her decision. Parameters α tell us the relativeweight (i.e. their relative importance importance) that the various socio-economic factorshave when people are making a decision with respect to our binary question. The parameter α does not multiply any speciﬁc attribute, and thus it is a homogeneous inﬂuence which isfelt by all people in the same way, regardless of their individual characteristics. A discretechoice model is considered good when the parametrized attributes are very suitable forthe speciﬁc choice, so that the parameter α is found to be small in comparison to theattribute-speciﬁc ones.We have shown in chapter 2 that elementary statistical mechanics gives us the probabilityof an individual i with attributes a i answering “YES” to our question as: p i = P ( σ i = 1) = e h i e h i + e − h i ,h i = k X j =1 α j a ( j ) i + α , which as we saw is equivalent to the result obtained by applying economics’ utility maxi-mization principle to a random utility with Gumbel disturbances. Therefore collecting thechoices made by a relevant number of people, and keeping track of their socio-economicattributes, allows us to use statistics in order to ﬁnd the value of α for which our distri-bution best ﬁts the real data. This in turn allows to assess the implications on aggregatebehavior if we apply incentives to the population which aﬀect speciﬁc attribute, as can becommodity prices in a market situation. The kind of model described in the last section has been successfully used by econometricsfor the last thirty years [50], and has opened the way to the quantitative study of socialphenomena. Such models, however, only apply to situations where the functional relationbetween the people’s attributes α and the population’s behavior is a smooth one: it is evermore evident, on the other hand, that behavior at a societal level can be marked by suddenjumps [51, 61, 47].There exist many examples from linguistics, economics, and sociology where it has beenobserved how the global behaviour of large groups of people can change in an abrupt manneras a consequence of slight variations in the social structure (such as, for instance, a change62n the pronunciation of a language due to a little immigration rate, or as a substantialdecrease in crime rates due to seemingly minor action taken by the authorities) [3, 31, 47].From a statistical mechanical point of view, these abrupt transitions may be considered asphase transitions caused by the interaction between individuals, and this is what led us toconsider in this thesis the interesting mapping between discrete choice econometrics andthe Curie-Weiss theory, ﬁrst stated in [21].We then go back to studying the general interacting model H N ( σ ) = − N X i,l =1 J il σ i σ l − N X i =1 h i σ i , (5.1)while keeping h i = k X j =1 α j a ( j ) i + α . We now need to ﬁnd a suitable parametrization for the interaction coeﬃcients J il . Sinceeach person is characterized by k binary socio-economic attributes, the population can benaturally partitioned into 2 k subgroups, so that using the mean-ﬁeld assumptions allowsone to rewrite the model in terms of subgroup-speciﬁc magnetizations m g , as in the generalHamiltonian (4.1). Equation (4.1) is general enough to consider populations with diﬀerentrelative sizes (such as one in which residents make up a much larger share of population thanimmigrants): nevertheless, it turns out that the mean-ﬁeld assumption implies a relation ofdirect proportionality between interaction coeﬃcients and population sizes, that might beconsidered innatural.The approach taken in this thesis, therefore, is to consider sub-populations of comparablesize, and model them in the thermodynamic limit as having equal size. In speciﬁc, in allcases we divide the data into two geographical regions which have a similar population. This“equal size” assumption can be considered as part of the modelling process: by using it toanalyze data, as we do here, we can gain insights on how to relax it in future reﬁnements ofthe model. So, for the time being, let J il depend explicitly on a partition of sub-populationsof equal sizes. By using the mean-ﬁeld assumption we can express this as follows J il = 12 k N J gg ′ , if i ∈ g and l ∈ g ′ , where g and g ′ are two sub-population (not necessarily distinct). This in turn allows us to63ewrite (5.1) as H N ( σ ) = − N k ( k X g,g ′ =1 J gg ′ m g m g ′ + k X g =1 h g m g )where m g is the average opinion of group g : m g = 12 k N g N/ k X i =( g − N/ k +1 σ i . We readily see how this is the many-population model considered in the previous chap-ter, and this gives us a solid microscopic foundation for the theory. Indeed, the resultswe obtained through relatively elementary mathematics establish rigourously the existenceof the model’s thermodynamic limit, as well as its factorization properties, and just asimportantly provide us with a closed form for the thermodynamic state equations.Therefore if we are willing to test how well the model’s assumptions compare with realdata, we can use these equations as the main tool for a procedure of statistical estimation.Here we shall confront the simple case where k = 1. This is a bipartite model which, as weknow from the last chapter, can have at most ﬁve metastable equilibrium states, given bythe thermodynamically stable solutions to the following equations:¯ m = tanh( J ¯ m + J ¯ m + h ) (5.2)¯ m = tanh( J ¯ m + J ¯ m + h ) (5.3)Equation (4.32) which was derived from the model’s exact solution shows that theequilibrium state equations for a system consisting of two parts of equal size do not carrytwo diﬀerent parameters J and J , but that, even if these two parameters were diﬀerentin the Hamiltonian, what characterizes each of the two subparts is rather their average( J + J ) /

2. We keep J and J as two distinct parameters throughout the statisticalapplication in order to use them as a consistency test: we shall be able to consider systemsto be in equilibrium only if J − J = 0.The state equations (5.2) allow us, in particular, to write the probability of i choosingYES in a closed form, similar to the non-interacting one: p i = P ( σ i = 1) = e U g e U g + e − U g , (5.4)64here U g = X g ′ =1 J g,g ′ ¯ m g ′ + h g . This is the basic tool needed to estimate the model starting from real data. We describethe estimation procedure in the next section.

We have seen that according to the model an individual i belonging to group g has proba-bility of choosing “YES” equal to p i = e U g e U g + e − U g where U g = X g ′ J g,g ′ ¯ m g ′ + h g . The standard approach of statistical estimation for discrete models is to maximize theprobability of observing a sample of data with respect to the parameters of the model (seee.g. [6]). This is done by maximizing the likelihood function L = Y i p i with respect to the model’s parameters, which in our case consist of the interaction matrix J and the vector α .Our model, however, is such that p i is a function of the equilibrium states m g , which inturn are discontinuous functions of the model’s parameters. This problem takes away muchof the appeal of the maximum likelihood procedure, and calls for a more feasible alternative.The natural alternative to maximum likelihood for problems of model regression is givenby the least squares method [25], which simply minimizes the squared norm of the diﬀerencebetween observed quantities, and the model’s prediction. Since in our case the observedquantities are the empirical average opinions ˜ m g , we need to ﬁnd the parameter valueswhich minimize X g ( ˜ m g − tanh U g ) , (5.5)which in our case correspond to satisfying as closely as possible the state equations (5.2) insquared norm. This, however, is still computationally cumbersome due to the non-linearity65f the function tanh( U j ). This problem has already been encountered by Berkson back inthe nineteen-ﬁfties, when developing a statistical methodology for bioassay [7]: this is aninteresting point, since this stimulus-response kind of experiment bears a close analogy tothe natural kind of applications for a model of social behavior, such as linking stimula givenby incentive through policy and media, to behavioral responses on part of a population.The key observation in Berkson’s paper is that, since U g is a linear function of themodel’s parameters, and the function tanh( x ) is invertible, a viable modiﬁcation to leastsquares is given by minimizing the following quantity, instead: X g (arctanh ˜ m g − U g ) . (5.6)This reduces the problem to a linear least squares problem which can be handled withstandard statistical software, and Berkson ﬁnds an excellent numerical agreement betweenthis method and the standard least squares procedure.There are nevertheless a number of issues with Berkson’s approach, which are analyzedin [6], pag. 96. All the problems arising can be traced to the fact that to build (5.6), weare collecting the individual observations into subgroups, each of average opinion m g . Theproblem is well exempliﬁed by the case in which a subgroup has average opinion m g ≡ ±

1: inthis case arctanh m g = −∞ , and the method breaks down. However the event m g ≡ ± We shall carry out the estimation program for real situations which correspond to a verysimple case of our model. The data was obtained from periodical censuses carried out byIstat : since census data concerns events which are recorded in oﬃcial documents, for alarge number of people, we ﬁnd it to be an ideal testing ground for our model.For the sake of simplicity, individuals are described by a single binary attribute charac-terizing their place of residence (either Northern or Southern Italy) and we chose, among theseveral possible case studies, the ones for which choices are likely to involve peer interactionin a major way. Italian National Institute of Statistics

To address this ﬁrst task we use data from the annual report on the institution of marriagecompiled by Istat in the seven years going from 2000 to 2006. The reason for choosing thisspeciﬁc social question is both a methodological and a conceptual one.Firstly, we are motivated by the exceptional quality of the data available in this case,since it is a census which concerns a population of more than 250 thousand people per year,for seven years. This allows us some leeway from the possible issues regarding the samplesize, such as the one highlighted in the last section. And just as importantly the availabilityof a time series of data measured at even times also allows to check the consistency of thedata as well as the stability of the phenomenon.Secondly, marriage is probably one of the few matters where a great number of individ-uals make a genuine choice concerning their life that gets recorded in an oﬃcial document,as opposed to what happens, for example, in the case of opinion polls.We choose to study the data with one of the simplest forms of the model: individualsare divided according to only to a binary attribute a (1) , which takes value 1 for peoplefrom Northern Italy, and 0 for people form Southern Italy. In the formalism of Section 2,therefore, the model is deﬁned by the Hamiltonian H N ( σ ) = − N J m + ( J + J ) m m + J m + h m + h m ) ,h i = α a (1) i + α , and the state equations to be used for Berkson’s statistical procedure are given by (5.4).Table 5.1 shows the time evolution of the share of men choosing to marry through areligious ritual: the population is divided in two geographical classes. The ﬁrst thing worthnoticing is that these shares show a remarkable stability over the seven-year period: thisconﬁrms how, though arising from choices made by distinct individuals, who bear extremelydiﬀerent personal motivations, the aggregate behavior can be seen as an observable featurecharacterizing society as a whole. 67 of religious marriages, by yearRegion 2000 2001 2002 2003 2004 2005 2006Northern Italy Southern Italy

Table 5.1: Percentage of religious marriages, by year and geographical region α -0.10 ± ± ± ± α ± ± ± ± J ± ± ± ± J ± ± ± ± J -0.21 ± ± ± ± J ± ± ± ± Table 5.2: Religious vs civil marriages: estimation of the interacting model

In order to apply Berkson’s method of estimation, we choose gather the data into periodsof four years, starting with 2000 − − g choosing the religious ritual in a speciﬁc year (say in 2000) by m g ,we have that the quantity that ought to be minimized in order to estimate the model’sparameters for the ﬁrst period is the following, which we label X : X = X year =2000 2 X g =1 (arctanh m yearg − U yearg ) ,U yearg = X g ′ =1 J g,g ′ m yearg ′ + h g ,h g = α a (1) g + α . The results of the estimation for the four periods are shown in Table 5.2, whereas Table5.3 shows the corresponding estimation for a discrete choice model which doesn’t take intoaccount interaction.

The second case study uses data from the annual report compiled by Istat in the six yearsgoing from 2000 to 2005. The data show how divorcing couples chose between a consensual68 -year periodParameter 2000-2003 2001-2004 2002-2005 2003-2006 α ± ± ± ± α -0.41 ± ± ± ± Table 5.3: Religious vs civil marriages: estimation of the non-interacting model % of consensual divorces, by yearRegion 2000 2001 2002 2003 2004 2005N orthern Italy 75.06 80.75 81.32 81.62 81.55 81.58 S outhern Italy 58.83 72.80 71.80 72.61 72.76 72.08 Table 5.4: Percentage of consensual divorces, by year and geographical region and a non-consensual divorce in Northern and Southern Italy. As shown in Table 5.4 heretoo, when looking at the ratio among consensual versus the total divorces, the data show aremarkable stability.Again we gather the data into periods of four years and Table 5.5 presents the estimationof our model’s parameters for the whole available period, while in Table 5.6 we show thecorresponding ﬁt by the non-interacting discrete choice model.We notice that the estimated parameters have some analogies with the preceding casestudy in that here too the cross interactions J , J are statistically close to zero whereasthe diagonal values J , J are both greater than one suggesting an interaction scenariocharacterized by multiple equilibria [28]. Furthermore, in both cases the attribute-speciﬁcparameter α is larger than the generic parameter α in the interacting model (Tables 2 and5), as opposed to what we see in the non-interacting case (Tables 3 and 6): this suggeststhat by accounting for interaction we might be able to better evaluate the role played bysocio-economic attributes. The last case study deals with suicidal tendencies in Italy, again following the annual reportcompiled by Istat in the eight years from 2000 to 2007, and we use the same geographicalattribute used for the former two studies.The data in Table 5.7 shows the percentage of deaths due to hanging as a mode ofexecution. The topic of suicide is of particular relevance to sociology: indeed, the very ﬁrstsystematic quantitative treatise in the social sciences was carried out by ´Emile Durkheim69 -year periodParameter 2000-2003 2001-2004 2002-2005 α ± ± ± α -0.25 ± ± ± J ± ± ± J ± ± ± J -0.05 ± ± ± J -0.08 ± ± ± Table 5.5: Consensual vs non-consensual divorces: estimation of the interacting model α ± ± ± α ± ± ± Table 5.6: Consensual vs non-consensual divorces: estimation of the non-interacting model [20], a founding father of the subject, who was puzzled by how a phenomenon as unnaturalas suicide could arise with the astonishing regularity that he found. Such a regularity aseven been dimmed the “sociology’s one law” [56], and there is hope that the connection tostatistical mechanics might eventually shed light on the origin of such a law.Mirroring the two previous case studies, we present the time series in Table 5.7, whereasTable 5.8 shows the estimation results for the interacting model, and Table 5.9 are theestimation results for the discrete choice model. Again, the data agrees with the analogiesfound for the two previous case studies. % suicides by hangingRegion 2000 2001 2002 2003 2004 2005 2006 2007Northern Italy

Southern Italy

Table 5.7: Percentage of suicides with hanging as mode of execution, by year and geographical region -year periodParameter 2000-2003 2001-2004 2002-2005 2003-2006 2004-2007 α ± ± ± ± ± α ± ± ± ± ± J ± ± ± ± ± J ± ± ± ± ± J ± ± ± ± ± J ± ± ± ± ± Table 5.8: Suicidal tendencies: estimation of the interacting model α -0.25 ± ± ± ± ± α -0.05 ± ± ± ± ± Table 5.9: Suicidal tendencies: estimation of the non-interacting model

We shall now estimate our model parameters using a diﬀerent approach, which makesexplicit use of the time ﬂuctuations of our main observable quantities ˜ m i . This approach isnot econometric, but typically statistical mechanical, in that it equates ﬂuctuations observedover time with ﬂuctuations of a system which is in an equilibrium which is deﬁned byan ensemble of states rather than by a single state. The problem of retracing a model’sparameters from observable quantities in this context has been referred to in the literatureas the “inverse Ising problem” (see e.g. [64]).We start from the usual model H N ( σ ) = − N J m + ( J + J ) m m + J m + h m + h m ) , (5.7) h i = α a (1) i + α , and we shall analyze the data from our three case studies again using the model’s stateequations ¯ m = tanh( J ¯ m + J ¯ m + h ) , ¯ m = tanh( J ¯ m + J ¯ m + h ) , (5.8)71hich, as we shall see, will now also provide us with the system’s ﬂuctuations as well as theaverage quantities. Just as in the last section, we choose to use two distinct parameters J and J inside the state equations (5.8) instead of their average ( J + J ) in order to testfor consistency. The method presented here comes from an observation about quantity ¯ m i ∂h j , which is called m i ’s susceptibility with respect to external ﬁeld h i in physics, or m i ’s elasticity with respectto incentive h i in econometrics.The two relevant points of view that make ¯ m i ∂h j such an interesting quantity are thoseof statistical mechanics and thermodynamics. For statistical mechanics ∂ ¯ m i ∂h j is a quantity deﬁned internally to the system. The followingformula clariﬁes this point: From (5.7) ∂ ¯ m i ∂h j = ∂∂h j n X σ m i ( σ ) e − H N ( σ ) Z o = N (cid:0) ω N ( m i m j ) − ω N ( m i ) ω N ( m j ) (cid:1) ≡ c ij . (5.9)The quantity ∂ ¯ m i ∂h j , which we shall refer to as c ij for notational convenience, is thussimply the amount of ﬂuctuations that we observe in quantities m i : if imagine the systemas a closed box, and we imagine being inside such closed box, we can in principle measure c ij by studying the way m i vary. The second point of view is intrinsically diﬀerent: for thermodynamics ∂ ¯ m i ∂h j corresponds tothe response of the “closed box” mentioned in the last paragraph to an external inﬂuencegiven by a small change in the ﬁeld h j . Diﬀerently from statistical mechanics, thermo-dynamics cannot provide us with this response’s value a priori from observations, since itdoesn’t know any details of what is going on inside the box. Thermodynamics does tell us,however, that responses of the system to diﬀerent inﬂuences , if the system is to obey tothe thermodynamic law identiﬁed by state equations (5.8).72hese interrelations can be made explicit by considering the partial derivatives of (5.8) ∂ ¯ m ∂h = (1 − ¯ m ) (cid:16) J ∂ ¯ m ∂h + J ∂ ¯ m ∂h + 1 (cid:17) ,∂ ¯ m ∂h = (1 − ¯ m ) (cid:16) J ∂ ¯ m ∂h + J ∂ ¯ m ∂h (cid:17) ,∂ ¯ m ∂h = (1 − ¯ m ) (cid:16) J ∂ ¯ m ∂h + J ∂ ¯ m ∂h + 1 (cid:17) ,∂ ¯ m ∂h = (1 − ¯ m ) (cid:16) J ∂ ¯ m ∂h + J ∂ ¯ m ∂h (cid:17) , By relabeling d i = (1 − ¯ m i ) and using deﬁnition (5.9) we can rewrite this system ofequations as J c + J c = c d − ,J c + J c = c d ,J c + J c = c d − ,J c + J c = c d . This is linear in the J ij , and the former two equations are independent from the lattertwo, so that we can easily solve for the J ij using Cramer’s rule. This together with theequations of state (5.8) allows us to express all the model parameters J i,j and h i as functionsof the observable quantities ¯ m i and c ij , as follows: J = c c c − c = J ,J = (cid:0) c d − (cid:1) c − c d c c − c ,J = (cid:0) c d − (cid:1) c − c d c c − c ,h = arctanh ¯ m − J ¯ m − J ¯ m ,h = arctanh ¯ m − J ¯ m − J ¯ m .

73n this case we see the consistency condition J = J fulﬁlled a priori. This tells us that,given a set of sub-magnetizations, together with its covariance matrix, our parametrizedfamily contains one and only one model corresponding to it. As a consequence we can saythat such model makes use of exactly the amount of information provided into the timeseries of standard statistics (i.e. means and covariances) of a poll-type database.Estimators for ¯ m i and c ij from the time series data are straightforward to obtain, andwe have gathered these statistics for our three case studies in Tables 5.10, 5.12 and 5.14.Given a time period T , which in our case shall correspond to a range of four consecutiveyears, we deﬁne estimators ˜ m i ( T ) of ¯ m i and ˜ c ij ( T ) of ¯ c ij corresponding to it˜ m i ( T ) = 1 | T | X year ∈ T ¯ m yeari , ˜ c i,j ( T ) = N T | T | X year ∈ T ( ¯ m yeari − ˜ m i ( T ))( ¯ m yearj − ˜ m j ( T )) . (5.10)We must point out that in order to be well deﬁned, such estimators should apply to atime series of samples which are of equal size, since susceptibility c i,j has indeed an explicitsize dependence. Our systems, on the other hand, cannot be of equal size since they consistof people who chose to participate into an activity, and the number of these people cannot beestablished a priori. As stated before, however, the point of view in this thesis is that humanaﬀairs can behave following the kind of quasi-static processes familiar to thermodynamics.Consistently with this perspective, and with some justiﬁcation coming from the considereddata, we shall consider the system’s population a slowly varying quantity, and use its averageof small periods of time as the quantity N T in order to deﬁne ˜ c i,j ( T ) N T = 1 | T | X year ∈ T N year . We can thus use relations (5.10) in order to obtain estimates for the model parameters. Byconsidering that α = h , (5.11) α = h − h , (5.12)we can compare the new estimates, presented in Tables 5.11, 5.13 and 5.15, with those fromthe preceding section. 74 .6.2 Comments on results from the two estimation approaches We can now compare Tables 5.11, 5.13 and 5.15 with their counterparts from last section,which estimated the same model for the same data coming from our three chosen casestudies, using our adaptation of Berkson’s method.Such comparison can be summarised as follows: comparing Table 5.11, showing param-eter estimations for the “religious vs civil marriage” case study, with Table 5.2, we ﬁnd theestimated values to be deﬁnitely diﬀerent, but we also see that they bear some interestingsimilarities, especially if we consider the conﬁdence interval provided by the least squaresmethod in Table 5.2. Three shared features are particularly noteworthy:- The estimated values for J and J are similar in one aspect: in both cases J isestimated to be consistently greater than J over the years;- J is estimated to be very close to zero in Table 5.11: J and J can be considered tobe statistically zero in Table 5.2 (which is also consistent with the condition J − J =0);- α and α consistently estimated with equal signs by both methods: this is an essentialprerequisite that any model needs to satisfy.The agreement is not good for the two remaining case studies, however. In the “con-sensual vs non-consensual divorce” case study, despite estimations being consistent in theﬁrst time range (that is 2000-2003), agreement gets worse and worse in the following twoperiods. As for the third case study, the two estimation methods do not show any agreementwhatsoever.An important point to be made is the dependence of method agreement against popu-lation size. For the ﬁrst case-study, where the population is made up of over 200 thousandpeople the agreement between the two methods is good. In the second case-study we havea population of roughly 40 thousand people, and we ﬁnd agreement in one of the threeconsidered time spans. The third case-study doesn’t show any agreement: the populationsize here, however, is of only around 2000 people.Finally, though the last point certainly motivates further enquiry, one should not beover-conﬁdent about population size being the only problem. An extremely importantobjection comes from the fact that wherever agreement is found, estimators ˜ c ij are foundto give very high values. We must remember that we are looking at the data through amodel that assumes equilibrium: such big ˜ c ij values correspond to large ﬂuctuations, andthese should cause an equilibrium model to be less precise and not more.75he failure of the two estimation methods to give consistent results in regimes with smallﬂuctuations (that is whenever ˜ c ij are small), reveal the presented study as inconclusive onan empirical level. There are however several improvements that can be made by usingthe same framework established here, the most important one concerning the handling ofthe data. This thesis has as its goal to propose both a model, and a procedure allowing toestablish the empirical relevance of the model itself. It was hence of the foremost importanceto show a concrete example of such a procedure; since this was not a professional work instatistics, however, it featured several drawbacks, some of which can be described as follows:- Though showing a remarkable temporal coherence, the time series consists of a numberof measurements which is insuﬃcient for any statistic to be reliable. In order to workon consistent groups of data, the choice was made to gather data in four-year ranges:the situation may be improved by considering a phenomenon having the same kind oftemporal coherence, but for which measurements are available on a monthly basis;- The regional separation between “Northern Italy” and “Southern Italy” is an artiﬁcialone, decided for technical reasons. The quality of the statistical study could be greatlyimproved by considering a partition into groups which is directly relevant to the issueunder study;- No use was made of the data regarding the relative sizes of the considered sub-populations. This, as noted before, was due to a diﬃculty arising from the mean-ﬁeld assumption, which lead us to characterize the population as having equal size.This drawback can be amended in two ways: 1) at a fundamental level, by furtherconsidering the implications of having populations of diﬀerent size for the model 2)by keeping the same model, but considering estimators for c ij that make use of theinformation coming from the subpopulation sizes.A ﬁnal point to make concerns the model itself: very little is known about the structureof the phase diagram of a mean-ﬁeld model of a multi-part system: indeed, as noted inearlier chapters, a subcase case of a two-part system considered here was studied in severaloccasions since the nineteen-ﬁfties [33, 9] until recently [46], and found to be highly non-trivial. As a consequence, it is to be expected that the analysis of the features characterizingthe regime that empirical data identify will need to be treated locally and numerically beforeany kind of global picture arises, and it is not a priori clear whether the presence of bigvalues for the c ij might characterize and interesting regime rather than just a failure of themodel. It is mainly for this reason that much of the eﬀort in this thesis has been directed76owards the aim of establishing a way to link the model to data, rather than to pursuefurther the analytic treatment of the model on its own. ˜ m m c c c Table 5.10: Religious vs civil marriages: statistics α -0.21 -0.18 -0.15 -0.10 α J J J Table 5.11: Religious vs civil marriages: estimation of the interacting model ˜ m m c c c Table 5.12: Consensual vs non-consensual divorces: statistics -year periodParameter 2000-2003 2001-2004 2002-2005 α -0.05 0.23 -0.71 α -0.17 0.01 5.81 J J J Table 5.13: Consensual vs non-consensual divorces: estimation of the interacting model ˜ m -0.29 -0.29 -0.29 -0.30 -0.28˜ m -0.25 -0.26 -0.25 -0.24 -0.25˜ c c c -0.08 0.00 0.11 -0.50 -0.91 Table 5.14: Suicidal tendencies: statistics α -1.21 -0.16 -0.06 -0.13 -0.11 α J -0.01 -0.53 -2.33 -0.45 0.51 J -3.40 0.41 0.57 0.75 0.76 J -0.39 0.00 0.19 -0.22 -0.14 Table 5.15: Suicidal tendencies: estimation of the interacting model cknowledgment In ﬁrst place I would like to express my gratitude to my advisor Prof. Pierluigi Contucci,for introducing me to the challenging topic of which this work gives but a glimpse, andfor carefully guiding me throughout the process of my Ph.d. Special thanks go to FedericoGallo, who made me aware of the subtle connection between statistical mechanics andeconometrics which is central to the thesis defended here; many thanks are due to CristianGiardin`a for his fundamental role in educating me on one of the key-points of the analytictreatment provided in this work, and to Adriano Barra for his important contribution to theempirical case studies considered. I would also like to thank Diego Grandi, who was availablefor many interesting discussions on physics which were of great help to the completion ofthis task, and Stefano Ghirlanda and Giulia Menconi with whom I was happy to collaboratein the early stage of my doctorate. 79 eferences [1] Akerlof G., Social distance and economic decisions,

Econometrica : 1005-1027,1997.[2] Ariely D., Predictably irrational - the hidden forces that shape our decisions , Harper-Collins, London, 1986.[3] Ball P.,

Critical Mass , Arrow books, 2004.[4] Bandura A.,

Social Foundations of Thought and Action: A Social Cognitive Theory ,Prentice Hall, 1986.[5] Barra A., The mean ﬁeld Ising model throught interpolating techniques,

J. Stat. Phys. : 234-261, 2008.[6] Ben-Akiva M., Lerman S. R.,

Discrete Choice Analysis , The MIT Press, 1985.[7] Berkson J., A statistically precise and relatively simple method of estimating thebioassay with quantal response, based on the logistic function,

Journal of the Ameri-can Statistical Association : 565-599, 1953.[8] Bianchi A., Contucci P., Giardin`a C., Thermodynamic limit for mean ﬁeld spin mod-els, Math. Phys. E J , 2004.[9] Bidaux R., Carrara P., Vivet B., Antiferromagnetisme dans un champ moleculaire. I.Traitment de champ moleculaire, J. Phys. Chem. Solids : 2453-2469, 1967.[10] Bond R., Smith P. B., Culture and conformity: A meta-analysis of studies usingAsch’s (1952b,1956) line judgment task, Psychological Bulletin : 111137, 1996.[11] Borghesi C., Bouchaud J. P., Of songs and men: a model for multiple choice withherding,

Quality and Quantity : 557-568, 2007.8012] Bovier A., Gayrard V., The thermodynamics of the Curie-Weiss model with randomcouplings, Journal of Statistical Physics : 643-663, 1993.[13] Cohen E.G.D., Tricritical points in metamagnets and helium mixtures, Fundamen-tal Problems in Statistical Mechanics, Proceedings of the 1974 Wageningen SummerSchool , North-Holland/American Elsevier, 1973.[14] Chandler D.,

Introduction to modern statistical mechanics , Oxford University Press,1987.[15] Cont R., Lowe M., Social distance, heterogeneity and social interaction,

Centre desMath´ematiques Appliqu´ees, Ecole Polytechnique

R.I. No 505: 2003.[16] Contucci P., Gallo I., Menconi G., Phase transitions in social sciences: two-populationmean ﬁeld theory,

Int. Jou. Mod. Phys. B (14): 1-14, 2008.[17] Contucci P., Gallo I., Ghirlanda S., Equilibria of culture contact derived from ingroupand outgroup attitudes, arXiv:0712.1119 , 2008.[18] Curie P., Propri´et´e ferromagnetiqu´e des corps a diverse temp´eratures, Ann. de Chim.et de Phys. , 7 e s´erie, V : 289, 1895.[19] De Sanctis L., Structural approachs to spin glasses and optimization problems, Ph.D.Thesis, Department of Mathematics, Princeton University , 2005.[20] Durkheim E.,

Le Suicide. ´Etude de sociologie , Paris Alcan, 1897.[21] Brock W., Durlauf S., Discrete Choice with Social Interactions,

Review of EconomicStudies : 235-260, 2001.[22] Ellis R. S., Large deviations and statistical mechanics , Springer, 1985.[23] F¨ollmer H., Random Economies wih Many Interacting Agents,

J. Math. Econ. :51-62, 1973.[24] Fox J., Daly A.J., Gunn H., Review of RAND Europes transport demand modelsystems, RAND , 2003.[25] Hensher D. A., Rose J. M., Greene W. H.,

Applied Choice Analysis: A Primer ,Cambridge University Press, 2005.[26] Galam S., Moscovici S., Towards a theory of collective phenomena: Consensus andattitude changes in groups,

European Journal of Social Psychology : 49-74, 1991.8127] Gallo F., Contucci P., Coutts A., Gallo I., Tackling climate change through energyeﬃciency: mathematical models for evidence-based public policy recommendations, arXiv:0804.3319 , 2008.[28] Gallo I., Contucci P., Bipartite Mean Field Spin Systems. Existence and Solution, Math. Phys. E J , 2008.[29] Gallo I., Barra A., Contucci P., Parameter Evaluation of a Simple Mean-Field Modelof Social Interaction, arXiv:0810.3029 , 2008.[30] Gerard K., Shanahan M., Louviere J., Using stated Preference Discrete Choice Mod-elling to inform health care decision-making: A pilot study of breast screening par-ticipation, Applied Economics (9): 1073-1085, 2003.[31] Gladwell M., The Tipping Point , Little, Brown and Company, 2000.[32] Goldenfeld N.,

Lectures on phase transitions and the renormalization group , Addison-Wesley, 1995.[33] Gorter C. J., Van Teski-Tinbergen T., Transitions and phase diagram in an or-thorhombic antiferromagnetic crystal,

Physica :273-287, 1956.[34] Granovetter M., Threshold models of collective behaviour, Am. J. Sociol. : 1420-1443, 1978.[35] Guerra F., About the overlap distribution in a mean ﬁeld spin glass model, Int. J.Phys. B : 1675-1684, 1997.[36] Guerra F., Toninelli F. L., The Thermodynamic Limit in Mean Field Spin GlassModels, Communications in Mathematical Physics , 2002.[37] Guerra F., Spin Glasses, cond-mat/0507581 , 2006.[38] Guerra F., Mathematical aspects of mean ﬁeld spin glass theory, cond-mat/0410435 ,2005.[39] Halpern D., Bates C., Mulgan G., Aldridge S., Personal Responsibility and ChangingBehaviour: the state of knowledge and its implications for public policy,

CabinetOﬃce, S Unit, G Britain , 2004.[40] van Hemmen J.L., van Enter A.C.D., Canisius J., On a classical spin glass model,

ZPhys B : 311-336, 1983. 8241] Huang K., Statistical mechanics, 2nd ed. , Wiley, 1987.[42] Ida T., Kuroda T., Discrete choice analysis of demand for broadband in Japan,

Journalof Regulatory Economics (1): 5-22, 2006.[43] Kenneth B., Game Theory and The Social Contract , MIT Press, 1998.[44] Kincaid J.M., Cohen E.G.D., Phase diagrams of liquid helium mixtures and metam-agnets: experiment and mean ﬁeld theory,

Physics Letters C : 58-142, 1975.[45] Knott D., Muers S., Aldrige S., Achieving Cultural Change:A Policy Framework, TheStrategy Unit, Cabinet Oﬃce, UK : 2007.[46] K¨ulske C., Le Ny A., Spin-Flip Dynamics of the Curie-Weiss Model: Loss of Gibb-sianness with Possibly Broken Symmetry,

Communications in Mathematical Physics : 431-454, 2007.[47] Kuran T., Now Out of Never,

World politics , 1991.[48] Luce R.,

Individual choice behavior: a theoretical analysis , J. Wiley and Sons, 1959.[49] Luce R., Suppes P., Preferences, Utility and Subjective Probability, in Luce R., BushR. and Galenter E.,

Handbook of Mathematical Psychology, Vol. 3 , Wiley, 1965.[50] Mcfadden D., Economic Choices,

The American Economic Review : 351-378, 2001.[51] Michard Q., Bouchaud J. P., Theory of collective opinion shifts: from smooth trendsto abrupt swings , The European Physical Journal B , : 151-159, 2001.[52] Milgram S., The small world problem, Psychology today , 1967.[53] Ortuzar J., Wilumsen L.,

Modelling Transport , Wiley, 2001.[54] Paag H., Daly A.J., Rohr C., Predicting use of the Copenhagen harbour tunnel, inDavid Hensher,

Travel behaviour research: the leading edge , Pergamon, 2001.[55] Persky J., Restrospectives: The Ethology of Homo Economicus,

The Journal of Eco-nomic Perspectives (2): 221-223, 1995.[56] Pope W., Danigelis N., Sociology’s One Law, Social Forces , 1981.[57] Royden H.L.,

Real Analysis 3rd ed. , Prentice Hall, 1988.[58] Ruelle D.,

Statistical mechanics: rigorous results , Addison Wesley, 1989.8359] Ryan M., Gerard K., Using discrete choice experiments to value health care pro-grammes: current practice and future research reﬂections,

Applied Health Economicsand Health Policy (1): 55-64, 2003.[60] Ryan M., Netten A., Skatun D., Smith P., Using discrete choice experiments to esti-mate a preference-based measure of outcome An application to social care for olderpeople, Journal of Health Economics (5): 927-944, 2006.[61] Salganik M. J., Dodds P. S., Watts D. J., Experimental Study of Inequality andUnpredictability in an Artiﬁcial Cultural Market, Science : (2006)[62] Schelling T.,

Micromotives and Macrobehaviour , W W Norton & Co Ltd, 1978.[63] Scheinkman J. A., Social interactions, in

The New Palgrave Dictionary of Economics,2nd Edition , Palgrave Macmillan, 2008.[64] Sessak V. and Monasson R., Small-correlation expansions for the inverse Ising prob-lem,

J. Phys. A: Math. Theor. : 1-17, 2009.[65] Soetevent A. R., Kooreman P., A discrete-choice model with social interactions: withan application to high school teen behavior, Journal of Applied Econometrics :599-624, 2007.[66] Talagrand M., Spin Glasses: A Challenge for mathematicians , Springer-Verlag, 2003.[67] Thompson C. J.,

Classical Equilibrium Statistical Mechanics , Clarendon Press, 1988:91-95.[68] Train K.,

Discrete choice methods with simulation , Cambridge University Press, 2003.[69] Watts D. J., Dodds P. S., Inﬂuentials, Networks and Public Opinion Formation,