aa r X i v : . [ ec on . T H ] N ov A Model of Competing Narratives ∗ Kfir Eliaz and Ran Spiegler † November 13, 2018
Abstract
We formalize the argument that political disagreements can betraced to a “clash of narratives”. Drawing on the “Bayesian Networks”literature, we model a narrative as a causal model that maps actionsinto consequences, weaving a selection of other random variables intothe story. An equilibrium is defined as a probability distribution overnarrative-policy pairs that maximizes a representative agent’s antic-ipatory utility, capturing the idea that public opinion favors hopefulnarratives. Our equilibrium analysis sheds light on the structure ofprevailing narratives, the variables they involve, the policies they sus-tain and their contribution to political polarization. ∗ Financial support by ERC Advanced Investigator grant no. 692995 is gratefully ac-knowledged. We thank Heidi Thysen, Stephane Wolton and conference audiences at ES-SET and CCET for helpful comments. † Eliaz: School of Economics, Tel-Aviv University and Economics Dept., Columbia Uni-versity. E-mail: kfi[email protected]. Spiegler: School of Economics, Tel-Aviv Universityand Economics Dept., University College London and CFM. E-mail: [email protected]. Introduction
It has become commonplace to claim that political disagreements can betraced to a “ clash of narratives ”. Going beyond differences in preferences orinformation, divergent opinions emanate from fundamentally different inter-pretations of reality that take the form of stories . Consequently, a policygains in popularity if it can be sustained by an effective narrative; and politi-cians and public-opinion makers spend considerable energy on trying to shapethe popular narratives that surround policy debates.There are countless expressions of this idea in popular and academic dis-course. For instance, a recent New Yorker profile of a former aide of PresidentObama begins with the words “Barack Obama was a writer before he becamea politician, and he saw his Presidency as a struggle over narrative”. Like-wise, two public policy professors write in an LSE blog that “there can belittle doubt then that people think narratives are important and that craft-ing, manipulating, or influencing them likely shapes public policy”. Theyadd that narratives simplify complex policy issues “by telling a story thatincludes assertions about what causes what, who the victims are, who iscausing the harm, and what should be done”. In this paper we offer a formalization of the idea that battles over pub-lic opinion involve competing narratives. Of course, the term “narrative”is vague and any formalization inevitably leaves many of its aspects outsidethe scope of investigation. Our model is based on the idea that in the con-text of public-policy debates, narratives can be regarded as causal models that map actions to consequences. Following the literature on probabilisticgraphical models in Statistics, Artificial Intelligence and Psychology (Cowellet al. (1999), Sloman (2005), Pearl (2009)), we represent such causal modelsby directed acyclic graphs (DAGs).In our model, what defines a narrative is the variables it incorporatesand the way these are arranged in the causal mapping from actions to conse-quences. For instance, consider a debate over US trade policy and its possible See http://blogs.lse.ac.uk/impactofsocialsciences/2018/07/18/mastering-the-art-of-the-narrative-using-stories-to-shape-public-policy/. → imports from China → employment (1)represents a narrative that weaves a third variable (imports from China) intoa causal story that regulates the action-consequence mapping which is thesubject of the policy debate.The nodes in the DAG represent variables (not the values they can take),and the links represent perceived direct causal effects (but not the sign ormagnitude of these effects). The variables are coarse-grained, such that thenarrative does not describe an individual historical episode; instead, it canbe used to interpret a wealth of historical episodes. It alerts the public’sattention to long-run correlations between adjacent variables along the causalchain and invites a causal interpretation of these correlations.We refer to the narrative represented by (1) as a “ lever narrative ” becauseit regards imports from China as a “lever” - i.e., as an endogenous variablethat is influenced by policy and in turn influences the target variable. In-tuitively, this narrative supports a protectionist policy: imports from Chinaare negatively correlated with both protectionism and employment in thelocal manufacturing sector, and it is natural to interpret these correlationsin terms of the causal chain (1). But while the support is intuitive, it is illu-sory if the narrative is false - e.g. if the actual correlation between importsfrom China and employment is due to the confounding effect of exogenoustechnological change.The following is another example of a lever narrative in the context of aforeign policy debate. The policy question is whether to impose economicsanctions on a rival country with a hostile regime. The public considers desta-bilizing the regime a desirable outcome. A lever narrative that intuitivelygives support to a hawkish policy issanction policy → economic situation in rival country → regime stability3he following is a lever narrative that involves a different “lever”:sanction policy → nationalism in rival country → regime stabilityThis narrative intuitively supports a dovish policy because nationalistic sen-timents in the rival country are positively correlated with the stability of itsregime and potentially ameliorated by a soft stance on sanctions.Thus, two narratives may have the same “lever” structure but differ inthe selection of variables that function as “levers”, and consequently in thepolicies they support. Likewise, the same variable can be assigned differentroles in the causal scheme. For instance, the following is a foreign-policynarrative that treats nationalism as an exogenous variable:sanction policy → regime stability ← nationalism in rival countryWe refer to a narrative with this structure as a “threat/opportunity narra-tive”, because it regards the third variable that it weaves into the story asan external variable that the policy responds to rather than influences it. Inthe context of our foreign-policy example, this narrative intuitively favors ahawkish policy because it regards the prospect of waning nationalism in therival country as an opportunity for toppling its regime, which tough sanctionpolicy can exploit.Thus, foreign-policy narratives can differ in the variables they weave intothe story or in the role that these variables play in the causal mapping fromactions to consequences. This is akin to a dramatist’s decision about whichevents to include as ingredients in a story and how to construct a plot aroundthem. Different narratives can generate different beliefs regarding the map-ping from actions to consequences - and therefore lend support to differentpolicies - because they alert the audience’s attention to correlations betweendifferent sets of variables and manipulate its causal interpretation of thesecorrelations. A public-opinion maker who wishes to promote a particularpolicy will therefore devise a narrative that “sells” it most effectively.Our objective is to define a notion of equilibrium in public-policy debates,4n which narrative-policy pairs vie for dominance in public opinion. Whenthe public adopts a narrative, we assume - following Spiegler (2016) - that itconstructs a belief over the narrative’s variables, by factorizing their objectivejoint distribution according to the so-called “Bayesian-Network factorizationformula”, and it relies on this belief to evaluate policies. This factorizationcaptures the notion of fitting the causal model to objective data. A wrongcausal model can induce a distorted belief regarding the mapping from actionsto consequences.To summarize the first ingredient of our model, a narrative is an arrange-ment of selected variables in a causal model (formalized as a DAG), combinedwith a rule for generating beliefs from such a causal model. But what hap-pens when the public confronts competing narratives? Here we invoke thesecond ingredient of our model, which is the idea that the public selects be-tween narrative-policy pairs “hedonically” - i.e., according to the indirectanticipatory utility that each one of them generates.The idea that people adopt distorted beliefs to enhance their anticipatoryutility has several precedents in the literature (Akerlof and Dickens 1982),Brunnermeier and Parker (2005), Spiegler (2008)). Recently, Montiel Oleaet al. (2018) studied the notion of “competing models” in a very differentcontext of linear regression models that differ in the set of variables theyadmit, and assumed that prevailing models maximize the indirect expectedutility they induce when estimated against a random sample. In the con-text of public-policy debates, we find it particularly natural to assume thatthe public will be drawn to hopeful narrative-policy pairs. Precisely becauseindividuals have little influence over public policy, they incur negligible de-cision costs when indulging in hopeful fantasies. It is therefore realistic toassume that anticipatory feelings are a powerful driving force behind politicalpositions.Based on these two ingredients, we define equilibrium as a steady-statedistribution over narrative-policy pairs, such that every element in the sup-port maximizes a representative agent’s anticipatory utility. Why we do referto this concept as “equilibrium” instead of plain maximization? The reasonis that the action frequencies that are induced by a given distribution over5arrative-policy pairs can affect belief (and hence the anticipatory utility)that each narrative generates. This feedback effect is fundamental to theidea of beliefs that are generated by fitting a wrong causal model to objec-tive long-run data (see Spiegler (2016)), and it is what creates the need foran equilibrium approach to the notion of competing narratives.We employ our equilibrium concept to explore several questions: Whichnarratives are attached to various policies - that is, what is their causalstructure and what kind of variables do they involve? Can we account fordivergent popular policies by the notion of competing narratives? Are swingsbetween conflicting dominant narratives fundamental to battles over publicopinion? The results we present demonstrate the formalism’s potential toshed light on the role of narratives in political debates. Related literature
The idea that people think about empirical regularities in terms of “causalstories” that can be represented by DAGs has been embraced by psycholo-gists of causal reasoning (e.g. Sloman (2005), Sloman and Lagnado (2015)).Spiegler (2016) adopted this idea as a basis for a model of decision makingunder causal misperceptions. In Spiegler (2016), a decision maker forms asubjective belief by fitting a subjective causal model to objective long-rundata. This continues to be a building block of the model in this paper, whichgoes beyond it in two major directions: first, the collection of variables thatcan appear in a causal model of a given size is not fixed but selected en-dogenously; and second, we assume “hedonic” selection between competingcausal models.We are aware of at least three papers in economics that draw attention tothe role of narratives in economic contexts. Given that the term “narrative”has such a loose meaning, it should come as no surprise that it has receivedvery different formalizations. Shiller (2017) does not provide an explicitmodel of what a narrative is. Instead, he regards certain terms and expres-sions that appear in popular discourse as indications of a specific narrativeand proposes to use epidemiological models to study their spread. Benabouet al. (2016) focus on moral decision making and formalize narratives asmessages or signals that can affect decision makers’ beliefs regarding the ex-6ernality of their actions. Levy and Razin (2018) use the term “narrative” todescribe information structures in game-theoretic settings that people pos-tulate to explain observed behavior.Finally, our paper joins a handful of works in so-called “behavioral polit-ical economics” that study voters’ belief formation according to misspecifiedsubjective models or wrong causal attribution rules - e.g., Spiegler (2013),Esponda and Pouzo (2017); and see Schnellenbach and Schubert (2015) fora survey.
Let X = X × · · · × X n , where n > X i = { , } for each i = 1 , ..., n .For every N ⊆ { , ..., n } , denote X N = × i ∈ N X i . For any x ∈ X , the com-ponents x and x n - also denoted a and y - are referred to as an action and a consequence . These components are independently distributed. Inparticular, actions have no causal effects on consequences.Let Q be a finite set of conditional distributions over x , ..., x n − that havefull support for every x , x n . Given a pair of numbers α, µ ∈ (0 , P α,µ ⊂ ∆( X ) as the set of distributions p for which p ( a = 1) = α , p ( y =1 | a ) = µ for all a , and ( p ( · | x , x n )) is in Q . We regard µ as a constant,whereas α represents a historical action frequency that we endogenize below.A directed acyclic graph (DAG) is a pair ( N, R ), where N ⊆ { , ..., n } isa set of nodes and R ⊆ N × N is a set of directed links. Acyclicity meansthat the graph contains no directed path from a node to itself. We use iRj or i → j to denote a directed link from the node i into the node j . Abusingnotation, let R ( i ) = { j ∈ N | jRi } be the set of “parents” of node i . We willoften suppress N in the notation of a DAG and identify it with R .Following Pearl (2009), we interpret a DAG as a causal model , wherethe link i → j means that x i is perceived as an immediate cause of x j .Directedness and acyclicity of R are consistent with basic intuitions regardingcausality. The causal model is agnostic about the sign or magnitude of causaleffects.Let R be a collection of DAGs ( N, R ) satisfying two restrictions: { , n } ⊆ , and there is no directed path from n to 1 - i.e., the consequence variableis not perceived as a (possibly indirect) cause of the action. In all the DAGsthat appear in the examples we will examine, 1 is an ancestral node (i.e., R (1) = ∅ ) and n is the unique terminal node (i.e., n / ∈ R ( i ) for every i ∈ N and there is no other node with this property). However, these propertiesare not necessary for our general analysis. Narratives and their induced beliefs
Fix α, µ ∈ (0 , narrative is a pair s = ( p, R ) ∈ P α,µ × R . The narrativeinduces a subjective belief over ∆( X N ), defined as follows: p R ( x N ) = Y i ∈ N p ( x i | x R ( i ) ) (2)The full-support assumption ensures that all the terms in this factorizationformula are well-defined.The conditional distribution of x n given x induced by p R is computed inthe usual way. It has a simple expression when 1 is an ancestral node: p R ( x n | x ) = X x ,...,x n − Y i> p ( x i | x R ( i ) ) ! (3)For illustration, when the DAG is R : 1 → → ←
2, the narrative ( p, R )induces p R ( x , x , x , x ) = p ( x ) p ( x ) p ( x | x ) p ( x | x , x )and p R ( x | x ) = X x ,x p ( x ) p ( x | x ) p ( x | x , x )The interpretation of this belief formation process is as follows. In a nar-rative ( p, R ), the conditional distribution p ( x , ..., x n − | x , x n ) represents aselection of n − variables that are incorporated into the story.In other words, every conditional distribution in Q is implemented by somecollection of n − R determines how thesevariables (some or all of them) are woven into a causal structure. This isakin to a novelist who conjures up a collection of events, and then organizes8heir unfolding according to a plot. The narrative generates a subjectivebelief regarding the mapping from actions to consequences, by alerting theaudience’s attention to particular correlations - those that the causal modeldeems relevant - and combining them according to the causal model. Thecorrelations themselves are accurate - i.e., each of the terms in the factor-ization formula (2) is extracted from an objective distribution (over a, y andthe selected additional variables). However, the way they are combined maylead to distorted belief, such that p R ( y = 1 | a ) = µ for some a . Policies and anticipatory utility
Let D = [ ε, − ε ], where ε > policy d ∈ D is aproposed mixture over actions, where d is the proposed frequency of playingthe action a = 1.Given a historical action frequency α , a narrative s = ( p, R ) and a policy d induce the following gross anticipatory utility : V ( s, d | α ) = d · p R ( y = 1 | a = 1) + (1 − d ) · p R ( y = 1 | a = 0) (4)Note that V is defined for a given α because the set of feasible narrativesvaries with α , but also (as we will later see) because the subjective distribu-tion p R ( y | a ) is not invariant to α .A representative agent has a utility function u ( y, d ) = y − C ( d − d ∗ ),where d ∗ ∈ D is the agent’s ideal policy, and C is a symmetric, convexcost function that satisfies C (0) = C ′ (0) = 0. The function C representsthe agent’s intrinsic disutility he experiences when deviating from his idealpolicy. Note that if the agent had rational expectations, he would realizethat y is independent of a and find no reason to deviate from d ∗ . Given α ,The agent’s net anticipatory utility from the narrative-policy pair ( s, d ) is U ( s, d | α ) = V ( s, d | α ) − C ( d − d ∗ ) (5)One may wonder why there is a need to define policy as a continuousvariable, rather than identifying it with the binary action. The reason, asusual in these cases, is that we want our model to generate a fine mapping9rom the subjective belief p R ( y | a ) to policies. In addition, certain interestingeffects in our model would disappear or become obscured under a binary-policy specification. Equilibrium
The model’s primitives are the exogenous probability of a good outcome µ ,the set of conditional distributions Q , the set of feasible DAGs R and the costfunction C . The objects P α,µ , p R and U are derived from these primitives.We are now ready to define our notion of equilibrium. Definition 1
An action frequency α ∈ [0 , and a probability distribution σ over narrative-policy pairs constitute an equilibrium if two conditions hold: Supp ( σ ) ⊆ arg max ( s,d ) ∈ P α,µ ×R× D U ( s, d | α ) and α = X ( s,d ) σ ( s, d ) · d This concept captures a steady-state in the battle over public opinion.The first condition requires that prevailing narrative-policy pairs are thosethat maximize the representative agent’s net anticipatory utility, given thehistorical action frequency. Thus, public opinion’s criterion for selecting be-tween competing narrative-policy pairs is net anticipatory utility - in otherwords, it chooses the narrative it prefers to believe in. This captures the ideathat voters do not adjudicate between narratives using “scientific” methods;rather, they are attracted to narratives with a hopeful message. The secondcondition requires the historical action frequency to be consistent with themarginal steady-state distribution over policies. The lower and upper limitson d are thus introduced in order to ensure that α is interior.The distribution α can be interpreted as a cross-section measurement ofthe relative popularity of various policies among the public. However, wefavor an “ergodic” interpretation, according to which α describes a historical10ction frequency. Different policies are ascendant at various points in time.A particular policy rises to dominance when the narrative that accompaniesit appeals to the public in the sense that the narrative-policy pair maximizesthe public’s anticipatory payoff. Over time, as the historical action frequencychanges, so does the anticipatory payoff induced by various narrative-policypairs, and therefore a different narrative-policy pair may become dominant.The distribution α is the average action frequency that results from theperiodic swings between dominant narrative-policy pairs.The following preliminary result establishes equilibrium existence. Proposition 1
An equilibrium exists.
Our next basic observation provides a simple rational-expectations bench-mark. If R is a fully connected DAG, or if it contains no directed path fromthe ancestral node 1 to node n , then p R ( y | a ) = µ for all a - i.e. the agent’sbelief regarding the mapping from actions to consequences coincides with ra-tional expectations. In this case, V (( p, R ) , d | α ) = µ for every p, d , such thatdeviating from the ideal policy d ∗ does not produce any kick to anticipatoryutility. If R only consists of such DAGs, then in any equilibrium ( α, σ ), themarginal of σ over d (and therefore α ) assigns probability one to d ∗ . In thenext section, we will begin to see departures from this crisp benchmark whenother DAGs are admitted. Let n = 3, µ = d ∗ = , C (∆) = k ∆ , where k > √ . Take ε (in the definitionof D ) to be vanishingly small. Suppose that Q consists of a single conditionaldistribution: p ( x = 1 | a, y ) ≈ a (1 − y ) (6)The approximate equality is due to an arbitrarily small perturbation of theexact specification x = a (1 − y ), to ensure that p has full support. The set R consists of all DAGs with two or three nodes in which a is represented bya ancestral node. 11nterpret the three variables as follows. The action a represents foreignpolicy toward a rival country with a hostile regime, where a = 1 (0) denoteshawkish (dovish) policy. The consequence y represents the stability of theregime, where y = 1 (0) indicates regime change (regime stability). Finally,the variable x represents the strength of nationalistic attitudes among therival country’s population, where x = 1 (0) indicates that these attitudesare strong (weak).The joint distribution p satisfies the following properties. First, foreignpolicy has no causal effect on the stability of the rival country’s regime.Second, hawkish (dovish) policy tends to strengthen (weaken) nationalismin the rival country. Finally, nationalism and regime stability are positivelycorrelated. In particular, regime change can only happen when nationalisticattitudes are weak. Yet, this correlation is not causal; rather, it is due toconfounding by exogenous variables that are excluded from the causal modelsour narrators employ.Since Q is a singleton in this example, narrators have no freedom in theirchoice of p . Consequently, a narrative can be identified with the DAG itemploys. Claim 1
There exists a unique equilibrium ( α, σ ) , where α ≈ − √ and Supp ( σ ) consists of two narrative-policy pairs: ( i ) a lever narrative R l : a → x → y coupled with a dovish policy d o ≈ − √ k ; ( ii ) an opportunitynarrative R o : a → y ← x , coupled with a hawkish policy d l ≈ + √ k . Proof.
For the sake of the calculations in this proof, we treat the approximate-equality definition of p as if the equality were precise. We will also supposethat the equilibrium policies are interior and given by first-order conditions.We will later verify that the equilibrium is unique.Consider the opportunity DAG R o . By (3), we have p R o ( y | a ) = X x =0 , p ( x ) p ( y | a, x )We can calculate these terms under the specification (6) and the assumption12hat µ = , and obtain p R o ( y = 1 | a = 0) = 2 − α p R o ( y = 1 | a = 1) = 2 − α U ( R o , d | α ) = d · − α − d ) · − α − k ( d −
12 ) (7)Therefore, ∂U ( R o , d | α ) ∂d = 2 − α − k ( d −
12 ) (8)Because this derivative is strictly positive at d ≤ and strictly decreasing in d > , there is a unique policy d o > that maximizes U ( R o , d | α ).Now consider the lever DAG R l . By (3), we have p R l ( y | a ) = X x =0 , p ( x | a ) p ( y | x )We can calculate these terms under the specification (6) and the assumptionthat µ = , and obtain p R l ( y = 1 | a = 0) = 12 − αp R l ( y = 1 | a = 1) = 12(2 − α )such that U ( R l , d | α ) = d · − α ) + (1 − d ) · − α − k ( d −
12 ) (9)Therefore, ∂U ( R l , d | α ) ∂d = − − α ) − k ( d −
12 ) (10)Because this derivative is strictly negative at d ≥ and strictly decreasingin d > , there is a unique policy d l < that maximizes U ( R l , d | α ). It13ollows that Supp ( σ ) must be some weak subset of { ( R o , d o ) , ( R l , d l ) } .Let us first suppose that Supp ( σ ) coincides with this set and that d o and d l are given by first-order conditions. Then, U ( R o , d o | α ) = U ( R l , d l | α ) (11) ∂U ( R o , d | α ) | d = d o ∂d = ∂U ( R l , d | α ) | d = d l ∂d = 0 (12)By plugging (7)-(10) into the above equations, we can verify that they aresatisfied at the values for ( d o , d l , α ) that are given in the statement of theclaim. The assumption on k ensures that the solution is well-defined. Theexact weights that σ assigns to the two points in the support can be extractedfrom the condition α = P ( s,d ) σ ( s, d ) · d .To verify uniqueness, consider first equilibria in which Supp ( σ ) has twoelements. Note that U ( R o , d o | α ) monotonically decreases with α , while U ( R l , d l | α ) monotonically increases with α . This means that for a given( d o , d l ) , there is a unique α that solves equation (11). Given α, equations (11)-(12) are linear in ( d o , d l ) and hence, have a unique solution. It follows thatthere is a unique triplet ( d o , d l , α ) that solves (11)-(12). Now suppose that Supp ( σ ) consists of a single point ( R l , d ) (( R o , d )) only. Then, α = d . In thiscase, a simple calculation establishes that the narrative-policy pair ( R o , − d )(( R l , − d )) delivers a higher net anticipatory utility, a contradiction.This example has a number of noteworthy features. Coupling of narratives and policies
Although there is a single available variable (other than the action and theconsequence) that narrators can incorporate into their stories, its locationin the narrative’s causal scheme depends on the direction of the policy thenarrative is meant to sustain. Thus, in order to sustain a hawkish policy d > d ∗ , the narrative must treat the variable x as an exogenous opportunity.In contrast, to sustain a dovish policy d < d ∗ , the narrative must treat thevariable x as a lever.The reason that the lever narrative promotes dovish policies is that ac-cording to p , a and x are positively correlated, whereas x and y are neg-14tively correlated. The lever narrative puts these correlations together as ifthey reflected a causal chain a → x → y . As a result, p R l predicts a negativeindirect causal effect of a on y .The intuition for why the opportunity narrative promotes hawkish poli-cies is quite different. According to p , Pr( a = 1 , x = 0) ≈ αµ - i.e., thecombination of a = 1 and x = 0 is an infrequent event. Yet the rarity isunaccounted for by p R o , which sums over x without conditioning on a (andobserve that Pr( x = 0) = αµ + 1 − α > Pr( a = 1 , x = 0)). At the sametime, the probability of y = 1 conditional on the combination a = 1 , x = 0 isapproximately one: if we observe both hawkish policy and weak nationalism,it is almost surely because the regime is unstable. The coupling of these twoeffects leads to an exaggerated belief in the probability of y = 1 conditionalon a = 1. Equilibrium polarization
The marginal equilibrium distribution over policies assigns weight to one pol-icy on each side of the agent’s ideal point. The fundamental force behind thispolarization effect is a “diminishing returns” property of the two narratives:their ability to deceive the agent about the effect of a on y decreases with thehistorical frequency of the action they support. Thus, when we perturb α above the equilibrium level, this makes room for the growing popularity of alever narrative that sustains a dovish policy. Conversely, perturbing α belowthe equilibrium level increases the popularity of an opportunity narrativethat promotes a hawkish policy.This effect can be interpreted in terms of cross-sectional political polar-ization: At any moment in time, there are two narrative-policy pairs thatdominate public opinion. Alternatively, it can be given an “ergodic” interpre-tation: Different narrative-policy pairs rise to dominance at different pointsin time, and the distribution σ captures the long-run frequency with whicheach of them is dominant. Mutual narrative refutation
In our model, the representative agent does not reason “scientifically” aboutthe causal models conveyed by conflicting narratives. Rather than actively15eeking data about p ( y | a ) in order to test the contending narratives, heallows the “narrators” to determine the data he pays attention to. Thus, thelever narrative calls his attention to the conditional probabilities p ( x | a )and p ( y | x ), whereas the opportunity narrative calls his attention to themarginal probability p ( x ) and the conditional probability p ( y | a, x ). Whenevaluating a given narrative ( p, R ), the agent only considers the data that thenarrative calls attention to and uses it to evaluate the narrative’s anticipatoryvalue, via the factorization formula p R .If our agent were somewhat less passive in his approach to data, he couldnotice that the data that one narrative employs actually refutes the othernarrative. Thus, the data p ( y | a, x ) referred to by the opportunity narrativedemonstrates that unlike what the lever narrative assumes, y and a are not independent conditional on x . Conversely, the data p ( x | a ) demonstratesthat unlike what the opportunity narrative assumes, x and a are not inde-pendent. But how would the agent respond to this observation? A criticalreaction would be to distrust all narratives and develop a more “scientific”belief-formation method. However, an equally natural reaction would be toconclude that “all narratives are wrong” and stick to the one that makes theagent feel more hopeful about the future - especially in the political context,where the agent’s personal stakes are negligible.Finally, note that this scenario would not arise in a modified version of ourexample, in which there are two distinct variables with the same conditionaldistribution. In this case, the two conflicting narratives could invoke differentvariables, such that the above mutual refutation would be infeasible. Hawkish bias and distortion of the status quo
For a given absolute policy distance from the ideal point d ∗ = , the opportu-nity narrative leads to a higher anticipatory utility than the lever narrative.As a result, the average equilibrium policy lands on the hawkish side (eventhough d o and d l are equally far from the ideal point) - i.e., α > .The fundamental reason behind this effect is that given p , the lever nar-rative has the property that V (( p, R l ) , α | α ) = µ , whereas the opportu-nity narrative satisfies V (( p, R l ) , α | α ) > µ . In other words, while thelever narrative exaggerates the probability of y = 1 under a counterf actual not distort theconsequences of a policy that adheres to the status quo. In contrast, theopportunity narrative also distorts the status-quo.This ability to spin tales not just about counterfactual events but alsoabout the status quo gives the opportunity narrative an advantage over thelever narrative. A plausible criterion for refining our notion of equilibriumis to rule out such distortions of the status quo because the public is lesslikely to fall for a narrative that misrepresents the status quo. Our analysisin the next section will involve such a restriction. In the current example,it rules out the opportunity narrative (in fact, this is generically the case).The following result summarizes the effect of this change on the equilibriumanalysis. Claim 2
Suppose that R includes all the DAGs in the original specifica-tion except a → y ← x . Then, there exists an essentially unique equilib-rium ( α, σ ) , where α ≈ − q k , and Supp ( σ ) consists of the followingnarrative-policy pairs: ( i ) a lever narrative R l : a → x → y coupled witha dovish policy d l ≈ − q k ; ( ii ) any distribution over the remainingDAGs in R coupled with the policy d ∗ . The proof follows the same outline as in the previous claim, except thatthe policy d ∗ coupled with any DAG that induces rational expectations (e.g. a → y ) replaces ( R o , d o ). Thus, when the opportunity narrative is ruledout, the equilibrium exhibits a dovish bias, mixing between the rational-expectations policy d ∗ and a dovish policy that is sustained by the levernarrative. Toward the end of the previous section, we pointed out that while narrativesdistort the effect of a on y , a plausible restriction is that this distortion onlyinvolves counterf actual deviations from the steady-state policy. It is onething to stoke illusions about the consequences of counterfactual policies, and17uite another to present a wrong picture about the consequences of actualpolicies, because the latter can be checked against the long-run observation of p ( y ). Hence, it seems sensible to restrict attention to narratives that do notdistort beliefs about the effectiveness of the status-quo policy. In this section,we implement this desideratum by restricting the set of feasible DAGs R . Definition 2 (Perfect DAGs)
A DAG ( N, R ) is perfect if whenever iRk and jRk for some i, j, k ∈ N , it is the case that iRj or jRi . Thus, in a causal model that is represented by a perfect DAG, if twovariables are perceived as direct causes of a third variable, then there mustbe a perceived direct causal link between them. E.g., 1 → → → → → ց ↓ ր ↓ ր → → ← R R
3, yetthere is no direct link between 1 and 2.Perfection is a familiar property in the Bayesian Networks literature. Inour context, the crucial properties of perfect DAGs are the following:
Correct marginals . Let (
N, R ) be a perfect DAG. Then, p R ( x i ) = p ( x i ) forevery i ∈ N . That is, the subjective distribution induced by the DAG doesnot distort the objective marginal distribution over individual variables. No status-quo distortion (NSQD). Let (
N, R ) be a perfect DAG. Then, V (( p, R ) , α | α ) = µ for every objective distribution p . That is, the DAG never distorts theconsequences of following a policy that coincides with the historical actionfrequencies.Indeed, Spiegler (2017,2018) shows that the class of perfect DAGs is thelargest that satisfies these properties for all objective distributions. Thisobservation can be extended: For a generic p , imperfect DAGs will violateboth properties. Thus, the significance of the restriction to perfect DAGs isthat it is necessary for the NSQD property, given a generic set Q .18 .1 Linear Narratives In this sub-section we investigate the structure of narratives. Specifically, wefocus on the notion of linear DAGs.
Definition 3
A DAG ( N, R ) is linear if is the unique ancestral node, n is the unique terminal node, and R ( i ) is a singleton for every non-ancestralnode. Clearly, linear DAGs are a subclass of perfect DAGs, because by defi-nition, no node in a linear DAG has more than one parent. Linear DAGscapture the simplest form of narrative. They consist of a single causal chainand correspond to the notion of stories as “one damned thing after another”.In addition, they are simple in the sense that they only call attention to cor-relations between pairs of variables (this property characterizes any causaltree - indeed, linear DAGs are degenerate trees with a single terminal node).The intuitive appeal of linear DAGs raises the question of whether thereis any loss of generality in restricting attention to them. Formally, we posethe following question. Consider a narrative ( p, R ) in which R is a perfectDAG. Is there an alternative narrative ( p ′ , R ′ ) in which R ′ is linear (andnot larger than R , in the sense that it has weakly fewer nodes), such that p ′ R ′ ( y | a ) = p R ( y | a )?Looking at the illustrative perfect DAGs at the beginning of this section,one might get the impression that the answer is obvious. For instance, inthe DAG given by (13), we could collapse the subsets { , } and { , } intoa pair of ”mega-nodes” x ′ = ( x , x ) and x ′ = ( x , x ), such that the six-node perfect DAG, denoted R , would be reduced to a four-node linear DAG R ′ : 1 → ′ → ′ →
6. However, note that for a given p , the original DAG R induces p R ( x , ..., x ) = p ( x , x , x ) p ( x | x , x ) p ( x | x , x ) p ( x | x , x )whereas the reduced DAG leads to a factorization that can be written as p R ′ ( x , ..., x ) = p ( x , x , x ) p ( x | x , x ) p ( x | x , x , x ) p ( x | x , x )19he third terms in these two expressions are different. Therefore, for arbi-trary p , we will have p R ′ = p R and it is not immediately obvious that wecould come up with a different p ′ such that p ′ R ′ ( x | x ) = p R ( x | x ). Proposition 2
For every narrative ( p, R ) in which R is perfect, there existsanother narrative ( p ′ , R ′ ) in which R ′ is linear and has weakly fewer nodesthan R , such that p ′ R ′ ( y | a ) ≡ p R ( y | a ) . Thus, for every narrative ( p, R ) that employs a perfect DAG we can finda (potentially different) narrative ( p ′ , R ′ ) in which R ′ is a linear DAG withweakly fewer nodes than R , such that the two narratives generate the sameconditional beliefs. The intermediate nodes in R ′ represent variables thatare derived from the original variables via a non-trivial sequence of trans-formations, which employs the basic tool of “junction trees” in the BayesianNetworks literature. Therefore, p ′ is typically different from p . In particular,this means that p ′ may lie outside the set Q to which p belongs. That is,our result does not mean that the restriction to linear DAGs is without lossof generality for an arbitrary set Q . However, if Q is sufficiently rich, linearnarratives can approximate non-linear narratives that involve perfect DAGs. As shown at the end of Section 2, under rational expectations (or when R only consists of DAGs that induce p R ( y = 1 | a ) = µ for all a ), anyequilibrium assigns probability one to the ideal policy d ∗ . This provides astark benchmark for the result in this sub-section. Definition 4
Fix µ . A pair ( Q, R ) is rich if it satisfies the following twoconditions: (i) for every α ∈ (0 , there exists a feasible narrative ( p, R ) , p ∈ P α,µ , R ∈ R , such that p R ( y = 1 | a ) is non-constant in a , and (ii) forevery q ∈ Q there exists q ′ ∈ Q such that q ′ ( · | a, y ) ≡ q ( · | − a, y ) . Richness means that the set of feasible narratives always enables beliefdistortions that favor either action. To see why it is not a vacuous property,20ecall that the lever narrative in Section 3 satisfies p R ( y = 1 | a = 0) >p R ( y = 1 | a = 1). Because Q is a singleton in that example, it failscondition ( ii ) in the definition of richness. Now add to Q a mirror image ofthe conditional distribution given by (6), such that x = (1 − a )(1 − y ) witharbitrarily high probability. Then, as long as R includes a → x → y , thepair ( Q, R ) is rich. Proposition 3
Let R be a collection of perfect DAGs, such that ( Q, R ) isrich. Then, in any equilibrium ( α, σ ) , σ assigns positive probability to exactlytwo policies, d r > d ∗ and d l < d ∗ . Proof.
Fix an equilibrium ( α, σ ). First, we establish that the support of σ must include least two distinct policies. Assume the contrary - i.e., themarginal of σ over d is degenerate. Then by definition, it assigns probabilityone to the steady-state policy α . By the NSQD property of perfect DAGs, V ( s, α | α ) = µ for every feasible narrative s .There are two cases to consider. Suppose α = d ∗ . Then any narrative( p, R ) in the support of σ delivers U (( p, R ) , d ∗ | α ) = µ − C ( α − d ∗ ). However,the narrative policy pair (( p, R ∗ ) , d ∗ ), where R ∗ = a → y generates the netpayoff U (( p, R ∗ ) , d ∗ | α ) = µ , contradicting the first part of the definition ofequilibrium. Suppose next that α = d ∗ . Then, V (( p, R ) , d ∗ | α ) = d ∗ · p R ( y = 1 | a = 1) + (1 − d ∗ ) · p R ( y = 1 | a = 0) = µ By property ( i ) of richness, there is a feasible narrative ( p ′ , R ′ ) such thatwithout loss of generality, p ′ R ′ ( y = 1 | a = 1) > p ′ R ′ ( y = 1 | a = 0).Therefore, V (( p ′ , R ′ ) , d ′ | α ) = d ′ · p R ( y = 1 | a = 1) + (1 − d ′ ) · p R ( y = 1 | a = 0) > µ whenever d ′ > d ∗ . Since C ′ = 0 at d = d ∗ , it follows that coupling thenarrative ( p ′ , R ′ ) with such a policy d ′ that is slightly larger than d ∗ willdeliver U (( p ′ , R ′ ) , d ′ ) > µ , a contradiction.Now suppose that the support of σ contains at least two distinct policies.We argue that at least two of these policies, denoted d l and d r , satisfy d l < α d r > α . Note that every ( s, d ) ∈ Supp ( σ ) must deliver U ( s, d ) ≥ µ because the narrative-policy pair (( p, a → y ) , d ∗ ) induces U = µ . Let usnow show that the narrative ( p , R ) that accompanies the policy d r satisfies p R ( y = 1 | a = 1) > p R ( y = 1 | a = 0), and that the narrative ( p , R ) thataccompanies d l satisfies p R ( y = 1 | a = 1) < p R ( y = 1 | a = 0).By the definition of equilibrium, any narrative ( p, R ) that accompaniesany d in the support of σ maximizes U (( p, R ) , d | α ) = V (( p, R ) , d | α ) − C ( d − d ∗ )where V (( p, R ) , d | α ) = d · p R ( y = 1 | a = 1) + (1 − d ) · p R ( y = 1 | a = 0)Because all feasible narratives involve perfect DAGs, any ( p, R ) must satisfy V (( p, R ) , α | α ) = µ . This means that we can rewrite V (( p, R ) , d | α ) asfollows: V (( p, R ) , d | α ) = d − α − α · p R ( y = 1 | a = 1) + 1 − d − α · µ (14)= α − dα · p R ( y = 1 | a = 0) + dα · µ (15)It follows that the set of narratives that maximize U for given ( d, α ) onlydepends on the ordinal ranking between d and α . Specifically, if d > α ,then ( p, R ) should maximize p R ( y = 1 | a = 1); if d < α , then ( p, R ) shouldmaximize p R ( y = 1 | a = 0); and if d = α , then all feasible narratives induce U = µ − C ( d − d ∗ ). Richness implies that there is ( p, R ) such that the slopeof V (( p, R ) , d | α ) with respect to d > α is strictly positive, and there is( p, R ) such that the slope of V (( p, R ) , d | α ) with respect to d < α is strictlynegative.It follows that the value function max ( p,R ) V (( p, R ) , d | α ) is piecewiselinear in d : It is linearly increasing (decreasing) in d > α ( d < α ). Since C isstrictly convex, it follows that there is a unique maximizer d r of U (( p, R ) , d | α ) in the range d ≥ α , and a unique maximizer d l of U (( p, R ) , d | α ) in the22ange of d ≤ α . In both cases, α cannot be the maximizer. To see why, recallthat U (( p, R ) , α | α ) = µ − C ( α − d ∗ ) for any narrative ( p, R ). We notedabove that every ( s, d ) ∈ Supp ( σ ) must deliver U ( s, d ) ≥ µ . It follows thatif α ∈ arg max d U (( p, R ) , d | α ), then α = d ∗ . But since C ′ = 0 at d = d ∗ ,it follows from (14) that any narrative ( p, R ) with p R ( y = 1 | a = 0) > d>α U (( p, R ) , d | α ) > µ . Likewise, any narrative ( p, R ) with p R ( y = 1 | a = 0) > d<α U (( p, R ) , d | α ) > µ . We concludethat d r > α and d l < α , and therefore the support of the marginal of σ over d is weakly contained in { d l , d r } . Because we have already established thatthis support cannot be a singleton, the containment must be an identity.It remains to establish that d r > d ∗ and d l < d ∗ . Assume the contrarysuch that without loss of generality, d l ≥ d ∗ . Recall that d l is accompanied bya narrative ( p , R ) for which p R ( y = 1 | a = 0) >
0. Therefore, the deriva-tive of U (( p, R ) , d | α ) with respect to d is strictly negative at d = d l , whichmeans that switching from d l to a slightly lower policy (without changing theaccompanying narrative) would generate a higher net anticipatory utility, acontradiction.Thus, when the set of feasible narratives only involves perfect DAGs - yetis sufficiently rich to enable belief distortion in either direction - equilibriummust induce exactly two policies. Each of the two policies deviates from theideal point d ∗ in a different direction. As the proof of the result indicates,this polarization result does not directly rely on the notion of narratives ascausal models. Indeed, any model of belief distortion that satisfies NSQD andrichness would lead to the same result. Causal models only play an indirectrole in this sub-section: Perfect DAGs imply NSQD and non-vacuousnessof the richness property. They will return to play a direct role in the nextsub-section. In this sub-section we provide a complete equilibrium characterization for thefollowing specification. First, narratives must be short: They can involve atmost one variable x in addition to a and y . Second, R is the set of perfect23AGs with two or three nodes in which a is represented by an ancestralnode. The only DAG in this class that does not induce p R ( y | a ) = µ for all a is the lever DAG a → x → y . Finally, Q is large in the following sense:There is an arbitrarily small constant δ > p ( x | a, y )), there is a conditional distribution q ∈ Q such thatmax a,y | q ( x = 1 | a, y ) − p ( x = 1 | a, y ) | < δ .Our analysis in the previous sub-section implies that in any equilibrium( α, σ ), Supp ( σ ) consists of two elements: a policy d r > d ∗ sustained by alever narrative that employs some distribution q r ∈ Q , and a policy d l < d ∗ sustained by another lever narrative that employs a different distribution q l ∈ Q . The following result refines this characterization. Proposition 4
There is an essentially unique equilibrium ( α, σ ) . In par-ticular:(i) In the δ → limit, q r is defined by p ( x = 1 | a, y ) = y + a (1 − y ) and q l is defined by p ( x = 1 | a, y ) = y + (1 − a )(1 − y ) .(ii) α ∈ ( , d ∗ ) when d ∗ > , and α = when d ∗ = . Proof.
We established in the previous sub-section that d r is accompaniedby a narrative ( p, R ) that maximizes p R ( y = 1 | a = 1); and likewise, d l isaccompanied by a narrative ( p, R ) that maximizes p R ( y = 1 | a = 0). Theonly DAG that can induce non-constant p R ( y | a ) is a → x → y . Therefore,the narratives that accompany both d r and d l involve this DAG, which wedenote by R . To find the optimal narrative that accompanies d r , we need tofind the quadruple ( p ( x = 1 | a, y )) a,y =0 , that maximizes p R ( y = 1 | a = 1) = X x p ( x | a = 1) p ( y = 1 | x )= X x X y ′ p ( y ′ ) p ( x | a = 1 , y ′ ) ! µ P a ′ p ( a ′ ) p ( x | a ′ , y = 1) P y ′′ P a ′′ p ( a ′′ ) p ( y ′′ ) p ( x | a ′′ , y ′′ )In the Appendix, we show that the solution in the δ → p ∗ ( x = 1 | By essential uniqueness we mean that the definition of q or q is unique up to rela-beling of x . , y ) = y + a (1 − y ), inducing p ∗ R ( y = 1 | a = 1) = µµ + α (1 − µ )and, by NSQD, p ∗ R ( y = 1 | a = 0) = µ µ + α (1 − µ )Therefore, V (( p ∗ , R ) , d | α ) = d µµ + α (1 − µ ) + (1 − d ) µ µ + α (1 − µ )= µ + µ (1 − µ ) µ + α (1 − µ ) ( d − α )Likewise, the narrative that accompanies d l in the δ → p ∗∗ ( x = 1 | a, y ) = y + (1 − a )(1 − y ), inducing p ∗∗ R ( y = 1 | a = 0) = µµ + (1 − α )(1 − µ ) p ∗∗ R ( y = 1 | a = 1) = µ µ + (1 − µ )(1 − α )Therefore, V (( p ∗∗ , R ) , d | α ) = d µ µ + (1 − µ )(1 − α ) + (1 − d ) µµ + (1 − µ )(1 − α )= µ − µ (1 − µ ) µ + (1 − α )(1 − µ ) ( d − α )Denote U r ( α ) = U (( p ∗ , R ) , d r | α ) = V (( p ∗ , R ) , d r | α ) − C ( d r − d ∗ ) U l ( α ) = U (( p ∗∗ , R ) , d l | α ) = V (( p ∗∗ , R ) , d l | α ) − C ( d l − d ∗ )25enote ∆ = | d − α | , e = α − d ∗ . Then, we can write U r ( α ) = max ∆ ≤ − ε − α (cid:20) µ + µ (1 − µ ) µ + α (1 − µ ) ∆ − C (∆ + e ) (cid:21) (16) U l ( α ) = max ∆ ≤ α − ε (cid:20) µ + µ (1 − µ ) µ + (1 − α )(1 − µ ) ∆ − C (∆ − e ) (cid:21) Recall that by assumption, d ∗ ≥ . Suppose α > d ∗ . Then, α > and e > U r ( α ) < U l ( α ), contradicting equilibrium. Nowsuppose α < . Then, e <
0, and it is clear from (16) that U r ( α ) > U l ( α ),again contradicting equilibrium. It follows that α ∈ [ , d ∗ ]. Furthermore,since U r ( α ) is strictly decreasing in α while U l ( α ) is strictly increasing in α ,there is at most one value of α for which U r ( α ) = U l ( α ), hence equilibriummust be unique.The characterization has a number of noteworthy properties. First, thelever narrative that sustains either of the two equilibrium policies selectsthe intermediate variable x such that it is highly correlated with both thedesired outcome y = 1 and the advocated policy. Specifically, the selectedvariable is such that one particular value is attained whenever y = 1 or thefavored action is taken.For illustration, recall the US trade policy debate described in the In-troduction. In this context, our characterization approximates the followingprevailing narratives. The lever narrative that sustains a policy with a pro-tectionist bias (relative to the agent’s ideal point) will involve a variable like“imports from China”, because low imports are associated with trade restric-tions as well as with high employment in the local manufacturing sector, evenif the latter correlation is not causal but due to a confounding factor (such asexogenous technology changes that affect outsourcing of production). Like-wise, the lever narrative that sustains a trade policy with a liberalized biaswill select a variable like ”industrial exports”.Second, the anticipatory utility induced by the equilibrium narratives ex-hibits a diminishing-returns property. That is, when α increases (decreases),the narrative that advocates right-leaning (left-leaning) policies has lower an-ticipatory value. This property is intuitive: narratives generate false hopes26bout counterfactual policies; as the historical action frequency leans in thesame direction as the narrative, the ability to sell this illusion diminishes. Inturn, the diminishing-returns property implies two features of equilibrium:essential uniqueness (specifically, the marginal equilibrium distribution overpolicies is unique) and a “centrist bias” (i.e., the historical action frequencylies between and d ∗ . Our analysis in the previous section ruled out imperfect DAGs, which includethe opportunity narrative we encountered in Section 3. In this section weexplore the implication of allowing for imperfect DAGs. We focus our analysison the case in which only a single auxiliary variable can be used (i.e., n = 3).Thus, the set of feasible DAGs is the set of all DAGs with up to three nodes,in which a is represented by an ancestral node. The only imperfect DAG inthis class is a → x ← y . We assume throughout that d ∗ = .The following result establishes a polarization result akin to that of Sec-tion 4.2. Proposition 5 If ( Q, R ) is rich in the sense of Section 4.2, then any equi-librium assigns positive probability to at least one policy d > d ∗ and one policy d < d ∗ . Proof.
Assume the contrary - without loss of generality, there is an equilib-rium ( α, σ ) that assigns probability one to policies d ≥ d ∗ = . Therefore, α ≥ . If the DAG a → y ← x is never played in this equilibrium, we areback with the model of Section 4.2, where this possibility was ruled out.Now suppose that Supp ( σ ) includes a narrative-policy pair (( p, R ) , d ) inwhich R : a → y ← x . Let us first establish that, for one such pair, p R ( y =1 | a = 1) = p R ( y = 1 | a = 0) . Assume the contrary for every such ( p, R ).This means that if we switched to the DAG R ′ : y ← x , we would have p R ′ ( y ) = p R ( y ). However, since p R ′ ( y ) ≡ p ( y ), we have p R ( y = 1 | a ) = µ forall a . This means that the narrative-policy pair (( p, R ) , d ) induces the same27et anticipatory utility as if the narrative involved the DAG a → y . Sincewe can perform this substitution for every narrative-policy pair in Supp ( σ )that involves the DAG a → y ← x , we are back in the case of Section 4.2,which again leads to a contradiction.From now on, assume without loss of generality that for every narrative-policy pair (( p, R ) , d ) in which R : a → y ← x , p R ( y = 1 | a = 1) = p R ( y = 1 | a = 0). Suppose d = d ∗ . Since C is flat at this point, adeviation to the narrative policy pair (( p, R ) , d ′ ), where d ′ is slightly differentfrom d ∗ in the direction of the action a that has the higher p R ( y = 1 | a )would generate higher net anticipatory utility, contradicting the definition ofequilibrium. Therefore, d > d ∗ . In particular, this means that α > . If p R ( y = 1 | a = 1) < p R ( y = 1 | a = 0), a switch to the narrative-policy pair(( p, R ) , − d ) would increase gross anticipatory utility without changing C ,a contradiction.Thus, α > and Supp ( σ ) includes a narrative-policy pair (( p, R ) , d ) inwhich R : a → y ← x , d > and p R ( y = 1 | a = 1) > p R ( y = 1 | a = 0).Write down the explicit formula for p R ( y | a ): p R ( y = 1 | a ) = X x p ( x ) p ( y = 1 | a, x ) (17)= X x X a ′′ p ( a ′′ ) X y ′′ p ( y ′′ ) p ( x | a ′′ , y ′′ ) ! p ( a )( p ( y = 1)) p ( x | a, y = 1) X y ′ p ( y ′ ) p ( a ) p ( x | a, y ′ )= µ X x p ( x | a, y = 1) X y ′ p ( y ′ ) p ( x | a, y ′ ) X a ′′ p ( a ′′ ) X y ′′ p ( y ′′ ) p ( x | a ′′ , y ′′ )28or a = 1, this expression becomes µ X x p ( x | a = 1 , y = 1) α X y p ( y ) p ( x | a = 1 , y ) + (1 − α ) X y p ( y ) p ( x | a = 0 , y ) X y p ( y ) p ( x | a = 1 , y )= µ X x p ( x | a = 1 , y = 1) α + (1 − α ) X y p ( y ) p ( x | a = 0 , y ) X y p ( y ) p ( x | a = 1 , y ) = µ α + (1 − α ) X x p ( x | a = 1 , y = 1) X y p ( y ) p ( x | a = 0 , y ) X y p ( y ) p ( x | a = 1 , y ) Likewise, for a = 0, (17) becomes µ (1 − α ) + α X x p ( x | a = 0 , y = 1) X y p ( y ) p ( x | a = 1 , y ) X y p ( y ) p ( x | a = 0 , y ) Denote A = X x p ( x | a = 1 , y = 1) X y p ( y ) p ( x | a = 0 , y ) X y p ( y ) p ( x | a = 1 , y ) B = X x p ( x | a = 0 , y = 1) X y p ( y ) p ( x | a = 1 , y ) X y p ( y ) p ( x | a = 0 , y )Since p R ( y = 1 | a = 1) > p R ( y = 1 | a = 0), A > B . And since p R ( y = 1 | a = 1) > µ , A >
1. The net anticipatory utility generated by (( p, R ) , α ) canthus be written as d · p R ( y = 1 | a = 1) + (1 − d ) · p R ( y = 1 | a = 0) − C ( d −
12 ) (18)= µ [ d ( α + (1 − α ) A ) + (1 − d )((1 − α ) + αB )] − C ( d −
12 )Now consider a deviation to the narrative-policy pair ((˜ p, R ) , − d ), where ˜ p is defined by ˜ p ( x | a, y ) ≡ p ( x | − a, y )29hat is, ˜ p is a mirror image of p . By assumption, ˜ p is feasible. Define ˜ A and ˜ B accordingly. By construction, ˜ A = B and ˜ B = A . Therefore, the netanticipatory utility generated by ((˜ p, R ) , − d ) is(1 − d ) · ˜ p R ( y = 1 | a = 1) + d · ˜ p R ( y | a = 0) − C ((1 − d ) −
12 )= µ [(1 − d )( α + (1 − α ) B ) + d ((1 − α ) + αA )] − C ( 12 − d )Since d, α > and A >
1, this expression exceeds (18), a contradiction.Unlike the case of perfect DAGs, the DAG a → x ← y does not satisfythe NSQD property, and therefore the proof resorts to other arguments. Thekey question is whether, assuming all equilibrium policies lie on one sideof d ∗ = , a narrative-policy pair (( p, a → x ← y ) , d ) ∈ Supp ( σ ) can bedestabilized by a deviation to a “mirror” pair. The answer is not obvious,and our proof relies on the particular structure of the imperfect three-nodeDAG a → x ← y .The result is weaker than its analogue in Section 4.2. In particular, we areunable to determine whether equilibrium will sustain exactly one policy oneach side of d ∗ for general cost functions. However, when costs are sufficientlysmall, we obtain a stronger characterization. Proposition 6
Suppose (as in Section 4.3) that there is an arbitrarily smallconstant δ > such that for every conditional distribution ( p ( x | a, y )) thereis q ∈ Q such that max a,y | q ( x = 1 | a, y ) − p ( x = 1 | a, y ) | < δ . Then, if C ′ ( · ) and ε are sufficiently small, there is a unique equilibrium, in which α = and Supp ( σ ) consists of:(i) An opportunity narrative that consists of the DAG a → y ← x and theconditional distribution p ( x = 1 | a, y ) ≈ y + (1 − a )(1 − y ) , coupled with apolicy d r ≈ .(ii) An opportunity narrative that consists of the DAG a → y ← x and theconditional distribution p ( x = 1 | a, y ) ≈ y + a (1 − y ) , coupled with a policy d l ≈ . If d ∗ > , a similar result holds, where the only difference is that α ∈ ( , d ∗ ) . roof. In Section 4.3, we derived, for each a = 0 ,
1, a lever narrative thatsustains p R ( y = 1 | a ) − p R ( y = 1 | − a ) > α ∈ (0 , V with respect to d , it follows that if C ′ is sufficiently small, the only policies that survive in equilibrium are theextreme points d = 1 − ε and d = ε . It follows that in order to characterizeequilibrium in the low ε limit, we only need to look for the narratives ( p, R )that maximize p R ( y = 1 | a ) for each a = 0 , p R ( y = 1 | a = 1) and p R ( y =1 | a = 0) that lever narratives can attain are µ/ [ µ + (1 − µ ) α ] and µ/ [ µ +(1 − µ )(1 − α )], respectively. In the Appendix, we show that the largest p R ( y = 1 | a = 1) and p R ( y = 1 | a = 0) that opportunity narratives canattain are 1 − α (1 − µ ) and 1 − (1 − α )(1 − µ ), respectively. A simple calculationestablishes that 1 − α (1 − µ ) > µµ + (1 − µ ) α for any α ∈ (0 , ε, δ → − α (1 − µ ) − C (1 −
12 ) = 1 − (1 − α )(1 − µ ) − C ( −
12 )which holds if and only if α = .Thus, when the set of feasible three-node DAGs is unrestricted, the set Q is rich and the cost C is low, the narratives that prevail in equilibriumare opportunity narratives and they sustain extreme policies. Surprisingly,the opportunity narrative that sustains an extreme right (left) policy employsthe same third variable that was employed by the equilibrium lever narrativethat sustained the extreme left (right) in Section 4.3. We saw an inkling ofthis effect in the illustrative example of Section 3: The same variable canfeature in narratives that support radically different policies; what changesis the role that this variable plays in the narrative’s causal structure.31 Conclusion
The model presented in this paper formalized a number of intuitions regard-ing the role of narratives in the formation of popular political opinions. Ourmodel was based on two main ideas.
What are narratives and how do they shape beliefs?
In our model, narra-tives are formalized as causal models (represented by DAGs) that describehow actions map into consequences. Different narratives employ differentintermediate variables and arrange them differently in the causal scheme.Narratives shape beliefs in the sense that beliefs emerge from fitting causalmodels to long-run correlations between the variables that appear in thenarrative. These beliefs are used to evaluate policies.
How does the public select between competing narratives?
Our behavioralassumption was that in the presence of conflicting narrative-policy pairs, thepublic (a representative agent in this paper) selects between them “hedo-nically” - i.e., according to the anticipatory utility induced by each of thesepairs. This is consistent with the basic intuition that people are drawn to“hopeful” stories.The main insights that emerged as results of our formalism can be sum-marized as follows. First, narratives are employed to “sell false hopes”: Theyinvolve misspecified causal models that generate biased beliefs regarding theconsequences of counterfactual policies. Second, the same variable can servetwo conflicting narratives with a different causal structure (e.g., “lever nar-rative” vs. “opportunity narrative”) in the service of conflicting policies.Third, multiplicity of dominant narrative-policy pairs can be a fundamentalproperty of long-run equilibrium in the “battle over public opinion”. Indeed,growing popularity of one policy can strengthen the appeal of a narrativethat supports an opposing policy. This “diminishing returns” property leadsto additional properties of equilibrium (uniqueness, centrist bias) in specificsettings. Finally, when we rule out narratives that convey false beliefs re-garding the status quo, linear narratives are without loss of generality.Our analysis leaves a number of open technical problems. First, Section32.3 provided a complete equilibrium characterization for perfect DAGs andrich Q in the case of n = 3. We also know that for n = 4, equilibriumnarratives have the longer linear form a → x → x → y . Naturally, weconjecture that for general n , prevailing narratives are linear chains of length n . But what are the conditional beliefs over consequences that these pre-vailing narratives induce? Finally, the case of general n and an unrestrictedset of feasible DAGs (including imperfect ones) is almost entirely open; theonly analysis we have been able to carry out for this domain is the n = 3example of Section 4. A broad question that is common to these two cases iswhether our definition of equilibrium generates a force that favors narrativesthat involve many variables. References [1] Akerlof, G. and W. Dickens (1982), The economic consequences of cog-nitive dissonance,
American Economic Review
72, 307-319.[2] Benabou, R., A. Falk and J. Tirole (2018), Narratives, Imperatives andMoral Reasoning, NBER Working Paper No. 24798.[3] Brunnermeier, M. and J. Parker (2005), Optimal Expectations,
Ameri-can Economic Review
95, 1092-1118.[4] Cowell, R., P. Dawid, S. Lauritzen and D. Spiegelhalter (1999),
Proba-bilistic Networks and Expert Systems,
Springer, London.[5] Esponda. I. and D. Pouzo (2016), Berk–Nash Equilibrium: A Frameworkfor Modeling Agents with Misspecified Models,
Econometrica
84, 1093-1130.[6] Esponda, I. and D. Pouzo (2017), Retrospective Voting and Party Po-larization,
International Economic Review , forthcoming.[7] Levy, G. and R. Razin (2018), An Explanation-Based Approach to Com-bining Forecasts, mimeo. 338] Monteal Olea, J., P. Ortoleva, M. Pai and A. Prat (2018), CompetingModels, mimeo.[9] Pearl, J. (2009),
Causality: Models, Reasoning and Inference,
Cam-bridge University Press, Cambridge.[10] Shiller, R. (2017), Narrative Economics,
American Economic Review
Causal Models: How People Think about the Worldand its Alternatives , Oxford University Press.[12] Sloman, S. and D. Lagnado (2015), Causality in Thought,
Annual Re-view of Psychology
66, 223-247.[13] Schnellenbach, J. and D. Schubert (2015), Behavioral Political Economy:A Survey,
European Journal of Political Economy
40, 395-417.[14] Spiegler, R. (2008), On Two Points of View Regarding Revealed Prefer-ences and Behavioral Economics (2008), in
The Foundations of Positiveand Normative Economics , Oxford University Press, 95-115.[15] Spiegler, R. (2013), Placebo Reforms,
American Economic Review
Quarterly Journal of Economics
Journal of the European Economic Association ,forthcoming.
Appendix: Proofs
Proof of Proposition 1
Consider an auxiliary two-player game. Player 1’s strategy space is D , and α denotes an element in this space. Player 2’s strategy space is ∆( Q × R × D ),34nd β denotes an element in this space. Observe that when we fix α and µ ,an element q ∈ Q induces unambiguously an element p q ∈ P α,µ .The payoff of player 1 from the strategy profile ( α, β ) is X ( q,R,d ) β ( q, R, d ) U (( p q , R ) , d | α )Note that since p R is a continuous function of α , so is U . The payoff of player2 from ( σ, α ) is − (cid:18) α − X ( q,R,d ) β (( p q , R ) , d ) d (cid:19) A Nash equilibrium in this auxiliary game is equivalent to our notion ofequilibrium. The strategy spaces and payoff functions of the two playersin the auxiliary game satisfy standard conditions for the existence of Nashequilibrium.
Proof of Proposition 2
The proof proceeds in the three main steps.
Step 1: Deriving an auxiliary “clique factorization” formula
Consider a non-linear perfect DAG (
N, R ), where N = { , ..., n } , n >
2. Wesay that a subset of nodes C ⊆ N is a clique if for every i, j ∈ C , iRj or jRi .We say that a clique is maximal if it is not contained in another clique. Let C be the collection of maximal cliques in the DAG.The following is standard material in the Bayesian-Networks literature.Because ( N, R ) is perfect, we can construct an auxiliary (non-directed) tree whose set of nodes is C , such that for every pair of nodes C and C ′ in thistree, C ∩ C ′ is contained in any C ′′ that lies along the path that connects C and C ′ (the path is unique, by the definition of a tree). Such a tree isreferred to in the literature as a junction tree . Given a junction tree, we saythat S ⊆ N is a separator if there are two adjacent tree nodes C and C ′ suchthat S = C ∩ C ′ . Let S be the set of separators for a given junction treeconstructed from C . Then, for any distribution p ′ ∈ ∆( X ) with full support35hat is consistent with ( N, R ) (i.e., in the sense that p R = p ), p ′ ( x ) = Y C ∈C p ′ ( x C ) Y S ∈S p ′ ( x S )For an exposition of these results, see Cowell et al. (1999), pp. 52-69.Now, our objective distribution p is not necessarily consistent with R .However, p R is consistent with R by definition. Furthermore, a key featureof perfect DAGs is that they do not distort the marginal distributions overcliques - i.e., p R ( x C ) ≡ p ( x C ) for every C ∈ C (see Spiegler (2017) for furtherdetails). It follows that for every objective distribution p and a perfect DAG( N, R ), we can write p R ( x ) ≡ Y C ∈C p ( x C ) Y S ∈S p ( x S ) (19)where C is the set of maximal cliques in ( N, R ) and S is the set of separatorsin some junction tree constructed out of C .Let C , C m ∈ C be two cliques in ( N, R ) that include the nodes 1 and n , respectively. Furthermore, for a given junction tree representation of theDAG, select these cliques to be minimally distant from each other - i.e.,1 , n / ∈ C for every C along the junction-tree path between C and C m .If C = C m , then by our earlier observation that perfect DAGs do notdistort the marginals of collections of variables that form a clique, it followsthat p R ( x , x n ) ≡ p ( x , x n ) and therefore p R ( x n | x ) ≡ p ( x n | x ) - i.e. wecan replace the original DAG with the degenerate linear DAG 1 → n andobtain the same subjective conditional distribution over x n . The same devi-ation holds if there is no junction-tree path between C and C m , because thismeans that x ⊥ x n according to p R , and therefore p R ( x n | x ) ≡ p ( x n | x ).Thus, from now on, assume that C = C m and there is a junction-treepath between C and C m . Enumerate all the nodes in the junction treeand turn it into a directed tree, such that C is its root node. For every k = 2 , ..., |C| , let pa ( k ) denote the index of the direct parent of C k - i.e. thejunction tree has a direct link C pa ( k ) → C k . In particular, let C , C , ..., C m
36e the tree nodes along the path between C and C m , such that this pathis C → C → · · · → C m . By the definition of a junction tree, if i ∈ C k , C j for some 1 ≤ k < j ≤ m , then i ∈ C h for every h = k + 1 , ..., j −
1. Andsince the cliques C , ..., C m are maximal, it follows that every C k along thesequence C , ..., C m +1 must introduce at least one element i / ∈ ∪ j 2, and obtain the following equivalent formula: p R ( x ) ≡ p ( x C ) · Y |C| k =2 p ( x C k − C pa ( k ) | x C k ∩ C pa ( k ) )Furthermore, by the definition of the junction tree, for every k > m , C k − C pa ( k ) and C ∗ = C ∪ · · · ∪ C m are mutually disjoint. Therefore, p R ( x C ∗ ) ≡ p ( x C ) Y mk =2 p ( x C k − C k − | x C k ∩ C k − ) (20) Step 2: Obtaining a linear-DAG factorization We begin this step by deriving the subjective conditional probability p R ( x n | x ) from (20). Recall that from the definition of C and C m it follows that1 ∈ C , n ∈ C m , and 1 , n / ∈ C k for every k = 2 , ..., m − 1. Denote C = { } and observe that p ( x C ) = p ( x ) p ( x C −{ } | x ). Then, p R ( x n | x ) = X x C ∗−{ ,n } Y mk =1 p ( x C k − C k − | x C k ∩ C k − ) (21)We can draw an immediate conclusion from this formula. Suppose that thereis some i ∈ C ∗ − { , n } such that i ∈ C k for a unique k = 1 , ..., m . Then, thevariable x i appears in only one term in (21), namely p ( x C k − C k − | x C k ∩ C k − ).Moreover, by assumption, i ∈ C k − C k − . Therefore, we can rewrite this term37s follows: p ( x C k − C k − | x C k ∩ C k − ) = p ( x C k − ( C k − ∪{ i } ) | x C k ∩ C k − ) p ( x i | x ( C k ∪ C k − ) −{ i } )This means we can rewrite p R ( x n | x ) as follows: X x C ∗−{ ,n } Y h = k p ( x C h − C h − | x C h ∩ C h − ) p ( x C k − ( C k − ∪{ i } ) | x C k ∩ C k − ) p ( x i | x ( C k ∪ C k − ) −{ i } ) = X x C ∗−{ ,n,i } Y h = k p ( x C h − C h − | x C h ∩ C h − ) p ( x C k − ( C k − ∪{ i } ) | x C k ∩ C k − ) X x i p ( x i | x ( C k ∪ C k − ) −{ i } ) = X x C ∗−{ ,n,i } Y h = k p ( x C h − C h − | x C h ∩ C h − ) p ( x C k − ( C k − ∪{ i } ) | x C k ∩ C k − )This is the same formula we would have if we removed i (and the linksassociated with this node) from the original DAG in the first place. Therefore,without loss of generality, we can assume that every i ∈ C ∗ − { , n } belongsto at least two cliques C k , k = 1 , ..., m . Furthermore, by the definition of ajunction tree, these two cliques are consecutive, C k and C k +1 . In particular,this means that C − C = { } , C m − C m − = { n } , and C k − C k − ⊆ C k +1 ∩ C k for every k = 1 , ..., m − 1. The latter observation implies that for every k = 1 , ..., m − 1, ( C k +1 ∩ C k ) − ( C k − C k − ) is weakly contained in C k ∩ C k − .Therefore, p ( x C k − C k − | x C k ∩ C k − ) = p ( x C k +1 ∩ C k | x C k ∩ C k − ), such that wecan replace the term p ( x C k − C k − | x C k ∩ C k − ) in (20) with the equivalent term p ( x C k +1 ∩ C k | x C k ∩ C k − ). Finally, perform another change in (20), by replacing p ( x C ) with the equivalent term p ( x ) p ( x C ∩ C | x ). After these changes areperformed, (20) is transformed into a Bayesian-network factorization formulawith respect to a linear DAG1 → ( C ∩ C ) → ( C ∩ C ) · · · → ( C m ∩ C m − ) → m This DAG has at most m + 1 ≤ n nodes. Step 3: Transforming the intermediate linear-DAG nodes into binary vari-ables For every k = 2 , ..., m − 1, define z k = x C k ∩ C k − , and let z ∗ k be one arbitrary38alue that the variable z k can get. (Because p has full support, at least twovalues of each z k have positive probability.) Observe that p R ( y | a ) = X z ,...,z m − p ( z | a ) p ( z | z ) · · · p ( z m − | z m − ) p ( y | z m − )is equal to X z ,...,z k − p ( z | a ) · · · p ( z k − | z k − ) X z k +1 X z k p ( z k | z k − ) p ( z k +1 | z k ) ! · · · X z m − p ( z m − | z m − ) p ( y | z m − )The expression in the large parenthesis can be written as p ( z k = z ∗ k | z k − ) p ( z k +1 | z k = z ∗ k ) + p ( z k = z ∗ k | z k − ) p ( z k +1 | z k = z ∗ k )This is the only place in the formula for p R ( y | a ) where z k makes an appear-ance. Therefore, without loss of generality, we can transform z k into a binaryvariable that gets the value 1 when z k = z ∗ k and the value 0 when z k = z ∗ k .The distribution p ′ over a , y and the other m − p via the above series of steps. The requirement that p ′ has fullsupport is therefore satisfied because z k gets at least two values. Missing step in the proof of Proposition 4 Let R L : a → x → y. Our objective is to show that p R L ( y = 1 | a = 1) ≤ µµ + α (1 − µ ) p R L ( y = 1 | a = 0) ≤ µµ + (1 − α )(1 − µ )in the δ → p R L ( y = 1 | a = 1) = X x =0 , p ( x | a = 1) p ( y = 1 | x )Using the notation p ay ≡ p ( x = 1 | a, y ), p R L ( y = 1 | a = 1) can be rewritten39s [ µp + (1 − µ ) p ] µ [ αp + (1 − α ) p ](1 − µ )[ αp + (1 − α ) p ] + µ [ αp + (1 − α ) p ]+[1 − µp − (1 − µ ) p ] µ [1 − αp − (1 − α ) p ](1 − µ )[1 − αp − (1 − α ) p ] + µ [1 − αp − (1 − α ) p ]This expression is a convex combination of two expressions, µ [ αp + (1 − α ) p ](1 − µ )[ αp + (1 − α ) p ] + µ [ αp + (1 − α ) p ] (22)and µ [1 − αp − (1 − α ) p ](1 − µ )[1 − αp − (1 − α ) p ] + µ [1 − αp − (1 − α ) p ] (23)Suppose (22) is greater or equal to (23). Then p R L ( y = 1 | a = 1) attainsa maximum only if p = p = 1 . Given this, (22) attains a maximum at p = 1 and p = 0. At these values, p R L ( y = 1 | a = 1) = µµ + α (1 − µ )and indeed, (22) is greater than (23).Using analogous arguments, p R L ( y = 1 | a = 0) ≤ µµ + (1 − α )(1 − µ )where p = p = p = 1 and p = 0 attain this upper bound. (cid:4) Missing step in the proof of Proposition 6 Let R o : a → y ← x . Our objective is to show that p R o ( y = 1 | a = 1) ≤ − α (1 − µ ) p R o ( y = 1 | a = 0) ≤ − (1 − α )(1 − µ )40n the δ → p R o ( y = 1 | a ) = X x =0 , p ( x ) p ( y = 1 | a, x )Denote p ay ≡ p ( x | a, y ) . Then p R o ( y = 1 | a = 1) is equal to[ αµp + α (1 − µ ) p +(1 − α ) µp +(1 − α )(1 − µ ) p ] µαp α [ µp + (1 − µ ) p ] +[ αµ (1 − p ) + α (1 − µ )(1 − p ) + (1 − α ) µ (1 − p ) + (1 − α )(1 − µ )(1 − p )] µα (1 − p ) α [ µ (1 − p ) + (1 − µ )(1 − p )]which simplifies into[1+( 1 − αα )( µp + (1 − µ ) p µp + (1 − µ ) p )] µαp +[1+( 1 − αα )( µ (1 − p ) + (1 − µ )(1 − p ) µ (1 − p ) + (1 − µ )(1 − p ) )] µα (1 − p )(24)Note that this expression is a convex combination of two expressions, µp + (1 − µ ) p µp + (1 − µ ) p (25)and µ (1 − p ) + (1 − µ )(1 − p ) µ (1 − p ) + (1 − µ )(1 − p ) (26)Suppose (25) is greater or equal to (26). Then (24) attains a maximum onlyif p = 1 . Given this, (25) attains a maximum at p = p = 1 and p = 0 . Plugging these values into (24) gives p R o ( y = 1 | a = 1) = 1 − α (1 − µ )and (25) is greater than (26).By analogous arguments, p R o ( y = 1 | a = 0) ≤ − (1 − α )(1 − µ )and p = p = p = 1 , p = 0 attain this upper bound. (cid:4)(cid:4)