[PDF] Improving Information from Manipulable Data

Abstract

Data-based decisionmaking must account for the manipulation of data by agents who are aware of how decisions are being made and want to affect their allocations. We study a framework in which, due to such manipulation, data becomes less informative when decisions depend more strongly on data. We formalize why and how a decisionmaker should commit to underutilizing data. Doing so attenuates information loss and thereby improves allocation accuracy.

Full PDF

IImproving Information from Manipulable Data ∗ Alex Frankel † Navin Kartik ‡ April 7, 2020

Abstract

Data-based decisionmaking must account for the manipulation of data by agents whoare aware of how decisions are being made and want to aﬀect their allocations. We studya framework in which, due to such manipulation, data becomes less informative whendecisions depend more strongly on data. We formalize why and how a decisionmakershould commit to underutilizing data. Doing so attenuates information loss and therebyimproves allocation accuracy.

JEL Classiﬁcation:

C72; D40; D82

Keywords:

Gaming; Goodhart’s Law; Strategic Classiﬁcation ∗ We thank Ian Ball, Ralph Boleslavsky, Max Farrell, Pepe Montiel Olea, Canice Prendergast, RobertTopel, and conference audiences for helpful comments. Bruno Furtado and Suneil Parimoo provided excellentresearch assistance. † University of Chicago Booth School of Business; [email protected] . ‡ Columbia University, Department of Economics; [email protected] . a r X i v : . [ ec on . T H ] A p r . Introduction In various situations an agent receives an allocation based on some prediction about hercharacteristics, and the prediction relies on data generated by the agent’s own behavior.Firms use a consumer’s web browsing history for price discrimination or ad targeting; aprospective borrower’s loan decision and interest rate depend on her credit score; and websearch rankings take as input a web site’s own text and metadata. In all these settings, agentswho understand the prediction algorithm can alter their behavior to receive a more desirableallocation. Consumers can adjust browsing behavior to mimic those with low willingnessto pay; borrowers can open or close accounts to improve their credit score; and web sitescan perform search engine optimization to improve their rankings. How should a designeraccount for manipulation when setting the allocation rule?First consider a naive designer who is unaware of the potential for manipulation. Beforeimplementing an allocation rule, the designer gathers data generated by agents and estimatestheir types (the relevant characteristics). The naive allocation rule assigns each agent theallocation that is optimal according to this estimate. But after the rule is implemented,agents’ behavior changes: if agents with “higher observables” x receive a “higher allocation” y under the allocation rule Y ( x ) , and if agents prefer higher allocations, then some agentswill ﬁnd ways to game the rule by increasing their x . In line with Goodhart’s Law, theoriginal estimation is no longer accurate.A more sophisticated designer realizes that behavior has changed, gathers new data, andre-estimates the relationship between observables and type. After the designer updates theallocation rule based on the new prediction, agent behavior changes once again. The designermight iterate to a ﬁxed point : an allocation rule that is a best response to the data thatis generated under this very rule. But the resulting allocation need not match the desiredagent characteristics well.The question of this paper is how a designer with commitment power—a Stackelbergleader—should adjust a ﬁxed-point allocation rule in order to improve the accuracy of theallocation. We ﬁnd that a designer should make the allocation rule less sensitive to ma-nipulable data than under the ﬁxed point. In other words, the designer should “ﬂatten”the allocation rule. Flattening the allocation results in ex-post suboptimality; the designerhas committed to “underutilizing” agents’ data. Fixed-point allocations, by contrast, areex-post optimal. However, a ﬂatter allocation rule reduces manipulation, which makes thedata more informative about agents’ types. Allocation accuracy improves on balance. We1evelop and explore this logic in what we believe is a compelling model of information lossdue to manipulation.By way of background, note that in some environments, manipulation does not leadto information loss: ﬁxed-point rules deliver the designer’s full-information outcome. Tosee this, think of a ﬁxed-point rule as corresponding to the designer’s equilibrium strategyin a signaling game in which the designer and agent best respond to each other. Undera standard single-crossing condition à la Spence (1973)—the designer wants to give moredesirable allocations to agents with higher types, and higher types have lower marginal costsof taking higher observable actions—this signaling game has a fully separating equilibrium,i.e., one in which the designer perfectly matches the agent’s allocation to her type. Even withcommitment power, a designer cannot improve accuracy by departing from the correspondingallocation rule.To introduce information loss, we build on a framework ﬁrst presented by Prendergastand Topel (1996). The designer learns about an agent’s type by observing data the agentgenerates, her action x ∈ R . Agents are heterogeneous on two dimensions of their types,what we call natural action and gaming ability . The designer is only interested in the naturalaction η ∈ R , which determines the agent’s action x absent any manipulation. Gamingability γ ∈ R summarizes how much an agent manipulates x in response to incentives.When drawing inferences from the action x , the designer’s information about the agent’snatural action η is “muddled” with that about gaming ability γ (Frankel and Kartik, 2019).We assume the designer observes x and chooses an allocation y = Y ( x ) ∈ R with the goalof minimizing the quadratic distance between y and η . We focus on linear allocation rulesor policies Y ( x ) = βx + β , and we posit that agents adjust their observable x in proportionto γβ —their gaming ability times the sensitivity of allocations to observables. These linearfunctional forms arise in the linear-quadratic signaling models of Fischer and Verrecchia(2000) and Bénabou and Tirole (2006), among others. Our main result establishes that the commitment optimal policy is less sensitive to ob-servables than is the ﬁxed-point policy. (For this introduction, suppose there is a uniqueﬁxed point unless indicated otherwise.) Mathematically, for policies Y ( x ) = βx + β , it isoptimal for the designer to depart from the ﬁxed point with β > by attenuating that coef-ﬁcient towards zero. Information is underutilized in the sense that, given the data generated Subsection 2.1 microfounds such agent behavior. Subsection 4.2 discusses optimality of linear allocationrules. Note that, following a common abuse of terminology, we say “linear” instead of the mathematicallymore precise “aﬃne”.

2y agents in response to this optimal policy, the designer would ex-post beneﬁt from usinga higher β . For instance, suppose the sensitivity of the naive policy is β = 1 : when thedesigner does not condition the allocation on observables, the linear regression coeﬃcient oftype η on observable x is 1, and the naive designer responds by matching her allocation rule’ssensitivity to this regression coeﬃcient. The ﬁxed-point policy may have β = 0 . . That is,when the designer sets β = 0 . and runs a linear regression of η on x using data generatedby the agent in response to β = 0 . , the regression coeﬃcient is the same . . Our resultis that the optimal policy has β ∈ (0 , . , say β = 0 . . After the designer sets β = 0 . ,however, the corresponding linear regression coeﬃcient is larger than . , say . . We em-phasize that our argument for shrinking regression coeﬃcients is driven by the informationalbeneﬁt from reduced manipulation, and in turn, the resulting improvement in allocations.It is orthogonal to concerns about model overﬁtting.In comparing our commitment solution with the ﬁxed-point benchmark, it is helpful tokeep in mind two distinct interpretations of the ﬁxed point. The ﬁrst concerns a designerwho has market power in the sense that agents adjust their manipulation behavior in re-sponse to this designer’s policies. Think of web sites engaging in search engine optimizationto speciﬁcally improve their Google rankings; third party sellers paying for fake reviews onthe Amazon platform; or citizens trying to game an eligibility rule for a targeted govern-ment policy. In these cases the designer may settle on a ﬁxed point by iterating policiesuntil reaching an ex post optimum. Our paper highlights that this ﬁxed point may yet besuboptimal ex ante, and oﬀers the prescriptive advice of ﬂattening the allocation rule.A second perspective is that the ﬁxed-point policy represents the outcome of a compet-itive market. With many banks, any one bank that uses credit information in an ex-postsuboptimal manner will simply be putting itself at a disadvantage to its competitors; simi-larly for colleges using SAT scores for admissions. So the ﬁxed point becomes a descriptiveprediction of the market outcome, i.e., the equilibrium of a signaling game. In that case,our optimal policy suggests a government intervention to improve allocations, or a directionthat collusion might take.Before turning to the related literature, we stress two points about our approach. First,our paper aims to formalize a precise but ultimately qualitative point, and make salientits logic. Our model is deliberately stylized and, we believe, broadly relevant for manyapplications. But it is not intended to capture the details on any speciﬁc one. We hopethat it will be useful for particular applications either as a building block or even simply3s a benchmark for thinking about positive and normative implications. Second, we viewour main result—the commitment policy ﬂattens ﬁxed points and underutilizes data—asintuitive once one understands the logic of our environment. Indeed, there is a simple ﬁrst-order gain vs. second-order loss intuition for a local improvement from ﬂattening a ﬁxedpoint; see the discussion after Proposition 1. Conﬁrming that the result holds for the globaloptimum is not straightforward, however; among other things, there can be multiple ﬁxedpoints. Related Literature.

There are many settings in economics in which a designer commitsto making ex-post suboptimal allocations in order to improve ex-ante incentives on somedimension. Our speciﬁc interest in this paper is in a canonical problem of matching alloca-tions to unobservables in the presence of strategic manipulation. In this context, we studya simple model in which there is a beneﬁt of committing to distortions in order to improvethe ex-ante accuracy of the allocations.Building on intuitions from Prendergast and Topel (1996), Fischer and Verrecchia (2000),and Bénabou and Tirole (2006), Frankel and Kartik (2019) elucidate general conditions underwhich an agent’s action becomes less informative to an observer when the agent has strongerincentives to manipulate. None of these papers model the allocation-accuracy problem westudy here; the latter three papers do not study commitment either. Notwithstanding, ourdesigner faces the following tradeoﬀ suggested by the intuitions in those papers: makingallocations more responsive to an agent’s data ampliﬁes the agent’s manipulation, whichmakes the data less informative, reducing the optimal responsiveness for allocation accuracy.At a very broad level, our main result that the designer should ﬂatten allocations relativeto the ﬁxed-point rule is reminiscent of the “downward distortion” of allocations in screen-ing problems following Mussa and Rosen (1978). That said, our framework, analysis, andemphasis—on manipulation and information loss, allocation accuracy, contrasting commit-ment with ﬁxed points—are not readily comparable with that literature. One recent paperon screening we highlight is Bonatti and Cisternas (2019). In a dynamic price discriminationproblem, they show that short-lived ﬁrms get better information about long-lived consumers’types—resulting in higher steady-state proﬁts—if a designer reveals a statistic that under-weights recent consumer behavior. Suitable underweighting dampens consumer incentivesto manipulate demand.A ﬁnance literature addresses the diﬃculty of using market activity to learn fundamentals4hen participants have manipulation incentives. Again in models very diﬀerent from ours,some papers highlight beneﬁts of committing to underutilizing information. See, for example,Bond and Goldstein (2015) and Boleslavsky, Kelly, and Taylor (2017). These authors studytrading in the shadow of a policymaker who may intervene after observing prices or orderﬂows. The anticipation of intervention makes the ﬁnancial market less informative about afundamental to which the intervention should be tailored. Both papers establish that thepolicymaker may beneﬁt from a commitment that, in some sense, entails underutilizationof information. In particular, Bond and Goldstein (2015, Proposition 2) highlight a localﬁrst-order information beneﬁt vs. second-order allocation loss akin to our Lemma 1. Unlikeus, they do not study global optimality.A number of papers in economics study the design of testing regimes and other instrumentsto improve information extraction. Recent examples include Harbaugh and Rasmusen (2018)on pooling test outcomes to improve voluntary participation, Perez-Richet and Skreta (2018)on the beneﬁts of noisy tests when agents can manipulate the test, and Martinez-Gorrichoand Oyarzun (2019) on using “conservative” (or “conﬁrmatory”) thresholds to mitigate ma-nipulation. Jann and Schottmüller (2018), Ali and Bénabou (2019), and Frankel and Kartik(2019) analyze how hiding information about agents’ actions—increasing privacy—can im-prove information about their characteristics. Beyond economics, our paper connects to a recent computer science literature studyingclassiﬁcation algorithms in the presence of strategic manipulation. See, among others, Hardt,Megiddo, Papadimitriou, and Wootters (2016), Hu, Immorlica, and Vaughan (2019), Milli,Miller, Dragan, and Hardt (2018), and Kleinberg and Raghavan (2019). In a binary strategicclassiﬁcation problem, Braverman and Garg (2019) argue for random allocations to improveallocation accuracy and reduce manipulation costs.We would like to reiterate that our designer is only interested in allocation accuracy, notdirectly the costs of manipulation. Moreover, unlike Kleinberg and Raghavan (2019), wemodel an agent’s manipulation eﬀort as pure “gaming”: it does not provide desirable outputor aﬀect the designer’s preferred allocation. By contrast to us, principal-agent problems ineconomics often focus on how allocation rules interact with incentives for desirable eﬀort.For instance, Prendergast and Topel (1996) study contracts in which incentivizing workereﬀort provides a ﬁrm worse information about the worker’s match quality because of anintermediary’s favoritism. In a multitasking environment, Ederer, Holden, and Meyer (2018) Eliaz and Spiegler (2019) explore the distinct issue of an agent’s incentives to reveal her own data to a“non-Bayesian statistician” making predictions about her. He also compares his analog of our commitment solution with both his scoringand ﬁxed-point solutions. Similar to us, he ﬁnds that under certain conditions, his com-mitment solution is less responsive to all of an agent’s features than the (unique, under hisassumptions) ﬁxed-point solution.

2. Model

An agent has a type ( η, γ ) ∈ R drawn from joint distribution F . It may be helpfulto remember the mnemonics η for n atural action, and γ for g aming ability; see Subsec-tion 2.1. Assume the variances Var( η ) = σ η and Var( γ ) = σ γ are positive and ﬁnite. Denote the means of η and γ by µ η , µ γ , and assume their correlation is ρ ∈ ( − , , with ρ = Cov( η, γ ) / ( σ η σ γ ) .A designer seeks to match an allocation y ∈ R to η , with a quadratic loss of ( y − η ) . Thedesigner chooses y = Y ( x ) as a function of an observed action x ∈ R that is chosen by theagent. Thus, the designer’s welfare loss is Welfare Loss ≡ E [( Y ( x ) − η ) ] . (1)The agent chooses x as a function of her type ( η, γ ) after observing the allocation rule Y .In a manner detailed later, the agent will have an incentive to choose a higher x to obtain ahigher y . Given a strategy of the agent, the designer can compute the distribution of x and He interprets the aggregator as produced by an intermediary who shares the decisionmaker’s interests,but cannot control the decisionmaker’s behavior. That is, the intermediary can commit to the aggregationrule but allocations are made optimally given the aggregation. Throughout, we use ‘positive’ to mean ‘strictly positive’, and similarly for ‘negative’, ‘larger’, and‘smaller’. E [ η | x ] for any x the agent may choose. A standard decomposition is Welfare Loss = E [( E [ η | x ] − η ) ] (cid:124) (cid:123)(cid:122) (cid:125) Info loss from estimating η using x + E [( Y ( x ) − E [ η | x ]) ] (cid:124) (cid:123)(cid:122) (cid:125) Misallocation loss given estimation . (2)Holding ﬁxed the agent’s strategy, it is “ex-post optimal” for the designer to set Y ( x ) = E [ η | x ] . However, the agent’s strategy responds to Y . So the designer may prefer to use anex-post suboptimal allocation rule to improve her estimation of η from x , as seen in the ﬁrstterm of (2). That is, the designer may beneﬁt from the power to commit to her allocationrule. Assume the designer chooses among linear allocation rules: the designer chooses policyparameters ( β, β ) ∈ R such that Y ( x ) = βx + β . (3)Also assume that, given the designer’s policy ( β, β ), the agent chooses x using a linearstrategy X β ( η, γ ) that takes the form X β ( η, γ ) = η + mβγ (4)for some exogenous parameter m > . Thus η is the agent’s “natural action”: the action takenwhen the designer’s policy does not depend on x (i.e., β = 0 ). The variable γ representsidiosyncratic responsiveness to the designer’s policy: a higher γ increases the agent’s actionfrom the natural level by more for any β > . The parameter m captures a commoncomponent of responsiveness across all agent types.The strategy in Equation 4 can be motivated as the best response for an agent who The right-hand sides of (1) and (2) are equal if E [( Y ( x )) − ηY ( x ) + η ] = E (cid:2) η − η E [ η | x ] + ( E [ η | x ]) + ( Y ( x )) − Y ( x ) E [ η | x ] + ( E [ η | x ]) (cid:3) . Canceling out like terms and rearranging, it suﬃces to show that E (cid:2) ( E [ η | x ] − η ) Y ( x ) (cid:3) = 2 E (cid:2) ( E [ η | x ] − η ) E [ η | x ] (cid:3) . This equality holds by the orthogonality condition E [( E [ η | x ] − η ) g ( x )] = 0 for all functions g ( x ) . mγy − ( x − η ) / . Here m captures the “stakes” that agents face to obtain higher y , and γ is an idiosyncraticmarginal beneﬁt. Alternatively, the strategy is also optimal for an agent with γ > whomaximizes y − ( x − η ) / (2 mγ ) . Here m parameterizes the “manipulability” of the action x , and γ is an agent’s idiosyncratic“gaming ability”. The designer commits to her policy ( β, β ) , which the agent observes and responds toaccording to (4). Plugging the rule (3) and the strategy (4) into the welfare loss function(1) yields Welfare Loss = E [( β ( η + mβγ ) + β − η ) ] . The designer’s problem is therefore to choose ( β, β ) to minimize the above loss function,which is quartic in β . We denote the solution as ( β ∗ , β ∗ ) . Given the asymmetry between the characteristics η and γ in the agent’s strategy (4), it iscrucial for our results that the designer seeks to match η rather than γ . The reason is thatwhen the designer’s policy puts more weight on the data—when β increases—the agent’saction x becomes less informative about η but more informative about γ ; Remark 1 belowmakes this point precise.It is, on the other hand, straightforward to generalize our analysis to an allocation match-ing some other characteristic of the agent, τ , that is correlated with η . The assumptionwe would require is that E [ τ | η, γ ] is independent of γ and linear in η . The welfare loss E [( Y ( x ) − τ ) ] could then be decomposed as E [( Y ( x ) − E [ τ | η ]) ] + E [( E [ τ | η ] − τ ) ] . As the Using standard mean-variance decompositions,

Welfare Loss = (1 − β ) σ η + m β σ γ − − β ) mβ ρσ η σ γ + ( β − (1 − β ) µ η + mβ µ γ ) . τ using η —is independent of theallocation rule Y ( x ) , it would not aﬀect the designer’s choice. The designer would theneﬀectively be trying to match the allocation to a linear function of η . η on action x When the designer uses policy ( β, β ) , the agent responds with strategy X β ( η, γ ) = η + mβγ . Suppose the designer were to gather data under this agent behavior and then estimatethe relationship between the dimension of interest η and the action x . Speciﬁcally, let ˆ η β ( x ) denote the best linear estimator of η from x under a quadratic loss objective: ˆ η β ( x ) ≡ ˆ β ( β ) x + ˆ β ( β ) , with ˆ β and ˆ β the coeﬃcients of an ordinary least squares (OLS) regression of η on x .Following standard results for OLS, ˆ β ( β ) = σ η + mρσ η σ γ βσ η + m σ γ β + 2 mρσ η σ γ β , (5)where the right-hand side’s numerator is the covariance of x and η given the strategy X β , andits denominator is the variance of x (which is positive because ρ ∈ ( − , ). Correspondingly, ˆ β ( β ) = µ η − ˆ β ( β )[ µ η + mβµ γ ] . It is useful to further rewrite the welfare loss (2) as follows, for any policy ( β, β ) deﬁningthe linear allocation rule Y ( x ) = βx + β : Welfare Loss = E [(ˆ η β ( x ) − η ) ] (cid:124) (cid:123)(cid:122) (cid:125) Info loss from linearly estimating η using x + E [( Y ( x ) − ˆ η β ( x )) ] (cid:124) (cid:123)(cid:122) (cid:125) Misallocation loss given linear estimation . (6)Some readers may ﬁnd it helpful to note that information loss from estimation (the ﬁrstterm in (6)) is the variance of the residuals in an OLS regression of η on x ; put diﬀerently, E [(ˆ η β ( x ) − η ) ] = σ η (1 − R xη ) , with R xη the coeﬃcient of determination in that regression. This derivation is identical to that in fn. 5, only replacing E [ η | x ] by ˆ η β ( x ) and applying the orthogonalitycondition E [(ˆ η β ( x ) − η ) g ( x )] = 0 for all aﬃne functions g ( x ) .

9e stress that Equation 6 is simply a convenient decomposition; given our focus on linearallocation rules, using OLS entails no restrictions.

Remark . For ρ ≥ , ˆ β ( β ) is decreasing on β ≥ . To see why, notice that when β increasesthe agent’s action x depends more on the variable γ . This increases Var( x ) and, when ρ ≥ ,also provides the designer with less information about the variable η that she is trying toestimate from x . Both eﬀects lead to a lower ˆ β . By contrast, if the designer were trying toestimate γ rather than η (minimizing E [( y − γ ) ] rather than E [( y − η ) ] ), then for ρ ≥ ,the analogous regression coeﬃcient of γ on x need not be deceasing on β ≥ . A rule that does not condition the allocation on the observable correspondsto a constant policy ( β, β ) with β = 0 . A constant policy gives rise to a welfare lossof σ η + ( β − µ η ) . In the decomposition of Equation 6, the entire welfare loss is due tomisallocation; the information loss from estimation is zero because the agent’s behavior x = η fully reveals the natural action η . Under the constant policy the linear estimator ˆ η has coeﬃcients ˆ β (0) = 1 and ˆ β (0) = 0 . Naive.

If the designer uses a constant policy ( β, β ) with β = 0 , the agent responds with X ( η, γ ) = η . Suppose the designer gathers data produced from such behavior, and—failingto account for manipulation—expects the agent to maintain this strategy regardless of thepolicy. Then the designer would (incorrectly) perceive her optimal policy to be ( β n , β n0 ) ≡ ( ˆ β (0) , ˆ β (0)) = (1 , . Designer’s best response.

More generally, suppose the designer expects the agent to usethe strategy X β ( η, γ ) = η + mβγ regardless of its policy. The designer would ﬁnd it optimalin response to set an allocation rule Y ( x ) equal to the best linear estimator of η from x , i.e.,a policy ( ˆ β ( β ) , ˆ β ( β )) yielding Y ( x ) = ˆ η β ( x ) . Fixed point.

We say that a policy ( β fp , β fp0 ) is a ﬁxed point if β fp = ˆ β ( β fp ) and β fp0 = ˆ β ( β fp ) . Less information is not generally in the Blackwell (1951) sense unless the prior on ( η, γ ) is bivariatenormal. Rather, it is in the sense of a higher information loss from linearly estimating η using x : E [(ˆ η β ( x ) − η ) ] is increasing in β .

10 ﬁxed point corresponds to a Nash equilibrium of a game in which the designer’s policyis set simultaneously with the agent’s strategy. That is, instead of the designer committingto a policy (the Stackelberg solution), the policy is a best response to the agent’s strategythat the policy induces. In the decomposition of Equation 6, a ﬁxed-point policy may havea positive information loss from estimation, but it has zero misallocation loss—the designeris choosing the optimal policy given the information generated by the agent.Figure 1 illustrates some designer best response functions and ﬁxed points.

3. Analysis

We seek to compare the designer’s optimal policy ( β ∗ , β ∗ ) with the ﬁxed points ( β fp , β fp0 ) .There can, in general, be multiple ﬁxed points, but there is always at least one with a positivesensitivity or weight on the agent’s action, i.e., β fp > . Moreover, when there is nonnegativecorrelation in the agent’s characteristics ( ρ ≥ ), there is only one nonnegative ﬁxed point,and it satisﬁes β fp ∈ (0 , . See Proposition C.1 in the Supplementary Appendix.Take any ﬁxed-point sensitivity β fp > . Our main result is that the optimal policyputs less weight on the agent’s action than does the ﬁxed point. Further, the optimalpolicy underutilizes information by putting less weight on the agent’s action than does theOLS coeﬃcient (and hence the best linear policy) given the data generated by the agent inresponse. Proposition 1.

There is a unique optimum, ( β ∗ , β ∗ ) . It has β ∗ > and β ∗ < β fp for anyﬁxed point β fp > . Moreover, ˆ β ( β ∗ ) > β ∗ . For a concrete example, take m = σ η = σ γ = 1 and ρ = 0 . Recall that the sensitivityof the naive policy is (normalized to) β = 1 . The unique ﬁxed-point policy has β fp ≈ . .The optimal policy reduces the sensitivity to β ∗ ≈ . . Given the agent’s behavior underthis policy, the designer would ex post prefer the higher value ˆ β ( β ∗ ) ≈ . .Here is the intuition for the proposition, as illustrated graphically in Figure 2. Considera designer choosing β = β fp > . This designer’s policy is ex-post optimal in the sense thatmisallocation loss (the second term in the welfare decomposition (6)) given the informationthe designer obtains about η is minimized at zero. Adjusting the sensitivity β in eitherdirection from β fp increases misallocation loss, but this harm is second order because we are11 - β - β ( β ) βρ = ρ = .75 ρ = .9 (a) Parameters: σ η = σ γ = 1 and m = 1 . - β - β ( β ) βρ =- .5 ρ =- .99 (b) Parameters: σ η = σ γ = 1 and m = 0 . . Figure 1 – The best response function ˆ β . As shown in Figure 1a, ˆ β is decreasing on [0 , ∞ ) when ρ ≥ . Figure 1b illustrates that this need not be true when ρ < . In allcases, intersections of ˆ β with β correspond to ﬁxed points β fp .12tarting from a minimum. By contrast, at β = β fp there is positive information loss fromestimation (the ﬁrst term in (6)) because the agent’s action does not reveal η . Lowering β reduces information loss from estimation, which yields a ﬁrst-order beneﬁt. (The ﬁrst-orderbeneﬁt was suggested by Remark 1 for ρ ≥ , and the point is general.) Hence, there is anet ﬁrst-order welfare beneﬁt of lowering β from β fp . Of course, the designer wouldn’t lower β down to 0, since making some use of the information from data is better than not using itat all. The proof of Proposition 1 in Appendix A establishes uniqueness of the global optimum,rules out that it is negative, and shows that it is less than every ﬁxed point β fp > . Toformalize a key step—the aforementioned ﬁrst-order beneﬁt of reducing β from any β fp —let L ( β ) be the welfare loss from policy β (paired with the correspondingly optimal β ), withderivative L (cid:48) ( β ) . Lemma 1.

For any β fp , it holds that L (cid:48) ( β fp ) > . Note that Lemma 1 also applies to negative values of β fp when those exist. Remark . The welfare gains from commitment can be substantial. For suitable parame-ters, the unique ﬁxed point’s welfare is arbitrarily close to that of the best constant policy Y ( x ) = µ η , while the optimal policy’s welfare is arbitrarily close to the ﬁrst best’s. We provide a few comparative statics below. In taking comparative statics, it is helpful toobserve that the designer’s best response ˆ β ( β ) deﬁned in Equation 5 depends on parameters m , σ η , and σ γ only through the statistic k ≡ mσ γ /σ η , as does the welfare loss L ( β ) dividedby σ η (see Equation A.1 in Appendix A.1). Therefore, the optimal and ﬁxed-point values β ∗ and β fp also only depend on these parameters through k . The parameter k summarizesthe susceptibility of the allocation problem to manipulation: higher k (arising from higherstakes or manipulability m of the mechanism, greater variance in gaming ability σ γ , orlower variance in natural actions σ η ) means that under any ﬁxed policy, agents as a wholeadjust their observable action x further from their natural action η , relative to the spread of Indeed, any ﬁxed-point policy itself does better than the best constant policy ( β, β ) = (0 , µ η ) . Note,however, that this constant policy can be better than the naive policy ( β n , β n0 ) = (1 , . The parameters are such that mσ γ /σ η → / + and ρ → − . Both the ﬁrst-best welfare and theconstant policy’s are independent of ρ ; the former is (by normalization) while the latter is − σ η , which canbe arbitrarily low. isallocationlossInfo loss fromestimationTotal loss0 β fp β * β n = β Figure 2 – The welfare loss decomposition from Equation 6 for policy ( β , β ) , with theoptimal β plugged in for each β on the horizontal axis. Parameters: σ η = σ γ = m = 1 and ρ = 0 . Numerical solutions: β ∗ = 0 . and β fp = 0 . .observables prior to manipulation. Hence, for comparative statics over model primitives, itis suﬃcient to consider only the statistic k and the correlation parameter ρ . Proposition 2.

For k ≡ mσ γ /σ η , the following comparative statics hold.1. As k → ∞ , β ∗ → ; as k → , β ∗ → . If ρ ≥ , then β ∗ is strictly decreasing in k ; if ρ < , then β ∗ is strictly quasi-concave in k , attaining a maximum at some point.2. β ∗ is strictly increasing in ρ when k > / , strictly decreasing in ρ when k < / , andindependent of ρ when k = 3 / .3. When ρ = 0 , β ∗ /β fp is strictly decreasing in k , approaching (cid:112) / ≈ . as k → ∞ and as k → . Part 1 of the proposition implies that when agents’ characteristics are nonnegativelycorrelated, a designer faced with a more manipulable environment should put less weighton the agents’ observable action. While such monotonicity is intuitive, it does not holdwhen there is negative correlation. Similarly, one might expect greater positive correlationto increase the optimum β ∗ ; indeed, Frankel and Kartik (2019, Proposition 4) establish that14t does have this eﬀect on the (unique) positive ﬁxed point β fp > . But we see in part 2 ofProposition 2 that this holds for β ∗ only when the susceptibility-to-manipulation statistic k is large enough. Finally, part 3 implies that when the characteristics are uncorrelated, theratio β fp /β ∗ decreases as the statistic k increases. As k → , the ﬁxed point fully revealsan agent’s natural action ( β fp → ) and so the designer does not beneﬁt from commitmentpower: the ﬁxed point is optimal as it provides the minimum possible welfare loss. As k → ∞ , both β ∗ and β fp tend to zero yet the ratio β ∗ /β fp stays bounded.

4. Discussion

We have argued that the designer should make allocations less sensitive to data than inﬁxed points in order to improve information. Of course, information might only aﬀect part ofthe designer’s welfare. Recall that agents shift their action by mβγ away from their naturalaction. If the designer seeks to induce higher actions—because this corresponds to sociallyvaluable “eﬀort”, say—then the designer will want to increase β above β ∗ . If the designerinstead wants to reduce signaling costs, he will further weaken manipulation incentives byattenuating β from β ∗ towards zero. Both shifts will harm allocation accuracy, the formerby increasing the information loss from estimation and the latter by increasing misallocationloss given estimation.Another possibility is that the designer seeks to match the allocation to an agent’s gamingability γ instead of her natural action η , perhaps because gaming ability is a skill the designervalues. In that case one would expect the sensitivity to data under commitment to be largerthan in ﬁxed points; intuitively, increasing manipulation incentives provides less informationabout η but more information about γ (Frankel and Kartik, 2019). That the designer uses linear allocation rules is generally restrictive. Gesche (2019) andFrankel and Kartik (2019) have shown that ﬁxing any linear strategy for the agent, the de-signer’s best response is linear if the agent’s type distribution is bivariate elliptical (Gómez,Gómez-Villegas, and Marín, 2003), subsuming bivariate normal; see also Fischer and Ver-recchia (2000) and Bénabou and Tirole (2006). Hence, under these joint distributions—andwhen agents optimally respond to linear allocation rules with linear strategies (see Subsec-tion 2.1)—the linear ﬁxed-point policies of the current paper correspond to equilibria of a15ignaling game. Ball (2020) extends these results to a multidimensional action space. Anot implausible conjecture is that elliptical distributions also ensure optimality of linearallocation rules when the designer can commit.

We have developed our main point about ﬂattening allocation rules in what we believe isa canonical model of information loss from manipulation, one used in a number of aforemen-tioned papers. Information loss in this model stems from agents’ heterogeneous responses toincentives. But we believe the underlying logic generalizes to other sources of informationloss. For instance, even a model with a one-dimensional type (e.g., no heterogeneity ongaming ability γ ) may have information loss from “pooling at the top” in a bounded actionspace. Appendix D establishes a version of our result for a simple model in that vein. We conclude by sketching a proposal—estimation with noise—for attenuating the impactof manipulable data in more general allocation problems. Consider an environment in whicha designer estimates agent characteristic η from the observation of some x , then assignsallocation y based on both x and the estimate of η . The variable x need not be scalar. Assuch, the allocation rule need not have any easily interpreted coeﬃcient measuring how “ﬂat”or “steep” it is with respect to manipulable components of x .To formalize our proposal, let a data set be a joint distribution over ( x, η ) . Let M L be an estimation procedure (e.g., a machine learning algorithm) that takes as input an observable x and a data set d , and then outputs an allocation y . We interpret M L ( x ; d ) as ﬁrst estimating η from x after being ﬁt to the training data d , and then outputting the designer’s preferredallocation given x and the estimate of η . Estimation with noise.

Recall the classical econometric result that measurement erroron an independent variable leads to attenuation bias, i.e., to an estimated coeﬃcient in alinear regression that is biased towards zero. Applying this concept, here is one approachfor generating the optimal policy of Sections 2–3. First gather training data set ˜ d from somelinear policy ˜ Y ( x ) = β + βx , where we take the coeﬃcient β such that we expect the bestresponse ˆ β ( β ) to be above β ∗ . Then add noise to the measurements of x in the data set ˜ d to generate a new data set d (cid:48) . For instance, replace each data point ( x i , η i ) in ˜ d with16ata point ( x (cid:48) i , η i ) in d (cid:48) , where the new regressor x (cid:48) is deﬁned as x (cid:48) i = x i + c + ε i for c ∈ R and ε i ∼ N (0 , σ ε ) . When we linearly regress η on x (cid:48) in the data set d (cid:48) , attenuation biasestablishes that we ﬁnd a smaller coeﬃcient than ˆ β ( β ) : increasing the variance of the noise σ ε from 0 to inﬁnity reduces the estimated coeﬃcient of η on x (cid:48) from ˆ β ( β ) to . For anappropriate level of noise, we hit the optimal coeﬃcient β ∗ . Finally the constant c , addedto or subtracted from all points x (cid:48) , can be adjusted so that the average allocation is equalto µ η and thus the constant term in the regression is optimal.We can generalize this estimation with noise to arbitrary estimation procedures on arbi-trary data sets. Start with the training data set ˜ d induced by some original policy ˜ Y . Togenerate the new data set d (cid:48) , add noise—perhaps with nonzero mean—to any manipulablecomponents of x to get x (cid:48) , while keeping η unchanged. Now deﬁne the estimation with noisepolicy Y ews as Y ews ( x ) = M L ( x ; d (cid:48) ) . Crucially, when determining the allocation for an agent with observable x , we do not addnoise to this agent’s x . The noise is only added to the data set on which the algorithm istrained. In other words, Y ews sets each agent’s allocation based on an estimate of η , where η is estimated using artiﬁcially noised up data. The logic of attenuation bias suggests that Y ews is in some sense “ﬂatter” with respect to the manipulable components of x , or “puts lessweight” on those components, relative to the best response policy that does not add noise.We hope future research will explore this proposal systematically and study its beneﬁtsin improving information from manipulable data in complex environments. Note that adding noise to the data here does not necessarily mean that the policy function Y ews will bestochastic; indeed, by estimating on resampled data points with independent noise draws, the function canbe made essentially deterministic conditional on the true data. In contrast, mechanisms designed to keepagent characteristics hidden from an observer may require stochastic output conditional on the underlyingdata. See the literature on diﬀerential privacy, surveyed in Dwork (2011). . Appendix: Proofs A.1. Proof of Proposition 1

From Subsection 2.2, ( β ∗ , β ∗ ) solves min ( β,β ) ∈ R E [( mβ γ + β − (1 − β ) η ) ] . The ﬁrst-order condition with respect to β implies β ∗ = (1 − β ) µ η − mβ µ γ . Substituting β ∗ into the objective, the designer chooses β to minimize E [( mβ ( γ − µ γ ) − (1 − β )( η − µ η )) ]=(1 − β ) σ η + m β σ γ − − β ) mβ ρσ η σ γ = σ η (cid:104)(cid:0) (1 − β ) − kβ (cid:1) + 2(1 − ρ ) β (1 − β ) k (cid:105) , where k ≡ mσ γ /σ η > . Equivalently, for k > and ρ ∈ ( − , , β ∗ minimizes L ( β, k, ρ ) ≡ (cid:0) kβ + β − (cid:1) + 2(1 − ρ ) β (1 − β ) k. (A.1)Diﬀerentiating, L β ( β, k, ρ ) = − − β ) + 4 k β + 2 ρkβ (3 β − . (A.2)Note that L β (0 , k, ρ ) < , i.e., there is a ﬁrst-order beneﬁt from putting some positiveweight on the agent’s action.The last statement of Proposition 1 follows from the second because, from Equation 5, ˆ β ( · ) is continuous, ˆ β (0) > , and ˆ β ( β fp ) = β fp for any β fp . Proposition 1 is thus implied byLemma 1 and the following result. We abuse notation hereafter and drop the arguments k and ρ from L ( · ) when those are held ﬁxed. So, for example, L ( β ) means that both k and ρ Lemma A.1.

There exists β ∗ ∈ (0 , such that:1. The loss function L ( β ) from (A.1) is uniquely minimized over β ∈ R at β ∗ .2. β ∗ = min β ≥ { β : L (cid:48) ( β ) ≥ } .3. L (cid:48)(cid:48) ( β ∗ ) > . Proof.

The proof has ﬁve steps below. Steps 1–3 are building blocks to Step 4, whichestablishes that all minimizers of L ( β ) are in (0 , . Step 5 then establishes there is in facta unique minimizer, and it has the requisite properties. It is useful in this proof to extendthe domain of the function L deﬁned in (A.1) to include ρ = − and ρ = 1 .Step 1: We ﬁrst establish two useful properties of L ( β, ρ = 1) . Simplifying (A.1), L ( β, ρ = 1) = (cid:0) kβ + β − (cid:1) is the square of a quadratic. The quadratic kβ + β − is strictly convex in β , minimized at β = β m ≡ − / (2 k ) < , (A.3)and, because it has one negative and one positive root, it is negative and strictly increasingon [ β m , . It follows that L ( · , ρ = 1) is strictly decreasing on [ β m , and symmetric around β m (i.e., for any x , L ( β m + x, ρ = 1) = L ( β m − x, ρ = 1) ).Step 2: We claim that for any β < and ρ < , there is ˜ β ≥ such that L ( ˜ β ) < L ( β ) .Since L (cid:48) (0) < , it follows that for ρ < , arg min L ( β, ρ ) ⊂ R ++ .To prove the claim, we ﬁrst establish that for any x > and β = β m − x (where β m isdeﬁned in (A.3)), the symmetric point β m + x has a lower loss when ρ < ; note that β m + x may also be negative. The argument is as follows: L ( β m − x, ρ ) − L ( β m + x, ρ ) = L ( β m − x, ρ = 1) + 2(1 − ρ )( β m − x ) (1 − β m + x ) k − (cid:2) L ( β m + x, ρ = 1) + 2(1 − ρ )( β m + x ) (1 − β m − x ) k (cid:3) = 2(1 − ρ ) k (cid:2) ( β m − x ) (1 − β m + x ) − ( β m + x ) (1 − β m − x ) (cid:3) = 4(1 − ρ ) kx (cid:0) β m (3 β m −

2) + x (cid:1) ≥ , L ( β m + x, ρ = 1) = L ( β m − x, ρ = 1) , the third equality is from algebraic simpliﬁcation,and the inequality is because β m < , x > , and ρ < .It now suﬃces to establish L (0 , ρ ) < L ( β, ρ ) for all β ∈ [ β m , . Diﬀerentiating (A.2)yields L βρ ( β, ρ ) = 2 kβ (3 β − > when β < . Hence for β ∈ [ β m , , L (0 , ρ ) − L ( β, ρ ) ≤ L (0 , ρ = 1) − L ( β, ρ = 1) < , where the strict inequality is from Step 1.Step 3: arg min β L ( β, ρ = − ∩ (0 , (cid:54) = ∅ .To prove this, simplify (A.1) to get L ( β, ρ = −

1) = (cid:0) kβ − β + 1 (cid:1) . The quadratic kβ − β + 1 is strictly convex in β and minimized at β = 1 / (2 k ) ; moreover, if k ≥ / then that quadratic is nonnegative, and otherwise it is equal to zero at β = ±√ − k k .It follows that if k ≥ / , arg min L ( β, ρ = −

1) = { / (2 k ) } ⊂ (0 , . If k ∈ (0 , / , min arg min L ( β, ρ = −

1) = −√ − k k ∈ (0 , .Step 4: For ρ ∈ ( − , , arg min β L ( β, ρ ) ⊂ (0 , .To prove this, note that L βρ ( β, ρ ) = 2 kβ (3 β − > when β > / . Monotone compara-tive statics (see Fact 1 in the Supplementary Appendix) imply that on the domain (2 / , ∞ ) every minimizer of L ( · , ρ ) when ρ > − is smaller than every minimizer of L ( · , ρ = − .Step 3 then implies that all minimizers when ρ > − are less than ; Step 2 established thatwhen ρ < , all minimizers are larger than .Step 5: Finally, we claim that for ρ ∈ ( − , , L (cid:48) ( β ) has only one root in (0 , ; moreover, L (cid:48)(cid:48) ( β ) > at that root. The lemma follows because L (cid:48) ( β ) is continuous and L (cid:48) (0) < .To prove the claim, ﬁrst observe from Equation A.2 that L (cid:48) ( β ) is a cubic function thatis initially strictly concave and then strictly convex, with inﬂection point β = − ρ/ (2 k ) . Forthe rest of the proof, view L (cid:48) or L (cid:48)(cid:48) as a function of β only.1. If ρ ≥ , then the inﬂection point is negative, and thus L (cid:48) is strictly convex on β > .Since L (cid:48) (0) < , L (cid:48) has only one positive root, and L (cid:48)(cid:48) > at that root.2. Consider ρ ∈ ( − , . L (cid:48)(cid:48) is minimized at the inﬂection point of L (cid:48) . DiﬀerentiatingEquation A.2 and evaluating at the inﬂection point, L (cid:48)(cid:48) (cid:18) − ρ k (cid:19) = 2 + 12 k (cid:18) − ρ k (cid:19) + 4 ρk (cid:18) (cid:18) − ρ k (cid:19) − (cid:19) = 2 − ρ − kρ.

20f this expression is positive, then L (cid:48)(cid:48) ( β ) > for all β , i.e., L (cid:48) is strictly increasing andhence has a unique root.So suppose instead − ρ − kρ ≤ . Equivalently, since ρ < , suppose k ≤ − ρ ρ . The right-hand side of this inequality is less than − ρ/ because ρ ∈ ( − , , and hence k < − ρ/ . Consequently, the inﬂection point, β = − ρ/ (2 k ) , is larger than , andtherefore L (cid:48) ( β ) is concave over β ∈ (0 , . Moreover, recall that L (cid:48) (0) < , and alsoobserve that L (cid:48) (2) = 32 k + 16 kρ + 2 > because k < − ρ/ and ρ ∈ ( − , . It followsthat L (cid:48) has only one root on (0 , , and L (cid:48)(cid:48) > at that root. A.2. Proof of Lemma 1

It holds that E [(ˆ η β ( x ) − η ) ] = σ η (cid:0) − R ηx (cid:1) = σ η − ( ˆ β ( β )) Var( x ) , where the ﬁrst equality was noted after Equation 6, and the second equality holds because R ηx = (Cov( x, η )) / (Var( η )Var( x )) and ˆ β ( β ) = Cov( x, η ) / Var( x ) .We also have E [( Y ( x ) − ˆ η β ( x )) ] = E [( βx + β − ˆ β ( β ) x − ˆ β ( β )) ] from deﬁnitions = E (cid:20)(cid:16) ( β − ˆ β ( β ))( x − E [ x ]) (cid:17) (cid:21) = ( β − ˆ β ( β )) Var( x ) , where the second line is because β E [ x ] + β = µ η = ˆ β ( β ) E [ x ] + ˆ β ( β ) (the second equalityhere is standard; for the ﬁrst, see the beginning of the proof of Proposition 1) and hence β − ˆ β ( β ) = ( ˆ β ( β ) − β ) E [ x ] .Substituting these formulae into Equation 6 yields L ( β ) = σ η − ( ˆ β ( β )) Var( x ) (cid:124) (cid:123)(cid:122) (cid:125) Info loss + ( β − ˆ β ( β )) Var( x ) (cid:124) (cid:123)(cid:122) (cid:125) Misallocation loss . L (cid:48) ( β ) = Marginal change in info loss (cid:122) (cid:125)(cid:124) (cid:123)(cid:18) − β ( β ) ˆ β (cid:48) ( β )Var( x ) − ( ˆ β ( β )) dd β Var( x ) (cid:19) + (cid:18) − β − ˆ β ( β )) ˆ β (cid:48) ( β )Var( x ) + ( β − ˆ β ( β )) dd β Var( x ) (cid:19)(cid:124) (cid:123)(cid:122) (cid:125) Marginal change in misallocation loss . When β = β fp = ˆ β ( β fp ) , the marginal change in misallocation loss is evidently zero. Thus, L (cid:48) ( β fp ) = − β fp ˆ β (cid:48) ( β fp )Var( x ) − ( β fp ) dd β Var( x ) . Using

Var( x ) = Cov( x, β ) / ˆ β ( β ) , Cov( x, β ) = σ η + mρσ η σ γ β , Var( x ) = Cov( x, β )+ mρσ η σ γ β + m σ γ β , and β fp = ˆ β ( β fp ) , some algebra yields L (cid:48) ( β fp ) = 2 m Var( x ) ( β fp ) σ η σ γ (1 − ρ ) . Hence, L (cid:48) ( β fp ) > because β fp (cid:54) = 0 (as ˆ β (0) = 1 from Equation 5) and ρ ∈ ( − , . A.3. Proof of Proposition 2

The proof is via the following claims. Applying Lemma A.1, we without loss restrictattention to β ∈ (0 , in all the claims. Claim A.1. β ∗ is continuously diﬀerentiable in ρ and k . Proof.

Lemma A.1 established that sign[ L (cid:48)(cid:48) ( β ∗ )] > . Thus, the implicit function theorem Letting C and V be shorthand for Cov( x, β ) and Var( x ) respectively, a prime denote the derivativewith respect to β , suppressing arguments, evaluating all functions at β fp , and using the properties noted: L (cid:48) = − β fp ˆ β (cid:48) V − ( C/V ) V (cid:48) = − C ˆ β (cid:48) − ( C/V ) V (cid:48) = ( − CV C (cid:48) + C V (cid:48) ) /V = ( C/V ) (cid:2) − CC (cid:48) ( C + mρσ η σ γ β fp + m σ γ ( β fp ) ) + C (2 C (cid:48) + 2 m σ γ β fp ) (cid:3) = (2 β fp C/V ) (cid:2) − C (cid:48) ( mρσ η σ γ + m σ γ β fp ) + Cm σ γ (cid:3) = (2 β fp C/V ) (cid:2) − ( mρσ η σ γ ) − ( mρσ η σ γ ) m σ γ β fp ) + ( σ η + mρσ η σ γ β fp ) m σ γ (cid:3) = (2 β fp C/V ) m ( σ η σ γ ) (1 − ρ )= 2( β fp ) (1 /V ) m σ η σ γ (1 − ρ ) . d β ∗ d k = − L βk L ββ and d β ∗ d ρ = − L βρ L ββ . Claim A.2. If k > / then β ∗ < / and is strictly increasing in ρ . If k < / then β ∗ > / and is strictly decreasing in ρ . If k = 3 / then β ∗ = 2 / independent of ρ . Proof.

From Equation A.2 compute the cross partial L βρ = 2 kβ (3 β − . Hence L βρ < when β < / , while L βρ > when β > / . Moreover, it follows fromEquation A.2 that when β = 2 / , sign[ L β ] = sign[ k − / independent of ρ .1. Consider k = 3 / . Routine algebra veriﬁes that L β is strictly increasing in β , andhence L β = 0 = ⇒ β = 2 / , i.e., β ∗ = 2 / independent of ρ .2. Consider k > / . Since L β > when β = 2 / , it follows that β ∗ < / . (Recall L β < when β = 0 , and Lemma A.1 implies that β ∗ = min { β > L β = 0 } .) Since L βρ < on the domain β < / , monotone comparative statics (see Fact 1 in theSupplementary Appendix) imply β ∗ is strictly increasing in ρ .3. Consider k < / . For ρ = 0 , we have L βk = 8 kβ > and hence β ∗ > / using β ∗ = 2 / when k = 3 / and monotone comparative statics. It follows that β ∗ > / for all ρ because β ∗ is continuous in ρ and L β < when β = 2 / whereas L β = 0 when β = β ∗ . Since L βρ > on the domain β > / , monotone comparative statics imply β ∗ is strictly decreasing in ρ . Claim A.3. As k → ∞ , β ∗ → ; as k → , β ∗ → . If ρ ≥ then β ∗ is strictly decreasingin k . If ρ < then β ∗ is strictly quasi-concave in k , attaining a maximum at some point. Proof.

The ﬁrst statement about limits is evident from inspecting Equation A.2. For thecomparative statics, compute the cross partials L βk = 8 kβ + 2 ρβ (3 β − and L βkk = 8 β > . Since d β ∗ d k = − L βk L ββ and, from Lemma A.1, L ββ > at β = β ∗ , the sign of d β ∗ d k is the sign of − L βk . Using β ∗ → as k → , we see that for small k and at β = β ∗ , L βk is larger than butarbitrarily close to ρ .1. It follows that L βk > for all k and β = β ∗ when ρ ≥ . That is, d β ∗ d k < when ρ ≥ .23. Consider ρ < . Plainly L βk < for small k and β = β ∗ , while for some k it becomespositive (since β ∗ → as k → ∞ ). Since L βk is strictly increasing in k , it follows that d β ∗ d k is strictly decreasing in k , initially positive and eventually negative. Claim A.4.

Assume ρ = 0 . There is a unique β fp , which is positive. Both β fp and β ∗ /β fp are strictly decreasing in k . Moreover, β ∗ /β fp → as k → ∞ and β ∗ /β fp → (cid:112) / as k → . Proof.

Assume ρ = 0 . Equation C.1 simpliﬁes to k ( β fp ) + β fp − , (A.4)which has a unique solution, with β fp ∈ (0 , strictly decreasing in k with range (0 , .The ﬁrst-order condition for β ∗ simpliﬁes to k ( β ∗ ) + β ∗ − , (A.5)which has a unique solution, also in (0 , and strictly decreasing in k with range (0 , .Hence, β ∗ /β fp → as k → . Moroever, Equation A.4 and Equation A.5 imply that as k → ∞ , k ( β fp ) → and k ( β ∗ ) → , and hence ( β ∗ /β fp ) → (cid:112) / .It remains to prove that β ∗ /β fp is strictly decreasing in k . Applying the implicit functiontheorem to Equation A.4 and Equation A.5 (which is indeed valid) and doing some algebra, d β ∗ d k = − k ( β ∗ ) k ( β ∗ ) + 1 , d β fp d k = − k ( β fp ) k ( β fp ) + 1 .β ∗ /β fp is strictly decreasing in k if and only if β fp d β ∗ d k − β ∗ d β fp d k < . Substituting in theformulae above, this inequality is equivalent to k ( β fp ) β ∗ k ( β fp ) + 1 < k ( β ∗ ) β fp k ( β ∗ ) + 1 ⇐⇒ (cid:0) k ( β ∗ ) + 1 (cid:1) ( β fp ) < (cid:0) k ( β fp ) + 1 (cid:1) β ∗ ) ⇐⇒ β fp < β ∗ √ . k → because both β fp → and β ∗ → as k → . Bycontinuity, we are done if there is no k at which β fp = β ∗ √ . Indeed there is not becausethen Equation A.4 would become equivalent to k ( β ∗ ) + β ∗ − / √ , contradicting Equation A.5. References

Ali, S. N. and R. Bénabou (2019): “Image Versus Information: Changing Societal Normsand Optimal Privacy,” Forthcoming in

American Economic Journal: Microeconomics . Ball, I. (2020): “Scoring Strategic Agents,” Working paper.

Bénabou, R. and J. Tirole (2006): “Incentives and Prosocial Behavior,”

American Eco-nomic Review , 96, 1652–1678.

Blackwell, D. (1951): “Comparison of Experiments,” in

Proceedings of the Second Berke-ley Symposium on Mathematical Statistics and Probability , University of California Press,vol. 1, 93–102.

Boleslavsky, R., D. L. Kelly, and C. R. Taylor (2017): “Selloﬀs, Bailouts, andFeedback: Can Asset Markets Inform Policy?”

Journal of Economic Theory , 169, 294–343.

Bonatti, A. and G. Cisternas (2019): “Consumer Scores and Price Discrimination,”Forthcoming in

Review of Economic Studies . Bond, P. and I. Goldstein (2015): “Government Intervention and Information Aggrega-tion by Prices,”

Journal of Finance , 70, 2777–2812.

Braverman, M. and S. Garg (2019): “The Role of Randomness and Noise in StrategicClassiﬁcation,” ACM EC 2019 Workshop on Learning in Presence of Strategic Behavior.25 work, C. (2011): “Diﬀerential Privacy,”

Encyclopedia of Cryptography and Security , 338–340.

Ederer, F., R. Holden, and M. Meyer (2018): “Gaming and Strategic Opacity inIncentive Provision,”

RAND Journal of Economics , 49, 819–854.

Eliaz, K. and R. Spiegler (2019): “The Model Selection Curse,”

American EconomicReview: Insights , 1, 127–40.

Fischer, P. E. and R. E. Verrecchia (2000): “Reporting Bias,”

The Accounting Review ,75, 229–245.

Frankel, A. and N. Kartik (2019): “Muddled Information,”

Journal of Political Econ-omy , 129, 1739–1776.

Gesche, T. (2019): “De-biasing Strategic Communication,” Unpublished.

Gómez, E., M. A. Gómez-Villegas, and J. M. Marín (2003): “A Survey on ContinuousElliptical Vector Distributions,”

Revista matemática complutense , 16, 345–361.

Harbaugh, R. and E. Rasmusen (2018): “Coarse Grades: Informing the Public byWithholding Information,”

American Economic Journal: Microeconomics , 10, 210–35.

Hardt, M., N. Megiddo, C. Papadimitriou, and M. Wootters (2016): “StrategicClassiﬁcation,” in

Proceedings of the 2016 ACM conference on innovations in theoreticalcomputer science , ACM, 111–122.

Hu, L., N. Immorlica, and J. W. Vaughan (2019): “The Disparate Eﬀects of Strate-gic Manipulation,” in

ACM Conference on Fairness, Accountability, and Transparency ,Atlanta, Georgia.

Jann, O. and C. Schottmüller (2018): “An Informational Theory of Privacy,” Unpub-lished.

Kleinberg, J. and M. Raghavan (2019): “How Do Classiﬁers Induce Agents to InvestEﬀort Strategically?” in

Proceedings of the 2019 ACM Conference on Economics andComputation , ACM, 825–844.

Kreps, D. and R. Wilson (1982): “Sequential Equilibria,”

Econometrica , 50, 863–894.

Liang, A. and E. Madsen (2020): “Data Linkages and Incentives,” unpublished.26 artinez-Gorricho, S. and C. Oyarzun (2019): “Hypothesis Testing with EndogenousInformation,” Unpublished.

Milli, S., J. Miller, A. D. Dragan, and M. Hardt (2018): “The Social Cost ofStrategic Classiﬁcation,” ArXiv:1808.08460v2 [cs.LG].

Mussa, M. and S. Rosen (1978): “Monopoly and Product Quality,”

Journal of EconomicTheory , 18, 301–317.

Perez-Richet, E. and V. Skreta (2018): “Test Design Under Falsiﬁcation,” Unpub-lished.

Prendergast, C. and R. H. Topel (1996): “Favoritism in Organizations,”

Journal ofPolitical Economy , 104, 958–978.

Spence, M. (1973): “Job Market Signaling,”

Quarterly Journal of Economics , 87, 355–374.27 upplementary (Online) Appendices

B. Monotone Comparative Statics

The following fact on monotone comparative statics is used in the proof of Proposition 1and in the proof of Proposition 2. Although it is well known, we include a proof.

Fact . Let T ⊆ R , Z ⊆ R be open, and f : Z × T → R be continuously diﬀerentiable in z with for all t ∈ T , arg min z ∈ Z f ( z, t ) (cid:54) = ∅ . Deﬁne M ( t ) ≡ arg min z ∈ Z f ( z, t ) . For any t ∈ T and t ∈ T with t > t , it holds that:1. If f z ( z, t ) > f z ( z, t ) for all z ∈ Z , then for any m ∈ M ( t ) and any m ∈ M ( t ) it holdsthat m < m .Proof: For any ˆ z > m , f (ˆ z, t ) − f ( m, t ) = (cid:90) ˆ zm f z ( z, t )d z > (cid:90) ˆ zm f z ( z, t )d z = f (ˆ z, t ) − f ( m, t ) ≥ . Hence m ≤ m . The inequality must be strict because otherwise the ﬁrst-order condi-tions yield f z ( m, t ) = f z ( m, t ) > f z ( m, t ) = 0 .2. If f z ( z, t ) < f z ( z, t ) for all z ∈ Z , then for any m ∈ M ( t ) and any m ∈ M ( t ) it holdsthat m > m . (We omit a proof, as it is analogous to that above.) (cid:3) C. Additional Results

Proposition C.1.

There exists β fp > satisfying ˆ β ( β fp ) = β fp . If ρ ≥ there is only one β fp ≥ , and it satisﬁes β fp ∈ (0 , . That there is only one positive ﬁxed point under nonnegative correlation has been notedin diﬀerent form in Frankel and Kartik (2019, Proposition 4).

Proof of Proposition C.1.

For β ≥ , Equation 5 can be rewritten as the cubic equation m σ γ β + 2 mρσ η σ γ β + ( σ η − mρσ η σ γ ) β − σ η = 0 . (C.1)The left-hand side of (C.1) is continuous, negative at β = 0 and tends to ∞ as β → ∞ .There is a positive solution to (C.1) by the intermediate value theorem.28or the second statement of the proposition, diﬀerentiate ˆ β ( · ) from Equation 5 to obtain ˆ β (cid:48) ( β ) = − mσ η σ γ (2 βmσ η σ γ + ρσ η + ρβ m σ γ )( σ η + 2 βmρσ η σ γ + β m σ γ ) . When ρ ≥ , this derivative is negative for all β > . The result follows from the fact that ˆ β (0) = 1 and, when ρ ≥ , ˆ β (1) < . D. Alternative Model of Information Loss

Our paper ﬁnds that a designer improves information, and thereby allocation accuracy,by ﬂattening a ﬁxed point rule. We developed this point in what we believe is a canonicalmodel of information loss from manipulation, one used in a number of other papers. Butwe think the point applies more broadly, including in other models of information loss. Forinstance, even a model with a one-dimensional type (such as the model in this paper with noheterogeneity on the gaming ability γ ) can lead to information loss when there is a boundedaction space and strong manipulation incentives. The reason is “pooling at the top”. Weestablish below a version of our main result for a simple model in this vein.Let the agent take action x ∈ { , } with natural action η ∈ { , } . The agent’s type η is her private information, drawn with ex-ante probability π ∈ (0 , that η = 1 . Afterobserving x , the designer chooses allocation y ∈ R with payoﬀ − ( y − η ) . We assume, forsimplicity, that the agent of type η = 1 must choose x = 1 . The payoﬀ for type η = 0 is y − cx , where c > is a commonly known parameter. To streamline the analysis, we assume c ∈ (0 , π ) .A pure allocation rule or policy is Y : { , } → R . Due to the designer’s quadratic losspayoﬀ, it is without loss to focus on pure policies. Given a policy Y , let ∆ ≡ Y (1) − Y (0) bethe diﬀerence in allocations across the two actions of the agent. We focus, without loss, onpolicies with ∆ ≥ . A policy with a smaller ∆ is a “ﬂatter” policy, i.e., it is less sensitive tothe agent’s action. The naive policy Y n sets Y n (1) = 1 and Y n (0) = 0 , corresponding to anaive allocation diﬀerence of ∆ n = 1 . Let ∆ fp and ∆ ∗ denote the corresponding diﬀerencesfrom ﬁxed point and commitment policies. Our main point goes through so long as action x = 1 is no more costly than x = 0 for type η = 1 , asthis will ensure it is optimal for type η = 1 to choose x = 1 . .1. Naive Policy Take any policy with ∆ = 1 . Since we assume c < π < , even the agent with η = 0 will then choose x = 1 . So welfare—the designer’s ex-ante expected payoﬀ—from the naivepolicy is − π (0 − − (1 − π )(1 − = − (1 − π ) . D.2. Fixed Point

At a Bayesian Nash equilibrium (of either the simultaneous move game, or when the agentmoves ﬁrst), Y ( x ) = E [ η | x ] for any x on the equilibrium path. If x = 0 is on the equilibriumpath, Y (0) = 0 because type η = 1 does not play x = 0 .There is a fully-pooling equilibrium with both types playing x = 1 : the designer plays Y (1) = π and Y (0) = 0 , and it is optimal for type η = 0 to play x = 1 because c < π . Thecorresponding welfare is − π ( π − − (1 − π )( π − = − π (1 − π ) . There is no equilibrium in which the agent of type η = 0 puts positive probability onaction x = 0 , because that would imply Y (1) > π and Y (1) = 0 , against which the agent’sunique best response is to play x = 1 .Therefore, we have identiﬁed the (essentially unique, up to the oﬀ-path allocation following x = 0 ) ﬁxed point policy: Y fp (1) = π , Y fp (0) = 0 , and therefore ∆ fp = π . The agent poolson x = 1 , and welfare is − π (1 − π ) . This welfare is larger than that of the naive policy.

D.3. Commitment

Now suppose the designer’s commits to a policy before the agent moves. From the earlieranalysis, if ∆ > c the agent will pool at x = 1 and so an optimal such policy is the ﬁxedpoint policy Y fp . For any ∆ < c , there is full separation: the agent’s best response is x = η .Indeed, full separation is also a best response for the agent when ∆ = c . Given that thedesigner wants to match the agent’s type, it follows that the optimal way to induce fullseparation is to set ∆ = c (or ∆ = c − ), i.e., have Y ∗ (1) = Y ∗ (0) + c . The choice of Y fp (0) = 0 can be justiﬁed from the perspective of the agent “trembling”. In particular,in the signaling game where the agent moves before the designer, any sequential equilibrium (Kreps andWilson, 1982) has Y (0) = 0 , as only type η = 0 can play x = 0 . But note that no matter how Y (0) isspeciﬁed, it must hold in a ﬁxed point that ∆ ≤ c ; otherwise the agent will not pool at x = 1 .

30t such an optimum, quadratic loss utility implies that the designer sets an average actionof (1 − π ) Y ∗ (0) + πY ∗ (1) equal to E [ η ] = π . Plugging in Y ∗ (1) = Y ∗ (0) + c yields (1 − π ) Y ∗ (0) + π ( Y ∗ (0) + c ) = π, and hence the solution Y ∗ (0) = π (1 − c ) , Y ∗ (1) = π (1 − c ) + c. The corresponding welfare is − (1 − π )( π (1 − c ) − − π ( π (1 − c ) + c − = − (1 − c ) (1 − π ) π. This welfare is larger than that under the ﬁxed point. Moreover, the optimal policy has ∆ ∗ = c while the ﬁxed point has ∆ fp = π and the naive policy has ∆ n = 1 . Thus the optimalpolicy is ﬂatter than the ﬁxed point, which in turn is ﬂatter than the naive policy: ∆ ∗ < ∆ fp < ∆ n . Note that the designer obtains no beneﬁt from reducing ∆ from ∆ fp = π until reaching ∆ ∗ = c ; this is an artifact of the assumption that there is no heterogeneity in the manipulationcost c . In a model with such heterogeneity, there would be a more continuous beneﬁt ofreducing ∆∆