[PDF] Optimal Incentive Contract with Endogenous Monitoring Technology

Abstract

Recent technology advances have enabled firms to flexibly process and analyze sophisticated employee performance data at a reduced and yet significant cost. We develop a theory of optimal incentive contracting where the monitoring technology that governs the above procedure is part of the designer's strategic planning. In otherwise standard principal-agent models with moral hazard, we allow the principal to partition agents' performance data into any finite categories and to pay for the amount of information the output signal carries. Through analysis of the trade-off between giving incentives to agents and saving the monitoring cost, we obtain characterizations of optimal monitoring technologies such as information aggregation, strict MLRP, likelihood ratio-convex performance classification, group evaluation in response to rising monitoring costs, and assessing multiple task performances according to agents' endogenous tendencies to shirk. We examine the implications of these results for workforce management and firms' internal organizations.

Full PDF

OOptimal Incentive Contract with EndogenousMonitoring Technology

Anqi Li ∗ Ming Yang † Forthcoming, Theoretical Economics

Abstract

Recent technology advances have enabled ﬁrms to ﬂexibly process and ana-lyze sophisticated employee performance data at a reduced and yet signiﬁcantcost. We develop a theory of optimal incentive contracting where the moni-toring technology that governs the above procedure is part of the designer’sstrategic planning. In otherwise standard principal-agent models with moralhazard, we allow the principal to partition agents’ performance data into anyﬁnite categories and to pay for the amount of information the output signalcarries. Through analysis of the trade-oﬀ between giving incentives to agentsand saving the monitoring cost, we obtain characterizations of optimal mon-itoring technologies such as information aggregation, strict MLRP, likelihoodratio-convex performance classiﬁcation, group evaluation in response to risingmonitoring costs, and assessing multiple task performances according to agents’endogenous tendencies to shirk. We examine the implications of these resultsfor workforce management and ﬁrms’ internal organizations.Key words: incentive contract; endogenous monitoring technology.JEL codes: D86, M15, M5. ∗ Department of Economics, Washington University in St. Louis. [email protected]. † Fuqua School of Business, Duke University. [email protected]. We thank the coeditor, fouranonymous referees, Nick Bloom, George Mailath, Ilya Segal, Chris Shannon, Joel Sobel, JacobSteinhardt, Bob Wilson and the seminar participants at Caltech, Decentralization Conference 2017,Duke-UNC, Johns Hopkins Carey, Northwestern, Stanford, UCSB, UCSD, U of Chicago and U ofWashington for comments and suggestions. Lin Hu and Jessie Li provided generous assistance forthe numerical analysis. All errors are our own. a r X i v : . [ ec on . T H ] N ov Introduction

Recent technology advances have enabled ﬁrms to ﬂexibly process and analyze so-phisticated employee performance data at a reduced and yet signiﬁcant cost. Speechanalytics software, natural language processing tools and cloud-based systems are in-creasingly used to convert hard-to-process contents into succinct and meaningful rat-ings such as “satisfactory” and “unsatisfactory” (Murﬀ et al. (2011); Singer (2013);Kaplan (2015)). This paper develops a theory of optimal incentive contracting wherethe monitoring technology that governs the above procedure is part of the designer’sstrategic planning.Our research agenda is motivated by the case of call center performance manage-ment reported by Singer (2013). It has long been recognized that the conversationsbetween call center agents and customers contain useful performance indicators suchas customer sentiment, voice quality and tone, etc.. Recently, the advent of speechanalytics software has ﬁnally enabled the processing and analysis of these contents,as well as their conversions into meaningful ratings such as “satisfactory” and “un-satisfactory.” On the one hand, running speech analytics software consumes serverspace and power, and the procedure has been increasingly outsourced to third partiesto take advantage of the latest development in cloud computing. On the other hand,managers now have considerable freedom to decide which facets of the customer con-versation to utilize, thanks to the increased availability of products whose specialtiesrange from emotion detection to word spotting.We formalize the ﬂexibility and cost associated with the design and implementa-tion of the monitoring technology in otherwise standard principal-agent models withmoral hazard. Speciﬁcally, we allow the monitoring technology to partition agents’performance data into any ﬁnite categories, at a cost that increases with the amountof information the output signal carries (hereafter monitoring cost ). An incentivecontract pairs the monitoring technology with a wage scheme that maps realizationsof the output signal to diﬀerent wages. An optimal contract minimizes the sum ofexpected wage and monitoring cost, subject to agents’ incentive constraints.Our main result gives characterizations of optimal monitoring technologies in gen-eral environments, showing that the assignment of Lagrange multiplier-weighted likeli-hood ratios to performance categories is positive assortative in the direction of agentutilities. Geometrically, this means that optimal monitoring technologies comprise2onvex cells in the space of likelihood ratios or their transformations. This result pro-vides practitioners with the needed formula for sorting employee performance data,and exploiting its geometry yields insights into workforce management and ﬁrms’internal organizations.Our proof strategy works directly with the principal’s Lagrangian. It handlesgeneral situations featuring multiple agents and tasks, in which the direction of sortingvector-valued likelihood ratios is nonobvious a priori. It overcomes the technicalchallenge whereby perturbations of the sorting algorithm aﬀect wages endogenouslythrough the Lagrange multipliers of agents’ incentive constraints, yielding eﬀects thatare new and diﬃcult to assess using standard methods.We give three applications of our result. In the single-agent model considered inHolmstr¨om (1979), we show that the assignment of likelihood ratios to wage categoriesis positive assortative and follows a simple cutoﬀ rule. The monitoring technologyaggregates potentially high-dimensional performance data into rank-ordered ratings,and the output signal satisﬁes the strict monotone likelihood ratio property withrespect to the order induced by likelihood ratios. Solving cutoﬀ likelihood ratios yieldsconsistent ﬁndings with recent developments in manufacturing, retail and healthcaresectors, where decreases in the data processing cost have shown to increase the ﬁnenessof the performance grids (Bloom and Van Reenen (2006, 2007); Murﬀ et al. (2011);Ewenstein, Hancock, and Komm (2016)).In the multi-agent model considered in Holmstr¨om (1982), the optimal monitor-ing technology partitions vectors of individual agents’ likelihood ratios into convexpolygons. Based on this result, we then compare individual and group performanceevaluations from the angle of monitoring cost, showing that ﬁrms should switch fromindividual to group evaluation in response to rising monitoring costs. This result for-malizes the theses of Alchian and Demsetz (1972) and Lazear and Rosen (1981) thateither team or tournament should be the dominant incentive system when individualperformance evaluation is too costly to conduct. It is consistent with the ﬁndings ofBloom and Van Reenen (2006, 2007) that lack of IT access increases the use of groupperformance evaluation among otherwise similar ﬁrms.In the multiple-task model studied in Holmstr¨om and Milgrom (1991), the re-sources spent on the assessment of a task performance should increase with the agent’sendogenous tendency to shirk the corresponding task. Using simulation, we applythis result to the study of, e.g., how improved precision of some task measurements3caused by, e.g., the advent of high-quality scanner data measuring the skillfulnessin scanning items) would aﬀect the resources spent on the assessments of other taskperformances (e.g., projecting warmth to customers).

Earlier studies on contracting with costly experiments (in the sense of Blackwell(1953)) include, but are not limited to: Baiman and Demski (1980) and Dye (1986), inwhich the principal can pay an external auditor for drawing a signal from an exogenousdistribution; Holmstr¨om (1979), Grossman and Hart (1983) and Kim (1995), in whichsignal distributions are ranked based on the incentive costs they incur. In thesestudies, the principal can change the probability space generated by the agent’s hiddeneﬀort and, in the ﬁrst two studies, through paying stylized costs. In contrast, we focuson the conversion of raw data into performance ratings while taking the probabilityspace as given. Also our assumption that the monitoring cost increases with theamount of information carried by the output signal could be ill-suited for modelingthe cost of running experiments.The current work diﬀers from existing studies on rational inattention (hereafterRI) in three aspects. First, early developments in RI by Sims (1998, 2003), Ma´ckowiakand Wiederholt (2009) and Woodford (2009) sought to explain the stickiness ofmacroeconomic variables by information processing costs, whereas we examine theimplication of costly yet ﬂexible monitoring for principal-agent relationships. Sec-ond, we focus mainly on partitional monitoring technologies because in reality, addingnon-performance-related factors into employee ratings could have dire consequencessuch as appeals, lawsuits and excessive turnover. Finally, our monitoring cost func-tion nests entropy as a special case.Recent works of Cr´emer, Garicano, and Prat (2007), J¨ager, Metzger, and Riedel(2011), Sobel (2015) and Dilm´e (2017) examine the optimal language used betweenorganization members who share a common interest but face communication costs. Yang (2019) studies a security design problem where a rationally inattentive buyer can obtainany signal about the uncertain fundamental at a cost that is proportional to entropy reduction.Other recent eﬀorts to introduce RI into strategic environments include but are not limited to:Mat´ejka and McKay (2012), Martin (2017) and Ravid (2017). See standard HR textbooks for this subject matter. Saint-Paul (2017) demonstrates the validityof entropy as an information cost in decision problems where the decision variable is a deterministicfunction of the exogenous state variable.

Primitives

A risk-neutral principal faces a risk-averse agent, who earns a utility u ( w ) from spending a nonnegative wage w ≥ c ( a ) from privatelyexerting high or low eﬀort a ∈ { , } . The function u : R + → R satisﬁes u (0) = 0, u (cid:48) > u (cid:48)(cid:48) <

0, whereas c (1) = c > c (0) = 0.Each eﬀort choice a generates a probability space (Ω , Σ , P a ), where Ω is a ﬁnite-dimensional Euclidean space that comprises the agent’s performance data, Σ is theBorel sigma-algebra on Ω, and P a is the probability measure on (Ω , Σ). P a ’s areassumed to be mutually absolutely continuous, and the probability density function p a ’s they induce are well-deﬁned and everywhere positive. Incentive contract

An incentive contract (cid:104)P , w ( · ) (cid:105) is a pair of monitoring tech-nology P and wage scheme w : P → R + . The former represents a human- ormachine-operated system that governs the processing and analysis of performancedata, whereas the latter maps outputs of the ﬁrst-step procedure to diﬀerent levelsof wages. In the main body of this paper, P can be any partition of Ω with at most K cells that are all of positive measures, and w : P → R + maps each cell A of P toa nonnegative wage w ( A ) ≥ The upper bound K for the rating scale |P| can beany integer greater than one and will be taken as given throughout the analysis. In Appendix B.2, we allow the monitoring technology to be any mapping from Ω to lotterieson ﬁnite performance categories. If the lottery is degenerate, then the monitoring technology ispartitional. Appendix B.1 examines the case where the agent is constrained by individual rationality. The upper bound K , while stylized, guarantees the existence of optimal incentive contract(s).Judging from the simulation exercises we have so far conducted, the optimal rating scale is typicallysmaller than K even when µ is small (see, e.g., Figure 1). ω ∈ Ω, let A ( ω ) be the unique performance category thatcontains ω and let w ( A ( ω )) be the wage associated with A ( ω ). Time evolves asfollows:1. the principal commits to (cid:104)P , w ( · ) (cid:105) ;2. the agent privately chooses a ∈ { , } ;3. Nature draws ω from Ω according to P a ;4. the monitoring technology outputs A ( ω );5. the principal pays w ( A ( ω )) to the agent. Implementation cost

For any given eﬀort choice a by the agent, a monitoringtechnology P = { A , · · · , A N } outputs a signal X : Ω → P whose probability distri-bution is represented by a vector π ( P , a ) = ( P a ( A ) , · · · , P a ( A N ) , , · · · ,

0) in the K -dimensional simplex. The principal incurs the following cost from implementingan incentive contract (cid:104)P , w ( · ) (cid:105) : (cid:88) A ∈P P a ( A ) w ( A ) + µ · H ( P , a ) , which consists of two parts. The ﬁrst part (cid:80) A ∈P P a ( A ) w ( A ), i.e., the incentivecost , has been the central focus of the existing principal-agent literature. The secondpart µ · H ( P , a ), hereafter termed the monitoring cost , represents the cost associatedwith the processing and analysis of the performance data. In particular, µ > H ( P , a )captures the amount of information carried by the output signal and is assumed tosatisfy the following properties: Assumption 1.

There exists a function h : ∆ K → R + such that H ( P , a ) = h ( π ( P , a )) for all ( P , a ) . Furthermore,(a) h ( π , · · · , π K ) = h (cid:0) π σ (1) , · · · , π σ ( K ) (cid:1) for all probability vector ( π , · · · , π K ) ∈ ∆ K and permutation σ on { , · · · , K } ;(b) h (0 , π , · · · ) < h ( π (cid:48) , π (cid:48)(cid:48) , · · · ) for all (0 , π , · · · ) and ( π (cid:48) , π (cid:48) , · · · ) ∈ ∆ K that diﬀeronly in the ﬁrst two elements and satisfy π , π (cid:48) , π (cid:48) > and π = π (cid:48) + π (cid:48) . − (cid:80) A ∈P P a ( A ) log P a ( A ) of the output signal and the bits of information log |P| itcarries. In Section 2.2, we motivate the use of this assumption in the example of callcenter performance management.

The principal’s problem

Consider the problem of inducing high eﬀort from theagent. Deﬁne a random variable Z : Ω → R by Z ( ω ) = 1 − p ( ω ) p ( ω ) ∀ ω, where p ( ω ) /p ( ω ) is the likelihood ratio associated with data point ω . Note that E [ Z | a = 1] = 0 and that the range of Z is a subset of ( −∞ , − A ∈ Σof positive measure, deﬁne the z -value of A by z ( A ) = E [ Z | A ; a = 1] . In words, z ( A ) represents the average value of Z conditional on the data point beingdrawn from A .A contract (cid:104)P , w ( · ) (cid:105) is incentive compatible if (cid:88) A ∈P P ( A ) u ( w ( A )) − c ≥ (cid:88) A ∈P P ( A ) u ( w ( A ))or, equivalently, (cid:88) A ∈P P ( A ) u ( w ( A )) z ( A ) ≥ c, (IC)and it satisﬁes the limited liability constraint if w ( A ) ≥ ∀ A ∈ P . (LL) The bit is a basic unit of information in information theory, computing, and digital communi-cations. In information theory, one bit is deﬁned as the maximum information entropy of a binaryrandom variable. The problem of inducing low eﬀort is standard.

7n optimal incentive contract that induces high eﬀort from the agent (optimal in-centive contract for short) minimizes the total implementation cost under high eﬀort,subject to the incentive compatibility constraint and limited liability constraint:min (cid:104)P ,w ( · ) (cid:105) (cid:88) A ∈P P ( A ) w ( A ) + µ · H ( P ,

1) s.t. (IC) and (LL) . In what follows, we will denote the solution(s) to the above problem by (cid:104)P ∗ , w ∗ ( · ) (cid:105) . We ﬁrst illustrate Assumption 1 in the context of call center performance manage-ment:

Example 1.

In the example described in Section 1, a piece of performance datacomprises the major characteristics of a call history (e.g., customer sentiment andvoice quality) encoded in binary digits, and the monitoring technology representsthe speech analytics program that categorizes binary digits into performance ratings.To formalize the design ﬂexibility, we allow the monitoring technology to partitionperformance data into any N ≤ K categories, where K can be any interger greaterthan one. The cost of running the monitoring technology is assumed to increasewith the amount of processed information, whose deﬁnition varies from case to case.For example, if the monitoring technology runs many times among many identicalagents, then the optimal design should minimize the average steps it takes to ﬁnd theperformance category containing the raw data point. By now, it is well known thatthis quantity equals approximately the entropy of the output signal. In contrast, ifthe monitoring technology runs only a few times for a few number of agents, thenthe worst-case (or unamortized) amount of processed information is best capturedby the bits of information carried by the output signal (see, e.g., Cover and Thomas(2006)). In both cases, the quantity of our interest depends only on the probabilitydistribution of the output signal and nothing else.We next introduce the concept of setup cost and distinguish it from our notion ofmonitoring cost: Example 1 (Continued) . As its name suggests, setup cost refers the cost incurredto set up the infrastructure that facilitates data processing and analysis, e.g., Fast8ourier Transformation (FFT) chips (which transform sound waves into their majorcharacteristics coded in binary digits), recording devices, etc..The major role of setup cost is to change the probability space (Ω , Σ , P a ). Forexample, design improvements in FFT chips enable more frequent sampling of soundwaves and cause (Ω , Σ , P a ) to change. In what follows, we will take the probabilityspace as given and ignore the setup cost. That said, one can certainly embed ouranalysis into a two-stage setting in which the principal ﬁrst incurs the setup cost andthen the monitoring cost. Results below will carry over to this new setting. Example 2.

Suppose u ( w ) = √ w , Z is uniformly distributed over [ − / , /

2] under a = 1 and H ( P , a ) = f ( |P| ) for some strictly increasing function f : { , · · · , K } → R + . Below we walk through the key steps in solving the optimal incentive contract,give closed-form solutions and discuss their practical implications. Optimal wage scheme

We ﬁrst solve for the optimal wage scheme for any givenmonitoring technology P as in Holmstr¨om (1979). Speciﬁcally, label the performancecategories as A , · · · , A N , and write π n = P ( A n ) and z n = z ( A n ) for n = 1 , · · · , N .Assume z j (cid:54) = z k for some j, k ∈ { , · · · , N } to make the analysis interesting. Theprincipal’s problem is then min { w n } N (cid:88) n =1 π n w n , s.t. N (cid:88) n =1 π n √ w n z n ≥ c, (IC)and w n ≥ , n = 1 , · · · , N. (LL)Straightforward algebra yields the expression for minimal incentive cost: c (cid:32) N (cid:88) n =1 π n max { , z n } (cid:33) − . suﬃcient statistics principle , namely z -value is the only part of the performance data that provides the agent with incentives. Optimal monitoring technology

We next solve for the optimal monitoring tech-nology. First, note that the principal should partition performance data based only ontheir z -values, and that diﬀerent performance categories must attain diﬀerent z -valuesand wages. The reason combines the suﬃcient statistic principle with Assumption1(b), namely merging performance categories of the same z -value saves the monitoringcost while leaving the incentive cost unaﬀected and thus constitutes an improvementto the original monitoring technology.A more interesting question concerns how we should assign the various data points,identiﬁed by their z -values, to diﬀerent performance categories. In the baseline modelfeaturing a single agent and binary eﬀorts, the answer to this question is relativelystraightforward: assign high (resp. low) z -values to high-wage (resp. low-wage) cat-egories. Here is a quick proof of this result: since the left-hand side of the (IC)constraint is supermodular in wages and z -values, if our conjecture were false, thenreshuﬄing data points as above while holding the probabilities of performance cate-gories constant reduces the incentive cost while leaving the monitoring cost unaﬀected.When extending the above intuition to general settings featuring multiple agentsor multiple actions, we face two challenges. First, in the case where z -values and wagesare vectors, the direction of sorting these objects is nonobvious a priori. Second,changes in the sorting algorithm aﬀect wages endogenously through the Lagrangemultipliers of the incentive constraints, yielding eﬀects that are new and diﬃcult toassess using standard methods.The proof strategy presented in Section 3.3 overcomes these challenges, showingthat the assignment of Lagrange multiplier-weighted z -values to performance cate-gories must be positive assortative in the direction of agent utilities. Geometrically,this means that any optimal monitoring technology must comprise convex cells in thespace of z -values or their transformations. Theorems 1, 3 and 5 formalize the abovestatements. Implications

An important feature of the optimal monitoring technology is in-formation aggregation —a term used by human resource practitioners to refer to theaggregation of potentially high-dimensional performance data into rank-ordered rat-10ngs such as “satisfactory” and “unsatisfactory.”The geometry of the optimal monitoring technology sheds light on the practicalissues covered in Sections 3.4, 4.3 and 5.1. Consider, for example, optimal perfor-mance grids. In the current example, it can be shown that the optimal N -partitionalmonitoring technology divides the space [ − / , /

2] of z -values into N disjoint inter-vals [ (cid:98) z n − , (cid:98) z n ), n = 1 , · · · , N , where (cid:98) z = − / (cid:98) z N = 1 /

2. The optimal cut points { (cid:98) z n } N − n =1 can be solved as follows:min { (cid:98) z n } c (cid:32) N (cid:88) n =1 π n max { , z n } (cid:33) − − µ · f ( N ) , where π n = (cid:90) (cid:98) z n (cid:98) z n − dZ = (cid:98) z n − (cid:98) z n − , and z n = 1 π n (cid:90) (cid:98) z n (cid:98) z n − ZdZ = 12 ( (cid:98) z n − + (cid:98) z n ) . Straightforward algebra yields (cid:98) z n = 2 n − N − , n = 1 , · · · , N − . Based on this result, as well as the functional form of f , we can then solve for theoptimal rating scale N and hence the optimal incentive contract completely. This section analyzes optimal incentive contracts. Results below hold true exceptperhaps on a measure zero set of data points. The same disclaimer applies to theremainder of this paper.We ﬁrst deﬁne Z -convexity : Deﬁnition 1.

A set A ∈ Σ is Z -convex if the following holds for all ω (cid:48) , ω (cid:48)(cid:48) ∈ A suchthat Z ( ω (cid:48) ) (cid:54) = Z ( ω (cid:48)(cid:48) ) : { ω ∈ Ω : Z ( ω ) = (1 − s ) · Z ( ω (cid:48) ) + s · Z ( ω (cid:48)(cid:48) ) for some s ∈ (0 , } ⊂ A. In words, a set A ∈ Σ is Z -convex if whenever it contains data points of diﬀerent11 -values, it must also contain all data points of intermediate z -values. Let Z ( A )denote the image of any set A ∈ Σ under mapping Z . In the case where Z (Ω) is aconnected set in R , the above deﬁnition is equivalent to the convexity of Z ( A ) in R .A few assumptions before we go into detail. The next assumption says that thedistribution of Z has no atom or hole: Assumption 2. Z is distributed atomlessly on a connected set Z (Ω) in R under a = 1 . The next assumption says that Z (Ω) is compact set in R : Assumption 3. Z (Ω) is a compact set in R . The next assumption imposes regularities on the monitoring cost function: Part(a) of it holds for the bits of information carried by the output signal, and Part (b)of it holds for the entropy of the output signal:

Assumption 4.

The function h : ∆ K → R + satisﬁes one of the following conditions:(a) h ( π ( P , a )) = f ( |P| ) for some strictly increasing function f : { , · · · , K } → R + ;(b) h is continuous. We now state our main results. The next theorem shows that any optimal incentivecontract assigns data points of high (resp. low) z -values to high-wage (resp. low-wage) categories. Under Assumption 2, this can be achieved by ﬁrst dividing z -valuesinto disjoint intervals and then backing out the partition of the original data spaceaccordingly. The result is an aggregation of potentially high-dimensional data intorank-ordered ratings, as well as a wage scheme that is strictly increasing in theseratings: Theorem 1.

Assume Assumption 1 and let (cid:104)P ∗ , w ∗ ( · ) (cid:105) be any optimal incentivecontract that induces high eﬀort from the agent. Then P ∗ comprises Z -convex cellslabeled as A , · · · , A N where w ∗ ( A ) < · · · < w ∗ ( A N ) . Assume, in addition,Assumption 2. Then there exist inf Z (Ω) = (cid:98) z < (cid:98) z < · · · < (cid:98) z N = sup Z (Ω) such that A n = { ω : Z ( ω ) ∈ [ (cid:98) z n − , (cid:98) z n ) } for n = 1 , · · · , N . Under Assumption 2, the set of (ﬁnite) cut points has measure zero, so it is unimportant whichof the two adjacent intervals a cut point belongs to. The choice of expressing all intervals as righthalf-open ones is purely aesthetic.

Theorem 2.

An optimal incentive contract that induces high eﬀort from the agentexists under Assumptions 1-4.Proof.

See Appendix A.1.

The proof of Theorem 1 consists of three steps. The intuitions of steps one and twohave already been discussed in Example 2. Step three is new.

Step one

We ﬁrst take any monitoring technology P as given and solve for theoptimal wage scheme as in Holmstr¨om (1979):min w : P→ R + (cid:88) A ∈P P ( A ) w ( A ) s.t. (IC) and (LL). (3.1)The next lemma restates Holmstr¨om’s (1979) suﬃcient statistic principle : Lemma 1.

Let w ∗ ( · ; P ) be any solution to Problem (3.1). Then there exists λ > such that u (cid:48) ( w ∗ ( A ; P )) = 1 / ( λz ( A )) for all A ∈ P such that w ∗ ( A ; P ) > .Proof. See Appendix A.1.

Step two

We next demonstrate that diﬀerent performance categories must attaindiﬀerent z -values and wages: Lemma 2.

Assume Assumption 1. Let (cid:104)P ∗ , w ∗ ( · ) (cid:105) be any optimal incentive contractthat induces high eﬀort from the agent and label the cells of P ∗ as A , · · · , A N suchthat z ( A ) ≤ · · · ≤ z ( A N ) . Then z ( A ) < < · · · < z ( A N ) and w ∗ ( A ) < · · ·

Step three

We ﬁnally demonstrate that the assignment of z -values into wage cate-gories is positive assortative. In Example 2, we sketched a proof based on supermodu-larity and pointed out the diﬃculties of extending that argument to multidimensionalenvironments. The argument below overcomes these diﬃculties.13ake any optimal incentive contract with distinct performance categories A j and A k . From Lemma 2, we know that z ( A j ) (cid:54) = z ( A k ). Fix any (cid:15) >

0, and take any A (cid:48) (cid:15) ⊂ A j and A (cid:48)(cid:48) (cid:15) ⊂ A k such that P ( A (cid:48) (cid:15) ) = P ( A (cid:48)(cid:48) (cid:15) ) = (cid:15) and z ( A (cid:48) (cid:15) ) = z (cid:48) (cid:54) = z ( A (cid:48)(cid:48) (cid:15) ) = z (cid:48)(cid:48) .In words, A (cid:48) (cid:15) and A (cid:48)(cid:48) (cid:15) have the same probability (cid:15) under a = 1 but diﬀerent z -valuesthat are independent of (cid:15) . Lemma 3 of Appendix A.1.1 proves existence of A (cid:48) (cid:15) and A (cid:48)(cid:48) (cid:15) when (cid:15) is small.Consider a perturbation to the monitoring technology that “swaps” A (cid:48) (cid:15) and A (cid:48)(cid:48) (cid:15) .Post the perturbation, the new performance categories, denoted by A n ( (cid:15) )’s, become A j ( (cid:15) ) = ( A j \ A (cid:48) (cid:15) ) ∪ A (cid:48)(cid:48) (cid:15) , A k ( (cid:15) ) = ( A k \ A (cid:48)(cid:48) (cid:15) ) ∪ A (cid:48) (cid:15) and A n ( (cid:15) ) = A n for n (cid:54) = j, k . Since theperturbation has no eﬀect on the probabilities of the performance categories under a = 1, it does not aﬀect the monitoring cost by Assumption 1(a). Meanwhile, itchanges the principal’s Lagrangian to the following (ignore the (LL) constraint): L ( (cid:15) ) = (cid:88) n π n ( w n ( (cid:15) ) − λ ( (cid:15) ) u ( w n ( (cid:15) )) z n ) + λ ( (cid:15) ) c, where π n denotes the probability of A n (equivalently A n ( (cid:15) )) under a = 1, w n ( (cid:15) )the optimal wage at A n ( (cid:15) ), and λ ( (cid:15) ) the Lagrange multiplier associated with the(IC) constraint. A close inspection of the Lagrangian leads to the following conjec-ture: to minimize L ( (cid:15) ), the assignment of Lagrange multiplier-weighted z -values toperformance categories must be positive assortative in the direction of agent utilities.To develop intuition, we assume diﬀerentiability and obtain L (cid:48) (0) = (cid:88) n π n w (cid:48) n (0) − λ (cid:48) (0) (cid:32)(cid:88) n π n u ( w n (0)) z n (0) − c (cid:33)(cid:124) (cid:123)(cid:122) (cid:125) (1) = 0 − λ (0) (cid:88) n π n · u (cid:48) ( w n (0)) z n (0) (cid:124) (cid:123)(cid:122) (cid:125) (2) = 1 /λ (0) · w (cid:48) n (0) + (cid:88) n π n u ( w n (0)) z (cid:48) n (0)  = (cid:88) n π n w (cid:48) n (0) − − (cid:88) n π n w (cid:48) n (0) − λ (0) (cid:88) n π n u ( w n (0)) z (cid:48) n (0)= − λ (0) (cid:88) n π n u ( w n (0)) z (cid:48) n (0)= λ (0) ( z (cid:48)(cid:48) − z (cid:48) ) ( u ( w k (0)) − u ( w j (0))) . In the above expression, (1) = 0 because the (IC) constraint binds under the original14ontract, and (2) = 1 /λ (0) by Lemma 1. These ﬁndings resolve our concerns raisedin Section 3.1, showing that the eﬀects of our perturbation on the Lagrange multiplierand wages are negligible.To complete the proof, note that L (cid:48) (0) ≥ L (cid:48) (0) (cid:54) = 0because λ (0) > z (cid:48)(cid:48) (cid:54) = z (cid:48) and w j (0) (cid:54) = w k (0) (Lemma 2). Combining yields L (cid:48) (0) >

0, so our conjecture is indeed true. Z -convexity is immediate: if a performancecategory contains extreme but not intermediate z -values, then the assignment of z -values goes in the wrong direction and an improvement can be constructed.The above proof strategy yields the endogenous direction of sorting raw data intoperformance categories, which is relatively straightforward in the baseline model but isless so in later extensions. The proof in Appendix A.1 does not assume diﬀerentiabilityand handles the limited liability constraint, too. Strict MLRP

Theorem 1 implies that the signal generated by any optimal moni-toring technology must satisfy the strict monotone likelihood ratio property (hereafter strict MLRP ) with respect to the order induced by z -values: Deﬁnition 2.

For any

A, A (cid:48) ∈ Σ of positive measures, write A z (cid:22) A (cid:48) if z ( A ) ≤ z ( A (cid:48) ) . Corollary 1.

The signal X : Ω → P ∗ generated by any optimal monitoring technology P ∗ satisﬁes strict MLRP with respect to z (cid:22) , i.e., any A, A (cid:48) ∈ P ∗ satisfy A z (cid:22) A (cid:48) if andonly if z ( A ) < z ( A (cid:48) ) . While the signal generated by any monitoring technology trivially satisﬁes theweak MLRP with respect to z (cid:22) (i.e., replace “ < ” with “ ≤ ” in Corollary 1), it violatesthe strict MLRP if there are multiple performance categories that attain the same z -value. By contrast, the signal generated by any optimal monitoring technology mustsatisfy the strict MLRP with respect to z (cid:22) , because merging performance categories ofthe same z -value saves the monitoring cost while leaving the incentive cost unaﬀected. Comparative statics

The parameter µ captures factors that aﬀect the (opportu-nity) cost of data processing and analysis. Factors that reduce µ include, but are notlimited to: the advent of IT-based HR management systems in the 90’s, advancementsin speech analytics, increases in computing power, etc..15o facilitate comparative statics analysis, we write any choice of optimal incentivecontract as (cid:104)P ∗ ( µ ) , w ∗ ( · ; µ ) (cid:105) to make its dependence on µ explicit: Proposition 1.

Fix any < µ < µ (cid:48) . For any choices of (cid:104)P ∗ ( µ ) , w ∗ ( · ; µ ) (cid:105) and (cid:104)P ∗ ( µ (cid:48) ) , w ∗ ( · ; µ (cid:48) ) (cid:105) :(i) (cid:88) A ∈P ( µ ) P ( A ) w ∗ ( A ; µ ) ≤ (cid:88) A ∈P ( µ (cid:48) ) P ( A ) w ∗ ( A ; µ (cid:48) ) ;(ii) H ( P ∗ ( µ ) , ≥ H ( P ∗ ( µ (cid:48) ) , ;(iii) |P ∗ ( µ ) | ≥ |P ∗ ( µ (cid:48) ) | under Assumption 4(a).Proof. Part (i) follows from the optimalities of P ∗ ( µ ) and P ∗ ( µ (cid:48) ). Parts (ii) and (ii)are immediate.Proposition 1 shows that as data processing and analysis become cheaper, theprincipal pays less wage on average and the information carried by the output signalbecomes ﬁner. In the case where the monitoring cost is an increasing function ofthe rating scale (see, e.g., Hook, Jenkins, and Foot (2011)), the optimal rating scaleis nonincreasing in µ . For other monitoring cost functions such as entropy, we canﬁrst compute the cutoﬀ z -values and then the optimal rating scale as in Example 2. Figure 1 plots the numerical solutions obtained in a special case.The above ﬁndings are consistent with several strands of empirical facts. Amongothers, access to IT has proven to increase the ﬁneness of the performance grids amongmanufacturing companies, holding other things constant (Bloom and Van Reenen(2006, 2007, 2010); Bloom, Sadun, and Van Reenen (2012)). Crowdsourcing theprocessing and analysis of real-time data has enabled the “exact individual diagnosis”that separates distinctive and mediocre performers in companies like GE and Zalando(Ewenstein, Hancock, and Komm (2016)). In general, this is not an easy task because perturbations of cutoﬀ z -values (which diﬀer from theperturbation considered in Section 3.3) aﬀect wages endogenously through the Lagrange multipliersof the incentive constraints. See the appendices of Bloom and Van Reenen (2006, 2007) for survey questions regarding theﬁneness of the performance grids, e.g., “Each employee is given a red light (not performing), anamber light (doing well and meeting targets), a green light (consistently meeting targets, very highperformer) and a blue light (high performer capable of promotion of up to two levels),” versus“rewards is based on an individual’s commitment to the company measured by seniority.”

345 0.0 0.5 1.0 1.5 2.0 µ | P | Figure 1: Plot the optimal rating scale against µ : entropy cost, u ( w ) = √ w , Z ∼ U [ − / , / c = 1, K = 100. Each of the two agents i = 1 , u i ( w i ) − c i ( a i ) from spending anonnegative wage w i ≥ a i ∈ { , } . Thefunction u i : R + → R satisﬁes u i (0) = 0, u (cid:48) i > u (cid:48)(cid:48) i <

0, and c i (1) = c i > c i (0) =0. Each eﬀort proﬁle a = a a generates a probability space (Ω , Σ , P a ), where Ω isa ﬁnite-dimensional Euclidean space that comprises agents’ performance data, Σ isthe Borel sigma-algebra on Ω, and P a is the probability measure on (Ω , Σ). P a ’s areassumed to be mutually absolutely continuous, and the probability density function p a ’s they induce are well-deﬁned and everywhere positive.In this new setting, a monitoring technology P can be any partition of Ω with atmost K cells that are all of positive measures, and a wage scheme w : P → R mapseach cell A of P to a vector w ( A ) = ( w ( A ) , w ( A )) (cid:62) of nonnegative wages. Forany data point ω , let A ( ω ) be the unique performance category that contains ω andlet w ( A ( ω )) be the wage vector associated with A ( ω ). Time evolves as follows:1. the principal commits to (cid:104)P , w ( · ) (cid:105) ;17. agent i privately chooses a i ∈ { , } , i = 1 , ω from Ω according to P a ;4. the monitoring technology outputs A ( ω );5. the principal pays w i ( A ( ω )) to agent i = 1 , for(1 , (cid:62) and deﬁne a vector-valued random variable Z = ( Z , Z ) (cid:62) by Z i ( ω ) = 1 − p a i =0 ,a − i =1 ( ω ) p ( ω ) ∀ ω ∈ Ω , i = 1 , . Deﬁne the z -value of any set A ∈ Σ of positive measure by ( z ( A ) , z ( A )) (cid:62) , where z i ( A ) = E [ Z i | A ; a = ] ∀ i = 1 , . A contract is incentive compatible for agent i if (cid:88) A ∈P P ( A ) u i ( w i ( A )) z i ( A ) ≥ c i , (IC i )and it satisﬁes agent i ’s limited liability constraint if w i ( A ) ≥ ∀ A ∈ P . (LL i )An optimal contract minimizes the total implementation cost under the high eﬀortproﬁle, subject to agents’ incentive compatibility constraints and limited liabilityconstraints:min (cid:104)P , w ( · ) (cid:105) (cid:88) A ∈P P a ( A ) (cid:88) i =1 w i ( A ) + µ · H ( P , ) s.t. (IC i ) and (LL i ), i = 1 , . The next deﬁnition generalizes Z -convexity:18 eﬁnition 3. A set A ∈ Σ is Z -convex if the following holds for all ω (cid:48) , ω (cid:48)(cid:48) ∈ A suchthat Z ( ω (cid:48) ) (cid:54) = Z ( ω (cid:48)(cid:48) ) : { ω ∈ Ω : Z ( ω ) = (1 − s ) · Z ( ω (cid:48) ) + s · Z ( ω (cid:48)(cid:48) ) for some s ∈ (0 , } ⊂ A. The next two assumptions impose regularities on the principal’s problem analo-gously to Assumptions 2 and 3:

Assumption 5. Z is distributed atomelessly on a connect set Z (Ω) in R under a = . Assumption 6. Z (Ω) is compact set in R with dim Z (Ω) = 2 . The next theorems extend Theorems 1 and 2 to encompass multiple agents:

Theorem 3.

Assume Assumptions 1, 5 and 6. Then any optimal monitoring tech-nology comprises Z -convex cells that constitute convex polygons in R . Theorem 4.

An optimal incentive contract that induces high eﬀort from both agentsexists under Assumptions 1, 4, 5 and 6.Proof.

See Appendix A.2.

Proof sketch

The proof strategy developed in Section 3.3 is useful for handlingvector-valued z -values and wages. As before, ﬁx any (cid:15) >

0, and take any subsets A (cid:48) (cid:15) and A (cid:48)(cid:48) (cid:15) of two distinct performance categories A j and A k , respectively, such that P ( A (cid:48) (cid:15) ) = P ( A (cid:48)(cid:48) (cid:15) ) = (cid:15) and z ( A (cid:48) (cid:15) ) = z (cid:48) (cid:54) = z ( A (cid:48)(cid:48) (cid:15) ) = z (cid:48)(cid:48) (Lemma 5 of Appendix A.2.1proves existence of sets that satisfy weaker properties). Post the perturbation as inSection 3.3, the principal’s Lagrangian becomes (ignore (LL i ) constraints): L ( (cid:15) ) = (cid:88) n π n (cid:32)(cid:88) i w i,n ( (cid:15) ) − λ i ( (cid:15) ) u i ( w i,n ( (cid:15) )) z i,n ( (cid:15) ) − c i (cid:33) , where π n denotes the probability of A n (equivalently, A n ( (cid:15) )) under a = , w i,n ( (cid:15) )agent i ’s optimal wage at A n ( (cid:15) ) and λ i ( (cid:15) ) the Lagrange multiplier associated withthe (IC i ) constraint. Assuming diﬀerentiability, we obtain L (cid:48) (0) = − (cid:88) n π n · u (cid:62) n (cid:32) λ (0) 00 λ (0) (cid:33) dd(cid:15) z n ( (cid:15) ) (cid:12)(cid:12)(cid:12)(cid:12) (cid:15) =0 = ( u k − u j ) (cid:62) ( (cid:98) z (cid:48)(cid:48) − (cid:98) z (cid:48) ) , u n = ( u ( w i,n (0)) , u ( w i,n (0))) (cid:62) ∀ n and (cid:98) z = (cid:32) λ (0) 00 λ (0) (cid:33) z for z = z (cid:48) , z (cid:48)(cid:48) . Since L (cid:48) (0) ≥ z -values into performance categories must be “positive assortative,” where the directionof sorting is given by the vector of agents’ utilities. This implies Z -convexity for thesame reason as in Section 3.3. Implications

Solving the optimal convex polygons is computationally hard. Thatsaid, note that the boundaries of convex polygons consist of straight line segments in Z (Ω), which combined with Assumption 5 yields the following observations: • any bi-partitional contract takes the form of either a team or a tournament andis fully captured by the intercept and slope of the straight line as depicted inFigure 2; W =0 W =0 W >0 W >0 Z Z Z Z W >0 W =0 W =0 W >0 Figure 2: Bi-partitional contracts: team and tournament. • contracts that evaluate and reward agents on an individual basis are fully de-termined by the individual performance cutoﬀs as depicted in Figure 3.20 >0 W >0 W >0 W =0 W =0 W =0 W =0 W >0 Z Z Figure 3: An individual incentive contract.

This section compares individual and group performance evaluations from the angleof monitoring cost. To obtain the sharpest insights, suppose that agents are techno-logically independent : Assumption 7.

There exist probability spaces { (Ω i , Σ i , P i,a i ) } i,a i as in Section 2 suchthat (Ω , Σ , P a ) = (Ω × Ω , Σ ⊗ Σ , P ,a × P ,a ) for all a ∈ { , } . In the language of contract theory, Assumption 7 rules out any technological link (i.e., ω i depends on a − i ) or common productivity shock (i.e., ω , ω are correlatedgiven a ) between agents.The next deﬁnition is standard: Deﬁnition 4. (i) P is an individual monitoring technology if for all A ∈ P , thereexist A ∈ Σ and A ∈ Σ such that A = A × A ; otherwise P is a groupmonitoring technology ;(ii) Let P be any individual monitoring technology. Then w : P → R is an indi-vidual wage scheme if w i (cid:0) A i × A (cid:48)− i ; P (cid:1) = w i (cid:0) A i × A (cid:48)(cid:48)− i ; P (cid:1) for all i = 1 , and A i × A (cid:48)− i , A i × A (cid:48)(cid:48)− i ∈ P ; otherwise w : P → R is a group wage scheme ;(iii) (cid:104)P , w : P → R (cid:105) is an individual incentive contract if P is an individualmonitoring technology and w : P → R is an individual wage scheme; otherwiseit is a group incentive contract .

21y deﬁnition, a group incentive contract either conducts group performance eval-uations or pairs individual performance evaluations with group incentive pays. UnderAssumption 7, the second option is sub-optimal by the suﬃcient statistics principleor Holmstr¨om (1982), thus reducing the comparison between individual and groupincentive contracts to that of individual and group performance evaluations.Let I be the ratio between the minimal cost of implementing bi-partitional incen-tive contracts and that of implementing individual incentive contracts (the latter, bydeﬁnition, have at least four performance categories). I <

Corollary 2.

Under Assumptions 1, 4(a), 5, 6 and 7,

I < when µ is large. Beyond the case considered in Corollary 2, we can compute I numerically based onthe prior discussion about how to parameterize bi-partitional and individual incentivecontracts. Figure 4 plots the solutions obtained in a special case. m I Figure 4: Plot I against µ : entropy cost, u i ( w ) = √ w , Z i ∼ U [ − / , /

2] and c i = 1for i = 1 , In the future, it will be interesting to naildown the role of IT in Bloom and Van Reenen (2006, 2007), and to replicate thesestudies for recent advancements in data technologies.

In this section, suppose that the agent’s action space A is a ﬁnite set, and that takingan action a in A incurs a cost c ( a ) to the agent and generates a probability space(Ω , Σ , P a ) as in Section 2. The principal wishes to induce the most costly action a ∗ ,i.e., c ( a ∗ ) > c ( a ) for all a ∈ D = A − { a ∗ } . For any deviation from a ∗ to a ∈ D ,deﬁne a random variable Z a : Ω → R by Z a ( ω ) = 1 − p a ( ω ) p a ∗ ( ω ) ∀ ω ∈ Ω . For any a ∈ D and set A ∈ Σ of positive measure, deﬁne z a ( A ) = E [ Z a | A ; a ∗ ] . A contract is incentive compatible if for all a ∈ D : (cid:88) A ∈P P a ∗ ( A ) u ( w ( A )) z a ( A ) ≥ c ( a ∗ ) − c ( a ) . (IC a ) See the survey questions of Bloom and Van Reenen (2006, 2007) regarding the choices betweenindividual and group evaluations, e.g., “employees are rewarded based on their individual contribu-tions to the company,” and “compensation is based on shift/plant-level outcomes.” The former isregarded as an advanced but expensive managerial practice and is more prevalent among companieswith better IT access, other things being equal.

23n optimal incentive contract (cid:104)P ∗ , w ∗ ( · ) (cid:105) that induces a ∗ solvesmin (cid:104)P ,w ( · ) (cid:105) (cid:88) A ∈P P a ∗ ( A ) w ( A ) + µ · H ( P , a ∗ ) s.t. (IC a ) ∀ a ∈ D and (LL) . Write Z for ( Z a ) (cid:62) a ∈D . For any |D| -vector λ = ( λ a ) (cid:62) a ∈D in R |D| + , deﬁne a randomvariable Z λ : Ω → R by Z λ ( ω ) = λ (cid:62) Z ( ω ) ∀ ω ∈ Ω . The next deﬁnition generalizes Z -convexity: Deﬁnition 5.

A set A ∈ Σ is Z λ -convex if the following holds for all ω (cid:48) , ω (cid:48)(cid:48) ∈ A suchthat Z λ ( ω (cid:48) ) (cid:54) = Z λ ( ω (cid:48)(cid:48) ) : { ω : Z λ ( ω ) = (1 − s ) · Z λ ( ω (cid:48) ) + s · Z λ ( ω (cid:48)(cid:48) ) for some s ∈ (0 , } ⊂ A. The next theorems extend Theorems 1 and 2 to encompass multiple actions:

Theorem 5.

Assume Assumption 1 and Assumption 3 for all a ∈ D . Then forany optimal incentive contract (cid:104)P ∗ , w ∗ ( · ) (cid:105) that induces a ∗ , there exists λ ∗ ∈ R |D| + with (cid:107) λ ∗ (cid:107) > such that all cells of P ∗ are Z λ ∗ -convex and can be labeled as A , · · · , A N such that w ∗ ( A ) < · · · < w ∗ ( A N ) . Assume, in addition, As-sumption 2 for all a ∈ D . Then there exist −∞ ≤ (cid:98) z < · · · < (cid:98) z N < + ∞ such that A n = { ω : Z λ ∗ ( ω ) ∈ [ (cid:98) z n − , (cid:98) z n ) } for n = 1 , · · · , N . Theorem 6.

Assume Assumptions 1 and 4, as well as Assumptions 2 and 3 for all a ∈ D . Then an optimal incentive contract that induces a ∗ exists.Proof. See Appendix A.3.In the presence of multiple actions, each data point is associated with ﬁnitely many z -values, each corresponding to a deviation from a ∗ that the agent can potentiallycommit. By establishing that the assignment of Lagrange multiplier-weighted z -values into wage categories is positive assortative, Theorem 5 relates the focus of dataprocessing and analysis to the agent’s endogenous tendencies to commit deviations.Intuitively, when λ ∗ a is large and hence the agent is tempted to commit deviation a ,focus should be given to the information Z a that helps detect deviation a , and the (cid:107) · (cid:107) denotes the sup norm in the remainder of this paper. Z a . Thenext section gives an application of this result. A single agent can exert either high or low eﬀort a i ∈ { , } in each of the two tasks i = 1 ,

2, and each a i independently generates a probability space (Ω i , Σ i , P i,a i ) as inSection 2. The goal of a risk-neutral principal is to induce high eﬀort in both tasks.Write a = a a , ω = ω ω , A = { , , , } , a ∗ = 11 and D = { , , } .For any i = 1 , ω i ∈ Ω i , deﬁne Z i ( ω i ) = 1 − p i,a i =0 ( ω i ) p i,a i =1 ( ω i ) , where p i,a i is the probability density function induced by P i,a i . For any ω ∈ Ω × Ω and λ = ( λ , λ , λ ) (cid:62) ∈ R , deﬁne Z λ ( ω ) = ( λ + λ ) · Z ( ω ) + ( λ + λ ) · Z ( ω ) − λ · Z ( ω ) Z ( ω ) . The next corollary is immediate from Theorem 5:

Corollary 3.

Assume Assumption 1 and Assumption 3 for all a ∈ D . Then forany optimal incentive contract (cid:104)P ∗ , w ∗ ( · ) (cid:105) that induces high eﬀort in both tasks, thereexists λ ∗ ∈ R with λ ∗ + λ ∗ , λ ∗ + λ ∗ > such that all cells of P ∗ are Z λ ∗ -convexand can be labeled as A , · · · , A N such that w ∗ ( A ) < · · · < w ∗ ( A N ) . Assume, inaddition, Assumption 2 for all a ∈ D . Then there exist −∞ ≤ (cid:98) z < · · · < (cid:98) z N < + ∞ such that A n = { ω : Z λ ∗ ( ω ) ∈ [ (cid:98) z n − , (cid:98) z n ) } for n = 1 , · · · , N . In a seminal paper, Holmstr¨om and Milgrom (1991) shows that when the agentfaces multiple tasks, over-incentivizing tasks that generate precise performance datamay prevent the completion of tasks that generate noisy performance data. Thatanalysis abstracts away from monitoring costs and focuses on the power of (linear)compensation schemes.Corollary 3 delivers a diﬀerent message: when it comes to allocating limited re-sources across the assessments of multiple task performances, the optimal allocationshould reﬂect the agent’s endogenous tendency to shirk each task. The usefulness ofthis result is illustrated by the next example:25 xample 3.

A cashier faces two tasks: to scan items and to project warmth tocustomers. A piece of performance data consists of the scanner data recorded by thepoint of sale (POS) system, as well as the feedback gathered from customers. ByCorollary 3, the following ratio: R = λ ∗ + λ ∗ λ ∗ + λ ∗ captures how the principal should allocate limited resources across the assessmentsof skillfulness in scanning items and warmth. Intuitively, a small R arises when thecashier is reluctant to project warmth to customers, in which case resources should bedevoted to the assessment of warmth, and the ﬁnal performance rating should dependsigniﬁcantly on such assessment.We examine how optimal resource allocation varies with the precision of rawperformance data. As in Holmstr¨om and Milgrom (1991), we assume that • ω i = a i + ξ i for i = 1 ,

2, where ξ i ’s are independent normal random variableswith mean zero and variance σ i ’s; • the cashier has CARA utility of consumption u ( w ) = 1 − exp ( − γw ).Unlike Holmstr¨om and Milgrom (1991), we do not conﬁne ourselves to linear wageschemes.In the case where the monitoring cost is an increasing function of the ratingscale, we compute R for diﬀerent values of σ , holding σ = 1 and |P| = 2 ﬁxed. Ourﬁndings are reported in Figure 5. Assuming that our parameter choices are reasonableones, we arrive at the following conclusion: as skillfulness becomes easier to measure–thanks to the advent of high quality scanner data–the cashier becomes more afraidto shirk the scanning task and less so about projecting coldness to customers; tocorrect the cashier’s incentive, resources should be shifted towards the processing andanalysis of customer feedback and away from that of scanner data. In the future,one can test this prediction by running ﬁeld experiments as that of Bloom et al.(2013). For example, one can randomize the quality of scanner data among otherwisesimilar stores and examine the eﬀect on resource allocation between scanner data andcustomer feedback. 26 .50.60.70.80.91.0 0.25 0.50 0.75 1.00 s R Figure 5: Plot R against σ : H ( P , a ) = f ( |P| ), |P| = 2; u ( w ) = 1 − exp ( − . w ); c (00) = 0, c (01) = 0 . c (10) = 0 . c (11) = 0 . ξ and ξ are normallydistributed with mean zero and σ = 1. We conclude by posing open questions. First, our work is broadly related to theburgeoning literature on information design (see, e.g., Bergemann and Morris (2019)for a survey), and we hope it inspires new research questions such as how to conductcostly yet ﬂexible monitoring in long-term employment relationships. Second, ourtheory may guide investigations into empirical issues such as how advancements inbig data technologies have aﬀected the design and implementation of monitoringtechnologies, and whether they can partially explain the heterogeneity in the internalorganizations of otherwise similar ﬁrms. We hope that someone, maybe ourselves,will pursue these research agendas in the future.

A Omitted Proofs

A.1 Proofs of Section 3

In this appendix, write any N -partitional contract (cid:104)P , w ( · ) (cid:105) as the correspondingtuple (cid:104) A n , π n , z n , w n (cid:105) Nn =1 , where A n is a generic cell of P , π n = P ( A n ), z n = z ( A n )and w n = w ( A n ). Assume w.l.o.g. that z ≤ · · · ≤ z N .27 .1.1 Useful Lemmas Proof of Lemma 1

Proof.

The wage-minimization problem for given monitoring technology (cid:104) A n , π n , z n (cid:105) Nn =1 as in Lemma 1 ismin (cid:104) ˜ w n (cid:105) N (cid:88) n =1 π n ˜ w n − λ (cid:32) N (cid:88) n =1 π n u ( ˜ w n ) z n − c (cid:33) − N (cid:88) n =1 η n ˜ w n , where λ and η n denote the Lagrange multipliers associated with the (IC) constraintand (LL) constraint at ˜ w n , respectively. Diﬀerentiating the objective function withrespect to ˜ w n and setting the result equal to zero yields λz n u (cid:48) ( w n ) = 1 − η n /π n , implying that u (cid:48) ( w n ) = 1 / ( λz n ) if and only if w n > Proof.

Fix any optimal incentive contract that induces high eﬀort from the agent andlet (cid:104) A n , π n , z n , w n (cid:105) Nn =1 be the corresponding tuple. Note that N ≥

2. By Assumption1(b), if w j = w k for some j (cid:54) = k , then merging A j and A k has no eﬀect on the incentivecost but strictly reduces the monitoring cost, which contradicts the optimality of theoriginal contract. Then from Lemma 1 and the assumption z ≤ · · · ≤ z N , it followsthat 0 ≤ w < · · · < w N and z < · · · < z N . In particular, we must have z < (cid:80) Nn =1 π n z n = 0. This implies w = 0, because otherwise letting w = 0reduces the expected wage and relaxes the (IC) constraint while keeping the (LL)constraint satisﬁed. Finally, combining w n > n ≥ z n > n ≥ Lemma 3.

For all A ∈ Σ such that P ( A ) > and (cid:15) ∈ (0 , P ( A )] , there exists A (cid:15) ⊂ A such that P ( A (cid:15) ) = (cid:15) and z ( A (cid:15) ) = z ( A ) .Proof. Let A be as above. Since P admits a density, it follows that for all t ∈ (0 , P ( A )], there exists B t ⊂ A such that P ( B t ) = t and Z ( ω (cid:48) ) ≤ Z ( ω ) for all ω ∈ B t and ω (cid:48) ∈ A \ B t . Likewise, there exists C t ⊂ A such that P ( C t ) = t and Z ( ω (cid:48) ) ≥ Z ( ω ) for all ω ∈ C t and ω (cid:48) ∈ A \ C t . For t = 0 deﬁne B = C = ∅ .28et (cid:15) be as above. Consider B t ∪ C (cid:15) − t , t ∈ [0 , (cid:15) ]. Since z ( B t ) ≥ z ( A ) and z ( C (cid:15) − t ) ≤ z ( A ) for all t ∈ (0 , (cid:15) ) and z ( B t ∪ C (cid:15) − t ) is continuous in t (because P admits a density), there exists t ∈ [0 , (cid:15) ] such that z ( B t ∪ C (cid:15) − t ) = z ( A ). Meanwhile P ( B t ∪ C (cid:15) − t ) = (cid:15) by construction, so let A (cid:15) = B t ∪ C (cid:15) − t and we are done. A.1.2 Proof of Theorem 1

Proof.

Take any optimal incentive contract that induces high eﬀort from the agentand let (cid:104) A n , π n , z n , w n (cid:105) Nn =1 be the corresponding tuple. Suppose, to the contrary,that some A j is not Z -convex. By Deﬁnition 1, there exist A (cid:48) , A (cid:48)(cid:48) ⊂ A j and ˜ A ⊂ A k , k (cid:54) = j such that (i) P ( A (cid:48) ), P ( A (cid:48)(cid:48) ), P ( ˜ A ) >

0, and (ii) ˜ z = (1 − s ) z (cid:48) + sz (cid:48)(cid:48) ,where z (cid:48) := z ( A (cid:48) ) (cid:54) = z (cid:48)(cid:48) := z ( A (cid:48)(cid:48) ), ˜ z := z ( ˜ A ) and s ∈ (0 , (cid:15) ∈ (0 , min { P ( A (cid:48) ) , P ( A (cid:48)(cid:48) ) , P ( ˜ A ) } ), there exist A (cid:48) (cid:15) ⊂ A (cid:48) , A (cid:48)(cid:48) (cid:15) ⊂ A (cid:48)(cid:48) and ˜ A (cid:15) ⊂ ˜ A such that (i) P ( A (cid:48) (cid:15) ) = P ( A (cid:48)(cid:48) (cid:15) ) = P ( ˜ A (cid:15) ) = (cid:15) , and (ii) z ( A (cid:48) (cid:15) ) = z (cid:48) , z ( A (cid:48)(cid:48) (cid:15) ) = z (cid:48)(cid:48) and z ( ˜ A (cid:15) ) = ˜ z .Consider two perturbations to the monitoring technology: (a) move A (cid:48) (cid:15) to A k and˜ A (cid:15) to A j ; (b) move ˜ A (cid:15) to A j and A (cid:48)(cid:48) (cid:15) to A k . By construction, neither perturbationaﬀects the probability distribution of the output signal under high eﬀort and hencethe monitoring cost. Below we demonstrate that one of them strictly reduces theincentive cost compared to the original (optimal) contract. Perturbation (a)

Let (cid:104) A n ( (cid:15) ) , π n , z n ( (cid:15) ) (cid:105) Nn =1 be the tuple associated with the mon-itoring technology after perturbation (a). By construction, A j ( (cid:15) ) = ( A j ∪ ˜ A (cid:15) ) \ A (cid:48) (cid:15) ,so z j ( (cid:15) ) = π j z j − (cid:15)z (cid:48) + (cid:15) ˜ zπ j = z j + s ( z (cid:48)(cid:48) − z (cid:48) ) π j (cid:15). Likewise, A k ( (cid:15) ) = ( A k ∪ A (cid:48) (cid:15) ) \ ˜ A (cid:15) and A n ( (cid:15) ) = A n for n (cid:54) = j, k , and similar algebraicmanipulation as above yields  z j ( (cid:15) ) = z j + s ( z (cid:48)(cid:48) − z (cid:48) ) π j (cid:15),z k ( (cid:15) ) = z k − s ( z (cid:48)(cid:48) − z (cid:48) ) π k (cid:15),z n ( (cid:15) ) = z n ∀ n (cid:54) = j, k. (A.1)29onsider wage proﬁle (cid:104) w n ( (cid:15) ) (cid:105) Nn =1 such that w ( (cid:15) ) = 0 and the (IC) constraint remainsbinding after the perturbation, i.e., N (cid:88) n =1 π n u ( w n ( (cid:15) )) z n ( (cid:15) ) = N (cid:88) n =1 π n u ( w n ) z n = c. (A.2)A close inspection of Equations (A.1) and (A.2) reveals the existence of M > (cid:15) such that when (cid:15) is small, there exist wage proﬁles as above thatsatisfy | w n ( (cid:15) ) − w n | < M (cid:15) for all n and hence the (LL) constraint by Lemma 2. With a slight abuse of notation, write ˙ w n ( (cid:15) ) = ( w n ( (cid:15) ) − w n ) /(cid:15) and ˙ z n ( (cid:15) ) =( z n ( (cid:15) ) − z n ) /(cid:15) , and note that ˙ w ( (cid:15) ) = 0. When (cid:15) is small, expanding Equation(A.2) using the twice-diﬀerentiability of u ( · ) and | w n ( (cid:15) ) − w n | ∼ O ( (cid:15) ) yields N (cid:88) n =1 π n u ( w n ) z n = N (cid:88) n =1 π n (cid:0) u ( w n ) + u (cid:48) ( w n ) · ˙ w n ( (cid:15) ) · (cid:15) + O (cid:0) (cid:15) (cid:1)(cid:1) ( z n + ˙ z n ( (cid:15) ) · (cid:15) ) . Multiply the above equation by the Lagrange multiplier λ > N (cid:88) n =1 π n · u (cid:48) ( w n ) · λz n · ˙ w n ( (cid:15) ) = − λ N (cid:88) n =1 u ( w n ) · π n ˙ z n ( (cid:15) ) + O ( (cid:15) ) , and simplifying using ˙ w ( (cid:15) ) = 0, u (cid:48) ( w n ) = 1 / ( λz n ) for n ≥ N (cid:88) n =1 π n ˙ w n ( (cid:15) ) = s ( u ( w k ) − u ( w j )) ( λz (cid:48)(cid:48) − λz (cid:48) ) + O ( (cid:15) ) . (A.3) Perturbation (b)

Repeating the above argument for perturbation (b) yields N (cid:88) n =1 π n ˙ w n ( (cid:15) ) = − λ (1 − s ) ( u ( w k ) − u ( w j )) ( z (cid:48)(cid:48) − z (cid:48) ) + O ( (cid:15) ) . (A.4) To be precise, recall that u ( w n ), z n > n ≥ (cid:15) is small, z n ( (cid:15) ) > n ≥ (cid:80) Nn =2 π n u ( x n ) z n ( (cid:15) ) = (cid:80) Nn =2 π n u ( w n ) z n yields wage proﬁles as above. Note that we do not assume diﬀerentiability of w n ( (cid:15) ) or z n ( (cid:15) ) with respect to (cid:15) . The samedisclaimer applies to the remainder of this paper. u ( w j ) (cid:54) = u ( w k ) (Lemma 2), z (cid:48) (cid:54) = z (cid:48)(cid:48) (by assumption) and λ >

0, it followsthat the right-hand side of either Equation (A.3) or (A.4) is strictly negative when (cid:15) is small. Thus for either perturbation (a) or (b), we can construct a wage proﬁle thatincurs a lower incentive cost than the original optimal contract, and this leads to acontradiction.

A.1.3 Proof of Theorem 2

Proof.

By Theorem 1, any optimal monitoring technology with at most N ∈ { , · · · , K } cells is fully characterized by N − (cid:98) z , · · · , (cid:98) z N − satisfying min Z (Ω) ≤ (cid:98) z ≤· · · ≤ (cid:98) z N − ≤ max Z (Ω). Write (cid:98) z = ( (cid:98) z , · · · , (cid:98) z N − ) (cid:62) . Deﬁne Z N = { (cid:98) z : min Z (Ω) ≤ (cid:98) z ≤ · · · ≤ (cid:98) z N − ≤ max Z (Ω) } , equip Z N with the sup norm (cid:107) · (cid:107) , and note that Z N is compact by Assumption 3.Let W ( (cid:98) z ) be the minimal incentive cost for inducing high eﬀort from the agent underthe monitoring technology formed by (cid:98) z . Note that W ( (cid:98) z ) exists and is ﬁnite if andonly if min Z (Ω) < (cid:98) z n < max Z (Ω) for some n , because then z ( A ) (cid:54)≡ A ’s formed under (cid:98) z , so W ( (cid:98) z ) can be solved by applying Lemma1. We proceed in two steps. Step 1

Show that W ( (cid:98) z ) is continuous in (cid:98) z for any given N ∈ { , · · · , K } .Fix any (cid:98) z ∈ Z N such that W ( (cid:98) z ) is ﬁnite. W.l.o.g. consider the case where (cid:98) z n ’sare all distinct. For suﬃciently small δ >

0, let (cid:98) z δ be any element of Z N such that (cid:107) (cid:98) z δ − (cid:98) z (cid:107) < δ . Let π n and z n (resp. π δn and z δn ) denote the probability (under a = 1)and z -value of A n = { ω : Z ( ω ) ∈ [ (cid:98) z n − , (cid:98) z n ) } (resp. A δn = (cid:8) ω : Z ( ω ) ∈ [ (cid:98) z δn − , (cid:98) z δn ) (cid:9) ),respectively. Let w n denote the optimal wage at A n .Fix any (cid:15) >

0, and consider the wage proﬁle that pays w n + (cid:15) at A δn if z δn > w n otherwise. By construction, this wage proﬁle satisﬁes the (LL) constraint. UnderAssumptions 2 and 3, it satisﬁes the (IC) constraint when δ is suﬃciently small:lim δ → (cid:88) n π δn u (cid:0) w n + 1 z δn > · (cid:15) (cid:1) z δn = (cid:88) n π n u ( w n + 1 z n > · (cid:15) ) z n > c, (cid:80) n π n z n = 0 and z n (cid:54)≡ z n > n . Inaddition, since lim δ → (cid:88) n π δn (cid:0) w n + 1 z δn > · (cid:15) (cid:1) = (cid:88) n π n ( w n + 1 z n > · (cid:15) ) , it follows that when δ is suﬃciently small, W (cid:0)(cid:98) z δ (cid:1) − W ( (cid:98) z ) ≤ (cid:88) n π δn (cid:0) w n + 1 z δn > · (cid:15) (cid:1) − (cid:88) n π n w n < (cid:15), where the ﬁrst inequality holds because the constructed wage proﬁle is not necessarilyoptimal under (cid:98) z δ . Finally, interchanging the roles between (cid:98) z and (cid:98) z δ in the abovederivation yields W ( (cid:98) z ) − W (cid:0)(cid:98) z δ (cid:1) < (cid:15) , implying that (cid:12)(cid:12) W (cid:0)(cid:98) z δ (cid:1) − W ( (cid:98) z ) (cid:12)(cid:12) < (cid:15) when δ issuﬃciently small. Step 2

Under Assumption 4(a), the following quantity: W N := min (cid:98) z ∈Z N W ( (cid:98) z )exists and is ﬁnite for all N ∈ { , · · · , K } by Step 1 and the compactness of Z N . Let m N denote the minimal rating scale attained by W N . Solvingmin ≤ N ≤ K W N + µ · f ( m N )yields the solution(s) to the principal’s problem.Under Assumption 4(b), the principal’s problem can be written as follows:min (cid:98) z ∈Z K W ( (cid:98) z ) + µ · h ( π ( (cid:98) z )) , where π ( (cid:98) z ) is the probability vector formed under (cid:98) z and is clearly continuous in (cid:98) z .The existence of solution(s) then follows from Step 1 and the compactness of Z K . A.2 Proof of Section 4

In this appendix, write any N -partitional contract (cid:104)P , w ( · ) (cid:105) as the correspondingtuple (cid:104) A n , π n , z n , w n (cid:105) Nn =1 , where A n is a generic cell of P , π n = P ( A n ), z n =32 z ,n , z ,n ) (cid:62) := ( z ( A n ) , z ( A n )) (cid:62) and w n = ( w ,n , w ,n ) (cid:62) := ( w ( A n ) , w ( A n )) (cid:62) . A.2.1 Useful Lemmas

The next lemma generalizes Lemmas 1 and 2 to encompass multiple agents:

Lemma 4.

Assume Assumption 1. Then under any optimal incentive contract thatinduces high eﬀort from both agents, (i) there exist λ , λ > such that u (cid:48) i ( w i,n ) =1 / ( λ i z i,n ) if and only if w i,n > ; (ii) w j (cid:54) = w k for all j (cid:54) = k .Proof. The wage-minimization problem for given monitoring technology (cid:104) A n , π n , z n (cid:105) Nn =1 is min (cid:104) ˜ w i,n (cid:105) (cid:88) i,n π n ˜ w i,n − (cid:88) i λ i (cid:32)(cid:88) n π n u i ( ˜ w i,n ) z i,n − c i (cid:33) − (cid:88) i,n η i,n ˜ w i,n , where λ i and η i,n denote the Lagrange multipliers associated with the (IC i ) constraintand (LL i ) constraint at ˜ w i,n , respectively. Diﬀerentiating the objective function withrespect to ˜ w i,n yields the ﬁrst-order condition in Part (i). The proof of Part (ii) isthe same as that of Lemma 2 and is therefore omitted.The next lemma plays an analogous role as that of Lemma 3: Lemma 5.

Assume Assumption 6. Fix any δ > and any A ∈ Σ such that P ( A ) > . Then for all (cid:15) ∈ (0 , P ( A )] , there exists A (cid:15) ⊂ A such that P ( A (cid:15) ) = (cid:15) and (cid:107) z ( A (cid:15) ) − z ( A ) (cid:107) < δ .Proof. With a slight abuse of notation, let P be any ﬁnite partition of Ω suchthat every B ∈ P is measurable and (cid:107) Z ( ω ) − Z ( ω (cid:48) ) (cid:107) < δ for all ω, ω (cid:48) ∈ B . P exists because P admits a density and Z (Ω) is a compact set in R . Deﬁne P + = { B ∈ P : P ( A ∩ B ) > } and P = { B ∈ P : P ( A ∩ B ) = 0 } , which areboth ﬁnite. Note that (cid:80) B ∈P P ( A ∩ B ) = 0, (cid:80) B ∈P + P ( A ∩ B ) = P ( A ) and z ( A ) = (cid:80) B ∈P + P ( A ∩ B ) z ( A ∩ B ).Since P admits a density, it follows that for all B ∈ P + , there exists C B ⊂ A ∩ B such that P ( C B ) = P ( A ∩ B ) (cid:15)/P ( A ). Also note that (cid:107) z ( C B ) − z ( A ∩ B ) (cid:107) < δ byconstruction. Let A (cid:15) = ∪ B ∈P + C B . Then P ( A (cid:15) ) = (cid:80) B ∈P + P ( A ∩ B ) (cid:15)/P ( A ) = (cid:15) (cid:107) z ( A (cid:15) ) − z ( A ) (cid:107) = (cid:107) (cid:88) B ∈P + P ( A ∩ B ) P ( A ) ( z ( C B ) − z ( A ∩ B )) (cid:107)≤ (cid:88) B ∈P + P ( A ∩ B ) P ( A ) (cid:107) z ( C B ) − z ( A ∩ B ) (cid:107) < δ. A.2.2 Proof of Theorem 3

Proof.

Take any optimal incentive contract that induces high eﬀort from both agentsand let (cid:104) A n , π n , z n , w n (cid:105) Nn =1 be the corresponding tuple. Suppose, to the contrary,that some A j is not Z -convex. By deﬁnition, there exist A (cid:48) , A (cid:48)(cid:48) ⊂ A j and ˜ A ∈ A k , k (cid:54) = j such that (i) P ( A (cid:48) ), P ( A (cid:48)(cid:48) ), P ( ˜ A ) >

0, and (ii) ˜ z = (1 − s ) z (cid:48) + s z (cid:48)(cid:48) where z (cid:48) := z ( A (cid:48) ) (cid:54) = z (cid:48)(cid:48) := z ( A (cid:48)(cid:48) ), ˜ z := z ( ˜ A ) and s ∈ (0 , δ > (cid:15) ∈ (0 , min { P ( A (cid:48) ) , P ( A (cid:48)(cid:48) ) , P ( ˜ A ) } ), there exist A (cid:48) (cid:15) ⊂ A (cid:48) , A (cid:48)(cid:48) (cid:15) ⊂ A (cid:48)(cid:48) and ˜ A (cid:15) ⊂ ˜ A such that (i) P ( A (cid:48)(cid:48) (cid:15) ) = P ( A (cid:48)(cid:48) (cid:15) ) = P ( ˜ A (cid:15) ) = (cid:15) , and (ii) (cid:107) z ( A (cid:48) (cid:15) ) − z (cid:48) (cid:107) , (cid:107) z ( A (cid:48)(cid:48) (cid:15) ) − z (cid:48)(cid:48) (cid:107) , (cid:107) z ( ˜ A (cid:15) ) − ˜ z (cid:107) < δ .Consider two perturbations to the monitoring technology: (a) move A (cid:48) (cid:15) to A k and˜ A (cid:15) to A j ; (b) move ˜ A (cid:15) to A j and A (cid:48)(cid:48) (cid:15) to A k . By Assumption 1, neither perturbationaﬀects the probability distribution of the output signal under a = and hence themonitoring cost. Below we demonstrate that one of them strictly reduces the incentivecost compared to the original optimal contract. Perturbation (a)

Let (cid:104) A n ( (cid:15) ) , π n , z n ( (cid:15) ) (cid:105) Nn =1 denote the tuple associated with themonitoring technology after perturbation (a), where A j ( (cid:15) ) = ( A j ∪ ˜ A (cid:15) ) \ A (cid:48) (cid:15) , A k ( (cid:15) ) =( A k ∪ A (cid:48) (cid:15) ) \ ˜ A (cid:15) and A n ( (cid:15) ) = A n for n (cid:54) = j, k . Straightforward algebra shows that  z j ( (cid:15) ) = z j + z ( ˜ A (cid:15) ) − z ( A (cid:48) (cid:15) ) π j (cid:15), z k ( (cid:15) ) = z k − z ( ˜ A (cid:15) ) − z ( A (cid:48) (cid:15) ) π k (cid:15), z n ( (cid:15) ) = z n ∀ n (cid:54) = j, k, (A.5)34nd that (cid:107) z ( ˜ A (cid:15) ) − z ( A (cid:48) (cid:15) ) − (˜ z − z (cid:48) ) (cid:107) ≤(cid:107) z ( ˜ A (cid:15) ) − ˜ z (cid:107) + (cid:107) z ( A (cid:48) (cid:15) ) − z (cid:48) (cid:107) < min (cid:26) δ, ω ∈ Ω (cid:107) Z ( ω ) (cid:107) (cid:27) . (A.6)Deﬁne B i = { n : w i,n = 0 } for i = 1 ,

2. Consider wage proﬁle (cid:104) w n ( (cid:15) ) (cid:105) Nn =1 such thatfor i = 1 ,

2: (1) w i,n ( (cid:15) ) = w i,n = 0 for n ∈ B i ; (2) agent i ’s incentive compatibilityconstraint remains binding after perturbation (a), i.e., N (cid:88) n =1 π n u i ( w i,n ( (cid:15) )) z i,n ( (cid:15) ) = N (cid:88) n =1 π n u i ( w i,n ) z i,n = c i . (A.7)A close inspection of Equations (A.5)-(A.7) reveals the existence of M > (cid:15) and δ such that when (cid:15) is suﬃciently small, there exist wage proﬁles asabove that satisfy (cid:107) w n ( (cid:15) ) − w n (cid:107) < M (cid:15) for all n and hence (LL i ) constraints.With a slight abuse of notation, write ˙ w n ( (cid:15) ) = ( w n ( (cid:15) ) − w n ) /(cid:15) and ˙ z n ( (cid:15) ) =( z n ( (cid:15) ) − z n ) /(cid:15) , and note that ˙ w i,n ( (cid:15) ) = 0 for i = 1 , n ∈ B i . When (cid:15) is small,expanding Equation (A.7) using the twice-diﬀerentiability of u i ( · ) and | w i,n ( (cid:15) ) − w i,n | ∼ O ( (cid:15) ) and multiplying the result by the Lagrange multiplier λ i > i ) constraint prior to the perturbation yields N (cid:88) n =1 π n · u (cid:48) i ( w i,n ) · λ i z i,n · ˙ w i,n ( (cid:15) ) = − λ i N (cid:88) n =1 u i ( w i,n ) · π n ˙ z i,n ( (cid:15) ) + O ( (cid:15) ) . Simplifying using ˙ w i,n ( (cid:15) ) = 0 if n ∈ B i , u (cid:48) ( w i,n ) = 1 / ( λ i z i,n ) if n / ∈ B i (Lemma 4)and Equation (A.5) yields (cid:88) i,n π n ˙ w i,n = ( u k − u j ) (cid:62) Λ ( z ( ˜ A (cid:15) ) − z ( A (cid:48) (cid:15) )) + O ( (cid:15) ) , where u n = ( u ( w ,n ) , u ( w ,n )) (cid:62) for n = k, j and Λ = (cid:0) λ λ (cid:1) . Further simplifying35sing Equation (A.6) and ˜ z = (1 − s ) z (cid:48) + s z (cid:48)(cid:48) yields the following when δ is small: (cid:88) i,n π n ˙ w i,n = ( u k − u j ) (cid:62) Λ (˜ z − z (cid:48) ) + O ( (cid:15) )+ ( u k − u j ) (cid:62) Λ ( z ( ˜ A (cid:15) ) − z ( A (cid:48) (cid:15) ) − (˜ z − z (cid:48) ))= s ( u k − u j ) (cid:62) Λ ( z (cid:48)(cid:48) − z (cid:48) ) + O ( (cid:15) ) + O ( δ ) . (A.8) Perturbation (b)

Repeating the above argument for perturbation (b) yields (cid:88) i,n π n ˙ w i,n = − (1 − s ) ( u k − u j ) (cid:62) Λ ( z (cid:48)(cid:48) − z (cid:48) ) + O ( (cid:15) ) + O ( δ ) . (A.9)Consider two cases:Case 1 ( u k − u j ) (cid:62) Λ ( z (cid:48)(cid:48) − z (cid:48) ) (cid:54) = 0. In this case, the right-hand sides of Equations (A.8)and (A.9) have the opposite signs when (cid:15) and δ are suﬃciently small, and theremainder of the proof is the same as that of Theorem 1.Case 2 ( u k − u j ) (cid:62) Λ ( z (cid:48)(cid:48) − z (cid:48) ) = 0. In this case, note that ( u k − u j ) (cid:62) Λ (cid:54) = (cid:62) by Lemma4, where denotes the 2-vector of zeros. Then from Assumption 5 ( Z is dis-tributed atomlessly on a connected set), there exist B (cid:48) ⊂ A (cid:48) , B (cid:48)(cid:48) ⊂ A (cid:48)(cid:48) and˜ B ⊂ ˜ A such that P ( B (cid:48) ), P ( B (cid:48)(cid:48) ), P ( ˜ B ) > z ( ˜ B ) = (1 − s (cid:48) ) z ( B (cid:48) ) + s (cid:48) z ( B (cid:48)(cid:48) )for some s (cid:48) ∈ (0 , u k − u j ) (cid:62) Λ ( z ( B (cid:48)(cid:48) ) − z ( B (cid:48) )) (cid:54) = 0. Replacing A (cid:48) , A (cid:48)(cid:48) and ˜ A with B (cid:48) , B (cid:48)(cid:48) and ˜ B , respectively, in the above argument gives the desiredresult. A.2.3 Proof of Theorem 4

Proof.

By Theorem 3, any optimal monitoring technology with at most N ∈ { , · · · , K } cells is fully characterized by (1) a ﬁnite number q N of vertices z , · · · , z q N in Z (Ω),and (2) a q N × q N adjacency matrix M whose lm ’th entry equals 1 if z l and z m areconnected by a line segment and 0 otherwise. By deﬁnition, M is symmetric andhence is determined by its upper triangle entries, which can be either 0 or 1. Thus M belongs to M N := { , } q N × ( q N − / , which is a ﬁnite set.36rite (cid:126) z for ( z , · · · , z q N ) (cid:62) . For any N ∈ { , · · · , K } and adjacency matrix M ∈M N , deﬁne Z N ( M ) = { (cid:126) z : ( (cid:126) z , M ) partitions Z (Ω) into at most N convex polygons } , equip Z N ( M ) with the sup norm (cid:107) · (cid:107) , and note that Z N ( M ) is compact by Assump-tion 6. Let W ( (cid:126) z , M ) denote the minimal incentive cost for inducing high eﬀort fromboth agents under the monitoring technology formed by ( (cid:126) z , M ). W ( (cid:126) z , M ) exists andis ﬁnite if and only if for all i = 1 , z i ( A ) (cid:54)≡ A ’sformed under ( (cid:126) z , M ).We proceed in two steps. Step 1

Show that W ( (cid:126) z , M ) is continuous in the ﬁrst argument for any given N ∈{ , · · · , K } and M ∈ M N .Fix any (cid:126) z ∈ Z N ( M ) such that W ( (cid:126) z , M ) is ﬁnite. For suﬃciently small δ > (cid:126) z δ be any element of Z N ( M ) such that (cid:107) (cid:126) z δ − (cid:126) z (cid:107) < δ . Label the performancecategories formed under ( (cid:126) z , M ) and (cid:0) (cid:126) z δ , M (cid:1) as A n ’s and A δn ’s, respectively, such thatfor n = 1 , , · · · , z l is a vertex of cl ( Z ( A n )) if and only if z δl is a vertex of cl (cid:0) Z (cid:0) A δn (cid:1)(cid:1) .Let π n and z i,n (resp. π δn and z δi,n ) denote the probability (under a = ) and z i -valueof A n (resp. A δn ), respectively. Let w i,n denote the optimal wage of agent i at A n .Fix any (cid:15) >

0. Consider the wage proﬁle that pays w i,n + (cid:15)/ i if z δi,n > w i,n otherwise and therefore satisﬁes the (LL i ) constraint. Under Assumptions 5and 6, the (IC i ) constraint is satisﬁed when δ is suﬃciently small:lim δ → (cid:88) n π δi,n u (cid:16) w i,n + 1 z δi,n > · (cid:15)/ (cid:17) z δi,n = (cid:88) n π n u (cid:0) w i,n + 1 z i,n > · (cid:15)/ (cid:1) z i,n > c i , where the inequality holds because (cid:80) n π n z i,n = 0 and z i,n (cid:54)≡ z i,n > n .In addition, sincelim δ → (cid:88) i,n π δn (cid:16) w i,n + 1 z δi,n > · (cid:15)/ (cid:17) = (cid:88) i,n π n (cid:0) w i,n + 1 z i,n > · (cid:15)/ (cid:1) ,

37t follows that when δ is suﬃciently small, W (cid:0) (cid:126) z δ , M (cid:1) − W ( (cid:126) z , M ) ≤ (cid:88) i,n π δn (cid:16) w i,n + 1 z δi,n > · (cid:15)/ (cid:17) − (cid:88) i,n π n w i,n < (cid:15), where the ﬁrst inequality holds because the constructed wage proﬁle is not necessarilyoptimal under (cid:0) (cid:126) z δ , M (cid:1) . Finally, interchanging the roles between (cid:126) z δ and (cid:126) z in the abovederivation yields W ( (cid:126) z , M ) − W (cid:0) (cid:126) z δ , M (cid:1) < (cid:15) , implying that (cid:12)(cid:12) W (cid:0) (cid:126) z δ , M (cid:1) − W ( (cid:126) z , M ) (cid:12)(cid:12) <(cid:15) when δ is suﬃciently small. Step 2

Under Assumption 4(a), the following quantity: W N := min M ∈M N ,(cid:126) z ∈Z N ( M ) W ( (cid:126) z , M )exists and is ﬁnite for all N ∈ { , · · · , K } by Step 1, the compactness of Z N ( M )and the ﬁniteness of M N . Under Assumption 4(b), the principal’s problem can bewritten as follows: min M ∈M K ,(cid:126) z ∈Z K ( M ) W ( (cid:126) z , M ) + µ · h ( π ( (cid:126) z , M )) , where π ( (cid:126) z , M ) is the probability vector formed under ( (cid:126) z , M ) and is clearly continuousin (cid:126) z . The remainder of the proof is the same as that of Theorem 2 and is thereforeomitted. A.3 Proofs of Section 5

In this appendix, write z ( A ) = ( z a ( A )) (cid:62) a ∈D for any set A ∈ Σ of positive mea-sure, as well as any N -partitional contract (cid:104)P , w ( · ) (cid:105) as the corresponding tuple (cid:104) A n , π n , z n , w n (cid:105) Nn =1 , where A n is a generic cell of P , π n = P a ∗ ( A n ), z n = z ( A n )and w n = w ( A n ). Assume w.l.o.g. that w ≤ · · · ≤ w N . A.3.1 Useful Lemma

The next lemma generalizes Lemmas 1 and 2 to encompass multiple agents:

Lemma 6.

Assume Assumption 1. Then for any optimal incentive contract thatinduces a ∗ , (i) there exists λ ∈ R |D| + with (cid:107) λ (cid:107) > such that u (cid:48) ( w n ) = 1 / (cid:0) λ (cid:62) z n (cid:1) if nd only if w n > ; (ii) λ (cid:62) z < < λ (cid:62) z < · · · and w < w < · · · .Proof. The wage-minimization problem for given monitoring technology (cid:104) A n , π n , z n (cid:105) Nn =1 is min (cid:104) ˜ w n (cid:105) (cid:88) n π n ˜ w n − (cid:88) n π n u ( ˜ w n ) · λ (cid:62) z n − (cid:88) n η n ˜ w n , where λ denotes the proﬁle of the Lagrange multipliers associated with the (IC a )constraints and η n the Lagrange multiplier associated with the (LL) constraint at˜ w n . Note that (cid:107) λ (cid:107) >

0, because otherwise all (IC a ) constraints are slack and hencesubtracting a small (cid:15) > w n yields the ﬁrst-order conditionin Part (i). The proof of Part (ii) is the same as that of Lemma 2 and is thereforeomitted. A.3.2 Proof of Theorem 5

Proof.

Take any optimal incentive contract that induces a ∗ . Let (cid:104) A n , π n , z n , w n (cid:105) Nn =1 be the corresponding tuple and λ be the proﬁle of the Lagrange multipliers as-sociated with the (IC a ) constraints. Suppose, to the contrary, that some A j isnot Z λ -convex. Then there exist A (cid:48) , A (cid:48)(cid:48) ⊂ A j and ˜ A ⊂ A k , k (cid:54) = j such that(i) P a ∗ ( A (cid:48) ) , P a ∗ ( A (cid:48)(cid:48) ) , P a ∗ ( ˜ A ) >

0, and (ii) λ (cid:62) ˜ z = (1 − s ) λ (cid:62) z (cid:48) + s λ (cid:62) z (cid:48)(cid:48) , where z (cid:48) := z ( A (cid:48) ), z (cid:48)(cid:48) := z ( A (cid:48)(cid:48) ), ˜ z := z ( ˜ A ), λ (cid:62) z (cid:48) (cid:54) = λ (cid:62) z (cid:48)(cid:48) and s ∈ (0 , (cid:15) ∈ (0 , min { P a ∗ ( A (cid:48) ) , P a ∗ ( A (cid:48)(cid:48) ) , P a ∗ ( ˜ A ) } ), there exist A (cid:48) (cid:15) ⊂ A (cid:48) , A (cid:48)(cid:48) (cid:15) ⊂ A (cid:48)(cid:48) and˜ A (cid:15) ⊂ ˜ A such that (i) P a ∗ ( A (cid:48) (cid:15) ) = P a ∗ ( A (cid:48)(cid:48) (cid:15) ) = P a ∗ ( ˜ A (cid:15) ) = (cid:15) , and (ii) λ (cid:62) z ( A (cid:48) (cid:15) ) = λ (cid:62) z (cid:48) , λ (cid:62) z ( A (cid:48)(cid:48) (cid:15) ) = λ (cid:62) z (cid:48)(cid:48) and λ (cid:62) z ( ˜ A (cid:15) ) = λ (cid:62) ˜ z .Consider two perturbations to the monitoring technology: (a) move A (cid:48) (cid:15) to A k and ˜ A (cid:15) to A j , and (b) move ˜ A (cid:15) to A j and A (cid:48)(cid:48) (cid:15) to A k . By Assumption 1, neitherperturbation aﬀects the probability distribution of the output signal under action a ∗ and hence the monitoring cost. Below we demonstrate that one of them strictlyreduces the incentive cost compared to the original (optimal) contract. Perturbation (a)

Let (cid:104) A n ( (cid:15) ) , π n , z n ( (cid:15) ) (cid:105) Nn =1 be the tuple associated with the mon-itoring technology after perturbation (a), where A j ( (cid:15) ) = ( A j ∪ ˜ A (cid:15) ) \ A (cid:48) (cid:15) , A k ( (cid:15) ) =39 A k ∪ A (cid:48) (cid:15) ) \ ˜ A (cid:15) and A n ( (cid:15) ) = A n for n (cid:54) = j, k . Straightforward algebra shows that  z j ( (cid:15) ) = z j + z ( ˜ A (cid:15) ) − z ( A (cid:48) (cid:15) ) π j (cid:15), z k ( (cid:15) ) = z k − z ( ˜ A (cid:15) ) − z ( A (cid:48) (cid:15) ) π k (cid:15), z n ( (cid:15) ) = z n ∀ n (cid:54) = j, k, (A.10)and that (cid:107) z ( ˜ A (cid:15) ) − z ( A (cid:48) (cid:15) ) (cid:107) ≤ (cid:107) z ( ˜ A (cid:15) ) (cid:107) + (cid:107) z ( A (cid:48) (cid:15) ) (cid:107) ≤ ω ∈ Ω (cid:107) Z ( ω ) (cid:107) . (A.11)Consider wage proﬁle (cid:104) w n ( (cid:15) ) (cid:105) Nn =1 such that (1) w ( (cid:15) ) = w = 0 and (2) all (IC a )constraints are slack by O ( (cid:15) ) after the perturbation, i.e.,0 ≤ N (cid:88) n =1 π n u ( w n ( (cid:15) )) z a,n ( (cid:15) ) − N (cid:88) n =1 π n u ( w n ) z a,n ∼ O ( (cid:15) ) ∀ a ∈ D . (A.12)A close inspection of Equations (A.10)-(A.12) reveals the existence of M > (cid:15) is suﬃciently small, there exist wage proﬁles as above that satisfy | w n ( (cid:15) ) − w n | < M (cid:15) for all n and hence the (LL) constraint. Write ˙ w n = ( w n ( (cid:15) ) − w n ) /(cid:15) and ˙ z n ( (cid:15) ) = ( z n ( (cid:15) ) − z n ) /(cid:15) . When (cid:15) is small, expand-ing Equation (A.12) using the twice-diﬀerentiability of u ( · ) and | w n ( (cid:15) ) − w n | ∼ O ( (cid:15) )and multiplying the result by λ yields N (cid:88) n =1 π n · u (cid:48) ( w n ) · λ (cid:62) z n · ˙ w n ( (cid:15) ) = − N (cid:88) n =1 u ( w n ) · π n · λ (cid:62) ˙ z n ( (cid:15) ) + O ( (cid:15) ) . Simplifying using ˙ w ( (cid:15) ) = 0, u (cid:48) ( w n ) = 1 / (cid:0) λ (cid:62) z n (cid:1) for n ≥ N (cid:88) n =1 π n ˙ w n ( (cid:15) ) = s ( u ( w k ) − u ( w j )) (cid:0) λ (cid:62) z (cid:48)(cid:48) − λ (cid:62) z (cid:48) (cid:1) + O ( (cid:15) ) . (A.13) To see why, deﬁne κ a = (cid:80) Nn =2 π n u ( w n ) z a,n and S a = (cid:110) (cid:104) x n (cid:105) Nn =2 ∈ R N − : (cid:80) Nn =2 x n z a,n ≥ κ a (cid:111) for each a ∈ D , and note that (cid:104) π n u ( w n ) (cid:105) Nn =2 ∈ ∩ a ∈D S a . If we cannot construct a wage proﬁleas above, then there exist a (cid:48) , a (cid:48)(cid:48) ∈ D such that ∩ a = a (cid:48) ,a (cid:48)(cid:48) (cid:110) (cid:104) x n (cid:105) Nn =2 ∈ R N − : (cid:80) Nn =2 x n z a,n ≥ κ a (cid:111) = (cid:110) (cid:104) x n (cid:105) Nn =2 ∈ R N − : (cid:80) Nn =2 x n z a (cid:48) ,n = κ a (cid:48) (cid:111) and hence z a (cid:48)(cid:48) ,n = − z a (cid:48) ,n for n = 2 , · · · , N and κ a (cid:48)(cid:48) = − κ a (cid:48) . In the meantime, κ a ≥ c ( a ∗ ) − c ( a ) > a ∈ D , thus reaching a contradiction. erturbation (b) Repeating the above argument for perturbation (b) yields N (cid:88) n =1 π n ˙ w n ( (cid:15) ) = − (1 − s ) ( u ( w k ) − u ( w j )) (cid:0) λ (cid:62) z (cid:48)(cid:48) − λ (cid:62) z (cid:48) (cid:1) + O ( (cid:15) ) . (A.14)Since u ( w k ) (cid:54) = u ( w j ) by Lemma 6 and λ (cid:62) z (cid:48)(cid:48) (cid:54) = λ (cid:62) z (cid:48) by assumption, the right-handsides of Equations (A.13) and (A.14) have the opposite signs when (cid:15) is small. Theremainder of the proof is the same as that of Theorem 1 and is therefore omitted. A.3.3 Proof of Theorem 6

Proof.

Deﬁne Λ = (cid:110) λ : λ ∈ R |D| + and (cid:107) λ (cid:107) |D| = 1 (cid:111) , where (cid:107)·(cid:107) |D| denotes the |D| -dimensional Euclidean norm. By Theorem 5, any optimalmonitoring technology with at most N ∈ { , · · · , K } performance categories is fullycaptured by λ ∈ Λ and N − (cid:98) z , · · · , (cid:98) z N − such that min ω ∈ Ω λ (cid:62) Z ( ω ) ≤ (cid:98) z ≤ · · · ≤ (cid:98) z N − ≤ max ω ∈ Ω λ (cid:62) Z ( ω ). Write (cid:98) z = ( (cid:98) z , · · · , (cid:98) z N − ). Deﬁne Z N ( λ ) = (cid:26)(cid:98) z : min ω ∈ Ω λ (cid:62) Z ( ω ) ≤ (cid:98) z ≤ · · · ≤ (cid:98) z N − ≤ max ω ∈ Ω λ (cid:62) Z ( ω ) (cid:27) , equip Z N ( λ ) with the sup norm (cid:107) · (cid:107) , and note that Z N ( λ ) is compact by Assumption3. For any given pair ( λ , (cid:98) z ), write the minimal incentive cost for inducing a ∗ as W ( λ , (cid:98) z ), and note that W ( λ , (cid:98) z ) exists and is ﬁnite if and only if λ a > a ∈ D and min ω ∈ Ω λ (cid:62) Z ( ω ) < (cid:98) z n < max ω ∈ Ω λ (cid:62) Z ( ω ) for some n . The ﬁrst conditionis necessary: otherwise there exists a ∈ D such that z a ( A ) ≡ A ’s formed under ( λ , (cid:98) z ) and hence the (IC a ) constraint will be violated.We proceed in two steps. Step 1

Show that W ( λ , (cid:98) z ) is continuous in ( λ , (cid:98) z ) for any given N ∈ { , · · · , K } .Fix any λ ∈ Λ and (cid:98) z ∈ Z N ( λ ) such that W ( λ , (cid:98) z ) is ﬁnite. W.l.o.g. consider thecase where (cid:98) z n ’s are all distinct. For suﬃciently small δ >

0, let λ δ and (cid:98) z δ be anyelement of Λ and Z N (cid:0) λ δ (cid:1) , respectively, such that (cid:107) λ δ − λ (cid:107) |D| , (cid:107) (cid:98) z δ − (cid:98) z (cid:107) < δ . Let π n and z n (resp. π δn and z δn ) denote the probability (under a = a ∗ ) and |D| -vector of z -values associated with performance category A n = (cid:8) ω : λ (cid:62) Z ( ω ) ∈ [ (cid:98) z n − , (cid:98) z n ) (cid:9) (resp.41 δn = (cid:8) ω : λ δ (cid:62) Z ( ω ) ∈ [ (cid:98) z δn − , (cid:98) z δn ) (cid:9) ), respectively. Let w n denote the optimal wage at A n .Fix any (cid:15) >

0, and consider the wage proﬁle that pays w n + (cid:15) at A δn if z δa,n > a ∈ D and w n otherwise. By construction, this wage proﬁle satisﬁes the (LL)constraint. Under Assumptions 2 and 3, it satisﬁes every (IC a ) constraint when δ issmall: lim δ → (cid:88) n u (cid:32) w n + (cid:89) a (cid:48) ∈D z δa (cid:48) ,n > · (cid:15) (cid:33) π δn z δa,n = (cid:88) n u (cid:32) w n + (cid:89) a (cid:48) ∈D z a (cid:48) ,n > · (cid:15) (cid:33) π n z a,n > (cid:88) n u ( w n ) π n z a,n , where the inequality holds because that (cid:80) n π n z a (cid:48) ,n = 0 and z a (cid:48) ,n is strictly increasingin n for all a (cid:48) ∈ D so there exists n such that (cid:81) a (cid:48) ∈D z a (cid:48) ,n > = 1. To complete theproof, note thatlim δ → (cid:88) n π δn (cid:32) w n + (cid:89) a ∈D z δa,n > · (cid:15) (cid:33) = (cid:88) n π n (cid:32) w n + (cid:89) a ∈D z a,n > · (cid:15) (cid:33) , so the following holds when δ is suﬃciently small: W (cid:0) λ δ , (cid:98) z δ (cid:1) − W ( λ , (cid:98) z ) ≤ (cid:88) n π δn (cid:32) w n + (cid:89) a ∈D z δa,n > · (cid:15) (cid:33) − (cid:88) n π n w n < (cid:15). Finally, interchanging the roles between ( λ , (cid:98) z ) and (cid:0) λ δ , (cid:98) z δ (cid:1) in the above derivationyields W ( λ , (cid:98) z ) − W (cid:0) λ δ , (cid:98) z δ (cid:1) < (cid:15) , implying that (cid:12)(cid:12) W (cid:0) λ δ , (cid:98) z δ (cid:1) − W ( λ , (cid:98) z ) (cid:12)(cid:12) < (cid:15) when δ is suﬃciently small. Step 2

Under Assumption 4(a), the following quantity: W N := min λ ∈ Λ , (cid:98) z ∈Z N ( λ ) W ( λ , (cid:98) z )42xists and is ﬁnite for all N ∈ { , · · · , K } by Step 1 and the compactness of Λ and Z N ( λ ). Under Assumption 4(b), the principal’s problem can be written as follows:min λ ∈ Λ , ˆ z ∈Z K ( λ ) W ( λ , (cid:98) z ) + µ · h ( π ( λ , (cid:98) z )) , where π ( λ , (cid:98) z ) denotes the probability vector formed under ( λ , (cid:98) z ) and is continuousin its argument. The remainder of the proof is the same as that of Theorem 2 and istherefore omitted. B Other Extensions

B.1 Individual Rationality

In this appendix, let everything be as in the baseline model except that the agent isconstrained by individual rationality rather than limited liability: (cid:88) A ∈P P ( A ) u ( w ( A )) ≥ c + u. (IR)A wage scheme is w : P → R , and an optimal incentive contract that induces higheﬀort from the agent (optimal incentive contract for short) minimizes the total im-plementation cost, subject to the (IC) and (IR) constraints. Corollary 4.

Under Assumption 1, any optimal monitoring technology that induceshigh eﬀort from the agent comprises Z -convex cells.Proof. Take any optimal incentive contract and let (cid:104) A n , π n , z n , w n (cid:105) Nn =1 be the corre-sponding tuple. Assume without loss of generality that z ≤ · · · ≤ z N . Step 1

Show that z < · · · < z N and w < · · · < w N .The wage-minimization problem given (cid:104) A n , π n , z n (cid:105) Nn =1 ismin (cid:104) ˜ w n (cid:105) N (cid:88) n =1 π n ˜ w n − λ (cid:32) N (cid:88) n =1 π n u ( ˜ w n ) z n − c (cid:33) − γ (cid:32) N (cid:88) n =1 π n u ( ˜ w n ) − ( c + u ) (cid:33) , where λ and γ denote the Lagrange multipliers associated with the (IC) and (IR)constraints, respectively. Diﬀerentiating the objective function with respect to ˜ w n u (cid:48) ( w n ) = 1 λz n + γ . Thus if z j = z k for some j (cid:54) = k , then w j = w k . But then merging A j and A k has noeﬀect on the incentive cost but strictly reduces the monitoring cost by Assumption1(b), a contradiction to the optimality of the original contract. Step 2

Show Z -convexity.Suppose, to the contrary, that some A j is not Z -convex. Consider ﬁrst perturba-tion (a) in the proof of Theorem 1. Take any wage proﬁle (cid:104) w n ( (cid:15) ) (cid:105) Nn =1 such that the(IC) and (IR) constraints remain binding after the perturbation, i.e., N (cid:88) n =1 π n u ( w n ( (cid:15) )) z n ( (cid:15) ) = N (cid:88) n =1 π n u ( w n ) z n , (B.1)and N (cid:88) n =1 π n u ( w n ( (cid:15) )) = N (cid:88) n =1 π n u ( w n ) . (B.2)A close inspection of Equations (A.1), (B.1) and (B.2) reveals the existence of M > (cid:15) is suﬃciently small, there exist wage proﬁles as above such that | w n ( (cid:15) ) − w n | < M (cid:15) for all n . Write ˙ w n ( (cid:15) ) = ( w n ( (cid:15) ) − w n ) /(cid:15) and ˙ z n ( (cid:15) ) = ( z n ( (cid:15) ) − z n ) /(cid:15) , and let λ > γ > λ (B.1)+ γ (B.2) using the twice-diﬀerentiability of u ( · ) and | w n ( (cid:15) ) − w n | ∼ O ( (cid:15) ) yields the following when (cid:15) is small: N (cid:88) n =1 π n · u (cid:48) ( w n ) · ( λz n + γ ) · ˙ w n ( (cid:15) ) = − λ N (cid:88) n =1 u ( w n ) · π n ˙ z n ( (cid:15) ) + O ( (cid:15) ) . (B.3) To see why, deﬁne κ = (cid:80) Nn =1 π n u ( w n ) z n , κ = (cid:80) Nn =1 π n u ( w n ), S = (cid:110) (cid:104) x n (cid:105) Nn =1 ∈ R N : (cid:80) Nn =1 x n z n ≥ κ (cid:111) and S = (cid:110) (cid:104) x n (cid:105) Nn =1 ∈ R N : (cid:80) Nn =1 x n ≥ κ (cid:111) , and note that (cid:104) π n u ( w n ) (cid:105) Nn =1 ∈ S ∩ S . Then from z < · · · < z N , it follows that dim S ∩ S = N , and combiningwith Equation (A.1) gives the desired result. u (cid:48) ( w n ) = 1 / ( λz n + γ ) and Equation (A.1) yields N (cid:88) n =1 π n ˙ w n ( (cid:15) ) = s ( u ( w k ) − u ( w j )) ( λz (cid:48)(cid:48) − λz (cid:48) ) . (B.4)Consider next perturbation (b). Similar algebraic manipulation as above yields N (cid:88) n =1 π n ˙ w n ( (cid:15) ) = − (1 − s ) ( u ( w k ) − u ( w j )) ( λz (cid:48)(cid:48) − λz (cid:48) ) . (B.5)Since u ( w j ) (cid:54) = u ( w k ) and z (cid:48)(cid:48) (cid:54) = z (cid:48) , we must have sgn (B.4) (cid:54) = sgn (B.5), and theremainder of the proof is the same as that of Theorem 1. B.2 Random Monitoring Technology

This appendix extends the baseline model to encompass random monitoring tech-nologies q : Ω → ∆ K mapping raw data points to elements in the K -dimensionalsimplex. Time evolves as follows:1. the principal commits to (cid:104) q , w (cid:105) ;2. the agent privately chooses a ∈ { , } ;3. Nature draws ω ∈ Ω according to P a ;4. the monitoring technology outputs n ∈ { , · · · , K } with probability q n ( ω );5. the principal pays the promised wage w n ≥ (cid:104) q , w (cid:105) , the agent is assigned to performance category n with probability π n = (cid:90) q n ( ω ) dP ( ω )if he exerts high eﬀort. Deﬁne N = { n : π n > } . For n ∈ N , deﬁne z n = (cid:90) Z ( ω ) q n ( ω ) dP ( ω ) /π n

45s the z -value of performance category n . For n / ∈ N , let w n = 0. Then (cid:104) q , w (cid:105) isincentive compatible if (cid:88) n ∈N π n u ( w n ) z n ≥ c, (IC)in which case the monitoring cost is proportional to the mutual information of theraw data and output signal conditional on high eﬀort: H ( q ,

1) = (cid:88) n ∈N (cid:90) q n ( ω ) log q n ( ω ) (cid:82) q n ( ω ) dP ( ω ) dP ( ω ) . An optimal incentive contract (cid:104) q ∗ , w ∗ (cid:105) that induces high eﬀort from the agent solvesmin (cid:104) q ,w (cid:105) (cid:88) n ∈N π n w n + µ · H ( q ,

1) s.t. (IC) and (LL) . The next theorem gives characterizations of optimal incentive contracts:

Theorem 7.

For any optimal incentive contract (cid:104) q ∗ , w ∗ (cid:105) that induces high eﬀortfrom the agent, we have (i) q ∗ : Z (Ω) → ∆ K ; (ii) min { w ∗ n : n ∈ N ∗ } = 0 ; (iii) forall j, k ∈ N ∗ , w ∗ j (cid:54) = w ∗ k and q ∗ k ( z ) /q ∗ j ( z ) is strictly increasing in z if w ∗ j < w ∗ k .Proof. Since the incentive cost is linear in q ( ω ) whereas the monitoring cost is convexin q ( ω ), it follows that q ∗ : Z (Ω) → ∆ K and that w ∗ j (cid:54) = w ∗ k for all j, k ∈ N ∗ . Write N ∗ = { , · · · , N } and assume w.l.o.g. that w ∗ < · · · < w ∗ N . Then w ∗ = 0 for thesame reason as in proof of Lemma 2. Diﬀerentiating the principal’s objective functionwith respect to q ( z ) yields the following ﬁrst-order condition: − w ∗ n + λu ( w ∗ n ) z = µ (cid:18) log q ∗ n ( z ) q ∗ ( z ) − log π ∗ n π ∗ (cid:19) ∀ n = 2 , · · · , N, (B.6)where λ > z , thus proving Part (iii)of this theorem.The next theorem proves existence of optimal incentive contract: Theorem 8.

Assume Assumptions 2 and 3. Then an optimal incentive contract thatinduces high eﬀort from the agent exists. roof. For any given q , the wage-minimization problem admits solutions if and onlyif z j (cid:54) = z k for some j, k ∈ N , in which case we denote the minimal incentive cost by W ( q ). The principal’s problem ismin q W ( q ) + µ · H ( q , , and any solution of it must be continuous diﬀerentiable on Z (Ω) by Equation (B.6)and Assumptions 2 and 3 (taking the usual care of derivatives at end points). Deﬁne C (cid:0) Z (Ω) , ∆ K (cid:1) as the set of q ’s as above and equip C (cid:0) Z (Ω) , ∆ K (cid:1) with the supnorm (cid:107) · (cid:107) , i.e., (cid:107) q (cid:48) − q (cid:107) = sup z,n | q (cid:48) n ( z ) − q n ( z ) | . Rewrite the principal’s problem asfollows: min q ∈ C ( Z (Ω) , ∆ K ) W ( q ) + µ · H ( q , , and note that the objective function is continuous in q .To prove existence of solutions, note thatinf q ∈ C ( Z (Ω) , ∆ K ) W ( q ) + µ · H ( q , x . Let (cid:8) q k (cid:9) be any sequence in C (cid:0) Z (Ω) , ∆ K (cid:1) such that lim k →∞ W (cid:0) q k (cid:1) + µ · H (cid:0) q k , (cid:1) = x . Clearly, q k is uniformly bounded forall k , and the family (cid:8) q k (cid:9) is equicontinuous by Assumption 3 and the deﬁnition of C (cid:0) Z (Ω) , ∆ K (cid:1) . Thus, a subsequence of (cid:8) q k (cid:9) converges uniformly to some q ∞ byHelly’s selection theorem, and W ( q ∞ ) + µ · H ( q ∞ ,

1) = x by the continuity of theobjective function. References

Alchian, A. A., and H. Demsetz (1972): “Production, information costs, andeconomic organization,”

American Economic Review , 62(5), 777-795.

Baiman, S., and J. S. Demski (1980): “Economically optimal performance evalu-ation and control systems,”

Journal of Accounting Research , 18(S), 184-220.

Bergemann, D., and S. Morris (2019): “Information design: a unifying perspec-tive,”

Journal of Economic Literature , 57(1), 44-95.47 lackwell, D. (1953): “Equivalent comparisons of experiments,”

Annals of Math-ematical Statistics , 24(2), 265-272.

Bloom, N., B. Eifert, A. Mahajan, D. McKenzie, and J. Roberts (2013):“Does management matter? Evidence from India,”

Quarterly Journal of Eco-nomics , 128(1), 1-51.

Bloom, N., R. Sadun, and J. Van Reenen (2012): “Americans do IT better: USmultinationals and the productivity miracle,”

American Economic Review , 102(1),167-201.

Bloom, N., and J. Van Reenen (2006): “Measuring and explaining managementpractices across ﬁrms and countries,”

Centre for Economic Performance DiscussionPaper , No. 716. ——— (2007): “Measuring and explaining management practices across ﬁrms andcountries,”

Quarterly Journal of Economics , 122(4): 1351-1408. ——— (2010): “Why do management practices diﬀer across countries?,”

Journal ofEconomic Perspectives , 24(1), 203-224.

Cover, T. M., and J. A. Thomas (2006):

Elements of information theory,

Hobo-ken, NJ: John Wiley & Sons, 2nd ed.

Cr´emer, J., L. Garicano, and A. Prat (2007): “Language and the theory ofthe ﬁrm,”

Quarterly Journal of Economics , 122(1), 373-407.

Dilm´e, F. (2017): “Optimal languages,”

Working Paper . Dye, R. A. (1986): “Optimal monitoring policies in agencies,”

The Rand Journal ofEconomics , 17(3), 339-350.

Ewenstein, B., B. Hancock, and A. Komm (2016): “Ahead of the curve: thefuture of performance management,”

McKinsey Quarterly , May.

Green, J. R., and N. L. Stokey (1983): “A comparison of tournaments andcontracts,”

Journal of Political Economy , 91(3), 349-364.

Grossman, S. J., and O. D. Hart (1983): “An analysis of the principal-agentproblem,”

Econometrica , 51(1), 7-45. 48 ayyali, B., D. Knott, and S. Van Kuiken (2013): “The ‘big data’ revolutionin healthcare,”

McKinsey Quarterly , January.

Holmstr¨om, B. (1979): “Moral hazard and observability,”

The Bell Journal ofEconomics , 10(1), 74-91. ——— (1982): “Moral hazard in teams,”

The Bell Journal of Economics , 13(2),324-340.

Holmstr¨om, B., and P. Milgrom (1991): “Multitask principal-agent analyses:incentive contracts, asset ownership, and job design,”

Journal of Law, Economics,and Organization , 7(S), 24-52.

Hook, C., A. Jenkins, and M. Foot (2011):

Introducing human resource man-agement,

Pearson, 6th ed.

J¨ager, G., L. P. Metzger, and F. Riedel (2011): “Voronoi languages: Equi-libria in cheap-talk games with high-dimensional signals and few signals,”

Gamesand Economic Behavior , 73(2), 517-537.

Kaplan, E. (2015): “The spy who ﬁred me: the human costs of workplace monitor-ing,”

Harper’s Magazine , March.

Kim, S. K. (1995): “Eﬃciency of an information system in an agency model,”

Econo-metrica , 63(1), 89-102.

Lazear, E. P., and S. Rosen (1981): “Rank-order tournaments as optimal laborcontracts,”

Journal of Political Economy , 89(5), 841-864.

Ma´ckowiak, B., and M. Wiederholt (2009): “Optimal sticky prices underrational inattention,”

American Economic Review , 99(3), 769-803.

Martin, D. (2017): “Strategic pricing with rational inattention to quality,”

Gamesand Economic Behavior , 104, 131-145.

Matˇejka, F., and A. McKay (2012): “Simple market equilibria with ratio-nally inattentive consumers,”

American Economic Review: Papers and Proceedings ,102(3), 24-29. 49 ookherjee, D. (1984): “Optimal incentive schemes with many agents,”

Reviewof Economic Studies , 51(3), 433-446.

Murff, H. J., F. FitzHenry, M. E. Matheny, N. Gentry, K. L. Kotter,K. Crimin, R. S. Dittus, A. K. Rosen, P. L. Elkin, S. H. Brown, and T.Speroff (2011): “Automated identiﬁcation of postoperative complications withinan electronic medical record using natural language processing,”

Journal of Amer-ican Medical Association , 306(8), 848-855.

Ravid, D. (2017): “Bargaining with rational inattention,”

Working Paper . Saint-Paul, G. (2017): “A “quantized” approach to rational inattention,”

EuropeanEconomic Review , 100, 50-71.

Shannon, C. E. (1948): “A mathematical theory of communication,”

Bell LabsTechnical Journal , 27(3), 379-423.

Sims, C. A. (1998): “Stickiness,”

Carnegie-Rochester Conference Series on PublicPolicy , 49, 317-356. ——— (2003): “Implications of rational inattention,”

Journal of Monetary Eco-nomics , 50(3), 665-690.

Singer, N. (2013): “In a mood? Call center agents can tell,”

New York Times ,October 12.

Sobel, J. (2015): “Broad terms and organizational codes,”

Working Paper . Woodford, M. D. (2009): “Information-constrained state-dependent pricing,”

Journal of Monetary Economics , 56(S), S100-S124.

Yang, M. (2019): “Optimality of debt under ﬂexible information acquisition,”