Optimal Incentive Contract with Endogenous Monitoring Technology
OOptimal Incentive Contract with EndogenousMonitoring Technology
Anqi Li ∗ Ming Yang † Forthcoming, Theoretical Economics
Abstract
Recent technology advances have enabled firms to flexibly process and ana-lyze sophisticated employee performance data at a reduced and yet significantcost. We develop a theory of optimal incentive contracting where the moni-toring technology that governs the above procedure is part of the designer’sstrategic planning. In otherwise standard principal-agent models with moralhazard, we allow the principal to partition agents’ performance data into anyfinite categories and to pay for the amount of information the output signalcarries. Through analysis of the trade-off between giving incentives to agentsand saving the monitoring cost, we obtain characterizations of optimal mon-itoring technologies such as information aggregation, strict MLRP, likelihoodratio-convex performance classification, group evaluation in response to risingmonitoring costs, and assessing multiple task performances according to agents’endogenous tendencies to shirk. We examine the implications of these resultsfor workforce management and firms’ internal organizations.Key words: incentive contract; endogenous monitoring technology.JEL codes: D86, M15, M5. ∗ Department of Economics, Washington University in St. Louis. [email protected]. † Fuqua School of Business, Duke University. [email protected]. We thank the coeditor, fouranonymous referees, Nick Bloom, George Mailath, Ilya Segal, Chris Shannon, Joel Sobel, JacobSteinhardt, Bob Wilson and the seminar participants at Caltech, Decentralization Conference 2017,Duke-UNC, Johns Hopkins Carey, Northwestern, Stanford, UCSB, UCSD, U of Chicago and U ofWashington for comments and suggestions. Lin Hu and Jessie Li provided generous assistance forthe numerical analysis. All errors are our own. a r X i v : . [ ec on . T H ] N ov Introduction
Recent technology advances have enabled firms to flexibly process and analyze so-phisticated employee performance data at a reduced and yet significant cost. Speechanalytics software, natural language processing tools and cloud-based systems are in-creasingly used to convert hard-to-process contents into succinct and meaningful rat-ings such as “satisfactory” and “unsatisfactory” (Murff et al. (2011); Singer (2013);Kaplan (2015)). This paper develops a theory of optimal incentive contracting wherethe monitoring technology that governs the above procedure is part of the designer’sstrategic planning.Our research agenda is motivated by the case of call center performance manage-ment reported by Singer (2013). It has long been recognized that the conversationsbetween call center agents and customers contain useful performance indicators suchas customer sentiment, voice quality and tone, etc.. Recently, the advent of speechanalytics software has finally enabled the processing and analysis of these contents,as well as their conversions into meaningful ratings such as “satisfactory” and “un-satisfactory.” On the one hand, running speech analytics software consumes serverspace and power, and the procedure has been increasingly outsourced to third partiesto take advantage of the latest development in cloud computing. On the other hand,managers now have considerable freedom to decide which facets of the customer con-versation to utilize, thanks to the increased availability of products whose specialtiesrange from emotion detection to word spotting.We formalize the flexibility and cost associated with the design and implementa-tion of the monitoring technology in otherwise standard principal-agent models withmoral hazard. Specifically, we allow the monitoring technology to partition agents’performance data into any finite categories, at a cost that increases with the amountof information the output signal carries (hereafter monitoring cost ). An incentivecontract pairs the monitoring technology with a wage scheme that maps realizationsof the output signal to different wages. An optimal contract minimizes the sum ofexpected wage and monitoring cost, subject to agents’ incentive constraints.Our main result gives characterizations of optimal monitoring technologies in gen-eral environments, showing that the assignment of Lagrange multiplier-weighted likeli-hood ratios to performance categories is positive assortative in the direction of agentutilities. Geometrically, this means that optimal monitoring technologies comprise2onvex cells in the space of likelihood ratios or their transformations. This result pro-vides practitioners with the needed formula for sorting employee performance data,and exploiting its geometry yields insights into workforce management and firms’internal organizations.Our proof strategy works directly with the principal’s Lagrangian. It handlesgeneral situations featuring multiple agents and tasks, in which the direction of sortingvector-valued likelihood ratios is nonobvious a priori. It overcomes the technicalchallenge whereby perturbations of the sorting algorithm affect wages endogenouslythrough the Lagrange multipliers of agents’ incentive constraints, yielding effects thatare new and difficult to assess using standard methods.We give three applications of our result. In the single-agent model considered inHolmstr¨om (1979), we show that the assignment of likelihood ratios to wage categoriesis positive assortative and follows a simple cutoff rule. The monitoring technologyaggregates potentially high-dimensional performance data into rank-ordered ratings,and the output signal satisfies the strict monotone likelihood ratio property withrespect to the order induced by likelihood ratios. Solving cutoff likelihood ratios yieldsconsistent findings with recent developments in manufacturing, retail and healthcaresectors, where decreases in the data processing cost have shown to increase the finenessof the performance grids (Bloom and Van Reenen (2006, 2007); Murff et al. (2011);Ewenstein, Hancock, and Komm (2016)).In the multi-agent model considered in Holmstr¨om (1982), the optimal monitor-ing technology partitions vectors of individual agents’ likelihood ratios into convexpolygons. Based on this result, we then compare individual and group performanceevaluations from the angle of monitoring cost, showing that firms should switch fromindividual to group evaluation in response to rising monitoring costs. This result for-malizes the theses of Alchian and Demsetz (1972) and Lazear and Rosen (1981) thateither team or tournament should be the dominant incentive system when individualperformance evaluation is too costly to conduct. It is consistent with the findings ofBloom and Van Reenen (2006, 2007) that lack of IT access increases the use of groupperformance evaluation among otherwise similar firms.In the multiple-task model studied in Holmstr¨om and Milgrom (1991), the re-sources spent on the assessment of a task performance should increase with the agent’sendogenous tendency to shirk the corresponding task. Using simulation, we applythis result to the study of, e.g., how improved precision of some task measurements3caused by, e.g., the advent of high-quality scanner data measuring the skillfulnessin scanning items) would affect the resources spent on the assessments of other taskperformances (e.g., projecting warmth to customers).
Earlier studies on contracting with costly experiments (in the sense of Blackwell(1953)) include, but are not limited to: Baiman and Demski (1980) and Dye (1986), inwhich the principal can pay an external auditor for drawing a signal from an exogenousdistribution; Holmstr¨om (1979), Grossman and Hart (1983) and Kim (1995), in whichsignal distributions are ranked based on the incentive costs they incur. In thesestudies, the principal can change the probability space generated by the agent’s hiddeneffort and, in the first two studies, through paying stylized costs. In contrast, we focuson the conversion of raw data into performance ratings while taking the probabilityspace as given. Also our assumption that the monitoring cost increases with theamount of information carried by the output signal could be ill-suited for modelingthe cost of running experiments.The current work differs from existing studies on rational inattention (hereafterRI) in three aspects. First, early developments in RI by Sims (1998, 2003), Ma´ckowiakand Wiederholt (2009) and Woodford (2009) sought to explain the stickiness ofmacroeconomic variables by information processing costs, whereas we examine theimplication of costly yet flexible monitoring for principal-agent relationships. Sec-ond, we focus mainly on partitional monitoring technologies because in reality, addingnon-performance-related factors into employee ratings could have dire consequencessuch as appeals, lawsuits and excessive turnover. Finally, our monitoring cost func-tion nests entropy as a special case.Recent works of Cr´emer, Garicano, and Prat (2007), J¨ager, Metzger, and Riedel(2011), Sobel (2015) and Dilm´e (2017) examine the optimal language used betweenorganization members who share a common interest but face communication costs. Yang (2019) studies a security design problem where a rationally inattentive buyer can obtainany signal about the uncertain fundamental at a cost that is proportional to entropy reduction.Other recent efforts to introduce RI into strategic environments include but are not limited to:Mat´ejka and McKay (2012), Martin (2017) and Ravid (2017). See standard HR textbooks for this subject matter. Saint-Paul (2017) demonstrates the validityof entropy as an information cost in decision problems where the decision variable is a deterministicfunction of the exogenous state variable.
Primitives
A risk-neutral principal faces a risk-averse agent, who earns a utility u ( w ) from spending a nonnegative wage w ≥ c ( a ) from privatelyexerting high or low effort a ∈ { , } . The function u : R + → R satisfies u (0) = 0, u (cid:48) > u (cid:48)(cid:48) <
0, whereas c (1) = c > c (0) = 0.Each effort choice a generates a probability space (Ω , Σ , P a ), where Ω is a finite-dimensional Euclidean space that comprises the agent’s performance data, Σ is theBorel sigma-algebra on Ω, and P a is the probability measure on (Ω , Σ). P a ’s areassumed to be mutually absolutely continuous, and the probability density function p a ’s they induce are well-defined and everywhere positive. Incentive contract
An incentive contract (cid:104)P , w ( · ) (cid:105) is a pair of monitoring tech-nology P and wage scheme w : P → R + . The former represents a human- ormachine-operated system that governs the processing and analysis of performancedata, whereas the latter maps outputs of the first-step procedure to different levelsof wages. In the main body of this paper, P can be any partition of Ω with at most K cells that are all of positive measures, and w : P → R + maps each cell A of P toa nonnegative wage w ( A ) ≥ The upper bound K for the rating scale |P| can beany integer greater than one and will be taken as given throughout the analysis. In Appendix B.2, we allow the monitoring technology to be any mapping from Ω to lotterieson finite performance categories. If the lottery is degenerate, then the monitoring technology ispartitional. Appendix B.1 examines the case where the agent is constrained by individual rationality. The upper bound K , while stylized, guarantees the existence of optimal incentive contract(s).Judging from the simulation exercises we have so far conducted, the optimal rating scale is typicallysmaller than K even when µ is small (see, e.g., Figure 1). ω ∈ Ω, let A ( ω ) be the unique performance category thatcontains ω and let w ( A ( ω )) be the wage associated with A ( ω ). Time evolves asfollows:1. the principal commits to (cid:104)P , w ( · ) (cid:105) ;2. the agent privately chooses a ∈ { , } ;3. Nature draws ω from Ω according to P a ;4. the monitoring technology outputs A ( ω );5. the principal pays w ( A ( ω )) to the agent. Implementation cost
For any given effort choice a by the agent, a monitoringtechnology P = { A , · · · , A N } outputs a signal X : Ω → P whose probability distri-bution is represented by a vector π ( P , a ) = ( P a ( A ) , · · · , P a ( A N ) , , · · · ,
0) in the K -dimensional simplex. The principal incurs the following cost from implementingan incentive contract (cid:104)P , w ( · ) (cid:105) : (cid:88) A ∈P P a ( A ) w ( A ) + µ · H ( P , a ) , which consists of two parts. The first part (cid:80) A ∈P P a ( A ) w ( A ), i.e., the incentivecost , has been the central focus of the existing principal-agent literature. The secondpart µ · H ( P , a ), hereafter termed the monitoring cost , represents the cost associatedwith the processing and analysis of the performance data. In particular, µ > H ( P , a )captures the amount of information carried by the output signal and is assumed tosatisfy the following properties: Assumption 1.
There exists a function h : ∆ K → R + such that H ( P , a ) = h ( π ( P , a )) for all ( P , a ) . Furthermore,(a) h ( π , · · · , π K ) = h (cid:0) π σ (1) , · · · , π σ ( K ) (cid:1) for all probability vector ( π , · · · , π K ) ∈ ∆ K and permutation σ on { , · · · , K } ;(b) h (0 , π , · · · ) < h ( π (cid:48) , π (cid:48)(cid:48) , · · · ) for all (0 , π , · · · ) and ( π (cid:48) , π (cid:48) , · · · ) ∈ ∆ K that differonly in the first two elements and satisfy π , π (cid:48) , π (cid:48) > and π = π (cid:48) + π (cid:48) . − (cid:80) A ∈P P a ( A ) log P a ( A ) of the output signal and the bits of information log |P| itcarries. In Section 2.2, we motivate the use of this assumption in the example of callcenter performance management.
The principal’s problem
Consider the problem of inducing high effort from theagent. Define a random variable Z : Ω → R by Z ( ω ) = 1 − p ( ω ) p ( ω ) ∀ ω, where p ( ω ) /p ( ω ) is the likelihood ratio associated with data point ω . Note that E [ Z | a = 1] = 0 and that the range of Z is a subset of ( −∞ , − A ∈ Σof positive measure, define the z -value of A by z ( A ) = E [ Z | A ; a = 1] . In words, z ( A ) represents the average value of Z conditional on the data point beingdrawn from A .A contract (cid:104)P , w ( · ) (cid:105) is incentive compatible if (cid:88) A ∈P P ( A ) u ( w ( A )) − c ≥ (cid:88) A ∈P P ( A ) u ( w ( A ))or, equivalently, (cid:88) A ∈P P ( A ) u ( w ( A )) z ( A ) ≥ c, (IC)and it satisfies the limited liability constraint if w ( A ) ≥ ∀ A ∈ P . (LL) The bit is a basic unit of information in information theory, computing, and digital communi-cations. In information theory, one bit is defined as the maximum information entropy of a binaryrandom variable. The problem of inducing low effort is standard.
7n optimal incentive contract that induces high effort from the agent (optimal in-centive contract for short) minimizes the total implementation cost under high effort,subject to the incentive compatibility constraint and limited liability constraint:min (cid:104)P ,w ( · ) (cid:105) (cid:88) A ∈P P ( A ) w ( A ) + µ · H ( P ,
1) s.t. (IC) and (LL) . In what follows, we will denote the solution(s) to the above problem by (cid:104)P ∗ , w ∗ ( · ) (cid:105) . We first illustrate Assumption 1 in the context of call center performance manage-ment:
Example 1.
In the example described in Section 1, a piece of performance datacomprises the major characteristics of a call history (e.g., customer sentiment andvoice quality) encoded in binary digits, and the monitoring technology representsthe speech analytics program that categorizes binary digits into performance ratings.To formalize the design flexibility, we allow the monitoring technology to partitionperformance data into any N ≤ K categories, where K can be any interger greaterthan one. The cost of running the monitoring technology is assumed to increasewith the amount of processed information, whose definition varies from case to case.For example, if the monitoring technology runs many times among many identicalagents, then the optimal design should minimize the average steps it takes to find theperformance category containing the raw data point. By now, it is well known thatthis quantity equals approximately the entropy of the output signal. In contrast, ifthe monitoring technology runs only a few times for a few number of agents, thenthe worst-case (or unamortized) amount of processed information is best capturedby the bits of information carried by the output signal (see, e.g., Cover and Thomas(2006)). In both cases, the quantity of our interest depends only on the probabilitydistribution of the output signal and nothing else.We next introduce the concept of setup cost and distinguish it from our notion ofmonitoring cost: Example 1 (Continued) . As its name suggests, setup cost refers the cost incurredto set up the infrastructure that facilitates data processing and analysis, e.g., Fast8ourier Transformation (FFT) chips (which transform sound waves into their majorcharacteristics coded in binary digits), recording devices, etc..The major role of setup cost is to change the probability space (Ω , Σ , P a ). Forexample, design improvements in FFT chips enable more frequent sampling of soundwaves and cause (Ω , Σ , P a ) to change. In what follows, we will take the probabilityspace as given and ignore the setup cost. That said, one can certainly embed ouranalysis into a two-stage setting in which the principal first incurs the setup cost andthen the monitoring cost. Results below will carry over to this new setting. Example 2.
Suppose u ( w ) = √ w , Z is uniformly distributed over [ − / , /
2] under a = 1 and H ( P , a ) = f ( |P| ) for some strictly increasing function f : { , · · · , K } → R + . Below we walk through the key steps in solving the optimal incentive contract,give closed-form solutions and discuss their practical implications. Optimal wage scheme
We first solve for the optimal wage scheme for any givenmonitoring technology P as in Holmstr¨om (1979). Specifically, label the performancecategories as A , · · · , A N , and write π n = P ( A n ) and z n = z ( A n ) for n = 1 , · · · , N .Assume z j (cid:54) = z k for some j, k ∈ { , · · · , N } to make the analysis interesting. Theprincipal’s problem is then min { w n } N (cid:88) n =1 π n w n , s.t. N (cid:88) n =1 π n √ w n z n ≥ c, (IC)and w n ≥ , n = 1 , · · · , N. (LL)Straightforward algebra yields the expression for minimal incentive cost: c (cid:32) N (cid:88) n =1 π n max { , z n } (cid:33) − . sufficient statistics principle , namely z -value is the only part of the performance data that provides the agent with incentives. Optimal monitoring technology
We next solve for the optimal monitoring tech-nology. First, note that the principal should partition performance data based only ontheir z -values, and that different performance categories must attain different z -valuesand wages. The reason combines the sufficient statistic principle with Assumption1(b), namely merging performance categories of the same z -value saves the monitoringcost while leaving the incentive cost unaffected and thus constitutes an improvementto the original monitoring technology.A more interesting question concerns how we should assign the various data points,identified by their z -values, to different performance categories. In the baseline modelfeaturing a single agent and binary efforts, the answer to this question is relativelystraightforward: assign high (resp. low) z -values to high-wage (resp. low-wage) cat-egories. Here is a quick proof of this result: since the left-hand side of the (IC)constraint is supermodular in wages and z -values, if our conjecture were false, thenreshuffling data points as above while holding the probabilities of performance cate-gories constant reduces the incentive cost while leaving the monitoring cost unaffected.When extending the above intuition to general settings featuring multiple agentsor multiple actions, we face two challenges. First, in the case where z -values and wagesare vectors, the direction of sorting these objects is nonobvious a priori. Second,changes in the sorting algorithm affect wages endogenously through the Lagrangemultipliers of the incentive constraints, yielding effects that are new and difficult toassess using standard methods.The proof strategy presented in Section 3.3 overcomes these challenges, showingthat the assignment of Lagrange multiplier-weighted z -values to performance cate-gories must be positive assortative in the direction of agent utilities. Geometrically,this means that any optimal monitoring technology must comprise convex cells in thespace of z -values or their transformations. Theorems 1, 3 and 5 formalize the abovestatements. Implications
An important feature of the optimal monitoring technology is in-formation aggregation —a term used by human resource practitioners to refer to theaggregation of potentially high-dimensional performance data into rank-ordered rat-10ngs such as “satisfactory” and “unsatisfactory.”The geometry of the optimal monitoring technology sheds light on the practicalissues covered in Sections 3.4, 4.3 and 5.1. Consider, for example, optimal perfor-mance grids. In the current example, it can be shown that the optimal N -partitionalmonitoring technology divides the space [ − / , /
2] of z -values into N disjoint inter-vals [ (cid:98) z n − , (cid:98) z n ), n = 1 , · · · , N , where (cid:98) z = − / (cid:98) z N = 1 /
2. The optimal cut points { (cid:98) z n } N − n =1 can be solved as follows:min { (cid:98) z n } c (cid:32) N (cid:88) n =1 π n max { , z n } (cid:33) − − µ · f ( N ) , where π n = (cid:90) (cid:98) z n (cid:98) z n − dZ = (cid:98) z n − (cid:98) z n − , and z n = 1 π n (cid:90) (cid:98) z n (cid:98) z n − ZdZ = 12 ( (cid:98) z n − + (cid:98) z n ) . Straightforward algebra yields (cid:98) z n = 2 n − N − , n = 1 , · · · , N − . Based on this result, as well as the functional form of f , we can then solve for theoptimal rating scale N and hence the optimal incentive contract completely. This section analyzes optimal incentive contracts. Results below hold true exceptperhaps on a measure zero set of data points. The same disclaimer applies to theremainder of this paper.We first define Z -convexity : Definition 1.
A set A ∈ Σ is Z -convex if the following holds for all ω (cid:48) , ω (cid:48)(cid:48) ∈ A suchthat Z ( ω (cid:48) ) (cid:54) = Z ( ω (cid:48)(cid:48) ) : { ω ∈ Ω : Z ( ω ) = (1 − s ) · Z ( ω (cid:48) ) + s · Z ( ω (cid:48)(cid:48) ) for some s ∈ (0 , } ⊂ A. In words, a set A ∈ Σ is Z -convex if whenever it contains data points of different11 -values, it must also contain all data points of intermediate z -values. Let Z ( A )denote the image of any set A ∈ Σ under mapping Z . In the case where Z (Ω) is aconnected set in R , the above definition is equivalent to the convexity of Z ( A ) in R .A few assumptions before we go into detail. The next assumption says that thedistribution of Z has no atom or hole: Assumption 2. Z is distributed atomlessly on a connected set Z (Ω) in R under a = 1 . The next assumption says that Z (Ω) is compact set in R : Assumption 3. Z (Ω) is a compact set in R . The next assumption imposes regularities on the monitoring cost function: Part(a) of it holds for the bits of information carried by the output signal, and Part (b)of it holds for the entropy of the output signal:
Assumption 4.
The function h : ∆ K → R + satisfies one of the following conditions:(a) h ( π ( P , a )) = f ( |P| ) for some strictly increasing function f : { , · · · , K } → R + ;(b) h is continuous. We now state our main results. The next theorem shows that any optimal incentivecontract assigns data points of high (resp. low) z -values to high-wage (resp. low-wage) categories. Under Assumption 2, this can be achieved by first dividing z -valuesinto disjoint intervals and then backing out the partition of the original data spaceaccordingly. The result is an aggregation of potentially high-dimensional data intorank-ordered ratings, as well as a wage scheme that is strictly increasing in theseratings: Theorem 1.
Assume Assumption 1 and let (cid:104)P ∗ , w ∗ ( · ) (cid:105) be any optimal incentivecontract that induces high effort from the agent. Then P ∗ comprises Z -convex cellslabeled as A , · · · , A N where w ∗ ( A ) < · · · < w ∗ ( A N ) . Assume, in addition,Assumption 2. Then there exist inf Z (Ω) = (cid:98) z < (cid:98) z < · · · < (cid:98) z N = sup Z (Ω) such that A n = { ω : Z ( ω ) ∈ [ (cid:98) z n − , (cid:98) z n ) } for n = 1 , · · · , N . Under Assumption 2, the set of (finite) cut points has measure zero, so it is unimportant whichof the two adjacent intervals a cut point belongs to. The choice of expressing all intervals as righthalf-open ones is purely aesthetic.
Theorem 2.
An optimal incentive contract that induces high effort from the agentexists under Assumptions 1-4.Proof.
See Appendix A.1.
The proof of Theorem 1 consists of three steps. The intuitions of steps one and twohave already been discussed in Example 2. Step three is new.
Step one
We first take any monitoring technology P as given and solve for theoptimal wage scheme as in Holmstr¨om (1979):min w : P→ R + (cid:88) A ∈P P ( A ) w ( A ) s.t. (IC) and (LL). (3.1)The next lemma restates Holmstr¨om’s (1979) sufficient statistic principle : Lemma 1.
Let w ∗ ( · ; P ) be any solution to Problem (3.1). Then there exists λ > such that u (cid:48) ( w ∗ ( A ; P )) = 1 / ( λz ( A )) for all A ∈ P such that w ∗ ( A ; P ) > .Proof. See Appendix A.1.
Step two
We next demonstrate that different performance categories must attaindifferent z -values and wages: Lemma 2.
Assume Assumption 1. Let (cid:104)P ∗ , w ∗ ( · ) (cid:105) be any optimal incentive contractthat induces high effort from the agent and label the cells of P ∗ as A , · · · , A N suchthat z ( A ) ≤ · · · ≤ z ( A N ) . Then z ( A ) < < · · · < z ( A N ) and w ∗ ( A ) < · · · Step three We finally demonstrate that the assignment of z -values into wage cate-gories is positive assortative. In Example 2, we sketched a proof based on supermodu-larity and pointed out the difficulties of extending that argument to multidimensionalenvironments. The argument below overcomes these difficulties.13ake any optimal incentive contract with distinct performance categories A j and A k . From Lemma 2, we know that z ( A j ) (cid:54) = z ( A k ). Fix any (cid:15) > 0, and take any A (cid:48) (cid:15) ⊂ A j and A (cid:48)(cid:48) (cid:15) ⊂ A k such that P ( A (cid:48) (cid:15) ) = P ( A (cid:48)(cid:48) (cid:15) ) = (cid:15) and z ( A (cid:48) (cid:15) ) = z (cid:48) (cid:54) = z ( A (cid:48)(cid:48) (cid:15) ) = z (cid:48)(cid:48) .In words, A (cid:48) (cid:15) and A (cid:48)(cid:48) (cid:15) have the same probability (cid:15) under a = 1 but different z -valuesthat are independent of (cid:15) . Lemma 3 of Appendix A.1.1 proves existence of A (cid:48) (cid:15) and A (cid:48)(cid:48) (cid:15) when (cid:15) is small.Consider a perturbation to the monitoring technology that “swaps” A (cid:48) (cid:15) and A (cid:48)(cid:48) (cid:15) .Post the perturbation, the new performance categories, denoted by A n ( (cid:15) )’s, become A j ( (cid:15) ) = ( A j \ A (cid:48) (cid:15) ) ∪ A (cid:48)(cid:48) (cid:15) , A k ( (cid:15) ) = ( A k \ A (cid:48)(cid:48) (cid:15) ) ∪ A (cid:48) (cid:15) and A n ( (cid:15) ) = A n for n (cid:54) = j, k . Since theperturbation has no effect on the probabilities of the performance categories under a = 1, it does not affect the monitoring cost by Assumption 1(a). Meanwhile, itchanges the principal’s Lagrangian to the following (ignore the (LL) constraint): L ( (cid:15) ) = (cid:88) n π n ( w n ( (cid:15) ) − λ ( (cid:15) ) u ( w n ( (cid:15) )) z n ) + λ ( (cid:15) ) c, where π n denotes the probability of A n (equivalently A n ( (cid:15) )) under a = 1, w n ( (cid:15) )the optimal wage at A n ( (cid:15) ), and λ ( (cid:15) ) the Lagrange multiplier associated with the(IC) constraint. A close inspection of the Lagrangian leads to the following conjec-ture: to minimize L ( (cid:15) ), the assignment of Lagrange multiplier-weighted z -values toperformance categories must be positive assortative in the direction of agent utilities.To develop intuition, we assume differentiability and obtain L (cid:48) (0) = (cid:88) n π n w (cid:48) n (0) − λ (cid:48) (0) (cid:32)(cid:88) n π n u ( w n (0)) z n (0) − c (cid:33)(cid:124) (cid:123)(cid:122) (cid:125) (1) = 0 − λ (0) (cid:88) n π n · u (cid:48) ( w n (0)) z n (0) (cid:124) (cid:123)(cid:122) (cid:125) (2) = 1 /λ (0) · w (cid:48) n (0) + (cid:88) n π n u ( w n (0)) z (cid:48) n (0) = (cid:88) n π n w (cid:48) n (0) − − (cid:88) n π n w (cid:48) n (0) − λ (0) (cid:88) n π n u ( w n (0)) z (cid:48) n (0)= − λ (0) (cid:88) n π n u ( w n (0)) z (cid:48) n (0)= λ (0) ( z (cid:48)(cid:48) − z (cid:48) ) ( u ( w k (0)) − u ( w j (0))) . In the above expression, (1) = 0 because the (IC) constraint binds under the original14ontract, and (2) = 1 /λ (0) by Lemma 1. These findings resolve our concerns raisedin Section 3.1, showing that the effects of our perturbation on the Lagrange multiplierand wages are negligible.To complete the proof, note that L (cid:48) (0) ≥ L (cid:48) (0) (cid:54) = 0because λ (0) > z (cid:48)(cid:48) (cid:54) = z (cid:48) and w j (0) (cid:54) = w k (0) (Lemma 2). Combining yields L (cid:48) (0) > 0, so our conjecture is indeed true. Z -convexity is immediate: if a performancecategory contains extreme but not intermediate z -values, then the assignment of z -values goes in the wrong direction and an improvement can be constructed.The above proof strategy yields the endogenous direction of sorting raw data intoperformance categories, which is relatively straightforward in the baseline model but isless so in later extensions. The proof in Appendix A.1 does not assume differentiabilityand handles the limited liability constraint, too. Strict MLRP Theorem 1 implies that the signal generated by any optimal moni-toring technology must satisfy the strict monotone likelihood ratio property (hereafter strict MLRP ) with respect to the order induced by z -values: Definition 2. For any A, A (cid:48) ∈ Σ of positive measures, write A z (cid:22) A (cid:48) if z ( A ) ≤ z ( A (cid:48) ) . Corollary 1. The signal X : Ω → P ∗ generated by any optimal monitoring technology P ∗ satisfies strict MLRP with respect to z (cid:22) , i.e., any A, A (cid:48) ∈ P ∗ satisfy A z (cid:22) A (cid:48) if andonly if z ( A ) < z ( A (cid:48) ) . While the signal generated by any monitoring technology trivially satisfies theweak MLRP with respect to z (cid:22) (i.e., replace “ < ” with “ ≤ ” in Corollary 1), it violatesthe strict MLRP if there are multiple performance categories that attain the same z -value. By contrast, the signal generated by any optimal monitoring technology mustsatisfy the strict MLRP with respect to z (cid:22) , because merging performance categories ofthe same z -value saves the monitoring cost while leaving the incentive cost unaffected. Comparative statics The parameter µ captures factors that affect the (opportu-nity) cost of data processing and analysis. Factors that reduce µ include, but are notlimited to: the advent of IT-based HR management systems in the 90’s, advancementsin speech analytics, increases in computing power, etc..15o facilitate comparative statics analysis, we write any choice of optimal incentivecontract as (cid:104)P ∗ ( µ ) , w ∗ ( · ; µ ) (cid:105) to make its dependence on µ explicit: Proposition 1. Fix any < µ < µ (cid:48) . For any choices of (cid:104)P ∗ ( µ ) , w ∗ ( · ; µ ) (cid:105) and (cid:104)P ∗ ( µ (cid:48) ) , w ∗ ( · ; µ (cid:48) ) (cid:105) :(i) (cid:88) A ∈P ( µ ) P ( A ) w ∗ ( A ; µ ) ≤ (cid:88) A ∈P ( µ (cid:48) ) P ( A ) w ∗ ( A ; µ (cid:48) ) ;(ii) H ( P ∗ ( µ ) , ≥ H ( P ∗ ( µ (cid:48) ) , ;(iii) |P ∗ ( µ ) | ≥ |P ∗ ( µ (cid:48) ) | under Assumption 4(a).Proof. Part (i) follows from the optimalities of P ∗ ( µ ) and P ∗ ( µ (cid:48) ). Parts (ii) and (ii)are immediate.Proposition 1 shows that as data processing and analysis become cheaper, theprincipal pays less wage on average and the information carried by the output signalbecomes finer. In the case where the monitoring cost is an increasing function ofthe rating scale (see, e.g., Hook, Jenkins, and Foot (2011)), the optimal rating scaleis nonincreasing in µ . For other monitoring cost functions such as entropy, we canfirst compute the cutoff z -values and then the optimal rating scale as in Example 2. Figure 1 plots the numerical solutions obtained in a special case.The above findings are consistent with several strands of empirical facts. Amongothers, access to IT has proven to increase the fineness of the performance grids amongmanufacturing companies, holding other things constant (Bloom and Van Reenen(2006, 2007, 2010); Bloom, Sadun, and Van Reenen (2012)). Crowdsourcing theprocessing and analysis of real-time data has enabled the “exact individual diagnosis”that separates distinctive and mediocre performers in companies like GE and Zalando(Ewenstein, Hancock, and Komm (2016)). In general, this is not an easy task because perturbations of cutoff z -values (which differ from theperturbation considered in Section 3.3) affect wages endogenously through the Lagrange multipliersof the incentive constraints. See the appendices of Bloom and Van Reenen (2006, 2007) for survey questions regarding thefineness of the performance grids, e.g., “Each employee is given a red light (not performing), anamber light (doing well and meeting targets), a green light (consistently meeting targets, very highperformer) and a blue light (high performer capable of promotion of up to two levels),” versus“rewards is based on an individual’s commitment to the company measured by seniority.” 345 0.0 0.5 1.0 1.5 2.0 µ | P | Figure 1: Plot the optimal rating scale against µ : entropy cost, u ( w ) = √ w , Z ∼ U [ − / , / c = 1, K = 100. Each of the two agents i = 1 , u i ( w i ) − c i ( a i ) from spending anonnegative wage w i ≥ a i ∈ { , } . Thefunction u i : R + → R satisfies u i (0) = 0, u (cid:48) i > u (cid:48)(cid:48) i < 0, and c i (1) = c i > c i (0) =0. Each effort profile a = a a generates a probability space (Ω , Σ , P a ), where Ω isa finite-dimensional Euclidean space that comprises agents’ performance data, Σ isthe Borel sigma-algebra on Ω, and P a is the probability measure on (Ω , Σ). P a ’s areassumed to be mutually absolutely continuous, and the probability density function p a ’s they induce are well-defined and everywhere positive.In this new setting, a monitoring technology P can be any partition of Ω with atmost K cells that are all of positive measures, and a wage scheme w : P → R mapseach cell A of P to a vector w ( A ) = ( w ( A ) , w ( A )) (cid:62) of nonnegative wages. Forany data point ω , let A ( ω ) be the unique performance category that contains ω andlet w ( A ( ω )) be the wage vector associated with A ( ω ). Time evolves as follows:1. the principal commits to (cid:104)P , w ( · ) (cid:105) ;17. agent i privately chooses a i ∈ { , } , i = 1 , ω from Ω according to P a ;4. the monitoring technology outputs A ( ω );5. the principal pays w i ( A ( ω )) to agent i = 1 , for(1 , (cid:62) and define a vector-valued random variable Z = ( Z , Z ) (cid:62) by Z i ( ω ) = 1 − p a i =0 ,a − i =1 ( ω ) p ( ω ) ∀ ω ∈ Ω , i = 1 , . Define the z -value of any set A ∈ Σ of positive measure by ( z ( A ) , z ( A )) (cid:62) , where z i ( A ) = E [ Z i | A ; a = ] ∀ i = 1 , . A contract is incentive compatible for agent i if (cid:88) A ∈P P ( A ) u i ( w i ( A )) z i ( A ) ≥ c i , (IC i )and it satisfies agent i ’s limited liability constraint if w i ( A ) ≥ ∀ A ∈ P . (LL i )An optimal contract minimizes the total implementation cost under the high effortprofile, subject to agents’ incentive compatibility constraints and limited liabilityconstraints:min (cid:104)P , w ( · ) (cid:105) (cid:88) A ∈P P a ( A ) (cid:88) i =1 w i ( A ) + µ · H ( P , ) s.t. (IC i ) and (LL i ), i = 1 , . The next definition generalizes Z -convexity:18 efinition 3. A set A ∈ Σ is Z -convex if the following holds for all ω (cid:48) , ω (cid:48)(cid:48) ∈ A suchthat Z ( ω (cid:48) ) (cid:54) = Z ( ω (cid:48)(cid:48) ) : { ω ∈ Ω : Z ( ω ) = (1 − s ) · Z ( ω (cid:48) ) + s · Z ( ω (cid:48)(cid:48) ) for some s ∈ (0 , } ⊂ A. The next two assumptions impose regularities on the principal’s problem analo-gously to Assumptions 2 and 3: Assumption 5. Z is distributed atomelessly on a connect set Z (Ω) in R under a = . Assumption 6. Z (Ω) is compact set in R with dim Z (Ω) = 2 . The next theorems extend Theorems 1 and 2 to encompass multiple agents: Theorem 3. Assume Assumptions 1, 5 and 6. Then any optimal monitoring tech-nology comprises Z -convex cells that constitute convex polygons in R . Theorem 4. An optimal incentive contract that induces high effort from both agentsexists under Assumptions 1, 4, 5 and 6.Proof. See Appendix A.2. Proof sketch The proof strategy developed in Section 3.3 is useful for handlingvector-valued z -values and wages. As before, fix any (cid:15) > 0, and take any subsets A (cid:48) (cid:15) and A (cid:48)(cid:48) (cid:15) of two distinct performance categories A j and A k , respectively, such that P ( A (cid:48) (cid:15) ) = P ( A (cid:48)(cid:48) (cid:15) ) = (cid:15) and z ( A (cid:48) (cid:15) ) = z (cid:48) (cid:54) = z ( A (cid:48)(cid:48) (cid:15) ) = z (cid:48)(cid:48) (Lemma 5 of Appendix A.2.1proves existence of sets that satisfy weaker properties). Post the perturbation as inSection 3.3, the principal’s Lagrangian becomes (ignore (LL i ) constraints): L ( (cid:15) ) = (cid:88) n π n (cid:32)(cid:88) i w i,n ( (cid:15) ) − λ i ( (cid:15) ) u i ( w i,n ( (cid:15) )) z i,n ( (cid:15) ) − c i (cid:33) , where π n denotes the probability of A n (equivalently, A n ( (cid:15) )) under a = , w i,n ( (cid:15) )agent i ’s optimal wage at A n ( (cid:15) ) and λ i ( (cid:15) ) the Lagrange multiplier associated withthe (IC i ) constraint. Assuming differentiability, we obtain L (cid:48) (0) = − (cid:88) n π n · u (cid:62) n (cid:32) λ (0) 00 λ (0) (cid:33) dd(cid:15) z n ( (cid:15) ) (cid:12)(cid:12)(cid:12)(cid:12) (cid:15) =0 = ( u k − u j ) (cid:62) ( (cid:98) z (cid:48)(cid:48) − (cid:98) z (cid:48) ) , u n = ( u ( w i,n (0)) , u ( w i,n (0))) (cid:62) ∀ n and (cid:98) z = (cid:32) λ (0) 00 λ (0) (cid:33) z for z = z (cid:48) , z (cid:48)(cid:48) . Since L (cid:48) (0) ≥ z -values into performance categories must be “positive assortative,” where the directionof sorting is given by the vector of agents’ utilities. This implies Z -convexity for thesame reason as in Section 3.3. Implications Solving the optimal convex polygons is computationally hard. Thatsaid, note that the boundaries of convex polygons consist of straight line segments in Z (Ω), which combined with Assumption 5 yields the following observations: • any bi-partitional contract takes the form of either a team or a tournament andis fully captured by the intercept and slope of the straight line as depicted inFigure 2; W =0 W =0 W >0 W >0 Z Z Z Z W >0 W =0 W =0 W >0 Figure 2: Bi-partitional contracts: team and tournament. • contracts that evaluate and reward agents on an individual basis are fully de-termined by the individual performance cutoffs as depicted in Figure 3.20 >0 W >0 W >0 W =0 W =0 W =0 W =0 W >0 Z Z Figure 3: An individual incentive contract. This section compares individual and group performance evaluations from the angleof monitoring cost. To obtain the sharpest insights, suppose that agents are techno-logically independent : Assumption 7. There exist probability spaces { (Ω i , Σ i , P i,a i ) } i,a i as in Section 2 suchthat (Ω , Σ , P a ) = (Ω × Ω , Σ ⊗ Σ , P ,a × P ,a ) for all a ∈ { , } . In the language of contract theory, Assumption 7 rules out any technological link (i.e., ω i depends on a − i ) or common productivity shock (i.e., ω , ω are correlatedgiven a ) between agents.The next definition is standard: Definition 4. (i) P is an individual monitoring technology if for all A ∈ P , thereexist A ∈ Σ and A ∈ Σ such that A = A × A ; otherwise P is a groupmonitoring technology ;(ii) Let P be any individual monitoring technology. Then w : P → R is an indi-vidual wage scheme if w i (cid:0) A i × A (cid:48)− i ; P (cid:1) = w i (cid:0) A i × A (cid:48)(cid:48)− i ; P (cid:1) for all i = 1 , and A i × A (cid:48)− i , A i × A (cid:48)(cid:48)− i ∈ P ; otherwise w : P → R is a group wage scheme ;(iii) (cid:104)P , w : P → R (cid:105) is an individual incentive contract if P is an individualmonitoring technology and w : P → R is an individual wage scheme; otherwiseit is a group incentive contract . 21y definition, a group incentive contract either conducts group performance eval-uations or pairs individual performance evaluations with group incentive pays. UnderAssumption 7, the second option is sub-optimal by the sufficient statistics principleor Holmstr¨om (1982), thus reducing the comparison between individual and groupincentive contracts to that of individual and group performance evaluations.Let I be the ratio between the minimal cost of implementing bi-partitional incen-tive contracts and that of implementing individual incentive contracts (the latter, bydefinition, have at least four performance categories). I < Corollary 2. Under Assumptions 1, 4(a), 5, 6 and 7, I < when µ is large. Beyond the case considered in Corollary 2, we can compute I numerically based onthe prior discussion about how to parameterize bi-partitional and individual incentivecontracts. Figure 4 plots the solutions obtained in a special case. m I Figure 4: Plot I against µ : entropy cost, u i ( w ) = √ w , Z i ∼ U [ − / , / 2] and c i = 1for i = 1 , In the future, it will be interesting to naildown the role of IT in Bloom and Van Reenen (2006, 2007), and to replicate thesestudies for recent advancements in data technologies. In this section, suppose that the agent’s action space A is a finite set, and that takingan action a in A incurs a cost c ( a ) to the agent and generates a probability space(Ω , Σ , P a ) as in Section 2. The principal wishes to induce the most costly action a ∗ ,i.e., c ( a ∗ ) > c ( a ) for all a ∈ D = A − { a ∗ } . For any deviation from a ∗ to a ∈ D ,define a random variable Z a : Ω → R by Z a ( ω ) = 1 − p a ( ω ) p a ∗ ( ω ) ∀ ω ∈ Ω . For any a ∈ D and set A ∈ Σ of positive measure, define z a ( A ) = E [ Z a | A ; a ∗ ] . A contract is incentive compatible if for all a ∈ D : (cid:88) A ∈P P a ∗ ( A ) u ( w ( A )) z a ( A ) ≥ c ( a ∗ ) − c ( a ) . (IC a ) See the survey questions of Bloom and Van Reenen (2006, 2007) regarding the choices betweenindividual and group evaluations, e.g., “employees are rewarded based on their individual contribu-tions to the company,” and “compensation is based on shift/plant-level outcomes.” The former isregarded as an advanced but expensive managerial practice and is more prevalent among companieswith better IT access, other things being equal. 23n optimal incentive contract (cid:104)P ∗ , w ∗ ( · ) (cid:105) that induces a ∗ solvesmin (cid:104)P ,w ( · ) (cid:105) (cid:88) A ∈P P a ∗ ( A ) w ( A ) + µ · H ( P , a ∗ ) s.t. (IC a ) ∀ a ∈ D and (LL) . Write Z for ( Z a ) (cid:62) a ∈D . For any |D| -vector λ = ( λ a ) (cid:62) a ∈D in R |D| + , define a randomvariable Z λ : Ω → R by Z λ ( ω ) = λ (cid:62) Z ( ω ) ∀ ω ∈ Ω . The next definition generalizes Z -convexity: Definition 5. A set A ∈ Σ is Z λ -convex if the following holds for all ω (cid:48) , ω (cid:48)(cid:48) ∈ A suchthat Z λ ( ω (cid:48) ) (cid:54) = Z λ ( ω (cid:48)(cid:48) ) : { ω : Z λ ( ω ) = (1 − s ) · Z λ ( ω (cid:48) ) + s · Z λ ( ω (cid:48)(cid:48) ) for some s ∈ (0 , } ⊂ A. The next theorems extend Theorems 1 and 2 to encompass multiple actions: Theorem 5. Assume Assumption 1 and Assumption 3 for all a ∈ D . Then forany optimal incentive contract (cid:104)P ∗ , w ∗ ( · ) (cid:105) that induces a ∗ , there exists λ ∗ ∈ R |D| + with (cid:107) λ ∗ (cid:107) > such that all cells of P ∗ are Z λ ∗ -convex and can be labeled as A , · · · , A N such that w ∗ ( A ) < · · · < w ∗ ( A N ) . Assume, in addition, As-sumption 2 for all a ∈ D . Then there exist −∞ ≤ (cid:98) z < · · · < (cid:98) z N < + ∞ such that A n = { ω : Z λ ∗ ( ω ) ∈ [ (cid:98) z n − , (cid:98) z n ) } for n = 1 , · · · , N . Theorem 6. Assume Assumptions 1 and 4, as well as Assumptions 2 and 3 for all a ∈ D . Then an optimal incentive contract that induces a ∗ exists.Proof. See Appendix A.3.In the presence of multiple actions, each data point is associated with finitely many z -values, each corresponding to a deviation from a ∗ that the agent can potentiallycommit. By establishing that the assignment of Lagrange multiplier-weighted z -values into wage categories is positive assortative, Theorem 5 relates the focus of dataprocessing and analysis to the agent’s endogenous tendencies to commit deviations.Intuitively, when λ ∗ a is large and hence the agent is tempted to commit deviation a ,focus should be given to the information Z a that helps detect deviation a , and the (cid:107) · (cid:107) denotes the sup norm in the remainder of this paper. Z a . Thenext section gives an application of this result. A single agent can exert either high or low effort a i ∈ { , } in each of the two tasks i = 1 , 2, and each a i independently generates a probability space (Ω i , Σ i , P i,a i ) as inSection 2. The goal of a risk-neutral principal is to induce high effort in both tasks.Write a = a a , ω = ω ω , A = { , , , } , a ∗ = 11 and D = { , , } .For any i = 1 , ω i ∈ Ω i , define Z i ( ω i ) = 1 − p i,a i =0 ( ω i ) p i,a i =1 ( ω i ) , where p i,a i is the probability density function induced by P i,a i . For any ω ∈ Ω × Ω and λ = ( λ , λ , λ ) (cid:62) ∈ R , define Z λ ( ω ) = ( λ + λ ) · Z ( ω ) + ( λ + λ ) · Z ( ω ) − λ · Z ( ω ) Z ( ω ) . The next corollary is immediate from Theorem 5: Corollary 3. Assume Assumption 1 and Assumption 3 for all a ∈ D . Then forany optimal incentive contract (cid:104)P ∗ , w ∗ ( · ) (cid:105) that induces high effort in both tasks, thereexists λ ∗ ∈ R with λ ∗ + λ ∗ , λ ∗ + λ ∗ > such that all cells of P ∗ are Z λ ∗ -convexand can be labeled as A , · · · , A N such that w ∗ ( A ) < · · · < w ∗ ( A N ) . Assume, inaddition, Assumption 2 for all a ∈ D . Then there exist −∞ ≤ (cid:98) z < · · · < (cid:98) z N < + ∞ such that A n = { ω : Z λ ∗ ( ω ) ∈ [ (cid:98) z n − , (cid:98) z n ) } for n = 1 , · · · , N . In a seminal paper, Holmstr¨om and Milgrom (1991) shows that when the agentfaces multiple tasks, over-incentivizing tasks that generate precise performance datamay prevent the completion of tasks that generate noisy performance data. Thatanalysis abstracts away from monitoring costs and focuses on the power of (linear)compensation schemes.Corollary 3 delivers a different message: when it comes to allocating limited re-sources across the assessments of multiple task performances, the optimal allocationshould reflect the agent’s endogenous tendency to shirk each task. The usefulness ofthis result is illustrated by the next example:25 xample 3. A cashier faces two tasks: to scan items and to project warmth tocustomers. A piece of performance data consists of the scanner data recorded by thepoint of sale (POS) system, as well as the feedback gathered from customers. ByCorollary 3, the following ratio: R = λ ∗ + λ ∗ λ ∗ + λ ∗ captures how the principal should allocate limited resources across the assessmentsof skillfulness in scanning items and warmth. Intuitively, a small R arises when thecashier is reluctant to project warmth to customers, in which case resources should bedevoted to the assessment of warmth, and the final performance rating should dependsignificantly on such assessment.We examine how optimal resource allocation varies with the precision of rawperformance data. As in Holmstr¨om and Milgrom (1991), we assume that • ω i = a i + ξ i for i = 1 , 2, where ξ i ’s are independent normal random variableswith mean zero and variance σ i ’s; • the cashier has CARA utility of consumption u ( w ) = 1 − exp ( − γw ).Unlike Holmstr¨om and Milgrom (1991), we do not confine ourselves to linear wageschemes.In the case where the monitoring cost is an increasing function of the ratingscale, we compute R for different values of σ , holding σ = 1 and |P| = 2 fixed. Ourfindings are reported in Figure 5. Assuming that our parameter choices are reasonableones, we arrive at the following conclusion: as skillfulness becomes easier to measure–thanks to the advent of high quality scanner data–the cashier becomes more afraidto shirk the scanning task and less so about projecting coldness to customers; tocorrect the cashier’s incentive, resources should be shifted towards the processing andanalysis of customer feedback and away from that of scanner data. In the future,one can test this prediction by running field experiments as that of Bloom et al.(2013). For example, one can randomize the quality of scanner data among otherwisesimilar stores and examine the effect on resource allocation between scanner data andcustomer feedback. 26 .50.60.70.80.91.0 0.25 0.50 0.75 1.00 s R Figure 5: Plot R against σ : H ( P , a ) = f ( |P| ), |P| = 2; u ( w ) = 1 − exp ( − . w ); c (00) = 0, c (01) = 0 . c (10) = 0 . c (11) = 0 . ξ and ξ are normallydistributed with mean zero and σ = 1. We conclude by posing open questions. First, our work is broadly related to theburgeoning literature on information design (see, e.g., Bergemann and Morris (2019)for a survey), and we hope it inspires new research questions such as how to conductcostly yet flexible monitoring in long-term employment relationships. Second, ourtheory may guide investigations into empirical issues such as how advancements inbig data technologies have affected the design and implementation of monitoringtechnologies, and whether they can partially explain the heterogeneity in the internalorganizations of otherwise similar firms. We hope that someone, maybe ourselves,will pursue these research agendas in the future. A Omitted Proofs A.1 Proofs of Section 3 In this appendix, write any N -partitional contract (cid:104)P , w ( · ) (cid:105) as the correspondingtuple (cid:104) A n , π n , z n , w n (cid:105) Nn =1 , where A n is a generic cell of P , π n = P ( A n ), z n = z ( A n )and w n = w ( A n ). Assume w.l.o.g. that z ≤ · · · ≤ z N .27 .1.1 Useful Lemmas Proof of Lemma 1 Proof. The wage-minimization problem for given monitoring technology (cid:104) A n , π n , z n (cid:105) Nn =1 as in Lemma 1 ismin (cid:104) ˜ w n (cid:105) N (cid:88) n =1 π n ˜ w n − λ (cid:32) N (cid:88) n =1 π n u ( ˜ w n ) z n − c (cid:33) − N (cid:88) n =1 η n ˜ w n , where λ and η n denote the Lagrange multipliers associated with the (IC) constraintand (LL) constraint at ˜ w n , respectively. Differentiating the objective function withrespect to ˜ w n and setting the result equal to zero yields λz n u (cid:48) ( w n ) = 1 − η n /π n , implying that u (cid:48) ( w n ) = 1 / ( λz n ) if and only if w n > Proof. Fix any optimal incentive contract that induces high effort from the agent andlet (cid:104) A n , π n , z n , w n (cid:105) Nn =1 be the corresponding tuple. Note that N ≥ 2. By Assumption1(b), if w j = w k for some j (cid:54) = k , then merging A j and A k has no effect on the incentivecost but strictly reduces the monitoring cost, which contradicts the optimality of theoriginal contract. Then from Lemma 1 and the assumption z ≤ · · · ≤ z N , it followsthat 0 ≤ w < · · · < w N and z < · · · < z N . In particular, we must have z < (cid:80) Nn =1 π n z n = 0. This implies w = 0, because otherwise letting w = 0reduces the expected wage and relaxes the (IC) constraint while keeping the (LL)constraint satisfied. Finally, combining w n > n ≥ z n > n ≥ Lemma 3. For all A ∈ Σ such that P ( A ) > and (cid:15) ∈ (0 , P ( A )] , there exists A (cid:15) ⊂ A such that P ( A (cid:15) ) = (cid:15) and z ( A (cid:15) ) = z ( A ) .Proof. Let A be as above. Since P admits a density, it follows that for all t ∈ (0 , P ( A )], there exists B t ⊂ A such that P ( B t ) = t and Z ( ω (cid:48) ) ≤ Z ( ω ) for all ω ∈ B t and ω (cid:48) ∈ A \ B t . Likewise, there exists C t ⊂ A such that P ( C t ) = t and Z ( ω (cid:48) ) ≥ Z ( ω ) for all ω ∈ C t and ω (cid:48) ∈ A \ C t . For t = 0 define B = C = ∅ .28et (cid:15) be as above. Consider B t ∪ C (cid:15) − t , t ∈ [0 , (cid:15) ]. Since z ( B t ) ≥ z ( A ) and z ( C (cid:15) − t ) ≤ z ( A ) for all t ∈ (0 , (cid:15) ) and z ( B t ∪ C (cid:15) − t ) is continuous in t (because P admits a density), there exists t ∈ [0 , (cid:15) ] such that z ( B t ∪ C (cid:15) − t ) = z ( A ). Meanwhile P ( B t ∪ C (cid:15) − t ) = (cid:15) by construction, so let A (cid:15) = B t ∪ C (cid:15) − t and we are done. A.1.2 Proof of Theorem 1 Proof. Take any optimal incentive contract that induces high effort from the agentand let (cid:104) A n , π n , z n , w n (cid:105) Nn =1 be the corresponding tuple. Suppose, to the contrary,that some A j is not Z -convex. By Definition 1, there exist A (cid:48) , A (cid:48)(cid:48) ⊂ A j and ˜ A ⊂ A k , k (cid:54) = j such that (i) P ( A (cid:48) ), P ( A (cid:48)(cid:48) ), P ( ˜ A ) > 0, and (ii) ˜ z = (1 − s ) z (cid:48) + sz (cid:48)(cid:48) ,where z (cid:48) := z ( A (cid:48) ) (cid:54) = z (cid:48)(cid:48) := z ( A (cid:48)(cid:48) ), ˜ z := z ( ˜ A ) and s ∈ (0 , (cid:15) ∈ (0 , min { P ( A (cid:48) ) , P ( A (cid:48)(cid:48) ) , P ( ˜ A ) } ), there exist A (cid:48) (cid:15) ⊂ A (cid:48) , A (cid:48)(cid:48) (cid:15) ⊂ A (cid:48)(cid:48) and ˜ A (cid:15) ⊂ ˜ A such that (i) P ( A (cid:48) (cid:15) ) = P ( A (cid:48)(cid:48) (cid:15) ) = P ( ˜ A (cid:15) ) = (cid:15) , and (ii) z ( A (cid:48) (cid:15) ) = z (cid:48) , z ( A (cid:48)(cid:48) (cid:15) ) = z (cid:48)(cid:48) and z ( ˜ A (cid:15) ) = ˜ z .Consider two perturbations to the monitoring technology: (a) move A (cid:48) (cid:15) to A k and˜ A (cid:15) to A j ; (b) move ˜ A (cid:15) to A j and A (cid:48)(cid:48) (cid:15) to A k . By construction, neither perturbationaffects the probability distribution of the output signal under high effort and hencethe monitoring cost. Below we demonstrate that one of them strictly reduces theincentive cost compared to the original (optimal) contract. Perturbation (a) Let (cid:104) A n ( (cid:15) ) , π n , z n ( (cid:15) ) (cid:105) Nn =1 be the tuple associated with the mon-itoring technology after perturbation (a). By construction, A j ( (cid:15) ) = ( A j ∪ ˜ A (cid:15) ) \ A (cid:48) (cid:15) ,so z j ( (cid:15) ) = π j z j − (cid:15)z (cid:48) + (cid:15) ˜ zπ j = z j + s ( z (cid:48)(cid:48) − z (cid:48) ) π j (cid:15). Likewise, A k ( (cid:15) ) = ( A k ∪ A (cid:48) (cid:15) ) \ ˜ A (cid:15) and A n ( (cid:15) ) = A n for n (cid:54) = j, k , and similar algebraicmanipulation as above yields z j ( (cid:15) ) = z j + s ( z (cid:48)(cid:48) − z (cid:48) ) π j (cid:15),z k ( (cid:15) ) = z k − s ( z (cid:48)(cid:48) − z (cid:48) ) π k (cid:15),z n ( (cid:15) ) = z n ∀ n (cid:54) = j, k. (A.1)29onsider wage profile (cid:104) w n ( (cid:15) ) (cid:105) Nn =1 such that w ( (cid:15) ) = 0 and the (IC) constraint remainsbinding after the perturbation, i.e., N (cid:88) n =1 π n u ( w n ( (cid:15) )) z n ( (cid:15) ) = N (cid:88) n =1 π n u ( w n ) z n = c. (A.2)A close inspection of Equations (A.1) and (A.2) reveals the existence of M > (cid:15) such that when (cid:15) is small, there exist wage profiles as above thatsatisfy | w n ( (cid:15) ) − w n | < M (cid:15) for all n and hence the (LL) constraint by Lemma 2. With a slight abuse of notation, write ˙ w n ( (cid:15) ) = ( w n ( (cid:15) ) − w n ) /(cid:15) and ˙ z n ( (cid:15) ) =( z n ( (cid:15) ) − z n ) /(cid:15) , and note that ˙ w ( (cid:15) ) = 0. When (cid:15) is small, expanding Equation(A.2) using the twice-differentiability of u ( · ) and | w n ( (cid:15) ) − w n | ∼ O ( (cid:15) ) yields N (cid:88) n =1 π n u ( w n ) z n = N (cid:88) n =1 π n (cid:0) u ( w n ) + u (cid:48) ( w n ) · ˙ w n ( (cid:15) ) · (cid:15) + O (cid:0) (cid:15) (cid:1)(cid:1) ( z n + ˙ z n ( (cid:15) ) · (cid:15) ) . Multiply the above equation by the Lagrange multiplier λ > N (cid:88) n =1 π n · u (cid:48) ( w n ) · λz n · ˙ w n ( (cid:15) ) = − λ N (cid:88) n =1 u ( w n ) · π n ˙ z n ( (cid:15) ) + O ( (cid:15) ) , and simplifying using ˙ w ( (cid:15) ) = 0, u (cid:48) ( w n ) = 1 / ( λz n ) for n ≥ N (cid:88) n =1 π n ˙ w n ( (cid:15) ) = s ( u ( w k ) − u ( w j )) ( λz (cid:48)(cid:48) − λz (cid:48) ) + O ( (cid:15) ) . (A.3) Perturbation (b) Repeating the above argument for perturbation (b) yields N (cid:88) n =1 π n ˙ w n ( (cid:15) ) = − λ (1 − s ) ( u ( w k ) − u ( w j )) ( z (cid:48)(cid:48) − z (cid:48) ) + O ( (cid:15) ) . (A.4) To be precise, recall that u ( w n ), z n > n ≥ (cid:15) is small, z n ( (cid:15) ) > n ≥ (cid:80) Nn =2 π n u ( x n ) z n ( (cid:15) ) = (cid:80) Nn =2 π n u ( w n ) z n yields wage profiles as above. Note that we do not assume differentiability of w n ( (cid:15) ) or z n ( (cid:15) ) with respect to (cid:15) . The samedisclaimer applies to the remainder of this paper. u ( w j ) (cid:54) = u ( w k ) (Lemma 2), z (cid:48) (cid:54) = z (cid:48)(cid:48) (by assumption) and λ > 0, it followsthat the right-hand side of either Equation (A.3) or (A.4) is strictly negative when (cid:15) is small. Thus for either perturbation (a) or (b), we can construct a wage profile thatincurs a lower incentive cost than the original optimal contract, and this leads to acontradiction. A.1.3 Proof of Theorem 2 Proof. By Theorem 1, any optimal monitoring technology with at most N ∈ { , · · · , K } cells is fully characterized by N − (cid:98) z , · · · , (cid:98) z N − satisfying min Z (Ω) ≤ (cid:98) z ≤· · · ≤ (cid:98) z N − ≤ max Z (Ω). Write (cid:98) z = ( (cid:98) z , · · · , (cid:98) z N − ) (cid:62) . Define Z N = { (cid:98) z : min Z (Ω) ≤ (cid:98) z ≤ · · · ≤ (cid:98) z N − ≤ max Z (Ω) } , equip Z N with the sup norm (cid:107) · (cid:107) , and note that Z N is compact by Assumption 3.Let W ( (cid:98) z ) be the minimal incentive cost for inducing high effort from the agent underthe monitoring technology formed by (cid:98) z . Note that W ( (cid:98) z ) exists and is finite if andonly if min Z (Ω) < (cid:98) z n < max Z (Ω) for some n , because then z ( A ) (cid:54)≡ A ’s formed under (cid:98) z , so W ( (cid:98) z ) can be solved by applying Lemma1. We proceed in two steps. Step 1 Show that W ( (cid:98) z ) is continuous in (cid:98) z for any given N ∈ { , · · · , K } .Fix any (cid:98) z ∈ Z N such that W ( (cid:98) z ) is finite. W.l.o.g. consider the case where (cid:98) z n ’sare all distinct. For sufficiently small δ > 0, let (cid:98) z δ be any element of Z N such that (cid:107) (cid:98) z δ − (cid:98) z (cid:107) < δ . Let π n and z n (resp. π δn and z δn ) denote the probability (under a = 1)and z -value of A n = { ω : Z ( ω ) ∈ [ (cid:98) z n − , (cid:98) z n ) } (resp. A δn = (cid:8) ω : Z ( ω ) ∈ [ (cid:98) z δn − , (cid:98) z δn ) (cid:9) ),respectively. Let w n denote the optimal wage at A n .Fix any (cid:15) > 0, and consider the wage profile that pays w n + (cid:15) at A δn if z δn > w n otherwise. By construction, this wage profile satisfies the (LL) constraint. UnderAssumptions 2 and 3, it satisfies the (IC) constraint when δ is sufficiently small:lim δ → (cid:88) n π δn u (cid:0) w n + 1 z δn > · (cid:15) (cid:1) z δn = (cid:88) n π n u ( w n + 1 z n > · (cid:15) ) z n > c, (cid:80) n π n z n = 0 and z n (cid:54)≡ z n > n . Inaddition, since lim δ → (cid:88) n π δn (cid:0) w n + 1 z δn > · (cid:15) (cid:1) = (cid:88) n π n ( w n + 1 z n > · (cid:15) ) , it follows that when δ is sufficiently small, W (cid:0)(cid:98) z δ (cid:1) − W ( (cid:98) z ) ≤ (cid:88) n π δn (cid:0) w n + 1 z δn > · (cid:15) (cid:1) − (cid:88) n π n w n < (cid:15), where the first inequality holds because the constructed wage profile is not necessarilyoptimal under (cid:98) z δ . Finally, interchanging the roles between (cid:98) z and (cid:98) z δ in the abovederivation yields W ( (cid:98) z ) − W (cid:0)(cid:98) z δ (cid:1) < (cid:15) , implying that (cid:12)(cid:12) W (cid:0)(cid:98) z δ (cid:1) − W ( (cid:98) z ) (cid:12)(cid:12) < (cid:15) when δ issufficiently small. Step 2 Under Assumption 4(a), the following quantity: W N := min (cid:98) z ∈Z N W ( (cid:98) z )exists and is finite for all N ∈ { , · · · , K } by Step 1 and the compactness of Z N . Let m N denote the minimal rating scale attained by W N . Solvingmin ≤ N ≤ K W N + µ · f ( m N )yields the solution(s) to the principal’s problem.Under Assumption 4(b), the principal’s problem can be written as follows:min (cid:98) z ∈Z K W ( (cid:98) z ) + µ · h ( π ( (cid:98) z )) , where π ( (cid:98) z ) is the probability vector formed under (cid:98) z and is clearly continuous in (cid:98) z .The existence of solution(s) then follows from Step 1 and the compactness of Z K . A.2 Proof of Section 4 In this appendix, write any N -partitional contract (cid:104)P , w ( · ) (cid:105) as the correspondingtuple (cid:104) A n , π n , z n , w n (cid:105) Nn =1 , where A n is a generic cell of P , π n = P ( A n ), z n =32 z ,n , z ,n ) (cid:62) := ( z ( A n ) , z ( A n )) (cid:62) and w n = ( w ,n , w ,n ) (cid:62) := ( w ( A n ) , w ( A n )) (cid:62) . A.2.1 Useful Lemmas The next lemma generalizes Lemmas 1 and 2 to encompass multiple agents: Lemma 4. Assume Assumption 1. Then under any optimal incentive contract thatinduces high effort from both agents, (i) there exist λ , λ > such that u (cid:48) i ( w i,n ) =1 / ( λ i z i,n ) if and only if w i,n > ; (ii) w j (cid:54) = w k for all j (cid:54) = k .Proof. The wage-minimization problem for given monitoring technology (cid:104) A n , π n , z n (cid:105) Nn =1 is min (cid:104) ˜ w i,n (cid:105) (cid:88) i,n π n ˜ w i,n − (cid:88) i λ i (cid:32)(cid:88) n π n u i ( ˜ w i,n ) z i,n − c i (cid:33) − (cid:88) i,n η i,n ˜ w i,n , where λ i and η i,n denote the Lagrange multipliers associated with the (IC i ) constraintand (LL i ) constraint at ˜ w i,n , respectively. Differentiating the objective function withrespect to ˜ w i,n yields the first-order condition in Part (i). The proof of Part (ii) isthe same as that of Lemma 2 and is therefore omitted.The next lemma plays an analogous role as that of Lemma 3: Lemma 5. Assume Assumption 6. Fix any δ > and any A ∈ Σ such that P ( A ) > . Then for all (cid:15) ∈ (0 , P ( A )] , there exists A (cid:15) ⊂ A such that P ( A (cid:15) ) = (cid:15) and (cid:107) z ( A (cid:15) ) − z ( A ) (cid:107) < δ .Proof. With a slight abuse of notation, let P be any finite partition of Ω suchthat every B ∈ P is measurable and (cid:107) Z ( ω ) − Z ( ω (cid:48) ) (cid:107) < δ for all ω, ω (cid:48) ∈ B . P exists because P admits a density and Z (Ω) is a compact set in R . Define P + = { B ∈ P : P ( A ∩ B ) > } and P = { B ∈ P : P ( A ∩ B ) = 0 } , which areboth finite. Note that (cid:80) B ∈P P ( A ∩ B ) = 0, (cid:80) B ∈P + P ( A ∩ B ) = P ( A ) and z ( A ) = (cid:80) B ∈P + P ( A ∩ B ) z ( A ∩ B ).Since P admits a density, it follows that for all B ∈ P + , there exists C B ⊂ A ∩ B such that P ( C B ) = P ( A ∩ B ) (cid:15)/P ( A ). Also note that (cid:107) z ( C B ) − z ( A ∩ B ) (cid:107) < δ byconstruction. Let A (cid:15) = ∪ B ∈P + C B . Then P ( A (cid:15) ) = (cid:80) B ∈P + P ( A ∩ B ) (cid:15)/P ( A ) = (cid:15) (cid:107) z ( A (cid:15) ) − z ( A ) (cid:107) = (cid:107) (cid:88) B ∈P + P ( A ∩ B ) P ( A ) ( z ( C B ) − z ( A ∩ B )) (cid:107)≤ (cid:88) B ∈P + P ( A ∩ B ) P ( A ) (cid:107) z ( C B ) − z ( A ∩ B ) (cid:107) < δ. A.2.2 Proof of Theorem 3 Proof. Take any optimal incentive contract that induces high effort from both agentsand let (cid:104) A n , π n , z n , w n (cid:105) Nn =1 be the corresponding tuple. Suppose, to the contrary,that some A j is not Z -convex. By definition, there exist A (cid:48) , A (cid:48)(cid:48) ⊂ A j and ˜ A ∈ A k , k (cid:54) = j such that (i) P ( A (cid:48) ), P ( A (cid:48)(cid:48) ), P ( ˜ A ) > 0, and (ii) ˜ z = (1 − s ) z (cid:48) + s z (cid:48)(cid:48) where z (cid:48) := z ( A (cid:48) ) (cid:54) = z (cid:48)(cid:48) := z ( A (cid:48)(cid:48) ), ˜ z := z ( ˜ A ) and s ∈ (0 , δ > (cid:15) ∈ (0 , min { P ( A (cid:48) ) , P ( A (cid:48)(cid:48) ) , P ( ˜ A ) } ), there exist A (cid:48) (cid:15) ⊂ A (cid:48) , A (cid:48)(cid:48) (cid:15) ⊂ A (cid:48)(cid:48) and ˜ A (cid:15) ⊂ ˜ A such that (i) P ( A (cid:48)(cid:48) (cid:15) ) = P ( A (cid:48)(cid:48) (cid:15) ) = P ( ˜ A (cid:15) ) = (cid:15) , and (ii) (cid:107) z ( A (cid:48) (cid:15) ) − z (cid:48) (cid:107) , (cid:107) z ( A (cid:48)(cid:48) (cid:15) ) − z (cid:48)(cid:48) (cid:107) , (cid:107) z ( ˜ A (cid:15) ) − ˜ z (cid:107) < δ .Consider two perturbations to the monitoring technology: (a) move A (cid:48) (cid:15) to A k and˜ A (cid:15) to A j ; (b) move ˜ A (cid:15) to A j and A (cid:48)(cid:48) (cid:15) to A k . By Assumption 1, neither perturbationaffects the probability distribution of the output signal under a = and hence themonitoring cost. Below we demonstrate that one of them strictly reduces the incentivecost compared to the original optimal contract. Perturbation (a) Let (cid:104) A n ( (cid:15) ) , π n , z n ( (cid:15) ) (cid:105) Nn =1 denote the tuple associated with themonitoring technology after perturbation (a), where A j ( (cid:15) ) = ( A j ∪ ˜ A (cid:15) ) \ A (cid:48) (cid:15) , A k ( (cid:15) ) =( A k ∪ A (cid:48) (cid:15) ) \ ˜ A (cid:15) and A n ( (cid:15) ) = A n for n (cid:54) = j, k . Straightforward algebra shows that z j ( (cid:15) ) = z j + z ( ˜ A (cid:15) ) − z ( A (cid:48) (cid:15) ) π j (cid:15), z k ( (cid:15) ) = z k − z ( ˜ A (cid:15) ) − z ( A (cid:48) (cid:15) ) π k (cid:15), z n ( (cid:15) ) = z n ∀ n (cid:54) = j, k, (A.5)34nd that (cid:107) z ( ˜ A (cid:15) ) − z ( A (cid:48) (cid:15) ) − (˜ z − z (cid:48) ) (cid:107) ≤(cid:107) z ( ˜ A (cid:15) ) − ˜ z (cid:107) + (cid:107) z ( A (cid:48) (cid:15) ) − z (cid:48) (cid:107) < min (cid:26) δ, ω ∈ Ω (cid:107) Z ( ω ) (cid:107) (cid:27) . (A.6)Define B i = { n : w i,n = 0 } for i = 1 , 2. Consider wage profile (cid:104) w n ( (cid:15) ) (cid:105) Nn =1 such thatfor i = 1 , 2: (1) w i,n ( (cid:15) ) = w i,n = 0 for n ∈ B i ; (2) agent i ’s incentive compatibilityconstraint remains binding after perturbation (a), i.e., N (cid:88) n =1 π n u i ( w i,n ( (cid:15) )) z i,n ( (cid:15) ) = N (cid:88) n =1 π n u i ( w i,n ) z i,n = c i . (A.7)A close inspection of Equations (A.5)-(A.7) reveals the existence of M > (cid:15) and δ such that when (cid:15) is sufficiently small, there exist wage profiles asabove that satisfy (cid:107) w n ( (cid:15) ) − w n (cid:107) < M (cid:15) for all n and hence (LL i ) constraints.With a slight abuse of notation, write ˙ w n ( (cid:15) ) = ( w n ( (cid:15) ) − w n ) /(cid:15) and ˙ z n ( (cid:15) ) =( z n ( (cid:15) ) − z n ) /(cid:15) , and note that ˙ w i,n ( (cid:15) ) = 0 for i = 1 , n ∈ B i . When (cid:15) is small,expanding Equation (A.7) using the twice-differentiability of u i ( · ) and | w i,n ( (cid:15) ) − w i,n | ∼ O ( (cid:15) ) and multiplying the result by the Lagrange multiplier λ i > i ) constraint prior to the perturbation yields N (cid:88) n =1 π n · u (cid:48) i ( w i,n ) · λ i z i,n · ˙ w i,n ( (cid:15) ) = − λ i N (cid:88) n =1 u i ( w i,n ) · π n ˙ z i,n ( (cid:15) ) + O ( (cid:15) ) . Simplifying using ˙ w i,n ( (cid:15) ) = 0 if n ∈ B i , u (cid:48) ( w i,n ) = 1 / ( λ i z i,n ) if n / ∈ B i (Lemma 4)and Equation (A.5) yields (cid:88) i,n π n ˙ w i,n = ( u k − u j ) (cid:62) Λ ( z ( ˜ A (cid:15) ) − z ( A (cid:48) (cid:15) )) + O ( (cid:15) ) , where u n = ( u ( w ,n ) , u ( w ,n )) (cid:62) for n = k, j and Λ = (cid:0) λ λ (cid:1) . Further simplifying35sing Equation (A.6) and ˜ z = (1 − s ) z (cid:48) + s z (cid:48)(cid:48) yields the following when δ is small: (cid:88) i,n π n ˙ w i,n = ( u k − u j ) (cid:62) Λ (˜ z − z (cid:48) ) + O ( (cid:15) )+ ( u k − u j ) (cid:62) Λ ( z ( ˜ A (cid:15) ) − z ( A (cid:48) (cid:15) ) − (˜ z − z (cid:48) ))= s ( u k − u j ) (cid:62) Λ ( z (cid:48)(cid:48) − z (cid:48) ) + O ( (cid:15) ) + O ( δ ) . (A.8) Perturbation (b) Repeating the above argument for perturbation (b) yields (cid:88) i,n π n ˙ w i,n = − (1 − s ) ( u k − u j ) (cid:62) Λ ( z (cid:48)(cid:48) − z (cid:48) ) + O ( (cid:15) ) + O ( δ ) . (A.9)Consider two cases:Case 1 ( u k − u j ) (cid:62) Λ ( z (cid:48)(cid:48) − z (cid:48) ) (cid:54) = 0. In this case, the right-hand sides of Equations (A.8)and (A.9) have the opposite signs when (cid:15) and δ are sufficiently small, and theremainder of the proof is the same as that of Theorem 1.Case 2 ( u k − u j ) (cid:62) Λ ( z (cid:48)(cid:48) − z (cid:48) ) = 0. In this case, note that ( u k − u j ) (cid:62) Λ (cid:54) = (cid:62) by Lemma4, where denotes the 2-vector of zeros. Then from Assumption 5 ( Z is dis-tributed atomlessly on a connected set), there exist B (cid:48) ⊂ A (cid:48) , B (cid:48)(cid:48) ⊂ A (cid:48)(cid:48) and˜ B ⊂ ˜ A such that P ( B (cid:48) ), P ( B (cid:48)(cid:48) ), P ( ˜ B ) > z ( ˜ B ) = (1 − s (cid:48) ) z ( B (cid:48) ) + s (cid:48) z ( B (cid:48)(cid:48) )for some s (cid:48) ∈ (0 , u k − u j ) (cid:62) Λ ( z ( B (cid:48)(cid:48) ) − z ( B (cid:48) )) (cid:54) = 0. Replacing A (cid:48) , A (cid:48)(cid:48) and ˜ A with B (cid:48) , B (cid:48)(cid:48) and ˜ B , respectively, in the above argument gives the desiredresult. A.2.3 Proof of Theorem 4 Proof. By Theorem 3, any optimal monitoring technology with at most N ∈ { , · · · , K } cells is fully characterized by (1) a finite number q N of vertices z , · · · , z q N in Z (Ω),and (2) a q N × q N adjacency matrix M whose lm ’th entry equals 1 if z l and z m areconnected by a line segment and 0 otherwise. By definition, M is symmetric andhence is determined by its upper triangle entries, which can be either 0 or 1. Thus M belongs to M N := { , } q N × ( q N − / , which is a finite set.36rite (cid:126) z for ( z , · · · , z q N ) (cid:62) . For any N ∈ { , · · · , K } and adjacency matrix M ∈M N , define Z N ( M ) = { (cid:126) z : ( (cid:126) z , M ) partitions Z (Ω) into at most N convex polygons } , equip Z N ( M ) with the sup norm (cid:107) · (cid:107) , and note that Z N ( M ) is compact by Assump-tion 6. Let W ( (cid:126) z , M ) denote the minimal incentive cost for inducing high effort fromboth agents under the monitoring technology formed by ( (cid:126) z , M ). W ( (cid:126) z , M ) exists andis finite if and only if for all i = 1 , z i ( A ) (cid:54)≡ A ’sformed under ( (cid:126) z , M ).We proceed in two steps. Step 1 Show that W ( (cid:126) z , M ) is continuous in the first argument for any given N ∈{ , · · · , K } and M ∈ M N .Fix any (cid:126) z ∈ Z N ( M ) such that W ( (cid:126) z , M ) is finite. For sufficiently small δ > (cid:126) z δ be any element of Z N ( M ) such that (cid:107) (cid:126) z δ − (cid:126) z (cid:107) < δ . Label the performancecategories formed under ( (cid:126) z , M ) and (cid:0) (cid:126) z δ , M (cid:1) as A n ’s and A δn ’s, respectively, such thatfor n = 1 , , · · · , z l is a vertex of cl ( Z ( A n )) if and only if z δl is a vertex of cl (cid:0) Z (cid:0) A δn (cid:1)(cid:1) .Let π n and z i,n (resp. π δn and z δi,n ) denote the probability (under a = ) and z i -valueof A n (resp. A δn ), respectively. Let w i,n denote the optimal wage of agent i at A n .Fix any (cid:15) > 0. Consider the wage profile that pays w i,n + (cid:15)/ i if z δi,n > w i,n otherwise and therefore satisfies the (LL i ) constraint. Under Assumptions 5and 6, the (IC i ) constraint is satisfied when δ is sufficiently small:lim δ → (cid:88) n π δi,n u (cid:16) w i,n + 1 z δi,n > · (cid:15)/ (cid:17) z δi,n = (cid:88) n π n u (cid:0) w i,n + 1 z i,n > · (cid:15)/ (cid:1) z i,n > c i , where the inequality holds because (cid:80) n π n z i,n = 0 and z i,n (cid:54)≡ z i,n > n .In addition, sincelim δ → (cid:88) i,n π δn (cid:16) w i,n + 1 z δi,n > · (cid:15)/ (cid:17) = (cid:88) i,n π n (cid:0) w i,n + 1 z i,n > · (cid:15)/ (cid:1) , 37t follows that when δ is sufficiently small, W (cid:0) (cid:126) z δ , M (cid:1) − W ( (cid:126) z , M ) ≤ (cid:88) i,n π δn (cid:16) w i,n + 1 z δi,n > · (cid:15)/ (cid:17) − (cid:88) i,n π n w i,n < (cid:15), where the first inequality holds because the constructed wage profile is not necessarilyoptimal under (cid:0) (cid:126) z δ , M (cid:1) . Finally, interchanging the roles between (cid:126) z δ and (cid:126) z in the abovederivation yields W ( (cid:126) z , M ) − W (cid:0) (cid:126) z δ , M (cid:1) < (cid:15) , implying that (cid:12)(cid:12) W (cid:0) (cid:126) z δ , M (cid:1) − W ( (cid:126) z , M ) (cid:12)(cid:12) <(cid:15) when δ is sufficiently small. Step 2 Under Assumption 4(a), the following quantity: W N := min M ∈M N ,(cid:126) z ∈Z N ( M ) W ( (cid:126) z , M )exists and is finite for all N ∈ { , · · · , K } by Step 1, the compactness of Z N ( M )and the finiteness of M N . Under Assumption 4(b), the principal’s problem can bewritten as follows: min M ∈M K ,(cid:126) z ∈Z K ( M ) W ( (cid:126) z , M ) + µ · h ( π ( (cid:126) z , M )) , where π ( (cid:126) z , M ) is the probability vector formed under ( (cid:126) z , M ) and is clearly continuousin (cid:126) z . The remainder of the proof is the same as that of Theorem 2 and is thereforeomitted. A.3 Proofs of Section 5 In this appendix, write z ( A ) = ( z a ( A )) (cid:62) a ∈D for any set A ∈ Σ of positive mea-sure, as well as any N -partitional contract (cid:104)P , w ( · ) (cid:105) as the corresponding tuple (cid:104) A n , π n , z n , w n (cid:105) Nn =1 , where A n is a generic cell of P , π n = P a ∗ ( A n ), z n = z ( A n )and w n = w ( A n ). Assume w.l.o.g. that w ≤ · · · ≤ w N . A.3.1 Useful Lemma The next lemma generalizes Lemmas 1 and 2 to encompass multiple agents: Lemma 6. Assume Assumption 1. Then for any optimal incentive contract thatinduces a ∗ , (i) there exists λ ∈ R |D| + with (cid:107) λ (cid:107) > such that u (cid:48) ( w n ) = 1 / (cid:0) λ (cid:62) z n (cid:1) if nd only if w n > ; (ii) λ (cid:62) z < < λ (cid:62) z < · · · and w < w < · · · .Proof. The wage-minimization problem for given monitoring technology (cid:104) A n , π n , z n (cid:105) Nn =1 is min (cid:104) ˜ w n (cid:105) (cid:88) n π n ˜ w n − (cid:88) n π n u ( ˜ w n ) · λ (cid:62) z n − (cid:88) n η n ˜ w n , where λ denotes the profile of the Lagrange multipliers associated with the (IC a )constraints and η n the Lagrange multiplier associated with the (LL) constraint at˜ w n . Note that (cid:107) λ (cid:107) > 0, because otherwise all (IC a ) constraints are slack and hencesubtracting a small (cid:15) > w n yields the first-order conditionin Part (i). The proof of Part (ii) is the same as that of Lemma 2 and is thereforeomitted. A.3.2 Proof of Theorem 5 Proof. Take any optimal incentive contract that induces a ∗ . Let (cid:104) A n , π n , z n , w n (cid:105) Nn =1 be the corresponding tuple and λ be the profile of the Lagrange multipliers as-sociated with the (IC a ) constraints. Suppose, to the contrary, that some A j isnot Z λ -convex. Then there exist A (cid:48) , A (cid:48)(cid:48) ⊂ A j and ˜ A ⊂ A k , k (cid:54) = j such that(i) P a ∗ ( A (cid:48) ) , P a ∗ ( A (cid:48)(cid:48) ) , P a ∗ ( ˜ A ) > 0, and (ii) λ (cid:62) ˜ z = (1 − s ) λ (cid:62) z (cid:48) + s λ (cid:62) z (cid:48)(cid:48) , where z (cid:48) := z ( A (cid:48) ), z (cid:48)(cid:48) := z ( A (cid:48)(cid:48) ), ˜ z := z ( ˜ A ), λ (cid:62) z (cid:48) (cid:54) = λ (cid:62) z (cid:48)(cid:48) and s ∈ (0 , (cid:15) ∈ (0 , min { P a ∗ ( A (cid:48) ) , P a ∗ ( A (cid:48)(cid:48) ) , P a ∗ ( ˜ A ) } ), there exist A (cid:48) (cid:15) ⊂ A (cid:48) , A (cid:48)(cid:48) (cid:15) ⊂ A (cid:48)(cid:48) and˜ A (cid:15) ⊂ ˜ A such that (i) P a ∗ ( A (cid:48) (cid:15) ) = P a ∗ ( A (cid:48)(cid:48) (cid:15) ) = P a ∗ ( ˜ A (cid:15) ) = (cid:15) , and (ii) λ (cid:62) z ( A (cid:48) (cid:15) ) = λ (cid:62) z (cid:48) , λ (cid:62) z ( A (cid:48)(cid:48) (cid:15) ) = λ (cid:62) z (cid:48)(cid:48) and λ (cid:62) z ( ˜ A (cid:15) ) = λ (cid:62) ˜ z .Consider two perturbations to the monitoring technology: (a) move A (cid:48) (cid:15) to A k and ˜ A (cid:15) to A j , and (b) move ˜ A (cid:15) to A j and A (cid:48)(cid:48) (cid:15) to A k . By Assumption 1, neitherperturbation affects the probability distribution of the output signal under action a ∗ and hence the monitoring cost. Below we demonstrate that one of them strictlyreduces the incentive cost compared to the original (optimal) contract. Perturbation (a) Let (cid:104) A n ( (cid:15) ) , π n , z n ( (cid:15) ) (cid:105) Nn =1 be the tuple associated with the mon-itoring technology after perturbation (a), where A j ( (cid:15) ) = ( A j ∪ ˜ A (cid:15) ) \ A (cid:48) (cid:15) , A k ( (cid:15) ) =39 A k ∪ A (cid:48) (cid:15) ) \ ˜ A (cid:15) and A n ( (cid:15) ) = A n for n (cid:54) = j, k . Straightforward algebra shows that z j ( (cid:15) ) = z j + z ( ˜ A (cid:15) ) − z ( A (cid:48) (cid:15) ) π j (cid:15), z k ( (cid:15) ) = z k − z ( ˜ A (cid:15) ) − z ( A (cid:48) (cid:15) ) π k (cid:15), z n ( (cid:15) ) = z n ∀ n (cid:54) = j, k, (A.10)and that (cid:107) z ( ˜ A (cid:15) ) − z ( A (cid:48) (cid:15) ) (cid:107) ≤ (cid:107) z ( ˜ A (cid:15) ) (cid:107) + (cid:107) z ( A (cid:48) (cid:15) ) (cid:107) ≤ ω ∈ Ω (cid:107) Z ( ω ) (cid:107) . (A.11)Consider wage profile (cid:104) w n ( (cid:15) ) (cid:105) Nn =1 such that (1) w ( (cid:15) ) = w = 0 and (2) all (IC a )constraints are slack by O ( (cid:15) ) after the perturbation, i.e.,0 ≤ N (cid:88) n =1 π n u ( w n ( (cid:15) )) z a,n ( (cid:15) ) − N (cid:88) n =1 π n u ( w n ) z a,n ∼ O ( (cid:15) ) ∀ a ∈ D . (A.12)A close inspection of Equations (A.10)-(A.12) reveals the existence of M > (cid:15) is sufficiently small, there exist wage profiles as above that satisfy | w n ( (cid:15) ) − w n | < M (cid:15) for all n and hence the (LL) constraint. Write ˙ w n = ( w n ( (cid:15) ) − w n ) /(cid:15) and ˙ z n ( (cid:15) ) = ( z n ( (cid:15) ) − z n ) /(cid:15) . When (cid:15) is small, expand-ing Equation (A.12) using the twice-differentiability of u ( · ) and | w n ( (cid:15) ) − w n | ∼ O ( (cid:15) )and multiplying the result by λ yields N (cid:88) n =1 π n · u (cid:48) ( w n ) · λ (cid:62) z n · ˙ w n ( (cid:15) ) = − N (cid:88) n =1 u ( w n ) · π n · λ (cid:62) ˙ z n ( (cid:15) ) + O ( (cid:15) ) . Simplifying using ˙ w ( (cid:15) ) = 0, u (cid:48) ( w n ) = 1 / (cid:0) λ (cid:62) z n (cid:1) for n ≥ N (cid:88) n =1 π n ˙ w n ( (cid:15) ) = s ( u ( w k ) − u ( w j )) (cid:0) λ (cid:62) z (cid:48)(cid:48) − λ (cid:62) z (cid:48) (cid:1) + O ( (cid:15) ) . (A.13) To see why, define κ a = (cid:80) Nn =2 π n u ( w n ) z a,n and S a = (cid:110) (cid:104) x n (cid:105) Nn =2 ∈ R N − : (cid:80) Nn =2 x n z a,n ≥ κ a (cid:111) for each a ∈ D , and note that (cid:104) π n u ( w n ) (cid:105) Nn =2 ∈ ∩ a ∈D S a . If we cannot construct a wage profileas above, then there exist a (cid:48) , a (cid:48)(cid:48) ∈ D such that ∩ a = a (cid:48) ,a (cid:48)(cid:48) (cid:110) (cid:104) x n (cid:105) Nn =2 ∈ R N − : (cid:80) Nn =2 x n z a,n ≥ κ a (cid:111) = (cid:110) (cid:104) x n (cid:105) Nn =2 ∈ R N − : (cid:80) Nn =2 x n z a (cid:48) ,n = κ a (cid:48) (cid:111) and hence z a (cid:48)(cid:48) ,n = − z a (cid:48) ,n for n = 2 , · · · , N and κ a (cid:48)(cid:48) = − κ a (cid:48) . In the meantime, κ a ≥ c ( a ∗ ) − c ( a ) > a ∈ D , thus reaching a contradiction. erturbation (b) Repeating the above argument for perturbation (b) yields N (cid:88) n =1 π n ˙ w n ( (cid:15) ) = − (1 − s ) ( u ( w k ) − u ( w j )) (cid:0) λ (cid:62) z (cid:48)(cid:48) − λ (cid:62) z (cid:48) (cid:1) + O ( (cid:15) ) . (A.14)Since u ( w k ) (cid:54) = u ( w j ) by Lemma 6 and λ (cid:62) z (cid:48)(cid:48) (cid:54) = λ (cid:62) z (cid:48) by assumption, the right-handsides of Equations (A.13) and (A.14) have the opposite signs when (cid:15) is small. Theremainder of the proof is the same as that of Theorem 1 and is therefore omitted. A.3.3 Proof of Theorem 6 Proof. Define Λ = (cid:110) λ : λ ∈ R |D| + and (cid:107) λ (cid:107) |D| = 1 (cid:111) , where (cid:107)·(cid:107) |D| denotes the |D| -dimensional Euclidean norm. By Theorem 5, any optimalmonitoring technology with at most N ∈ { , · · · , K } performance categories is fullycaptured by λ ∈ Λ and N − (cid:98) z , · · · , (cid:98) z N − such that min ω ∈ Ω λ (cid:62) Z ( ω ) ≤ (cid:98) z ≤ · · · ≤ (cid:98) z N − ≤ max ω ∈ Ω λ (cid:62) Z ( ω ). Write (cid:98) z = ( (cid:98) z , · · · , (cid:98) z N − ). Define Z N ( λ ) = (cid:26)(cid:98) z : min ω ∈ Ω λ (cid:62) Z ( ω ) ≤ (cid:98) z ≤ · · · ≤ (cid:98) z N − ≤ max ω ∈ Ω λ (cid:62) Z ( ω ) (cid:27) , equip Z N ( λ ) with the sup norm (cid:107) · (cid:107) , and note that Z N ( λ ) is compact by Assumption3. For any given pair ( λ , (cid:98) z ), write the minimal incentive cost for inducing a ∗ as W ( λ , (cid:98) z ), and note that W ( λ , (cid:98) z ) exists and is finite if and only if λ a > a ∈ D and min ω ∈ Ω λ (cid:62) Z ( ω ) < (cid:98) z n < max ω ∈ Ω λ (cid:62) Z ( ω ) for some n . The first conditionis necessary: otherwise there exists a ∈ D such that z a ( A ) ≡ A ’s formed under ( λ , (cid:98) z ) and hence the (IC a ) constraint will be violated.We proceed in two steps. Step 1 Show that W ( λ , (cid:98) z ) is continuous in ( λ , (cid:98) z ) for any given N ∈ { , · · · , K } .Fix any λ ∈ Λ and (cid:98) z ∈ Z N ( λ ) such that W ( λ , (cid:98) z ) is finite. W.l.o.g. consider thecase where (cid:98) z n ’s are all distinct. For sufficiently small δ > 0, let λ δ and (cid:98) z δ be anyelement of Λ and Z N (cid:0) λ δ (cid:1) , respectively, such that (cid:107) λ δ − λ (cid:107) |D| , (cid:107) (cid:98) z δ − (cid:98) z (cid:107) < δ . Let π n and z n (resp. π δn and z δn ) denote the probability (under a = a ∗ ) and |D| -vector of z -values associated with performance category A n = (cid:8) ω : λ (cid:62) Z ( ω ) ∈ [ (cid:98) z n − , (cid:98) z n ) (cid:9) (resp.41 δn = (cid:8) ω : λ δ (cid:62) Z ( ω ) ∈ [ (cid:98) z δn − , (cid:98) z δn ) (cid:9) ), respectively. Let w n denote the optimal wage at A n .Fix any (cid:15) > 0, and consider the wage profile that pays w n + (cid:15) at A δn if z δa,n > a ∈ D and w n otherwise. By construction, this wage profile satisfies the (LL)constraint. Under Assumptions 2 and 3, it satisfies every (IC a ) constraint when δ issmall: lim δ → (cid:88) n u (cid:32) w n + (cid:89) a (cid:48) ∈D z δa (cid:48) ,n > · (cid:15) (cid:33) π δn z δa,n = (cid:88) n u (cid:32) w n + (cid:89) a (cid:48) ∈D z a (cid:48) ,n > · (cid:15) (cid:33) π n z a,n > (cid:88) n u ( w n ) π n z a,n , where the inequality holds because that (cid:80) n π n z a (cid:48) ,n = 0 and z a (cid:48) ,n is strictly increasingin n for all a (cid:48) ∈ D so there exists n such that (cid:81) a (cid:48) ∈D z a (cid:48) ,n > = 1. To complete theproof, note thatlim δ → (cid:88) n π δn (cid:32) w n + (cid:89) a ∈D z δa,n > · (cid:15) (cid:33) = (cid:88) n π n (cid:32) w n + (cid:89) a ∈D z a,n > · (cid:15) (cid:33) , so the following holds when δ is sufficiently small: W (cid:0) λ δ , (cid:98) z δ (cid:1) − W ( λ , (cid:98) z ) ≤ (cid:88) n π δn (cid:32) w n + (cid:89) a ∈D z δa,n > · (cid:15) (cid:33) − (cid:88) n π n w n < (cid:15). Finally, interchanging the roles between ( λ , (cid:98) z ) and (cid:0) λ δ , (cid:98) z δ (cid:1) in the above derivationyields W ( λ , (cid:98) z ) − W (cid:0) λ δ , (cid:98) z δ (cid:1) < (cid:15) , implying that (cid:12)(cid:12) W (cid:0) λ δ , (cid:98) z δ (cid:1) − W ( λ , (cid:98) z ) (cid:12)(cid:12) < (cid:15) when δ is sufficiently small. Step 2 Under Assumption 4(a), the following quantity: W N := min λ ∈ Λ , (cid:98) z ∈Z N ( λ ) W ( λ , (cid:98) z )42xists and is finite for all N ∈ { , · · · , K } by Step 1 and the compactness of Λ and Z N ( λ ). Under Assumption 4(b), the principal’s problem can be written as follows:min λ ∈ Λ , ˆ z ∈Z K ( λ ) W ( λ , (cid:98) z ) + µ · h ( π ( λ , (cid:98) z )) , where π ( λ , (cid:98) z ) denotes the probability vector formed under ( λ , (cid:98) z ) and is continuousin its argument. The remainder of the proof is the same as that of Theorem 2 and istherefore omitted. B Other Extensions B.1 Individual Rationality In this appendix, let everything be as in the baseline model except that the agent isconstrained by individual rationality rather than limited liability: (cid:88) A ∈P P ( A ) u ( w ( A )) ≥ c + u. (IR)A wage scheme is w : P → R , and an optimal incentive contract that induces higheffort from the agent (optimal incentive contract for short) minimizes the total im-plementation cost, subject to the (IC) and (IR) constraints. Corollary 4. Under Assumption 1, any optimal monitoring technology that induceshigh effort from the agent comprises Z -convex cells.Proof. Take any optimal incentive contract and let (cid:104) A n , π n , z n , w n (cid:105) Nn =1 be the corre-sponding tuple. Assume without loss of generality that z ≤ · · · ≤ z N . Step 1 Show that z < · · · < z N and w < · · · < w N .The wage-minimization problem given (cid:104) A n , π n , z n (cid:105) Nn =1 ismin (cid:104) ˜ w n (cid:105) N (cid:88) n =1 π n ˜ w n − λ (cid:32) N (cid:88) n =1 π n u ( ˜ w n ) z n − c (cid:33) − γ (cid:32) N (cid:88) n =1 π n u ( ˜ w n ) − ( c + u ) (cid:33) , where λ and γ denote the Lagrange multipliers associated with the (IC) and (IR)constraints, respectively. Differentiating the objective function with respect to ˜ w n u (cid:48) ( w n ) = 1 λz n + γ . Thus if z j = z k for some j (cid:54) = k , then w j = w k . But then merging A j and A k has noeffect on the incentive cost but strictly reduces the monitoring cost by Assumption1(b), a contradiction to the optimality of the original contract. Step 2 Show Z -convexity.Suppose, to the contrary, that some A j is not Z -convex. Consider first perturba-tion (a) in the proof of Theorem 1. Take any wage profile (cid:104) w n ( (cid:15) ) (cid:105) Nn =1 such that the(IC) and (IR) constraints remain binding after the perturbation, i.e., N (cid:88) n =1 π n u ( w n ( (cid:15) )) z n ( (cid:15) ) = N (cid:88) n =1 π n u ( w n ) z n , (B.1)and N (cid:88) n =1 π n u ( w n ( (cid:15) )) = N (cid:88) n =1 π n u ( w n ) . (B.2)A close inspection of Equations (A.1), (B.1) and (B.2) reveals the existence of M > (cid:15) is sufficiently small, there exist wage profiles as above such that | w n ( (cid:15) ) − w n | < M (cid:15) for all n . Write ˙ w n ( (cid:15) ) = ( w n ( (cid:15) ) − w n ) /(cid:15) and ˙ z n ( (cid:15) ) = ( z n ( (cid:15) ) − z n ) /(cid:15) , and let λ > γ > λ (B.1)+ γ (B.2) using the twice-differentiability of u ( · ) and | w n ( (cid:15) ) − w n | ∼ O ( (cid:15) ) yields the following when (cid:15) is small: N (cid:88) n =1 π n · u (cid:48) ( w n ) · ( λz n + γ ) · ˙ w n ( (cid:15) ) = − λ N (cid:88) n =1 u ( w n ) · π n ˙ z n ( (cid:15) ) + O ( (cid:15) ) . (B.3) To see why, define κ = (cid:80) Nn =1 π n u ( w n ) z n , κ = (cid:80) Nn =1 π n u ( w n ), S = (cid:110) (cid:104) x n (cid:105) Nn =1 ∈ R N : (cid:80) Nn =1 x n z n ≥ κ (cid:111) and S = (cid:110) (cid:104) x n (cid:105) Nn =1 ∈ R N : (cid:80) Nn =1 x n ≥ κ (cid:111) , and note that (cid:104) π n u ( w n ) (cid:105) Nn =1 ∈ S ∩ S . Then from z < · · · < z N , it follows that dim S ∩ S = N , and combiningwith Equation (A.1) gives the desired result. u (cid:48) ( w n ) = 1 / ( λz n + γ ) and Equation (A.1) yields N (cid:88) n =1 π n ˙ w n ( (cid:15) ) = s ( u ( w k ) − u ( w j )) ( λz (cid:48)(cid:48) − λz (cid:48) ) . (B.4)Consider next perturbation (b). Similar algebraic manipulation as above yields N (cid:88) n =1 π n ˙ w n ( (cid:15) ) = − (1 − s ) ( u ( w k ) − u ( w j )) ( λz (cid:48)(cid:48) − λz (cid:48) ) . (B.5)Since u ( w j ) (cid:54) = u ( w k ) and z (cid:48)(cid:48) (cid:54) = z (cid:48) , we must have sgn (B.4) (cid:54) = sgn (B.5), and theremainder of the proof is the same as that of Theorem 1. B.2 Random Monitoring Technology This appendix extends the baseline model to encompass random monitoring tech-nologies q : Ω → ∆ K mapping raw data points to elements in the K -dimensionalsimplex. Time evolves as follows:1. the principal commits to (cid:104) q , w (cid:105) ;2. the agent privately chooses a ∈ { , } ;3. Nature draws ω ∈ Ω according to P a ;4. the monitoring technology outputs n ∈ { , · · · , K } with probability q n ( ω );5. the principal pays the promised wage w n ≥ (cid:104) q , w (cid:105) , the agent is assigned to performance category n with probability π n = (cid:90) q n ( ω ) dP ( ω )if he exerts high effort. Define N = { n : π n > } . For n ∈ N , define z n = (cid:90) Z ( ω ) q n ( ω ) dP ( ω ) /π n 45s the z -value of performance category n . For n / ∈ N , let w n = 0. Then (cid:104) q , w (cid:105) isincentive compatible if (cid:88) n ∈N π n u ( w n ) z n ≥ c, (IC)in which case the monitoring cost is proportional to the mutual information of theraw data and output signal conditional on high effort: H ( q , 1) = (cid:88) n ∈N (cid:90) q n ( ω ) log q n ( ω ) (cid:82) q n ( ω ) dP ( ω ) dP ( ω ) . An optimal incentive contract (cid:104) q ∗ , w ∗ (cid:105) that induces high effort from the agent solvesmin (cid:104) q ,w (cid:105) (cid:88) n ∈N π n w n + µ · H ( q , 1) s.t. (IC) and (LL) . The next theorem gives characterizations of optimal incentive contracts: Theorem 7. For any optimal incentive contract (cid:104) q ∗ , w ∗ (cid:105) that induces high effortfrom the agent, we have (i) q ∗ : Z (Ω) → ∆ K ; (ii) min { w ∗ n : n ∈ N ∗ } = 0 ; (iii) forall j, k ∈ N ∗ , w ∗ j (cid:54) = w ∗ k and q ∗ k ( z ) /q ∗ j ( z ) is strictly increasing in z if w ∗ j < w ∗ k .Proof. Since the incentive cost is linear in q ( ω ) whereas the monitoring cost is convexin q ( ω ), it follows that q ∗ : Z (Ω) → ∆ K and that w ∗ j (cid:54) = w ∗ k for all j, k ∈ N ∗ . Write N ∗ = { , · · · , N } and assume w.l.o.g. that w ∗ < · · · < w ∗ N . Then w ∗ = 0 for thesame reason as in proof of Lemma 2. Differentiating the principal’s objective functionwith respect to q ( z ) yields the following first-order condition: − w ∗ n + λu ( w ∗ n ) z = µ (cid:18) log q ∗ n ( z ) q ∗ ( z ) − log π ∗ n π ∗ (cid:19) ∀ n = 2 , · · · , N, (B.6)where λ > z , thus proving Part (iii)of this theorem.The next theorem proves existence of optimal incentive contract: Theorem 8. Assume Assumptions 2 and 3. Then an optimal incentive contract thatinduces high effort from the agent exists. roof. For any given q , the wage-minimization problem admits solutions if and onlyif z j (cid:54) = z k for some j, k ∈ N , in which case we denote the minimal incentive cost by W ( q ). The principal’s problem ismin q W ( q ) + µ · H ( q , , and any solution of it must be continuous differentiable on Z (Ω) by Equation (B.6)and Assumptions 2 and 3 (taking the usual care of derivatives at end points). Define C (cid:0) Z (Ω) , ∆ K (cid:1) as the set of q ’s as above and equip C (cid:0) Z (Ω) , ∆ K (cid:1) with the supnorm (cid:107) · (cid:107) , i.e., (cid:107) q (cid:48) − q (cid:107) = sup z,n | q (cid:48) n ( z ) − q n ( z ) | . Rewrite the principal’s problem asfollows: min q ∈ C ( Z (Ω) , ∆ K ) W ( q ) + µ · H ( q , , and note that the objective function is continuous in q .To prove existence of solutions, note thatinf q ∈ C ( Z (Ω) , ∆ K ) W ( q ) + µ · H ( q , x . Let (cid:8) q k (cid:9) be any sequence in C (cid:0) Z (Ω) , ∆ K (cid:1) such that lim k →∞ W (cid:0) q k (cid:1) + µ · H (cid:0) q k , (cid:1) = x . Clearly, q k is uniformly bounded forall k , and the family (cid:8) q k (cid:9) is equicontinuous by Assumption 3 and the definition of C (cid:0) Z (Ω) , ∆ K (cid:1) . Thus, a subsequence of (cid:8) q k (cid:9) converges uniformly to some q ∞ byHelly’s selection theorem, and W ( q ∞ ) + µ · H ( q ∞ , 1) = x by the continuity of theobjective function. References Alchian, A. A., and H. Demsetz (1972): “Production, information costs, andeconomic organization,” American Economic Review , 62(5), 777-795. Baiman, S., and J. S. Demski (1980): “Economically optimal performance evalu-ation and control systems,” Journal of Accounting Research , 18(S), 184-220. Bergemann, D., and S. Morris (2019): “Information design: a unifying perspec-tive,” Journal of Economic Literature , 57(1), 44-95.47 lackwell, D. (1953): “Equivalent comparisons of experiments,” Annals of Math-ematical Statistics , 24(2), 265-272. Bloom, N., B. Eifert, A. Mahajan, D. McKenzie, and J. Roberts (2013):“Does management matter? Evidence from India,” Quarterly Journal of Eco-nomics , 128(1), 1-51. Bloom, N., R. Sadun, and J. Van Reenen (2012): “Americans do IT better: USmultinationals and the productivity miracle,” American Economic Review , 102(1),167-201. Bloom, N., and J. Van Reenen (2006): “Measuring and explaining managementpractices across firms and countries,” Centre for Economic Performance DiscussionPaper , No. 716. ——— (2007): “Measuring and explaining management practices across firms andcountries,” Quarterly Journal of Economics , 122(4): 1351-1408. ——— (2010): “Why do management practices differ across countries?,” Journal ofEconomic Perspectives , 24(1), 203-224. Cover, T. M., and J. A. Thomas (2006): Elements of information theory, Hobo-ken, NJ: John Wiley & Sons, 2nd ed. Cr´emer, J., L. Garicano, and A. Prat (2007): “Language and the theory ofthe firm,” Quarterly Journal of Economics , 122(1), 373-407. Dilm´e, F. (2017): “Optimal languages,” Working Paper . Dye, R. A. (1986): “Optimal monitoring policies in agencies,” The Rand Journal ofEconomics , 17(3), 339-350. Ewenstein, B., B. Hancock, and A. Komm (2016): “Ahead of the curve: thefuture of performance management,” McKinsey Quarterly , May. Green, J. R., and N. L. Stokey (1983): “A comparison of tournaments andcontracts,” Journal of Political Economy , 91(3), 349-364. Grossman, S. J., and O. D. Hart (1983): “An analysis of the principal-agentproblem,” Econometrica , 51(1), 7-45. 48 ayyali, B., D. Knott, and S. Van Kuiken (2013): “The ‘big data’ revolutionin healthcare,” McKinsey Quarterly , January. Holmstr¨om, B. (1979): “Moral hazard and observability,” The Bell Journal ofEconomics , 10(1), 74-91. ——— (1982): “Moral hazard in teams,” The Bell Journal of Economics , 13(2),324-340. Holmstr¨om, B., and P. Milgrom (1991): “Multitask principal-agent analyses:incentive contracts, asset ownership, and job design,” Journal of Law, Economics,and Organization , 7(S), 24-52. Hook, C., A. Jenkins, and M. Foot (2011): Introducing human resource man-agement, Pearson, 6th ed. J¨ager, G., L. P. Metzger, and F. Riedel (2011): “Voronoi languages: Equi-libria in cheap-talk games with high-dimensional signals and few signals,” Gamesand Economic Behavior , 73(2), 517-537. Kaplan, E. (2015): “The spy who fired me: the human costs of workplace monitor-ing,” Harper’s Magazine , March. Kim, S. K. (1995): “Efficiency of an information system in an agency model,” Econo-metrica , 63(1), 89-102. Lazear, E. P., and S. Rosen (1981): “Rank-order tournaments as optimal laborcontracts,” Journal of Political Economy , 89(5), 841-864. Ma´ckowiak, B., and M. Wiederholt (2009): “Optimal sticky prices underrational inattention,” American Economic Review , 99(3), 769-803. Martin, D. (2017): “Strategic pricing with rational inattention to quality,” Gamesand Economic Behavior , 104, 131-145. Matˇejka, F., and A. McKay (2012): “Simple market equilibria with ratio-nally inattentive consumers,” American Economic Review: Papers and Proceedings ,102(3), 24-29. 49 ookherjee, D. (1984): “Optimal incentive schemes with many agents,” Reviewof Economic Studies , 51(3), 433-446. Murff, H. J., F. FitzHenry, M. E. Matheny, N. Gentry, K. L. Kotter,K. Crimin, R. S. Dittus, A. K. Rosen, P. L. Elkin, S. H. Brown, and T.Speroff (2011): “Automated identification of postoperative complications withinan electronic medical record using natural language processing,” Journal of Amer-ican Medical Association , 306(8), 848-855. Ravid, D. (2017): “Bargaining with rational inattention,” Working Paper . Saint-Paul, G. (2017): “A “quantized” approach to rational inattention,” EuropeanEconomic Review , 100, 50-71. Shannon, C. E. (1948): “A mathematical theory of communication,” Bell LabsTechnical Journal , 27(3), 379-423. Sims, C. A. (1998): “Stickiness,” Carnegie-Rochester Conference Series on PublicPolicy , 49, 317-356. ——— (2003): “Implications of rational inattention,” Journal of Monetary Eco-nomics , 50(3), 665-690. Singer, N. (2013): “In a mood? Call center agents can tell,” New York Times ,October 12. Sobel, J. (2015): “Broad terms and organizational codes,” Working Paper . Woodford, M. D. (2009): “Information-constrained state-dependent pricing,” Journal of Monetary Economics , 56(S), S100-S124. Yang, M. (2019): “Optimality of debt under flexible information acquisition,”