An Agent-Based Model of Delegation Relationships With Hidden-Action: On the Effects of Heterogeneous Memory on Performance
AAn Agent-Based Model of Delegation Relationships With Hidden-Action: On the Effectsof Heterogeneous Memory on Performance
Patrick Reinwald
Department of Management Controland Strategic ManagementUniversity of KlagenfurtKlagenfurt, AustriaEmail: [email protected]: 0000-0002-2907-7939
Stephan Leitner
Department of Management Controland Strategic ManagementUniversity of KlagenfurtKlagenfurt, AustriaEmail: [email protected]: 0000-0001-6790-4651
Friederike Wall
Department of Management Controland Strategic ManagementUniversity of KlagenfurtKlagenfurt, AustriaEmail: [email protected]: 0000-0001-8001-8558
Abstract —We introduce an agent-based model of delegationrelationships between a principal and an agent, which is based onthe standard-hidden action model introduced by Holmstr¨om and,by doing so, provide a model which can be used to further exploretheoretical topics in managerial economics, such as the efficiencyof incentive mechanisms. We employ the concept of agentization,i.e., we systematically transform the standard hidden-actionmodel into an agent-based model. Our modeling approach allowsfor a relaxation of some of the rather ”heroic” assumptionsincluded in the standard hidden-action model, whereby weparticularly focus on assumptions related to the (i) availabilityof information about the environment and the (ii) principal’sand agent’s cognitive capabilities (with a particular focus ontheir learning capabilities and their memory). Our analysisfocuses on how close and how fast the incentive scheme, whichendogenously emerges from the agent-based model, convergesto the solution proposed by the standard hidden-action model.Also, we investigate whether a stable solution can emerge fromthe agent-based model variant. The results show that in stableenvironments the emergent result can nearly reach the solutionproposed by the standard hidden-action model. Surprisingly,the results indicate that turbulence in the environment leads tostability in earlier time periods.
Keywords – Agent-based modeling and simulation; Managementcontrol system; Information asymmetry; Complexity economics;Agentization.
I. I
NTRODUCTION
The standard hidden-action model introduced by Holm-str¨om [1] describes, in general, a delegation relation betweena principal and an agent. It covers a situation in which theprincipal delegates a task to an agent. The agent selects aneffort level in order to carry out this task, which is notobservable by the principal (it is hidden ). The agent’s efforttogether with some environmental impact produces outcomewhich is to be shared between the principal and the agent. Theprincipal’s objective is to maximize her share of the outcomeassociated with the task while the agent strives for maximizinghis share of the outcome minus his disutility from makingeffort to carry out the task. The standard hidden-action modelproposes an incentive scheme (i.e., a rule to share the outcome)which aligns the agent’s and the principal’s objective so thatthe principal’s utility finds its maximum. As only the outcome(but not the effort level) is observable for the principal, thesharing rule is based on outcome only. In order to derive the optimal sharing rule, agency theorymakes rather ”heroic’” assumptions about the capabilitiesof both parties with respect to (i) information processingcapacity, (ii) availability of information, and (iii) capability tofind the optimal solution immediately. Axtell [2] argues thatthe most ”heroic” assumptions are agent homogeneity, non-interactiveness, and the existence of equilibrium solutions andrefers to them as ”neoclassical sweetspot”. The result of theserather ”heroic” assumptions is that the explanatory and thepredictive power regarding real world problems substantiallydecreases [3]. Critics often refer to principal-agent models as”toy problems” and argue that solutions derived from such ”toyproblems”, where authors can ”assume complicating thingsaway”, are of limited use and only merely help to solve realworld problems [4]-[6].We take up on this critique and put the ”heroic” assump-tions included in the standard hidden-action model [1] in thefocus of this paper. In the vein of Leitner and Wall [7],we transfer the standard hidden-action model into an agent-based model following a procedure introduced by Guerreroand Axtell [8] and Leitner and Behrens [9], which allows usto relax some of the included assumptions. In particular, werelax the assumptions related to the principal’s and the agent’s(i) information processing capacity and (ii) the availabilityof information. We model situations in which the principaland the agent no longer have full information about theenvironment but have to learn this information over time. Inconsequence, they can no longer find the optimal (second-best)solution immediately, like it is the case in the standard hidden-action model. We, therefore, endow the principal and the agentwith the capability to search for the best possible incentivescheme over time by employing a hillclimbing algorithm.Please notice that the optimal solution, which is derived fromthe standard hidden-action model for cases in which the theagent’s effort is not observable, is referred to as second-bestsolution . The first-best solution, on the contrary, assumes thatthe agent’s effort is observable [10], which, in consequence,means that no incentive problem in the above outlined sensearises.The remainder of this paper is organized as follows: SectionII introduces two variants of the hidden-action model. Firstthe main features of the standard hidden-action model are a r X i v : . [ c s . M A ] S e p resented. We, then introduce the agent-based model variantwhich relaxes some of the assumptions included in the standardhidden-action model. In Section III, we elaborate on thesimulation setup, and introduce and discuss the results. SectionIV concludes the paper and gives an outlook on future work.II. T HE HIDDEN - ACTION M ODEL
This section summarizes the main features of the standardhidden-action model introduced in [1] and proposes an agent-based representation of the hidden-action problem in whichthe principal and the agent are endowed with limited andheterogeneous memory.
A. The standard hidden-action model
The standard hidden-action model, which describes a del-egation relation between a principal and an agent, was firstdescribed by Holmstr¨om [1]. This model covers a situation inwhich a principal offers an agent a contract upon a task tobe carried out and a sharing rule over the generated outcome,among other things. If the agent agrees on the condition statedin the contract, he exerts effort to complete the specifiedtask. Together with an exogenous factor, his effort generatesoutcome, but this also leads to disutility for him. For theprincipal, however, the effort the agent had carried out isunobservable, which results in a situations where only theoutcome can be used as a basis for the sharing rule. Boththe principal and the agent are individual utility maximizers[10], [5]. Furthermore, it is assumed that the principal is riskneutral and characterized by her utility function U P ( x, s ) = x − s ( x ) , (1)whereby x represents the generated outcome and s = s ( x ) is the function for the sharing rule. As mentioned before, theoutcome x = f ( a, θ ) , (2)is a function of the agent’s effort a and the exogenous factor θ .The agent, who is assumed to be risk averse, is characterizedby the utility function U A ( s, a ) = V ( s ) − G ( a ) , (3)where V ( s ) represents the generated utility from the compen-sation and G ( a ) denotes to disutulity from the exerted effort.For the contract to be effective, two additional constraints haveto be fulfilled. First, the participation constraint E ( U a ( s, a )) ≥ ¯ U , (4)which assures that the agent gets at least the utility he wouldget from the best outside option ¯ U . Second, the incentivecompatibility constraint a ∈ arg max a E { U A ( s ( x ) , a (cid:48) ) } , (5)which aligns the agent’s goal (maximizing his utility) with thegoal of the principal. For an extensive review and the formalsolution to this problem, the reader is, for example, referredto Holmstr¨om [1] and Lambert [10]. Figure 1. Flow diagram
B. Agent-based model variant
We transfer the standard hidden-action model [1] intoan agent-based model following the agentization procedureintroduced by Guerrero and Axtell [8] and Leitner and Behrens[9]. We limit the principal’s and the agent’s availability ofinformation about the environment and endow them with thecapability to learn about the environment over time. As aconsequence of this change, the principal and the agent canno longer find the optimal (second-best) contract immediatelybut search for it over time. This stepwise search for the optimalcontract requires us to switch from a one-periodic to a multi-periodic model in which the principal and the agent agree upona contract in every timestep. Due to the employed learningmechanism (which is introduced below), the principal’s andthe agent’s states of information are changing over time. Wedo not give the agent the possibility to allocate effort overmultiple periods. A condensed overview of the sequence ofthe events is provided in Figure 1.We indicate time steps by t = 1 , ..., T . After the adjustmentto a multi-periodic model, the principal’s utility functionintroduced in (1) takes the form of U P ( x t , s ( x t )) = x t − s ( x t ) , (6)where x t denotes the outcome and s ( x t ) is the agent’s com-pensation in time step t . The production function introducedin (2) is adapted, so that x t = a t + θ t , (7)where a t stands for the effort-level selected by the agent froma set of all feasible actions A t in t . A t is a subset of A withthe range from zero to the value a t , whereby the latter refersto the effort-level which is required to achieve the second-best solution according to the standard hidden-action model.Setting the boundaries in this way is in line with [1] and assuresthe feasibility of the solution. We denote the effort level forwhich the principal designs the contract by ˜ a t and refer to it asincited effort. The variable θ t denotes the realized exogenousfactor in t which follows a Normal distribution. Given theseadaptations, the agent’s compensation function s ( x t ) takes theform of s ( x t ) = x t ∗ p t , (8)where p t ∈ [0 , is the premium parameter in t . After theadaption to a multi-periodic model, the agent’s utility functionakes the form of U A ( s ( x t ) , a t ) = V ( s ( x t )) (cid:122) (cid:125)(cid:124) (cid:123) − e − η ∗ s ( x t ) η − G ( a t ) (cid:122)(cid:125)(cid:124)(cid:123) a t , (9)where η represents the agent’s Arrow-Pratt measure of risk-aversion [11]. The agent strives for maximizing this utilityfunction but, due to the adaptions outlined below, replaces theenvironmental factor θ t by his expectation thereof. For thecomputation of the agent’s expectation about the environmentalfactor see (11) and (13).The most central change during the process of agentizationis the relaxation of the availability of information about theenvironment for the principal and the agent. In the standardhidden-action model [1], it is assumed that both parties haveknowledge about the distribution of the exogenous factor andits parameterization, so that they can compute an expectedvalue for the environmental variable. In contrast to this, in theagent-based model variant they have to learn about the envi-ronment over time. The assumptions related to the environmentare relaxed in the following way:1) The principal and the agent no longer have fullinformation about the environmental factor at thebeginning of the simulation.2) The principal and the agent are able to individuallylearn about the environment over time.3) The principal and the agent also have the cognitiveability to store the gathered information about theenvironment in their memory.These changes are implemented by a simultaneous and sequen-tial learning model (see also [7]). Part of this learning model isnot only the process of gathering and storing the informationbut also the application of this information to compute theestimation of the exogenous factor. Both the principal and theagent are now able to individually learn about the environmentand estimate the exogenous factor in the same way: Theymake their conclusions about the environment on the basisof the observable outcome x t (see (7)), which is possiblebecause the only unknown information in the equation is therealized environmental factor. Recall, both the principal andthe agent know the realized outcome x t , the agent knows theexerted effort a t , and the principal knows the incited effort ˜ a t .Using this information, the principal and the agent can inferthe estimations of the environmental factor from the realizedoutcome, according to ˜ θ t = x t − ˜ a t , (10)and θ t = x t − a t , (11)respectively. Furthermore, they are able to privately storethe estimations in their memory until their defined cognitivecapacity ( m P for the principal and m A for the agent) isreached. Once the cognitive capacity is reached, the oldestentries are replaced by the most recent observations. Theirexpectation about the environmental factor is computed byaveraging all privately stored and retrievable estimations of TABLE I. Notation for the agent-based model variant
Description ParameterTimesteps t Principal’s utility U P Agent’s utility U A Agent’s Arrow-Pratt measure of risk-aversion η Agent’s share of outcome in t s ( x t ) = x t ∗ p t Outcome x t = a t + θ t Principal’s expected outcome ˜ x Pt Premium parameter in t p t Exerted effort level in t a t Induced effort level by the principal in t ˜ a t Set of all feasible actions in t A t Exogenous (environment) variable in t θ t Principal’s estimation of the realized exogenous factor in t ˜ θ t Mental capability of the principal m P Mental capability of the agent m A Averaged expected exogenous factor of the principal ˆ θ Pt Averaged expected exogenous factor of the agent ˆ θ At exogenous factors, so that ˆ θ P t = t − n = t − (cid:80) n =1 ˜ θ n if m P = ∞ , m P n = t − (cid:80) ∀ t ≤ m P : n =1 ∀ t>m P : n = t − m P ˜ θ n if m P < ∞ , (12)for the principal and ˆ θ At = t − n = t − (cid:80) n =1 θ n if m A = ∞ , m A n = t − (cid:80) ∀ t ≤ m A : n =1 ∀ t>m A : n = t − m A θ n if m A < ∞ , (13)for the agent.Next, the principal randomly discovers two alternativeeffort levels in the search space A t which together with ˜ a t serve as candidates for ˜ a t +1 . Using her expectation about theenvironment ˆ θ P t , the principal computes the expected outcome ˜ x P t (according to (7)) and her expected utility (according to(6)) associated with all three candidates. Notice that for alleffort levels the associated premium parameter is computedaccording to p t = max p =[0 , U p (˜ x P t , s (˜ x P t )) . (14)Finally, the principal selects the candidate with the highestexpected utility as desired effort level ˜ a t +1 for period t + 1 and communicates the associated premium parameter p t +1 tothe agent. In period t + 1 , the agent starts over the procedurewith selecting an effort level a t +1 using p t +1 (see Figure 1).III. S IMULATIONS PARAMETERS AND R ESULTS
We have conducted 8 scenarios with different underlyingassumptions about the cognitive capacity (memory) of the prin-cipal and the agent and two different levels of environmentalturbulence.Recall that the environmental variable is modeled to followa Normal distribution. The variations in environmental turbu-lence are operationalized by altering this distribution’s standarddeviation which is set relative to the optimal outcome x of thestandard hidden-action model (second-best solution in [1]). Forthe case of a rather stable environment we set σ = 0 . x and for a rather unstable environment we set σ = 0 . x . Thedistribution’s mean is fixed at zero.n the 8 investigated scenarios, we have, in general, twoconfigurations related to the cognitive capacity of the principaland the agent. The first one assumes that the principal hasadvantage in cognitive capacity and the second assumes theopposite situation. Each of them consist of 4 scenarios wherethe one in advantage always has unlimited memory and theother one has either a memory with the length of one or five.The agent’s Arrow-Pratt measure is always set to η = 0 . ,which characterizes a risk-averse agent.For every scenario, we perform 700 repetitions ( R = 700 ),our analysis focuses on the first 20 time steps ( t = 1 , ..., ). Inorder to implement our simulation model we use MATLAB R (cid:13) .We report the averaged normalized effort level carriedout by the agent in every period t as performance measure.Therefore in every simulation run r = 1 , ..., R and for everyperiod t = 1 , ..., T we track the effort level a tr chosen by theagent, and normalize it by the optimal level of effort a ∗ . φ t = 1 R r = R (cid:88) r =1 a tr a ∗ (15)We report the averaged normalized effort level as a measureof performance, as provides fundamental insights into thefunctioning of the emergent incentive schemes without furtherperturbation caused by environmental turbulence. A. Results 1: Advantage in information for the agent
First, we analyze the results in the situation with theadvantage for the agent. In the two scenarios with a relativelystable environment ( σ = 0 . x ) we can see that performanceis significantly higher (at the end of the observation period) forcases in which the principal’s cognitive capacity is higher, too(in terms of an increase in memory length): As the principal’smemory increases from m P = 1 to m P = 5 , the observedfinal averaged normalized effort level increases from . to . (see the two top subplots in Figure 2). We canobserve similar results for the scenarios with a rather unstableenvironment ( σ = 0 . x ). Here, the final performance measureincreases from . to . as the principal’s memoryincreases from m P = 1 to m P = 5 (see the two bottomsubplots in Figure 2). Furthermore, in line with intuition, wecan observe that an increase in environmental turbulence leadsto a decrease in the effort exerted by the agent.We further analyze the effect of the principal’s memory onthe average variance of the results (i.e., the average variance ofthe effort exerted by the agent throughout the entire observa-tion period): For the scenarios in a rather unstable environment,the average standard deviation significantly decreases from . (for m P = 1 ) to . (for m P = 5 ). For the casesin a rather stable environment no significant differences canbe observed. The significance at the -level was confirmedusing an F-test. Thus, increasing the principal’s cognitivecapacity in an unstable environment significantly reduces thevariance of the effort induced by the incentive scheme (andexerted by the agent). In other words, if the principal managesto increase her memory in an turbulent environment, she notonly increases the average normalized effort level but alsosignificantly reduces the risk of extreme deviations from thisvalue.Finally, we take a look on the stability of the averagednormalized effort level. We regard a solution to be stable assoon as (i) the averaged normalized effort level at period t Figure 2. Situations with advantage regarding cognitive capacity for theagent (A) and the cognitive capacity of the principal (P) with either m P = 1 or m P = 5 . Scenarios a plotted for two different environmentalsituations, stable ( σ = 0 . x ) and unstable σ = 0 . x .. is not significantly different from the same measure in period t − , and given that (ii) this condition does not change after thispoint in time. For this analysis, we perform a T-test and set α =0 . . First, we focus on rather stable environments: For caseswith a low (high) cognitive capacity for the principal of m P =1 and m P = 5 , we observe a stable solution for periods t = 7 and t = 9 onwards, respectively. For unstable environments,we can observe that a stable solution emerges earlier in bothscenarios. For m P = 1 ( m P = 5 ) the solution becomes stablein period t = 4 ( t = 6 ). This is a counter-intuitive result asone would expect that turbulence in the environment leads toinstability in the emergent solution. B. Result 2: Advantage in information for the principal
This section analyses the cases in which the principal hasan advantage in information. In stable environments ( σ =0 . x ), we cannot observe significant differences: For theagent’s memory of m A = 1 and m A = 5 , the observedfinal averaged normalized effort levels are . and . ,respectively (see the two top subplots in Figure 3). The sameresult can be observed for unstable environments ( σ = 0 . x ).Here, the final performances are . and . for m A =1 and m A = 5 , respectively (see the two bottom subplots inFigure 3). Additionally, in line with intuition, we can observethat an increase in environmental turbulence leads to a decreasein the effort exerted by the agent.Further, we analyze the effects of the agent’s memory onthe average variance of the results (i.e., the average varianceof the effort exerted by the agent throughout the entire ob-servation period). For unstable environments ( σ = 0 . x ),the observed results are similar to the ones presented in theprevious section: The variances of the exerted effort levelsare significantly different at the 99%-level for m A = 1 and m A = 5 , for the former (latter) we observe a standard deviationof . ( . ). For stable environmnents, the results indi-cate that increasing the agent’s memory significantly decreasesthe standard deviation of the exerted effort at the 95%-level: igure 3. Scenarios with advantage regarding cognitive capacity for theprincipal (P) and the cognitive capacity of the agent (A) with either m A = 1 or m A = 5 . Scenarios a plotted for two different environmentalsituations, stable ( σ = 0 . x ) and unstable σ = 0 . x . We observe . and . for m A = 1 and m A = 5 ,respectively. Thus, increasing the agent’s cognitive capacitysignificantly reduces the variance of the effort induced by theincentive scheme (and exerted by the him). In other words,if the agent manages to increase his memory, he significantlyreduces the risk of extreme deviations from the performancevalue.Finally, we investigate the stability of the averaged normal-ized effort level. First, we focus on the scenarios with ratherstable environments: For the cognitive capacity of the agentof m = 1 and m = 5 we reach the stable point at period t = 14 and t = 13 , respectively. For unstable environments wecan observe in both scenarios that a stable solution emergesearlier. For m = 1 ( m = 5 ) the performance reaches a stablepoint at period t = 9 ( t = 7 ). This leads to the same counter-intuitive result as in the section before, as one would expectthat turbulence in the environment leads to instability in theemergent solution.IV. C ONCLUSION AND F UTURE W ORK
The results presented in this paper deliver some insightsabout the effects of heterogeneous agents in a hidden-actionsetting: • Our results suggest, that gathering information aboutthe environment is a good strategy for the principal toincrease his utility especially situations in which theenvironment is rather turbulent. • In turbulent environments, increasing the memory ofboth the principal and the agent always has a positiveeffect on the variance of the results. This means thatincreasing the memory significantly reduces the riskof extreme deviations from the performance measuresreported above. • In stable environments, the results, on the contrary,suggest that only an increase of the agent’s memoryleads to an significant decrease in the exerted effort’svariance. • Surprisingly, the presented results indicate, that turbu-lence has a positive effect on stability, so that a stablesolution emerges earlier in turbulent environments.Future work might want to deeper investigate the effectsof heterogeneous memory in the hidden-action setting (e.g,more memory length) and also include cognitive biases whencharacterizing the principal’s and the agent’s cognitive capabil-ities (such as the recency or the primacy effect [12]). Anotherpotentially fruitful option avenue for future research might beto limit the principal’s knowledge about the characteristics ofthe agent (such as the utility function).A
CKNOWLEDGMENT