[PDF] On the Value of Penalties in Time-Inconsistent Planning

Abstract

Full PDF

aa r X i v : . [ c s . D S ] F e b On the Value of Penalties in Time-Inconsistent Planning

Susanne Albers ∗ Dennis Kraft † Abstract

People tend to behave inconsistently over time due to an inherent present bias. As this may impairperformance, social and economic settings need to be adapted accordingly. Common tools to reduce theimpact of time-inconsistent behavior are penalties and prohibition. Such tools are called commitment de-vices. In recent work Kleinberg and Oren [5] connect the design of prohibition-based commitment devicesto a combinatorial problem in which edges are removed from a task graph G with n nodes. However, thisproblem is NP-hard to approximate within a ratio less than √ n/ [2]. To address this issue, we proposea penalty-based commitment device that does not delete edges but raises their cost. The beneﬁts of ourapproach are twofold. On the conceptual side, we show that penalties are up to /β times more efﬁcientthan prohibition, where β ∈ (0 , parameterizes the present bias. On the computational side, we signif-icantly improve approximability by presenting a -approximation algorithm for allocating the penalties.To complement this result, we prove that optimal penalties are NP-hard to approximate within a ratio of . . Most people make long term plans. They intend to eat healthy, save money, prepare for exams, exerciseregularly and so on. Curiously, the same people often change their plans at a later point in time. They indulgein fast food, squander their money, fail to study and skip workouts. Although change may be necessary dueto unforeseen events, people often change their plans even if the circumstances stay the same. This type of time-inconsistent behavior is a well-known phenomenon in behavioral economics and might impair a person’sperformance in social or economic domains [1, 8].A sensible explanation for time-inconsistent behavior is that people are present biased and assign dis-proportionately greater value to the present than to the future. Consider, for instance, a scenario in whicha student named Alice attends a course over several weeks. To pass the course, Alice either needs to solvea homework exercise each week or give a presentation once. The presentation incurs a onetime effort of ,whereas each homework exercise incurs an effort of . Assume that she automatically fails the course if shemisses a homework assignment before she has given a presentation. If the course lasts for more than weeks,she clearly minimizes her effort by giving a presentation in the ﬁrst week. Paradoxically, if Alice is presentbiased, she might solve all homework exercises instead. The reason for this is the following:Suppose Alice perceives present effort accurately, but discounts future effort by a factor of β = 1 / . Inthe ﬁrst week Alice must decide between solving the homework exercise or giving a presentation. Clearly, thehomework incurs less immediate effort than the presentation. Furthermore, Alice can still give a presentationnext week. Her perceived effort for doing the homework this week and giving the presentation the weekafter is β . To Alice this plan appears more convenient than giving the presentation right away.Consequently, she does the homework. However, come next week she changes this plan and postpones the ∗ Department of Computer Science, Technical University of Munich, 85748 Garching, Germany; [email protected] . Worksupported by the European Research Council, Grant Agreement No. 691672. † Department of Computer Science, Technical University of Munich, 85748 Garching, Germany. [email protected]

Previous Work

Time-inconsistent behavior has been studied extensively in behavioral economics. Foran introduction to the topic refer for example to [1]. Alice’s scenario demonstrates how time-inconsistencyarises whenever people are present biased. Alice evaluates her preferences based on a well-established dis-counting model called quasi-hyperbolic-discounting [7]. As her story shows, quasi-hyperbolic-discountingtempts people to make poor decisions. To prevent poor decisions, social and economic settings need to beadapted accordingly. Depending on the domain, such adaptations might be implemented by governments,companies, teachers or people themselves. We call these entities designers and their motivation can be benev-olent or self-serving. In either case, the designer’s objective is to commit people to a certain goal. Their toolsare called commitment devices and may include rewards, penalty fees and strict prohibition [3, 9].Until recently, the study of time-inconsistent behavior lacked a unifying and expressive framework. How-ever, groundbreaking work by Kleinberg and Oren closed this gap by reducing the behavior of a quasi-hyperbolic-discounting person to a simple planning problem in task graphs [5]. Their framework has helpedto identify various structural properties of social and economic settings that affect the performance of presentbiased individuals [5, 10]. It has also been extended to people whose present bias varies over time [4] as wellas people who are aware of their present bias and act accordingly [6]. We will formally introduce the frame-work in Section 2. A signiﬁcant part of Kleinberg and Oren’s work is concerned with the study of a simpleyet powerful commitment device based on prohibition [5]. In particular, they demonstrate how performancecan be improved by removing a strategically chosen set of edges from the task graph. The drawback of theirapproach is its computational complexity. As it turns out, an optimal commitment device is NP-hard to ap-proximate within a ratio less than √ n/ , where n denotes the number nodes in the task graph [2]. Currently,the best known polynomial-time approximation achieves a ratio of √ n [2]. It should be mentioned thatKleinberg and Oren’s framework has also been used to analyze reward-based commitment devices [2, 10].Unfortunately, their computational complexity does not permit a polynomial-time approximation within aﬁnite ratio unless P = NP [2].

Our Contribution

To circumvent the theoretical bottleneck mentioned above, we propose a naturalgeneralization of Kleinberg and Oren’s commitment device. Instead of prohibition, our commitment device isbased on penalty fees, a standard tool in the economic literature [3, 9]. This means that the designer is free toraise the cost of arbitrary edges in the task graph. We call such an assignment of penalties a cost conﬁguration .The designer’s objective is to construct cost conﬁgurations that are as efﬁcient as possible.In Section 3 we conduct a quantitative comparison between the efﬁciency of prohibition-based and penalty-based commitment devices. Assuming that optimal solutions are known, we show that penalties are strictlymore powerful than prohibitions. In particular, we show that penalties may outperform prohibitions by a factorof /β where β parameterizes the present bias. This result is tight. In Section 4 we investigate the computa-tional complexity of our commitment device. Using a reduction from -SAT, we argue that the constructionof an efﬁcient cost conﬁguration is NP-hard when posed as a decision problem. A generalization of this re-duction proves NP-hardness for approximations within a ratio of . . Unless P = NP , this dismissesthe existence of a polynomial-time approximation scheme. While analyzing the complexity of our commit-ment device we also point to a remarkable structural property. More speciﬁcally, we show that every costconﬁguration admits another cost conﬁguration of comparable efﬁciency that assigns its cost entirely alonga single path. Assuming that the path is known in advance, we provide an algorithm for constructing such acost conﬁguration in polynomial-time. This result is important for the design of exact algorithms as it reducesthe search space to the set of paths through the task graph. Finally, Section 5 introduces a -approximationalgorithm for our commitment device. This is the main result of our work and a considerable improvement to2he complexity theoretic barrier of √ n/ for approximating prohibition-based commitment devices [2]. In the following, we introduce Kleinberg and Oren’s framework [5]. Let G = ( V, E ) be a directed acyclicgraph with n nodes that models a given long-term project. The edges of G correspond to the tasks of theproject and the nodes represent the states. In particular, there exists a start state s and a target state t . Eachpath from s to t corresponds to a valid sequence of tasks to complete the project. The effort of a speciﬁc taskis captured by a non-negative cost c ( e ) assigned to the associated edge e .To complete the project, an agent with a present bias of β ∈ (0 , incrementally constructs a path from s to t as follows: At any node v different from t , the agent evaluates her lowest perceived cost . For thispurpose she considers all paths P leading from v to t . However, she only anticipates the cost of the ﬁrstedge of P correctly; all other edges of P are discounted by her present bias. More formally, let d ( w ) de-note the cost of a cheapest path from node w to t . The agent’s lowest perceived cost at v is deﬁned as ζ ( v ) = min { c ( v, w ) + βd ( w ) | ( v, w ) ∈ E } . We assume that she only traverses edges ( v, w ) that minimizeher anticipated cost, i.e. edges for which c ( v, w ) + βd ( w ) = ζ ( v ) . Ties are broken arbitrarily. For conve-nience, we deﬁne the perceived cost of ( v, w ) as η ( v, w ) = c ( v, w ) + βd ( w ) . The agent is motivated by anintrinsic or extrinsic reward r collected at t . As she receives this reward in the future, she perceives its value as βr at each node different from t . When located at v , she compares her lowest perceived cost to the anticipatedreward and continues moving if and only if ζ ( v ) ≤ βr . Otherwise, if ζ ( v ) > βr , we assume she abandonsthe project. We call G motivating if she does not abandon while constructing her path from s to t . Note thatin some graphs the agent can take several paths from s to t due to ties between incident edges. In this case, G is considered motivating if she does not abandon on any of these paths.For the sake of a clear presentation, we will assume throughout this work that each node of G is locatedon a path from s to t . This assumption is sensible for the following reason: Clearly, the agent can only visitnodes that are reachable from s . Furthermore, she is not willing to enter nodes that do not lead to the reward.Consequently, only nodes that are on a path from s to t are relevant to her behavior. Note that all nodes thatdo not satisfy this property can be removed from G in a simple preprocessing step.To illustrate the model, we revisit Alice’s scenario from Section 1. Assume that the course takes m weeks.We represent each week i by a distinct node v i and set s = v . Furthermore, we introduce a target node t thatmarks the passing of the course. Each week i < m Alice can either give a presentation or proceed with thehomework. We model the ﬁrst case by an edge ( v i , t ) of cost and the latter case by an edge ( v i , v i +1 ) of cost . In the last week, i.e. i = m , Alice’s only sensible choice is to do the homework. Therefore, edge ( v m , t ) isof cost . Recall that Alice’s present bias is β = 1 / . Moreover, assume that her intrinsic reward for passingis r = 6 . For i < m her perceived cost of the edges ( v i , t ) is η ( v i , t ) = c ( v i , t ) = 3 . As this is less than herperceived reward, which is β , she is never motivated to give a presentation right away. However, herperceived cost of the edges ( v i , v i +1 ) is at most η ( v i , v i +1 ) ≤ c ( v i , v i +1 ) + βc ( v i +1 , t ) ≤ . This matches herperceived reward. As a result, she walks from v to v m along the edge ( v i , v i +1 ) . Once she reaches v m shetraverses the only remaining edge for a perceived cost of η ( v m , t ) = c ( v m , t ) = 1 and passes the course. Thismatches our analysis from Section 1. In this section we demonstrate how the designer can modify a given project to help the agent reach t . For thispurpose, the designer may have several commitment devices at her disposal. A straightforward approach is toincrease the reward that the agent collects at t . Although this may keep the agent from abandoning the projectprematurely, it has no inﬂuence on the path taken by the agent. Furthermore, increasing the reward may be3ostly for the designer. As a result, the designer has two conﬂicting objectives. On the one hand, she mustensure that the agent reaches t . On the other hand, she needs to minimize the resources spent. To deal withthis dilemma, Kleinberg and Oren allow the designer to prohibit a strategically chosen set of tasks [5]. Thiscommitment device is readily implemented in their framework. In fact, it is sufﬁcient to remove all edges ofprohibited tasks. The result is a subgraph G ′ that may signiﬁcantly reduce the reward required to motivate theagent. Unfortunately, an optimal subgraph G ′ is NP-hard to approximate within a ratio less than √ n/ [2].To circumvent this theoretical bottleneck, we propose a different approach. Instead of prohibiting certaintasks we allow the designer to charge penalty fees. Such fees could be implemented in several ways; forinstance in the form of donations to charity. Our only assumption is that the designer does not beneﬁt fromthe fees, i.e. there is no incentive to maximize the fees payed by the agent. Similar to commitment devicesbased on prohibition, our commitment device is readily implemented in Kleinberg and Oren’s framework.The designer simply assigns a positive extra cost ˜ c ( e ) to the desired edges e . The new cost of e is equal to c ( e ) + ˜ c ( e ) . We call ˜ c a cost conﬁguration . Applying a cost conﬁguration to G yields a new task graph withincreased edge cost. All concepts of the original framework carry over immediately. Sometimes it will benecessary to compare different commitment devices with each other. To clarify which commitment device weare talking about, we use the following notation whenever necessary: If we consider a subgraph G ′ , we write d G ′ , η G ′ and ζ G ′ . Similarly, if we consider a cost conﬁguration ˜ c , we write d ˜ c , η ˜ c and ζ ˜ c . Moreover, we denotethe trivial cost conﬁguration, i.e. the one that assigns no extra cost, by ˜0 .It is interesting to think of penalty fees as a natural generalization of prohibition. This becomes particularlyapparent in the context of Kleinberg and Oren’s framework as we can recreate the properties of any subgraph G ′ by a cost conﬁguration ˜ c . For this purpose, it is sufﬁcient to assign an extra cost of ˜ c ( e ) = r + 1 toany edge e not contained in G ′ . As a result, the agent’s perceived cost of paths along e certainly exceeds herperceived reward. However, this means that e is irrelevant to the agent’s planning and could be deleted from G altogether. Consequently, penalties are at least as powerful as prohibitions. But how much more efﬁcient arepenalties in the best case? As the following theorem suggests, cost conﬁgurations may outperform subgraphsby a factor of almost /β . Theorem 1.

The ratio between the minimum reward r that admits a motivating subgraph and the reward q ofa motivating cost conﬁguration is at most /β . This bound is tight.Proof. To see that r/q ≤ /β , let G be an arbitrary task graph and consider a subgraph G ′ whose only edgesare those of a cheapest path P from s to t . Recall that d ( s ) denotes the cost of P . In G ′ the agent’s only choiceis to follow P . Because her perceived cost is a discounted version of the actual cost, she never perceives acost greater than d ( s ) in G ′ . Consequently, d ( s ) /β is an upper bound on r . Next, consider an arbitrary costconﬁguration ˜ c . As ˜ c only increases edge cost, the agent’s lowest perceived cost at s is at least βd ( s ) . Weconclude that q must be at least d ( s ) to be motivating. This yields the desired ratio.It remains to show the tightness of the result. For this purpose, we construct a task graph G such that: (a)The minimum reward that admits a motivating subgraph is /β . (b) There exists a cost conﬁguration thatis motivating for a reward of (1 + ε ) /β , where ε is a positive value strictly less than . Our construction isa modiﬁed version of Alice’s task graph. Let m = ⌈ β − (1 − β ) − ε − ⌉ and assume that G contains a path v , . . . , v m +1 whose edges are all of cost (1 − β ) ε . We call this the main path and set s = v and t = v m +1 .In addition to the main path, each v i with i ≤ m has a shortcut to t via a common node w . The edges ( v i , w ) are free, whereas ( w, t ) is of cost /β . Figure 1 illustrates the structure of G . Note that the drawing mergessome of the edges ( v i , w ) for a concise representation.We proceed to argue that G satisﬁes (a). For the sake of contradiction, assume the existence of a subgraph G ′ that is motivating for a reward r < /β . In this case the agent must not take shortcuts as her perceived costat w exceeds her perceived reward. Therefore, she must follow the main path. In particular, she must visit eachnode v i on the ﬁrst half of the path, i.e. i ≤ m + 1 . At each of these nodes, her lowest perceived cost is realizedalong the edge ( v i , v i +1 ) . Essentially, there are two ways she can come up with this cost. First, she might4 v v v m tw . . .. . . (1 − β ) ε (1 − β ) ε (1 − β ) ε /β Figure 1: Graph maximizing the ratio between the efﬁciency of subgraphs and cost conﬁgurationsplan to take a shortcut at a later point in time. As a result, we get η G ′ ( v i , v i +1 ) ≥ c ( v i , v i +1 ) + βc ( w, t ) > .Secondly, she might plan to stay on the main path. In this case she must traverse at least m edges, each ofwhich contributes β (1 − β ) ε or more to η G ′ ( v i , v i +1 ) . Consequently, we get η G ′ ( v i , v i +1 ) ≥ mβ (1 − β ) ε ≥ /β ≥ . Either way her perceived cost for taking the main path is at least . As this tempts her to take theshortcut at v i , all of the ﬁrst m + 1 shortcuts must be interrupted in G ′ . This means she must walk along atleast m edges of the main path before taking the ﬁrst shortcut. As a result, her lowest perceived cost at v isat least ζ G ′ ( v ) ≥ mβ (1 − β ) ε ≥ /β . This is a contradiction to the assumption that r is motivating.Next we show how to construct a cost conﬁguration ˜ c that satisﬁes (b). For this purpose it is sufﬁcient toadd an extra cost of ε to all edges ( v i , w ) . To upper bound the agent’s perceived cost of ( v i , v i +1 ) , assumeshe plans to take a shortcut in the next step, i.e. at v i +1 . For i < m we get η ˜ c ( v i , v i +1 ) ≤ c ( v i , v i +1 ) + β (˜ c ( v i +1 , w ) + c ( w, t )) = (1 − β ) ε + βε + 1 < ε . In the special case of i = 2 m , the inequality η ˜ c ( v i , v i +1 ) < ε is still satisﬁed, this time via the direct edge ( v m , t ) . In contrast, the agent’s perceivedcost of an immediate shortcut is η ˜ c ( v i , t ) = ε + βc ( w, t ) = 1 + ε for all i ≤ m . Therefore, she is nevertempted to divert from the main path. Furthermore, a reward of q = (1 + ε ) /β is sufﬁcient to keep hermotivated. We now turn our attention to the computational aspects of designing efﬁcient penalty fees. In this section, weassume that the agent’s reward is ﬁxed to some value r > . Our goal is to compute cost conﬁgurations thatare motivating for r whenever they exist. Similar to prohibition-based commitment devices [2], this task isNP-hard whenever the agent is present biased, i.e. β = 1 . We will prove this claim at the end of the section.But ﬁrst, assume that we already have partial knowledge of the solution. More precisely, assume we know oneof the paths the agent might take in a motivating cost conﬁguration provided a motivating cost conﬁgurationexists. We call this path P . Based on P , Algorithm 1 constructs a cost conﬁguration ˜ c that is motivating for aslightly larger reward r + ε . Algorithm 1: P ATH A ND F ENCE

Input : Task graph G , present bias β , path P = v , . . . , v m , positive value ε Output : Cost conﬁguration ˜ c ˜ c ← ˜0 ; for i from m − to do foreach w ∈ { w ′ | ( v i , w ′ ) ∈ E } do if w = v i +1 then ˜ c ( v i , w ) ← max { , η ˜ c ( v i , v i +1 ) − η ˜ c ( v i , w ) + βε/ ( m − } return ˜ c ;The basic idea of Algorithm 1 is simple. Starting with v m − , it considers all nodes v i of P in reverseorder. For each v i it assigns an extra cost of max { , η ˜ c ( v i , v i +1 ) − η ˜ c ( v i , w ) + βε/ ( m − } to the edges ( v i , w ) that leave P , i.e. edges different from ( v i , v i +1 ) . As a result, the agent’s perceived cost of ( v i , w ) is5reater than that of ( v i , v i +1 ) by at least βε/ ( m − . Consequently, she has no incentive to divert from P at v i . Since the algorithm runs in reverse order, extra cost assigned in iteration i has no effect on the agent’sbehavior at later nodes, i.e. nodes v j with j > i . Figuratively speaking, the algorithm builds a fence of penaltyfees along P preventing the agent from leaving P . For this reason, we call the algorithm P ATH A ND F ENCE .As the next proposition suggests, cost conﬁgurations of this particular fence structure can achieve almost thesame efﬁciency as any other cost conﬁguration. Due to space constraints, refer to the Appendix for a proof.

Proposition 1.

Let P be the agent’s path from s to t with respect to a cost conﬁguration ˜ c ∗ that is motivatingfor a reward r . P ATH A ND F ENCE constructs a cost conﬁguration ˜ c that is motivating for a reward of r + ε ,where ε is an arbitrary small but positive quantity. Proposition 1 has some interesting implications. The ﬁrst one is of conceptual nature. Note that P

ATH A ND -F ENCE constructs a cost conﬁguration that never actually charges the agent any extra cost. This suggests theexistence of efﬁcient penalty-based commitment devices that do not require the designer to enforce penalties.The mere threat of repercussions appears to be sufﬁcient. The second implication is computational. Clearly,P

ATH A ND F ENCE runs in polynomial-time with respect to n . In particular, the number of iterations does notdepend on the choice of ε . Consequently, P ATH A ND F ENCE can be combined with an exhaustive search algo-rithm that considers all paths from s to t to search for a motivating cost conﬁguration. Although the numberof such paths can be exponential in n , this approach still reduces the size of the search space considerably.Finally, it should be noted that a similar result for commitment devices based on prohibition is unlikely toexist. The reason is that subgraphs remain hard to approximate even if the agent’s optimal path is known [2],indicating a favorable computational complexity for the design of penalty fees. Of course there is anotherpotential source of hardness: the computation of P . To prove that this is a limiting factor, we introduce thedecision problem MOTIVATING COST CONFIGURATION: Deﬁnition 1 (MCC) . Given a task graph G , a reward r > and a present bias β ∈ (0 , , decide the existenceof a motivating cost conﬁguration. We propose a reduction from -SAT to show that MCC is NP-complete for arbitrary β ∈ (0 , . At a laterpoint we will use the same reduction to establish a hardness of approximation result. Theorem 2.

MCC is NP-complete for any present bias β ∈ (0 , .Proof. According to [2], whether or not a given task graph is motivating for a ﬁxed reward can be veriﬁedin polynomial-time. Of course, this remains valid if the edges are assigned extra cost. Consequently, anymotivating cost conﬁguration is a suitable certiﬁcate for a ”yes”-instance of MCC. We conclude that MCCis in NP. In the following, we present a reduction from -SAT to show that MCC is also NP-hard. Thisestablishes the theorem.Let I be an arbitrary instance of -SAT consisting of ℓ clauses c , . . . , c ℓ over m variables x , . . . , x m .We construct a MCC instance J such that its task graph G admits a motivating cost conﬁguration for a rewardof r = 1 /β if and only if I has a satisfying variable assignment. Figure 2 depicts G for a small sampleinstance of I . In general, G consists of a source s , a target t and ﬁve nodes u , . . . , u . Depending on I , G also contains some extra nodes. For each variable x k , there are two variable nodes w k,T and w k,F . The ideais to interpret x k as true whenever the agent visits w k,T and as false whenever she visits w k,F . As a result, theagent’s walk through G yields a variable assignment τ . Furthermore, for each clause c i there is a literal node v i,j corresponding to the j -th literal of c i . Our goal is to construct G in such a way that every motivating costconﬁguration guides the agent along literal nodes that are satisﬁed with respect to τ .All nodes v i,j and w k,y are connected via so-called forward edges . More speciﬁcally, for all ≤ i < ℓ and ≤ j, j ′ ≤ there is a forward edge from v i,j to v i +1 ,j ′ . Similarly, there is a forward edge from w k,y to w k +1 ,y ′ for all ≤ k < m and y, y ′ ∈ { T, F } . We also have forward edges from s to each v ,j , from each6 v , v , v , v , v , v , v , v , v , w ,T w ,F w ,T w ,F w ,T w ,F u u u u u t (1 − β ) − ε (1 − β ) − ε (1 − β ) − ε (1 − β ) − ε (1 − β ) (1 − β ) − ε (1 − β ) − ε (1 − β ) − ε (1 − β ) − ε (1 − β ) (1 − β ) 1 Figure 2: Reduction from the 3-SAT instance: (¯ x ∨ x ∨ x ) ∧ ( x ∨ ¯ x ∨ ¯ x ) ∧ ( x ∨ ¯ x ∨ x ) v ℓ,j to u , from u to each w ,y and from each w m,y to u . For the sake of readability, some forward edgesare merged in Figure 2. The price of each forward edge is (1 − β ) − ε , where the encoding length of β isassumed to be polynomial in I . Furthermore, ε denotes a small but positive quantity such that ε < min n (1 − β ) , β (1 − β ) β , β (1 − β ) β o . In addition to the forward edges, there are three types of shortcuts . The ﬁrst type, which is depicted as dashededges in Figure 2, connects each literal node v i,j to a distinct variable node via a single edge of cost (1 − β ) .If the j -th literal of c i is equal to x k , the shortcut goes to w k,F . Otherwise, if the literal is negated, i.e. ¯ x k ,the shortcut goes to w k,T . The second type of shortcut goes from u to t along a single edge of cost − β .For clear representation, this shortcut is omitted in Figure 2. The third type of shortcut connects each variablenode w k,y to t via a distinct intermediate node. The ﬁrst edge is free while the second costs − β . Again,shortcuts of this type are omitted in Figure 2 to keep the drawing simple. Finally, there are four more edges ( u , u ) , ( u , u ) , ( u , u ) and ( u , t ) of cost (1 − β ) , (1 − β ) , − β and respectively. Note that G isacyclic and its encoding length is polynomial in I .To establish the theorem, we must show that I has a satisfying variable assignment if and only if J hasa motivating cost conﬁguration. A detailed argument is described in the Appendix. At this point we onlysketch the main ideas. For this purpose let ˜ c be a cost conﬁguration that is motivating for a reward of /β andlet P be the agent’s path through G with respect to ˜ c . Note that P cannot contain shortcuts of the second orthird type as their edges are too expensive. Furthermore, P cannot contain a shortcut of the ﬁrst type becausethe agent either perceives it as too expensive or is tempted to enter a shortcut of the third type immediatelyafterwards. As a result, P contains exactly one of the two nodes w k,T and w k,F for each variable x k . Let τ : { x , . . . , x m } → { T, F } be the corresponding variable assignment. To keep the agent on P , ˜ c mustassign extra cost to all shortcuts that start at a variable node satisﬁed by τ . However, this raises the perceivedcost of all paths via literal nodes not satisﬁed by τ to values that are not motivating. Consequently, P cannotcontain such literal nodes. But P must contain exactly one literal node of each clause because P takes noshortcuts. This means that τ satisﬁes at least one literal in each clause and is therefore a feasible solutionof I . Conversely, whenever I has a feasible solution τ , we can construct a motivating cost conﬁguration ˜ c as follows: First, assign an appropriate extra cost, e.g. (1 − β ) , to the shortcuts of type three starting at thevariable nodes w k,τ ( x k ) . Secondly, block the forward edges into the variable nodes w k,τ (¯ x k ) with high extracost of e.g. . 7 v v v v tw (1 − β ) (1 − β ) (1 − β ) − β − β ) − β Figure 3: Task graph with no optimal cost conﬁguration

The previous section showed that optimal penalty-based commitment devices are NP-hard to design. This sec-tion therefore focuses on an optimization version of the problem. Our goal is to construct cost conﬁgurationsthat require the designer to raise the reward at t as little as possible. However, before we provide a formaldeﬁnition of the problem we should consider a curious technical detail; namely, not all task graphs admit anoptimal cost conﬁguration.Consider, for instance, the task graph in Figure 3. At v the agent is indifferent between the edges ( v , v ) and ( v , w ) . In both cases her perceived cost is . If she chooses ( v , w ) , she faces a perceived cost of − β at w . Conversely, if she chooses ( v , v ) , she perceives a cost of at v , v and v . Assuming that β < , ( v , v ) is the better choice. To break the tie between ( v , w ) and ( v , v ) we must place a positive extra costof ε onto the upper path. However, when located at s the agent’s perceived cost of the upper path is βε .In contrast, her perceived cost of the lower path is β (1 − β ) . Assuming that ε < (1 − β ) , she prefersthe upper path. Consequently, we can construct a cost conﬁguration that is motivating for a reward arbitrarilyclose to /β , but no cost conﬁguration is motivating for a reward of exactly /β . To account for the potentiallack of an optimal solution, we compare our results to the inﬁmum of all rewards that admit a motivating costconﬁguration. The optimization problem MCC-OPT is deﬁned accordingly: Deﬁnition 2 (MCC-OPT) . Given a task graph G and a present bias β ∈ (0 , , determine the inﬁmum of allrewards for which a motivating cost conﬁguration exists. Algorithm 2: M IN M AX P ATH A PPROX

Input : Task graph G , present bias β Output : Cost conﬁguration ˜ c P ← minmax path from s to t with respect to η ˜0 ; ̺ ← max { η ˜0 ( e ) | e ∈ P } ; foreach v ∈ V \ { t } do ς ( v ) ← successor node of v on a cheapest path from v to t ; foreach ( v, w ) ∈ E do if ( v, w ) ∈ P ∨ ( ς ( v ) = w ∧ v / ∈ P ) then ˜ c ( v, w ) ← else if ( v, w ) = P ∧ ς ( v ) = w then ˜ c ( v, w ) ← ̺/β else P ′ ← v, ς ( v ) , ς ( ς ( v )) , . . . , t ; u ← ﬁrst node of P ′ different from v that is also a node of P ; e ← most expensive edge of P ′ , between v and u ; ˜ c ( v, w ) ← c ( e ) ; return ˜ c ;We are now ready to introduce Algorithm 2. This algorithm enables us to construct cost conﬁgurations thatapproximate MCC-OPT within a factor of . At a high level, the algorithm proceeds in two phases. First, it8omputes a value ̺ such that ̺/β is a lower bound for any reward that admits a motivating cost conﬁguration.Secondly, it constructs a cost conﬁguration ˜ c that is motivating for a reward of ̺/β . This yields the promisedapproximation ratio of .For a more detailed discussion of Algorithm 2 assume that each edge e is labeled with its perceivedcost η ˜0 ( e ) . Furthermore, let ˜ c ′ be an arbitrary cost conﬁguration and P ′ the agent’s corresponding path from s to t . Our goal is to lower bound the minimum reward that is motivating for ˜ c ′ by some value ̺/β . For thispurpose, it is instructive to observe that any motivating reward must be at least max { η ˜ c ′ ( e ) | e ∈ P ′ } /β ≥ max { η ˜0 ( e ) | e ∈ P ′ } /β . Since P ′ can be an arbitrary path from s to t , we set ̺ = min (cid:8) max { η ˜0 ( e ) | e ∈ P } (cid:12)(cid:12) P is a path from s to t (cid:9) . In other words, ̺ is the maximum edge cost of a minmax path P from s to t with respect to η ˜0 . Note that P can be computed in polynomial-time by adding the edges of G in non-decreasing order of perceived cost toan initially empty set E ′ until s and t become connected for the ﬁrst time. Any path from s to t that only usesedges of E ′ is a suitable minmax path.We continue with the construction of ˜ c . To facilitate this task, Algorithm 2 sets up a cheapest path succes-sor relation ς . More precisely, it assigns a distinct successor node ς ( v ) to each v ∈ V \ { t } . The successor ischosen in such a way that ( v, ς ( v )) is the initial edge of a cheapest path from v to t . Since we may assumethat t is reachable from each node of G , all v = t must have at least one suitable successor. By constructionof ς , any path P ′ = v, ς ( v ) , ς ( ς ( v )) , . . . , t is a cheapest path from v to t . We call P ′ the ς -path of v .Once ς has been created, Algorithm 2 starts to assign an appropriate extra cost to all edges of G . Theidea behind this assignment is to either keep the agent on P or guide her along a suitable ς -path. For thisreason, we also call the algorithm M IN M AX P ATH A PPROX . While iterating through the edges ( v, w ) of G the algorithm distinguishes between three types of edges: First, ( v, w ) might be an edge of P or an edge of a ς -path. In the latter case v must not be a node of P . Any ( v, w ) that satisﬁes these requirements is an edge wewant the agent to traverse or use in her plans. Consequently, ( v, w ) is assigned no extra cost. Secondly, ( v, w ) might neither be an edge of P nor of a ς -path. Since we do not want the agent to traverse or plan along suchan edge, the algorithm assigns an extra cost of ̺/β to ( v, w ) . This is sufﬁciently expensive for the agent tolose interest in ( v, w ) provided that the reward is ̺/β . Thirdly, ( v, w ) might not be an edge of P but of a ς -path such that v is a node of P . This is the most involved case. To ﬁnd an appropriate cost for ( v, w ) , thealgorithm considers the ς -path P ′ of v . Let u be the ﬁrst common node between P and P ′ that is differentfrom v . Because P and P ′ both end in t , such a node must exist. Moreover, let e be the most expensive edgeof P ′ between v and u . The algorithm assigns an extra cost of c ( e ) to ( v, w ) . As we will show in Theorem 3,this cost is either high enough to keep the agent on P or she travels to u along P ′ without encountering edgesthat are too expensive.Clearly, Algorithm 2 can be implemented to run in polynomial-time with respect to the size of G . Itremains to show that the algorithm returns a cost conﬁguration ˜ c that approximates MCC-OPT within a factorof . Theorem 3. M IN M AX P ATH A PPROX has an approximation ratio of .Proof. Recall that ̺ denotes the maximum perceived edge cost along the minmax path P . From the abovedescription of M IN M AX P ATH A PPROX , it should be evident that ̺/β is a lower bound on the minimum moti-vating reward of any cost conﬁguration. To prove the theorem, we need to show that the algorithm returns acost conﬁguration ˜ c that is motivating for a reward of ̺/β .As our ﬁrst step we argue that the cost of a cheapest path from any node v to t with respect to ˜ c is atmost twice the cost of a cheapest path with respect to ˜0 . More formally, we prove that d ˜ c ( v ) ≤ d ˜0 ( v ) . Forthis purpose let P ′ be the ς -path of v . By construction of ς , P ′ is a cheapest path from v to t . It is crucialto observe that M IN M AX A PPROX only assigns extra cost to an edge ( v ′ , ς ( v ′ )) of P ′ if v ′ is located on P .9onsequently, there is at most one edge with extra cost between any two consecutive intersections of P and P ′ . Furthermore, this extra cost is equal to the cost of an edge on P ′ between v ′ and the next intersection of P and P ′ . Therefore, each edge of P ′ can contribute at most once to the total extra cost assigned to P ′ . Thismeans that the price of P ′ with respect to ˜ c is at most twice the price of P ′ with respect to ˜0 . Because theprice of P ′ is an upper bound for d ˜ c ( v ) , we have shown that d ˜ c ( v ) ≤ d ˜0 ( v ) .We proceed to investigate the agent’s walk through G . Our goal is to show that her lowest perceived costis at most ̺ at every node v on her way. This establishes the theorem. Our analysis is based on the followingcase distinction: First, assume that v is located on P . The immediate successor of v on P is denoted by w .Remember that ˜ c assigns no extra cost to ( v, w ) . Using the result from the previous paragraph, we get ζ ˜ c ( v ) ≤ η ˜ c ( v, w ) = c ( v, w ) + βd ˜ c ( v, w ) ≤ c ( v, w ) + β d ˜0 ( v, w ) ≤ (cid:0) c ( v, w ) + βd ˜0 ( v, w ) (cid:1) = 2 η ˜0 ( v, w ) ≤ ̺. The last inequality is valid by deﬁnition of ̺ .Secondly, assume that v is not located on P and consider the last node v ′ on P the agent visited before v . Because she traversed ( v ′ , ς ( v ′ )) to get to v , we know that η ˜ c ( v ′ , ς ( v ′ )) ≤ ̺ and d ˜ c ( ς ( v ′ )) ≤ ̺/β .We also know that she faces an extra cost of ̺/β whenever she tries to leave the ς -path P ′ of v ′ beforethe next intersection of P and P ′ . Since she is not willing to pay this much, v must be located on P ′ .In particular, all paths from ς ( v ′ ) to t either visit ς ( v ) or cross an edge that charges an extra cost of ̺/β .Consequently, a cheapest path from ς ( v ′ ) to t with respect to ˜ c costs at least d ˜ c ( ς ( v ′ )) ≥ min { ̺/β, d ˜ c ( ς ( v )) } .As d ˜ c ( ς ( v ′ )) ≤ ̺/β , this implies that d ˜ c ( ς ( v ′ )) ≥ d ˜ c ( ς ( v )) . Our proof is almost complete. For the ﬁnal part,recall that ( v, ς ( v )) is located on P ′ between v ′ and the next intersection of P and P ′ . By construction of ˜ c we have ˜ c ( v ′ , ς ( v ′ )) ≥ c ( v, ς ( v )) . Furthermore, ( v, ς ( v )) has no extra cost. Putting all the pieces together weget ζ ˜ c ( v ) ≤ η ˜ c ( v, ς ( v )) = c ( v, ς ( v )) + βd ˜ c ( ς ( v )) ≤ ˜ c ( v ′ , ς ( v ′ )) + βd ˜ c ( ς ( v ′ )) ≤ c ( v ′ , ς ( v ′ )) + ˜ c ( v ′ , ς ( v ′ )) + βd ˜ c ( ς ( v ′ )) = η ˜ c ( v ′ , ς ( v ′ )) ≤ ̺. To complement this result and emphasize the quality our approximation given the theoretical limitations,we argue that MCC-OPT is NP-hard to approximate within any ratio of . or less. In particular, assumingthat P = NP this rules out the existence of a polynomial-time approximation scheme. Theorem 4.

MCC-OPT is NP-hard to approximate within a ratio less or equal to . .Proof. To establish the theorem, a reduction similar to the one from Theorem 2 can be used. In fact, given a -SAT instance I we can construct the corresponding MCC-OPT instance J the same way as in the proof ofTheorem 2. The only difference is that our choice of ε is slightly more restrictive as we require ε < min n β (1 − β ) , β (1 − β ) (2 − β ) , β (1 − β ) β , β (1 − β ) (2 − β )1 + β o . The proof can be structured around the following properties of J : (a) If I has a solution, J admits amotivating cost conﬁguration for a reward of /β . (b) If I has no solution, J admits no motivating costconﬁguration for a reward of (1 + β (1 − β ) ) /β or less. Consequently, any algorithm that approximatesMCC-OPT within a ratio of β (1 − β ) or less must also solve I . To maximize this ratio we choose β = 1 / and obtain the desired approximability bound, namely − / / . . All that remainsto show is that J indeed satisﬁes (a) and (b). The correctness of (a) is an immediate consequence of the proofof Theorem 2. A detailed proof of (b) can be found in the Appendix.10 Conclusion

In this work we have used Kleinberg and Oren’s graph theoretic framework [5] to provide a systematic anal-ysis of penalty-based commitment devices. We have shown that penalty fees are strictly more powerful thancommitment devices based on prohibition. In particular, we have shown that penalties may outperform pro-hibitions by a factor of up to /β . We have also been able to obtain some of the ﬁrst positive computationalresults for the algorithmic design of commitment devices. We have given a polynomial-time algorithm to con-struct penalty fees that match an optimal solution by a factor of . This is signiﬁcant progress when comparedto prohibition-based commitment devices whose approximation is known to be NP-hard within a factor lessthan √ n/ [2]. Due to their versatility, expressiveness and favorable computational properties, we believe thatpenalty-based commitment devices will prove to be a valuable tool for the targeted improvement of complexsocial and economic settings in the context of time-inconsistent behavior. References [1] George A Akerlof. Procrastination and obedience.

The American Economic Review , 81(2):1–19, 1991.[2] Susanne Albers and Dennis Kraft. Motivating time-inconsistent agents: A computational approach. In

Proceedings of the 12th Conference on Web and Internet Economics , pages 309–323. Springer, 2016.[3] Gharad Bryan, Dean Karlan, and Scott Nelson. Commitment devices.

Annual Review of Economics ,2:671–698, 2010.[4] Nick Gravin, Nicole Immorlica, Brendan Lucier, and Emmanouil Pountourakis. Procrastination withvariable present bias. In

Proceedings of the 17th ACM Conference on Economics and Computation ,pages 361–361, New York, NY, USA, 2016. ACM.[5] Jon Kleinberg and Sigal Oren. Time-inconsistent planning: A computational problem in behavioraleconomics. In

Proceedings of the 15th ACM Conference on Economics and Computation , pages 547–564, New York, NY, USA, 2014. ACM.[6] Jon Kleinberg, Sigal Oren, and Manish Raghavan. Planning problems for sophisticated agents withpresent bias. In

Proceedings of the 17th ACM Conference on Economics and Computation , pages 343–360, New York, NY, USA, 2016. ACM.[7] David Laibson. Golden eggs and hyperbolic discounting.

The Quarterly Journal of Economics , pages443–477, 1997.[8] Ted O’Donoghue and Matthew Rabin. Doing it now or later.

The American Economic Review , 89:103–124, 1999.[9] Ted O’Donoghue and Matthew Rabin. Incentives and self control.

Advances in Economics and Econo-metrics: The 9th World Congress , 2:215–245, 2006.[10] Pingzhong Tang, Yifeng Teng, Zihe Wang, Shenke Xiao, and Yichong Xu. Computational issues intime-inconsistent planning. In

Proceedings of the 31st AAAI Conference on Artiﬁcial Intelligence , 2017.To appear. 11

Appendix

Proof of Proposition 1.

Let P = v , . . . , v m and assume that s = v and t = v m . From the description ofP ATH A ND F ENCE it should be clear that whenever the algorithm assigns extra cost to an edge ( v i , w ) , theperceived cost of that edge exceeds the perceived cost of ( v i , v i +1 ) by at least βε/ ( m − . Furthermore,since G is acyclic, the extra cost of ( v i , w ) does not affect the agent’s perceived cost of any previously pro-cessed edge ( v j , w ′ ) with j ≥ i . We conclude that P ATH A ND F ENCE returns a cost conﬁguration ˜ c for which η ˜ c ( v i , v i +1 ) < η ˜ c ( v i , v i +1 ) + βε/ ( m − ≤ η ˜ c ( v i , w ) . Consequently, the agent has no incentive to divertfrom P .In the remainder, we bound the perceived cost of each ( v i , v i +1 ) by η ˜ c ( v i , v i +1 ) ≤ η ˜ c ∗ ( v i , v i +1 ) + βε .Together with the observations from the previous paragraph, this concludes the proof. Our argument is basedon an induction on P . The main induction hypothesis is η ˜ c ( v i , v i +1 ) ≤ η ˜ c ∗ ( v i , v i +1 )+ βε ( m − − i ) / ( m − .Clearly, this also implies that η ˜ c ( v i , v i +1 ) ≤ η ˜ c ∗ ( v i , v i +1 ) + βε . To simplify matters, we introduce d ˜ c ( v i ) ≤ d ˜ c ∗ ( v i ) + βε ( m − − i ) / ( m − as an auxiliary induction hypothesis.We start the induction at the last edge of P , i.e. i = m − . Our goal is to show that η ˜ c ( v m − , t ) ≤ η ˜ c ∗ ( v m − , t ) and d ˜ c ( v m − ) ≤ d ˜ c ∗ ( v m − ) . Recall that ( v m − , t ) minimizes the agent’s perceived cost withrespect to ˜ c . By deﬁnition of P we also know that ( v m − , t ) minimizes her perceived cost with respect to ˜ c ∗ .Consequently, we have η ˜ c ( v m − , t ) = ζ ˜ c ( v m − ) and η ˜ c ∗ ( v m − , t ) = ζ ˜ c ∗ ( v m − ) . Since ( v m − , t ) is the lastedge of P , we conclude that η ˜ c ( v m − , t ) = ζ ˜ c ( v m − ) ≤ d ˜ c ( v m − ) ≤ c ( v m − , t ) + ˜ c ( v m − , t ) as well as c ( v m − , t ) + ˜ c ∗ ( v m − , t ) = η ˜ c ∗ ( v m − , t ) = ζ ˜ c ∗ ( v m − ) ≤ d ˜ c ∗ ( v m − ) . Moreover, ˜ c assigns no extra cost to ( v m − , t ) . Therefore, ˜ c ( v m − , t ) = 0 ≤ ˜ c ∗ ( v m − , t ) holds true. Combin-ing the last three inequalities concludes the basis of our induction.For the inductive step, assume that η ˜ c ( v j , v j +1 ) ≤ η ˜ c ∗ ( v j , v j +1 ) + βε ( m − − j ) / ( m − and d ˜ c ( v j ) ≤ d ˜ c ∗ ( v j ) + βε ( m − − j ) / ( m − are valid for all j such that i < j < m . We proceed to argue that bothof these inequalities are also valid for i . We start with the ﬁrst inequality. By construction of ˜ c we have ˜ c ( v i , v i +1 ) = 0 ≤ ˜ c ∗ ( v i , v i +1 ) . Consequently, we can bound the perceived cost of ( v i , v i +1 ) by η ˜ c ( v i , v i +1 ) ≤ c ( v i , v i +1 ) + ˜ c ∗ ( v i , v i +1 ) + βd ˜ c ( v i +1 ) . The auxiliary induction hypothesis now implies the desired result η ˜ c ( v i , v i +1 ) ≤ c ( v i , v i +1 ) + ˜ c ∗ ( v i , v i +1 ) + β (cid:16) d ˜ c ∗ ( v i +1 ) + βε m − − ( i + 1) m − (cid:17) = η ˜ c ∗ ( v i , v i +1 ) + β ε m − − im − ≤ η ˜ c ∗ ( v i , v i +1 ) + βε m − − im − . The proof of the second inequality, i.e. d ˜ c ( v i ) ≤ d ˜ c ∗ ( v i ) + βε ( m − − i ) / ( m − , is a bit more involved.Let ( v i , w ) be the initial edge of a cheapest path P ′ from v i to t with respect to ˜ c ∗ . In a ﬁrst step, we show that d ˜ c ( w ) ≤ d ˜ c ∗ ( w ) + βε ( m − − i ) / ( m − . For this purpose let v j be the node of smallest index differentfrom v i that is located at an intersection between P and P ′ . Because P and P ′ both end in t , such a nodemust exist. Recall that ˜ c only assigns extra cost to edges that leave a node of P . By deﬁnition of v j , no edgein P ′ between w and v j can be such an edge. Let d ˜ c ( w, v j ) and d ˜ c ∗ ( w, v j ) denote the cost of a cheapest pathfrom w to v j with respect to ˜ c and ˜ c ∗ . According to our considerations, d ˜ c ( w, v j ) ≤ d ˜ c ∗ ( w, v j ) holds true. If v j = t , this immediately implies d ˜ c ( w ) ≤ d ˜ c ∗ ( w ) + βε ( m − − i ) / ( m − . Otherwise, if v j = t , we canapply the auxiliary induction hypothesis to obtain the desired result d ˜ c ( w ) = d ˜ c ( w, v j ) + d ˜ c ( v j ) ≤ d ˜ c ∗ ( w, v j ) + d ˜ c ∗ ( v j ) + βε m − − jm − ≤ d ˜ c ∗ ( w ) + βε m − − im − . P ′ . We distinguish between two scenarios. First,assume that ˜ c assigns no extra cost to ( v i , w ) . In this case, we have ˜ c ( v i , w ) = 0 ≤ ˜ c ∗ ( v i , w ) . Together withthe inequality from the previous paragraph, this immediately concludes the inductive step d ˜ c ( v i ) ≤ c ( v i , w ) + ˜ c ( v i , w ) + d ˜ c ( w ) ≤ c ( v i , w ) + ˜ c ∗ ( v i , w ) + d ˜ c ∗ ( w ) + βε m − − im − ≤ c ( v i , w ) + ˜ c ∗ ( v i , w ) + d ˜ c ∗ ( w ) + βε m − − im − d ˜ c ∗ ( v i ) + βε m − − im − . Secondly, assume that ˜ c assigns positive cost to ( v i , w ) . In this case, the perceived cost of ( v i , w ) withrespect to ˜ c is just slightly greater than that of ( v i , v i +1 ) . More formally, it holds true that η ˜ c ( v i , w ) = η ˜ c ( v i , v i +1 ) + βε/ ( m − . This follows from the construction of ˜ c . Therefore, we can upper bound d ˜ c ( v i ) by d ˜ c ( v i ) ≤ c ( v i , w ) + ˜ c ( v i , w ) + d ˜ c ( w ) = η ˜ c ( v i , w ) + (1 − β ) d ˜ c ( w )= η ˜ c ( v i , v i +1 ) + βε m − − β ) d ˜ c ( w ) . Recall that η ˜ c ( v i , v i +1 ) ≤ η ˜ c ∗ ( v i , v i +1 ) + β ε ( m − − i ) / ( m − . In combination with our upper bound on d ˜ c ( w ) , we obtain d ˜ c ( v i ) ≤ η ˜ c ∗ ( v i , v i +1 ) + β ε m − − im − βε m − − β ) (cid:16) d ˜ c ∗ ( w ) + βε m − − im − (cid:17) = η ˜ c ∗ ( v i , v i +1 ) + (1 − β ) d ˜ c ∗ ( w ) + m − − im − βε. Because ( v i , v i +1 ) minimizes the perceived cost at v i with respect to ˜ c ∗ , we may assume that η ˜ c ∗ ( v i , v i +1 ) ≤ η ˜ c ∗ ( v i , w ) and obtain d ˜ c ( v i ) ≤ η ˜ c ∗ ( v i , w ) + (1 − β ) d ˜ c ∗ ( w ) + m − − im − βε = d ˜ c ∗ ( v i ) + m − − im − βε. Proof of Theorem 2 (continued).

It remains to show that I has a satisfying variable assignment if and onlyif J has a cost conﬁguration ˜ c that is motivating for a reward of /β . ( ⇒ ) We start by constructing ˜ c froman assignment of truth values τ : { x , . . . , x m } → { T, F } that satisﬁes each clause of I . For this purposewe assign an extra cost of (1 − β ) to the ﬁrst edge of all shortcuts that start at variable nodes w k,τ ( x k ) .Furthermore, we assign an extra cost of to all forward edges ending in a variable node of the from w k,τ (¯ x k ) .To show that ˜ c is indeed motivating, we divide the agent’s walk into two separate parts.The ﬁrst part contains the literal nodes from s to u . When located at a speciﬁc node v i,j , the agent hastwo options: either she takes the shortcut or she follows a forward edge. In the ﬁrst case, she ends up at avariable node. By construction of G , the cost of a cheapest path from any variable node to t is at least − β .This holds true regardless of extra cost. As a result, her perceived cost for taking the shortcut at v i,j is at least (1 − β ) + β (2 − β ) = 1 . Her other option is to take a forward edge. Assuming that i < ℓ , let j ′ be theindex of a literal in c i +1 that evaluates to true with respect to τ . Because τ is a satisfying variable assignment,such a literal must exist. The agent’s perceived cost for traversing ( v i,j , v i +1 ,j ′ ) and then taking the two directshortcuts to t is (1 − β ) − ε + β ((1 − β ) + (2 − β )) = 1 − ε . In the special case that i = ℓ we obtain thesame perceived cost along the path v ℓ,j , u , u , t . Consequently, the agent always prefers at least one forwardedge to the current shortcut. A similar argument shows that there is one forward edge with a perceived cost of − ε out of s . Furthermore, there are no immediate shortcuts at s . Finally, when located at u , the agent hasno choice but to traverse ( u , u ) . At this point her perceived cost of the path u , u , t is . Considering thather perceived value of the reward is , we conclude that she follows the forward edges until she successfullycompletes the ﬁrst part of her walk. 13he second part of the agent’s walk contains the variable nodes from u to t . At u the agent has threeoptions. First, she can follow the shortcut of type two. This has a cost of − β and is clearly not motivating.Secondly, she can traverse the forward edge to w ,τ (¯ x ) . By construction of ˜ c , this edge has a cost greater than . Again, this is not motivating. Thirdly, she can traverse the forward edge to w ,τ ( x ) . If the she plans to takethe shortcut to t immediately afterwards, the perceived cost is (1 − β ) − ε + β ((1 − β ) + (2 − β )) = 1 − ε .Therefore, this is her preferred choice. Because it is also a motivating choice, she moves to w ,τ ( x ) whereshe faces the same three options. The only difference is that this time the ﬁrst option is a shortcut is of thethird type and has a perceived cost of (1 − β ) + β (2 − β ) = 1 . Repeating the argument shows that the agenttravels from one variable node w k,τ ( x k ) to the next w k +1 ,τ ( x k +1 ) until she gets to u . At this point the onlypath to t is along the nodes u , u and u . Since the agent’s lowest perceived cost at all three nodes is , sheremains motivated and eventually reaches t . We conclude that ˜ c is motivating.( ⇐ ) Next, assume that J has a solution, i.e. there exists a cost conﬁguration ˜ c that is motivating for areward of /β . We proceed to show how to obtain a variable assignment τ that satisﬁes each clause of I . Forthis purpose we make the following two observations: First, no motivating cost conﬁguration can guide theagent onto a shortcut of type two or three. This is because these shortcuts have an edge of cost (2 − β ) > and are too expensive to traverse for the given reward. Secondly, the agent cannot enter a shortcut of the ﬁrsttype either. To understand this, assume for a moment that she does take such a shortcut. The shortcut takesher from some literal node v i,j to a variable node w k,y . Her perceived cost of ( v i,j , w k,y ) can be at most ,otherwise the shortcut would not be motivating. By construction of G there is exactly one cheapest path from w k,y to t , namely the direct shortcut to t . As the total cost of this shortcut is − β , the only way to achievea perceived cost of for ( v i,j , w k,y ) is along this very shortcut. In particular, no extra cost can be assigned tothis shortcut. However, once the agent has reached w k,y , her perceived cost for taking the direct shortcut to t is β (2 − β ) as no extra cost is placed on this path. Conversely, her perceived cost of any forward edge at w k,y is at least − ε , even if we neglect extra cost. By choice of ε , it holds true that (1 − ε ) − β (2 − β ) > (cid:0) − (1 − β ) (cid:1) − β (2 − β ) = 0 . Consequently, the agent prefers the shortcut to any of the forward edges. This contradicts our previous obser-vation that she does not take shortcuts of type three.Because no motivating cost conﬁguration can guide the agent onto a shortcut, we conclude that her walkfrom s to t must contain exactly one literal node v i,j and one variable node w k,y for each clause and variableof I . Let P be one of possibly several paths the agent can walk from s to t . Based on P , we construct asuitable variable assignment τ as follows: If she visits w k,T along P , we set τ ( x k ) = T . Otherwise, if shevisits w k,F , we set τ ( x k ) = F . To conclude the proof, we argue that τ satisﬁes all clauses of I .Consider an arbitrary clause c i and let v i,j be the corresponding literal node in P . Furthermore, let v i − ,j ′ be the literal node preceding v i,j in P . If i = 1 , let v i − ,j ′ = s . We denote the agent’s planned path to t when located at v i − ,j ′ by P ′ . Clearly, the ﬁrst edge of P ′ must be ( v i − ,j ′ , v i,j ) as this edge is also on P .For the next edge of P ′ we have two options: The ﬁrst is another forward edge. As a result, there must besome additional edge of cost (1 − β ) in P ′ . This can either be a subsequent shortcut of type one or ( u , u ) .Moreover, P ′ must include the edges ( u , u ) and ( u , t ) or a shortcut of type two or three. In all cases, thetotal cost of these edges is at least (1 − β ) + (2 − β ) . Therefore, the agent’s perceived cost of the ﬁrst optionsums up to a value greater or equal to (1 − β ) − ε + β (cid:0) (1 − β ) − ε + (1 − β ) + (2 − β ) (cid:1) = 1 + (1 + β ) (cid:16) β (1 − β ) β − ε (cid:17) > . The inequality is valid by choice of ε . Clearly, this is not motivating. The second option is that the next edgeof P ′ is the shortcut from v i,j to the corresponding variable node w k,y . Again, we can distinguish betweentwo cases. First, P ′ might include a forward edge from w k,y to a subsequent variable node w k +1 ,y ′ , or to14 if k = m . However, similar calculations to the one above indicate that this is not motivating. The onlyremaining option is that P ′ contains the shortcut from w k,y to t . In this case, the perceived cost of P ′ is atleast − ε . This leaves an extra cost of at most ε/β to place onto the shortcut from w k,y to t .Now assume that P includes w k,y , i.e. the agent visits w k,y at a later point. Recall that her perceived costfor taking a forward edge at w k,y is at least − ε . By choice of ε , an extra cost of ε/β is not sufﬁcient toprevent her from entering the shortcut at w k,y as (1 − ε ) − (cid:16) εβ + β (2 − β ) (cid:17) = 1 + ββ (cid:16) β (1 − β ) β − ε (cid:17) > . Of course, this contradicts the fact that the agent cannot take shortcuts. Consequently, the agent cannot visit w k,y but must visit w k, ¯ y instead. By construction of G , this implies that τ satisﬁes the j -th literal of c i .Because this holds true for all clauses of I , τ must be a satisfying variable assignment. Proof of Theorem 4 (continued).

In the following, we prove that J satisﬁes (b). For the sake of contradictionassume that there exists a cost conﬁguration ˜ c that is motivating for a reward of at most (1 + β (1 − β ) ) /β ,but I has no solution. Let P be a path that corresponds to the agent’s walk from s to t .Similar to the proof of Theorem 2 we ﬁrst argue that P cannot include shortcuts. Recall that shortcutsof the second and third type have an edge of cost − β . However, the agent’s perceived reward is at most β (1 − β ) . Because − β = 1 + (1 − β ) > β (1 − β ) , she has no incentive to traverse such anedge. It remains to show that she does not take a shortcut of the ﬁrst type either. For this purpose assume shetravels from a literal node v i,j to some variable node w k,y via a shortcut of type one. Let P ′ be her plannedpath when located at v i,j . We distinguish between two scenarios. First, P ′ might include a forward edge after ( v i,j , w k,y ) . Even if we neglect extra cost, her perceived cost of P is at least (1 − β ) + β (cid:0) (1 − β ) − ε + (2 − β ) (cid:1) > (1 − β ) + β (cid:0) (1 − β ) − β (1 − b ) + (2 − β ) (cid:1) = 1 + β (1 − β ) . The inequality is valid by choice of ε . Because her perceived cost of P ′ exceeds her perceived reward, thisscenario is not possible. Secondly, P ′ might contain the shortcut from w k,y to t . In this case, the agent’sperceived cost of P ′ is at least . Consequently, ˜ c may assign an extra cost of no more than ( β (1 − β ) ) /β =(1 − β ) to the edges of P ′ . This holds particularly true for edges of the shortcut from w k,y to t . Therefore,her perceived cost for taking the shortcut at w k,y is at most (1 − β ) + β (2 − β ) . Conversely, even withoutextra cost, her cost for taking a forward edge at w k,y is at least − ε . By choice of ε , she prefers the shortcut − ε > − β (1 − β ) (2 − β ) = (1 − β ) + β (2 − β ) . This contradicts the fact that she does not enter a shortcut of type three.Because ˜ c does not guide the agent onto a shortcut, we conclude that P must contain exactly one literalnode v i,j and one variable node w k,y for each clause and each variable of I . Similar to the proof of Theorem 2we use P to construct a variable assignment τ in the following way: If the agent visits w k,T along P , we set τ ( x k ) = T . Otherwise, if she visits w k,F , we set τ ( x k ) = F . To conclude the proof we argue that τ satisﬁesall clauses of I . This is a contradiction to our initial assumption that I has no solution.Consider an arbitrary clause c i and let v i,j be the corresponding literal node in P . Furthermore, let v i − ,j ′ be the literal node that precedes v i,j in P . If i = 1 , let v i − ,j ′ = s . The agent’s planned path from v i − ,j ′ to t is denoted by P ′ . Clearly, the ﬁrst edge of P ′ must be ( v i − ,j ′ , v i,j ) . In the next step two directions arepossible. The ﬁrst one is another forward edge. As argued in the proof of Theorem 2 the perceived cost of P ′ is at least (1 − β ) − ε + β (cid:0) (1 − β ) − ε + (1 − β ) + (2 − β ) (cid:1) = 1 + β (1 − β ) − (1 + β ) ε.

15y choice of ε , this is not motivating β (1 − β ) − (1 + β ) ε > β (1 − β ) − (1 + β ) β (1 − β ) β = 1 + β (1 − β ) . The second direction is along the shortcut from v i,j to some variable node w k,y . Again, we can distinguishbetween two cases. First, P ′ might contain a forward edge from w k,y to some variable node w k +1 ,y ′ or u if k = m . However, a calculation similar to the one above indicates that this is not motivating. The onlyremaining possibility is that P ′ also contains the shortcut from w k,y to t . In this case, the perceived cost of P ′ is at least − ε . This leaves an extra cost of no more than ( ε + β (1 − β ) ) /β = ε/β + (1 − β ) to place ontothe shortcut from w k,y to t .Now assume that P also includes w k,y . The agent’s perceived cost for taking a forward edge from w k,y isat least − ε . By choice of ε , we conclude that an extra cost of ε/β + (1 − β ) is not sufﬁcient to prevent theagent from entering the shortcut as (1 − ε ) − (cid:16) εβ + (1 − β ) + β (2 − β ) (cid:17) = 1 + ββ (cid:16) β (1 − β ) (2 − β )1 + β − ε (cid:17) > . Of course, this contradicts the fact that the agent cannot take shortcuts. Consequently, the agent cannot visit w k,y but must visit w k, ¯ y instead. By construction of G , this implies that τ satisﬁes the j -th literal of clause c i .Because this holds true for all clauses of I , ττ