Obtaining Reliable Feedback for Sanctioning Reputation Mechanisms
aa r X i v : . [ c s . A I] O c t Journal of Artificial Intelligence Research 29 (2007) 391-419 Submitted 1/2007; published 8/2007
Obtaining Reliable Feedback for Sanctioning ReputationMechanisms
Radu Jurca [email protected]
Boi Faltings [email protected]
Ecole Polytechnique F´ed´erale de Lausanne (EPFL)Artificial Intelligence Laboratory (LIA)CH-1015 Lausanne, Switzerland
Abstract
Reputation mechanisms offer an effective alternative to verification authorities for build-ing trust in electronic markets with moral hazard. Future clients guide their business de-cisions by considering the feedback from past transactions; if truthfully exposed, cheatingbehavior is sanctioned and thus becomes irrational.It therefore becomes important to ensure that rational clients have the right incentivesto report honestly. As an alternative to side-payment schemes that explicitly reward truth-ful reports, we show that honesty can emerge as a rational behavior when clients have arepeated presence in the market. To this end we describe a mechanism that supports anequilibrium where truthful feedback is obtained. Then we characterize the set of pareto-optimal equilibria of the mechanism, and derive an upper bound on the percentage of falsereports that can be recorded by the mechanism. An important role in the existence of thisbound is played by the fact that rational clients can establish a reputation for reportinghonestly.
1. Introduction
The availability of ubiquitous communication through the Internet is driving the migra-tion of business transactions from direct contact between people to electronically mediatedinteractions. People interact electronically either through human-computer interfaces orthrough programs representing humans, so-called agents. In either case, no physical in-teractions among entities occur, and the systems are much more susceptible to fraud anddeception.Traditional methods to avoid cheating involve cryptographic schemes and trusted thirdparties (TTP’s) that overlook every transaction. Such systems are very costly, introducepotential bottlenecks, and may be difficult to deploy due to the complexity and heterogene-ity of the environment: e.g., agents in different geographical locations may be subject todifferent legislation, or different interaction protocols.Reputation mechanisms offer a novel and effective way of ensuring the necessary levelof trust which is essential to the functioning of any market. They are based on the observa-tion that agent strategies change when we consider that interactions are repeated: the otherparty remembers past cheating, and changes its terms of business accordingly in the future.Therefore, the expected gains due to future transactions in which the agent has a higherreputation can offset the loss incurred by not cheating in the present. This effect can be am- c (cid:13) urca & Faltings plified considerably when such reputation information is shared among a large population,and thus multiplies the expected future gains made accessible by honest behavior.Existing reputation mechanisms enjoy huge success. Systems such as eBay or Amazon implement reputation mechanisms which are partly credited for the businesses’ success.Studies show that human users seriously take into account the reputation of the seller whenplacing bids in online auctions (Houser & Wooders, 2006), and that despite the incentiveto free ride, feedback is provided in more than half of the transactions on eBay (Resnick &Zeckhauser, 2002).One important challenge associated with designing reputation mechanisms is to ensurethat truthful feedback is obtained about the actual interactions, a property called incentive-compatibility . Rational users can regard the private information they have observed as avaluable asset, not to be freely shared. Worse even, agents can have external incentivesto misreport and thus manipulate the reputation information available to other agents(Harmon, 2004). Without proper measures, the reputation mechanism will obtain unreliableinformation, biased by the strategic interests of the reporters.Honest reporting incentives should be addressed differently depending on the predomi-nant role of the reputation mechanisms. The signaling role is useful in environments wherethe service offered by different providers may have different quality, but all clients inter-acting with the same provider are treated equally (markets with adverse selection ). Thisis the case, for example, in a market of web-services. Different providers possess differenthardware resources and employ different algorithms; this makes certain web-services betterthan others. Nevertheless, all requests issued to the same web-service are treated by thesame program. Some clients might experience worse service than others, but these differ-ences are random, and not determined by the provider. The feedback from previous clientsstatistically estimates the quality delivered by a provider in the future, and hence signalsto future clients which provider should be selected.The sanctioning role, on the other hand, is present in settings where service requestsissued by clients must be individually addressed by the provider. Think of a barber, whomust skillfully shave every client that walks in his shop. The problem here is that providersmust exert care (and costly effort) for satisfying every service request. Good quality canresult only when enough effort was exerted, but the provider is better off by exerting lesseffort: e.g., clients will anyway pay for the shave, so the barber is better off by doing a sloppyjob as fast as possible in order to have time for more customers. This moral hazard situationcan be eliminated by a reputation mechanism that punishes providers for not exerting effort.Low effort results in negative feedback that decreases the reputation, and hence the futurebusiness opportunities of the provider. The future loss due to a bad reputation offsets themomentary gain obtained by cheating, and makes cooperative behavior profitable.There are well known solutions for providing honest reporting incentives for signalingreputation mechanisms. Since all clients interacting with a service receive the same quality(in a statistical sense), a client’s private observation influences her belief regarding theexperience of other clients. In the web-services market mentioned before, the fact that oneclient had a bad experience with a certain web-service makes her more likely to believethat other clients will also encounter problems with that same web-service. This correlation btaining Reliable Feedback for Sanctioning Reputation Mechanisms between the client’s private belief and the feedback reported by other clients can be usedto design feedback payments that make honesty a Nash equilibrium. When submittingfeedback, clients get paid an amount that depends both on the the value they reportedand on the reports submitted by other clients. As long as others report truthfully, theexpected payment of every client is maximized by the honest report – thus the equilibrium.Miller, Resnick, and Zeckhauser (2005) and Jurca and Faltings (2006) show that incentive-compatible payments can be designed to offset both reporting costs and lying incentives.For sanctioning reputation mechanisms the same payment schemes are not guaranteed tobe incentive-compatible. Different clients may experience different service quality becausethe provider decided to exert different effort levels. The private beliefs of the reportermay no longer be correlated to the feedback of other clients, and therefore, the statisticalproperties exploited by Miller et al. (2005) are no longer present.As an alternative, we propose different incentives to motivate honest reporting basedon the repeated presence of the client in the market. Game theoretic results (i.e., the folktheorems ) show that repeated interactions support new equilibria where present deviationsare made unattractive by future penalties. Even without a reputation mechanism, a clientcan guide her future play depending on the experience of previous interactions. As a firstresult of this paper, we describe a mechanism that indeed supports a cooperative equilibriumwhere providers exert effort all the time. The reputation mechanism correctly records whenthe client received low quality.There are certainly some applications where clients repeatedly interact with the sameseller with a potential moral hazard problem. The barber shop mentioned above is oneexample, as most people prefer going to the same barber (or hairdresser). Another exampleis a market of delivery services. Every package must be scheduled for timely delivery,and this involves a cost for the provider. Some of this cost may be saved by occasionallydropping a package, hence the moral hazard. Moreover, business clients typically relyon the same carrier to dispatch their documents or merchandise. As their own businessdepends on the quality and timeliness of the delivery, they do have the incentive to form alasting relationship and get good service. Yet another example is that of a business personwho repeatedly travels to an offshore client. The business person has a direct interest torepeatedly obtain good service from the hotel which is closest to the client’s offices.We assume that the quality observed by the clients is also influenced by environmentalfactors outside the control of, however observable by, the provider. Despite the barber’sbest effort, a sudden movement of the client can always generate an accidental cut that willmake the client unhappy. Likewise, the delivery company may occasionally lose or damagesome packages due to transportation accidents. Nevertheless, the delivery company (likethe barber) eventually learns with certainty about any delays, damages or losses that entitleclients to complain about unsatisfactory service.The mechanism we propose is quite simple. Before asking feedback from the client, themechanism gives the provider the opportunity to acknowledge failure, and reimburse theclient. Only when the provider claims good service does the reputation mechanism recordthe feedback of the client. Contradictory reports (the provider claims good service, but theclient submits negative feedback) may only appear when one of the parties is lying, andtherefore, both the client and the provider are sanctioned: the provider suffers a loss as aconsequence of the negative report, while the client is given a small fine. urca & Faltings One equilibrium of the mechanism is when providers always do their best to deliverthe promised quality, and truthfully acknowledge the failures caused by the environmentalfactors. Their “honest” behavior is motivated by the threat that any mistake will drivethe unsatisfied client away from the market. When future transactions generate sufficientrevenue, the provider does not afford to risk losing a client, hence the equilibrium.Unfortunately, this socially desired equilibrium is not unique. Clients can occasionallyaccept bad service and keep returning to the same provider because they don’t have betteralternatives. Moreover, since complaining for bad service is sanctioned by the reputationmechanism, clients might be reluctant to report negative feedback. Penalties for negativereports and the clients’ lack of choice drives the provider to occasionally cheat in order toincrease his revenue.As a second result, we characterize the set of pareto-optimal equilibria of our mechanismand prove that the amount of unreported cheating that can occur is limited by two factors.The first factor limits the amount of cheating in general, and is given by the quality ofthe alternatives available to the clients. Better alternatives increase the expectations of theclients, therefore the provider must cheat less in order to keep his customers.The second factor limits the amount of unreported cheating, and represents the costincurred by clients to establish a reputation for reporting the truth. By stubbornly exposingbad service when it happens, despite the fine imposed by the reputation mechanism, theclient signals to the provider that she is committed to always report the truth. Such signalswill eventually change the strategy of the provider to full cooperation, who will avoid thepunishment for negative feedback. Having a reputation for reporting truthfully is of course,valuable to the client; therefore, a rational client accepts to lie (and give up the reputation)only when the cost of building a reputation for reporting honestly is greater than theoccasional loss created by tolerated cheating. This cost is given by the ease with whichthe provider switches to cooperative play, and by the magnitude of the fine imposed fornegative feedback.Concretely, this paper proceeds as follows. In Section 2 we describe related work, fol-lowed by a more detailed description of our setting in Section 3. Section 4 presents a gametheoretic model of our mechanism and an analysis of reporting incentives and equilibria.Here we establish the existence of the cooperative equilibrium, and derive un upper boundon the amount of cheating that can occur in any pareto-optimal equilibrium.In Section 5 we establish the cost of building a reputation for reporting honestly, andhence compute an upper bound on the percentage of false reports recorded by the reputationmechanism in any equilibrium.We continue in Section 6 by analyzing the impact of malicious buyers that explicitlytry to destroy the reputation of the provider. We give some initial approximations on theworst case damage such buyers can cause to providers. Further discussions, open issues anddirections for future work are discussed in Section 7. Finally, Section 8 concludes our work.
2. Related Work
The notion of reputation is often used in Game Theory to signal the commitment of a playertowards a fixed strategy. This is what we mean by saying that clients establish a reputationfor reporting the truth : they commit to always report the truth. Building a reputation btaining Reliable Feedback for Sanctioning Reputation Mechanisms usually requires some incomplete information repeated game, and can significantly impactthe set of equilibrium points of the game. This is commonly referred to as the reputationeffect , first characterized by the seminal papers of Kreps, Milgrom, Roberts, and Wilson(1982), Kreps and Wilson (1982) and Milgrom and Roberts (1982).The reputation effect can be extended to all games where a player ( A ) could benefitfrom committing to a certain strategy σ that is not credible in a complete informationgame: e.g., a monopolist seller would like to commit to fight all potential entrants in achain-store game (Selten, 1978), however, this commitment is not credible due to the costof fighting. In an incomplete information game where the commitment type has positiveprobability, A ’s opponent ( B ) can at some point become convinced that A is playing as ifshe were the commitment type. At that point, B will play a best response against σ , whichgives A the desired payoff. Establishing a reputation for the commitment strategy requirestime and cost. When the higher future payoffs offset the cost of building reputation, thereputation effect prescribes minimum payoffs any equilibrium strategy should give to player A (otherwise, A can profitably deviate by playing as if she were a commitment type).Fudenberg and Levine (1989) study the class of all repeated games in which a long-runplayer faces a sequence of single-shot opponents who can observe all previous games. If thelong-run player is sufficiently patient and the single-shot players have a positive prior beliefthat the long-run player might be a commitment type, the authors derive a lower bound onthe payoff received by the long-run player in any Nash equilibrium of the repeated game.This result holds for both finitely and infinitely repeated games, and is robust against furtherperturbations of the information structure (i.e., it is independent of what other types havepositive probability).Schmidt (1993) provides a generalization of the above result for the two long-run playercase in a special class of games called of “conflicting interests”, when one of the players issufficiently more patient than the opponent. A game is of conflicting interests when thecommitment strategy of one player ( A ) holds the opponent ( B ) to his minimax payoff. Theauthor derives an upper limit on the number of rounds B will not play a best response to A ’s commitment type, which in turn generates a lower bound on A ’s equilibrium payoff. Fora detailed treatment of the reputation effect, the reader is directed to the work of Mailathand Samuelson (2006).In computer science and information systems research, reputation information definessome aggregate of feedback reports about past transactions. This is the semantics we areusing when referring to the reputation of the provider. Reputation information encompassesa unitary appreciation of the personal attributes of the provider, and influences the trustingdecisions of clients. Depending on the environment, reputation has two main roles: to signal the capabilities of the provider, and to sanction cheating behavior (Kuwabara, 2003). Signaling reputation mechanisms allow clients to learn which providers are the mostcapable of providing good service. Such systems have been widely used in computationaltrust mechanisms. Birk (2001) and Biswas, Sen, and Debnath (2000) describe systemswhere agents use their direct past experience to recognize trustworthy partners. The globalefficiency of the market is clearly increased, however, the time needed to build the reputationinformation prohibits the use of this kind of mechanisms in a large scale online market.A number of signaling reputation mechanisms also take into consideration indirect rep-utation information, i.e., information reported by peers. Schillo, Funk, and Rovatsos (2000) urca & Faltings and Yu and Singh (2002, 2003) use social networks in order to obtain the reputation of anunknown agent. Agents ask acquaintances several hops away about the trustworthiness ofan unknown agent. Recommendations are afterwards aggregated into a single measure ofthe agent’s reputation. This class of mechanisms, however intuitive, does not provide anyrational participation incentives for the agents. Moreover, there is little protection againstuntruthful reporting, and no guarantee that the mechanism cannot be manipulated by amalicious provider in order to obtain higher payoffs.Truthful reporting incentives for signaling reputation mechanisms are described byMiller et al. (2005). Honest reports are explicitly rewarded by payments that take intoaccount the value of the submitted report, and the value of a report submitted by anotherclient (called the reference reporter ). The payment schemes are designed based on properscoring rules , mathematical functions that make possible the revelation of private beliefs(Cooke, 1991). The essence behind honest reporting incentives is the observation that theprivate information a client obtains from interacting with a provider changes her belief re-garding the reports of other clients. This change in beliefs can be exploited to make honestyan ex-ante Nash equilibrium strategy.Jurca and Faltings (2006) extend the above result by taking a computational approachto designing incentive compatible payment schemes. Instead of using closed form scoringrules, they compute the payments using an optimization problem that minimizes the totalbudget required to reward the reporters. By also using several reference reports and filteringmechanisms, they render the payment mechanisms cheaper and more practical.Dellarocas (2005) presents a comprehensive investigation of binary sanctioning reputa-tion mechanisms. As in our setting, providers are equally capable of providing high quality,however, doing so requires costly effort. The role of the reputation mechanism is to encour-age cooperative behavior by punishing cheating: negative feedback reduces future revenueseither by excluding the provider from the market, or by decreasing the price the providercan charge in future transactions. Dellarocas shows that simple information structures anddecision rules can lead to efficient equilibria, given that clients report honestly.Our paper builds upon such mechanisms by addressing reporting incentives. We will ab-stract away the details of the underlying reputation mechanism through an explicit penaltyassociated with a negative feedback. Given that such high enough penalties exist, any rep-utation mechanism (i.e., feedback aggregation and trusting decision rules) can be pluggedin our scheme.In the same group of work that addresses reporting incentives, we mention the work ofBraynov and Sandholm (2002), Dellarocas (2002) and Papaioannou and Stamoulis (2005).Braynov and Sandholm consider exchanges of goods for money and prove that a marketin which agents are trusted to the degree they deserve to be trusted is equally efficient asa market with complete trustworthiness. By scaling the amount of the traded product,the authors prove that it is possible to make it rational for sellers to truthfully declaretheir trustworthiness. Truthful declaration of one’s trustworthiness eliminates the need ofreputation mechanisms and significantly reduces the cost of trust management. However,the assumptions made about the trading environment (i.e. the form of the cost function andthe selling price which is supposed to be smaller than the marginal cost) are not commonin most electronic markets. btaining Reliable Feedback for Sanctioning Reputation Mechanisms
For e-Bay-like auctions, the Goodwill Hunting mechanism (Dellarocas, 2002) provides away to make sellers indifferent between lying or truthfully declaring the quality of the goodoffered for sale. Momentary gains or losses obtained from misrepresenting the good’s qualityare later compensated by the mechanism which has the power to modify the announcementof the seller.Papaioannou and Stamoulis (2005) describe an incentive-compatible reputation mecha-nism that is particularly suited for peer-to-peer applications. Their mechanism is similar toours, in the sense that both the provider and the client are punished for submitting conflict-ing reports. The authors experimentally show that a class of common lying strategies aresuccessfully deterred by their scheme. Unlike their results, our paper considers all possibleequilibrium strategies and sets bounds on the amount of untruthful information recordedby the reputation mechanism.
3. The Setting
We assume an online market, where rational clients (she) repeatedly request the same ser-vice from one provider (he). Every client repeatedly interacts with the service provider,however, successive requests from the same client are always interleaved with enough re-quests generated by other clients. Transactions are assumed sequential, the provider doesnot have capacity constraints, and accepts all requests.The price of service is p monetary units, and the service can have either high ( q ) orlow ( q ) quality. Only high quality is valuable to the clients, and has utility u ( q ) = u .Low quality has utility 0, and can be precisely distinguished from high quality. Before eachround, the client can decide to request the service from the provider, or quit the market andresort to an outside provider that is completely trustworthy. The outside provider alwaysdelivers high quality service, but for a higher price p (1 + ρ ).If the client decides to interact with the online provider, she issues a request to theprovider, and pays for the service. The provider can now decide to exert low ( e ) or high( e ) effort when treating the request. Low effort has a normalized cost of 0, but generatesonly low quality. High effort is expensive (normalized cost equals c ( e ) = c ) and generateshigh quality with probability α < α is fixed, and depends on the environmental factorsoutside the control of the provider. αp > c , so that it is individually rational for providersto exert effort.After exerting effort, the provider can observe the quality of the resulting service. Hecan then decide to deliver the service as it is, or to acknowledge failure and roll back thetransaction by fully reimbursing the client. We assume perfect delivery channels, suchthat the client perceives exactly the same quality as the provider. After delivery, the clientinspects the quality of service, and can accuse low quality by submitting a negative reportto the reputation mechanism.The reputation mechanism (RM) is unique in the market, and trusted by all participants.It can oversee monetary transactions (i.e., payments made between clients and the provider)and can impose fines on all parties. However, the RM does not observe the effort levelexerted by the provider, nor does it know the quality of the delivered service.
3. In reality, the provider might also pay a penalty for rolling back the transaction. As long as this penaltyis small, the qualitative results we present in this paper remain valid. urca & Faltings
The RM asks feedback from the client only if she chose to transact with the provider inthe current round (i.e., paid the price of service to the provider) and the provider deliveredthe service (i.e., provider did not reimburse the client). When the client submits negativefeedback, the RM punishes both the client and the provider: the client must pay a fine ε ,and the provider accumulates a negative reputation report. Although simplistic, this model retains the main characteristics of several interesting ap-plications. A delivery service for perishable goods (goods that lose value past a certaindeadline) is one of them. Pizza, for example, must be delivered within 30 minutes, oth-erwise it gets cold and loses its taste. Hungry clients can order at home, or drive to amore expensive local restaurant, where they’re sure to get a hot pizza. The price of a homedelivered pizza is p = 1, while at the restaurant, the same pizza would cost p (1 + ρ ) = 1 . u = 2.The pizza delivery provider must exert costly effort to deliver orders within the deadline.A courier must be dispatched immediately (high effort), for an estimated cost of c = 0 . α = 99%), traffic conditions and unexpected accidents (e.g., the address is not easily found)may still delay some deliveries past the deadline.Once at the destination, the delivery person, as well as the client, know if the deliverywas late or not. As it is common practice, the provider can acknowledge being late, andreimburse the client. Clients may provide feedback to a reputation mechanism, but theirfeedback counts only if they were not reimbursed. The client’s fine for submitting a negativereport can be set for example at ε = 0 .
01. The future loss to the provider caused by thenegative report (and quantified through ¯ ε ) depends on the reputation mechanism.A simplified market of car garagists or plumbers could fit the same model. The provideris commissioned to repair a car (respectively the plumbing) and the quality of the workdepends on the exerted effort. High effort is more costly but ensures a lasting result withhigh probability. Low effort is cheap, but the resulting fix is only temporary. In bothcases, however, the warranty convention may specify the right of the client to ask for areimbursement if problems reoccur within the warranty period. Reputation feedback maybe submitted at the end of the warranty period, and is accepted only if reimbursementsdidn’t occur.An interesting emerging application comes with a new generation of web services thatcan optimally decide how to treat every request. For some service types, a high qualityresponse requires the exclusive use of costly resources. For example, computation jobsrequire CPU time, storage requests need disk space, information requests need queries todatabases. Sufficient resources, is a prerequisite, but not a guarantee for good service.Software and hardware failures may occur, however, these failures are properly signaledto the provider. Once monetary incentives become sufficiently important in such markets,intelligent providers will identify the moral hazard problem, and may act strategically asidentified in our model. btaining Reliable Feedback for Sanctioning Reputation Mechanisms
4. Behavior and Reporting Incentives
From game theoretic point of view, one interaction between the client and the provider canbe modeled by the extensive-form game ( G ) with imperfect public information, shown inFigure 1. The client moves first and decides (at node 1) whether to play in and interactwith the provider, or to play out and resort to the trusted outside option.Once the client plays in , the provider can chose at node 2 whether to exert high orlow effort (i.e., plays e or e respectively). When the provider plays e the generatedquality is low. When the provider plays e , nature chooses between high quality ( q ) withprobability α , and low quality ( q ) with probability 1 − α . The constant α is assumedcommon knowledge in the market. Having seen the resulting quality, the provider delivers(i.e., plays d ) the service, or acknowledges low quality and rolls back the transaction (i.e.,plays l ) by fully reimbursing the client. If the service is delivered, the client can reportpositive (1) or negative (0) feedback.A pure strategy is a deterministic mapping describing an action for each of the player’sinformation sets. The client has three information sets in the game G . The first informationset is singleton and contains the node 1 at the beginning of game when the client mustdecide between playing in or out . The second information set contains the nodes 7 and 8(the dotted oval in Figure 1) where the client must decide between reporting 0 or 1, giventhat she has received low quality, q . The third information set is singleton and contains thenode 9 where the client must decide between reporting 0 or 1, given that she received highquality, q . The strategy in q q , for example, is the honest reporting strategy, specifyingthat the client enters the game, reports 0 when she receives low quality, and reports 1 whenshe receives high quality. The set of pure strategies of the client is: A C = { out q q , out q q , out q q , out q q , in q q , in q q , in q q , in q q } ;Similarly, the set of pure strategies of the provider is: A P = { e l, e d, e l q l q , e l q d q , e d q l q , e d q d q } ;where e l q d q , for example, is the socially desired strategy: the provider exerts effort atnode 2, acknowledges low quality at node 5, and delivers high quality at node 6. A purestrategy profile s is a pair ( s C , s P ) where s C ∈ A C and s P ∈ A P . If ∆( A ) denotes the set ofprobability distributions over the elements of A , σ C ∈ ∆( A C ) and σ P ∈ ∆( A P ) are mixedstrategies for the client, respectively the provider, and σ = ( σ C , σ P ) is a mixed strategyprofile.The payoffs to the players depend on the chosen strategy profile, and on the moveof nature. Let g ( σ ) = (cid:0) g C ( σ ) , g P ( σ ) (cid:1) denote the pair of expected payoffs received bythe client, respectively by the provider when playing strategy profile σ . The function g :∆( A C ) × ∆( A P ) → R is characterized in Table 1 and also describs the normal formtransformation of G . Besides the corresponding payments made between the client andthe provider, Table 1 also reflects the influence of the reputation mechanism, as furtherexplained in Section 4.1. The four strategies of the client that involve playing out at node 1generate the same outcomes, and therefore, have been collapsed for simplicity into a singlerow of Table 1. urca & Faltings e e u-p(1+ )0 r in Client out
ClientClient ProviderProviderProvider ProviderNature l l l d d d0 0 01 1 1q q ee -pp-c-p-p-c- ee u-p-p-c- ee u-pp-c
12 34 657 8 9
Figure 1: The game representing one interaction. Empty circles represent decision nodes,edge labels represent actions, full circles represent terminal nodes and the dottedoval represents an information set. Payoffs are represented in rectangles, the toprow describes the payoff of the client, the second row describes the payoff of theprovider.
For every interaction, the reputation mechanism records one of the three different signals itmay receive: positive feedback when the client reports 1, negative feedback when the clientreports 0, and neutral feedback when the provider rolls back the transaction and reimbursesthe client. In Figure 1 (and Table 1) positive and neutral feedback do not influence thepayoff of the provider, while negative feedback imposes a punishment equivalent to ¯ ε .Two considerations made us choose this representation. First, we associate neutral andpositive feedback with the same reward (0 in this case) because intuitively, the acknowl-edgement of failure may also be regarded as “honest” behavior on behalf of the provider.Failures occur despite best effort, and by acknowledging them, the provider shouldn’t suffer.However, neutral feedback may also result because the provider did not exert effort. Thelack of punishment for these instances contradicts the goal of the reputation mechanism to btaining Reliable Feedback for Sanctioning Reputation Mechanisms P r o v i d e r Client in q q in q q in q q in q q oute l
00 00 00 00 u − p (1 + ρ )0 e d − pp − pp − p − εp − ¯ ε − p − εp − ¯ ε u − p (1 + ρ )0 e l q l q − c − c − c − c u − p (1 + ρ )0 e l q d q α ( u − p ) αp − c α ( u − p − ε ) α ( p − ¯ ε ) − c α ( u − p ) αp − c α ( u − p − ε ) α ( p − ¯ ε ) − c u − p (1 + ρ )0 e d q l q − (1 − α ) p (1 − α ) p − c − (1 − α ) p (1 − α ) p − c − (1 − α )( p + ε )(1 − α )( p − ¯ ε ) − c − (1 − α )( p + ε )(1 − α )( p − ¯ ε ) − c u − p (1 + ρ )0 e d q d q αu − pp − c α ( u − ε ) − pp − α ¯ ε − c αu − (1 − α ) ε − pp − (1 − α )¯ ε − c αu − ε − pp − ¯ ε − c u − p (1 + ρ )0 Table 1: Normal transformation of the extensive form game, G encourage exertion of effort. Fortunately, the action e l can be the result of rational behavioronly in two circumstances, both excusable: one, when the provider defends himself againsta malicious client that is expected to falsely report negative feedback (details in Section6), and two, when the environmental noise is too big ( α is too small) to justify exertion ofeffort. Neutral feedback can be used to estimate the parameter α , or to detect coalitionsof malicious clients, and indirectly, may influence the revenue of the provider. However,for the simplified model presented above, positive and neutral feedback are considered thesame in terms of generated payoffs.The second argument relates to the role of the RM to constrain the revenue of theprovider depending on the feedback of the client. There are several ways of doing that.Dellarocas (2005) describes two principles, and two mechanisms that punish the providerwhen the clients submit negative reports. The first, works by exclusion. After each negativereport the reputation mechanism bans the provider from the market with probability π .This probability can be tuned such that the provider has the incentive to cooperate almostall the time, and the market stays efficient. The second works by changing the conditionsof future trade. Every negative report triggers the decrease of the price the next N clientswill pay for the service. For lower values of N the price decrease is higher, nonetheless, N can take any value in an efficient market.Both mechanisms work because the future losses offset the momentary gain the providerwould have had by intentionally cheating on the client. Note that these penalties are givenendogenously by lost future opportunities, and require some minimum premiums for trustedproviders. When margins are not high enough, providers do not care enough about futuretransactions, and will use the present opportunity of cheating.Another option is to use exogenous penalties for cheating. For example, the providermay be required to buy a licence for operating in the market . The licence is partiallydestroyed by every negative feedback. Totaly destroyed licences must be restored througha new payment, and remaining parts can be sold if the provider quits the market. Theprice of the licence and the amount that is destroyed by a negative feedback can be scaled
4. The reputation mechanism can buy and sell market licences urca & Faltings such that rational providers have the incentive to cooperate. Unlike the previous solutions,this mechanism does not require minimum transaction margins as punishments for negativefeedback are directly subtracted from the upfront deposit.One way or another, all reputation mechanisms foster cooperation because the providerassociates value to client feedback. Let V ( R + ) and V ( R − ) be the value of a positive, respec-tively a negative report. In the game in Figure 1, V ( R + ) is normalized to 0, and V ( R − )is ¯ ε . By using this notation, we abstract away the details of the reputation mechanism,and retain only the essential punishment associated with negative feedback. Any reputa-tion mechanism can be plugged in our scheme, as long as the particular constraints (e.g.,minimum margins for transactions) are satisfied.One last aspect to be considered is the influence of the reputation mechanism on thefuture transactions of the client. If negative reports attract lower prices, rational long-runclients might be tempted to falsely report in order to purchase cheaper services in the future.Fortunately, some of the mechanisms designed for single-run clients, do not influence thereporting strategy of long-run clients. The reputation mechanism that only keeps the last N reports (Dellarocas, 2005) is one of them. A false negative report only influences thenext N transactions of the provider; given that more than N other requests are interleavedbetween any two successive requests of the same client, a dishonest reporter cannot decreasethe price for her future transactions.The licence-based mechanism we have described above is another example. The priceof service remains unchanged, therefore reporting incentives are unaffected. On the otherhand, when negative feedback is punished by exclusion, clients may be more reluctant toreport negatively, since they also lose a trading partner. The one-time game presented in Figure 1 has only one subgame equilibrium where the clientopts out . When asked to report feedback, the client always prefers to report 1 (reporting0 attracts the penalty ε ). Knowing this, the best strategy for the provider is to exert loweffort and deliver the service. Knowing the provider will play e d , it is strictly better forthe client to play out .The repeated game between the same client and provider may, however, have otherequilibria. Before analyzing the repeated game, let us note that every interaction between aprovider and a particular client can be strategically isolated and considered independently.As the provider accepts all clients and views them identically, he will maximize his expectedrevenue in each of the isolated repeated games.From now on, we will only consider the repeated interaction between the provider andone client. This can be modeled by a T -fold repetition of the stage game G , denoted G T ,where T is finite or infinite. In this paper we will deal with the infinite horizon case, however,the results obtained can also be applied with minor modifications to finitely repeated gameswhere T is large enough.If ˆ δ is the per period discount factor reflecting the probability that the market ceasesto exist after each round, (or the present value of future revenues), let us denote by δ theexpected discount factor in the game G T . If our client interacts with the provider on theaverage every N rounds, δ = ˆ δ N . btaining Reliable Feedback for Sanctioning Reputation Mechanisms The life-time expected payoff of the players is computed as: T X τ =0 δ τ g τi ; where i ∈ { C, P } is the client, respectively the provider, g τi is the expected payoff obtainedby player i in the τ th interaction, and δ τ is the discount applied to compute the present dayvalue of g τi .We will consider normalized life-time expected payoffs, so that payoffs in G and G T canbe expressed using the same measure: V i = (1 − δ ) T X τ =0 δ τ g τi ; (1) We define the average continuation payoff for player i from period t onward (and in-cluding period t ) as: V ti = (1 − δ ) T X τ = t δ τ − t g τi ; (2) The set of outcomes publicly perceived by both players after each round is: Y = { out, l, q , q , q , q } where: • out is observed when the client opts out , • l is observed when the provider acknowledges low quality and rolls back the transac-tion, • q i j is observed when the provider delivers quality q i ∈ { q , q } and the client reports j ∈ { , } .We denote by h t a specific public history of the repeated game out of the set H t = ( × Y ) t ofall possible histories up to and including period t . In the repeated game, a public strategy σ i of player i is a sequence of maps ( σ ti ), where σ ti : H t − → ∆( A i ) prescribes the (mixed)strategy to be played in round t , after the public history h t − ∈ H t − . A perfect publicequilibrium (PPE) is a profile of public strategies σ = ( σ C , σ P ) that, beginning at any time t and given any public history h t − , form a Nash equilibrium from that point on (Fudenberg,Levine, & Maskin, 1994). V ti ( σ ) is the continuation payoff to player i given by the strategyprofile σ . G is a game with product structure since any public outcome can be expressed as a vectorof two components ( y C , y P ) such that the distribution of y i depends only on the actions ofplayer i ∈ { C, P } , the client, respectively the provider. For such games, Fudenberg et al.(1994) establish a Folk Theorem proving that any feasible, individually rational payoffprofile is achievable as a PPE of G ∞ when the discount factor is close enough to 1. The setof feasible, individually rational payoff profiles is characterized by: • the minimax payoff to the client, obtained by the option out : V C = u − p (1 + ρ ); urca & Faltings V C V P ëp à - c p à - c p ë ( u à - p ) ëu à - p - pu à - p (1+ ú ) V P ( e l q d q ;in q q )( e d q d q ;in q q )( e d;in q q ) pareto optimalfrontier Figure 2: The pareto-optimal frontier of the set of feasible, individually rational payoffprofiles of G . • the minimax payoff to the provider, obtained when the provider plays e l : V P = 0; • the pareto optimal frontier (graphically presented in Figure 2) delimited by thepayoffs given by (linear combination of) the strategy profiles ( in q q , e l q d q ),( in q q , e d q d q ) and ( in q q , e d ).and contains more than one point (i.e., the payoff when the client plays out ) when α ( u − p ) >u − p (1 + ρ ) and αp − c >
0. Both conditions impose restrictions on the minimum margingenerated by a transaction such that the interaction is profitable. The PPE payoff profilethat gives the provider the maximum payoff is ( V C , V P ) where: V P = ( α ∗ u − c − u + p (1 + ρ ) if ρ ≤ u (1 − α ) p p + c ( pρ − u ) αu if ρ > u (1 − α ) p and V C is defined above.While completely characterizing the set of PPE payoffs for discount factors strictlysmaller than 1 is outside the scope of this paper, let us note the following results:First, if the discount factor is high enough (but strictly less than 1) with respect to theprofit margin obtained by the provider from one interaction, there is at least one PPE suchthat the reputation mechanism records only honest reports. Moreover, this equilibrium ispareto-optimal. Proposition 1
When δ > pp (1+ α ) − c , the strategy profile: • the provider always exerts high effort, and delivers only high quality; if the clientdeviates from the equilibrium , the provider switches to e d for the rest of the rounds; btaining Reliable Feedback for Sanctioning Reputation Mechanisms • the client always reports when asked to submit feedback; if the provider deviates,(i.e., she receives low quality), the client switches to out for the rest of the rounds.is a pareto-optimal PPE. Proof.
It is not profitable for the client to deviate from the equilibrium path. Reporting0 attracts the penalty ε in the present round, and the termination of the interaction withthe provider (the provider stops exerting effort from that round onwards).The provider, on the other hand, can momentarily gain by deviating to e d q d q or e d .A deviation to e d q d q gives an expected momentary gain of p (1 − α ) and an expectedcontinuation loss of (1 − α )( αp − c ). A deviation to e d brings an expected momentarygain equal to (1 − α ) p + c and an expected continuation loss of αp − c . For the discountfactor satisfying our hypothesis, both deviations are not profitable. The discount factoris low enough with respect to profit margins, such that the future revenues given by theequilibrium strategy offset the momentary gains obtained by deviating.The equilibrium payoff profile is ( V C , V P ) = ( α ( u − p ) , αp − c ), which is pareto-optimaland socially efficient. (cid:3) Second, we can prove that the client never reports negative feedback in any pareto-optimal PPE, regardless the value of the discount factor. The restriction to pareto-optimalis justifiable by practical reasons: assuming that the client and the provider can somehownegotiate the equilibrium they are going to play, it makes most sense to choose one of thepareto-optimal equilibria.
Proposition 2
The probability that the client reports negative feedback on the equilibriumpath of any pareto-optimal PPE strategy is zero.
Sketch of Proof.
The full proof presented in Appendix A follows the following steps.Step 1, all equilibrium payoffs can be expressed by adding the present round payoff to thediscounted continuation payoff from the next round onward. Step 2, take the PPE payoffprofile V = ( V C , V P ), such that there is no other PPE payoff profile V ′ = ( V ′ C , V P ) with V C < V ′ C . The client never reports negative feedback in the first round of the equilibriumthat gives V . Step 3, the equilibrium continuation payoff after the first round also satisfiesthe conditions set for V . Hence, the probability that the client reports negative feedback onthe equilibrium path that gives V is 0. Pareto-optimal PPE payoff profiles clearly satisfythe definition of V , hence the result of the proposition. (cid:3) The third result we want to mention here, is that there is an upper bound on thepercentage of false reports recorded by the reputation mechanism in any of the pareto-optimal equilibria.
Proposition 3
The upper bound on the percentage of false reports recorded by the reputa-tion mechanism in any PPE equilibrium is: γ ≤ ( (1 − α )( p − u )+ pρp if pρ ≤ u (1 − α ); pρu if pρ > u (1 − α ) (3)405 urca & Faltings Sketch of Proof.
The full proof presented in Appendix B builds directly on the result ofProposition 2. Since clients never report negative feedback along pareto-optimal equilibria,the only false reports recorded by the reputation mechanism appear when the providerdelivers low quality, and the client reports positive feedback. However, any PPE profilemust give the client at least V C = u − p (1 + ρ ), otherwise the client is better off by resortingto the outside option. Every round in which the provider deliberatively delivers low qualitygives the client a payoff strictly smaller than u − p (1 + ρ ). An equilibrium payoff greaterthan V C is therefore possible only when the percentage of rounds where the provider deliverslow quality is bounded. The same bound limits the percentage of false reports recorded bythe reputation mechanism. (cid:3) For a more intuitive understanding of the results presented in this section, let us referto the pizza delivery example detailed in Section 3.1. The price of a home delivered pizza is p = 1, while at the local restaurant the same pizza would cost p (1 + ρ ) = 1 .
2. The utility ofa warm pizza to the client is u = 2, the cost of delivery is c = 0 . − α = 0 . V C = u − p (1 + ρ ) = 0 . V C = α ( u − p ) = 0 .
99, while the payoff of the provider is V P = αp − c = 0 . δ = pp (1+ α ) − c = 0 .
84; assuming that the daily discount factor of the pizza service is ˆ δ = 0 . / (1 − ˆ δ ) = 250 interactions (with all clients), while the average lifetime of the client is atleast 1 / (1 − δ ) = 7 interactions (with the same pizza delivery service). These are clearlyrealistic numbers.Proposition 3 gives an upper bound on the percentage of false reports that our mecha-nism may record in equilibrium from the clients. As u (1 − α ) = 0 . < . pρ , this limitis: γ = pρu = 0 . It follows that at least 90% of the reports recorded by our mechanism (in any equilibrium)are correct. The false reports (false positive reports) result from rare cases where the pizzadelivery is intentionally delayed to save some cost but clients do not complain. The falsereport can be justified, for example, by the provider’s threat to refuse future orders fromclients that complain. Given that late deliveries are still rare enough, clients are better offwith the home delivery than with the restaurant, hence they accept the threat. As otheroptions become available to the clients (e.g., competing delivery services) the bound γ willdecrease. btaining Reliable Feedback for Sanctioning Reputation Mechanisms Please note that the upper bound defined by Proposition 3 only depends on the outsidealternative available to the provider, and is not influenced by the punishment ¯ ε introducedby the reputation mechanism. This happens because the revenue of a client is independentof the interactions of other clients, and therefore, on the reputation information as reportedby other clients. Equilibrium strategies are exclusively based on the direct experience ofthe client. In the following section, however, we will refine this bound by considering thatclients can build a reputation for reporting honestly. There, the punishment ¯ ε plays animportant role.
5. Building a Reputation for Truthful Reporting
An immediate consequence of Propositions 2 and 3 is that the provider can extract all of thesurplus created by the transactions by occasionally delivering low quality, and convincingthe clients not to report negative feedback (providers can do so by promising sufficiently highcontinuation payoffs that prevent the client to resort to the outside provider). Assumingthat the provider has more “power” in the market, he could influence the choice of theequilibrium strategy to one that gives him the most revenue, and holds the clients close tothe minimax payoff V C = u − p (1 + ρ ) given by the outside option. However, a client who could commit to report honestly, (i.e., commit to play the strategy s ∗ C = in q q ) would benefit from cooperative trade. The provider’s best response against s ∗ C is to play e l q d q repeatedly, which leads the game to the socially efficient outcome.Unfortunately the commitment to s ∗ C is not credible in the complete information game, forthe reasons explained in Section 4.2.Following the results of Kreps et al. (1982), Fudenberg and Levine (1989) and Schmidt(1993) we know that such honest reporting commitments may become credible in a gamewith incomplete information. Suppose that the provider has incomplete information in G ∞ ,and believes with non-negative probability that he is facing a committed client that alwaysreports the truth. A rational client can then “fake” the committed client, and “build areputation” for reporting honestly. When the reputation becomes credible, the providerwill play e l q d q (the best response against s ∗ C ), which is better for the client than thepayoff she would obtain if the provider knew she was the “rational” type.As an effect of reputation building, the set of equilibrium points is reduced to a setwhere the payoff to the client is higher than the payoff obtained by a client committed toreport honestly. As anticipated from Proposition 3, a smaller set of equilibrium points alsoreduces the bound of false reports recorded by the reputation mechanism. In certain cases,this bound can be reduced to almost zero.Formally, incomplete information can be modeled by a perturbation of the completeinformation repeated game G ∞ such that in period 0 (before the first round of the game isplayed) the “type” of the client is drawn by nature out of a countable set Θ according tothe probability measure µ . The client’s payoff now additionally depends on her type. We
5. All pareto-optimal PPE payoff profiles are also renegotiation-proof (Bernheim & Ray, 1989; Farrell &Maskin, 1989). This follows from the proof of Proposition 3: the continuation payoffs enforcing a pareto-optimal PPE payoff profile are also pareto-optimal. Therefore, clients falsely report positive feedbackeven under the more restrictive notion of negotiation-proof equilibrium. urca & Faltings say that in the perturbed game G ∞ ( µ ) the provider has incomplete information because heis not sure about the true type of the client.Two types from Θ have particular importance: • The “normal” type of the client, denoted by θ , is the rational client who has thepayoffs presented in Figure 1. • The “commitment” type of the client, denoted by θ ∗ , always prefers to play the com-mitment strategy s ∗ C . From a rational perspective, the commitment type client obtainsan arbitrarily high supplementary reward for reporting the truth. This external re-ward makes the strategy s ∗ C the dominant strategy, and therefore, no commitmenttype client will play anything else than s ∗ C .In Theorem 1 we give an upper bound k P on the number of times the provider deliverslow quality in G ∞ ( µ ), given that he always observes the client reporting honestly.The intuition behind this result is the following. The provider’s best response to ahonest reporter is e l q d q : always exert high effort, and deliver only when the quality ishigh. This gives the commitment type client her maximum attainable payoff in G ∞ ( µ ),corresponding to the socially efficient outcome. The provider, however, would be better offby playing against the normal type client, against whom he can obtain an expected payoffgreater than αp − c .The normal type client may be distinguished from a commitment type client only inthe rounds when the provider delivers low quality: the commitment type always reportsnegative feedback, while the normal type might decide to report positive feedback in orderto avoid the penalty ε . The provider can therefore decide to deliver low quality to the clientin order to test her real type. The question is, how many times should the provider testthe true type of the client.Every failed test (i.e., the provider delivers low quality and the client reports negativefeedback) generates a loss of − ¯ ε to the provider, and slightly enforces the belief that theclient reports honestly. Since the provider cannot wait infinitely for future payoffs, theremust be a time when the provider will stop testing the type of the provider, and accepts toplay the socially efficient strategy, e l q d q .The switch to the socially efficient strategy is not triggered by a revelation of the client’stype. The provider believes that the client behaves as if she were a commitment type, notthat the client is a commitment type. The client may very well be a normal type whochooses to mimic the commitment type, in the hope that she will obtain better service fromthe provider. However, further trying to determine the true type of the client is too costlyfor the provider. Therefore, the provider chooses to play e l q d q , which is the best responseto the commitment strategy s ∗ C . Theorem 1
If the provider has incomplete information in G ∞ , and assigns positive prob-ability to the normal and commitment type of the client ( µ ( θ ) > , µ ∗ = µ ( θ ∗ ) > ), thereis a finite upper bound, k P , on the number of times the provider delivers low quality in anyequilibrium of G ∞ ( µ ) . This upper bound is: k P = ln( µ ∗ )ln (cid:16) δ ( V P − αp + c )+(1 − δ ) pδ ( V P − αp + c )+(1 − δ )¯ ε (cid:17) (4)408 btaining Reliable Feedback for Sanctioning Reputation Mechanisms Proof.
First, we use an important result obtained by Fudenberg and Levine (1989) aboutstatistical inference (Lemma 1): If every previously delivered low quality service was sanc-tioned by a negative report, the provider must expect with increasing probability that hisnext low quality delivery will also be sanctioned by negative feedback. Technically, for any π <
1, the provider can deliver at most n ( π ) low quality services (sanctioned by negativefeedback) before expecting that the n ( π ) + 1 low quality delivery will also be sanctioned bynegative feedback with probability greater then π . This number equals to: n ( π ) = (cid:22) ln µ ∗ ln π (cid:23) ; As stated earlier, this lemma does not prove that the provider will become convincedthat he is facing a commitment type client. It simply proves that after a finite number ofrounds the provider becomes convinced that the client is playing as if she were a commitmenttype.Second, if π > δV P δV P +(1 − δ )¯ ε but is strictly smaller than 1, the rational provider doesnot deliver low quality (it is easy to verify that the maximum discounted future gain doesnot compensate for the risk of getting a negative feedback in the present round). By thepreviously mentioned lemma, it must be that in any equilibrium, the provider delivers lowquality a finite number of times.Third, let us analyze the round, ¯ t , when the provider is about to deliver a low qualityservice (play d q ) for the last time. If π is the belief of the provider that the client reportshonestly in round ¯ t , his expected payoff (just before deciding to deliver the low qualityservice) can be computed as follows: • with probability π the client reports 0. Her reputation for reporting honestly becomescredible, so the provider plays e l q d q in all subsequent rounds. The provider gains p − ¯ ε in the current round, and expects αp − c for the subsequent rounds; • with probability 1 − π , the client reports 1 and deviates from the commitment strategy,the provider knows he is facing a rational client, and can choose a continuation PPEstrategy from the complete information game. He gains p in the current round, andexpects at most V P in the subsequent rounds; V P ≤ (1 − δ )( p − π ¯ ε ) + δ ( π ( αp − c ) + (1 − π ) V P ) On the other hand, had the provider acknowledged the low quality and rolled back thetransaction (i.e., play l q ), his expected payoff would have been at least: V ′ P ≥ (1 − δ )0 + δ ( αp − c ) Since the provider chooses nonetheless to play d q it must be that V P ≥ V ′ P which isequivalent to: π ≤ π = δ ( V P − αp + c ) + (1 − δ ) pδ ( V P − αp + c ) + (1 − δ )¯ ε (5)409 urca & Faltings Finally, by replacing Equation (5) in the definition of n ( π ) we obtain the upper boundon the number of times the provider delivers low quality service to a client committed toreport honestly. (cid:3) The existence of k P further reduces the possible equilibrium payoffs a client can get in G ∞ ( µ ). Consider a rational client who receives for the first time low quality. She has thefollowing options: • report negative feedback and attempt to build a reputation for reporting honestly.Her payoff for the current round is − p − ε . Moreover, her worst case expectation forthe future is that the next k P − − p − ε , followed by thecommitment payoff equal to α ( u − p ): V C | − δ )( − p − ε ) + δ (1 − δ k P − )( − p − ε ) + δ k P α ( u − p ); (6) • on the other hand, by reporting positive feedback she reveals to be a normal type,loses only p in the current round, and expects a continuation payoff equal to ˆ V C givenby a PPE strategy profile of the complete information game G ∞ : V C | − δ )( − p ) + δ ˆ V C ; (7) The reputation mechanism records false reports only when clients do not have the in-centive to build a reputation for reporting honestly, and V C | > V C |
0; this is true for: ˆ V C > δ k P − α ( u − p ) − (1 − δ k P − )( p + ε ) − − δδ ε ; Following the argument of Proposition 3 we can obtain a bound on the percentage offalse reports recorded by the reputation mechanism in a pareto-optimal PPE that gives theclient at least ˆ V C : ˆ γ = ( α ( u − p ) − ˆ V C p if ˆ V C ≥ αu − p ; u − p − ˆ V C u if ˆ V C < αu − p (8) Of particular importance is the case when k P = 1. ˆ V C and ˆ γ become: ˆ V C = α ( u − p ) − − δδ ε ; ˆ γ = (1 − δ ) εδp ; (9) so the probability of recording a false report (after the first one) can be arbitrarily close to0 as ε → k P ,defined in Theorem 1, as a function of the prior belief ( µ ∗ ) of the provider that the clientis an honest reporter. We have used a value of the discount factor equal to δ = 0 .
95, suchthat on average, every client interacts 1 / (1 − δ ) = 20 times with the same provider. Thepenalty for negative feedback was taken ¯ ε = 2 .
5. When the provider believes that 20% of btaining Reliable Feedback for Sanctioning Reputation Mechanisms µ ∗ k P Figure 3: The upper bound k P as a function of the prior belief µ ∗ .the clients always report honestly, he will deliver at most 3 times low quality. When thebelief goes up to µ ∗ = 40% no rational provider will deliver low quality more than once.In Figure 4 we plot the values of the bounds γ (Equation (3)) and ˆ γ (Equation (8)) asa function of the prior belief µ ∗ . The bounds simultaneously hold, therefore the maximumpercentage of false reports recorded by the reputation mechanism is the minimum of thetwo. When µ ∗ is less 0 . k P ≥ γ ≤ ˆ γ , and the reputation effect does not significantlyreduce the worst case percentage of false reports recorded by the mechanism. However, when µ ∗ ∈ (0 . , .
4) the reputation mechanism records (in the worst case) only half as many falsereports, and as µ ∗ > .
4, the percentage of false reports drops to 0 . ε . In the limit, as ε approaches 0, thereputation mechanism will register a false report with vanishing probability.The result of Theorem 1 has to be interpreted as a worst case scenario. In real markets,providers that already have a small predisposition to cooperate will defect fewer times.Moreover, the mechanism is self enforcing, in the sense that the more clients act as commit-ment types, the higher will be the prior beliefs of the providers that new, unknown clientswill report truthfully, and therefore the easier it will be for the new clients to act as truthfulreporters.As mentioned at the end of Section 4.2, the bound ˆ γ strongly depends on the punishment¯ ε imposed by the reputation mechanism for a negative feedback. The higher ¯ ε , the easier itis for clients to build a reputation, and therefore, the lower the amount of false informationrecorded by the reputation mechanism.
6. The Threat of Malicious Clients
The mechanism described so far encourages service providers to do their best and delivergood service. The clients were assumed rational, or committed to report honestly, and urca & Faltings µ ∗ γ , ˆ γ γ ˆ γ min( γ, ˆ γ ) Figure 4: The maximum probability of recording a false report as a function of the priorbelief µ ∗ .in either case, they never report negative feedback unfairly. In this section, we investigatewhat happens when clients explicitly try to “hurt” the providers by submitting fake negativeratings to the reputation mechanism.An immediate consequence of fake negative reports is that clients lose money. However,the costs ε of a negative report would probably be too small to deter clients with sepa-rate agendas from hurting the provider. Fortunately, the mechanism we propose naturallyprotects service providers from consistent attacks initiated by malicious clients.Formally, a malicious type client, θ β ∈ Θ, obtains a supplementary (external) payoff β for reporting negative feedback. Obviously, β has to be greater than the penalty ε , otherwisethe results of Proposition 2 would apply. In the incomplete information game G ∞ ( µ ), theprovider now assigns non-zero initial probability to the belief that the client is malicious.When only the normal type, θ , the honest reporter type θ ∗ and the malicious type θ β have non-zero initial probability, the mechanism we describe is robust against unfairnegative reports. The first false negative report exposes the client as being malicious, sinceneither the normal, nor the commitment type report 0 after receiving high quality. ByBayes’ Law, the provider’s updated belief following a false negative report must assignprobability 1 to the malicious type. Although providers are not allowed to refuse servicerequests, they can protect themselves against malicious clients by playing e l : i.e., exertlow effort and reimburse the client afterwards. The RM records neutral feedback in thiscase, and does not sanction the provider. Against e l , malicious clients are better off byquitting the market (opt out ), thus stopping the attack. The RM records at most one falsenegative report for every malicious client, and assuming that identity changes are difficult,providers are not vulnerable to unfair punishments. btaining Reliable Feedback for Sanctioning Reputation Mechanisms When other types (besides θ , θ ∗ and θ β ) have non-zero initial probability, maliciousclients are harder to detect. They could masquerade client types that are normal, butaccidentally misreport. It is not rational for the provider to immediately exclude (by playing e l ) normal clients that rarely misreport: the majority of the cooperative transactionsrewarded by positive feedback still generate positive payoffs. Let us now consider theclient type θ ( ν ) ∈ Θ that behaves exactly like the normal type, but misreports 0 insteadof 1 independently with probability ν . When interacting with the client type θ ( ν ), theprovider receives the maximum number of unfair negative reports when playing the efficientequilibrium: i.e., e l q d q . In this case, the provider’s expected payoff is: V P = αp − c − ν ¯ ε ; Since V P has to be positive (the minimax payoff of the provider is 0, given by e l ), it mustbe that ν ≤ αp − c ¯ ε .The maximum value of ν is also a good approximation for the maximum percentageof false negative reports the malicious type can submit to the reputation mechanism. Anysignificantly higher number of harmful reports exposes the malicious type and allows theprovider to defend himself.Note, however, that the malicious type can submit a fraction ν of false reports onlywhen the type θ ( ν ) has positive prior probability. When the provider does not believe thata normal client can make so many mistakes (even if the percentage of false reports is stilllow enough to generate positive revenues) he attributes the false reports to a malicious type,and disengages from cooperative behavior. Therefore, one method to reduce the impact ofmalicious clients is to make sure that normal clients make few or no mistakes. Technicalmeans (for example by providing automated tools for formatting and submitting feedback),or improved user interfaces (that make it easier for human users to spot reporting mistakes)will greatly limit the percentage of mistakes made by normal clients, and therefore, alsoreduce the amount of harm done by malicious clients.One concrete method for reducing mistakes is to solicit only negative feedback from theclients (the principle that no news is good news, also applied by Dellarocas (2005)). Asreporting involves some conscious decision, mistakes will be less frequent. On the otherhand, the reporting effort will add to the penalty for a negative report, and makes it harderfor normal clients to establish a reputation for honest reporters. Alternative methods forreducing the harm done by malicious clients (like filtering mechanisms, etc., ) as well astighter bounds on the percentage of false reports introduced by such clients will be furtheraddressed in future work.
7. Discussion and Future Work
Further benefits can be obtained if the clients’ reputation for reporting honestly is sharedwithin the market. The reports submitted by a client while interacting with other providerswill change the initial beliefs of a new provider. As we have seen in Section 5, providerscheat less if they a priory expect with higher probability to encounter honest reportingclients. A client that has once built a reputation for truthfully reporting the provider’sbehavior will benefit from cooperative trade during her entire lifetime, without having toconvince each provider separately. Therefore the upper bound on the loss a client has towithstand in order to convince a provider that she is a commitment type, becomes an upper urca & Faltings bound on the total loss a client has to withstand during her entire lifetime in the market.How to effectively share the reputation of clients within the market remains an open issue.Correlated with this idea is the observation that clients that use our mechanism aremotivated to keep their identity. In generalized markets where agents are encouraged toplay both roles (e.g. a peer-2-peer file sharing market where the fact that an agent actsonly as “provider” can be interpreted as a strong indication of “double identity” with theintention of cheating) our mechanism also solves the problem signaled by Friedman andResnick (2001) related to cheap online pseudonyms. The price to pay for the new identityis the loss due to building a reputation as truthful reporter when acting as a client.Unlike incentive-compatible mechanism that pay reporters depending on the feedbackprovided by peers, the mechanism described here is less vulnerable to collusion. The onlyreason individual clients would collude is to badmouth (i.e., artificially decrease the rep-utation of) a provider. However, as long as the punishment for negative feedback is notsuper-linear in the number of reports (this is usually the case), coordinating within a coali-tion brings no benefits for the colluders: individual actions are just as effective as the actionswhen part of a coalition. The collusion between the provider and client can only acceler-ate the synchronization of strategies on one of the PPE profiles (collusion on a non-PPEstrategy profile is not stable), which is rather desirable. The only profitable collusion canhappen when competitor providers incentivize normal clients to unfairly downrate theircurrent provider. Colluding clients become malicious in this case, and the limits on theharm they can do are presented in Section 6.The mechanism we describe here is not a general solution for all online markets. Ingeneral retail e-commerce, clients don’t usually interact with the same service provider morethan once. As we have showed along this paper, the assumption of a repeated interactionis crucial for our results. Nevertheless, we believe there are several scenarios of practicalimportance that do meet our requirements (e.g., interactions that are part of a supply chain).For these, our mechanism can be used in conjunction with other reputation mechanisms toguarantee reliable feedback and improve the overall efficiency of the market.Our mechanism can be further criticized for being centralized. The reputation mecha-nism acts as a central authority by supervising monetary transactions, collecting feedbackand imposing penalties on the participants. However, we see no problem in implementingthe reputation mechanism as a distributed system. Different providers can use differentreputation mechanisms, or, can even switch mechanisms given that some safeguarding mea-sures are in place. Concrete implementations remain to be addressed by future work.Although we present a setting where the service always costs the same amount, ourresults can be easily extended to scenarios where the provider may deliver different kindsof services, having different prices. As long as the provider believes that requests arerandomly drawn from some distribution, the bounds presented above can be computedusing the average values of u , p and c . The constraint on the provider’s belief is necessaryin order to exclude some unlikely situations where the provider cheats on a one time highvalue transaction, knowing that the following interactions carry little revenue, and therefore,cannot impose effective punishments.In this paper, we systematically overestimate the bounds on the worst case percentageof false reports recorded by the mechanism. The computation of tight bounds requires aprecise quantitative description of the actual set of PPE payoffs the client and the provider btaining Reliable Feedback for Sanctioning Reputation Mechanisms can have in G ∞ . Fudenberg et al. (1994) and Abreu, Pearce, and Stacchetti (1990) posethe theoretical grounds for computing the set of PPE payoffs in an infinitely repeated gamewith discount factors strictly smaller than 1. However, efficient algorithms that allow us tofind this set are still an open question. As research in this domain progresses, we expect tobe able to significantly lower the upper bounds described in Sections 4 and 5.One direction of future research is to study the behavior of the above mechanism whenthere is two-sided incomplete information: i.e. the client is also uncertain about the typeof the provider. A provider type of particular importance is the “greedy” type who alwayslikes to keep the client to a continuation payoff arbitrarily close to the minimal one. In thissituation we expect to be able to find an upper bound k C on the number of rounds in which arational client would be willing to test the true type of the provider. The condition k P < k C describes the constraints on the parameters of the system for which the reputation effectwill work in the favor of the client: i.e. the provider will give up first the “psychological”war and revert to a cooperative equilibrium.The problem of involuntary reporting mistakes briefly mentioned in Section 6 needsfurther addressing. Besides false negative mistakes (reporting 0 instead of 1), normal clientscan also make false positive mistakes (report 1 instead of the intended 0). In our presentframework, one such mistake is enough ro ruin the reputation of a normal type client toreport honestly. This is one of the reasons why we chose a sequential model where thefeedback of the client is not required if the provider acknowledges low quality. Once thereputation of the client becomes credible, the provider always rolls back the transactions thatgenerate (accidentally or not) low quality, so the client is not required to continuously defendher reputation. Nevertheless, the consequences of reporting mistakes in the reputationbuilding phase must be considered in more detail. Similarly, mistakes made by the provider,monitoring and communication errors will also influence the results presented here.Last, but not the least, practical implementations of the mechanism we propose mustaddress the problem of persistent online identities. One possible attack created by easyidentity changes has been mentioned in Section 6: malicious buyers can continuously changeidentity in order to discredit the provider. In another attack, the provider can use fakeidentities to increase his revenue. When punishments for negative feedback are generatedendogenously by decreased prices in a fixed number of future transactions (e.g., Dellarocas,2005), the provider can adopt the following strategy: he cheats on all real customers, butgenerates a sufficient number of fake transactions in between two real transactions, suchthat the effect created by the real negative report disappears. An easy fix to this latterattack is to charge transaction or entrance fees. However, these measures also affect theoverall efficiency of the market, and therefore, different applications will most likely needindividual solutions.
8. Conclusions
Effective reputation mechanisms must provide appropriate incentives in order to obtainhonest feedback from self-interested clients. For environments characterized by adverseselection, direct payments can explicitly reward honest information by conditioning theamount to be paid on the information reported by other peers. The same technique un-fortunately does not work when service providers have moral hazard, and can individually urca & Faltings decide which requests to satisfy. Sanctioning reputation mechanisms must therefore useother mechanisms to obtain reliable feedback.In this paper we describe an incentive-compatible reputation mechanism when the clientsalso have a repeated presence in the market. Before asking feedback from the clients, weallow the provider to acknowledge failures and reimburse the price paid for service. Whenfuture transactions generate sufficient profit, we prove that there is an equilibrium wherethe provider behaves as socially desired: he always exerts effort, and reimburses clients thatoccasionally receive bad service due to uncontrollable factors. Moreover, we analyze theset of pareto-optimal equilibria of the mechanism, and establish a limit on the maximumamount of false information recorded by the mechanism. The bound depends both on theexternal alternatives available to clients and on the ease with which they can commit toreporting the truth.
Appendix A. Proof of Proposition 2
The probability that the client reports negative feedback on the equilibrium path of anypareto-optimal PPE strategy is zero.
Proof.
Step 1 . Following the principle of dynamic programming (Abreu et al., 1990), the payoffprofile V = ( V C , V P ) is a PPE of G ∞ , if and only if there is a strategy profile σ in G , andthe continuation PPE payoffs profiles { W ( y ) | y ∈ Y } of G ∞ , such that: • V is obtained by playing σ in the current round, and a PPE strategy that gives W ( y )as a continuation payoff, where y is the public outcome of the current round, and P r [ y | σ ] is the probability of observing y after playing σ : V C = (1 − δ ) g C ( σ ) + δ (cid:16) X y ∈ Y P r [ y | σ ] · W C ( y ) (cid:17) ; V P = (1 − δ ) g P ( σ ) + δ (cid:16) X y ∈ Y P r [ y | σ ] · W P ( y ) (cid:17) ; • no player finds it profitable to deviate from σ : V C ≥ (1 − δ ) g C (cid:0) ( σ ′ C , σ P ) (cid:1) + δ (cid:16) X y ∈ Y P r (cid:2) y | ( σ ′ C , σ P ) (cid:3) · W C ( y ) (cid:17) ; ∀ σ ′ C = σ C V P ≥ (1 − δ ) g P (cid:0) ( σ C , σ ′ P ) (cid:1) + δ (cid:16) X y ∈ Y P r (cid:2) y | ( σ C , σ ′ P ) (cid:3) · W P ( y ) (cid:17) ; ∀ σ ′ P = σ P The strategy σ and the payoff profiles { W ( y ) | y ∈ Y } are said to enforce V . Step 2.
Take the PPE payoff profile V = ( V C , V P ), such that there is no other PPEpayoff profile V ′ = ( V ′ C , V P ) with V C < V ′ C . Let σ and { W ( y ) | y ∈ Y } enforce V , and assumethat σ assigns positive probability β = P r [ q | σ ] > q
0. If β = P r [ q | σ ](possibly equal to 0), let us consider: • the strategy profile σ ′ = ( σ ′ C , σ P ) where σ ′ C is obtained from σ C by asking the clientto report 1 instead of 0 when she receives low quality (i.e., q ); btaining Reliable Feedback for Sanctioning Reputation Mechanisms • the continuation payoffs { W ′ ( y ) | y ∈ Y } such that W ′ i ( q
1) = β W i ( q
0) + β W i ( q W ′ i ( y = q
1) = W i ( y ) for i ∈ { C, P } . Since, the set of correlated PPE payoffprofiles of G ∞ is convex, if W ( y ) are PPE payoff profiles, so are W ′ ( y ).The payoff profile ( V ′ C , V P ), V ′ C = V C + (1 − δ ) β ε is a PPE equilibrium profile becauseit can be enforced by σ ′ and { W ′ ( y ) | y ∈ Y } . However, this contradicts our assumption that V ′ C < V C , so P r [ q | σ ] must be 0. Following exactly the same argument, we can prove that P r [ q | σ ] = 0. Step 3.
Taking V , σ and { W ( y ) | y ∈ Y } from step 2, we have: V C = (1 − δ ) g C ( σ ) + δ (cid:16) X y ∈ Y P r [ y | σ ] · W C ( y ) (cid:17) ; (10) If no other PPE payoff profile V ′ = ( V ′ C , V P ) can have V ′ C > V C , it must be that thecontinuation payoffs W ( y ) satisfy the same property. (Assume otherwise that there is aPPE ( W ′ C ( y ) , W P ( y )) with W ′ C ( y ) > W C ( y ). Replacing W ′ C ( y ) in (10) we obtain V ′ thatcontradicts the hypothesis).By continuing the recursion, we obtain that the client never reports 0 on the equilibriumpath that enforces a payoff profile as defined in Step 2. Pareto-optimal payoff profiles clearlyenter this category, hence the result of the proposition. (cid:3) Appendix B. Proof of Proposition 3
The upper bound on the percentage of false reports recorded by the reputation mechanismin any PPE equilibrium is: γ ≤ ( (1 − α )( p − u )+ pρp if pρ ≤ u (1 − α ); pρu if pρ > u (1 − α ) Proof.
Since clients never report negative feedback along pareto-optimal equilibria,the only false reports recorded by the reputation mechanism appear when the providerdelivers low quality, and the client reports positive feedback. Let σ = ( σ C , σ P ) be a pareto-optimal PPE strategy profile. σ induces a probability distribution over public historiesand, therefore, over expected outcomes in each of the following transactions. Let µ t be theprobability distribution induced by σ over the outcomes in round t . µ t ( q
0) = µ t ( q
0) = 0as proven by Proposition 2. The payoff received by the client when playing σ is therefore: V C ( σ ) ≤ (1 − δ ) ∞ X t =0 δ t (cid:16) µ t ( q − p ) + µ t ( q u − p ) + µ t ( l )0 + µ t ( out )( u − p − pρ ) (cid:17) ; where µ t ( q µ t ( q µ t ( l )+ µ t ( out ) = 1 and µ t ( q µ t ( l ) ≥ (1 − α ) µ t ( q /α , becausethe probability of q is at least (1 − α ) /α times the probability of q .When the discount factor, δ , is the probability that the repeated interaction will stopafter each transaction, the expected probability of the outcome q γ = (1 − δ ) ∞ X t =0 δ t µ t ( q urca & Faltings Since any PPE profile must give the client at least V C = u − p (1+ ρ ), (otherwise the clientis better off by resorting to the outside option), V C ( σ ) ≥ V C . By replacing the expressionof V C ( σ ), and taking into account the constraints on the probability of q we obtain: γ ( − p ) + ( u − p ) · min (cid:0) − γ, α (cid:1) ≤ V C ; γ ≤ ( (1 − α )( p − u )+ pρp if pρ ≤ u (1 − α ); pρu if pρ > u (1 − α ) (cid:3) References
Abreu, P., Pearce, D., & Stacchetti, E. (1990). Toward a Theory of Discounted RepeatedGames with Imperfect Monitoring.
Econometrica , (5), 1041 – 1063.Bernheim, B. D., & Ray, D. (1989). Collective Dynamic Consistency in Repeated Games. Games and Economic Behavior , , 295–326.Birk, A. (2001). Learning to Trust. In Falcone, R., Singh, M., & Tan, Y.-H. (Eds.), Trustin Cyber-societies , Vol. LNAI 2246, pp. 133–144. Springer-Verlag, Berlin Heidelberg.Biswas, A., Sen, S., & Debnath, S. (2000). Limiting Deception in a Group of Social Agents.
Applied Artificial Intelligence , , 785–797.Braynov, S., & Sandholm, T. (2002). Incentive Compatible Mechanism for Trust Revelation.In Proceedings of the AAMAS , Bologna, Italy.Cooke, R. (1991).
Experts in Uncertainity: Opinion and Subjective Probability in Science .Oxford University Press: New York.Dellarocas, C. (2002). Goodwill Hunting: An Economically Efficient Online Feedback. InPadget, J., & et al. (Eds.),
Agent-Mediated Electronic Commerce IV. Designing Mech-anisms and Systems , Vol. LNCS 2531, pp. 238–252. Springer Verlag.Dellarocas, C. (2005). Reputation Mechanism Design in Online Trading Environments withPure Moral Hazard.
Information Systems Research , (2), 209–230.Farrell, J., & Maskin, E. (1989). Renegotiation in Repeated Games. Games and EconomicBehavior , , 327–360.Friedman, E., & Resnick, P. (2001). The Social Cost of Cheap Pseudonyms. Journal ofEconomics and Management Strategy , , 173–199.Fudenberg, D., & Levine, D. (1989). Reputation and Equilibrium Selection in Games witha Patient Player. Econometrica , , 759–778.Fudenberg, D., Levine, D., & Maskin, E. (1994). The Folk Theorem with Imperfect PublicInformation. Econometica , (5), 997–1039.Harmon, A. (2004). Amazon Glitch Unmasks War of Reviewers. The New York Times .Houser, D., & Wooders, J. (2006). Reputation in Auctions: Theory and Evidence fromeBay.
Journal of Economics and Management Strategy , , 353–369. btaining Reliable Feedback for Sanctioning Reputation Mechanisms Jurca, R., & Faltings, B. (2006). Minimum Payments that Reward Honest ReputationFeedback. In
Proceedings of the ACM Conference on Electronic Commerce (EC’06) ,pp. 190–199, Ann Arbor, Michigan, USA.Kreps, D. M., Milgrom, P., Roberts, J., & Wilson, R. (1982). Rational Cooperation in theFinitely Repeated Pisoner’s Dilemma.
Journal of Economic Theory , , 245–252.Kreps, D. M., & Wilson, R. (1982). Reputation and Imperfect Information. Journal ofEconomic Theory , , 253–279.Kuwabara, K. (2003). Decomposing Reputation Effects: Sanctioning or Signaling?. Workingpaper.Mailath, G., & Samuelson, L. (2006). Repeated Games and Reputations: Long-Run Rela-tionships . Oxford University Press.Milgrom, P., & Roberts, J. (1982). Predation, Reputation and Entry Deterrence.
Journalof Economic Theory , , 280–312.Miller, N., Resnick, P., & Zeckhauser, R. (2005). Eliciting Informative Feedback: The Peer-Prediction Method. Management Science , , 1359 –1373.Papaioannou, T. G., & Stamoulis, G. D. (2005). An Incentives’ Mechanism PromotingTruthful Feedback in Peer-to-Peer Systems. In Proceedings of IEEE/ACM CCGRID2005 .Resnick, P., & Zeckhauser, R. (2002). Trust Among Strangers in Electronic Transactions:Empirical Analysis of eBay’s Reputation System. In Baye, M. (Ed.),
The Economicsof the Internet and E-Commerce , Vol. 11 of Advances in Applied Microeconomics.Elsevier Science, Amsterdam.Schillo, M., Funk, P., & Rovatsos, M. (2000). Using Trust for Detecting Deceitful Agentsin Artificial Societies.
Applied Artificial Intelligence , , 825–848.Schmidt, K. M. (1993). Reputation and Equilibrium Characterization in Repeated Gameswith Conflicting Interests. Econometrica , , 325–351.Selten, R. (1978). The Chain-Store Paradox. Theory and Decision , , 127–159.Yu, B., & Singh, M. (2002). An Evidential Model of Distributed Reputation Management.In Proceedings of the AAMAS , Bologna, Italy.Yu, B., & Singh, M. (2003). Detecting Deception in Reputation Management. In
Proceedingsof the AAMAS , Melbourne, Australia., Melbourne, Australia.